cs.CV - 2023-08-30

3D Adversarial Augmentations for Robust Out-of-Domain Predictions

  • paper_url: http://arxiv.org/abs/2308.15479
  • repo_url: None
  • paper_authors: Alexander Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Nassir Navab, Benjamin Busam, Federico Tombari
    for: This paper aims to improve the generalization of 3D object detection and semantic segmentation models to out-of-domain data.methods: The authors use adversarial examples to augment the training set and improve the models’ robustness to out-of-domain data. They learn a set of vectors that deform the objects in an adversarial fashion while preserving their plausibility.results: The authors show that their approach substantially improves the robustness and generalization of both 3D object detection and 3D semantic segmentation methods to out-of-domain data, achieving better performance on a variety of scenarios using data from KITTI, Waymo, and CrashD for object detection, and data from SemanticKITTI, Waymo, and nuScenes for semantic segmentation.Here’s the simplified Chinese text for the three key points:for: 这篇论文目标是提高3D物体检测和 semantic segmentation 模型对非标型数据的泛化性。methods: 作者使用对抗示例来增强训练集,以提高模型对非标型数据的Robustness。他们学习了一组扭曲物体的vector,以 preserve their plausibility。results: 作者表明,他们的方法可以大幅提高3D物体检测和 semantic segmentation 模型对非标型数据的泛化性,在不同场景下,使用KITTI、Waymo和CrashD数据集进行3D物体检测,以及使用SemanticKITTI、Waymo和nuScenes数据集进行semantic segmentation,并且在训练使用标准单个数据集,而不是使用多个数据集。
    Abstract Since real-world training datasets cannot properly sample the long tail of the underlying data distribution, corner cases and rare out-of-domain samples can severely hinder the performance of state-of-the-art models. This problem becomes even more severe for dense tasks, such as 3D semantic segmentation, where points of non-standard objects can be confidently associated to the wrong class. In this work, we focus on improving the generalization to out-of-domain data. We achieve this by augmenting the training set with adversarial examples. First, we learn a set of vectors that deform the objects in an adversarial fashion. To prevent the adversarial examples from being too far from the existing data distribution, we preserve their plausibility through a series of constraints, ensuring sensor-awareness and shapes smoothness. Then, we perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model. We conduct extensive experiments across a variety of scenarios on data from KITTI, Waymo, and CrashD for 3D object detection, and on data from SemanticKITTI, Waymo, and nuScenes for 3D semantic segmentation. Despite training on a standard single dataset, our approach substantially improves the robustness and generalization of both 3D object detection and 3D semantic segmentation methods to out-of-domain data.
    摘要 自实际训练数据集不能正确采样下游数据分布的长尾,因此角落情况和罕见的非预训练数据样本会严重影响当前最佳模型的性能。这个问题在某些笔直的任务上,如3D语义分割,变得更加严重,因为非标准对象的点可以坚定地归类到错误的类型上。在这种情况下,我们关注提高对非预训练数据的泛化。我们实现这一目标通过在训练集中添加对抗示例来实现。首先,我们学习一组可以妄图对象的变形向量。为了保证对抗示例不过于远离现有数据分布,我们保留其可能性通过一系列约束,包括感知器和形状的平滑性。然后,我们通过应用学习的样本独立向量来对可用的对象进行对抗增强。我们在各种场景下进行了广泛的实验,包括KITTI、Waymo和CrashD上的3D物体检测,以及SemanticKITTI、Waymo和nuScenes上的3D语义分割。尽管我们只使用了标准单个数据集进行训练,但我们的方法可以很大程度上提高3D物体检测和3D语义分割方法对于非预训练数据的泛化性和Robustness。

An Adaptive Tangent Feature Perspective of Neural Networks

  • paper_url: http://arxiv.org/abs/2308.15478
  • repo_url: None
  • paper_authors: Daniel LeJeune, Sina Alemohammad
  • for: 了解神经网络中特征学习的机制
  • methods: 使用线性模型在抽象特征空间进行学习,并在训练过程中允许特征进行变换
  • results: 提出了一种基于线性变换的特征学习框架,并证明该框架在神经网络中可以提供更多的特征学习细节,以及一种适应特征实现的抽象特征分类方法可以在MNIST和CIFAR-10上具有许多 órders of magnitude 的采样复杂性优势。
    Abstract In order to better understand feature learning in neural networks, we propose a framework for understanding linear models in tangent feature space where the features are allowed to be transformed during training. We consider linear transformations of features, resulting in a joint optimization over parameters and transformations with a bilinear interpolation constraint. We show that this optimization problem has an equivalent linearly constrained optimization with structured regularization that encourages approximately low rank solutions. Specializing to neural network structure, we gain insights into how the features and thus the kernel function change, providing additional nuance to the phenomenon of kernel alignment when the target function is poorly represented using tangent features. In addition to verifying our theoretical observations in real neural networks on a simple regression problem, we empirically show that an adaptive feature implementation of tangent feature classification has an order of magnitude lower sample complexity than the fixed tangent feature model on MNIST and CIFAR-10.
    摘要 为了更好地理解神经网络中的特征学习,我们提出了一个框架来理解在斜缩Feature空间中的线性模型。我们考虑了特征的线性变换,从而导致参数和变换的共同优化问题,其中包括bilinear插值约束。我们表明这个优化问题有相应的线性约束优化问题,并且具有结构化正则化,以鼓励约束低维解决方案。在神经网络结构下,我们获得了特征和几何函数的变化,从而提供了特征对kernel函数的影响的更多的准确信息。此外,我们还证明了在实际神经网络中,适用于拟合特征的tanent特征分类模型具有训练样本的一个数量级减少。

Learning Modulated Transformation in GANs

  • paper_url: http://arxiv.org/abs/2308.15472
  • repo_url: None
  • paper_authors: Ceyuan Yang, Qihang Zhang, Yinghao Xu, Jiapeng Zhu, Yujun Shen, Bo Dai
  • for: 提高 generative adversarial networks (GANs) 的模型灵活性和可重用性,以便更好地处理各种生成任务,包括图像生成、3D-aware图像生成和视频生成。
  • methods: 提出一种名为 modulated transformation module (MTM) 的插件模块,该模块可以预测空间偏移,并根据隐藏码来控制变量位置,以便模型更好地处理几何变换。
  • results: 在多种生成任务上进行了广泛的实验,并证明了该方法可以与当前的状态略进行无需任何参数调整。特别是,在人体生成 tasks 上,我们提高了 StyleGAN3 的 FID 值从 21.36 下降至 13.60, demonstrate 了学习模ulated geometry transformation 的能力。
    Abstract The success of style-based generators largely benefits from style modulation, which helps take care of the cross-instance variation within data. However, the instance-wise stochasticity is typically introduced via regular convolution, where kernels interact with features at some fixed locations, limiting its capacity for modeling geometric variation. To alleviate this problem, we equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM). This module predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations for different instances, and hence offers the model an additional degree of freedom to handle geometry deformation. Extensive experiments suggest that our approach can be faithfully generalized to various generative tasks, including image generation, 3D-aware image synthesis, and video generation, and get compatible with state-of-the-art frameworks without any hyper-parameter tuning. It is noteworthy that, towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.
    摘要 成功的风格基本生成器主要受益于风格调整,它可以处理数据中的跨实例变化。然而,实例具有的随机性通常通过常规 convolution 引入,其中核函数与特征在固定位置相互作用,限制模型的形态变换能力。为解决这个问题,我们在生成对抗网络(GANs)中增加了可替换模块,称为模ulated transformation module(MTM)。这个模块根据隐藏代码预测空间偏移,并基于这些偏移进行变量位置的 convolution 操作,从而为模型增加了一个额外的自由度来处理形态变换。我们的方法可以广泛应用于不同的生成任务,包括图像生成、三维感知图像合成和视频生成,并与当前最佳框架相容无需任何超参数调整。特别是,我们在挑战性的 TaiChi 数据集上进行人体生成 task 时,提高了 StyleGAN3 的 FID 从 21.36 下降至 13.60,这表明我们学习了模ulated geometry transformation 的能力。

Input margins can predict generalization too

  • paper_url: http://arxiv.org/abs/2308.15466
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Coenraad Mouton, Marthinus W. Theunissen, Marelie H. Davel
  • for: investigate the relationship between generalization and classification margins in deep neural networks
  • methods: use margin measurements, specifically constrained margins, to predict generalization ability
  • results: constrained margins achieve highly competitive scores and outperform other margin measurements in general, providing a novel insight into the relationship between generalization and classification margins.
    Abstract Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or its representation internal to the network. While margins have been shown to be correlated with the generalization ability of a model when measured at its hidden representations (hidden margins), no such link between large margins and generalization has been established for input margins. We show that while input margins are not generally predictive of generalization, they can be if the search space is appropriately constrained. We develop such a measure based on input margins, which we refer to as `constrained margins'. The predictive power of this new measure is demonstrated on the 'Predicting Generalization in Deep Learning' (PGDL) dataset and contrasted with hidden representation margins. We find that constrained margins achieve highly competitive scores and outperform other margin measurements in general. This provides a novel insight on the relationship between generalization and classification margins, and highlights the importance of considering the data manifold for investigations of generalization in DNNs.
    摘要

Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

  • paper_url: http://arxiv.org/abs/2308.15462
  • repo_url: None
  • paper_authors: Yazhou Xing, Amrita Mazumdar, Anjul Patney, Chao Liu, Hongxu Yin, Qifeng Chen, Jan Kautz, Iuri Frosio
  • for: 解决LDR相机无法处理宽动态范围输入的问题,提高图像质量。
  • methods: 使用变换器基于深度神经网络(DNN)推断缺失HDR细节。在减少参数学习中,使用多尺度DNN和适当的成本函数来实现状态艺术质量。 Additionally, using a reference frame from the past as an additional input to aid the reconstruction of overexposed areas.
  • results: 在减少参数学习中,使用这种方法可以获得状态艺术质量,而不需要使用复杂的获取机制或高Dynamic范围成像处理。我们的示例视频可以在https://drive.google.com/file/d/1-r12BKImLOYCLUoPzdebnMyNjJ4Rk360/view中找到。
    Abstract Low dynamic range (LDR) cameras cannot deal with wide dynamic range inputs, frequently leading to local overexposure issues. We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging. We propose a transformer-based deep neural network (DNN) to infer the missing HDR details. In an ablation study, we show the importance of using a multiscale DNN and train it with the proper cost function to achieve state-of-the-art quality. To aid the reconstruction of the overexposed areas, our DNN takes a reference frame from the past as an additional input. This leverages the commonly occurring temporal instabilities of autoexposure to our advantage: since well-exposed details in the current frame may be overexposed in the future, we use reinforcement learning to train a reference frame selection DNN that decides whether to adopt the current frame as a future reference. Without resorting to alternating exposures, we obtain therefore a causal, HDR hallucination algorithm with potential application in common video acquisition settings. Our demo video can be found at https://drive.google.com/file/d/1-r12BKImLOYCLUoPzdebnMyNjJ4Rk360/view
    摘要 低动态范围(LDR)摄像机不能处理宽动态范围输入,导致本地过度曝光问题。我们提出了一种学习基于的系统,以减少这些缺陷而不需要复杂的获取机制如alternating exposures或高动态范围(HDR)拍摄。我们提议使用 transformer 基于的深度神经网络(DNN)来推理缺失 HDR 细节。在一个ablation study中,我们表明了使用多尺度 DNN 和适当的成本函数以 дости得状态的质量。为了重建过度曝光的区域,我们的 DNN 接受了过去的参考帧作为额外输入。这样利用了自动曝光的 temporal 不稳定性,我们使用 reinforcement learning 来训练参考帧选择 DNN,以确定是否采用当前帧作为未来的参考。无需alternating exposures,我们得到了一个 causal、HDR 幻化算法,可能在常见的视频拍摄设置中应用。我们的 demo 视频可以在 找到。

Canonical Factors for Hybrid Neural Fields

  • paper_url: http://arxiv.org/abs/2308.15461
  • repo_url: https://github.com/brentyi/tilted
  • paper_authors: Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma
  • for: 本文主要针对的问题是射线对齐信号的抽象特征量化方法引入的不良偏见问题,并提出一种解决方法。
  • methods: 本文使用了学习一组均衡变换的方法,以消除这些偏见。
  • results: 实验结果表明,使用这种方法可以提高图像、签名距离和辐射场重建质量、稳定性、压缩率和运行时间。
    Abstract Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance succeeds with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, where we observe improvements across quality, robustness, compactness, and runtime. Results demonstrate that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures.
    摘要
  1. Characterize the undesirable biases that these architectures have for axis-aligned signals, which can lead to radiance field reconstruction differences of up to 2 PSNR.2. Explore how learning a set of canonicalizing transformations can improve representations by removing these biases.We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance can be done with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, and observe improvements in quality, robustness, compactness, and runtime. Our results show that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures.

Pseudo-Boolean Polynomials Approach To Edge Detection And Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.15453
  • repo_url: None
  • paper_authors: Tendai Mapungwana Chikake, Boris Goldengorin, Alexey Samosyuk
  • for: 用于图像Edge检测和分割
  • methods: 使用pseudo-Boolean波尔次数计算image patches,进行二分类 blob和边区域的分类
  • results: 在简单图像中成功实现Edge检测和分割,并在复杂图像中进行应用In English, this translates to:
  • for: Used for image edge detection and segmentation
  • methods: Using pseudo-Boolean polynomials calculated on image patches for binary classification of blob and edge regions
  • results: Successfully implemented edge detection and segmentation on simple images and applied to complex images like aerial landscapes
    Abstract We introduce a deterministic approach to edge detection and image segmentation by formulating pseudo-Boolean polynomials on image patches. The approach works by applying a binary classification of blob and edge regions in an image based on the degrees of pseudo-Boolean polynomials calculated on patches extracted from the provided image. We test our method on simple images containing primitive shapes of constant and contrasting colour and establish the feasibility before applying it to complex instances like aerial landscape images. The proposed method is based on the exploitation of the reduction, polynomial degree, and equivalence properties of penalty-based pseudo-Boolean polynomials.
    摘要 我们介绍了一种权值Deterministic逻辑来实现图像边检测和分割,通过在图像块上计算 pseudo-Boolean 多项式。该方法基于对图像块上的二分类,将图像分为 blob 和边区域,并基于计算的 pseudo-Boolean 多项式度量来进行分类。我们在简单的图像中使用了固定颜色和对比度的基本形状进行测试,并证明了该方法的可行性。然后,我们将该方法应用于复杂的飞行图像。该方法基于 pseudo-Boolean 多项式的减少、度量和等价性属性的利用。

WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

  • paper_url: http://arxiv.org/abs/2308.15413
  • repo_url: None
  • paper_authors: Eric Lei, Muhammad Asad Lodhi, Jiahao Pang, Junghyun Ahn, Dong Tian
  • for: 本研究旨在实现基于 mesh 数据的无监督学习,以便学习更有意义的表示。
  • methods: 本文提出了一种基于 bottleneck 的 mesh autoencoder,通过专门 Representing mesh 连接情况的基本图来促进学习共享的 latent 空间表示对象形状。
  • results: 对比点云学习,WRAPPINGNET 可以提供更高质量的重建和竞争性的分类结果,同时可以进行不同类型对象之间的 latent interpolate。
    Abstract There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific templates, e.g., human face/body templates. It restricts the learned latent codes to only be meaningful for objects in a specific category, so the learned latent spaces are unable to be used across different types of objects. In this work, we present WrappingNet, the first mesh autoencoder enabling general mesh unsupervised learning over heterogeneous objects. It introduces a novel base graph in the bottleneck dedicated to representing mesh connectivity, which is shown to facilitate learning a shared latent space representing object shape. The superiority of WrappingNet mesh learning is further demonstrated via improved reconstruction quality and competitive classification compared to point cloud learning, as well as latent interpolation between meshes of different categories.
    摘要 有些最近的努力是通过固定长度代码Word来学习更有意义的表示,从网格数据中得到更多的信息,因为网格作为三维形态的完整模型,比点云更有优势。然而,网格连接会对深度学习管道的构建带来新的挑战。以前的无监督学习方法通常假设特定类别的模板,例如人脸/身体模板。这限制学习的幂等空间只能对特定类别的对象进行有意义的学习,因此学习的幂等空间无法在不同类别的对象之间进行使用。在这种工作中,我们介绍了WrappingNet,首个能够进行总体网格无监督学习的自动编码器。它引入了瓶颈部分的新基graph,用于表示网格连接,这被证明可以促进学习对象形状的共享 latent space。我们通过对网格学习和点云学习进行比较,以及在不同类别的网格之间进行 latent interpolate 等方法来证明 WrappingNet 的优越性。

Robust Long-Tailed Learning via Label-Aware Bounded CVaR

  • paper_url: http://arxiv.org/abs/2308.15405
  • repo_url: None
  • paper_authors: Hong Zhu, Runpeng Yu, Xing Tang, Yifei Wang, Yuan Fang, Yisen Wang
  • for: 实际世界中的核心类别问题 often 会出现不对称或长尾分布,导致模型训练时对少数类别表现差。这种情况下,单简的模型通常对少数类别表现不佳。
  • methods: 本文提出了两种基于CVaR(Conditional Value at Risk)的新方法来改善长尾学习的表现,并提供了严谨的理论保证。特别是,我们首先引入了Label-Aware Bounded CVaR(LAB-CVaR)损失函数,以解决原始CVaR的偏预结果问题,然后设计了LAB-CVaR的最佳质量上限。基于LAB-CVaR,我们还提出了LAB-CVaR with logit adjustment(LAB-CVaR-logit)损失函数,并提供了理论支持。
  • results: 实际实验结果显示,我们的提案方法在实际世界中的长尾标签分布下表现出色,较以单简的模型表现更好。
    Abstract Data in the real-world classification problems are always imbalanced or long-tailed, wherein the majority classes have the most of the samples that dominate the model training. In such setting, the naive model tends to have poor performance on the minority classes. Previously, a variety of loss modifications have been proposed to address the long-tailed leaning problem, while these methods either treat the samples in the same class indiscriminatingly or lack a theoretical guarantee. In this paper, we propose two novel approaches based on CVaR (Conditional Value at Risk) to improve the performance of long-tailed learning with a solid theoretical ground. Specifically, we firstly introduce a Label-Aware Bounded CVaR (LAB-CVaR) loss to overcome the pessimistic result of the original CVaR, and further design the optimal weight bounds for LAB-CVaR theoretically. Based on LAB-CVaR, we additionally propose a LAB-CVaR with logit adjustment (LAB-CVaR-logit) loss to stabilize the optimization process, where we also offer the theoretical support. Extensive experiments on real-world datasets with long-tailed label distributions verify the superiority of our proposed methods.
    摘要 <>Translate given text into Simplified Chinese.<>世界上的实际分类问题中的数据总是偏斜或长尾分布,其中多数类占据了模型训练中的大多数样本。在这种情况下,简单的模型通常会对少数类表现不佳。先前,一些损失修改方法已经被提出,但这些方法可能会对同一类的样本待遇不公平,或者缺乏理论保证。在本文中,我们提出了两种基于CVaR(Conditional Value at Risk)的新方法,以改进长尾学习的性能,并提供了坚实的理论基础。 Specifically, we first introduce a Label-Aware Bounded CVaR (LAB-CVaR) loss to overcome the pessimistic result of the original CVaR, and further design the optimal weight bounds for LAB-CVaR theoretically. Based on LAB-CVaR, we additionally propose a LAB-CVaR with logit adjustment (LAB-CVaR-logit) loss to stabilize the optimization process, where we also offer the theoretical support. 实际上,我们对实际上的长尾标签分布进行了广泛的实验,并证明了我们提出的方法的优越性。