cs.LG - 2023-09-14

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

  • paper_url: http://arxiv.org/abs/2309.08044
  • repo_url: None
  • paper_authors: Mike Nguyen, Nicole Mücke
  • for: 这篇论文旨在研究两层神经网络在神经归一化kernel(NTK) Régime中的泛化性能,使用梯度下降(GD)进行训练。
  • methods: 论文使用了梯度下降(GD)进行训练,并且在早停止GD的情况下 derive fast rates of convergence,这些速率被认为是非 Parametric regression in reproducing kernel Hilbert spaces中的最优性。
  • results: 论文表明,在训练过程中,权重会保持在初始化的周围,即 radius 取决于 Structural assumptions 如抽象函数的细节和 интеграル运算符的eigenvalue decay。
    Abstract We analyze the generalization properties of two-layer neural networks in the neural tangent kernel (NTK) regime, trained with gradient descent (GD). For early stopped GD we derive fast rates of convergence that are known to be minimax optimal in the framework of non-parametric regression in reproducing kernel Hilbert spaces. On our way, we precisely keep track of the number of hidden neurons required for generalization and improve over existing results. We further show that the weights during training remain in a vicinity around initialization, the radius being dependent on structural assumptions such as degree of smoothness of the regression function and eigenvalue decay of the integral operator associated to the NTK.
    摘要 我们分析两层神经网络在神经 Tangent 公式(NTK) режиме下的通用性特性,使用梯度下降(GD)进行训练。我们 derivate fast rates of convergence,这些率已知为非 Parametric 回归中极值优化的框架中的最优化率。在我们的路径上,我们精确地跟踪隐藏神经元数量所需的通用性,并超过现有结果。此外,我们还证明在训练过程中的权重保持在初始化附近,半径取决于干扰函数的度数平滑性和积分算子相关的谱值衰落。

On Prediction Feature Assignment in the Heckman Selection Model

  • paper_url: http://arxiv.org/abs/2309.08043
  • repo_url: None
  • paper_authors: Huy Mai, Xintao Wu
  • for: Handle missing-not-at-random (MNAR) sample selection bias in prediction models.
  • methods: Heckman selection model and its variants, with a novel data-driven framework called Heckman-FA to obtain prediction features.
  • results: Robust regression model under MNAR sample selection bias, using experimental results on real-world datasets.Here’s the full text in Simplified Chinese:
  • for: Handle missing-not-at-random (MNAR) 样本选择偏见在预测模型中。
  • methods: Heckman选择模型及其变种,使用novel数据驱动框架called Heckman-FA获取预测特征。
  • results: Robust预测模型 unter MNAR样本选择偏见,使用实验结果在实际数据集上。
    Abstract Under missing-not-at-random (MNAR) sample selection bias, the performance of a prediction model is often degraded. This paper focuses on one classic instance of MNAR sample selection bias where a subset of samples have non-randomly missing outcomes. The Heckman selection model and its variants have commonly been used to handle this type of sample selection bias. The Heckman model uses two separate equations to model the prediction and selection of samples, where the selection features include all prediction features. When using the Heckman model, the prediction features must be properly chosen from the set of selection features. However, choosing the proper prediction features is a challenging task for the Heckman model. This is especially the case when the number of selection features is large. Existing approaches that use the Heckman model often provide a manually chosen set of prediction features. In this paper, we propose Heckman-FA as a novel data-driven framework for obtaining prediction features for the Heckman model. Heckman-FA first trains an assignment function that determines whether or not a selection feature is assigned as a prediction feature. Using the parameters of the trained function, the framework extracts a suitable set of prediction features based on the goodness-of-fit of the prediction model given the chosen prediction features and the correlation between noise terms of the prediction and selection equations. Experimental results on real-world datasets show that Heckman-FA produces a robust regression model under MNAR sample selection bias.
    摘要 Under missing-not-at-random (MNAR) 样本选择偏见,预测模型的性能 oft 被降低。这篇论文关注了一个经典的 MNAR 样本选择偏见情况,其中一个子集样本有非Randomly missing 结果。赫克曼选择模型和其变种经常用来处理这种样本选择偏见。赫克曼模型使用两个分开的方程来模型预测和选择样本,其中选择特征包括所有预测特征。在使用赫克曼模型时,预测特征必须从选择特征中选择。然而,选择合适的预测特征是赫克曼模型中的一个挑战。这是特别是当数量很大的选择特征时。现有的方法通常使用手动选择预测特征。在这篇论文中,我们提出了一种新的数据驱动的 Heckman-FA 框架,用于获取适用的预测特征。Heckman-FA 首先训练一个分配函数,该函数确定一个选择特征是否被用作预测特征。使用该函数的参数,框架提取了一个适合的预测特征集,基于预测模型给出的准确性和选择特征之间的相关性。实验结果表明,Heckman-FA 在真实世界数据上生成了一个稳定的回归模型,并且在 MNAR 样本选择偏见情况下表现良好。

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

  • paper_url: http://arxiv.org/abs/2309.08023
  • repo_url: None
  • paper_authors: Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang
  • For: The paper is written for detecting speaker changes and performing automatic speech recognition (ASR) for 96 languages.* Methods: The paper uses a multilingual speaker change detection model (USM-SCD) that is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data. The model is fine-tuned for the downstream task of speaker change detection and ASR.* Results: The USM-SCD model achieves more than 75% average speaker change detection F1 score across a test set of 96 languages, with an 85.8% speaker change detection F1 score on American English. The model also exhibits state-of-the-art ASR quality compared to a strong public ASR baseline, making it suitable for handling both tasks with negligible additional computational cost.
    Abstract We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8% speaker change detection F1 score across various public and internal test sets, beating the previous monolingual baseline model by 21% relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.
    摘要 我们介绍了一个多语言说话变化检测模型(USM-SCD),该模型可同时检测说话转移并进行ASR для 96种语言。该模型基于一个大量监督和无监督数据训练的语音基础模型,并示了将大型通用基础模型进行精度调整的利用性。我们通过一系列剥离研究分析了USM-SCD模型的性能。我们发现USM-SCD模型可以在96种语言测试集上达到75%以上的平均说话变化检测F1分数。在美国英语上,USM-SCD模型可以在不同的公共和内部测试集上达到85.8%的说话变化检测F1分数,比前一代单语言基线模型提高21%的Relative。我们还发现只需要调整模型参数的一半可以达到最佳模型性能。USM-SCD模型在ASR质量方面与一个强大的公共ASR基线模型相当,使其适合同时处理两个任务,计算成本几乎为零加上。

CRYPTO-MINE: Cryptanalysis via Mutual Information Neural Estimation

  • paper_url: http://arxiv.org/abs/2309.08019
  • repo_url: None
  • paper_authors: Benjamin D. Kim, Vipindev Adat Vasudevan, Jongchan Woo, Alejandro Cohen, Rafael G. L. D’Oliveira, Thomas Stahlbuhk, Muriel Médard
  • for: 评估密码系统的计算安全性
  • methods: 使用神经网络来估计密码系统中文本和密文之间的共同信息
  • results: 对多种加密算法和基准方法进行实验分析,并研究输入分布与信息泄露之间的关系
    Abstract The use of Mutual Information (MI) as a measure to evaluate the efficiency of cryptosystems has an extensive history. However, estimating MI between unknown random variables in a high-dimensional space is challenging. Recent advances in machine learning have enabled progress in estimating MI using neural networks. This work presents a novel application of MI estimation in the field of cryptography. We propose applying this methodology directly to estimate the MI between plaintext and ciphertext in a chosen plaintext attack. The leaked information, if any, from the encryption could potentially be exploited by adversaries to compromise the computational security of the cryptosystem. We evaluate the efficiency of our approach by empirically analyzing multiple encryption schemes and baseline approaches. Furthermore, we extend the analysis to novel network coding-based cryptosystems that provide individual secrecy and study the relationship between information leakage and input distribution.
    摘要 使用互讯信息(MI)作为加密系统的效率评估的历史悠久。然而,在高维空间中估计MI的挑战很大。现代机器学习技术的进步使得MI估计使用神经网络得到进展。本工作提出了一种应用MI估计的新方法,直接用于计算文本和加密文本之间的MI。如果有任何泄露信息,可能会被敌方利用来损害加密系统的计算安全性。我们通过实验分析多种加密方案和基eline方法来评估我们的方法的效率。此外,我们将分析新的网络编码基于加密系统,并研究输入分布与信息泄露之间的关系。

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

  • paper_url: http://arxiv.org/abs/2309.07988
  • repo_url: None
  • paper_authors: Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Ernie Chang, Yangyang Shi, Vikas Chandra
  • for: 这篇论文的目的是优化Transformer数据准确度模型,以提高声识别器的效率和可携性。
  • methods: 本文使用了一种名为“folding attention”的技术,对于 linear projection layers 进行了优化,以减少模型的大小和计算量,同时不影响模型的准确度和计算过程。
  • results: 实验结果显示,使用“folding attention”技术可以将模型的大小(以及相应的内存消耗)降低到24%,并且降低了电源消耗到23%,而且无需妥协模型的准确度或计算过程。
    Abstract Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage. To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. Experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead.
    摘要 具有转换器基于模型的模型在语音识别方面表现出色。现有的优化转换器推理,通常是为长上下文应用,围绕简化关注分数计算。然而,流动语音识别模型通常每次处理的令牌数相对较少,因此关注分数计算不是瓶颈。相反,瓶颈在多头注意力和feedforward网络的线性投影层上,这些层占据了模型大小的较大比例,并且对计算、内存和能源使用做出了重要贡献。为解决这个瓶颈,我们提出了“叠加注意力”技术, Targeting these linear layers, this technique significantly reduces the model size and improves memory and power efficiency. Our experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead.

SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems

  • paper_url: http://arxiv.org/abs/2309.07983
  • repo_url: https://github.com/s3l-official/slmia-sr
  • paper_authors: Guangke Chen, Yedi Zhang, Fu Song
  • for: 这个论文的目的是提出一种特有的会员推测攻击(Membership Inference Attack),用于 speaker recognition(SR)领域。
  • methods: 这个攻击使用了两个特征组合体 engineering 来量化训练和非训练 speaker 之间的差异,并通过一种新的混合比例训练策略来提高攻击效果。
  • results: 实验表明,这个攻击能够成功地判断 SR 模型中是否包含特定的声音例子,并且可以在白卷和黑卷场景下进行。此外,攻击还可以采用两种新的技术来降低黑卷查询数量。
    Abstract Membership inference attacks allow adversaries to determine whether a particular example was contained in the model's training dataset. While previous works have confirmed the feasibility of such attacks in various applications, none has focused on speaker recognition (SR), a promising voice-based biometric recognition technique. In this work, we propose SLMIA-SR, the first membership inference attack tailored to SR. In contrast to conventional example-level attack, our attack features speaker-level membership inference, i.e., determining if any voices of a given speaker, either the same as or different from the given inference voices, have been involved in the training of a model. It is particularly useful and practical since the training and inference voices are usually distinct, and it is also meaningful considering the open-set nature of SR, namely, the recognition speakers were often not present in the training data. We utilize intra-closeness and inter-farness, two training objectives of SR, to characterize the differences between training and non-training speakers and quantify them with two groups of features driven by carefully-established feature engineering to mount the attack. To improve the generalizability of our attack, we propose a novel mixing ratio training strategy to train attack models. To enhance the attack performance, we introduce voice chunk splitting to cope with the limited number of inference voices and propose to train attack models dependent on the number of inference voices. Our attack is versatile and can work in both white-box and black-box scenarios. Additionally, we propose two novel techniques to reduce the number of black-box queries while maintaining the attack performance. Extensive experiments demonstrate the effectiveness of SLMIA-SR.
    摘要 <>输入文本翻译成简化中文。<> membrane 攻击允许敌对者确定一个特定的示例是否包含在模型的训练数据集中。 先前的工作已经证明了这种攻击的可行性在多种应用场景中,但没有关注 speaker recognition(SR),一种有前途的语音基于生物认证技术。在这项工作中,我们提出了 SLMIA-SR,首个针对 SR 的成员推断攻击。与传统的示例级攻击不同,我们的攻击具有 speaker 级成员推断,即确定一个给定的推断声音是否在模型训练中出现过。这是非常有用和实用的,因为训练和推断声音通常不同,而 SR 的开放集成特性也使得这种攻击有意义。我们利用 SR 的内部亲缘和外部远离两个训练目标,将不同的声音分类为两组特征驱动了精心设计的特征工程,以进行攻击。为了提高攻击的通用性,我们提出了一种新的混合比例训练策略。为了提高攻击性能,我们引入了声音块拼接技术,并根据推断声音的数量进行模型训练。我们的攻击可以在白盒和黑盒两种enario下进行。此外,我们还提出了两种新的黑盒查询数量减少技术,以保持攻击性能。我们的实验证明了 SLMIA-SR 的有效性。

Uncertainty quantification for learned ISTA

  • paper_url: http://arxiv.org/abs/2309.07982
  • repo_url: None
  • paper_authors: Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer, Hannah Laus, Holger Rauhut
  • for: 这篇论文主要是为了解决 inverse problems 中的问题,通过 integrate 数学模型和深度学习技术,以提高问题的解决效率和可解释性。
  • methods: 这篇论文使用了 algorithm unrolling schemes 这种模型基于深度学习技术,并将具有解释性的 prior domain knowledge 纳入训练过程中。
  • results: 这篇论文提出了一种可以获得 confidence intervals 的方法,以便在解决 inverse problems 中提高 uncertainty quantification 的能力。
    Abstract Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.
    摘要 numerical 性能与解释性之间的桥梁是基于深度学习的模型解 inverse 问题,它们在解决高维统计方法的问题上具有优势。此外,结合域知识可以使训练更加效率,因为 fewer 参数使得训练步骤可以使用 smaller datasets。algorithm 折叠方案在这些模型学习技术中占据主导地位。 despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap by proposing a rigorous way to obtain confidence intervals for the LISTA estimator.Note: Please keep in mind that the translation is done using a machine translation tool, and the quality of the translation may vary.

Improving physics-informed DeepONets with hard constraints

  • paper_url: http://arxiv.org/abs/2309.07899
  • repo_url: None
  • paper_authors: Rüdiger Brecht, Dmytro R. Popovych, Alex Bihlo, Roman O. Popovych
  • for: 提高现有物理学 Informed(标准或运算员)神经网络的精度,使其不需要准确地学习系统的初始条件。
  • methods: 提议使用Physics-Informed Deep Learning(PIDL)策略,使得初始条件不需要被学习,并且 garantia 预测解的连续性。
  • results: PIDL 策略可以提高现有物理学 Informed 神经网络的精度,并且可以确保预测解的连续性。
    Abstract Current physics-informed (standard or operator) neural networks still rely on accurately learning the initial conditions of the system they are solving. In contrast, standard numerical methods evolve such initial conditions without needing to learn these. In this study, we propose to improve current physics-informed deep learning strategies such that initial conditions do not need to be learned and are represented exactly in the predicted solution. Moreover, this method guarantees that when a DeepONet is applied multiple times to time step a solution, the resulting function is continuous.
    摘要 当前的物理学 informed (标准或运算符) 神经网络仍然需要准确地学习系统的初始条件。相比之下,标准的数学方法会自动演化这些初始条件而不需要学习。在本研究中,我们提议改进当前的物理学 informed 深度学习策略,使得初始条件不需要被学习,并且在预测解的过程中被 precisiely 表示。此外,这种方法 garantía 当 DeepONet 被应用多次来步骤解 solution,得到的函数是连续的。

Choosing a Proxy Metric from Past Experiments

  • paper_url: http://arxiv.org/abs/2309.07893
  • repo_url: None
  • paper_authors: Nilesh Tripuraneni, Lee Richardson, Alexander D’Amour, Jacopo Soriano, Steve Yadlowsky
  • for: 这个论文的目的是提出一种新的统计框架,用于在Homogeneous population的随机实验中定义和构建优化的代理指标。
  • methods: 这个框架首先将在给定实验中构建优化代理指标转化为一个股票优化问题,这个问题取决于实验中真实的潜在治疗效果和噪声水平。然后,通过对历史 Randomized 实验中的观察治疗效果和代理指标进行减噪处理,提取实验中的潜在治疗效果估计。
  • results: 这个框架的实现和评估基于一个大量的Randomized 实验 Corpora,并构建了一些高效的代理指标,与多个基elines compare favorably。
    Abstract In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.
    摘要 很多随机实验中,长期指标(即首要评估目标)的治疗效果很难或不可能量度。这些长期指标通常需要时间才能响应变化,并且具有较高的噪音水平,使其 faithful 地估计困难。为了解决这个问题,我们常常会测量一些短期代理指标,以便它们可以尽可能地跟踪长期指标,从而用于决策。我们介绍了一种新的统计框架,用于在同质人口中的随机实验中定义和构建优化的代理指标。我们的过程将在给定实验中构建优化的代理指标降低到一个股票优化问题,这个问题取决于实验中真实的潜在治疗效果和噪音水平。然后,我们将历史 Randomized 实验中观察到的治疗效果和代理指标进行滤波,以提取用于优化问题的 latent 治疗效果估计。我们的研究显示,用于某个实验的优化代理指标不是先验定义的,而是基于实验中样本大小(或有效噪音水平)。为了实现和评估我们的框架,我们使用我们的方法在一个大量的Randomized实验 Corpora中进行实践和评估。我们的结果表明,我们的方法可以在这些实验中建立高效的代理指标,并且相比于多种基准,它们表现良好。

Some notes concerning a generalized KMM-type optimization method for density ratio estimation

  • paper_url: http://arxiv.org/abs/2309.07887
  • repo_url: https://github.com/cdalecsa/generalized-kmm
  • paper_authors: Cristian Daniel Alecsa
  • for: 本研究は密集度比估算のための新たな优化算法を引入します。
  • methods: 本研究では、KMM法の拡张に基づいて、适切な损失函数を构筑し、密集度比估算に対する更加一般的な Situationをカバーすることを目的としています。
  • results: 本研究では、适切な损失函数を用いて密集度比估算を行い、训练データおよびテストデータの密集度比を推定することができました。
    Abstract In the present paper we introduce new optimization algorithms for the task of density ratio estimation. More precisely, we consider extending the well-known KMM method using the construction of a suitable loss function, in order to encompass more general situations involving the estimation of density ratio with respect to subsets of the training data and test data, respectively. The associated codes can be found at https://github.com/CDAlecsa/Generalized-KMM.
    摘要 现在的论文中,我们介绍了一些新的优化算法用于概率比率估计任务。更加准确地说,我们考虑将广泛known的KMM方法扩展,通过构建适当的损失函数,以涵盖更加一般的情况,即对于训练数据和测试数据中的子集的概率比率估计。相关的代码可以在https://github.com/CDAlecsa/Generalized-KMM中找到。

Identifying the Group-Theoretic Structure of Machine-Learned Symmetries

  • paper_url: http://arxiv.org/abs/2309.07860
  • repo_url: None
  • paper_authors: Roy T. Forestano, Konstantin T. Matchev, Katia Matcheva, Alexander Roman, Eyup B. Unlu, Sarunas Verner
  • for: 这篇论文是用于探讨和识别深度学习中潜在的群论变换的方法。
  • methods: 这篇论文提出了一些方法来检查和识别深度学习中的机器学习约束的群论结构。它们包括在深度学习阶段或后续处理阶段使用损失函数来探讨子代数结构。
  • results: 这篇论文通过使用示例来说明了这些方法的效果,并应用到了粒子物理中的SU(3)和SU(5)非阿贝尔 gauge Symmetries的破碎。
    Abstract Deep learning was recently successfully used in deriving symmetry transformations that preserve important physics quantities. Being completely agnostic, these techniques postpone the identification of the discovered symmetries to a later stage. In this letter we propose methods for examining and identifying the group-theoretic structure of such machine-learned symmetries. We design loss functions which probe the subalgebra structure either during the deep learning stage of symmetry discovery or in a subsequent post-processing stage. We illustrate the new methods with examples from the U(n) Lie group family, obtaining the respective subalgebra decompositions. As an application to particle physics, we demonstrate the identification of the residual symmetries after the spontaneous breaking of non-Abelian gauge symmetries like SU(3) and SU(5) which are commonly used in model building.
    摘要 深度学习最近成功地应用于找到保留重要物理量的同态变换。这些技术完全无知,在后续阶段确定发现的同态性。在这封信中,我们提议用于检查和识别深度学习learned的同态性的方法。我们设计损失函数,在深度学习阶段或后续处理阶段探索同态性的子代数结构。我们通过使用U(n)李群家族的例子,获得相应的子代数分解。在素 particles physics中,我们示例了在非阿贝尔 gaugeSymmetries的自发性折损后剩下的 residual symmetries。Here's the translation in Traditional Chinese:深度学习最近成功地应用于找到保留重要物理量的同态变换。这些技术完全无知,在后续阶段确定发现的同态性。在这封信中,我们提议用于检查和识别深度学习learned的同态性的方法。我们设计损失函数,在深度学习阶段或后续处理阶段探索同态性的子代数结构。我们通过使用U(n)李群家族的例子,获得相应的子代数分解。在素粒子物理中,我们示例了在非阿贝尔 gaugeSymmetries的自发性折损后剩下的 residual symmetries。

Complex-Valued Neural Networks for Data-Driven Signal Processing and Signal Understanding

  • paper_url: http://arxiv.org/abs/2309.07948
  • repo_url: None
  • paper_authors: Josiah W. Smith
  • for: 这份研究是为了提供一个基于 PyTorch 的复值神经网络库,并且提供轻量级的界面 для 复值神经网络操作和架构。
  • methods: 本研究使用了 PyTorch 库,并提供了一些常用的复值神经网络操作和架构,包括线性层、核函数层、注意力层、批训层和对称层等。此外,还包括一些常用的数据驱动模型化技术,例如批训层和对称层。
  • results: 本研究提供了一个轻量级的复值神经网络库,并且包括了一些高效的实现方式,例如 batchnorm 和 layernorm 等。此外,还包括了一些未被充分探索的曼哈频率-based 复值神经网络层,这些层在许多研究上显示了杰出的表现。
    Abstract Complex-valued neural networks have emerged boasting superior modeling performance for many tasks across the signal processing, sensing, and communications arenas. However, developing complex-valued models currently demands development of basic deep learning operations, such as linear or convolution layers, as modern deep learning frameworks like PyTorch and Tensor flow do not adequately support complex-valued neural networks. This paper overviews a package built on PyTorch with the intention of implementing light-weight interfaces for common complex-valued neural network operations and architectures. Similar to natural language understanding (NLU), which as recently made tremendous leaps towards text-based intelligence, RF Signal Understanding (RFSU) is a promising field extending conventional signal processing algorithms using a hybrid approach of signal mechanics-based insight with data-driven modeling power. Notably, we include efficient implementations for linear, convolution, and attention modules in addition to activation functions and normalization layers such as batchnorm and layernorm. Additionally, we include efficient implementations of manifold-based complex-valued neural network layers that have shown tremendous promise but remain relatively unexplored in many research contexts. Although there is an emphasis on 1-D data tensors, due to a focus on signal processing, communications, and radar data, many of the routines are implemented for 2-D and 3-D data as well. Specifically, the proposed approach offers a useful set of tools and documentation for data-driven signal processing research and practical implementation.
    摘要 复杂值神经网络在信号处理、感知和通信领域表现出了优秀的模型化能力。然而,开发复杂值模型目前需要开发基础的深度学习操作,如线性或核函数层,因为现代深度学习框架如PyTorch和TensorFlow等不充分支持复杂值神经网络。这篇论文介绍了基于PyTorch的一个包,旨在实现轻量级的复杂值神经网络操作和架构。与自然语言理解(NLU)一样,RF信号理解(RFSU)是一个有前途的领域,通过将信号机械学的视角与数据驱动模型的力量相结合,从传统的信号处理算法中扩展出来的。我们在这个包中提供了高效的线性、核函数、注意模块以及激活函数和 нормализаayer,以及一些未explored的复杂值神经网络层,如杯形正则和层正则。尽管我们主要关注1-D数据tensor,由于对信号处理、通信和雷达数据的ocus,许多routines也被实现为2-D和3-D数据。特别是,我们的方法提供了一套有用的工具和文档,供数据驱动的信号处理研究和实践中使用。

Learning to Warm-Start Fixed-Point Optimization Algorithms

  • paper_url: http://arxiv.org/abs/2309.07835
  • repo_url: https://github.com/stellatogrp/l2ws_fixed_point
  • paper_authors: Rajiv Sambharya, Georgina Hall, Brandon Amos, Bartolomeo Stellato
  • for: 这个论文旨在提出一种机器学习框架,用于优化固定点算法。
  • methods: 该框架包含一个神经网络,用于将问题参数转换为固定点启动值,然后进行一定数量的固定点迭代。文章提出了两个损失函数,用于将神经网络的输出与真实解做比较。
  • results: 通过应用这种框架,可以在控制、统计和信号处理等领域中,对固定点算法进行优化,并且可以减少解决问题所需的迭代次数和解决时间。
    Abstract We introduce a machine-learning framework to warm-start fixed-point optimization algorithms. Our architecture consists of a neural network mapping problem parameters to warm starts, followed by a predefined number of fixed-point iterations. We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution. In this way, the neural network predicts warm starts with the end-to-end goal of minimizing the downstream loss. An important feature of our architecture is its flexibility, in that it can predict a warm start for fixed-point algorithms run for any number of steps, without being limited to the number of steps it has been trained on. We provide PAC-Bayes generalization bounds on unseen data for common classes of fixed-point operators: contractive, linearly convergent, and averaged. Applying this framework to well-known applications in control, statistics, and signal processing, we observe a significant reduction in the number of iterations and solution time required to solve these problems, through learned warm starts.
    摘要 我团队引入一种机器学习框架,用于启动fixed-point优化算法。我们的架构包括一个神经网络,将问题参数映射到启动点,然后跟踪一定数量的固定点迭代。我们提出了两个损失函数,一个是将固定点剩余拟合到最小,另一个是将固定点与真实解偏移到最小。因此,神经网络预测的启动点的目的是将下游损失函数最小化。我们的架构的一个重要特点是其灵活性,可以预测任何数量的固定点迭代后的启动点,不受训练数据中的固定点迭代数量限制。我们提供了PAC-Bayes一致bounds,用于未经见过的数据集。在控制、统计和信号处理等应用中,我们发现通过学习的启动点,可以减少解决问题所需的迭代数和解决时间。

Directed Scattering for Knowledge Graph-based Cellular Signaling Analysis

  • paper_url: http://arxiv.org/abs/2309.07813
  • repo_url: None
  • paper_authors: Aarthi Venkat, Joyce Chew, Ferran Cardoso Rodriguez, Christopher J. Tape, Michael Perlmutter, Smita Krishnaswamy
  • for: 这篇论文是为了描述一种新的推理方法,即指向散射自动编码器(DSAE),用于学习科学知识图的层次结构。
  • methods: 这种方法使用指向的几何散射变换,并结合自动编码器的非线性几何性和几何空间的几何性,来学习潜在层次结构。
  • results: 论文表明,使用DSAE方法可以在描述指向图的任务上表现出色,超过了许多其他方法。
    Abstract Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.
    摘要 <>TRANSLATE_TEXTDirected graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.TRANSLATE_TEXT

Communication Efficient Private Federated Learning Using Dithering

  • paper_url: http://arxiv.org/abs/2309.07809
  • repo_url: None
  • paper_authors: Burak Hasircioglu, Deniz Gunduz
  • for: 保护隐私 während 确保有效的通信
  • methods: 使用 subtractive dithering 量化方法实现
  • results: 实验结果表明,我们的方法可以保持与其他客户端的隐私相同水平,同时减少了通信量,并且准确率与使用全精度Gradient方法相同。
    Abstract The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.
    摘要 federated learning 中保护隐私的任务是一个基本挑战,我们在可信汇聚模型中解决了这个问题,并提出了同时实现隐私和效率通信的解决方案。我们表明,在客户端使用基于减准的抖动 quantization scheme 可以有效地重现在汇聚器中添加正常噪声的过程。这意味着我们可以对其他客户端保持同样的隐私保护水平,同时减少了与汇聚器之间的通信量,相比于将全精度梯度传输和使用中央噪声添加。我们还经验证了我们的提议方法和全精度梯度方法的准确性相同。

Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

  • paper_url: http://arxiv.org/abs/2309.07742
  • repo_url: None
  • paper_authors: Emanuele Marconato, Andrea Passerini, Stefano Teso
  • for: 本研究旨在提供一个数学框架,以便获取可解释的表示,并且适用于后期解释和基于概念的神经网络。
  • methods: 本研究使用了 causal 表示学习的最新进展,并且直接模型了人类投资者的元素。这允许我们获取一个原则性的可解释性概念,并且与人类的概念词汇相互对应。
  • results: 本研究提出了一个名为“名称转移游戏”的概念,用于连结了解可解释性和可读性。此外,我们还证明了可解释性和概念泄露之间的关联,以及内容Style分类的关联。
    Abstract Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.
    摘要 Focus in Explainable AI 是从低级元素定义的解释向解释编码在数据上学来的可解释概念移植。然而,如何可靠地获得这些概念仍然是一个不清楚的问题。无共识的概念可解释性存在,导致使用 post-hoc 解释器和基于神经网络的概念网络的概念都是通过不同的、互不兼容的策略获得的。然而,大多数情况忽略了人类的问题:一个表示只有在人类接收端可以理解。人类可解释表示学习的关键挑战在于如何模型和实现这个人类元素。在这种工作中,我们提出了一种数学框架,用于获得适用于 post-hoc 解释器和基于神经网络的可解释表示。我们的 HRL formalization 基于 recent Advances in causal representation learning,并直接模型了人类股东。这使得我们可以 derivation 一种原则性的对机器表示和人类理解的词汇之间的协调。在这样做之前,我们将 alignment 和可解释性联系起来,通过一个简单直观的名称传递游戏,并将 alignment 与知名的表示性质联系起来。我们还表明了协调与潜在的概念泄露问题和内容风格分离问题之间的联系, alles durch eine allgemeine information-theoretische Reformulierung dieser Eigenschaften。我们的概念化旨在跨越人类和算法两侧的可解释性,建立一个新的研究领域,即人类可解释表示。

Slow Invariant Manifolds of Singularly Perturbed Systems via Physics-Informed Machine Learning

  • paper_url: http://arxiv.org/abs/2309.07946
  • repo_url: None
  • paper_authors: Dimitrios G. Patsatzis, Gianluca Fabiani, Lucia Russo, Constantinos Siettos
  • for: 该论文旨在提出一种基于物理信息的机器学习(PIML)方法,用于近似稳定演化涉及的各种系统中的慢演化替换(SIMs)。
  • methods: 该方法使用了两种神经网络结构,即逻辑神经网络(FNNs)和随机投影神经网络(RPNNs),并使用了符号计算来计算学习过程中所需的梯度。
  • results: 该方法可以提供与传统GSPT基于方法相等或更高的准确性,并且不受杂态参数的大小影响。此外,该方法的计算成本比Symbolic、自动和数值近似方法更低。
    Abstract We present a physics-informed machine-learning (PIML) approach for the approximation of slow invariant manifolds (SIMs) of singularly perturbed systems, providing functionals in an explicit form that facilitate the construction and numerical integration of reduced order models (ROMs). The proposed scheme solves a partial differential equation corresponding to the invariance equation (IE) within the Geometric Singular Perturbation Theory (GSPT) framework. For the solution of the IE, we used two neural network structures, namely feedforward neural networks (FNNs), and random projection neural networks (RPNNs), with symbolic differentiation for the computation of the gradients required for the learning process. The efficiency of our PIML method is assessed via three benchmark problems, namely the Michaelis-Menten, the target mediated drug disposition reaction mechanism, and the 3D Sel'kov model. We show that the proposed PIML scheme provides approximations, of equivalent or even higher accuracy, than those provided by other traditional GSPT-based methods, and importantly, for any practical purposes, it is not affected by the magnitude of the perturbation parameter. This is of particular importance, as there are many systems for which the gap between the fast and slow timescales is not that big, but still ROMs can be constructed. A comparison of the computational costs between symbolic, automatic and numerical approximation of the required derivatives in the learning process is also provided.
    摘要 我们提出了一种物理学 informative machine learning(PIML)方法,用于简单减少系统中的迟停态 manifold(SIM)的近似,并提供了一个明确的函数形式,以便建立和计算简化过程的简化模型(ROM)。我们的方案解决了对对称等式(IE)的解决方案在几何学上困难调适理论(GSPT)框架下。为解决IE,我们使用了两种神经网络结构, specifically feedforward neural networks(FNNs)和随机投射神经网络(RPNNs),并使用symbolic differentiations来计算学习过程中所需的梯度。我们评估了我们的PIML方法的效率,使用了三个问题作为测试:Michaelis-Menten、target mediated drug disposition reaction mechanism和3D Sel'kov model。我们发现,我们的PIML方法可以提供相等或更高的精度,并且不受 perturbation parameter的大小影响。这是特别重要的,因为有许多系统,其中速速态和慢态态之间的差距不大,但仍然可以建立ROM。我们还提供了 символіic、自动和数字approximation的计算成本比较。

Understanding Vector-Valued Neural Networks and Their Relationship with Real and Hypercomplex-Valued Neural Networks

  • paper_url: http://arxiv.org/abs/2309.07716
  • repo_url: None
  • paper_authors: Marcos Eduardo Valle
    for:* This paper presents a broad framework for vector-valued neural networks (V-nets) that can naturally consider the intercorrelation between feature channels.methods:* The paper explains the relationship between vector-valued and traditional neural networks, and shows how V-nets can be implemented in current deep-learning libraries as real-valued networks.results:* The paper provides a more robust training method for deep learning models by using vector-valued neural networks with fewer parameters.
    Abstract Despite the many successful applications of deep learning models for multidimensional signal and image processing, most traditional neural networks process data represented by (multidimensional) arrays of real numbers. The intercorrelation between feature channels is usually expected to be learned from the training data, requiring numerous parameters and careful training. In contrast, vector-valued neural networks are conceived to process arrays of vectors and naturally consider the intercorrelation between feature channels. Consequently, they usually have fewer parameters and often undergo more robust training than traditional neural networks. This paper aims to present a broad framework for vector-valued neural networks, referred to as V-nets. In this context, hypercomplex-valued neural networks are regarded as vector-valued models with additional algebraic properties. Furthermore, this paper explains the relationship between vector-valued and traditional neural networks. Precisely, a vector-valued neural network can be obtained by placing restrictions on a real-valued model to consider the intercorrelation between feature channels. Finally, we show how V-nets, including hypercomplex-valued neural networks, can be implemented in current deep-learning libraries as real-valued networks.
    摘要 尽管深度学习模型在多维信号和图像处理方面取得了多个成功应用,但大多数传统神经网络仍然处理由多维数组表示的实数据。通常情况下,神经网络在训练数据中学习Feature通道之间的相互关联,需要大量参数并且精心训练。相比之下,向量值神经网络是为处理向量数组而设计的,自然地考虑Feature通道之间的相互关联,通常具有 fewer parameters,并且训练更加稳定。这篇论文旨在提出一个抽象框架,用于描述向量值神经网络,被称为V-网络。在这个上下文中,幂复数值神经网络被视为向量值模型的一种特殊情况,具有附加的代数性质。此外,这篇论文还解释了向量值神经网络与传统神经网络之间的关系。具体来说,一个向量值神经网络可以通过对实数据进行限制,使其考虑Feature通道之间的相互关联。最后,我们展示了如何在当前深度学习库中实现V-网络,包括幂复数值神经网络。

Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context

  • paper_url: http://arxiv.org/abs/2309.07708
  • repo_url: None
  • paper_authors: Haochong Xia, Shuo Sun, Xinrun Wang, Bo An
  • for: 这个论文的目的是提高金融预测精度、管理风险和促进金融决策。
  • methods: 该论文提出了一种新的金融市场模拟方法,包括市场动力模型、动态时间戳拼接和生成对抗网络。
  • results: 论文在使用 Dow Jones 工业指数数据从 2000 年到 2023 年进行评估,并显示了与 4 种现有时间序列生成模型相比的超越性表现。
    Abstract Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making. Despite the development of financial market simulation methodologies, existing frameworks often struggle with adapting to specialized simulation context. We pinpoint the challenges as i) current financial datasets do not contain context labels; ii) current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities; iii) the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data. To address these challenges, our contributions are: i) we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics; ii) we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer; iii) we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the pertaining stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage. We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.
    摘要 金融模拟器在提高预测精度、管理风险和促进战略金融决策方面扮演着重要角色。despite the development of financial market simulation methodologies, existing frameworks often struggle with adapting to specialized simulation context. We identify the challenges as follows:1. current financial datasets do not contain context labels;2. current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities;3. the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data.To address these challenges, our contributions are as follows:1. we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics;2. we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer;3. we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the first stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage.We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.

Causal Entropy and Information Gain for Measuring Causal Control

  • paper_url: http://arxiv.org/abs/2309.07703
  • repo_url: None
  • paper_authors: Francisco Nunes Ferreira Quialheiro Simoes, Mehdi Dastani, Thijs van Ommen
  • for: 本文旨在提出一种基于 causal 结构的信息理论量表示Feature Selection的方法,以提高模型的解释性。
  • methods: 本文提出了两种新的信息理论量: causal entropy 和 causal information gain,它们可以评估特定输出变量的 causal 影响。
  • results: 对比标准的共购信息, causal information gain 能够更好地揭示控制输出变量的特定特征。
    Abstract Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.
    摘要 人工智能模型和方法通常缺乏 causal 解释力。尽管批处可解释机器学习(IML)方法得到了进步,它们经常将重要性归属于无关 causal 影响的输出变量。选择 causally 相关的特征 среди被这些方法认为是相关的特征,或者在模型训练之前进行选择,可以提供一个解决方案。基于信息理论量的特征选择方法已经成功地鉴定了统计上相关的特征。然而,这些信息理论量并不包含 causality,因此无法适用于这些情况。为解决这个挑战,本文提出了包含系统 causal 结构的信息理论量,可以用来评估特征对某个输出变量的 causal 重要性。specifically,我们引入 causal 版本的 entropy 和 mutual information,称为 causal entropy 和 causal 信息增强,用于评估特征对输出变量的控制能力。这些新定义的量捕捉变量在其他变量的间接作用下的 entropy 变化。我们 derivation 了基本的结论,证明这些量与 causal 效应的存在有关。我们还示出了使用 causal 信息增强在特征选择中的优势, highlighting 它们可以更好地揭示选择输出变量的控制特征。我们的调查开创了用于在 causation 领域中提高解释力的方法的道路。

FedFNN: Faster Training Convergence Through Update Predictions in Federated Recommender Systems

  • paper_url: http://arxiv.org/abs/2309.08635
  • repo_url: None
  • paper_authors: Francesco Fabbri, Xianghang Liu, Jack R. McKenzie, Bartlomiej Twardowski, Tri Kurniawan Wijaya
  • for: 这篇论文的目的是提出一种加速分布式机器学习的算法,以提高在线个性化的时间效率,同时保持用户数据隐私。
  • methods: 这篇论文使用的方法是预测未被采样的用户的weight更新,使用采样集的更新来预测。
  • results: 这篇论文的实验结果显示,FedFNN可以比其他方法更快地完成分布式模型训练,并且可以维持或改善准确性。
    Abstract Federated Learning (FL) has emerged as a key approach for distributed machine learning, enhancing online personalization while ensuring user data privacy. Instead of sending private data to a central server as in traditional approaches, FL decentralizes computations: devices train locally and share updates with a global server. A primary challenge in this setting is achieving fast and accurate model training - vital for recommendation systems where delays can compromise user engagement. This paper introduces FedFNN, an algorithm that accelerates decentralized model training. In FL, only a subset of users are involved in each training epoch. FedFNN employs supervised learning to predict weight updates from unsampled users, using updates from the sampled set. Our evaluations, using real and synthetic data, show: 1. FedFNN achieves training speeds 5x faster than leading methods, maintaining or improving accuracy; 2. the algorithm's performance is consistent regardless of client cluster variations; 3. FedFNN outperforms other methods in scenarios with limited client availability, converging more quickly.
    摘要 分布式学习(FL)已成为分布式机器学习的关键方法,提高在线个性化而保护用户数据隐私。在传统方法中,用户的私人数据将被中央服务器发送,而在FL中,设备在本地进行计算,并将更新发送到全球服务器。在这种情况下,一个主要挑战是实现快速准确的模型训练——这对于推荐系统来说非常重要,因为延迟可能会影响用户的参与度。本文介绍了FedFNN算法,它可以加速分布式模型训练。在FL中,只有一 subset of 用户参与每次训练epoch。FedFNN使用supervised learning来预测unsampled 用户的weight更新,使用sampled set的更新进行预测。我们的评估,使用实际和synthetic数据,显示:1. FedFNN可以在5倍 бы于领先方法进行训练,保持或提高准确性;2. FedFNN的性能是客户端群集变化的不变的;3. FedFNN在有限客户可用情况下比其他方法更快 converges。

A DenseNet-based method for decoding auditory spatial attention with EEG

  • paper_url: http://arxiv.org/abs/2309.07690
  • repo_url: https://github.com/xuxiran/asad_densenet
  • paper_authors: Xiran Xu, Bo Wang, Yujie Yan, Xihong Wu, Jing Chen
  • For: The paper aims to improve the performance of auditory spatial attention detection (ASAD) using a 3D deep convolutional neural network (DenseNet-3D) to extract temporal and spatial features of the neural representation for the attended locations.* Methods: The proposed method transforms the original EEG channels into a 2D spatial topological map and then uses a 3D DenseNet to extract temporal and spatial features of the neural representation for the attended locations.* Results: The proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet’s 90.6%) with a 1-second decision window for the widely used KULeuven (KUL) dataset.
    Abstract Auditory spatial attention detection (ASAD) aims to decode the attended spatial location with EEG in a multiple-speaker setting. ASAD methods are inspired by the brain lateralization of cortical neural responses during the processing of auditory spatial attention, and show promising performance for the task of auditory attention decoding (AAD) with neural recordings. In the previous ASAD methods, the spatial distribution of EEG electrodes is not fully exploited, which may limit the performance of these methods. In the present work, by transforming the original EEG channels into a two-dimensional (2D) spatial topological map, the EEG data is transformed into a three-dimensional (3D) arrangement containing spatial-temporal information. And then a 3D deep convolutional neural network (DenseNet-3D) is used to extract temporal and spatial features of the neural representation for the attended locations. The results show that the proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet's 90.6%) with 1-second decision window for the widely used KULeuven (KUL) dataset, and the code to implement our work is available on Github: https://github.com/xuxiran/ASAD_DenseNet
    摘要 听觉空间注意力检测(ASAD)目标是通过电enzephalogram(EEG)在多个发声人员的情况下解码注意力所投入的空间位置。ASAD技术由大脑偏好听觉空间注意力处理中的 cortical neural response的 lateralization inspirited,并在听觉注意力解码(AAD)中表现出了良好的性能。在先前的ASAD方法中,EEG电极的空间分布未能得到完全利用,这可能会限制这些方法的性能。在当前的工作中,我们将原始EEG通道转换成二维(2D)空间Topological Map,将EEG数据转换成三维(3D)排序,其中包含空间-时间信息。然后,我们使用3D深度卷积神经网络(DenseNet-3D)提取时间和空间特征,以提高注意力解码精度。结果显示,我们的方法在KUL数据集上达到了94.4%的解码精度,高于先前的SOTA方法(XANet的90.6%)。代码可以在GitHub上获取:https://github.com/xuxiran/ASAD_DenseNet。

Benchmarking machine learning models for quantum state classification

  • paper_url: http://arxiv.org/abs/2309.07679
  • repo_url: None
  • paper_authors: Edoardo Pedicillo, Andrea Pasquale, Stefano Carrazza
  • for: 本文是用于描述量子计算领域中两级量子状态(qubits)的处理方法。
  • methods: 本文使用了多种分类技术来识别测量结果中的基态和升态。
  • results: 本文对真实的量子设备进行了多种分类技术的 benchmarking。
    Abstract Quantum computing is a growing field where the information is processed by two-levels quantum states known as qubits. Current physical realizations of qubits require a careful calibration, composed by different experiments, due to noise and decoherence phenomena. Among the different characterization experiments, a crucial step is to develop a model to classify the measured state by discriminating the ground state from the excited state. In this proceedings we benchmark multiple classification techniques applied to real quantum devices.
    摘要 量子计算是一个快速发展的领域,信息通过两级量子状态 known as qubits 处理。当前实现 qubits 需要精心准备,由于噪声和紊乱现象而需要不同的实验。其中一个关键步骤是开发一个模型,用于将测量结果分类, diferenciating 基准态从受激态。在这个论文中,我们对真实量子设备上的多种分类技术进行了比较。

Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis

  • paper_url: http://arxiv.org/abs/2309.07675
  • repo_url: None
  • paper_authors: Mehdi Zadem, Sergio Mover, Sao Mai Nguyen
  • for: 本研究旨在提出一种开发机制,以便自动发现符号化目标表示,并使用这种表示来进行高级刺激学习。
  • methods: 本研究使用了符号推理来实现目标发现,并使用了一种层次推理算法来同时学习目标表示和层次策略。
  • results: 实验结果表明,使用符号推理进行目标发现可以提高数据效率,并且学习的目标表示是可解释的和可传输的。
    Abstract Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.
    摘要 开放式学习受益于符号方法的目标表示,因为它们提供了结构化知识的方式,以便有效和可传递地学习。然而,现有的层次强化学习(HRL)方法,它们通常是通过手动设定目标表示来实现的,这会带来一个挑战:自动发现符号目标表示,需要保留环境动力学的重要信息。在这篇论文中,我们提出一种发展机制,通过生成表示来自动发现目标。我们引入了一种封建的HRL算法,同时学习目标表示和层次策略。该算法使用神经网络上的符号可达性分析来approximate环境转移关系,并用此来精细化目标表示。我们在复杂的导航任务上评估了我们的方法,显示学习的表示是可读取的、可传递的和数据效率地学习。

Physics-constrained robust learning of open-form PDEs from limited and noisy data

  • paper_url: http://arxiv.org/abs/2309.07672
  • repo_url: None
  • paper_authors: Mengge Du, Longfeng Nie, Siyu Lou, Yuntian Chenc, Dongxiao Zhang
  • for: This paper aims to propose a framework for robustly uncovering open-form partial differential equations (PDEs) from limited and noisy data, which is a significant challenge in nonlinear dynamic systems.
  • methods: The proposed framework, called R-DISCOVER, uses two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training.
  • results: The numerical experiments demonstrate that the proposed framework can uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.
    Abstract Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.
    摘要 <>translate "Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding." into Simplified Chinese.以下是文本的中文翻译:“揭示非线性动力系统下的基本法则还是一项具有挑战性的任务,特别是当面临有噪讯观测和没有先验知识时。这项研究提出了R-DISCOVER框架,用于有效地推测非线性动力系统中的开放式偏微分方程(PDE)。该框架通过两个交替更新过程:发现和嵌入来实现。发现阶段使用 симвоlic Representation 和基于强化学习(RL)的混合 PDE 生成器来高效地生成多种开放式 PDE 树结构。一个基于神经网络的预测模型用于评估系统响应,并作为生成器的奖励评价器。RL 方法用于迭代优化生成器,并选择最佳performing PDE 以 Parameters-free 稳定度量。嵌入阶段将初始identified PDE 作为物理约束 integrate 到预测模型中,以实现Robust 训练。自动构建计算图和嵌入过程 без人工参与的 Traversal PDE 树 automates 构建计算图和嵌入过程。 numerics experiments 表明我们的框架可以从有限和高噪讯数据中推测动力系统的基本法则,并在其他基于物理学习网络的发现方法中表现出色。这项工作开启了新的可能性,用于探索具有有限理解的实际系统。”

Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE

  • paper_url: http://arxiv.org/abs/2309.07663
  • repo_url: None
  • paper_authors: Yuma Ichikawa, Koji Hukushima
  • for: 避免Variational Autoencoder(VAE)中的 posterior collapse,提高表示学习质量。
  • methods: 使用高维limite下的minimal VAE,通过closed-form表达分析β参数与数据集大小、 posterior collapse、rate-distortion曲线之间的关系。
  • results: beta参数可以induce posterior collapse,不同于常见的 regularization parameters。beta值越高,普遍错误曲线上延伸的满板期变得越长,超过某个 beta 阈值后变为∞。这意味着beta需要仔细调整,并且较大的数据集需要以高率实现高质量的rate-distortion曲线。
    Abstract In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.
    摘要 在变量自适应器(VAE)中,变量 posterior 经常与假设一样,这被称为 posterior collapse,这会妨碍表示学习质量。为了解决这个问题,VAE中引入了可调参数 beta。这篇论文提供了关于 beta 在 VAE、数据集大小、 posterior collapse 和 rate-distortion 曲线之间的关系的关闭式表达。这些结果表明,在一定的 beta 阈值以上,generalization error 的总化Error 会出现长满的浓淡区。 beta 增加时,浓淡区的长度延长,并 eventually 变为无限大。这意味着 beta 不同于常见的 regularization 参数,可以导致 posterior collapse,而不是 dataset size。因此,beta 是一个危险的参数,需要仔细调整。此外,考虑数据集大小对 rate-distortion 曲线的依赖,需要一个较大的数据集以获得高率的 rate-distortion 曲线。广泛的数值实验支持我们的分析。

Structure-Preserving Transformers for Sequences of SPD Matrices

  • paper_url: http://arxiv.org/abs/2309.07579
  • repo_url: https://github.com/mathieuseraphim/spdtransnet
  • paper_authors: Mathieu Seraphim, Alexis Lechervy, Florian Yger, Luc Brun, Olivier Etard
  • for: 这 paper 是用于分类Symmetric Positive Definite matrices 的sequence,保持它们的Riemannian geometry。
  • methods: 这 paper 使用Transformer-based auto-attention机制,将EEG-derived covariance matrices 转化为sequence,并进行自动睡眠阶段分类。
  • results: 这 paper 在使用标准数据集上实现了高水平的stage-wise性能。
    Abstract In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.
    摘要 Note:* "Symmetric Positive Definite" (SPD) matrices are matrices that are symmetric and positive definite, meaning that they are equal to their own transpose and have all positive eigenvalues.* "Riemannian geometry" refers to the study of curved spaces, such as the surface of a sphere or a saddle-shaped surface, and the properties of curves and surfaces within those spaces. In this context, the Riemannian geometry of the SPD matrices is preserved throughout the analysis.

Naturalistic Robot Arm Trajectory Generation via Representation Learning

  • paper_url: http://arxiv.org/abs/2309.07550
  • repo_url: None
  • paper_authors: Jayjun Lee, Adam J. Spiers
  • for: 这个论文旨在提高助手机器人在家庭环境中的预测和人类化运动。特别是用于支持脊梁部分 паралиysis 患者的独立生活。
  • methods: 本文使用自我监督学习方法,使用自适应域 spatial-temporal图 neural network 来学习人类示范动作。通过穿戴在人臂上的 IMU 传感器记录的无动作任务示范数据,学习出自然和功能的喝奶动作轨迹 для UR5e 机器人臂。
  • results: 研究表明,通过自适应域 spatial-temporal图 neural network 学习人类示范动作,可以生成自然和功能的喝奶动作轨迹,并且可以在不同的人类动作示范数据上进行适应和适应。
    Abstract The integration of manipulator robots in household environments suggests a need for more predictable and human-like robot motion. This holds especially true for wheelchair-mounted assistive robots that can support the independence of people with paralysis. One method of generating naturalistic motion trajectories is via the imitation of human demonstrators. This paper explores a self-supervised imitation learning method using an autoregressive spatio-temporal graph neural network for an assistive drinking task. We address learning from diverse human motion trajectory data that were captured via wearable IMU sensors on a human arm as the action-free task demonstrations. Observed arm motion data from several participants is used to generate natural and functional drinking motion trajectories for a UR5e robot arm.
    摘要 文中提到了在家庭环境中 integrating manipulate 机器人,这表明了更加预测可靠并且人类化的机器人运动的需求,尤其是用于椅子上的帮助机器人,以支持脊梁部分瘫痪人的独立。本文探讨了通过人类示范者的模仿来生成自然的动作轨迹。我们使用了一种自动适应空间时间图 neuronal network 来实现这一点,并使用了来自不同人类动作轨迹数据的穿梭IMU传感器来学习。我们使用了多个参与者的臂动作数据来生成自然而有用的喝彩动作轨迹 для UR5e 机器人臂。

Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering

  • paper_url: http://arxiv.org/abs/2309.07548
  • repo_url: None
  • paper_authors: Yuki Akiyama, Konstantinos Slavakis
  • For: 本研究旨在探讨动力学学习(Reinforcement Learning,RL)的算法基础和理论核心,特别是引入距离bellman映射,这种映射在征表kernel空间(RKHS)中定义,并且可以利用RKHS的强表示性和内积来获得更好的近似性。* Methods: 本研究使用的方法包括提出了一种新的距离bellman映射,这种映射在RKHS中定义,并且可以利用RKHS的强表示性和内积来获得更好的近似性。此外,本研究还提出了一种基于这种映射的策略迭代算法,用于在线选择最佳exponent $p$,以适应线性适应过滤器中的异常值。* Results: 本研究的数据测试显示,基于提出的映射和策略迭代算法的方法可以在synthetic数据上显示出优于非RL和基于kernel RL的方法。
    Abstract This paper aims at the algorithmic/theoretical core of reinforcement learning (RL) by introducing the novel class of proximal Bellman mappings. These mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit from the rich approximation properties and inner product of RKHSs, they are shown to belong to the powerful Hilbertian family of (firmly) nonexpansive mappings, regardless of the values of their discount factors, and possess ample degrees of design freedom to even reproduce attributes of the classical Bellman mappings and to pave the way for novel RL designs. An approximate policy-iteration scheme is built on the proposed class of mappings to solve the problem of selecting online, at every time instance, the "optimal" exponent $p$ in a $p$-norm loss to combat outliers in linear adaptive filtering, without training data and any knowledge on the statistical properties of the outliers. Numerical tests on synthetic data showcase the superior performance of the proposed framework over several non-RL and kernel-based RL schemes.
    摘要

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

  • paper_url: http://arxiv.org/abs/2309.07544
  • repo_url: None
  • paper_authors: Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, Haoxing Ren
  • for: This paper is written for evaluating the performance of large language models (LLMs) in generating Verilog code for hardware design and verification.
  • methods: The paper proposes a benchmarking framework for LLMs that includes a comprehensive evaluation dataset of 156 problems from the Verilog instructional website HDLBits, and a method for automatically testing the generated Verilog code for functional correctness.
  • results: The paper shows that the Verilog code generation capability of pretrained language models can be improved with supervised fine-tuning by bootstrapping with LLM-generated synthetic problem-code pairs.Here’s the simplified Chinese version of the three key points:
  • for: 这篇论文是为了评估大语言模型(LLM)在硬件设计和验证中的Verilog代码生成性能而写的。
  • methods: 论文提出了一种特化于LLM的评估框架,包括156个HDLLBits网站上的Verilog教程问题集,以及一种自动比较生成的Verilog代码与经典解决方案的方法。
  • results: 论文表明,通过监督微调,使用LLM生成的 simulateproblem-code对可以提高预训练的语言模型的Verilog代码生成能力。
    Abstract The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.
    摘要 LLMs的增长 популярность开创了它们在多个领域的应用。这篇论文提出了专门为鉴定LLM表现在Verilog代码生成和验证中而设计的 bencmarking 框架。我们提供了完整的评估数据集,包括156个来自 HDLBits 教程网站的 Verilog 代码生成任务,这些任务的复杂程度从简单的 combinational 电路到复杂的 finite state machine。生成的 Verilog 代码可以通过与 golden 解决方案进行对比来自动测试其功能正确性。此外,我们还证明了预训练的语言模型可以通过监督微调来提高 Verilog 代码生成能力。

Adaptive approximation of monotone functions

  • paper_url: http://arxiv.org/abs/2309.07530
  • repo_url: None
  • paper_authors: Pierre Gaillard, Sébastien Gerchinovitz, Étienne de Montbrun
  • for: 本研究 targets the classical problem of approximating a non-decreasing function $f$ in $L^p(\mu)$ norm using sequential queries.
  • methods: 我们提出了一种新的GreedyBox算法,是Novak(1992)原始的数值积分算法的推广。我们证明了GreedyBox算法在任何函数$f$上具有最佳样本复杂度。
  • results: 我们发现了多种性能差异,包括适应性和非适应性算法、平滑和分割平滑函数、 monotone和非 monotone函数之间的性能差异。此外,我们还计算了最佳合理极小样本复杂度 для特定的函数$f$.
    Abstract We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after stopping. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results.
    摘要 我们研究了一个古典问题,即使用顺序询问函数 $f: \mathcal{X} \to \mathcal{Y}$ 的渐近估计,在 $L^p(\mu)$ нор下的渐近估计问题。我们考虑了知道的封闭实数interval $\mathcal{X}$, $\mathcal{Y}$ 和知道的概率度量 $\mu$ на $\mathcal{X}$。对于任何函数 $f$,我们Characterize了终止后的 $\hat{f}$ 的 $L^p(\mu)$ 误差下的最小询问数量,而不是worst-case结果,这些结果是函数 $f$ 具体的dependent。为解决这个问题,我们引入 GreedyBox 算法,是 Novak (1992) 提出的数值积分算法的一个扩展。我们证明了 GreedyBox 算法在任何函数 $f$ 上实现最佳样本Complexity,几乎是 logarithmic factor。此外,我们发现了关于 piecewise-smooth 函数的结果。对于这些函数,GreedyBox 算法的 $L^p(\mu)$ 误差明显更快下降,比预期的更快。甚至可以通过一个简单的修改,使 GreedyBox 算法在这些函数上实现最佳的混合估计率,我们 computed explicitly。我们的结论显示了多个性能差异,包括适应和非适应算法、紧密和 piecewise-smooth 函数、以及单调和非单调函数。最后,我们提供了一些数学实验支持我们的理论结果。

Learning Beyond Similarities: Incorporating Dissimilarities between Positive Pairs in Self-Supervised Time Series Learning

  • paper_url: http://arxiv.org/abs/2309.07526
  • repo_url: None
  • paper_authors: Adrian Atienza, Jakob Bardram, Sadasivan Puthusserypady
  • for: 用于提高时间序列数据中的冲击性识别率
  • methods: 利用自然学习方法,通过对着同类样本进行压缩编码,并考虑异同样本的关系,从而提高时间序列数据的表征
  • results: 在使用电心率信号进行识别AFib时,提高了+10%的检测精度,并且可能开拓了新的SSL方法应用于时间序列数据
    Abstract By identifying similarities between successive inputs, Self-Supervised Learning (SSL) methods for time series analysis have demonstrated their effectiveness in encoding the inherent static characteristics of temporal data. However, an exclusive emphasis on similarities might result in representations that overlook the dynamic attributes critical for modeling cardiovascular diseases within a confined subject cohort. Introducing Distilled Encoding Beyond Similarities (DEBS), this paper pioneers an SSL approach that transcends mere similarities by integrating dissimilarities among positive pairs. The framework is applied to electrocardiogram (ECG) signals, leading to a notable enhancement of +10\% in the detection accuracy of Atrial Fibrillation (AFib) across diverse subjects. DEBS underscores the potential of attaining a more refined representation by encoding the dynamic characteristics of time series data, tapping into dissimilarities during the optimization process. Broadly, the strategy delineated in this study holds the promise of unearthing novel avenues for advancing SSL methodologies tailored to temporal data.
    摘要 自适应学习(SSL)方法在时间序列分析中已经表现出其效iveness,通过识别同时间序列数据的相似性来编码时间序列中的静态特征。但是,强调同时间序列的相似性可能会忽略了模型心血管疾病中的动态特征,这可能会导致模型的识别率下降。为了解决这个问题,本文提出了一种名为“压缩编码 beyond similarities”(DEBS)的SSL方法,它通过对正例对的不同性进行束缚来补充相似性编码。这种方法在应用于电cardiogram(ECG)信号上,导致了AFib检测精度的提高达10%。DEBS表明了在SSL方法中编码时间序列数据的动态特征可以通过压缩编码来提高模型的性能。总之,本研究的策略可能会探索新的SSL方法,以更好地适应时间序列数据的特点。

Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering

  • paper_url: http://arxiv.org/abs/2309.07486
  • repo_url: None
  • paper_authors: Sepideh Aghamolaei, Mohammad Ghodsi
  • for: 本文 escrit para resolver el problema de clasificación de puntos etiquetados con k label, y presenta una aproximación de la solución utilizando un algoritmo de ordenamiento de mapas de calor.
  • methods: El algoritmo utilizado es un algoritmo de ordenamiento de mapas de calor, que se basa en la reducción de dimensionalidad mediante hashing localmente sensitivo. Se prueba la complejidad NP-hard del problema y se da un algoritmo de aproximación para un caso especial NP-hard.
  • results: Se compara el algoritmo propuesto con k-means y DBSCAN en términos de calidad de agrupación y tiempo de ejecución en varios grafos dirigidos y no dirigidos de redes de correo electrónico y redes de computadoras. Los resultados muestran que el algoritmo propuesto tiene una mejor calidad de agrupación y un tiempo de ejecución más rápido que k-means y DBSCAN en muchos casos.
    Abstract Given a set of points labeled with $k$ labels, we introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. We prove the problem is NP-hard and we give a fixed-parameter algorithm with a constant number of rounds in the massively parallel computation model, where each machine has a sublinear memory and the total memory of the machines is linear. We give an approximation algorithm for a NP-hard special case of the problem. We empirically compare our algorithm with k-means and density-based clustering (DBSCAN) using a dimensionality reduction via locality-sensitive hashing on several directed and undirected graphs of email and computer networks.
    摘要 给定一个点集合,我们引入热图排序问题,即重新排序和合并点和维度,保持cluster(标签)。一个cluster preserved if it remains connected, 即不被分解成多个cluster并且没有两个cluster合并。 我们证明该问题是NP困难,并提供了一个固定参数算法,其中每台机器有SUBLINEAR的内存,总内存量为线性。我们还提供了一个优化算法 дляNP困难的特殊情况。我们通过本地敏感哈希进行维度减少对email和计算机网络directed和undirected图进行实验比较我们的算法与k-means和DBSCAN分 clustering。

Improved Auto-Encoding using Deterministic Projected Belief Networks

  • paper_url: http://arxiv.org/abs/2309.07481
  • repo_url: None
  • paper_authors: Paul M Baggenstoss
  • for: 本研究利用决定性投影信念网络(D-PBN)的特有特性,全面利用可训练复合启动函数(TCAs)。
  • methods: 本研究使用D-PBN auto-encoder,并利用TCAs作为活化函数。
  • results: 研究发现,使用D-PBN auto-encoder和TCAs可以明显超越标准 auto-encoder,包括变量 auto-encoder。
    Abstract In this paper, we exploit the unique properties of a deterministic projected belief network (D-PBN) to take full advantage of trainable compound activation functions (TCAs). A D-PBN is a type of auto-encoder that operates by "backing up" through a feed-forward neural network. TCAs are activation functions with complex monotonic-increasing shapes that change the distribution of the data so that the linear transformation that follows is more effective. Because a D-PBN operates by "backing up", the TCAs are inverted in the reconstruction process, restoring the original distribution of the data, thus taking advantage of a given TCA in both analysis and reconstruction. In this paper, we show that a D-PBN auto-encoder with TCAs can significantly out-perform standard auto-encoders including variational auto-encoders.
    摘要 在这篇论文中,我们利用杜比贝尔网络(D-PBN)的特有性,全面利用可调编组活化函数(TCAs)。D-PBN 是一种自适应神经网络,通过“后退”方式运作,可以将数据分布改变,使之更容易进行线性变换。因为 D-PBN 在重建过程中会将 TCAs 反转,所以可以利用给定的 TCA 进行分析和重建。在这篇论文中,我们表明了 D-PBN 自适应神经网络加上 TCAs 可以明显超越标准自适应神经网络,包括可变自适应神经网络。

SC-MAD: Mixtures of Higher-order Networks for Data Augmentation

  • paper_url: http://arxiv.org/abs/2309.07453
  • repo_url: None
  • paper_authors: Madeline Navarro, Santiago Segarra
  • for: 这个论文旨在扩展基于图的对等连接到高阶关系上,以满足复杂系统的研究需求。
  • methods: 该论文提出了一种基于 simplicial complex 的数据增强方法,包括线性和非线性混合机制,以生成混合样本。此外,它还提出了一种几何归一化混合方法来实现数据集之间的关系。
  • results: 研究人员通过对实验数据集进行混合,并对混合样本进行分类,发现混合样本可以在Homomorphism densities上 interpolate among existing data。
    Abstract The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.
    摘要 多样化的复杂系统与多向交互刺激了对高阶关系的扩展。特别是 simplicial complex inspirits 对 graph neural network (GNNs) 的扩展。学习这些系统需要大量数据,可能是非常昂贵或者不可能获得。我们提议对 simplicial complex 进行数据扩充,通过线性和非线性混合机制返回混合的已有标注样本。此外,我们还提出了一种几何 clustering 混合方法,以便在数据驱动的关系下 interpolate 多个 simplicial complex。我们理论上验证了resultant synthetic simplicial complexes 在 homomorphism density 上 interpolate 于现有数据。我们的方法在 both synthetic 和实际世界数据上进行了 simplicial complex classification 的示例。

Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

  • paper_url: http://arxiv.org/abs/2309.07452
  • repo_url: None
  • paper_authors: Lianke Qin, Zhao Song, Baocheng Sun
  • for: 本研究的目的是证明NTK regression和GNN Training是等价的。
  • methods: 本文使用NTK kernel方法来研究图学习,并提出了三个新的理论结论。
  • results: 本文证明了NTK regression和GNN Training是等价的,并提供了首个NTK formulation for node-level regression。
    Abstract A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network. NTK is a major step forward in the theoretical deep learning because it allows researchers to use traditional mathematical tools to analyze properties of deep neural networks and to explain various neural network techniques from a theoretical view. A natural extension of NTK on graph learning is \textit{Graph Neural Tangent Kernel (GNTK)}, and researchers have already provide GNTK formulation for graph-level regression and show empirically that this kernel method can achieve similar accuracy as GNNs on various bioinformatics datasets [dhs+19]. The remaining question now is whether solving GNTK regression is equivalent to training an infinite-wide multi-layer GNN using gradient descent. In this paper, we provide three new theoretical results. First, we formally prove this equivalence for graph-level regression. Second, we present the first GNTK formulation for node-level regression. Finally, we prove the equivalence for node-level regression.
    摘要 一种在理论深度学习中崛起的趋势是理解深度学习是如何工作的,通过神经束凝彻函数(NTK)来实现。NTK是一种等效于使用梯度下降来训练多层无穷广链神经网络的kernel方法。NTK对理论深度学习是一个重要的进步,它使得研究人员可以使用传统的数学工具来分析深度神经网络的性质和解释各种神经网络技术的概念视角。在NTK的基础之上,研究人员已经提出了\textit{图像神经束凝彻函数(GNTK)},并在图像级别 regression 中提供了GNTK的形式化。在此纸上,我们提供了三个新的理论结果。首先,我们正式证明了图像级别 regression 的等价性。其次,我们提供了第一个GNTK的形式化,用于节点级别 regression。最后,我们证明了节点级别 regression 的等价性。

TensorFlow Chaotic Prediction and Blow Up

  • paper_url: http://arxiv.org/abs/2309.07450
  • repo_url: None
  • paper_authors: M. Andrecut
  • for: 预测高维非线性系统的混沌动力学行为
  • methods: 使用TensorFlow库进行深度神经网络训练和预测
  • results: 短时预测能够得到有效结果,但长时预测会因TensorFlow库的非束定性导致结果快速衰减和爆炸
    Abstract Predicting the dynamics of chaotic systems is one of the most challenging tasks for neural networks, and machine learning in general. Here we aim to predict the spatiotemporal chaotic dynamics of a high-dimensional non-linear system. In our attempt we use the TensorFlow library, representing the state of the art for deep neural networks training and prediction. While our results are encouraging, and show that the dynamics of the considered system can be predicted for short time, we also indirectly discovered an unexpected and undesirable behavior of the TensorFlow library. More specifically, the longer term prediction of the system's chaotic behavior quickly deteriorates and blows up due to the nondeterministic behavior of the TensorFlow library. Here we provide numerical evidence of the short time prediction ability, and of the longer term predictability blow up.
    摘要 <>预测混沌系统的动态是机器学习领域中最为困难的任务之一。在这里,我们试图预测高维非线性系统的空间时间混沌动态。我们使用TensorFlow库,表示深度神经网络训练和预测的状态体系。虽然我们的结果吸引人,并显示了系统的短时预测能力,但我们也发现了TensorFlow库的意外和不愉快的行为。 Specifically,系统的混沌行为的长期预测快速下降和爆炸,这是因为TensorFlow库的非确定性行为。我们提供了数字证据,证明了短时预测能力和长期预测爆炸。

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

  • paper_url: http://arxiv.org/abs/2309.07418
  • repo_url: None
  • paper_authors: Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin
  • for: 这个论文主要针对的是优化大型语言模型(LLM)中的注意力 regression 问题。
  • methods: 作者提出了一种Iterative greedy algorithm,用于在 $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}{j_0} x ) , {\bf 1}n \rangle^{-1} \exp( \mathsf{A}{j_0} x ), A{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$ loss function上训练,这个loss function是一个一层注意力网络的目标函数。
  • results: 作者提出了一种时间复杂度为 $\widetilde{O}( ({\cal T}{\mathrm{mat}(n,n,d) + {\cal T}{\mathrm{mat}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ 的训练算法,用于将 $L(X,Y)$ 的损失降到 $\epsilon$ 之下。这个算法可以在大型语言模型中应用。
    Abstract Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}(n,n,d) + {\cal T}_{\mathrm{mat}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.
    摘要 大型语言模型(LLM)已经革命化了我们日常生活中的各种方面。在这种情况下,解决注意力回归是对LMM的优化的基本任务。在这种工作中,我们关注于给出可证明保证的一层注意力网络对函数$L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$是 kronecker乘积 between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. $X, Y \in \mathbb{R}^{d \times d}$是我们想要学习的变量, $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. 在多层LLM网络中,矩阵$B \in \mathbb{R}^{n \times d}$可以被视为层的输出,而$A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$可以被视为层的输入。矩阵版本的$x$可以被视为$QK^\top$,而$Y$可以被视为$V$.我们提供了一种迭代循环算法来训练损失函数$L(X,Y)$,该算法在$\epsilon$阈值下运行时间为$\widetilde{O}( ({\cal T}_{\mathrm{mat}(n,n,d) + {\cal T}_{\mathrm{mat}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$。这里的${\cal T}_{\mathrm{mat}(a,b,c)$表示矩阵$a \times b$的乘法,$\omega\approx 2.37$是矩阵乘法的废入。

Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

  • paper_url: http://arxiv.org/abs/2309.07402
  • repo_url: None
  • paper_authors: Jiaren Xiao, Quanyu Dai, Xiao Shen, Xiaochen Xie, Jing Dai, James Lam, Ka-Wai Kwok
  • for: 本研究目的是解决在图像上面临着数据标注成本高的情况下,采用 semi-supervised domain adaptation (SSDA) 技术来优化图像中节点的分类。
  • methods: 本研究提出了一种新的方法 called SemiGCL,它利用图像的本地和全局视图来生成有用的节点表示,并通过对不同视图的节点表示进行对比来增强节点表示的有用性。此外,SemiGCL 还利用了最大 entropy 损失来降低领域差异。
  • results: 实验结果表明,SemiGCL 在 SSDA 任务中表现出色,高于现有的基线方法。
    Abstract Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. To tackle the SSDA problem on graphs, a novel method called SemiGCL is proposed, which benefits from graph contrastive learning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks.
    摘要 很多现实应用中会遇到图像标签稀缺的问题,这是因为数据标签的成本很高。为此,半经典领域适应(SSDA)在图上进行了应用,以利用来自源图的标注知识来帮助目标图的节点分类。然而,到目前为止,已有的跨图节点分类方法并未正式地考虑SSDA问题。为解决SSDA问题,一种名为SemiGCL的新方法被提出,它利用图形对比学习和最大 entropy 训练来生成有用的节点表示。此外,SemiGCL通过对不同视图中的节点表示进行对比,以提高节点表示的有用性。同时,SemiGCL通过对无标签目标节点的 entropy 损失进行反向优化,以减少领域差异。实验结果表明,SemiGCL在 benchmark 数据集上的性能高于当前的基eline。

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

  • paper_url: http://arxiv.org/abs/2309.07937
  • repo_url: None
  • paper_authors: Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe
  • for: 本研究提出了一种基于语音特征的多任务语言模型(VoxtLM),可以执行四项任务:语音识别、语音生成、文本生成和语音续写。
  • methods: VoxtLM将文本词汇与自动学习的语音特征拼接起来,使用特殊符号实现多任务学习。与单任务模型相比,VoxtLM在语音生成方面具有显著的改善,语音清晰度从28.9降低到5.6,对象质量从2.68提高到3.90。
  • results: VoxtLM在语音生成、语音识别和文本生成等方面都超过单任务模型的表现。模型使用公开的数据和训练规则,并将模型检查点开源,以便完全可重现。
    Abstract We propose a decoder-only language model, \textit{VoxtLM}, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation. VoxtLM integrates text vocabulary with discrete speech tokens from self-supervised speech features and uses special tokens to enable multitask learning. Compared to a single-task model, VoxtLM exhibits a significant improvement in speech synthesis, with improvements in both speech intelligibility from 28.9 to 5.6 and objective quality from 2.68 to 3.90. VoxtLM also improves speech generation and speech recognition performance over the single-task counterpart. VoxtLM is trained with publicly available data and training recipes and model checkpoints will be open-sourced to make fully reproducible work.
    摘要 我们提议一个只有decoder的语言模型,名为VoxtLM,可以完成四个任务:语音识别、语音生成、文本生成和语音续写。VoxtLM将文本词汇与自然语言的特征Token结合在一起,并使用特殊的Token来实现多任务学习。相比单任务模型,VoxtLM在语音生成方面表现出了显著的改善,其中语音清晰度从28.9降低到5.6,对象质量从2.68提高到3.90。VoxtLM还提高了语音识别和语音生成的性能。VoxtLM通过公共可用数据和训练课程来训练,并将模型检查点开源,以便实现完全可重现的工作。Note: "VoxtLM" is a name I translated as "语音LM" (speech language model) in Simplified Chinese.

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

  • paper_url: http://arxiv.org/abs/2309.07391
  • repo_url: https://github.com/habla-liaa/encodecmae
  • paper_authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer
  • for: 这项研究的目的是学习一种通用的音频表示学习模型,可以用于各种音频处理任务,如语音、音乐和环境声。
  • methods: 这项研究使用了基于自我supervised模型(如BERT)的方法,适应音频处理的需求。这些模型利用文本的整数性,因此对音频信号进行映射或改变学习目标是必要的。这项研究使用EnCodec neural audio codec生成扩展目标,并使用masked autoencoder(MAE)来学习通用音频模型。
  • results: 研究在各种音频任务中表现出色,包括语音、音乐和环境声,并与现有音频表示模型的性能相比或更好。
    Abstract The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music or environmental sounds. To approach this problem, methods inspired by self-supervised models from NLP, like BERT, are often used and adapted to audio. These models rely on the discrete nature of text, hence adopting this type of approach for audio processing requires either a change in the learning objective or mapping the audio signal to a set of discrete classes. In this work, we explore the use of EnCodec, a neural audio codec, to generate discrete targets for learning an universal audio model based on a masked autoencoder (MAE). We evaluate this approach, which we call EncodecMAE, on a wide range of audio tasks spanning speech, music and environmental sounds, achieving performances comparable or better than leading audio representation models.
    摘要 “ universal audio representation learning 的目的是获得可以用于多种下游任务中的基础模型,包括语音、音乐或环境 зву频谱。为了解决这个问题,通常使用受自动批评模型(BERT)所启发的方法,并将其适应到音频处理中。这些模型对于文本的类别性很有帮助,因此为了将这种方法应用到音频处理中,需要对学习目标或音频信号进行变数映射。在这个工作中,我们探索使用 EnCodec,一个神经网络对话器,生成类别目标,以便透过隐藏autoencoder(MAE)进行学习 universal audio 模型。我们评估这种方法,我们称之为 EncodecMAE,在丰富的音频任务中,包括语音、音乐和环境音频,实现了与领先的音频表现模型相同或更好的性能。”

Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.07383
  • repo_url: None
  • paper_authors: Ali Bouland, Shengyuan Niu, Sai Tej Paruchuri, Andrew Kurdila, John Burns, Eugenio Schuster
  • for: 研究了一些值函数近似的收敛率,其出现在一组嵌入kernel空间(RKHS)$H(\Omega)$中的优化控制问题中。
  • methods: 通过将优化控制问题划入特定的本地空间中, derive strong rates of convergence for the operator equation that enables offline approximations in policy iteration。
  • results: derive explicit upper bounds on error in value function approximations in terms of power function $\Pwr_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions.
    Abstract This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(\Omega)$. By casting an optimal control problem in a specific class of native spaces, strong rates of convergence are derived for the operator equation that enables offline approximations that appear in policy iteration. Explicit upper bounds on error in value function approximations are derived in terms of power function $\Pwr_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions.
    摘要

Beta quantile regression for robust estimation of uncertainty in the presence of outliers

  • paper_url: http://arxiv.org/abs/2309.07374
  • repo_url: None
  • paper_authors: Haleh Akrami, Omar Zamzam, Anand Joshi, Sergul Aydore, Richard Leahy
  • For: The paper is written for estimating aleatoric uncertainty in deep neural networks and generating prediction intervals, with a focus on critical applications such as clinical diagnosis.* Methods: The paper proposes a robust solution for quantile regression that incorporates concepts from robust divergence, and compares the performance of this method with two existing methods (least trimmed quantile regression and robust regression based on case-specific parameter regularization) in a simple real dataset and a medical imaging translation task using diffusion models.* Results: The proposed method is shown to be effective in addressing the problem of outlier features in deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, and can provide more accurate and robust results compared to existing methods.
    Abstract Quantile Regression (QR) can be used to estimate aleatoric uncertainty in deep neural networks and can generate prediction intervals. Quantifying uncertainty is particularly important in critical applications such as clinical diagnosis, where a realistic assessment of uncertainty is essential in determining disease status and planning the appropriate treatment. The most common application of quantile regression models is in cases where the parametric likelihood cannot be specified. Although quantile regression is quite robust to outlier response observations, it can be sensitive to outlier covariate observations (features). Outlier features can compromise the performance of deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, potentially leading to misleading conclusions. To address this problem, we propose a robust solution for quantile regression that incorporates concepts from robust divergence. We compare the performance of our proposed method with (i) least trimmed quantile regression and (ii) robust regression based on the regularization of case-specific parameters in a simple real dataset in the presence of outlier. These methods have not been applied in a deep learning framework. We also demonstrate the applicability of the proposed method by applying it to a medical imaging translation task using diffusion models.
    摘要 量词回归(QR)可以用于深度神经网络中计算 aleatoric 不确定性,并生成预测范围。量化不确定性 particuarly 重要在医疗应用中,如诊断、诊断和治疗规划。量词回归模型通常用于 Parametric likelihood 无法指定的情况下。虽然量词回归很 robust 对于异常响应观测,但可能敏感于异常 covariate 观测(特征)。异常特征可能会导致深度学习回归问题如 Style 翻译、图像重建和深度异常检测出现问题,从而导致误导性的结论。为解决这个问题,我们提出了一种 Robust 解决方案,即 incorporating robust 分配概念。我们与(i)最多截取量词回归和(ii)基于 Regularization 的 Case-specific 参数规则化进行比较性能。这些方法在深度学习框架中没有应用。我们还示出了提议的方法的可行性,通过应用它到一个医学图像翻译任务中。

Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing

  • paper_url: http://arxiv.org/abs/2309.08477
  • repo_url: None
  • paper_authors: Hadar Szostak, Kobi Cohen
  • for: 这个研究是用来解决多代理 active hypothesis testing(AHT)问题的,多个代理在环境中搜集不确定观测数据,并将其转换为几个假设。
  • methods: 这个研究使用了深度多代理学习(deep multi-agent reinforcement learning,MARL)方法来解决AHT问题,每个代理使用一个训练好的深度神经网来将其状态映射到动作(抽样规则或停止规则),以最小化bayes risk。
  • results: 实验结果显示了代理们可以透过 MARLA 学习协力策略,提高性能,同时与单代理学习方法(single-agent learning)相比,MARL 表现更好。
    Abstract We consider a decentralized formulation of the active hypothesis testing (AHT) problem, where multiple agents gather noisy observations from the environment with the purpose of identifying the correct hypothesis. At each time step, agents have the option to select a sampling action. These different actions result in observations drawn from various distributions, each associated with a specific hypothesis. The agents collaborate to accomplish the task, where message exchanges between agents are allowed over a rate-limited communications channel. The objective is to devise a multi-agent policy that minimizes the Bayes risk. This risk comprises both the cost of sampling and the joint terminal cost incurred by the agents upon making a hypothesis declaration. Deriving optimal structured policies for AHT problems is generally mathematically intractable, even in the context of a single agent. As a result, recent efforts have turned to deep learning methodologies to address these problems, which have exhibited significant success in single-agent learning scenarios. In this paper, we tackle the multi-agent AHT formulation by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. This algorithm, named Multi-Agent Reinforcement Learning for AHT (MARLA), operates at each time step by having each agent map its state to an action (sampling rule or stopping rule) using a trained deep neural network with the goal of minimizing the Bayes risk. We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance using MARLA. Furthermore, we demonstrate the superiority of MARLA over single-agent learning approaches. Finally, we provide an open-source implementation of the MARLA framework, for the benefit of researchers and developers in related domains.
    摘要 我们考虑了一个分散式的活动假设测试(AHT)问题,多个代理人从环境中获得不确定的观察,以确定正确的假设。在每个时间步骤中,代理人可以选择抽样动作。这些不同的动作将从不同的分布中获得观察,每个假设都有相应的分布。代理人协力完成任务,并在有限的通信频道上进行讯息交换。目的是发展一个多代理人政策,以最小化巴耶斯风险。这种风险包括抽样成本和代理人宣布假设时的终端成本。实现结构化政策的构成是一个严格的数学问题,即使是单一代理人的情况下也是如此。因此,最近的努力是使用深度学习方法来解决这些问题,这些方法在单一代理人学习情况下已经表现出了很大的成功。在这篇论文中,我们使用名为多代理人强化学习 для AHT(MARLA)的新算法,它在每个时间步骤中让每个代理人使用训练的深度神经网来将其状态映射到动作(抽样规则或停止规则),以最小化巴耶斯风险。我们提供了一个完整的实验结果,详细展示了代理人们可以通过MARLA学习合作策略,并提高性能。此外,我们还证明了MARLA比单一代理人学习方法更有优势。最后,我们提供了一个开源的MARLA框架,以便研究人员和相关领域的开发人员参考。

eess.IV - 2023-09-14

Live Iterative Ptychography with projection-based algorithms

  • paper_url: http://arxiv.org/abs/2309.08639
  • repo_url: https://github.com/sp-uhh/livepty
  • paper_authors: Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann
  • for: 这个研究证明了在扫描过程中,ptychographic phase problem可以实时解决,而不需要预先处理数据。
  • methods: 这种方法基于投影方法,如Error Reduction (ER)和Difference Map (DM),但是它提供了实时视觉反馈、对象重建和自适应扫描功能。
  • results: 研究表明,live variants of projection-based methods可以在相同的计算资源下实现更高质量的重建,并且可以在实际应用中提供实时视觉反馈。
    Abstract In this work, we demonstrate that the ptychographic phase problem can be solved in a live fashion during scanning, while data is still being collected. We propose a generally applicable modification of the widespread projection-based algorithms such as Error Reduction (ER) and Difference Map (DM). This novel variant of ptychographic phase retrieval enables immediate visual feedback during experiments, reconstruction of arbitrary-sized objects with a fixed amount of computational resources, and adaptive scanning. By building upon the Real-Time Iterative Spectrogram Inversion (RTISI) family of algorithms from the audio processing literature, we show that live variants of projection-based methods such as DM can be derived naturally and may even achieve higher-quality reconstructions than their classic non-live counterparts with comparable effective computational load.
    摘要 在这个工作中,我们证明了ptychographic phase problem可以在扫描过程中实时解决,而数据仍在被收集。我们提议一种通用的修改,以适应广泛的投影基本算法,如Error Reduction (ER)和Difference Map (DM)。这种新的ptychographic phase恢复方法允许实时视觉反馈、 fixed computational resources 扫描任意大小的对象、和自适应扫描。通过基于音频处理文献的Real-Time Iterative Spectrogram Inversion (RTISI)家族算法,我们显示了live variants of projection-based方法,如DM,可以自然地 derivation,并且可能 même achieve higher-quality reconstructions than their classic non-live counterparts with comparable effective computational load。

MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding

  • paper_url: http://arxiv.org/abs/2309.07589
  • repo_url: https://github.com/yefeng00/EEV-0.4
  • paper_authors: Chuanmin Jia, Feng Ye, Fanke Dong, Kai Lin, Leonardo Chiariglione, Siwei Ma, Huifang Sun, Wen Gao
  • for: 这份研究是为了推动人工智能(AI)技术的标准化,特别是用于神经网络对动画的处理、编码和传输。
  • methods: 这份研究使用了神经网络技术来实现端到端优化的神经动画编码,并且不受传统的混合架构限制。
  • results: 这份研究显示了EEV模型在比较标准H.266/VVC的评估指标下表现更好,并且可以更好地实现高品质动画资料的压缩。
    Abstract The rapid advancement of artificial intelligence (AI) technology has led to the prioritization of standardizing the processing, coding, and transmission of video using neural networks. To address this priority area, the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a suite of standards called MPAI-EEV for "end-to-end optimized neural video coding." The aim of this AI-based video standard project is to compress the number of bits required to represent high-fidelity video data by utilizing data-trained neural coding technologies. This approach is not constrained by how data coding has traditionally been applied in the context of a hybrid framework. This paper presents an overview of recent and ongoing standardization efforts in this area and highlights the key technologies and design philosophy of EEV. It also provides a comparison and report on some primary efforts such as the coding efficiency of the reference model. Additionally, it discusses emerging activities such as learned Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under development, or in the exploration phase. With a focus on UAV video signals, this paper addresses the current status of these preliminary efforts. It also indicates development timelines, summarizes the main technical details, and provides pointers to further points of reference. The exploration experiment shows that the EEV model performs better than the state-of-the-art video coding standard H.266/VVC in terms of perceptual evaluation metric.
    摘要 人工智能技术的快速发展导致了标准化处理、编程和传输视频使用神经网络的优先级。为解决这一优先领域,人工智能视频编码标准(MPAI)小组在开发一个名为“终端优化神经视频编码”(MPAI-EEV)的标准集。该项目的目标是通过使用训练过的神经网络编码技术来压缩表示高精度视频数据的比特数。这种方法不受传统的混合框架下的数据编码方法的限制。本文提供了最近和进行中的标准化努力的概述,以及EEV的关键技术和设计哲学。它还对参考模型的编码效率进行了比较报告。此外,它还讨论了正在进行的learned Unmanned-Aerial-Vehicles(UAV)视频编码活动,包括目前的规划、开发和探索阶段。关注UAV视频信号,本文介绍了当前的初步努力,包括发展时间表、主要技术细节和更多参考点。实验表明,EEV模型在可视评价指标上表现更好于当前最佳视频编码标准H.266/VVC。

Oscillating-gradient spin-echo diffusion-weighted imaging (OGSE-DWI) with a limited number of oscillations: II. Asymptotics

  • paper_url: http://arxiv.org/abs/2309.07484
  • repo_url: None
  • paper_authors: Jeff Kershaw, Takayuki Obata
  • for: 这个研究的目的是研究气压gradient spin-echo扩散磁共振成像(OGSE-DWI)技术在复频域中的应用,以研究复杂的湿物质的微结构。
  • methods: 这个研究使用了OGSE-DWI技术,通过测量$U_{kk}$和$U_{k0}$这两个量来间接获取分子扩散 спектrum的信息。
  • results: 研究发现,在低频和高频限制下,气压gradient spin-echo扩散磁共振成像信号的各自特征具有普适的行为,这些行为是基于样本的全局组织结构。
    Abstract Oscillating-gradient spin-echo diffusion-weighted magnetic resonance imaging (OGSE-DWI) has been promoted as a promising technique for studying the microstructure of complex hydrated matter in the frequency domain. The target of the OGSE-DWI technique is the spectral density of molecular diffusion, $u_{2}(\omega)$, which is predicted to obey a set of asymptotic universality relations that are linked to the global organisation of the sample. So, in principle the complex microstructure of a medium can be classified by measuring the spectral density in its low- and high-frequency limits. However, due to practical limitations on the spectral resolution and range of the technique, it is not possible to directly sample the spectral density with OGSE-DWI. Rather, information about the spectral density can be obtained only indirectly through the quantities $U_{kk}$ & $U_{k0}$, which are filtered representations of $u_{2}(\omega)$. The purpose of this study is to investigate how the universal behaviour of $u_{2}(\omega)$ emerges in the asymptotic behaviour of OGSE-DWI signal.
    摘要 oscilating-gradient spin-echo diffusion-weighted magnetic resonance imaging (OGSE-DWI) 被广泛推广为研究复杂湿物质微结构的有效技术。 OGSE-DWI 技术的目标是分子扩散 спектル的密度函数 $u_{2}(\omega)$,这被预计遵循一系列的极限universality关系,与样本的全局结构相关。因此,通过测量低频和高频限的 spectral density,可以直接分类复杂medium的微结构。然而,由于 OGSE-DWI 技术的实际 spectral resolution和范围有限制,因此不能直接测量 spectral density。相反,通过 quantities $U_{kk}$ 和 $U_{k0}$,这些是 $u_{2}(\omega)$ 的过滤表示,可以获得有关 spectral density 的信息。本研究的目标是研究 OGSE-DWI 信号的极限行为中 universality 行为如何emerge。

CvFormer: Cross-view transFormers with Pre-training for fMRI Analysis of Human Brain

  • paper_url: http://arxiv.org/abs/2309.07940
  • repo_url: None
  • paper_authors: Xiangzhu Meng, Qiang Liu, Shu Wu, Liang Wang
  • for: 这 paper 旨在Addressing the issue of neglecting complementary information between region of interest (RoI) nodes and their connectivities in human brain functional magnetic resonance imaging (fMRI) data, by proposing a novel cross-view analysis method called Cross-view transFormers (CvFormer).
  • methods: CvFormer 使用 RoI 和连接性编码模块生成两个不同视图的人脑Brain, 然后使用基本 transformer 模块处理 RoI 和子连接 tokens,并将两个视图的信息集成在 cross-view 模块中。此外,CvFormer 还使用全局 токен作为每个分支的查询,以便在 cross-view 模块中交换信息,这需要 linear 时间的计算和存储复杂度而不是quadratic 时间。
  • results: 实验结果表明,提出的 CvFormer 可以在 two public ABIDE 和 ADNI 数据集上显著提高表达,证明其效果和超越性。
    Abstract In recent years, functional magnetic resonance imaging (fMRI) has been widely utilized to diagnose neurological disease, by exploiting the region of interest (RoI) nodes as well as their connectivities in human brain. However, most of existing works only rely on either RoIs or connectivities, neglecting the potential for complementary information between them. To address this issue, we study how to discover the rich cross-view information in fMRI data of human brain. This paper presents a novel method for cross-view analysis of fMRI data of the human brain, called Cross-view transFormers (CvFormer). CvFormer employs RoI and connectivity encoder modules to generate two separate views of the human brain, represented as RoI and sub-connectivity tokens. Then, basic transformer modules can be used to process the RoI and sub-connectivity tokens, and cross-view modules integrate the complement information across two views. Furthermore, CvFormer uses a global token for each branch as a query to exchange information with other branches in cross-view modules, which only requires linear time for both computational and memory complexity instead of quadratic time. To enhance the robustness of the proposed CvFormer, we propose a two-stage strategy to train its parameters. To be specific, RoI and connectivity views can be firstly utilized as self-supervised information to pre-train the CvFormer by combining it with contrastive learning and then fused to finetune the CvFormer using label information. Experiment results on two public ABIDE and ADNI datasets can show clear improvements by the proposed CvFormer, which can validate its effectiveness and superiority.
    摘要 近年来,功能核磁共振成像(fMRI)已广泛应用于诊断神经系统疾病,利用人脑中的区域兴趣(RoI)节点和其连接性。然而,大多数现有工作只是利用RoI或连接性,忽略了这两者之间的可能性。为了解决这个问题,我们研究如何在人脑fMRI数据中发现富有跨视图信息。本文提出了一种新的跨视图分析人脑fMRI数据的方法,称为跨视图变换器(CvFormer)。CvFormer使用RoI和连接性编码模块生成两个分开的人脑视图,即RoI和子连接度 tokens。然后,基本变换模块可以处理RoI和子连接度 tokens,并将两个视图的补充信息集成起来。此外,CvFormer使用每个分支的全局token作为查询,在跨视图模块中交换信息,只需要线性时间的计算和存储复杂度,而不是 quadratic时间。为了增强提出的CvFormerRobustness,我们提出了一种两阶段策略来训练其参数。具体来说,RoI和连接视图可以在先使用自我超视的方式将CvFormer pré-训练,然后将其与标签信息混合以进行精度训练。实验结果表明,提出的CvFormer在两个公共的ABIDE和ADNI数据集上具有显著改进,这可以证明其效果和优势。

VCD: A Video Conferencing Dataset for Video Compression

  • paper_url: http://arxiv.org/abs/2309.07376
  • repo_url: None
  • paper_authors: Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam, Yasaman Hosseinkashi
  • for: The paper is written for evaluating video codecs for real-time communication in video conferencing scenarios.
  • methods: The paper presents a new dataset called the Video Conferencing Dataset (VCD) that includes a wide variety of camera qualities and spatial and temporal information.
  • results: The paper reports the compression efficiency of several popular video codecs (H.264, H.265, H.266, and AV1) in low-delay settings on VCD and compares them with non-video conferencing datasets. The results show that the source quality and scenarios have a significant effect on the compression efficiency of all the codecs.Here’s the simplified Chinese text for the three key points:
  • for: 这篇论文是为评估视频编码器在视频会议场景中的实时通信而写的。
  • methods: 论文提出了一个新的视频会议数据集(VCD),该数据集包含多种摄像头质量和空间和时间信息。
  • results: 论文Report了H.264、H.265、H.266和AV1等多种流行的视频编码器在VCD中的压缩效率,并与非视频会议数据集进行比较。结果显示源质量和场景具有重要的影响于所有编码器的压缩效率。
    Abstract Commonly used datasets for evaluating video codecs are all very high quality and not representative of video typically used in video conferencing scenarios. We present the Video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing. VCD includes a wide variety of camera qualities and spatial and temporal information. It includes both desktop and mobile scenarios and two types of video background processing. We report the compression efficiency of H.264, H.265, H.266, and AV1 in low-delay settings on VCD and compare it with the non-video conferencing datasets UVC, MLC-JVC, and HEVC. The results show the source quality and the scenarios have a significant effect on the compression efficiency of all the codecs. VCD enables the evaluation and tuning of codecs for this important scenario. The VCD is publicly available as an open-source dataset at https://github.com/microsoft/VCD.
    摘要 通常使用的视频编码器评估 dataset 都具有非常高的质量,而不代表实际视频会议场景。我们介绍了视频会议 Dataset (VCD),用于评估实时通信中的视频编码器,这是第一个专门针对视频会议的 dataset。VCD 包括多种摄像头质量和空间和时间信息,包括桌面和移动场景,以及两种视频背景处理方式。我们在 VCD 上测试了 H.264、H.265、H.266 和 AV1 编码器的压缩效率,并与非视频会议 dataset UVC、MLC-JVC 和 HEVC 进行比较。结果表明源质量和场景具有重要的影响于所有编码器的压缩效率。VCD 可以用于评估和调整编码器,以便在这一重要场景中实现优化。VCD 公共可用于开源 dataset,可以在 GitHub 上找到。

eess.SP - 2023-09-14

Efficient Rotating Synthetic Aperture Radar Imaging via Robust Sparse Array Synthesis

  • paper_url: http://arxiv.org/abs/2309.08038
  • repo_url: None
  • paper_authors: Wei Zhao, Cai Wen, Quan Yuan, Rong Zheng
  • for: 提高ROSAR的实时性和计算效率,使其在各种应用中更加广泛应用。
  • methods: 基于可重要稳定的简单频率阵列synthesis技术,通过范围维度匹配滤波和方向维度匹配滤波,实现高效的SAR图像生成。
  • results: 比BPA更高效,但图像质量与BPA相当,计算时间减少90%。
    Abstract Rotating Synthetic Aperture Radar (ROSAR) can generate a 360$^\circ$ image of its surrounding environment using the collected data from a single moving track. Due to its non-linear track, the Back-Projection Algorithm (BPA) is commonly used to generate SAR images in ROSAR. Despite its superior imaging performance, BPA suffers from high computation complexity, restricting its application in real-time systems. In this paper, we propose an efficient imaging method based on robust sparse array synthesis. It first conducts range-dimension matched filtering, followed by azimuth-dimension matched filtering using a selected sparse aperture and filtering weights. The aperture and weights are computed offline in advance to ensure robustness to array manifold errors induced by the imperfect radar rotation. We introduce robust constraints on the main-lobe and sidelobe levels of filter design. The resultant robust sparse array synthesis problem is a non-convex optimization problem with quadratic constraints. An algorithm based on feasible point pursuit and successive convex approximation is devised to solve the optimization problem. Extensive simulation study and experimental evaluations using a real-world hardware platform demonstrate that the proposed algorithm can achieve image quality comparable to that of BPA, but with a substantial reduction in computational time up to 90%.
    摘要 绕转synthetic aperture radar(ROSAR)可以使用收集的数据生成360度的环境图像,而不需要多个静止轨迹。由于其非线性轨迹,因此通常使用回投算法(BPA)生成SAR图像。尽管它的图像性能更高,但BPA受到高计算复杂性的限制,因此无法应用于实时系统。在这篇论文中,我们提出了一种高效的图像方法,基于强健的稀疏阵列synthesis。它首先进行距离维度匹配滤波,然后使用选择的稀疏阵列和滤波加重来进行 azimuth维度匹配滤波。阵列和加重在线上预计算,以确保对雷达旋转induced的阵列 manifold errors 的Robustness。我们引入了主ibeam和副ibeam水平的Robust constraints。结果的强健稀疏阵列synthesis问题是非束convEX optimization问题,其中 quadratic constraints。我们提出了基于可行点追求和successive convex approximation的算法来解决优化问题。广泛的模拟研究和使用实际硬件平台进行的实验评估表明,提议的算法可以实现与BPA相同的图像质量,但计算时间减少了90%。

On Distributed and Asynchronous Sampling of Gaussian Processes for Sequential Binary Hypothesis Testing

  • paper_url: http://arxiv.org/abs/2309.07855
  • repo_url: None
  • paper_authors: Nandan Sriranga, Saikiran Bulusu, Baocheng Geng, Pramod K. Varshney
  • for: 本研究假设了一个二进制顺序假设测试问题,其中测试数据来自分布式和异步探测器。研究者们分析了在分布式探测器中jointly宽感Stationary(WSS) Gaussian观测过程的采样时间对顺序测试的预期停止时间的影响。
  • methods: 研究者们使用了随机采样理论来分析顺序测试的性能。他们还使用了异步采样的概率 bound来分析顺序测试的性能。
  • results: 研究者们发现了采样时间对顺序测试的影响,并提供了采样时间的下界和上界。他们还通过数值计算来证明了理论结论的正确性。
    Abstract In this work, we consider a binary sequential hypothesis testing problem with distributed and asynchronous measurements. The aim is to analyze the effect of sampling times of jointly \textit{wide-sense stationary} (WSS) Gaussian observation processes at distributed sensors on the expected stopping time of the sequential test at the fusion center (FC). The distributed system is such that the sensors and the FC sample observations periodically, where the sampling times are not necessarily synchronous, i.e., the sampling times at different sensors and the FC may be different from each other. \color{black} The sampling times, however, are restricted to be within a time window and a sample obtained within the window is assumed to be \textit{uncorrelated} with samples outside the window. We also assume that correlations may exist only between the observations sampled at the FC and those at the sensors in a pairwise manner (sensor pairs not including the FC have independent observations). The effect of \textit{asynchronous} sampling on the SPRT performance is analyzed by obtaining bounds for the expected stopping time. We illustrate the validity of the theoretical results with numerical results.
    摘要 在这项工作中,我们考虑了一个二进制顺序假设测试问题,其中测量过程是分布式和 asynchronous。我们想要分析在分布式感知器和汇中心(FC)之间的共同 Gaussian 观测过程中,采样时间的影响。我们假设了所有感知器和FC在Periodically采样观测数据,但采样时间不一定同步。具体来说,采样时间在不同的感知器和FC之间可能不同。然而,采样时间都限制在一个时间窗口内,而在这个窗口内采样的样本假设是独立的。我们还假设了感知器对感知器之间的观测是对称的,即感知器对感知器之间的观测是独立的。我们分析了异步采样对 SPRT 性能的影响,并获得了预期停止时间的下界。我们通过数据示例来证明理论结果的正确性。

Kullback-Leibler Divergence-Guided Copula Statistics-Based Blind Source Separation of Dependent Signals

  • paper_url: http://arxiv.org/abs/2309.07814
  • repo_url: None
  • paper_authors: Pooja Algikar, Lamine Mili, Kiran Karra, Mohsen Ben Hassine
  • for: 该文章提出了一种基于 copula 统计的盲源分离方法,用于分离线性混合的依赖源信号。
  • methods: 该方法基于 copula 统计量度源信号的非线性依赖关系,并使用 Kullback-Leibler 差分来估算依赖结构。
  • results: 实验结果表明,基于 copula 统计的盲源分离方法在11-Bus 4-Machine 系统中的时域分析数据上 converges faster 并且表现更好,相比之下 state-of-the-art 盲源分离方法。
    Abstract In this paper, we propose a blind source separation of a linear mixture of dependent sources based on copula statistics that measure the non-linear dependence between source component signals structured as copula density functions. The source signals are assumed to be stationary. The method minimizes the Kullback-Leibler divergence between the copula density functions of the estimated sources and of the dependency structure. The proposed method is applied to data obtained from the time-domain analysis of the classical 11-Bus 4-Machine system. Extensive simulation results demonstrate that the proposed method based on copula statistics converges faster and outperforms the state-of-the-art blind source separation method for dependent sources in terms of interference-to-signal ratio.
    摘要 在这篇论文中,我们提出了一种基于 copula 统计的无参源分离方法,用于分解线性混合的相关源信号。我们假设源信号是站ARY的。方法的目标是将 copula density function 的两个分布匹配,以最小化库拉-莱布勒散度的差异。我们对来自经典 11-Bus 4-Machine 系统的时域分析数据进行了应用。广泛的 simulate 结果表明,基于 copula 统计的方法在相关源信号分离方面比现有的方法更快 converge 和有更高的干扰比信号比。

Enhancing Performance, Calibration Time and Efficiency in Brain-Machine Interfaces through Transfer Learning and Wearable EEG Technology

  • paper_url: http://arxiv.org/abs/2309.07798
  • repo_url: None
  • paper_authors: Xiaying Wang, Lan Mei, Victor Kartsch, Andrea Cossettini, Luca Benini
  • for: 这个研究旨在解决助生技术中的持续性问题,协助患有motor impairments的个体控制设备和实现功能恢复。
  • methods: 本研究使用了一个 tiny CNN-based Transfer Learning (TL) 方法,与舒适的 wearable EEG 头盔相结合。这个新型的 wearable EEG 设备使用了软干电极,能够在头盔上进行处理。
  • results: 研究获得了多Session的 motor-movement EEG 数据,并在 TL 下实现了96% 的 inter-session 精度,大大减少了准确时间和使用度。透过在 Edge 上执行推断 every 100ms,系统估计可以实现30小时的电池生命。
    Abstract Brain-machine interfaces (BMIs) have emerged as a transformative force in assistive technologies, empowering individuals with motor impairments by enabling device control and facilitating functional recovery. However, the persistent challenge of inter-session variability poses a significant hurdle, requiring time-consuming calibration at every new use. Compounding this issue, the low comfort level of current devices further restricts their usage. To address these challenges, we propose a comprehensive solution that combines a tiny CNN-based Transfer Learning (TL) approach with a comfortable, wearable EEG headband. The novel wearable EEG device features soft dry electrodes placed on the headband and is capable of on-board processing. We acquire multiple sessions of motor-movement EEG data and achieve up to 96% inter-session accuracy using TL, greatly reducing the calibration time and improving usability. By executing the inference on the edge every 100ms, the system is estimated to achieve 30h of battery life. The comfortable BMI setup with tiny CNN and TL paves the way to future on-device continual learning, essential for tackling inter-session variability and improving usability.
    摘要 Brain-machine interfaces (BMIs) 蜕化为助手技术的新动力,激活人们的 motor 功能障碍,并且使得设备控制和功能恢复更加容易。然而,在每次使用时需要时间consuming的 calibration 仍然是一大障碍。此外,现有的设备舒适度还是一个限制因素。为了解决这些挑战,我们提出了一个综合解决方案,结合 tiny CNN 基于传输学习 (TL) 的方法,以及舒适可穿戴的 EEG 头盔。这种新的 EEG 设备采用软干电极置于头盔上,并可以在设备上进行处理。我们收集了多个 motor 运动 EEG 数据,并使用 TL 实现了96%的session accuracy,从而大幅减少了 calibration 时间,并提高了使用性。通过在边缘进行每100ms的执行推理,系统估计可以达到30小时的电池寿命。舒适的 BMI 设置,结合 tiny CNN 和 TL,为未来的在设备上进行不断学习做出了重要贡献,以解决inter-session variability和提高使用性。

Stochastic Phased Array Performance Indicators for Quality-of-Service-Enhanced Massive MIMO

  • paper_url: http://arxiv.org/abs/2309.07740
  • repo_url: None
  • paper_authors: Noud Kanters, Andrés Alayón Glazunov
  • for: 本研究旨在描述基站(BS)装备不同物理数组天线时,信号吞吐量(SINR)的表达方式,具体来说是通过两个基本的优势因素(FoM):一是实时有效增益(IEG),二是扫描频率干扰相关(BCC)。
  • methods: 本研究使用了全数字零强制扫描(FD ZF)天线扫描算法,并研究了不同天线排布的影响。
  • results: 研究结果表明,高IEG和低BCC的天线排布可以提高融合总bitrate和减少调度需求。
    Abstract In this paper, we show that the signal-to-interference-plus-noise ratio (SINR) at a base station (BS) equipped with an arbitrary physical array antenna can be expressed as a function of two fundamental figures-of-merit (FoMs): (I) the instantaneous effective gain (IEG), and (II) the beamforming-channel correlation (BCC). These two FoMs are functions of the array antenna layout, the antenna elements, the propagation channel and the applied signal processing algorithms, and hence they are random variables (RVs) in general. We illustrate that both FoMs provide essential insights for quality-of-service (QoS)-based phased array design by investigating their statistics for BSs applying full-digital (FD) zero forcing (ZF) beamforming. We evaluate various array designs and show that arrays with higher IEGs and a reduced probability of low BCCs can increase the ergodic sum rate and reduce the need for scheduling.
    摘要 在这篇论文中,我们显示了基站(BS)装备任意物理数组天线的信号吞吐量较干扰噪声比(SINR)可以表示为两个基本的优势因素(FoM):(I)即时效果的辐射增益(IEG),和(II)扫描通道卷积(BCC)。这两个 FoM 是天线阵列布局、天线元件、传播通道和应用的信号处理算法的函数,因此它们是随机变量(RV)的一般情况。我们表明了这两个 FoM 对服务质量(QoS)基于天线阵列设计带来了重要的洞察,通过研究这两个 FoM 的统计特性来评估不同的天线阵列设计。我们评估了多种阵列设计,并显示了高IEG和低BCC的概率可以提高杂合辐射率和减少调度需求。

Performance Analysis of RIS/STAR-IOS-aided V2V NOMA/OMA Communications over Composite Fading Channels

  • paper_url: http://arxiv.org/abs/2309.07738
  • repo_url: None
  • paper_authors: Farshad Rostami Ghadi, Masoud Kaveh, Diego Martin
  • For: 本研究探讨了基于协助器(RIS)和同时发射和反射智能多面(STAR-IOS)的车辆到车辆(V2V)通信的性能,尤其是在非对称多Access(NOMA)和对称多Access(OMA)方案下。* Methods: 本研究使用了中心限定定理(CLT)来 deriv 同aje channel的PDF和CDF,并使用了Jensen不等式来提出一个Upper bound of 平均容量(EC),以及一个分析表达式来计算智能交通系统(ITS)中的能量效率(EE)。* Results: 研究结果显示,在V2V通信中应用RIS/STAR-RIS可以显著改善智能交通系统的性能,并且在NOMA和OMA场景下,considering NOMA scheme可以提供更好的性能,包括出入口概率(OP)、平均容量(EC)和能量效率(EE)。
    Abstract This paper investigates the performance of vehicleto-vehicle (V2V) communications assisted by a reconfigurable intelligent surface (RIS) and a simultaneous transmitting and reflecting intelligent omni-surface (STAR-IOS) under nonorthogonal multiple access (NOMA) and orthogonal multiple access (OMA) schemes. In particular, we consider that the RIS is close to the transmitter vehicle while the STAR-IOS is near the receiver vehicles. In addition, we assume that the STAR-IOS exploits the energy-splitting (ES) protocol for communication and the fading channels between the RIS and STAR-IOS follow composite Fisher-Snedecor F distribution. Under such assumptions, we first use the central limit theorem (CLT) to derive the PDF and the CDF of equivalent channels at receiver vehicles, and then, we derive the closed-form expression of outage probability (OP) under NOMA/OMA scenarios. Additionally, by exploiting Jensen's inequality, we propose an upper bound of the ergodic capacity (EC), and then, we derive an analytical expression of the energy efficiency (EE) for both NOMA and OMA cases. Further, our analytical results, which are double-checked with the Monte-Carlo simulation, reveal that applying RIS/STAR-RIS in V2V communications can significantly improve the performance of intelligent transportation systems (ITS). Besides, the results indicate that considering the NOMA scheme provides better performance in terms of the OP, EC, and EE as compared with the OMA case for the considered V2V communication.
    摘要 Using the central limit theorem (CLT), we first derive the probability distribution function (PDF) and the cumulative distribution function (CDF) of the equivalent channels at the receiver vehicles. Then, we derive a closed-form expression for the outage probability (OP) under both NOMA and OMA scenarios. Additionally, by exploiting Jensen's inequality, we propose an upper bound on the ergodic capacity (EC) and derive an analytical expression for the energy efficiency (EE) for both NOMA and OMA cases.Our analytical results, which are verified through Monte-Carlo simulations, show that incorporating RIS/STAR-IOS in V2V communications can significantly improve the performance of intelligent transportation systems (ITS). Furthermore, our results indicate that the NOMA scheme outperforms the OMA case in terms of OP, EC, and EE for the considered V2V communication.Here is the Simplified Chinese translation of the text:这篇论文研究了使用拓展智能表面(RIS)和同时传输和反射智能全面(STAR-IOS)帮助汽车间通信,以及在不对称多接入(NOMA)和对称多接入(OMA)方案下的性能。特别是,我们假设RIS靠近发送器汽车,而STAR-IOS靠近接收器汽车。此外,我们假设STAR-IOS使用能量分配(ES)协议进行通信,并且在RIS和STAR-IOS之间的投射通道遵循复合拜让-斯内多辛(F)分布。使用中心限定定律(CLT),我们首先计算发送器汽车的等效通道PDF和CDF,然后计算OP的关键值。此外,我们通过贝叶斯不等式提出OP的上限,并计算NOMA和OMA情况下的吞吐量率(EC)和能效率(EE)的分析表达式。我们的分析结果,通过对 Monte-Carlo 仿真进行验证,表明在汽车间通信中应用 RIS/STAR-IOS 可以显著提高智能交通系统(ITS)的性能。此外,我们的结果还表明,在考虑 NOMA 方案时,OP、EC 和 EE 的性能都比 OMA 情况更好。

RIS-Assisted Physical Layer Authentication for 6G Endogenous Security

  • paper_url: http://arxiv.org/abs/2309.07736
  • repo_url: None
  • paper_authors: Ning Gao, Cen Li, Shengguo Meng, Wankai Tang, Shuchen Meng, Shi Jin, Michail Matthaiou
  • for: 增强未来各种设备访问安全性的技术之一,physical layer authentication (PLA)。
  • methods: 提议使用智能表面(RIS) assisted PLA系统,在PLA过程中,合法的发送器可以自定义通道指纹。
  • results: 通过分析Received Signal Strength(RSS)基于的骗取检测方法,验证了提议的架构的可行性。实验结果显示,在不同发送源位置和同一个发送源位置下,具有3.5%和76%的性能提升。
    Abstract The physical layer authentication (PLA) is a promising technology which can enhance the access security of a massive number of devices in the near future. In this paper, we propose a reconfigurable intelligent surface (RIS)-assisted PLA system, in which the legitimate transmitter can customize the channel fingerprints during PLA by controlling the ON-OFF state of the RIS. Without loss of generality, we use the received signal strength (RSS) based spoofing detection approach to analyze the feasibility of the proposed architecture. Specifically, based on the RSS, we derive the statistical properties of PLA and give some interesting insights, which showcase that the RIS-assisted PLA is theoretically feasible. Then, we derive the optimal detection threshold to maximize the performance in the context of the presented performance metrics. Next, the actual feasibility of the proposed system is verified via proof-of-concept experiments on a RIS-assisted PLA prototype platform. The experiment results show that there are 3.5% and 76% performance improvements when the transmission sources are at different locations and at the same location, respectively.
    摘要 物理层身份验证(PLA)是一种有前途的技术,可以增强未来大量设备的访问安全性。在这篇论文中,我们提议了基于智能表面(RIS)的PLA系统,其中有效发送器可以在PLA中自定义通道指纹。不失一般性,我们使用基于受信号强度(RSS)的伪造探测方法来分析提议的体系的可行性。特别是基于RSS,我们 derivates了PLA的统计性质并提供了一些有趣的发现,这些发现表明RIS协助的PLA是理论上可行的。然后,我们计算了最佳检测阈值以最大化性能在所提出的性能指标下。接下来,我们验证了提议的系统的实际可行性通过一个基于RIS协助PLA的原型平台的证明实验。实验结果显示,在不同位置的发送源情况下,提议的系统具有3.5%和76%的性能提升。

Semantic reconstruction of continuous language from MEG signals

  • paper_url: http://arxiv.org/abs/2309.07701
  • repo_url: None
  • paper_authors: Bo Wang, Xiran Xu, Longxiang Zhang, Boda Xiao, Xihong Wu, Jing Chen
  • for: 这个研究的目的是使用 magnetoencephalography (MEG) 信号来解读语言 semantics。
  • methods: 该研究使用了一种数据驱动的方法,首先使用对比学习训练多个参与者的决定模型,然后使用搜索算法生成基于 MEG 数据的连续语句。
  • results: 研究结果表明,提posed的连续词嵌入模型可以有效地利用每个参与者的专有信息和共享信息。此外,生成的文本具有与目标文本相似的相关性,BERTScore 平均值为 0.816,与之前的 fMRI 研究相当。
    Abstract Decoding language from neural signals holds considerable theoretical and practical importance. Previous research has indicated the feasibility of decoding text or speech from invasive neural signals. However, when using non-invasive neural signals, significant challenges are encountered due to their low quality. In this study, we proposed a data-driven approach for decoding semantic of language from Magnetoencephalography (MEG) signals recorded while subjects were listening to continuous speech. First, a multi-subject decoding model was trained using contrastive learning to reconstruct continuous word embeddings from MEG data. Subsequently, a beam search algorithm was adopted to generate text sequences based on the reconstructed word embeddings. Given a candidate sentence in the beam, a language model was used to predict the subsequent words. The word embeddings of the subsequent words were correlated with the reconstructed word embedding. These correlations were then used as a measure of the probability for the next word. The results showed that the proposed continuous word embedding model can effectively leverage both subject-specific and subject-shared information. Additionally, the decoded text exhibited significant similarity to the target text, with an average BERTScore of 0.816, a score comparable to that in the previous fMRI study.
    摘要 decode language from neural signals has important theoretical and practical significance. Previous research has shown that it is feasible to decode text or speech from invasive neural signals. However, when using non-invasive neural signals, there are significant challenges due to their low quality. In this study, we proposed a data-driven approach for decoding the semantic of language from Magnetoencephalography (MEG) signals recorded while subjects were listening to continuous speech. First, a multi-subject decoding model was trained using contrastive learning to reconstruct continuous word embeddings from MEG data. Then, a beam search algorithm was adopted to generate text sequences based on the reconstructed word embeddings. Given a candidate sentence in the beam, a language model was used to predict the subsequent words. The word embeddings of the subsequent words were correlated with the reconstructed word embedding. These correlations were then used as a measure of the probability for the next word. The results showed that the proposed continuous word embedding model can effectively leverage both subject-specific and subject-shared information. Additionally, the decoded text exhibited significant similarity to the target text, with an average BERTScore of 0.816, a score comparable to that in the previous fMRI study.

On the Relationship Between Iterated Statistical Linearization and Quasi-Newton Methods

  • paper_url: http://arxiv.org/abs/2309.07636
  • repo_url: None
  • paper_authors: Anton Kullberg, Martin A. Skoglund, Isaac Skog, Gustaf Hendeby
  • for: 本研究 investigate了基于统计线性化的迭代筛选算法,如迭代感知卡尔曼筛选器(IUKF)和迭代 posterior linearization筛选器(IPLF)与基于 quasi-Newton(QN)方法的筛选算法之间的关系。
  • methods: 本研究表明了IUKF和IPLF可以视为QN算法,通过在QN-IEKF中找到一个Hessian更正,使IUKF和IPLF迭代iterate更新与QN-IEKF的iterate更新相同。
  • results: 本研究还表明了IUKF和IPLF更新可以被rewrite为与QN-IEKF更新相似,即使有一个补做项。这使我们对基于统计线性化的迭代筛选算法的性能有更深刻的理解。
    Abstract This letter investigates relationships between iterated filtering algorithms based on statistical linearization, such as the iterated unscented Kalman filter (IUKF), and filtering algorithms based on quasi-Newton (QN) methods, such as the QN iterated extended Kalman filter (QN-IEKF). Firstly, it is shown that the IUKF and the iterated posterior linearization filter (IPLF) can be viewed as QN algorithms, by finding a Hessian correction in the QN-IEKF such that the IPLF iterate updates are identical to that of the QN-IEKF. Secondly, it is shown that the IPLF/IUKF update can be rewritten such that it is approximately identical to the QN-IEKF, albeit for an additional correction term. This enables a richer understanding of the properties of iterated filtering algorithms based on statistical linearization.
    摘要 Firstly, it is shown that the IUKF and the IPLF can be viewed as QN algorithms by finding a Hessian correction in the QN-IEKF such that the IPLF iterate updates are identical to those of the QN-IEKF.Secondly, it is shown that the IPLF/IUKF update can be rewritten so that it is approximately identical to the QN-IEKF, with an additional correction term. This provides a deeper understanding of the properties of iterated filtering algorithms based on statistical linearization.

Unified Linearization-based Nonlinear Filtering

  • paper_url: http://arxiv.org/abs/2309.07631
  • repo_url: None
  • paper_authors: Anton Kullberg, Isaac Skog, Gustaf Hendeby
  • for: 这个论文旨在探讨了三类重复状态估计滤波器:标准滤波器(如延长 Kalman 滤波器)、迭代滤波器(如迭代抽象 Kalman 滤波器),以及动态迭代滤波器(如动态迭代 posterior linearization 滤波器)。
  • methods: 这些滤波器使用了一种通用算法,它将三类 filters 的特点联系起来,从而探讨了各种滤波器的优缺点,并且提供了一个数字示例,用于证明不同类型的滤波器在非线性定位问题中的估计精度差异。
  • results: 数字示例表明,不同类型的滤波器在非线性定位问题中的估计精度差异较大,标准滤波器在精度方面落后于迭代滤波器和动态迭代滤波器,但是它们在计算效率方面占据了优势。
    Abstract This letter shows that the following three classes of recursive state estimation filters: standard filters, such as the extended Kalman filter; iterated filters, such as the iterated unscented Kalman filter; and dynamically iterated filters, such as the dynamically iterated posterior linearization filters; can be unified in terms of a general algorithm. The general algorithm highlights the strong similarities between specific filtering algorithms in the three filter classes and facilitates an in-depth understanding of the pros and cons of the different filter classes and algorithms. We end with a numerical example showing the estimation accuracy differences between the three classes of filters when applied to a nonlinear localization problem.
    摘要 这封信件显示了以下三类 recursive state estimation 筛子可以被统一为一个通用算法:标准筛子(如扩展卡尔曼筛子)、迭代筛子(如迭代抽象卡尔曼筛子)和动态迭代筛子(如动态迭代 posterior linearization 筛子)。通用算法使得specific filtering algorithms在三个筛子类中的强相似性更加明显,并且便于深入了解每个筛子类的优缺点。我们结束了一个数值示例,表明不同类型的筛子在非线性定位问题中的估计精度差异。Note: "Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

Exact solution of the full RMSA problem in elastic optical networks

  • paper_url: http://arxiv.org/abs/2309.07621
  • repo_url: None
  • paper_authors: Fabio David, José F. de Rezende, Valmir C. Barbosa
  • for: solves the Routing, Modulation, and Spectrum Allocation (RMSA) problem in Elastic Optical Networks (EONs) to maximize the number of admitted demands while minimizing the number of regenerators and frequency slots used.
  • methods: uses a complex ILP (Integer Linear Programming) formulation that takes into account frequency-slot continuity and contiguity.
  • results: the formulation is applied to the NSFNET topology to demonstrate the practicality and importance of obtaining exact solutions.
    Abstract Exact solutions of the Routing, Modulation, and Spectrum Allocation (RMSA) problem in Elastic Optical Networks (EONs), so that the number of admitted demands is maximized while those of regenerators and frequency slots used are minimized, require a complex ILP formulation taking into account frequency-slot continuity and contiguity. We introduce the first such formulation, ending a hiatus of some years since the last ILP formulation for a much simpler RMSA variation was introduced. By exploiting a number of problem and solver specificities, we use the NSFNET topology to illustrate the practicality and importance of obtaining exact solutions.
    摘要 Routing、Modulation、和 Spectrum Allocation(RMSA)问题的精确解决方案,以最大化接受的需求数,同时最小化使用的重建器和频率槽数,需要复杂的ILP表述,考虑频率槽连续性和紧密性。我们提出了首个这种形式的表述,结束了一些年来没有新的ILP表述的停滞。通过利用一些问题和解决方案特点,我们使用NSFNETtopology示例ify了实际性和重要性获得精确解。

Fluid Antenna-Assisted Dirty Multiple Access Channels over Composite Fading

  • paper_url: http://arxiv.org/abs/2309.07604
  • repo_url: None
  • paper_authors: Farshad Rostami Ghadi, Kai-Kit Wong, F. Javier Lopez-Martinez, Chan-Byoung Chae, Kin-Fai Tong, Yangyang Zhang
  • for: 本文研究了在多用户通信系统中应用emerging fluid antenna(FA)技术,当有side information(SI)可用于发送器的情况下。
  • methods: 作者使用了Jakes模型和copula理论,通过Spearman的{\rho} rank correlation coefficient来准确描述FA通道之间的空间相关性,并得出了关于Fisher-Snedecor F 折射下的停机概率(OP)的关闭式表达。
  • results: 作者的计算结果表明,在考虑FA时,多用户通信系统的性能可以得到改善,特别是在OP和用户数量上。此外,作者还发现了可以使用只一个FA在接收器上支持大量用户,只需要几波长的空间。
    Abstract This letter investigates the application of the emerging fluid antenna (FA) technology in multiuser communication systems when side information (SI) is available at the transmitters. In particular, we consider a K-user dirty multiple access channel (DMAC) with non-causally known SI at the transmitters, where K users send independent messages to a common receiver with a FA capable of changing its location depending on the channel condition. By connecting Jakes' model to copula theory through Spearman's {\rho} rank correlation coefficient, we accurately describe the spatial correlation between the FA channels, and derive a closed-form expression for the outage probability (OP) under Fisher-Snedecor F fading. Numerical results illustrate how considering FA can improve the performance of multiuser communication systems in terms of the OP and also support a large number of users using only one FA at the common receiver in a few wavelengths of space.
    摘要

On Performance of Fluid Antenna System using Maximum Ratio Combining

  • paper_url: http://arxiv.org/abs/2309.07582
  • repo_url: None
  • paper_authors: Xiazhi Lai, Tuo Wu, Junteng Yao, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong
  • for: 这个研究是 investigate a fluid antenna system (FAS) 可以同时活动多个端口,以实现更好的接收性能。
  • methods: 这个研究使用 maximum ratio combining (MRC) 将接收到的信号从选择的多个端口进行组合。
  • results: 研究结果表明,使用 FAS 可以利用各个端口的空间多样性,并且通过计算机模拟得到了证明。
    Abstract This letter investigates a fluid antenna system (FAS) where multiple ports can be activated for signal combining for enhanced receiver performance. Given $M$ ports at the FAS, the best $K$ ports out of the $M$ available ports are selected before maximum ratio combining (MRC) is used to combine the received signals from the selected ports. The aim of this letter is to study the achievable performance of FAS when more than one ports can be activated. We do so by analyzing the outage probability of this setup in Rayleigh fading channels through the utilization of Gauss-Chebyshev integration, lower bound estimation, and high signal-to-noise ratio (SNR) asymptotic approximations. Our analytical results demonstrate that FAS can harness rich spatial diversity, which is confirmed by computer simulations.
    摘要 这封信件研究一种流体天线系统(FAS),其中多个端口可以被激活以实现信号合并以提高接收器性能。假设有 $M$ 个端口在 FAS 中,我们需要选择最佳的 $K$ 个端口($K \leq M$),然后使用最大比率组合(MRC)将接收到的信号从选择的端口进行组合。本信件的目的是研究 FAS 在多个端口可以被激活的情况下可以达到的性能。我们通过利用 Gaussian-Chebyshev 积分、下界估计和高 SNR 强制近似来分析 Rayleigh 抽象渐近频谱中的失业概率。我们的分析结果表明,FAS 可以利用丰富的空间多样性,这被计算仿真所证实。

A Gaussian Copula Approach to the Performance Analysis of Fluid Antenna Systems

  • paper_url: http://arxiv.org/abs/2309.07506
  • repo_url: None
  • paper_authors: Farshad Rostami Ghadi, Kai-Kit Wong, F. Javier Lopez-Martinez, Chan-Byoung Chae, Kin-Fai Tong, Yangyang Zhang
  • for: 这个论文研究了单用户流体天线系统(FAS)的性能,通过利用一类圆形共corrélation函数来描述流体天线端口之间的结构相关性。
  • methods: 我们使用了Jakes模型的圆形共 corrélation函数来表示两种情况:一般情况和特定情况,并derived了相关的分布函数的数学表述。
  • results: 我们发现增加流体天线大小可以降低OP和DOR,但系统性能随天线端口数量增加而减退。此外,我们的结果表明FAS比单一固定天线系统具有更好的性能,即使流体天线较小。
    Abstract This paper investigates the performance of a singleuser fluid antenna system (FAS), by exploiting a class of elliptical copulas to describe the structure of dependency amongst the fluid antenna ports. By expressing Jakes' model in terms of the Gaussian copula, we consider two cases: (i) the general case, i.e., any arbitrary correlated fading distribution; and (ii) the specific case, i.e., correlated Nakagami-m fading. For both scenarios, we first derive analytical expressions for the cumulative distribution function (CDF) and probability density function (PDF) of the equivalent channel in terms of multivariate normal distribution. Then, we obtain the outage probability (OP) and the delay outage rate (DOR) to analyze the performance of the FAS. By employing the popular rank correlation coefficients such as Spearman's \{rho} and Kendall's {\tau}, we measure the degree of dependency in correlated arbitrary fading channels and illustrate how the Gaussian copula can be accurately connected to Jakes' model in FAS without complicated mathematical analysis. Numerical results show that increasing the fluid antenna size provides lower OP and DOR, but the system performance saturates as the number of antenna ports increases. In addition, our results indicate that FAS provides better performance compared to conventional single-fixed antenna systems even when the size of fluid antenna is small.
    摘要

A Tutorial on Environment-Aware Communications via Channel Knowledge Map for 6G

  • paper_url: http://arxiv.org/abs/2309.07460
  • repo_url: None
  • paper_authors: Yong Zeng, Junting Chen, Jie Xu, Di Wu, Xiaoli Xu, Shi Jin, Xiqi Gao, David Gesbert, Shuguang Cui, Rui Zhang
  • for: This paper aims to provide a comprehensive tutorial overview on environment-aware communications enabled by channel knowledge map (CKM) for 6G mobile communication networks.
  • methods: The paper discusses the basic concept of CKM, compares it with various existing channel inference techniques, and presents the main techniques for CKM construction, including both model-free and model-assisted approaches.
  • results: The paper provides a general framework for the utilization of CKM to achieve environment-aware communications and discusses typical CKM-aided communication scenarios. It also highlights important open problems in CKM research and potential solutions to inspire future work.Here’s the text in Simplified Chinese:
  • for: 这篇论文目标是为6G移动通信网络提供全面的教程介绍,演示如何通过通道知识地图(CKM)实现环境意识通信。
  • methods: 论文介绍CKM的基本概念,与现有的通道推理技术进行比较,并介绍CKM构建的主要技术,包括无模型和模型支持的方法。
  • results: 论文提供CKM在环境意识通信中的总体框架,并介绍一些典型的CKM帮助通信场景。它还提出了importante的CKM研究开放问题,并讨论了可能的解决方案,以启发未来工作。
    Abstract Sixth-generation (6G) mobile communication networks are expected to have dense infrastructures, large-dimensional channels, cost-effective hardware, diversified positioning methods, and enhanced intelligence. Such trends bring both new challenges and opportunities for the practical design of 6G. On one hand, acquiring channel state information (CSI) in real time for all wireless links becomes quite challenging in 6G. On the other hand, there would be numerous data sources in 6G containing high-quality location-tagged channel data, making it possible to better learn the local wireless environment. By exploiting such new opportunities and for tackling the CSI acquisition challenge, there is a promising paradigm shift from the conventional environment-unaware communications to the new environment-aware communications based on the novel approach of channel knowledge map (CKM). This article aims to provide a comprehensive tutorial overview on environment-aware communications enabled by CKM to fully harness its benefits for 6G. First, the basic concept of CKM is presented, and a comparison of CKM with various existing channel inference techniques is discussed. Next, the main techniques for CKM construction are discussed, including both the model-free and model-assisted approaches. Furthermore, a general framework is presented for the utilization of CKM to achieve environment-aware communications, followed by some typical CKM-aided communication scenarios. Finally, important open problems in CKM research are highlighted and potential solutions are discussed to inspire future work.
    摘要

Interpretable and Efficient Beamforming-Based Deep Learning for Single Snapshot DOA Estimation

  • paper_url: http://arxiv.org/abs/2309.07411
  • repo_url: None
  • paper_authors: Ruxin Zheng, Shunqiao Sun, Hongshan Liu, Honglei Chen, Jian Li
  • for: 这篇论文是为了提出一种可解释的深度学习方法来进行方向来源估计(DOA),使用单张Snapshot。
  • methods: 这篇论文使用的方法包括deep-MPDR网络,这是一种将最小功率扭积分布(MPDR)类型的扭积器翻译成深度学习的方法,以提高泛化和效率。
  • results: 根据 simulations 和实际数据集的实验结果,这种方法在准确率和计算时间方面具有优势,并且在其他深度学习DOA估计网络中表现出了更好的一致性、效率和可解释性。
    Abstract We introduce an interpretable deep learning approach for direction of arrival (DOA) estimation with a single snapshot. Classical subspace-based methods like MUSIC and ESPRIT use spatial smoothing on uniform linear arrays for single snapshot DOA estimation but face drawbacks in reduced array aperture and inapplicability to sparse arrays. Single-snapshot methods such as compressive sensing and iterative adaptation approach (IAA) encounter challenges with high computational costs and slow convergence, hampering real-time use. Recent deep learning DOA methods offer promising accuracy and speed. However, the practical deployment of deep networks is hindered by their black-box nature. To address this, we propose a deep-MPDR network translating minimum power distortionless response (MPDR)-type beamformer into deep learning, enhancing generalization and efficiency. Comprehensive experiments conducted using both simulated and real-world datasets substantiate its dominance in terms of inference time and accuracy in comparison to conventional methods. Moreover, it excels in terms of efficiency, generalizability, and interpretability when contrasted with other deep learning DOA estimation networks.
    摘要 我们介绍一个可解释深度学习方法来测量方向来源(DOA)的单一快照。经典的子空间基础方法如MUSIC和ESPRIT使用线性阵列上的空间平滑来进行单一快照DOA估计,但是它们受到缩小阵列范围和不适用于叠合阵列的限制。单一快照方法如压缩感知和迭代适应方法(IAA)则面临高计算成本和慢的融合速度,使其在实时使用中受到阻碍。现代深度学习DOA方法则提供了准确性和速度的推荐,但是实际应用受到黑盒子的问题所限。为了解决这个问题,我们提出了深度-MPDR网络,将最小功率无损响应(MPDR)型扁平阵列转换为深度学习,从而提高普遍性和效率。我们在使用实验和真实数据进行了全面的比较,证明了我们的方法在推理时间和准确性方面较 conventional方法优越,并且在效率、普遍性和可解释性方面也较其他深度学习DOA估计网络优越。

cs.SD - 2023-09-13

Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

  • paper_url: http://arxiv.org/abs/2309.07287
  • repo_url: None
  • paper_authors: Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios
  • For: The paper aims to develop a machine learning model that can label adult and child audio recordings of clinician-child interactions, with the goal of assisting clinicians in capturing events of interest and communicating with parents more effectively.* Methods: The authors use a self-supervised learning model called Wav2Vec 2.0 (W2V2), which was pretrained on 4300 hours of home recordings of children under 5 years old. They apply this system to two-channel audio recordings of brief clinician-child interactions using the Rapid-ABC corpus, and introduce auxiliary features extracted from the W2V2-based automatic speech recognition (ASR) system to improve the accuracy of vocalization classification (VC) for children under 4 years old.* Results: The authors observe consistent improvements in the VC task on two corpora (Rapid-ABC and BabbleCor), and reach or outperform the state-of-the-art performance of BabbleCor.
    Abstract Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that often emerges in early childhood. ASD assessment typically involves an observation protocol including note-taking and ratings of child's social behavior conducted by a trained clinician. A robust machine learning (ML) model that is capable of labeling adult and child audio has the potential to save significant time and labor in manual coding children's behaviors. This may assist clinicians capture events of interest, better communicate events with parents, and educate new clinicians. In this study, we leverage the self-supervised learning model, Wav2Vec 2.0 (W2V2), pretrained on 4300h of home recordings of children under 5 years old, to build a unified system that performs both speaker diarization (SD) and vocalization classification (VC) tasks. We apply this system to two-channel audio recordings of brief 3-5 minute clinician-child interactions using the Rapid-ABC corpus. We propose a novel technique by introducing auxiliary features extracted from W2V2-based automatic speech recognition (ASR) system for children under 4 years old to improve children's VC task. We test our proposed method of improving children's VC task on two corpora (Rapid-ABC and BabbleCor) and observe consistent improvements. Furthermore, we reach, or perhaps outperform, the state-of-the-art performance of BabbleCor.
    摘要 自适应发展障碍(ASD)是一种在早期儿hood出现的神经发展障碍。ASD评估通常包括一种观察协议,包括记录和评分孩子的社交行为,由训练有素的临床医生进行。一个功能强大的机器学习(ML)模型,可以标注成人和儿童的音频,有可能为临床医生节省巨量的时间和劳动。这可能帮助临床医生捕捉事件关键,更好地与父母交流事件,并更好地培训新的临床医生。在这项研究中,我们利用了无监督学习模型,Wav2Vec 2.0(W2V2),已经在4300小时的孩子下5岁的家庭录音中进行了预训练。我们使用这个系统来建立一个统一的系统,用于完成说话识别(VC)和媒体分类(SD)任务。我们将这个系统应用于两栏raphic-ABC corpus中的两栏录音,并提出了一种新的技术,通过在W2V2基于自动语音识别(ASR)系统中提取的辅助特征,以改进儿童的VC任务。我们在两个corpus(Rapid-ABC和BabbleCor)上测试了我们的提议,并观察到了一致的改进。此外,我们达到了或者超过了BabbleCor的状态艺术性能。

A Flexible Online Framework for Projection-Based STFT Phase Retrieval

  • paper_url: http://arxiv.org/abs/2309.07043
  • repo_url: None
  • paper_authors: Tal Peer, Simon Welker, Johannes Kolhoff, Timo Gerkmann
  • for: 提高iterative STFT阶段的phaserecovery性能
  • methods: 使用新的投影运算符组合方式,从 Griffin-Lim 方法中获得更好的重构质量和迭代数量,同时保持相同的计算复杂性
  • results: 在speech signal上实现了更好的重构质量,比RTISI更高的性能,并且可以在线实现任何基于迭代投影的算法
    Abstract Several recent contributions in the field of iterative STFT phase retrieval have demonstrated that the performance of the classical Griffin-Lim method can be considerably improved upon. By using the same projection operators as Griffin-Lim, but combining them in innovative ways, these approaches achieve better results in terms of both reconstruction quality and required number of iterations, while retaining a similar computational complexity per iteration. However, like Griffin-Lim, these algorithms operate in an offline manner and thus require an entire spectrogram as input, which is an unrealistic requirement for many real-world speech communication applications. We propose to extend RTISI -- an existing online (frame-by-frame) variant of the Griffin-Lim algorithm -- into a flexible framework that enables straightforward online implementation of any algorithm based on iterative projections. We further employ this framework to implement online variants of the fast Griffin-Lim algorithm, the accelerated Griffin-Lim algorithm, and two algorithms from the optics domain. Evaluation results on speech signals show that, similarly to the offline case, these algorithms can achieve a considerable performance gain compared to RTISI.
    摘要 近些年在循环STFT阶段phaserecovery领域,一些研究表明可以通过使用同样的投影运算符,但是通过创新的方式组合它们,提高循环STFT阶段phaserecovery的性能,包括重建质量和需要的迭代数量,而且保持与经典Griffin-Lim方法相同的计算复杂度。然而,这些算法都是在离线方式下运行,需要一个完整的spectrogram作为输入,这是许多实际语音通信应用场景中的一个不现实的假设。我们提议将RTISI---一种现有的在线(frame-by-frame)变体的Griffin-Lim算法---扩展为一个灵活的框架,以便直观在线实现任何基于循环投影的算法。此外,我们使用这个框架来在线实现快速Griffin-Lim算法、加速Griffin-Lim算法和两种光学领域的算法。对语音信号进行评估结果表明,与离线情况类似,这些算法可以与RTISI相比,获得显著的性能提升。

Diffusion models for audio semantic communication

  • paper_url: http://arxiv.org/abs/2309.07195
  • repo_url: None
  • paper_authors: Eleonora Grassucci, Christian Marinoni, Andrea Rodriguez, Danilo Comminiello
  • for: 本研究旨在提高听音信息的传输稳定性和可靠性,通过将 semantics 和 audio signal 转化为听音信息的含义,然后在接收端使用 conditional diffusion model 重建听音信息。
  • methods: 本研究提出了一种基于 inverse problem 的听音信息传输框架,将 audio signal 和 semantics 转化为听音信息的含义,然后使用 conditional diffusion model 在接收端重建听音信息。
  • results: 实验结果显示,本研究的方法在不同的渠道条件下都能够超越竞争对手,并且可以有效地重建听音信息。您可以访问项目页面,listen to samples 和获取代码:https://ispamm.github.io/diffusion-audio-semantic-communication/.
    Abstract Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication approach proposes to send the semantics and then regenerate semantically consistent content at the receiver without exactly recovering the bitstream. In this paper, we propose a generative audio semantic communication framework that faces the communication problem as an inverse problem, therefore being robust to different corruptions. Our method transmits lower-dimensional representations of the audio signal and of the associated semantics to the receiver, which generates the corresponding signal with a particular focus on its meaning (i.e., the semantics) thanks to the conditional diffusion model at its core. During the generation process, the diffusion model restores the received information from multiple degradations at the same time including corruption noise and missing parts caused by the transmission over the noisy channel. We show that our framework outperforms competitors in a real-world scenario and with different channel conditions. Visit the project page to listen to samples and access the code: https://ispamm.github.io/diffusion-audio-semantic-communication/.
    摘要 直接传送对话讯号从传送器到接收器过噪通道可能吸收稳定带宽和容易发生错误,尤其在尝试从传送的字节中恢复传送的内容。相反,最近的 semantics 通信方法建议将内容和其相关的 semantics 传送到接收器,并在接收器端使用 conditional diffusion 模型来生成具有特定意义的讯号。在我们的框架中,我们传送对话讯号的下降维度表示和相关的 semantics 到接收器,接收器使用 conditional diffusion 模型来从多种降低处理中恢复获得的讯号,包括噪音扰障和传送过程中的缺失部分。我们显示,我们的框架在实际情况下比竞争对手更好,并在不同的通道条件下显示出优秀的表现。您可以前往项目页面聆听样本和取得代码:https://ispamm.github.io/diffusion-audio-semantic-communication/.

Reorganization of the auditory-perceptual space across the human vocal range

  • paper_url: http://arxiv.org/abs/2309.06946
  • repo_url: None
  • paper_authors: Daniel Friedrichs, Volker Dellwo
  • For: This paper investigates the auditory-perceptual space of vowels in the human vocal range, specifically focusing on the role of spectral shape in vowel perception.* Methods: The study uses multidimensional scaling analysis of cochlea-scaled spectra from 250-ms vowel segments, with a dataset of 240 vowels produced by three native German female speakers.* Results: The study finds systematic spectral shifts associated with vowel height and frontness, with a notable clustering around /i a u/ above 523 Hz. These findings highlight the importance of spectral shape in vowel perception and offer insights into the evolution of language.In Simplified Chinese text, the information could be summarized as follows:* 为:这篇论文研究了人类语音范围内的声音感知空间,尤其是声音形态在声音认知中的作用。* 方法:这项研究使用了多维度投影分析器,利用了250毫秒的元音段的聪见谱,来研究3名德国女性说话者的元音。* 结果:研究发现,随着高频声音的增加,元音的高度和前端性呈现出系统性的声音偏移,特别是在523Hz以上的高频声音上。这些发现 подтвержда了声音形态在元音认知中的重要作用,并且为语言演化提供了可能的解释。
    Abstract We analyzed the auditory-perceptual space across a substantial portion of the human vocal range (220-1046 Hz) using multidimensional scaling analysis of cochlea-scaled spectra from 250-ms vowel segments, initially studied in Friedrichs et al. (2017) J. Acoust. Soc. Am. 142 1025-1033. The dataset comprised the vowels /i y e {\o} {\epsilon} a o u/ (N=240) produced by three native German female speakers, encompassing a broad range of their respective voice frequency ranges. The initial study demonstrated that, during a closed-set identification task involving 21 listeners, the point vowels /i a u/ were significantly recognized at fundamental frequencies (fo) nearing 1 kHz, whereas the recognition of other vowels decreased at higher pitches. Building on these findings, our study revealed systematic spectral shifts associated with vowel height and frontness as fo increased, with a notable clustering around /i a u/ above 523 Hz. These observations underscore the pivotal role of spectral shape in vowel perception, illustrating the reliance on acoustic anchors at higher pitches. Furthermore, this study sheds light on the quantal nature of these vowels and their potential impact on language evolution, offering a plausible explanation for their widespread presence in the world's languages.
    摘要 我们使用多维度尺度分析对人声 vocal range(220-1046Hz)中的听觉空间进行了分析,使用 Friedrichs et al. (2017) J. Acoust. Soc. Am. 142 1025-1033中提出的多个 native German female speakers的vowel /i y e {\o} {\epsilon} a o u/(共240个) Dataset,覆盖了它们的声音频率范围。之前的研究表明,在一个关闭式认知任务中,点vowel /i a u/ 在基本频率(fo)接近1kHz的情况下得到了明显的认知。此外,我们的研究还发现了高频域的系统性 spectral shifts 与vowel height和前端性有关,特别是在523Hz以上的高频域。这些观察结果 highlights the crucial role of spectral shape in vowel perception, and underscores the reliance on acoustic anchors at higher pitches.此外,这种研究还 shed light on the quantal nature of these vowels and their potential impact on language evolution, offering a plausible explanation for their widespread presence in the world's languages.

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

  • paper_url: http://arxiv.org/abs/2309.06934
  • repo_url: None
  • paper_authors: Carlos Hernandez-Olivan, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramirez, Wei-Hsiang Liao, Yuki Mitsufuji
  • for: 这篇论文主要关注于修复音乐信号的听风伤害,以提高音频质量并且适用于不同的修复任务。
  • methods: 这篇论文提出了一种基于扩散 posterior 采样(DPS)的音乐修复方法,并研究了一些针对现有DPS-based方法的问题,如杂散导航技术,包括RePaint(RP)策略和 Pseudoinverse-Guided Diffusion Models($\Pi$GDM)。
  • results: 在 vocal declipping 和 bandwidth extension 两个任务中,我们的方法表现出色,超越了目前的DPS-based音乐修复标准。可以参考 \url{http://carlosholivan.github.io/demos/audio-restoration-2023.html} 获取修复后的音频示例。
    Abstract Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrinsic properties, making it versatile across various restoration tasks. In this paper, we identify that there are potential issues which will degrade current DPS-based methods' performance and introduce the way to mitigate the issues inspired by diverse diffusion guidance techniques including the RePaint (RP) strategy and the Pseudoinverse-Guided Diffusion Models ($\Pi$GDM). We demonstrate our methods for the vocal declipping and bandwidth extension tasks under various levels of distortion and cutoff frequency, respectively. In both tasks, our methods outperform the current DPS-based music restoration benchmarks. We refer to \url{http://carlosholivan.github.io/demos/audio-restoration-2023.html} for examples of the restored audio samples.
    摘要 重新恢复音乐信号是提高音频质量的关键,以便进行下游音乐修饰。最近的扩散基于音乐恢复方法中,扩散 posterior 抽象(DPS)表现出色,其具有多种恢复任务的 universality。在这篇论文中,我们发现了当前 DPS 基于的方法可能会受到的问题,并提出了利用多种扩散指导技术,包括 RePaint(RP)策略和 Pseudoinverse-Guided Diffusion Models($\Pi$GDM)来 Mitigate 这些问题。我们在 vocals 减震和频率延展任务中应用了我们的方法,并在不同的噪声和截止频率水平下进行了评估。在两个任务中,我们的方法超越了当前 DPS 基于的音乐恢复标准。更多的纪录音amples可以在 上找到。

EMALG: An Enhanced Mandarin Lombard Grid Corpus with Meaningful Sentences

  • paper_url: http://arxiv.org/abs/2309.06858
  • repo_url: None
  • paper_authors: Baifeng Li, Qingmu Liu, Yuhong Yang, Hongyang Chen, Weiping Tu, Song Lin
  • for: 这个研究 investigate Lombard effect, where individuals adapt their speech in noisy environments.
  • methods: 我们引入了改进的满语 Lombard 网格 (EMALG) Corpora,增加了有意义的句子,从而解决了 MALG Corpora 面临的挑战。
  • results: 我们发现,在满语中,女性在发言有意义的句子时更加强烈地表现出 Lombard 效应,而男性则不然。此外,我们发现 meaningless 句子会负面影响 Lombard 效应分析。此外,我们的结果证实了在英语和满语之间 Lombard 效应的相似性,与之前的研究相符。
    Abstract This study investigates the Lombard effect, where individuals adapt their speech in noisy environments. We introduce an enhanced Mandarin Lombard grid (EMALG) corpus with meaningful sentences , enhancing the Mandarin Lombard grid (MALG) corpus. EMALG features 34 speakers and improves recording setups, addressing challenges faced by MALG with nonsense sentences. Our findings reveal that in Mandarin, female exhibit a more pronounced Lombard effect than male, particularly when uttering meaningful sentences. Additionally, we uncover that nonsense sentences negatively impact Lombard effect analysis. Moreover, our results reaffirm the consistency in the Lombard effect comparison between English and Mandarin found in previous research.
    摘要 这个研究investigates the Lombard effect, where individuals adapt their speech in noisy environments. We introduce an enhanced Mandarin Lombard grid (EMALG) corpus with meaningful sentences, enhancing the Mandarin Lombard grid (MALG) corpus. EMALG features 34 speakers and improves recording setups, addressing challenges faced by MALG with nonsense sentences. Our findings reveal that in Mandarin, female speakers exhibit a more pronounced Lombard effect than male speakers, particularly when uttering meaningful sentences. Additionally, we find that nonsense sentences negatively impact Lombard effect analysis. Moreover, our results reaffirm the consistency in the Lombard effect comparison between English and Mandarin found in previous research.Here's the translation in Traditional Chinese for comparison:这个研究investigates the Lombard effect, where individuals adapt their speech in noisy environments. We introduce an enhanced Mandarin Lombard grid (EMALG) corpus with meaningful sentences, enhancing the Mandarin Lombard grid (MALG) corpus. EMALG features 34 speakers and improves recording setups, addressing challenges faced by MALG with nonsense sentences. Our findings reveal that in Mandarin, female speakers exhibit a more pronounced Lombard effect than male speakers, particularly when uttering meaningful sentences. Additionally, we find that nonsense sentences negatively impact Lombard effect analysis. Moreover, our results reaffirm the consistency in the Lombard effect comparison between English and Mandarin found in previous research.

DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation

  • paper_url: http://arxiv.org/abs/2309.06787
  • repo_url: None
  • paper_authors: Zhichao Wu, Qiulin Li, Sixing Liu, Qun Yang
  • for: 这 paper 是为了提高 Text-to-Speech 任务中的Diffusion模型的效率和适用性而写的。
  • methods: 这 paper 使用 Discrete Diffusion Model with Contrastive Learning 来提高 Text-to-Speech Generation 的质量和速度。具体来说,它使用精确的文本编码器来简化模型的参数和提高计算效率,并使用对比学习方法来增强文本和声音之间的对应关系。
  • results: эксперименталь结果表明,提出的方法可以在保持声音质量的同时,大幅降低Diffusion模型的计算资源占用率和执行速度。synthesized samples 可以在 https://github.com/lawtherWu/DCTTS 上获取。
    Abstract In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete space significantly lowers the computational consumption of the diffusion model and improves sampling speed; 2) The contrastive learning method based on discrete space is used to enhance the alignment connection between speech and text and improve sampling quality; and 3) It uses an efficient text encoder to simplify the model's parameters and increase computational efficiency. The experimental results demonstrate that the approach proposed in this paper has outstanding speech synthesis quality and sampling speed while significantly reducing the resource consumption of diffusion model. The synthesized samples are available at https://github.com/lawtherWu/DCTTS.
    摘要 在文本至语音(TTS)任务中,液态扩散模型具有优秀的准确性和泛化能力,但它的资源消耗量和推理速度始终是一大挑战。这篇论文提出了粒子扩散模型与对比学习 для文本至语音生成(DCTTS)。这个方法的贡献包括:1. 基于粒子空间的TTS扩散模型,显著降低了扩散模型的计算摄用量和提高了抽样速度;2. 基于粒子空间的对比学习方法,可以增强语音和文本之间的对应关系,提高抽样质量;3. 使用高效的文本编码器,简化模型参数,提高计算效率。实验结果表明,该方法提出的方法在语音合成质量和抽样速度两个方面具有优秀表现,同时significantly降低了扩散模型的资源消耗量。生成的样例可以在GitHub上获取:https://github.com/lawtherWu/DCTTS。

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

  • paper_url: http://arxiv.org/abs/2309.06780
  • repo_url: None
  • paper_authors: Chu Yuan Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Xinrui Yan
  • for: 本研究旨在探讨合成 speech 中的源特征,以便在法律和知识产权领域进行识别和防范。
  • methods: 本研究使用多个 speaker LibriTTS dataset, investigate 合成 speech 中的模型特征,包括 acoustic model 和 vocoder,以及它们对 waveform 的影响。
  • results: 研究发现, vocoder 和 acoustic model 都会留下特定的模型特征在 waveform 中,但 vocoder 的特征更加强大,可能会覆盖 acoustic model 的特征。这些发现表明存在模型特征,可以用于识别合成 speech 的源。
    Abstract Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To address the gaps, we present our findings concerning the identification of the sources of synthesized speech in this paper. We investigate the existence of speech synthesis model fingerprints in the generated speech waveforms, with a focus on the acoustic model and the vocoder, and study the influence of each component on the fingerprint in the overall speech waveforms. Our research, conducted using the multi-speaker LibriTTS dataset, demonstrates two key insights: (1) vocoders and acoustic models impart distinct, model-specific fingerprints on the waveforms they generate, and (2) vocoder fingerprints are the more dominant of the two, and may mask the fingerprints from the acoustic model. These findings strongly suggest the existence of model-specific fingerprints for both the acoustic model and the vocoder, highlighting their potential utility in source identification applications.
    摘要 In this paper, we investigate the existence of speech synthesis model fingerprints in generated speech waveforms, focusing on the acoustic model and the vocoder. We examine the influence of each component on the fingerprint in the overall speech waveforms.Our research, conducted using the multi-speaker LibriTTS dataset, reveals two key insights:1. Vocoders and acoustic models impart distinct, model-specific fingerprints on the waveforms they generate.2. Vocoder fingerprints are the more dominant of the two, and may mask the fingerprints from the acoustic model.These findings suggest the existence of model-specific fingerprints for both the acoustic model and the vocoder, highlighting their potential utility in source identification applications.

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network

  • paper_url: http://arxiv.org/abs/2309.06723
  • repo_url: None
  • paper_authors: Qinghua Liu, Meng Ge, Zhizheng Wu, Haizhou Li
  • for: 研究如何利用变化的说话表情来提高音视频说话人EXTRACTING
  • methods: 提出了一种具有可pose-invariant视图的音视频说话人EXTRACTING网络(PIAVE),通过生成每个原始姿势 orientation的pose-invariant视图,使模型可以得到一个一致的前视图,therefore, forming a multi-view visual input for the speaker.
  • results: 在多视图MEAD和野外LRS3数据集上进行实验,PIAVE比 estado-of-the-art 高效和更加鲁棒地处理pose变化。
    Abstract It is common in everyday spoken communication that we look at the turning head of a talker to listen to his/her voice. Humans see the talker to listen better, so do machines. However, previous studies on audio-visual speaker extraction have not effectively handled the varying talking face. This paper studies how to take full advantage of the varying talking face. We propose a Pose-Invariant Audio-Visual Speaker Extraction Network (PIAVE) that incorporates an additional pose-invariant view to improve audio-visual speaker extraction. Specifically, we generate the pose-invariant view from each original pose orientation, which enables the model to receive a consistent frontal view of the talker regardless of his/her head pose, therefore, forming a multi-view visual input for the speaker. Experiments on the multi-view MEAD and in-the-wild LRS3 dataset demonstrate that PIAVE outperforms the state-of-the-art and is more robust to pose variations.
    摘要 通常在日常口语communication中,我们会看向说话人的头部,以便更好地听到他/她的voice。人类和机器都会这样做。然而,之前的音频视频说话人提取研究没有有效地处理变化的说话面孔。这篇论文研究如何全面利用变化的说话面孔。我们提议一个pose-invariant的音频视频说话人提取网络(PIAVE),该网络包含一个额外的pose-invariant视图,以提高音频视频说话人提取的精度。具体来说,我们将每个原始poseorientation中生成一个pose-invariant视图,这使得模型能够得到不同poseorientation下的说话人的一致的前视角,因此形成一个多视图的视觉输入。实验表明,PIAVE在多视图MEAD和野外LRS3 dataset上表现出优于状态之arte和更加鲁为pose变化。

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

  • paper_url: http://arxiv.org/abs/2309.06672
  • repo_url: None
  • paper_authors: Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian
  • for: 提高speaker分类任务的性能,尤其是针对未看到之数量的说话人。
  • methods: 使用Attention-based encoder-decoder网络,采用教师强制策略进行模型训练,并提出了一种循环解码方法来输出每个说话人的扩展结果。
  • results: 在多个评估指标上达到了新的最佳性能,包括CALLHOME(10.08%)、DIHARD II(24.64%)和AMI(13.00%)评估指标。此外,该系统还表现出了极高的竞争力作为一种speech类型检测模型。
    Abstract Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers, while target speaker voice activity detection (TS-VAD) systems tend to be overly complex. In this paper, we propose a simple attention-based encoder-decoder network for end-to-end neural diarization (AED-EEND). In our training process, we introduce a teacher-forcing strategy to address the speaker permutation problem, leading to faster model convergence. For evaluation, we propose an iterative decoding method that outputs diarization results for each speaker sequentially. Additionally, we propose an Enhancer module to enhance the frame-level speaker embeddings, enabling the model to handle scenarios with an unseen number of speakers. We also explore replacing the transformer encoder with a Conformer architecture, which better models local information. Furthermore, we discovered that commonly used simulation datasets for speaker diarization have a much higher overlap ratio compared to real data. We found that using simulated training data that is more consistent with real data can achieve an improvement in consistency. Extensive experimental validation demonstrates the effectiveness of our proposed methodologies. Our best system achieved a new state-of-the-art diarization error rate (DER) performance on all the CALLHOME (10.08%), DIHARD II (24.64%), and AMI (13.00%) evaluation benchmarks, when no oracle voice activity detection (VAD) is used. Beyond speaker diarization, our AED-EEND system also shows remarkable competitiveness as a speech type detection model.
    摘要 深度神经网络基于系统在说话人分类任务中表现出了显著的改进。然而,采用端到端神经网络(EEND)系统在未看到数量的说话人场景下一般具有困难泛化性。而目标说话人活动检测(TS-VAD)系统则往往过于复杂。在这篇论文中,我们提议一种简单的注意力基于encoder-decoder网络(AED-EEND)。在我们的训练过程中,我们引入了教师强制策略来解决说话人排序问题,从而提高模型的快速收敛。在评估中,我们提出了一种逐个输出每个说话人的分类结果的迭代解码方法。此外,我们还提出了一种增强器模块,用于增强每帧的说话人嵌入,使模型能够处理未看到数量的说话人场景。此外,我们还发现了常用的说话人分类 simulation 数据集的一个重要问题,即 overlap ratio 较高。我们发现,使用更加符合实际数据的 simulated 训练数据可以实现提高一致性。我们的实验证明了我们提出的方法的有效性。我们的最佳系统在所有 CALLHOME(10.08%)、DIHARD II(24.64%)和 AMI(13.00%) 评估标准上达到了新的状态之纪录级别,当没有使用 oracle 语音活动检测(VAD)时。此外,我们的 AED-EEND 系统还表现出了很好的抗讲话类型检测能力。

Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

  • paper_url: http://arxiv.org/abs/2309.06649
  • repo_url: https://github.com/jorshi/drumblender
  • paper_authors: Jordie Shier, Franco Caspe, Andrew Robertson, Mark Sandler, Charalampos Saitis, Andrew McPherson
  • for: 这个论文主要针对的是什么? + 这篇论文旨在提出一种能够模型束缚的数字信号处理(DDSP)技术,用于生成打击声的电子乐器。
  • methods: 这个论文使用了什么方法? + 该论文提出了一种基于束缚模型的打击声生成方法,包括使用 modify 的时间卷积神经网络来生成打击声的脉冲。
  • results: 这个论文的结果是什么? + 该论文通过使用大量的音频和电子打击乐amples,计算了一系列的重建度量,并证明了其方法可以更好地重建打击声乐器的音波信号。
    Abstract Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments.
    摘要 diferenciable digital signal processing (DDSP) 技术,包括音频合成方法,在最近几年内受到了关注,并且具有可解释的参数空间特性。然而,当前的可 diferenciable 合成方法并没有直接模型信号激发部分,这对于钣鼓样本是非常重要的。在这项工作中,我们提出了一种统一的合成框架,旨在解决钣鼓合成和激发部分的问题。为此,我们提出了基于圆形模型合成的钣鼓合成模型,并将模拟的时间卷积神经网络用于激发部分。我们使用修改后的圆形峰挑选算法来生成时间变化的非幂圆形,并与可导的噪声和激发编码器一起进行联合训练,以重construct 钣鼓 зву频样本。我们计算了一组重建指标,使用大量的音频和电子打击乐样本,并显示了我们的方法可以提高钣鼓类打击乐器的开始信号重建。

cs.CV - 2023-09-13

Automated Assessment of Critical View of Safety in Laparoscopic Cholecystectomy

  • paper_url: http://arxiv.org/abs/2309.07330
  • repo_url: None
  • paper_authors: Yunfan Li, Himanshu Gupta, Haibin Ling, IV Ramakrishnan, Prateek Prasanna, Georgios Georgakis, Aaron Sasson
    for: 这项研究旨在开发深度学习技术,自动评估 lap choledochoscopy 中的安全视野 (CVS)。methods: 研究采用了两栅 semantic segmentation 方法,首先生成两个分割图,然后根据近距离 gallbladder 的 анатомиче结构进行定量计算,最后通过规则来确定每一个 CVS 标准的满足 Condition。results: 研究所获得的结果包括:1) 对 relevant 类型的 mIoU 提高了11.8%以上,相比单基eline模型; 2) 对 Transformer 基eline模型的 Sobel 损失函数,提高了1.84%的 mIoU; 3) CVS 标准的评估中,提高了16%以上,全 CV 评估中提高了5%。
    Abstract Cholecystectomy (gallbladder removal) is one of the most common procedures in the US, with more than 1.2M procedures annually. Compared with classical open cholecystectomy, laparoscopic cholecystectomy (LC) is associated with significantly shorter recovery period, and hence is the preferred method. However, LC is also associated with an increase in bile duct injuries (BDIs), resulting in significant morbidity and mortality. The primary cause of BDIs from LCs is misidentification of the cystic duct with the bile duct. Critical view of safety (CVS) is the most effective of safety protocols, which is said to be achieved during the surgery if certain criteria are met. However, due to suboptimal understanding and implementation of CVS, the BDI rates have remained stable over the last three decades. In this paper, we develop deep-learning techniques to automate the assessment of CVS in LCs. An innovative aspect of our research is on developing specialized learning techniques by incorporating domain knowledge to compensate for the limited training data available in practice. In particular, our CVS assessment process involves a fusion of two segmentation maps followed by an estimation of a certain region of interest based on anatomical structures close to the gallbladder, and then finally determination of each of the three CVS criteria via rule-based assessment of structural information. We achieved a gain of over 11.8% in mIoU on relevant classes with our two-stream semantic segmentation approach when compared to a single-model baseline, and 1.84% in mIoU with our proposed Sobel loss function when compared to a Transformer-based baseline model. For CVS criteria, we achieved up to 16% improvement and, for the overall CVS assessment, we achieved 5% improvement in balanced accuracy compared to DeepCVS under the same experiment settings.
    摘要 每年有超过1.2万次Cholecystectomy(胆囊除除)手术在美国,而与经典开胆囊手术相比, Laparoscopic Cholecystectomy(LC)具有明显更短的恢复时间,因此成为首选方法。然而,LC也会导致胆囊损伤(BDIs)的增加,从而导致重要的致病和死亡率。胆囊损伤的主要原因是在LC中误认胆囊与胆囊之间的区别。 Critical View of Safety(CVS)是安全协议中最有效的一种,CVS的实施可以在手术中达到特定的标准。然而,由于CVS的理解和实施不够,BDIs的发生率在过去三十年内保持了稳定的水平。在这篇论文中,我们利用深度学习技术自动评估LC中CVS。我们的研究的创新之处在于通过结合域名知识来补偿实际数据的有限性,以提高CVS评估的准确性。我们的CVS评估过程包括将两个分割图像融合,然后根据胆囊附近的 анатомиче结构进行一定区域的估计,最后通过基于结构信息的规则来确定每一个CVS标准。我们的两栅semantic segmentation方法在相关的类型上获得了11.8%的增强,而我们的 Sobel损失函数在基于Transformer的基线模型上获得了1.84%的增强。对于CVS标准,我们达到了16%的提高,而对于总CVS评估,我们达到了5%的提高,与DeepCVS相比,在同一个实验设置下。

$\texttt{NePhi}$: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration

  • paper_url: http://arxiv.org/abs/2309.07322
  • repo_url: None
  • paper_authors: Lin Tian, Soumyadip Sengupta, Hastings Greer, Raúl San José Estépar, Marc Niethammer
  • for: 本研究提出了一种基于神经网络的凹变模型,可以实现约等于 diffeomorphic 变换。
  • methods: 这种模型使用 функциональ表示凹变,从而减少了训练和推断中的内存占用。这对大量的三维注册问题非常重要。
  • results: 我们在两个 synthetic 2D 数据集和实际的3D 肺注册中测试了 $\texttt{NePhi}$,结果显示它可以在单resolution注册设置下 достичь与 voxel-based 表示相似的准确率,同时使用更少的内存和允许更快的实例优化。
    Abstract This work proposes $\texttt{NePhi}$, a neural deformation model which results in approximately diffeomorphic transformations. In contrast to the predominant voxel-based approaches, $\texttt{NePhi}$ represents deformations functionally which allows for memory-efficient training and inference. This is of particular importance for large volumetric registrations. Further, while medical image registration approaches representing transformation maps via multi-layer perceptrons have been proposed, $\texttt{NePhi}$ facilitates both pairwise optimization-based registration $\textit{as well as}$ learning-based registration via predicted or optimized global and local latent codes. Lastly, as deformation regularity is a highly desirable property for most medical image registration tasks, $\texttt{NePhi}$ makes use of gradient inverse consistency regularization which empirically results in approximately diffeomorphic transformations. We show the performance of $\texttt{NePhi}$ on two 2D synthetic datasets as well as on real 3D lung registration. Our results show that $\texttt{NePhi}$ can achieve similar accuracies as voxel-based representations in a single-resolution registration setting while using less memory and allowing for faster instance-optimization.
    摘要 这个工作提出了一种名为 $\texttt{NePhi}$ 的神经网络变换模型,它可以生成约等于Diffusion的变换。与传统的 voxel-based 方法不同, $\texttt{NePhi}$ 表示变换函циональ地,这使得训练和推理中的内存占用更加有效。这对大量的核心注册特别重要。此外,医疗影像注册方法表示转换地图via多层感知器已经提出,但 $\texttt{NePhi}$ 可以实现对约束 optimize 的 pairwise 注册以及学习基于预测或优化的全局和本地秘密码注册。最后,由于变换规范是医疗影像注册任务中最为急需的特性之一, $\texttt{NePhi}$ 使用了梯度反转一致正则化,这些正则化在实际中能够使变换变得约等于Diffusion的。我们在两个二维 sintetic 数据集以及真实的三维肺注册任务中展示了 $\texttt{NePhi}$ 的性能,结果表明 $\texttt{NePhi}$ 可以在单个分辨率注册设置中 achieve 类似于 voxel-based 表示的准确性,使用更少的内存和允许更快的实例优化。

Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency Detection

  • paper_url: http://arxiv.org/abs/2309.07297
  • repo_url: None
  • paper_authors: Guangyu Ren, Jitesh Joshi, Youngjun Cho
  • For: 本研究旨在提高RGB-T焦点检测的精度,Addressing the limitations of existing methods that neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features.* Methods: 我们提出了一种多Modal Hybrid loss(MMHL),包括监督和自监督损失函数。 semantic features from different modalities are distinctly utilized in the supervised loss component, while the self-supervised loss component reduces the distance between RGB and thermal features. We also consider both spatial and channel information during feature fusion and propose the Hybrid Fusion Module to effectively fuse RGB and thermal features.* Results: 我们采用了一种顺序训练策略,首先在RGB图像上进行训练,然后在第二个阶段学习交叉模式特征。 This training strategy improves saliency detection performance without increasing computational overhead. Results from performance evaluation and ablation studies demonstrate the superior performance achieved by the proposed method compared with the existing state-of-the-art methods.
    Abstract RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments. However, existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features. To address this, we first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions. The supervised loss component of MMHL distinctly utilizes semantic features from different modalities, while the self-supervised loss component reduces the distance between RGB and thermal features. We further consider both spatial and channel information during feature fusion and propose the Hybrid Fusion Module to effectively fuse RGB and thermal features. Lastly, instead of jointly training the network with cross-modal features, we implement a sequential training strategy which performs training only on RGB images in the first stage and then learns cross-modal features in the second stage. This training strategy improves saliency detection performance without computational overhead. Results from performance evaluation and ablation studies demonstrate the superior performance achieved by the proposed method compared with the existing state-of-the-art methods.
    摘要

GAN-based Algorithm for Efficient Image Inpainting

  • paper_url: http://arxiv.org/abs/2309.07293
  • repo_url: None
  • paper_authors: Zhengyang Han, Zehao Jiang, Yuan Ju
  • for: 实现面部识别中掩盖面罩的问题,因为COVID-19的全球大流行导致了新的挑战。
  • methods: 使用机器学习的图像填充技术,以完成掩盖面罩中的可能的面部。特别是使用自适应器和生成敌方网络(GAN),以保留具体的图像特征和生成力。
  • results: 使用50,000个影星面部图像进行训练,获得了一个可靠的结果,但还有空间进行改进。此外,文章还讨论了模型的缺陷和改进方向,以及未来的应用范围和进一步改进方法。
    Abstract Global pandemic due to the spread of COVID-19 has post challenges in a new dimension on facial recognition, where people start to wear masks. Under such condition, the authors consider utilizing machine learning in image inpainting to tackle the problem, by complete the possible face that is originally covered in mask. In particular, autoencoder has great potential on retaining important, general features of the image as well as the generative power of the generative adversarial network (GAN). The authors implement a combination of the two models, context encoders and explain how it combines the power of the two models and train the model with 50,000 images of influencers faces and yields a solid result that still contains space for improvements. Furthermore, the authors discuss some shortcomings with the model, their possible improvements, as well as some area of study for future investigation for applicative perspective, as well as directions to further enhance and refine the model.
    摘要 全球大流行 COVID-19 病毒的蔓延已经带来了一种新的维度的面部识别挑战,由于人们开始穿着口罩。在这种情况下,作者们考虑使用机器学习图像填充技术来解决问题,通过完成掩盖在口罩中的可能的面部。特别是,自适应器具有保留图像重要特征的能力和生成型生成器网络(GAN)的生成力。作者们实现了这两种模型的组合,并解释了这两种模型如何结合并训练模型,使用50,000张影武者脸部图像,并获得了一个坚固的结果,还有空间进行改进。此外,作者们还讨论了模型的缺陷、可能的改进和未来研究的方向,以及如何进一步提高和细化模型。

Unbiased Face Synthesis With Diffusion Models: Are We There Yet?

  • paper_url: http://arxiv.org/abs/2309.07277
  • repo_url: None
  • paper_authors: Harrison Rosenberg, Shimaa Ahmed, Guruprasad V Ramesh, Ramya Korlakai Vinayak, Kassem Fawaz
  • for: 本研究旨在 investigate text-to-image diffusion models 的效果和缺陷在人脸生成方面。
  • methods: 本研究使用了一组质量量化指标和用户研究来评估生成的人脸图像。
  • results: 研究发现生成的人脸图像存在忠实度、人口群体差异和分布Shift等缺陷。此外,我们还提出了一种分析模型,可以帮助理解训练数据选择对生成模型的表现产生了什么影响。
    Abstract Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
    摘要 文本到图像扩散模型已经得到了广泛的推广,尤其是它们可以生成和修改人脸图像的能力。在这篇论文中,我们研究了生成模型在人脸生成方面的有效性和缺陷。我们使用了一组质量量化指标和用户调查来评估生成图像的特点,并应用于使用最新的文本到图像扩散模型生成的人脸图像。我们发现了一些生成人脸图像的限制,包括文本提示的准确性、人口群体差异和分布偏移。此外,我们还提出了一个分析模型,帮助我们理解训练数据选择对生成模型的影响。

So you think you can track?

  • paper_url: http://arxiv.org/abs/2309.07268
  • repo_url: https://github.com/rprokap/pset-9
  • paper_authors: Derek Gloudemans, Gergely Zachár, Yanbing Wang, Junyi Ji, Matt Nice, Matt Bunting, William Barbour, Jonathan Sprinkle, Benedetto Piccoli, Maria Laura Delle Monache, Alexandre Bayen, Benjamin Seibold, Daniel B. Work
  • for: 这篇论文旨在提供一个多 камерatracking数据集,用于测试跟踪算法的性能。
  • methods: 论文使用了234个高清晰度相机记录了一段4.2英里长的8-10车道高速公路附近纳什维尔的视频数据,并将视频数据与270辆车辆的GPS轨迹数据结合使用,以提供一组真实的轨迹数据。
  • results: 初步的测试结果显示,使用跟踪算法对视频数据进行跟踪,只能获得9.5%的最佳HOTA(最高识别率75.9%,IOU 0.1,平均每个真实轨迹对象的ID数为47.9),这表明测试的跟踪算法无法在需要的长时间和空间尺度上达到足够的性能。
    Abstract This work introduces a multi-camera tracking dataset consisting of 234 hours of video data recorded concurrently from 234 overlapping HD cameras covering a 4.2 mile stretch of 8-10 lane interstate highway near Nashville, TN. The video is recorded during a period of high traffic density with 500+ objects typically visible within the scene and typical object longevities of 3-15 minutes. GPS trajectories from 270 vehicle passes through the scene are manually corrected in the video data to provide a set of ground-truth trajectories for recall-oriented tracking metrics, and object detections are provided for each camera in the scene (159 million total before cross-camera fusion). Initial benchmarking of tracking-by-detection algorithms is performed against the GPS trajectories, and a best HOTA of only 9.5% is obtained (best recall 75.9% at IOU 0.1, 47.9 average IDs per ground truth object), indicating the benchmarked trackers do not perform sufficiently well at the long temporal and spatial durations required for traffic scene understanding.
    摘要 这项工作介绍了一个多个摄像头跟踪数据集,包含234小时的视频数据,由234个 overlap 高清晰度摄像头记录在田中的4.2英里长8-10车道高速公路附近。视频在高交通密度时期录制,typical object longevities 3-15分钟,可以看到500多个对象在场景中。GPS轨迹从270辆车辆通过场景被手动修正在视频数据中,以提供一个基准轨迹数据集,并提供了每个摄像头上的对象探测结果(共159万个)。初步测试了基于检测的跟踪算法,并在GPS轨迹上实现了最佳HOTA的9.5%(最高准确率75.9%,IOU 0.1,47.9个平均ID每个基准对象),这表明测试的跟踪算法无法在需要的长时间和空间持续时间内达到足够的性能。

Automated segmentation of rheumatoid arthritis immunohistochemistry stained synovial tissue

  • paper_url: http://arxiv.org/abs/2309.07255
  • repo_url: https://github.com/amayags/ihc_synovium_segmentation
  • paper_authors: Amaya Gallagher-Syed, Abbas Khan, Felice Rivellese, Costantino Pitzalis, Myles J. Lewis, Gregory Slabaugh, Michael R. Barnes
    for: 这个研究是为了开发一个可靠、重复的自动分类算法,以帮助研究人员分析患有慢性Autoimmune疾病的Synovial tissue标本。methods: 这个研究使用了UNET对一个手动检验、多中心临床资料集R4RA进行训练,以 Handle多种IHC染色方法和不同来源的资料集中的变化。results: 模型在DICE分数0.865的基础上,成功地分类不同类型的IHC染色、处理不同来源的资料集中的变化、和常见的WSIs错误。这个算法可以用作自动影像分析管道的第一步,从而提高速度、重复性和Robustness。
    Abstract Rheumatoid Arthritis (RA) is a chronic, autoimmune disease which primarily affects the joint's synovial tissue. It is a highly heterogeneous disease, with wide cellular and molecular variability observed in synovial tissues. Over the last two decades, the methods available for their study have advanced considerably. In particular, Immunohistochemistry stains are well suited to highlighting the functional organisation of samples. Yet, analysis of IHC-stained synovial tissue samples is still overwhelmingly done manually and semi-quantitatively by expert pathologists. This is because in addition to the fragmented nature of IHC stained synovial tissue, there exist wide variations in intensity and colour, strong clinical centre batch effect, as well as the presence of many undesirable artefacts present in gigapixel Whole Slide Images (WSIs), such as water droplets, pen annotation, folded tissue, blurriness, etc. There is therefore a strong need for a robust, repeatable automated tissue segmentation algorithm which can cope with this variability and provide support to imaging pipelines. We train a UNET on a hand-curated, heterogeneous real-world multi-centre clinical dataset R4RA, which contains multiple types of IHC staining. The model obtains a DICE score of 0.865 and successfully segments different types of IHC staining, as well as dealing with variance in colours, intensity and common WSIs artefacts from the different clinical centres. It can be used as the first step in an automated image analysis pipeline for synovial tissue samples stained with IHC, increasing speed, reproducibility and robustness.
    摘要 《急性风湿综合病(RA)是一种chronic autoimmune疾病,主要影响 JOINT的 synovial тissue。这是一种highly heterogeneous的疾病, synovial tissue中observation到了宽泛的细胞和分子多样性。过去二十年, изу究这种疾病的方法有了很大的进步。特别是,免疫组织染色(IHC)是一种非常适合高亮samples的功能组织结构的方法。然而,IHC染色synovial tissue样本的分析仍然是由专家病理学家手动、半量地进行的。这是因为synovial tissue样本的残留物和水滴落、笔迹注解、折叠、模糊等多种缺点和噪声存在。因此,有一个强大、重复的自动化组织分 segmentation算法的需求,以适应这种多样性,并提供图像管道中的支持。我们在一个手动、多中心临床数据集R4RA上训练了一个UNET模型,该模型在多种IHC染色类型中 segments不同类型的IHC染色,并处理了不同临床中心的variance in colors、intensity和常见的WSIs噪声。它可以作为自动化图像分析管道的第一步,提高速度、可重复性和 Robustness。

Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement

  • paper_url: http://arxiv.org/abs/2309.07254
  • repo_url: None
  • paper_authors: Chenghao Li, Dake Chen, Yuke Zhang, Peter A. Beerel
  • for: 减少扩散模型中的复制现象,以保护隐私。
  • methods: 引入一个通用度分数来衡量训练数据标签的通用性,并使用大语言模型(LLM)来泛化训练标签。然后,我们提出了一种双混合增强方法来缓解扩散模型中的复制现象。
  • results: 我们的提议方法可以相比原始扩散模型,降低复制率 by 43.5%,同时保持生成的多样性和质量。
    Abstract While diffusion models demonstrate a remarkable capability for generating high-quality images, their tendency to `replicate' training data raises privacy concerns. Although recent research suggests that this replication may stem from the insufficient generalization of training data captions and duplication of training images, effective mitigation strategies remain elusive. To address this gap, our paper first introduces a generality score that measures the caption generality and employ large language model (LLM) to generalize training captions. Subsequently, we leverage generalized captions and propose a novel dual fusion enhancement approach to mitigate the replication of diffusion models. Our empirical results demonstrate that our proposed methods can significantly reduce replication by 43.5% compared to the original diffusion model while maintaining the diversity and quality of generations.
    摘要 Diffusion models demonstrate remarkable image generation capabilities, but their tendency to "replicate" training data raises privacy concerns. Recent research suggests that this replication stems from insufficient generalization of training data captions and duplication of training images, but effective mitigation strategies remain elusive. To address this gap, our paper introduces a generality score to measure caption generality and uses large language models (LLM) to generalize training captions. We then propose a novel dual fusion enhancement approach to mitigate the replication of diffusion models. Our empirical results show that our proposed methods can significantly reduce replication by 43.5% compared to the original diffusion model while maintaining the diversity and quality of generations.Here's the text in Simplified Chinese characters:Diffusion models 表现出杰出的图像生成能力,但它们的“复制”行为引起隐私问题。最近的研究表明,这种复制是因为训练数据标签的不够普遍性和训练图像的重复,但有效的 mitigation 策略仍然存在问题。为了解决这个差距,我们的论文首先引入一个普遍度分数来衡量标签普遍性,然后使用大型自然语言模型(LLM)来普遍化训练标签。接着,我们提出了一种新的双拟合增强方法来缓解 diffusion 模型的复制。我们的实验结果表明,我们的提议方法可以将复制量降低到原始 diffusion 模型的 43.5% 水平,同时保持生成的多样性和质量。

  • paper_url: http://arxiv.org/abs/2309.07243
  • repo_url: None
  • paper_authors: Peter Hardy, Hansung Kim
  • for: recover 3D human poses from 2D kinematic skeletons
  • methods: lift-then-fill approach, custom sampling function, and independent lifting of skeleton parts
  • results: significantly more accurate results, improved stability and likelihood estimation, and consistent accuracy in scenarios without occlusion
    Abstract We present LInKs, a novel unsupervised learning method to recover 3D human poses from 2D kinematic skeletons obtained from a single image, even when occlusions are present. Our approach follows a unique two-step process, which involves first lifting the occluded 2D pose to the 3D domain, followed by filling in the occluded parts using the partially reconstructed 3D coordinates. This lift-then-fill approach leads to significantly more accurate results compared to models that complete the pose in 2D space alone. Additionally, we improve the stability and likelihood estimation of normalising flows through a custom sampling function replacing PCA dimensionality reduction previously used in prior work. Furthermore, we are the first to investigate if different parts of the 2D kinematic skeleton can be lifted independently which we find by itself reduces the error of current lifting approaches. We attribute this to the reduction of long-range keypoint correlations. In our detailed evaluation, we quantify the error under various realistic occlusion scenarios, showcasing the versatility and applicability of our model. Our results consistently demonstrate the superiority of handling all types of occlusions in 3D space when compared to others that complete the pose in 2D space. Our approach also exhibits consistent accuracy in scenarios without occlusion, as evidenced by a 7.9% reduction in reconstruction error compared to prior works on the Human3.6M dataset. Furthermore, our method excels in accurately retrieving complete 3D poses even in the presence of occlusions, making it highly applicable in situations where complete 2D pose information is unavailable.
    摘要 我们介绍了LInKs,一种新的无监督学习方法,用于从单张图像中提取3D人姿 pose,即使存在 occlusion。我们的方法采用了一个Unique two-step process,首先将受 occlusion 的2D pose提升到3D空间,然后使用部分重建的3D坐标填充 occluded 部分。这种 lift-then-fill 方法,与先前只在2D空间完成 pose 的模型相比,具有显著更高的准确性。此外,我们通过自定义抽样函数取代先前在先前工作中使用的 PCA 维度减少,提高了流体的稳定性和likelihood估计。进一步,我们发现可以独立提升不同部分的2D骨架,这会减少了长距离关键点相关性,从而降低 error。在我们的详细评估中,我们对不同的 occlusion 场景下的 error 进行了量化分析,并展示了我们的模型在不同的场景下的多样化和可应用性。我们的结果表明,在3D空间中处理所有类型的 occlusion 时,我们的方法具有明显的优势,并且在没有 occlusion 的场景下也具有高度的准确性。此外,我们的方法能够准确地从图像中提取完整的3D姿 pose,即使完整的2D pose信息不available。

Text-Guided Generation and Editing of Compositional 3D Avatars

  • paper_url: http://arxiv.org/abs/2309.07125
  • repo_url: https://github.com/HaoZhang990127/TECA
  • paper_authors: Hao Zhang, Yao Feng, Peter Kulits, Yandong Wen, Justus Thies, Michael J. Black
  • for: 通过文本描述生成高质量的3D人物头像,包括发型和配饰。
  • methods: 使用分解模型,将人物头部、脸部、头发和配饰分别用3D矩阵和神经辐射场(NeRF)来表示,以提高实现和编辑人物的出现。
  • results: 通过Text-guided generation and Editing of Compositional Avatars(TECA)方法,可以从文本描述中生成更加真实的3D人物头像,同时支持编辑人物的外观特征,如发型、辫子和其他配饰。
    Abstract Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest, existing methods either lack realism, produce unrealistic shapes, or do not support editing, such as modifications to the hairstyle. We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories. Our observation is that the hair and face, for example, have very different structural qualities that benefit from different representations. Building on this insight, we generate avatars with a compositional model, in which the head, face, and upper body are represented with traditional 3D meshes, and the hair, clothing, and accessories with neural radiance fields (NeRF). The model-based mesh representation provides a strong geometric prior for the face region, improving realism while enabling editing of the person's appearance. By using NeRFs to represent the remaining components, our method is able to model and synthesize parts with complex geometry and appearance, such as curly hair and fluffy scarves. Our novel system synthesizes these high-quality compositional avatars from text descriptions. The experimental results demonstrate that our method, Text-guided generation and Editing of Compositional Avatars (TECA), produces avatars that are more realistic than those of recent methods while being editable because of their compositional nature. For example, our TECA enables the seamless transfer of compositional features like hairstyles, scarves, and other accessories between avatars. This capability supports applications such as virtual try-on.
    摘要 我们的目标是创建一个现实主义3D人物头像,包括头发和配件,只使用文本描述。Recent interest in this challenge has been significant, but existing methods lack realism, produce unrealistic shapes, or do not support editing. We argue that these methods are limited because they use a monolithic modeling approach, where the head, face, hair, and accessories are represented by a single model. Our observation is that the hair and face have very different structural qualities that benefit from different representations. Based on this insight, we generate avatars with a compositional model, where the head, face, and upper body are represented by traditional 3D meshes, and the hair, clothing, and accessories are represented by neural radiance fields (NeRF). This approach provides a strong geometric prior for the face region, improving realism while enabling editing of the person's appearance. NeRFs are used to model and synthesize parts with complex geometry and appearance, such as curly hair and fluffy scarves. Our novel system, Text-guided generation and Editing of Compositional Avatars (TECA), synthesizes high-quality compositional avatars from text descriptions. Experimental results show that our method produces more realistic avatars than recent methods, and is editable due to its compositional nature. For example, our TECA enables seamless transfer of compositional features like hairstyles, scarves, and other accessories between avatars, supporting applications such as virtual try-on.

Tree-Structured Shading Decomposition

  • paper_url: http://arxiv.org/abs/2309.07122
  • repo_url: https://github.com/gcgeng/inv-shade-trees
  • paper_authors: Chen Geng, Hong-Xing Yu, Sharon Zhang, Maneesh Agrawala, Jiajun Wu
  • for: 本研究旨在从单一图像中推断物体阴影树状表示,以便对物体表面阴影进行编辑。
  • methods: 本文提出使用阴影树表示,结合基本阴影节点和混合方法来因式物体表面阴影。这种表示方式可以让不熟悉物理阴影过程的 novice 用户通过有效和直观的方式进行编辑。
  • results: 本文通过实验表明,使用 hybrid 方法可以有效地推断阴影树并且可以在不同的图像和描述符下进行应用。这些应用包括材料编辑、vectorized 阴影和重新照明。
    Abstract We study inferring a tree-structured representation from a single image for object shading. Prior work typically uses the parametric or measured representation to model shading, which is neither interpretable nor easily editable. We propose using the shade tree representation, which combines basic shading nodes and compositing methods to factorize object surface shading. The shade tree representation enables novice users who are unfamiliar with the physical shading process to edit object shading in an efficient and intuitive manner. A main challenge in inferring the shade tree is that the inference problem involves both the discrete tree structure and the continuous parameters of the tree nodes. We propose a hybrid approach to address this issue. We introduce an auto-regressive inference model to generate a rough estimation of the tree structure and node parameters, and then we fine-tune the inferred shade tree through an optimization algorithm. We show experiments on synthetic images, captured reflectance, real images, and non-realistic vector drawings, allowing downstream applications such as material editing, vectorized shading, and relighting. Project website: https://chen-geng.com/inv-shade-trees
    摘要 我们研究从单张图像推导出树状表示,以便对物体陷阱进行推断。先前的工作通常使用参数化或测量表示方法来模拟陷阱,这些方法都不是可解释的,也不是容易修改的。我们提议使用阴影树表示,这种表示结合基本阴影节点和组合方法来因式化物体表面阴影。阴影树表示使得无经验的用户可以快速和直观地编辑物体阴影。主要挑战在推导阴影树时是解决混合精度和树结构的问题。我们提出了一种混合方法,包括自动回归推断模型生成粗略的树结构和节点参数,然后通过优化算法进行细化调整。我们在Synthetic图像、捕捉反射图像、真实图像和非现实 вектор图像上进行了实验,以便下游应用如材质编辑、vector化阴影和重新照明。项目网站:https://chen-geng.com/inv-shade-trees

PILOT: A Pre-Trained Model-Based Continual Learning Toolbox

  • paper_url: http://arxiv.org/abs/2309.07117
  • repo_url: https://github.com/sun-hailong/lamda-pilot
  • paper_authors: Hai-Long Sun, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan
  • for: 本研究旨在开发一个基于预训练模型的 continual learning 工具箱(PILOT),以便在实际应用中适应新数据的到达。
  • methods: 本研究使用了一些当前领先的类增量学习算法基于预训练模型,如L2P、DualPrompt和CODA-Prompt。同时,PILOT还将典型的类增量学习算法(如DER、FOSTER和MEMO)置于预训练模型的Context中进行评估其效果。
  • results: PILOT在实际应用中表现出色,能够在不同的类增量学习任务中保持高度的性能。
    Abstract While traditional machine learning can effectively tackle a wide range of problems, it primarily operates within a closed-world setting, which presents limitations when dealing with streaming data. As a solution, incremental learning emerges to address real-world scenarios involving new data's arrival. Recently, pre-training has made significant advancements and garnered the attention of numerous researchers. The strong performance of these pre-trained models (PTMs) presents a promising avenue for developing continual learning algorithms that can effectively adapt to real-world scenarios. Consequently, exploring the utilization of PTMs in incremental learning has become essential. This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT also fits typical class-incremental learning algorithms (e.g., DER, FOSTER, and MEMO) within the context of pre-trained models to evaluate their effectiveness.
    摘要 传统机器学习可以有效地解决广泛的问题,但它通常运作在关闭世界设定下,这限制了处理流动资料的能力。为了解决这个问题,增量学习 emerges 作为一个解决方案,可以在实际世界情况下进行学习。最近,预训条件(Pre-training)已经做出了重要的进步,并吸引了许多研究人员的注意。这些预训模型(PTMs)的强大表现表明了它们可以作为增量学习的基础,以便更好地适应实际世界情况。因此,使用 PTMs 在增量学习中是必要的。本文提出了一个基于预训模型的增量学习工具箱,称为 PILOT。一方面,PILOT 实现了一些现代的分类增量学习算法,基于预训模型,如 L2P、DualPrompt 和 CODA-Prompt。另一方面,PILOT 还可以将传统的分类增量学习算法(例如 DER、FOSTER 和 MEMO)与预训模型集成,以评估其效果。

Weakly-Supervised Multi-Task Learning for Audio-Visual Speaker Verification

  • paper_url: http://arxiv.org/abs/2309.07115
  • repo_url: None
  • paper_authors: Anith Selvakumar, Homa Fashandi
  • for: 这篇论文旨在提出一种实现多modal人资料的稳定测试方法,以便进行开集 audio-visual 人识别。
  • methods: 这篇论文使用了多任务学习技术,以强化距离度量学习(DML)方法,并证明了具有弱标签的副任务可以增加学习的话者表示的紧密度。 In addition, the authors extended the Generalized end-to-end loss (GE2E) to multimodal inputs and demonstrated that it can achieve competitive performance in an audio-visual space.
  • results: 这篇论文的网络实现了开集 audio-visual speaker verification的state of the art表现, reporting 0.244%, 0.252%, 0.441% Equal Error Rate (EER) on the three official trial lists of VoxCeleb1-O/E/H, which is to our knowledge, the best published results on VoxCeleb1-E and VoxCeleb1-H.
    Abstract In this paper, we present a methodology for achieving robust multimodal person representations optimized for open-set audio-visual speaker verification. Distance Metric Learning (DML) approaches have typically dominated this problem space, owing to strong performance on new and unseen classes. In our work, we explored multitask learning techniques to further boost performance of the DML approach and show that an auxiliary task with weak labels can increase the compactness of the learned speaker representation. We also extend the Generalized end-to-end loss (GE2E) to multimodal inputs and demonstrate that it can achieve competitive performance in an audio-visual space. Finally, we introduce a non-synchronous audio-visual sampling random strategy during training time that has shown to improve generalization. Our network achieves state of the art performance for speaker verification, reporting 0.244%, 0.252%, 0.441% Equal Error Rate (EER) on the three official trial lists of VoxCeleb1-O/E/H, which is to our knowledge, the best published results on VoxCeleb1-E and VoxCeleb1-H.
    摘要 在这篇论文中,我们提出了一种方法来实现可靠的多模态人表示,optimized for open-set audio-visual speaker verification。传统的Distance Metric Learning(DML)方法在这个问题空间中占据主导地位,因为它在新和未经见的类型上表现出色。在我们的工作中,我们探索了多任务学习技术,以提高DML方法的性能,并证明了一个auxiliary任务的弱标签可以提高学习的人类表示的紧凑性。我们还扩展了Generalized end-to-end loss(GE2E)到多模态输入,并证明它可以在音频视频空间中达到竞争性能。最后,我们引入了在训练时Audio-Visual采样随机化策略,并证明其可以提高总体化。我们的网络实现了人识别的状态方法,报告了VoxCeleb1-O/E/H三个官方试验列表的EER为0.244%、0.252%和0.441%,这是我们所知道的最佳发表结果。

Contrastive Deep Encoding Enables Uncertainty-aware Machine-learning-assisted Histopathology

  • paper_url: http://arxiv.org/abs/2309.07113
  • repo_url: None
  • paper_authors: Nirhoshan Sivaroopan, Chamuditha Jayanga, Chalani Ekanayake, Hasindri Watawana, Jathurshan Pradeepkumar, Mithunjha Anandakumar, Ranga Rodrigo, Chamira U. S. Edussooriya, Dushan N. Wadduwage
  • for: 本研究旨在使用大量公共领域数据来预训深度神经网络模型,以便在医学 histopathology 图像中学习丰富的特征。
  • methods: 本研究使用了大量公共领域数据进行预训,然后使用一小部分注释数据进行精度调整。此外,研究还提出了一种不确定性意识损失函数,以衡量模型在推理过程中的信任程度。
  • results: 研究表明,使用预训后精度调整的方法可以达到当今最佳性能(SOTA),并且只需要1-10%的注释数据。此外,研究还证明了不确定性意识损失函数可以帮助专家选择最佳的实例进行进一步训练。
    Abstract Deep neural network models can learn clinically relevant features from millions of histopathology images. However generating high-quality annotations to train such models for each hospital, each cancer type, and each diagnostic task is prohibitively laborious. On the other hand, terabytes of training data -- while lacking reliable annotations -- are readily available in the public domain in some cases. In this work, we explore how these large datasets can be consciously utilized to pre-train deep networks to encode informative representations. We then fine-tune our pre-trained models on a fraction of annotated training data to perform specific downstream tasks. We show that our approach can reach the state-of-the-art (SOTA) for patch-level classification with only 1-10% randomly selected annotations compared to other SOTA approaches. Moreover, we propose an uncertainty-aware loss function, to quantify the model confidence during inference. Quantified uncertainty helps experts select the best instances to label for further training. Our uncertainty-aware labeling reaches the SOTA with significantly fewer annotations compared to random labeling. Last, we demonstrate how our pre-trained encoders can surpass current SOTA for whole-slide image classification with weak supervision. Our work lays the foundation for data and task-agnostic pre-trained deep networks with quantified uncertainty.
    摘要 深度神经网络模型可以从百万个 histopathology 图像中学习丰富的临床相关特征。然而,为每个医院、每种肿瘤类型和每个诊断任务生成高质量笔记是不可能的。一方面,一些情况下有公共领域中的 terabytes 训练数据,尽管缺乏可靠笔记,但是可以采用。在这种情况下,我们探索了如何利用这些大量数据来预训练深度网络,以便在特定下游任务上编码有用的表示。然后,我们使用一部分注释训练数据来精度地训练我们的预训练模型,以达到特定的下游任务。我们发现,我们的方法可以与其他 SOTA 方法相比,只需要1-10%的随机选择笔记,可以达到 SOTA 的 patch-level 分类结果。此外,我们还提出了一种不确定性感知损失函数,用于衡量模型在推理过程中的自信度。这种量化不确定性帮助专家选择最佳的实例进行进一步训练。我们的不确定性感知标注可以与随机标注相比,并达到 SOTA 结果。最后,我们展示了我们的预训练Encoder可以在弱级指导下超越当前 SOTA 的整个扫描图像分类结果。我们的工作为数据和任务无关的预训练深度网络和量化不确定性提供了基础。

Hardening RGB-D Object Recognition Systems against Adversarial Patch Attacks

  • paper_url: http://arxiv.org/abs/2309.07106
  • repo_url: None
  • paper_authors: Yang Zheng, Luca Demetrio, Antonio Emanuele Cinà, Xiaoyi Feng, Zhaoqiang Xia, Xiaoyue Jiang, Ambra Demontis, Battista Biggio, Fabio Roli
  • for: 这种研究是为了提高RGB-D对象识别系统的预测性能,并且通过将色彩和深度信息 fusion来实现这一目标。
  • methods: 这种研究使用了RGB-D系统,并通过对这些系统进行攻击来检验其 robustness。
  • results: 研究发现,RGB-D系统在面对攻击时的Robustness与RGB-only系统类似,而且RGB-D系统的Robustness还受到了原始图像颜色的修改的攻击。此外,研究还提出了一种基于检测机制的防御方法,可以提高RGB-D系统对攻击的Robustness。
    Abstract RGB-D object recognition systems improve their predictive performances by fusing color and depth information, outperforming neural network architectures that rely solely on colors. While RGB-D systems are expected to be more robust to adversarial examples than RGB-only systems, they have also been proven to be highly vulnerable. Their robustness is similar even when the adversarial examples are generated by altering only the original images' colors. Different works highlighted the vulnerability of RGB-D systems; however, there is a lacking of technical explanations for this weakness. Hence, in our work, we bridge this gap by investigating the learned deep representation of RGB-D systems, discovering that color features make the function learned by the network more complex and, thus, more sensitive to small perturbations. To mitigate this problem, we propose a defense based on a detection mechanism that makes RGB-D systems more robust against adversarial examples. We empirically show that this defense improves the performances of RGB-D systems against adversarial examples even when they are computed ad-hoc to circumvent this detection mechanism, and that is also more effective than adversarial training.
    摘要

Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection

  • paper_url: http://arxiv.org/abs/2309.07104
  • repo_url: None
  • paper_authors: Derek Gloudemans, Xinxuan Lu, Shepard Xia, Daniel B. Work
  • for: 提高缺乏视点的单目3D物体检测精度
  • methods: 使用新的 polygon IoU 损失函数(PIoU loss)和传统的 L1 损失函数的组合
  • results: 在三种state-of-the-art 视点不对称的3D检测模型上测试并证明了PIoU loss的更快收敛速度和更高的精度(+1.64% AP70 for MonoCon on cars, +0.18% AP70 for RTM3D on cars, and +0.83%/+2.46% AP50/AP25 for MonoRCNN on cyclists)
    Abstract Monocular 3D object detection is a challenging task because depth information is difficult to obtain from 2D images. A subset of viewpoint-agnostic monocular 3D detection methods also do not explicitly leverage scene homography or geometry during training, meaning that a model trained thusly can detect objects in images from arbitrary viewpoints. Such works predict the projections of the 3D bounding boxes on the image plane to estimate the location of the 3D boxes, but these projections are not rectangular so the calculation of IoU between these projected polygons is not straightforward. This work proposes an efficient, fully differentiable algorithm for the calculation of IoU between two convex polygons, which can be utilized to compute the IoU between two 3D bounding box footprints viewed from an arbitrary angle. We test the performance of the proposed polygon IoU loss (PIoU loss) on three state-of-the-art viewpoint-agnostic 3D detection models. Experiments demonstrate that the proposed PIoU loss converges faster than L1 loss and that in 3D detection models, a combination of PIoU loss and L1 loss gives better results than L1 loss alone (+1.64% AP70 for MonoCon on cars, +0.18% AP70 for RTM3D on cars, and +0.83%/+2.46% AP50/AP25 for MonoRCNN on cyclists).
    摘要 “监视器一阶段3D物体检测是一个具有挑战性的任务,因为深度信息从2D图像中很难获取。一些不受视点影响的监视器一阶段3D检测方法不直接利用场势对或几何学 during training,这意味着它们可以在任意视点下检测物体。这些方法预测图像平面上3D bounding box的投影,但这些投影不是正方形的,因此计算IoU(交集率) между这些投影的多边形是不直接的。本文提出了一个高效、完全可微分的多边形IoU损失(PIoU损失),可以用来计算两个3D bounding box的投影之间的IoU。我们将这个PIoU损失与L1损失进行比较,实验结果显示PIoU损失在3D检测模型中较快速收敛,并且在3D检测模型中,PIoU损失和L1损失的组合比L1损失 alone (+1.64% AP70 for MonoCon on cars, +0.18% AP70 for RTM3D on cars, and +0.83%/+2.46% AP50/AP25 for MonoRCNN on cyclists)。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline

  • paper_url: http://arxiv.org/abs/2309.07094
  • repo_url: None
  • paper_authors: Mirko Usuelli, Matteo Frosi, Paolo Cudrano, Simone Mentasti, Matteo Matteucci
  • for: The paper is written for the task of Loop Closure Detection (LCD) in robotics and computer vision, and to address the challenges of integrating radar data for this task.
  • methods: The paper proposes a novel supervised deep learning pipeline called RadarLCD, which leverages a pre-trained HERO model to select key points crucial for LCD tasks and achieve better performance than state-of-the-art methods.
  • results: The paper evaluates RadarLCD on a variety of FMCW Radar dataset scenes and shows that it surpasses state-of-the-art systems in multiple aspects of Loop Closure Detection.Here’s the Chinese translation of the three key points:
  • for: 这篇论文是为了Loop Closure Detection(LCD)任务而写的,并且解决了 integrating radar data 的挑战。
  • methods: 这篇论文提出了一种新的supervised deep learning pipeline,即RadarLCD,它利用了 pre-trained HERO 模型来选择关键的 LCD 任务点,并且在多个 FMCW Radar 数据集场景中进行评估,并与state-of-the-art系统进行比较。
  • results: 这篇论文在多个 FMCW Radar 数据集场景中进行评估,并显示RadarLCD 在多个方面的Loop Closure Detection 性能比state-of-the-art系统更高。
    Abstract Loop Closure Detection (LCD) is an essential task in robotics and computer vision, serving as a fundamental component for various applications across diverse domains. These applications encompass object recognition, image retrieval, and video analysis. LCD consists in identifying whether a robot has returned to a previously visited location, referred to as a loop, and then estimating the related roto-translation with respect to the analyzed location. Despite the numerous advantages of radar sensors, such as their ability to operate under diverse weather conditions and provide a wider range of view compared to other commonly used sensors (e.g., cameras or LiDARs), integrating radar data remains an arduous task due to intrinsic noise and distortion. To address this challenge, this research introduces RadarLCD, a novel supervised deep learning pipeline specifically designed for Loop Closure Detection using the FMCW Radar (Frequency Modulated Continuous Wave) sensor. RadarLCD, a learning-based LCD methodology explicitly designed for radar systems, makes a significant contribution by leveraging the pre-trained HERO (Hybrid Estimation Radar Odometry) model. Being originally developed for radar odometry, HERO's features are used to select key points crucial for LCD tasks. The methodology undergoes evaluation across a variety of FMCW Radar dataset scenes, and it is compared to state-of-the-art systems such as Scan Context for Place Recognition and ICP for Loop Closure. The results demonstrate that RadarLCD surpasses the alternatives in multiple aspects of Loop Closure Detection.
    摘要 Loop Closure Detection (LCD) 是 robotics 和计算机视觉中的一项重要任务,对各种应用领域有着广泛的应用,如对象识别、图像检索和视频分析。LCD 的目标是判断机器人是否返回到了之前访问过的位置(即循环),并估计相关的扭转翻译。尽管雷达传感器具有多种优势,如在不同天气条件下运行和其他普用传感器(如摄像头或 LiDAR)的视场更广泛,但将雷达数据与其他传感器结合仍然是一项困难的任务,因为雷达数据具有内在的噪声和扭曲。为解决这个挑战,本研究提出了 RadarLCD,一种基于深度学习的新型 Loop Closure Detection 管道,专门针对 Frequency Modulated Continuous Wave 雷达传感器。RadarLCD 是一种基于学习的 LCD 方法,通过利用预训练的 HERO(混合估算雷达运动)模型,选择关键点对 LCD 任务的重要性。该方法在多个 FMCW Radar 数据集场景进行评估,与状态的扫Context for Place Recognition 和 ICP for Loop Closure 相比较。结果表明,RadarLCD 在多个方面的 Loop Closure Detection 方面表现出色。

Developing a Novel Image Marker to Predict the Responses of Neoadjuvant Chemotherapy (NACT) for Ovarian Cancer Patients

  • paper_url: http://arxiv.org/abs/2309.07087
  • repo_url: None
  • paper_authors: Ke Zhang, Neman Abdoli, Patrik Gilley, Youkabed Sadri, Xuxin Chen, Theresa C. Thai, Lauren Dockery, Kathleen Moore, Robert S. Mannel, Yuchen Qiu
  • for: 这个研究的目的是开发一种新的图像标记,以便在早期预测NACT治疗的响应。
  • methods: 研究人员首先计算了1373个 радиологи学特征,以量化肿瘤特征,这些特征可以分为三类:几何特征、Intensity特征和xture特征。然后,这些特征被用principal component analysis算法优化,生成一个紧凑而有用的特征集。使用这个特征集作为输入,一个基于SVM的分类器被开发和优化,以创建一个最终的标记,表示病人是否响应NACT治疗。
  • results: 结果显示,这种新方法在ROC曲线的AUC为0.745,并达到了全体准确率的76.2%,正确预测率为70%,错误预测率为78.1%。
    Abstract Objective: Neoadjuvant chemotherapy (NACT) is one kind of treatment for advanced stage ovarian cancer patients. However, due to the nature of tumor heterogeneity, the patients' responses to NACT varies significantly among different subgroups. To address this clinical challenge, the purpose of this study is to develop a novel image marker to achieve high accuracy response prediction of the NACT at an early stage. Methods: For this purpose, we first computed a total of 1373 radiomics features to quantify the tumor characteristics, which can be grouped into three categories: geometric, intensity, and texture features. Second, all these features were optimized by principal component analysis algorithm to generate a compact and informative feature cluster. Using this cluster as the input, an SVM based classifier was developed and optimized to create a final marker, indicating the likelihood of the patient being responsive to the NACT treatment. To validate this scheme, a total of 42 ovarian cancer patients were retrospectively collected. A nested leave-one-out cross-validation was adopted for model performance assessment. Results: The results demonstrate that the new method yielded an AUC (area under the ROC [receiver characteristic operation] curve) of 0.745. Meanwhile, the model achieved overall accuracy of 76.2%, positive predictive value of 70%, and negative predictive value of 78.1%. Conclusion: This study provides meaningful information for the development of radiomics based image markers in NACT response prediction.
    摘要 目的:使用neoadjuvant化学治疗(NACT)对高度晚期卵巢癌患者进行治疗。然而,由于肿瘤多样性,患者对NACT的响应差异非常大。为解决这一临床挑战,本研究的目的是开发一种高精度响应预测 marker。方法:为达到这个目的,我们首先计算了1373个 радиологи特征,以量化肿瘤特征,这些特征可以分为三类:几何、Intensity和Texture特征。然后,我们使用主成分分析算法对这些特征进行优化,生成一个紧凑而有用的特征集。使用这个特征集作为输入,我们开发了一个基于SVM的分类器,并且优化了它以创建一个最终的marker,用于预测患者对NACT治疗的响应。以验证这种方案,我们收集了42例卵巢癌患者的数据,并采用了一种嵌入式的留一个出样验证。结果:结果显示,新的方法在ROC曲线上的AUC为0.745,并且模型的总准确率为76.2%,正确预测率为70%,错误预测率为78.1%。结论:本研究为 радиологи基于图像 marker 在NACT响应预测方面提供了有用的信息。

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

  • paper_url: http://arxiv.org/abs/2309.07084
  • repo_url: https://github.com/iranqin/supfusion
  • paper_authors: Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang
    for:* 本研究旨在提出一种新的训练策略,即SupFusion,以提高LiDAR-Camera融合的检测性能。methods:* 我们提出了一种名为极地抽象的数据增强方法,用于增强稀疏的对象,并训练一个助手模型来生成高质量的特征作为监督。* 我们还提出了一种简单 yet effective的深度融合模块,可以连续提高检测器的能力。results:* 我们的提议在KITTI测试benchmark上实现了约2%的3D mAP提升,基于多个LiDAR-Camera 3D检测器。
    Abstract In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. Our strategy involves a data enhancement method named Polar Sampling, which densifies sparse objects and trains an assistant model to generate high-quality features as the supervision. These features are then used to train the LiDAR-Camera fusion model, where the fusion feature is optimized to simulate the generated high-quality features. Furthermore, we propose a simple yet effective deep fusion module, which contiguously gains superior performance compared with previous fusion methods with SupFusion strategy. In such a manner, our proposal shares the following advantages. Firstly, SupFusion introduces auxiliary feature-level supervision which could boost LiDAR-Camera detection performance without introducing extra inference costs. Secondly, the proposed deep fusion could continuously improve the detector's abilities. Our proposed SupFusion and deep fusion module is plug-and-play, we make extensive experiments to demonstrate its effectiveness. Specifically, we gain around 2% 3D mAP improvements on KITTI benchmark based on multiple LiDAR-Camera 3D detectors.
    摘要 在这篇论文中,我们提出了一种新的训练策略,称为SupFusion,该策略提供了LiDAR-Camera fusión中的auxiliary feature层超级视图,以提高检测性能。我们的策略包括一种名为极地抽象法(Polar Sampling)的数据增强方法,该方法用于增强稀疏的对象并训练一个助手模型以生成高质量的特征。这些特征然后用于训练LiDAR-Camera fusión模型,其拼接特征优化以模拟生成的高质量特征。此外,我们还提出了一种简单 yet有效的深度融合模块,该模块可以不断提高检测器的能力。因此,我们的提议具有以下优点:一、SupFusion引入了auxiliary feature层超级视图,可以不添加额外的推理成本,提高LiDAR-Camera检测性能。二、我们提出的深度融合模块可以不断提高检测器的能力。我们的SupFusion和深度融合模块是可插入的,我们进行了广泛的实验来证明其效果。 Specifically, we gained around 2% 3D mAP improvements on KITTI benchmark based on multiple LiDAR-Camera 3D detectors.

FAIR: Frequency-aware Image Restoration for Industrial Visual Anomaly Detection

  • paper_url: http://arxiv.org/abs/2309.07068
  • repo_url: https://github.com/liutongkun/fair
  • paper_authors: Tongkun Liu, Bing Li, Xiao Du, Bingke Jiang, Leqi Geng, Feiyang Wang, Zhuo Zhao
  • for: 这种论文主要针对的是 industrial visual inspection 中的图像重建型异常检测模型,它们通常受到正常重建精度和异常重建分辨率之间的负面负担影响。
  • methods: 作者提出了一种新的自我超级视觉任务,即频率意识图像恢复(FAIR),它利用了异常重建错误的频率偏好来提高正常图像的恢复精度,同时降低不良泛化到异常图像上。
  • results: 使用简单的杂色UNet,FAIR可以在多种缺陷检测数据集上达到状态 искусственный知识的表现,并且比传统方法更高效。代码可以在 GitHub 上找到:https://github.com/liutongkun/FAIR。
    Abstract Image reconstruction-based anomaly detection models are widely explored in industrial visual inspection. However, existing models usually suffer from the trade-off between normal reconstruction fidelity and abnormal reconstruction distinguishability, which damages the performance. In this paper, we find that the above trade-off can be better mitigated by leveraging the distinct frequency biases between normal and abnormal reconstruction errors. To this end, we propose Frequency-aware Image Restoration (FAIR), a novel self-supervised image restoration task that restores images from their high-frequency components. It enables precise reconstruction of normal patterns while mitigating unfavorable generalization to anomalies. Using only a simple vanilla UNet, FAIR achieves state-of-the-art performance with higher efficiency on various defect detection datasets. Code: https://github.com/liutongkun/FAIR.
    摘要 工业视觉检查中广泛探索图像重建型异常检测模型。然而,现有模型通常受到正常重建准确性和异常重建分别性之间的负面trade-off影响,这会降低性能。在这篇论文中,我们发现了正常重建和异常重建错误频率偏好之间的明显差异。为了利用这一点,我们提出了频率意识图像恢复(FAIR)任务,该任务可以从高频组成部分恢复图像。这使得正常模式的精确重建同时减少了不利于异常检测的泛化。只使用简单的vanilla UNet,FAIR实现了与其他数据集上的状态码表现相同或更高的性能,同时更高效。代码:https://github.com/liutongkun/FAIR。

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

  • paper_url: http://arxiv.org/abs/2309.07054
  • repo_url: https://github.com/shangwei5/stgtn
  • paper_authors: Dongwei Ren, Wei Shang, Yi Yang, Wangmeng Zuo
  • for: 这种视频去滤方法是为了从给定的模糊视频中恢复连续的锐利帧而设计的。
  • methods: 该方法利用了邻域帧和当前锐利帧的hybrid transformer来集成特征。首先,我们训练了模糊检测器,以便在模糊视频中分辨锐利帧和模糊帧。然后,我们使用了窗口基于的本地transformer来利用邻域帧的特征,并在不需要显式坐标匹配的情况下通过对准器来协同集成特征。此外,我们还使用了全球transformer来聚合长期锐利特征。
  • results: 对于标准测试集,我们的提出方法在量化指标和视觉质量上都超过了现有的视频去滤方法和事件驱动视频去滤方法。ources are available at https://github.com/shangwei5/STGTN.
    Abstract Video deblurring methods, aiming at recovering consecutive sharp frames from a given blurry video, usually assume that the input video suffers from consecutively blurry frames. However, in real-world blurry videos taken by modern imaging devices, sharp frames usually appear in the given video, thus making temporal long-term sharp features available for facilitating the restoration of a blurry frame. In this work, we propose a video deblurring method that leverages both neighboring frames and present sharp frames using hybrid Transformers for feature aggregation. Specifically, we first train a blur-aware detector to distinguish between sharp and blurry frames. Then, a window-based local Transformer is employed for exploiting features from neighboring frames, where cross attention is beneficial for aggregating features from neighboring frames without explicit spatial alignment. To aggregate long-term sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability. Moreover, our method can easily be extended to event-driven video deblurring by incorporating an event fusion module into the global Transformer. Extensive experiments on benchmark datasets demonstrate that our proposed method outperforms state-of-the-art video deblurring methods as well as event-driven video deblurring methods in terms of quantitative metrics and visual quality. The source code and trained models are available at https://github.com/shangwei5/STGTN.
    摘要 “视频抖杂方法,目标是从给定的抖杂视频中恢复连续的锐利帧,通常假设输入视频中的每帧都是抖杂的。然而,现实中拍摄的视频中,有些帧是锐利的,因此可以利用这些锐利帧来帮助恢复抖杂帧。在这种情况下,我们提出了一种利用邻帧和当前锐利帧的hybrid transformer来Feature汇集的视频抖杂方法。具体来说,我们首先训练一个抖杂检测器,以便在视频中分辨锐利和抖杂帧。然后,我们使用窗口基本的本地transformer来利用邻帧中的特征,其中径观注意是有利于不需要显式匹配的特征汇集。此外,我们还利用全球的transformer来汇集长期内的锐利特征,并且我们可以轻松地扩展这种方法到事件驱动的视频抖杂。我们的方法在标准的量化指标和视觉质量上都超过了现有的视频抖杂方法和事件驱动的视频抖杂方法。我们的源代码和训练模型可以在https://github.com/shangwei5/STGTN上下载。”

Exploiting Multiple Priors for Neural 3D Indoor Reconstruction

  • paper_url: http://arxiv.org/abs/2309.07021
  • repo_url: None
  • paper_authors: Federico Lincetto, Gianluca Agresti, Mattia Rossi, Pietro Zanuttigh
  • for: 实现高质量3D重建结果的大型室内场景
  • methods: 提出了一种基于多种规则化策略的神经隐式模型方法,利用图像来实现更好的大型室内环境重建
  • results: 实验结果表明,我们的方法可以在复杂的室内场景中实现状态动态3D重建结果
    Abstract Neural implicit modeling permits to achieve impressive 3D reconstruction results on small objects, while it exhibits significant limitations in large indoor scenes. In this work, we propose a novel neural implicit modeling method that leverages multiple regularization strategies to achieve better reconstructions of large indoor environments, while relying only on images. A sparse but accurate depth prior is used to anchor the scene to the initial model. A dense but less accurate depth prior is also introduced, flexible enough to still let the model diverge from it to improve the estimated geometry. Then, a novel self-supervised strategy to regularize the estimated surface normals is presented. Finally, a learnable exposure compensation scheme permits to cope with challenging lighting conditions. Experimental results show that our approach produces state-of-the-art 3D reconstructions in challenging indoor scenarios.
    摘要

Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning

  • paper_url: http://arxiv.org/abs/2309.06987
  • repo_url: None
  • paper_authors: Riti Paul, Sahil Vora, Baoxin Li
  • for: 这 paper 的目的是解决 generalized zero-shot learning(GZSL) 中样本分类问题,即在训练时不可以获取未经训练的标签。
  • methods: 这 paper 使用了 contrastive-learning-based (instance-based) embedding 在生成网络中,并利用数据点之间的 semantic relationship。然而,现有的嵌入建模方法受到两个限制:(1)不能考虑细致的集群结构,导致嵌入特征的有限可识别性;(2)对现有的对比嵌入网络来说,采用 restriction 的扩展机制,导致嵌入空间中的表示不够多样化。为了提高嵌入空间中的表示质量,我们提出了一种 margin-based prototypical contrastive learning embedding network,该网络可以从 prototype-data 和 implicit data-data 的交互中获得 clusters 的质量提升,同时为嵌入网络和生成器提供了明显的 cluster 超visulization。
  • results: 通过对三个 benchmark 数据集进行全面的实验评估,我们表明了我们的方法可以超越当前状态的艺术。我们的方法还在 GZSL Setting 中具有最好的未经训练性能。
    Abstract Generalized zero-shot learning(GZSL) aims to classify samples from seen and unseen labels, assuming unseen labels are not accessible during training. Recent advancements in GZSL have been expedited by incorporating contrastive-learning-based (instance-based) embedding in generative networks and leveraging the semantic relationship between data points. However, existing embedding architectures suffer from two limitations: (1) limited discriminability of synthetic features' embedding without considering fine-grained cluster structures; (2) inflexible optimization due to restricted scaling mechanisms on existing contrastive embedding networks, leading to overlapped representations in the embedding space. To enhance the quality of representations in the embedding space, as mentioned in (1), we propose a margin-based prototypical contrastive learning embedding network that reaps the benefits of prototype-data (cluster quality enhancement) and implicit data-data (fine-grained representations) interaction while providing substantial cluster supervision to the embedding network and the generator. To tackle (2), we propose an instance adaptive contrastive loss that leads to generalized representations for unseen labels with increased inter-class margin. Through comprehensive experimental evaluation, we show that our method can outperform the current state-of-the-art on three benchmark datasets. Our approach also consistently achieves the best unseen performance in the GZSL setting.
    摘要 通用零例学习(GZSL)目标是将训练时见过的和未见过的标签分类,假设未见过的标签在训练过程中不可访问。Recent Advances in GZSL 被加速通过在生成网络中 incorporating 对准学习(例程)基于的嵌入,并利用数据点之间的semantic关系。然而,现有的嵌入架构受到两种限制:(1)不考虑细化类划结构, Synthetic features的嵌入不具有充分的推荐性;(2)现有对准学习网络的优化机制受限,导致嵌入空间中的表示不够强大。为了提高嵌入空间中表示质量,我们提出一种基于prototype的嵌入对照学习网络,该网络可以利用对象-数据(精细化表示)和隐式数据-数据(群体质量提高)的交互,同时为嵌入网络和生成器提供了重要的类型指导。此外,我们还提出了一种适应实例的对准损失,以提高对未见过标签的表示。通过全面的实验评估,我们显示了我们的方法可以在三个标准数据集上超越当前状态的艺术。我们的方法还一直保持了GZSL中最佳的未见表现。

Differentiable JPEG: The Devil is in the Details

  • paper_url: http://arxiv.org/abs/2309.06978
  • repo_url: https://github.com/necla-ml/diff-jpeg
  • paper_authors: Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar
  • for: 该论文旨在对现有的可微分JPEG方法进行全面的回顾,并将其中缺失的重要细节所涉及的问题作出解决。
  • methods: 该论文提出了一种新的可微分JPEG方法,该方法可以具有输入图像、JPEG质量、量化表和色彩转换参数的微分性。
  • results: 对于已有的diff. JPEG方法,该论文进行了forward和backward性的评估,并进行了广泛的ablation测试,以评估关键的设计选择。结果显示,该新的diff. JPEG方法可以距离参照实现最佳,在强大压缩率下可以提高PSNR值$9.51$dB。
    Abstract JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.
    摘要 JPEG 仍然是最广泛的损失图像编码方法之一,但它的非微分性限制了它在深度学习管道中的应用。在过去几年中,一些微分 aproximation of JPEG 已经被提出来解决这个问题。本文进行了对现有 diff. JPEG 方法的全面回顾,并发现了以前的方法缺失的关键细节。为此,我们提出了一种新的 diff. JPEG 方法,超越了先前的限制。我们的方法是对输入图像、JPEG 质量、量化表和色彩转换参数进行微分。我们评估了我们的 diff. JPEG 方法的前向和反向性能,并对关键设计选择进行了广泛的ablations。我们的提出的 diff. JPEG 方法与(非微分)参考实现最接近,在 PSNR 上平均提高 $3.47$ dB,并在强大压缩率下提高 PSNR 的 $9.51$ dB。我们的 diff. JPEG 还能够通过强大的攻击测试,证明了有效的梯度近似。我们的代码可以在 https://github.com/necla-ml/Diff-JPEG 上获取。

Neural network-based coronary dominance classification of RCA angiograms

  • paper_url: http://arxiv.org/abs/2309.06958
  • repo_url: None
  • paper_authors: Ivan Kruzhilov, Egor Ikryannikov, Artem Shadrin, Ruslan Utegenov, Galina Zubkova, Ivan Bessonov
  • for: 这 paper 是为了研究基于右 coronary artery (RCA) 涂整图像的 cardiac dominance 分类算法。
  • methods: 这 paper 使用了 Convolutional Neural Network (ConvNext) 和 Swin transformer 进行 2D 图像 (frame) 分类,并使用多数投票来 классифициayer cardio angiographic view。 auxiliary network 也用于检测无关图像,并将其从数据集中排除。
  • results: 五次交叉验证给出了以下 dominance 分类指标:macro recall=93.1%, accuracy=93.5%, macro F1=89.2%。模型通常在 RCA 堵塞和小径 combines with poor quality cardio angiographic view 的情况下失败。在这些情况下, cardiac dominance 分类可能会复杂,需要专家们之间的讨论以确定准确的结论。
    Abstract Background. Cardiac dominance classification is essential for SYNTAX score estimation, which is a tool used to determine the complexity of coronary artery disease and guide patient selection toward optimal revascularization strategy. Objectives. Cardiac dominance classification algorithm based on the analysis of right coronary artery (RCA) angiograms using neural network Method. We employed convolutional neural network ConvNext and Swin transformer for 2D image (frames) classification, along with a majority vote for cardio angiographic view classification. An auxiliary network was also used to detect irrelevant images which were then excluded from the data set. Our data set consisted of 828 angiographic studies, 192 of them being patients with left dominance. Results. 5-fold cross validation gave the following dominance classification metrics (p=95%): macro recall=93.1%, accuracy=93.5%, macro F1=89.2%. The most common case in which the model regularly failed was RCA occlusion, as it requires utilization of LCA information. Another cause for false prediction is a small diameter combined with poor quality cardio angiographic view. In such cases, cardiac dominance classification can be complex and may require discussion among specialists to reach an accurate conclusion. Conclusion. The use of machine learning approaches to classify cardiac dominance based on RCA alone has been shown to be successful with satisfactory accuracy. However, for higher accuracy, it is necessary to utilize LCA information in the case of an occluded RCA and detect cases where there is high uncertainty.
    摘要 背景:心脏主导分类是 SYNTAX 分数估计的关键因素,它用于评估心血管疾病的复杂度并选择患者最佳的再入力策略。目标:使用神经网络对右 coronary artery(RCA) angeiogram 进行分类。方法:我们使用 ConvNext 和 Swin transformer 对二维图像(帧)进行分类,并使用多数投票进行心血管视图分类。此外,我们还使用 auxilary network 检测不相关图像,并将其从数据集中排除。我们的数据集包括 828 个 angeiographic 研究,其中 192 个是左主导的患者。结果:5-fold 十字验证给出了以下主导分类指标(p=95%):macro recall=93.1%、准确率=93.5%、macro F1=89.2%。模型经常错误的情况包括 RCA 填充和小 diameter combined with poor quality 心血管视图。在这些情况下,心脏主导分类可能是复杂的,需要专家之间的讨论以达到准确的结论。结论:使用机器学习方法来基于 RCA alone 分类心脏主导成功,但是为了提高准确率,需要在 RCA 填充情况下使用 LCA 信息,并检测高不确定性的情况。

TransNet: A Transfer Learning-Based Network for Human Action Recognition

  • paper_url: http://arxiv.org/abs/2309.06951
  • repo_url: None
  • paper_authors: K. Alomar, X. Cai
  • for: 人体动作识别 (HAR) 是计算机视觉领域的高级和重要研究领域,它的应用场景广泛。
  • methods: 该文提出了一种简单 yet 多样和有效的深度学习架构,名为 TransNet,用于 HAR。TransNet 将复杂的 3D-CNN 拆分成 2D-CNN 和 1D-CNN,其中 2D-CNN 和 1D-CNN 组件分别提取视频中的空间特征和时间模式。
  • results: 对比于当前 HAR 模型,TransNet 具有更高的 fleксиibilty、模型复杂度、训练速度和分类精度。
    Abstract Human action recognition (HAR) is a high-level and significant research area in computer vision due to its ubiquitous applications. The main limitations of the current HAR models are their complex structures and lengthy training time. In this paper, we propose a simple yet versatile and effective end-to-end deep learning architecture, coined as TransNet, for HAR. TransNet decomposes the complex 3D-CNNs into 2D- and 1D-CNNs, where the 2D- and 1D-CNN components extract spatial features and temporal patterns in videos, respectively. Benefiting from its concise architecture, TransNet is ideally compatible with any pretrained state-of-the-art 2D-CNN models in other fields, being transferred to serve the HAR task. In other words, it naturally leverages the power and success of transfer learning for HAR, bringing huge advantages in terms of efficiency and effectiveness. Extensive experimental results and the comparison with the state-of-the-art models demonstrate the superior performance of the proposed TransNet in HAR in terms of flexibility, model complexity, training speed and classification accuracy.
    摘要 人体动作识别(HAR)是计算机视觉领域的高级和重要研究领域,因其广泛的应用领域。现有的HAR模型的主要限制是其复杂的结构和训练时间。在本文中,我们提出了一种简单却强大、有效的端到端深度学习架构,名为TransNet,用于HAR。TransNet将复杂的3D-CNN decomposed into 2D-和1D-CNN,其中2D-CNN和1D-CNN组件分别EXTRACT SPATIAL FEATURES AND TEMPORAL PATTERNS IN VIDEOS。由于TransNet的简洁架构,它可以 идеальcompatibility with any pre-trained state-of-the-art 2D-CNN models in other fields,可以 transferred to serve the HAR task。即,它自然地利用了转移学习的力量和成功, bringing huge advantages in terms of efficiency and effectiveness。经验证的结果和对状态 искусственныйINTelligence模型的比较表明,提议的TransNet在HAR中具有优秀的flexibility、模型复杂度、训练速度和分类精度。

Limited-Angle Tomography Reconstruction via Deep End-To-End Learning on Synthetic Data

  • paper_url: http://arxiv.org/abs/2309.06948
  • repo_url: https://github.com/99991/htc2022-tud-hhu-version-1
  • paper_authors: Thomas Germer, Jan Robine, Sebastian Konietzny, Stefan Harmeling, Tobias Uelwer
  • for: 解决限角 Tomatoesography重建问题
  • methods: 使用深度神经网络,在大量合成数据上训练
  • results: 实现了30°或40°sinogram的 Tomatoesography重建,并在 Helsinki Tomography Challenge 2022 上获得了第一名
    Abstract Computed tomography (CT) has become an essential part of modern science and medicine. A CT scanner consists of an X-ray source that is spun around an object of interest. On the opposite end of the X-ray source, a detector captures X-rays that are not absorbed by the object. The reconstruction of an image is a linear inverse problem, which is usually solved by filtered back projection. However, when the number of measurements is small, the reconstruction problem is ill-posed. This is for example the case when the X-ray source is not spun completely around the object, but rather irradiates the object only from a limited angle. To tackle this problem, we present a deep neural network that is trained on a large amount of carefully-crafted synthetic data and can perform limited-angle tomography reconstruction even for only 30{\deg} or 40{\deg} sinograms. With our approach we won the first place in the Helsinki Tomography Challenge 2022.
    摘要

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.06933
  • repo_url: None
  • paper_authors: Namhyuk Ahn, Junsoo Lee, Chunggi Lee, Kunhee Kim, Daesik Kim, Seung-Hun Nam, Kibeom Hong
  • for: 这个研究旨在探讨大规模文本至图模型的进步,尤其是在艺术领域中的应用。
  • methods: 这篇论文提出了一个名为 DreamStyler 的新框架,用于艺术图像生成。DreamStyler 利用了多阶段文本嵌入,并且可以同时进行文本至图生成和类型转移。
  • results: 实验结果显示,DreamStyler 能够在多个场景中表现出色,包括文本描述和类型参考的情况下。这表明 DreamStyler 具有优秀的创作潜力。
    Abstract Recent progresses in large-scale text-to-image models have yielded remarkable accomplishments, finding various applications in art domain. However, expressing unique characteristics of an artwork (e.g. brushwork, colortone, or composition) with text prompts alone may encounter limitations due to the inherent constraints of verbal description. To this end, we introduce DreamStyler, a novel framework designed for artistic image synthesis, proficient in both text-to-image synthesis and style transfer. DreamStyler optimizes a multi-stage textual embedding with a context-aware text prompt, resulting in prominent image quality. In addition, with content and style guidance, DreamStyler exhibits flexibility to accommodate a range of style references. Experimental results demonstrate its superior performance across multiple scenarios, suggesting its promising potential in artistic product creation.
    摘要 近期大规模文本到图像模型的进步带来了非常出色的成果,在艺术领域找到了多种应用。然而,通过文本提示alone表达艺术作品的独特特征(如笔触、颜色气息或 композиitions)可能会遇到限制,因为文本描述的本质受到限制。为此,我们介绍 DreamStyler,一种新的框架,用于艺术图像生成,具有文本到图像生成和风格传递的能力。 DreamStyler 优化了多 Stage 文本嵌入,使用 Context-aware 文本提示,从而实现了显著的图像质量。另外,通过内容和风格引用, DreamStyler 具有柔性,可以满足不同风格引用的需求。实验结果表明其在多种场景中的超越性,表明它在艺术产品创作中具有普遍的潜力。

Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

  • paper_url: http://arxiv.org/abs/2309.06924
  • repo_url: None
  • paper_authors: Zhaodong Sun, Xiaobai Li
  • for: 这个论文目的是提出一种无监督的远程生物物理测量方法,使用视频来测量血液含量信号。
  • methods: 这个方法使用3DCNN模型生成多个空间时间的血液含量信号,并采用了对比损失函数来捕捉血液含量信号的先验知识。
  • results: 这个方法在五个公共可用的数据集上进行了评估,并与现有的监督方法进行了比较。结果表明,无监督的contrast-Phys+方法可以超过现有的监督方法,即使使用部分可用或不一致的GT标签或无标签。此外,这个方法具有计算效率高、鲁棒性好、泛化能力强等优点。
    Abstract Video-based remote physiological measurement utilizes facial videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements have been shown to achieve good performance. However, the drawback of these methods is that they require facial videos with ground truth (GT) physiological signals, which are often costly and difficult to obtain. In this paper, we propose Contrast-Phys+, a method that can be trained in both unsupervised and weakly-supervised settings. We employ a 3DCNN model to generate multiple spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a contrastive loss function. We further incorporate the GT signals into contrastive learning to adapt to partial or misaligned labels. The contrastive loss encourages rPPG/GT signals from the same video to be grouped together, while pushing those from different videos apart. We evaluate our methods on five publicly available datasets that include both RGB and Near-infrared videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals, or no labels at all. Additionally, we highlight the advantages of our methods in terms of computational efficiency, noise robustness, and generalization.
    摘要 视频基于远程生物学量测量利用 faces 视频测量血液量变化信号,也称为远程 Plethysmography (rPPG)。已经展示过监督方法可以达到良好的性能。然而,这些方法需要 faces 视频中的GT 生物学信号,这些信号通常是 expensive 和困难获得的。在这篇论文中,我们提出了 Contrast-Phys+,一种可以在不监督和弱监督设置下训练的方法。我们使用 3DCNN 模型生成多个 spatiotemporal rPPG 信号,并将 rPPG 的先前知识 integrate 到一个对比损失函数中。我们进一步将 GT 信号 integrate 到对比学习中,以适应部分或错配置的标签。对比损失函数会将 rPPG/GT 信号从同一个视频集成一起,而推动它们来自不同视频的信号分离开。我们对五个公共可用的数据集进行评估,包括 RGB 和 Near-infrared 视频。Contrast-Phys+ 超过了当前最佳监督方法的性能,即使使用部分可用或错配置的 GT 标签,或没有标签。此外,我们还指出了我们的方法的计算效率、阈值鲁棒性和泛化优势。

Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning

  • paper_url: http://arxiv.org/abs/2309.06922
  • repo_url: None
  • paper_authors: Sanghyeon Kim, Hyunmo Yang, Younghyun Kim, Youngjoon Hong, Eunbyung Park
  • for: 本文旨在探讨一种基于平行和顺序分支的 adapter module,以提高大规模基础模型的精度和灵活性。
  • methods: 本文提出了一种名为 Hydra 的方法,该方法基于分支的分析,并结合平行和顺序分支来整合特性,从而提高了表达能力。此外,该方法还利用预训练权重,通过线性组合来提高预训练特征的泛化性。
  • results: 经过广泛的实验证明,Hydra 方法可以提高精度和灵活性,并且在多种应用中表现出优于单分支方法。 Code 可以在 \url{https://github.com/extremebird/Hydra} 上获取。
    Abstract The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis that parallel and sequential adaptation branches learn novel and general features during fine-tuning, respectively. The proposed method, named Hydra, due to its multi-head computational branches, combines parallel and sequential branch to integrate capabilities, which is more expressive than existing single branch methods and enables the exploration of a broader range of optimal points in the fine-tuning process. In addition, the proposed adaptation method explicitly leverages the pre-trained weights by performing a linear combination of the pre-trained features. It allows the learned features to have better generalization performance across diverse downstream tasks. Furthermore, we perform a comprehensive analysis of the characteristics of each adaptation branch with empirical evidence. Through an extensive range of experiments, encompassing comparisons and ablation studies, we substantiate the efficiency and demonstrate the superior performance of Hydra. This comprehensive evaluation underscores the potential impact and effectiveness of Hydra in a variety of applications. Our code is available on \url{https://github.com/extremebird/Hydra}
    摘要 Recent large-scale foundation models have led to the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis that parallel and sequential adaptation branches learn novel and general features during fine-tuning, respectively. The proposed method, named Hydra, due to its multi-head computational branches, combines parallel and sequential branches to integrate capabilities, which is more expressive than existing single branch methods and enables the exploration of a broader range of optimal points in the fine-tuning process. In addition, the proposed adaptation method explicitly leverages the pre-trained weights by performing a linear combination of the pre-trained features. It allows the learned features to have better generalization performance across diverse downstream tasks. Furthermore, we perform a comprehensive analysis of the characteristics of each adaptation branch with empirical evidence. Through an extensive range of experiments, encompassing comparisons and ablation studies, we substantiate the efficiency and demonstrate the superior performance of Hydra. This comprehensive evaluation underscores the potential impact and effectiveness of Hydra in a variety of applications. Our code is available on \url{https://github.com/extremebird/Hydra}.Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

CCSPNet-Joint: Efficient Joint Training Method for Traffic Sign Detection Under Extreme Conditions

  • paper_url: http://arxiv.org/abs/2309.06902
  • repo_url: https://github.com/haoqinhong/ccspnet-joint
  • paper_authors: Haoqin Hong, Yue Zhou, Xiangyu Shu, Xiangfang Hu
  • for: traffic sign detection in extreme conditions such as fog, rain, and motion blur
  • methods: CCSPNet, an efficient feature extraction module based on Transformers and CNNs, and joint training model CCSPNet-Joint
  • results: state-of-the-art performance in traffic sign detection under extreme conditions, with a 5.32% improvement in precision and an 18.09% improvement in mAP@.5 compared to end-to-end methods
    Abstract Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extraction module based on Transformers and CNNs, which effectively leverages contextual information, achieves faster inference speed and provides stronger feature enhancement capabilities. Furthermore, we establish the correlation between object detection and image denoising tasks and propose a joint training model, CCSPNet-Joint, to improve data efficiency and generalization. Finally, to validate our approach, we create the CCTSDB-AUG dataset for traffic sign detection in extreme scenarios. Extensive experiments have shown that CCSPNet achieves state-of-the-art performance in traffic sign detection under extreme conditions. Compared to end-to-end methods, CCSPNet-Joint achieves a 5.32% improvement in precision and an 18.09% improvement in mAP@.5.
    摘要 《交通标志检测是智能驾驶研究的重要方向。然而,现有方法经常忽略极端条件,如雾、雨和运动模糊。另外,末端培训策略对图像净化和物体检测模型的训练效果不够利用交互信息。为解决这些问题,我们提出了 CCSPNet,一种高效的特征提取模块,基于 Transformers 和 CNNs,可以有效利用上下文信息,实现更快的推理速度和更强的特征增强能力。此外,我们确立了对象检测和图像净化任务之间的相关性,并提出了一种共同培训模型 CCSPNet-Joint,以提高数据效率和泛化能力。最后,为证明我们的方法,我们创建了 CCTSDB-AUG 数据集,用于交通标志检测在极端情况下。广泛的实验表明,CCSPNet 在极端情况下的交通标志检测性能达到了当前最佳水平。相比于端到端方法,CCSPNet-Joint 在精度和 mAP@0.5 上具有5.32% 和18.09% 的提升。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

MagiCapture: High-Resolution Multi-Concept Portrait Customization

  • paper_url: http://arxiv.org/abs/2309.06895
  • repo_url: None
  • paper_authors: Junha Hyung, Jaeyo Shin, Jaegul Choo
  • for: 这个论文旨在个性化大规模文本到图像模型,包括稳定扩散模型,以生成高品质人脸图像。
  • methods: 该论文提出了一种基于少量主题和风格引用图像的个性化方法,可以生成高分辨率人脸图像。它还使用了一种新的注意力重фокус损失函数和辅助约束,以便在弱监督学习环境下进行稳定的学习。
  • results: 根据论文的评估,MagiCapture方法可以在生成人脸图像时提供高品质的输出,并且可以普适化到其他非人类对象上。
    Abstract Large-scale text-to-image models including Stable Diffusion are capable of generating high-fidelity photorealistic portrait images. There is an active research area dedicated to personalizing these models, aiming to synthesize specific subjects or styles using provided sets of reference images. However, despite the plausible results from these personalization methods, they tend to produce images that often fall short of realism and are not yet on a commercially viable level. This is particularly noticeable in portrait image generation, where any unnatural artifact in human faces is easily discernible due to our inherent human bias. To address this, we introduce MagiCapture, a personalization method for integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references. For instance, given a handful of random selfies, our fine-tuned model can generate high-quality portrait images in specific styles, such as passport or profile photos. The main challenge with this task is the absence of ground truth for the composed concepts, leading to a reduction in the quality of the final output and an identity shift of the source subject. To address these issues, we present a novel Attention Refocusing loss coupled with auxiliary priors, both of which facilitate robust learning within this weakly supervised learning setting. Our pipeline also includes additional post-processing steps to ensure the creation of highly realistic outputs. MagiCapture outperforms other baselines in both quantitative and qualitative evaluations and can also be generalized to other non-human objects.
    摘要 大规模的文本到图像模型,包括稳定扩散,能够生成高效率的高分辨率人脸图像。有一个活跃的研究领域专门做个性化这些模型,以生成特定主题或风格使用提供的参考图像集。然而,尽管这些个性化方法可能会生成可信的结果,但它们通常会生成图像,它们的真实性不够,而且还没有商业化水平。这是特别明显在人脸图像生成中,因为人类对人脸的偏见会让任何不自然的artifact在人脸上容易被识别出来。为解决这个问题,我们介绍了MagiCapture,一种个性化方法,可以将主题和风格概念与高分辨率人脸图像相结合,只需要几张随机的自拍照片。例如,我们的精度调整后的模型可以生成高质量的人脸图像,以特定的风格,如护照照片或 Profil photo。主要挑战在这个任务中是缺乏compose的ground truth,导致最终输出质量下降和源主题的认同shift。为解决这些问题,我们提出了一种新的注意力重新定向损失,以及auxiliary priors,它们都可以在这种弱supervised learning Setting中Robust learning。我们的管道还包括额外的后处理步骤,以确保生成的输出非常真实。MagiCapture在量和质量上都超过了其他基elines,并且可以泛化到其他非人物对象。

Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?

  • paper_url: http://arxiv.org/abs/2309.06891
  • repo_url: https://github.com/billpsomas/simpool
  • paper_authors: Bill Psomas, Ioannis Kakogeorgiou, Konstantinos Karantzalos, Yannis Avrithis
  • for: This paper aims to improve the performance of both convolutional and transformer encoders by developing a generic pooling framework and proposing a simple attention-based pooling mechanism called SimPool.
  • methods: The paper uses a combination of theoretical analysis and experimental evaluation to compare the properties of different pooling methods and derive the SimPool mechanism. The authors also propose a simple attention mechanism that can be used as a replacement for the default pooling method in both convolutional and transformer encoders.
  • results: The paper shows that SimPool improves performance on pre-training and downstream tasks, and provides attention maps that delineate object boundaries in all cases, whether supervised or self-supervised. The authors claim that SimPool is “universal” because it can be used with any type of supervision or attention mechanism, and it provides attention maps of at least as good quality as self-supervised methods without explicit losses or modifying the architecture.
    Abstract Convolutional networks and vision transformers have different forms of pairwise interactions, pooling across layers and pooling at the end of the network. Does the latter really need to be different? As a by-product of pooling, vision transformers provide spatial attention for free, but this is most often of low quality unless self-supervised, which is not well studied. Is supervision really the problem? In this work, we develop a generic pooling framework and then we formulate a number of existing methods as instantiations. By discussing the properties of each group of methods, we derive SimPool, a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. We find that, whether supervised or self-supervised, this improves performance on pre-training and downstream tasks and provides attention maps delineating object boundaries in all cases. One could thus call SimPool universal. To our knowledge, we are the first to obtain attention maps in supervised transformers of at least as good quality as self-supervised, without explicit losses or modifying the architecture. Code at: https://github.com/billpsomas/simpool.
    摘要 卷积网络和视transformer具有不同的对比交互方式,包括层内 Pooling 和网络结束处 Pooling。后者是否真的需要不同呢?作为对比 Pooling 的产物,视transformer提供了彩色注意力,但这通常是低质量的,除非自我超视,这并不是很好地研究。是超级视还是问题呢?在这项工作中,我们开发了一个通用的 Pooling 框架,然后将现有方法视为实体的实例。通过讨论每组方法的性质,我们 derivate SimPool,一种简单的注意力基于 Pooling 机制,用于替换 convolutional 和 transformer 编码器的默认 Pooling 机制。我们发现,无论是超级视还是自我超视,这种改进性能在预训练和下游任务中,并且提供了对象边界的注意力图。因此,可以称 SimPool 为通用的。据我们知道,我们是第一个在超级视中获得至少相当于自我超视的质量的注意力图, без эксплицит的损失或修改网络结构。代码可以在 GitHub 上找到:https://github.com/billpsomas/simpool。

Manufacturing Quality Control with Autoencoder-Based Defect Localization and Unsupervised Class Selection

  • paper_url: http://arxiv.org/abs/2309.06884
  • repo_url: None
  • paper_authors: Devang Mehta, Noah Klarmann
  • For: This paper aims to improve visual defect localization in manufacturing industries using a defect localizing autoencoder with unsupervised class selection.* Methods: The proposed method uses a pre-trained VGG-16 network to extract features, which are then clustered using k-means to select the most relevant classes of defects. The selected classes are augmented with natural wild textures to simulate artificial defects.* Results: The proposed method demonstrates effectiveness in improving defect detection in manufacturing industries, with precise and accurate localization of quality defects on melamine-faced boards for the furniture industry. Incorporating artificial defects into the training data shows significant potential for practical implementation in real-world quality control scenarios.Here’s the same information in Simplified Chinese text:* For: 这篇论文目标是通过使用杂化自动编码器来提高制造业中的视觉缺陷检测。* Methods: 提议的方法使用预训练的VGG-16网络提取特征,然后使用k-means归一化选择最相关的缺陷类。选择的缺陷类加以自然野生的文本涂抹来模拟人工缺陷。* Results: 提议的方法在制造业中显示出效果,能够准确地检测制造过程中的质量缺陷,并在家具行业中在批量生产中实现高精度的缺陷检测。将人工缺陷添加到训练数据中显示出了实际应用中的潜在优势。
    Abstract Manufacturing industries require efficient and voluminous production of high-quality finished goods. In the context of Industry 4.0, visual anomaly detection poses an optimistic solution for automatically controlling product quality with high precision. Automation based on computer vision poses a promising solution to prevent bottlenecks at the product quality checkpoint. We considered recent advancements in machine learning to improve visual defect localization, but challenges persist in obtaining a balanced feature set and database of the wide variety of defects occurring in the production line. This paper proposes a defect localizing autoencoder with unsupervised class selection by clustering with k-means the features extracted from a pre-trained VGG-16 network. The selected classes of defects are augmented with natural wild textures to simulate artificial defects. The study demonstrates the effectiveness of the defect localizing autoencoder with unsupervised class selection for improving defect detection in manufacturing industries. The proposed methodology shows promising results with precise and accurate localization of quality defects on melamine-faced boards for the furniture industry. Incorporating artificial defects into the training data shows significant potential for practical implementation in real-world quality control scenarios.
    摘要 制造业需要高效、大量生产高质量完成品。在第四代工业时代下,视觉异常检测提供了一个优秀的自动控制产品质量的解决方案。基于计算机视觉的自动化可以解决生产线上质量检查瓶颈。我们利用了最新的机器学习技术来提高视觉缺陷定位,但是面临着获得多样化缺陷库和平衡特征集的挑战。这篇论文提出了基于自动编码器的缺陷定位方法,通过归一化分解特征来自动选择缺陷类别。选择的缺陷类别会被人工添加自然的野生文本纹理,以模拟人工缺陷。研究表明,该方法在制造业中进行质量控制时具有高精度和准确的缺陷定位能力。通过在折射面板上使用人工添加的缺陷,研究表明了在实际应用中添加人工缺陷的可能性。

ProMap: Datasets for Product Mapping in E-commerce

  • paper_url: http://arxiv.org/abs/2309.06882
  • repo_url: None
  • paper_authors: Kateřina Macková, Martin Pilát
  • for: 这两个 datasets 用于识别两个不同的电商平台上的同一款产品。
  • methods: 这两个 datasets 包括图像和文本描述产品特性,包括产品规格,使其成为识别产品的最佳数据集。
  • results: 这两个 datasets 提供了识别产品的 Golden Standard,可以填充现有数据集中的空白,并且可以用于训练和测试识别产品的机器学习算法。
    Abstract The goal of product mapping is to decide, whether two listings from two different e-shops describe the same products. Existing datasets of matching and non-matching pairs of products, however, often suffer from incomplete product information or contain only very distant non-matching products. Therefore, while predictive models trained on these datasets achieve good results on them, in practice, they are unusable as they cannot distinguish very similar but non-matching pairs of products. This paper introduces two new datasets for product mapping: ProMapCz consisting of 1,495 Czech product pairs and ProMapEn consisting of 1,555 English product pairs of matching and non-matching products manually scraped from two pairs of e-shops. The datasets contain both images and textual descriptions of the products, including their specifications, making them one of the most complete datasets for product mapping. Additionally, the non-matching products were selected in two phases, creating two types of non-matches -- close non-matches and medium non-matches. Even the medium non-matches are pairs of products that are much more similar than non-matches in other datasets -- for example, they still need to have the same brand and similar name and price. After simple data preprocessing, several machine learning algorithms were trained on these and two the other datasets to demonstrate the complexity and completeness of ProMap datasets. ProMap datasets are presented as a golden standard for further research of product mapping filling the gaps in existing ones.
    摘要 “产品映射的目标是判断两个不同电商平台上的两个产品是否相同。现有的匹配和不匹配产品集合经常受到产品信息不完整或只包含非常远的不匹配产品的影响,因此虽然使用这些数据集训练预测模型可以获得良好的结果,但在实际应用中无法分辨非常相似但不匹配的两个产品。这篇论文介绍了两个新的产品映射数据集:ProMapCz和ProMapEn,它们分别包含1,495个捷克产品对和1,555个英文产品对匹配和不匹配产品,通过手动从两个电商平台抽取。这两个数据集包含产品图像和文本描述,包括产品规格,使其成为目前最完整的产品映射数据集之一。此外,非匹配产品被选择了两个阶段,创造了两种非匹配类型:近似非匹配和中等非匹配。即使中等非匹配也比其他数据集中的非匹配产品更相似,例如它们仍需要具有相同品牌和类似名称和价格。经过简单的数据处理后,数据集被用于训练多种机器学习算法,以示ProMap数据集的复杂性和完整性。ProMap数据集被提出为未来产品映射研究的金标准,填补现有数据集的缺陷。”

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

  • paper_url: http://arxiv.org/abs/2309.06877
  • repo_url: https://github.com/yyyooooo/dmi
  • paper_authors: Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang
  • for: 本研究目的是提高视频权利侵犯检测的精度,以保护视频创作者的利益和积极性。
  • methods: 本研究提出了两种方法来解决问题:首先,提出了一种分解原始高维特征的方法,以分离出不相互重叠的子特征,从而 removing 繁殖信息;其次,在这些子特征之上,进一步学习一种辅助特征以增强子特征。
  • results: 实验结果表明,我们的方法在两个大规模的数据集(SVD 和 VCSL)上达到了 90.1% TOP-100 mAP 和新的状态之最在 VCSL 数据集上。我们的代码和模型已经在 GitHub 上公开(https://github.com/yyyooooo/DMI/),希望能为社区作出贡献。
    Abstract The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.
    摘要 自媒体时代为我们提供了巨大的高质量视频。然而,视频版权侵犯问题现在严重地危害着视频创作者的利益和积极性。正确识别侵犯视频是一项急需要解决的问题。目前的状态艺术方法通常是将高维混合视频特征 feed 到深度神经网络中,希望神经网络可以从特征中提取有用的表示。虽然这种方法简单,但它依 heavily 靠原始杂合的特征,缺乏约束,使得神经网络可能无法提取有用的任务相关的semantic。在这篇论文中,我们尝试解决以上挑战的两个方面:1. 我们提议将原始高维特征分解成多个子特征,明确地分解特征,将每个子特征编码为独立的低维Component。我们预期子特征会含有非重叠的semantic,从而消除 redundancy 信息。2. 在子特征的基础上,我们进一步学习一个辅助特征,以增强子特征。我们 theoretically 分析了标签和分解特征之间的共 informations,得到一个损失函数,以便提取任务相关的信息。我们在两个大规模的 benchmark 数据集(即 SVD 和 VCSL)上进行了广泛的实验,结果显示,我们的方法在大规模 SVD 数据集上达到了 90.1% TOP-100 mAP,并在 VCSL 数据集上设置了新的 state-of-the-art。我们的代码和模型已经在 GitHub 上发布,希望能为社区作出贡献。

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

  • paper_url: http://arxiv.org/abs/2309.06828
  • repo_url: https://github.com/ljy19970415/unibrain
  • paper_authors: Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, Yanfeng Wang
  • for: 这个研究旨在提出一种基于大规模数据的高效级别诊断方法,以提高脑病诊断的准确性和效率。
  • methods: 该方法提出了一种层次知识强化预训练框架,称为UniBrain,该框架利用了24,770个成像报告对的大规模数据集,并采用了层次对齐机制,以强化特征学习效率。
  • results: 该方法在三个实际世界数据集和BraTS2019数据集上进行验证,与所有现有诊断方法相比,具有显著的超越性和优异表现,并与专业医生在某些疾病类型上的表现相当。
    Abstract Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed. However, the early explorations usually focus on the limited types of brain diseases in one study and train the model on the data in a small scale, yielding the bottleneck of generalization. Towards a more effective and scalable paradigm, we propose a hierarchical knowledge-enhanced pre-training framework for the universal brain MRI diagnosis, termed as UniBrain. Specifically, UniBrain leverages a large-scale dataset of 24,770 imaging-report pairs from routine diagnostics. Different from previous pre-training techniques for the unitary vision or textual feature, or with the brute-force alignment between vision and language information, we leverage the unique characteristic of report information in different granularity to build a hierarchical alignment mechanism, which strengthens the efficiency in feature learning. Our UniBrain is validated on three real world datasets with severe class imbalance and the public BraTS2019 dataset. It not only consistently outperforms all state-of-the-art diagnostic methods by a large margin and provides a superior grounding performance but also shows comparable performance compared to expert radiologists on certain disease types.
    摘要

Topology-inspired Cross-domain Network for Developmental Cervical Stenosis Quantification

  • paper_url: http://arxiv.org/abs/2309.06825
  • repo_url: None
  • paper_authors: Zhenxi Zhang, Yanyang Wang, Yao Wu, Weifei Wu
  • for: 验证 Developmental Canal Stenosis (DCS) 的数量是否正确,以便检测颈椎病变。
  • methods: 使用深度关键点本地化网络,并在坐标空间和图像空间进行交叉领域协调,以提高数量的准确性和效率。
  • results: 提出了一种名为 Topology-inspired Cross-domain Network (TCN) 的方法,可以更好地解决骨架图像中的弯曲和缺失关系问题,并提高了数量的准确性和生成性。
    Abstract Developmental Canal Stenosis (DCS) quantification is crucial in cervical spondylosis screening. Compared with quantifying DCS manually, a more efficient and time-saving manner is provided by deep keypoint localization networks, which can be implemented in either the coordinate or the image domain. However, the vertebral visualization features often lead to abnormal topological structures during keypoint localization, including keypoint distortion with edges and weakly connected structures, which cannot be fully suppressed in either the coordinate or image domain alone. To overcome this limitation, a keypoint-edge and a reparameterization modules are utilized to restrict these abnormal structures in a cross-domain manner. The keypoint-edge constraint module restricts the keypoints on the edges of vertebrae, which ensures that the distribution pattern of keypoint coordinates is consistent with those for DCS quantification. And the reparameterization module constrains the weakly connected structures in image-domain heatmaps with coordinates combined. Moreover, the cross-domain network improves spatial generalization by utilizing heatmaps and incorporating coordinates for accurate localization, which avoids the trade-off between these two properties in an individual domain. Comprehensive results of distinct quantification tasks show the superiority and generability of the proposed Topology-inspired Cross-domain Network (TCN) compared with other competing localization methods.
    摘要 发展颈部狭窄(DCS)的量化是颈部硬化检测中非常重要。相比手动量化DCS,深度关键点本地化网络可以提供更高效和时间换算的方式。然而, vertebral 视觉特征经常导致关键点本地化过程中的异常拓扑结构,包括关键点扭曲、边缘和弱连接结构,这些结构不能在坐标或图像领域独立地完全抑制。为了解决这个限制,我们提出了关键点-边缘约束模块和重parameter化模块。关键点-边缘约束模块使得关键点在颈椎边缘上分布均匀,从而确保DCS量化中的分布模式与实际相符。而重parameter化模块在图像领域的热图上使用坐标combined进行弱连接结构的约束,从而避免了坐标领域和图像领域之间的负面相互作用。此外,交叉领域网络可以提高空间总化的性能,通过使用热图和坐标进行准确的本地化,从而避免了坐标领域和图像领域之间的负面相互作用。对于不同的量化任务,我们的提案的Topology-inspired Cross-domain Network(TCN)在与其他竞争方法相比显示出了超越性和可重用性。

Tracking Particles Ejected From Active Asteroid Bennu With Event-Based Vision

  • paper_url: http://arxiv.org/abs/2309.06819
  • repo_url: None
  • paper_authors: Loïc J. Azzalini, Dario Izzo
  • for: 预测和跟踪小行星系统中的喷发物,以保证航天器安全和科学观测。
  • methods: 使用事件驱动的相机检测和跟踪几厘米大的粒子,而不是使用标准帧驱动的相机。
  • results: 可以提高类似时间紧Constrained任务的科学返回,并且可以补充现有航天器上的影像技术。
    Abstract Early detection and tracking of ejecta in the vicinity of small solar system bodies is crucial to guarantee spacecraft safety and support scientific observation. During the visit of active asteroid Bennu, the OSIRIS-REx spacecraft relied on the analysis of images captured by onboard navigation cameras to detect particle ejection events, which ultimately became one of the mission's scientific highlights. To increase the scientific return of similar time-constrained missions, this work proposes an event-based solution that is dedicated to the detection and tracking of centimetre-sized particles. Unlike a standard frame-based camera, the pixels of an event-based camera independently trigger events indicating whether the scene brightness has increased or decreased at that time and location in the sensor plane. As a result of the sparse and asynchronous spatiotemporal output, event cameras combine very high dynamic range and temporal resolution with low-power consumption, which could complement existing onboard imaging techniques. This paper motivates the use of a scientific event camera by reconstructing the particle ejection episodes reported by the OSIRIS-REx mission in a photorealistic scene generator and in turn, simulating event-based observations. The resulting streams of spatiotemporal data support future work on event-based multi-object tracking.
    摘要 早期探测和跟踪小行星附近喷发物是保证航天器安全和支持科学观测的关键。在活跃小行星奥塞里斯-雷克号航天器探测中,使用摄像头捕捉的图像进行分析以探测喷发物事件,最终成为任务的科学焦点之一。为了增加类似时间紧张任务的科学返回,本文提出了事件基于解决方案,专门用于探测和跟踪厘米级喷发物。不同于标准帧基式摄像头,事件基本摄像头的像素独立触发事件,表示抽象场景的明亮度在感知平面上增加或减少。由于事件摄像头的稀疏和 asynchronous 的特点,它们可以同时实现高动态范围和低功耗消耗,这将与现有航天器上的摄像头技术相结合,以提高任务的科学返回。本文驱动使用科学事件摄像头的使用,通过重建奥塞里斯-雷克号任务报道的喷发物 episodess 在一个实时生成的光学场景生成器中进行重建,并在转换为事件基本的探测方式下,生成喷发物跟踪数据。这些数据将支持未来的事件基本多 объек跟踪工作。

TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

  • paper_url: http://arxiv.org/abs/2309.06809
  • repo_url: None
  • paper_authors: M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Horst Possegger, Rogerio Feris, Horst Bischof
  • for: 本研究旨在提高CLIP等视觉语言模型(VLM)的视觉识别性能,使其能够更好地适应下游任务的数据分布。
  • methods: 本研究使用文本生成模型(LLM)生成的文本数据进行VLM的单向training,以提高其视觉识别性能。
  • results: 比对基eline的文本只VLM训练方法,本研究在特定任务下的适应性能提高至8.4%,细致识别性能提高至8.7%,零shot分类性能提高3.1%。
    Abstract Vision and Language Models (VLMs), such as CLIP, have enabled visual recognition of a potentially unlimited set of categories described by text prompts. However, for the best visual recognition performance, these models still require tuning to better fit the data distributions of the downstream tasks, in order to overcome the domain shift from the web-based pre-training data. Recently, it has been shown that it is possible to effectively tune VLMs without any paired data, and in particular to effectively improve VLMs visual recognition performance using text-only training data generated by Large Language Models (LLMs). In this paper, we dive deeper into this exciting text-only VLM training approach and explore ways it can be significantly further improved taking the specifics of the downstream task into account when sampling text data from LLMs. In particular, compared to the SOTA text-only VLM training approach, we demonstrate up to 8.4% performance improvement in (cross) domain-specific adaptation, up to 8.7% improvement in fine-grained recognition, and 3.1% overall average improvement in zero-shot classification compared to strong baselines.
    摘要 视力和语言模型(VLM),如CLIP,已经实现了基于文本提示的可 COUNT 类别视觉识别。然而,为了 achieve the best 视觉识别性能,这些模型仍需要调整,以适应下游任务的数据分布,并且 overcome the domain shift from the web-based pre-training data。最近,有人提出了不需要对数据进行对应的 Training 可以有效地调整 VLM 的 Visual Recognition 性能。在这篇论文中,我们会 deeper 探究这种 Text-only VLM 训练方法,并 explore 如何通过在 LLMs 生成的文本数据中采样来进一步提高 VLM 的 Visual Recognition 性能。特别是,Compared to the SOTA text-only VLM training approach,我们示出了在 Cross-domain 适应、细化识别和 zero-shot 分类中的 Performance Improvement。Here's the breakdown of the translation:* "Visual recognition" is translated as "视觉识别" (wēi jǐng zhì bèi)* "VLM" is translated as "视力和语言模型" (wēi jǐng yǔ yán yǔ mó delè)* "pre-training data" is translated as "预训练数据" (xiāng xù xíng xì)* "domain shift" is translated as "领域转移" (diàn yì zhī yì)* "downstream tasks" is translated as "下游任务" (xià yòu jìn yè)* "text-only training" is translated as "文本Only 训练" (wén tiě only xù xì)* "LLMs" is translated as "大语言模型" (dà yǔ yán mó delè)* "fine-grained recognition" is translated as "细化识别" (xì huà zhì bèi)* "zero-shot classification" is translated as "zero-shot 分类" (zhì shòu bìng lè)* "SOTA" is translated as "state-of-the-art" (zhì yì jīn yì)Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Dynamic NeRFs for Soccer Scenes

  • paper_url: http://arxiv.org/abs/2309.06802
  • repo_url: https://github.com/iSach/SoccerNeRFs
  • paper_authors: Sacha Lewin, Maxime Vandegar, Thomas Hoyoux, Olivier Barnich, Gilles Louppe
  • for: 本研究旨在解决长期困扰 novel view synthesis 领域的问题,具体来说是为 sports broadcasting 领域提供高质量的 synthetic replay。
  • methods: 本研究使用 neural radiance fields (NeRFs) 技术来解决这个问题,NeRFs 是一种基于深度学习的方法,可以生成高品质的视觉效果。
  • results: 研究表明,使用 NeRFs 技术可以在 soccer 场景中重构场景,但是这种方法无法完全满足 target 应用的质量要求。然而,这种方法仍然表现出了扎实的推动力,并且开发出了一个可用的 dataset 和代码。
    Abstract The long-standing problem of novel view synthesis has many applications, notably in sports broadcasting. Photorealistic novel view synthesis of soccer actions, in particular, is of enormous interest to the broadcast industry. Yet only a few industrial solutions have been proposed, and even fewer that achieve near-broadcast quality of the synthetic replays. Except for their setup of multiple static cameras around the playfield, the best proprietary systems disclose close to no information about their inner workings. Leveraging multiple static cameras for such a task indeed presents a challenge rarely tackled in the literature, for a lack of public datasets: the reconstruction of a large-scale, mostly static environment, with small, fast-moving elements. Recently, the emergence of neural radiance fields has induced stunning progress in many novel view synthesis applications, leveraging deep learning principles to produce photorealistic results in the most challenging settings. In this work, we investigate the feasibility of basing a solution to the task on dynamic NeRFs, i.e., neural models purposed to reconstruct general dynamic content. We compose synthetic soccer environments and conduct multiple experiments using them, identifying key components that help reconstruct soccer scenes with dynamic NeRFs. We show that, although this approach cannot fully meet the quality requirements for the target application, it suggests promising avenues toward a cost-efficient, automatic solution. We also make our work dataset and code publicly available, with the goal to encourage further efforts from the research community on the task of novel view synthesis for dynamic soccer scenes. For code, data, and video results, please see https://soccernerfs.isach.be.
    摘要 长期存在的新视图合成问题具有广泛的应用,特别是在体育直播中。高品质的新视图合成 Soccer 动作非常有价值于广播业。然而,只有一些商业解决方案被提出,而且它们几乎不公开自己的内部工作原理。使用多个静止摄像头环绕场地的设置是最佳的商业系统的一个挑战,因为它们几乎从来没有在文献中被探讨。Recently, the emergence of neural radiance fields has made significant progress in many novel view synthesis applications, using deep learning principles to produce photorealistic results in the most challenging settings. In this work, we investigate the feasibility of basing a solution to the task on dynamic NeRFs, i.e., neural models purposed to reconstruct general dynamic content. We create synthetic soccer environments and conduct multiple experiments using them, identifying key components that help reconstruct soccer scenes with dynamic NeRFs. We show that, although this approach cannot fully meet the quality requirements for the target application, it suggests promising avenues toward a cost-efficient, automatic solution. We also make our work dataset and code publicly available, with the goal to encourage further efforts from the research community on the task of novel view synthesis for dynamic soccer scenes. For code, data, and video results, please see .

Motion-Bias-Free Feature-Based SLAM

  • paper_url: http://arxiv.org/abs/2309.06792
  • repo_url: None
  • paper_authors: Alejandro Fontan, Javier Civera, Michael Milford
  • for: 提高 SLAM 在未知环境中安全部署,需要具备一些关键性能,而现有的标准测试不能完全覆盖这些性能。
  • methods: 本文提出了一些改进 feature-based SLAM 管道,以解决前后方向行程偏好问题。
  • results: 在四个数据集的完整评估中,我们的改进约束 Significantly 减少了前后方向行程偏好问题,同时改进了总轨迹误差。 elimination of SLAM motion bias has significant relevance for a wide range of robotics and computer vision applications where performance consistency is important.
    Abstract For SLAM to be safely deployed in unstructured real world environments, it must possess several key properties that are not encompassed by conventional benchmarks. In this paper we show that SLAM commutativity, that is, consistency in trajectory estimates on forward and reverse traverses of the same route, is a significant issue for the state of the art. Current pipelines show a significant bias between forward and reverse directions of travel, that is in addition inconsistent regarding which direction of travel exhibits better performance. In this paper we propose several contributions to feature-based SLAM pipelines that remedies the motion bias problem. In a comprehensive evaluation across four datasets, we show that our contributions implemented in ORB-SLAM2 substantially reduce the bias between forward and backward motion and additionally improve the aggregated trajectory error. Removing the SLAM motion bias has significant relevance for the wide range of robotics and computer vision applications where performance consistency is important.
    摘要 为了安全地部署SLAM在无结构环境中,它必须具备一些关键的特性,这些特性不包括传统测试准则。在这篇论文中,我们表明SLAM commutativity,即在前进和返回两个相同路径上的轨迹估计的一致性,是当前状态的主要问题。当前的管道显示了前进和返回两个方向的旅行中存在显著的偏好,并且这种偏好不一致地适用于哪一个方向的性能更好。在这篇论文中,我们提出了一些对feature-based SLAM管道的贡献,以解决运动偏好问题。在四个数据集的完整评估中,我们表明我们的贡献在ORB-SLAM2中实现了显著减少前进和返回运动之间的偏好,并且改善总轨迹错误。从除掉SLAM运动偏好来看,这种改进具有广泛的 роботех和计算机视觉应用中的重要性,其中性能一致性是关键的。

Remote Sensing Object Detection Meets Deep Learning: A Meta-review of Challenges and Advances

  • paper_url: http://arxiv.org/abs/2309.06751
  • repo_url: None
  • paper_authors: Xiangrong Zhang, Tianyang Zhang, Guanchun Wang, Peng Zhu, Xu Tang, Xiuping Jia, Licheng Jiao
  • for: 本文提供了一项涵盖 latest achievements in deep learning based remote sensing object detection (RSOD) 技术的 comprehensive 评论。
  • methods: 文章系统地介绍了 RSOD 领域中的五大挑战,并对它们的应用进行了分层分类。
  • results: 文章评论了 widely used 的 benchmark datasets 和评价指标,以及 RSOD 在不同应用场景中的应用。
    Abstract Remote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received longstanding attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this review aims to present a comprehensive review of the recent achievements in deep learning based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD, as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD.
    摘要 遥感对象检测(RSOD)是遥感领域中最基本和最具挑战性的任务之一,在最近几年内得到了长期的关注。在技术不断发展的今天,深度学习技术的特点强大的特征表示能力,对RSOD技术的发展带来了很大的进步。本文是遥感领域中最新的RSOD技术发展的全面回顾,涵盖了超过300篇论文。我们在这篇文章中分析了RSOD中的5个主要挑战,即多尺度对象检测、旋转对象检测、弱对象检测、小对象检测和有限监督对象检测,并在一个层次分区的方式中系统地介绍了相应的方法。此外,我们还评估了在RSOD领域中最常用的评价指标和数据集,以及RSOD的应用场景。最后,我们还提供了未来研究方向,以便进一步推动RSOD领域的研究。

MFL-YOLO: An Object Detection Model for Damaged Traffic Signs

  • paper_url: http://arxiv.org/abs/2309.06750
  • repo_url: None
  • paper_authors: Tengyang Chen, Jiangtao Ren
  • for: 这个论文的目的是提出一种基于YOLOv5s的改进对象检测方法,以检测损坏的交通标志。
  • methods: 该方法使用了一种简单的跨层损失函数,使模型在不同层次有不同的角色,从而学习更多的多样性特征。此外,模型还使用了GSConv和VoVGSCSP instead of traditional convolution and CSP。
  • results: 相比YOLOv5s,我们的MFL-YOLO方法在F1分数和mAP上提高4.3和5.1,同时降低了FLOPs的计算量 by 8.9%。在CCTSDB2021和TT100K上进行了进一步的验证,以证明我们的模型具有更好的泛化能力。
    Abstract Traffic signs are important facilities to ensure traffic safety and smooth flow, but may be damaged due to many reasons, which poses a great safety hazard. Therefore, it is important to study a method to detect damaged traffic signs. Existing object detection techniques for damaged traffic signs are still absent. Since damaged traffic signs are closer in appearance to normal ones, it is difficult to capture the detailed local damage features of damaged traffic signs using traditional object detection methods. In this paper, we propose an improved object detection method based on YOLOv5s, namely MFL-YOLO (Mutual Feature Levels Loss enhanced YOLO). We designed a simple cross-level loss function so that each level of the model has its own role, which is beneficial for the model to be able to learn more diverse features and improve the fine granularity. The method can be applied as a plug-and-play module and it does not increase the structural complexity or the computational complexity while improving the accuracy. We also replaced the traditional convolution and CSP with the GSConv and VoVGSCSP in the neck of YOLOv5s to reduce the scale and computational complexity. Compared with YOLOv5s, our MFL-YOLO improves 4.3 and 5.1 in F1 scores and mAP, while reducing the FLOPs by 8.9%. The Grad-CAM heat map visualization shows that our model can better focus on the local details of the damaged traffic signs. In addition, we also conducted experiments on CCTSDB2021 and TT100K to further validate the generalization of our model.
    摘要 交通标志是重要的安全设施,可以确保交通顺畅,但可能因多种原因受损,带来安全隐患。因此,研究一种检测受损交通标志的方法非常重要。现有的交通标志检测技术尚不存在。因为受损交通标志与正常的交通标志相似,使用传统的对象检测方法难以捕捉受损交通标志的详细地方特征。在这篇论文中,我们提出了一种改进的对象检测方法基于YOLOv5s,即MFL-YOLO(多级特征水平损失增强YOLO)。我们设计了一个简单的跨级损失函数,使得每级模型都有自己的角色,有助于模型学习更多的多样性特征,提高细腻度。这种方法可以作为插件模块使用,不增加结构复杂度或计算复杂度,同时提高准确率。我们还将传统的卷积和CSP替换为GSConv和VoVGSCSP,从颈部处理YOLOv5s来减少缩放和计算复杂度。与YOLOv5s相比,我们的MFL-YOLO提高了4.3和5.1的F1分数和MAP,同时降低了FLOPs的8.9%。Grad-CAM热力映射视觉化表示,我们的模型更好地关注受损交通标志的地方特征。此外,我们还进行了CCTSDB2021和TT100K的实验,以验证我们的模型的通用性。

Integrating GAN and Texture Synthesis for Enhanced Road Damage Detection

  • paper_url: http://arxiv.org/abs/2309.06747
  • repo_url: None
  • paper_authors: Tengyang Chen, Jiangtao Ren
  • for: 提高道路破坏检测精度,以保障安全驾驶和 prolong road durability
  • methods: 使用生成对抗网络生成多种形态的道路破坏,并利用文本合成技术提取道路xture,以控制破坏严重程度
  • results: 提高了4.1%的mAP和4.5%的F1-score
    Abstract In the domain of traffic safety and road maintenance, precise detection of road damage is crucial for ensuring safe driving and prolonging road durability. However, current methods often fall short due to limited data. Prior attempts have used Generative Adversarial Networks to generate damage with diverse shapes and manually integrate it into appropriate positions. However, the problem has not been well explored and is faced with two challenges. First, they only enrich the location and shape of damage while neglect the diversity of severity levels, and the realism still needs further improvement. Second, they require a significant amount of manual effort. To address these challenges, we propose an innovative approach. In addition to using GAN to generate damage with various shapes, we further employ texture synthesis techniques to extract road textures. These two elements are then mixed with different weights, allowing us to control the severity of the synthesized damage, which are then embedded back into the original images via Poisson blending. Our method ensures both richness of damage severity and a better alignment with the background. To save labor costs, we leverage structural similarity for automated sample selection during embedding. Each augmented data of an original image contains versions with varying severity levels. We implement a straightforward screening strategy to mitigate distribution drift. Experiments are conducted on a public road damage dataset. The proposed method not only eliminates the need for manual labor but also achieves remarkable enhancements, improving the mAP by 4.1% and the F1-score by 4.5%.
    摘要 在交通安全和路面维护领域,精确检测路面损坏是保证安全驾驶和路面使用的重要因素。然而,现有方法往往因为有限数据而无法实现。先前的尝试使用生成敌方网络(Generative Adversarial Networks,GAN)生成具有多种形状的损坏,并手动将其插入适当的位置。然而,这个问题尚未得到充分探索,面临两个挑战。首先,它们只能够丰富路面损坏的位置和形状,而忽略损坏的严重程度的多样性。其次,它们需要大量的人工努力。为解决这些挑战,我们提出了一个创新的方法。除了使用GAN生成具有多种形状的损坏之外,我们还使用 текстур合成技术提取路面的teksture。这两个元素被混合不同的重量,allowing us to control the severity of the synthesized damage。这些合成损坏被回填回原始图像中via Poisson blending,以保证损坏的丰富性和背景的 Better alignment。为避免劳动成本,我们利用结构相似性进行自动化样本选择 during embedding。每个增强的数据包含不同严重程度的版本。我们实现了一个简单的萤幕策略来缓和分布迁移。实验在公共路面损坏数据集上进行。提出的方法不仅减少了劳动成本,而且取得了很大的改进,提高了mAP by 4.1%和F1-score by 4.5%。

VEATIC: Video-based Emotion and Affect Tracking in Context Dataset

  • paper_url: http://arxiv.org/abs/2309.06745
  • repo_url: None
  • paper_authors: Zhihang Ren, Jefferson Ortega, Yifan Wang, Zhimin Chen, Yunhui Guo, Stella X. Yu, David Whitney
  • For: 这个论文的目的是为了提供一个新的大型数据集,以便更好地理解人类情绪认知的机制和通用情况。* Methods: 这篇论文使用了124个电影、纪录片和家用视频的剪辑,并通过实时注释来提供每帧视频帧的连续�valence和arousal评分。此外,论文还提出了一种新的计算机视觉任务,即在视频帧中推断人物的情绪,并提出了一种简单的模型来评估这个任务。* Results: 实验显示,使用这个数据集训练的预训练模型可以与其他类似数据集的模型进行竞争,这表明VEATIC数据集的一般性。
    Abstract Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize well on common cases for computer vision models trained on those datasets. In this work, we introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context Dataset (VEATIC), that can conquer the limitations of the previous datasets. VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation. Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame. Additionally, we propose a simple model to benchmark this new computer vision task. We also compare the performance of the pretrained model using our dataset with other similar datasets. Experiments show the competing results of our pretrained model via VEATIC, indicating the generalizability of VEATIC. Our dataset is available at https://veatic.github.io.
    摘要 人类情感认知是心理 физи学和计算机视觉领域中的一个重要话题。然而,现有的发布 datasets 有很多限制。例如,大多数 datasets 只包含表达 facial expressions 的帧。由于过去的 datasets 的限制,很难理解人类情感认知的机制或者将模型在常见情况下进行普适的推理。在这项工作中,我们介绍了一个全新的大型 datasets,即 Video-based Emotion and Affect Tracking in Context Dataset (VEATIC)。VEATIC 包含 124 个 Hollywood 电影、纪录片和家庭视频的视频剪辑,每帧有实时标注的总体情感和高度情感值。此外,我们还提出了一个新的计算机视觉任务,即根据视频帧中的上下文和人物信息来预测人物的情感。此外,我们还提出了一个简单的模型来评估这个计算机视觉任务。我们还比较了使用我们的 dataset 预训练的模型的性能与其他相似的 dataset 的模型。实验结果显示了我们的预训练模型在 VEATIC 上的竞争性。我们的 dataset 可以在 上下载。

MTD: Multi-Timestep Detector for Delayed Streaming Perception

  • paper_url: http://arxiv.org/abs/2309.06742
  • repo_url: https://github.com/yulin1004/mtd
  • paper_authors: Yihui Huang, Ningjiang Chen
  • for: 这篇论文的目的是提高自动驾驶系统的实时环境感知,以确保用户的安全和体验。
  • methods: 该论文提出了一种名为多时步检测器(MTD)的端到端检测器,该检测器使用动态路由进行多支流未来预测,使模型具有抗延迟弹性。此外,一种延迟分析模块(DAM)也被提出,用于优化现有延迟感知方法,不断监测模型推理堆栈的延迟趋势。
  • results: 该论文在Argoverse-HD数据集上进行了实验,实验结果表明,该方法在不同的延迟设置下实现了状态革命的表现。
    Abstract Autonomous driving systems require real-time environmental perception to ensure user safety and experience. Streaming perception is a task of reporting the current state of the world, which is used to evaluate the delay and accuracy of autonomous driving systems. In real-world applications, factors such as hardware limitations and high temperatures inevitably cause delays in autonomous driving systems, resulting in the offset between the model output and the world state. In order to solve this problem, this paper propose the Multi- Timestep Detector (MTD), an end-to-end detector which uses dynamic routing for multi-branch future prediction, giving model the ability to resist delay fluctuations. A Delay Analysis Module (DAM) is proposed to optimize the existing delay sensing method, continuously monitoring the model inference stack and calculating the delay trend. Moreover, a novel Timestep Branch Module (TBM) is constructed, which includes static flow and adaptive flow to adaptively predict specific timesteps according to the delay trend. The proposed method has been evaluated on the Argoverse-HD dataset, and the experimental results show that it has achieved state-of-the-art performance across various delay settings.
    摘要 自动驾驶系统需要实时环境感知以确保用户安全和体验。流动感知是报告当前世界状态的任务,用于评估自动驾驶系统的延迟和准确性。在实际应用中,硬件限制和高温会导致自动驾驶系统的延迟,从而导致模型输出和世界状态之间的偏差。为解决这个问题,本文提出了多步调用器(MTD),一种端到端检测器,使用动态路由进行多支分支未来预测,让模型具有抗延迟波动的能力。延迟分析模块(DAM)被提出,用于优化现有延迟感知方法,持续监测模型推理堆栈的延迟趋势。此外,一种新的时间步模块(TBM)被构建,包括静止流和适应流,可以适应延迟趋势进行特定时间步预测。提出的方法在Argoverse-HD数据集上进行了实验,实验结果显示,它在不同延迟设置下实现了state-of-the-art的性能。

GelFlow: Self-supervised Learning of Optical Flow for Vision-Based Tactile Sensor Displacement Measurement

  • paper_url: http://arxiv.org/abs/2309.06735
  • repo_url: None
  • paper_authors: Zhiyuan Zhang, Hua Yang, Zhouping Yin
  • for: 支持更灵活的机器人手指操作,高分辨率多模态信息可以由视觉基于感觉器获取。
  • methods: 使用自主学习的光流方法,解决现有光流方法的精度问题。
  • results: 提出了一种基于深度学习的光流方法,实现了高精度的偏移量测量。对比传统和深度学习基于光流方法,得到了更高的偏移量测量精度。
    Abstract High-resolution multi-modality information acquired by vision-based tactile sensors can support more dexterous manipulations for robot fingers. Optical flow is low-level information directly obtained by vision-based tactile sensors, which can be transformed into other modalities like force, geometry and depth. Current vision-tactile sensors employ optical flow methods from OpenCV to estimate the deformation of markers in gels. However, these methods need to be more precise for accurately measuring the displacement of markers during large elastic deformation of the gel, as this can significantly impact the accuracy of downstream tasks. This study proposes a self-supervised optical flow method based on deep learning to achieve high accuracy in displacement measurement for vision-based tactile sensors. The proposed method employs a coarse-to-fine strategy to handle large deformations by constructing a multi-scale feature pyramid from the input image. To better deal with the elastic deformation caused by the gel, the Helmholtz velocity decomposition constraint combined with the elastic deformation constraint are adopted to address the distortion rate and area change rate, respectively. A local flow fusion module is designed to smooth the optical flow, taking into account the prior knowledge of the blurred effect of gel deformation. We trained the proposed self-supervised network using an open-source dataset and compared it with traditional and deep learning-based optical flow methods. The results show that the proposed method achieved the highest displacement measurement accuracy, thereby demonstrating its potential for enabling more precise measurement of downstream tasks using vision-based tactile sensors.
    摘要 高分辨率多Modal信息由视觉基于感觉传感器获取可以支持机器人手指更灵活的抓取操作。视觉流是视觉基于感觉传感器直接获取的低级信息,可以转换为其他模式如力、几何和深度。现有的视觉感觉传感器使用OpenCV中的视觉流方法来估计gel中 marker的变形。然而,这些方法需要更加精准地测量Marker的移动 during large elastic deformation of the gel,因为这可能会对下游任务的准确性产生重要影响。本研究提出了一种基于深度学习的自主Optical flow方法来实现高精度的移动测量。该方法采用了粗细到细的策略来处理大的变形,通过构建输入图像的多尺度特征 pyramid。为了更好地处理由gel引起的弹性扭formation,该方法采用了Helmholtz速度分解约束和弹性扭formation约束来处理扭formation rate和面积变化率,分别。此外,为了更好地处理gel的模糊效应,该方法还设计了一个本地流合并模块来平滑Optical flow。我们使用了一个开源数据集来训练我们的提案的自主网络,并与传统和深度学习基于的Optical flow方法进行比较。结果显示,我们的方法实现了最高的移动测量精度,从而证明了其在视觉基于感觉传感器上的潜在应用。

Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer

  • paper_url: http://arxiv.org/abs/2309.07929
  • repo_url: None
  • paper_authors: Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li
  • for: 本研究旨在解决静音下Audio-Visual Localization和Segmentation任务中数据缺乏和数据分布不均问题,提高模型的泛化能力。
  • methods: 我们提出了Encoder-Prompt-Decoder模型,其中首先构建了Semantic-aware Audio Prompt(SAP),以帮助视觉基础模型更好地听到物体的声音。然后,我们开发了Correlation Adapter(ColA),以保持最小的训练努力并维护视觉基础模型的知识。
  • results: 我们通过广泛的实验证明,Compared with其他拟合方法,我们的方法在未看过类和跨数据集情况下表现更好, indicating that our method can better generalize to unseen data.
    Abstract Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio? In this work, we concentrate on the Audio-Visual Localization and Segmentation tasks but under the demanding zero-shot and few-shot scenarios. To achieve this goal, different from existing approaches that mostly employ the encoder-fusion-decoder paradigm to decode localization information from the fused audio-visual feature, we introduce the encoder-prompt-decoder paradigm, aiming to better fit the data scarcity and varying data distribution dilemmas with the help of abundant knowledge from pre-trained models. Specifically, we first propose to construct Semantic-aware Audio Prompt (SAP) to help the visual foundation model focus on sounding objects, meanwhile, the semantic gap between the visual and audio modalities is also encouraged to shrink. Then, we develop a Correlation Adapter (ColA) to keep minimal training efforts as well as maintain adequate knowledge of the visual foundation model. By equipping with these means, extensive experiments demonstrate that this new paradigm outperforms other fusion-based methods in both the unseen class and cross-dataset settings. We hope that our work can further promote the generalization study of Audio-Visual Localization and Segmentation in practical application scenarios.
    摘要 原文:假设我们没有直接见到对象,但可以听到它的声音。在这种情况下,模型是否可以准确地确定对象的视觉位置?在这个工作中,我们关注了音频视频本地化和分割任务,但是在零次和几次学习enario下进行。为了实现这个目标,不同于现有的方法,我们不使用混合Encoder-Fusion-Decoder模型来解码音频视频特征中的本地化信息。而是引入Encoder-Prompt-Decoder模型,以更好地适应数据缺乏和数据分布的变化问题,并利用大量的预训练模型知识。Specifically:我们首先提出了Semantic-aware Audio Prompt(SAP),帮助视觉基础模型更好地注意声音对象,同时也鼓励视觉和声音模式之间的semantic gap减小。然后,我们开发了Correlation Adapter(ColA),以保持最小的训练努力,同时也保持视觉基础模型的知识。通过这些手段,我们进行了广泛的实验,并证明了这种新方法在未看到类和跨数据集 Setting下比其他混合方法表现更好。我们希望这种工作可以进一步促进实际应用场景中Audio-Visual Localization和Segmentation的总结研究。Translation:假设我们没有直接见到对象,但可以听到它的声音。在这种情况下,模型是否可以准确地确定对象的视觉位置?在这个工作中,我们关注了音频视频本地化和分割任务,但是在零次和几次学习enario下进行。为了实现这个目标,不同于现有的方法,我们不使用混合Encoder-Fusion-Decoder模型来解码音频视频特征中的本地化信息。而是引入Encoder-Prompt-Decoder模型,以更好地适应数据缺乏和数据分布的变化问题,并利用大量的预训练模型知识。Specifically,我们首先提出了Semantic-aware Audio Prompt(SAP),帮助视觉基础模型更好地注意声音对象,同时也鼓励视觉和声音模式之间的semantic gap减小。然后,我们开发了Correlation Adapter(ColA),以保持最小的训练努力,同时也保持视觉基础模型的知识。通过这些手段,我们进行了广泛的实验,并证明了这种新方法在未看到类和跨数据集 Setting下比其他混合方法表现更好。我们希望这种工作可以进一步促进实际应用场景中Audio-Visual Localization和Segmentation的总结研究。

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

  • paper_url: http://arxiv.org/abs/2309.06728
  • repo_url: None
  • paper_authors: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu
  • for: 这个论文的目标是提出一种无监督的音频视频分割方法,以便在实际应用中避免繁琐的批处理和标注工作。
  • methods: 这个方法基于一种新的卷积权重学习策略,通过利用现有的多Modal基础模型(如检测[1]、开放世界分割[2]和多Modal协调[3])来准确地关联音频mask对。
  • results: 经验表明,该方法可以与现有的监督学习方法相比,在复杂的场景下具有良好的性能,尤其是在多个声音对象重叠的情况下。
    Abstract Audio-Visual Segmentation (AVS) aims to precisely outline audible objects in a visual scene at the pixel level. Existing AVS methods require fine-grained annotations of audio-mask pairs in supervised learning fashion. This limits their scalability since it is time consuming and tedious to acquire such cross-modality pixel level labels. To overcome this obstacle, in this work we introduce unsupervised audio-visual segmentation with no need for task-specific data annotations and model training. For tackling this newly proposed problem, we formulate a novel Cross-Modality Semantic Filtering (CMSF) approach to accurately associate the underlying audio-mask pairs by leveraging the off-the-shelf multi-modal foundation models (e.g., detection [1], open-world segmentation [2] and multi-modal alignment [3]). Guiding the proposal generation by either audio or visual cues, we design two training-free variants: AT-GDINO-SAM and OWOD-BIND. Extensive experiments on the AVS-Bench dataset show that our unsupervised approach can perform well in comparison to prior art supervised counterparts across complex scenarios with multiple auditory objects. Particularly, in situations where existing supervised AVS methods struggle with overlapping foreground objects, our models still excel in accurately segmenting overlapped auditory objects. Our code will be publicly released.
    摘要 Audio-Visual Segmentation (AVS) 目标是在视觉场景中像素级准确标识可听对象。现有的 AVS 方法需要精grained的音频 маска对在supervised 学习方式下进行标注。这限制了它们的扩展性,因为获得这种 across-modality 像素级标注是时间consuming 和痛苦的。为了解决这个问题,在这个工作中,我们介绍了无监督的音频视觉分割方法,无需任务特定的数据标注和模型训练。为解决这个新提出的问题,我们提出了一种 Cross-Modality Semantic Filtering (CMSF) 方法,以准确地关联音频 маска对。我们利用了市场上可得到的多Modal foundation models(例如检测 [1]、开放世界分割 [2]和多Modal alignment [3]),以帮助我们准确地关联音频和视觉信号。我们通过音频或视觉提示来引导提议生成,设计了两种无需训练的变体:AT-GDINO-SAM 和 OWOD-BIND。我们在 AVS-Bench 数据集上进行了广泛的实验,结果表明,我们的无监督方法可以在复杂的场景中与先前的监督性 AVS 方法相比,表现良好。特别是在多个 auditory 对象 overlap 的情况下,我们的模型仍能准确地分割 overlap 的 auditory 对象。我们将代码公开发布。

Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense

  • paper_url: http://arxiv.org/abs/2309.06724
  • repo_url: None
  • paper_authors: Jianqiao Wangni
  • for: 提供一个通用的计算 fotografraphy 框架,从不完美的图像中恢复真实场景,通过深度非 Parametric 凸 filtering (DNCF)。
  • methods: 使用一个非 Parametric 深度网络来模仿图像形成物理方程,如噪声除除、超解像、填充和闪光。DNCF 没有dependent于训练数据的参数化,因此具有强大的泛化和Robustness 对抗图像修改。
  • results: 在推理过程中,我们鼓励网络参数为非负值,创建了输入和参数之间的bi-凸函数,并采用了第二个优化算法,实现了10倍的加速。通过这些工具,我们在实验中证明了DNCF 可以在实时中防止图像分类深度网络被攻击 algorithms。
    Abstract We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.
    摘要 我们目标是提供一个通用的计算摄影框架,通过深度非 Parametric 矩阵 Filtering (DNCF) 来回归真实场景 из不完美图像。DNCF 包含一个非 Parametric 深度网络,用于模拟图像形成物理方程,如净化、超分解、填充和闪光。DNCF 没有依赖于训练数据的参数化,因此具有强大的泛化和鲁棒性,抵御恶意图像修饰。在推理过程中,我们还鼓励网络参数为非负,创建了输入和参数之间的二 conjugate 函数,这使得可以通过缺乏运行时间的第二次优化算法进行加速,比 Deep Image Prior 快速了 10 倍。通过这些工具,我们在实验中证明了它可以在实时中防止图像分类深度网络被攻击算法攻击。

Deep Attentive Time Warping

  • paper_url: http://arxiv.org/abs/2309.06720
  • repo_url: https://github.com/matsuo-shinnosuke/deep-attentive-time-warping
  • paper_authors: Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, Seiichi Uchida
  • for: 本文为了提高时间序列分类中的非线性时间扭曲问题的处理能力,提出了一种基于神经网络的任务适应时间扭曲机制。
  • methods: 本文使用了注意力模型,称为两边注意力模型,来开发一种可靠的时间扭曲机制,并通过度量学学习来训练模型。
  • results: 对比DTW和其他学习型模型,本文的模型在在线签名验证任务中显示出了superior的效果和状态革命性能。
    Abstract Similarity measures for time series are important problems for time series classification. To handle the nonlinear time distortions, Dynamic Time Warping (DTW) has been widely used. However, DTW is not learnable and suffers from a trade-off between robustness against time distortion and discriminative power. In this paper, we propose a neural network model for task-adaptive time warping. Specifically, we use the attention model, called the bipartite attention model, to develop an explicit time warping mechanism with greater distortion invariance. Unlike other learnable models using DTW for warping, our model predicts all local correspondences between two time series and is trained based on metric learning, which enables it to learn the optimal data-dependent warping for the target task. We also propose to induce pre-training of our model by DTW to improve the discriminative power. Extensive experiments demonstrate the superior effectiveness of our model over DTW and its state-of-the-art performance in online signature verification.
    摘要 时序序列相似度评估是时序分类的关键问题。为了处理非线性时间扭曲,广泛使用了动态时间扭曲(DTW)。然而,DTW不是学习的,它受到时间扭曲的质量和数据分类能力之间的负担。在这篇论文中,我们提出了一种基于神经网络的任务适应时间扭曲模型。specifically,我们使用了对称注意力模型,称为双对称注意力模型,来开发一个显式的时间扭曲机制,具有更高的扭曲不变性。与其他使用DTW进行扭曲的学习模型不同,我们的模型预测了两个时序序列之间的所有本地匹配,并基于度量学习来训练,这使得它能够学习目标任务中最佳的数据dependent扭曲。我们还提出了在我们模型中进行DTW预训练,以提高分类能力。广泛的实验证明了我们模型的超过DTW和其他现有方法的表现,并在在线签名验证中达到了最佳性能。

MPI-Flow: Learning Realistic Optical Flow with Multiplane Images

  • paper_url: http://arxiv.org/abs/2309.06714
  • repo_url: https://github.com/sharpiless/mpi-flow
  • paper_authors: Yingping Liang, Jiaming Liu, Debing Zhang, Ying Fu
    for:这个研究旨在提高学习型光流估计模型的实用性,通过将真实世界影像转换为真实光流资料集。methods:我们使用多层深度表示(Multiplane Image,MPI)来创建高度真实的新图像,并使用摄像机矩阵和plane深度来计算每个平面的光流。我们还开发了一个独立物运动模组,以分离摄像机和动物运动的影响。results:我们的方法在实验中表现出色,在真实数据集上实现了最佳性能,并且在无监督和监督式训练中均 achieve 州际表现。代码将在:\url{https://github.com/Sharpiless/MPI-Flow} 中公开。
    Abstract The accuracy of learning-based optical flow estimation models heavily relies on the realism of the training datasets. Current approaches for generating such datasets either employ synthetic data or generate images with limited realism. However, the domain gap of these data with real-world scenes constrains the generalization of the trained model to real-world applications. To address this issue, we investigate generating realistic optical flow datasets from real-world images. Firstly, to generate highly realistic new images, we construct a layered depth representation, known as multiplane images (MPI), from single-view images. This allows us to generate novel view images that are highly realistic. To generate optical flow maps that correspond accurately to the new image, we calculate the optical flows of each plane using the camera matrix and plane depths. We then project these layered optical flows into the output optical flow map with volume rendering. Secondly, to ensure the realism of motion, we present an independent object motion module that can separate the camera and dynamic object motion in MPI. This module addresses the deficiency in MPI-based single-view methods, where optical flow is generated only by camera motion and does not account for any object movement. We additionally devise a depth-aware inpainting module to merge new images with dynamic objects and address unnatural motion occlusions. We show the superior performance of our method through extensive experiments on real-world datasets. Moreover, our approach achieves state-of-the-art performance in both unsupervised and supervised training of learning-based models. The code will be made publicly available at: \url{https://github.com/Sharpiless/MPI-Flow}.
    摘要 “学习基于的光流估算模型准确性很大程度上取决于训练数据的真实性。现有的方法用 sintetic 数据或生成有限的真实性的图像来生成训练数据。然而,这些数据与实际场景之间的域差异会限制训练得到的模型在实际应用中的泛化能力。为了解决这个问题,我们研究如何从实际图像中生成真实的光流数据。首先,我们使用单视图图像生成多层次深度表示(MPI),以生成高真实性的新图像。然后,我们使用相机矩阵和深度信息计算每层的光流,并使用体积投影将层次光流投影到输出光流图中。其次,我们提出了独立物体运动模块,可以在 MPI 中分离 Camera 和动态物体的运动。这个模块解决了 MPI 基于单视图方法中的缺陷,即只能通过相机运动生成光流,而不考虑物体运动。此外,我们还提出了depth-aware填充模块,可以将新图像与动态物体合并,并解决不自然的运动遮挡。我们通过大量实验证明了我们的方法的超越性,并且在不监督和监督训练中都达到了学习基于模型的状态之巅。代码将在:\url{https://github.com/Sharpiless/MPI-Flow} 公开。”

Transparent Object Tracking with Enhanced Fusion Module

  • paper_url: http://arxiv.org/abs/2309.06701
  • repo_url: https://github.com/kalyan0510/totem
  • paper_authors: Kalyan Garigapati, Erik Blasch, Jie Wei, Haibin Ling
  • for: 这个论文是为了提高机器人任务中透明物体的追踪性能,因为这些物体的适应性和反射性环境,传统的追踪算法受到减少性能。
  • methods: 这个论文使用了一种新的特征融合技术,将透明信息融合到固定特征空间中,以便在更广泛的追踪器中使用。融合模组包括一个对应器Encoder和一个多层感知机制模组,通过关键查询基于变数的转换来嵌入透明信息到追踪管道中。
  • results: 这个论文提出了一个新的追踪架构,使用了新的融合技术以 achieve superior 的透明物体追踪 результа。该架构在 TOTB Benchmark 上获得了竞争性的结果,与现有的追踪器相比。
    Abstract Accurate tracking of transparent objects, such as glasses, plays a critical role in many robotic tasks such as robot-assisted living. Due to the adaptive and often reflective texture of such objects, traditional tracking algorithms that rely on general-purpose learned features suffer from reduced performance. Recent research has proposed to instill transparency awareness into existing general object trackers by fusing purpose-built features. However, with the existing fusion techniques, the addition of new features causes a change in the latent space making it impossible to incorporate transparency awareness on trackers with fixed latent spaces. For example, many of the current days transformer-based trackers are fully pre-trained and are sensitive to any latent space perturbations. In this paper, we present a new feature fusion technique that integrates transparency information into a fixed feature space, enabling its use in a broader range of trackers. Our proposed fusion module, composed of a transformer encoder and an MLP module, leverages key query-based transformations to embed the transparency information into the tracking pipeline. We also present a new two-step training strategy for our fusion module to effectively merge transparency features. We propose a new tracker architecture that uses our fusion techniques to achieve superior results for transparent object tracking. Our proposed method achieves competitive results with state-of-the-art trackers on TOTB, which is the largest transparent object tracking benchmark recently released. Our results and the implementation of code will be made publicly available at https://github.com/kalyan0510/TOTEM.
    摘要 准确跟踪透明物体,如镜片、玻璃等,在机器人任务中扮演着关键性的角色。由于透明物体的适应性和反射性Texture,传统的跟踪算法基于通用学习的特征会受到减少性能的影响。最近的研究提出了把透明性知识引入现有的通用物体跟踪器中,通过混合专门设计的特征。然而,现有的混合技术会导致特征空间中的变化,使得在跟踪器中引入透明性知识 become impossible。例如,现在大多数的当今天transformer-based tracker都是完全预训练的,对于特征空间的任何变化都会产生敏感反应。在这篇论文中,我们提出了一种新的特征混合技术,可以将透明信息 embed到固定特征空间中,使其能够在更广泛的跟踪器上使用。我们的提案的混合模块由transformer编码器和多层感知机制组成,利用关键Query-based变换来嵌入透明信息到跟踪管道中。我们还提出了一种新的两步训练策略,以便有效地合并透明特征。我们提议一种新的跟踪架构,使用我们的混合技术来实现superior的透明物体跟踪结果。我们的提议方法在最新的TOTBbenchmark上达到了与状态静态跟踪器相当的竞争水平。我们的结果和实现代码将于https://github.com/kalyan0510/TOTEM中公开。

STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning

  • paper_url: http://arxiv.org/abs/2309.06680
  • repo_url: https://github.com/palaashagrawal/stupd
  • paper_authors: Palaash Agrawal, Haidi Azaman, Cheston Tan
  • for: This paper aims to improve the ability of computer vision models to perform spatial reasoning and understand temporal relations in visual scenes.
  • methods: The authors propose a large-scale video dataset called STUPD, which includes 150K visual depictions of static and dynamic spatial relationships derived from prepositions of the English language, as well as 50K visual depictions of temporal relations.
  • results: The authors show that pretraining models on the STUPD dataset leads to an increase in performance on real-world datasets (ImageNet-VidVRD and Spatial Senses) compared to other pretraining datasets.
    Abstract Understanding relations between objects is crucial for understanding the semantics of a visual scene. It is also an essential step in order to bridge visual and language models. However, current state-of-the-art computer vision models still lack the ability to perform spatial reasoning well. Existing datasets mostly cover a relatively small number of spatial relations, all of which are static relations that do not intrinsically involve motion. In this paper, we propose the Spatial and Temporal Understanding of Prepositions Dataset (STUPD) -- a large-scale video dataset for understanding static and dynamic spatial relationships derived from prepositions of the English language. The dataset contains 150K visual depictions (videos and images), consisting of 30 distinct spatial prepositional senses, in the form of object interaction simulations generated synthetically using Unity3D. In addition to spatial relations, we also propose 50K visual depictions across 10 temporal relations, consisting of videos depicting event/time-point interactions. To our knowledge, no dataset exists that represents temporal relations through visual settings. In this dataset, we also provide 3D information about object interactions such as frame-wise coordinates, and descriptions of the objects used. The goal of this synthetic dataset is to help models perform better in visual relationship detection in real-world settings. We demonstrate an increase in the performance of various models over 2 real-world datasets (ImageNet-VidVRD and Spatial Senses) when pretrained on the STUPD dataset, in comparison to other pretraining datasets.
    摘要 理解物体之间的关系是视觉Scene的semantics理解的关键,同时也是将视觉和语言模型相连的重要步骤。然而,当前状态的计算机视觉模型仍然缺乏空间逻辑的能力。现有的数据集主要覆盖了一些相对较小的空间关系,其中所有都是静止的关系,不含动态的变化。在这篇论文中,我们提出了空间和时间理解预position数据集(STUPD)——一个大规模的视频数据集,用于理解静止和动态空间关系,从英语预position中提取出来。该数据集包含150万个视觉表示(视频和图像),包括30种不同的空间预position感,通过Unity3D Synthetically生成的对象互动 simulations。此外,我们还提出了50万个视觉表示的10种时间关系,包括视频显示事件/时间点互动。我们知道,没有任何数据集表示了时间关系通过视觉设置。在这个数据集中,我们还提供了对象互动中的帧坐标和使用的对象描述。该人工数据集的目标是帮助模型在实际设置中更好地检测视觉关系。我们在STUPD数据集上进行预训练后,与ImageNet-VidVRD和空间感数据集进行比较,发现模型在这些数据集上表现出了明显的提升。

ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal

  • paper_url: http://arxiv.org/abs/2309.06670
  • repo_url: None
  • paper_authors: Weiwen Chen, Shenghong Luo, Xuhang Chen, Zinuo Li, Shuqiang Wang, Chi-Man Pun
  • for: 本研究旨在解决手持设备捕捉文档时出现的文档阴影问题,以提高文档的可读性。
  • methods: 该研究提出了一种基于Transformer架构的ShaDocFormer模型,它将传统方法和深度学习技术相结合,以解决文档阴影除去的问题。ShaDocFormer模型包括两个组件:阴影敏感检测器(STD)和彩色融合级联器(CFR)。STD模块使用传统的阈值技术,并通过Transformer的注意机制获取全局信息,以准确检测阴影面积。CFR模块采用级联和汇集结构,以实现从粗到细的修复过程,以捕捉整个图像的变化。
  • results: 实验表明,ShaDocFormer模型在Qualitative和Quantitative两个维度上都能够超越当前状态艺的方法。
    Abstract Document shadow is a common issue that arise when capturing documents using mobile devices, which significantly impacts the readability. Current methods encounter various challenges including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process for the entire image. As a result, ShaDocFormer excels in accurately detecting and capturing variations in both shadow and illumination, thereby enabling effective removal of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms current state-of-the-art methods in both qualitative and quantitative measurements.
    摘要 文档阴影是手持设备捕捉文档时常见的问题,对于文档的可读性有很大的影响。现有方法面临着各种挑战,包括不准确的阴影面掩模板和灯光量的估算。本文提出了ShaDocFormer,一种基于Transformer架构的架构,该架构集成了传统方法和深度学习技术,用于解决文档阴影除去的问题。ShaDocFormer架构包括两个组件:阴影感知阈值检测器(STD)和缓存融合修正器(CFR)。STD模块使用传统的阈值技术,并利用Transformer的注意机制,以全局信息的收集,以准确探测阴影面掩模板。CFR模块采用缓存和融合的结构,实现了从粗到细的修复过程,以便整个图像的修复。因此,ShaDocFormer能够准确探测和捕捉阴影和灯光的变化,从而实现有效地除去阴影。经验表明,ShaDocFormer在质量和量度上都超过当前状态的方法。

LCReg: Long-Tailed Image Classification with Latent Categories based Recognition

  • paper_url: http://arxiv.org/abs/2309.07186
  • repo_url: None
  • paper_authors: Weide Liu, Zhonghua Wu, Yiming Wang, Henghui Ding, Fayao Liu, Jie Lin, Guosheng Lin
  • for: long-tailed image recognition
  • methods: 使用类共同尺度特征学习和Semantic数据增强来提高特征表示
  • results: 在五个长尾图像识别数据集上进行了广泛的实验,与基eline进行比较,得到了显著提高的结果
    Abstract In this work, we tackle the challenging problem of long-tailed image recognition. Previous long-tailed recognition approaches mainly focus on data augmentation or re-balancing strategies for the tail classes to give them more attention during model training. However, these methods are limited by the small number of training images for the tail classes, which results in poor feature representations. To address this issue, we propose the Latent Categories based long-tail Recognition (LCReg) method. Our hypothesis is that common latent features shared by head and tail classes can be used to improve feature representation. Specifically, we learn a set of class-agnostic latent features shared by both head and tail classes, and then use semantic data augmentation on the latent features to implicitly increase the diversity of the training sample. We conduct extensive experiments on five long-tailed image recognition datasets, and the results show that our proposed method significantly improves the baselines.
    摘要 在这项工作中,我们解决了长尾图像识别的挑战问题。先前的长尾识别方法主要集中在数据扩展或重新平衡策略上,以给尾类提供更多的注意力 durante 模型训练。然而,这些方法受到尾类训练图像的少量限制,导致feature表示不佳。为解决这个问题,我们提出了Latent Categories based long-tail Recognition(LCReg)方法。我们的假设是,头和尾类共享的潜在特征可以提高特征表示。具体来说,我们学习一组不同类型的潜在特征,然后使用语义数据扩展在这些潜在特征上进行隐式增加训练样本的多样性。我们在五个长尾图像识别 datasets 进行了广泛的实验,结果显示,我们提出的方法可以明显提高基elines。

Generalizable Neural Fields as Partially Observed Neural Processes

  • paper_url: http://arxiv.org/abs/2309.06660
  • repo_url: None
  • paper_authors: Jeffrey Gu, Kuan-Chieh Wang, Serena Yeung
  • for: 代表信号为函数参数化的神经场是一种有前途的代替方案,比传统的离散 вектор或网格基本表示更好地扩展性、连续性和可微性。
  • methods: 我们提出了一种新的思路,视为大规模培育神经表示为部分观察神经过程框架,并利用神经过程算法解决这个问题。
  • results: 我们的方法比现有的梯度基于元学习方法和卷积网络方法更高效,并且可以更好地利用信号之间的共享信息或结构。
    Abstract Neural fields, which represent signals as a function parameterized by a neural network, are a promising alternative to traditional discrete vector or grid-based representations. Compared to discrete representations, neural representations both scale well with increasing resolution, are continuous, and can be many-times differentiable. However, given a dataset of signals that we would like to represent, having to optimize a separate neural field for each signal is inefficient, and cannot capitalize on shared information or structures among signals. Existing generalization methods view this as a meta-learning problem and employ gradient-based meta-learning to learn an initialization which is then fine-tuned with test-time optimization, or learn hypernetworks to produce the weights of a neural field. We instead propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task. We demonstrate that this approach outperforms both state-of-the-art gradient-based meta-learning approaches and hypernetwork approaches.
    摘要 neural fields, which represent signals as a function parameterized by a neural network, are a promising alternative to traditional discrete vector or grid-based representations. Compared to discrete representations, neural representations both scale well with increasing resolution, are continuous, and can be many-times differentiable. However, given a dataset of signals that we would like to represent, having to optimize a separate neural field for each signal is inefficient, and cannot capitalize on shared information or structures among signals. Existing generalization methods view this as a meta-learning problem and employ gradient-based meta-learning to learn an initialization which is then fine-tuned with test-time optimization, or learn hypernetworks to produce the weights of a neural field. We instead propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task. We demonstrate that this approach outperforms both state-of-the-art gradient-based meta-learning approaches and hypernetwork approaches.Here's the word-for-word translation:神经场, 表示信号为神经网络参数化的函数, 是传统栅格化或简单向量表示的有前途的替代方案。与简单表示相比, 神经表示可以扩展到高分辨率, 连续, 可多次导数。但是, 给定一个信号集, 每个信号都需要优化单独的神经场, 这是不效率的, 无法利用信号之间共享的信息或结构。现有的总结方法视为meta学习问题, 使用梯度基本的meta学习学习初始化, 然后在测试时进行优化, 或学习卷积网络生成神经场的权重。我们则提出了一种新的思路, 视为大规模训练神经表示为部分观察神经过程框架的一部分, 并利用神经过程算法解决这个问题。我们示出了这种方法在状态统计学习术的梯度基本meta学习方法和卷积网络方法之上具有优势。

Event-Driven Imaging in Turbid Media: A Confluence of Optoelectronics and Neuromorphic Computation

  • paper_url: http://arxiv.org/abs/2309.06652
  • repo_url: None
  • paper_authors: Ning Zhang, Timothy Shea, Arto Nurmikko
  • for: 这篇论文旨在探讨如何使用光学计算方法揭示在浓雾媒体中难以识别的目标图像。
  • methods: 这种新方法基于人类视觉,首先将散射光转换为脉冲信号,然后使用神经元模型进行图像重建。
  • results: 研究人员通过对不同的MNIST字体和图像集进行图像重建,成功地解决了透明媒体中图像不可见的问题,并且可以准确地识别出图像的内容。
    Abstract In this paper a new optical-computational method is introduced to unveil images of targets whose visibility is severely obscured by light scattering in dense, turbid media. The targets of interest are taken to be dynamic in that their optical properties are time-varying whether stationary in space or moving. The scheme, to our knowledge the first of its kind, is human vision inspired whereby diffuse photons collected from the turbid medium are first transformed to spike trains by a dynamic vision sensor as in the retina, and image reconstruction is then performed by a neuromorphic computing approach mimicking the brain. We combine benchtop experimental data in both reflection (backscattering) and transmission geometries with support from physics-based simulations to develop a neuromorphic computational model and then apply this for image reconstruction of different MNIST characters and image sets by a dedicated deep spiking neural network algorithm. Image reconstruction is achieved under conditions of turbidity where an original image is unintelligible to the human eye or a digital video camera, yet clearly and quantifiable identifiable when using the new neuromorphic computational approach.
    摘要 在这篇论文中,我们介绍了一种新的光电计算方法,用于揭示受到干扰媒体散射的目标图像。目标图像是动态的,即其光学性质在时间上变化,可能是静止的或者移动的。我们的方法是基于人视系统的,通过将散射媒体中的散射光转化为脉冲 trains,然后使用神经网络模型来重建图像。我们结合了实验和物理学习模型,并使用专门的深度脉冲神经网络算法来实现图像重建。我们发现,在某些情况下,使用我们的方法可以在干扰媒体中揭示出清晰可读的图像,而人类眼或数字摄像头则无法识别到这些图像。

cs.AI - 2023-09-13

Efficient quantum recurrent reinforcement learning via quantum reservoir computing

  • paper_url: http://arxiv.org/abs/2309.07339
  • repo_url: None
  • paper_authors: Samuel Yen-Chi Chen
  • for: This paper aims to address the challenge of inefficient training in quantum reinforcement learning (QRL) models incorporating quantum recurrent neural networks (QRNNs).
  • methods: The proposed approach utilizes QLSTM-based reservoirs, with randomly initialized and fixed parameters, and trains the model using the asynchronous advantage actor-critic (A3C) algorithm.
  • results: Numerical simulations demonstrate the efficacy of the proposed QLSTM-Reservoir RL framework, achieving comparable results to a fully trained QLSTM RL model with the same architecture and training settings.
    Abstract Quantum reinforcement learning (QRL) has emerged as a framework to solve sequential decision-making tasks, showcasing empirical quantum advantages. A notable development is through quantum recurrent neural networks (QRNNs) for memory-intensive tasks such as partially observable environments. However, QRL models incorporating QRNN encounter challenges such as inefficient training of QRL with QRNN, given that the computation of gradients in QRNN is both computationally expensive and time-consuming. This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM). QLSTM parameters are randomly initialized and fixed without training. The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm. Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework. Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings.
    摘要 This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM)。 QLSTM parameters are randomly initialized and fixed without training。 The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm。Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework。 Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings。

Learning from Auxiliary Sources in Argumentative Revision Classification

  • paper_url: http://arxiv.org/abs/2309.07334
  • repo_url: None
  • paper_authors: Tazin Afrin, Diane Litman
  • for: 这个论文是为了开发用于分类辩论性修订的模型。
  • methods: 论文使用了两种方法:多任务学习和传输学习,以利用相似任务的辅助数据来提高分类器性能。
  • results: 论文的内在和外在评估结果显示,multi-task learning和传输学习都可以提高分类器性能,其中传输学习更好地表达数据之间的关系。
    Abstract We develop models to classify desirable reasoning revisions in argumentative writing. We explore two approaches -- multi-task learning and transfer learning -- to take advantage of auxiliary sources of revision data for similar tasks. Results of intrinsic and extrinsic evaluations show that both approaches can indeed improve classifier performance over baselines. While multi-task learning shows that training on different sources of data at the same time may improve performance, transfer-learning better represents the relationship between the data.
    摘要 我们开发了一些模型,用于分类 Argumentative 写作中的可covetous 修订。我们explore两种方法:多任务学习和转移学习,以利用 auxiliary 数据来提高分类器性能。结果显示,两种方法都可以提高分类器的性能,比baseline更好。而多任务学习表明,在同时使用不同的数据源进行训练可以提高性能,而转移学习更好地表达数据之间的关系。

Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining

  • paper_url: http://arxiv.org/abs/2309.07332
  • repo_url: https://github.com/xzhan96-stf/icp_train_clean
  • paper_authors: Xianghao Zhan, Qinmei Xu, Yuanning Zheng, Guangming Lu, Olivier Gevaert
  • for: 提高生物医学数据标注的精度,解决传统半supervised学习方法在使用大量标注数据时仍然表现不佳的问题。
  • methods: 提出了一种基于概率预测的数据清洁方法,利用 inductive conformal prediction (ICP) 计算出的可靠度指标, Rectify 标注数据和噪声数据,提高数据标注的精度。
  • results: 在三种不同的模式下进行了三种类型的分类任务,包括使用标题和摘要滤除 DILI 文献、通过 CT 成像和电子医疗记录预测 COVID-19 患者 ICU admit、以及使用 RNA 序列数据分型乳腺癌。结果表明,该方法可以显著提高分类性能,包括增加精度、AUROC 和 AUPRC。
    Abstract Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.
    摘要 准确标注生物医学数据存在挑战。传统的半监督学习方法经常不充分利用可用的无标签数据。为解决这个问题,我们提议一种基于可靠性的训练数据清洁方法,利用抽象匹配预测(ICP)计算的可靠性指标来正确化涂抹数据和异常值在大量的噪音训练数据中。我们在三种不同的Modalities上进行了三种类型的分类任务:在标题和摘要中筛选药物引起的liver损伤文献(DILI)、通过CT成像记录和电子医疗记录预测COVID-19患者ICU入院,以及使用RNA测序数据进行乳腺癌类型分类。我们对训练标签进行了各种噪音添加,并评估了方法的效果。结果显示,我们的方法可以帮助提高分类性能:DILI中的86个实验中(最高11.4%)、COVID-19中的所有48个实验(最高23.8%和69.8%)和RNA测序中的47个实验中(最高74.6%和89.0%)。我们的方法可以在多模态生物医学机器学习任务中帮助提高分类性能,而不需要大量的精心筛选训练数据。

Racing Control Variable Genetic Programming for Symbolic Regression

  • paper_url: http://arxiv.org/abs/2309.07934
  • repo_url: None
  • paper_authors: Nan Jiang, Yexiang Xue
  • for: 这 paper 旨在提高 symbolic regression 的效率和准确性,使其能够更快地从实验数据中找到 governing equations。
  • methods: 这 paper 使用了 Control Variable Genetic Programming (CVGP) 和 Racing Control Variable Genetic Programming (Racing-CVGP) 两种方法, CVGP 通过设计控制变量实验来加速 regression 过程,而 Racing-CVGP 则同时执行多个 experiment schedule,选择最佳 experiment schedule 以提高效率。
  • results: 该 paper 在多个 synthetic 和实际世界数据集上进行了测试,并证明 Racing-CVGP 可以比 CVGP 和一系列基于固定数据集的 symbolic regressors 更高效和准确地找到 governing equations。
    Abstract Symbolic regression, as one of the most crucial tasks in AI for science, discovers governing equations from experimental data. Popular approaches based on genetic programming, Monte Carlo tree search, or deep reinforcement learning learn symbolic regression from a fixed dataset. They require massive datasets and long training time especially when learning complex equations involving many variables. Recently, Control Variable Genetic Programming (CVGP) has been introduced which accelerates the regression process by discovering equations from designed control variable experiments. However, the set of experiments is fixed a-priori in CVGP and we observe that sub-optimal selection of experiment schedules delay the discovery process significantly. To overcome this limitation, we propose Racing Control Variable Genetic Programming (Racing-CVGP), which carries out multiple experiment schedules simultaneously. A selection scheme similar to that used in selecting good symbolic equations in the genetic programming process is implemented to ensure that promising experiment schedules eventually win over the average ones. The unfavorable schedules are terminated early to save time for the promising ones. We evaluate Racing-CVGP on several synthetic and real-world datasets corresponding to true physics laws. We demonstrate that Racing-CVGP outperforms CVGP and a series of symbolic regressors which discover equations from fixed datasets.
    摘要 Symbolic regression, as one of the most crucial tasks in AI for science, 发现 governing equations from experimental data. 受欢迎的方法包括生物学编程、Monte Carlo tree search 或深度奖励学习,它们从固定的数据集中学习 symbolic regression, 但它们需要庞大的数据集和长时间训练,特别是当学习复杂的方程式时。 最近,Control Variable Genetic Programming (CVGP) 被引入,它可以从设计的控制变量实验中发现方程式。 然而,CVGP 中的实验安排是固定的,我们发现在实验中选择的方案会延迟发现过程的进度。 为了解决这个限制,我们提出了 Racing Control Variable Genetic Programming (Racing-CVGP),它同时执行多个实验安排。 我们实施了一种类似于在生物学编程过程中选择好的符号方程式的选择方式,以确保Promising experiment schedules 最终胜过平均的那些。 不利的安排将在早些时候被终止,以便把时间花在Promising ones 上。 我们对 Racing-CVGP 进行了多个 synthetic 和真实世界数据集的评估,并证明它在 CVGP 和一系列基于固定数据集的符号回归器之上表现出色。

Traveling Words: A Geometric Interpretation of Transformers

  • paper_url: http://arxiv.org/abs/2309.07315
  • repo_url: https://github.com/santiag0m/traveling-words
  • paper_authors: Raul Molina
  • for: 本研究旨在解释转换器内部机制的几何视角,以便更好地理解转换器如何处理自然语言处理任务。
  • methods: 本文使用层Normalization和注意力机制来描述转换器的内部机制。层Normalization使得含义特征被归一化到一个超球上,从而使得注意力可以模糊语言表示的含义。
  • results: 通过对预训练的GPT-2模型进行探测,发现了层Normalization和注意力机制的相互关系,并发现了早期层的查询-关键注意力模式。这些结果证明了几何视角的有用性,并提供了一种直观的理解转换器的方式,即将单词粒子视为在超球上的旋转。
    Abstract Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere.
    摘要 transformers 已经在自然语言处理领域取得了 significativ advancement, 但理解其内部机制仍然是一个挑战。在这篇论文中,我们提出了一种新的几何视角,可以帮助我们更好地理解 transformer 的内部机制。我们的主要贡献在于解释如何层 normalization 使得封闭 latent features 在 hyper-sphere 上,然后使得 attention 可以模糊这个表面上的 semantic representation 的形态。这种几何视角通过连接了已知的属性,如迭代精度和 contextual embeddings。我们验证了我们的发现,通过探测一个预训练的 124M 参数 GPT-2 模型。我们的发现显示了早期层的查询-密钥 attention 模式,并建立在更深层的 attention 头上的主题特有性。利用这种几何视角,我们提供了一种直观的理解 transformers,描述它们为模elling 单词粒子的轨迹在 hyper-sphere 上。

AudioSR: Versatile Audio Super-resolution at Scale

  • paper_url: http://arxiv.org/abs/2309.07314
  • repo_url: None
  • paper_authors: Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley
  • for: 提高低分辨率音频质量
  • methods: 使用扩散模型进行数据生成
  • results: 能够在各种音频类型和频谱范围内提高音频质量,包括音效、音乐和说话
    Abstract Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4kHz to 8kHz). In this paper, we introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2kHz to 16kHz to a high-resolution audio signal at 24kHz bandwidth with a sampling rate of 48kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen. Our code and demo are available at https://audioldm.github.io/audiosr.
    摘要 音频超分辨是一个基础任务,它预测低分辨率音频中缺失的高频成分,以提高数字应用中的音频质量。现有的方法具有限制,例如只能处理特定类型的音频(如音乐、语音)以及特定的宽频范围(如4kHz到8kHz)。在这篇论文中,我们介绍了一种扩散基于的生成模型,即AudioSR,可以在多种音频类型上进行稳定的音频超分辨。具体来说,AudioSR可以将输入音频信号在2kHz到16kHz的宽频范围内 upsample 到高分辨率音频信号,具有48kHz的抽样率和24kHz的宽频范围。对于多种音频超分辨测试 benchmark 进行了广泛的对象评估,结果显示提案的模型具有强大的效果。此外,我们的主观评估表明,AudioSR可以作为一个插件模块,提高多种音频生成模型的生成质量,包括AudioLDM、Fastspeech2和MusicGen。我们的代码和示例可以在 上下载。

Pretraining on the Test Set Is All You Need

  • paper_url: http://arxiv.org/abs/2309.08632
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Rylan Schaeffer
  • for: 这 paper 是为了检验和评估基于 transformer 语言模型的小型语言模型。
  • methods: 这 paper 使用了一种新的数据集 mixture,包含 less than 100 thousand tokens,并使用 transformer 语言模型进行预训练。
  • results: 这 paper 的模型可以在多种学术 bencmarks 上达到完美的结果,超越所有已知基础模型,并且能够准确预测下游评估 bencmarks 的 canaries。
    Abstract Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM \textbf{phi-CTNL} (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. \textbf{phi-CTNL} also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.
    摘要 受最近一些研究所示出小型Transformer模型预训练在精心准备的数据上的承诺,我们决心投入大量资源来精心制作一种新的高质量非 sintetic 数据混合物。使用我们的新数据混合物,包含 fewer than 100 thousand tokens,我们预训练了1 million参数的Transformer模型\textbf{phi-CTNL}(发音为“虚构的”),在多种学术评价指标上达到了完美的结果,一直超越所有已知基础模型。\textbf{phi-CTNL} 还击倒了力学律稳定的增长和显示出 nunca-before-seen 的感知能力,准确预测下游评价指标的鸡。

  • paper_url: http://arxiv.org/abs/2309.07276
  • repo_url: None
  • paper_authors: Thao Nguyen, Vladislav Hrosinkov, Eric Rosen, Stefanie Tellex
  • for: 本研究旨在实现实际的物品搜寻任务,使 robot 能够通过语言描述找到环境中的物品。
  • methods: 本研究使用 Markov decision process (POMDP) 来解决物品搜寻问题,并通过一个深度神经网络来决定物品探测器和视觉传感器的错误模型。
  • results: 研究比较了这个新方法和现有的物品搜寻算法,结果显示,使用这个方法可以实现更高的任务完成率(由 0.46 提升至 0.66)和更快和有效的物品搜寻。此外,研究还在 Boston Dynamics Spot 机器人上进行了实际应用,证明了这个方法的可行性和效果。
    Abstract Object search is a challenging task because when given complex language descriptions (e.g., "find the white cup on the table"), the robot must move its camera through the environment and recognize the described object. Previous works map language descriptions to a set of fixed object detectors with predetermined noise models, but these approaches are challenging to scale because new detectors need to be made for each object. In this work, we bridge the gap in realistic object search by posing the search problem as a partially observable Markov decision process (POMDP) where the object detector and visual sensor noise in the observation model is determined by a single Deep Neural Network conditioned on complex language descriptions. We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise. With an LCOM, any language description of an object can be used to generate an appropriate object detector and noise model, and training an LCOM only requires readily available supervised image-caption datasets. We empirically evaluate our method by comparing against a state-of-the-art object search algorithm in simulation, and demonstrate that planning with our observation model yields a significantly higher average task completion rate (from 0.46 to 0.66) and more efficient and quicker object search than with a fixed-noise model. We demonstrate our method on a Boston Dynamics Spot robot, enabling it to handle complex natural language object descriptions and efficiently find objects in a room-scale environment.
    摘要 <>将文本翻译成简化中文。<>对象搜索是一项复杂的任务,因为当给出复杂的自然语言描述(例如,"找到桌子上的白瓶")时,机器人需要通过环境移动相机并识别描述的对象。先前的方法将语言描述映射到一组预先定义的对象检测器中,但这些方法具有扩展性问题,因为需要为每个对象创建新的检测器。在这种工作中,我们将搜索问题视为一个部分可观察的Markov决策过程(POMDP),其中对象检测器和视觉传感器的噪音在观察模型中由单个深度神经网络决定,该神经网络是根据复杂的自然语言描述 Conditioned。我们将神经网络的输出 integrate into我们的语言受控观察模型(LCOM),以表示动态变化的感知噪音。与一个LCOM相比,任何自然语言描述都可以生成适当的对象检测器和噪音模型,并且训练LCOM只需要可以获得的ready-to-use的Supervised image-caption dataset。我们实际运行我们的方法,与一个状态革新的对象搜索算法进行比较,并证明在 simulation中,使用我们的观察模型可以实现更高的均值任务完成率(从0.46到0.66)和更有效和更快的对象搜索。我们在Boston Dynamics Spot机器人上运行我们的方法,使其能够处理复杂的自然语言对象描述,并快速寻找房间级别环境中的对象。

Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach

  • paper_url: http://arxiv.org/abs/2309.07265
  • repo_url: https://github.com/ahmadnagib/tl-aided-drl
  • paper_authors: Ahmad M. Nagib, Hatem Abou-Zeid, Hossam S. Hassanein
    for: 这篇论文旨在提出一种基于启发学习(DRL)的闭环控制方法,以优化Radio Access Network(RAN)功能。methods: 该论文使用了转移学习(TL)作为训练和部署工作流程的核心组成部分,并提出了一种hybrid TL-aided方法, combinign policy reuse和distillation TL方法,以提供安全和加速的收敛。results: 实验结果表明,提议的hybrid方法在面对实际的VR游戏压力流量的情况下,能够提供至少7.7%和20.7%的初始奖励值和结束收敛情况的提升,同时保持快速收敛和提高普适性。
    Abstract The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when encountering previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.
    摘要 Open Radio Access Network (O-RAN) 架构支持智能网络控制算法作为其核心功能之一。数据驱动应用程序将此类算法用于Radio Access Network (RAN) 函数的优化,通过RAN 智能控制器 (RICs)。深度强化学习 (DRL) 算法是 O-RAN 文献中广泛采用的方法来解决动态Radio资源管理问题。然而,尽管 O-RAN RICs 带来了优点,但实际部署中对 DRL 算法的应用仍迟后。这主要是因为 DRL 代理人在部署和遇到新的网络条件时的慢速融合和不稳定性。在这篇论文中,我们解决这些挑战,通过在训练和部署过程中引入传输学习 (TL) 作为核心组件。为此,我们提出了一种混合 TL-aided 方法,利用传输学习的优势,提供安全和加速融合,并在 O-RAN 排程中提供更好的一致性。我们进行了 Thorough 实验,包括多种服务,如真实的 VR 游戏流量,以反映实际 O-RAN 排程的实际场景。我们还提出了非 TL-aided DRL 和 policy reuse 和 distillation 两种基eline。我们的提案的混合方法在 DRL 基eline 中显示出至少7.7%和20.7%的平均初始奖励值和百分比可以融合,并在奖励变幅方面减少64.6%,同时保持快速融合和提高一致性。

Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2309.07235
  • repo_url: None
  • paper_authors: Xingfu Wu, Praveen Paramasivam, Valerie Taylor
  • for: 优化稠密矩阵分解(LU、Cholesky、3mm)的性能在GPU和AI加速器上。
  • methods: 使用 Bayesian 优化和 TVM tensor expression language 实现自动调整。
  • results: 对 Argonne National Laboratory 的 GPU 集群(Swing)进行科学计算kernel的评估,并与 AutoTVM Comparing four tuners 的自动调整框架,发现自动调整框架在大多数情况下表现更好。
    Abstract Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.
    摘要 Apache TVM(tensor虚拟机),一个开源机器学习编译框架,设计用于优化不同硬件平台上的计算,为GPU和人工智能加速器提供了提高稠密矩阵因子化(如LU和Cholesky分解)的机会。在这篇论文中,我们提出了一个基于抽象优化的新的TVM自动调整框架,使用 bayesian优化来实现。我们使用TVMtensor表达语言来实现线性代数kernels,如LU、Cholesky和3mm。我们使用这些科学计算kernels来评估我们的方法在Argonne国家实验室的GPU集群上的效果。我们比较了我们的自动调整框架与AutoTVM自动调整框架,并发现我们的框架在大多数情况下超越AutoTVM。

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

  • paper_url: http://arxiv.org/abs/2309.07120
  • repo_url: https://github.com/ucsc-vlaa/sight-beyond-text
  • paper_authors: Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie
  • for: 这个研究旨在探讨多Modal大型自然语言模型(MLLMs)的各种能力,包括文本回答和视觉指令。
  • methods: 研究人员使用了视觉指令调整,这是一种广泛使用的方法来转化大型自然语言模型(LLMs)为MLLMs。
  • results: 研究人员发现,使用视觉指令调整可以帮助模型更好地保持真实性和道德观念在纯文本上下文中。例如,一个视觉指令调整后的 LLaMA2 7B 模型在 TruthfulQA-mc 和 Ethics bencmarks 上表现更好 than 一个基于一百万个人签名的 LLaMA2-chat 7B 模型。
    Abstract Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses. While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. In this study, we get out of the box and unveil an intriguing characteristic of MLLMs -- our preliminary results suggest that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment in the pure NLP context. For example, a visual-instruction-tuned LLaMA2 7B model surpasses the performance of the LLaMA2-chat 7B model, fine-tuned with over one million human annotations, on TruthfulQA-mc and Ethics benchmarks. Further analysis reveals that the improved alignment can be attributed to the superior instruction quality inherent to visual-text data. In releasing our code at github.com/UCSC-VLAA/Sight-Beyond-Text, we aspire to foster further exploration into the intrinsic value of visual-text synergies and, in a broader scope, multi-modal interactions in alignment research.
    摘要 多Modal大语言模型(MLLMs)由大语言模型(LLM)进行训练,具有理解多Modal输入并生成文本响应的能力。虽然它们在多Modal任务中表现出色,但纯NLP能力的MLLMs经常被低估和未经测试。在这个研究中,我们离开“盒子”,揭示了MLLMs的一个吸引人特点——我们的初步结果表明,使用视觉指令调整,一种常见的将LLMs转换为MLLMs的策略,奇怪地和有趣地帮助模型在纯NLP上获得更高的真实性和道德适应性。例如,一个视觉指令调整后的LLaMA2 7B模型超过了人工注释超过一百万个的LLaMA2-chat 7B模型在TruthfulQA-mc和Ethicsbenchmark上的表现。进一步分析表明,改善的协调可以归因于视觉文本数据中的更高质量指令。在发布我们的代码在github.com/UCSC-VLAA/Sight-Beyond-Text上,我们希望能够激发更多人研究多Modal交互和协调在对齐研究中的内在价值。

Characterizing Speed Performance of Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.07108
  • repo_url: None
  • paper_authors: Samuel Wiggins, Yuan Meng, Rajgopal Kannan, Viktor Prasanna
  • for: 本研究旨在探讨多智能体强化学习(MARL)算法在大规模AI系统和大数据应用中的性能瓶颈。
  • methods: 本研究使用了一种稍加简化的分类方法,将MARL算法分为两类:培训方案和通信方式。然后,通过对三种目标算法(MADDPG、ToM2C和NeurComm)的性能瓶颈进行系统分析,以探讨它们的性能瓶颈。
  • results: 研究发现,MARL算法的性能瓶颈主要来自于培训和通信过程中的计算和内存占用。此外,研究还发现,通过并行化和加速MARL算法可以大幅提高性能。
    Abstract Multi-Agent Reinforcement Learning (MARL) has achieved significant success in large-scale AI systems and big-data applications such as smart grids, surveillance, etc. Existing advancements in MARL algorithms focus on improving the rewards obtained by introducing various mechanisms for inter-agent cooperation. However, these optimizations are usually compute- and memory-intensive, thus leading to suboptimal speed performance in end-to-end training time. In this work, we analyze the speed performance (i.e., latency-bounded throughput) as the key metric in MARL implementations. Specifically, we first introduce a taxonomy of MARL algorithms from an acceleration perspective categorized by (1) training scheme and (2) communication method. Using our taxonomy, we identify three state-of-the-art MARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG), Target-oriented Multi-agent Communication and Cooperation (ToM2C), and Networked Multi-Agent RL (NeurComm) - as target benchmark algorithms, and provide a systematic analysis of their performance bottlenecks on a homogeneous multi-core CPU platform. We justify the need for MARL latency-bounded throughput to be a key performance metric in future literature while also addressing opportunities for parallelization and acceleration.
    摘要 在这个工作中,我们分析 MARL 实现中的速度性能(即延迟确定通过put)作为关键度量。 Specifically, we first introduce a taxonomy of MARL algorithms from an acceleration perspective categorized by (1) training scheme and (2) communication method. Using our taxonomy, we identify three state-of-the-art MARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG), Target-oriented Multi-agent Communication and Cooperation (ToM2C), and Networked Multi-Agent RL (NeurComm) - as target benchmark algorithms, and provide a systematic analysis of their performance bottlenecks on a homogeneous multi-core CPU platform. We justify the need for MARL latency-bounded throughput to be a key performance metric in future literature while also addressing opportunities for parallelization and acceleration.

Mitigating Group Bias in Federated Learning for Heterogeneous Devices

  • paper_url: http://arxiv.org/abs/2309.07085
  • repo_url: None
  • paper_authors: Khotso Selialia, Yasra Chandio, Fatima M. Anwar
  • for: 这篇论文的目的是为了提出一个具有隐私保证的聚合学习框架,以便在分散的边缘应用中进行模型训练,并且能够处理不同部署环境下的数据不均匀。
  • methods: 这篇论文使用了聚合学习的方法,并且提出了一个修改了多重加权更新方法,以便在不同部署环境下实现具有均衡性的模型训练。此外,论文还提出了一些调整技术来减少最差和最好的群体之间的差异,并且通过阈值机制来确保均衡具有均衡性和差异减少。
  • results: 论文的实验结果显示,这个框架可以实现具有均衡性和隐私保证的聚合学习,并且能够在实际的不同部署环境下进行均衡的模型训练。此外,论文还显示了该框架在人类情绪识别和图像分类 benchmark 上的良好表现,证明了它在实际应用中的可行性和有效性。
    Abstract Federated Learning is emerging as a privacy-preserving model training approach in distributed edge applications. As such, most edge deployments are heterogeneous in nature i.e., their sensing capabilities and environments vary across deployments. This edge heterogeneity violates the independence and identical distribution (IID) property of local data across clients and produces biased global models i.e. models that contribute to unfair decision-making and discrimination against a particular community or a group. Existing bias mitigation techniques only focus on bias generated from label heterogeneity in non-IID data without accounting for domain variations due to feature heterogeneity and do not address global group-fairness property. Our work proposes a group-fair FL framework that minimizes group-bias while preserving privacy and without resource utilization overhead. Our main idea is to leverage average conditional probabilities to compute a cross-domain group \textit{importance weights} derived from heterogeneous training data to optimize the performance of the worst-performing group using a modified multiplicative weights update method. Additionally, we propose regularization techniques to minimize the difference between the worst and best-performing groups while making sure through our thresholding mechanism to strike a balance between bias reduction and group performance degradation. Our evaluation of human emotion recognition and image classification benchmarks assesses the fair decision-making of our framework in real-world heterogeneous settings.
    摘要 federated 学习是一种隐私保护的模型训练方法,在分布式边缘应用中得到普遍应用。由于大多数边缘部署是多样化的,即它们的感知能力和环境不同。这种边缘多样性违反了本地数据之间独立和同分布性(IID)属性,导致模型偏袋和不公正决策。现有的偏袋缓解技术仅考虑模型生成的标签多样性,而不考虑特征多样性引起的领域变化,也没有考虑全球群体公平性质量。我们的工作提出了一种群体公平的 Federated 学习框架,以降低群体偏袋,保护隐私和不增加资源使用负担。我们的主要想法是通过conditional probability的平均值来计算各个领域的群体重要性权,基于多样化的训练数据,使用修改后的乘数更新方法来优化最差performing group的性能。此外,我们还提出了约束技术,以降低最差和最优performing group之间的差距,同时通过阈值机制保证偏袋减少和群体性能下降之间的平衡。我们对人听情感识别和图像分类benchmark进行了评估,以评估我们的框架在实际多样化环境中的公平决策性。

A Comprehensive Analysis of the Role of Artificial Intelligence and Machine Learning in Modern Digital Forensics and Incident Response

  • paper_url: http://arxiv.org/abs/2309.07064
  • repo_url: None
  • paper_authors: Dipo Dunsin, Mohamed C. Ghanem, Karim Ouazzane, Vassil Vassilev
  • for: This paper aims to provide a comprehensive analysis of the use of Artificial Intelligence (AI) and Machine Learning (ML) in digital forensics and incident response, exploring cutting-edge research initiatives and their applications in various facets of digital forensics practice.
  • methods: The paper employs a thorough and in-depth analysis, including a review of existing research, to examine the use of AI and ML techniques in digital forensics, including data collection and recovery, cybercrime timeline reconstruction, big data analysis, pattern recognition, chain of custody safeguarding, and responsive strategies to hacking incidents.
  • results: The study highlights the potential and limitations of AI and ML techniques in digital forensics, including their contributions, limitations, and gaps in the existing research. It also underscores the significance of strategic planning, continual research, and development to unlock AI’s full potential in digital forensics and incident response, and offers insights into their benefits, drawbacks, and broader implications for tackling modern cyber threats.
    Abstract In the dynamic landscape of digital forensics, the integration of Artificial Intelligence (AI) and Machine Learning (ML) stands as a transformative technology, poised to amplify the efficiency and precision of digital forensics investigations. However, the use of ML and AI in digital forensics is still in its nascent stages. As a result, this paper gives a thorough and in-depth analysis that goes beyond a simple survey and review. The goal is to look closely at how AI and ML techniques are used in digital forensics and incident response. This research explores cutting-edge research initiatives that cross domains such as data collection and recovery, the intricate reconstruction of cybercrime timelines, robust big data analysis, pattern recognition, safeguarding the chain of custody, and orchestrating responsive strategies to hacking incidents. This endeavour digs far beneath the surface to unearth the intricate ways AI-driven methodologies are shaping these crucial facets of digital forensics practice. While the promise of AI in digital forensics is evident, the challenges arising from increasing database sizes and evolving criminal tactics necessitate ongoing collaborative research and refinement within the digital forensics profession. This study examines the contributions, limitations, and gaps in the existing research, shedding light on the potential and limitations of AI and ML techniques. By exploring these different research areas, we highlight the critical need for strategic planning, continual research, and development to unlock AI's full potential in digital forensics and incident response. Ultimately, this paper underscores the significance of AI and ML integration in digital forensics, offering insights into their benefits, drawbacks, and broader implications for tackling modern cyber threats.
    摘要 在数字审查领域的动态景观中,人工智能(AI)和机器学习(ML)技术被视为一种转型性的技术,它将在数字审查调查中提高效率和精度。然而,在数字审查中使用ML和AI的使用仍然处于初始阶段。因此,本文进行了深入的分析和评论,不仅是简单的报告和评论。本研究的目的是仔细研究AI和ML技术在数字审查和应急应对中的应用。本研究探讨了跨领域的数据采集和恢复、复杂的网络犯罪时间线重建、大数据分析、模式识别、维护链接 custody和应对黑客事件的应急策略等方面。本研究不仅仅是浅表面上的探讨,而是深入探究AI驱动的方法如何影响这些关键方面的数字审查实践。虽然AI在数字审查中的承诺很明显,但是随着数据库的增加和犯罪手段的发展,需要持续合作研究和完善。本研究 examine了现有研究的贡献、局限性和缺陷,为数字审查和应急应对领域的未来发展提供了新的思路和方向。最终,本文强调了AI和ML在数字审查中的潜在价值,并 shed light on its potential and limitations。

Deep Quantum Graph Dreaming: Deciphering Neural Network Insights into Quantum Experiments

  • paper_url: http://arxiv.org/abs/2309.07056
  • repo_url: None
  • paper_authors: Tareq Jaouni, Sören Arlt, Carlos Ruiz-Gonzalez, Ebrahim Karimi, Xuemei Gu, Mario Krenn
  • for: 本研究旨在使用可解释AI技术(XAI)来解释神经网络在量子光学实验中的学习结果。
  • methods: 本研究使用了inception或deep dreaming技术来探索神经网络对量子系统的学习。首先,我们在神经网络上训练了量子系统的性质,然后使用了inception技术来探索神经网络如何改变量子系统的性质。
  • results: 研究发现,神经网络可以Shift量子系统的初始分布性质,并且可以描述神经网络学习的策略。在深层层次中,神经网络可以识别复杂的量子结构和甚至是量子共振。这与计算机视觉中已知的性质相似,我们现在在复杂的自然科学任务中发现了这种性质。这种方法可能有助于开发更有可读性的AI基于科学发现技术。
    Abstract Despite their promise to facilitate new scientific discoveries, the opaqueness of neural networks presents a challenge in interpreting the logic behind their findings. Here, we use a eXplainable-AI (XAI) technique called $inception$ or $deep$ $dreaming$, which has been invented in machine learning for computer vision. We use this techniques to explore what neural networks learn about quantum optics experiments. Our story begins by training a deep neural networks on the properties of quantum systems. Once trained, we "invert" the neural network -- effectively asking how it imagines a quantum system with a specific property, and how it would continuously modify the quantum system to change a property. We find that the network can shift the initial distribution of properties of the quantum system, and we can conceptualize the learned strategies of the neural network. Interestingly, we find that, in the first layers, the neural network identifies simple properties, while in the deeper ones, it can identify complex quantum structures and even quantum entanglement. This is in reminiscence of long-understood properties known in computer vision, which we now identify in a complex natural science task. Our approach could be useful in a more interpretable way to develop new advanced AI-based scientific discovery techniques in quantum physics.
    摘要 尽管神经网络的承诺是为科学发现提供新的机会,但它们的不透明性却成为解释其发现的逻辑的挑战。在这里,我们使用一种可解释的AI技术,称为inception或deep dreaming,这种技术在机器学习中提出,用于计算机视觉领域。我们使用这种技术来探索神经网络在量子光学实验中学习的内容。我们的故事开始于训练深度神经网络,然后我们“反转”神经网络,即问神经网络如何在Specific property的量子系统中做出修改,以达到改变这个属性的目的。我们发现神经网络可以将初始分布的属性转移到量子系统中,并且我们可以概念化神经网络学习的策略。有趣的是,在核心层次上,神经网络可以识别简单的属性,而在更深的层次上,它可以识别复杂的量子结构,甚至是量子紧密相关。这与计算机视觉中长期认知的特性相似,我们现在在复杂的自然科学任务中发现这些特性。我们的方法可能有助于在更可解释的方式下开发新的高级AI科学发现技术。

Pearl’s and Jeffrey’s Update as Modes of Learning in Probabilistic Programming

  • paper_url: http://arxiv.org/abs/2309.07053
  • repo_url: None
  • paper_authors: Bart Jacobs, Dario Stein
  • for: 这篇论文主要针对 updating 概率分布 的问题,即在新证据出现后,如何更新概率分布。
  • methods: 论文使用了 Pearl 和 Jeffrey 的两种自然更新方法,并对它们之间的相似性和差异进行了解释。
  • results: 论文表明,Jeffrey 的更新规则可以通过变分推理得到,而且在概率论中可以用多erset 函数和分布Monad 的 Kleisli 类来分析。
    Abstract The concept of updating a probability distribution in the light of new evidence lies at the heart of statistics and machine learning. Pearl's and Jeffrey's rule are two natural update mechanisms which lead to different outcomes, yet the similarities and differences remain mysterious. This paper clarifies their relationship in several ways: via separate descriptions of the two update mechanisms in terms of probabilistic programs and sampling semantics, and via different notions of likelihood (for Pearl and for Jeffrey). Moreover, it is shown that Jeffrey's update rule arises via variational inference. In terms of categorical probability theory, this amounts to an analysis of the situation in terms of the behaviour of the multiset functor, extended to the Kleisli category of the distribution monad.
    摘要 <>将新证据纳入概率分布更新的概念是统计和机器学习的核心。珍珠和胡均的更新规则是自然的更新机制,却又存在价值和不同之处,这些相似和不同的关系仍然莫迪。这篇论文通过以下几种方式解释了这两个更新规则之间的关系:首先,通过在概率程序和抽样语义上分别描述这两个更新规则,并通过不同的可能性概率(对珍珠和胡均)进行解释。其次,这篇论文还证明了胡均的更新规则可以通过变分推理来 derivation。在Category probability theory中,这等于对多erset函数的行为进行分析,将分布模块的Kleisli类型扩展到多erset函数。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

  • paper_url: http://arxiv.org/abs/2309.07051
  • repo_url: https://github.com/youngseng/unifiedgesture
  • paper_authors: Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, changpeng yang, Zonghong Dai
  • for: 这个论文的目的是提出一种新的散射模型基于的speech-driven手势生成方法,用于解决过去的个体数据集设计和不同的动体捕捉标准下的手势生成问题。
  • methods: 该方法首先提出了一种重定向网络,以学习不同动体标准下的手势表示,并将其整合到不同的手势数据集中。然后,通过涉及注意力和自注意力来捕捉speech和手势之间的相关性,并通过使用散射模型架构来生成更加匹配的speech和realistic的手势。最后,通过在粒度单位上使用学习的奖励函数来补做手势和speech的对齐和多样性。
  • results: 对于speech-driven手势生成方法,这个方法的实验结果表明,它在CCA、FGD和人工智能化三个指标上都超过了现有的方法。
    Abstract The automatic co-speech gesture generation draws much attention in computer animation. Previous works designed network structures on individual datasets, which resulted in a lack of data volume and generalizability across different motion capture standards. In addition, it is a challenging task due to the weak correlation between speech and gestures. To address these problems, we present UnifiedGesture, a novel diffusion model-based speech-driven gesture synthesis approach, trained on multiple gesture datasets with different skeletons. Specifically, we first present a retargeting network to learn latent homeomorphic graphs for different motion capture standards, unifying the representations of various gestures while extending the dataset. We then capture the correlation between speech and gestures based on a diffusion model architecture using cross-local attention and self-attention to generate better speech-matched and realistic gestures. To further align speech and gesture and increase diversity, we incorporate reinforcement learning on the discrete gesture units with a learned reward function. Extensive experiments show that UnifiedGesture outperforms recent approaches on speech-driven gesture generation in terms of CCA, FGD, and human-likeness. All code, pre-trained models, databases, and demos are available to the public at https://github.com/YoungSeng/UnifiedGesture.
    摘要 “自动合成手势拥有了广泛的关注,特别是在计算机动画领域。先前的工作基于个别数据集设计了网络结构,却导致数据量不足和不同动作捕捉标准之间的普适性问题。此外,手势和语音之间的关系强度较弱,这也是一个挑战。为解决这些问题,我们提出了一种新的扩散模型基于的语音驱动手势生成方法,并在多个手势数据集上进行了训练。具体来说,我们首先提出了一种重定向网络,以学习不同动作捕捉标准下的共同表示,从而统一各种手势的表示,同时扩展数据集。然后,我们基于扩散模型架构使用了相互关注和自身关注,以生成更好地语音匹配和更加真实的手势。为了进一步对语音和手势进行对齐和增加多样性,我们还在粗粒度上进行了资源学习,并使用了学习的奖励函数。我们的实验表明,UnifiedGesture可以在语音驱动手势生成方面与现有的方法相比,在CCA、FGD和人类化指标上表现出色。所有代码、预训练模型、数据库和示例都可以在https://github.com/YoungSeng/UnifiedGesture上获取。”

Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

  • paper_url: http://arxiv.org/abs/2309.07200
  • repo_url: None
  • paper_authors: Marco Federici, Patrick Forré, Ryota Tomioka, Bastiaan S. Veeling
  • for: 这种论文是为了提出一种基于信息理论的时间延迟维度减少方法,用于精确地模拟大规模系统在长时间步长下的动态行为。
  • methods: 这种方法使用时间延迟信息瓶颈(T-IB),一种基于信息理论的原则,来将复杂系统映射到简化的表示空间中,并模型大幅时间跳动。
  • results: 实验表明,T-IB可以成功地捕捉原始过程的统计特征和动态特征,并在选择的时间延迟下提供信息优质的表示,超越现有的时间延迟维度减少方法。
    Abstract Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.
    摘要

Efficient Reinforcement Learning for Jumping Monopods

  • paper_url: http://arxiv.org/abs/2309.07038
  • repo_url: https://github.com/mfocchi/jump_rl
  • paper_authors: Riccardo Bussola, Michele Focchi, Andrea Del Prete, Daniele Fontanelli, Luigi Palopoli
  • for: 这个研究旨在解决控制问题,使一个单脚抓到目标点。
  • methods: 该研究使用了知识导引RL框架,以快速学习控制器。
  • results: 研究表明,该方法比优化基于方法和端到端RL方法更高效。
    Abstract In this work, we consider the complex control problem of making a monopod reach a target with a jump. The monopod can jump in any direction and the terrain underneath its foot can be uneven. This is a template of a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) could be an interesting alternative, but the application of an end-to-end approach in which the controller must learn everything from scratch, is impractical. The solution advocated in this paper is to guide the learning process within an RL framework by injecting physical knowledge. This expedient brings to widespread benefits, such as a drastic reduction of the learning time, and the ability to learn and compensate for possible errors in the low-level controller executing the motion. We demonstrate the advantage of our approach with respect to both optimization-based and end-to-end RL approaches.
    摘要 在这项工作中,我们考虑了一个复杂的控制问题,即使用杆 Foot 达到目标点。杆 Foot 可以在任何方向跳跃,地面下的地形可能不平坦。这是一个更大的类型问题的模板,这些问题非常困难并且计算昂贵,使用标准优化基本技术来解决。学习反馈(RL)可能是一个有趣的alternative,但是把控制器学习 everything from scratch 的end-to-end方法是不实用的。我们提出的解决方案是通过在 RL 框架中注入物理知识来导引学习过程。这种方法带来了广泛的好处,如减少学习时间和可以学习和赔偿低级控制器执行运动中的可能错误。我们通过与优化基本技术和end-to-end RL方法进行比较,证明了我们的方法的优势。

How (Not) to Use Sociodemographic Information for Subjective NLP Tasks

  • paper_url: http://arxiv.org/abs/2309.07034
  • repo_url: https://github.com/ukplab/arxiv2023-sociodemographic-prompting
  • paper_authors: Tilman Beck, Hendrik Schuff, Anne Lauscher, Iryna Gurevych
  • for: 这个研究旨在解释社会学人类对于主观NLU任务的决策会受到哪些因素影响,以及社会学人类对于这些任务的评估是否有所不同。
  • methods: 这个研究使用了社会学人类对于主观NLU任务的评估,并评估了不同的提示方式以及模型家族对于这些任务的表现。
  • results: 研究发现,社会学人类对于主观NLU任务的评估存在许多不同,并且不同的提示方式和模型家族对于这些任务的表现也有很大的差异。
    Abstract Annotators' sociodemographic backgrounds (i.e., the individual compositions of their gender, age, educational background, etc.) have a strong impact on their decisions when working on subjective NLP tasks, such as hate speech detection. Often, heterogeneous backgrounds result in high disagreements. To model this variation, recent work has explored sociodemographic prompting, a technique, which steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give. However, the available NLP literature disagrees on the efficacy of this technique -- it remains unclear, for which tasks and scenarios it can help and evaluations are limited to specific tasks only. We address this research gap by presenting the largest and most comprehensive study of sociodemographic prompting today. Concretely, we evaluate several prompt formulations across seven datasets and six instruction-tuned model families. We find that (1) while sociodemographic prompting can be beneficial for improving zero-shot learning in subjective NLP tasks, (2) its outcomes largely vary for different model types, sizes, and datasets, (3) are subject to large variance with regards to prompt formulations. Thus, sociodemographic prompting is not a reliable proxy for traditional data annotation with a sociodemographically heterogeneous group of annotators. Instead, we propose (4) to use it for identifying ambiguous instances resulting in more informed annotation efforts.
    摘要 评分人员的社会 демографические背景(例如,个人的性别、年龄、教育背景等)在主观NLPTasks中的决策产生很大影响。常见的多样性导致高度不一致。为了模拟这种多样性,最近的研究已经探索了社会 демографических提示,一种技术,可以让模型的输出更加受人类社会 демографиically多样化的人的影响。然而,NLPLiterature中有限的资源表明,这种技术的有效性是不确定的,它的应用场景和任务类型都需要进一步的研究。我们填补这个研究漏洞,并提出了以下结论:1. 社会 демографические提示可以在主观NLPTasks中提高零shot学习的效果。2. 不同的模型类型、大小和数据集中,sociodemographic prompting的效果差异较大。3. 对提示形式的选择会导致很大的变化。4. 因此,sociodemographic prompting不是一个可靠的代理人 для使用多样化的社会 демографи学注解人员进行标注。而是应用于帮助标注人员更加了解的情况。

Résumé Parsing as Hierarchical Sequence Labeling: An Empirical Study

  • paper_url: http://arxiv.org/abs/2309.07015
  • repo_url: https://github.com/federetyk/resume-parsing
  • paper_authors: Federico Retyk, Hermenegildo Fabregat, Juan Aizpuru, Mariana Taglio, Rabih Zbib
  • For: 本研究旨在提出一种 simultaneous 分类模型,用于从简历中提取信息。* Methods: 本研究采用了两级标签模型,将文档分为行和单词两个水平,并同时进行行和单词的标签。* Results: 实验结果表明,提出的模型在英文、法语、中文、西班牙语、德语、葡萄牙语和瑞典语的简历提取任务中表现出色,超越了之前的方法。
    Abstract Extracting information from r\'esum\'es is typically formulated as a two-stage problem, where the document is first segmented into sections and then each section is processed individually to extract the target entities. Instead, we cast the whole problem as sequence labeling in two levels -- lines and tokens -- and study model architectures for solving both tasks simultaneously. We build high-quality r\'esum\'e parsing corpora in English, French, Chinese, Spanish, German, Portuguese, and Swedish. Based on these corpora, we present experimental results that demonstrate the effectiveness of the proposed models for the information extraction task, outperforming approaches introduced in previous work. We conduct an ablation study of the proposed architectures. We also analyze both model performance and resource efficiency, and describe the trade-offs for model deployment in the context of a production environment.
    摘要 通常情况下,从简历中提取信息是一个两阶段的问题,首先将文档分 segmented into sections,然后对每个section进行处理,以提取目标实体。而我们将这个问题框定为一个两级标注问题,将文本分为两个水平:lines和tokens,并研究如何同时解决这两个任务。我们建立了高质量的简历分析 Corpora 在英语、法语、中文、西班牙语、德语、葡萄牙语和瑞典语等语言中。基于这些 Corpora,我们提出了一些实验结果,表明我们的方法在信息提取任务中表现出色,超过了之前的方法。我们进行了模型architecture的ablation Study,并分析了模型性能和资源使用情况,并描述了在生产环境中模型部署的trade-offs。

  • paper_url: http://arxiv.org/abs/2309.07001
  • repo_url: None
  • paper_authors: Ziyuan Xia, Anchen Sun, Xiaodong Cai, Saixing Zeng
  • for: 这种研究旨在映射全球市场中公司ESG报告的变化景观。
  • methods: 该研究采用动态框架分析公司ESG策略管理,包括个体类、多个类和特定可持续发展指数的Alignment。
  • results: 研究发现,通过包含分析关键词的方法,可以揭示过去几年ESG话题的同时演变。
    Abstract Environmental, social, and governance (ESG) reports are globally recognized as a keystone in sustainable enterprise development. This study aims to map the changing landscape of ESG topics within firms in the global market. A dynamic framework is developed to analyze ESG strategic management for individual classes, across multiple classes, and in alignment with a specific sustainability index. The output of these analytical processes forms the foundation of an ESG strategic model. Utilizing a rich collection of 21st-century ESG reports from technology companies, our experiment elucidates the changes in ESG perspectives by incorporating analytical keywords into the proposed framework. This work thus provides an empirical method that reveals the concurrent evolution of ESG topics over recent years.
    摘要 环境、社会和管理(ESG)报告在全球市场被广泛认可为可持续企业发展的关键estone。这项研究旨在映射全球市场内公司ESG主题的变化情况。我们提出了一种动态框架,用于分析公司独立类、多个类和特定可持续指数的ESG策略管理。这些分析过程的输出形成了ESG策略模型的基础。通过使用21世纪ESG报告的丰富集合,我们的实验揭示了ESG视角的变化。这种实验方法可以揭示过去几年ESG主题的同时演化。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

  • paper_url: http://arxiv.org/abs/2309.06981
  • repo_url: None
  • paper_authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan
  • For: The paper is written to propose a backdoor attack called MASTERKEY that can compromise speaker verification (SV) models in mobile systems, with a focus on real-world practical settings where the attacker has no knowledge of the intended victim.* Methods: The paper investigates the limitations of existing poisoning attacks against unseen targets, optimizes a universal backdoor that can attack arbitrary targets, embeds speaker’s characteristics and semantics information into the backdoor, and estimates channel distortion to integrate it into the backdoor.* Results: The paper validates the attack on 6 popular SV models, poisoning a total of 53 models and using a trigger to attack 16,430 enrolled speakers (310 target speakers enrolled in 53 poisoned models) with a 100% attack success rate at a 15% poison rate, and achieving a 50% attack success rate at a 3% poison rate. The attack is validated in 3 real-world scenarios, including over-the-air and over-the-telephony-line scenarios.Here’s the Chinese version of the three information points:* FOR: 这篇论文是为了提出一种名为MASTERKEY的后门攻击,用于在移动系统中验证真实用户的语音特征,并且在实际生活中的假设中,攻击者没有受害者的知情。* METHODS: 论文调查了现有的毒 attacked against 未知目标的局限性,并且优化了一个通用的后门,可以攻击任意目标。然后,它嵌入了说话者的特征和语义信息到后门中,使其隐蔽。最后,它估算了通道扭曲,并将其纳入到后门中。* RESULTS: 论文验证了这种攻击在6种流行的SV模型上,毒化了53个模型,并使用触发器攻击了16,430名注册说话者(310名目标说话者注册在53个毒化模型中),成功率100%,毒料率15%。降低毒料率到3%,成功率仍然约为50%。论文在3个实际生活中场景中成功地演示了这种攻击,包括过空中和过 телеphony-line 场景。
    Abstract Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation of existing poisoning attacks against unseen targets. Then, we optimize a universal backdoor that is capable of attacking arbitrary targets. Next, we embed the speaker's characteristics and semantics information into the backdoor, making it imperceptible. Finally, we estimate the channel distortion and integrate it into the backdoor. We validate our attack on 6 popular SV models. Specifically, we poison a total of 53 models and use our trigger to attack 16,430 enrolled speakers, composed of 310 target speakers enrolled in 53 poisoned models. Our attack achieves 100% attack success rate with a 15% poison rate. By decreasing the poison rate to 3%, the attack success rate remains around 50%. We validate our attack in 3 real-world scenarios and successfully demonstrate the attack through both over-the-air and over-the-telephony-line scenarios.
    摘要 听话验证(SV)广泛应用在移动系统中,以使用用户的声音特征进行身份验证。在这项工作中,我们提出了一种后门攻击方法,名为主要钥匙(MASTERKEY),以损害SV模型。与前一些攻击不同,我们在实际世界中具有无知的攻击者。我们调查现有毒素攻击的局限性,然后优化一个通用的后门,可以攻击任意目标。接着,我们将听话者的特征和 semantics信息 embed 到后门中,使其隐藏不见。最后,我们估算通道扭曲,并将其纳入后门。我们验证了我们的攻击方法,并成功攻击了6个流行的SV模型。具体来说,我们毒备53个模型,并使用我们的触发器攻击16430名注册用户,其中310名目标用户注册在53个毒备模型中。我们的攻击达100%成功率,且使用毒备率为15%时,成功率约为50%。我们在3个实际场景中验证了我们的攻击,并成功地通过了 both over-the-air 和 over-the-telephony-line 场景。

DNNShifter: An Efficient DNN Pruning System for Edge Computing

  • paper_url: http://arxiv.org/abs/2309.06973
  • repo_url: https://github.com/blessonvar/dnnshifter
  • paper_authors: Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
  • for: 本研究旨在 Addressing the challenge of deploying deep neural network (DNN) models on mobile and embedded devices with limited computational and memory resources.
  • methods: 本研究使用了一种 novel methodology of structured pruning to rapidly derive suitable model variants that maintain the accuracy of the original model.
  • results: compared to conventional training methods, DNNShifter produces pruned model variants up to 93x faster, with a 1.67x inference latency speedup and up to 3.8x lower memory utilization. Additionally, DNNShifter has up to 11.9x lower overhead for switching models.
    Abstract Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To address this, models are pruned to create lightweight, more suitable variants for these devices. Existing pruning methods are unable to provide similar quality models compared to their unpruned counterparts without significant time costs and overheads or are limited to offline use cases. Our work rapidly derives suitable model variants while maintaining the accuracy of the original model. The model variants can be swapped quickly when system and network conditions change to match workload demand. This paper presents DNNShifter, an end-to-end DNN training, spatial pruning, and model switching system that addresses the challenges mentioned above. At the heart of DNNShifter is a novel methodology that prunes sparse models using structured pruning. The pruned model variants generated by DNNShifter are smaller in size and thus faster than dense and sparse model predecessors, making them suitable for inference at the edge while retaining near similar accuracy as of the original dense model. DNNShifter generates a portfolio of model variants that can be swiftly interchanged depending on operational conditions. DNNShifter produces pruned model variants up to 93x faster than conventional training methods. Compared to sparse models, the pruned model variants are up to 5.14x smaller and have a 1.67x inference latency speedup, with no compromise to sparse model accuracy. In addition, DNNShifter has up to 11.9x lower overhead for switching models and up to 3.8x lower memory utilisation than existing approaches.
    摘要 深度神经网络(DNN)在机器学习应用中扮演着重要的角色。生产质量DNN模型在训练数百万个DNN参数后可以达到高精度推理。这种情况带来了Edge网络中的资源挑战,如移动设备和嵌入式设备,它们具有有限的计算和存储资源。为解决这个问题,模型会被截取,以创建轻量级、适应 Edge 设备的模型变体。现有的截取方法无法提供与原始模型相同质量的模型变体,或者需要过长的时间和资源成本,或者只适用于线上场景。我们的工作可以快速生成适应 Edge 设备的模型变体,保持原始模型的精度。这些模型变体可以快速交换,根据系统和网络条件变化,以适应工作负荷。我们提出了DNNShifter,一个端到端的DNN训练、空间截取和模型交换系统,解决以上问题。DNNShifter的核心是一种新的预测截取方法,可以快速生成稀疏模型的减少版本。这些减少版本比传统训练方法快速,小于传统稀疏模型的大小,但保持与原始 dense 模型的精度。DNNShifter生成了一个模型变体的PORTFOLIO,可以快速交换,根据操作条件变化。相比于稀疏模型,DNNShifter的减少版本更快,具有1.67倍的推理速度,而且没有牺牲稀疏模型的精度。此外,DNNShifter的模型交换过程 overhead 低于11.9倍,内存使用率低于3.8倍。

Setting the Right Expectations: Algorithmic Recourse Over Time

  • paper_url: http://arxiv.org/abs/2309.06969
  • repo_url: None
  • paper_authors: Joao Fonseca, Andrew Bell, Carlo Abrate, Francesco Bonchi, Julia Stoyanovich
  • for: 这篇论文关注了算法决策中的高风险决策,以及算法救济的可靠性问题。
  • methods: 该论文提出了一种基于代理模型的 simulated environment 来研究算法救济在不断变化的环境中的可靠性问题。
  • results: 研究发现,只有在某些特定的参数化情况下,算法救济可以在时间上保持可靠性。这指出了更多的研究是需要进行,以确保算法救济的可靠性。
    Abstract Algorithmic systems are often called upon to assist in high-stakes decision making. In light of this, algorithmic recourse, the principle wherein individuals should be able to take action against an undesirable outcome made by an algorithmic system, is receiving growing attention. The bulk of the literature on algorithmic recourse to-date focuses primarily on how to provide recourse to a single individual, overlooking a critical element: the effects of a continuously changing context. Disregarding these effects on recourse is a significant oversight, since, in almost all cases, recourse consists of an individual making a first, unfavorable attempt, and then being given an opportunity to make one or several attempts at a later date - when the context might have changed. This can create false expectations, as initial recourse recommendations may become less reliable over time due to model drift and competition for access to the favorable outcome between individuals. In this work we propose an agent-based simulation framework for studying the effects of a continuously changing environment on algorithmic recourse. In particular, we identify two main effects that can alter the reliability of recourse for individuals represented by the agents: (1) competition with other agents acting upon recourse, and (2) competition with new agents entering the environment. Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time. Consequently, we argue that substantial additional work is needed to understand recourse reliability over time, and to develop recourse methods that reward agents' effort.
    摘要 算法系统经常被召唤来协助高风险决策。因此,算法补救(individuals should be able to take action against an undesirable outcome made by an algorithmic system)正在受到加强注意力。大多数Literature on algorithmic recourse to date focuses primarily on how to provide recourse to a single individual, neglecting a critical element: the effects of a continuously changing context. Ignoring these effects on recourse is a significant oversight, as recourse typically consists of an individual making a first, unfavorable attempt, and then being given an opportunity to make one or several attempts at a later date - when the context might have changed. This can create false expectations, as initial recourse recommendations may become less reliable over time due to model drift and competition for access to the favorable outcome between individuals.在这个研究中,我们提出了一个基于代理模型的 simulate the effects of a continuously changing environment on algorithmic recourse. Specifically, we identify two main effects that can alter the reliability of recourse for individuals represented by the agents: (1) competition with other agents acting upon recourse, and (2) competition with new agents entering the environment. Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time. Therefore, we argue that substantial additional work is needed to understand recourse reliability over time, and to develop recourse methods that reward agents' effort.

Attention-based Dynamic Graph Convolutional Recurrent Neural Network for Traffic Flow Prediction in Highway Transportation

  • paper_url: http://arxiv.org/abs/2309.07196
  • repo_url: None
  • paper_authors: Tianpu Zhang, Weilong Ding, Mengda Xing
  • for: traffic flow prediction in highway transportation
  • methods: Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) with self-attention, multi-dynamic graphs, and dedicated gated kernel for graph convolution operations
  • results: better performance than state-of-the-art baselines and practical benefit in highway transportation, as proven by experiments on two public datasets and case studies of a real Web system.
    Abstract As one of the important tools for spatial feature extraction, graph convolution has been applied in a wide range of fields such as traffic flow prediction. However, current popular works of graph convolution cannot guarantee spatio-temporal consistency in a long period. The ignorance of correlational dynamics, convolutional locality and temporal comprehensiveness would limit predictive accuracy. In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed to improve traffic flow prediction in highway transportation. Three temporal resolutions of data sequence are effectively integrated by self-attention to extract characteristics; multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics; a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs to reduce overfitting for graph convolution operations. Experiments on two public datasets show our work better than state-of-the-art baselines, and case studies of a real Web system prove practical benefit in highway transportation.
    摘要 As one of the important tools for spatial feature extraction, graph convolution has been applied in a wide range of fields such as traffic flow prediction. However, current popular works of graph convolution cannot guarantee spatio-temporal consistency in a long period. The ignorance of correlational dynamics, convolutional locality and temporal comprehensiveness would limit predictive accuracy. In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed to improve traffic flow prediction in highway transportation. Three temporal resolutions of data sequence are effectively integrated by self-attention to extract characteristics; multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics; a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs to reduce overfitting for graph convolution operations. Experiments on two public datasets show our work better than state-of-the-art baselines, and case studies of a real Web system prove practical benefit in highway transportation.Here's the breakdown of the translation:* "As one of the important tools for spatial feature extraction" becomes "作为空间特征提取工具之一"* "graph convolution" becomes "图 convolution"* "cannot guarantee spatio-temporal consistency in a long period" becomes "不能保证长期空间-时间一致性"* "ignorance of correlational dynamics" becomes "忽略相关动态的忽略"* "convolutional locality" becomes "卷积性的局部性"* "temporal comprehensiveness" becomes "时间全面性"* "would limit predictive accuracy" becomes "会限制预测精度"* "In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed" becomes "本文提出了一种新型的注意力基于动态图 convolutional Recurrent Neural Network (ADGCRNN)"* "Three temporal resolutions of data sequence are effectively integrated by self-attention" becomes "数据序列的三个时间分辨率被自注意力有效地集成"* "multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics" becomes "多个动态图和其权重在运行时动态创建,以兼容不同特征的变化"* "a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs" becomes "在这些完整的图上引入了一种专门的权重抑制层,以强调高度相关的节点"* "Experiments on two public datasets show our work better than state-of-the-art baselines" becomes "在两个公共数据集上进行了对我们的工作的比较,显示我们的工作比预设基线更好"* "case studies of a real Web system prove practical benefit in highway transportation" becomes "一个真实的Web系统的案例研究表明,我们的工作在高速交通中具有实用的利益"

Towards Reliable Dermatology Evaluation Benchmarks

  • paper_url: http://arxiv.org/abs/2309.06961
  • repo_url: https://github.com/digital-dermatology/selfclean-revised-benchmarks
  • paper_authors: Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Matthew Groh, Roxana Daneshjou, Labelling Consortium, Alexander A. Navarini, Marc Pouly
  • for: 这篇论文是为了提高数字DERMATOLOGY模型性能评估的可靠性而写的。
  • methods: 该论文提出了一种资源效率的数据清洁协议,使用现有的算法清洁策略,并通过多名DERMATOLOGIST的确认进行筛选不相关样本和近似样本,并估计每个DERMATOLOGY图像集中标签错误的百分比。
  • results: 该论文通过对六个DERMATOLOGY图像集进行数据清洁和确认,发现这些集中存在一些不准确的标签,并估计这些标签错误的百分比。同时, authors也发布了每个图像集的修订后的文件列表,以便将来的模型评估工作更加可靠。
    Abstract Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.
    摘要 《数字DERMATOLOGY中的标准数据集不慎地含有错误,这会降低模型性能估计的信任度。我们提议一种资源有效的数据清洁协议,以确定这些问题。该协议基于现有的算法清洁策略,并且通过多名 dermatologist 的确认进行终止。我们从六个DERMATOLOGY图像集中移除无关样本和近似样本,并估计这些样本集中标签错误的百分比。我们随着这篇论文发布了每个样本集的修订文件列表,这些列表应该用于模型评估。我们的工作为数字DERMATOLOGY中的性能评估奠定了基础。》Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

  • paper_url: http://arxiv.org/abs/2309.06960
  • repo_url: None
  • paper_authors: Hanqing Guo, Guangjing Wang, Yuanda Wang, Bocheng Chen, Qiben Yan, Li Xiao
  • for: 针对声音助手的黑盒攻击
  • methods: 利用决策基本攻击避免训练模型输出的敏感信息泄露
  • results: 在3个实际场景下成功攻击5种商业声音控制设备,穿越3种生物特征验证机制,成功率高于95%,并且可以在几分钟内生成攻击示例并发起攻击
    Abstract In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with >95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and ~1,500 queries (~25 minutes), respectively.
    摘要 在这篇论文中,我们提出了PhantomSound,一种高效的黑盒攻击方法 toward voice assistants。现有的黑盒对voice assistants的攻击方法 Either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples.然而,这些攻击方法需要较长的训练时间和较多的查询数。PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation.在实验中,我们对4种speech-to-text API进行了3个实际场景的攻击,以示出实时攻击的影响。结果显示,PhantomSound可以快速和可靠地攻击5种流行的语音控制设备,并可以绕过3种生活体验检测机制,成功率高于95%。 benchmark结果表明,PhantomSound可以在几分钟内生成攻击示例和发动攻击。我们在查询效率和成功攻击成本方面做出了重要提高,相比之前的黑盒攻击方法,PhantomSound可以在merely ~300 queries(~5分钟)和~1,500 queries(~25分钟)内发动成功无目标和目标攻击,减少了93.1%和65.5%的成本。

Implicit Neural Multiple Description for DNA-based data storage

  • paper_url: http://arxiv.org/abs/2309.06956
  • repo_url: None
  • paper_authors: Trung Hieu Le, Xavier Pic, Jeremy Mateos, Marc Antonini
  • for: 这篇论文是为了探讨用DNA作为数据存储解决方案的可能性,以及处理存储和生物 manipulate 操作中出现的错误的技术方法。
  • methods: 该论文使用了一种新型的压缩方案和多重描述编码(MDC)技术,利用神经网络对DNA数据进行编码,以快速减少数据的存储容量。
  • results: 实验结果显示,该方法可以与当前领域中最新的DNA数据存储方法相比,具有更高的压缩率和更强的静音鲁棒性。
    Abstract DNA exhibits remarkable potential as a data storage solution due to its impressive storage density and long-term stability, stemming from its inherent biomolecular structure. However, developing this novel medium comes with its own set of challenges, particularly in addressing errors arising from storage and biological manipulations. These challenges are further conditioned by the structural constraints of DNA sequences and cost considerations. In response to these limitations, we have pioneered a novel compression scheme and a cutting-edge Multiple Description Coding (MDC) technique utilizing neural networks for DNA data storage. Our MDC method introduces an innovative approach to encoding data into DNA, specifically designed to withstand errors effectively. Notably, our new compression scheme overperforms classic image compression methods for DNA-data storage. Furthermore, our approach exhibits superiority over conventional MDC methods reliant on auto-encoders. Its distinctive strengths lie in its ability to bypass the need for extensive model training and its enhanced adaptability for fine-tuning redundancy levels. Experimental results demonstrate that our solution competes favorably with the latest DNA data storage methods in the field, offering superior compression rates and robust noise resilience.
    摘要

DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

  • paper_url: http://arxiv.org/abs/2309.06941
  • repo_url: None
  • paper_authors: Xiangchen Yin, Zhenda Yu, Xin Gao, Ran Ju, Xiao Sun, Xinyu Zhang
  • for: 这篇论文的目的是提高低光照图像的色彩和细节,以便在自动驾驶中进行高级视觉任务。
  • methods: 该论文提出了一种基于频率的新凭据,并提出了一种名为频率增强变换器(DEFormer)。该变换器包括一个可学习的频率分支(LFB),用于频率增强,以及一种跨频域融合(CDF)来减少RGB频谱和频率频谱之间的差异。
  • results: 该论文的实验结果表明,通过使用DEFormer进行预处理,可以提高检测器的性能,在ExDark和DARK FACE数据集上分别提高了2.1%和3.4%的均值精度。
    Abstract The goal of low-light image enhancement is to restore the color and details of the image and is of great significance for high-level visual tasks in autonomous driving. However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain. In this paper we introduce frequency as a new clue into the network and propose a novel DCT-driven enhancement transformer (DEFormer). First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE). CFE calculates the curvature of each channel to represent the detail richness of different frequency bands, then we divides the frequency features, which focuses on frequency bands with richer textures. In addition, we propose a cross domain fusion (CDF) for reducing the differences between the RGB domain and the frequency domain. We also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively improves the performance of the detector, bringing 2.1% and 3.4% improvement in ExDark and DARK FACE datasets on mAP respectively.
    摘要 goal of low-light image enhancement is to restore the color and details of the image, and it is of great significance for high-level visual tasks in autonomous driving. However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain. In this paper, we introduce frequency as a new clue into the network and propose a novel DCT-driven enhancement transformer (DEFormer). First, we propose a learnable frequency branch (LFB) for frequency enhancement, which contains DCT processing and curvature-based frequency enhancement (CFE). CFE calculates the curvature of each channel to represent the detail richness of different frequency bands, then we divide the frequency features, which focuses on frequency bands with richer textures. In addition, we propose a cross domain fusion (CDF) for reducing the differences between the RGB domain and the frequency domain. We also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively improves the performance of the detector, bringing 2.1% and 3.4% improvement in ExDark and DARK FACE datasets on mAP respectively.

Collectionless Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2309.06938
  • repo_url: None
  • paper_authors: Marco Gori, Stefano Melacci
  • For: The paper advocates for the development of new machine learning protocols that prioritize human-like cognitive skills and environmental interactions, rather than relying on centralized data collections.* Methods: The proposed approach is based on the collectionless principle, which restricts the learning process to processing data acquired from the environment at each time instant, without storing temporal information. This promotes self-organized memorization skills and dynamic information organization.* Results: The authors suggest that this approach could lead to the development of AI technologies that are better suited to addressing privacy issues, control, and customizability, and that it could reduce the concentration of power in companies and governments by promoting massively distributed computation.
    Abstract By and large, the professional handling of huge data collections is regarded as a fundamental ingredient of the progress of machine learning and of its spectacular results in related disciplines, with a growing agreement on risks connected to the centralization of such data collections. This paper sustains the position that the time has come for thinking of new learning protocols where machines conquer cognitive skills in a truly human-like context centered on environmental interactions. This comes with specific restrictions on the learning protocol according to the collectionless principle, which states that, at each time instant, data acquired from the environment is processed with the purpose of contributing to update the current internal representation of the environment, and that the agent is not given the privilege of recording the temporal stream. Basically, there is neither permission to store the temporal information coming from the sensors, thus promoting the development of self-organized memorization skills at a more abstract level, instead of relying on bare storage to simulate learning dynamics that are typical of offline learning algorithms. This purposely extreme position is intended to stimulate the development of machines that learn to dynamically organize the information by following human-based schemes. The proposition of this challenge suggests developing new foundations on computational processes of learning and reasoning that might open the doors to a truly orthogonal competitive track on AI technologies that avoid data accumulation by design, thus offering a framework which is better suited concerning privacy issues, control and customizability. Finally, pushing towards massively distributed computation, the collectionless approach to AI will likely reduce the concentration of power in companies and governments, thus better facing geopolitical issues.
    摘要 (以下是简化中文版本)通常来说,职业处理巨大数据集是机器学习进步和相关领域的基本组成部分,并且有一致的看法是中央化数据集的风险。这篇文章支持新的学习协议,即机器在人类化上下文中 conquering 认知能力,并且采用时间约束,不允许机器记录时间流。这种极端位置的目的是促进机器学习动态组织信息,按照人类方式,而不是依靠存储来模拟传统学习算法的动力学。这种挑战的提议是开发新的学习和理解过程的计算基础,以解决隐私、控制和个性化等问题。最后,通过分布式计算,集中化的问题将减少,从而更好地面对地opolitical 问题。

Continual Learning with Dirichlet Generative-based Rehearsal

  • paper_url: http://arxiv.org/abs/2309.06917
  • repo_url: None
  • paper_authors: Min Zeng, Wei Xue, Qifeng Liu, Yike Guo
  • for: 提高数据驱动任务异常对话系统(ToDs)的不断学习能力,解决由计算约束和时间消耗引起的问题。
  • methods: Dirichlet Continual Learning(DCL),一种基于生成的再学习策略,使用 Dirichlet 分布来模型 latent 变量,fficiently 捕捉上一个任务的句子特征,并准确地生成 pseudo 样本。 In addition, 我们引入 Jensen-Shannon Knowledge Distillation(JSKD),一种可靠的 logit-based 知识传递方法,在 pseudo 样本生成过程中增强知识传递。
  • results: 我们的方法在意图检测和填充任务中表现出色,超越了现有的方法。
    Abstract Recent advancements in data-driven task-oriented dialogue systems (ToDs) struggle with incremental learning due to computational constraints and time-consuming issues. Continual Learning (CL) attempts to solve this by avoiding intensive pre-training, but it faces the problem of catastrophic forgetting (CF). While generative-based rehearsal CL methods have made significant strides, generating pseudo samples that accurately reflect the underlying task-specific distribution is still a challenge. In this paper, we present Dirichlet Continual Learning (DCL), a novel generative-based rehearsal strategy for CL. Unlike the traditionally used Gaussian latent variable in the Conditional Variational Autoencoder (CVAE), DCL leverages the flexibility and versatility of the Dirichlet distribution to model the latent prior variable. This enables it to efficiently capture sentence-level features of previous tasks and effectively guide the generation of pseudo samples. In addition, we introduce Jensen-Shannon Knowledge Distillation (JSKD), a robust logit-based knowledge distillation method that enhances knowledge transfer during pseudo sample generation. Our experiments confirm the efficacy of our approach in both intent detection and slot-filling tasks, outperforming state-of-the-art methods.
    摘要

Towards the TopMost: A Topic Modeling System Toolkit

  • paper_url: http://arxiv.org/abs/2309.06908
  • repo_url: https://github.com/bobxwu/topmost
  • paper_authors: Xiaobao Wu, Fengjun Pan, Anh Tuan Luu
  • For: 提供了一个包含完整生命周期的主题模型工具套件(TopMost),以便快速使用、公正比较和扩展不同的主题模型。* Methods: TopMost 采用了高度协调和分离的模块化设计,可以快速使用、公正比较和扩展不同的主题模型。* Results: TopMost 可以覆盖更广泛的主题模型enario,包括资料预处理、模型训练、测试和评估等完整生命周期。In English, this means:* For: The paper proposes a Topic Modeling System Toolkit (TopMost) that provides a comprehensive set of tools for topic modeling, including dataset pre-processing, model training, testing, and evaluations.* Methods: TopMost uses a highly cohesive and decoupled modular design, allowing for quick use, fair comparisons, and flexible extensions of different topic models.* Results: TopMost can cover a wider range of topic modeling scenarios, including complete lifecycles with dataset pre-processing, model training, testing, and evaluations.
    Abstract Topic models have been proposed for decades with various applications and recently refreshed by the neural variational inference. However, these topic models adopt totally distinct dataset, implementation, and evaluation settings, which hinders their quick utilization and fair comparisons. This greatly hinders the research progress of topic models. To address these issues, in this paper we propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost stands out by covering a wider range of topic modeling scenarios including complete lifecycles with dataset pre-processing, model training, testing, and evaluations. The highly cohesive and decoupled modular design of TopMost enables quick utilization, fair comparisons, and flexible extensions of different topic models. This can facilitate the research and applications of topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost.
    摘要 Topic models have been proposed for decades with various applications, and recently have been refreshed by neural variational inference. However, these topic models use completely different datasets, implementations, and evaluation settings, which greatly hinders their quick utilization and fair comparisons. This hinders the research progress of topic models. To address these issues, in this paper we propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost covers a wider range of topic modeling scenarios, including complete lifecycles with dataset pre-processing, model training, testing, and evaluations. The highly cohesive and decoupled modular design of TopMost enables quick utilization, fair comparisons, and flexible extensions of different topic models. This can facilitate the research and applications of topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost.Here's the word-for-word translation:Topic models已经提出了数十年,具有各种应用,最近又被变量推理重新启发。然而,这些主题模型使用完全不同的数据集、实现和评估设置,这会大大阻碍它们的快速使用和公正比较。这阻碍了主题模型的研究进步。为了解决这些问题,在这篇论文中,我们提出了一个主题模型系统工具集(TopMost)。相比之下,现有的工具集,TopMost覆盖了更广泛的主题模型enario,包括完整的生命周期,从数据预处理、模型训练、测试、评估等方面。TopMost的高度凝结和分离的模块化设计,使得快速使用、公正比较和扩展不同的主题模型变得更加容易。这可以促进主题模型的研究和应用。我们的代码、教程和文档可以在https://github.com/bobxwu/topmost中找到。

OWL Reasoners still useable in 2023

  • paper_url: http://arxiv.org/abs/2309.06888
  • repo_url: https://github.com/k00ni/owl-reasoner-list
  • paper_authors: Konrad Abicht
  • for: 这篇论文的目的是对OWL理解器进行系统性的文献和软件综述,以确定它们在2023年是否仍然可用。
  • methods: 本论文使用了对100个OWL理解器/系统的分析,以及对每个item的项目页面、源代码仓库和相关文档的收集。
  • results: 研究得到了95个独立的OWL理解器和使用OWL理解器的系统,并提供了相关的原始研究数据,以便任何人使用。
    Abstract In a systematic literature and software review over 100 OWL reasoners/systems were analyzed to see if they would still be usable in 2023. This has never been done in this capacity. OWL reasoners still play an important role in knowledge organisation and management, but the last comprehensive surveys/studies are more than 8 years old. The result of this work is a comprehensive list of 95 standalone OWL reasoners and systems using an OWL reasoner. For each item, information on project pages, source code repositories and related documentation was gathered. The raw research data is provided in a Github repository for anyone to use.
    摘要 在一项系统性的文献和软件综述中,对超过100个OWL理解器/系统进行了分析,以确定它们在2023年仍然可以使用。这是历史上没有done before。OWL理解器仍然在知识组织和管理中发挥着重要作用,但最后一次全面的调查/研究已经超过8年了。这项工作的结果是一份包含95个独立OWL理解器和使用OWL理解器的系统的完整列表。对每个项目,我们收集了项目页面、源代码存储库和相关文档的信息。 raw research data 被提供到GitHub存储库中,任何人都可以使用。

Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles

  • paper_url: http://arxiv.org/abs/2309.06844
  • repo_url: None
  • paper_authors: Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov
  • for: 本研究旨在提高社交媒体上的信息对象性和质量,通过对话检测技术来检测信息的主观性。
  • methods: 本研究使用了三种不同的研究方向:一是基于精度调整的句子嵌入模型和维度减少;二是使用一种具有几种语言的多语言变换器,在修改后的数据集上进行了训练;三是使用一种几何学上的简单多数投票算法,将三种方法进行简单的加权投票。
  • results: 本研究在CheckThat! lab Task~2的测试集上取得了0.77宏F1的成绩,在英语子任务上位列第二名。
    Abstract The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different research directions are explored. The first one is based on fine-tuning a sentence embeddings encoder model and dimensionality reduction. The second one explores a sample-efficient few-shot learning model. The third one evaluates fine-tuning a multilingual transformer on an altered dataset, using data from multiple languages. Finally, the three approaches are combined in a simple majority voting ensemble, resulting in 0.77 macro F1 on the test set and achieving 2nd place on the English subtask.
    摘要 因为社交媒体的广泛使用,互联网上的信息中有很多主观、误导和False信息。因此,主观检测可以为确保信息的 объекivity 和质量做出重要的贡献。这篇文章介绍了Gpachov 团队为 CLEF-2023 CheckThat! 实验室任务2的主观检测解决方案。文章 explore 了三个不同的研究方向:一是基于精细调整句子嵌入模型和维度减少;二是探索一种高效的少量样本学习模型;三是使用多种语言的多语言 transformer 模型在修改后的数据集上进行了调整。最后,这三种方法被组合成了简单的多数投票ensemble,在测试集上获得了0.77宏折衔率,并在英语子任务上获得了第二名。

On the Local Quadratic Stability of T-S Fuzzy Systems in the Vicinity of the Origin

  • paper_url: http://arxiv.org/abs/2309.06841
  • repo_url: None
  • paper_authors: Donghwan Lee, Do Wan Kim
  • for: 这篇论文旨在提出新的连续时间 Takagi-Sugeno(T-S)朴素系统的本地稳定条件。
  • methods: 这些稳定条件基于线性矩阵不等式(LMIs),并利用quadratic Lyapunov函数。这些条件能够充分利用Underlying nonlinear系统的线性结构,并且在起点处提供信息。
  • results: 提出的条件比现有的杂化Lyapunov函数方法更为不保守,并且 prove to be necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems.
    Abstract The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying nonlinear system in the vicinity of the origin. As a result, the proposed conditions are proved to be less conservative compared to existing methods using fuzzy Lyapunov functions in the literature. Moreover, we establish that the proposed methods offer necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems. The paper also includes discussions on the inherent limitations associated with fuzzy Lyapunov approaches. To demonstrate the theoretical results, we provide comprehensive examples that elucidate the core concepts and validate the efficacy of the proposed conditions.
    摘要 本文的主要目标是介绍新的连续时间高aki-Sugeno(T-S)混合系统的本地稳定条件。这些稳定条件基于线性矩阵不等式(LMIs),并与 quadratic Lyapunov 函数相结合。此外,它们还利用了非线性系统的原点附近的线性结构,并且考虑了 membership 函数的原点信息。因此,提出的条件比现有的使用混合Lyapunov函数的方法更加不保守。此外,我们证明了该方法对 T-S 混合系统的本地 экспоненциаль稳定是必要和充分的。文章还探讨了混合Lyapunov 方法的内在限制。为validate the theoretical results, we provide comprehensive examples that illustrate the key concepts and demonstrate the effectiveness of the proposed conditions.Note: "高aki-Sugeno" is the Simplified Chinese term for "Takagi-Sugeno".

SAMUS: Adapting Segment Anything Model for Clinically-Friendly and Generalizable Ultrasound Image Segmentation

  • paper_url: http://arxiv.org/abs/2309.06824
  • repo_url: https://github.com/xianlin7/samus
  • paper_authors: Xian Lin, Yangyang Xiang, Li Zhang, Xin Yang, Zengqiang Yan, Li Yu
  • for: 这篇论文是为了提出一种适用于ultrasound图像分割的通用模型(SAMUS),以提高现有基础模型(SAM)在医疗图像分割 task 的表现。
  • methods: 该模型基于 SAM 的基础,增加了一个平行 CNN 分支,通过交叉分支注意力来注入本地特征到 ViT 编码器,以提高医疗图像分割的表现。此外,还开发了位置适配器和特征适配器,以适应 SAM 的医疗图像分割 task 和小型输入 (256x256) 的要求。
  • results: 对比评测表明,SAMUS 在任务特定模型和基础模型下都具有显著优势,并且可以在入门级 GPU 上部署。code、数据和模型将在 GitHub 上发布。
    Abstract Segment anything model (SAM), an eminent universal image segmentation model, has recently gathered considerable attention within the domain of medical image segmentation. Despite the remarkable performance of SAM on natural images, it grapples with significant performance degradation and limited generalization when confronted with medical images, particularly with those involving objects of low contrast, faint boundaries, intricate shapes, and diminutive sizes. In this paper, we propose SAMUS, a universal model tailored for ultrasound image segmentation. In contrast to previous SAM-based universal models, SAMUS pursues not only better generalization but also lower deployment cost, rendering it more suitable for clinical applications. Specifically, based on SAM, a parallel CNN branch is introduced to inject local features into the ViT encoder through cross-branch attention for better medical image segmentation. Then, a position adapter and a feature adapter are developed to adapt SAM from natural to medical domains and from requiring large-size inputs (1024x1024) to small-size inputs (256x256) for more clinical-friendly deployment. A comprehensive ultrasound dataset, comprising about 30k images and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate SAMUS's superiority against the state-of-the-art task-specific models and universal foundation models under both task-specific evaluation and generalization evaluation. Moreover, SAMUS is deployable on entry-level GPUs, as it has been liberated from the constraints of long sequence encoding. The code, data, and models will be released at https://github.com/xianlin7/SAMUS.
    摘要 segments anything model (SAM),一种受欢迎的通用图像分割模型,在医疗图像分割领域已经吸引了广泛的关注。尽管SAM在自然图像上表现出色,但在医疗图像上却面临较大的性能下降和限制,特别是对于低对比度、柔和的边界、复杂形态和小型对象的分割。在本文中,我们提出了SAMUS,一种适用于ultrasound图像分割的通用模型。与前一代基于SAM的通用模型不同,SAMUS不仅增加了更好的泛化性,还减少了部署成本,使其更适合临床应用。具体来说,基于SAM,我们在ViT编码器中引入了并行的CNN分支,通过交叉分支注意力注入本地特征,以改进医疗图像分割。然后,我们开发了位置适应器和特征适应器,以适应SAM从自然到医疗领域的转换,并从需要大型输入(1024x1024)降低到小型输入(256x256),以便更加临床友好的部署。我们收集了一个涵盖6类对象的完整ultrasound数据集,包括约30k张图像和69k个Mask,并进行了广泛的比较试验。结果表明SAMUS在比较当今任务特定模型和基础模型下显示出优异性,同时也可以在泛化评价中达到优秀的表现。此外,SAMUS可以在入门级GPU上部署,因为它已经脱离了长序编码的限制。代码、数据和模型将在https://github.com/xianlin7/SAMUS上公开。

Comparative Analysis of Contextual Relation Extraction based on Deep Learning Models

  • paper_url: http://arxiv.org/abs/2309.06814
  • repo_url: None
  • paper_authors: R. Priyadharshini, G. Jeyakodi, P. Shanthi Bala
  • for: 本研究旨在构建域知 graph,使用 Ontology 进行 semantic search、query answering 和 textual entailment。
  • methods: 本文使用深度学习技术来提取复杂关系,解决现有机器学习和自然语言处理技术无法高效地预测多个关系和未特定实体的问题。
  • results: 本研究表明,hybrid deep learning 模型可以有效地提取复杂句子中的关系,并且可以解决现有机器学习模型无法预测多个关系的问题。
    Abstract Contextual Relation Extraction (CRE) is mainly used for constructing a knowledge graph with a help of ontology. It performs various tasks such as semantic search, query answering, and textual entailment. Relation extraction identifies the entities from raw texts and the relations among them. An efficient and accurate CRE system is essential for creating domain knowledge in the biomedical industry. Existing Machine Learning and Natural Language Processing (NLP) techniques are not suitable to predict complex relations from sentences that consist of more than two relations and unspecified entities efficiently. In this work, deep learning techniques have been used to identify the appropriate semantic relation based on the context from multiple sentences. Even though various machine learning models have been used for relation extraction, they provide better results only for binary relations, i.e., relations occurred exactly between the two entities in a sentence. Machine learning models are not suited for complex sentences that consist of the words that have various meanings. To address these issues, hybrid deep learning models have been used to extract the relations from complex sentence effectively. This paper explores the analysis of various deep learning models that are used for relation extraction.
    摘要 Contextual Relation Extraction (CRE) 主要用于构建知识图库,帮助了 Ontology。它执行多种任务,如semantic search、查询回答和文本推理。relation extraction 可以从 raw text 中提取实体和关系。一个有效和准确的 CRE 系统对生物医学领域的Domain Knowledge 创造至关重要。现有的机器学习和自然语言处理(NLP)技术不适用于从多个句子中提取复杂关系。在这种情况下,深度学习技术被使用,以基于上下文从多个句子中提取适当的semantic关系。尽管许多机器学习模型已经用于relation extraction,但它们只能提供二分关系(即sentence中两个实体之间的关系)的更好结果。机器学习模型不适用于含有多义词的复杂句子。为解决这些问题,hybrid深度学习模型被用来从复杂句子中提取关系。本文探讨了不同的深度学习模型在relation extraction中的分析。

Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly

  • paper_url: http://arxiv.org/abs/2309.06810
  • repo_url: https://github.com/crtie/leveraging-se-3-equivariance-for-learning-3d-geometric-shape-assembly
  • paper_authors: Ruihai Wu, Chenrui Tie, Yushi Du, Yan Zhao, Hao Dong
  • for: 本研究目的是解决受损物体重新组装问题,即在计算机视觉和机器人领域中的一个新兴任务。不同于传统的semantic部件组装(例如将 chair的Semantic部件如脚部组装成整个椅子),这个任务更关注几何信息,而不是semantic信息。
  • methods: 我们提出了利用SE(3)对称性来实现几何 pose 分解的方法。而在之前的视觉和机器人研究中,只考虑单个物体表示的SE(3)对称性,我们则是在考虑多部件相互关系的情况下提出了利用SE(3)对称性的方法,这有助于提高多部件组装的性能。
  • results: 实验结果表明,SE(3)对称性和我们提出的方法对几何形态组装具有重要作用。这些结果表明,在多部件组装任务中,利用SE(3)对称性可以提高组装精度和效率。项目页面:https://crtie.github.io/SE-3-part-assembly/
    Abstract Shape assembly aims to reassemble parts (or fragments) into a complete object, which is a common task in our daily life. Different from the semantic part assembly (e.g., assembling a chair's semantic parts like legs into a whole chair), geometric part assembly (e.g., assembling bowl fragments into a complete bowl) is an emerging task in computer vision and robotics. Instead of semantic information, this task focuses on geometric information of parts. As the both geometric and pose space of fractured parts are exceptionally large, shape pose disentanglement of part representations is beneficial to geometric shape assembly. In our paper, we propose to leverage SE(3) equivariance for such shape pose disentanglement. Moreover, while previous works in vision and robotics only consider SE(3) equivariance for the representations of single objects, we move a step forward and propose leveraging SE(3) equivariance for representations considering multi-part correlations, which further boosts the performance of the multi-part assembly. Experiments demonstrate the significance of SE(3) equivariance and our proposed method for geometric shape assembly. Project page: https://crtie.github.io/SE-3-part-assembly/
    摘要 Shape assembly aims to reassemble parts (or fragments) into a complete object, which is a common task in our daily life. Different from the semantic part assembly (e.g., assembling a chair's semantic parts like legs into a whole chair), geometric part assembly (e.g., assembling bowl fragments into a complete bowl) is an emerging task in computer vision and robotics. Instead of semantic information, this task focuses on geometric information of parts. As the both geometric and pose space of fractured parts are exceptionally large, shape pose disentanglement of part representations is beneficial to geometric shape assembly. In our paper, we propose to leverage SE(3) equivariance for such shape pose disentanglement. Moreover, while previous works in vision and robotics only consider SE(3) equivariance for the representations of single objects, we move a step forward and propose leveraging SE(3) equivariance for representations considering multi-part correlations, which further boosts the performance of the multi-part assembly. Experiments demonstrate the significance of SE(3) equivariance and our proposed method for geometric shape assembly. Project page: .

Bayesian uncertainty-weighted loss for improved generalisability on polyp segmentation task

  • paper_url: http://arxiv.org/abs/2309.06807
  • repo_url: None
  • paper_authors: Rebecca S. Stone, Pedro E. Chavarrias-Solano, Andrew J. Bulpitt, David C. Hogg, Sharib Ali
  • for: 这 paper 的目的是为了提高肿瘤分类的一致性,减少因为不同中心或受测器等因素而导致的模型偏见。
  • methods: 这 paper 使用了一种基于 Bayesian 的 epistemic uncertainty 来降低模型内在偏见,以提高模型对不同中心和受测器等不寻常数据的一致性。
  • results: 这 paper 在一个具有多中心和多数据模式的肿瘤分类数据集(PolypGen)上进行了评估,结果显示这种方法可以提高模型的一致性无需损失现有的表现水准。
    Abstract While several previous studies have devised methods for segmentation of polyps, most of these methods are not rigorously assessed on multi-center datasets. Variability due to appearance of polyps from one center to another, difference in endoscopic instrument grades, and acquisition quality result in methods with good performance on in-distribution test data, and poor performance on out-of-distribution or underrepresented samples. Unfair models have serious implications and pose a critical challenge to clinical applications. We adapt an implicit bias mitigation method which leverages Bayesian epistemic uncertainties during training to encourage the model to focus on underrepresented sample regions. We demonstrate the potential of this approach to improve generalisability without sacrificing state-of-the-art performance on a challenging multi-center polyp segmentation dataset (PolypGen) with different centers and image modalities.
    摘要 Previous studies have proposed various methods for segmenting polyps, but most of these methods have not been rigorously evaluated on multi-center datasets. The differences in the appearance of polyps between centers, the grades of endoscopic instruments, and the variations in image acquisition quality can lead to models with good performance on in-distribution test data but poor performance on out-of-distribution or underrepresented samples. This can have serious implications for clinical applications. We have adapted an approach to mitigate implicit bias by leveraging Bayesian epistemic uncertainties during training to encourage the model to focus on underrepresented sample regions. We demonstrate the potential of this approach to improve generalizability without sacrificing state-of-the-art performance on a challenging multi-center polyp segmentation dataset (PolypGen) with different centers and image modalities.Here's the word-for-word translation of the text into Simplified Chinese:先前的研究已经提出了多种方法来分割肿体,但大多数这些方法没有在多中心数据集上进行了审核。不同的中心所出现的肿体的外观、endoскоpic工具的级别和图像获取的质量变化会导致模型在不同的测试数据上表现出色,但对于少数或下 represents 的样本表现差。这会对临床应用造成严重的问题。我们采用了一种避免偏见的方法,通过在训练时使用泛化不确定性来强制模型关注下 represents 的样本区域。我们在一个多中心肿体分割数据集(PolypGen)上示出了这种方法的潜在改进性,不 sacrifice 先进的性能。

FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization

  • paper_url: http://arxiv.org/abs/2309.06805
  • repo_url: https://github.com/ericloong/feddip
  • paper_authors: Qianyu Long, Christos Anagnostopoulos, Shameem Puthiya Parambath, Daning Bi
  • for: 这篇论文旨在提出一个新的联合式学习(Federated Learning)框架,以确保在分布式训练和推断大规模深度学习网络(DNNs)中,能够有效地控制模型的缩减和储存。
  • methods: 这篇论文使用了动态模型删除和错误反馈来删除无用的信息交换,并且运用了增量调整来实现极端简化的模型。
  • results: 论文的结果显示,FedDIP可以不仅控制模型的缩减,而且能够实现和其他模型删除方法相似或更好的性能。
    Abstract Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs). However, DNNs are characterized by an extremely large number of parameters, thus, yielding significant challenges in exchanging these parameters among distributed nodes and managing the memory. Although recent DNN compression methods (e.g., sparsification, pruning) tackle such challenges, they do not holistically consider an adaptively controlled reduction of parameter exchange while maintaining high accuracy levels. We, therefore, contribute with a novel FL framework (coined FedDIP), which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange, which contributes to significant performance improvement, with (ii) incremental regularization that can achieve \textit{extreme} sparsity of models. We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods using benchmark data sets and DNN models. Our results showcase that FedDIP not only controls the model sparsity but efficiently achieves similar or better performance compared to other model pruning methods adopting incremental regularization during distributed model training. The code is available at: https://github.com/EricLoong/feddip.
    摘要 联合学习(Federated Learning,FL)已经成功地应用于分布式训练和推导大规模深度神经网络(Deep Neural Networks,DNNs)。然而,DNNs 具有极高的参数数量,因此在分布式节点之间交换这些参数具有很大的挑战。 although recent DNN compression methods (e.g., sparsification, pruning) 可以解决这些挑战,但它们不会全面考虑适当地控制参数交换的减少,以维持高精度水平。我们因此提出了一个新的 FL 框架(称为 FedDIP),它结合了(i)动态模型剔除与错误反馈,以消除重复的资讯交换,从而获得了显著的性能改善,以及(ii)增量调整,可以实现极端简化的模型。我们提供了 FedDIP 的凝聚分析和比较性评估,使用了标准的 benchmark 数据集和 DNN 模型。我们的结果显示,FedDIP 不仅可以控制模型的简化,而且可以实现类似或更好的性能,与其他适用增量调整的模型剔除方法相比。代码可以在:https://github.com/EricLoong/feddip 中找到。

Defensive Alliances in Signed Networks

  • paper_url: http://arxiv.org/abs/2309.06801
  • repo_url: None
  • paper_authors: Emmanuel Arrighi, Zhidan Feng, Henning Fernau, Kevin Mann, Xingqin Qi, Petra Wolf
  • for: 本研究旨在找到一个集合 agents可以共同实现一个目标。
  • methods: 本研究使用 signed networks 模型 liking 和 disliking 间的agent,并提出了一种新的defensive alliance 概念。
  • results: 研究回答了一些自然的算法问题,以及一些 combinatorial 发现,connect our notion to correlation clustering 和 signed neighborhood diversity snd。还提出了一种 parameterized algorithm 来找到一个最小的defensive alliance。
    Abstract The analysis of (social) networks and multi-agent systems is a central theme in Artificial Intelligence. Some line of research deals with finding groups of agents that could work together to achieve a certain goal. To this end, different notions of so-called clusters or communities have been introduced in the literature of graphs and networks. Among these, defensive alliance is a kind of quantitative group structure. However, all studies on the alliance so for have ignored one aspect that is central to the formation of alliances on a very intuitive level, assuming that the agents are preconditioned concerning their attitude towards other agents: they prefer to be in some group (alliance) together with the agents they like, so that they are happy to help each other towards their common aim, possibly then working against the agents outside of their group that they dislike. Signed networks were introduced in the psychology literature to model liking and disliking between agents, generalizing graphs in a natural way. Hence, we propose the novel notion of a defensive alliance in the context of signed networks. We then investigate several natural algorithmic questions related to this notion. These, and also combinatorial findings, connect our notion to that of correlation clustering, which is a well-established idea of finding groups of agents within a signed network. Also, we introduce a new structural parameter for signed graphs, signed neighborhood diversity snd, and exhibit a parameterized algorithm that finds a smallest defensive alliance in a signed graph.
    摘要 social networks 和多代理系统的分析是人工智能的中心主题。一些研究的目标是找到一些代理工作 вместе以实现某个目标。为此,图和网络的不同的定义叫做集群或社区在文献中出现。其中,防御联盟是一种量化的群体结构。然而,所有关于联盟的研究都忽略了一个非常直观的因素:代理在与其他代理之间有互恩的情感,因此偏好与自己喜欢的代理一起组成一个群体,以便帮助彼此实现共同目标,可能然后与外部的代理进行冲突。 signed networks 是在心理学文献中引入的,以模型代理之间的喜欢和讨厌关系,自然地扩展了图的概念。因此,我们提出了在 signed networks 上的新的防御联盟概念。我们然后调查了这个概念相关的自然的算法问题,以及 combinatorial 发现。这些问题与另一个已知的idea——相关均衡 clustering——相连。此外,我们引入了一个新的 signed 图结构参数,signed neighborhood diversity snd,并提供了一个 parameterized 算法,用于在 signed 图上找到最小的防御联盟。

Uncertainty-aware Traffic Prediction under Missing Data

  • paper_url: http://arxiv.org/abs/2309.06800
  • repo_url: None
  • paper_authors: Hao Mei, Junxian Li, Zhiming Liang, Guanjie Zheng, Bin Shi, Hua Wei
  • for: 这 paper 是为了解决 traffic prediction 中的一个重要问题,即如何在缺失历史记录的位置上进行预测。
  • methods: 这 paper 使用了一种基于 inductive graph neural network 的 uncertainty-aware 框架,可以1) 将预测扩展到缺失历史记录的位置,并2) 生成可预测性的预测结果,同时Quantify the uncertainty of the prediction results.
  • results: 经过广泛的实验 validate 了这 paper 的方法,可以在 real-life 数据上达到了良好的预测结果,并且uncertainty quantification 的结果与历史数据和无历史数据的位置之间呈高度相关性。此外,这 paper 还展示了其方法可以帮助交通部门在限定的投入下选择合适的感知器,以提高预测结果的准确性。
    Abstract Traffic prediction is a crucial topic because of its broad scope of applications in the transportation domain. Recently, various studies have achieved promising results. However, most studies assume the prediction locations have complete or at least partial historical records and cannot be extended to non-historical recorded locations. In real-life scenarios, the deployment of sensors could be limited due to budget limitations and installation availability, which makes most current models not applicable. Though few pieces of literature tried to impute traffic states at the missing locations, these methods need the data simultaneously observed at the locations with sensors, making them not applicable to prediction tasks. Another drawback is the lack of measurement of uncertainty in prediction, making prior works unsuitable for risk-sensitive tasks or involving decision-making. To fill the gap, inspired by the previous inductive graph neural network, this work proposed an uncertainty-aware framework with the ability to 1) extend prediction to missing locations with no historical records and significantly extend spatial coverage of prediction locations while reducing deployment of sensors and 2) generate probabilistic prediction with uncertainty quantification to help the management of risk and decision making in the down-stream tasks. Through extensive experiments on real-life datasets, the result shows our method achieved promising results on prediction tasks, and the uncertainty quantification gives consistent results which highly correlated with the locations with and without historical data. We also show that our model could help support sensor deployment tasks in the transportation field to achieve higher accuracy with a limited sensor deployment budget.
    摘要 traffic prediction 是一个重要的话题,因为它在交通领域有广泛的应用。在最近的研究中,许多研究已经获得了有前途的结果。然而,大多数研究假设预测位置有完整或至少部分的历史记录,并不能扩展到无历史记录的位置。在实际的场景中,投入设备的限制和安装可用性可能会限制投入设备的数量,使现有的模型无法适用。虽然一些文献尝试了填充交通状态的缺失位置,但这些方法需要同时观测的数据在投入设备上,使得它们无法适用于预测任务。另外,现有的方法缺乏测量预测结果的不确定性,使得前一代的研究不适用于风险敏感任务或决策任务。为了填补这个空白,我们提出了一种不确定性意识框架,能够1)扩展预测至缺失位置,大幅减少投入设备数量,同时提高预测精度,2)生成probabilistic预测,并对预测结果进行不确定性评估,以帮助管理风险和决策任务。经过广泛的实验研究,我们的方法在预测任务中获得了优秀的结果,并且不确定性评估与位置有 historia record 的相关性很高。我们还证明了我们的模型可以帮助交通领域中的投入设备部署任务实现更高的准确率,使用有限的投入设备预算。

When Geoscience Meets Foundation Models: Towards General Geoscience Artificial Intelligence System

  • paper_url: http://arxiv.org/abs/2309.06799
  • repo_url: None
  • paper_authors: Hao Zhang, Jin-Jian Xu
  • for: 这篇论文旨在探讨地球系统的动态模型,以探索地球系统的演化和发展。
  • methods: 这篇论文使用了跨学科数据集合来模拟地球系统的动态,并使用人工智能技术来探索这些数据的关系。
  • results: 这篇论文获得了一些关于地球系统的预测和模拟结果,并提供了一些关于地球系统的未来发展的探访。
    Abstract Geoscience foundation models represent a revolutionary approach in the field of Earth sciences by integrating massive cross-disciplinary data to simulate and understand the Earth systems dynamics. As a data-centric artificial intelligence (AI) paradigm, they uncover insights from petabytes of structured and unstructured data. Flexible task specification, diverse inputs and outputs and multi-modal knowledge representation enable comprehensive analysis infeasible with individual data sources. Critically, the scalability and generalizability of geoscience models allow for tackling diverse prediction, simulation, and decision challenges related to Earth systems interactions. Collaboration between domain experts and computer scientists leads to innovations in these invaluable tools for understanding the past, present, and future of our planet. However, challenges remain in validation and verification, scale, interpretability, knowledge representation, and social bias. Going forward, enhancing model integration, resolution, accuracy, and equity through cross-disciplinary teamwork is key. Despite current limitations, geoscience foundation models show promise for providing critical insights into pressing issues including climate change, natural hazards, and sustainability through their ability to probe scenarios and quantify uncertainties. Their continued evolution toward integrated, data-driven modeling holds paradigm-shifting potential for Earth science.
    摘要 地球科学基础模型代表了地球科学领域的一种革命性的方法,通过融合各个领务的大量交叉学科数据来模拟和理解地球系统的动态。作为一种数据驱动人工智能(AI)范例,它们揭示了数据中的新发现,从 petabytes 的结构化和无结构化数据中提取了有价值的信息。灵活的任务规定、多样化的输入和输出以及多Modal 的知识表示使得全面的分析变得可能,而各个数据源之间的集成分析则是不可能的。重要的是,地球科学基础模型的可扩展性和通用性使得可以解决多种预测、模拟和决策问题 related to Earth systems interactions。通过域专家和计算机科学家之间的合作,这些无价的工具为我们理解地球的过去、当前和未来带来了创新。然而,验证和验证、缺失、知识表示和社会偏见仍然是挑战。未来,通过跨学科团队的努力,地球科学基础模型的集成、分辨率、准确性和公平性将得到改进。尽管当前存在限制,但地球科学基础模型仍然拥有提供关键的洞察和量化不确定性的潜力,它们的持续演化将对地球科学产生 Paradigm-shifting 的影响。

Cognitive Mirage: A Review of Hallucinations in Large Language Models

  • paper_url: http://arxiv.org/abs/2309.06794
  • repo_url: https://github.com/hongbinye/cognitive-mirage-hallucinations-in-llms
  • paper_authors: Hongbin Ye, Tong Liu, Aijia Zhang, Wei Hua, Weiqiang Jia
  • for: 这 study 探讨了大语言模型中的幻觉现象,以及幻觉的类型、检测方法和改进策略。
  • methods: 该 paper 使用了 various 文本生成任务中的幻觉类型,并提供了理论分析、检测方法和改进方向。
  • results: 该 paper 提供了一个完整的幻觉分类系统,并提供了现有的检测和改进方法。 future research directions 也被提出了。
    Abstract As large language models continue to develop in the field of AI, text generation systems are susceptible to a worrisome phenomenon known as hallucination. In this study, we summarize recent compelling insights into hallucinations in LLMs. We present a novel taxonomy of hallucinations from various text generation tasks, thus provide theoretical insights, detection methods and improvement approaches. Based on this, future research directions are proposed. Our contribution are threefold: (1) We provide a detailed and complete taxonomy for hallucinations appearing in text generation tasks; (2) We provide theoretical analyses of hallucinations in LLMs and provide existing detection and improvement methods; (3) We propose several research directions that can be developed in the future. As hallucinations garner significant attention from the community, we will maintain updates on relevant research progress.
    摘要 Large language models (LLMs) 在人工智能领域的发展中,文本生成系统受到一种有害现象的威胁,称为投影。在这项研究中,我们summarize了最近的有力证明,描述了不同的文本生成任务中的投影现象,并提供了理论分析、检测方法和改进方法。根据这些成果,我们提出了未来研究的方向。我们的贡献包括以下三个方面:1. 我们提供了文本生成任务中投影现象的完整和详细的分类体系。2. 我们提供了LLMs中投影现象的理论分析,并提供了现有的检测和改进方法。3. 我们提出了未来研究的多个方向,以探索投影现象的原因和解决方案。由于投影现象在社区中受到广泛关注,我们将继续更新相关的研究进展。

Predicting Survival Time of Ball Bearings in the Presence of Censoring

  • paper_url: http://arxiv.org/abs/2309.07188
  • repo_url: https://github.com/thecml/ball-bearing-survival
  • paper_authors: Christian Marius Lillelund, Fernando Pannullo, Morten Opprud Jakobsen, Christian Fischer Pedersen
  • for: 这 paper 的目的是提出一种基于存活分析的方法来预测球磨具的时间到失败。
  • methods: 该 paper 使用了存活分析在时域和频域中分析数据,并使用了一些生存模型来估计时间到失败。
  • results: 该 paper 在 XJTU 和 PRONOSTIA 数据集上实现了良好的预测结果,其中 XJTU 上得到的最佳结果是0.70的 concordance-index和0.21的累积布莱尔分数,PRONOSTIA 上得到的最佳结果是0.76的 concordance-index和0.19的累积布莱尔分数。
    Abstract Ball bearings find widespread use in various manufacturing and mechanical domains, and methods based on machine learning have been widely adopted in the field to monitor wear and spot defects before they lead to failures. Few studies, however, have addressed the problem of censored data, in which failure is not observed. In this paper, we propose a novel approach to predict the time to failure in ball bearings using survival analysis. First, we analyze bearing data in the frequency domain and annotate when a bearing fails by comparing the Kullback-Leibler divergence and the standard deviation between its break-in frequency bins and its break-out frequency bins. Second, we train several survival models to estimate the time to failure based on the annotated data and covariates extracted from the time domain, such as skewness, kurtosis and entropy. The models give a probabilistic prediction of risk over time and allow us to compare the survival function between groups of bearings. We demonstrate our approach on the XJTU and PRONOSTIA datasets. On XJTU, the best result is a 0.70 concordance-index and 0.21 integrated Brier score. On PRONOSTIA, the best is a 0.76 concordance-index and 0.19 integrated Brier score. Our work motivates further work on incorporating censored data in models for predictive maintenance.
    摘要 滚球支持件在不同的生产和机械领域中广泛使用,而基于机器学习技术的监测方法也在这一领域得到了广泛应用。然而,有很少的研究专门关注缺失数据(失败不观察)问题。在这篇论文中,我们提出了一种新的方法,用于预测滚球支持件的时间到失败使用存生分析。首先,我们分析了滚球数据在频域中,并在break-in和break-out频率分布之间进行了比较,以确定缺失数据。然后,我们使用了多种存生模型来估算时间到失败的风险,并使用了时域中的特征,如方差、泊松指数和自 entropy。这些模型可以为不同的滚球支持件提供可比较的生存函数,并允许我们对缺失数据进行处理。我们在XJTU和PRONOSTIA数据集上进行了实践,最佳结果为XJTU的0.70 concordance-index和0.21 integral Brier score,PRONOSTIA的0.76 concordance-index和0.19 integral Brier score。我们的工作鼓励了更多关于缺失数据的包含在预测维护模型中的进一步研究。

Generative AI

  • paper_url: http://arxiv.org/abs/2309.07930
  • repo_url: https://github.com/StanGirard/quivr
  • paper_authors: Stefan Feuerriegel, Jochen Hartmann, Christian Janiesch, Patrick Zschech
  • for: 本研究旨在探讨Generative AI在信息系统中的应用和发展,以及其对BISE领域的影响和挑战。
  • methods: 本研究使用概率模型、深度学习和自然语言处理等技术,并提供了一些实际应用例如Dall-E 2、GPT-4和Copilot等。
  • results: 本研究发现当前Generative AI技术存在一些限制和挑战,如数据质量、隐私和安全等问题,并提出了BISE领域的研究论点和方向。
    Abstract The term "generative AI" refers to computational techniques that are capable of generating seemingly new, meaningful content such as text, images, or audio from training data. The widespread diffusion of this technology with examples such as Dall-E 2, GPT-4, and Copilot is currently revolutionizing the way we work and communicate with each other. In this article, we provide a conceptualization of generative AI as an entity in socio-technical systems and provide examples of models, systems, and applications. Based on that, we introduce limitations of current generative AI and provide an agenda for Business & Information Systems Engineering (BISE) research. Different from previous works, we focus on generative AI in the context of information systems, and, to this end, we discuss several opportunities and challenges that are unique to the BISE community and make suggestions for impactful directions for BISE research.
    摘要 “生成AI”是指通过训练数据生成新的、有意义的内容,如文本、图像或音频等。现在,这种技术的广泛散布,如达尔-E2、GPT-4和副手等,正在改变我们工作和交流方式。本文将生成AI视为社会技术系统中的一个实体,并提供模型、系统和应用的示例。然后,我们介绍当前生成AI的限制,并为商业信息系统工程(BISE)研究提供研究议程。与之前的研究不同,我们在信息系统上下文中强调生成AI,并讨论了BISE社区独特的机遇和挑战,并提供了影响力的BISE研究方向。

Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

  • paper_url: http://arxiv.org/abs/2309.06774
  • repo_url: None
  • paper_authors: Tilahun M. Getu, Georges Kaddoum
  • for: 本文旨在解释深度学习(DL)在多个领域中的成功原因,以及DL的核心问题。
  • methods: 本文使用了多种创新,包括优化、泛化和近似方法,以解决DL的问题。
  • results: 本文derived了深度Rectified Linear Unit(ReLU)隐藏层神经网络(FNN)和深度FNNwith ReLU和Tanh激活函数的二分类测试性能上限。这些上限被 validate by extensive computer experiments。
    Abstract Although deep learning (DL) has led to several breakthroughs in many disciplines as diverse as chemistry, computer science, electrical engineering, mathematics, medicine, neuroscience, and physics, a comprehensive understanding of why and how DL is empirically successful remains fundamentally elusive. To attack this fundamental problem and unravel the mysteries behind DL's empirical successes, significant innovations toward a unified theory of DL have been made. These innovations encompass nearly fundamental advances in optimization, generalization, and approximation. Despite these advances, however, no work to date has offered a way to quantify the testing performance of a DL-based algorithm employed to solve a pattern classification problem. To overcome this fundamental challenge in part, this paper exposes the fundamental testing performance limits of DL-based binary classifiers trained with hinge loss. For binary classifiers that are based on deep rectified linear unit (ReLU) feedforward neural networks (FNNs) and ones that are based on deep FNNs with ReLU and Tanh activation, we derive their respective novel asymptotic testing performance limits. The derived testing performance limits are validated by extensive computer experiments.
    摘要 Note:* "深度学习" (shēn dào xué xí) is the Simplified Chinese term for "deep learning".* "hinge loss" (折衣函数) is the Simplified Chinese term for "hinge loss".* "ReLU" (Rectified Linear Unit) is the Simplified Chinese term for "Rectified Linear Unit".* "Tanh" (Hyperbolic Tangent) is the Simplified Chinese term for "Hyperbolic Tangent".

Enhancing Keyphrase Generation by BART Finetuning with Splitting and Shuffling

  • paper_url: http://arxiv.org/abs/2309.06726
  • repo_url: None
  • paper_authors: Bin Chen, Mizuho Iwaihara
  • for: 本研究旨在提高缺失关键短语生成的性能,提出了关键短语集中心化的BART模型,并实现了缺失关键短语生成的精度优化。
  • methods: 本研究使用了序列到序列模型,并进行了两个独立的BART模型的微调,以便更好地生成缺失关键短语。
  • results: 对于缺失关键短语生成,本研究的Keyphrase-Focused BART在五个关键短语生成测试集上的F1@5指标中取得了新的州际最佳成绩。
    Abstract Keyphrase generation is a task of identifying a set of phrases that best repre-sent the main topics or themes of a given text. Keyphrases are dividend int pre-sent and absent keyphrases. Recent approaches utilizing sequence-to-sequence models show effectiveness on absent keyphrase generation. However, the per-formance is still limited due to the hardness of finding absent keyphrases. In this paper, we propose Keyphrase-Focused BART, which exploits the differ-ences between present and absent keyphrase generations, and performs fine-tuning of two separate BART models for present and absent keyphrases. We further show effective approaches of shuffling keyphrases and candidate keyphrase ranking. For absent keyphrases, our Keyphrase-Focused BART achieved new state-of-the-art score on F1@5 in two out of five keyphrase gen-eration benchmark datasets.
    摘要 <> tranlate_into_simplified_chinese键短语生成是一个文本中主题或主题的精炼过程,用于标识文本中的关键短语。键短语可以分为存在和缺失两类。 latest approaches 使用序列到序列模型,表现良好在缺失键短语生成中。然而,性能仍然有限因为缺失键短语的发现困难。在这篇论文中,我们提出了键短语专注的BART,利用存在和缺失键短语生成的差异,并对两个分离的BART模型进行了精炼。我们还提出了键短语洗牌和候选键短语排名的有效方法。对于缺失键短语,我们的键短语专注BART实现了五个键短语生成 benchmark dataset 中的新状态的末端得分。

Dynamic Spectrum Mixer for Visual Recognition

  • paper_url: http://arxiv.org/abs/2309.06721
  • repo_url: None
  • paper_authors: Zhiqiang Hu, Tao Yu
  • for: 提高多种视觉识别任务的性能,包括图像分类、物体检测和semantic segmentation。
  • methods: 使用Discrete Cosine Transform来在频域上表示 токен之间的互动,并提出了动态频谱权重生成层来选择有用的频谱频率。
  • results: 在多种视觉识别任务上达到了比较高的性能,比如 ImageNet 上的 top-1 准确率达到 83.8%,以及 ADE20K 上的 mIoU 达到 49.9%。
    Abstract Recently, MLP-based vision backbones have achieved promising performance in several visual recognition tasks. However, the existing MLP-based methods directly aggregate tokens with static weights, leaving the adaptability to different images untouched. Moreover, Recent research demonstrates that MLP-Transformer is great at creating long-range dependencies but ineffective at catching high frequencies that primarily transmit local information, which prevents it from applying to the downstream dense prediction tasks, such as semantic segmentation. To address these challenges, we propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM). The DSM represents token interactions in the frequency domain by employing the Discrete Cosine Transform, which can learn long-term spatial dependencies with log-linear complexity. Furthermore, a dynamic spectrum weight generation layer is proposed as the spectrum bands selector, which could emphasize the informative frequency bands while diminishing others. To this end, the technique can efficiently learn detailed features from visual input that contains both high- and low-frequency information. Extensive experiments show that DSM is a powerful and adaptable backbone for a range of visual recognition tasks. Particularly, DSM outperforms previous transformer-based and MLP-based models, on image classification, object detection, and semantic segmentation tasks, such as 83.8 \% top-1 accuracy on ImageNet, and 49.9 \% mIoU on ADE20K.
    摘要 近期,基于多层感知(MLP)的视觉基干已经实现了许多视觉识别任务的出色表现。然而,现有的MLP基本方法直接汇集token的静态重量,未能考虑不同图像的适应性。此外,最近的研究表明,MLP transformer在创建长距离依赖关系方面卓越,但缺乏捕捉高频信息,主要传输本地信息的能力,这使得它无法应用于下游密集预测任务,如semantic segmentation。为Address这些挑战,我们提议一种内容适应且计算效率高的结构,名为动态спектルmixer(DSM)。DSM通过使用Discrete Cosine Transform来表示token之间的交互,可以学习长期空间依赖关系,并且可以快速学习细致的特征。此外,我们还提出了一种动态spectrum weight生成层,可以强调有用的频率带,而忽略其他带。通过这种方式,我们可以高效地从视觉输入中学习细致的特征,包括高频和低频信息。我们的实验表明,DSM是一种强大和适应的视觉基干,可以在多种视觉识别任务中表现出色,包括ImageNet的83.8%的top-1准确率和ADE20K的49.9%的mIoU。

TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models

  • paper_url: http://arxiv.org/abs/2309.06719
  • repo_url: https://github.com/lijlansg/trafficgpt
  • paper_authors: Siyao Zhang, Daocheng Fu, Zhao Zhang, Bin Yu, Pinlong Cai
  • for: 提高城市交通管理和控制的能力,尤其是处理数字数据和与 simulate 的交互。
  • methods: 结合 ChatGPT 和专业交通基础模型,以提高解决复杂交通问题的能力。
  • results: 提供了一种新的 Approach 到利用 AI 技术解决城市交通问题,并可以提供有用的决策支持。
    Abstract With the promotion of chatgpt to the public, Large language models indeed showcase remarkable common sense, reasoning, and planning skills, frequently providing insightful guidance. These capabilities hold significant promise for their application in urban traffic management and control. However, LLMs struggle with addressing traffic issues, especially processing numerical data and interacting with simulations, limiting their potential in solving traffic-related challenges. In parallel, specialized traffic foundation models exist but are typically designed for specific tasks with limited input-output interactions. Combining these models with LLMs presents an opportunity to enhance their capacity for tackling complex traffic-related problems and providing insightful suggestions. To bridge this gap, we present TrafficGPT, a fusion of ChatGPT and traffic foundation models. This integration yields the following key enhancements: 1) empowering ChatGPT with the capacity to view, analyze, process traffic data, and provide insightful decision support for urban transportation system management; 2) facilitating the intelligent deconstruction of broad and complex tasks and sequential utilization of traffic foundation models for their gradual completion; 3) aiding human decision-making in traffic control through natural language dialogues; and 4) enabling interactive feedback and solicitation of revised outcomes. By seamlessly intertwining large language model and traffic expertise, TrafficGPT not only advances traffic management but also offers a novel approach to leveraging AI capabilities in this domain. The TrafficGPT demo can be found in https://github.com/lijlansg/TrafficGPT.git.
    摘要 With the promotion of ChatGPT to the public, large language models have indeed shown remarkable common sense, reasoning, and planning skills, frequently providing insightful guidance. These capabilities hold significant promise for their application in urban traffic management and control. However, LLMs struggle with addressing traffic issues, especially processing numerical data and interacting with simulations, limiting their potential in solving traffic-related challenges. In parallel, specialized traffic foundation models exist but are typically designed for specific tasks with limited input-output interactions. Combining these models with LLMs presents an opportunity to enhance their capacity for tackling complex traffic-related problems and providing insightful suggestions. To bridge this gap, we present TrafficGPT, a fusion of ChatGPT and traffic foundation models. This integration yields the following key enhancements:1. 允许ChatGPTView、分析和处理交通数据,为城市交通系统管理提供智能支持;2. 可以智能分解广泛和复杂的任务,并逐步使用交通基础模型完成其完成;3. 帮助人类决策交通控制通过自然语言对话;4. 允许人类反馈和修改结果。通过融合大语言模型和交通专家知识,TrafficGPT不仅提高了交通管理,还提供了一种新的使用AI技术解决交通问题的方法。TrafficGPT demo可以在https://github.com/lijlansg/TrafficGPT.git中找到。

Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

  • paper_url: http://arxiv.org/abs/2309.06692
  • repo_url: None
  • paper_authors: Xinyu Zhang, Weiyu Sun, Ying Chen
  • for: 提高 Federated Learning(FL)的性能,解决非独立同分布(non-IID)数据和设备不一致性的挑战。
  • methods: 通过对服务器端的梯度冲突现象进行分析,提出了一种简单 yet effective的 Gradient Harmonization(梯度融合)技术来mitigate local drifts。
  • results: 在多个 benchmark 和 non-IID 场景中,FedGH 能够顺利提高多个 state-of-the-art FL 基础系统,特别是在强度不一致的场景中表现更优异。
    Abstract Federated learning (FL) is a privacy-preserving paradigm for collaboratively training a global model from decentralized clients. However, the performance of FL is hindered by non-independent and identically distributed (non-IID) data and device heterogeneity. In this work, we revisit this key challenge through the lens of gradient conflicts on the server side. Specifically, we first investigate the gradient conflict phenomenon among multiple clients and reveal that stronger heterogeneity leads to more severe gradient conflicts. To tackle this issue, we propose FedGH, a simple yet effective method that mitigates local drifts through Gradient Harmonization. This technique projects one gradient vector onto the orthogonal plane of the other within conflicting client pairs. Extensive experiments demonstrate that FedGH consistently enhances multiple state-of-the-art FL baselines across diverse benchmarks and non-IID scenarios. Notably, FedGH yields more significant improvements in scenarios with stronger heterogeneity. As a plug-and-play module, FedGH can be seamlessly integrated into any FL framework without requiring hyperparameter tuning.
    摘要

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

  • paper_url: http://arxiv.org/abs/2309.06687
  • repo_url: https://github.com/zhehuazhou/llm_reward_design
  • paper_authors: Jiayang Song, Zhehua Zhou, Jiawei Liu, Chunrong Fang, Zhan Shu, Lei Ma
  • for: 自动化奖函数设计
  • methods: 利用大型自然语言模型(LLM)自我修复机制
  • results: 在三个不同的机器人系统上进行多种连续控制任务中,LLM设计的奖函数能与人工设计的奖函数相比或者超越它们,证明了方法的可行性和应用性。
    Abstract Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge, such as reasoning and planning. Recognizing that reward function design is also inherently linked to such knowledge, LLM offers a promising potential in this context. Motivated by this, we propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design. The framework commences with the LLM formulating an initial reward function based on natural language inputs. Then, the performance of the reward function is assessed, and the results are presented back to the LLM for guiding its self-refinement process. We examine the performance of our proposed framework through a variety of continuous robotic control tasks across three diverse robotic systems. The results indicate that our LLM-designed reward functions are able to rival or even surpass manually designed reward functions, highlighting the efficacy and applicability of our approach.
    摘要 尽管深度强化学习(DRL)在机器人应用中已经取得了显著的成功,但是设计高性能的奖金函数仍然是一项具有挑战性的任务,往往需要大量的手动输入。在最近,大型自然语言模型(LLM)在涉及深度通用常识知识的任务上得到了广泛的应用,如理解和规划。认识到奖金函数设计与这些知识息息相关,LLM具有潜在的应用潜力。在这种情况下,我们在这个工作中提出了一种基于LLM的自动奖金函数设计框架。框架开始于LLM根据自然语言输入形成初始奖金函数。然后,奖金函数的性能被评估,并将结果返回给LLM以引导其自我修正过程。我们通过三种不同的机器人控制任务来评估我们的提议的性能。结果表明,我们使用LLM设计的奖金函数能够与或超过手动设计的奖金函数,这说明了我们的方法的有效性和可行性。

Attention Loss Adjusted Prioritized Experience Replay

  • paper_url: http://arxiv.org/abs/2309.06684
  • repo_url: None
  • paper_authors: Zhuoying Chen, Huiping Li, Rizhong Wang
  • for: 提高深度强化学习训练速度
  • methods: 使用改进的自我注意网络和双抽样机制来调整重要抽样 вес,从而消除PER引起的估计误差
  • results: 在OPENAI gym中测试了值函数基于、政策梯度基于和多Agent强化学习算法,并进行了比较研究,证明提出的训练框架具有优势和效率。
    Abstract Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework.
    摘要 <>使用优先验收(PER)技术进行深度学习强化学习,可以提高神经网络训练速率。然而,PER 中的不均匀采样无可避免地导致状态动作空间分布的变化,从而引起 Q-值函数的估计误差。在这篇论文中,一种 Attention Loss Adjusted Prioritized(ALAP)经验回放算法被提出,该算法将改进的 Self-Attention 网络和双抽样机制结合起来,以适应调整重要抽样重量,以消除 PER 引起的估计误差。为证明 ALAP 的效果和通用性,该算法在 OPENAI gym 中使用值函数基本、政策梯度基本和多 Agent 强化学习算法进行测试,并进行对比研究,以验证提案的训练框架的优势和效率。>>>

A plug-and-play synthetic data deep learning for undersampled magnetic resonance image reconstruction

  • paper_url: http://arxiv.org/abs/2309.06681
  • repo_url: None
  • paper_authors: Min Xiao, Zi Wang, Jiefeng Guo, Xiaobo Qu
  • for: 提高现代医疗影像诊断中的磁共振成像(MRI)的速度,使其具有更快的扫描速度。
  • methods: 使用深度学习方法对快速扫描MRI数据进行重建,并可以适应不同的抽样设定。
  • results: 在实验数据上表明,提议的深度插件和游戏方法可以在不同的抽样模式和抽样速率下提供美观和稳定的加速图像重建性能。
    Abstract Magnetic resonance imaging (MRI) plays an important role in modern medical diagnostic but suffers from prolonged scan time. Current deep learning methods for undersampled MRI reconstruction exhibit good performance in image de-aliasing which can be tailored to the specific kspace undersampling scenario. But it is very troublesome to configure different deep networks when the sampling setting changes. In this work, we propose a deep plug-and-play method for undersampled MRI reconstruction, which effectively adapts to different sampling settings. Specifically, the image de-aliasing prior is first learned by a deep denoiser trained to remove general white Gaussian noise from synthetic data. Then the learned deep denoiser is plugged into an iterative algorithm for image reconstruction. Results on in vivo data demonstrate that the proposed method provides nice and robust accelerated image reconstruction performance under different undersampling patterns and sampling rates, both visually and quantitatively.
    摘要

SHARM: Segmented Head Anatomical Reference Models

  • paper_url: http://arxiv.org/abs/2309.06677
  • repo_url: None
  • paper_authors: Essam A. Rashed, Mohammad al-Shatouri, Ilkka Laakso, Akimasa Hirata
  • for: 这个研究的目的是提供一个开放访问的 Segmented Head Anatomical Reference Models(SHARM),用于衡量人头的不同组织部分的分布。
  • methods: 这个研究使用了开源的IXI MRI数据集和一种名为ForkNet+的卷积神经网络结构进行人头的分类。
  • results: 研究发现,SHARM在年龄层的统计特征上具有高一致性,与实际测量结果相符。 SHARM预期能够为电磁吸收研究提供一个有用的参照基准,同时也可以用于不同的人头分类应用。
    Abstract Reliable segmentation of anatomical tissues of human head is a major step in several clinical applications such as brain mapping, surgery planning and associated computational simulation studies. Segmentation is based on identifying different anatomical structures through labeling different tissues through medical imaging modalities. The segmentation of brain structures is commonly feasible with several remarkable contributions mainly for medical perspective; however, non-brain tissues are of less interest due to anatomical complexity and difficulties to be observed using standard medical imaging protocols. The lack of whole head segmentation methods and unavailability of large human head segmented datasets limiting the variability studies, especially in the computational evaluation of electrical brain stimulation (neuromodulation), human protection from electromagnetic field, and electroencephalography where non-brain tissues are of great importance. To fill this gap, this study provides an open-access Segmented Head Anatomical Reference Models (SHARM) that consists of 196 subjects. These models are segmented into 15 different tissues; skin, fat, muscle, skull cancellous bone, skull cortical bone, brain white matter, brain gray matter, cerebellum white matter, cerebellum gray matter, cerebrospinal fluid, dura, vitreous humor, lens, mucous tissue and blood vessels. The segmented head models are generated using open-access IXI MRI dataset through convolutional neural network structure named ForkNet+. Results indicate a high consistency in statistical characteristics of different tissue distribution in age scale with real measurements. SHARM is expected to be a useful benchmark not only for electromagnetic dosimetry studies but also for different human head segmentation applications.
    摘要 “人头部分 Segmentation 是诊断和运行方面的重要一步,包括脑地图、手术规划和相关的计算模拟研究。Segmentation 基于医学影像模式中不同的 анатомі学结构的识别,并将不同的组织分类为不同的 ткани。脑部分的 Segmentation 通常是医学观点上可行的,但非脑组织则因为 анатомі学复杂和普通医学影像实验困难而被忽略。没有整体人头部分 Segmentation 方法和大量人头部分分类dataset的不足,限制了多样性研究,特别是电子脑刺激(neuromodulation)、人体对电磁场的保护和电enzephalography 等方面的研究,非脑组织在这些领域非常重要。为了填补这个 gap,本研究提供了一个开放存取的 Segmented Head Anatomical Reference Models (SHARM),包括196个诊断件。这些模型分成15种不同的 ткани,包括皮肤、脂肪、肌肉、皮屑骨、骨髓、脑白 matter、脑灰 matter、脑灰白 matter、脑灰质液、脑髓液、眼镜膜、肉眼和血管。这些分类的人头模型通过开源的 IXI MRI 数据集通过 Convolutional Neural Network 结构 named ForkNet+ 生成。结果显示这些模型在年龄对应的统计特征上有高一致性。SHARM 预期会成为电磁质量研究之外,不同人头分类应用的有用benchmark。”

Large Language Models Can Infer Psychological Dispositions of Social Media Users

  • paper_url: http://arxiv.org/abs/2309.08631
  • repo_url: None
  • paper_authors: Heinrich Peters, Sandra Matz
  • for: 本研究旨在 investigate Large Language Models (LLMs) 的能力,以及这些模型在自然语言处理任务中的人类化能力。
  • methods: 本研究使用 GPT-3.5 和 GPT-4 来推断用户的 Facebook 状态更新中的心理特质。
  • results: 研究发现,LLMs 能够很准确地推断用户的 Five Big Personality Traits, correlation coefficient 平均为 r = .29(range = [.22, .33])。然而,研究还发现了性别和年龄的偏见,即对于女性和年轻人来说,LLMs 的推断结果更为准确。
    Abstract As Large Language Models (LLMs) demonstrate increasingly human-like abilities in various natural language processing (NLP) tasks that are bound to become integral to personalized technologies, understanding their capabilities and inherent biases is crucial. Our study investigates the potential of LLMs like ChatGPT to infer psychological dispositions of individuals from their digital footprints. Specifically, we assess the ability of GPT-3.5 and GPT-4 to derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores. Furthermore, our findings suggest biases in personality inferences with regard to gender and age: inferred scores demonstrated smaller errors for women and younger individuals on several traits, suggesting a potential systematic bias stemming from the underlying training data or differences in online self-expression.
    摘要 As Large Language Models (LLMs) demonstrate increasingly human-like abilities in various natural language processing (NLP) tasks that are bound to become integral to personalized technologies, understanding their capabilities and inherent biases is crucial. Our study investigates the potential of LLMs like ChatGPT to infer psychological dispositions of individuals from their digital footprints. Specifically, we assess the ability of GPT-3.5 and GPT-4 to derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores. Furthermore, our findings suggest biases in personality inferences with regard to gender and age: inferred scores demonstrated smaller errors for women and younger individuals on several traits, suggesting a potential systematic bias stemming from the underlying training data or differences in online self-expression.Here's the translation in Traditional Chinese:随着大型语言模型(LLMs)在不同的自然语言处理(NLP)任务中展示出人类化的能力,理解它们的能力和内在偏见是非常重要的。我们的研究探讨了 ChatGPT 等 LLMs 是否可以从用户的数码足迹中推断个人心理特质。具体来说,我们评估 GPT-3.5 和 GPT-4 是否可以从 Facebook 状态更新中推断 Big Five 人格特质。我们的结果显示,LLM-推断的和自我报告的特质分数之间的平均相关 coefficient 为 r = .29(range = [.22, .33])。此外,我们发现在不同的性别和年龄群体中,推断的特质分数存在偏见:对某些特质而言,女性和年轻人的推断分数更为精确,这可能是训练数据或在线自我表达中的偏见。

Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.06553
  • repo_url: https://github.com/holarissun/Prompt-OIRL
  • paper_authors: Hao Sun
    for:这个研究目的是为了提高大型自然语言模型(LLMs)的效能,并且可以实现人工智能的应用。methods:这个研究使用了逆引导式学习(Inverse-RL)来评估提示的效能,并且使用了网上的专家评价数据来 derivate 一个价值函数。results:这个研究获得了以下结果: Prompt-OIRL 可以预测提示的效能,是成本高效的,生成了人阅读的结果,并且能够有效地探索提示空间。
    Abstract The recent advances in the development of Large Language Models (LLMs) like ChatGPT have achieved remarkable performance by leveraging human expertise. Yet, fully eliciting LLMs' potential for complex tasks requires navigating the vast search space of natural language prompts. While prompt engineering has shown promise, the requisite human-crafted prompts in trial-and-error attempts and the associated costs pose significant challenges. Crucially, the efficiency of prompt optimization hinges on the costly procedure of prompt evaluation. This work introduces Prompt-OIRL, an approach rooted in offline inverse reinforcement learning that seeks to bridge the gap between effective prompt evaluation and affordability. Our method draws on offline datasets from expert evaluations, employing Inverse-RL to derive a reward model for offline, query-dependent prompt evaluations. The advantages of Prompt-OIRL are manifold: it predicts prompt performance, is cost-efficient, produces human-readable results, and efficiently navigates the prompt space. We validate our method across four LLMs and three arithmetic datasets, highlighting its potential as a robust and effective tool for offline prompt evaluation and optimization. Our code as well as the offline datasets are released, and we highlight the Prompt-OIRL can be reproduced within a few hours using a single laptop using CPU
    摘要 近期,大型自然语言模型(LLM)的开发已取得了显著的进步,如ChatGPT。然而,为复杂任务完全发挥LLM的潜力需要在自然语言提示的庞大搜索空间中穿梭。虽然提示工程学习表现了承诺,但是在尝试和错误中制定人工提示的过程和相关的成本带来了重大挑战。关键是提示优化的效率取决于提示评估的昂贵过程。本文提出了Prompt-OIRL方法,它基于线上反束学习来bridging提示评估的效率和可Affordability之间的 gap。我们的方法利用了线上专家评估数据,使用反束RL来 derive一个偏好prompt评估的奖励模型。Prompt-OIRL的优点包括:预测提示性能、成本效益、生成人readable的结果和高效地探索提示空间。我们验证了Prompt-OIRL在四个LLM和三个算术数据集上,并 highlighted its potential as a robust and effective tool for offline prompt evaluation and optimization。我们的代码以及offline数据集都已经发布,并且在使用单个 laptop CPU 上进行了复现,可以在几个小时内完成。

A Health Monitoring System Based on Flexible Triboelectric Sensors for Intelligence Medical Internet of Things and its Applications in Virtual Reality

  • paper_url: http://arxiv.org/abs/2309.07185
  • repo_url: None
  • paper_authors: Junqi Mao, Puen Zhou, Xiaoyao Wang, Hongbo Yao, Liuyang Liang, Yiqiao Zhao, Jiawei Zhang, Dayan Ban, Haiwu Zheng
    for:这项研究旨在开发一个可靠且智能的医疗物联网系统,以满足当代数字化和智能化时代的精准医疗、智能医疗和电子医疗需求。methods:研究人员采用了可变式摩擦电镜技术,并结合深度学习分析数据,实现了一个智能医疗监测系统,可以跟踪和分析患有parkinson病的患者的手部运动。results:研究人员实现了一个可靠、高敏感、智能的医疗监测系统,可以准确地捕捉和分析患有parkinson病的患者的细微运动和细腔运动。这种监测系统可以提供有价值的反馈和全面评估患者的情况,因此强调了人体感知技术在健康4.0社会中的极大潜力。
    Abstract The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of sensors. In this study, we designed a robust and intelligent IoMT system through the synergistic integration of flexible wearable triboelectric sensors and deep learning-assisted data analytics. We embedded four triboelectric sensors into a wristband to detect and analyze limb movements in patients suffering from Parkinson's Disease (PD). By further integrating deep learning-assisted data analytics, we actualized an intelligent healthcare monitoring system for the surveillance and interaction of PD patients, which includes location/trajectory tracking, heart monitoring and identity recognition. This innovative approach enabled us to accurately capture and scrutinize the subtle movements and fine motor of PD patients, thus providing insightful feedback and comprehensive assessment of the patients conditions. This monitoring system is cost-effective, easily fabricated, highly sensitive, and intelligent, consequently underscores the immense potential of human body sensing technology in a Health 4.0 society.
    摘要 互联网医疗物联网(IoMT)是一种将互联网物联网(IoT)技术与医疗应用结合的平台,使得精准医疗、智能医疗和 теле医学在数字化和智能时代得到实现。然而,IoMT面临着不同的挑战,包括可持续的电源供应、人类感知器的适应和感知器的智能。在本研究中,我们设计了一个可靠和智能的IoMT系统,通过同时结合柔性摩擦电子感知器和深度学习帮助数据分析。我们将四个摩擦电子感知器 integrate into a wristband to detect and analyze limb movements in patients with Parkinson's disease (PD).通过进一步结合深度学习帮助数据分析,我们实现了一个智能医疗监测系统,用于PD患者的位置/轨迹跟踪、心跳监测和身份识别。这种创新的方法使得我们能够准确捕捉和分析PD患者的微小动作和细 Motor,从而提供了深入的反馈和全面的病情评估。这种监测系统是成本低、易制造、高敏感和智能的,因此强调了人体感知技术在健康4.0社会中的无限潜力。

cs.CL - 2023-09-13

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

  • paper_url: http://arxiv.org/abs/2309.07311
  • repo_url: None
  • paper_authors: Angelica Chen, Ravid Schwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra
  • for: 本研究旨在探讨masked language model(MLM)在训练过程中的语法学习过程,以及这种过程对模型行为的影响。
  • methods: 本研究使用了一种case study的方法,通过分析MLM的训练过程中的 interpretable artifacts(可解释的特征)的演变,深入了解模型的emergent behavior(产生的行为)。
  • results: 研究发现,在训练过程中,MLM会自然地形成一种叫做Syntactic Attention Structure(语法注意结构),这种结构使得特定的Transformer heads(转换器头)对特定的语法关系进行专注。研究发现,在训练过程中有一个短暂的窗口时间,当models suddenly acquire SAS时,并且这个窗口时间与损失函数的快速下降相关。此外,SAS还促进了语言功能的后续学习。通过对SAS的 manipulate during training,研究发现SAS对语言功能的发展具有必要的作用,但同时也会与其他有利的特征和功能竞争。
    Abstract Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. In this paper, we present a case study of syntax acquisition in masked language models (MLMs). Our findings demonstrate how analyzing the evolution of interpretable artifacts throughout training deepens our understanding of emergent behavior. In particular, we study Syntactic Attention Structure (SAS), a naturally emerging property of MLMs wherein specific Transformer heads tend to focus on specific syntactic relations. We identify a brief window in training when models abruptly acquire SAS and find that this window is concurrent with a steep drop in loss. Moreover, SAS precipitates the subsequent acquisition of linguistic capabilities. We then examine the causal role of SAS by introducing a regularizer to manipulate SAS during training, and demonstrate that SAS is necessary for the development of grammatical capabilities. We further find that SAS competes with other beneficial traits and capabilities during training, and that briefly suppressing SAS can improve model quality. These findings reveal a real-world example of the relationship between disadvantageous simplicity bias and interpretable breakthrough training dynamics.
    摘要 大多数NLP解释研究都是关注已经训练完成的模型的行为和特征。然而,某些模型行为的理解可能只能通过观察训练过程的趋势来获得。在这篇论文中,我们提供了一个案例研究,探讨 маSKed语言模型(MLM)的 syntax acquisition。我们的发现表明,在训练过程中分析解释性artefact的演变可以深入了解模型的emergent行为。特别是,我们研究了在 MLM 中自然出现的语法注意结构(SAS),其中特定的 transformer 头部倾向于关注特定的语法关系。我们发现,在训练过程中模型突然获得 SAS 的时间fenomenon 和损失下降峰值是一样的。此外,SAS 对语言功能的掌握产生了先驱作用。我们然后对 SAS 的 causal 作用进行了检验,并证明了 SAS 是语言功能的必需条件。我们还发现,在训练过程中 SAS 与其他有利特征和能力的竞争存在,并且短暂地压制 SAS 可以提高模型质量。这些发现表明了实际中解释性研究中的简单偏好与可观察的训练动态之间的关系。

In-Contextual Bias Suppression for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.07251
  • repo_url: None
  • paper_authors: Daisuke Oba, Masahiro Kaneko, Danushka Bollegala
  • for: 降低语言模型中的性别偏见
  • methods: 使用文本基础模板生成 counterfactual 语句,以及使用 descriptive 语句来描述职业,以降低语言模型中的性别偏见
  • results: 通过这些方法,可以减少语言模型中的性别偏见,而不需要访问模型参数,并且不会影响下游任务的性能。
    Abstract Despite their impressive performance in a wide range of NLP tasks, Large Language Models (LLMs) have been reported to encode worrying-levels of gender bias. Prior work has proposed debiasing methods that require human labelled examples, data augmentation and fine-tuning of the LLMs, which are computationally costly. Moreover, one might not even have access to the internal parameters for performing debiasing such as in the case of commercially available LLMs such as GPT-4. To address this challenge we propose bias suppression, a novel alternative to debiasing that does not require access to model parameters. We show that text-based preambles, generated from manually designed templates covering counterfactual statements, can accurately suppress gender biases in LLMs. Moreover, we find that descriptive sentences for occupations can further suppress gender biases. Interestingly, we find that bias suppression has a minimal adverse effect on downstream task performance, while effectively mitigating the gender biases.
    摘要 尽管大语言模型(LLMs)在各种自然语言处理任务中表现出色,但它们仍然存在害让的性别偏见。先前的工作已经提议了对偏见的修正方法,需要人工标注示例、数据扩展和模型练习,这些方法都是 computationally costly。此外,在商业可用的 LLMs 中,可能无法获取内部参数,进行修正。为解决这个挑战,我们提出了偏见抑制,一种不需要模型参数的新方法。我们表明,基于手动设计的模板,生成的文本引言可以准确地抑制 LLMs 中的性别偏见。此外,我们发现,对职业描述句的使用可以进一步抑制性别偏见。意外地,我们发现,偏见抑制并没有对下游任务性能产生明显的负面影响,同时有效地 Mitigating 性别偏见。

RAIN: Your Language Models Can Align Themselves without Finetuning

  • paper_url: http://arxiv.org/abs/2309.07124
  • repo_url: None
  • paper_authors: Yuhui Li, Fangyun Wei, Jinjing Zhao, Chao Zhang, Hongyang Zhang
  • for: 本研究旨在实现AI安全,将预训语言模型(LLM)与人类偏好集成。
  • methods: 本研究使用了自我评估和回溯机制,将预训LLM直接调整为人类偏好。新引入的推论方法称为自回推导(RAIN),可以让预训LLM评估自己的生成结果,并使用评估结果导引后向回溯和前进生成,以确保AI安全。
  • results: 实验结果显示,RAIN可以将预训LLM的伤害率(HH dataset)提高至97%,保持有用性率不变。另外,在领导性攻击llm-attacks中,RAIN成功降低了攻击成功率从94%下降至19%。
    Abstract Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work explores the potential of the latter setting. We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. We introduce a novel inference method, Rewindable Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate their own generation and use the evaluation results to guide backward rewind and forward generation for AI safety. Notably, RAIN operates without the need of extra data for model alignment and abstains from any training, gradient computation, or parameter updates; during the self-evaluation phase, the model receives guidance on which human preference to align with through a fixed-template prompt, eliminating the need to modify the initial prompt. Experimental results evaluated by GPT-4 and humans demonstrate the effectiveness of RAIN: on the HH dataset, RAIN improves the harmlessness rate of LLaMA 30B over vanilla inference from 82% to 97%, while maintaining the helpfulness rate. Under the leading adversarial attack llm-attacks on Vicuna 33B, RAIN establishes a new defense baseline by reducing the attack success rate from 94% to 19%.
    摘要 大型语言模型(LLM)经常表现出人类 предпочтения不一致。过去的研究通过强化学习或指令调整,将预训练模型与人类 предпочтеences进行对应,这被称为辐射步骤。而不需要额外数据,直接将冻结LLMs进行对应,更加吸引人。这项工作探索了后者的可能性。我们发现,通过结合自我评估和rewind机制,无需额外数据的LLMs可以通过自我推动直接生成与人类 предпочтеences相符的回答。我们提出了一种新的推理方法,即逆时间推理自动回归推理(RAIN),允许预训练LLMs在自我评估期间评估自己的生成,并使用评估结果导向后向推动和前向生成,以保障人工智能安全。值得注意的是,RAIN不需要额外数据进行模型对齐,也不需要任何训练、梯度计算或参数更新;在自我评估阶段,模型通过固定模板提示来获得需要与人类 predicates 对齐的指导。实验结果通过GPT-4和人类评价表明,RAIN可以有效地提高LLaMA 30B中的无害率,从82%提高到97%,同时保持有用率不变。在llm-attacks攻击下,RAIN成功地提出了一个新的防御基准,从94%降低到19%。

Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

  • paper_url: http://arxiv.org/abs/2309.07098
  • repo_url: https://github.com/zurichnlp/contradecode
  • paper_authors: Rico Sennrich, Jannis Vamvas, Alireza Mohammadshahi
  • for: 本文针对机器翻译中的幻视和目标翻译问题进行解决,特别是在低资源语言和巨大多语言模型中。
  • methods: 本文提出了一种 modificated decoding 目标,以避免需要重训或外部模型。具体来说,在源 contrasional decoding 中寻找一个翻译,它在正确的输入下是可能的,但是在随机输入段下是不可能的,这样幻视和目标翻译将会具有相似的可能性。在语言 contrasional decoding 中寻找一个翻译,它在正确的语言指示符token下是可能的,但是在错误的语言指示符token下是不可能的。
  • results: 在 M2M-100 (418M) 和 SMaLL-100 上进行实验,发现这些方法可以有效地抑制幻视和目标翻译,提高 chrF2 的分数在57个翻译方向上平均提高1.7和1.4分。在英文–德文的证明中,我们也显示了使用 Llama 2 chat 模型时可以抑制错误翻译,这证明了这些方法的应用性。我们在 GitHub 上发布了源代码,请参考 https://github.com/ZurichNLP/ContraDecode
    Abstract Hallucinations and off-target translation remain unsolved problems in machine translation, especially for low-resource languages and massively multilingual models. In this paper, we introduce methods to mitigate both failure cases with a modified decoding objective, without requiring retraining or external models. In source-contrastive decoding, we search for a translation that is probable given the correct input, but improbable given a random input segment, hypothesising that hallucinations will be similarly probable given either. In language-contrastive decoding, we search for a translation that is probable, but improbable given the wrong language indicator token. In experiments on M2M-100 (418M) and SMaLL-100, we find that these methods effectively suppress hallucinations and off-target translations, improving chrF2 by 1.7 and 1.4 points on average across 57 tested translation directions. In a proof of concept on English--German, we also show that we can suppress off-target translations with the Llama 2 chat models, demonstrating the applicability of the method to machine translation with LLMs. We release our source code at https://github.com/ZurichNLP/ContraDecode.
    摘要 幻觉和目标翻译仍然是机器翻译中的未解决问题,特别是对于low-resource语言和大量多语言模型。在这篇论文中,我们介绍了一些方法来减少这两种失败情况,无需重新训练或外部模型。在源相似搜索中,我们寻找一个翻译,可以在正确的输入下被证明,但是在随机输入段下被证明是不可能的,我们假设幻觉将会与随机输入段一样可能。在语言相似搜索中,我们寻找一个翻译,可以在正确的语言指示符下被证明,但是在错误的语言指示符下被证明是不可能的。在M2M-100(418M)和SMaLL-100上进行了实验,我们发现这些方法可以有效地降低幻觉和目标翻译,提高chrF2的平均分数为1.7和1.4点。在一个证明性的英文-德语实验中,我们也示出了使用LLMs(大量多语言模型)可以降低幻觉翻译。我们在 GitHub 上发布了我们的源代码,请参考https://github.com/ZurichNLP/ContraDecode。

Can Whisper perform speech-based in-context learning

  • paper_url: http://arxiv.org/abs/2309.07081
  • repo_url: None
  • paper_authors: Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang
  • for: 这个论文 investigate OpenAI Whisper 自动语音识别(ASR)模型的上下文学习能力。
  • methods: 提出了一种基于语音的上下文学习(SICL)方法,可以在测试时进行适应,并且只需要一小部分标注的语音样本,不需要梯度下降。
  • results: 对中文方言进行语言级别适应实验,结果显示可以在隔离单词ASR中实现了 considable 的相对WRR(词错率)减少,最高达36.4%。
    Abstract This paper investigates the in-context learning abilities of the Whisper automatic speech recognition (ASR) models released by OpenAI. A novel speech-based in-context learning (SICL) approach is proposed for test-time adaptation, which can reduce the word error rates (WERs) with only a small number of labelled speech samples without gradient descent. Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32.3%. A k-nearest-neighbours-based in-context example selection technique can be applied to further improve the efficiency of SICL, which can increase the average relative WER reduction to 36.4%. The findings are verified using speaker adaptation or continuous speech recognition tasks, and both achieved considerable relative WER reductions. Detailed quantitative analyses are also provided to shed light on SICL's adaptability to phonological variances and dialect-specific lexical nuances.
    摘要

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

  • paper_url: http://arxiv.org/abs/2309.07045
  • repo_url: https://github.com/thu-coai/safetybench
  • paper_authors: Zhexin Zhang, Leqi Lei, Lindong Wu, Rui Sun, Yongkang Huang, Chong Long, Xiao Liu, Xuanyu Lei, Jie Tang, Minlie Huang
  • for: The paper is written to evaluate the safety of Large Language Models (LLMs) and to provide a comprehensive benchmark for assessing their safety.
  • methods: The paper presents SafetyBench, a benchmark that includes 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns, both in Chinese and English.
  • results: The paper reports the results of extensive tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings, showing a substantial performance advantage for GPT-4 over its counterparts, and highlighting the need for further improvement in the safety of current LLMs.Here’s the information in Simplified Chinese text:
  • for: 这篇论文是为了评估大语言模型(LLMs)的安全性而写的。
  • methods: 论文提出了 SafetyBench,一个包含11,435个多选问题,涵盖7种安全问题类型的完整的标准套件。
  • results: 论文公布了25种中文和英文 LLMS 在零shot和几 shot设置下的广泛测试结果,显示 GPT-4 在其他模型中表现出优异的性能,并且现有 LLMS 的安全性还有很大的改进空间。
    Abstract With the rapid development of Large Language Models (LLMs), increasing attention has been paid to their safety concerns. Consequently, evaluating the safety of LLMs has become an essential task for facilitating the broad applications of LLMs. Nevertheless, the absence of comprehensive safety evaluation benchmarks poses a significant impediment to effectively assess and enhance the safety of LLMs. In this work, we present SafetyBench, a comprehensive benchmark for evaluating the safety of LLMs, which comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. Notably, SafetyBench also incorporates both Chinese and English data, facilitating the evaluation in both languages. Our extensive tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts, and there is still significant room for improving the safety of current LLMs. We believe SafetyBench will enable fast and comprehensive evaluation of LLMs' safety, and foster the development of safer LLMs. Data and evaluation guidelines are available at https://github.com/thu-coai/SafetyBench. Submission entrance and leaderboard are available at https://llmbench.ai/safety.
    摘要 随着大型语言模型(LLMs)的快速发展,关注其安全问题的注意力也在不断增加。因此,评估LLMs的安全性成为了推广其应用的重要任务。然而,没有全面的安全评估标准,使得评估和改进LLMs的安全性受到了很大的阻碍。在这篇文章中,我们提出了一个名为SafetyBench的全面的安全评估标准,它包含11,435个多选问题,涵盖了7种安全问题的多样化分类。另外,SafetyBench还包含了中文和英文数据,以便在两种语言下进行评估。我们对25个流行的中文和英文LLMs进行了zero-shot和few-shot测试,发现GPT-4在这些测试中表现出了显著的性能优势,而目前的LLMs仍有很大的改进空间。我们认为SafetyBench将为LLMs的安全评估提供快速和全面的评估机制,并促进更安全的LLMs的发展。数据和评估指南可以在https://github.com/thu-coai/SafetyBench中找到,评估入口和排名可以在https://llmbench.ai/safety中找到。

Beyond original Research Articles Categorization via NLP

  • paper_url: http://arxiv.org/abs/2309.07020
  • repo_url: https://github.com/rturrisige/textclassification
  • paper_authors: Rosanna Turrisi
  • for: 这个研究是为了提出一种新的文本分类方法,用于科学文献中未知类别的识别,使用自然语言处理技术。
  • methods: 研究使用了预训练语言模型SciBERT,从ArXiv数据集中提取了抽象的有意义表示。文本分类使用K-Means算法,并根据Silhouette分数确定最佳分数。
  • results: 结果表明,提出的方法可以更有效地捕捉摘要中的主题信息,与传统的arXiv标签系统相比,导致改进的文本分类。该方法可能用于更好地导航和推荐科研论文。
    Abstract This work proposes a novel approach to text categorization -- for unknown categories -- in the context of scientific literature, using Natural Language Processing techniques. The study leverages the power of pre-trained language models, specifically SciBERT, to extract meaningful representations of abstracts from the ArXiv dataset. Text categorization is performed using the K-Means algorithm, and the optimal number of clusters is determined based on the Silhouette score. The results demonstrate that the proposed approach captures subject information more effectively than the traditional arXiv labeling system, leading to improved text categorization. The approach offers potential for better navigation and recommendation systems in the rapidly growing landscape of scientific research literature.
    摘要 这个研究提出了一种新的文本分类方法,用于未知类别的科学文献中,使用自然语言处理技术。研究利用了预训练语言模型,具体来说是 SciBERT,来提取抽象文本中意义的表示。文本分类使用K-Means算法,并根据Silhouette分数确定最佳分区数。结果表明,提出的方法能够更有效地捕捉报告中的主题信息,从而改善文本分类。该方法可能为科学研究文献领域中的浏览和推荐系统带来改善。

OYXOY: A Modern NLP Test Suite for Modern Greek

  • paper_url: http://arxiv.org/abs/2309.07009
  • repo_url: None
  • paper_authors: Konstantinos Kogkalidis, Stergios Chatzikyriakidis, Eirini Chrysovalantou Giannikouri, Vassiliki Katsouli, Christina Klironomou, Christina Koula, Dimitris Papadakis, Thelka Pasparaki, Erofili Psaltaki, Efthymia Sakellariou, Hara Soupiona
  • for: 这paper的目的是为希腊自然语言处理(Greek NLP)领域开发一个语言学原则驱动的技术相关评估集。
  • methods: 这paper使用了两个创新,即在推理任务中标记所有可能的推理标签,以及使用ChatGPT作为语言中立解析器将字典式现代希腊语转换成结构化格式,从而生成其他三个任务。
  • results: 这paper的实验结果表明了这些任务的挑战性,并证明了希腊NPLEcosystem需要迅速进步,以与当代主流研究保持 pace。
    Abstract This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just \textit{one}, but rather \textit{all} possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.
    摘要 Firstly, our inference dataset is the first of its kind, marking all possible inference labels, taking into account possible shifts due to ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. We use ChatGPT as a language-neutral parser to transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections.We conduct experiments using currently available state-of-the-art machinery alongside each task. Our experimental baselines show that the tasks are challenging and highlight the need for expedited progress in the Greek NLP ecosystem to keep pace with contemporary mainstream research.

Unsupervised Contrast-Consistent Ranking with Language Models

  • paper_url: http://arxiv.org/abs/2309.06991
  • repo_url: None
  • paper_authors: Niklas Stoehr, Pengxiang Cheng, Jing Wang, Daniel Preotiuc-Pietro, Rajarshi Bhowmik
  • for: 本研究旨在探讨语言模型中的排序知识,并研究一种基于冲突相关搜寻的排序技术。
  • methods: 本研究使用了对比稳定搜寻(CCS)方法,并对现有排序方法进行了修改,以使其更适合语言模型的排序任务。
  • results: 研究发现,使用CCR搜寻技术可以比使用提问技术更好地推理语言模型中的排序知识,并且可以与更大的语言模型进行比较。
    Abstract Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank reviews by sentiment. Recent work focuses on pairwise, pointwise, and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probing model guided by a logical constraint: a model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss, and Ordinal Regression objective. Our results confirm that, for the same language model, CCR probing outperforms prompting and even performs on a par with prompting much larger language models.
    摘要 language models 包含排名知识,是解决 context 排名任务的强大解决方案。例如,它们可能具有国家大小的参数知识或可以根据 sentiment 排序评论。 current work 注重 pairwise、pointwise 和 listwise 提问技术来引出语言模型的排名知识。然而,我们发现,即使通过精心调整和限定解码,提问技术可能不一定是自我一致的。这种情况 motivates us 寻找一种不同的方法,它是基于无监督探测方法 called Contrast-Consistent Search (CCS)。这个想法是训练一个探测模型,使其的表示符和其否则都被映射到冲突真假极点一致的多个声明中。我们假设这种限制也适用于排名任务,其中所有项都是通过一致的多个比较关系联系在一起。为此,我们将 binary CCS 方法扩展到 Contrast-Consistent Ranking (CCR),并采用现有的排名方法,如 Max-Margin Loss、Triplet Loss 和 Ordinal Regression 目标。我们的结果表明,使用 CCR probing,Language model 的表达能力相对较强,甚至可以与更大的语言模型相比。

Remote Inference of Cognitive Scores in ALS Patients Using a Picture Description

  • paper_url: http://arxiv.org/abs/2309.06989
  • repo_url: None
  • paper_authors: Carla Agurto, Guillermo Cecchi, Bo Wen, Ernest Fraenkel, James Berry, Indu Navar, Raquel Norel
  • For: The paper focuses on detecting cognitive impairment in individuals with Amyotrophic Lateral Sclerosis (ALS) using a digital version of the Edinburgh Cognitive and Behavioral ALS Screen (ECAS) test.* Methods: The study uses a remote testing approach where participants (ALS and non-ALS) describe a pool of pictures with complex scenes on their computer at home. The study extracts linguistic and acoustic features from the speech samples and inputs them into linear regression models to predict ECAS sub-scores and the total score.* Results: The study finds that speech samples from the picture description are reliable enough to predict the ECAS subs-scores, achieving statistically significant Spearman correlation values between 0.32 and 0.51 for the model’s performance using 10-fold cross-validation.Here is the information in Simplified Chinese text:* For: 这个研究旨在检测阿尔茨海默病患者的认知障碍。* Methods: 研究使用了远程测试方法,参与者(阿尔茨海默病患者和非阿尔茨海默病患者)在家中用计算机描述复杂场景的图片。研究提取了语言和听音特征,并将其输入到线性回归模型中预测ECAS子分数和总分数。* Results: 研究发现,图片描述的speech样本可靠地预测ECAS子分数,实现了 statistically significant的Spearman相关值(0.32-0.51)using 10-fold cross-validation。
    Abstract Amyotrophic lateral sclerosis is a fatal disease that not only affects movement, speech, and breath but also cognition. Recent studies have focused on the use of language analysis techniques to detect ALS and infer scales for monitoring functional progression. In this paper, we focused on another important aspect, cognitive impairment, which affects 35-50% of the ALS population. In an effort to reach the ALS population, which frequently exhibits mobility limitations, we implemented the digital version of the Edinburgh Cognitive and Behavioral ALS Screen (ECAS) test for the first time. This test which is designed to measure cognitive impairment was remotely performed by 56 participants from the EverythingALS Speech Study. As part of the study, participants (ALS and non-ALS) were asked to describe weekly one picture from a pool of many pictures with complex scenes displayed on their computer at home. We analyze the descriptions performed within +/- 60 days from the day the ECAS test was administered and extract different types of linguistic and acoustic features. We input those features into linear regression models to infer 5 ECAS sub-scores and the total score. Speech samples from the picture description are reliable enough to predict the ECAS subs-scores, achieving statistically significant Spearman correlation values between 0.32 and 0.51 for the model's performance using 10-fold cross-validation.
    摘要 在这个研究中,参与者(ALS和非ALS)被要求在家中电脑上显示出复杂的场景中的一幅照片,然后描述这幅照片。我们分析了这些描述的语言和声音特征,并将其输入到线性回授模型中,以预测ECAS子Score和总分。我们发现,这些语言和声音特征可以预测ECAS子Score的表现,得到了 statistically significant 的Spearman相关值(0.32-0.51),使用10-fold横推验。

Auto-Regressive Next-Token Predictors are Universal Learners

  • paper_url: http://arxiv.org/abs/2309.06979
  • repo_url: None
  • paper_authors: Eran Malach
  • for: 这个论文主要研究了自然语言处理领域中的语言模型,具体来说是通过预测下一个字符来实现语言模型的逻辑和数学逻辑能力。
  • methods: 该论文使用了自然语言处理中常见的预测下一个字符的方法,并通过分析这些方法的复杂性来理解语言模型的能力。
  • results: 实验结果表明,使用了线性预测和浅层多层感知机(MLP)等简单方法可以实现非rivial的文本生成和数学任务,而这些方法都是通过预测下一个字符来训练的。这些结果表明,自然语言处理领域中语言模型的能力主要归功于自动预测下一个字符的训练方法,而不是特定的架构选择。
    Abstract Large language models display remarkable capabilities in logical and mathematical reasoning, allowing them to solve complex tasks. Interestingly, these abilities emerge in networks trained on the simple task of next-token prediction. In this work, we present a theoretical framework for studying auto-regressive next-token predictors. We demonstrate that even simple models such as linear next-token predictors, trained on Chain-of-Thought (CoT) data, can approximate any function efficiently computed by a Turing machine. We introduce a new complexity measure -- length complexity -- which measures the number of intermediate tokens in a CoT sequence required to approximate some target function, and analyze the interplay between length complexity and other notions of complexity. Finally, we show experimentally that simple next-token predictors, such as linear networks and shallow Multi-Layer Perceptrons (MLPs), display non-trivial performance on text generation and arithmetic tasks. Our results demonstrate that the power of language models can be attributed, to a great extent, to the auto-regressive next-token training scheme, and not necessarily to a particular choice of architecture.
    摘要 大型语言模型显示了惊人的逻辑和数学逻辑能力,允许它们解决复杂任务。有趣的是,这些能力来自于在简单任务上训练的下一个元素预测器。在这项工作中,我们提出了一个理论框架,用于研究自动递归下一个元素预测器。我们示出了,即使是简单的模型,如线性下一个元素预测器,也可以高效地计算任何由图灵机计算的函数。我们引入了一个新的复杂度度量——链接复杂度,用于度量在链式思维(CoT)序列中需要用于逼近某个目标函数的中间元素数量,并分析了这种复杂度与其他复杂度之间的交互。最后,我们通过实验表明,简单的下一个元素预测器,如线性网络和浅层多层感知机(MLP),在文本生成和数学任务上表现出非常有趣的表现。我们的结果表明,语言模型的力量可以归结到自动递归下一个元素训练方案,而不是特定的架构选择。

Dynamic Causal Disentanglement Model for Dialogue Emotion Detection

  • paper_url: http://arxiv.org/abs/2309.06928
  • repo_url: None
  • paper_authors: Yuting Su, Yichen Wei, Weizhi Nie, Sicheng Zhao, Anan Liu
  • for: 本研究旨在提高对对话中的情感识别精度,通过将各种隐藏变量分离并分析对话内容的时间accumulation。
  • methods: 本研究提出了一种基于隐藏变量分离的动态 causal disentanglement模型,利用causal directed acyclic graph (DAG)建立隐藏情感信息与其他观察到的元素之间的相关性。
  • results: 在对话情感识别任务中,该模型的实验结果表明其比传统方法更高的精度,能够更好地识别对话中的情感变化。
    Abstract Emotion detection is a critical technology extensively employed in diverse fields. While the incorporation of commonsense knowledge has proven beneficial for existing emotion detection methods, dialogue-based emotion detection encounters numerous difficulties and challenges due to human agency and the variability of dialogue content.In dialogues, human emotions tend to accumulate in bursts. However, they are often implicitly expressed. This implies that many genuine emotions remain concealed within a plethora of unrelated words and dialogues.In this paper, we propose a Dynamic Causal Disentanglement Model based on hidden variable separation, which is founded on the separation of hidden variables. This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions, thereby enabling more precise emotion recognition. First, we introduce a novel Causal Directed Acyclic Graph (DAG) to establish the correlation between hidden emotional information and other observed elements. Subsequently, our approach utilizes pre-extracted personal attributes and utterance topics as guiding factors for the distribution of hidden variables, aiming to separate irrelevant ones. Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables, enabling the accumulation of emotion-related information throughout the conversation. To guide this disentanglement process, we leverage the ChatGPT-4.0 and LSTM networks to extract utterance topics and personal attributes as observed information.Finally, we test our approach on two popular datasets in dialogue emotion detection and relevant experimental results verified the model's superiority.
    摘要 互动感知是一种 kritical technology 在多个领域中广泛应用。然而,对话式感知探测遇到了许多问题和挑战,主要是因为人类自由和对话内容的变化。在对话中,人类情感倾向于累累发展,但它们通常是隐藏的,这意味着许多真正的情感还未被发现。在这篇文章中,我们提出了一个基于隐藏变量分离的动态 causal disentanglement 模型,可以对对话内容进行分解和探索,从而提高感知识别精度。首先,我们提出了一个新的 causal directed acyclic graph (DAG),用于建立隐藏情感信息和其他观察到的元素之间的相互作用。接着,我们的方法利用预先提取的人际特征和话题的对话概率,将隐藏变量分离为无关的一部分。特别是,我们提出了一个动态时间分离模型,用于推测对话中的变量和隐藏变量的传播,从而在对话中累累发展情感相关信息。为了导引这个分离过程,我们利用 ChatGPT-4.0 和 LSTM 网络提取对话话题和人际特征,以观察到的信息导引分离。总之,我们对两个广泛使用的对话感知探测 datasets 进行了评估,结果显示了我们的方法的优越性。

Native Language Identification with Big Bird Embeddings

  • paper_url: http://arxiv.org/abs/2309.06923
  • repo_url: https://github.com/sergeykramp/mthesis-bigbird-embeddings
  • paper_authors: Sergey Kramp, Giovanni Cassani, Chris Emmery
  • for: 本研究旨在 investigate whether input size is a limiting factor in Native Language Identification (NLI) tasks, and to provide effective and practical alternatives to traditional feature engineering methods.
  • methods: 本研究使用 Big Bird embeddings 来训练分类器,并 compare 其表现与传统的语言特征工程方法。
  • results: 研究发现,使用 Big Bird embeddings 可以大幅提高 NLI 模型的表现,并且可以在各种输入长度下保持稳定的性能。
    Abstract Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and transformer-based NLI models have thus far failed to offer effective, practical alternatives. The current work investigates if input size is a limiting factor, and shows that classifiers trained using Big Bird embeddings outperform linguistic feature engineering models by a large margin on the Reddit-L2 dataset. Additionally, we provide further insight into input length dependencies, show consistent out-of-sample performance, and qualitatively analyze the embedding space. Given the effectiveness and computational efficiency of this method, we believe it offers a promising avenue for future NLI work.
    摘要 native 语言识别 (NLI) 目标是根据作者写作的语言来分类作者的本土语言。历史上,这个任务曾经依赖于时间消耗的语言特征工程,而 transformer 基本的 NLI 模型未能提供有效、实用的替代方案。当前的研究发现输入大小是限制因素,并表明使用 Big Bird 嵌入 trained 分类器在 Reddit-L2 数据集上表现远胜语言特征工程模型。此外,我们还提供了输入长度的依赖关系和离线性能的分析,以及嵌入空间的Qualitative 分析。由于这种方法的效iveness 和计算效率,我们认为它将成为未来 NLI 工作的有力道路。

Scaled Prompt-Tuning for Few-Shot Natural Language Generation

  • paper_url: http://arxiv.org/abs/2309.06759
  • repo_url: None
  • paper_authors: Ting Hu, Christoph Meinel, Haojin Yang
  • for: 这个研究是为了提出 Parameter-Efficient Fine-Tuning (PEFT) 方法,以减少 fine-tuning 大语言模型 (LLMs) 的内存占用和训练成本,并在几次示例中维持或以上性能。
  • methods: 本研究提出了 Scaled Prompt-Tuning (SPT) 方法,它在几次示例中超过传统 Prompt-Tuning (PT) 方法的性能和数据转移能力,并未增加训练成本。此外,研究还进行了现有 PEFT 方法的比较,发现一些方法在几次示例中表现不佳,特别是在具有挑战性的数据集上。
  • results: 研究发现,SPT 方法在几次示例中可以维持或以上性能,并且在数据转移时具有更好的数据转移能力。此外,研究还发现了一些现有 PEFT 方法在几次示例中表现不佳,特别是在具有挑战性的数据集上。
    Abstract The increasingly Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities, while the memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible. Besides, fine-tuning generally requires a certain amount of data from individual tasks whilst data collection cost is another issue to consider in real-world applications. In this work, we focus on Parameter-Efficient Fine-Tuning (PEFT) methods for few-shot Natural Language Generation (NLG), which freeze most parameters in LLMs and tune a small subset of parameters in few-shot cases so that memory footprint, training cost, and labeling cost are reduced while maintaining or even improving the performance. We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability but without an obvious increase in training cost. Further study on intermediate SPT suggests the superior transferability of SPT in few-shot scenarios, providing a recipe for data-deficient and computation-limited circumstances. Moreover, a comprehensive comparison of existing PEFT methods reveals that certain approaches exhibiting decent performance with modest training cost such as Prefix-Tuning in prior study could struggle in few-shot NLG tasks, especially on challenging datasets.
    摘要 LLMs 的增长强大语言理解和生成能力,然而 fine-tuning LLMs 在下游任务中的内存需求和计算成本是非常重要的。此外, fine-tuning 通常需要具体任务的数据量,而数据收集成本是实际应用中的一个问题。在这项工作中,我们关注 Parameter-Efficient Fine-Tuning (PEFT) 方法,以便在几架 Natural Language Generation (NLG) 中减少内存占用量、训练成本和标签成本,保持或者提高性能。我们提出了一种缩放提 prompt-tuning (SPT) 方法,超过了传统的 PT 方法,而无需明显增加训练成本。进一步的研究表明,SPT 在几架 scenarios 中具有更好的转移性,提供了数据缺乏和计算有限的情况下的热门策略。此外,现有的 PEFT 方法的比较表明,某些方法在几架 NLG 任务中,特别是在复杂的 dataset 上,可能会表现出差。

CONVERSER: Few-Shot Conversational Dense Retrieval with Synthetic Data Generation

  • paper_url: http://arxiv.org/abs/2309.06748
  • repo_url: https://github.com/miulab/converser
  • paper_authors: Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, Yun-Nung Chen
  • for: 本研究旨在提出一种基于对话的密集检索框架,以便在几个示例对话中训练密集检索模型。
  • methods: 我们利用大语言模型的在场学习能力,将对话查询文本生成自检索集。
  • results: 我们的提议方法在对话检索评测中表现与完全监督模型相当,表明我们的方法可以在几个示例对话中实现可靠的密集检索。所有代码和生成的数据集可以在 GitHub 上找到。
    Abstract Conversational search provides a natural interface for information retrieval (IR). Recent approaches have demonstrated promising results in applying dense retrieval to conversational IR. However, training dense retrievers requires large amounts of in-domain paired data. This hinders the development of conversational dense retrievers, as abundant in-domain conversations are expensive to collect. In this paper, we propose CONVERSER, a framework for training conversational dense retrievers with at most 6 examples of in-domain dialogues. Specifically, we utilize the in-context learning capability of large language models to generate conversational queries given a passage in the retrieval corpus. Experimental results on conversational retrieval benchmarks OR-QuAC and TREC CAsT 19 show that the proposed CONVERSER achieves comparable performance to fully-supervised models, demonstrating the effectiveness of our proposed framework in few-shot conversational dense retrieval. All source code and generated datasets are available at https://github.com/MiuLab/CONVERSER
    摘要 对话搜索提供了一种自然的搜索界面,最近的方法已经在对话搜索中应用了稠密搜索。然而,训练稠密搜索者需要大量的领域内对话数据,这限制了对话稠密搜索的发展,因为充足的领域内对话是昂贵的收集的。在这篇论文中,我们提出了一个名为CONVERSER的框架,用于在最多6个领域对话中训练对话稠密搜索者。我们利用了大型自然语言模型的上下文学习能力,将领域内对话中的段落作为输入,生成对话查询。实验结果表明,我们的CONVERSER可以与完全监督模型相比,在少量对话稠密搜索中实现相似的性能。所有源代码和生成的数据集可以在https://github.com/MiuLab/CONVERSER上下载。

Simultaneous Machine Translation with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.06706
  • repo_url: None
  • paper_authors: Minghan Wang, Jinming Zhao, Thuy-Trang Vu, Fatemeh Shiri, Ehsan Shareghi, Gholamreza Haffari
  • for: 本研究探讨了大语言模型(LLM)在同时翻译(SimulMT)任务中的可行性。
  • methods: 我们基于传统方法,提出了一种简单 yet effective的混合策略,使得 LLM 可以在 SimulMT 中参与无需额外训练。我们还进行了 Supervised Fine-Tuning(SFT),并在混合全句和前缀句上进行了训练。
  • results: 我们在使用 Llama2-7B-chat 进行 nine 种语言对的实验中发现,LLM 可以 дости到与专门的 SimulMT 模型相同的翻译质量和延迟。
    Abstract Large language models (LLM) have demonstrated their abilities to solve various natural language processing tasks through dialogue-based interactions. For instance, research indicates that LLMs can achieve competitive performance in offline machine translation tasks for high-resource languages. However, applying LLMs to simultaneous machine translation (SimulMT) poses many challenges, including issues related to the training-inference mismatch arising from different decoding patterns. In this paper, we explore the feasibility of utilizing LLMs for SimulMT. Building upon conventional approaches, we introduce a simple yet effective mixture policy that enables LLMs to engage in SimulMT without requiring additional training. Furthermore, after Supervised Fine-Tuning (SFT) on a mixture of full and prefix sentences, the model exhibits significant performance improvements. Our experiments, conducted with Llama2-7B-chat on nine language pairs from the MUST-C dataset, demonstrate that LLM can achieve translation quality and latency comparable to dedicated SimulMT models.
    摘要 大型自然语言处理模型(LLM)已经在对话基于的交互中展示了各种自然语言处理任务的能力。例如,研究表明,LLM可以在高资源语言的机器翻译任务中达到竞争性的性能。然而,将LLM应用于同时机器翻译(SimulMT)存在许多挑战,包括各种训练-推理匹配问题。在这篇论文中,我们探讨了LLM在SimulMT中的可行性。基于传统方法,我们介绍了一种简单 yet有效的混合策略,使得LLM可以不需要额外训练参与SimulMT。此外,通过精心精度训练(SFT)在混合全文和前缀句子上,模型表现出了显著的性能改善。我们的实验,使用Llama2-7B-chat在MUST-C数据集上进行了九种语言对的实验,显示LLM可以达到与专门的SimulMT模型相同的翻译质量和延迟。

VLSlice: Interactive Vision-and-Language Slice Discovery

  • paper_url: http://arxiv.org/abs/2309.06703
  • repo_url: https://github.com/slymane/vlslice
  • paper_authors: Eric Slyman, Minsuk Kahng, Stefan Lee
  • for: 本研究旨在开发一种可互动的系统,以帮助用户发现 Representation-level 下的凝合性vision-and-language slice,从未标注的图像集中。
  • methods: 本研究使用大规模预训练,以学习可转移的模型,并通过用户指导的方式,自动发现vision-and-language slice。
  • results: 在用户研究中(n=22),VLSlice能够快速生成多样性凝合性vision-and-language slice,并且发布了这个工具给公众。
    Abstract Recent work in vision-and-language demonstrates that large-scale pretraining can learn generalizable models that are efficiently transferable to downstream tasks. While this may improve dataset-scale aggregate metrics, analyzing performance around hand-crafted subgroups targeting specific bias dimensions reveals systemic undesirable behaviors. However, this subgroup analysis is frequently stalled by annotation efforts, which require extensive time and resources to collect the necessary data. Prior art attempts to automatically discover subgroups to circumvent these constraints but typically leverages model behavior on existing task-specific annotations and rapidly degrades on more complex inputs beyond "tabular" data, none of which study vision-and-language models. This paper presents VLSlice, an interactive system enabling user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior, denoted as vision-and-language slices, from unlabeled image sets. We show that VLSlice enables users to quickly generate diverse high-coherency slices in a user study (n=22) and release the tool publicly.
    摘要 This paper introduces VLSlice, an interactive system that enables user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior, or vision-and-language slices, from unlabeled image sets. Our user study (n=22) shows that VLSlice can quickly generate diverse high-coherency slices, and we have released the tool publicly.

Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish

  • paper_url: http://arxiv.org/abs/2309.06698
  • repo_url: https://github.com/gglab-ku/turkish-plu
  • paper_authors: Arda Uzunoğlu, Gözde Gül Şahin
  • For: The paper is written for the field of natural language processing, with a focus on procedural natural language understanding (PLU) and its applications in Turkish.* Methods: The paper uses automated translation tools to expand the number of Turkish tutorials on wikiHow, and implements strong baseline models for PLU tasks such as linking actions, goal inference, and summarization using fine-tuned language-specific and multilingual models.* Results: The paper finds that language-specific models consistently outperform their multilingual models by a significant margin across most PLU tasks, and releases the corpus, downstream tasks, and baseline models for future research.
    Abstract Understanding procedural natural language (e.g., step-by-step instructions) is a crucial step to execution and planning. However, while there are ample corpora and downstream tasks available in English, the field lacks such resources for most languages. To address this gap, we conduct a case study on Turkish procedural texts. We first expand the number of tutorials in Turkish wikiHow from 2,000 to 52,000 using automated translation tools, where the translation quality and loyalty to the original meaning are validated by a team of experts on a random set. Then, we generate several downstream tasks on the corpus, such as linking actions, goal inference, and summarization. To tackle these tasks, we implement strong baseline models via fine-tuning large language-specific models such as TR-BART and BERTurk, as well as multilingual models such as mBART, mT5, and XLM. We find that language-specific models consistently outperform their multilingual models by a significant margin across most procedural language understanding (PLU) tasks. We release our corpus, downstream tasks and the baseline models with https://github.com/ GGLAB-KU/turkish-plu.
    摘要 理解进程自然语言(例如步骤说明)是执行和规划的关键步骤。然而,当前大多数语言的相关资源却缺乏。为了填补这一空白,我们进行了关于土耳其进程文本的 caso study。我们首先使用自动翻译工具将土耳其wikiHow的教程从2,000个增加到52,000个,并验证翻译质量和原始意义的忠诚性。然后,我们生成了多个下游任务,例如行为连接、目标推理和概要。为了解决这些任务,我们实施了强大的基线模型,包括TR-BART、BERTurk和多语言模型如mBART、mT5和XLM。我们发现语言特定模型在大多数进程自然语言理解(PLU)任务上一般性能较高,与多语言模型相比。我们将我们的资料库、下游任务和基线模型发布在https://github.com/GGLAB-KU/turkish-plu上。

Statistical Rejection Sampling Improves Preference Optimization

  • paper_url: http://arxiv.org/abs/2309.06657
  • repo_url: None
  • paper_authors: Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J. Liu, Jialu Liu
  • for: 提高语言模型与人类偏好的对齐,以提高语言模型的性能。
  • methods: 使用 Reinforcement Learning from Human Feedback (RLHF) 和 Offline Methods such as Sequence Likelihood Calibration (SLiC) 和 Direct Preference Optimization (DPO),以提高稳定性和扩展性,同时保持竞争性。
  • results: RSO 可以准确地估计目标优质策略,并在三种不同任务上进行了广泛的实验,证明了 RSO 可以在 LLM 和人类评分者的评价下表现更好。
    Abstract Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimization (PPO). Recently, offline methods such as Sequence Likelihood Calibration (SLiC) and Direct Preference Optimization (DPO) have emerged as attractive alternatives, offering improvements in stability and scalability while maintaining competitive performance. SLiC refines its loss function using sequence pairs sampled from a supervised fine-tuned (SFT) policy, while DPO directly optimizes language models based on preference data, foregoing the need for a separate reward model. However, the maximum likelihood estimator (MLE) of the target optimal policy requires labeled preference pairs sampled from that policy. DPO's lack of a reward model constrains its ability to sample preference pairs from the optimal policy, and SLiC is restricted to sampling preference pairs only from the SFT policy. To address these limitations, we introduce a novel approach called Statistical Rejection Sampling Optimization (RSO) that aims to source preference data from the target optimal policy using rejection sampling, enabling a more accurate estimation of the optimal policy. We also propose a unified framework that enhances the loss functions used in both SLiC and DPO from a preference modeling standpoint. Through extensive experiments across three diverse tasks, we demonstrate that RSO consistently outperforms both SLiC and DPO on evaluations from both Large Language Model (LLM) and human raters.
    摘要 改善语言模型与人类偏好的对应仍然是当前研究挑战。先前的方法主要使用人类反馈学习(RLHF)的在线RL方法,如距离优化策略(PPO)。在最近,offline方法,如序列可信度调整(SLiC)和直接偏好优化(DPO),作为有力的代替方案而出现。SLiC通过使用监督练好的(SFT)政策来细化损失函数,而DPO直接基于偏好数据来优化语言模型,不需要分离的奖励模型。然而,MLE目标优化政策的最大可能性需要从该政策中标注的偏好对。DPO缺乏奖励模型,因此无法从优化政策中采样偏好对,而SLiC只能从SFT政策中采样偏好对。为解决这些限制,我们提出了一种新的方法 called Statistical Rejection Sampling Optimization(RSO),它通过拒绝采样来源优化政策,以获得更加准确的优化策略。我们还提出了一种统一框架,它可以从偏好模型角度来增强SLiC和DPO的损失函数。通过对三个多样化任务进行广泛的实验,我们证明了RSO在LLM和人类评分者的评价中一直表现优于SLiC和DPO。

cs.LG - 2023-09-13

Tackling the dimensions in imaging genetics with CLUB-PLS

  • paper_url: http://arxiv.org/abs/2309.07352
  • repo_url: None
  • paper_authors: Andre Altmann, Ana C Lawry Aguila, Neda Jahanshad, Paul M Thompson, Marco Lorenzi
  • for: 链接高维数据在两个领域,如遗传学数据和大脑成像数据,以探索这两个领域之间的关系。
  • methods: 使用Partial Least Squares(PLS)基本框架,称为Cluster-Bootstrap PLS(CLUB-PLS),可以处理高维输入维度在两个领域以及大样本大小。该框架使用cluster bootstrap提供了robust统计量,以确定单个输入特征在两个领域中的有效性。
  • results: 对33,000名UK Biobank测试者的表面积和 cortical thickness进行了研究,发现了107个遗传因素- Fenotype对,与386个不同的基因相关。大多数这些Loci可以技术上验证:使用 классиclic GWAS或Genome-Wide Inferred Statistics(GWIS),发现85个Loci-Phenotype对超过了遗传学建议的(P<1e-05)阈值。
    Abstract A major challenge in imaging genetics and similar fields is to link high-dimensional data in one domain, e.g., genetic data, to high dimensional data in a second domain, e.g., brain imaging data. The standard approach in the area are mass univariate analyses across genetic factors and imaging phenotypes. That entails executing one genome-wide association study (GWAS) for each pre-defined imaging measure. Although this approach has been tremendously successful, one shortcoming is that phenotypes must be pre-defined. Consequently, effects that are not confined to pre-selected regions of interest or that reflect larger brain-wide patterns can easily be missed. In this work we introduce a Partial Least Squares (PLS)-based framework, which we term Cluster-Bootstrap PLS (CLUB-PLS), that can work with large input dimensions in both domains as well as with large sample sizes. One key factor of the framework is to use cluster bootstrap to provide robust statistics for single input features in both domains. We applied CLUB-PLS to investigating the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.
    摘要 一个主要挑战在生物遗传学和相关领域是将一维数据(例如遗传数据)与另一维数据(例如脑成像数据)相关联。现有的标准方法在这个领域是在每个预先定义的成像测量方面执行一次全 genomic association study (GWAS)。although this approach has been incredibly successful, one shortcoming is that phenotypes must be predefined. As a result, effects that are not limited to predefined regions of interest or that reflect larger brain-wide patterns can easily be overlooked.在这种工作中,我们引入了一种基于半最似方法的框架,我们称之为Cluster-Bootstrap PLS (CLUB-PLS),它可以处理大量的输入维度在两个领域以及大样本大小。一个关键因素是使用cluster bootstrap提供了 robust的统计参数 для单个输入特征在两个领域。我们使用CLUB-PLS investigated the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.Translation notes:* "一维数据" and "另一维数据" are used to refer to the two different types of data, i.e., genetic data and brain imaging data.* "遗传数据" and "成像数据" are used to refer to the two types of data, respectively.* "预先定义" means "pre-defined" in English.* "phenotypes" are referred to as "特征" in Chinese.* "CLUB-PLS" is a combination of "Cluster-Bootstrap PLS" and "遗传基因" in Chinese, which is used to refer to the framework introduced in the text.* "Genome-wide suggestive" is not a direct translation of any specific Chinese phrase, but it is used to refer to the threshold of P<1e-05 used in the study.

Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains

  • paper_url: http://arxiv.org/abs/2309.07344
  • repo_url: None
  • paper_authors: Md Nasim, Yexiang Xue
  • for: 加速Partial Differential Equations(PDEs)学习,以提速科学发现。
  • methods: 利用随机投影加速PDE更新,并可以拓宽应用范围。
  • results: 实验证明,提议的Reel可以在压缩数据 circumstance下减少70-98%的训练时间,而无损失模型质量。
    Abstract Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.
    摘要 加速部分泛函方程(PDE)从实验数据学习会加速科学发现的速度。先前的随机算法利用PDE更新中的稀疏性进行加速。然而,这些方法只适用于有限个可分解PDE中的稀疏特征。我们提议Reel,它通过随机投影加速PDE学习,并具有远更广泛的应用可能性。Reel利用更新中的稀疏特征进行稀疏更新的分解,从而实现高效的学习。当更新来源包括大面积上逐渐变化的项目(稀疏在频域)以及一些集中在小范围内的快速更新(稀疏在值域)时,Reel可以高效地学习PDE模型。随机投影后,可以压缩稀疏信号进行学习。为扩展模型的应用范围,Reel使用泰勒级数展开来近似非线性PDE更新为多项式形式。理论上,我们得出了原始维度的几乎常量因子近似关系。实验证明,我们提议的Reel可以在压缩数据时对PDE模型进行更快的学习(70-98%减少训练时间),并且与非压缩模型的质量相似。

User Training with Error Augmentation for Electromyogram-based Gesture Classification

  • paper_url: http://arxiv.org/abs/2309.07289
  • repo_url: None
  • paper_authors: Yunus Bicer, Niklas Smedemark-Margulies, Basak Celik, Elifnur Sunger, Ryan Orendorff, Stephanie Naufel, Tales Imbiriba, Deniz Erdo{ğ}mu{ş}, Eugene Tunik, Mathew Yarossi
  • for: 这个研究是为了开发一个基于Surface electromyographic (sEMG)的实时控制系统,以便通过识别手势来控制用户界面。
  • methods: 这个系统使用了机器学习算法来实时识别手势,并将sEMG数据流入这个算法中。在训练阶段,参与者获得了三种反馈:正确反馈、 modificated 反馈和无反馈。
  • results: 实验结果显示,相比基准值,modified 反馈 condtion 能够导致更高的精度和更好的手势类别分类。这些结果表明,在一个 gamified 用户界面中,通过实时反馈和手势识别的修改,可以实现INTUITIVE、快速和精度的任务取得。
    Abstract We designed and tested a system for real-time control of a user interface by extracting surface electromyographic (sEMG) activity from eight electrodes in a wrist-band configuration. sEMG data were streamed into a machine-learning algorithm that classified hand gestures in real-time. After an initial model calibration, participants were presented with one of three types of feedback during a human-learning stage: veridical feedback, in which predicted probabilities from the gesture classification algorithm were displayed without alteration, modified feedback, in which we applied a hidden augmentation of error to these probabilities, and no feedback. User performance was then evaluated in a series of minigames, in which subjects were required to use eight gestures to manipulate their game avatar to complete a task. Experimental results indicated that, relative to baseline, the modified feedback condition led to significantly improved accuracy and improved gesture class separation. These findings suggest that real-time feedback in a gamified user interface with manipulation of feedback may enable intuitive, rapid, and accurate task acquisition for sEMG-based gesture recognition applications.
    摘要 我们设计并测试了一个实时控制用户界面的系统,通过抽取背部电 MYO 活动的八个电极配置。 MYO 数据流入一个机器学习算法,并在实时进行手势分类。在人类学习阶段,参与者接受了三种反馈:正确反馈、修改反馈和无反馈。在不同反馈情况下,参与者在一系列小游戏中使用八种手势来控制游戏角色完成任务。实验结果表明,相比基eline,修改反馈情况下的准确率和手势分类率显著提高。这些发现表明,在实时反馈的 gamified 用户界面中,可以通过手势识别应用的修改反馈来快速、准确地学习任务。

Simultaneous inference for generalized linear models with unmeasured confounders

  • paper_url: http://arxiv.org/abs/2309.07261
  • repo_url: None
  • paper_authors: Jin-Hong Du, Larry Wasserman, Kathryn Roeder
  • for: This paper is written for researchers and practitioners in the field of genomic studies, particularly those interested in large-scale hypothesis testing and confounding effect adjustment.
  • methods: The paper proposes a unified statistical estimation and inference framework for multivariate generalized linear models in the presence of confounding effects. The method leverages orthogonal structures and integrates linear projections into three key stages: separating marginal and uncorrelated confounding effects, jointly estimating latent factors and primary effects, and incorporating projected and weighted bias-correction steps for hypothesis testing.
  • results: The paper establishes various identification conditions and non-asymptotic error bounds, and shows effective Type-I error control of asymptotic $z$-tests. Numerical experiments demonstrate that the proposed method controls the false discovery rate and is more powerful than alternative methods. The paper also demonstrates the suitability of adjusting confounding effects when significant covariates are absent from the model using single-cell RNA-seq counts from two groups of samples.
    Abstract Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    摘要 tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you need the translation in Traditional Chinese, please let me know.

All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks

  • paper_url: http://arxiv.org/abs/2309.07250
  • repo_url: None
  • paper_authors: Richard D. P. East, Guillermo Alonso-Linaje, Chae-Yeun Park
  • for: 这个论文的目的是提出使用矩阵网络来构建SU(2)对称的量子电路 ansatz,以实现量子变分析算法的高效运行。
  • methods: 这个论文使用矩阵网络来构建SU(2)对称的量子电路 ansatz,并证明其与其他已知的构建方法相等。
  • results: 该论文的实验结果表明,使用这种SU(2)对称的量子电路 ansatz可以提高量子变分析算法的性能,这表明它们可以应用于更多的实际问题。
    Abstract Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.
    摘要 In this paper, we propose using spin networks, a form of directed tensor network that is invariant under group transformation, to develop SU(2) equivariant quantum circuit ansätze. These circuits possess spin rotation symmetry. By changing to a basis that block diagonalizes the SU(2) group action, these networks provide a natural building block for constructing parameterized equivariant quantum circuits.We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and general permutations, but is more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results show that our equivariant circuits improve the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.

EarthPT: a foundation model for Earth Observation

  • paper_url: http://arxiv.org/abs/2309.07207
  • repo_url: None
  • paper_authors: Michael J. Smith, Luke Fleming, James E. Geach
  • for: 该研究开发了一种 Earth Observation(EO)预训transformer,即 EarthPT,用于预测未来地表反射特征。
  • methods: 该模型使用了700万参数的解码变换器基础模型,通过自我supervised方式进行训练,并特地针对地观用 caso设计。
  • results: 研究表明,EarthPT 是一个高效的预测器,可以准确预测未来地表反射特征,包括 Normalised Difference Vegetation Index(NDVI)的进化。这些预测结果在5个月的测试集期间 Typical error 约为0.05(在自然范围内的 -1到1 之间),超过了基于历史平均值的简单阶段模型。此外,研究还发现,EarthPT 学习的嵌入在 Semantically meaningful information 上,可以用于下游任务,如高精度、动态地用分类。
    Abstract We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'
    摘要 我们介绍 EarthPT -- 一种地球观测(EO)预训练的转换器。 EarthPT 是一个 700 万参数的解码转换器基模型,通过自我授课的自然语言模式进行了专门为 EO 应用场景设计和训练。我们展示了 EarthPT 是一个准确预测未来像素级表面反射的有效预测器,可以在 400-2300 nm 范围内准确预测未来数个月内的表面反射。例如,NDVI(Normalised Difference Vegetation Index)的发展趋势预测 Error 通常在 pixel 级别为 approximately 0.05(在自然范围内),而 phase-folded 模型基于历史平均值的预测 Error 较高。我们还示出了 EarthPT 学习的嵌入都具有Semantically meaningful information,可以用于下游任务,如高精度、动态地球用途分类。另外,由于 EO 数据的备受丰富,我们可以在理论上利用 quadrillions 的training tokens,根据 Large Language Models(LLMs)的神经Scaling 法则,无限制地扩展 EarthPT 和其他类似的 Large Observation Models。

Data Augmentation via Subgroup Mixup for Improving Fairness

  • paper_url: http://arxiv.org/abs/2309.07110
  • repo_url: None
  • paper_authors: Madeline Navarro, Camille Little, Genevera I. Allen, Santiago Segarra
  • for: 提高群体公平性,因为许多实际应用中机器学习系统会出现因为社会偏见而导致的各个群体之间的偏见。
  • methods: 使用对应 subgroup 的 pairwise mixup 数据扩展方法,以促进公平和准确的决策边界 для所有子群体。
  • results: 通过对 simulated 和实际数据进行比较,我们的方法可以实现公平的结果,同时保持或提高准确性。
    Abstract In this work, we propose data augmentation via pairwise mixup across subgroups to improve group fairness. Many real-world applications of machine learning systems exhibit biases across certain groups due to under-representation or training data that reflects societal biases. Inspired by the successes of mixup for improving classification performance, we develop a pairwise mixup scheme to augment training data and encourage fair and accurate decision boundaries for all subgroups. Data augmentation for group fairness allows us to add new samples of underrepresented groups to balance subpopulations. Furthermore, our method allows us to use the generalization ability of mixup to improve both fairness and accuracy. We compare our proposed mixup to existing data augmentation and bias mitigation approaches on both synthetic simulations and real-world benchmark fair classification data, demonstrating that we are able to achieve fair outcomes with robust if not improved accuracy.
    摘要 在这项工作中,我们提出了通过对组比对的混合来提高群公平性。许多实际应用中的机器学习系统具有因社会偏见而导致的各个群体之间的偏见。 Drawing inspiration from the successes of mixup in improving classification performance, we develop a pairwise mixup scheme to augment training data and encourage fair and accurate decision boundaries for all subgroups. 数据增强为群公平性的目的是增加受 represcribed 群体的样本,平衡 subgroup。此外,我们的方法可以利用混合的通用能力来提高公平性和准确性。我们与现有的数据增强和偏见缓和方法进行比较,在 sintetic simulations 和实际的偏见分类数据上表明,我们可以实现公平的结果,并且准确性可能会更高。

The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning

  • paper_url: http://arxiv.org/abs/2309.07072
  • repo_url: None
  • paper_authors: Alexander Bastounis, Alexander N. Gorban, Anders C. Hansen, Desmond J. Higham, Danil Prokhorov, Oliver Sutton, Ivan Y. Tyukin, Qinghua Zhou
  • for: 本研究探讨了神经网络在分类任务中确定稳定性和准确性的理论限制。
  • methods: 本研究使用了传统的分布式随机过程框架和最小化实际风险的算法,可能受权重正则化的影响。
  • results: 研究发现,有许多任务情况下,计算和验证理想稳定和准确的神经网络是极其困难,甚至不可能,即使理想的解在给定神经网络架构中存在。
    Abstract In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures.
    摘要 在这项工作中,我们评估了神经网络在分类任务中的理论限制。我们考虑了传统的分布无关框架和算法,以降低采样风险并可能受到权重正则化。我们发现,有一大家族任务, computing和验证理想稳定和准确的神经网络在上述设定下是极其困难,甚至存在这种理想解在给定的神经网络架构中。Note: " Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

An Extreme Learning Machine-Based Method for Computational PDEs in Higher Dimensions

  • paper_url: http://arxiv.org/abs/2309.07049
  • repo_url: None
  • paper_authors: Yiran Wang, Suchuan Dong
  • for: 解决高维partial differential equation (PDE)问题
  • methods: 使用Randomized Neural Networks方法,包括一种基于ELM的扩展方法和一种基于Approximate Theory of Functional Connections (A-TFC)的减少方法
  • results: 可以精确地解决高维PDE问题,错误水平与机器精度相似,并且比physics-informed neural network (PINN)方法更加经济和精准。
    Abstract We present two effective methods for solving high-dimensional partial differential equations (PDE) based on randomized neural networks. Motivated by the universal approximation property of this type of networks, both methods extend the extreme learning machine (ELM) approach from low to high dimensions. With the first method the unknown solution field in $d$ dimensions is represented by a randomized feed-forward neural network, in which the hidden-layer parameters are randomly assigned and fixed while the output-layer parameters are trained. The PDE and the boundary/initial conditions, as well as the continuity conditions (for the local variant of the method), are enforced on a set of random interior/boundary collocation points. The resultant linear or nonlinear algebraic system, through its least squares solution, provides the trained values for the network parameters. With the second method the high-dimensional PDE problem is reformulated through a constrained expression based on an Approximate variant of the Theory of Functional Connections (A-TFC), which avoids the exponential growth in the number of terms of TFC as the dimension increases. The free field function in the A-TFC constrained expression is represented by a randomized neural network and is trained by a procedure analogous to the first method. We present ample numerical simulations for a number of high-dimensional linear/nonlinear stationary/dynamic PDEs to demonstrate their performance. These methods can produce accurate solutions to high-dimensional PDEs, in particular with their errors reaching levels not far from the machine accuracy for relatively lower dimensions. Compared with the physics-informed neural network (PINN) method, the current method is both cost-effective and more accurate for high-dimensional PDEs.
    摘要 我们提出了两种有效的方法来解决高维度 partial differential equation (PDE) 基于随机 нейрон网络。这些方法都是基于随机 нейрон网络的通用测量性质,并将 ELM 方法从低维度扩展到高维度。在第一种方法中,不知之解决场在 d 维度中是使用随机Feed-Forward нейрон网络表示,其中隐藏层参数随机分配并固定,而输出层参数则是通过训练而得到。PDE 和边界/初始条件以及当地的连续条件( для本地方法)在一些随机内部/边界点上被规律。由于这些点的数量增加,它们的线性或非线性代数系统的最小二乘解决方案提供了训练随机 нейрон网络参数的值。在第二种方法中,高维度 PDE 问题被重新表述为一个受限的表述基于 Approximate 函数连接理论 (A-TFC),这个方法可以避免 PDE 的维度增加所带来的减少维度增加的数量增加。在 A-TFC 受限表述中,自由场函数是使用随机Feed-Forward нейрон网络表示,并通过与第一种方法相似的训练程序训练。我们将在一些高维度线性/非线性站点/动态 PDE 中进行丰富的数据验证,以示其性能。这些方法可以精确地解决高维度 PDE,特别是其错误在相对较低维度时已经接近机器精度。相比于物理学 informed neural network (PINN) 方法,现在的方法更加成本效益和精度高于高维度 PDE。

Optimal transport distances for directed, weighted graphs: a case study with cell-cell communication networks

  • paper_url: http://arxiv.org/abs/2309.07030
  • repo_url: None
  • paper_authors: James S. Nagai, Ivan G. Costa, Michael T. Schaub
  • for: comparing directed graphs using optimal transport distances
  • methods: proposes two distance measures based on variants of optimal transport (Wasserstein and Gromov-Wasserstein)
  • results: evaluates the performance of the two distance measures on simulated graph data and real-world directed cell-cell communication graphs.Here’s the summary in the format you requested:
  • for: 比较直接图使用最优运输距离
  • methods: 提出了基于变种最优运输的两种距离度量(水星车运输和格罗莫夫-水星车运输)
  • results: 对于 simulate图数据和实际直接细胞通信图数据进行评估和比较
    Abstract Comparing graphs by means of optimal transport has recently gained significant attention, as the distances induced by optimal transport provide both a principled metric between graphs as well as an interpretable description of the associated changes between graphs in terms of a transport plan. As the lack of symmetry introduces challenges in the typically considered formulations, optimal transport distances for graphs have mostly been developed for undirected graphs. Here, we propose two distance measures to compare directed graphs based on variants of optimal transport: (i) an earth movers distance (Wasserstein) and (ii) a Gromov-Wasserstein (GW) distance. We evaluate these two distances and discuss their relative performance for both simulated graph data and real-world directed cell-cell communication graphs, inferred from single-cell RNA-seq data.
    摘要 comparing 图表使用最优运输方法已经吸引了广泛关注,因为最优运输距离提供了图表之间的原则性的距离度量,同时还提供了可解释的交通计划。由于不具有对称性,通常考虑的形式ulation中的最优运输距离多是针对无向图进行开发。在这里,我们提出了两种用于比较导向图的距离度量:(一)地球搬运距离(沃asserstein)和(二)格罗莫夫-沃asserstein(GW)距离。我们评估了这两种距离度量,并对假设数据和真实世界导向细胞通信图进行了评估。

Mitigating Adversarial Attacks in Federated Learning with Trusted Execution Environments

  • paper_url: http://arxiv.org/abs/2309.07197
  • repo_url: https://github.com/queyrusi/pelta
  • paper_authors: Simon Queyrut, Valerio Schiavoni, Pascal Felber
  • for: 本研究旨在防止 federated learning(FL)中的模型更新计算被攻击者抢夺,以保护用户数据隐私。
  • methods: 本研究使用 Trusted Execution Environments(TEEs)来防止攻击者通过黑盒攻击(white-box attack)制作恶意样本。
  • results: 本研究在三个常用的数据集(CIFAR-10、CIFAR-100和ImageNet)上评估了 Pelta 机制,并证明了其能够有效地防止六种白盒攻击(包括 Projected Gradient Descent、Momentum Iterative Method、Auto Projected Gradient Descent 和 Carlini & Wagner 攻击)。
    Abstract The main premise of federated learning (FL) is that machine learning model updates are computed locally to preserve user data privacy. This approach avoids by design user data to ever leave the perimeter of their device. Once the updates aggregated, the model is broadcast to all nodes in the federation. However, without proper defenses, compromised nodes can probe the model inside their local memory in search for adversarial examples, which can lead to dangerous real-world scenarios. For instance, in image-based applications, adversarial examples consist of images slightly perturbed to the human eye getting misclassified by the local model. These adversarial images are then later presented to a victim node's counterpart model to replay the attack. Typical examples harness dissemination strategies such as altered traffic signs (patch attacks) no longer recognized by autonomous vehicles or seemingly unaltered samples that poison the local dataset of the FL scheme to undermine its robustness. Pelta is a novel shielding mechanism leveraging Trusted Execution Environments (TEEs) that reduce the ability of attackers to craft adversarial samples. Pelta masks inside the TEE the first part of the back-propagation chain rule, typically exploited by attackers to craft the malicious samples. We evaluate Pelta on state-of-the-art accurate models using three well-established datasets: CIFAR-10, CIFAR-100 and ImageNet. We show the effectiveness of Pelta in mitigating six white-box state-of-the-art adversarial attacks, such as Projected Gradient Descent, Momentum Iterative Method, Auto Projected Gradient Descent, the Carlini & Wagner attack. In particular, Pelta constitutes the first attempt at defending an ensemble model against the Self-Attention Gradient attack to the best of our knowledge. Our code is available to the research community at https://github.com/queyrusi/Pelta.
    摘要 主要想法 behind 联合学习(FL)是计算机机器学习模型更新在保持用户数据隐私的情况下进行本地计算。这种方法避免由设计计划用户数据从设备外部传输。一旦更新被聚合,模型会被广播到所有联合节点。然而,不当防御可能导致受到攻击的节点探测模型内部的攻击示例,这可能导致危险的实际场景。例如,在图像应用程序中,攻击示例可以是微调到人类眼睛的图像,使得本地模型 incorrect 分类。这些攻击示例然后会被传递到受到攻击的节点的对应模型,以重新发动攻击。常见的攻击策略包括修改交通标志(patch attacks),使得自动驾驶车辆不再认可,以及模拟不变的样本,损害本地联合学习方案的可靠性。Pelta 是一种新的防御机制,基于可信执行环境(TEEs),减少了攻击者对恶意示例的能力。Pelta 在 TEE 中隐藏了反propagation chain rule 的第一部分,通常由攻击者利用来制作恶意示例。我们使用三个常见的 dataset:CIFAR-10、CIFAR-100 和 ImageNet,评估 Pelta 对 state-of-the-art 精度模型的防御效果。我们发现 Pelta 可以有效防止六种白盒 state-of-the-art 攻击,包括 Projected Gradient Descent、Momentum Iterative Method、Auto Projected Gradient Descent 和 Carlini & Wagner 攻击。尤其是,Pelta 是首次防御 ensemble 模型对 Self-Attention Gradient 攻击的尝试。我们的代码可以在 https://github.com/queyrusi/Pelta 上下载。

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

  • paper_url: http://arxiv.org/abs/2309.08561
  • repo_url: None
  • paper_authors: Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet
  • for: 这篇论文的目的是提出一种新的自动话语识别(ASR)系统,用于检测用户定义的关键字。
  • methods: 这篇论文使用了一种新的文本编码器,将语音输入转换为关键字条件的正规化参数。这些参数 затем用于进行话语识别。
  • results: 这篇论文的实验结果显示,这种方法可以在多种多元的语言 benchmark 上取得优秀的成绩,并且在不同的语言中也能够获得显著的改善。此外,这种方法还可以在无法语言training的情况下提供substantial的性能改善。
    Abstract Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to baseline methods.
    摘要 开放词汇关键点检测是自动语音识别(ASR)中一项重要和挑战性的任务,它的目标是在语音词语中检测用户定义的关键词。关键点检测方法通常将音频词语和关键词映射到一个共同嵌入空间中,以获得一些相似度分数。在这种工作中,我们提出了AdaKWS方法,它是一种新的关键点检测方法,其中文本编码器在输出关键词条件下的normalization参数。这些参数用于处理听频输入。我们对多种多语言的benchmark进行了广泛的评估,并显示了与最近的关键点检测和ASR基线方法相比的显著提高。此外,我们还研究了我们的方法在没有在训练中看到的低资源语言上的效果,结果表明了和基线方法相比的巨大性能提高。

Effect of hyperparameters on variable selection in random forests

  • paper_url: http://arxiv.org/abs/2309.06943
  • repo_url: https://github.com/imbs-hl/rf-hyperparameters-and-variable-selection
  • paper_authors: Cesaire J. K. Fouodo, Lea L. Kronziel, Inke R. König, Silke Szymczak
  • for: 这个研究是为了探讨Random Forest(RF)算法中的几个参数对预测模型和变量选择的影响。
  • methods: 研究使用了两个 simulation studies,使用理论分布和实验遗传学数据,以evaluate RF算法中参数的影响。
  • results: 研究发现,选择候选变量的程序(Vita和Boruta)受到几个参数的影响,包括mtry.prop和sample.fraction。这些参数的设定可以影响预测性能和变量选择的精度。
    Abstract Random forests (RFs) are well suited for prediction modeling and variable selection in high-dimensional omics studies. The effect of hyperparameters of the RF algorithm on prediction performance and variable importance estimation have previously been investigated. However, how hyperparameters impact RF-based variable selection remains unclear. We evaluate the effects on the Vita and the Boruta variable selection procedures based on two simulation studies utilizing theoretical distributions and empirical gene expression data. We assess the ability of the procedures to select important variables (sensitivity) while controlling the false discovery rate (FDR). Our results show that the proportion of splitting candidate variables (mtry.prop) and the sample fraction (sample.fraction) for the training dataset influence the selection procedures more than the drawing strategy of the training datasets and the minimal terminal node size. A suitable setting of the RF hyperparameters depends on the correlation structure in the data. For weakly correlated predictor variables, the default value of mtry is optimal, but smaller values of sample.fraction result in larger sensitivity. In contrast, the difference in sensitivity of the optimal compared to the default value of sample.fraction is negligible for strongly correlated predictor variables, whereas smaller values than the default are better in the other settings. In conclusion, the default values of the hyperparameters will not always be suitable for identifying important variables. Thus, adequate values differ depending on whether the aim of the study is optimizing prediction performance or variable selection.
    摘要 Random forests (RFs) 是高维数据研究中适用于预测模型和变量选择的好方法。RF算法中的超参数对预测性能和变量重要性估计的影响已经被研究过,但是如何影响RF基于变量选择仍然不清楚。我们通过两个 simulations studies使用理论分布和实际表达数据来评估超参数对Vita和Boruta变量选择过程的影响。我们评估了选择重要变量的能力(敏感度),同时控制false discovery rate(FDR)。我们的结果表明,RF超参数中分布 candidate variables的比例(mtry.prop)和训练集中的样本分布(sample.fraction)对选择过程产生更大的影响,而不是训练集的抽取方式和最小终节点大小。RF超参数的适用取决于数据中变量之间的相关性。对于弱相关的预测变量,默认值的mtry是最佳,但是较小的sample.fraction会导致更大的敏感度。相反,对于强相关的预测变量,默认值的sample.fraction没有影响敏感度,但是较小的值比默认值更好。因此,默认值的超参数不一定适用于identifyingimportant变量。因此,需要根据研究的目标是优化预测性能还是变量选择来选择合适的超参数值。

Modeling Dislocation Dynamics Data Using Semantic Web Technologies

  • paper_url: http://arxiv.org/abs/2309.06930
  • repo_url: None
  • paper_authors: Ahmad Zainul Ihsan, Said Fathalla, Stefan Sandfeld
  • for: 本研究旨在探讨Materials Science and Engineering领域中普遍研究的晶体材料,包括金属和半导体材料。晶体材料通常含有一种特定的缺陷,即“扭变”。这种缺陷对材料的性能产生重要影响,包括强度、裂解强度和ductility。
  • methods: 本研究使用semantic web技术来描述扭变行为,包括使用ontology来注释数据。我们对已有的Dislocation Ontology进行了扩展,添加了缺失的概念,并与两个域相关的ontology(i.e., Elementary Multi-perspective Material Ontology和Materials Design Ontology)进行了对接,以便效率地表示扭变数据。
  • results: 我们通过构建了一个知识图(DisLocKG), illustrate了扭变数据之间的关系。此外,我们还开发了一个SPARQL终点,允许用户进行extensive的查询DisLocKG。
    Abstract Research in the field of Materials Science and Engineering focuses on the design, synthesis, properties, and performance of materials. An important class of materials that is widely investigated are crystalline materials, including metals and semiconductors. Crystalline material typically contains a distinct type of defect called "dislocation". This defect significantly affects various material properties, including strength, fracture toughness, and ductility. Researchers have devoted a significant effort in recent years to understanding dislocation behavior through experimental characterization techniques and simulations, e.g., dislocation dynamics simulations. This paper presents how data from dislocation dynamics simulations can be modeled using semantic web technologies through annotating data with ontologies. We extend the already existing Dislocation Ontology by adding missing concepts and aligning it with two other domain-related ontologies (i.e., the Elementary Multi-perspective Material Ontology and the Materials Design Ontology) allowing for representing the dislocation simulation data efficiently. Moreover, we show a real-world use case by representing the discrete dislocation dynamics data as a knowledge graph (DisLocKG) that illustrates the relationship between them. We also developed a SPARQL endpoint that brings extensive flexibility to query DisLocKG.
    摘要 研究在材料科学和工程领域的ocus是设计、合成、性能和表现等材料的方面。一种广泛研究的材料是晶体材料,包括金属和半导体。晶体材料通常含有一种特殊的缺陷,即“扭变”。这种缺陷对材料的性能产生很大影响,包括强度、裂解强度和塑性。研究人员在过去几年中对扭变行为进行了广泛的研究,包括实验测定技术和 simulations。本文介绍了如何使用Semantic Web技术来模型来自扭变动力学 simulations的数据,包括使用ontology进行数据注释。我们将现有的扭变ontology扩展,添加缺失的概念和与其他两个领域相关的 Ontology(即Elementary Multi-perspective Material Ontology和Materials Design Ontology)进行对应,以便有效地表示扭变动力学数据。此外,我们还构建了一个知识图(DisLocKG),用于表示扭变动力学数据的关系。此外,我们还开发了一个SPARQL端点,以便通过查询DisLocKG进行广泛的灵活性调查。

Investigating the Impact of Action Representations in Policy Gradient Algorithms

  • paper_url: http://arxiv.org/abs/2309.06921
  • repo_url: None
  • paper_authors: Jan Schneider, Pierre Schumacher, Daniel Häufle, Bernhard Schölkopf, Dieter Büchler
  • for: investigate the impact of action representations on the learning performance of reinforcement learning algorithms
  • methods: use different analysis techniques to assess the effectiveness of action representations in RL
  • results: the action representation can significantly influence the learning performance on popular RL benchmark tasks, and some of the performance differences can be attributed to changes in the complexity of the optimization landscape.Here’s the full text in Simplified Chinese:
  • for: 这研究旨在 investigate reinforcement learning 算法的学习性能如何受到行为表现的影响
  • methods: 使用不同的分析技术来评估 action 表现对 reinforcement learning 算法的效果
  • results: 发现 action 表现可以对 popular reinforcement learning benchmark task 产生显著的影响,并且一些性能差异可以被归因于优化 landscape 的复杂度变化。
    Abstract Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-world tasks. However, influences on the learning performance of RL algorithms are often poorly understood in practice. We discuss different analysis techniques and assess their effectiveness for investigating the impact of action representations in RL. Our experiments demonstrate that the action representation can significantly influence the learning performance on popular RL benchmark tasks. The analysis results indicate that some of the performance differences can be attributed to changes in the complexity of the optimization landscape. Finally, we discuss open challenges of analysis techniques for RL algorithms.
    摘要 利用强化学习(RL)框架,可以解决复杂的实际任务。然而,RL算法的学习性能中的影响因素通常在实践中不够了解。我们讨论了不同的分析技术,并评估它们在RL算法中的有效性。我们的实验表明,行为表示可以对RL算法的学习性能产生重要影响。分析结果表明,一些性能差异可以归结于优化景观的复杂性变化。最后,我们讨论了RL算法分析技术的开放挑战。

Domain-Aware Augmentations for Unsupervised Online General Continual Learning

  • paper_url: http://arxiv.org/abs/2309.06896
  • repo_url: None
  • paper_authors: Nicolas Michel, Romain Negrel, Giovanni Chierchia, Jean-François Bercher
  • for: 提高Unsupervised Online General Continual Learning(UOGCL)中的学习稳定性,特别是在无丝总成本知识或任务变化信息的情况下。
  • methods: 提出了一种新的方法,通过定义流程相依的数据增强技术和一些实现策略,以增强对异构学习的记忆使用。
  • results: 与其他无监督方法相比,该方法在所有考虑的设置中达到了最佳效果,并将超级vised和无监督 continual learning之间的差降到最低水平。我们的领域相关增强过程可以适应其他回放基于方法,因此是一个有前途的策略 для continual learning。
    Abstract Continual Learning has been challenging, especially when dealing with unsupervised scenarios such as Unsupervised Online General Continual Learning (UOGCL), where the learning agent has no prior knowledge of class boundaries or task change information. While previous research has focused on reducing forgetting in supervised setups, recent studies have shown that self-supervised learners are more resilient to forgetting. This paper proposes a novel approach that enhances memory usage for contrastive learning in UOGCL by defining and using stream-dependent data augmentations together with some implementation tricks. Our proposed method is simple yet effective, achieves state-of-the-art results compared to other unsupervised approaches in all considered setups, and reduces the gap between supervised and unsupervised continual learning. Our domain-aware augmentation procedure can be adapted to other replay-based methods, making it a promising strategy for continual learning.
    摘要 continuous learning 是具有挑战性的,特别是在无监督的情况下,如无监督在线通用性Continual Learning (UOGCL),学习机器没有类别边界或任务变化信息的先前知识。以前的研究主要是降低忘记的方法,但最近的研究表明,自我监督学习者更抗忘记。这篇论文提出了一种新的方法,增强了对异构学习的内存使用,通过定义流程依赖的数据增强和一些实现技巧。我们提议的方法简单又高效,在所有考虑的设置中达到了其他无监督方法的状态艺术级 результа,并将无监督Continual Learning和监督Continual Learning之间的差距减少。我们的领域相关增强过程可以适应其他播放基于方法,使其成为持续学习的优秀策略。

A Robust SINDy Approach by Combining Neural Networks and an Integral Form

  • paper_url: http://arxiv.org/abs/2309.07193
  • repo_url: None
  • paper_authors: Ali Forootani, Pawan Goyal, Peter Benner
  • for: 在噪音和缺乏数据的情况下找到管理方程的研究已经是数据挖掘领域的活跃领域之一。
  • methods: 我们使用神经网络学习一种含义 repre sentation,使其不仅能够在测量数据附近生成输出,而且能够描述时间演化的输出。在SINDy框架下,我们学习这种动力系统。使用神经网络学习的隐式表示,我们获取了SINDy所需的导数据。为了增强我们的方法的稳定性,我们进一步添加了输出隐式网络的积分条件。
  • results: 我们提出了一种可靠的方法,可以在噪音和缺乏数据的情况下找到管理方程。我们通过多个初始条件的数据集来扩展我们的方法,并证明了它的高效性。与已有方法相比,我们的方法在噪音和缺乏数据的情况下表现更加稳定和有效。
    Abstract The discovery of governing equations from data has been an active field of research for decades. One widely used methodology for this purpose is sparse regression for nonlinear dynamics, known as SINDy. Despite several attempts, noisy and scarce data still pose a severe challenge to the success of the SINDy approach. In this work, we discuss a robust method to discover nonlinear governing equations from noisy and scarce data. To do this, we make use of neural networks to learn an implicit representation based on measurement data so that not only it produces the output in the vicinity of the measurements but also the time-evolution of output can be described by a dynamical system. Additionally, we learn such a dynamic system in the spirit of the SINDy framework. Leveraging the implicit representation using neural networks, we obtain the derivative information -- required for SINDy -- using an automatic differentiation tool. To enhance the robustness of our methodology, we further incorporate an integral condition on the output of the implicit networks. Furthermore, we extend our methodology to handle data collected from multiple initial conditions. We demonstrate the efficiency of the proposed methodology to discover governing equations under noisy and scarce data regimes by means of several examples and compare its performance with existing methods.
    摘要 发现管理方程的研究已经是数据挖掘的一个活跃领域,有一种广泛使用的方法是非线性动力学中的稀疏回归,称为SINDy。 despite several attempts, noisy and scarce data still pose a severe challenge to the success of the SINDy approach. In this work, we discuss a robust method to discover nonlinear governing equations from noisy and scarce data. To do this, we make use of neural networks to learn an implicit representation based on measurement data so that not only it produces the output in the vicinity of the measurements but also the time-evolution of output can be described by a dynamical system. Additionally, we learn such a dynamic system in the spirit of the SINDy framework. Leveraging the implicit representation using neural networks, we obtain the derivative information -- required for SINDy -- using an automatic differentiation tool. To enhance the robustness of our methodology, we further incorporate an integral condition on the output of the implicit networks. Furthermore, we extend our methodology to handle data collected from multiple initial conditions. We demonstrate the efficiency of the proposed methodology to discover governing equations under noisy and scarce data regimes by means of several examples and compare its performance with existing methods.

The effect of data augmentation and 3D-CNN depth on Alzheimer’s Disease detection

  • paper_url: http://arxiv.org/abs/2309.07192
  • repo_url: https://github.com/rturrisige/AD_classification
  • paper_authors: Rosanna Turrisi, Alessandro Verri, Annalisa Barla
  • for: 这个论文的目的是为了提高健康监测领域中的机器学习(ML)技术的可靠性和可重复性,特意采用了最佳实践来确保可重复性和可靠性。
  • methods: 这篇论文使用了多种数据增强技术和模型复杂度来影响总性表现。使用了ADNI数据集中的MRI数据,采用3D卷积神经网络(CNN)进行分类问题。实验设计包括跨 Validation 和多个训练尝试,以资料稀缺和初始随机参数资料补做。
  • results: 研究发现,不同的数据增强策略和模型复杂度对总表现的影响是显著的,最好的模型(8 CL, (B))在各个批处理和训练尝试中具有最高精度和稳定性。
    Abstract Machine Learning (ML) has emerged as a promising approach in healthcare, outperforming traditional statistical techniques. However, to establish ML as a reliable tool in clinical practice, adherence to best practices regarding data handling, experimental design, and model evaluation is crucial. This work summarizes and strictly observes such practices to ensure reproducible and reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. We investigate the impact of different data augmentation techniques and model complexity on the overall performance. We consider MRI data from ADNI dataset to address a classification problem employing 3D Convolutional Neural Network (CNN). The experiments are designed to compensate for data scarcity and initial random parameters by utilizing cross-validation and multiple training trials. Within this framework, we train 15 predictive models, considering three different data augmentation strategies and five distinct 3D CNN architectures, each varying in the number of convolutional layers. Specifically, the augmentation strategies are based on affine transformations, such as zoom, shift, and rotation, applied concurrently or separately. The combined effect of data augmentation and model complexity leads to a variation in prediction performance up to 10% of accuracy. When affine transformation are applied separately, the model is more accurate, independently from the adopted architecture. For all strategies, the model accuracy followed a concave behavior at increasing number of convolutional layers, peaking at an intermediate value of layers. The best model (8 CL, (B)) is the most stable across cross-validation folds and training trials, reaching excellent performance both on the testing set and on an external test set.
    摘要 机器学习(ML)在医疗领域已经出现为一种有前途的方法,超过了传统的统计方法。然而,为了在临床实践中确立ML为可靠的工具,需要严格遵循数据处理、实验设计和模型评价的最佳做法。本工作总结并严格遵循这些做法,以确保可重复和可靠的ML。特别是,我们选择了诊断阿尔茨海默病(AD)为例子,这是医疗领域中的一个挑战性问题。我们研究了不同的数据扩充技术和模型复杂度对总性性能的影响。我们使用ADNI数据集中的MRI数据进行分类问题,使用3D卷积神经网络(CNN)进行解决。实验设计包括资料缺乏和初始随机参数的补做,通过批处理和多个训练回合来实现。在这个框架下,我们训练了15个预测模型,其中包括三种不同的数据扩充策略和五种不同的3D CNN架构,每种架构都有不同的层数。具体来说,数据扩充策略包括同时应用几何变换,包括压缩、移动和旋转,而模型复杂度则由层数决定。结果显示,当应用几何变换分别时,模型的准确率更高,而且独立于采用的架构。对所有策略来说,模型准确率随层数的增加而下降,最佳模型(8 CL,B)在各个批处理弹性和训练回合中具有优秀表现,并在测试集和外部测试集上达到了优秀表现。

Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning

  • paper_url: http://arxiv.org/abs/2309.06869
  • repo_url: None
  • paper_authors: Uyen Tu Lieu, Natsuhiko Yoshinaga
  • For: 本研究使用力学学习控制了动态自组装的十二面体 quasi-кристаллические(DDQC)从粗糙粒子中。* Methods: 研究使用了Q学习方法来估算温度控制策略,并通过这种策略来生成DDQC。* Results: 研究发现,使用力学学习控制的温度规则可以更有效地生成DDQC,并且可以避免一些常见的缺陷。此外,研究还发现了一种自动发现束缚温度的机制,即通过动态变化的束缚温度来帮助系统形成DDQC。
    Abstract We propose reinforcement learning to control the dynamical self-assembly of the dodecagonal quasicrystal (DDQC) from patchy particles. The patchy particles have anisotropic interactions with other particles and form DDQC. However, their structures at steady states are significantly influenced by the kinetic pathways of their structural formation. We estimate the best policy of temperature control trained by the Q-learning method and demonstrate that we can generate DDQC with few defects using the estimated policy. The temperature schedule obtained by reinforcement learning can reproduce the desired structure more efficiently than the conventional pre-fixed temperature schedule, such as annealing. To clarify the success of the learning, we also analyse a simple model describing the kinetics of structural changes through the motion in a triple-well potential. We have found that reinforcement learning autonomously discovers the critical temperature at which structural fluctuations enhance the chance of forming a globally stable state. The estimated policy guides the system toward the critical temperature to assist the formation of DDQC.
    摘要 我们提议使用强化学习控制多十二边形 quasi-кристал(DDQC)由杂性粒子自组装。这些杂性粒子具有方向性相互作用,并且形成DDQC。然而,他们的稳定态结构受到其结构形成的动力学 PATHway 的影响。我们使用Q学习方法估算最佳温度控制策略,并证明可以使用这个策略生成几乎无损DDQC。我们所获得的温度规则可以更有效地生成愿望的结构,而不是传统的预先 fixing 温度规则,如热处理。为了证明学习的成功,我们还分析了一个描述结构变化的三重潜在陷阱中的简单模型。我们发现,强化学习自动发现了在结构变化中增强global stability的温度。我们估算的策略可以帮助系统向这个温度迁移,以便DDQC的形成。

Supervised Machine Learning and Physics based Machine Learning approach for prediction of peak temperature distribution in Additive Friction Stir Deposition of Aluminium Alloy

  • paper_url: http://arxiv.org/abs/2309.06838
  • repo_url: None
  • paper_authors: Akshansh Mishra
  • For: This paper aims to improve the understanding of the relationship between process parameters, thermal profiles, and microstructure in Additive Friction Stir Deposition (AFSD) for solid-state additive manufacturing.* Methods: The paper combines supervised machine learning (SML) and physics-informed neural networks (PINNs) to predict peak temperature distribution in AFSD from process parameters.* Results: The integrated ML approach is able to classify deposition quality from process factors with robust accuracy, providing comprehensive insights into tailoring microstructure through thermal management in AFSD.
    Abstract Additive friction stir deposition (AFSD) is a novel solid-state additive manufacturing technique that circumvents issues of porosity, cracking, and properties anisotropy that plague traditional powder bed fusion and directed energy deposition approaches. However, correlations between process parameters, thermal profiles, and resulting microstructure in AFSD remain poorly understood. This hinders process optimization for properties. This work employs a cutting-edge framework combining supervised machine learning (SML) and physics-informed neural networks (PINNs) to predict peak temperature distribution in AFSD from process parameters. Eight regression algorithms were implemented for SML modeling, while four PINNs leveraged governing equations for transport, wave propagation, heat transfer, and quantum mechanics. Across multiple statistical measures, ensemble techniques like gradient boosting proved superior for SML, with lowest MSE of 165.78. The integrated ML approach was also applied to classify deposition quality from process factors, with logistic regression delivering robust accuracy. By fusing data-driven learning and fundamental physics, this dual methodology provides comprehensive insights into tailoring microstructure through thermal management in AFSD. The work demonstrates the power of bridging statistical and physics-based modeling for elucidating AM process-property relationships.
    摘要 添加式摩擦拌填(AFSD)是一种新型的固体添加制造技术,可以避免传统粉末压缩和指向能量激光束等方法中的孔隙、裂纹和性能不均问题。然而,AFSD的进程参数与热 profiling以及结果结构之间的关系仍然不够了解,这限制了进程优化以获得理想的性能。这项工作使用了一种前沿的框架,结合监督式机器学习(SML)和物理学习网络(PINNs),预测AFSD中热分布的峰值温度。这里实现了8种回归算法 дляSML模型,而4种PINNs利用了传输、波动、热传递和量子力学的管理方程。在多个统计度量上, ensemble技术如Gradient Boosting在SML中证明了最低MSE值为165.78。此外,这种混合的机器学习方法还应用于分类处理过程因素,使用了梯度提升算法获得了可靠的准确率。通过融合数据驱动学习和基础物理学习,这种双重方法提供了全面的理解添加制造过程中热管理对微structure的影响,并且演示了将统计学和物理学基础模型融合的力量,以描述AM过程-性能关系。

Safe Reinforcement Learning with Dual Robustness

  • paper_url: http://arxiv.org/abs/2309.06835
  • repo_url: None
  • paper_authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li
  • for: 本研究旨在整合安全RL和RobustRL,提供一个系统性框架,以便在严重情况下同时保证任务完成度和安全性。
  • methods: 本研究使用束定二player零游戏为基础,提出了一个双政策迭代法,同时优化任务政策和安全政策,并证明了迭代法的收敛性。
  • results: 实验结果显示,DRAC算法在所有情况下(无敌手、安全敌手、性能敌手)具有高性能和持续安全性,较以基eline为主的所有基eline。
    Abstract Reinforcement learning (RL) agents are vulnerable to adversarial disturbances, which can deteriorate task performance or compromise safety specifications. Existing methods either address safety requirements under the assumption of no adversary (e.g., safe RL) or only focus on robustness against performance adversaries (e.g., robust RL). Learning one policy that is both safe and robust remains a challenging open problem. The difficulty is how to tackle two intertwined aspects in the worst cases: feasibility and optimality. Optimality is only valid inside a feasible region, while identification of maximal feasible region must rely on learning the optimal policy. To address this issue, we propose a systematic framework to unify safe RL and robust RL, including problem formulation, iteration scheme, convergence analysis and practical algorithm design. This unification is built upon constrained two-player zero-sum Markov games. A dual policy iteration scheme is proposed, which simultaneously optimizes a task policy and a safety policy. The convergence of this iteration scheme is proved. Furthermore, we design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC achieves high performance and persistent safety under all scenarios (no adversary, safety adversary, performance adversary), outperforming all baselines significantly.
    摘要 利用强化学习(Reinforcement Learning,RL)代理人是易受到敌对干扰的,这可能会导致任务性能下降或者安全要求不符。现有的方法可以在恶作用者不存在的假设下处理安全要求(例如,安全RL),或者只是关注性能对抗者(例如,Robust RL)。学习一个同时具有安全和Robust性的策略是RL领域的一个挑战。这是由于feasibility和optimal性之间的关系,feasibility只能在一个可行的区域内确定,而optimal性则需要学习最佳策略。为解决这个问题,我们提出了一个系统性的框架,这个框架包括问题设定、迭代算法、收敛分析以及实用算法设计。这个框架基于含约两个player零点游戏。我们提出了一种双策略迭代算法,该算法同时优化一个任务策略和一个安全策略。我们证明了这个迭代算法的收敛性。此外,我们还设计了一种深度学习算法,即dually robust actor-critic(DRAC)。我们的实验表明,DRAC在所有情况下(无敌对者、安全敌对者、性能敌对者)都能够达到高性能和持续的安全性,与所有基elines(baselines)相比,DRAC表现出色。

Learning From Drift: Federated Learning on Non-IID Data via Drift Regularization

  • paper_url: http://arxiv.org/abs/2309.07189
  • repo_url: None
  • paper_authors: Yeachan Kim, Bonggun Shin
  • for: 这个研究旨在提高 Federated Learning Algorithms 在异步变的环境中表现。
  • methods: 这个研究使用了 Regularization 技术来防止模型在异步变的环境中表现下降。Specifically, 这个方法包含 two key components: drift estimation 和 drift regularization。
  • results: 实验结果显示,LfD 方法在五个 Federated Learning 方面表现出色:Generalization, Heterogeneity, Scalability, Forgetting, 和 Efficiency。Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data。
    Abstract Federated learning algorithms perform reasonably well on independent and identically distributed (IID) data. They, on the other hand, suffer greatly from heterogeneous environments, i.e., Non-IID data. Despite the fact that many research projects have been done to address this issue, recent findings indicate that they are still sub-optimal when compared to training on IID data. In this work, we carefully analyze the existing methods in heterogeneous environments. Interestingly, we find that regularizing the classifier's outputs is quite effective in preventing performance degradation on Non-IID data. Motivated by this, we propose Learning from Drift (LfD), a novel method for effectively training the model in heterogeneous settings. Our scheme encapsulates two key components: drift estimation and drift regularization. Specifically, LfD first estimates how different the local model is from the global model (i.e., drift). The local model is then regularized such that it does not fall in the direction of the estimated drift. In the experiment, we evaluate each method through the lens of the five aspects of federated learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and Efficiency. Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data.
    摘要 联邦学习算法在独立且同分布(IID)数据上表现良好。然而,在不同环境下(Non-IID)时,它们受到严重的影响,Recent research has shown that they are still suboptimal compared to training on IID data. In this work, we carefully analyze the existing methods in heterogeneous environments. Interestingly, we find that regularizing the classifier's outputs is quite effective in preventing performance degradation on Non-IID data. Motivated by this, we propose Learning from Drift (LfD), a novel method for effectively training the model in heterogeneous settings. Our scheme encapsulates two key components: drift estimation and drift regularization. Specifically, LfD first estimates how different the local model is from the global model (i.e., drift). The local model is then regularized such that it does not fall in the direction of the estimated drift. In the experiment, we evaluate each method through the lens of the five aspects of federated learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and Efficiency. Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data.

Electricity Demand Forecasting through Natural Language Processing with Long Short-Term Memory Networks

  • paper_url: http://arxiv.org/abs/2309.06793
  • repo_url: None
  • paper_authors: Yun Bai, Simon Camal, Andrea Michiorri
  • for: 本研究旨在提高英国国家电力需求预测的性能,通过 incorporating 文本新闻特征。
  • methods: 本研究使用长短Term Memory(LSTM)网络,并将文本新闻特征与历史荷载、天气预报、历法信息和知情事件结合使用。
  • results: 实验结果显示,公众情绪和交通和地政受到的词归档表示在电力需求中有时间连续性作用,LSTM与文本特征组合比对chmark平台上的LSTM准确预测3%左右,比官方benchmark的预测10%左右。此外,提出的模型可以有效减少预测不确定性,缩短预测分布和减少预测不确定性。
    Abstract Electricity demand forecasting is a well established research field. Usually this task is performed considering historical loads, weather forecasts, calendar information and known major events. Recently attention has been given on the possible use of new sources of information from textual news in order to improve the performance of these predictions. This paper proposes a Long and Short-Term Memory (LSTM) network incorporating textual news features that successfully predicts the deterministic and probabilistic tasks of the UK national electricity demand. The study finds that public sentiment and word vector representations related to transport and geopolitics have time-continuity effects on electricity demand. The experimental results show that the LSTM with textual features improves by more than 3% compared to the pure LSTM benchmark and by close to 10% over the official benchmark. Furthermore, the proposed model effectively reduces forecasting uncertainty by narrowing the confidence interval and bringing the forecast distribution closer to the truth.
    摘要 电力需求预测是一个已有研究的领域。通常这项任务通过历史荷载、天气预报、历程信息和知道的主要事件来完成。近期关注到新的信息来源——文本新闻的可能性,以提高预测性能。本文提出一个使用长短期记忆(LSTM)网络,并将文本新闻特征纳入预测模型中,成功预测英国全国电力需求的 deterministic 和 probabilistic 任务。研究发现,公众情绪和交通和地opolitics 相关词 vector 具有时间连续性效应,对电力需求产生影响。实验结果显示,LSTM 与文本特征相加的模型,与纯度 LSTM 参考模型比进行了超过 3% 的提高,与官方参考模型比进行了接近 10% 的提高。此外,提议的模型还能够有效减少预测不确定性,缩短信息分布,使预测 Distribution 更加接近真实。

Scalable neural network models and terascale datasets for particle-flow reconstruction

  • paper_url: http://arxiv.org/abs/2309.06782
  • repo_url: None
  • paper_authors: Joosep Pata, Eric Wulff, Farouk Mokhtar, David Southwick, Mengke Zhang, Maria Girone, Javier Duarte
  • for: 这个论文是为了研究高能电子- пози特核反应中的全事件重建方法。
  • methods: 这个论文使用图 neural network和基于kernel的transformer来实现PF重建,两者都避免了 quadratic memory allocation和计算成本,同时实现了真实的PF重建。
  • results: 研究发现,通过超级计算机进行参数调整可以大幅提高物理性能,并且模型可以在不同的硬件处理器上高度可移植,支持Nvidia、AMD和Intel Habana卡。此外,模型可以在高精度输入上进行训练,并达到与基准相当的物理性能。
    Abstract We study scalable machine learning models for full event reconstruction in high-energy electron-positron collisions based on a highly granular detector simulation. Particle-flow (PF) reconstruction can be formulated as a supervised learning task using tracks and calorimeter clusters or hits. We compare a graph neural network and kernel-based transformer and demonstrate that both avoid quadratic memory allocation and computational cost while achieving realistic PF reconstruction. We show that hyperparameter tuning on a supercomputer significantly improves the physics performance of the models. We also demonstrate that the resulting model is highly portable across hardware processors, supporting Nvidia, AMD, and Intel Habana cards. Finally, we demonstrate that the model can be trained on highly granular inputs consisting of tracks and calorimeter hits, resulting in a competitive physics performance with the baseline. Datasets and software to reproduce the studies are published following the findable, accessible, interoperable, and reusable (FAIR) principles.
    摘要 我们研究可扩展机器学习模型以实现高能电子-正电子撞击中全事件重建,基于高度粒子化计算机模拟。流聚(PF)重建可以表示为监督学习任务,使用轨迹和calorimeter团结或触发。我们比较了图 neural network和基于核 kernel transformer,并证明它们可以避免平方根内存分配和计算成本,同时实现实际PF重建。我们表明在超级计算机上调整超参数可以显著提高物理性能。我们还示出了模型可以在不同的硬件处理器上高度可移植,支持Nvidia、AMD和Intel Habana卡。最后,我们示出了模型可以在高度粒子化输入上训练,并达到与基准相当的物理性能。我们发布了相关的数据集和软件,以便根据FAIR原则进行重用。

MCNS: Mining Causal Natural Structures Inside Time Series via A Novel Internal Causality Scheme

  • paper_url: http://arxiv.org/abs/2309.06739
  • repo_url: None
  • paper_authors: Yuanhao Liu, Dehui Du, Zihan Jiang, Anyan Huang, Yiyang Li
  • for: 本研究旨在探讨时序序列中的内在 causality,以提高神经网络(NN)的准确性和可读性。
  • methods: 该研究提出了一种名为 “Mining Causal Natural Structure”(MCNS)的新框架,可自动找到时序序列中的内在 causality,并将其应用于 NN 中。
  • results: 实验结果表明,通过使用 MCNS 框架和浸泡 NN WITH MCNS,可以提高时序序列分类任务的准确性和可读性,同时还能提供深入的时序序列和数据Summary。
    Abstract Causal inference permits us to discover covert relationships of various variables in time series. However, in most existing works, the variables mentioned above are the dimensions. The causality between dimensions could be cursory, which hinders the comprehension of the internal relationship and the benefit of the causal graph to the neural networks (NNs). In this paper, we find that causality exists not only outside but also inside the time series because it reflects a succession of events in the real world. It inspires us to seek the relationship between internal subsequences. However, the challenges are the hardship of discovering causality from subsequences and utilizing the causal natural structures to improve NNs. To address these challenges, we propose a novel framework called Mining Causal Natural Structure (MCNS), which is automatic and domain-agnostic and helps to find the causal natural structures inside time series via the internal causality scheme. We evaluate the MCNS framework and impregnation NN with MCNS on time series classification tasks. Experimental results illustrate that our impregnation, by refining attention, shape selection classification, and pruning datasets, drives NN, even the data itself preferable accuracy and interpretability. Besides, MCNS provides an in-depth, solid summary of the time series and datasets.
    摘要 causal inference 允许我们发现时序列中变量之间的隐藏关系。然而,现有大多数工作中的变量都是维度,这使得变量之间的相互关系受到忽视,从而阻碍了我们理解内部关系和使用 causal graph 进行神经网络(NN)的优化。在这篇论文中,我们发现时序列中的 causality 不仅存在于外部,而且还存在于内部,因为它反映了实际世界中的事件顺序。这使我们感兴趣寻找内部 subsequences 之间的关系。然而,挑战在于从 subsequences 中发现 causality 和使用 causal natural structure 来改进 NN。为了解决这些挑战,我们提出了一种新的框架,即 Mining Causal Natural Structure (MCNS),它是自动化和领域无关的。MCNS 可以在时序列中找到内部 causal natural structure,并通过 internal causality scheme 来帮助我们更好地理解时序列和数据集。我们对 MCNS 框架和将 MCNS 束入 NN 进行评估。实验结果表明,我们的束入,通过修改注意力、形态选择分类和减少数据集,使得 NN 的准确率和可读性得到了提高。此外,MCNS 还为时序列和数据集提供了深入、坚实的总结。

Bias Amplification Enhances Minority Group Performance

  • paper_url: http://arxiv.org/abs/2309.06717
  • repo_url: None
  • paper_authors: Gaotang Li, Jiarui Liu, Wei Hu
  • for: 提高模型对罕见 subgroup 的准确率,即使模型在平均 Label 上具有高准确率。
  • methods: 提出了一种两stage 训练算法 BAM,包括:首先通过引入每个训练样本的学习抽象变量进行偏好增强 scheme; 其次,对模型错误分类的样本进行权重提高,然后继续使用权重调整后的数据集进行训练。
  • results: 在计算机视觉和自然语言处理领域的假 correlate 测试 benchmark 上,BAM 实现了与现有方法相当的竞争性性能,并且提出了一个简单的停止 criterion,可以根据最小的类别准确率差来除除需要group注解。
    Abstract Neural networks produced by standard training are known to suffer from poor accuracy on rare subgroups despite achieving high accuracy on average, due to the correlations between certain spurious features and labels. Previous approaches based on worst-group loss minimization (e.g. Group-DRO) are effective in improving worse-group accuracy but require expensive group annotations for all the training samples. In this paper, we focus on the more challenging and realistic setting where group annotations are only available on a small validation set or are not available at all. We propose BAM, a novel two-stage training algorithm: in the first stage, the model is trained using a bias amplification scheme via introducing a learnable auxiliary variable for each training sample; in the second stage, we upweight the samples that the bias-amplified model misclassifies, and then continue training the same model on the reweighted dataset. Empirically, BAM achieves competitive performance compared with existing methods evaluated on spurious correlation benchmarks in computer vision and natural language processing. Moreover, we find a simple stopping criterion based on minimum class accuracy difference that can remove the need for group annotations, with little or no loss in worst-group accuracy. We perform extensive analyses and ablations to verify the effectiveness and robustness of our algorithm in varying class and group imbalance ratios.
    摘要 神经网络由标准训练生成的通常会受到罕见 subgroup 的准确率低下,即使它们在平均准确率高,因为某些干扰特征和标签之间的相关性。先前的方法(如 Group-DRO)可以提高罕见 subgroup 的准确率,但它们需要训练样本中的所有样本都有群标注。在这篇论文中,我们关注到更加挑战和现实的设定,即群标注只有小 Validation set 中或完全无法获得。我们提出了 BAM,一种新的两阶段训练算法:在第一阶段,模型通过引入每个训练样本的学习可读变量来进行偏好增强 schemes;在第二阶段,我们将模型对偏好增强后的样本进行重量增强,然后继续训练同样的模型。实际上,BAM 的性能与现有方法在假设分布 benchmark 上相似,并且我们发现一个简单的停止标准 Based on minimum class accuracy difference,可以消除群标注,而且影响最差群的准确率很小。我们进行了广泛的分析和减少来验证我们的算法在不同的类和群异常比例中的效果和稳定性。

Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithm

  • paper_url: http://arxiv.org/abs/2309.06710
  • repo_url: https://github.com/sadmanomee/ParetoCSP
  • paper_authors: Sadman Sadeed Omee, Lai Wei, Jianjun Hu
  • for: 预测晶体结构 (Crystal Structure Prediction, CSP) 问题的解决方案。
  • methods: combinatorial 多目标遗传算法 (MOGA) 和神经网络间原子 potential (IAP) 模型,用于给化学组成提供能量最优的晶体结构。 使用 NSGA-III 算法,并将生物体质因素作为独立优化因素,使用 M3GNet 通用 IAP 引导 GA 搜索。
  • results: 比GN-OA State-of-the-art 神经 potential 基于 CSP 算法,ParetoCSP 在 $55$ 个多样化 bench mark 结构上表现出色,在七个性能指标上比GN-OA 高出 $2.562$ 倍。 Trajectory 分析表明,ParetoCSP 生成了更多的有效结构,帮助 GA 更好地搜索优化结构。
    Abstract While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (MOGA) with a neural network inter-atomic potential (IAP) model to find energetically optimal crystal structures given chemical compositions. We enhance the NSGA-III algorithm by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal IAP to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $2.562$ across $55$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures
    摘要 While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (MOGA) with a neural network inter-atomic potential (IAP) model to find energetically optimal crystal structures given chemical compositions. We enhance the NSGA-III algorithm by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal IAP to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $2.562$ across $55$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures.Here's the translation in Traditional Chinese:而 crystal structure prediction (CSP) 仍然是一个长期的挑战,我们介绍 ParetoCSP,一个新的 CSP 算法,它结合了多个目标遗传算法 (MOGA) 和一个神经网络间原子 potential (IAP) 模型,以发现化学成分所给予的能量最佳晶体结构。我们将 NSGA-III 算法加以改进,包括将生物遗传年龄作为独立优化标准,并使用 M3GNet 通用 IAP 导引 GA 搜索。与 GN-OA,一个现有的神经可能性基于 CSP 算法,ParetoCSP 在 $55$ 个多样的参考结构上显示出明显更好的预测能力,在七个效能指标上出performances by a factor of $2.562$。探索所有算法的轨迹分析显示,ParetoCSP 产生了更多的有效结构,帮助 GA 更有效地寻找最佳结构。

Predicting Fatigue Crack Growth via Path Slicing and Re-Weighting

  • paper_url: http://arxiv.org/abs/2309.06708
  • repo_url: https://github.com/zhaoyj21/fcg
  • paper_authors: Yingjie Zhao, Yong Liu, Zhiping Xu
  • for: 预测结构元件的疲劳风险,即疲劳破坏的可能性,是工程设计中非常重要的一环。
  • methods: 本文使用统计学学习框架来预测疲劳裂 crack 的增长和元件的寿命。文中使用数字图书馆来构建疲劳裂 crack 的模式和剩余寿命。使用维度减少和神经网络架构来学习疲劳裂 crack 的历史依赖性和非线性。
  • results: 文中的预测方法可以准确地预测疲劳裂 crack 的增长和元件的寿命。示例中使用板件的疲劳裂 crack,验证了数字神经网络的实时结构健康监测和疲劳寿命预测方法。
    Abstract Predicting potential risks associated with the fatigue of key structural components is crucial in engineering design. However, fatigue often involves entangled complexities of material microstructures and service conditions, making diagnosis and prognosis of fatigue damage challenging. We report a statistical learning framework to predict the growth of fatigue cracks and the life-to-failure of the components under loading conditions with uncertainties. Digital libraries of fatigue crack patterns and the remaining life are constructed by high-fidelity physical simulations. Dimensionality reduction and neural network architectures are then used to learn the history dependence and nonlinearity of fatigue crack growth. Path-slicing and re-weighting techniques are introduced to handle the statistical noises and rare events. The predicted fatigue crack patterns are self-updated and self-corrected by the evolving crack patterns. The end-to-end approach is validated by representative examples with fatigue cracks in plates, which showcase the digital-twin scenario in real-time structural health monitoring and fatigue life prediction for maintenance management decision-making.
    摘要 预测结构元件的疲劳风险是工程设计中非常重要的一环。然而,疲劳往往具有材料微结构和服务条件之间的紧张关系,使诊断和预测疲劳损伤变得具有挑战性。我们报道了一种统计学学习框架,用于预测负荷下疲劳裂纹的增长和元件的寿命。通过高精度物理 simulations construct 疲劳裂纹和剩余寿命的数字图书馆。使用维度减少和神经网络架构,学习疲劳裂纹的历史依赖和非线性。使用路径排序和重新权重技术,处理统计噪音和罕见事件。预测的疲劳裂纹自动更新和自我更正,具有实时结构健康监测和疲劳寿命预测功能,为维护决策提供了数字 scenario。

Federated PAC-Bayesian Learning on Non-IID data

  • paper_url: http://arxiv.org/abs/2309.06683
  • repo_url: None
  • paper_authors: Zihao Zhao, Yang Liu, Wenbo Ding, Xiao-Ping Zhang
  • for: 本研究是为了提供非独立Identical Distributions(non-IID) Federated Learning(FL)中的 Probably Approximately Correct(PAC) Bayesian bounds。
  • methods: 本研究使用了唯一的前知识和变量汇集 веса,并提出了一个新的 Gibbs 算法来优化 derivated bound。
  • results: 研究 validate 了在实际数据上。I hope this helps! Let me know if you have any further questions or if you’d like me to translate anything else.
    Abstract Existing research has either adapted the Probably Approximately Correct (PAC) Bayesian framework for federated learning (FL) or used information-theoretic PAC-Bayesian bounds while introducing their theorems, but few considering the non-IID challenges in FL. Our work presents the first non-vacuous federated PAC-Bayesian bound tailored for non-IID local data. This bound assumes unique prior knowledge for each client and variable aggregation weights. We also introduce an objective function and an innovative Gibbs-based algorithm for the optimization of the derived bound. The results are validated on real-world datasets.
    摘要 转换文本为简化中文。<>现有研究 either 采用了 Probably Approximately Correct(PAC)抽象框架 для联合学习(FL),或使用信息理论PAC-抽象 bounds,但很少考虑非Identical和分布(non-IID)挑战。我们的工作提供了首个非虚无的联合PAC-抽象约束,特点是每个客户端和变量权重。我们还介绍了一个目标函数和一种创新的吉卜斯-based算法来优化 derive的约束。结果在实际数据上验证。

Generalizable improvement of the Spalart-Allmaras model through assimilation of experimental data

  • paper_url: http://arxiv.org/abs/2309.06679
  • repo_url: None
  • paper_authors: Deepinder Jot Singh Aulakh, Romit Maulik
  • for: 这个研究旨在使用模型和数据融合提高Spalart-Allmaras(SA)闭合模型的纳维-斯托克解(RANS)解析方法,尤其是在分离流场下。
  • methods: 我们使用了数据融合,即ensemble Kalman filtering(EnKF)方法,来调整SA模型中的系数,以改进 computational models 的性能。
  • results: 我们发现,使用单个流条件逆推缘(BFS)的实验数据进行calibration后,重新调整的SA模型能够在其他分离流场中表现出优化的结果,包括2D-缘和修改BFS等情况。此外,我们还发现SA模型在不同的外部流场中能够保持其能力,而且各个模型部分的调整targeted towards specific flow-physics。
    Abstract This study focuses on the use of model and data fusion for improving the Spalart-Allmaras (SA) closure model for Reynolds-averaged Navier-Stokes solutions of separated flows. In particular, our goal is to develop of models that not-only assimilate sparse experimental data to improve performance in computational models, but also generalize to unseen cases by recovering classical SA behavior. We achieve our goals using data assimilation, namely the Ensemble Kalman Filtering approach (EnKF), to calibrate the coefficients of the SA model for separated flows. A holistic calibration strategy is implemented via a parameterization of the production, diffusion, and destruction terms. This calibration relies on the assimilation of experimental data collected velocity profiles, skin friction, and pressure coefficients for separated flows. Despite using of observational data from a single flow condition around a backward-facing step (BFS), the recalibrated SA model demonstrates generalization to other separated flows, including cases such as the 2D-bump and modified BFS. Significant improvement is observed in the quantities of interest, i.e., skin friction coefficient ($C_f$) and pressure coefficient ($C_p$) for each flow tested. Finally, it is also demonstrated that the newly proposed model recovers SA proficiency for external, unseparated flows, such as flow around a NACA-0012 airfoil without any danger of extrapolation, and that the individually calibrated terms in the SA model are targeted towards specific flow-physics wherein the calibrated production term improves the re-circulation zone while destruction improves the recovery zone.
    摘要

Multi-step prediction of chlorophyll concentration based on Adaptive Graph-Temporal Convolutional Network with Series Decomposition

  • paper_url: http://arxiv.org/abs/2309.07187
  • repo_url: None
  • paper_authors: Ying Chen, Xiao Li, Hongbo Zhang, Wenyang Song, Chongxuan Xv
  • for: 本研究旨在预测水体中氨基酸浓度的变化趋势,以便为环境保护和水产业提供科学依据。
  • methods: 该研究提出了一种基于时间分解和自适应图 convolutional neural network (AGTCNSD) 预测模型,通过分解原始序列为趋势组件和周期组件,并使用图 convolutional neural network 模型水质参数数据,并使用矩阵分解法 assigning 参数权重。
  • results: 研究表明,该预测模型在用于预测氨基酸浓度的水质数据中表现出色,比其他方法更好。这可以作为环境管理决策的科学资源。
    Abstract Chlorophyll concentration can well reflect the nutritional status and algal blooms of water bodies, and is an important indicator for evaluating water quality. The prediction of chlorophyll concentration change trend is of great significance to environmental protection and aquaculture. However, there is a complex and indistinguishable nonlinear relationship between many factors affecting chlorophyll concentration. In order to effectively mine the nonlinear features contained in the data. This paper proposes a time-series decomposition adaptive graph-time convolutional network ( AGTCNSD ) prediction model. Firstly, the original sequence is decomposed into trend component and periodic component by moving average method. Secondly, based on the graph convolutional neural network, the water quality parameter data is modeled, and a parameter embedding matrix is defined. The idea of matrix decomposition is used to assign weight parameters to each node. The adaptive graph convolution learns the relationship between different water quality parameters, updates the state information of each parameter, and improves the learning ability of the update relationship between nodes. Finally, time dependence is captured by time convolution to achieve multi-step prediction of chlorophyll concentration. The validity of the model is verified by the water quality data of the coastal city Beihai. The results show that the prediction effect of this method is better than other methods. It can be used as a scientific resource for environmental management decision-making.
    摘要 <>TRANSLATE_TEXT氯生蓝苷含量可以很好地反映水体营养状况和藻类繁殖,是评估水质的重要指标。预测氯生蓝苷含量变化趋势对环境保护和水产业是非常重要的。然而,许多影响氯生蓝苷含量的因素之间存在复杂和不可分别的非线性关系。为了有效挖掘数据中的非线性特征,这篇论文提出了时间序列归一化适应图 convolutional neural network (AGTCNSD) 预测模型。首先,原始序列被分解成趋势组件和周期组件使用移动平均方法。其次,基于图 convolutional neural network,水质参数数据被建模,并定义了参数嵌入矩阵。使用矩阵分解的想法,将每个节点的参数重量分配给每个节点。适应图 convolution 学习了不同水质参数之间的关系,更新每个参数的状态信息,并提高了更新关系 между节点的学习能力。最后,使用时间归一化 convolution 捕捉时间依赖,实现多步预测氯生蓝苷含量。验证模型的有效性通过海 coastal city Beihai 的水质数据进行验证。结果显示,这种方法的预测效果比其他方法更好。它可以作为环境管理决策的科学资源。>>

Sound field decomposition based on two-stage neural networks

  • paper_url: http://arxiv.org/abs/2309.06661
  • repo_url: None
  • paper_authors: Ryo Matsuda, Makoto Otani
  • for: 该研究提出了一种基于神经网络的声场分解方法,用于实时声源定位和声场重建。
  • methods: 该方法包括两个阶段:声场分离阶段和单源定位阶段。在第一阶段,通过多个源的声压在 микро机上合成的声场被分离为每个声源所激发的一个。在第二阶段,通过微机上的声压组成单个声源的来源位置的 regression 来获得估计的位置,而不受精度的影响。
  • results: 数据集通过 simulations 生成,使用Green’s function,并对每个频率进行训练。实验表明,与传统方法相比,提出的方法可以实现更高的源位置准确率和声场重建准确率。
    Abstract A method for sound field decomposition based on neural networks is proposed. The method comprises two stages: a sound field separation stage and a single-source localization stage. In the first stage, the sound pressure at microphones synthesized by multiple sources is separated into one excited by each sound source. In the second stage, the source location is obtained as a regression from the sound pressure at microphones consisting of a single sound source. The estimated location is not affected by discretization because the second stage is designed as a regression rather than a classification. Datasets are generated by simulation using Green's function, and the neural network is trained for each frequency. Numerical experiments reveal that, compared with conventional methods, the proposed method can achieve higher source-localization accuracy and higher sound-field-reconstruction accuracy.
    摘要 提出了基于神经网络的声场分解方法。该方法包括两个阶段:声场分离阶段和单源地址确定阶段。在第一阶段,通过多个源的声压在 Mikrofone sintesized 被分离成每个声源促进的一个。在第二阶段,通过 Mikrofone 上的声压 regression 获取单个声源的位置,而不受精度的影响。数据集通过绿函数 simulations 生成,每个频率都进行神经网络训练。数字实验表明,相比 convent ional 方法,提出的方法可以 дости到更高的声源地址准确率和声场重建率。

Dissipative Imitation Learning for Discrete Dynamic Output Feedback Control with Sparse Data Sets

  • paper_url: http://arxiv.org/abs/2309.06658
  • repo_url: None
  • paper_authors: Amy K. Strong, Ethan J. LoCicero, Leila J. Bridgeman
  • for: 这篇论文旨在解决复杂目标和高度不确定的植入模型下的控制器 synthesis 问题,并提供稳定性保证。
  • methods: 论文使用输入输出稳定性方法,通过专家数据、粗略的输入输出植入模型和新的约束来学习一个关闭Loop稳定的动态输出反馈控制器。学习目标是非凸的,但是使用迭代凸上界法(ICO)和投影加速度下降(PGD)可以成功学习控制器。
  • results: 论文应用于两个未知的植入模型,与传统的动态输出反馈控制器和神经网络控制器进行比较。与少量数据和不知道植入模型的情况下,约束 constrained 学习的控制器可以在关闭Loop中稳定地运行,并成功模仿专家控制器的行为,而其他方法通常无法维持稳定性和达到良好的性能。
    Abstract Imitation learning enables the synthesis of controllers for complex objectives and highly uncertain plant models. However, methods to provide stability guarantees to imitation learned controllers often rely on large amounts of data and/or known plant models. In this paper, we explore an input-output (IO) stability approach to dissipative imitation learning, which achieves stability with sparse data sets and with little known about the plant model. A closed-loop stable dynamic output feedback controller is learned using expert data, a coarse IO plant model, and a new constraint to enforce dissipativity on the learned controller. While the learning objective is nonconvex, iterative convex overbounding (ICO) and projected gradient descent (PGD) are explored as methods to successfully learn the controller. This new imitation learning method is applied to two unknown plants and compared to traditionally learned dynamic output feedback controller and neural network controller. With little knowledge of the plant model and a small data set, the dissipativity constrained learned controller achieves closed loop stability and successfully mimics the behavior of the expert controller, while other methods often fail to maintain stability and achieve good performance.
    摘要 imitative学习可以实现控制器的合成,但是确保控制器的稳定性 часто需要大量数据和/或已知的植物模型。在这篇论文中,我们探索了输入输出(IO)稳定性方法,以达到稳定性的 guarantees WITH sparse data sets AND little known about the plant model。我们使用专家数据、粗略的IO植物模型和一个新的约束来学习一个闭环稳定的动态输出反馈控制器。虽然学习目标是非凸的,但我们使用迭代凸上界(ICO)和投影Gradient Descent(PGD)来成功地学习控制器。这种新的imitative学习方法在两个未知的植物上应用,与传统学习的动态输出反馈控制器和神经网络控制器进行比较。与其他方法不同的是,我们的方法可以在小量数据和未知植物模型的情况下实现closed-loop稳定性,并成功地模仿专家控制器的行为。

Out of Distribution Detection via Domain-Informed Gaussian Process State Space Models

  • paper_url: http://arxiv.org/abs/2309.06655
  • repo_url: None
  • paper_authors: Alonso Marco, Elias Morley, Claire J. Tomlin
  • for: 本文目的是提出一种基于学习的方法,使机器人在未经训练的情况下安全地探索 unknown 环境。
  • methods: 本文使用 Gaussian process state-space models (GPSSMs) 来识别不符合训练数据的情况,通过比较预测值与实际值进行识别。
  • results: 实验结果表明,通过嵌入域知识在核心中,可以提高 GPSSM 的预测质量,并且在实际四足机器人在室内环境中探索时,可以可靠地识别未经训练的地形。
    Abstract In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly distinguish between in- and out-of-training distribution observations hinges on the accuracy of these predictions, primarily affected by the class of functions the GPSSM kernel can represent. In this paper, we propose (i) a novel approach to embed existing domain knowledge in the kernel and (ii) an OoD online runtime monitor, based on receding-horizon predictions. Domain knowledge is provided in the form of a dataset, collected either in simulation or by using a nominal model. Numerical results show that the informed kernel yields better regression quality with smaller datasets, as compared to standard kernel choices. We demonstrate the effectiveness of the OoD monitor on a real quadruped navigating an indoor setting, which reliably classifies previously unseen terrains.
    摘要 为让机器人在未经训练的情况下安全地导航,使用学习基于方法是非常重要。在线上快速识别外部训练数据(OoD)的情况是非常重要的。在这篇论文中,我们提出了(i)一种将现有领域知识集成到kernel中的新方法,以及(ii)基于往返预测的OoD在线监控器。领域知识通过一个数据集,其中可以是 simulations或nominal模型中收集的。我们的数字结果显示,在小数据集情况下, Informed kernel可以提供更好的回归质量,与标准kernel相比。我们还证明了OoD监控器在实际四足机器人在室内环境中逻辑分类未经训练的地形。

ConR: Contrastive Regularizer for Deep Imbalanced Regression

  • paper_url: http://arxiv.org/abs/2309.06651
  • repo_url: https://github.com/borealisai/conr
  • paper_authors: Mahsa Keramati, Lili Meng, R. David Evans
  • for: 这篇论文的目的是解决深度学习中的不对称分布问题,尤其是在回归 зада中。
  • methods: 这篇论文提出了一个名为ConR的对称调节器,它模型了数据中的全局和本地关系,并避免了小数据点被嵌入到大数据点中。
  • results: 根据该论文的实验结果,ConR可以对深度学习回归 зада中的不对称分布问题进行有效地解决,并且与现有的方法相比,具有更好的性能。
    Abstract Imbalanced distributions are ubiquitous in real-world data. They create constraints on Deep Neural Networks to represent the minority labels and avoid bias towards majority labels. The extensive body of imbalanced approaches address categorical label spaces but fail to effectively extend to regression problems where the label space is continuous. Conversely, local and global correlations among continuous labels provide valuable insights towards effectively modelling relationships in feature space. In this work, we propose ConR, a contrastive regularizer that models global and local label similarities in feature space and prevents the features of minority samples from being collapsed into their majority neighbours. Serving the similarities of the predictions as an indicator of feature similarities, ConR discerns the dissagreements between the label space and feature space and imposes a penalty on these disagreements. ConR minds the continuous nature of label space with two main strategies in a contrastive manner: incorrect proximities are penalized proportionate to the label similarities and the correct ones are encouraged to model local similarities. ConR consolidates essential considerations into a generic, easy-to-integrate, and efficient method that effectively addresses deep imbalanced regression. Moreover, ConR is orthogonal to existing approaches and smoothly extends to uni- and multi-dimensional label spaces. Our comprehensive experiments show that ConR significantly boosts the performance of all the state-of-the-art methods on three large-scale deep imbalanced regression benchmarks. Our code is publicly available in https://github.com/BorealisAI/ConR.
    摘要 实际世界数据中很常见偏置分布。它们限制深度神经网络来表示少数标签,并避免对多数标签的偏袋。然而,现有的偏置方法主要集中在 categorical 标签空间上,并未能有效扩展到回归问题,其标签空间是连续的。相反,本文提出了 ConR,一种对准规化器,该模型在特征空间中模型全局和本地标签相似性,并防止少数样本的特征被推入多数样本的邻居中。通过将预测结果作为特征相似性的指标,ConR 可以识别标签和特征空间之间的不一致,并对这些不一致进行罚款。ConR 通过两种主要策略在对照方式进行处理: incorrect 邻接被罚款与标签相似性成比例,而正确的邻接被鼓励以模型本地相似性。ConR 汇集了重要考虑因素,并将其转化为一种通用、易于集成、高效的方法,可以有效地解决深层偏置回归问题。此外,ConR 与现有方法相互正交,可以顺利扩展到一维和多维标签空间。我们的实验表明,ConR 可以显著提高现有状态之前方法的性能在三个大规模深层偏置回归 benchmark 上。我们的代码可以在 上公开获取。

eess.IV - 2023-09-13

Temporal compressive edge imaging enabled by a lensless diffuser camera

  • paper_url: http://arxiv.org/abs/2309.07198
  • repo_url: None
  • paper_authors: Ze Zheng, Baolei Liu, Jiaqi Song, Lei Ding, Xiaolan Zhong, David Mcgloin, Fan Wang
  • for: 高维度图像取得和动态物体检测
  • methods: 使用干扰器或编码面镜来实现无镜头成像系统,并使用时间压缩Edge检测方法直接从单步测量中提取动态物体的时间序列Edge图像
  • results: 提高图像质量并减少后处理步骤,可以进一步发展为多种任务专业无镜头成像系统
    Abstract Lensless imagers based on diffusers or encoding masks enable high-dimensional imaging from a single shot measurement and have been applied in various applications. However, to further extract image information such as edge detection, conventional post-processing filtering operations are needed after the reconstruction of the original object images in the diffuser imaging systems. Here, we present the concept of a temporal compressive edge detection method based on a lensless diffuser camera, which can directly recover a time sequence of edge images of a moving object from a single-shot measurement, without further post-processing steps. Our approach provides higher image quality during edge detection, compared with the conventional post-processing method. We demonstrate the effectiveness of this approach by both numerical simulation and experiments. The proof-of-concept approach can be further developed with other image post-process operations or versatile computer vision assignments toward task-oriented intelligent lensless imaging systems.
    摘要 Diffuser 或编码面增强的镜头less imaging 技术可以实现高维度图像取得,并在不同应用中使用。然而,为了进一步提取图像信息,例如边检测,传统的后处理滤波操作是必要的。在这种情况下,我们提出了基于减压 diffuser 摄像机的时间压缩边检测方法,可以从单个测量中直接回收移动物体的时间序列边图像,无需进一步的后处理步骤。我们的方法可以提供更高的图像质量,比传统的后处理方法更高。我们通过数值模拟和实验证明了这种方法的有效性。这种概念可以进一步发展为任务导向的智能镜头less imaging 系统。

Improving HEVC Encoding of Rendered Video Data Using True Motion Information

  • paper_url: http://arxiv.org/abs/2309.06945
  • repo_url: None
  • paper_authors: Christian Herglotz, David Müller, Andreas Weinlich, Frank Bauer, Michael Ortner, Marc Stamminger, André Kaup
  • for: 提高计算机生成视频序列的编码过程
  • methods: 利用计算机生成的动态vector进行增强评估性能
  • results: 实现了3.78%的均值bitrate减少
    Abstract This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation methods, it is proposed to exploit the computer generated motion vectors to enhance the ratedistortion performance. To this end, a motion vector mapping method including disocclusion handling is presented. It is shown that mean rate savings of 3.78% can be achieved.
    摘要 这个论文表明,可以使用场景中对象的真实运动向量来改善计算机生成视频序列的编码过程。因此,一个集合的序列被提供,其中对应的对象的真实运动向量在渲染过程中按照每个像素基础生成。除了传统的运动估计方法外,还提议使用计算机生成的运动向量来提高评估性能。为此,一种运动向量映射方法,包括缺失处理,被提出。实验表明,可以获得3.78%的均值Rate savings。

Deep Learning-based Synthetic High-Resolution In-Depth Imaging Using an Attachable Dual-element Endoscopic Ultrasound Probe

  • paper_url: http://arxiv.org/abs/2309.06770
  • repo_url: None
  • paper_authors: Hah Min Lew, Jae Seong Kim, Moon Hwan Lee, Jaegeun Park, Sangyeon Youn, Hee Man Kim, Jihun Kim, Jae Youn Hwang
    for: This paper aims to provide clinicians with appropriate hardware specifications for precise diagnosis by enhancing the resolution and penetration depth of endoscopic ultrasound (EUS) imaging.methods: The proposed approach uses a novel deep learning-based high-resolution in-depth imaging probe that offers low- and high-frequency ultrasound image pairs. The probe is designed with customized low- and high-frequency ultrasound transducers and a special geared structure to enable the same image plane.results: The proposed approach was evaluated with a wire phantom and a tissue-mimicking phantom, and 442 ultrasound image pairs were acquired from the tissue-mimicking phantom. The results demonstrate the feasibility of the approach for providing synthetic high-resolution in-depth images deep inside tissues, and a suitable deep-learning model was identified for the task.
    Abstract Endoscopic ultrasound (EUS) imaging has a trade-off between resolution and penetration depth. By considering the in-vivo characteristics of human organs, it is necessary to provide clinicians with appropriate hardware specifications for precise diagnosis. Recently, super-resolution (SR) ultrasound imaging studies, including the SR task in deep learning fields, have been reported for enhancing ultrasound images. However, most of those studies did not consider ultrasound imaging natures, but rather they were conventional SR techniques based on downsampling of ultrasound images. In this study, we propose a novel deep learning-based high-resolution in-depth imaging probe capable of offering low- and high-frequency ultrasound image pairs. We developed an attachable dual-element EUS probe with customized low- and high-frequency ultrasound transducers under small hardware constraints. We also designed a special geared structure to enable the same image plane. The proposed system was evaluated with a wire phantom and a tissue-mimicking phantom. After the evaluation, 442 ultrasound image pairs from the tissue-mimicking phantom were acquired. We then applied several deep learning models to obtain synthetic high-resolution in-depth images, thus demonstrating the feasibility of our approach for clinical unmet needs. Furthermore, we quantitatively and qualitatively analyzed the results to find a suitable deep-learning model for our task. The obtained results demonstrate that our proposed dual-element EUS probe with an image-to-image translation network has the potential to provide synthetic high-frequency ultrasound images deep inside tissues.
    摘要 挺投射 ultrasound (EUS) 的分辨率和扩散深度之间存在负权衡。在考虑人体内部器官的实际特点下,为临床诊断提供合适的硬件参数是必要的。目前,使用深度学习技术进行超解像 (SR) ultrasound 图像修复的研究已经很多,但大多数这些研究并没有考虑 ultrasound 图像的特点,而是基于下采样的 ultrasound 图像进行 SR 技术。本研究提出了一种基于深度学习的高解像深度扫描仪,能够提供低频和高频 ultrasound 图像对。我们开发了一种可附加的双元 EUS 探针,其中包括特制的低频和高频 ultrasound 传感器。我们还设计了一种特殊的几何结构,以使得同一个图像平面。我们对用一束织物和一种组织模拟物进行评估。然后,我们获得了442个 ultrasound 图像对。我们应用了多种深度学习模型,以获得 sintetic 高解像深度图像,从而证明了我们的方法的可行性。此外,我们对结果进行了量化和质量分析,以选择适合我们任务的深度学习模型。获得的结果表明,我们的提议的双元 EUS 探针与嵌入式深度学习网络具有将高频 ultrasound 图像深入到组织中的潜力。

Improving Deep Learning-based Defect Detection on Window Frames with Image Processing Strategies

  • paper_url: http://arxiv.org/abs/2309.06731
  • repo_url: None
  • paper_authors: Jorge Vasquez, Hemant K. Sharma, Tomotake Furuhata, Kenji Shimada
  • for: The paper is written for researchers and manufacturers who are interested in using machine learning and computer vision techniques for defect detection in window frames, particularly in challenging environments like construction sites.
  • methods: The paper proposes a novel defect detection pipeline called InspectNet, which combines image enhancement and augmentation techniques with a pre-trained U-Net model for window frame defect detection and segmentation. The pipeline is designed to improve the accuracy of defect detection in challenging environments.
  • results: The paper presents the results of experiments conducted using a Spot Robot for window frame inspections, with 16 variations of the dataset constructed using different image augmentation settings. The results show that the proposed InspectNet pipeline outperformed other algorithms when image enhancement and augmentation techniques were applied, achieving an average Intersection over Union (IoU) value of 0.91 when using the best dataset.
    Abstract Detecting subtle defects in window frames, including dents and scratches, is vital for upholding product integrity and sustaining a positive brand perception. Conventional machine vision systems often struggle to identify these defects in challenging environments like construction sites. In contrast, modern vision systems leveraging machine and deep learning (DL) are emerging as potent tools, particularly for cosmetic inspections. However, the promise of DL is yet to be fully realized. A few manufacturers have established a clear strategy for AI integration in quality inspection, hindered mainly by issues like scarce clean datasets and environmental changes that compromise model accuracy. Addressing these challenges, our study presents an innovative approach that amplifies defect detection in DL models, even with constrained data resources. The paper proposes a new defect detection pipeline called InspectNet (IPT-enhanced UNET) that includes the best combination of image enhancement and augmentation techniques for pre-processing the dataset and a Unet model tuned for window frame defect detection and segmentation. Experiments were carried out using a Spot Robot doing window frame inspections . 16 variations of the dataset were constructed using different image augmentation settings. Results of the experiments revealed that, on average, across all proposed evaluation measures, Unet outperformed all other algorithms when IPT-enhanced augmentations were applied. In particular, when using the best dataset, the average Intersection over Union (IoU) values achieved were IPT-enhanced Unet, reaching 0.91 of mIoU.
    摘要 检测窗框中微型瑕疵,包括折叠和擦抹,是维护产品完整性和保持品牌形象的关键。传统的机器视觉系统经常在建筑现场中难以检测这些瑕疵。相比之下,现代视觉系统通过机器学习和深度学习(DL)在cosmetic检测方面表现出了强大的潜力。然而,DL的承诺仍未实现。一些制造商在质检中采用了明确的AI整合策略,主要受到数据质量缺乏和环境变化所迟缓。为了解决这些挑战,本研究提出了一种创新的瑕疵检测方法,包括适应数据资源的限制的图像增强和扩展技术,以及适应窗框瑕疵检测和分 segmentation的Unet模型。实验使用了一个Spot Robot进行窗框检测。根据不同的图像扩展设置,建立了16个变种的数据集。实验结果表明,在所有提议的评价指标中,Unet模型在IPT-加强的扩展设置下表现最佳,特别是在使用最佳数据集时,Unet模型的平均交集率(IoU)值达0.91。

eess.SP - 2023-09-13

Space-Time Adaptive Processing in Connected and Automated Vehicular Radar Platoons

  • paper_url: http://arxiv.org/abs/2309.07355
  • repo_url: None
  • paper_authors: Zahra Esmaeilbeig, Kumar Vijay Mishra, Mojtaba Soltanalian
  • for: 这个研究是为了开发一个适用于自动驾驶汽车(CAV)雷达系统的空间时间自适应处理(STAP)框架。
  • methods: 这个研究使用了时分多普逻(TDM)来实现发送器调度,以提高目标探测性能。
  • results: numerical experiments confirm that the optimized TDM is successful in enhancing the target detection performance.
    Abstract In this study, we develop a holistic framework for space-time adaptive processing (STAP) in connected and automated vehicle (CAV) radar systems. We investigate a CAV system consisting of multiple vehicles that transmit frequency-modulated continuous-waveforms (FMCW), thereby functioning as a multistatic radar. Direct application of STAP in a network of radar systems such as in a CAV may lead to excess interference. We exploit time division multiplexing (TDM) to perform transmitter scheduling over FMCW pulses to achieve high detection performance. The TDM design problem is formulated as a quadratic assignment problem which is tackled by power method-like iterations and applying the Hungarian algorithm for linear assignment in each iteration. Numerical experiments confirm that the optimized TDM is successful in enhancing the target detection performance.
    摘要 在本研究中,我们开发了一个整体框架 для空间时间适应处理(STAP)在连接自动汽车(CAV)雷达系统中。我们研究了一个包含多个车辆发射频分Modulated continuous-waveform(FMCW)的CAV系统,这些车辆functioning as a multistatic radar。直接在CAV系统中应用STAP可能会导致过度干扰。我们利用时分多路多播(TDM)来实现发射器调度,以实现高的检测性能。TDM设计问题被формализова为一个 quadratic assignment problem,这个问题被解决通过power method-like iterations和应用hungarian algorithm来实现线性分配。数值实验证明优化的TDM能够提高目标检测性能。

  • paper_url: http://arxiv.org/abs/2309.07299
  • repo_url: None
  • paper_authors: Alexander Vavoulas, Nicholas Vaiopoulos, Harilaos G. Sandalidis, Konstantinos K. Delibasis
  • for: 研究用户RandomLocation在半径形覆盖区域中的无线通信链接性能。
  • methods: 使用方差天线和高速铁路网络等实际场景,研究用户RandomLocation对无线通信链接性能的影响。
  • results: 研究发现,用户RandomLocation会对无线通信链接性能产生很大的影响,并提供了 relevante distance metrics和频繁TerminalLocation对连接性能的影响的分析。
    Abstract Wireless transmitters (Txs) radiating directionally downwards often generate circular footprints on the ground. In certain scenarios, using elliptical cells can offer increased flexibility for providing user coverage, owing to the unique network characteristics. For instance, an elliptical footprint can be produced when a practical directional antenna with unequal azimuth and elevation half-power beamwidths is used in high-speed railway networks. Another common scenario involves the production of an elliptical footprint when an airborne Tx radiates at an angle by tilting its directional antenna by a few degrees. This paper aims to investigate, for the first time, the association between the random user location within an elliptical coverage area and the performance of a wireless communication link by considering these scenarios. We assume an unmanned aerial vehicle (UAV) as a Tx, although a tall cellular base station tower could also be employed without losing generality. To better understand the impact of random location, we derive relevant distance metrics and investigate the outage probability of the link for the two scenarios, taking both random terminal location and fading impairments into account. The findings may provide valuable insights into the performance of similar wireless systems.
    摘要 无线发送器(Tx)通常在下方方向性发射,可能生成圆形覆盖区 на 地面。在某些情况下,使用椭圆细胞可以提供更多的用户覆盖,由于无线网络特有的特点。例如,在高速铁路网络中使用实用方向性扬声器时,可以生成椭圆覆盖区。另一种常见的enario是通过倾斜irectional扬声器几度来生成椭圆覆盖区。这篇论文旨在,对于首次研究用户随机位置在椭圆覆盖区内的无线通信链的相关性,investigate 具体情况。我们假设用无人机(UAV)作为发送器,可以使用高Cellular 基站塔也可以无损generality。为了更好地理解随机位置的影响,我们 derive 相关的距离指标,并对两种情况的频繁受到干扰的情况进行研究。我们的发现可能为类似无线系统的性能提供有价值的洞察。

Beamforming Design and Performance Evaluation for RIS-aided Localization using LEO Satellite Signals

  • paper_url: http://arxiv.org/abs/2309.07296
  • repo_url: None
  • paper_authors: Lei Wang, Pinjun Zheng, Xing Liu, Tarig Ballal, Tareq Y. Al-Naffouri
  • for: 本研究探讨了使用低地球轨道卫星信号和可变智能表面(RIS)实现的位置呈现方法。
  • methods: 本文使用了克拉默-瑞恩约束来 derivation of the Cramér-Rao bound of the considered localization problem,并提出了一种最优的RIS扫描设计,以最小化 derive bound。
  • results: numerical results demonstrate that the proposed beamforming scheme outperforms benchmark alternatives, and shows that the combination of LEO satellites and RISs has the potential to achieve localization accuracy at the meter or even sub-meter level.
    Abstract The growing availability of low-Earth orbit (LEO) satellites, coupled with the anticipated widespread deployment of reconfigurable intelligent surfaces (RISs), opens up promising prospects for new localization paradigms. This paper studies RIS-aided localization using LEO satellite signals. The Cram\'er-Rao bound of the considered localization problem is derived, based on which an optimal RIS beamforming design that minimizes the derived bound is proposed. Numerical results demonstrate the superiority of the proposed beamforming scheme over benchmark alternatives, while also revealing that the synergy between LEO satellites and RISs holds the promise of achieving localization accuracy at the meter or even sub-meter level.
    摘要 随着低地球轨道卫星(LEO)的可用性增加,以及预计的广泛部署智能表面(RIS),开启了新的地点化方案的可能性。本文研究了RIS帮助地点化使用LEO卫星信号。基于考虑的地点化问题的卡尔-拉奥 bounds,提出了最优的RIS扫描设计,以iminimize der bound。 numerically results表明提案的扫描方案比参照方案更优,而且还表明了LEO卫星和RIS的共同作用可以达到米级或以下的地点化精度。

Adaptive KalmanNet: Data-Driven Kalman Filter with Fast Adaptation

  • paper_url: http://arxiv.org/abs/2309.07016
  • repo_url: https://github.com/kalmannet/adaptive-knet-icassp24
  • paper_authors: Xiaoyong Ni, Guy Revach, Nir Shlezinger
  • for: 提高Partially known state space (SS)模型下的跟踪性能
  • methods: 结合Classical Kalman filter (KF)和深度神经网络 (DNN)
  • results: 能够适应SS模型变化而无需重新训练
    Abstract Combining the classical Kalman filter (KF) with a deep neural network (DNN) enables tracking in partially known state space (SS) models. A major limitation of current DNN-aided designs stems from the need to train them to filter data originating from a specific distribution and underlying SS model. Consequently, changes in the model parameters may require lengthy retraining. While the KF adapts through parameter tuning, the black-box nature of DNNs makes identifying tunable components difficult. Hence, we propose Adaptive KalmanNet (AKNet), a DNN-aided KF that can adapt to changes in the SS model without retraining. Inspired by recent advances in large language model fine-tuning paradigms, AKNet uses a compact hypernetwork to generate context-dependent modulation weights. Numerical evaluation shows that AKNet provides consistent state estimation performance across a continuous range of noise distributions, even when trained using data from limited noise settings.
    摘要 通过结合古典卡尔曼畸(KF)和深度神经网络(DNN),可以实现部分知道状态空间(SS)模型中的跟踪。现有的DNN帮助设计受到特定分布和下辖SS模型的训练所限制,因此,对模型参数的变化可能需要重新训练。而KF可以通过参数调整适应,但是DNN的黑obox性使得找到可调Component困难。因此,我们提出了适应型卡尔曼网(AKNet),一种可以在不需要重新训练的情况下适应SS模型变化。受到 latest Advances in large language model fine-tuning paradigms 的启发,AKNet使用了一个紧凑的 hypernetwork 生成状态dependent的权重模ulation。numerical evaluation表明,AKNet在不同噪音分布下提供了一致的状态估计性能,即使用limited noise Settingstrained。

  • paper_url: http://arxiv.org/abs/2309.06911
  • repo_url: None
  • paper_authors: Gilles Callebaut, Liang Liu, Thomas Eriksson, Liesbet Van der Perre, Ove Edfors, Christian Fager
  • for: 本研究的目的是为6G无线网络的发展提供基础设施。
  • methods: 本文提出了6G无线网络的研究方向和技术方法,包括新的应用和网络能效提高。
  • results: 本文描述了6G无线网络的测试基础设施的需求和趋势,以及其发展的方法。
    Abstract The proof of the pudding is in the eating - that is why 6G testbeds are essential in the progress towards the next generation of wireless networks. Theoretical research towards 6G wireless networks is proposing advanced technologies to serve new applications and drastically improve the energy performance of the network. Testbeds are indispensable to validate these new technologies under more realistic conditions. This paper clarifies the requirements for 6G radio testbeds, reveals trends, and introduces approaches towards their development.
    摘要 “Proof of the pudding is in the eating”,这句话告诉我们,在下一代无线网络的发展过程中,6G测试床是非常重要的。理论研究表明,6G无线网络将拥有新的应用和巨大提高网络能效性。但是,为了 validate these new technologies,我们需要在更真实的环境中进行测试。本文阐述了6G无线电测试床的需求,探讨了趋势,并介绍了其开发方法。

Intelligent Reflective Surface Assist Integrated Sensing and Wireless Power Transfer

  • paper_url: http://arxiv.org/abs/2309.06909
  • repo_url: None
  • paper_authors: Zheng Li, Zhengyu Zhu, Zheng Chu, Yingying Guan, De Mi, Fan Liu, Lie-Liang Yang
  • for: 本研究旨在探讨一种智能反射 superficie(IRS)协助的 интеGRATED 感知和无线电力传输(ISWPT)系统,其中发送器在交通基础设施网络上发送信号感知多个目标并同时传输多个能量收集设备(EHDs)。
  • methods: 本研究提出了一种joint 优化系统性能的方法,通过减少能量收集和感知之间的性能质量负担,并通过对射频相位调整和射频平面调整进行优化。
  • results: 研究结果验证了提出的算法,并 demonstarted IRS 可以帮助提高 ISWPT 系统的性能。
    Abstract Wireless sensing and wireless energy are enablers to pave the way for smart transportation and a greener future. In this paper, an intelligent reflecting surface (IRS) assisted integrated sensing and wireless power transfer (ISWPT) system is investigated, where the transmitter in transportation infrastructure networks sends signals to sense multiple targets and simultaneously to multiple energy harvesting devices (EHDs) to power them. In light of the performance tradeoff between energy harvesting and sensing, we propose to jointly optimize the system performance via optimizing the beamforming and IRS phase shift. However, the coupling of optimization variables makes the formulated problem non-convex. Thus, an alternative optimization approach is introduced and based on which two algorithms are proposed to solve the problem. Specifically, the first one involves a semi-definite program technique, while the second one features a low-complexity optimization algorithm based on successive convex approximation and majorization minimization. Our simulation results validate the proposed algorithms and demonstrate the advantages of using IRS to assist wireless power transfer in ISWPT systems.
    摘要 无线感知和无线能源是智能交通和绿色未来的推动者。本文研究了一种帮助器(IRS)协助的 интеGRATED 无线传输和感知系统,其中传输器在交通基础设施网络上发射信号感知多个目标并同时传输多个能量收集设备(EHD)以供电。为了解决能收集和感知性能之间的性能质量负担,我们提议同时优化系统性能via 磁场调制和IRS阶段偏移。但是,优化变量的协同关系使得问题变得非对称。因此,我们引入了一种代替优化方法,并基于这种方法提出了两种算法来解决问题。 Specifically, the first one involves a semi-definite program technique, while the second one features a low-complexity optimization algorithm based on successive convex approximation and majorization minimization. Our simulation results validate the proposed algorithms and demonstrate the advantages of using IRS to assist wireless power transfer in ISWPT systems.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

TTD Configurations for Near-Field Beamforming: Parallel, Serial, or Hybrid?

  • paper_url: http://arxiv.org/abs/2309.06861
  • repo_url: None
  • paper_authors: Zhaolin Wang, Xidong Mu, Yuanwei Liu, Robert Schober
  • For: The paper is written for hybrid beamforming architectures in wideband near-field communications, specifically addressing the spatial-wideband effect and the maximum time delay requirements of true-time delayers (TTDs).* Methods: The paper investigates two TTD configurations: serial and hybrid serial-parallel, and proposes a power equalization approach to address the cumulative insertion loss. It also studies the wideband near-field beamforming design for different configurations to maximize spectral efficiency in both single-user and multi-user systems.* Results: The paper derives a closed-form solution for the beamforming design in single-user systems and develops a penalty-based iterative algorithm for multi-user systems. Numerical results show that the proposed designs can significantly reduce the maximum time delays required for TTDs and that the hybrid configuration excels in single-user systems, while the serial configuration is preferred in multi-user systems.Here is the result in Simplified Chinese text:* For: 这篇论文是为宽频近场通信中的混合扫描 Architecture 而写的,具体来说是解决宽频近场通信中的空间宽频效应和真实时延器(TTD)的最大时延问题。* Methods: 论文 investigate了两种 TTD 配置:串行和混合串行平行,并提出了一种能量均衡approach来 Addressing the cumulative insertion loss. 它还研究了不同配置下的宽频近场扫描设计,以最大化单个用户和多个用户系统中的spectral efficiency。* Results: 论文 derive了一个关于单用户系统的closed-form解决方案,并开发了一种 penalty-based iterative algorithm来 Addressing the multi-user system. 数字结果表明,提posed设计可以减少 TTD 的最大时延 requirement,并且混合配置在单用户系统中表现出色,而串行配置在多用户系统中表现更好。
    Abstract True-time delayers (TTDs) are popular components for hybrid beamforming architectures to combat the spatial-wideband effect in wideband near-field communications. A serial and a hybrid serial-parallel TTD configuration are investigated for hybrid beamforming architectures. Compared to the conventional parallel configuration, the serial configuration exhibits a cumulative time delay through multiple TTDs, which potentially alleviates the maximum delay requirements on the TTDs. However, independent control of individual TTDs becomes impossible in the serial configuration. In this context, a hybrid TTD configuration is proposed as a compromise solution. Furthermore, a power equalization approach is proposed to address the cumulative insertion loss of the serial and hybrid TTD configurations. Moreover, the wideband near-field beamforming design for different configurations is studied for maximizing the spectral efficiency in both single-user and multiple-user systems. 1) For single-user systems, a closed-form solution for the beamforming design is derived. The preferred user locations and the required maximum time delay of each TTD configuration are characterized. 2) For multi-user systems, a penalty-based iterative algorithm is developed to obtain a stationary point of the spectral efficiency maximization problem for each TTD configuration. In addition, a mixed-forward-and-backward (MFB) implementation is proposed to enhance the performance of the serial configuration. Our numerical results confirm the effectiveness of the proposed designs and unveil that i) compared to the conventional parallel configuration, both the serial and hybrid configurations can significantly reduce the maximum time delays required for the TTDs and ii) the hybrid configuration excels in single-user systems, while the serial configuration is preferred in multi-user systems.
    摘要 true-time delayers (TTDs) 是各种混合扫描镜谱架构中的受欢迎组件,以避免频宽频段效应在宽频近场通信中。我们 investigate 了串行和混合串行平行 TTD 配置在混合扫描镜谱架构中。相比传统并行配置,串行配置会带来累加时延,从而减轻 TTD 的最大延迟要求。然而,在串行配置中独立控制个别 TTD 是不可能的。为此,我们提出了混合 TTD 配置作为妥协解决方案。此外,我们还提出了一种电力均衡方法,以解决串行和混合 TTD 配置中的累加插入损耗。此外,我们还研究了不同配置下的宽频近场扫描设计,以最大化单个用户和多个用户系统中的 spectral efficiency。1)对单用户系统,我们 derive 了一个关闭式解决方案,用于扫描设计。我们还Characterize 了最佳用户位置和每个 TTD 配置的最大时延要求。2)对多用户系统,我们提出了一种罚 penalty-based 迭代算法,以获得每个 TTD 配置的 spectral efficiency 的最大化站点。此外,我们还提出了一种混合前后 (MFB) 实现,以提高串行配置的性能。我们的数字结果表明,我们的设计方案具有有效性,并且发现:i)与传统并行配置相比, both 串行和混合配置可以减少 TTD 的最大时延要求。ii)在单用户系统中,混合配置 excel,而在多用户系统中,串行配置是更好的选择。

A Wearable Ultra-Low-Power sEMG-Triggered Ultrasound System for Long-Term Muscle Activity Monitoring

  • paper_url: http://arxiv.org/abs/2309.06851
  • repo_url: None
  • paper_authors: Sebastian Frey, Victor Kartsch, Christoph Leitner, Andrea Cossettini, Sergei Vostrikov, Simone Benatti, Luca Benini
    for:This paper aims to develop a wearable solution that integrates surface electromyography (sEMG) and ultrasound (US) to monitor muscle activity with low power consumption.methods:The proposed approach utilizes an EMG-driven wake-up approach to achieve ultra-low power consumption, where the US probe is kept in a sleep state when there is no muscle activity. sEMG data are processed on the probe to identify muscle activity and generate a trigger to wake-up the US counterpart.results:The proposed approach enables more than 59% energy saving compared to operating both sEMG and US continuously, with a full-system average power consumption of 12.2 mW.
    Abstract Surface electromyography (sEMG) is a well-established approach to monitor muscular activity on wearable and resource-constrained devices. However, when measuring deeper muscles, its low signal-to-noise ratio (SNR), high signal attenuation, and crosstalk degrade sensing performance. Ultrasound (US) complements sEMG effectively with its higher SNR at high penetration depths. In fact, combining US and sEMG improves the accuracy of muscle dynamic assessment, compared to using only one modality. However, the power envelope of US hardware is considerably higher than that of sEMG, thus inflating energy consumption and reducing the battery life. This work proposes a wearable solution that integrates both modalities and utilizes an EMG-driven wake-up approach to achieve ultra-low power consumption as needed for wearable long-term monitoring. We integrate two wearable state-of-the-art (SoA) US and ExG biosignal acquisition devices to acquire time-synchronized measurements of the short head of the biceps. To minimize power consumption, the US probe is kept in a sleep state when there is no muscle activity. sEMG data are processed on the probe (filtering, envelope extraction and thresholding) to identify muscle activity and generate a trigger to wake-up the US counterpart. The US acquisition starts before muscle fascicles displacement thanks to a triggering time faster than the electromechanical delay (30-100 ms) between the neuromuscular junction stimulation and the muscle contraction. Assuming a muscle contraction of 200 ms at a contraction rate of 1 Hz, the proposed approach enables more than 59% energy saving (with a full-system average power consumption of 12.2 mW) as compared to operating both sEMG and US continuously.
    摘要 表面电MYography (sEMG) 是一种已知的方法,用于在穿戴式和资源受限的设备上监测肌肉活动。然而,当测量深部肌肉时,其信号噪声比 (SNR) 低、信号强度减弱和交叠问题会降低感测性能。ultrasound (US) 可以帮助 sEMG 提高肌动评估的准确性,但 US 硬件的能量总量远高于 sEMG 的,从而导致能源消耗过高,降低电池寿命。这项工作提议了一种携带两种模式的穿戴设备,并使用 EMG 驱动的唤醒方法来实现低功耗consumption。我们将两种现代 SoA US 和 ExG 维度诊断设备集成,以同步获取Short head of the biceps 的时间测量。为了降低能源消耗,US 探针在没有肌动活动时被保持在睡眠状态。sEMG 数据在探针上进行处理(滤波、封顶EXTRACTION 和阈值处理),以识别肌动活动并生成触发器,唤醒 US 对应部分。US 获取在肌肉脱静前开始,因为诊断神经筋接触点刺激和肌肉收缩之间的电romechanical delay (30-100 ms) 。假设每分钟一次肌肉收缩 200 ms,则提议的方法可以比continuously 操作 sEMG 和 US 的能源消耗减少超过 59% (全系统平均功耗为 12.2 mW)。

Low-complexity hardware and algorithm for joint communication and sensing

  • paper_url: http://arxiv.org/abs/2309.06850
  • repo_url: None
  • paper_authors: Andrea Bedin, Shaghayegh Shahcheraghi, Traian E. Abrudan, Arash Asadi
    for:* 这篇论文旨在描述一种基于joint communication and sensing(JCAS)的算法,用于6G系统中提供快速、可靠的通信和精准的物理环境感知。methods:* 该算法使用了一种新的宽频分布式扫描架构,其结合了宽频分布式扫描和高频级数字扫描,以便精准地估算时射(ToA)和方向射(AoA)。results:* 对比2D多信号分类(2D-MUSIC)算法,该方法可以实现相似的性能,而且需要远少于总汇准采样率和复杂性。
    Abstract Joint Communication and Sensing (JCAS) is foreseen as one very distinctive feature of the emerging 6G systems providing, in addition to fast end reliable communication, the ability to obtain an accurate perception of the physical environment. In this paper, we propose a JCAS algorithm that exploits a novel beamforming architecture, which features a combination of wideband analog and narrowband digital beamforming. This allows accurate estimation of Time of Arrival (ToA), exploiting the large bandwidth and Angle of Arrival (AoA), exploiting the high-rank digital beamforming. In our proposal, we separately estimate the ToA and AoA. The association between ToA and AoA is solved by acquiring multiple non-coherent frames and adding up the signal from each frame such that a specific component is combined coherently before the AoA estimation. Consequently, this removes the need to use 2D and 3D joint estimation methods, thus significantly lowering complexity. The resolution performance of the method is compared with that of 2D MUltiple SIgnal Classification (2D-MUSIC) algorithm, using a fully-digital wideband beamforming architecture. The results show that the proposed method can achieve performance similar to a fully-digital high-bandwidth system, while requiring a fraction of the total aggregate sampling rate and having much lower complexity.
    摘要 joint communication and sensing (JCAS) 是6G系统的一个特点,它提供了快速、可靠的通信以及精准地理环境的感知。在这篇论文中,我们提出了一种JCAS算法,利用了一种新的ibeamforming架构,这种架构组合了宽频Analog和窄频数字ibeamforming。这allowaccurately estimating Time of Arrival (ToA), exploiting the large bandwidth and Angle of Arrival (AoA), exploiting the high-rank digital beamforming。在我们的提议中,我们分别估算了ToA和AoA。将ToA和AoA相关联的问题解决通过在每帧中获取多个非协调的帧,并将每帧的信号相加,以便在AoA估算中具有一个固定的组合。这样,我们可以简化复杂性,而不需要使用2D和3D的共同估算方法。我们对方法的分辨率性能进行了比较,使用了一种完全数字宽频ibeamforming架构,与2D MUltiple SIgnal Classification (2D-MUSIC)算法进行比较。结果表明,我们的方法可以达到与高频率数字系统相同的性能,而且需要的总总采样率和复杂性比较低。

Reliability-Latency-Rate Tradeoff in Low-Latency Communications with Finite-Blocklength Coding

  • paper_url: http://arxiv.org/abs/2309.06769
  • repo_url: None
  • paper_authors: Lintao Li, Wei Chen, Petar Popovski, Khaled B. Letaief
  • For: 这篇论文主要研究了具有最小延迟要求的无线通信中的可靠性-延迟-率质量交易。它探讨了在有限块长编码(FBC)下的可靠性-延迟-率交易的基本质量,特别是错误概率、延迟违反概率(DVP)和服务率之间的交易。* Methods: 本论文使用了有效容量(EC)和正常近似来约束可靠性-延迟-率交易的质量。它还提出了一种EC-近似方法来评估受到质量约束的低延迟通信中的可靠性-延迟-率交易。* Results: 研究发现,在AWGN和RAYLEIGH抽象通道上,在允许的最大干扰电平下,可靠性-延迟-率交易存在一定的质量交易。此外,在 Nakagami-$m$ 抽象通道上,可靠性-延迟-率交易也存在一定的质量交易,但与AWGN和RAYLEIGH抽象通道相比,它的质量交易较差。在延迟传输中,如果延迟阈值大于通道准确时间,则可靠性-延迟-率交易存在一定的极限形式。
    Abstract Low-latency communication plays an increasingly important role in delay-sensitive applications by ensuring the real-time exchange of information. However, due to the constraints on the maximum instantaneous power, bounded latency is hard to be guaranteed. In this paper, we investigate the reliability-latency-rate tradeoff in low-latency communications with finite-blocklength coding (FBC). More specifically, we are interested in the fundamental tradeoff between error probability, delay-violation probability (DVP), and service rate. Based on the effective capacity (EC) and normal approximation, we present several gain-conservation inequalities to bound the reliability-latency-rate tradeoffs. In particular, we investigate the low-latency transmissions over an additive white Gaussian noise (AWGN) channel, over a Rayleigh fading channel, with frequency or spatial diversity, and over a Nakagami-$m$ fading channel. To analytically evaluate the quality-of-service-constrained low-latency communications with FBC, an EC-approximation method is further conceived to derive the closed-form expression of quality-of-service-constrained throughput. For delay-sensitive transmissions in which the latency threshold is greater than the channel coherence time, we find an asymptotic form of the tradeoff between the error probability and DVP over the AWGN and Rayleigh fading channels. Our results may provide some insights into the efficient scheduling of low-latency wireless communications in which statistical latency and reliability metrics are adopted.
    摘要 低延迟通信在延迟敏感应用中扮演着越来越重要的角色,确保了实时信息交换。然而,由于最大幂值功率的限制,保证 bounded latency 很难。在这篇论文中,我们研究了低延迟通信中的可靠性-延迟率-速率贸易。更Specifically,我们对 error probability、延迟违约概率(DVP)和服务率之间的基本贸易进行了研究。基于有效容量(EC)和正常approximation,我们提出了多种可靠性-延迟率-速率贸易的上限。具体来说,我们研究了在加itive white Gaussian noise(AWGN)频道、Rayleigh折射频道、频率或空间多普通频道以及Nakagami-$m$ 折射频道上的低延迟传输。为了分析受限于服务质量的低延迟通信,我们提出了EC-approximation方法来 derive closed-form表达式。对于延迟敏感传输,我们发现在AWGN和Rayleigh折射频道上,随着延迟阈值的增加,error probability和DVP之间存在一定的 asymptotic 关系。我们的结果可能提供一些关于有效地调度低延迟无线通信的经验。

Globally Optimal Beamforming Design for Integrated Sensing and Communication Systems

  • paper_url: http://arxiv.org/abs/2309.06674
  • repo_url: None
  • paper_authors: Zhiguo Wang, Jiageng Wu, Ya-Feng Liu, Fan Liu
  • for: 这paper是为了提出一种多输入多输出(MIMO)扫描和多用户通信的共同优化模型,以优化扫描和通信的性能。
  • methods: 该paper使用了 McCormick 环境 relaksation 和 semi-definite relaxation(SDR)技术来解决形ulated problem,并提出了一种高效的全局分支和约束 bounds 算法来解决这个问题。
  • results: 该paper提出的算法可以 garantía finding the global solution for the considered problem, 并且可以作为现有的本地或半优化算法的性能评估标准。
    Abstract In this paper, we propose a multi-input multi-output (MIMO) beamforming transmit optimization model for joint radar sensing and multi-user communications, where the design of the beamformers is formulated as an optimization problem whose objective is a weighted combination of the sum rate and the Cram\'{e}r-Rao bound (CRB), subject to the transmit power budget constraint. The formulated problem is challenging to obtain a global solution, because the sum rate maximization (SRM) problem itself (even without considering the sensing metric) is known to be NP-hard. In this paper, we propose an efficient global branch-and-bound algorithm for solving the formulated problem based on the McCormick envelope relaxation and the semidefinite relaxation (SDR) technique. The proposed algorithm is guaranteed to find the global solution for the considered problem, and thus serves as an important benchmark for performance evaluation of the existing local or suboptimal algorithms for solving the same problem.
    摘要 在这篇论文中,我们提出了一种多输入多输出(MIMO)发射优化模型,用于同时进行雷达探测和多用户通信。在这个模型中,发射器的设计被设置为优化问题的目标,其目标是一种加权的总质量和克拉默-拉奥 bound(CRB)的函数,同时满足发射功率预算限制。这个问题是NP-困难的,因此寻找全局解决方案是非常困难的。在这篇论文中,我们提出了一种高效的全局分支和约束算法,基于McCormick环境缓和SDR技术来解决这个问题。这种算法能够确保找到全局解决方案,因此可以作为现有的本地或部分优化算法的性能评估标准。