2023-09-24

cs.LG

cs.LG - 2023-09-24

Design Principles of Robust Multi-Armed Bandit Framework in Video Recommendations

paper_url: http://arxiv.org/abs/2310.01419
repo_url: None
paper_authors: Belhassen Bayar, Phanideep Gampa, Ainur Yessenalina, Zhen Wen
for: 提出了一种新的多臂弓箭推荐系统设计原则，以抵御时变metadata信号的影响，避免item杀死和数据稀缺导致的弓箭模型异常。
methods: 提出了三种设计原则，包括：一、使用时变metadata信号进行适应；二、避免item杀死和数据稀缺导致弓箭模型异常；三、避免弓箭模型 weights 频繁变化。
results: 通过系列实验，证明了提出的设计原则的优势，包括：在ROC-AUC和PR-AUC中提高了相对增量达到11.88%和44.85%，并在推荐特定受欢迎和不受欢迎标题时保持了公平性。

Abstract
Current multi-armed bandit approaches in recommender systems (RS) have focused more on devising effective exploration techniques, while not adequately addressing common exploitation challenges related to distributional changes and item cannibalization. Little work exists to guide the design of robust bandit frameworks that can address these frequent challenges in RS. In this paper, we propose a new design principles to (i) make bandit models robust to time-variant metadata signals, (ii) less prone to item cannibalization, and (iii) prevent their weights fluctuating due to data sparsity. Through a series of experiments, we systematically examine the influence of several important bandit design choices. We demonstrate the advantage of our proposed design principles at making bandit models robust to dynamic behavioral changes through in-depth analyses. Noticeably, we show improved relative gain compared to a baseline bandit model not incorporating our design choices of up to $11.88\%$ and $44.85\%$, respectively in ROC-AUC and PR-AUC. Case studies about fairness in recommending specific popular and unpopular titles are presented, to demonstrate the robustness of our proposed design at addressing popularity biases.

摘要
当前多臂罂缸方法在推荐系统（RS）中更多地关注了发展有效探索技术，而不够注意常见的利用探索挑战，如分布变化和物品吃掉。现有的工作不够引导设计Robust罂缸框架，以解决这些常见挑战。在这篇论文中，我们提出了一些新的设计原则，以使罂缸模型更加Robust于时变元数据信号， menos可害性和数据稀缺性。通过一系列实验，我们系统地检验了一些重要的罂缸设计选择的影响。我们示出了我们提出的设计原则的优势，使罂缸模型更加Robust于动态行为变化，并提高了相对增量比例，分别为11.88%和44.85%。我们还对推荐特定受欢迎和不受欢迎标题的公平性进行了案例研究，以示出我们的设计方法能够解决受欢迎性偏见。

The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance

paper_url: http://arxiv.org/abs/2309.13775
repo_url: https://github.com/jdonnelly36/Rashomon_Importance_Distribution
paper_authors: Jon Donnelly, Srikar Katta, Cynthia Rudin, Edward P. Browne
for: 这paper的目的是提出一种新的变量重要性评估框架，以便在不同的模型和数据集中Quantifying variable importance，并且可以针对不同的数据分布进行稳定的评估。
methods: 这paper使用了一种新的变量重要性评估方法，它可以考虑所有可能的解释，并且可以在不同的数据分布下保持稳定性。这个方法可以与大多数现有的模型类型和全局变量重要性指标集成。
results: 实验表明，这paper的方法可以在复杂的模拟场景中成功地评估变量重要性，并且可以准确地估计变量重要性的真实排名。此外，这paper还提供了理论保证和 finite sample error rates的分析，以及一个实际案例研究，以证明这paper的方法在实际应用中的效用。

Abstract
Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available here.

摘要
量化变量重要性是解决高负荷问题的关键在遗传学、公共政策和医学等领域。现有的方法通常计算变量重要性为给定模型和给定数据集中的。然而，对于给定数据集，可能有多个模型都能够准确地预测目标结果，而不同的研究人员可能因数据而得出不同的、尚未得到证明的结论。此外，即使考虑所有可能的解释，这些发现也可能不会普适化，因为不同的数据变换可能会导致不同的优秀模型。我们提出了一个新的变量重要性框架，可以评估变量重要性在所有好的模型中，并且在数据分布下是稳定的。我们的框架非常灵活，可以与大多数现有的模型类型和全局变量重要性度量结合使用。我们通过实验表明，我们的框架可以在复杂的模拟设置中成功地重新分配变量重要性。此外，我们证明了我们的框架可以准确地估计变量重要性的真实值，并提供了理论保证变量重要性的一致性和 finite sample error rate。最后，我们通过一个实际的案例研究，探讨了抑制HIV荷重的关键基因，并发现了一个没有在HIV相关研究中受到过关注的重要基因。代码可以在这里找到。

Improving Robustness of Deep Convolutional Neural Networks via Multiresolution Learning

paper_url: http://arxiv.org/abs/2309.13752
repo_url: None
paper_authors: Hongyan Zhou, Yao Liang
for: 提高深度学习模型的鲁棒性，包括对1D信号和2D图像预测问题的鲁棒性。
methods: 使用多分辨率学习，并证明多分辨率学习可以显著提高深度学习模型的鲁棒性，包括随机噪声和敌意攻击的鲁棒性。
results: our results show that multiresolution learning can significantly improve the robustness of DNN models for both 1D signal and 2D signal (image) prediction problems, and that this improvement can be achieved with small training dataset size and without sacrificing standard accuracy.

Abstract
The current learning process of deep learning, regardless of any deep neural network (DNN) architecture and/or learning algorithm used, is essentially a single resolution training. We explore multiresolution learning and show that multiresolution learning can significantly improve robustness of DNN models for both 1D signal and 2D signal (image) prediction problems. We demonstrate this improvement in terms of both noise and adversarial robustness as well as with small training dataset size. Our results also suggest that it may not be necessary to trade standard accuracy for robustness with multiresolution learning, which is, interestingly, contrary to the observation obtained from the traditional single resolution learning setting.

摘要
当前深度学习的学习过程，无论使用哪种深度神经网络（DNN）架构和学习算法，都是单分辨率训练。我们研究多分辨率学习，并证明多分辨率学习可以显著提高深度神经网络模型对1D信号和2D图像预测问题的鲁棒性。我们通过对噪声和攻击性诊断的改进来证明这一点，同时也发现了训练集大小的影响。我们的结果还表明，在多分辨率学习Setting中，可能不需要在标准准确率和鲁棒性之间进行权衡，这与传统单分辨率学习Setting中所获得的观察相反。

Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling

paper_url: http://arxiv.org/abs/2309.15214
repo_url: None
paper_authors: Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Karthik Kashinath, Jan Kautz, Mike Pritchard
for: 这篇论文旨在提供一种可靠且cost-effective的物理隐患预测方法，以取代现有的 expensive km-scale数值 simulations。
methods: 这篇论文使用了一种叫做 ResDiff 的 two-step方法，其中第一步使用了一个（UNet）回归模型预测 Mean，第二步使用了一个扩散模型预测 Residual。这个方法可以对不同的物理过程和不同的尺度进行适当的调整。
results: 这篇论文的结果显示了 ResDiff 方法在 bulk RMSE 和 CRPS scores 方面表现出了鼓舞人的能力。它还可以实现准确地预测风暴中的重要力学特征，例如降水和风速的分布。case studies 也显示了 ResDiff 方法在不同的气候现象中的适当运作。

Abstract
The state of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs. Here, a km-scale downscaling diffusion model is presented as a cost effective alternative. The model is trained from a regional high-resolution weather model over Taiwan, and conditioned on ERA5 reanalysis data. To address the downscaling uncertainties, large resolution ratios (25km to 2km), different physics involved at different scales and predict channels that are not in the input data, we employ a two-step approach (\textit{ResDiff}) where a (UNet) regression predicts the mean in the first step and a diffusion model predicts the residual in the second step. \textit{ResDiff} exhibits encouraging skill in bulk RMSE and CRPS scores. The predicted spectra and distributions from ResDiff faithfully recover important power law relationships regulating damaging wind and rain extremes. Case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics. This includes the sharp wind and temperature variations that co-locate with intense rainfall in a cold front, and the extreme winds and rainfall bands that surround the eyewall of typhoons. Some evidence of simultaneous bias correction is found. A first attempt at downscaling directly from an operational global forecast model successfully retains many of these benefits. The implication is that a new era of fully end-to-end, global-to-regional machine learning weather prediction is likely near at hand.

摘要
现代物理危机预测技术需要使用高resolution数值 simulate，这些 simulate 通常需要很多的计算资源和高resolution的全球输入数据。在这篇文章中，我们提出了一种cost-effective的km级下采 diffusion模型，作为一种alternative。这个模型在台湾地区高resolution天气模型上训练，并使用ERA5分析数据进行条件。为了 Addressing downscaling uncertainties, we employ a two-step approach（ResDiff），其中一个（UNet）回归预报mean，而另一个是diffusion模型预报差异。ResDiff exhibits encouraging skill in bulk RMSE和CRPS分数。预测的spectrum和分布从ResDiff faithful recover了重要的power law关系，这些关系控制了wind和rain extrema的formation。case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics, such as the sharp wind and temperature variations that co-locate with intense rainfall in a cold front, and the extreme winds and rainfall bands that surround the eyewall of typhoons。有些证据表明同时进行偏差修正。我们首次尝试了直接从运行的全球预测模型下采，成功保留了大多数的优点。这表明一个全新的end-to-end, global-to-regional机器学习天气预测时代可能即将到来。

Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups

paper_url: http://arxiv.org/abs/2309.13736
repo_url: None
paper_authors: Kathlén Kohn, Anna-Laura Sattelberger, Vahid Shahverdi
for: investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group.
methods: explicit description of their dimension, degree, Euclidean distance degree, and singularities.
results: fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups, and prove that all invariant linear functions can be learned by linear autoencoders.Here’s the same information in Traditional Chinese:
for: 研究对应 Permutation group 的函数子集，包括对称函数和对称函数。
methods: 提供对应函数的精确描述，包括次数、度量、欧几何级数和缺陷。
results: 完全描述任意 Permutation group 的对称性，以及循环群的对称性，并证明所有对称函数可以通过线性自动化学习。

Abstract
The set of functions parameterized by a linear fully-connected neural network is a determinantal variety. We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group. Examples of such group actions are translations or $90^\circ$ rotations on images. For such equivariant or invariant subvarieties, we provide an explicit description of their dimension, their degree as well as their Euclidean distance degree, and their singularities. We fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups. We draw conclusions for the parameterization and the design of equivariant and invariant linear networks, such as a weight sharing property, and we prove that all invariant linear functions can be learned by linear autoencoders.

摘要
Set of functions parameterized by linear fully-connected neural network is a determinantal variety. We investigate subvariety of functions that are equivariant or invariant under action of permutation group. Examples of such group actions include translations or $90^\circ$ rotations on images. For such equivariant or invariant subvarieties, we provide explicit description of their dimension, degree, Euclidean distance degree, and singularities. We fully characterize invariance for arbitrary permutation groups and equivariance for cyclic groups. We draw conclusions for parameterization and design of equivariant and invariant linear networks, including weight sharing property, and prove that all invariant linear functions can be learned by linear autoencoders.

Towards Tuning-Free Minimum-Volume Nonnegative Matrix Factorization

paper_url: http://arxiv.org/abs/2309.13733
repo_url: None
paper_authors: Duc Toan Nguyen, Eric C. Chi
for: 这篇论文主要是为了探讨缺量矩阵因子分解（NMF）在数据矩阵中发现隐藏结构的方法。
methods: 这篇论文提出了一种基于最小体积的NMF方法，可以在噪声存在的情况下可靠地还原缺量矩阵。
results: 这篇论文提出了一种不需要选择医学参数的NMF方法，并且提供了一种基于主化最小化的逐步算法来实现。 employing this method, the authors show that the optimal choice of the tuning parameter is insensitive to the noise level in the data.

Abstract
Nonnegative Matrix Factorization (NMF) is a versatile and powerful tool for discovering latent structures in data matrices, with many variations proposed in the literature. Recently, Leplat et al.\@ (2019) introduced a minimum-volume NMF for the identifiable recovery of rank-deficient matrices in the presence of noise. The performance of their formulation, however, requires the selection of a tuning parameter whose optimal value depends on the unknown noise level. In this work, we propose an alternative formulation of minimum-volume NMF inspired by the square-root lasso and its tuning-free properties. Our formulation also requires the selection of a tuning parameter, but its optimal value does not depend on the noise level. To fit our NMF model, we propose a majorization-minimization (MM) algorithm that comes with global convergence guarantees. We show empirically that the optimal choice of our tuning parameter is insensitive to the noise level in the data.

摘要
非负矩阵分解（NMF）是一种多变性强大的工具，用于找到数据矩阵中隐藏的结构，文献中有多种提案。最近，Leplat等人（2019）提出了一种可 identificable 的 minimum-volume NMF，用于在噪声存在的情况下 recuperate 缺rank 矩阵。然而，其表现需要选择一个调整参数，该参数的优化值取决于未知的噪声水平。在这篇文章中，我们提出了一种基于平方减法和其调整参数不виси的 minimum-volume NMF 形式化。我们的形式化也需要选择一个调整参数，但该参数的优化值不取决于噪声水平。为了适应我们的 NMF 模型，我们提出了一种majorization-minimization（MM）算法，该算法来with global convergence guarantees。我们通过实验表明，我们的调整参数的优化值对噪声水平的影响不大。

Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense

paper_url: http://arxiv.org/abs/2309.13722
repo_url: None
paper_authors: Julia Ackermann, Arnulf Jentzen, Thomas Kruse, Benno Kuckuck, Joshua Lee Padgett
for: 这些深度学习方法用于近似高维partial differential equations（PDEs）的批处。
methods: 这些方法使用的是深度神经网络（DNNs），并且使用了ReLU激活函数。
results: 这些方法可以在PDEs中超越几何约束（COD），即计算量只增长为 polynomial 函数，而不是几何函数。这些方法还可以在$L^p$ norm下与高维PDE解决方法进行比较。

Abstract
Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed. The interest that these methods have generated in the literature is in large part due to simulations which appear to demonstrate that such DL methods have the capacity to overcome the curse of dimensionality (COD) for PDEs in the sense that the number of computational operations they require to achieve a certain approximation accuracy $\varepsilon\in(0,\infty)$ grows at most polynomially in the PDE dimension $d\in\mathbb N$ and the reciprocal of $\varepsilon$. While there is thus far no mathematical result that proves that one of such methods is indeed capable of overcoming the COD, there are now a number of rigorous results in the literature that show that deep neural networks (DNNs) have the expressive power to approximate PDE solutions without the COD in the sense that the number of parameters used to describe the approximating DNN grows at most polynomially in both the PDE dimension $d\in\mathbb N$ and the reciprocal of the approximation accuracy $\varepsilon>0$. Roughly speaking, in the literature it is has been proved for every $T>0$ that solutions $u_d\colon [0,T]\times\mathbb R^d\to \mathbb R$, $d\in\mathbb N$, of semilinear heat PDEs with Lipschitz continuous nonlinearities can be approximated by DNNs with ReLU activation at the terminal time in the $L^2$-sense without the COD provided that the initial value functions $\mathbb R^d\ni x\mapsto u_d(0,x)\in\mathbb R$, $d\in\mathbb N$, can be approximated by ReLU DNNs without the COD. It is the key contribution of this work to generalize this result by establishing this statement in the $L^p$-sense with $p\in(0,\infty)$ and by allowing the activation function to be more general covering the ReLU, the leaky ReLU, and the softplus activation functions as special cases.

摘要
近些时候，一些深度学习（DL）方法用于近似高维partial differential equations（PDEs）已经被提出。这些方法在文献中引起了广泛的关注，主要是因为这些方法可以在computational operations上减少高维维度的影响，即“掌数之咎”（COD）。虽然没有现有的数学结论证明其中一种方法可以完全超越COD，但现在有一些文献证明了深度神经网络（DNNs）具有表达力可以在PDE维度$d\in\mathbb N$和reciprocal of approximation accuracy $\varepsilon>0$之间 polynomially增长。粗略地说，在文献中已经证明了，对于任意$T>0$，solutions $u_d\colon [0,T]\times\mathbb R^d\to \mathbb R$, $d\in\mathbb N$, of semilinear heat PDEs with Lipschitz continuous nonlinearities可以通过DNNs with ReLU activation在终点时间 уровнем$L^2$上无COD的方式进行approximation， provided that the initial value functions $\mathbb R^d\ni x\mapsto u_d(0,x)\in\mathbb R$, $d\in\mathbb N$,可以通过ReLU DNNs without COD进行approximation。这是本研究的关键贡献，是通过将这个结论推广到$L^p$ norm中($p\in(0,\infty)$)，并允许activation function可以是更加一般的，涵盖ReLU、泄漏ReLU和softplus activation function的特殊情况。

Federated Deep Multi-View Clustering with Global Self-Supervision

paper_url: http://arxiv.org/abs/2309.13697
repo_url: None
paper_authors: Xinyue Chen, Jie Xu, Yazhou Ren, Xiaorong Pu, Ce Zhu, Xiaofeng Zhu, Zhifeng Hao, Lifang He
for: 本研究旨在Addressing the challenges of incomplete multi-view data in distributed environments, where label information is unknown and data privacy must be preserved.
methods: 我们提出了一种novel federated deep multi-view clustering方法，包括sample alignment和data extension技术，以探索多个视图中的 complementary cluster结构。在服务器环境中，我们提出了一种global prototype和global pseudo-label的分布方式，以帮助客户端学习自我supervised信息。在客户端环境中，多个客户端使用全球自我supervised信息和深度自适应神经网络来学习视图特定的归一化分类结果和嵌入特征，并将其上传到服务器进行自我supervised信息的修正。
results: 我们的广泛实验结果表明，我们提出的方法可以有效地解决多视图数据的不完整性和隐私担忧问题，并且表现出色。

Abstract
Federated multi-view clustering has the potential to learn a global clustering model from data distributed across multiple devices. In this setting, label information is unknown and data privacy must be preserved, leading to two major challenges. First, views on different clients often have feature heterogeneity, and mining their complementary cluster information is not trivial. Second, the storage and usage of data from multiple clients in a distributed environment can lead to incompleteness of multi-view data. To address these challenges, we propose a novel federated deep multi-view clustering method that can mine complementary cluster structures from multiple clients, while dealing with data incompleteness and privacy concerns. Specifically, in the server environment, we propose sample alignment and data extension techniques to explore the complementary cluster structures of multiple views. The server then distributes global prototypes and global pseudo-labels to each client as global self-supervised information. In the client environment, multiple clients use the global self-supervised information and deep autoencoders to learn view-specific cluster assignments and embedded features, which are then uploaded to the server for refining the global self-supervised information. Finally, the results of our extensive experiments demonstrate that our proposed method exhibits superior performance in addressing the challenges of incomplete multi-view data in distributed environments.

摘要
“联合多视角聚类”有可能从多个设备上的数据学习全球聚类模型。在这个设定下，标签信息未知，且需保持数据隐私，导致两个主要挑战。首先，不同客户的视野常有特征差异，采集其辅助聚类结构不单简。其次，在分布式环境中存储和使用多个客户的数据可能会导致多视角数据的不完整性。为解决这些挑战，我们提出了一个新的联合深度多视角聚类方法。在服务器环境中，我们提出了样本Alignment和数据扩展技术来探索多个视野之间的辅助聚类结构。服务器随后将全球原型和全球伪标给每个客户作为全球自我超级信息。在客户环境中，每个客户使用全球自我超级信息和深度自适应器来学习视野特定的聚类分配和嵌入特征，然后将结果上传到服务器进行改进全球自我超级信息。最后，我们的广泛实验结果显示，我们的提议方法在实际中处理多视角数据的不完整性时表现出色。”

Performance Evaluation of Equal-Weight Portfolio and Optimum Risk Portfolio on Indian Stocks

paper_url: http://arxiv.org/abs/2309.13696
repo_url: None
paper_authors: Abhiraj Sen, Jaydip Sen
for: 这个论文目的是为了设计一个最佳投资组合，使得投资组合的返回和风险得到优化。methods: 这篇论文使用了三种方法来设计投资组合，包括最小风险方法、最大返回方法和等权分配方法。results: 根据实际股票市场数据，这篇论文发现了三个投资组合，每个组合包括10家公司，可以最大化返回和最小化风险。这些组合的性能被评估于2022年1月1日至2022年12月31日的股票价格数据上，并与市场数据进行比较。

Abstract
Designing an optimum portfolio for allocating suitable weights to its constituent assets so that the return and risk associated with the portfolio are optimized is a computationally hard problem. The seminal work of Markowitz that attempted to solve the problem by estimating the future returns of the stocks is found to perform sub-optimally on real-world stock market data. This is because the estimation task becomes extremely challenging due to the stochastic and volatile nature of stock prices. This work illustrates three approaches to portfolio design minimizing the risk, optimizing the risk, and assigning equal weights to the stocks of a portfolio. Thirteen critical sectors listed on the National Stock Exchange (NSE) of India are first chosen. Three portfolios are designed following the above approaches choosing the top ten stocks from each sector based on their free-float market capitalization. The portfolios are designed using the historical prices of the stocks from Jan 1, 2017, to Dec 31, 2022. The portfolios are evaluated on the stock price data from Jan 1, 2022, to Dec 31, 2022. The performances of the portfolios are compared, and the portfolio yielding the higher return for each sector is identified.

摘要
设计最佳投资组合，以优化投资组合的回报和风险，是一个计算复杂的问题。markowitz的基础工作，尝试通过估算未来股票回报来解决问题，发现在实际股市数据上表现下相对较差。这是因为估算任务在股票价格的随机和波动性下变得极其困难。本文介绍了三种方法来设计投资组合，即最小化风险、最大化回报和均衡分配股票。选择了13个关键领域的上市公司（NSE）在印度股市。根据每个领域的自由悬挂市值，选择了每个领域的前十名股票。使用历史股票价格从2017年1月1日到2022年12月31日，设计了三个投资组合。对于2022年1月1日到2022年12月31日的股票价格，评估了投资组合的表现。对每个领域，比较了投资组合的表现，并标识出每个领域的最高回报投资组合。

Regularization and Optimal Multiclass Learning

paper_url: http://arxiv.org/abs/2309.13692
repo_url: None
paper_authors: Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
for: This paper is written to study the role of regularization in multiclass learning with arbitrary label sets, and to introduce optimal learning algorithms that incorporate regularization using one-inclusion graphs (OIGs).
methods: The paper uses OIGs to exhibit optimal learning algorithms that relax structural risk minimization on two dimensions: allowing the regularization function to be “local” to datapoints, and using an unsupervised learning stage to learn this regularizer at the outset. The paper also introduces a combinatorial sequence called the Hall complexity, which is the first to characterize a problem’s transductive error rate exactly.
results: The paper shows that the introduced optimal learner relaxes structural risk minimization on two dimensions and uses an unsupervised learning stage to learn a regularizer at the outset. The paper also demonstrates that an agnostic version of the Hall complexity characterizes error rates exactly, and exhibits an optimal learner using maximum entropy programs.

Abstract
The quintessential learning algorithm of empirical risk minimization (ERM) is known to fail in various settings for which uniform convergence does not characterize learning. It is therefore unsurprising that the practice of machine learning is rife with considerably richer algorithmic techniques for successfully controlling model capacity. Nevertheless, no such technique or principle has broken away from the pack to characterize optimal learning in these more general settings. The purpose of this work is to characterize the role of regularization in perhaps the simplest setting for which ERM fails: multiclass learning with arbitrary label sets. Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles: Occam's Razor as embodied by structural risk minimization (SRM), the principle of maximum entropy, and Bayesian reasoning. Most notably, we introduce an optimal learner which relaxes structural risk minimization on two dimensions: it allows the regularization function to be "local" to datapoints, and uses an unsupervised learning stage to learn this regularizer at the outset. We justify these relaxations by showing that they are necessary: removing either dimension fails to yield a near-optimal learner. We also extract from OIGs a combinatorial sequence we term the Hall complexity, which is the first to characterize a problem's transductive error rate exactly. Lastly, we introduce a generalization of OIGs and the transductive learning setting to the agnostic case, where we show that optimal orientations of Hamming graphs -- judged using nodes' outdegrees minus a system of node-dependent credits -- characterize optimal learners exactly. We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.

摘要
《 Quintessential 学习算法的实际风险最小化（ERM）在各种设置下失败，因此 machine learning 实践中的许多更加复杂的算法技巧成功地控制模型容量。然而，没有任何技巧或原则能够在更一般的设置下Characterize 优化学习。》本文的目的是在多类学习中使用一 inclusion 图（OIGs）来Characterize 识别器的角色，并使用structural risk minimization（SRM）、最大Entropy 原则和 Bayesian 思维来定义优化学习算法。我们介绍了一个优化学习算法，它在两个维度上relax 了SRM：允许正则化函数在数据点上本地化，并在无监督学习阶段使用一个不supervised 学习来学习这个正则化器。我们证明了这些relaxation 是必要的， otherwise 不能得到近似优化学习算法。我们还从 OIGs 中提取了一个 combinatorial sequence，我们称之为 Hall complexity，它可以 exactly Characterize 问题的推导性错误率。 finally，我们将 OIGs 和推导学习设定扩展到agnostic 情况下，并证明在这种情况下，optimal orientations of Hamming graphs（judged by nodes' outdegrees minus a system of node-dependent credits）Characterize 优化学习算法 exactly。我们还证明了agnostic 版本的 Hall complexity 可以 exactly Characterize 错误率，并展示了一个使用最大Entropy 程序的优化学习算法。

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

paper_url: http://arxiv.org/abs/2309.13681
repo_url: None
paper_authors: Guo-qing Jiang, Jinlong Liu, Zixiang Ding, Lin Guo, Wei Lin
for: 这paper的目的是提高大批量（LB）训练的通过率，但是训练LB任务经常遇到大的泛化差和下降最终精度，限制了扩大批量大小。
methods: 这paper提出了基于偏差信号噪声比（GSNR）的偏差减少技术（VRGD），并应用于流行的优化器 such as SGD/Adam/LARS/LAMB。 authors还进行了关于整体趋势的分析和一般化分析，以解释它的快速训练动态和更小的泛化差。
results: 实验表明，VRGD可以加速训练（1-2倍），缩小泛化差和提高最终精度。 authors推进BERT预训练的批量大小到128k/64k和DLRM到512k，而无需影响精度。 ImageNet Top-1准确率在96k上提高了0.52pp，比LARS更高。总的来说，这paper的研究可以大幅减少BERT和ImageNet训练中的泛化差。

Abstract
As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65\%$.

摘要
为了提高自然语言处理（NLP）、计算机视觉（CV）和推荐系统（RS）的训练效率，常常使用大量的GPUs/TPUs并行为大批（LB）训练。然而，这些LB任务的训练经常会遇到大的泛化差异和下降最终精度，从而限制扩大批处理的大小。在这种情况下，我们开发了基于梯度信号噪声比（GSNR）的减少梯度下降技术（VRGD），并应用于流行的优化器如SGD/Adam/LAMB/LARS。我们进行了理论分析的速度和稳定性，以及通用的泛化分析，以证明它的快速训练特性和更小的泛化差异。实验表明，VRGD可以提高训练速度（1-2倍），缩小泛化差异和提高最终精度。我们把BERT预训练的批处理大小提高到128k/64k，而DLRM的批处理大小提高到512k，无需注意到精度下降。在ImageNet顶层1任务中，我们提高了LARS的性能，比原来的LARS提高了0.52pp。总的来说，我们通过减少BERT和ImageNet训练中的泛化差异，将其降低了65%以上。

Topology-Agnostic Detection of Temporal Money Laundering Flows in Billion-Scale Transactions

paper_url: http://arxiv.org/abs/2309.13662
repo_url: None
paper_authors: Haseeb Tariq, Marwan Hassani
for: 这篇论文是为了探讨针对财务洗钱措施的检测系统弱点，并利用多个银行账户、层次化和转移 transactions 来隐藏财富的来源和流动。
methods: 该论文提出了一种名为 FaSTMAN 的框架，采用域特定约束适应而建立了一个时间图示 sequential transactions，并使用二阶 Graph 表示法来评估边的重要性。
results: 对于一个包含多个大 european 银行交易的数据集，该框架表现出了明显的高效性和实用性，在比较两种现有的探测恶意流动交易方法时。

Abstract
Money launderers exploit the weaknesses in detection systems by purposefully placing their ill-gotten money into multiple accounts, at different banks. That money is then layered and moved around among mule accounts to obscure the origin and the flow of transactions. Consequently, the money is integrated into the financial system without raising suspicion. Path finding algorithms that aim at tracking suspicious flows of money usually struggle with scale and complexity. Existing community detection techniques also fail to properly capture the time-dependent relationships. This is particularly evident when performing analytics over massive transaction graphs. We propose a framework (called FaSTMAN), adapted for domain-specific constraints, to efficiently construct a temporal graph of sequential transactions. The framework includes a weighting method, using 2nd order graph representation, to quantify the significance of the edges. This method enables us to distribute complex queries on smaller and densely connected networks of flows. Finally, based on those queries, we can effectively identify networks of suspicious flows. We extensively evaluate the scalability and the effectiveness of our framework against two state-of-the-art solutions for detecting suspicious flows of transactions. For a dataset of over 1 Billion transactions from multiple large European banks, the results show a clear superiority of our framework both in efficiency and usefulness.

摘要
贩卖洗钱者利用检测系统的弱点，故意将黑钱分布到多个帐户，不同银行的帐户中。然后将这笔钱层层转移，以隐藏起点和转移流动的关系。因此，黑钱能够融入金融系统，无需引起怀疑。跟踪款流的算法通常在规模和复杂性方面遇到困难。现有社区检测技术也无法正确捕捉时间关系。特别是在处理庞大交易图时，这些技术的表现很差。我们提出了一个名为FaSTMAN的框架，适应域pecific约束，以生成Sequential Transactions的 temporal graph。该框架包括一种Edge重量方法，使用二次graph表示法，以衡量边的重要性。这种方法允许我们将复杂的查询分配到更小的、紧密连接的网络上。最后，基于这些查询，我们可以有效地认定涉嫌的款流网络。我们对两个国际领先的检测涉嫌款流解决方案进行了广泛的评估，并对一个包含多个大European银行的交易数据集进行了广泛的测试。结果表明，我们的框架在效率和有用性方面具有明显的优势。

Fantastic Generalization Measures are Nowhere to be Found

paper_url: http://arxiv.org/abs/2309.13658
repo_url: None
paper_authors: Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger
for: 这个论文旨在探讨神经网络在过参数 Setting 下的泛化能力，以及相关的一般化 bound 的可能性。
methods: 作者使用了一些常见的一般化 bound 类型，包括依赖于训练集和输出学习算法的 bound，以及依赖于训练集和学习算法的稳定 bound。
results: 作者通过数学分析和实验研究发现，在过参数 Setting 下，无论使用哪种一般化 bound，都无法保证一般化能力的准确性。此外，如果学习算法在某些分布上具有良好的准确率，那么一般化 bound 就无法 uniformly 紧张。因此，作者结论认为，在过参数 Setting 下，一般化 bound 无法是紧张的，除非有适当的Assumption sobre 人口分布。

Abstract
Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize in the overparameterized setting. However, none of these bounds are tight. For instance, in their paper ``Fantastic Generalization Measures and Where to Find Them'', Jiang et al. (2020) examine more than a dozen generalization bounds, and show empirically that none of them imply guarantees that can explain the remarkable performance of neural networks. This raises the question of whether tight generalization bounds are at all possible. We consider two types of generalization bounds common in the literature: (1) bounds that depend on the training set and the output of the learning algorithm. There are multiple bounds of this type in the literature (e.g., norm-based and margin-based bounds), but we prove mathematically that no such bound can be uniformly tight in the overparameterized setting; (2) bounds that depend on the training set and on the learning algorithm (e.g., stability bounds). For these bounds, we show a trade-off between the algorithm's performance and the bound's tightness. Namely, if the algorithm achieves good accuracy on certain distributions in the overparameterized setting, then no generalization bound can be tight for it. We conclude that generalization bounds in the overparameterized setting cannot be tight without suitable assumptions on the population distribution.

摘要
很多通用 bound 在文献中被提出，以解释神经网络在过参数化设置下的泛化能力。然而，这些 bound 都不是紧张的。例如，在他们的 paper "Fantastic Generalization Measures and Where to Find Them" 中， Jiang et al. (2020) examine over a dozen generalization bound, and show empirically that none of them can provide guarantees that can explain the remarkable performance of neural networks.这引起了是否存在紧张的 generalization bound 的问题。我们考虑了文献中两种常见的 generalization bound：1. 依赖于训练集和学习算法的 bound。文献中有多种这类 bound（例如，norm-based和margin-based bound），但我们证明了在过参数化设置下，无法得到一个 uniformly 紧张的 bound。2. 依赖于训练集和学习算法的 bound。例如，stability bound。我们显示出在某些分布下，如果学习算法 achieves 良好的准确率，那么不可能有一个紧张的 bound。我们结论是，在过参数化设置下，generalization bound 不可能是紧张的，除非有适当的人口分布假设。

A Probabilistic Model for Data Redundancy in the Feature Domain

paper_url: http://arxiv.org/abs/2309.13657
repo_url: None
paper_authors: Ghurumuruhan Ganesan
for: 这个论文是用概率模型来估算大量数据中具有低相关性和低多相关性的特征数量。
methods: 这个论文使用概率方法来获得相同顺序的上下限，用于估算具有低相关性和低多相关性的特征集的大小。
results: 论文提供了一种用于估算大量数据中具有低相关性和低多相关性的特征集的方法，并证明了一个关于互助约束集的辅助结果，这结果是独立有价值的。

Abstract
In this paper, we use a probabilistic model to estimate the number of uncorrelated features in a large dataset. Our model allows for both pairwise feature correlation (collinearity) and interdependency of multiple features (multicollinearity) and we use the probabilistic method to obtain upper and lower bounds of the same order, for the size of a feature set that exhibits low collinearity and low multicollinearity. We also prove an auxiliary result regarding mutually good constrained sets that is of independent interest.

摘要
在这篇论文中，我们使用一种 probabilistic 模型来估计大数据集中具有低相关性的特征数量。我们的模型允许对特征之间的对比关系（杂相关）以及多个特征之间的相互关系（多相关），并使用 probabilistic 方法获取同样的订正范围，以便确定具有低相关性和低多相关性的特征集的大小。我们还证明了一个有益的副结果，即具有互助约束的特征集是独立有价值的。

REWAFL: Residual Energy and Wireless Aware Participant Selection for Efficient Federated Learning over Mobile Devices

paper_url: http://arxiv.org/abs/2309.13643
repo_url: None
paper_authors: Y. Li, X. Qin, J. Geng, R. Chen, Y. Hou, Y. Gong, M. Pan, P. Zhang
for: 本文旨在提高 federated learning（FL）训练的速度和效率，并且解决移动设备的剩余能量和无线传输速率的影响。
methods: 本文提出了一种基于剩余能量和无线传输速率的PS设计方法，其中引入了一个新的PS价值函数，该价值函数同时考虑了全局FL训练价值和本地能量价值。此外，本文还提出了一种基于REWAFL的剩余能量和无线传输速率aware的本地计算策略。
results: 实验结果表明，REWAFL可以提高训练精度和效率，同时避免移动设备”耗尽电池”的问题。

Abstract
Participant selection (PS) helps to accelerate federated learning (FL) convergence, which is essential for the practical deployment of FL over mobile devices. While most existing PS approaches focus on improving training accuracy and efficiency rather than residual energy of mobile devices, which fundamentally determines whether the selected devices can participate. Meanwhile, the impacts of mobile devices' heterogeneous wireless transmission rates on PS and FL training efficiency are largely ignored. Moreover, PS causes the staleness issue. Prior research exploits isolated functions to force long-neglected devices to participate, which is decoupled from original PS designs. In this paper, we propose a residual energy and wireless aware PS design for efficient FL training over mobile devices (REWAFL). REW AFL introduces a novel PS utility function that jointly considers global FL training utilities and local energy utility, which integrates energy consumption and residual battery energy of candidate mobile devices. Under the proposed PS utility function framework, REW AFL further presents a residual energy and wireless aware local computing policy. Besides, REWAFL buries the staleness solution into its utility function and local computing policy. The experimental results show that REW AFL is effective in improving training accuracy and efficiency, while avoiding "flat battery" of mobile devices.

摘要
In this paper, we propose a residual energy and wireless aware PS design for efficient FL training over mobile devices (REWAFL). The proposed PS utility function jointly considers global FL training utilities and local energy utility, which integrates energy consumption and residual battery energy of candidate mobile devices. Under the proposed PS utility function framework, REW AFL further presents a residual energy and wireless aware local computing policy. Moreover, REWAFL buries the staleness solution into its utility function and local computing policy.The experimental results show that REWAFL is effective in improving training accuracy and efficiency while avoiding "flat battery" of mobile devices.

Crack-Net: Prediction of Crack Propagation in Composites

paper_url: http://arxiv.org/abs/2309.13626
repo_url: None
paper_authors: Hao Xu, Wei Fan, Ambrose C. Taylor, Dongxiao Zhang, Lecheng Ruan, Rundong Shi
for: 这篇论文的目的是来提供一个基于深度学习的材料分子破坏预测模型，以便在结构应用中提高材料性能和微struktural设计。
methods: 这篇论文使用了一个名为Crack-Net的深度学习框架，这个框架可以模拟材料的破坏过程，并且可以考虑不同的微struktural设计。
results: 这篇论文的结果显示，Crack-Net可以高度准确地预测材料的破坏模式和压缩曲线，并且可以处理更复杂的微struktural设计。

Abstract
Computational solid mechanics has become an indispensable approach in engineering, and numerical investigation of fracture in composites is essential as composites are widely used in structural applications. Crack evolution in composites is the bridge to elucidate the relationship between the microstructure and fracture performance, but crack-based finite element methods are computationally expensive and time-consuming, limiting their application in computation-intensive scenarios. Here we propose a deep learning framework called Crack-Net, which incorporates the relationship between crack evolution and stress response to predict the fracture process in composites. Trained on a high-precision fracture development dataset generated using the phase field method, Crack-Net demonstrates a remarkable capability to accurately forecast the long-term evolution of crack growth patterns and the stress-strain curve for a given composite design. The Crack-Net captures the essential principle of crack growth, which enables it to handle more complex microstructures such as binary co-continuous structures. Moreover, transfer learning is adopted to further improve the generalization ability of Crack-Net for composite materials with reinforcements of different strengths. The proposed Crack-Net holds great promise for practical applications in engineering and materials science, in which accurate and efficient fracture prediction is crucial for optimizing material performance and microstructural design.

摘要
computation solid mechanics 已成为工程领域必备的方法，数字调查对于复合材料的裂解是必要的，因为复合材料广泛应用于结构应用。裂解进程中的裂解演化在复合材料中是关键，但是基于裂解的finite element方法 computationally expensive 和时间consuming，这限制了它们在 computation-intensive enario 中的应用。我们提出了一个深度学习框架，叫做Crack-Net，它包含了裂解演化和压力应答之间的关系，以预测复合材料的裂解过程。 Crack-Net 在一个高精度的裂解发展数据集上训练，该数据集使用阶段场方法生成。 Crack-Net 能够准确预测复合材料的长期裂解趋势和压力-弹簧曲线。 Crack-Net 捕捉了裂解的基本原理，因此可以处理更复杂的微结构，例如二元共晶结构。此外，我们采用了传输学习来进一步提高 Crack-Net 对于不同强度的增强材料的泛化能力。我们提出的 Crack-Net 具有实际应用的潜在价值，在工程和材料科学中，准确和高效地预测裂解是关键的，以便优化材料性能和微结构设计。

Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions

paper_url: http://arxiv.org/abs/2309.13618
repo_url: None
paper_authors: Dongjie Wang, Meng Xiao, Min Wu, Pengfei Wang, Yuanchun Zhou, Yanjie Fu
for: This paper aims to improve the efficiency and effectiveness of feature transformation for machine learning tasks by reformulating the discrete search space into a continuous optimization task.
methods: The proposed method includes four steps: (1) reinforcement-enhanced data preparation, (2) feature transformation operation sequence embedding, (3) gradient-steered optimal embedding search, and (4) transformation operation sequence reconstruction.
results: The proposed method is expected to fundamentally fill the gap between efficiency and stability/robustness in feature transformation, and to provide a more effective and efficient way to optimize feature transformation for machine learning tasks.

Abstract
Feature transformation aims to generate new pattern-discriminative feature space from original features to improve downstream machine learning (ML) task performances. However, the discrete search space for the optimal feature explosively grows on the basis of combinations of features and operations from low-order forms to high-order forms. Existing methods, such as exhaustive search, expansion reduction, evolutionary algorithms, reinforcement learning, and iterative greedy, suffer from large search space. Overly emphasizing efficiency in algorithm design usually sacrifices stability or robustness. To fundamentally fill this gap, we reformulate discrete feature transformation as a continuous space optimization task and develop an embedding-optimization-reconstruction framework. This framework includes four steps: 1) reinforcement-enhanced data preparation, aiming to prepare high-quality transformation-accuracy training data; 2) feature transformation operation sequence embedding, intending to encapsulate the knowledge of prepared training data within a continuous space; 3) gradient-steered optimal embedding search, dedicating to uncover potentially superior embeddings within the learned space; 4) transformation operation sequence reconstruction, striving to reproduce the feature transformation solution to pinpoint the optimal feature space.

摘要
<>功能转换targets于生成新的特征分布，以提高下游机器学习（ML）任务表现。然而，原始特征的逐渐扩展的搜索空间会急剧增长，从低阶形式到高阶形式。现有的方法，如枚举搜索、减少扩展、进化算法、强化学习和迭代蜂巢，都受到搜索空间的限制。强调效率在算法设计中通常会牺牲稳定性或可靠性。为了彻底填补这个差距，我们将离散特征转换重新定义为连续空间优化任务，并开发一个嵌入优化重建框架。这个框架包括以下四个步骤：1. 增强驱动数据准备，目的是为特征转换精度训练数据做准备;2. 特征转换操作序列嵌入，旨在将准备好的训练数据中的知识嵌入到连续空间中;3. 梯度导航优化搜索，旨在在学习空间中找到可能更高质量的嵌入;4. 特征转换操作序列重建，努力将特征转换解决方案复制到特定的特征空间，以确定最佳特征空间。

DPA-WNO: A gray box model for a class of stochastic mechanics problem

paper_url: http://arxiv.org/abs/2309.15128
repo_url: None
paper_authors: Tushar, Souvik Chakraborty
for: 解决数据驱动模型缺乏解释性、占据大量数据和不能泛化问题，提出了数据物理融合方法，并提出了一种新的可微分物理增强波лет神经网络操作器（DPA-WNO），将数据驱动模型和物理解决方法融合在一起，以便利用数据驱动模型学习 FROM 数据，同时保留物理解决方法的解释性和泛化能力。
methods: 提出的DPA-WNO结合了可微分物理解决方法和波лет神经网络操作器，使得该方法可以利用数据驱动模型学习 FROM 数据，同时保留物理解决方法的解释性和泛化能力。
results: 对四个不同领域的科学和工程中的时间不确定性量化和可靠性分析问题进行了解决，并得到了有趣的结果，表明该方法可以有效地解决数据驱动模型中的缺乏解释性、占据大量数据和不能泛化等问题。

Abstract
The well-known governing physics in science and engineering is often based on certain assumptions and approximations. Therefore, analyses and designs carried out based on these equations are also approximate. The emergence of data-driven models has, to a certain degree, addressed this challenge; however, the purely data-driven models often (a) lack interpretability, (b) are data-hungry, and (c) do not generalize beyond the training window. Operator learning has recently been proposed as a potential alternative to address the aforementioned challenges; however, the challenges are still persistent. We here argue that one of the possible solutions resides in data-physics fusion, where the data-driven model is used to correct/identify the missing physics. To that end, we propose a novel Differentiable Physics Augmented Wavelet Neural Operator (DPA-WNO). The proposed DPA-WNO blends a differentiable physics solver with the Wavelet Neural Operator (WNO), where the role of WNO is to model the missing physics. This empowers the proposed framework to exploit the capability of WNO to learn from data while retaining the interpretability and generalizability associated with physics-based solvers. We illustrate the applicability of the proposed approach in solving time-dependent uncertainty quantification problems due to randomness in the initial condition. Four benchmark uncertainty quantification and reliability analysis examples from various fields of science and engineering are solved using the proposed approach. The results presented illustrate interesting features of the proposed approach.

摘要
科学和工程中常见的管理物理是基于某些假设和简化的。因此，基于这些方程的分析和设计也是有误差的。数据驱动模型的出现有所解决了这个挑战，但是纯数据驱动模型常有两个缺点：一是不可解释性，二是吃掉数据。运维学学习已经被提议为可能的解决方案之一，但是这些挑战仍然存在。我们认为一种可能的解决方案在数据物理融合中，其中数据驱动模型用于 corrections/identification of missing physics。为此，我们提出了一种新的可微分物理增强波let神经网络算法（DPA-WNO）。我们的提议的DPA-WNO将一个可微分物理解决器与波let神经网络（WNO）融合在一起，WNO用于模拟缺失的物理。这使得我们的框架能够利用WNO从数据中学习，同时保持与物理基础模型相关的可解释性和泛化性。我们通过解决时间依赖不确定性量化和可靠性分析问题来证明提议的方法的可行性。我们在不同的科学和工程领域中使用提议的方法解决了四个标准不确定性量化和可靠性分析问题的例子。结果表明了我们的方法的有趣特点。

Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling

paper_url: http://arxiv.org/abs/2309.13593
repo_url: None
paper_authors: Henrik Christiansen, Federico Errica, Francesco Alesiani
for: 本研究旨在自动调整哈密顿 Monte Carlo 方法的参数，以便快速探索参数空间。
methods: 本研究使用了完全可微分的设置和反射传播来优化参数。 furthermore, an attention-like loss is defined to allow for the gradient-driven learning of the distribution of integration steps.
results: 我们在一维振荡子和艾莫对蛋白质中进行了实验，发现我们的损失和自参数的散度之间存在良好的对映，从而获得了快速参数的调整。

Abstract
The performance of Hamiltonian Monte Carlo crucially depends on its parameters, in particular the integration timestep and the number of integration steps. We present an adaptive general-purpose framework to automatically tune these parameters based on a loss function which promotes the fast exploration of phase-space. For this, we make use of a fully-differentiable set-up and use backpropagation for optimization. An attention-like loss is defined which allows for the gradient driven learning of the distribution of integration steps. We also highlight the importance of jittering for a smooth loss-surface. Our approach is demonstrated for the one-dimensional harmonic oscillator and alanine dipeptide, a small protein common as a test-case for simulation methods. We find a good correspondence between our loss and the autocorrelation times, resulting in well-tuned parameters for Hamiltonian Monte Carlo.

摘要
Hamiltonian Monte Carlo 的性能取决于它的参数，特别是 интеграル时步和integration step的数量。我们提出了一种自适应通用框架，通过损函数来促进快速探索phaspace的分布。我们利用了完全导数的设置，并使用反射进行优化。我们定义了一种注意力类损函数，允许通过梯度驱动学习的integration step的分布。我们 также强调了在损函数Surface上的缓冲作用。我们的方法在一维振荡体和 Alanine dipeptide 上进行了示例，发现我们的损函数和自相关时间之间存在良好的匹配，从而获得了良好地调整的 Hamiltonian Monte Carlo 参数。

Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity

paper_url: http://arxiv.org/abs/2309.13591
repo_url: None
paper_authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Rafaël Pinot, Geovani Rizk
for: 本研究旨在探讨鲁棒分布式学习算法的理论基础，以抵御邪恶机器学习模型。
methods: 本文使用(G,B)-梯度不同性模型来研究分布式学习下数据不均衡的情况，并提出了一种更加实际的不同性模型。
results: 本研究显示了现有理论下的学习误差下界不适用于实际场景中的数据不均衡情况，而且提出了一种新的下界。此外，我们还提出了一种robust变种的分布式梯度下降算法，并通过实验 validate our分析。

Abstract
The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.

摘要
Theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. However, under data heterogeneity, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.Note: The translation is done using a machine translation tool, and may not be perfect. Please let me know if you need any further assistance.

Physics Informed Neural Network Code for 2D Transient Problems (PINN-2DT) Compatible with Google Colab

paper_url: http://arxiv.org/abs/2310.03755
repo_url: None
paper_authors: Paweł Maczuga, Maciej Skoczeń, Przemysław Rożnawski, Filip Tłuszcz, Marcin Szubert, Marcin Łoś, Witold Dzwinel, Keshav Pingali, Maciej Paszyński
For: The paper presents an open-source Physics Informed Neural Network (PINN) environment for simulating transient phenomena on two-dimensional rectangular domains.* Methods: The PINN environment uses a neural network to solve time-dependent partial differential equations (PDEs) and supports various boundary conditions, including Neumann and Dirichlet conditions. It also allows for customization of the number of layers and neurons per layer, as well as for arbitrary activation functions.* Results: The PINN environment provides a simple interface for defining the residual loss, boundary condition, and initial loss, together with their weights. It also includes a library of problems, such as non-stationary heat transfer, wave equation modeling a tsunami, atmospheric simulations including thermal inversion, and tumor growth simulations.

Abstract
We present an open-source Physics Informed Neural Network environment for simulations of transient phenomena on two-dimensional rectangular domains, with the following features: (1) it is compatible with Google Colab which allows automatic execution on cloud environment; (2) it supports two dimensional time-dependent PDEs; (3) it provides simple interface for definition of the residual loss, boundary condition and initial loss, together with their weights; (4) it support Neumann and Dirichlet boundary conditions; (5) it allows for customizing the number of layers and neurons per layer, as well as for arbitrary activation function; (6) the learning rate and number of epochs are available as parameters; (7) it automatically differentiates PINN with respect to spatial and temporal variables; (8) it provides routines for plotting the convergence (with running average), initial conditions learnt, 2D and 3D snapshots from the simulation and movies (9) it includes a library of problems: (a) non-stationary heat transfer; (b) wave equation modeling a tsunami; (c) atmospheric simulations including thermal inversion; (d) tumor growth simulations.

摘要
我们提供一个开源的物理学 Informed Neural Network 环境，用于二维矩形领域上的脉冲现象模拟，其特点如下：1. 兼容 Google Colab，可以在云环境自动执行;2. 支持二维时间依赖的偏微分方程;3. 提供简单的接口 для定义剩余损失、边界条件和初始损失，以及其权重;4. 支持内壁和 Dirichlet 边界条件;5. 允许自定义层数和神经元数，以及任意活动函数;6. 学习率和迭代次数作为参数;7. 自动 differentiate PINN 对于空间和时间变量;8. 提供折线Plot 的初始条件、2D和3D 快照和电影等;9. 包含一个库，包括：a. 非站点热传输;b. 泪滤波方程模拟潮汐;c. 大气模拟，包括热层倒挪;d. 肿瘤增长模拟。

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

paper_url: http://arxiv.org/abs/2310.03032
repo_url: None
paper_authors: Cong Xu, Jun Wang, Jianyong Wang, Wei Zhang
for: 提高现代推荐系统中的嵌入性能，即虚拟实体的虚拟表示和后续决策模型的基础。
methods: 提出了一种新的嵌入更新机制——结构意识嵌入演化（SEvo），使相关节点在每步中进行相似演化。与传统的图神经网络（GNN）不同，SEvo可以直接将图结构信息注入嵌入，无需较大的计算开销。
results: SEvo可以提高推荐系统性能，并且可以轻松地与现有优化器结合使用。具体来说，SEvo可以在不同的模型和数据集上保持稳定的性能提升。

Abstract
Embedding plays a critical role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate part, SEvo is able to directly inject the graph structure information into embedding with negligible computational overhead in training. The convergence properties of SEvo as well as its possible variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. In particular, SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structure information beyond explicit GNN modules.

摘要
嵌入具有重要作用在现代推荐系统中，因为它们是虚拟世界实体的虚拟表示和后续决策模型的基础。在这篇论文中，我们提出了一种新的嵌入更新机制，即结构意识 embedding 演化（SEvo），以促进相关节点在每步中进行相似演化。与传统的 GNN（图 neural network）不同，SEvo 能够直接将图结构信息注入嵌入中，在训练中减少计算开销。我们还对 SEvo 的收敛性和可能的变体进行了理论分析，以证明其设计的有效性。此外，SEvo 可以轻松地与现有优化器结合使用，以实现最佳性能。例如，SEvo 加强的 AdamW WITH moment estimate correction 在不同的模型和数据集上都显示了稳定的改进表现，这表明了在图结构信息上超出 Explicit GNN 模块的新技术路径。

Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities

paper_url: http://arxiv.org/abs/2309.13536
repo_url: https://github.com/pittisl/fl-with-intertwined-heterogeneity
paper_authors: Haoming Wang, Wei Gao
for: 本研究旨在提高 Federated Learning (FL) 的效率，解决数据和设备不同性的问题。
methods: 本研究提出了一种新的 FL 框架，利用梯度反转技术将停滞的客户端模型更新转化为非停滞的模型更新。
results: 实验结果表明，在面临无限停滞的情况下，本研究的方法可以提高训练模型准确率达到 20%，并提高 FL 训练进度达到 35%。

Abstract
The efficiency of Federated Learning (FL) is often affected by both data and device heterogeneities. Data heterogeneity is defined as the heterogeneity of data distributions on different clients. Device heterogeneity is defined as the clients' variant latencies in uploading their local model updates due to heterogeneous conditions of local hardware resources, and causes the problem of staleness when being addressed by asynchronous FL. Traditional schemes of tackling the impact of staleness consider data and device heterogeneities as two separate and independent aspects in FL, but this assumption is unrealistic in many practical FL scenarios where data and device heterogeneities are intertwined. In these cases, traditional schemes of weighted aggregation in FL have been proved to be ineffective, and a better approach is to convert a stale model update into a non-stale one. In this paper, we present a new FL framework that leverages the gradient inversion technique for such conversion, hence efficiently tackling unlimited staleness in clients' model updates. Our basic idea is to use gradient inversion to get estimations of clients' local training data from their uploaded stale model updates, and use these estimations to compute non-stale client model updates. In this way, we address the problem of possible data quality drop when using gradient inversion, while still preserving the clients' local data privacy. We compared our approach with the existing FL strategies on mainstream datasets and models, and experiment results demonstrate that when tackling unlimited staleness, our approach can significantly improve the trained model accuracy by up to 20% and speed up the FL training progress by up to 35%.

摘要
受到数据和设备不同性的影响，联合学习（FL）的效率 часто受到数据和设备不同性的影响。数据不同性指的是客户端上的数据分布不同。设备不同性指的是客户端上的具有不同的本地硬件资源，导致异步FL Addressing staleness的问题。传统的FL方案将数据和设备不同性视为独立的两个方面，但这是在实际FL场景中不切实际的。在这些场景下，传统的权重汇集方法在FL中证明不效果，而一种更好的方法是将异步模型更新转换成非异步模型更新。在本文中，我们提出了一个新的FL框架，利用梯度反转技术来实现此类转换，从而高效地解决客户端模型更新中的无限异步问题。我们的基本思想是使用梯度反转获取客户端上传的异步模型更新中的本地训练数据估计，并使用这些估计来计算非异步客户端模型更新。这种方法可以解决使用梯度反转可能导致数据质量下降的问题，同时仍保持客户端本地数据隐私。我们与主流数据集和模型进行比较，实验结果表明，在面临无限异步情况下，我们的方法可以提高训练模型准确率达20%，并提高FL训练进度达35%。

Data-Driven Modeling of an Unsaturated Bentonite Buffer Model Test Under High Temperatures Using an Enhanced Axisymmetric Reproducing Kernel Particle Method

paper_url: http://arxiv.org/abs/2309.13519
repo_url: None
paper_authors: Jonghyuk Baek, Yanran Wang, Xiaolong He, Yu Lu, John S. McCartney, J. S. Chen
for: 研究高级核废物深层地ологиRepository中焊铁粉抑融解行为下的高温环境下的焊铁粉 buffer 的THM行为。
methods: 使用深度神经网络(DNN)来模拟焊铁粉的水含量曲线，并将其 integrate into a Reproducing Kernel Particle Method (RKPM) 进行THM simulations。
results: 通过模拟一个焊铁粉层在中央加热的 tank-scale 实验，提出了一种新的抽象基函数，以便更好地模拟焊铁粉的THM行为。

Abstract
In deep geological repositories for high level nuclear waste with close canister spacings, bentonite buffers can experience temperatures higher than 100 {\deg}C. In this range of extreme temperatures, phenomenological constitutive laws face limitations in capturing the thermo-hydro-mechanical (THM) behavior of the bentonite, since the pre-defined functional constitutive laws often lack generality and flexibility to capture a wide range of complex coupling phenomena as well as the effects of stress state and path dependency. In this work, a deep neural network (DNN)-based soil-water retention curve (SWRC) of bentonite is introduced and integrated into a Reproducing Kernel Particle Method (RKPM) for conducting THM simulations of the bentonite buffer. The DNN-SWRC model incorporates temperature as an additional input variable, allowing it to learn the relationship between suction and degree of saturation under the general non-isothermal condition, which is difficult to represent using a phenomenological SWRC. For effective modeling of the tank-scale test, new axisymmetric Reproducing Kernel basis functions enriched with singular Dirichlet enforcement representing heater placement and an effective convective heat transfer coefficient representing thin-layer composite tank construction are developed. The proposed method is demonstrated through the modeling of a tank-scale experiment involving a cylindrical layer of MX-80 bentonite exposed to central heating.

摘要
高度地储存核电废弃物的深层地储Repository中，бенто纳缓冲可能会面临高温（超过100℃），这个范围内的温度范围可能会导致现象学的定量关系不够捕捉潮湿-热-机械（THM）行为，因为现象学的定量关系通常缺乏普遍性和灵活性，无法捕捉各种复杂的交互效应以及压力状态和路径依赖的影响。在这种情况下，一种深度神经网络（DNN）基于的泥土水吸辊曲线（SWRC）模型被引入，并与基于 reproduce kernel particle method（RKPM）的THM模拟方法相结合。DNN-SWRC模型包含温度作为输入变量，以便学习在一般非同温度条件下湿度和吸附之间的关系，这是现象学SWRC难以表示的。为了有效地模拟储存试验，新的轴对称 reproduce kernel基函数，包括热器设置和热传递系数，被开发出来。该方法在一个筒形储存试验中，涉及一层MX-80泥土，在中央加热情况下进行模拟。

2023-09-24

eess.IV

eess.IV - 2023-09-24

Autopet Challenge 2023: nnUNet-based whole-body 3D PET-CT Tumour Segmentation

paper_url: http://arxiv.org/abs/2309.13675
repo_url: None
paper_authors: Anissa Alloula, Daniel R McGowan, Bartłomiej W. Papież
For: 这个论文的目的是用nnUNet进行全身PET-CT扫描中的肿瘤分 segmentation，并对不同的训练和后处理策略进行调查。* Methods: 这个论文使用的方法是nnUNet，并进行了不同的训练和后处理策略的调查。* Results: 这个论文的最佳模型在内部测试集上获得了69%的Dice分数和6.27 mL的假正和5.78 mL的假负量。

Abstract
Fluorodeoxyglucose Positron Emission Tomography (FDG-PET) combined with Computed Tomography (CT) scans are critical in oncology to the identification of solid tumours and the monitoring of their progression. However, precise and consistent lesion segmentation remains challenging, as manual segmentation is time-consuming and subject to intra- and inter-observer variability. Despite their promise, automated segmentation methods often struggle with false positive segmentation of regions of healthy metabolic activity, particularly when presented with such a complex range of tumours across the whole body. In this paper, we explore the application of the nnUNet to tumour segmentation of whole-body PET-CT scans and conduct different experiments on optimal training and post-processing strategies. Our best model obtains a Dice score of 69\% and a false negative and false positive volume of 6.27 and 5.78 mL respectively, on our internal test set. This model is submitted as part of the autoPET 2023 challenge. Our code is available at: https://github.com/anissa218/autopet\_nnunet

摘要
fluorodeoxyglucose positron emission tomography（FDG-PET）与计算机扫描（CT）扫描结合是肿瘤诊断和肿瘤进展评估中非常重要。然而，准确和一致性的肿瘤分割仍然是一项挑战，因为手动分割时间费时且存在内外观察者差异。尽管自动分割方法在承诺的表现不佳，特别是在面临整个身体的复杂肿瘤时，容易出现健康代谢活动的假阳性分割。在这篇论文中，我们探讨使用nnuNet进行肿瘤分割的整体PET-CT扫描，并进行了不同的训练和后处理策略的试验。我们的最佳模型在我们的内部测试集上得到了69%的Dice分数和6.27和5.78 mL的假阴性和假正面量。这个模型已经被提交到autoPET 2023挑战中。我们的代码可以在以下链接中找到：https://github.com/anissa218/autopet_nnunet。

Sparsity-regularized coded ptychography for robust and efficient lensless microscopy on a chip

paper_url: http://arxiv.org/abs/2309.13611
repo_url: None
paper_authors: Ninghe Liu, Qianhao Zhao, Guoan Zheng
for: 提高ptychographic imaging的快速性和高分辨率
methods: 使用稀烈约束来减少测量频率，并通过梯度下降法进行优化
results: 能够生成高精度的重建图像，只需要八个Intensity测量值，并且可以在各种光学设备上实验 validate

Abstract
In ptychographic imaging, the trade-off between the number of acquisitions and the resultant imaging quality presents a complex optimization problem. Increasing the number of acquisitions typically yields reconstructions with higher spatial resolution and finer details. Conversely, a reduction in measurement frequency often compromises the quality of the reconstructed images, manifesting as increased noise and coarser details. To address this challenge, we employ sparsity priors to reformulate the ptychographic reconstruction task as a total variation regularized optimization problem. We introduce a new computational framework, termed the ptychographic proximal total-variation (PPTV) solver, designed to integrate into existing ptychography settings without necessitating hardware modifications. Through comprehensive numerical simulations, we validate that PPTV-driven coded ptychography is capable of producing highly accurate reconstructions with a minimal set of eight intensity measurements. Convergence analysis further substantiates the robustness, stability, and computational feasibility of the proposed PPTV algorithm. Experimental results obtained from optical setups unequivocally demonstrate that the PPTV algorithm facilitates high-throughput, high-resolution imaging while significantly reducing the measurement burden. These findings indicate that the PPTV algorithm has the potential to substantially mitigate the resource-intensive requirements traditionally associated with high-quality ptychographic imaging, thereby offering a pathway toward the development of more compact and efficient ptychographic microscopy systems.

摘要
在ptychographic imaging中，数据量和图像质量之间的交换存在一个复杂的优化问题。增加数据量通常会提高图像的空间分辨率和细节，而减少测量频率则可能会丑化图像的重建效果，表现为增加杂变和粗化细节。为解决这个挑战，我们利用简约约束来修改ptychographic重建任务，将其转化为一个total variation regularized优化问题。我们提出了一种新的计算框架，名为ptychographic proximal total-variation（PPTV）解决方案，可以无需修改现有的ptychography设备。通过广泛的数字实验，我们验证了PPTV驱动的coded ptychography可以生成高精度的重建图像，只需要八个Intensity测量。对于PPTV算法的收敛分析，我们进一步证明了其稳定性、计算可行性和robustness。实验结果表明，PPTV算法可以提高高速、高分辨率的图像重建，同时减少测量负担。这些发现表明，PPTV算法有可能大幅减少传统ptychographic imaging中的资源占用，从而开 up a new pathway towards the development of more compact and efficient ptychographic microscopy systems。

MediViSTA-SAM: Zero-shot Medical Video Analysis with Spatio-temporal SAM Adaptation

paper_url: http://arxiv.org/abs/2309.13539
repo_url: None
paper_authors: Sekeun Kim, Kyungsang Kim, Jiang Hu, Cheng Chen, Zhiliang Lyu, Ren Hui, Sunghwan Kim, Zhengliang Liu, Aoxiao Zhong, Xiang Li, Tianming Liu, Quanzheng Li
for: 这篇研究旨在适应医疗影像分类任务中使用Segmentation Anything Model (SAM)。
methods: 这篇研究引入了一种名为MediViSTA-SAM的新方法，它是一种适应医疗影像分类的Video Segmentation方法，使用了类似框架的对应运算，以及多尺度融合。
results: 实验结果显示，MediViSTA-SAM可以实现高准确性和有效性在医疗影像分类任务中。

Abstract
In recent years, the Segmentation Anything Model (SAM) has attracted considerable attention as a foundational model well-known for its robust generalization capabilities across various downstream tasks. However, SAM does not exhibit satisfactory performance in the realm of medical image analysis. In this study, we introduce the first study on adapting SAM on video segmentation, called MediViSTA-SAM, a novel approach designed for medical video segmentation. Given video data, MediViSTA, spatio-temporal adapter captures long and short range temporal attention with cross-frame attention mechanism effectively constraining it to consider the immediately preceding video frame as a reference, while also considering spatial information effectively. Additionally, it incorporates multi-scale fusion by employing a U-shaped encoder and a modified mask decoder to handle objects of varying sizes. To evaluate our approach, extensive experiments were conducted using state-of-the-art (SOTA) methods, assessing its generalization abilities on multi-vendor in-house echocardiography datasets. The results highlight the accuracy and effectiveness of our network in medical video segmentation.

摘要
Recently, the Segmentation Anything Model (SAM) has gained significant attention as a foundational model known for its robust generalization capabilities across various downstream tasks. However, SAM does not exhibit satisfactory performance in the field of medical image analysis. In this study, we introduce the first study on adapting SAM for video segmentation, called MediViSTA-SAM, a novel approach designed for medical video segmentation. Given video data, MediViSTA, a spatio-temporal adapter, captures long and short range temporal attention with a cross-frame attention mechanism, effectively constraining it to consider the immediately preceding video frame as a reference while also considering spatial information effectively. Additionally, it incorporates multi-scale fusion by employing a U-shaped encoder and a modified mask decoder to handle objects of varying sizes. To evaluate our approach, extensive experiments were conducted using state-of-the-art (SOTA) methods, assessing its generalization abilities on multi-vendor in-house echocardiography datasets. The results highlight the accuracy and effectiveness of our network in medical video segmentation.Here's the word-for-word translation of the text into Simplified Chinese:近年来，Segmentation Anything Model（SAM）已经吸引了较大的关注，作为许多下游任务的基础模型，其robust generalization能力在各种领域得到了证明。然而，SAM在医学影像分析领域表现不 satisfactory。在这种研究中，我们介绍了首个采用SAM进行视频分 segmentation的研究，称为MediViSTA-SAM，这是一种专门为医学视频分 segmentation设计的新方法。给定视频数据，MediViSTA使用空间temporal adapter， capture long和short range temporal attention，通过跨帧注意力机制，有效地将其限制为考虑 immediately preceding video frame作为参考，同时也考虑空间信息。此外，它还 incorporates multi-scale fusion，通过使用U型编码器和修改的mask decoder来处理各种大小的对象。为了评估我们的方法，我们进行了广泛的实验，使用现有的state-of-the-art方法，评估我们的网络在医学视频分 segmentation领域的普适性。结果表明，我们的网络在医学视频分 segmentation中具有高度的准确性和有效性。

Deep learning based workflow for accelerated industrial X-ray Computed Tomography

paper_url: http://arxiv.org/abs/2309.14371
repo_url: None
paper_authors: Obaidullah Rahman, Singanallur V. Venkatakrishnan, Luke Scime, Paul Brackman, Curtis Frederick, Ryan Dehoff, Vincent Paquit, Amirkoushyar Ziabari
for: 用于高精度非 destruktive characterization of additively-manufactured metal components
methods: 使用两个神经网络来获得快速加速的重建
results: 可以准确地检测瑕疵和损害，并且可以robustly generalizes across several alloys和不同的缺乏数据情况

Abstract
X-ray computed tomography (XCT) is an important tool for high-resolution non-destructive characterization of additively-manufactured metal components. XCT reconstructions of metal components may have beam hardening artifacts such as cupping and streaking which makes reliable detection of flaws and defects challenging. Furthermore, traditional workflows based on using analytic reconstruction algorithms require a large number of projections for accurate characterization - leading to longer measurement times and hindering the adoption of XCT for in-line inspections. In this paper, we introduce a new workflow based on the use of two neural networks to obtain high-quality accelerated reconstructions from sparse-view XCT scans of single material metal parts. The first network, implemented using fully-connected layers, helps reduce the impact of BH in the projection data without the need of any calibration or knowledge of the component material. The second network, a convolutional neural network, maps a low-quality analytic 3D reconstruction to a high-quality reconstruction. Using experimental data, we demonstrate that our method robustly generalizes across several alloys, and for a range of sparsity levels without any need for retraining the networks thereby enabling accurate and fast industrial XCT inspections.

摘要
X射 Computed Tomography (XCT) 是一种重要的不破坏性高分辨材料成型件的测量工具。 XCT 重建结果可能受到材料硬化的影响，导致识别瑕疵和缺陷困难。此外，传统的工作流程基于使用分析重建算法，需要较多的投射来进行准确的测量 - 导致测量时间长，阻碍 XCT 在生产线上的应用。在这篇论文中，我们介绍了一种新的工作流程，基于使用两个神经网络来从稀疏视图 XCT 扫描数据中获得高质量加速重建。首先，我们使用全连接层实现的第一个神经网络，帮助减少投射数据中的硬化效应，无需任何准备或组合物质知识。其次，我们使用卷积神经网络将低质量的分析3D重建映射到高质量的重建。使用实验数据，我们表明了我们的方法可靠地在不同的合金和稀疏程度上进行泛化，无需任何重新训练神经网络，以便快速和准确地进行工业 XCT 检测。

2023-09-24

eess.SP

eess.SP - 2023-09-24

Non-Uniform Sampling Reconstruction for Symmetrical NMR Spectroscopy by Exploiting Inherent Symmetry

paper_url: http://arxiv.org/abs/2309.13660
repo_url: None
paper_authors: Enping Lin, Ze Fang, Yuqing Huang, Yu Yang, Zhong Chen
For: The paper is written for researchers and scientists who use NMR spectroscopy to study biological macromolecules, specifically those who use multidimensional NMR spectroscopy and non-uniform sampling (NUS) techniques.* Methods: The paper proposes a new sampling schedule called SCPG (Symmetrical Copy Poisson Gap) and uses compressed sensing (CS) methods for reconstruction. The authors theoretically prove that the symmetrical constraint in SCPG is equivalent to sparsity, which improves the accuracy of NUS reconstruction.* Results: The authors show that the proposed SCPG sampling schedule outperforms state-of-the-art 2D Woven PG in NUS reconstruction for symmetrical NMR spectroscopy, both in simulated and experimental data.Here are the three points in Simplified Chinese text:
for: 本文是为研究生物 macromolecules 的研究人员和科学家编写的，尤其是使用多维度 NMR спектроскопия和非均匀抽样 (NUS) 技术。
methods: 本文提出了一种新的抽样时间表 called SCPG (Symmetrical Copy Poisson Gap)，并使用压缩感知 (CS) 方法进行重建。作者理论上证明 SCPG 中的对称约束等效地实现了简约性。
results: 作者表明，SCPG 抽样时间表在对 symmetrical NMR спектроскопия的 NUS 重建中比 state-of-the-art 2D Woven PG 高效， both in 模拟和实验数据中。

Abstract
Symmetrical NMR spectroscopy constitutes a vital branch of multidimensional NMR spectroscopy, providing a powerful tool for the structural elucidation of biological macromolecules. Non-Uniform Sampling (NUS) serves as an effective strategy for averting the prohibitive acquisition time of multidimensional NMR spectroscopy by only sampling a few points according to NUS sampling schedules and reconstructing missing points via algorithms. However, current sampling schedules are unable to maintain the accurate recovery of cross peaks that are weak but important. In this work, we propose a novel sampling schedule termed as SCPG (Symmetrical Copy Poisson Gap) and employ CS (Compressed Sensing) methods for reconstruction. We theoretically prove that the symmetrical constraint, apart from sparsity, is implicitly implemented when SCPG is combined with CS methods. The simulated and experimental data substantiate the advantage of SCPG over state-of-the-art 2D Woven PG in the NUS reconstruction of symmetrical NMR spectroscopy.

摘要
同对称NMR光谱学是生物大分子结构解析的重要分支，具有强大的工具。非均匀抽样（NUS）是一种有效的策略，以减少多维度NMR光谱学的质量点扩展时间。然而，目前的抽样计划无法确保强度较弱但重要的交叉峰织入的精确重建。在这个工作中，我们提出一个新的抽样计划，称为SCPG（对称复复点差隔），并使用CS（压缩感知）方法进行重建。我们 teorically证明，在SCPG与CS方法的结合下，还会隐式地实现对称限制。实验和资料 validate SCPG的优势，比顶部的2D维织PG在NUS重建中。

6G Positioning and Sensing Through the Lens of Sustainability, Inclusiveness, and Trustworthiness

paper_url: http://arxiv.org/abs/2309.13602
repo_url: None
paper_authors: Henk Wymeersch, Hui Chen, Hao Guo, Musa Furkan Keskin, Bahare M. Khorsandi, Mohammad H. Moghaddam, Alejandro Ramirez, Kim Schindhelm, Athanasios Stavridis, Tommy Svensson, Vijaya Yajnanarayana
for: 本研究旨在探讨6G技术如何实现可持续、包容和可信worthiness的价值观念，并与传统的性能指标之间的关系。
methods: 本研究采用了文献综述和理论分析的方法，探讨6G技术的可持续性、包容性和可信worthiness的实现方式，以及这些价值观念与传统的性能指标之间的关系。
results: 本研究发现，6G技术可以通过增强位置和感知的集成来提高通信性能，同时也可以实现包容性和可信worthiness的价值观念。然而，这些价值观念与传统的性能指标之间存在融合关系，需要在设计和实现6G技术时进行综合考虑。

Abstract
6G promises a paradigm shift in which positioning and sensing are inherently integrated, enhancing not only the communication performance but also enabling location- and context-aware services. Historically, positioning and sensing have been viewed through the lens of cost and performance trade-offs, implying an escalated demand for resources, such as radio, physical, and computational resources, for improved performance. However, 6G goes beyond this traditional perspective to encompass a set of broader values, namely sustainability, inclusiveness, and trustworthiness. This paper aims to: (i) shed light on these important value indicators and their relationship with the conventional key performance indicators, and (ii) unveil the dual nature of 6G in relation to these key value indicators (i.e., ensuring operation according to the values and enabling services that affect the values).

摘要
6G 承诺一种 Paradigm shift， Positioning 和 Sensing 被内置地集成，不仅提高了通信性能，还启用了 Location-和 Context-aware 服务。历史上，Positioning 和 Sensing 通常被视为成本和性能之间的贸易OFF，这意味着需要更多的 radio、物理和计算资源来提高性能。然而，6G 超越了传统的视角，涵盖更广泛的价值观念，包括可持续性、包容性和信任性。本文的目标是：（i）探讨这些重要的价值指标与传统的关键性能指标之间的关系，（ii）揭示 6G 对这些价值指标的双重性质（即，根据价值来运行并提供影响价值的服务）。

Identification of Ghost Targets for Automotive Radar in the Presence of Multipath

paper_url: http://arxiv.org/abs/2309.13585
repo_url: None
paper_authors: Le Zheng, Jiamin Long, Marco Lops, Fan Liu, Xueyao Hu
for:The paper is written for detecting the presence of ghosts in automotive radar systems due to multipath.methods:The paper uses a composite hypothesis testing approach based on the Generalized Likelihood Ratio Test (GLRT) philosophy, combined with a sparsity-enforced Compressed Sensing (CS) approach and Levenberg-Marquardt (LM) optimization to estimate the angular parameters in the continuous domain.results:The paper provides an extensive performance analysis to validate the proposed solution for detecting ghosts in automotive radar systems.

Abstract
Colocated multiple-input multiple-output (MIMO) technology has been widely used in automotive radars as it provides accurate angular estimation of the objects with relatively small number of transmitting and receiving antennas. Since the Direction Of Departure (DOD) and the Direction Of Arrival (DOA) of line-of-sight targets coincide, MIMO signal processing allows forming a larger virtual array for angle finding. However, multiple paths impinging the receiver is a major limiting factor, in that radar signals may bounce off obstacles, creating echoes for which the DOD does not equal the DOA. Thus, in complex scenarios with multiple scatterers, the direct paths of the intended targets may be corrupted by indirect paths from other objects, which leads to inaccurate angle estimation or ghost targets. In this paper, we focus on detecting the presence of ghosts due to multipath by regarding it as the problem of deciding between a composite hypothesis, ${\cal H}_0$ say, that the observations only contain an unknown number of direct paths sharing the same (unknown) DOD's and DOA's, and a composite alternative, ${\cal H}_1$ say, that the observations also contain an unknown number of indirect paths, for which DOD's and DOA's do not coincide. We exploit the Generalized Likelihood Ratio Test (GLRT) philosophy to determine the detector structure, wherein the unknown parameters are replaced by carefully designed estimators. The angles of both the active direct paths and of the multi-paths are indeed estimated through a sparsity-enforced Compressed Sensing (CS) approach with Levenberg-Marquardt (LM) optimization to estimate the angular parameters in the continuous domain. An extensive performance analysis is finally offered in order to validate the proposed solution.

摘要
协同多输入多出口（MIMO）技术在汽车雷达中广泛应用，因为它可以准确地估算目标物的方向，只需使用相对较少的发射和接收天线。由于发射和接收方向的DOD和DOA相同，MIMO信号处理可以组成较大的虚拟数组，以便角度测量。但是，多路射雷达信号可以受到障碍物的反射，导致信号返回不同的方向，从而导致DOD不等于DOA。因此，在复杂的多散体场景下，直接目标的直接路径可能会受到其他 объек的 indirect 路径的扰动，从而导致角度估算不准确或鬼目标。在这篇论文中，我们关注在多射场景中 Ghost 的探测，即在雷达信号中检测到不同的 DOD 和 DOA 的射频信号是否来自于直接或间接的多射。我们采用 Generalized Likelihood Ratio Test（GLRT）哲学来确定探测结构，其中未知参数被换成精心设计的估计器。雷达信号中的直接路径和多射路径的角度都是通过一种减少维度的 Compressed Sensing（CS）方法和 Levenberg-Marquardt（LM）优化来估计的。 finally，我们提供了广泛的性能分析，以验证我们的提案的可行性。

Sparsity-Based Channel Estimation Exploiting Deep Unrolling for Downlink Massive MIMO

paper_url: http://arxiv.org/abs/2309.13545
repo_url: None
paper_authors: An Chen, Wenbo Xu, Liyang Lu, Yue Wang
for: 提高5G无线通信系统中大量多输入多出力（MIMO）的spectrum和能量效率，避免过多的射频过头增加频率占用。
methods: 通过抽象学模型驱动的压缩感知（CS）和数据驱动的深度卷积技术相结合，实现hybrid通道估计方案，包括粗略估计部分和精度修正部分，分别利用多普勒频率域和时域频率域的频率稀热性来大幅减少射频过头。
results: 理论结果表明，提案的方案可以减少射频过头量化频率域和时域频率域的频率稀热性，以实现低射频过头的多输入多出力通道估计，同时保证估计精度。实验结果表明，对于5G FDD巨量MIMO系统，提案的方案可以减少射频过头量化80%以上，而且估计精度与传统CS方案相当。

Abstract
Massive multiple-input multiple-output (MIMO) enjoys great advantage in 5G wireless communication systems owing to its spectrum and energy efficiency. However, hundreds of antennas require large volumes of pilot overhead to guarantee reliable channel estimation in FDD massive MIMO system. Compressive sensing (CS) has been applied for channel estimation by exploiting the inherent sparse structure of massive MIMO channel but suffer from high complexity. To overcome this challenge, this paper develops a hybrid channel estimation scheme by integrating the model-driven CS and data-driven deep unrolling technique. The proposed scheme consists of a coarse estimation part and a fine correction part to respectively exploit the inter- and intraframe sparsities of channels to greatly reduce the pilot overhead. Theoretical result is provided to indicate the convergence of the fine correction and coarse estimation net. Simulation results are provided to verify that our scheme can estimate MIMO channels with low pilot overhead while guaranteeing estimation accuracy with relatively low complexity.

摘要
大量多输入多输出（MIMO）在5G无线通信系统中具有优异的优势，主要是在频率和能量方面。然而，数百个天线需要大量的射频过头来保证可靠的通道估计在FDD大量MIMO系统中。压缩感知（CS）已经应用于通道估计中，利用大量MIMO通道的自然稀畴结构。然而，它受到高复杂性的挑战。为了解决这个挑战，本文提出了一种混合模型驱动CS和数据驱动深层卷积技术的混合通道估计方案。该方案包括粗略估计部分和精度修正部分，分别利用通道之间和通道内部的稀畴性来大幅减少射频过头。我们提供了理论结果，证明了精度修正和粗略估计网的共振。实验结果表明，我们的方案可以在低射频过头下Estimation MIMO通道的精度，而且与相对较低的复杂性。

2023-09-23

cs.SD

cs.SD - 2023-09-23

paper_url: http://arxiv.org/abs/2309.13504
repo_url: None
paper_authors: Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin
for: 这篇论文主要是针对听取环境动态参数化问题进行研究，具体来说是用于盲目估计室内声学参数。
methods: 这篇论文提出了一种基于注意力机制的纯注意力模型，用于盲目估计室内声学参数。模型使用了扩展的Transformer架构，并使用了多modal数据 Transfer learning来提高模型性能。
results: 实验结果表明，提出的模型在实际听取环境中表现出色，特别是在使用专门预训练和数据扩展方案时。模型的性能在各种听取环境中都有显著提高。

Abstract
In recent years, dynamic parameterization of acoustic environments has raised increasing attention in the field of audio processing. One of the key parameters that characterize the local room acoustics in isolation from orientation and directivity of sources and receivers is the geometric room volume. Convolutional neural networks (CNNs) have been widely selected as the main models for conducting blind room acoustic parameter estimation, which aims to learn a direct mapping from audio spectrograms to corresponding labels. With the recent trend of self-attention mechanisms, this paper introduces a purely attention-based model to blindly estimate room volumes based on single-channel noisy speech signals. We demonstrate the feasibility of eliminating the reliance on CNN for this task and the proposed Transformer architecture takes Gammatone magnitude spectral coefficients and phase spectrograms as inputs. To enhance the model performance given the task-specific dataset, cross-modality transfer learning is also applied. Experimental results demonstrate that the proposed model outperforms traditional CNN models across a wide range of real-world acoustics spaces, especially with the help of the dedicated pretraining and data augmentation schemes.

摘要
最近几年，动态参数化的声学环境在声音处理领域受到了越来越多的关注。一个关键参数，用于隔离源和接收器的方向和方向性，是地形室内体积。深度学习神经网络（CNN）广泛选择为无目标声学参数估计的主要模型，该模型的目标是从声音спектрограм中直接学习到相应的标签。随着自动注意机制的潮流，这篇论文介绍了一种完全基于注意力的模型，用于无目标地估计室内体积，并将 Gammatone 大小 спектрограм和相位спектрограм作为输入。为了提高模型在这个任务上的表现，我们还应用了交叉模态转移学习。实验结果表明，我们提出的模型在真实世界的各种声学空间中， especial 在使用特定数据集和预训练 schemes 时，都能够超过传统的 CNN 模型。

Two vs. Four-Channel Sound Event Localization and Detection

paper_url: http://arxiv.org/abs/2309.13343
repo_url: None
paper_authors: Julia Wilkins, Magdalena Fuentes, Luca Bondi, Shabnam Ghaffarzadegan, Ali Abavisani, Juan Pablo Bello
for: 本研究旨在探讨DCASE 2022 SELD挑战 зада务中（任务3）模型在4个渠道设置下的性能，以及不同音频输入表示方式对SELD性能的影响。
methods: 本研究使用了DCASE 2022 SELD基线模型，并对不同音频输入表示方式进行比较分析，以评估它们对SELD性能的影响。
results: 研究发现，带声和ステレオ（即2个渠道）音频基于SELD模型仍然能够良好地定位和探测声源，尽管总体性能下降。此外，研究还 segmented 分析了不同场景中声源多样性的影响，以更好地理解不同音频输入表示方式对SELD性能的影响。

Abstract
Sound event localization and detection (SELD) systems estimate both the direction-of-arrival (DOA) and class of sound sources over time. In the DCASE 2022 SELD Challenge (Task 3), models are designed to operate in a 4-channel setting. While beneficial to further the development of SELD systems using a multichannel recording setup such as first-order Ambisonics (FOA), most consumer electronics devices rarely are able to record using more than two channels. For this reason, in this work we investigate the performance of the DCASE 2022 SELD baseline model using three audio input representations: FOA, binaural, and stereo. We perform a novel comparative analysis illustrating the effect of these audio input representations on SELD performance. Crucially, we show that binaural and stereo (i.e. 2-channel) audio-based SELD models are still able to localize and detect sound sources laterally quite well, despite overall performance degrading as less audio information is provided. Further, we segment our analysis by scenes containing varying degrees of sound source polyphony to better understand the effect of audio input representation on localization and detection performance as scene conditions become increasingly complex.

摘要
听音事件地理位置和检测（SELD）系统估算听音源的方向到达（DOA）和时间上的类型。在DCASE 2022 SELD挑战（任务3）中，模型设计用4通道记录设置。虽然多通道记录设置可以进一步发展SELD系统，但大多数消费类电子设备通常只能记录两个通道。因此，在这个工作中，我们研究了DCASE 2022 SELD基准模型使用FOA、双耳和立体声三种听音输入表示方式的性能。我们进行了一项新的比较分析，描述这些听音输入表示方式对听音源的地理位置和检测性能的影响。我们发现，使用双耳和立体声（即2通道）听音基于SELD模型仍然能够准确地localize和检测听音源，尽管总体性能下降。此外，我们对不同场景中听音源的多重播放情况进行了分 segment 的分析，以更好地理解听音输入表示方式对听音源的地理位置和检测性能的影响，场景条件变得越来越复杂。

Contrastive Speaker Embedding With Sequential Disentanglement

paper_url: http://arxiv.org/abs/2309.13253
repo_url: None
paper_authors: Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien
for: 本文旨在提出一种基于对比学习的语音说话人识别方法，该方法利用了顺序分解器（DSVAE）来除掉语言内容，从而使得只有说话人因素被用于构建对比损失目标。
methods: 本文提出的方法包括在传统的SimCLR框架中 incorporating 顺序分解器（DSVAE），以除掉语言内容，并使用对比学习来学习说话人特征。
results: 实验结果表明，提出的方法在 VoxCeleb1-test 上的表现Consistently 高于 SimCLR，这表明了应用顺序分解是有利于学习说话人特征的。

Abstract
Contrastive speaker embedding assumes that the contrast between the positive and negative pairs of speech segments is attributed to speaker identity only. However, this assumption is incorrect because speech signals contain not only speaker identity but also linguistic content. In this paper, we propose a contrastive learning framework with sequential disentanglement to remove linguistic content by incorporating a disentangled sequential variational autoencoder (DSVAE) into the conventional SimCLR framework. The DSVAE aims to disentangle speaker factors from content factors in an embedding space so that only the speaker factors are used for constructing a contrastive loss objective. Because content factors have been removed from the contrastive learning, the resulting speaker embeddings will be content-invariant. Experimental results on VoxCeleb1-test show that the proposed method consistently outperforms SimCLR. This suggests that applying sequential disentanglement is beneficial to learning speaker-discriminative embeddings.

摘要
<>Translate given text into Simplified Chinese.<>对照性发言嵌入假设，即对于正例和负例对话段的差异归结于发言人身份。然而，这个假设是错误的，因为语音信号包含不仅发言人身份，还包含语言内容。在这篇论文中，我们提议一种含有顺序解解析的对照学习框架，使用嵌入空间中的分离顺序自动编码器（DSVAE）来除去语言内容。DSVAE的目标是在嵌入空间中分离发言人因素和语言因素，以便只使用发言人因素来构建对照损失对象。因此，在对照学习中移除了语言内容，得到的发言人嵌入将是内容不变的。实验结果表明，提议的方法在VoxCeleb1-test上一直高于SimCLR。这表明，在学习发言人特异性嵌入时，顺序解解析是有利的。

2023-09-23

cs.CV

cs.CV - 2023-09-23

Portrait Stylization: Artistic Style Transfer with Auxiliary Networks for Human Face Stylization

paper_url: http://arxiv.org/abs/2309.13492
repo_url: https://github.com/thiagoambiel/PortraitStylization
paper_authors: Thiago Ambiel
for: 提高图像风格传递中人脸个体特征的保留
methods: 使用辅助预训练人脸识别模型的嵌入来鼓励算法在内容图像上传递人脸特征到最终风格化结果中
results: 提高了图像风格传递中人脸个体特征的保留

Abstract
Today's image style transfer methods have difficulty retaining humans face individual features after the whole stylizing process. This occurs because the features like face geometry and people's expressions are not captured by the general-purpose image classifiers like the VGG-19 pre-trained models. This paper proposes the use of embeddings from an auxiliary pre-trained face recognition model to encourage the algorithm to propagate human face features from the content image to the final stylized result.

摘要
今天的图像风格传递方法很难保持人脸个人特征 после整个风格化过程。这是因为人脸的特征，如人脸几何学和人们的表情，不被通用的图像分类器如VGG-19预训练模型捕捉。这篇论文提议使用auxiliary预训练人脸识别模型的嵌入来鼓励算法在内容图像上传递人脸特征到最终风格化结果中。

Identifying Systematic Errors in Object Detectors with the SCROD Pipeline

paper_url: http://arxiv.org/abs/2309.13489
repo_url: https://github.com/hieu9955/ggggg
paper_authors: Valentyn Boreiko, Matthias Hein, Jan Hendrik Metzen
for: 本研究旨在提高对象检测器的系统性错误识别和除除，以便在自动驾驶和机器人应用中使用。
methods: 我们提出了一种新的框架， combinesPhysical simulators和生成模型的优点，以实现高级的自动控制和可扩展性。
results: 我们的框架可以自动生成街道场景，并且可以具有精细的控制。此外，我们还提出了一种评价设定，可以作为类似框架的标准测试进程。

Abstract
The identification and removal of systematic errors in object detectors can be a prerequisite for their deployment in safety-critical applications like automated driving and robotics. Such systematic errors can for instance occur under very specific object poses (location, scale, orientation), object colors/textures, and backgrounds. Real images alone are unlikely to cover all relevant combinations. We overcome this limitation by generating synthetic images with fine-granular control. While generating synthetic images with physical simulators and hand-designed 3D assets allows fine-grained control over generated images, this approach is resource-intensive and has limited scalability. In contrast, using generative models is more scalable but less reliable in terms of fine-grained control. In this paper, we propose a novel framework that combines the strengths of both approaches. Our meticulously designed pipeline along with custom models enables us to generate street scenes with fine-grained control in a fully automated and scalable manner. Moreover, our framework introduces an evaluation setting that can serve as a benchmark for similar pipelines. This evaluation setting will contribute to advancing the field and promoting standardized testing procedures.

摘要
“系统性错误在物检测器中的识别和移除可以是安全应用程序like自动驾驶和机器人的必要条件。这些系统性错误可能会发生在非常特定的物品位置（位置、比例、姿态）、物品颜色/ texture 和背景下。实际的图像独立无法覆盖所有相关的 комbination。我们 overcome这个限制，通过生成Synthetic图像，并具有精细的控制。使用物理 simulator 和手动设计的 3D 资产来生成Synthetic图像可以实现精细的控制，但这种方法是资源耗尽和有限的可扩展性。相比之下，使用生成模型是更可扩展的，但是在精细控制方面 Less reliable。在这篇论文中，我们提出一个新的框架，让我们在自动化和可扩展的方式下，生成街景图像，并具有精细的控制。此外，我们的框架还引入了评估环境，可以作为类似框架的参考。这个评估环境将对领域的进步和标准化 testing процедуures 做出贡献。”

Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers

paper_url: http://arxiv.org/abs/2309.13475
repo_url: None
paper_authors: Aryaman Gupta, Kaustav Chakraborty, Somil Bansal
for: 本研究旨在提高自主系统的安全性和可靠性，通过在运行时检测和 Mitigate 视觉控制器的异常情况。
methods: 本研究使用了可达性基础结构来压测视觉控制器，并将其异常情况数据用于在线训练异常检测器。
results: 研究结果表明，提出的方法可以识别和处理视觉控制器的系统级异常情况，并在检测和 Mitigate 过程中提高自主系统的安全性和可靠性。

Abstract
Autonomous systems, such as self-driving cars and drones, have made significant strides in recent years by leveraging visual inputs and machine learning for decision-making and control. Despite their impressive performance, these vision-based controllers can make erroneous predictions when faced with novel or out-of-distribution inputs. Such errors can cascade to catastrophic system failures and compromise system safety. In this work, we introduce a run-time anomaly monitor to detect and mitigate such closed-loop, system-level failures. Specifically, we leverage a reachability-based framework to stress-test the vision-based controller offline and mine its system-level failures. This data is then used to train a classifier that is leveraged online to flag inputs that might cause system breakdowns. The anomaly detector highlights issues that transcend individual modules and pertain to the safety of the overall system. We also design a fallback controller that robustly handles these detected anomalies to preserve system safety. We validate the proposed approach on an autonomous aircraft taxiing system that uses a vision-based controller for taxiing. Our results show the efficacy of the proposed approach in identifying and handling system-level anomalies, outperforming methods such as prediction error-based detection, and ensembling, thereby enhancing the overall safety and robustness of autonomous systems.

摘要
自主系统，如自驾车和无人机，在最近几年中取得了很大的进步，通过视觉输入和机器学习来做出决策和控制。尽管它们的表现非常出色，但这些视觉控制器在面对新或非标准的输入时可能会做出错误的预测。这些错误可能会导致系统失败和发生危机。在这个工作中，我们提出了一个在线运行的问题检测器，以检测和缓解这些关键的系统级别失败。具体来说，我们利用一个可以在线运行的抽象框架，对视觉控制器进行压力测试，并从这些资料中提取出可能会导致系统异常的特征。这些特征可以被用来训练一个在线运行的分类器，以检测和识别可能会导致系统异常的输入。这个问题检测器可以帮助检测和解决系统级别的问题，以提高自主系统的安全和可靠性。我们还设计了一个可靠的备援控制器，以确保系统在检测到问题时能够稳定地运行。我们验证了我们的方法在一个使用视觉控制器进行着陆的自主飞机系统中，结果显示了我们的方法能够优化自主系统的安全和可靠性。

Edge Aware Learning for 3D Point Cloud

paper_url: http://arxiv.org/abs/2309.13472
repo_url: None
paper_authors: Lei Li
for:* 这种方法是为了处理点云数据中的噪声，提高物体认知和分割。methods:* 该方法使用了人类视觉系统中的边感知概念，并将其 интеグриinto了学习方法中，以提高物体识别和分割。results:* 该方法在ModelNet40和ShapeNet数据集上表现出色，在物体分类和分割任务中表现出了显著的优势。

Abstract
This paper proposes an innovative approach to Hierarchical Edge Aware 3D Point Cloud Learning (HEA-Net) that seeks to address the challenges of noise in point cloud data, and improve object recognition and segmentation by focusing on edge features. In this study, we present an innovative edge-aware learning methodology, specifically designed to enhance point cloud classification and segmentation. Drawing inspiration from the human visual system, the concept of edge-awareness has been incorporated into this methodology, contributing to improved object recognition while simultaneously reducing computational time. Our research has led to the development of an advanced 3D point cloud learning framework that effectively manages object classification and segmentation tasks. A unique fusion of local and global network learning paradigms has been employed, enriched by edge-focused local and global embeddings, thereby significantly augmenting the model's interpretative prowess. Further, we have applied a hierarchical transformer architecture to boost point cloud processing efficiency, thus providing nuanced insights into structural understanding. Our approach demonstrates significant promise in managing noisy point cloud data and highlights the potential of edge-aware strategies in 3D point cloud learning. The proposed approach is shown to outperform existing techniques in object classification and segmentation tasks, as demonstrated by experiments on ModelNet40 and ShapeNet datasets.

摘要

HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

paper_url: http://arxiv.org/abs/2309.13470
repo_url: None
paper_authors: Ankit Jha, Debabrata Pal, Mainak Singha, Naman Agarwal, Biplab Banerjee
for: 本研究旨在解决remote sensing（RS）频谱图像识别领域中的一个新的问题，即在meta-训练阶段可以同时使用视觉modalities，但在meta-测试阶段一个modalities可能缺失。
methods: 我们提出了一种新的几何学生成框架，即Hallucinated Audio-Visual Embeddings-Network（HAVE-Net），用于在限制性的单模态数据上meta-训练交叉模态特征。在推理阶段，我们使用这些幻想出的特征进行几何学分类。
results: 我们的实验结果表明，使用我们的幻想模态增强策略，在ADVANCE和AudioSetZSL数据集上的 benchmark dataset上，我们的几何学分类器在少数据情况下表现至少比基eline perfomance高出0.8-2%。

Abstract
Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain. Here, we aim to solve a novel problem where both the audio and visual modalities are present during the meta-training of a few-shot learning (FSL) classifier; however, one of the modalities might be missing during the meta-testing stage. This problem formulation is pertinent in the RS domain, given the difficulties in data acquisition or sensor malfunctioning. To mitigate, we propose a novel few-shot generative framework, Hallucinated Audio-Visual Embeddings-Network (HAVE-Net), to meta-train cross-modal features from limited unimodal data. Precisely, these hallucinated features are meta-learned from base classes and used for few-shot classification on novel classes during the inference phase. The experimental results on the benchmark ADVANCE and AudioSetZSL datasets show that our hallucinated modality augmentation strategy for few-shot classification outperforms the classifier performance trained with the real multimodal information at least by 0.8-2%.

摘要
现在远程感知（RS）或航空图像的认知是非常有趣，深度学习算法在最近几年中增添了一些风味。在训练神经网络时， occlusion、内类差异、照明等问题可能会出现。尽管将音频和视觉模式联合训练可以在数据缺乏的情况下提高分类性能，但是在RS领域还未得到过分析。在这里，我们想解决一个新的问题，即在meta-测试阶段缺失一个感知模式。这种问题在RS领域非常有 pertinence，因为数据收集或传感器故障是常见的问题。为了 mitigate，我们提出了一种新的几shot生成框架，即Hallucinated Audio-Visual Embeddings-Network（HAVE-Net）。具体来说，这些hallucinated特征是在基类上meta-学习的，并在推理阶段用于几shot分类 novel classes。实验结果表明，我们的 hallucinated感知特征增强策略在ADVANCE和AudioSetZSL数据集上的 benchmark 上超过了实际多modal信息下的类ifier性能，至少提高0.8-2%。

Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

paper_url: http://arxiv.org/abs/2309.13457
repo_url: None
paper_authors: Wai Tong Chung, Bassem Akoush, Pushan Sharma, Alex Tamkin, Ki Sung Jung, Jacqueline H. Chen, Jack Guo, Davy Brouzet, Mohsen Talei, Bruno Savard, Alexei Y. Poludnenko, Matthias Ihme
for:The paper aims to provide a large-scale dataset of 3D high-fidelity compressible turbulent flow simulations for training and benchmarking deep learning models.methods:The paper uses 744 full-domain samples from 34 high-fidelity direct numerical simulations to create a network-of-datasets called BLASTNet 2.0, which contains 49 variations of five deep learning approaches for 3D super-resolution.results:The paper performs a neural scaling analysis on these models to examine the performance of different machine learning (ML) approaches, including two scientific ML techniques, and demonstrates that (i) predictive performance can scale with model size and cost, (ii) architecture matters significantly, especially for smaller models, and (iii) the benefits of physics-based losses can persist with increasing model size.

Abstract
Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent flow simulation data. With this data, we benchmark a total of 49 variations of five deep learning approaches for 3D super-resolution - which can be applied for improving scientific imaging, simulations, turbulence models, as well as in computer vision applications. We perform neural scaling analysis on these models to examine the performance of different machine learning (ML) approaches, including two scientific ML techniques. We demonstrate that (i) predictive performance can scale with model size and cost, (ii) architecture matters significantly, especially for smaller models, and (iii) the benefits of physics-based losses can persist with increasing model size. The outcomes of this benchmark study are anticipated to offer insights that can aid the design of 3D super-resolution models, especially for turbulence models, while this data is expected to foster ML methods for a broad range of flow physics applications. This data is publicly available with download links and browsing tools consolidated at https://blastnet.github.io.

摘要
We tested 49 variations of five deep learning approaches for 3D super-resolution using this dataset. Our results show that:1. Predictive performance can scale with model size and cost.2. Model architecture is crucial, especially for smaller models.3. Physics-based losses can provide significant benefits, even with larger models.These findings can aid the design of 3D super-resolution models, particularly for turbulence models. The BLASTNet 2.0 dataset is publicly available at , with download links and browsing tools. This dataset is expected to facilitate the development of machine learning methods for a wide range of flow physics applications.

Video Timeline Modeling For News Story Understanding

paper_url: http://arxiv.org/abs/2309.13446
repo_url: https://github.com/google-research/google-research
paper_authors: Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong
For: The paper is written for exploring the problem of video timeline modeling, with the goal of creating a video-associated timeline to facilitate content and structure understanding of the story being told.* Methods: The paper proposes several deep learning approaches to tackling this problem, including the development of a realistic benchmark dataset (YouTube-News-Timeline) and the introduction of quantitative metrics to evaluate and compare methodologies.* Results: The paper anticipates that this exploratory work will pave the way for further research in video timeline modeling, and provides a testbed for researchers to build upon.Here’s the same information in Simplified Chinese text:* For: 本文探讨视频时间轴模型问题，目的是创建与特定主题相关的视频时间轴，以便更好地理解故事的内容和结构。* Methods: 本文提出了多种深度学习方法来解决这个问题，包括开发一个真实的 bencmark 数据集 (YouTube-News-Timeline) 和引入评估和比较方法的量化指标。* Results: 本文预计这项探讨工作将为视频时间轴模型的进一步研究开辟新的道路，并提供了研究者们可以进一步发展的测试平台。

Abstract
In this paper, we present a novel problem, namely video timeline modeling. Our objective is to create a video-associated timeline from a set of videos related to a specific topic, thereby facilitating the content and structure understanding of the story being told. This problem has significant potential in various real-world applications, for instance, news story summarization. To bootstrap research in this area, we curate a realistic benchmark dataset, YouTube-News-Timeline, consisting of over $12$k timelines and $300$k YouTube news videos. Additionally, we propose a set of quantitative metrics to comprehensively evaluate and compare methodologies. With such a testbed, we further develop and benchmark several deep learning approaches to tackling this problem. We anticipate that this exploratory work will pave the way for further research in video timeline modeling. The assets are available via https://github.com/google-research/google-research/tree/master/video_timeline_modeling.

摘要
在这篇论文中，我们提出了一个新的问题，即视频时间轴建模。我们的目标是从一组关于特定主题的视频集合中生成一个视频相关的时间轴，以便更好地理解视频中的内容和结构。这个问题在实际应用中具有重要的潜在价值，例如新闻故事概要。为了推动这个领域的研究，我们制作了一个现实的测试集，YouTube-News-Timeline，包含超过12000个时间轴和300000个YouTube新闻视频。此外，我们提出了一些量化的评价指标，以全面评估和比较不同方法的性能。通过这些实验，我们预计会开拓视频时间轴建模的研究途径。资产可以通过https://github.com/google-research/google-research/tree/master/video_timeline_modeling获取。

Dream the Impossible: Outlier Imagination with Diffusion Models

paper_url: http://arxiv.org/abs/2309.13415
repo_url: https://github.com/deeplearning-wisc/dream-ood
paper_authors: Xuefeng Du, Yiyou Sun, Xiaojin Zhu, Yixuan Li
for: 本研究旨在提出一种新的框架，以便生成高品质的异常样例，以提高机器学习模型的OOD检测和预测安全性。
methods: 该框架基于diffusion模型，通过文本条件的latent空间学习，生成高维像素空间中的异常样例。
results: 研究表明，通过使用DREAM-OOD生成的样例进行训练，可以提高OOD检测性能。

Abstract
Utilizing auxiliary outlier datasets to regularize the machine learning model has demonstrated promise for out-of-distribution (OOD) detection and safe prediction. Due to the labor intensity in data collection and cleaning, automating outlier data generation has been a long-desired alternative. Despite the appeal, generating photo-realistic outliers in the high dimensional pixel space has been an open challenge for the field. To tackle the problem, this paper proposes a new framework DREAM-OOD, which enables imagining photo-realistic outliers by way of diffusion models, provided with only the in-distribution (ID) data and classes. Specifically, DREAM-OOD learns a text-conditioned latent space based on ID data, and then samples outliers in the low-likelihood region via the latent, which can be decoded into images by the diffusion model. Different from prior works, DREAM-OOD enables visualizing and understanding the imagined outliers, directly in the pixel space. We conduct comprehensive quantitative and qualitative studies to understand the efficacy of DREAM-OOD, and show that training with the samples generated by DREAM-OOD can benefit OOD detection performance. Code is publicly available at https://github.com/deeplearning-wisc/dream-ood.

摘要
使用辅助外围数据集规范机器学习模型，已经显示了出现在其他分布（OOD）探测和安全预测的承诺。由于数据收集和清洁的劳动性，自动生成外围数据变得非常感兴趣。尽管有吸引力，在高维像素空间中生成真实的外围数据仍然是领域的一个开放挑战。为解决这个问题，这篇论文提出了一个新的框架——DREAM-OOD，可以生成真实的外围数据。具体来说，DREAM-OOD学习了根据内 distribuition（ID）数据的文本 conditioned的隐藏空间，然后通过这个隐藏空间的低可能性区域进行采样，这些采样可以通过扩散模型进行解码，转换为图像。与先前的工作不同，DREAM-OOD可以直接在像素空间中可见和理解生成的外围数据。我们进行了全面的量化和质量研究，并证明了在训练中使用DREAM-OOD生成的样本可以提高OOD探测性能。代码可以在https://github.com/deeplearning-wisc/dream-ood中下载。

WS-YOLO: Weakly Supervised Yolo Network for Surgical Tool Localization in Endoscopic Videos

paper_url: http://arxiv.org/abs/2309.13404
repo_url: https://github.com/breezewrf/weakly-supervised-yolov8
paper_authors: Rongfeng Wei, Jinlin Wu, You Pang, Zhen Chen
for: 这个论文是为了提高endooscopic视频记录中手术工具的检测和跟踪而写的。
methods: 这个论文使用了Weakly Supervised Yolo Network (WS-YOLO)来生成endooscopic视频中手术工具的精细Semantic信息，包括工具的位置和类别。
results: 这个论文的实验结果表明，WS-YOLO可以准确地检测和跟踪手术工具，并且可以减少人工标注劳动量。 codes are available online。

Abstract
Being able to automatically detect and track surgical instruments in endoscopic video recordings would allow for many useful applications that could transform different aspects of surgery. In robot-assisted surgery, the potentially informative data like categories of surgical tool can be captured, which is sparse, full of noise and without spatial information. We proposed a Weakly Supervised Yolo Network (WS-YOLO) for Surgical Tool Localization in Endoscopic Videos, to generate fine-grained semantic information with location and category from coarse-grained semantic information outputted by the da Vinci surgical robot, which significantly diminished the necessary human annotation labor while striking an optimal balance between the quantity of manually annotated data and detection performance. The source code is available at https://github.com/Breezewrf/Weakly-Supervised-Yolov8.

摘要
能够自动探测和跟踪针对endooscopic视频记录的手术工具，将有很多有用的应用程序，可以transform不同方面的手术。在机器助手手术中，可以捕捉可能有用的数据，如手术工具类别，但这些数据稀疏、充满噪音，无法提供空间信息。我们提出了一种Weakly Supervised Yolo Network (WS-YOLO) for Surgical Tool Localization in Endoscopic Videos，以生成细化的semantic信息，包括位置和类别，从粗化的semantic信息输出ted by the da Vinci surgical robot，这有效减少了人工标注劳动，同时 strike an optimal balance between the quantity of manually annotated data and detection performance。源代码可以在https://github.com/Breezewrf/Weakly-Supervised-Yolov8中找到。

Dual-Reference Source-Free Active Domain Adaptation for Nasopharyngeal Carcinoma Tumor Segmentation across Multiple Hospitals

paper_url: http://arxiv.org/abs/2309.13401
repo_url: https://github.com/whq-xxh/Active-GTV-Seg
paper_authors: Hongqiu Wang, Jian Chen, Shichen Zhang, Yuan He, Jinfeng Xu, Mengwan Wu, Jinlan He, Wenjun Liao, Xiangde Luo
for: 这个论文旨在提高nasopharyngeal carcinoma（NPC）的肿体卷积（GTV）分割精度，以确保NPC radiotherapy的效果。
methods: 这个论文提出了一种源自free active domain adaptation（SFADA）框架，用于解决GTV分割任务中的领域适应问题。该框架使用了双参照策略，选择目标领域中具有适应性和特定性的样本进行标注和模型细化。
results: 实验结果表明， compared to unsupervised domain adaptation（UDA）方法，SFADA方法可以更好地适应领域适应问题，并且可以与完全监督Upper Bound（UB）相当，即使只有几个标注样本。此外，该论文还收集了1057名NPC患者的临床数据，以验证该方法的有效性。

Abstract
Nasopharyngeal carcinoma (NPC) is a prevalent and clinically significant malignancy that predominantly impacts the head and neck area. Precise delineation of the Gross Tumor Volume (GTV) plays a pivotal role in ensuring effective radiotherapy for NPC. Despite recent methods that have achieved promising results on GTV segmentation, they are still limited by lacking carefully-annotated data and hard-to-access data from multiple hospitals in clinical practice. Although some unsupervised domain adaptation (UDA) has been proposed to alleviate this problem, unconditionally mapping the distribution distorts the underlying structural information, leading to inferior performance. To address this challenge, we devise a novel Sourece-Free Active Domain Adaptation (SFADA) framework to facilitate domain adaptation for the GTV segmentation task. Specifically, we design a dual reference strategy to select domain-invariant and domain-specific representative samples from a specific target domain for annotation and model fine-tuning without relying on source-domain data. Our approach not only ensures data privacy but also reduces the workload for oncologists as it just requires annotating a few representative samples from the target domain and does not need to access the source data. We collect a large-scale clinical dataset comprising 1057 NPC patients from five hospitals to validate our approach. Experimental results show that our method outperforms the UDA methods and achieves comparable results to the fully supervised upper bound, even with few annotations, highlighting the significant medical utility of our approach. In addition, there is no public dataset about multi-center NPC segmentation, we will release code and dataset for future research.

摘要
《nasopharyngeal carcinoma (NPC)是一种流行的且严重影响头颈部的恶性肿瘤，精准定义Gross Tumor Volume (GTV)对NPC有效 radiotherapy至关重要。 despite recent methods have achieved promising results in GTV segmentation, they are still limited by the lack of carefully-annotated data and hard-to-access data from multiple hospitals in clinical practice. although some unsupervised domain adaptation (UDA) has been proposed to alleviate this problem, unconditionally mapping the distribution distorts the underlying structural information, leading to inferior performance. to address this challenge, we propose a novel Source-Free Active Domain Adaptation (SFADA) framework to facilitate domain adaptation for the GTV segmentation task. specifically, we design a dual reference strategy to select domain-invariant and domain-specific representative samples from a specific target domain for annotation and model fine-tuning without relying on source-domain data. our approach not only ensures data privacy but also reduces the workload for oncologists as it only requires annotating a few representative samples from the target domain and does not need to access the source data. we collect a large-scale clinical dataset comprising 1057 NPC patients from five hospitals to validate our approach. experimental results show that our method outperforms the UDA methods and achieves comparable results to the fully supervised upper bound, even with few annotations, highlighting the significant medical utility of our approach. in addition, there is no public dataset about multi-center NPC segmentation, we will release code and dataset for future research.》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need the translation in Traditional Chinese, please let me know.

A mirror-Unet architecture for PET/CT lesion segmentation

paper_url: http://arxiv.org/abs/2309.13398
repo_url: https://github.com/yrotstein/autopet2023_mv1
paper_authors: Yamila Rotstein Habarnau, Mauro Namías
for: 这个研究的目的是为了自动检测和分类内科癌症的变化，并将其分类为不同的类型。
methods: 这个研究使用了一种深度学习方法，即 combining two UNet-3D branches，其中一个分支用于从 CT 图像中分类组织，另一个分支则用于从 PET 图像中分类变化。
results: 研究发现，这种深度学习方法可以实现高精度的变化检测和分类，并且可以将变化分类为不同的类型。

Abstract
Automatic lesion detection and segmentation from [${}^{18}$F]FDG PET/CT scans is a challenging task, due to the diversity of shapes, sizes, FDG uptake and location they may present, besides the fact that physiological uptake is also present on healthy tissues. In this work, we propose a deep learning method aimed at the segmentation of oncologic lesions, based on a combination of two UNet-3D branches. First, one of the network's branches is trained to segment a group of tissues from CT images. The other branch is trained to segment the lesions from PET images, combining on the bottleneck the embedded information of CT branch, already trained. We trained and validated our networks on the AutoPET MICCAI 2023 Challenge dataset. Our code is available at: https://github.com/yrotstein/AutoPET2023_Mv1.

摘要
自动检测和分割肿瘤从 [${}^{18}$F]FDG PET/CT扫描图像是一项具有挑战性的任务，由于肿瘤的多样性、大小、FDG吸收和位置的不同，以及健康组织也会有physiological吸收。在这项工作中，我们提出了基于深度学习的肿瘤分割方法，通过两个UNet-3D支持树的组合来实现。第一个网络支持树被训练用于从CT图像中分割一组组织。另一个支持树则被训练用于从PET图像中分割肿瘤，并将 embedding在树的瓶颈中的CT支持树已经训练完成。我们在AutoPET MICCAI 2023 Challenge数据集上训练和验证了我们的网络。我们的代码可以在以下地址找到：https://github.com/yrotstein/AutoPET2023_Mv1。

YOLORe-IDNet: An Efficient Multi-Camera System for Person-Tracking

paper_url: http://arxiv.org/abs/2309.13387
repo_url: None
paper_authors: Vipin Gautam, Shitala Prasad, Sharad Sinha
for: 实时多摄像头人识别和跟踪
methods: correlate filters 和 Intersection Over Union (IOU) 约束，以及基于 YOLOv5 的深度学习人体重复识别 (Re-ID)
results: 高达 79% 的 F1-Score 和 59% 的 IOU，与现有状态的算法相当，在公共可用 OTB-100 数据集上进行评估。

Abstract
The growing need for video surveillance in public spaces has created a demand for systems that can track individuals across multiple cameras feeds in real-time. While existing tracking systems have achieved impressive performance using deep learning models, they often rely on pre-existing images of suspects or historical data. However, this is not always feasible in cases where suspicious individuals are identified in real-time and without prior knowledge. We propose a person-tracking system that combines correlation filters and Intersection Over Union (IOU) constraints for robust tracking, along with a deep learning model for cross-camera person re-identification (Re-ID) on top of YOLOv5. The proposed system quickly identifies and tracks suspect in real-time across multiple cameras and recovers well after full or partial occlusion, making it suitable for security and surveillance applications. It is computationally efficient and achieves a high F1-Score of 79% and an IOU of 59% comparable to existing state-of-the-art algorithms, as demonstrated in our evaluation on a publicly available OTB-100 dataset. The proposed system offers a robust and efficient solution for the real-time tracking of individuals across multiple camera feeds. Its ability to track targets without prior knowledge or historical data is a significant improvement over existing systems, making it well-suited for public safety and surveillance applications.

摘要
随着公共空间内部照相设备的增加，需要一种可以在多个摄像头视频流中实时跟踪人员的系统。现有的跟踪系统已经使用深度学习模型实现了出色的性能，但是它们常常依赖于先前的图像或历史数据。然而，这并不总是可行的，特别是在实时发现疑犯并无先知的情况下。我们提议一种结合相关滤波器和交叉区域大小（IOU）约束的人员跟踪系统，并在YOLOv5之上使用深度学习模型进行交叉摄像头人员重新识别（Re-ID）。提议的系统可以快速地在多个摄像头视频流中识别和跟踪疑犯，并在全或部分遮挡后快速恢复，适用于安全监控应用。它的计算效率高，并在OTB-100公共数据集上实现了79%的F1分数和59%的IOU分数，与现有状态的算法相当。提议的系统提供了一种实时、有效的人员跟踪解决方案，无需先知或历史数据，对公共安全和监控应用有 significannot improvement。

paper_url: http://arxiv.org/abs/2309.13385
repo_url: https://github.com/vios-s/CMRxRECON_Challenge_EDIPO
paper_authors: Yuyang Xue, Yuning Du, Gianluca Carloni, Eva Pachetti, Connor Jordan, Sotirios A. Tsaftaris
for: 心脏功能和状态的非侵入性理解
methods: 使用 convolutional recurrent neural network (CRNN) 构成和单影像超解析模组
results: 比基eline情况下提高4.4%的结构相似性和3.9%的normalized mean square error

Abstract
Cine Magnetic Resonance Imaging (MRI) allows for understanding of the heart's function and condition in a non-invasive manner. Undersampling of the $k$-space is employed to reduce the scan duration, thus increasing patient comfort and reducing the risk of motion artefacts, at the cost of reduced image quality. In this challenge paper, we investigate the use of a convolutional recurrent neural network (CRNN) architecture to exploit temporal correlations in supervised cine cardiac MRI reconstruction. This is combined with a single-image super-resolution refinement module to improve single coil reconstruction by 4.4\% in structural similarity and 3.9\% in normalised mean square error compared to a plain CRNN implementation. We deploy a high-pass filter to our $\ell_1$ loss to allow greater emphasis on high-frequency details which are missing in the original data. The proposed model demonstrates considerable enhancements compared to the baseline case and holds promising potential for further improving cardiac MRI reconstruction.

摘要

Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers

paper_url: http://arxiv.org/abs/2309.13353
repo_url: None
paper_authors: Adam Pardyl, Grzegorz Kurzejamski, Jan Olszewski, Tomasz Trzciński, Bartosz Zieliński
for: 该论文旨在提高视transformer在实际应用中的表现和效率，通过提高输入灵活性。
methods: 论文提出了一种用于评估视transformer输入灵活性的评价协议，并提出了对transformer架构和训练策略的修改，以增强其输入灵活性。
results: 经过广泛的实验，论文发现了输入 sampling 策略的机会和挑战，并提供了关于视transformer在不同应用场景中的表现。

Abstract
Vision transformers have excelled in various computer vision tasks but mostly rely on rigid input sampling using a fixed-size grid of patches. This limits their applicability in real-world problems, such as in the field of robotics and UAVs, where one can utilize higher input elasticity to boost model performance and efficiency. Our paper addresses this limitation by formalizing the concept of input elasticity for vision transformers and introducing an evaluation protocol, including dedicated metrics for measuring input elasticity. Moreover, we propose modifications to the transformer architecture and training regime, which increase its elasticity. Through extensive experimentation, we spotlight opportunities and challenges associated with input sampling strategies.

摘要
< translating into Simplified Chinese...视力变换器在计算机视觉任务中表现出色，但它们通常采用固定大小网格的粒子 sampling 来输入数据。这限制了它们在实际应用中，如 роботи克和无人机领域， где可以使用更高的输入灵活性来提高模型性能和效率。我们的论文解决了这个限制，正式表述了视力变换器的输入灵活性概念，并提出了评估协议，包括专门为测量输入灵活性的指标。此外，我们还提出了 transformer 架构和训练方法的修改，以增加其灵活性。通过广泛的实验，我们把关注到输入采样策略的机会和挑战。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know.

FedDrive v2: an Analysis of the Impact of Label Skewness in Federated Semantic Segmentation for Autonomous Driving

paper_url: http://arxiv.org/abs/2309.13336
repo_url: https://github.com/Erosinho13/FedDrive
paper_authors: Eros Fanì, Marco Ciccone, Barbara Caputo
for: 研究 semantic segmentation 在自动驾驶中的 federated learning benchmark 的分布式skewness 的影响。
methods: 提出了 six 种新的 federated 场景，以研究 label skewness 对 segmentation 模型的性能的影响，并与域shift 的影响进行比较。
results: 研究了使用域信息 durante testing 的影响。Here’s the English version for reference:
for: Investigating the impact of distribution skewness on semantic segmentation in autonomous driving using federated learning benchmarks.
methods: Propose six new federated scenarios to study the effect of label skewness on segmentation models and compare it with the effect of domain shift.
results: Study the impact of using domain information during testing.

Abstract
We propose FedDrive v2, an extension of the Federated Learning benchmark for Semantic Segmentation in Autonomous Driving. While the first version aims at studying the effect of domain shift of the visual features across clients, in this work, we focus on the distribution skewness of the labels. We propose six new federated scenarios to investigate how label skewness affects the performance of segmentation models and compare it with the effect of domain shift. Finally, we study the impact of using the domain information during testing. Official website: https://feddrive.github.io

摘要
我们提出了FedDrive v2，它是基于联合学习benchmark для自动驾驶 semantic segmentation的扩展。在前一版中，我们研究了视觉特征之间客户端域转换的效果，而在这次工作中，我们专注于标签的分布偏度。我们提出了六种新的联合场景，以研究标签偏度对 segmentation 模型的性能影响，并与域转换的影响进行比较。最后，我们研究了在测试时使用域信息的影响。官方网站：https://feddrive.github.ioNote: "联合学习" (federated learning) in Chinese is usually translated as "联合学习" (federated learning), but in this context, I used "基于联合学习benchmark" (based on federated learning benchmark) to emphasize that FedDrive is an extension of a existing benchmark.

Tackling the Incomplete Annotation Issue in Universal Lesion Detection Task By Exploratory Training

paper_url: http://arxiv.org/abs/2309.13306
repo_url: None
paper_authors: Xiaoyu Bai, Benteng Ma, Changyang Li, Yong Xia
for: 这种研究的目的是提高非特征化图像诊断的精度，尤其是检测医疗图像中的多种器官 lesion。
methods: 该研究使用了深度学习方法，并利用 pseudo-label 技术来挖掘未标注的对象。
results: 研究表明，提出的方法可以superior表现于现有的方法，在两个医疗图像 datasets 上。

Abstract
Universal lesion detection has great value for clinical practice as it aims to detect various types of lesions in multiple organs on medical images. Deep learning methods have shown promising results, but demanding large volumes of annotated data for training. However, annotating medical images is costly and requires specialized knowledge. The diverse forms and contrasts of objects in medical images make fully annotation even more challenging, resulting in incomplete annotations. Directly training ULD detectors on such datasets can yield suboptimal results. Pseudo-label-based methods examine the training data and mine unlabelled objects for retraining, which have shown to be effective to tackle this issue. Presently, top-performing methods rely on a dynamic label-mining mechanism, operating at the mini-batch level. However, the model's performance varies at different iterations, leading to inconsistencies in the quality of the mined labels and limits their performance enhancement. Inspired by the observation that deep models learn concepts with increasing complexity, we introduce an innovative exploratory training to assess the reliability of mined lesions over time. Specifically, we introduce a teacher-student detection model as basis, where the teacher's predictions are combined with incomplete annotations to train the student. Additionally, we design a prediction bank to record high-confidence predictions. Each sample is trained several times, allowing us to get a sequence of records for each sample. If a prediction consistently appears in the record sequence, it is likely to be a true object, otherwise it may just a noise. This serves as a crucial criterion for selecting reliable mined lesions for retraining. Our experimental results substantiate that the proposed framework surpasses state-of-the-art methods on two medical image datasets, demonstrating its superior performance.

摘要
全面搜寻检测在医疗实践中具有极高的价值，旨在多种器官的医学图像上检测不同类型的损害。深度学习方法在检测中表现出了扎根的成绩，但需要大量已注解数据进行训练。然而，注解医学图像是成本高昂的，需要专业知识。医学图像中 объек的多样性和对比性使得全面注解变得更加困难，从而导致了不完全的注解。直接在如此数据上训练 ULD 检测器可能会得到不佳的结果。基于 pseudo-label 方法，我们可以在训练数据中挖掘未注解的对象，以便重新训练。目前，最高级的方法通过动态标签采集机制，在小批量级别进行操作。然而，模型在不同迭代中的性能会变化，导致标签采集过程中的品质不稳定，限制了性能提高的可能性。我们从深度学习模型学习对象的复杂性的观察中灵感，并提出了一种创新的探索训练方法。具体来说，我们在教师-学生检测模型的基础上，将教师的预测与不完全注解结合使用，用于训练学生。此外，我们还设计了预测银行，以记录高置信预测。每个样本都会在多次训练中得到多个记录，这些记录中的一些预测可能是真实的对象，而其他的预测可能只是噪音。这些记录可以作为选择可靠挖掘的标准。我们的实验结果表明，我们的方法在两个医学图像数据集上超过了当前state-of-the-art方法的性能，这证明了我们的方法的优越性。

C$^2$VAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior

paper_url: http://arxiv.org/abs/2309.13303
repo_url: None
paper_authors: Zhangkai Wu, Longbing Cao
for: 学习分离的隐藏因素和相关的隐藏因素
methods: 使用自适应变换自动机（VAE）和循环优化
results: 提高分离表示学习的效果，并且解决了TC-基于VAE的不稳定性和表达和表示之间的负面效应

Abstract
We present a self-supervised variational autoencoder (VAE) to jointly learn disentangled and dependent hidden factors and then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations in a contrastive manner. To this end, a Contrastive Copula VAE (C$^2$VAE) is introduced without relying on prior knowledge about data in the probabilistic principle and involving strong modeling assumptions on the posterior in the neural architecture. C$^2$VAE simultaneously factorizes the posterior (evidence lower bound, ELBO) with total correlation (TC)-driven decomposition for learning factorized disentangled representations and extracts the dependencies between hidden features by a neural Gaussian copula for copula coupled representations. Then, a self-supervised contrastive classifier differentiates the disentangled representations from the coupled representations, where a contrastive loss regularizes this contrastive classification together with the TC loss for eliminating entangled factors and strengthening disentangled representations. C$^2$VAE demonstrates a strong effect in enhancing disentangled representation learning. C$^2$VAE further contributes to improved optimization addressing the TC-based VAE instability and the trade-off between reconstruction and representation.

摘要
我们提出了一种自助学习的变分自动编码器（VAE），用于同时学习独立的隐藏因素和相关的隐藏因素。然后，我们使用一种自我超级vised类ifier来消除相关的表示，从而增强独立表示学习。为此，我们引入了一种叫做Contrastive Copula VAE（C$^2$VAE），不需要对数据的 probabilistic principle 进行严格的模型假设，同时可以学习分解 posterior（证明下界，ELBO）和总相关度（TC）驱动的分解，以学习独立的隐藏表示。然后，一种自我超级vised类ifier可以分辨出独立的表示和相关的表示，并通过对这两个类别进行对比来regular化这种对比分类。C$^2$VAE在提高独立表示学习的效果，同时也有助于改善优化，解决TC-基于VAE的不稳定性和表示重建之间的负面选择问题。

Gaining the Sparse Rewards by Exploring Binary Lottery Tickets in Spiking Neural Network

paper_url: http://arxiv.org/abs/2309.13302
repo_url: None
paper_authors: Hao Cheng, Jiahang Cao, Erjia Xiao, Pu Zhao, Mengshu Sun, Jiaxu Wang, Jize Zhang, Xue Lin, Bhavya Kailkhura, Kaidi Xu, Renjing Xu
for: This paper aims to explore the efficiency of Spiking Neural Networks (SNNs) by investigating the existence of Lottery Tickets (LTs) in binary SNNs and comparing the spiking mechanism with simple model binarization.
methods: The paper proposes a sparse training method called Binary Weights Spiking Lottery Tickets (BinW-SLT) to find LTs in binary SNNs under different network structures.
results: The paper shows that BinW-SLT can achieve up to +5.86% and +3.17% improvement on CIFAR-10 and CIFAR-100 compared with binary LTs, as well as achieve 1.86x and 8.92x energy saving compared with full-precision SNNs and ANNs.

Abstract
Spiking Neural Network (SNN) as a brain-inspired strategy receives lots of attention because of the high-sparsity and low-power properties derived from its inherent spiking information state. To further improve the efficiency of SNN, some works declare that the Lottery Tickets (LTs) Hypothesis, which indicates that the Artificial Neural Network (ANN) contains a subnetwork without sacrificing the performance of the original network, also exists in SNN. However, the spiking information handled by SNN has a natural similarity and affinity with binarization in sparsification. Therefore, to further explore SNN efficiency, this paper focuses on (1) the presence or absence of LTs in the binary SNN, and (2) whether the spiking mechanism is a superior strategy in terms of handling binary information compared to simple model binarization. To certify these consumptions, a sparse training method is proposed to find Binary Weights Spiking Lottery Tickets (BinW-SLT) under different network structures. Through comprehensive evaluations, we show that BinW-SLT could attain up to +5.86% and +3.17% improvement on CIFAR-10 and CIFAR-100 compared with binary LTs, as well as achieve 1.86x and 8.92x energy saving compared with full-precision SNN and ANN.

摘要
神经网络（SNN）因其自然的简约性和低功耗特性而受到了很多关注。一些研究表明，Artificial Neural Network（ANN）中的子网络也存在于SNN中，这被称为彩票假设（LTs）。然而，SNN处理的脉冲信息具有自然的相似性和亲和力，因此可以通过对binary化进行压缩来提高SNN的效率。为了进一步探索SNN的效率，本文主要研究了以下两个问题：（1）在二进制SNN中是否存在LTs，（2）脉冲机制是对binary信息处理的超越策略吗？为了证明这些消耗，我们提出了一种稀疏训练方法，可以在不同的网络结构下找到Binary Weights Spiking Lottery Tickets（BinW-SLT）。经过全面的评估，我们发现BinW-SLT可以在CIFAR-10和CIFAR-100上提高+5.86%和+3.17%，并且可以在full-precision SNN和ANN上实现1.86x和8.92x的能量减少。

MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View Stereo

paper_url: http://arxiv.org/abs/2309.13294
repo_url: https://github.com/rongxuantan/mp-mvs
paper_authors: Rongxuan Tan, Qing Wang, Xueyan Wang, Chao Yan, Yang Sun, Youyang Feng
for: 提高多视图ステレオ（MVS）基于3D重建的准确性。
methods: 提出了一种可靠的多观察窗口PatchMatch（mPM），以获取无文本区域的可靠深度值。此外，我们还改进了现有的检查板样本方案，限制我们的样本到远距离区域，以提高空间卷积的效率，同时避免生成异常值。最后，我们引入并改进了平面假设辅助PatchMatch（ACMP），而不是依赖于光学一致性，我们利用多视图之间的几何一致信息选择可靠的三角形 vertices。
results: 我们的方法在ETH3D高分辨率多视图标准准测试集上与多个状态 искусственный智能方法进行比较，结果表明，我们的方法可以达到状态 искусственный智能水平。

Abstract
Significant strides have been made in enhancing the accuracy of Multi-View Stereo (MVS)-based 3D reconstruction. However, untextured areas with unstable photometric consistency often remain incompletely reconstructed. In this paper, we propose a resilient and effective multi-view stereo approach (MP-MVS). We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas. In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches. Subsequently, we improve the existing checkerboard sampling schemes by limiting our sampling to distant regions, which can effectively improve the efficiency of spatial propagation while mitigating outlier generation. Finally, we introduce and improve planar prior assisted PatchMatch of ACMP. Instead of relying on photometric consistency, we utilize geometric consistency information between multi-views to select reliable triangulated vertices. This strategy can obtain a more accurate planar prior model to rectify photometric consistency measurements. Our approach has been tested on the ETH3D High-res multi-view benchmark with several state-of-the-art approaches. The results demonstrate that our approach can reach the state-of-the-art. The associated codes will be accessible at https://github.com/RongxuanTan/MP-MVS.

摘要
<>translate text into Simplified Chinese多视图ステレオ（MVS）ベースの3D复元において、精度が向上した进歩がありました。しかし、不安定な光学的一致性を持つ不染色领域がまだ不完全に复元されていることがあります。この论文では、抗强度的かつ效率的な多视点ステレオアプローチ（MP-MVS）を提案します。我々は、信赖性の高い深度を得るために、多スケールのウィンドウパッチマッチ（mPM）を设计しました。他の多スケールアプローチと异なり、我々のアプローチは速く、パッチマッチベースのMVSアプローチに容易に拡张できます。次に、我々は、离れた地域に限定されたサンプリングを行うことで、空间的な広がりを改善することができます。また、アウトライアーの生成を抑制することができます。最后に、我々は、写真的一致性を基准にしてパッチマッチをサポートする计画的な射影を提案します。このストラテジは、多视点间の几何学的一致性情报を利用して、信赖性の高い平面モデルを构筑することができます。我々のアプローチは、ETH3D High-res多视点ベンチマークで复数の现状最高のアプローチとの比较を行い、结果はstate-of-the-artに到达することを示しています。関连するコードは、https://github.com/RongxuanTan/MP-MVSにアクセスできます。

Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation

paper_url: http://arxiv.org/abs/2309.14360
repo_url: None
paper_authors: Yulong Zhang, Shuhao Chen, Weisen Jiang, Yu Zhang, Jiangang Lu, James T. Kwok
for: 提高深度学习模型在新应用场景中的性能，增强Unsupervised Domain Adaptation（UDA）技术的表现。
methods: 提出DomAin-guided Conditional Diffusion Model（DACDM），通过控制生成样本的标签信息和引入域类分类器，以生成高准确性和多样性的目标域样本，从而使得现有的UDA方法更容易在目标域上传输。
results: 经验表明，DACDM可以大幅提高现有UDA方法在多种标准权重环境下的表现。

Abstract
Limited transferability hinders the performance of deep learning models when applied to new application scenarios. Recently, Unsupervised Domain Adaptation (UDA) has achieved significant progress in addressing this issue via learning domain-invariant features. However, the performance of existing UDA methods is constrained by the large domain shift and limited target domain data. To alleviate these issues, we propose DomAin-guided Conditional Diffusion Model (DACDM) to generate high-fidelity and diversity samples for the target domain. In the proposed DACDM, by introducing class information, the labels of generated samples can be controlled, and a domain classifier is further introduced in DACDM to guide the generated samples for the target domain. The generated samples help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance. Extensive experiments on various benchmarks demonstrate that DACDM brings a large improvement to the performance of existing UDA methods.

摘要
因深度学习模型在新应用场景中表现不佳的限制性，现在不监督领域适应（UDA）已经取得了显著的进步，通过学习领域共同特征来解决这一问题。然而，现有的 UDA 方法受到大领域差异和有限目标领域数据的限制。为了解决这些问题，我们提议了带有类信息的 DomAin-guided Conditional Diffusion Model (DACDM)，用于生成高准确性和多样性的目标领域样本。在我们的 DACDM 中，通过引入类信息，生成的样本的标签可以被控制，并在 DACDM 中引入了领域分类器，以便导引生成的样本到目标领域。这些生成的样本可以帮助现有的 UDA 方法更好地在源领域和目标领域之间传输，从而提高传输性能。我们在各种标准准点上进行了广泛的实验，并证明了 DACDM 可以大幅提高现有 UDA 方法的表现。

Automatic Reverse Engineering: Creating computer-aided design (CAD) models from multi-view images

paper_url: http://arxiv.org/abs/2309.13281
repo_url: None
paper_authors: Henrik Jobczyk, Hanno Homann
for: automated reverse engineering task
methods: combine three distinct stages: convolutional neural network, multi-view pooling, and transformer-based CAD sequence generator
results: successfully reconstructed valid CAD models from simulated test image data, and demonstrated some capabilities in real-world test with actual photographs of three-dimensional test objects, but limited to basic shapes.

Abstract
Generation of computer-aided design (CAD) models from multi-view images may be useful in many practical applications. To date, this problem is usually solved with an intermediate point-cloud reconstruction and involves manual work to create the final CAD models. In this contribution, we present a novel network for an automated reverse engineering task. Our network architecture combines three distinct stages: A convolutional neural network as the encoder stage, a multi-view pooling stage and a transformer-based CAD sequence generator. The model is trained and evaluated on a large number of simulated input images and extensive optimization of model architectures and hyper-parameters is performed. A proof-of-concept is demonstrated by successfully reconstructing a number of valid CAD models from simulated test image data. Various accuracy metrics are calculated and compared to a state-of-the-art point-based network. Finally, a real world test is conducted supplying the network with actual photographs of two three-dimensional test objects. It is shown that some of the capabilities of our network can be transferred to this domain, even though the training exclusively incorporates purely synthetic training data. However to date, the feasible model complexity is still limited to basic shapes.

摘要
计算机支持设计（CAD）模型生成从多视图图像可能在很多实际应用中有用。目前，这个问题通常通过中间点云重建解决，并且需要手动创建最终CAD模型。在这篇论文中，我们提出了一种新的网络，用于自动反工程设计任务。我们的网络架构包括三个不同的阶段：一个卷积神经网络作为编码阶段，一个多视图池化阶段，以及一个基于变换器的CAD序列生成器。我们在一大量的模拟输入图像和优化模型结构和超参数方面进行了训练和评估。我们成功地从模拟测试数据中生成了一些有效的CAD模型。我们还计算了几个精度指标，并与当前点 cloud 网络进行比较。最后，我们在实际测试中使用实际照片对两个三维测试对象进行测试，并证明了我们的网络部分可以在这个领域中转移。然而，目前可能的模型复杂度仍然受限于基本形状。

Discwise Active Learning for LiDAR Semantic Segmentation

paper_url: http://arxiv.org/abs/2309.13276
repo_url: None
paper_authors: Ozan Unal, Dengxin Dai, Ali Tamer Unal, Luc Van Gool
For: This paper explores the use of active learning (AL) for LiDAR semantic segmentation, with a focus on improving annotation efficiency and reducing costs.* Methods: The proposed method, called DiAL, uses a discwise approach to query the region covered by a single frame on global coordinates, and labels all frames simultaneously. It also addresses two major challenges in discwise AL: a new acquisition function that takes 3D point density changes into consideration, and a mixed-integer linear program to select multiple frames while avoiding disc intersections.* Results: The proposed method is evaluated on a real-world LiDAR dataset, and shows improved performance and efficiency compared to traditional sequential labeling methods. Additionally, a semi-supervised learning approach is proposed to utilize all frames within the dataset and further improve performance.

Abstract
While LiDAR data acquisition is easy, labeling for semantic segmentation remains highly time consuming and must therefore be done selectively. Active learning (AL) provides a solution that can iteratively and intelligently label a dataset while retaining high performance and a low budget. In this work we explore AL for LiDAR semantic segmentation. As a human expert is a component of the pipeline, a practical framework must consider common labeling techniques such as sequential labeling that drastically improve annotation times. We therefore propose a discwise approach (DiAL), where in each iteration, we query the region a single frame covers on global coordinates, labeling all frames simultaneously. We then tackle the two major challenges that emerge with discwise AL. Firstly we devise a new acquisition function that takes 3D point density changes into consideration which arise due to location changes or ego-vehicle motion. Next we solve a mixed-integer linear program that provides a general solution to the selection of multiple frames while taking into consideration the possibilities of disc intersections. Finally we propose a semi-supervised learning approach to utilize all frames within our dataset and improve performance.

摘要
利用LiDAR数据获取 relativamente fácil，但是用于 semantic segmentation 的标注仍然非常时间consuming，因此需要选择性地进行标注。活动学习（AL）提供了一个解决方案，可以逐步地、智能地标注数据集，同时保持高性能和低预算。在这项工作中，我们探索了 LiDAR semantic segmentation 中的 AL。作为人类专家是数据流水线的一部分，我们需要考虑常见的标注技术，如顺序标注，以提高标注时间的效率。因此，我们提出了一种区域方法（DiAL），其中，在每次迭代中，我们查询当前帧所覆盖的全球坐标范围，并同时标注所有帧。然后，我们解决了两个主要的挑战，即：1. 根据Location changes 或者自动车动的影响，3D 点云 density 发生变化，我们提出了一种新的获取函数，以便考虑这些变化。2. 我们解决了板块交叠的问题，通过一种混合整数线性Programming 提供一个通用的解决方案，以便选择多帧的板块。3. 最后，我们提出了一种半supervised learning 方法，以利用整个数据集，并提高性能。

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

paper_url: http://arxiv.org/abs/2309.13274
repo_url: https://github.com/iva-mzsun/glober
paper_authors: Mingzhen Sun, Weining Wang, Zihan Qin, Jiahui Sun, Sihan Chen, Jing Liu
for: 这个论文目的是提出一种新的非 autoregressive 视频生成方法，以提高视频生成的全局性和本地性。
methods: 该方法首先生成全局特征，以获取全面的全局指导，然后基于全局特征，通过非 autoregressive 的方式，生成具有全局性和本地性的视频帧。特别是，我们提出了一个视频自编码器，该自编码器将视频转换为全局特征，并建立了一个基于扩散模型的视频解码器，该解码器通过非 autoregressive 的方式，生成视频帧。
results: 我们的实验结果表明，我们的提出的方法可以具有高效率和高质量的视频生成。我们在多个 benchmark 上达到了新的州OF-THE-ART 结果。

Abstract
Video generation necessitates both global coherence and local realism. This work presents a novel non-autoregressive method GLOBER, which first generates global features to obtain comprehensive global guidance and then synthesizes video frames based on the global features to generate coherent videos. Specifically, we propose a video auto-encoder, where a video encoder encodes videos into global features, and a video decoder, built on a diffusion model, decodes the global features and synthesizes video frames in a non-autoregressive manner. To achieve maximum flexibility, our video decoder perceives temporal information through normalized frame indexes, which enables it to synthesize arbitrary sub video clips with predetermined starting and ending frame indexes. Moreover, a novel adversarial loss is introduced to improve the global coherence and local realism between the synthesized video frames. Finally, we employ a diffusion-based video generator to fit the global features outputted by the video encoder for video generation. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed method, and new state-of-the-art results have been achieved on multiple benchmarks.

摘要
视频生成需要both全局一致性和地方现实性。这项工作提出了一种新的非 autoregressive方法GLOBER，它首先生成全局特征来获得全面的全局指导，然后根据全局特征来生成协调的视频帧，以生成一致的视频。具体来说，我们提出了一个视频自编码器，其中视频编码器将视频编码成全局特征，而视频解码器，基于扩散模型，将全局特征解码并在非 autoregressive 方式下synthesize视频帧。为了保证最大的灵活性，我们的视频解码器通过normalized帧索引来感知时间信息，这使其能够synthesize任意的子视频片段，并且通过引入novel adversarial loss来提高全局一致性和地方现实性 между生成的视频帧。最后，我们使用扩散基于的视频生成器来适应GLOBER输出的全局特征。我们的实验结果表明，我们的提出的方法效果和效率都非常高，并在多个标准准则上达到了新的状态码。

Randomize to Generalize: Domain Randomization for Runway FOD Detection

paper_url: http://arxiv.org/abs/2309.13264
repo_url: https://github.com/hieu9955/ggggg
paper_authors: Javaria Farooq, Nayyer Aafaq, M Khizer Ali Khan, Ammar Saleem, M Ibraheem Siddiqui
for:* The paper aims to improve object detection in low-resolution images with small objects and diverse backgrounds, which is challenging for existing methods.methods:* The proposed method, Synthetic Randomized Image Augmentation (SRIA), consists of two stages: weakly supervised pixel-level segmentation mask generation and batch-wise synthesis of artificial images with diverse augmentations.results:* The proposed method significantly improves object detection accuracy on out-of-distribution (OOD) test sets, with a reported improvement from 41% to 92%. The method also outperforms several state-of-the-art (SOTA) models, including CenterNet, SSD, YOLOv3, YOLOv4, YOLOv5, and Outer Vit, on a publicly available foreign object debris (FOD) dataset.

Abstract
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio. Further, object detection methodologies often make underlying assumption that both training and testing data remain congruent. However, this presumption often leads to decline in performance when model is applied to out-of-domain(unseen) data. Techniques like synthetic image generation are employed to improve model performance by leveraging variations in input data. Such an approach typically presumes access to 3D-rendered datasets. In contrast, we propose a novel two-stage methodology Synthetic Randomized Image Augmentation (SRIA), carefully devised to enhance generalization capabilities of models encountering 2D datasets, particularly with lower resolution which is more practical in real-world scenarios. The first stage employs a weakly supervised technique to generate pixel-level segmentation masks. Subsequently, the second stage generates a batch-wise synthesis of artificial images, carefully designed with an array of diverse augmentations. The efficacy of proposed technique is illustrated on challenging foreign object debris (FOD) detection. We compare our results with several SOTA models including CenterNet, SSD, YOLOv3, YOLOv4, YOLOv5, and Outer Vit on a publicly available FOD-A dataset. We also construct an out-of-distribution test set encompassing 800 annotated images featuring a corpus of ten common categories. Notably, by harnessing merely 1.81% of objects from source training data and amalgamating with 29 runway background images, we generate 2227 synthetic images. Subsequent model retraining via transfer learning, utilizing enriched dataset generated by domain randomization, demonstrates significant improvement in detection accuracy. We report that detection accuracy improved from an initial 41% to 92% for OOD test set.

摘要
小 объек detection 是一个挑战，主要由于小型、低分辨率、 occlusion、背景噪音、照明条件以及小对图像比率而导致。此外，检测方法ologies 经常假设训练和测试数据保持一致，但这种假设常导致模型在未经验数据上下运行时表现下降。为了改善模型性能，人们通常采用生成 Synthetic 图像的技术。然而，这些技术通常假设有许多3D-rendered 数据可供使用。在这种情况下，我们提出了一种新的两Stage 方法，即Synthetic Randomized Image Augmentation（SRIA），旨在提高模型在2D数据上的泛化能力。首先，我们使用弱有监督技术生成像素级分割标记。然后，第二stage 使用批量 Synthesis 技术生成一批人工生成的假图像，并且特意设计了一系列多样化的增强。我们证明了我们的方法在困难的外部物杂（FOD）检测 task 上表现出色。我们与多个state-of-the-art（SOTA）模型，包括CenterNet、SSD、YOLOv3、YOLOv4、YOLOv5和Outer Vit进行比较。我们还构建了一个异常数据集，包括800个标注图像，其中包含10种常见类别。吸引地，我们只需使用来源训练数据中的1.81%的对象，并将其混合到29个runway背景图像中，就可以生成2227个假图像。然后，通过将模型重新训练，使用了增强的频率 randomization 数据集，我们可以看到检测精度从41%提高到92%。

Order-preserving Consistency Regularization for Domain Adaptation and Generalization

paper_url: http://arxiv.org/abs/2309.13258
repo_url: None
paper_authors: Mengmeng Jing, Xiantong Zhen, Jingjing Li, Cees Snoek
for: 提高cross-domain任务中深度学习模型的Robustness，使其不受特定领域属性的影响。
methods: 采用数据增强和一致规范来使模型更不敏感于特定领域属性。
results: 对五种不同的cross-domain任务进行了全面的实验，得到了明显的优势。

Abstract
Deep learning models fail on cross-domain challenges if the model is oversensitive to domain-specific attributes, e.g., lightning, background, camera angle, etc. To alleviate this problem, data augmentation coupled with consistency regularization are commonly adopted to make the model less sensitive to domain-specific attributes. Consistency regularization enforces the model to output the same representation or prediction for two views of one image. These constraints, however, are either too strict or not order-preserving for the classification probabilities. In this work, we propose the Order-preserving Consistency Regularization (OCR) for cross-domain tasks. The order-preserving property for the prediction makes the model robust to task-irrelevant transformations. As a result, the model becomes less sensitive to the domain-specific attributes. The comprehensive experiments show that our method achieves clear advantages on five different cross-domain tasks.

摘要
深度学习模型在跨频道挑战中失败，因为模型过敏于频道特有的特征，如闪电、背景、摄像头角度等。为解决这问题，数据增强和一致化 regularization 是常用的方法，以使模型对频道特有的特征更不敏感。一致性 regularization 要求模型对两个视图的同一张图像输出相同的表示或预测结果。然而，这些约束 Either too strict or not order-preserving for the classification probabilities。在这工作中，我们提出了Order-preserving Consistency Regularization (OCR) 方法，该方法的顺序性质使模型对任务不相关的变换具有鲁棒性。因此，模型对频道特有的特征变得更不敏感。我们的方法在五个不同的跨频道任务中实现了明显的优势。

RTrack: Accelerating Convergence for Visual Object Tracking via Pseudo-Boxes Exploration

paper_url: http://arxiv.org/abs/2309.13257
repo_url: None
paper_authors: Guotian Zeng, Bi Zeng, Hong Zhang, Jianqi Liu, Qingmao Wei
for: 提高单目标跟踪（SOT）的性能，减少训练时间并靠近对象检测（OD）任务
methods: 使用一组样本点来获取pseudo bounding box，自动调整这些点来定义空间范围和强调本地区域
results: 在GOT-10k数据集上实现了与状态革新（SOTA）跟踪器相同的性能，减少训练时间到前一代跟踪器的10%，将SOT更近于OD任务，并且在各种情况下展现了更快的转化速度。

Abstract
Single object tracking (SOT) heavily relies on the representation of the target object as a bounding box. However, due to the potential deformation and rotation experienced by the tracked targets, the genuine bounding box fails to capture the appearance information explicitly and introduces cluttered background. This paper proposes RTrack, a novel object representation baseline tracker that utilizes a set of sample points to get a pseudo bounding box. RTrack automatically arranges these points to define the spatial extents and highlight local areas. Building upon the baseline, we conducted an in-depth exploration of the training potential and introduced a one-to-many leading assignment strategy. It is worth noting that our approach achieves competitive performance to the state-of-the-art trackers on the GOT-10k dataset while reducing training time to just 10% of the previous state-of-the-art (SOTA) trackers' training costs. The substantial reduction in training costs brings single-object tracking (SOT) closer to the object detection (OD) task. Extensive experiments demonstrate that our proposed RTrack achieves SOTA results with faster convergence.

摘要
单一目标追踪（SOT）仅靠目标物体的包围盒来表示，但由于追踪目标的潜在偏移和旋转，真正的包围盒将显示出目标物体的外观信息，却会受到背景噪声的影响。本文提出了一个新的物体表现基准追踪器（RTrack），利用一组Sample Points来取得伪包围盒。RTrack可自动安排这些点来定义空间扩展和突出地方。基于这个基准，我们进行了深入的训练潜力探索和引入了一个一对多领导分配策略。值得注意的是，我们的方法在GOT-10k dataset上与现有的State-of-the-art（SOTA）追踪器相比，具有竞争性的性能，同时降低训练时间至前一个SOTA追踪器的10%。这个重要的减少训练时间将单一目标追踪（SOT）与物体检测（OD）任务逐渐接近。广泛的实验显示了我们提出的RTrack具有SOTA结果，并且更快地趋向稳定。

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

paper_url: http://arxiv.org/abs/2309.13248
repo_url: https://github.com/kfan21/eoras
paper_authors: Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
for: 这个论文的目的是提出一种高效的物体抽象分割方法，用于视频模态分割 tasks。
methods: 该方法利用了自主学习的方式，并使用了对象центric表示，以便在实际场景中使用对象特征来优化特征质量。具体来说，该方法包括一个翻译模块，用于将图像特征 proyect 到鸟瞰视图（BEV）中，以获得3D信息；以及一个多视图协同层，用于在不同视图之间进行对话 Mechanism，以完善对象表示。
results: 实验结果表明，该方法可以在实际场景中实现高效的物体抽象分割，并且在多个synthetic和实际 benchmark中达到了领先的性能。

Abstract
Video amodal segmentation is a particularly challenging task in computer vision, which requires to deduce the full shape of an object from the visible parts of it. Recently, some studies have achieved promising performance by using motion flow to integrate information across frames under a self-supervised setting. However, motion flow has a clear limitation by the two factors of moving cameras and object deformation. This paper presents a rethinking to previous works. We particularly leverage the supervised signals with object-centric representation in \textit{real-world scenarios}. The underlying idea is the supervision signal of the specific object and the features from different views can mutually benefit the deduction of the full mask in any specific frame. We thus propose an Efficient object-centric Representation amodal Segmentation (EoRaS). Specially, beyond solely relying on supervision signals, we design a translation module to project image features into the Bird's-Eye View (BEV), which introduces 3D information to improve current feature quality. Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion. As a result, the full mask of the object can be decoded from image features updated by object slots. Extensive experiments on both real-world and synthetic benchmarks demonstrate the superiority of our proposed method, achieving state-of-the-art performance. Our code will be released at \url{https://github.com/kfan21/EoRaS}.

摘要
视频无模板分割是计算机视觉中特别具有挑战性的任务，需要从可见部分中推断物体的全部形状。最近几年，一些研究已经取得了一定的成果，通过在无监督的设置下使用运动流来集成帧中的信息。然而，运动流受到两个因素的限制：摄像机的移动和物体的变形。本文提出了对前一些研究的重新思考。我们特别利用实际场景中的监督信号和不同视图中的特征进行协同协调，以解决任意帧中的全面 маска推断。我们因此提出了一种高效的物体中心表示协调分割（EoRaS）方法。具体来说，我们不仅仅依靠监督信号，还设计了一种将图像特征投影到鸟瞰视（BEV）中的翻译模块，以此增加图像特征的3D信息。此外，我们提出了基于多视图的协同协调层，该层配备了一组物体槽，通过注意机制与不同视图中的特征进行互动，以便填充物体的完整表示。因此，从图像特征更新后的物体槽中可以解码出物体的全面 маска。广泛的实验表明，我们提出的方法在实际和Syntheticbenchmark上具有最高性能，达到了状态盘的表现。我们的代码将在\url{https://github.com/kfan21/EoRaS}上发布。

paper_url: http://arxiv.org/abs/2309.13247
repo_url: None
paper_authors: Yifan Ding, Liqiang Wang, Boqing Gong
for: 本文旨在提出一种 novel 的多模式知识转移方法，用于解决 Referring Expression Grounding (REG) 问题。
methods: 我们提出的方法利用特定的关系修饰，同时增强多模式间关系和多模式与另一个模式之间的转移关系，以提高多模式领域的转移性能。
results: 我们的方法在实验中显著提高了多模式领域的转移性能，并且在 REG 问题中显著提高了地区localization的准确率。

Abstract
Domain adaptation, which aims to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection. However, for multi-modal tasks, conventional approaches rely on large-scale pre-training. But due to the difficulty of acquiring multi-modal data, large-scale pre-training is often impractical. Therefore, domain adaptation, which can efficiently utilize the knowledge from different datasets (domains), is crucial for multi-modal tasks. In this paper, we focus on the Referring Expression Grounding (REG) task, which is to localize an image region described by a natural language expression. Specifically, we propose a novel approach to effectively transfer multi-modal knowledge through a specially relation-tailored approach for the REG problem. Our approach tackles the multi-modal domain adaptation problem by simultaneously enriching inter-domain relations and transferring relations between domains. Experiments show that our proposed approach significantly improves the transferability of multi-modal domains and enhances adaptation performance in the REG problem.

摘要
域适应，targeting to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection. However, for multi-modal tasks, conventional approaches rely on large-scale pre-training. But due to the difficulty of acquiring multi-modal data, large-scale pre-training is often impractical. Therefore, domain adaptation, which can efficiently utilize the knowledge from different datasets (domains), is crucial for multi-modal tasks. In this paper, we focus on the Referring Expression Grounding (REG) task, which is to localize an image region described by a natural language expression. Specifically, we propose a novel approach to effectively transfer multi-modal knowledge through a specially relation-tailored approach for the REG problem. Our approach tackles the multi-modal domain adaptation problem by simultaneously enriching inter-domain relations and transferring relations between domains. Experiments show that our proposed approach significantly improves the transferability of multi-modal domains and enhances adaptation performance in the REG problem.Here's the translation in Traditional Chinese as well:域适应，targeting to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection. However, for multi-modal tasks, conventional approaches rely on large-scale pre-training. But due to the difficulty of acquiring multi-modal data, large-scale pre-training is often impractical. Therefore, domain adaptation, which can efficiently utilize the knowledge from different datasets (domains), is crucial for multi-modal tasks. In this paper, we focus on the Referring Expression Grounding (REG) task, which is to localize an image region described by a natural language expression. Specifically, we propose a novel approach to effectively transfer multi-modal knowledge through a specially relation-tailored approach for the REG problem. Our approach tackles the multi-modal domain adaptation problem by simultaneously enriching inter-domain relations and transferring relations between domains. Experiments show that our proposed approach significantly improves the transferability of multi-modal domains and enhances adaptation performance in the REG problem.I hope this helps!

RBFormer: Improve Adversarial Robustness of Transformer by Robust Bias

paper_url: http://arxiv.org/abs/2309.13245
repo_url: None
paper_authors: Hao Cheng, Jinhao Duan, Hui Li, Lyutianyang Zhang, Jiahang Cao, Ping Wang, Jize Zhang, Kaidi Xu, Renjing Xu
for: 本研究旨在 investigate Transformer-based structure 的内在Robustness性能，而不是 introduce novel defense measures against adversarial attacks.
methods: 我们使用 rational structure design approach 来 mitigate the susceptibility to robustness issues, 具体来说是增加高频结构强度的偏好。
results: 我们的实验结果显示， compared to several existing baseline structures, RBFormer 表现出了 robust superiority, 在 CIFAR-10 和 ImageNet-1k 上的评价标准上减少了 +16.12% 和 +5.04%。

Abstract
Recently, there has been a surge of interest and attention in Transformer-based structures, such as Vision Transformer (ViT) and Vision Multilayer Perceptron (VMLP). Compared with the previous convolution-based structures, the Transformer-based structure under investigation showcases a comparable or superior performance under its distinctive attention-based input token mixer strategy. Introducing adversarial examples as a robustness consideration has had a profound and detrimental impact on the performance of well-established convolution-based structures. This inherent vulnerability to adversarial attacks has also been demonstrated in Transformer-based structures. In this paper, our emphasis lies on investigating the intrinsic robustness of the structure rather than introducing novel defense measures against adversarial attacks. To address the susceptibility to robustness issues, we employ a rational structure design approach to mitigate such vulnerabilities. Specifically, we enhance the adversarial robustness of the structure by increasing the proportion of high-frequency structural robust biases. As a result, we introduce a novel structure called Robust Bias Transformer-based Structure (RBFormer) that shows robust superiority compared to several existing baseline structures. Through a series of extensive experiments, RBFormer outperforms the original structures by a significant margin, achieving an impressive improvement of +16.12% and +5.04% across different evaluation criteria on CIFAR-10 and ImageNet-1k, respectively.

摘要
近期，有一股关注和关注力在Transformer结构方面，如视觉转换器（ViT）和视觉多层感知器（VMLP）。与过去的卷积结构相比，Transformer结构在其特有的注意力基于输入token混合策略下显示了相当或更高的性能。在引入对抗示例作为Robustness考虑的情况下，已经证明了传统的卷积结构的内置敏感性。在这篇论文中，我们的注意点是研究Transformer结构的内置Robustness，而不是引入新的防御措施 против对抗攻击。为了解决敏感性问题，我们采用了合理的结构设计方法，增加高频结构Robust遗产偏好。因此，我们提出了一种新的结构，即Robust Bias Transformer-based Structure（RBFormer），它在不同评价标准下与多个基准结构进行比较，具有显著的超越性。通过一系列广泛的实验，RBFormer在CIFAR-10和ImageNet-1k上的表现都出色，与原始结构之间的提升为+16.12%和+5.04%。

NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation

paper_url: http://arxiv.org/abs/2309.13240
repo_url: None
paper_authors: Rui Yu, Jiachen Liu, Zihan Zhou, Sharon X. Huang
for: 增强环境感知，如 робоット导航和远程视觉协助等应用场景中，扩大摄像头的视野范围有利于提高环境感知。
methods: 使用NeRF生成扩展视野图像，并使用这些图像培育场景特定的图像填充模型。
results: 对三个 fotorealistic 数据集和一个实际世界数据集进行了广泛的测试，并显示了方法的稳定性和潜力。

Abstract
In various applications, such as robotic navigation and remote visual assistance, expanding the field of view (FOV) of the camera proves beneficial for enhancing environmental perception. Unlike image outpainting techniques aimed solely at generating aesthetically pleasing visuals, these applications demand an extended view that faithfully represents the scene. To achieve this, we formulate a new problem of faithful FOV extrapolation that utilizes a set of pre-captured images as prior knowledge of the scene. To address this problem, we present a simple yet effective solution called NeRF-Enhanced Outpainting (NEO) that uses extended-FOV images generated through NeRF to train a scene-specific image outpainting model. To assess the performance of NEO, we conduct comprehensive evaluations on three photorealistic datasets and one real-world dataset. Extensive experiments on the benchmark datasets showcase the robustness and potential of our method in addressing this challenge. We believe our work lays a strong foundation for future exploration within the research community.

摘要
在各种应用中，如 робоット导航和远程视觉协助，扩展相机的视场（FOV）有利于提高环境识别。不同于基于图像涂抹技术的艺术化视觉，这些应用需要一个扩展的视野，准确反映场景。为实现这一点，我们提出了一个新的 faithful FOV 拓展问题，利用场景中预 capture 的图像作为知识来培养场景特定的图像涂抹模型。为解决这个问题，我们提出了一种简单 yet effective 的解决方案called NeRF-Enhanced Outpainting（NEO），使用通过NeRF生成的扩展 FOV 图像来训练场景特定的图像涂抹模型。为评估 NEO 的性能，我们进行了三个实验室数据集和一个真实世界数据集的全面评估。广泛的实验表明我们的方法在解决这个挑战中具有强大的基础和潜力。我们认为我们的研究为未来研究社区提供了一个坚强的基础。

Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation

paper_url: http://arxiv.org/abs/2309.13237
repo_url: https://github.com/hcplab-sysu/stket
paper_authors: Tao Pu, Tianshui Chen, Hefeng Wu, Yongyi Lu, Liang Lin
for: VidSGG aims to identify objects in visual scenes and infer their relationships for a given video.
methods: The proposed method, STKET, incorporates prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations.
results: STKET outperforms current competing algorithms by a large margin, with improvements of 8.1%, 4.7%, and 2.1% on different settings.

Abstract
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video. It requires not only a comprehensive understanding of each object scattered on the whole scene but also a deep dive into their temporal motions and interactions. Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images, which can serve as prior knowledge to facilitate VidSGG model learning and inference. In this work, we propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations. Specifically, we first learn spatial co-occurrence and temporal transition correlations in a statistical manner. Then, we design spatial and temporal knowledge-embedded layers that introduce the multi-head cross-attention mechanism to fully explore the interaction between visual representation and the knowledge to generate spatial- and temporal-embedded representations, respectively. Finally, we aggregate these representations for each subject-object pair to predict the final semantic labels and their relationships. Extensive experiments show that STKET outperforms current competing algorithms by a large margin, e.g., improving the mR@50 by 8.1%, 4.7%, and 2.1% on different settings over current algorithms.

摘要
视频场景图生成（VidSGG）目标是从视频中标识对象并推断它们之间的关系。这需要不仅对整个场景中每个对象进行全面的理解，还需要深入研究它们的时间变化和互动。自然地，对象对的关系具有在每个图像中的空间相互关联和在不同图像之间的时间相关性，这些知识可以用来促进VidSGG模型的学习和推断。在这种工作中，我们提出了一种具有空间-时间知识嵌入的变换器（STKET），它将在多头交叉关注机制中嵌入先前学习的空间-时间知识，以学习更加表示关系的表示。具体来说，我们首先在统计方面学习空间共occurrence和时间转换关系。然后，我们设计了空间和时间知识嵌入层，以全面地探索视觉表示和知识之间的交互，生成空间-时间嵌入表示。最后，我们对每个主体-对象对进行综合这些表示，以预测最终的semantic标签和其关系。广泛的实验显示，STKET在不同设置下比现有竞争算法大幅提高了性能，例如在不同设置下提高mR@50的表现，比如提高8.1%, 4.7%和2.1%。

M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

paper_url: http://arxiv.org/abs/2309.13235
repo_url: None
paper_authors: Qibo Qiu, Honghui Yang, Wenxiao Wang, Shun Zhang, Haiming Gao, Haochao Ying, Wei Hua, Xiaofei He
for: 本研究旨在提高点云自助预训练的表现，使模型具备低级和高级表示能力，捕捉点云的几何细节和 semantic上下文。
methods: 该方法使用 masked point cloud 作为输入，并引入两个解码器同时预测masked表示和原始点 cloud。在解码过程中，我们提出了 siamese decoder 技术，以保持学习参数的数量不变。此外，我们还提出了在线 codebook 技术，将连续的 tokens 映射到精确的 discrete tokens。
results: 实验结果显示，M$^3$CS 在类别和分割任务中表现出色，与现有方法进行比较，具有更高的表现。

Abstract
Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the original points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities to capture geometric details and semantic contexts during pre-training. To this end, M$^3$CS is proposed to enable the model with the above abilities. Specifically, with masked point cloud as input, M$^3$CS introduces two decoders to predict masked representations and the original points simultaneously. While an extra decoder doubles parameters for the decoding process and may lead to overfitting, we propose siamese decoders to keep the amount of learnable parameters unchanged. Further, we propose an online codebook projecting continuous tokens into discrete ones before reconstructing masked points. In such way, we can enforce the decoder to take effect through the combinations of tokens rather than remembering each token. Comprehensive experiments show that M$^3$CS achieves superior performance at both classification and segmentation tasks, outperforming existing methods.

摘要
受隐藏点模型的推广使得自我超视了点云的预训练 scheme 成为了可靠的方法。现有的方法都是在预训练中重建原始点或相关的特征作为目标。然而，考虑到下游任务的多样性，需要模型具备低层和高层表示模型化能力，以 Capture point clouds的几何细节和 semantics during pre-training。为此，我们提出了M$^3$CS。具体来说，输入隐藏点云后，M$^3$CS引入了两个解码器同时预测隐藏表示和原始点。虽然增加了一个解码器会增加参数的数量，可能导致过拟合，但我们提出了同源的解码器，以保持参数的数量不变。此外，我们还提出了在线代码库，将连续的 токен proyect 为离散的 токен。这样可以让解码器通过Token的组合来实现效果，而不是记忆每个Token。我们的实验表明，M$^3$CS在类别和分割任务中表现出色，超过了现有的方法。

Real3D-AD: A Dataset of Point Cloud Anomaly Detection

paper_url: http://arxiv.org/abs/2309.13226
repo_url: https://github.com/m-3lab/real3d-ad
paper_authors: Jiaqi Liu, Guoyang Xie, Ruitao Chen, Xinpeng Li, Jinbao Wang, Yong Liu, Chengjie Wang, Feng Zheng
for: 高精度点云异常检测是精密制造和机器制造领域中的标准，用于检测机器制造过程中的缺陷。
methods: 我们提出了一种基于准备的3D异常检测方法，named Reg3D-AD，该方法包括一种新的特征记忆银行，可以保持本地和全局表示。
results: 我们在Real3D-AD dataset上进行了广泛的实验，并证明了Reg3D-AD的效果。Real3D-AD dataset是目前最大的高精度3D工业异常检测dataset，它包括1,254个高分辨率3D项，每个item有40,000到百万个点。

Abstract
High-precision point cloud anomaly detection is the gold standard for identifying the defects of advancing machining and precision manufacturing. Despite some methodological advances in this area, the scarcity of datasets and the lack of a systematic benchmark hinder its development. We introduce Real3D-AD, a challenging high-precision point cloud anomaly detection dataset, addressing the limitations in the field. With 1,254 high-resolution 3D items from forty thousand to millions of points for each item, Real3D-AD is the largest dataset for high-precision 3D industrial anomaly detection to date. Real3D-AD surpasses existing 3D anomaly detection datasets available regarding point cloud resolution (0.0010mm-0.0015mm), 360 degree coverage and perfect prototype. Additionally, we present a comprehensive benchmark for Real3D-AD, revealing the absence of baseline methods for high-precision point cloud anomaly detection. To address this, we propose Reg3D-AD, a registration-based 3D anomaly detection method incorporating a novel feature memory bank that preserves local and global representations. Extensive experiments on the Real3D-AD dataset highlight the effectiveness of Reg3D-AD. For reproducibility and accessibility, we provide the Real3D-AD dataset, benchmark source code, and Reg3D-AD on our website:https://github.com/M-3LAB/Real3D-AD.

摘要
高精度点云异常检测是现代加工和精密制造中的标准。 despite some methodological advances in this area, the scarcity of datasets and the lack of a systematic benchmark hinder its development. We introduce Real3D-AD, a challenging high-precision point cloud anomaly detection dataset, addressing the limitations in the field. With 1,254 high-resolution 3D items from forty thousand to millions of points for each item, Real3D-AD is the largest dataset for high-precision 3D industrial anomaly detection to date. Real3D-AD surpasses existing 3D anomaly detection datasets available regarding point cloud resolution (0.0010mm-0.0015mm), 360 degree coverage and perfect prototype. Additionally, we present a comprehensive benchmark for Real3D-AD, revealing the absence of baseline methods for high-precision point cloud anomaly detection. To address this, we propose Reg3D-AD, a registration-based 3D anomaly detection method incorporating a novel feature memory bank that preserves local and global representations. Extensive experiments on the Real3D-AD dataset highlight the effectiveness of Reg3D-AD. For reproducibility and accessibility, we provide the Real3D-AD dataset, benchmark source code, and Reg3D-AD on our website:https://github.com/M-3LAB/Real3D-AD.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese scripts used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

2023-09-23

cs.AI

cs.AI - 2023-09-23

Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy

paper_url: http://arxiv.org/abs/2309.13500
repo_url: None
paper_authors: Lin Ni, Sijie Wang, Zeyu Zhang, Xiaoxuan Li, Xianda Zheng, Paul Denny, Jiamou Liu
for: 这篇论文旨在提出一种新的学习战略——learnersourcing，并且解决学生发表问题时因为内在的噪声而难以预测学生表现的问题。
methods: 本文使用了签名双方 Graph Neural Networks (SGNNs) 和 Large Language Model (LLM) 的整合策略，实现了学生答案的全面模型化，并且使用了对照学习框架，增强了噪声抗性。
results: 本文针对五个真实世界的数据集，进行验证，结果显示了本方法的优越性，包括提高预测精度和类型抗性。

Abstract
As an emerging education strategy, learnersourcing offers the potential for personalized learning content creation, but also grapples with the challenge of predicting student performance due to inherent noise in student-generated data. While graph-based methods excel in capturing dense learner-question interactions, they falter in cold start scenarios, characterized by limited interactions, as seen when questions lack substantial learner responses. In response, we introduce an innovative strategy that synergizes the potential of integrating Signed Graph Neural Networks (SGNNs) and Large Language Model (LLM) embeddings. Our methodology employs a signed bipartite graph to comprehensively model student answers, complemented by a contrastive learning framework that enhances noise resilience. Furthermore, LLM's contribution lies in generating foundational question embeddings, proving especially advantageous in addressing cold start scenarios characterized by limited graph data interactions. Validation across five real-world datasets sourced from the PeerWise platform underscores our approach's effectiveness. Our method outperforms baselines, showcasing enhanced predictive accuracy and robustness.

摘要
如一种出现的教育战略，学习者来源（learnersourcing）具有个性化学习内容创建的潜力，但同时也面临学生表现预测的挑战，因为学生自然生成的数据中含有噪声。Graph基的方法在学生-问题互动密集的情况下表现出色，但在冷启动场景下， caracterized by limited interactions, graph data interactions are limited. In response, we propose an innovative strategy that combines Signed Graph Neural Networks (SGNNs) and Large Language Model (LLM) embeddings. Our methodology uses a signed bipartite graph to comprehensively model student answers, and a contrastive learning framework that enhances noise resilience. Additionally, LLM's contribution lies in generating foundational question embeddings, which is especially advantageous in addressing cold start scenarios with limited graph data interactions. Our approach is validated across five real-world datasets sourced from the PeerWise platform, and outperforms baselines, demonstrating enhanced predictive accuracy and robustness.

Enhancing Prediction and Analysis of UK Road Traffic Accident Severity Using AI: Integration of Machine Learning, Econometric Techniques, and Time Series Forecasting in Public Health Research

paper_url: http://arxiv.org/abs/2309.13483
repo_url: None
paper_authors: Md Abu Sufian, Jayasree Varadarajan
for: 本研究旨在 investigate 英国道路交通事故严重程度，使用机器学习、 econometric 和统计方法处理历史数据。
methods: 我们使用了各种技术，包括相关分析、回归模型、GMM 处理错误项、时间序列预测VAR 和 ARIMA 模型。
results: 我们的方法比预测方法出perform better，MASE 0.800 和 ME -73.80。我们还建立了一个Random Forest 分类器，具有 73% 精度、78% 回归率和 73% F1-score。使用 H2O AutoML 优化后，我们获得了 XGBoost 模型，RMSE 0.176 和 MAE 0.087。因素分析确定了关键变量，并使用 SHAP 为 Explainable AI， highlighting 关键因素如 Driver_Home_Area_Type 和 Road_Type。I hope that helps! Let me know if you have any further questions.

Abstract
This research investigates road traffic accident severity in the UK, using a combination of machine learning, econometric, and statistical methods on historical data. We employed various techniques, including correlation analysis, regression models, GMM for error term issues, and time-series forecasting with VAR and ARIMA models. Our approach outperforms naive forecasting with an MASE of 0.800 and ME of -73.80. We also built a random forest classifier with 73% precision, 78% recall, and a 73% F1-score. Optimizing with H2O AutoML led to an XGBoost model with an RMSE of 0.176 and MAE of 0.087. Factor Analysis identified key variables, and we used SHAP for Explainable AI, highlighting influential factors like Driver_Home_Area_Type and Road_Type. Our study enhances understanding of accident severity and offers insights for evidence-based road safety policies.

摘要
Translation notes:* "machine learning" Machine Learning* "econometric" econometric* "statistical" statistical* "historical data" 历史数据* "correlation analysis" 相关分析* "regression models" 回归模型* "GMM" Generalized Method of Moments (GMM)* "error term issues" 错误项问题* "time-series forecasting" 时间序列预测* "VAR" VAR (Vector Autoregression)* "ARIMA" ARIMA (AutoRegressive Integrated Moving Average)* "naive forecasting" 简单预测* "MASE" Mean Absolute Scaled Error (MASE)* "ME" Mean Error (ME)* "random forest classifier" 随机森林分类器* "H2O AutoML" H2O AutoML (Automated Machine Learning)* "XGBoost" XGBoost (eXtreme Gradient Boosting)* "Factor Analysis" 因素分析* "SHAP" SHAP (SHapley Additive exPlanations)* "Explainable AI" 可解释AI

Personalised and Adjustable Interval Type-2 Fuzzy-Based PPG Quality Assessment for the Edge

paper_url: http://arxiv.org/abs/2309.13464
repo_url: None
paper_authors: Jose A. Miranda, Celia López-Ongil, Javier Andreu-Perez
for: 这篇论文主要是为了提出一种基于Interval Type-2 Fuzzy Logic System (IT2FLS)的个性化和可调PPG信号质量评估方法，以提高PPG信号处理的准确性和可靠性。
methods: 该方法使用了个性化的IT2FLS参数来适应每个个体PPG信号的特点，同时提供可调的个性化水平，让医疗提供者可以根据不同应用场景进行调整。
results: 实验结果显示，提出的方法可以达到93.72%的准确率，表明该方法可以实现高效、实时的PPG信号质量评估，并提高PPG信号处理系统的准确性和可靠性。

Abstract
Most of today's wearable technology provides seamless cardiac activity monitoring. Specifically, the vast majority employ Photoplethysmography (PPG) sensors to acquire blood volume pulse information, which is further analysed to extract useful and physiologically related features. Nevertheless, PPG-based signal reliability presents different challenges that strongly affect such data processing. This is mainly related to the fact of PPG morphological wave distortion due to motion artefacts, which can lead to erroneous interpretation of the extracted cardiac-related features. On this basis, in this paper, we propose a novel personalised and adjustable Interval Type-2 Fuzzy Logic System (IT2FLS) for assessing the quality of PPG signals. The proposed system employs a personalised approach to adapt the IT2FLS parameters to the unique characteristics of each individual's PPG signals.Additionally, the system provides adjustable levels of personalisation, allowing healthcare providers to adjust the system to meet specific requirements for different applications. The proposed system obtained up to 93.72\% for average accuracy during validation. The presented system has the potential to enable ultra-low complexity and real-time PPG quality assessment, improving the accuracy and reliability of PPG-based health monitoring systems at the edge.

摘要
Therefore, in this paper, we propose a novel personalized and adjustable Interval Type-2 Fuzzy Logic System (IT2FLS) for assessing the quality of PPG signals. The proposed system employs a personalized approach to adapt the IT2FLS parameters to the unique characteristics of each individual's PPG signals. Additionally, the system provides adjustable levels of personalization, allowing healthcare providers to adjust the system to meet specific requirements for different applications.The proposed system obtained up to 93.72% for average accuracy during validation. The presented system has the potential to enable ultra-low complexity and real-time PPG quality assessment, improving the accuracy and reliability of PPG-based health monitoring systems at the edge.

A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

paper_url: http://arxiv.org/abs/2309.13459
repo_url: None
paper_authors: Wenzhuo Zhou, Annie Qu, Keiland W. Cooper, Norbert Fortin, Babak Shahbaba
for: 提高图像任务的解释性和可解释性，以及提高图像任务的表现。
methods: 提出了一种新的模型独立图像神经网络（MaGNet）框架，可以逐渐融合不同阶次的信息，提取高阶几何结构中的知识，并提供可解释的结果。
results: 在 simulate 数据上进行了广泛的数值研究，并在一个真实世界的案例中对 brain activity 数据进行了应用，以确认 MaGNet 的效果。

Abstract
Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its power to represent layer-wise neighborhood mixing. We conduct comprehensive numerical studies using simulated data to demonstrate the superior performance of MaGNet in comparison to several state-of-the-art alternatives. Furthermore, we apply MaGNet to a real-world case study aimed at extracting task-critical information from brain activity data, thereby highlighting its effectiveness in advancing scientific research.

摘要
GRAPH NEURAL NETWORKS (GNNs) 已经在各种图像任务中表现出色。 despite their success, existing GNNs 受到两个重要的限制：一是不能解释结果的黑盒特性，二是无法学习不同级别的表示。为了解决这些问题，我们提出了一种新的Model-agnostic Graph Neural Network（MaGNet）框架，可以逐渐 интегриate不同级别的信息，提取高阶邻居的知识，并提供可靠和可解释的结果，通过标识重要的紧凑图结构。具体来说，MaGNet 由两个组成部分：一个用于复杂关系的隐藏表示估计模型，和一个用于标识重要节点、边和节点特征的解释模型。我们通过对Empirical Rademacher complexity的总化误差 bound来证明MaGNet 的总化误差 bound，并表明其可以具有层次混合的 neigh权。我们在使用 simulated data 进行了广泛的数值研究，并证明 MaGNet 在与多种状态前的替代方案相比之下表现出优异性。此外，我们使用 MaGNet 对 brain activity data 进行了实际应用，以验证其在科研中的效果。

EMGTFNet: Fuzzy Vision Transformer to decode Upperlimb sEMG signals for Hand Gestures Recognition

paper_url: http://arxiv.org/abs/2310.03754
repo_url: None
paper_authors: Joseph Cherre Córdova, Christian Flores, Javier Andreu-Perez
for: 这个论文是为了研究用于手势识别（HGR）的电Myoelectric控制而写的。
methods: 这篇论文使用机器学习和深度学习方法进行模式识别，并使用视Transformer（ViT）架构和粗糙神经块（FNB）组成EMGTFNet模型来实现手势识别。
results: 该模型可以准确地识别多种手势动作，而无需使用数据扩展技术、传输学习或增加网络参数的数量。实验结果显示，对于NinaPro数据集中的49种手势动作，测试准确率为83.57%和3.5%，使用200 ms窗口大小和56,793个可变参数。这些结果超越了不含FNB的ViT模型，因此证明了包含FNB可以提高其性能。

Abstract
Myoelectric control is an area of electromyography of increasing interest nowadays, particularly in applications such as Hand Gesture Recognition (HGR) for bionic prostheses. Today's focus is on pattern recognition using Machine Learning and, more recently, Deep Learning methods. Despite achieving good results on sparse sEMG signals, the latter models typically require large datasets and training times. Furthermore, due to the nature of stochastic sEMG signals, traditional models fail to generalize samples for atypical or noisy values. In this paper, we propose the design of a Vision Transformer (ViT) based architecture with a Fuzzy Neural Block (FNB) called EMGTFNet to perform Hand Gesture Recognition from surface electromyography (sEMG) signals. The proposed EMGTFNet architecture can accurately classify a variety of hand gestures without any need for data augmentation techniques, transfer learning or a significant increase in the number of parameters in the network. The accuracy of the proposed model is tested using the publicly available NinaPro database consisting of 49 different hand gestures. Experiments yield an average test accuracy of 83.57\% \& 3.5\% using a 200 ms window size and only 56,793 trainable parameters. Our results outperform the ViT without FNB, thus demonstrating that including FNB improves its performance. Our proposal framework EMGTFNet reported the significant potential for its practical application for prosthetic control.

摘要
“我的电动控制是一个增加电omyography的兴趣领域，特别是在应用中有手势识别（HGR）的这些复义肢。今天的重点是使用机器学习和更深入的深度学习方法来进行模式识别。尽管可以取得好的结果，但这些模型通常需要大量的数据和训练时间。此外，由于随机的sEMG信号的性质，传统的模型无法扩展过去的样本，以致无法处理异常或噪音的值。在这篇文章中，我们提出了基于视觉 трансформа器（ViT）架构的EMGTFNet，以进行手势识别从表面电omyography（sEMG）信号。我们的提案的EMGTFNet架构可以将多种手势识别为无需增加资料增强技术、传统学习或网络中的参数数量。我们的实验结果显示，EMGTFNet可以高度精确地分类49种不同的手势，而且不需要增加训练数据或增加网络中的参数数量。我们的结果比ViT无FNB更好，这证明了包含FNB可以提高其表现。我们的建议框架EMGTFNet具有实际应用于复义控制的潜在性。”

AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming

paper_url: http://arxiv.org/abs/2309.13445
repo_url: None
paper_authors: Siva Satyendra Sahoo, Salim Ullah, Akash Kumar
for: 本研究旨在设计低成本计算机算符 для遥感系统中的机器学习（ML）算法。
methods: 本研究使用了人工智能/机器学习（AI/ML）基于的方法来设计FPGA基于的伪函数。
results: 相比传统的进化算法基于优化方法，本研究使用了混合整数二次函数 constrained programs来实现更有向性的搜索，并提高了精度和性能。

Abstract
With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML inference on resource-constrained systems. Approximate computing (AxC) aims to provide disproportionate gains in the power, performance, and area (PPA) of an application by allowing some level of reduction in its behavioral accuracy (BEHAV). Using approximate operators (AxOs) for computer arithmetic forms one of the more prevalent methods of implementing AxC. AxOs provide the additional scope for finer granularity of optimization, compared to only precision scaling of computer arithmetic. To this end, designing platform-specific and cost-efficient approximate operators forms an important research goal. Recently, multiple works have reported using AI/ML-based approaches for synthesizing novel FPGA-based AxOs. However, most of such works limit usage of AI/ML to designing ML-based surrogate functions used during iterative optimization processes. To this end, we propose a novel data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data and use the solutions to enable a more directed search approach for evolutionary optimization algorithms. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV, in the design of signed 8-bit multipliers.

摘要
随着机器学习（ML）算法在嵌入式系统中的应用逐渐增加，需要设计低成本的计算机器 arithmetic 来支持这些资源受限的系统。为此，人们正在活跃探讨新的计算模型，如 aproximate 和 Stochastic computing，以利用 ML 算法的内置错误抗性来实现 ML 推理。approximate computing（AxC）目标是提供不均匀的 PPA 提升，而不是仅仅是精度的减少。使用 approximate 操作符（AxOs）来实现计算机器 arithmetic 是其中一种常见的方法。AxOs 提供了更高的优化精度，相比于仅仅是精度的缩放。为此，设计Platform-specific 和 cost-efficient approximate 操作符成为了一项重要的研究目标。最近，多种文献报道了使用 AI/ML 方法来 sinthez FPGA 基于 AxOs。然而，大多数这些工作都是限制使用 AI/ML 来设计 ML 基于 surrogate 函数，用于 iterative 优化过程中。因此，我们提出了一种数据分析驱动的数学编程方法来 sinthez approximate 操作符。 Specifically，我们使用权重分析结果来构建混合整数quadratically constrained 程序，并使用这些解决方案来实现更 direkt 的搜索方法。与传统的进化算法基于优化相比，我们报道了在设计 signed 8-bit 乘数器时，对 PPA 和 BEHAV 的共同优化中的21%提高。

How Do Drivers Behave at Roundabouts in a Mixed Traffic? A Case Study Using Machine Learning

paper_url: http://arxiv.org/abs/2309.13442
repo_url: None
paper_authors: Farah Abu Hamad, Rama Hasiba, Deema Shahwan, Huthaifa I. Ashqar
for: 这个研究旨在分类车手在环形巷与其他路用者之间的交互行为，以提高路面安全性。
methods: 使用数据驱动的无监督机器学习分类车手行为，使用车辆动力学数据，分为三种驾驶模式（保守、正常、强制）。
results: 研究发现，大多数车手在环形巷上的行为可以分为两种驾驶模式：保守和正常，因为环形巷的交通速度较低。此外，发现当车手与行人或自行车使用者互动时，大约77%的车手被分类为保守驾驶者，对于不参与互动的保守驾驶者而言，只有42%。这些结果显示车手在环形巷与其他路用者互动时可能会发生不寻常的行为，增加了交通碰撞的风险。

Abstract
Driving behavior is considered a unique driving habit of each driver and has a significant impact on road safety. Classifying driving behavior and introducing policies based on the results can reduce the severity of crashes on the road. Roundabouts are particularly interesting because of the interconnected interaction between different road users at the area of roundabouts, which different driving behavior is hypothesized. This study investigates driving behavior at roundabouts in a mixed traffic environment using a data-driven unsupervised machine learning to classify driving behavior at three roundabouts in Germany. We used a dataset of vehicle kinematics to a group of different vehicles and vulnerable road users (VRUs) at roundabouts and classified them into three categories (i.e., conservative, normal, and aggressive). Results showed that most of the drivers proceeding through a roundabout can be mostly classified into two driving styles: conservative and normal because traffic speeds in roundabouts are relatively lower than in other signalized and unsignalized intersections. Results also showed that about 77% of drivers who interacted with pedestrians or cyclists were classified as conservative drivers compared to about 42% of conservative drivers that did not interact or about 51% from all drivers. It seems that drivers tend to behave abnormally as they interact with VRUs at roundabouts, which increases the risk of crashes when an intersection is multimodal. Results of this study could be helpful in improving the safety of roads by allowing policymakers to determine the effective and suitable safety countermeasures. Results will also be beneficial for the Advanced Driver Assistance System (ADAS) as the technology is being deployed in a mixed traffic environment.

摘要
驾驶行为被视为每位驾驶员的特有驾驶习惯，对路面安全有着重要影响。根据不同驾驶行为分类并采取相应政策可以减轻路面上的事故严重程度。圆形交叉口特别有趣，因为不同的驾驶行为在圆形交叉口的交叉点发生了互相关联的互动。本研究使用数据驱动无监督机器学习方法在德国三个圆形交叉口中分类驾驶行为。我们使用了车辆动态数据来分类不同的车辆和护理用路用户（VRU）在圆形交叉口中的驾驶行为，并将其分为三类（即保守、常规和强制）。结果显示，大多数通过圆形交叉口的驾驶员可以分为两种驾驶风格：保守和常规，因为圆形交叉口的交通速度相对较低。结果还显示，与步行者或自行车用户互动的77%的驾驶员被分类为保守驾驶员，与不与步行者或自行车用户互动的42%的保守驾驶员相比。这表明在多模式交叉口中，驾驶员在与VRU互动时有异常的行为，这会增加路面上的风险。本研究的结果可以帮助政策制定者确定有效和适当的安全防范措施。此外，这些结果还将有助于高等技术应用系统（ADAS）在混合交通环境中部署。

Finding Order in Chaos: A Novel Data Augmentation Method for Time Series in Contrastive Learning

paper_url: http://arxiv.org/abs/2309.13439
repo_url: https://github.com/eth-siplab/Finding_Order_in_Chaos
paper_authors: Berken Utku Demirel, Christian Holz
for: 这 paper 的目的是提出一种新的数据增强方法，用于 quasi-periodic 时间序列任务，以连接内类样本并找到隐藏空间中的顺序。
methods: 该方法基于 mixup 技术，并提出了一种新的方法，考虑非站ARY 时间序列的周期性。通过控制数据增强的混杂程度，该方法可以提高下游任务的表达特征和性能。
results: 对于三个时间序列任务（心率估算、人类活动识别和心血管疾病检测），该方法与州前工作相比，表现出了更好的数据生成和知道数据增强技术。

Abstract
The success of contrastive learning is well known to be dependent on data augmentation. Although the degree of data augmentations has been well controlled by utilizing pre-defined techniques in some domains like vision, time-series data augmentation is less explored and remains a challenging problem due to the complexity of the data generation mechanism, such as the intricate mechanism involved in the cardiovascular system. Moreover, there is no widely recognized and general time-series augmentation method that can be applied across different tasks. In this paper, we propose a novel data augmentation method for quasi-periodic time-series tasks that aims to connect intra-class samples together, and thereby find order in the latent space. Our method builds upon the well-known mixup technique by incorporating a novel approach that accounts for the periodic nature of non-stationary time-series. Also, by controlling the degree of chaos created by data augmentation, our method leads to improved feature representations and performance on downstream tasks. We evaluate our proposed method on three time-series tasks, including heart rate estimation, human activity recognition, and cardiovascular disease detection. Extensive experiments against state-of-the-art methods show that the proposed approach outperforms prior works on optimal data generation and known data augmentation techniques in the three tasks, reflecting the effectiveness of the presented method. Source code: https://github.com/eth-siplab/Finding_Order_in_Chaos

摘要
成功的对比学习几乎总是受到数据增强的影响。虽然在某些领域如视觉领域中，数据增强的度已经很好地控制了，但时间序列数据增强仍然是一个挑战，因为时间序列数据生成机制的复杂性，如心血管系统的内部机制。此外，没有一种广泛认可和可适用于不同任务的时间序列数据增强方法。在这篇论文中，我们提出了一种新的时间序列数据增强方法，旨在连接同类样本 вместе，从而在隐藏空间找到顺序。我们的方法基于已知的mixup技术，并添加了一种新的方法，考虑非站ARY时间序列的周期性。此外，我们可控制数据增强中创造的混乱程度，从而获得改进的特征表示和下游任务的性能。我们在三个时间序列任务中进行了广泛的实验，包括心率估计、人员活动识别和冠状疾病检测。对比于现有的最佳数据生成和知道数据增强技术，我们的方法表现出色，反映了提出的方法的效iveness。源代码：https://github.com/eth-siplab/Finding_Order_in_Chaos

Rethinking Superpixel Segmentation from Biologically Inspired Mechanisms

paper_url: http://arxiv.org/abs/2309.13438
repo_url: None
paper_authors: Tingyu Zhao, Bo Peng, Yuan Sun, Daipeng Yang, Zhenguang Zhang, Xi Wu
for: 这个论文主要针对的是提高深度学习基于超像分割方法的效率和性能，但是在生成严格遵循物体边界的超像时，仍然存在一定的挑战。methods: 我们提出了一种基于生物网络架构的超像分割方法，包括增强检查模块（ESM）和新的边界意识标签（BAL）。ESM通过模拟视觉系统中的交互投影机制来增强semantic信息。BAL利用视觉 cortical cells的空间频率特点来促进生成强边界遵循的超像。results: 我们通过对BSDS500 dataset和NYUv2 dataset进行评估，证明了我们的方法的有效性。

Abstract
Recently, advancements in deep learning-based superpixel segmentation methods have brought about improvements in both the efficiency and the performance of segmentation. However, a significant challenge remains in generating superpixels that strictly adhere to object boundaries while conveying rich visual significance, especially when cross-surface color correlations may interfere with objects. Drawing inspiration from neural structure and visual mechanisms, we propose a biological network architecture comprising an Enhanced Screening Module (ESM) and a novel Boundary-Aware Label (BAL) for superpixel segmentation. The ESM enhances semantic information by simulating the interactive projection mechanisms of the visual cortex. Additionally, the BAL emulates the spatial frequency characteristics of visual cortical cells to facilitate the generation of superpixels with strong boundary adherence. We demonstrate the effectiveness of our approach through evaluations on both the BSDS500 dataset and the NYUv2 dataset.

摘要
近些年，深度学习基于超像素分割方法的进步，使得分割效率和性能得到了改善。然而，仍然存在一大挑战，即生成严格遵循物体边界的超像素，同时捕捉富有视觉意义的信息，特别是当颜色相关性障碍物体时。 drawing inspiration from neural structure and visual mechanisms, we propose a biological network architecture consisting of an Enhanced Screening Module (ESM) and a novel Boundary-Aware Label (BAL) for superpixel segmentation. The ESM enhances semantic information by simulating the interactive projection mechanisms of the visual cortex. Additionally, the BAL emulates the spatial frequency characteristics of visual cortical cells to facilitate the generation of superpixels with strong boundary adherence. We demonstrate the effectiveness of our approach through evaluations on both the BSDS500 dataset and the NYUv2 dataset.

SpeakEasy: A Conversational Intelligence Chatbot for Enhancing College Students’ Communication Skills

paper_url: http://arxiv.org/abs/2310.14891
repo_url: None
paper_authors: Hyunbae Jeon, Rhea Ramachandran, Victoria Ploerer, Yella Diekmann, Max Bagga
for: The paper aims to help college students improve their communication skills through a chatbot that provides feedback on their conversational ability.methods: The chatbot, called SpeakEasy, uses a seven-minute spoken conversation with the user, analyzes the user’s responses with metrics based on previous research, and provides feedback on how to improve conversational ability.results: SpeakEasy evaluates the quality of the conversation using macros and provides elaborate feedback to the user on how to improve their conversations. The chatbot also updates its algorithms based on the user’s responses to questions about its performance.

Abstract
Social interactions and conversation skills separate the successful from the rest and the confident from the shy. For college students in particular, the ability to converse can be an outlet for the stress and anxiety experienced on a daily basis along with a foundation for all-important career skills. In light of this, we designed SpeakEasy: a chatbot with some degree of intelligence that provides feedback to the user on their ability to engage in free-form conversations with the chatbot. SpeakEasy attempts to help college students improve their communication skills by engaging in a seven-minute spoken conversation with the user, analyzing the user's responses with metrics designed based on previous psychology and linguistics research, and providing feedback to the user on how they can improve their conversational ability. To simulate natural conversation, SpeakEasy converses with the user on a wide assortment of topics that two people meeting for the first time might discuss: travel, sports, and entertainment. Unlike most other chatbots with the goal of improving conversation skills, SpeakEasy actually records the user speaking, transcribes the audio into tokens, and uses macros-e.g., sequences that calculate the pace of speech, determine if the user has an over-reliance on certain words, and identifies awkward transitions-to evaluate the quality of the conversation. Based on the evaluation, SpeakEasy provides elaborate feedback on how the user can improve their conversations. In turn, SpeakEasy updates its algorithms based on a series of questions that the user responds to regarding SpeakEasy's performance.

摘要
社交交流和对话技巧对成功和自信心是非常重要的，尤其是 для大学生。在日常生活中受到压力和焦虑的情况下，与其他人交流可以是一种缓解压力的方式，同时也是职业技能的基础。为了帮助大学生提高communication skills，我们开发了SpeakEasy：一个具有一定程度的人工智能的chatbot，可以与用户进行7分钟的自由对话，并提供用户在对话中的表现评价。SpeakEasy使用了基于前期心理学和语言学研究的度量来评估用户的对话能力，并提供了用户如何改进对话技巧的具体反馈。与其他帮助提高对话技巧的chatbot不同，SpeakEasy实际记录用户的语音，将语音转录为符号，并使用抽象来评估对话质量。SpeakEasy使用的抽象包括语速度、用户语言使用情况和对话过渡的awkwardness等。基于这些评估结果，SpeakEasy提供了详细的反馈， помо助用户改进对话技巧。而SpeakEasy的算法则基于用户对SpeakEasy的表现进行评价的问题来进行更新。

Resolving References in Visually-Grounded Dialogue via Text Generation

paper_url: http://arxiv.org/abs/2309.13430
repo_url: https://github.com/willemsenbram/reference-resolution-via-text-generation
paper_authors: Bram Willemsen, Livia Qian, Gabriel Skantze
for: 用于解决基于对话语言的视觉引用解决方案，提高视觉语言模型（VLM）的对话处理能力。
methods: 使用修改的大语言模型（LLM）生成定语描述，捕捉对话语言上的核心相关信息；使用预训练的VLM来基于生成的定语描述进行零基本训练引用识别。
results: 在人工标注的视觉对话数据集上测试，与基eline比较的result exceeds，并发现使用更大的上下文窗口可以获得更高的返回率。

Abstract
Vision-language models (VLMs) have shown to be effective at image retrieval based on simple text queries, but text-image retrieval based on conversational input remains a challenge. Consequently, if we want to use VLMs for reference resolution in visually-grounded dialogue, the discourse processing capabilities of these models need to be augmented. To address this issue, we propose fine-tuning a causal large language model (LLM) to generate definite descriptions that summarize coreferential information found in the linguistic context of references. We then use a pretrained VLM to identify referents based on the generated descriptions, zero-shot. We evaluate our approach on a manually annotated dataset of visually-grounded dialogues and achieve results that, on average, exceed the performance of the baselines we compare against. Furthermore, we find that using referent descriptions based on larger context windows has the potential to yield higher returns.

摘要
传感语言模型（VLM）在基于简单文本查询的图像检索方面表现出色，但基于对话输入的文本-图像检索仍然是一个挑战。因此，如果我们想使用VLM进行视觉定位对话，那么这些模型的语言处理能力需要进行增强。为解决这个问题，我们提议通过细化大语言模型（LLM）来生成定语描述，捕捉在语言上下文中的核心相关信息。然后，我们使用预训练的VLM来根据生成的描述来确定参照，无需训练。我们对手动标注的视觉定位对话集进行评估，并超越比较基线的性能。此外，我们发现使用基于更大上下文窗口的定语描述有可能带来更高的返回。

Modeling Student Performance in Game-Based Learning Environments

paper_url: http://arxiv.org/abs/2309.13429
repo_url: https://github.com/harryjeon24/student_performance
paper_authors: Hyunbae Jeon, Harry He, Anthony Wang, Susanna Spooner
for: 这项研究探讨了基于游戏学习的教育游戏”Jo Wilder和首都案例”，关注使用不同机器学习模型预测学生表现，包括K-最近邻居（KNN）、多层感知神经网络（MLP）和随机森林。研究目标是确定预测学生表现和正确问题答案的最有价值特征。
methods: 通过利用游戏数据，我们建立了完整的基准chmarks для这些模型，并探讨了如何应用正确的数据聚合方法。我们压缩了原始训练数据的大小从4.6 GB压缩到48 MB的预处理训练数据，保持了高F1分数和准确率。
results: 我们的发现表明，适当的预处理技术可以在不使用深度学习模型的情况下提高表现。MLP模型在French Touch模型当前状态的比较中表现出色，达到F-1分数0.83和准确率0.74，这表明其适用于这个数据集。未来的研究应该探索使用更大的数据集、其他预处理技术、更先进的深度学习技术和实际应用来为学生根据预测表现提供个性化学习建议。这项研究贡献于游戏学习理解和优化教育游戏经验，以提高学生的成绩和技能发展。

Abstract
This study investigates game-based learning in the context of the educational game "Jo Wilder and the Capitol Case," focusing on predicting student performance using various machine learning models, including K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and Random Forest. The research aims to identify the features most predictive of student performance and correct question answering. By leveraging gameplay data, we establish complete benchmarks for these models and explore the importance of applying proper data aggregation methods. By compressing all numeric data to min/max/mean/sum and categorical data to first, last, count, and nunique, we reduced the size of the original training data from 4.6 GB to 48 MB of preprocessed training data, maintaining high F1 scores and accuracy. Our findings suggest that proper preprocessing techniques can be vital in enhancing the performance of non-deep-learning-based models. The MLP model outperformed the current state-of-the-art French Touch model, achieving an F-1 score of 0.83 and an accuracy of 0.74, suggesting its suitability for this dataset. Future research should explore using larger datasets, other preprocessing techniques, more advanced deep learning techniques, and real-world applications to provide personalized learning recommendations to students based on their predicted performance. This paper contributes to the understanding of game-based learning and provides insights into optimizing educational game experiences for improved student outcomes and skill development.

摘要
We preprocessed the original training data by compressing all numeric data to min/max/mean/sum and categorical data to first, last, count, and nunique, reducing the data size from 4.6 GB to 48 MB while maintaining high F1 scores and accuracy. Our findings suggest that proper preprocessing techniques can significantly enhance the performance of non-deep-learning-based models.The MLP model outperformed the current state-of-the-art French Touch model, achieving an F-1 score of 0.83 and an accuracy of 0.74, suggesting its suitability for this dataset. Future research should explore using larger datasets, other preprocessing techniques, more advanced deep learning techniques, and real-world applications to provide personalized learning recommendations to students based on their predicted performance.This study contributes to the understanding of game-based learning and provides insights into optimizing educational game experiences for improved student outcomes and skill development.

ECGNet: A generative adversarial network (GAN) approach to the synthesis of 12-lead ECG signals from single lead inputs

paper_url: http://arxiv.org/abs/2310.03753
repo_url: None
paper_authors: Max Bagga, Hyunbae Jeon, Alex Issokson
for: 这个论文的目的是生成完整的12导电cardiogram信号，并使用GAN模型来实现这一目标。
methods: 这个论文使用了GAN模型，bidirectional LSTM生成器和CNN抗对模型来生成12导电cardiogram信号。
results: 该模型可以很好地保留信号中的特有特征，例如P-Q段和R峰的特征，并且可以预测多种心血管疾病的发生。

Abstract
Electrocardiography (ECG) signal generation has been heavily explored using generative adversarial networks (GAN) because the implementation of 12-lead ECGs is not always feasible. The GAN models have achieved remarkable results in reproducing ECG signals but are only designed for multiple lead inputs and the features the GAN model preserves have not been identified-limiting the generated signals use in cardiovascular disease (CVD)-predictive models. This paper presents ECGNet which is a procedure that generates a complete set of 12-lead ECG signals from any single lead input using a GAN framework with a bidirectional long short-term memory (LSTM) generator and a convolutional neural network (CNN) discriminator. Cross and auto-correlation analysis performed on the generated signals identifies features conserved during the signal generation-i.e., features that can characterize the unique-nature of each signal and thus likely indicators of CVD. Finally, by using ECG signals annotated with the CVD-indicative features detailed by the correlation analysis as inputs for a CVD-onset-predictive CNN model, we overcome challenges preventing the prediction of multiple-CVD targets. Our models are experimented on 15s 12-lead ECG dataset recorded using MyoVista's wavECG. Functional outcome data for each patient is recorded and used in the CVD-predictive model. Our best GAN model achieves state-of-the-art accuracy with Frechet Distance (FD) scores of 4.73, 4.89, 5.18, 4.77, 4.71, and 5.55 on the V1-V6 pre-cordial leads respectively and shows strength in preserving the P-Q segments and R-peaks in the generated signals. To the best of our knowledge, ECGNet is the first to predict all of the remaining eleven leads from the input of any single lead.

摘要
电rokardiography（ECG）信号生成已经得到了广泛的探索，使用生成对抗网络（GAN），因为实施12导ECG的实施不一定可行。GAN模型已经实现了对ECG信号的很好的重现，但是它们只是多导输入的，而且保留的特征没有得到了识别-这限制了生成的信号在冠军疾病预测中的使用。本文提出了ECGNet，一种可以从单个导入信号中生成完整的12导ECG信号的GAN框架，包括一个双向长短期记忆（LSTM）生成器和一个卷积神经网络（CNN）分类器。在生成的信号中进行了交叉和自相关分析，并识别了保留的特征-即可以Characterize每个信号的独特性，因此可能是冠军疾病的指标。最后，我们使用了标注了CVD指标的ECG信号作为输入，并使用了一个CVD发生预测的CNN模型，解决了由于多个CVD目标的预测而产生的挑战。我们对15秒12导ECG数据集进行了实验，该数据集使用MyoVista的wavECG记录。每个患者的功能结果数据都被记录，并用于CVD发生预测模型。我们的最佳GAN模型在V1-V6前心导电位上获得了state-of-the-art的准确率，FD分数分别为4.73、4.89、5.18、4.77、4.71和5.55，并且表现出了保持P-Q段和R-peak的强大能力。而且，根据我们知道，ECGNet是第一个可以从任何单个导入信号中预测所有的11导ECG信号。

A Chat About Boring Problems: Studying GPT-based text normalization

paper_url: http://arxiv.org/abs/2309.13426
repo_url: None
paper_authors: Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg
for: 本研究旨在探讨语言模型是否可以有效地进行文本normalization，并提出了一种新的文本normalizationtask设计方法。
methods: 本研究使用了大型语言模型（LLM），结合自我一致性理解和语言知识引入的提问工程，以实践文本normalization的可行性。
results: 研究发现，使用LLM进行文本normalization可以在几个shotenario下实现错误率大约40%下降，而且通过分析错误原因，发现了传统文本normalization任务的一些限制。

Abstract
Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language models. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models (LLM) for text normalization in few-shot scenarios. Combining self-consistency reasoning with linguistic-informed prompt engineering, we find LLM based text normalization to achieve error rates around 40\% lower than top normalization systems. Further, upon error analysis, we note key limitations in the conventional design of text normalization tasks. We create a new taxonomy of text normalization errors and apply it to results from GPT-3.5-Turbo and GPT-4.0. Through this new framework, we can identify strengths and weaknesses of GPT-based TN, opening opportunities for future work.

摘要

Penalties and Rewards for Fair Learning in Paired Kidney Exchange Programs

paper_url: http://arxiv.org/abs/2309.13421
repo_url: None
paper_authors: Margarida Carvalho, Alison Caulfield, Yi Lin, Adrian Vetta
for: 这个论文旨在探讨了一种动态交换和分配机制，以提高生产力探讨机制的性能。
methods: 该论文使用了学习算法，以在动态模拟中学习优化患者-捐献者权重，以提高结果。
results: 研究发现，在加拿大生产力探讨计划中，使用学习算法可以提高平均等待时间、增加移植数量和提高群体公平。具体来说，最高表现的学习算法可以提高群体公平性 by 10%，同时增加移植数量 by 6%和降低等待时间 by 24%。但研究的核心结果却是，在提高生产力探讨计划的性能方面，不是将积极分配给患者-捐献者对的正面权重，而是通过对少量非指定捐献者的负面权重分配来实现。

Abstract
A kidney exchange program, also called a kidney paired donation program, can be viewed as a repeated, dynamic trading and allocation mechanism. This suggests that a dynamic algorithm for transplant exchange selection may have superior performance in comparison to the repeated use of a static algorithm. We confirm this hypothesis using a full scale simulation of the Canadian Kidney Paired Donation Program: learning algorithms, that attempt to learn optimal patient-donor weights in advance via dynamic simulations, do lead to improved outcomes. Specifically, our learning algorithms, designed with the objective of fairness (that is, equity in terms of transplant accessibility across cPRA groups), also lead to an increased number of transplants and shorter average waiting times. Indeed, our highest performing learning algorithm improves egalitarian fairness by 10% whilst also increasing the number of transplants by 6% and decreasing waiting times by 24%. However, our main result is much more surprising. We find that the most critical factor in determining the performance of a kidney exchange program is not the judicious assignment of positive weights (rewards) to patient-donor pairs. Rather, the key factor in increasing the number of transplants, decreasing waiting times and improving group fairness is the judicious assignment of a negative weight (penalty) to the small number of non-directed donors in the kidney exchange program.

摘要
一个肾移植计划，也称为肾对肾移植计划，可以看作是一种循环、动态的交易和分配机制。这表明使用动态算法进行移植交易选择可能会有更高的性能。我们确认这一假设使用加拿大肾对肾移植计划的全规模模拟：学习算法，尝试通过动态模拟来学习患者-捐精对的优质量因子，实际上会导致改进的结果。Specifically, our learning algorithms, designed with the objective of fairness (that is, equity in terms of transplant accessibility across cPRA groups), also lead to an increased number of transplants and shorter average waiting times. Indeed, our highest performing learning algorithm improves egalitarian fairness by 10% whilst also increasing the number of transplants by 6% and decreasing waiting times by 24%. However, our main result is much more surprising. We find that the most critical factor in determining the performance of a kidney exchange program is not the judicious assignment of positive weights (rewards) to patient-donor pairs. Rather, the key factor in increasing the number of transplants, decreasing waiting times and improving group fairness is the judicious assignment of a negative weight (penalty) to the small number of non-directed donors in the kidney exchange program.

State-space Models with Layer-wise Nonlinearity are Universal Approximators with Exponential Decaying Memory

paper_url: http://arxiv.org/abs/2309.13414
repo_url: None
paper_authors: Shida Wang, Beichen Xue
for: 这篇论文主要研究了使用层状态模型来模型连续序列之间的关系。
methods: 论文使用了层状态模型，并在每层添加非线性活化来提高模型的表达能力。
results: 研究表明，通过层状态模型和非线性活化的组合，可以有效地模型复杂的连续序列模式。但是，研究也表明，状态空间模型无法根本解决指数减少的内存问题。

Abstract
State-space models have gained popularity in sequence modelling due to their simple and efficient network structures. However, the absence of nonlinear activation along the temporal direction limits the model's capacity. In this paper, we prove that stacking state-space models with layer-wise nonlinear activation is sufficient to approximate any continuous sequence-to-sequence relationship. Our findings demonstrate that the addition of layer-wise nonlinear activation enhances the model's capacity to learn complex sequence patterns. Meanwhile, it can be seen both theoretically and empirically that the state-space models do not fundamentally resolve the exponential decaying memory issue. Theoretical results are justified by numerical verifications.

摘要
状态空间模型在序列模型中得到了广泛应用，因为它们的简单和高效的网络结构。然而，在时间方向上缺乏非线性活化限制了模型的容量。在这篇论文中，我们证明了将层weise非线性活化核心到状态空间模型可以近似任何连续序列到序列关系。我们的发现表明，增加层wise非线性活化可以提高模型学习复杂序列模式的能力。同时，可以在理论和实验两个方面见到，状态空间模型并没有根本解决指数减少记忆问题。理论结果得到了数值验证。

Towards Attributions of Input Variables in a Coalition

paper_url: http://arxiv.org/abs/2309.13411
repo_url: None
paper_authors: Xinhao Zheng, Huiqi Deng, Quanshi Zhang
for: 这paper的目的是开发一种新的贡献计算方法，以解释个体变量贡献和其党筹贡献之间的冲突。
methods: 该paper使用了一种全新的视角来推导贡献计算方法，包括将Harsanyi交互编码为AI模型中的交互分配，然后将Shapley值扩展到党筹贡献领域。
results: 该paper发现了冲突的基本机制，即党筹中包含部分变量的交互导致这种冲突。

Abstract
This paper aims to develop a new attribution method to explain the conflict between individual variables' attributions and their coalition's attribution from a fully new perspective. First, we find that the Shapley value can be reformulated as the allocation of Harsanyi interactions encoded by the AI model. Second, based the re-alloction of interactions, we extend the Shapley value to the attribution of coalitions. Third we ective. We derive the fundamental mechanism behind the conflict. This conflict come from the interaction containing partial variables in their coalition.

摘要
这篇论文目的是开发一种新的归因方法，以解释个体变量归因和其党的归因之间的冲突。我们首先发现，夏普利值可以被重新解释为由人工智能模型编码的哈萨尼（Harsanyi）互动的分配。其次，基于重新分配互动，我们扩展了夏普利值来归因党。最后，我们 derive了这种冲突的基本机制，这种冲突来自各个变量在其党中的互动中含有部分变量。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Time-Series Forecasting: Unleashing Long-Term Dependencies with Fractionally Differenced Data

paper_url: http://arxiv.org/abs/2309.13409
repo_url: None
paper_authors: Sarit Maitra, Vivek Mishra, Srashti Dwivedi, Sukanya Kundu, Goutam Kumar Kundu
for: 这个研究旨在提出一种新的预测策略，利用分数差分（FD）来捕捉时间序列数据中的短期和长期依赖关系。
methods: 这个研究使用了FD法，与传统的整数差分方法不同，FD可以维护时间序列的记忆，同时为预测目的进行稳定化。研究还使用了新闻报道的 sentiment分析，将FD应用于股票指数SPY的金融数据。
results: 研究结果表明，FD在与目标变量进行binary分类时表现出优于整数差分，这得到了ROCAUC和MCC评价的证明。

Abstract
This study introduces a novel forecasting strategy that leverages the power of fractional differencing (FD) to capture both short- and long-term dependencies in time series data. Unlike traditional integer differencing methods, FD preserves memory in series while stabilizing it for modeling purposes. By applying FD to financial data from the SPY index and incorporating sentiment analysis from news reports, this empirical analysis explores the effectiveness of FD in conjunction with binary classification of target variables. Supervised classification algorithms were employed to validate the performance of FD series. The results demonstrate the superiority of FD over integer differencing, as confirmed by Receiver Operating Characteristic/Area Under the Curve (ROCAUC) and Mathews Correlation Coefficient (MCC) evaluations.

摘要

A Unitary Weights Based One-Iteration Quantum Perceptron Algorithm for Non-Ideal Training Sets

paper_url: http://arxiv.org/abs/2309.14366
repo_url: None
paper_authors: Wenjie Liu, Peipei Gao, Yuxiang Wang, Wenbin Yu, Maojun Zhang
for: 提高量子神经网络的训练集不完美问题和一次学习问题
methods: 提出了一种基于单位 weights 的高效量子见解算法，通过计算总加重矩阵的特征值分解来使加重矩阵变为单位矩阵
results: 示例验证了量子门 Warren gates {H, S, T, CNOT, Toffoli, Fredkin} 的准确实现，并且与其他量子见解算法进行比较，显示了我们的算法在应用性、准确性和可用性等方面具有优势。此外，为了进一步验证我们的算法的可应用性，还提出了一种量子复合门，该门由多个基本量子门组成。

Abstract
In order to solve the problem of non-ideal training sets (i.e., the less-complete or over-complete sets) and implement one-iteration learning, a novel efficient quantum perceptron algorithm based on unitary weights is proposed, where the singular value decomposition of the total weight matrix from the training set is calculated to make the weight matrix to be unitary. The example validation of quantum gates {H, S, T, CNOT, Toffoli, Fredkin} shows that our algorithm can accurately implement arbitrary quantum gates within one iteration. The performance comparison between our algorithm and other quantum perceptron algorithms demonstrates the advantages of our algorithm in terms of applicability, accuracy, and availability. For further validating the applicability of our algorithm, a quantum composite gate which consists of several basic quantum gates is also illustrated.

摘要
为解决非理想训练集（即部分或过complete的集）和实现一轮学习，一种新的高效量子批量算法基于单位Weightmatrix是提出的，其中来自训练集的总weight矩阵的singular value decomposition被计算以使weight矩阵变为单位矩阵。例子验证量子门{H, S, T, CNOT, Toffoli, Fredkin}表明，我们的算法可以在一轮内准确实现任意量子门。与其他量子批量算法相比，我们的算法在可用性、准确性和可用性等方面具有优势。为进一步验证我们的算法的可用性，一种量子复合门，由多个基本量子门组成，也被描述。

A Survey on Image-text Multimodal Models

paper_url: http://arxiv.org/abs/2309.15857
repo_url: https://github.com/i2vec/a-survey-on-image-text-multimodal-models
paper_authors: Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, Bihui Yu, Guiyong Chang, Dawei Liu, Sibo Zhang, Zhengbing Yao, Mingjun Xu, Liping Bu
for:This paper provides a comprehensive review of the evolution and current state of image-text multimodal models, exploring their application value, challenges, and potential research trajectories.methods:The paper revisits the basic concepts and developmental milestones of image-text multimodal models, introducing a novel classification that segments their evolution into three distinct phases, and proposes a categorization of the tasks associated with image-text multimodal models into five major types.results:The paper delves into the inherent challenges and limitations of image-text multimodal models and fosters the exploration of prospective research directions, offering an exhaustive overview of the present research landscape of image-text multimodal models and serving as a valuable reference for future scholarly endeavors.

Abstract
Amidst the evolving landscape of artificial intelligence, the convergence of visual and textual information has surfaced as a crucial frontier, leading to the advent of image-text multimodal models. This paper provides a comprehensive review of the evolution and current state of image-text multimodal models, exploring their application value, challenges, and potential research trajectories. Initially, we revisit the basic concepts and developmental milestones of these models, introducing a novel classification that segments their evolution into three distinct phases, based on their time of introduction and subsequent impact on the discipline. Furthermore, based on the tasks' significance and prevalence in the academic landscape, we propose a categorization of the tasks associated with image-text multimodal models into five major types, elucidating the recent progress and key technologies within each category. Despite the remarkable accomplishments of these models, numerous challenges and issues persist. This paper delves into the inherent challenges and limitations of image-text multimodal models, fostering the exploration of prospective research directions. Our objective is to offer an exhaustive overview of the present research landscape of image-text multimodal models and to serve as a valuable reference for future scholarly endeavors. We extend an invitation to the broader community to collaborate in enhancing the image-text multimodal model community, accessible at: \href{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.

摘要
在人工智能的演化 landscape 中，图文合并成为了一个关键的前ier，导致了图文多modal模型的出现。本文提供了图文多modal模型的全面回顾和当前状况，探讨其应用价值、挑战和可能的研究车道。首先，我们回顾了这些模型的基本概念和发展历程，提出了一种新的分类方法，将其分为三个不同的阶段，根据它们的出现时间和对领域的影响。此外，根据学术景观中任务的重要性和普遍性，我们对图文多modal模型相关任务进行了五种主要类别的分类，阐述了最近的进步和关键技术在每个类别中。 despite the remarkable achievements of these models, numerous challenges and issues persist. This paper explores the inherent challenges and limitations of image-text multimodal models, and invites the broader community to collaborate in enhancing the image-text multimodal model community, accessible at: \href{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.Here's the word-for-word translation of the text into Simplified Chinese:在人工智能的演化 landscape 中，图文合并成为了一个关键的前ier，导致了图文多modal模型的出现。本文提供了图文多modal模型的全面回顾和当前状况，探讨其应用价值、挑战和可能的研究车道。首先，我们回顾了这些模型的基本概念和发展历程，提出了一种新的分类方法，将其分为三个不同的阶段，根据它们的出现时间和对领域的影响。此外，根据学术景观中任务的重要性和普遍性，我们对图文多modal模型相关任务进行了五种主要类别的分类，阐述了最近的进步和关键技术在每个类别中。 despite the remarkable achievements of these models, numerous challenges and issues persist. This paper explores the inherent challenges and limitations of image-text multimodal models, and invites the broader community to collaborate in enhancing the image-text multimodal model community, accessible at: \href{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.

Smart City Digital Twin Framework for Real-Time Multi-Data Integration and Wide Public Distribution

paper_url: http://arxiv.org/abs/2309.13394
repo_url: None
paper_authors: Lorenzo Adreani, Pierfrancesco Bellini, Marco Fanfani, Paolo Nesi, Gianni Pantaleo
for: 这个论文是为了介绍一种基于Snap4City IoT平台的城市数字孪生框架，用于支持城市规划和管理决策。
methods: 该框架使用了数据收集、索引、计算和信息分布等方法，并将这些方法集成到了一个跨多个数据源的平台上，以实现实时更新的数字孪生。
results: 该框架可以提供实时的城市情况描述、预测和仿真分析结果，包括交通拥堵、污染物分布、可能的结果等，并且支持公民参与城市决策过程。

Abstract
Digital Twins are digital replica of real entities and are becoming fundamental tools to monitor and control the status of entities, predict their future evolutions, and simulate alternative scenarios to understand the impact of changes. Thanks to the large deployment of sensors, with the increasing information it is possible to build accurate reproductions of urban environments including structural data and real-time information. Such solutions help city councils and decision makers to face challenges in urban development and improve the citizen quality of life, by ana-lysing the actual conditions, evaluating in advance through simulations and what-if analysis the outcomes of infrastructural or political chang-es, or predicting the effects of humans and/or of natural events. Snap4City Smart City Digital Twin framework is capable to respond to the requirements identified in the literature and by the international forums. Differently from other solutions, the proposed architecture provides an integrated solution for data gathering, indexing, computing and information distribution offered by the Snap4City IoT platform, therefore realizing a continuously updated Digital Twin. 3D building models, road networks, IoT devices, WoT Entities, point of interests, routes, paths, etc., as well as results from data analytical processes for traffic density reconstruction, pollutant dispersion, predictions of any kind, what-if analysis, etc., are all integrated into an accessible web interface, to support the citizens participation in the city decision processes. What-If analysis to let the user performs simulations and observe possible outcomes. As case of study, the Digital Twin of the city of Florence (Italy) is presented. Snap4City platform, is released as open-source, and made available through GitHub and as docker compose.

摘要
“数字双”是数字世界中的实体复制品，它们在监测和控制实体状态、预测未来发展和模拟不同enario来理解改变的影响。随着丰富的传感器的扩散，可以建立 precisemodels of urban environments, including structural data and real-time information。这些解决方案帮助城市议会和决策者面对城市发展的挑战，提高公民的生活质量，通过实际情况分析、预测变化和“what-if”分析来评估基础设施或政策变化的影响。Snap4City Smart City Digital Twin框架能够应对文献和国际论坛中所提出的需求。与其他解决方案不同，我们的架构提供了一个集成的数据收集、索引、计算和信息分发的解决方案，以实现不断更新的数字双。3D建筑模型、路网、物联网设备、Web of Things实体、终端、路线、轨迹等都会被集成到一个可访问的Web界面中，以支持公民参与城市决策过程。“what-if”分析允许用户进行模拟和观察可能的结果。作为案例研究，我们介绍了 Florence（意大利）的数字双。Snap4City平台释放为开源，通过 GitHub和docker compose 进行分发。

AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture

paper_url: http://arxiv.org/abs/2309.13393
repo_url: None
paper_authors: Leonardo Saraceni, Ionut M. Motoi, Daniele Nardi, Thomas A. Ciarfuglia
for: 这个论文是为了解决精准农业中的多目标跟踪问题，这个问题是机器人学中的一个挑战。
methods: 这篇论文提出了一种基于运动信息的实时跟踪检测管道，即AgriSORT，该管道可以快速和准确地在视频序列中传播跟踪。
results: 在一个特制的农业上的MOT benchмарck上测试了AgriSORT管道，并得到了高效和准确的跟踪结果。

Abstract
The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT [5], we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons.

摘要
“多目标追踪（MOT）问题的挑战是在识别和追踪影像序列中的所有物件，并保留每个物件唯一的识别码。这是机器人学中的基本问题。在精确农业中，实现满意的解决方案受到极大的镜头运动、突然的照明变化和强大的遮蔽影响。现代追踪器多数依靠物件的外观而非运动进行相互关联，这在农业案例中可能无效，因为大多数目标是静止的物件，具有相同的外观。为此，我们基于SORT [5]的概念，提出了AgriSORT，一个简单、在线、实时的追踪-by-探测管线，仅基于运动资讯，可以实现精确和快速的探测迹踪转换。AgriSORT的主要专注点包括效率、灵活性、最小化依赖和机器人平台的易用性。我们将该管线评估在特有的农业上的MOT实验中，基于简体葡萄园的视频序列，特别是由于强大的自相似和物件的密度。管线和数据都可以供未来的比较。”

D-Separation for Causal Self-Explanation

paper_url: http://arxiv.org/abs/2309.13391
repo_url: https://github.com/jugechengzi/rationalization-mcd
paper_authors: Wei Liu, Jun Wang, Haozhao Wang, Ruixuan Li, Zhiying Deng, YuanKai Zhang, Yang Qiu
for: 提高 NLP 模型的解释性和精度
methods: 基于 Minimum Conditional Dependence（MCD） criterion，使用 KL-divergence 度量依赖性，提高 F1 分数
results: 与先前最佳 MMI-based 方法比较，MCD 方法可以提高 F1 分数达到 $13.7%$ 之间

Abstract
Rationalization is a self-explaining framework for NLP models. Conventional work typically uses the maximum mutual information (MMI) criterion to find the rationale that is most indicative of the target label. However, this criterion can be influenced by spurious features that correlate with the causal rationale or the target label. Instead of attempting to rectify the issues of the MMI criterion, we propose a novel criterion to uncover the causal rationale, termed the Minimum Conditional Dependence (MCD) criterion, which is grounded on our finding that the non-causal features and the target label are \emph{d-separated} by the causal rationale. By minimizing the dependence between the unselected parts of the input and the target label conditioned on the selected rationale candidate, all the causes of the label are compelled to be selected. In this study, we employ a simple and practical measure of dependence, specifically the KL-divergence, to validate our proposed MCD criterion. Empirically, we demonstrate that MCD improves the F1 score by up to $13.7\%$ compared to previous state-of-the-art MMI-based methods. Our code is available at: \url{https://github.com/jugechengzi/Rationalization-MCD}.

摘要
<>这是一个自解释的框架 для NLP模型。传统工作通常使用最大共同信息（MMI） criterion 来找到这些模型的理由，但这个标准可能受到假冒的特征所影响，这些特征可能与目标标签或理由相关。而不是尝试修正 MMI 标准的问题，我们提出了一个新的标准，即最小侧项依存性（MCD）标准，这是基于我们发现非 causal 特征和目标标签在 causal 理由下是 d-separated 的现象。通过将选择的理由候选者中的非选择部分的输入与目标标签之间的依存关系降至最低，所有的 Label 的原因都会被选择。在这个研究中，我们使用了一个简单实用的依存度量，具体是 KL- divergence，以验证我们的提出的 MCD 标准。实验结果显示，MCD 可以与之前的 MMI 基于的方法相比，提高 F1 分数达 13.7%。我们的代码可以在：\url{https://github.com/jugechengzi/Rationalization-MCD} 中找到。

Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment

paper_url: http://arxiv.org/abs/2309.13378
repo_url: https://github.com/hieu9955/ggggg
paper_authors: Yutong Xia, Yuxuan Liang, Haomin Wen, Xu Liu, Kun Wang, Zhengyang Zhou, Roger Zimmermann
for: 本文旨在解决预测空间时间图（STG）中的 temporal out-of-distribution（OoD）问题和动态空间 causation 问题。
methods: 本文提出了一种名为 CaST 的新框架，利用 causal 镜头来解读 STG 数据生成过程，并采用 back-door adjustment 和 front-door adjustment 等方法来处理 temporal OoD 问题和 causal 衍生效应。
results: 实验结果表明，CaST 可以准确地预测 STG，并且在三个实际数据集上表现出色，常常超过现有方法。此外，CaST 具有良好的解释性。

Abstract
Spatio-Temporal Graph (STG) forecasting is a fundamental task in many real-world applications. Spatio-Temporal Graph Neural Networks have emerged as the most popular method for STG forecasting, but they often struggle with temporal out-of-distribution (OoD) issues and dynamic spatial causation. In this paper, we propose a novel framework called CaST to tackle these two challenges via causal treatments. Concretely, leveraging a causal lens, we first build a structural causal model to decipher the data generation process of STGs. To handle the temporal OoD issue, we employ the back-door adjustment by a novel disentanglement block to separate invariant parts and temporal environments from input data. Moreover, we utilize the front-door adjustment and adopt the Hodge-Laplacian operator for edge-level convolution to model the ripple effect of causation. Experiments results on three real-world datasets demonstrate the effectiveness and practicality of CaST, which consistently outperforms existing methods with good interpretability.

摘要
espacio-temporal graph (STG) 预测是现实世界中许多应用场景中的基本任务。 espacio-temporal graph neural networks (STGNNs) 已经成为 STG 预测的最受欢迎方法，但它们经常面临时间外部预测 (OoD) 问题和动态空间 causation。在这篇论文中，我们提议一种名为 CaST 的框架，以解决这两个挑战。具体来说，我们首先利用 causal 镜头来理解 STG 数据生成过程。为了处理时间 OoD 问题，我们使用一种新的分离块来分离输入数据中的不变部分和时间环境。此外，我们使用 front-door 调整和霍迪-拉普拉斯算子来模型 causation 的涟漪效应。实验结果表明，CaST 在三个真实世界数据集上具有优秀的效果和可读性，并经常超越现有方法。

Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs

paper_url: http://arxiv.org/abs/2309.13365
repo_url: None
paper_authors: Hecotr Kohler, Riad Akrour, Philippe Preux
for: 提高AI模型的可解释性，以便用户建立对其信任。
methods: 使用强化学习框架，在DT中探索特征之间的关系，以建立更加紧凑的DT。
results: 通过抽离特征之间的关系，可以减少DT的大小，同时保持模型的性能。

Abstract
Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been proposed to explore the space of DTs using deep RL. This framework augments a decision problem (e.g. a supervised classification task) with additional actions that gather information about the features of an otherwise hidden input. By appropriately penalizing these actions, the agent learns to optimally trade-off size and performance of DTs. In practice, a reactive policy for a partially observable Markov decision process (MDP) needs to be learned, which is still an open problem. We show in this paper that deep RL can fail even on simple toy tasks of this class. However, when the underlying decision problem is a supervised classification task, we show that finding the optimal tree can be cast as a fully observable Markov decision problem and be solved efficiently, giving rise to a new family of algorithms for learning DTs that go beyond the classical greedy maximization ones.

摘要
“AI模型的可解释性允许用户建立信任，以建立可靠的AI。尤其是决策树（DT）可以提供全面的模型显示和对输入特征的透彻显示，从而帮助用户了解模型的问题。然而，如果DT太大，则可能会妨碍可解释性。为了学习尺寸小的DT，一个最近的强化学习（RL）框架已经被提议，通过将决策问题（例如分类任务）与额外的动作搜索整合，以便在搜索DT时，对输入特征进行有效的探索。在实践中，需要学习一个可 React的策略，这是一个 ainda 未解决的问题。我们在这篇论文中显示，深度RL可以在简单的玩具任务上失败，但当对决策问题时，我们可以将找到最佳树的问题转化为一个可观察的Markov决策过程（MDP），并有效地解决它，从而开启了一新的家族Algorithm для学习DT，与传统的单簇最大化算法不同。”

MLPST: MLP is All You Need for Spatio-Temporal Prediction

paper_url: http://arxiv.org/abs/2309.13363
repo_url: None
paper_authors: Zijian Zhang, Ze Huang, Zhiwei Hu, Xiangyu Zhao, Wanyu Wang, Zitao Liu, Junbo Zhang, S. Joe Qin, Hongwei Zhao
For: 预测交通流量，提高公共交通系统的运作效率和可靠性。* Methods: 提出了一种简单、轻量级的多层感知器（MLP）架构，通过快速和高效的MLP处理， capture 空间和时间关系，并且需要只有线性计算复杂度和模型参数数量相对较少。* Results: 经过广泛的实验 validate MLPST的高效性和灵活性，并且在模型准确率最高的情况下，MLPST achieves the best time and space efficiency。

Abstract
Traffic prediction is a typical spatio-temporal data mining task and has great significance to the public transportation system. Considering the demand for its grand application, we recognize key factors for an ideal spatio-temporal prediction method: efficient, lightweight, and effective. However, the current deep model-based spatio-temporal prediction solutions generally own intricate architectures with cumbersome optimization, which can hardly meet these expectations. To accomplish the above goals, we propose an intuitive and novel framework, MLPST, a pure multi-layer perceptron architecture for traffic prediction. Specifically, we first capture spatial relationships from both local and global receptive fields. Then, temporal dependencies in different intervals are comprehensively considered. Through compact and swift MLP processing, MLPST can well capture the spatial and temporal dependencies while requiring only linear computational complexity, as well as model parameters that are more than an order of magnitude lower than baselines. Extensive experiments validated the superior effectiveness and efficiency of MLPST against advanced baselines, and among models with optimal accuracy, MLPST achieves the best time and space efficiency.

摘要
很多人对汽车流量预测有很大的需求，因为它对城市交通系统的管理有着重要的作用。为了满足这些需求，我们认为一个理想的空间时间预测方法应该具备以下三个特点：高效、轻量级和有效。然而，目前的深度模型基于的空间时间预测解决方案通常具有复杂的体系和繁琐的优化，这些方法很难满足我们的期望。为了实现以上目标，我们提出了一种直观和新型的框架，即多层感知网络（MLPST）。特别是，我们首先从本地和全局感知场景中捕捉到空间关系。然后，在不同时间间隔中考虑到了时间关系。通过紧凑的MLP处理，MLPST可以很好地捕捉到空间和时间关系，同时计算复杂度只有线性增长，并且模型参数比基线模型高出一个数量级。我们进行了广泛的实验，并证明了MLPST在比较先进的基elines上的超越性和效率。在同等准确性下，MLPST在时间和空间效率方面具有优势。

Probing the Moral Development of Large Language Models through Defining Issues Test

paper_url: http://arxiv.org/abs/2309.13356
repo_url: None
paper_authors: Kumar Tanmay, Aditi Khandelwal, Utkarsh Agarwal, Monojit Choudhury
for: 这项研究用于测试LLMs的道德理解能力，使用定义问题测试（DIT），这是根据科尔堡认知道的道德发展模型（KCDM）而开发的一种心理测试。
methods: 这项研究使用DIT测试LLMs的道德理解能力，包括用道德决策问题和道德考虑因素，评估 respondent 对问题的解决方案和道德价值观的重要性。
results: 研究显示，早期LLMs如GPT-3的道德理解能力与随机基线相当，而ChatGPT、Llama2-Chat、PaLM-2和GPT-4则表现出较好的道德理解能力，与成年人相当。GPT-4的后konventional道德理解分数最高，与典型大学生相当。但是，模型在不同的决策问题上表现不一致，指出了其理解和解决能力的重要缺陷。

Abstract
In this study, we measure the moral reasoning ability of LLMs using the Defining Issues Test - a psychometric instrument developed for measuring the moral development stage of a person according to the Kohlberg's Cognitive Moral Development Model. DIT uses moral dilemmas followed by a set of ethical considerations that the respondent has to judge for importance in resolving the dilemma, and then rank-order them by importance. A moral development stage score of the respondent is then computed based on the relevance rating and ranking. Our study shows that early LLMs such as GPT-3 exhibit a moral reasoning ability no better than that of a random baseline, while ChatGPT, Llama2-Chat, PaLM-2 and GPT-4 show significantly better performance on this task, comparable to adult humans. GPT-4, in fact, has the highest post-conventional moral reasoning score, equivalent to that of typical graduate school students. However, we also observe that the models do not perform consistently across all dilemmas, pointing to important gaps in their understanding and reasoning abilities.

摘要
在这项研究中，我们测量了LLM的道德思维能力使用定义问题测试（DIT）——一种心理测量instrument，用于测量人类的道德发展阶段 according to Kohlberg's cognitive moral development model。DIT使用道德困境，然后提供一组伦理考虑，请求参与者根据其重要性来评价和排序。根据参与者的道德发展阶段分数， compute the moral development stage score。 our study shows that early LLMs such as GPT-3 do not exhibit any better moral reasoning ability than a random baseline, while ChatGPT, Llama2-Chat, PaLM-2 and GPT-4 show significantly better performance on this task, comparable to adult humans. GPT-4, in fact, has the highest post-conventional moral reasoning score, equivalent to that of typical graduate school students. However, we also observe that the models do not perform consistently across all dilemmas, pointing to important gaps in their understanding and reasoning abilities.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Lexical Squad@Multimodal Hate Speech Event Detection 2023: Multimodal Hate Speech Detection using Fused Ensemble Approach

paper_url: http://arxiv.org/abs/2309.13354
repo_url: https://github.com/m0hammad-kashif/multimodalhatespeech
paper_authors: Mohammad Kashif, Mohammad Zohair, Saquib Ali
for: 本研究旨在探讨如何使用多模态学习方法来检测仇恨言论。methods: 本研究使用了InceptionV3、BERT和XLNet等现状模型，并将其组合成一个ensemble模型来检测仇恨言论。results: 研究得出了75.21%的准确率和74.96%的F1分数，并进行了实验来证明模型在预测和分类上的性能。

Abstract
With a surge in the usage of social media postings to express opinions, emotions, and ideologies, there has been a significant shift towards the calibration of social media as a rapid medium of conveying viewpoints and outlooks over the globe. Concurrently, the emergence of a multitude of conflicts between two entities has given rise to a stream of social media content containing propaganda, hate speech, and inconsiderate views. Thus, the issue of monitoring social media postings is rising swiftly, attracting major attention from those willing to solve such problems. One such problem is Hate Speech detection. To mitigate this problem, we present our novel ensemble learning approach for detecting hate speech, by classifying text-embedded images into two labels, namely "Hate Speech" and "No Hate Speech". We have incorporated state-of-art models including InceptionV3, BERT, and XLNet. Our proposed ensemble model yielded promising results with 75.21 and 74.96 as accuracy and F-1 score (respectively). We also present an empirical evaluation of the text-embedded images to elaborate on how well the model was able to predict and classify. We release our codebase here (https://github.com/M0hammad-Kashif/MultiModalHateSpeech).

摘要
受社交媒体发表意见、情感和意识形态的使用量增加，社交媒体已成为全球快速传递观点和视野的重要媒体。同时，全球多个问题的出现导致社交媒体内容中充斥着宣传、仇恨言论和不谨慎的观点。因此，监测社交媒体帖子的问题日益减少，引起了广泛的关注。其中，我们提出了一种新的ensemble学习方法，用于检测嫌 speech，通过将文本嵌入图像分为两个标签：“嫌 speech”和“无嫌 speech”。我们把state-of-art模型，如InceptionV3、BERT和XLNet纳入了我们的提案模型中。我们的提案模型在实验中达到了75.21%和74.96%的准确率和F-1分数（分别）。我们还进行了employnesian评估，以便更好地描述模型是如何预测和分类文本嵌入图像。我们在github上发布了代码库（https://github.com/M0hammad-Kashif/MultiModalHateSpeech）。

An In-depth Survey of Large Language Model-based Artificial Intelligence Agents

paper_url: http://arxiv.org/abs/2309.14365
repo_url: None
paper_authors: Pengyu Zhao, Zijian Jin, Ning Cheng
for: 本文主要研究大语言模型（LLM）与传统人工智能（AI）代理之间的主要区别和特点，以及 LLM 基于 AI 代理的可能性和潜力。
methods: 本文首先比较了这两种代理的基本特点，并详细分析了 AI 代理的关键组件，包括规划、记忆和工具使用。特别是在记忆方面，本文提出了一种创新的分类方法，不仅与传统分类方法不同，还为 AI 代理的记忆系统设计提供了新的视角。
results: 本文通过对核心组件的深入分析，为未来人工智能代理技术的发展提供了坚实的基础。文章最后还提出了进一步研究的方向，以便为学术研究人员提供价值的思路和指导。

Abstract
Due to the powerful capabilities demonstrated by large language model (LLM), there has been a recent surge in efforts to integrate them with AI agents to enhance their performance. In this paper, we have explored the core differences and characteristics between LLM-based AI agents and traditional AI agents. Specifically, we first compare the fundamental characteristics of these two types of agents, clarifying the significant advantages of LLM-based agents in handling natural language, knowledge storage, and reasoning capabilities. Subsequently, we conducted an in-depth analysis of the key components of AI agents, including planning, memory, and tool use. Particularly, for the crucial component of memory, this paper introduced an innovative classification scheme, not only departing from traditional classification methods but also providing a fresh perspective on the design of an AI agent's memory system. We firmly believe that in-depth research and understanding of these core components will lay a solid foundation for the future advancement of AI agent technology. At the end of the paper, we provide directional suggestions for further research in this field, with the hope of offering valuable insights to scholars and researchers in the field.

摘要
因为大型语言模型（LLM）的强大能力，近期有大量努力尝试将其与人工智能代理 integrate 以提高性能。在这篇论文中，我们探讨了 LLM 基于代理的核心差异和特点。 Specifically，我们首先比较了这两种代理的基本特点，明确 LLM 基于代理在自然语言处理、知识存储和 raison d'être 能力方面的显著优势。接着，我们进行了深入的分析代理的关键组件，包括规划、记忆和工具使用。尤其是在关键组件中的记忆方面，这篇论文提出了一种创新的分类方法，不仅与传统分类方法不同，而且为 AI 代理的记忆系统设计提供了新的视角。我们认为深入研究和理解这些核心组件将为未来人工智能代理技术的发展 lay 下一定的基础。总之，在这篇论文的结尾，我们提出了一些方向性的建议，以期为学者和研究人员在这个领域提供价值的信息。

LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers?

paper_url: http://arxiv.org/abs/2309.13340
repo_url: None
paper_authors: Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu
for: 本研究使用大语言模型（LLM）来解释黑obox文本分类器的决策，通过生成后续的、模型无关的counterfactual解释。
methods: 我们提出了一个管道，使用LLM来生成post-hoc、模型无关的counterfactual解释，通过（i）利用LLM的文本理解能力来标识和提取潜在特征，以及（ii）利用LLM的推倒和生成能力来生成counterfactual解释。
results: 我们在一组state-of-the-art LLM中评估了三种变体，包括不同的特征提取方法和Counterfactual解释生成方法。我们发现这些模型在不同的设置中的性能不同，一种基于两步特征提取的全变体在大多数情况下表现最佳。我们的管道可以用于自动解释系统，可能减少人工劳动。

Abstract
Large language models (LLMs) are increasingly being used for tasks beyond text generation, including complex tasks such as data labeling, information extraction, etc. With the recent surge in research efforts to comprehend the full extent of LLM capabilities, in this work, we investigate the role of LLMs as counterfactual explanation modules, to explain decisions of black-box text classifiers. Inspired by causal thinking, we propose a pipeline for using LLMs to generate post-hoc, model-agnostic counterfactual explanations in a principled way via (i) leveraging the textual understanding capabilities of the LLM to identify and extract latent features, and (ii) leveraging the perturbation and generation capabilities of the same LLM to generate a counterfactual explanation by perturbing input features derived from the extracted latent features. We evaluate three variants of our framework, with varying degrees of specificity, on a suite of state-of-the-art LLMs, including ChatGPT and LLaMA 2. We evaluate the effectiveness and quality of the generated counterfactual explanations, over a variety of text classification benchmarks. Our results show varied performance of these models in different settings, with a full two-step feature extraction based variant outperforming others in most cases. Our pipeline can be used in automated explanation systems, potentially reducing human effort.

摘要
大型语言模型（LLM） increasingly 用于 tasks beyond 文本生成，包括复杂的任务，如数据标签、信息提取等。随着研究尝试理解 LLM 的全面能力，在这个工作中，我们 investigate LLM 作为 counterfactual explanation module，以解释黑色盒子文本分类器的决策。灵感自 causal 思维，我们提出一个管道，使用 LLM 生成 post-hoc，model-agnostic counterfactual explanations 的方式，包括：(i) 利用 LLM 的文本理解能力，识别和提取 latent features，以及 (ii) 利用 LLM 的干扰和生成能力，对 input features 进行推变，生成 counterfactual explanation。我们评估了三种不同的框架，以不同的具体性，在一些最新的 LLM 上，包括 ChatGPT 和 LLaMA 2。我们评估这些模型在不同的设定下的效能和质量，并发现在大多数情况下，一个完整的 two-step 特征提取基于的Variant 表现较好。我们的管道可以用于自动解释系统，可能将人类努力削减。

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

paper_url: http://arxiv.org/abs/2309.13339
repo_url: None
paper_authors: Xufeng Zhao, Mengdi Li, Wenhao Lu, Cornelius Weber, Jae Hee Lee, Kun Chu, Stefan Wermter
for: 提高大型自然语言模型的逻辑推理能力
methods: 基于符号逻辑原理的符号神经网络框架LogiCoT
results: 在不同领域的语言任务上，LogiCoT能够提高大型自然语言模型的逻辑推理能力，并且可以避免生成模型的幻觉现象

Abstract
Recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their behavior, particularly in terms of reasoning, often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. Generative language models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. Aiming to improve the zero-shot chain-of-thought reasoning ability of large language models, we propose Logical Chain-of-Thought (LogiCoT), a neurosymbolic framework that leverages principles from symbolic logic to verify and revise the reasoning processes accordingly. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of the enhanced reasoning paradigm by logic.

摘要
recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their behavior, particularly in terms of reasoning, often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. Generative language models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. aiming to improve the zero-shot chain-of-thought reasoning ability of large language models, we propose Logical Chain-of-Thought (LogiCoT), a neurosymbolic framework that leverages principles from symbolic logic to verify and revise the reasoning processes accordingly. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of the enhanced reasoning paradigm by logic.Here's the word-for-word translation in Simplified Chinese:最近的大语言模型突破有让人印象深刻的多领域普适性。然而，它们的理解能力仍然有很大的改进空间，特别是在多步逻辑场景下。虽然大语言模型拥有庞大的知识，但它们的行为，尤其是在理解方面，经常不能充分利用这些知识来建立一个有效的思维模式。生成语言模型有时会出现幻见，因为它们的理解过程没有遵循逻辑原则。为了提高大语言模型的零shot逻辑链条理解能力，我们提出了Logical Chain-of-Thought（LogiCoT），一种符号逻辑框架，利用符号逻辑原理来验证和修改理解过程。在不同领域的语言任务上，包括算术、常识、符号、 causal inference 和社会问题，我们进行了实验评估，并证明了增强的理解模式的有效性。

Diversifying Question Generation over Knowledge Base via External Natural Questions

paper_url: http://arxiv.org/abs/2309.14362
repo_url: None
paper_authors: Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen
for: 本研究旨在提高知识基础问题生成（KBQG）的质量。
methods: 本研究提出了一种新的多元评价指标，以度量生成的问题的多样性，并 introduces 一种双模型框架，通过两种选择策略来生成多元的问题。
results: 实验结果表明，提出的方法可以生成高度多元的问题，并提高问题回答 task 的性能。

Abstract
Previous methods on knowledge base question generation (KBQG) primarily focus on enhancing the quality of a single generated question. Recognizing the remarkable paraphrasing ability of humans, we contend that diverse texts should convey the same semantics through varied expressions. The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity. Current metrics inadequately assess the above diversity since they calculate the ratio of unique n-grams in the generated question itself, which leans more towards measuring duplication rather than true diversity. Accordingly, we devise a new diversity evaluation metric, which measures the diversity among top-k generated questions for each instance while ensuring their relevance to the ground truth. Clearly, the second challenge is how to enhance diversifying question generation. To address this challenge, we introduce a dual model framework interwoven by two selection strategies to generate diverse questions leveraging external natural questions. The main idea of our dual framework is to extract more diverse expressions and integrate them into the generation model to enhance diversifying question generation. Extensive experiments on widely used benchmarks for KBQG demonstrate that our proposed approach generates highly diverse questions and improves the performance of question answering tasks.

摘要
To address this challenge, we propose a new evaluation metric for diversity that measures the diversity among the top-k generated questions for each instance while ensuring their relevance to the ground truth. Additionally, we introduce a dual model framework that leverages external natural questions to generate diverse questions. Our approach extracts more diverse expressions and integrates them into the generation model to enhance diversifying question generation.We demonstrate the effectiveness of our approach through extensive experiments on widely used benchmarks for KBQG. Our proposed approach generates highly diverse questions and improves the performance of question answering tasks.

Class Attendance System in Education with Deep Learning Method

paper_url: http://arxiv.org/abs/2309.13317
repo_url: None
paper_authors: Hüdaverdi Demir, Serkan Savaş
For: The paper is written for the purpose of developing a system using deep learning methods for object detection in images to record students’ entrance to educational institutions and to perform class attendance.* Methods: The paper uses deep learning methods, specifically object detection algorithms, to detect students’ entrance and attendance in educational institutions.* Results: The study successfully implemented the object detection system and will be applied to real-life problems in a school in the 2022-2023 academic year.Here is the information in Simplified Chinese text:
for: 本研究旨在开发基于深度学习方法的对象检测系统，用于记录学生入学教育机构和进行课程参加。
methods: 本研究使用深度学习方法，具体来说是对象检测算法，来检测学生入学和课程参加。
results: 研究成功实现对象检测系统，将在2022-2023学年度应用到实际问题中。

Abstract
With the advancing technology, the hardware gain of computers and the increase in the processing capacity of processors have facilitated the processing of instantaneous and real-time images. Face recognition processes are also studies in the field of image processing. Facial recognition processes are frequently used in security applications and commercial applications. Especially in the last 20 years, the high performances of artificial intelligence (AI) studies have contributed to the spread of these studies in many different fields. Education is one of them. The potential and advantages of using AI in education; can be grouped under three headings: student, teacher, and institution. One of the institutional studies may be the security of educational environments and the contribution of automation to education and training processes. From this point of view, deep learning methods, one of the sub-branches of AI, were used in this study. For object detection from images, a pioneering study has been designed and successfully implemented to keep records of students' entrance to the educational institution and to perform class attendance with images taken from the camera using image processing algorithms. The application of the study to real-life problems will be carried out in a school determined in the 2022-2023 academic year.

摘要
The potential benefits of using AI in education can be grouped into three categories: student, teacher, and institution. One of the institutional studies is the security of educational environments and the contribution of automation to education and training processes. In this study, deep learning methods, a sub-branch of AI, were used for object detection from images. The study was designed to record students' entrance to the educational institution and to perform class attendance using images taken from cameras and image processing algorithms. The application of the study to real-life problems will be carried out in a school during the 2022-2023 academic year.

USL-Net: Uncertainty Self-Learning Network for Unsupervised Skin Lesion Segmentation

paper_url: http://arxiv.org/abs/2309.13289
repo_url: None
paper_authors: Xiaofan Li, Bo Peng, Daipeng Yang, Zhuyang Xie
for: 这个研究旨在提出一个无监督的皮肤条件分类方法，以解决无监督皮肤条件分类中的挑战。
methods: 本研究使用了自我学习网络（USL-Net），通过对照学习提取特征，然后生成分类对应的活化地图（CAM）来进行分类。高度活化的地图区域表示皮肤条件的重要性，而低度活化的区域则表示背景。
results: 实验结果显示，本方法可以与弱监督和监督方法相比，并且超过其他已有的无监督方法的性能。

Abstract
Unsupervised skin lesion segmentation offers several benefits, including conserving expert human resources, reducing discrepancies due to subjective human labeling, and adapting to novel environments. However, segmenting dermoscopic images without manual labeling guidance presents significant challenges due to dermoscopic image artifacts such as hair noise, blister noise, and subtle edge differences. To address these challenges, we introduce an innovative Uncertainty Self-Learning Network (USL-Net) designed for skin lesion segmentation. The USL-Net can effectively segment a range of lesions, eliminating the need for manual labeling guidance. Initially, features are extracted using contrastive learning, followed by the generation of Class Activation Maps (CAMs) as saliency maps using these features. The different CAM locations correspond to the importance of the lesion region based on their saliency. High-saliency regions in the map serve as pseudo-labels for lesion regions while low-saliency regions represent the background. However, intermediate regions can be hard to classify, often due to their proximity to lesion edges or interference from hair or blisters. Rather than risk potential pseudo-labeling errors or learning confusion by forcefully classifying these regions, we consider them as uncertainty regions, exempting them from pseudo-labeling and allowing the network to self-learn. Further, we employ connectivity detection and centrality detection to refine foreground pseudo-labels and reduce noise-induced errors. The application of cycle refining enhances performance further. Our method underwent thorough experimental validation on the ISIC-2017, ISIC-2018, and PH2 datasets, demonstrating that its performance is on par with weakly supervised and supervised methods, and exceeds that of other existing unsupervised methods.

摘要
无监督皮肤病变分割具有多个优点，如保留专业人员资源、减少主观人类标注的差异和适应新环境。然而，在无人标注指导下对德朗斯科普图像进行分割存在 significanti挑战，主要是因为德朗斯科普图像的艺术ifacts，如毛发噪声、膨涨噪声和细微边缘差异。为解决这些挑战，我们提出了一种创新的不确定自学习网络（USL-Net），用于皮肤病变分割。USL-Net可以有效地分割多种病变，无需人工标注指导。首先，通过对比学习提取特征，然后通过这些特征生成类活动图（CAM）作为saliency map。不同的CAM位置对病变区域的重要性进行标识，高Saliency区域在Map中对病变区域具有高重要性，而低Saliency区域则代表背景。然而，中间区域可能具有困难分类，常常是因为 lesion edge 邻近或毛发/膨涨噪声的干扰。而不要强制性地将这些区域分类为病变区域，我们将其视为不确定区域，并将其除外。此外，我们还使用连接检测和中心检测来改进前景 pseudo-标注和减少噪声引起的错误。通过循环反馈，我们进一步提高性能。我们的方法在 ISIC-2017、ISIC-2018 和 PH2 数据集上进行了严格的实验验证，结果表明其性能与弱监督和监督方法相当，并超过了其他现有的无监督方法。

paper_url: http://arxiv.org/abs/2309.13285
repo_url: None
paper_authors: Zhehui Huang, Zhaojing Yang, Rahul Krupani, Baskın Şenbaşlar, Sumeet Batra, Gaurav S. Sukhatme
for: 这paper的目的是使用end-to-end深度强化学习控制带有障碍物的四旋翼机器人群体。
methods: 这paper使用的方法包括curriculum学习和归一化缓存，以及对邻居机器人和障碍物的注意机制。
results: 这paper的结果表明，通过使用这些方法，可以在带有障碍物的环境中控制四旋翼机器人群体，并且可以在真实的quadrotor上进行零shot传输。

Abstract
End-to-end deep reinforcement learning (DRL) for quadrotor control promises many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Video demonstrations are available on the project website at: https://sites.google.com/view/obst-avoid-swarm-rl.

摘要
<>TRANSLATE_TEXTEnd-to-end deep reinforcement learning (DRL) for quadrotor control quadrotor 执行 controller 承诺 many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Video demonstrations are available on the project website at: https://sites.google.com/view/obst-avoid-swarm-rl.TRANSLATE_TEXT

Being Aware of Localization Accuracy By Generating Predicted-IoU-Guided Quality Scores

paper_url: http://arxiv.org/abs/2309.13269
repo_url: https://github.com/panffeereal/clq
paper_authors: Pengfei Liu, Weibo Wang, Yuhan Guo, Jiubin Tan
for: 提高检测性能，通过同时考虑分类分数和地址准确率，提高检测质量。
methods: 采用了新的检测架构，即CLQ，其包括一个简洁的LQE分支，用于获取地址质量分数指导。在训练和推理过程中，LQE分支与分类分支结合在一起，生成一个共同的分类-地址-质量表示。
results: 在COCO测试dev数据集上，CLQ实现了最新的状态艺术性能，具有47.8 AP和11.5 fps的速度，并且在ATSS上进行扩展，实现了可靠的1.2 AP提升。

Abstract
Localization Quality Estimation (LQE) helps to improve detection performance as it benefits post processing through jointly considering classification score and localization accuracy. In this perspective, for further leveraging the close relationship between localization accuracy and IoU (Intersection-Over-Union), and for depressing those inconsistent predictions, we designed an elegant LQE branch to acquire localization quality score guided by predicted IoU. Distinctly, for alleviating the inconsistency of classification score and localization quality during training and inference, under which some predictions with low classification scores but high LQE scores will impair the performance, instead of separately and independently setting, we embedded LQE branch into classification branch, producing a joint classification-localization-quality representation. Then a novel one stage detector termed CLQ is proposed. Extensive experiments show that CLQ achieves state-of-the-arts' performance at an accuracy of 47.8 AP and a speed of 11.5 fps with ResNeXt-101 as backbone on COCO test-dev. Finally, we extend CLQ to ATSS, producing a reliable 1.2 AP gain, showing our model's strong adaptability and scalability. Codes are released at https://github.com/PanffeeReal/CLQ.

摘要
本文提出了一种新的一Stage检测器（CLQ），通过结合分类分支和本地化质量估计（LQE）分支，实现了分类、本地化和质量三者的共同表示。这种方法可以解决在训练和推断过程中分类分数和本地化质量之间的不一致问题，从而提高检测性能。实验表明，CLQ在COCO测试预训练集上达到了47.8 AP的最佳性能和11.5 fps的速度，并且对ATSS进行扩展，实现了可靠的1.2 AP提升。代码可以在github上找到。

paper_url: http://arxiv.org/abs/2309.13266
repo_url: https://github.com/wzcai99/Distill-Navigator
paper_authors: Wenzhe Cai, Guangran Cheng, Lingyue Kong, Lu Dong, Changyin Sun
for: 提高机器人Navigation技能的通用化和实际应用（improving the generalization of mobile robot navigation skills and achieving sim-to-real transfer）
methods: 跨模态融合方法和教师学生填充框架（cross-modal fusion method and teacher-student distillation architecture）
results: 比基eline表现出色，在 simulated和实际环境中实现了Robust Navigation性能（outperforms the baselines in both simulated and real-world environments, achieving robust navigation performance with varying working conditions）

Abstract
Recently, learning-based approaches show promising results in navigation tasks. However, the poor generalization capability and the simulation-reality gap prevent a wide range of applications. We consider the problem of improving the generalization of mobile robots and achieving sim-to-real transfer for navigation skills. To that end, we propose a cross-modal fusion method and a knowledge transfer framework for better generalization. This is realized by a teacher-student distillation architecture. The teacher learns a discriminative representation and the near-perfect policy in an ideal environment. By imitating the behavior and representation of the teacher, the student is able to align the features from noisy multi-modal input and reduce the influence of variations on navigation policy. We evaluate our method in simulated and real-world environments. Experiments show that our method outperforms the baselines by a large margin and achieves robust navigation performance with varying working conditions.

摘要
最近，学习基于方法在导航任务中显示了有前途的结果。然而，低泛化能力和实验室实际差距阻碍了广泛应用。我们对移动机器人的改进泛化和实际协同转移技能的问题进行了考虑。为达到这一目标，我们提出了跨模态融合方法和知识传递框架。这是通过教师学生热退架构实现的。教师在理想环境中学习一个抽象表示和准确策略。通过imiter教师的行为和表示，学生可以将多种不稳定的输入特征相互拟合，并减少导航策略中的变化影响。我们在模拟和实际环境中进行了测试，实验结果表明，我们的方法在基eline上大幅超越，并在不同工作条件下实现了稳定的导航性能。

Optimizing Chance-Constrained Submodular Problems with Variable Uncertainties

paper_url: http://arxiv.org/abs/2309.14359
repo_url: None
paper_authors: Xiankun Yan, Anh Viet Do, Feng Shi, Xiaoyu Qin, Frank Neumann
for: 这个论文是关于概率性约束限制的实时优化问题的研究，具体来说是关于可变权重项目中的可变概率约束。
methods: 该论文使用了抽象搜索算法和随机搜索算法来解决可变权重项目中的可变概率约束问题。
results: 该论文通过分析和实验表明，使用抽象搜索算法和随机搜索算法可以在可变权重项目中提供高质量的解决方案，具体来说是一个常数近似比例的解决方案。

Abstract
Chance constraints are frequently used to limit the probability of constraint violations in real-world optimization problems where the constraints involve stochastic components. We study chance-constrained submodular optimization problems, which capture a wide range of optimization problems with stochastic constraints. Previous studies considered submodular problems with stochastic knapsack constraints in the case where uncertainties are the same for each item that can be selected. However, uncertainty levels are usually variable with respect to the different stochastic components in real-world scenarios, and rigorous analysis for this setting is missing in the context of submodular optimization. This paper provides the first such analysis for this case, where the weights of items have the same expectation but different dispersion. We present greedy algorithms that can obtain a high-quality solution, i.e., a constant approximation ratio to the given optimal solution from the deterministic setting. In the experiments, we demonstrate that the algorithms perform effectively on several chance-constrained instances of the maximum coverage problem and the influence maximization problem.

摘要
机会约束 frequently 用于限制实际问题中的约束违背 probabilities。我们研究 chance-constrained submodular optimization problems，这些问题涵盖了实际问题中具有恒等约束的变量组合。 previous studies 对 submodular problems with stochastic knapsack constraints 进行了研究，但是在实际情况中，不同的随机成分之间的不确定程度通常是不同的，而这项研究在 submodular optimization 的 context 中缺乏 rigorous analysis。 this paper 提供了 first such analysis for this case，where the weights of items have the same expectation but different dispersion. we present greedy algorithms that can obtain a high-quality solution, i.e., a constant approximation ratio to the given optimal solution from the deterministic setting. in the experiments, we demonstrate that the algorithms perform effectively on several chance-constrained instances of the maximum coverage problem and the influence maximization problem.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

WikiMT++ Dataset Card

paper_url: http://arxiv.org/abs/2309.13259
repo_url: None
paper_authors: Monan Zhou, Shangda Wu, Yuan Wang, Wei Li
for: 扩展和改进 WikiMusicText 数据集，用于音乐信息检索、条件音乐生成、自动作曲和情感分类等应用场景。
methods: 添加对象属性（专辑、歌词、视频）和主观情感属性（12种情感词），以及使用 CLaMP 进行属性修正，以提高数据集的准确性和完整性。
results: 提高了 WikiMT 的应用场景和可用性，并且通过添加新的属性和修正原始数据，提高了数据集的准确性和完整性。

Abstract
WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT), featuring 1010 curated lead sheets in ABC notation. To expand application scenarios of WikiMT, we add both objective (album, lyrics, video) and subjective emotion (12 emotion adjectives) and emo\_4q (Russell 4Q) attributes, enhancing its usability for music information retrieval, conditional music generation, automatic composition, and emotion classification, etc. Additionally, CLaMP is implemented to correct the attributes inherited from WikiMT to reduce errors introduced during original data collection and enhance the accuracy and completeness of our dataset.

摘要
WikiMT++是 WikiMusicText（WikiMT）的扩展和改进版本，包含1010个精心编辑的领导Sheet在ABCnotation。为扩展 WikiMT 的应用场景，我们添加了对象（专辑、歌词、视频）和主观情感（12种情感形容词）以及 emo\_4q（Russell 4Q）属性，从而提高了音乐信息检索、 conditional music generation、自动作曲和情感分类等方面的可用性。此外，CLaMP 也被实现，以修正从 WikiMT 继承的属性，以降低在原始数据收集过程中引入的错误，提高数据集的准确性和完整性。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation. It is commonly used in mainland China and Singapore.

Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

paper_url: http://arxiv.org/abs/2309.13256
repo_url: https://github.com/zhaohan-xi/plm-prompt-defense
paper_authors: Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Jinghui Chen, Fenglong Ma, Ting Wang
for: This paper is written to investigate the security risks of pre-trained language models (PLMs) as few-shot learners and to propose a novel defense mechanism called MDP to address these risks.
methods: The paper uses a pilot study to demonstrate the vulnerability of PLMs to backdoor attacks in few-shot scenarios and proposes MDP as a lightweight, pluggable, and effective defense. MDP leverages the gap between the masking-sensitivity of poisoned and clean samples to identify poisoned samples.
results: The paper shows the efficacy of MDP through analytical analysis and empirical evaluation using benchmark datasets and representative attacks. The results demonstrate that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.

Abstract
Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP.

摘要

Can I Trust the Explanations? Investigating Explainable Machine Learning Methods for Monotonic Models

paper_url: http://arxiv.org/abs/2309.13246
repo_url: None
paper_authors: Dangxing Chen
for: 这个论文主要针对的是解释性机器学习方法在具有领域知识的模型上的应用。
methods: 这个论文使用了解释性机器学习方法，包括基准值法和集成梯度法，来解释具有领域知识的模型的决策过程。
results: 研究发现，当只有个体偏好 monotonicity 存在时，基准值法可以提供良好的解释；而当强对比 monotonicity 存在时，集成梯度法在平均情况下可以提供相对更好的解释。

Abstract
In recent years, explainable machine learning methods have been very successful. Despite their success, most explainable machine learning methods are applied to black-box models without any domain knowledge. By incorporating domain knowledge, science-informed machine learning models have demonstrated better generalization and interpretation. But do we obtain consistent scientific explanations if we apply explainable machine learning methods to science-informed machine learning models? This question is addressed in the context of monotonic models that exhibit three different types of monotonicity. To demonstrate monotonicity, we propose three axioms. Accordingly, this study shows that when only individual monotonicity is involved, the baseline Shapley value provides good explanations; however, when strong pairwise monotonicity is involved, the Integrated gradients method provides reasonable explanations on average.

摘要

UniHead: Unifying Multi-Perception for Detection Heads

paper_url: http://arxiv.org/abs/2309.13242
repo_url: https://github.com/zht8506/unihead
paper_authors: Hantao Zhou, Rui Yang, Yachao Zhang, Haoran Duan, Yawen Huang, Runze Hu, Xiu Li, Yefeng Zheng
for:This paper aims to improve the object detection performance by developing a novel detection head called UniHead, which unifies three perceptual abilities simultaneously: deformation perception, global perception, and cross-task perception.methods:The proposed UniHead uses a Dual-axial Aggregation Transformer (DAT) to model long-range dependencies and adaptively sample object features, as well as a Cross-task Interaction Transformer (CIT) to facilitate interaction between the classification and localization branches.results:The proposed UniHead achieves significant improvements in object detection performance on the COCO dataset, with +2.7 AP gains in RetinaNet, +2.9 AP gains in FreeAnchor, and +2.1 AP gains in GFL, compared to the baseline methods. The code will be publicly available at https://github.com/zht8506/UniHead.

Abstract
The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception, global perception and cross-task perception. Despite numerous methods attempt to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we have developed an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach (1) introduces deformation perception, enabling the model to adaptively sample object features; (2) proposes a Dual-axial Aggregation Transformer (DAT) to adeptly model long-range dependencies, thereby achieving global perception; and (3) devises a Cross-task Interaction Transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain +2.7 AP gains in RetinaNet, +2.9 AP gains in FreeAnchor, and +2.1 AP gains in GFL. The code will be publicly available. Code Url: https://github.com/zht8506/UniHead.

摘要
历史头部是目标检测器中的关键组件，负责执行分类和 lokalisierung 功能。可惜，通常使用的并行头部frequentlylacks omni perceptual capabilities, such as deformation perception, global perception, and cross-task perception. Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we have developed an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach:1. 引入了形态感知，使模型可以适应性地采样对象特征;2. 提出了 Dual-axial Aggregation Transformer (DAT)，以便模型长距离依赖关系，实现全球感知;3. 设计了 Cross-task Interaction Transformer (CIT)，以便类别和 lokalisierung 分支之间进行交互，使两个任务进行一致。作为一种插件式方法，我们的 UniHead 可以方便地与现有的检测器集成。广泛的实验表明，我们的 UniHead 可以为许多检测器带来显著改进。例如，UniHead 可以在 RetinaNet 中提高 +2.7 AP 得分，在 FreeAnchor 中提高 +2.9 AP 得分，和在 GFL 中提高 +2.1 AP 得分。代码将公开。代码Url：https://github.com/zht8506/UniHead。

Heterogeneous Feature Representation for Digital Twin-Oriented Complex Networked Systems

paper_url: http://arxiv.org/abs/2309.13229
repo_url: None
paper_authors: Jiaqi Wen, Bogdan Gabrys, Katarzyna Musial
for: 这项研究旨在提高复杂网络系统（CNS）模型的表达能力，以更好地反映实际世界系统。
methods: 该研究使用了不同的特征表示原则，包括整数特征值和杂化集，以描述节点特征的客观和主观含义。
results: 研究发现，使用杂化集表示法可以提高模型的表达能力，并且不同的特征表示方法会影响网络结构和疫情蔓延速度，需要采取不同的缓冲策略来适应不同人群。

Abstract
Building models of Complex Networked Systems (CNS) that can accurately represent reality forms an important research area. To be able to reflect real world systems, the modelling needs to consider not only the intensity of interactions between the entities but also features of all the elements of the system. This study aims to improve the expressive power of node features in Digital Twin-Oriented Complex Networked Systems (DT-CNSs) with heterogeneous feature representation principles. This involves representing features with crisp feature values and fuzzy sets, each describing the objective and the subjective inductions of the nodes' features and feature differences. Our empirical analysis builds DT-CNSs to recreate realistic physical contact networks in different countries from real node feature distributions based on various representation principles and an optimised feature preference. We also investigate their respective disaster resilience to an epidemic outbreak starting from the most popular node. The results suggest that the increasing flexibility of feature representation with fuzzy sets improves the expressive power and enables more accurate modelling. In addition, the heterogeneous features influence the network structure and the speed of the epidemic outbreak, requiring various mitigation policies targeted at different people.

摘要
Translation notes:* "Complex Networked Systems" (CNS) is translated as "复杂网络系统" (CNS) in Simplified Chinese.* "Digital Twin-Oriented Complex Networked Systems" (DT-CNSs) is translated as "数字双向复杂网络系统" (DT-CNSs) in Simplified Chinese.* "heterogeneous feature representation principles" is translated as "不同类型特征表示原则" in Simplified Chinese.* "crisp feature values" is translated as "分割特征值" in Simplified Chinese.* "fuzzy sets" is translated as "柔软集" in Simplified Chinese.* "objective and subjective inductions of the nodes' features" is translated as "节点特征的客观和主观推导" in Simplified Chinese.* "feature differences" is translated as "特征差异" in Simplified Chinese.* "empirical analysis" is translated as "实证分析" in Simplified Chinese.* "physical contact networks" is translated as "物理接触网络" in Simplified Chinese.* "real node feature distributions" is translated as "真实节点特征分布" in Simplified Chinese.* "optimized feature preference" is translated as "优化特征偏好" in Simplified Chinese.* "disaster resilience" is translated as "灾害抗性" in Simplified Chinese.* "epidemic outbreak" is translated as "疫情爆发" in Simplified Chinese.* "most popular node" is translated as "最受欢迎的节点" in Simplified Chinese.* "mitigation policies" is translated as "缓解措施" in Simplified Chinese.* "different people" is translated as "不同人群" in Simplified Chinese.

Pick Planning Strategies for Large-Scale Package Manipulation

paper_url: http://arxiv.org/abs/2309.13224
repo_url: None
paper_authors: Shuai Li, Azarakhsh Keipour, Kevin Jamieson, Nicolas Hudson, Sicong Zhao, Charles Swan, Kostas Bekris
for: 提高仓储运作效率，降低物流成本，提高交往速度，提高市场波动的抗性。
methods: 使用Robin营运系统进行大规模包裹排序和单独，每天处理600万个包裹，总共处理20亿个包裹。开发了各种论述方法，其中包括使用实际生产数据训练的抓取质量预测器。
results: 本研究是首次在真实生产环境中大规模应用学习抓取质量预测器。

Abstract
Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to market fluctuations. This extended abstract showcases a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which is used for picking and singulating up to 6 million packages per day and so far has manipulated over 2 billion packages. It describes the various heuristic methods developed over time and their successor, which utilizes a pick success predictor trained on real production data. To the best of the authors' knowledge, this work is the first large-scale deployment of learned pick quality estimation methods in a real production system.

摘要
自动化仓库操作可以减少物流成本，最终降低消费者最终价格，提高快递速度，并增强对市场波动的抗颤势。这个扩展摘要展示了亚马逊 робо拓客（Robin）车队的大规模套件搬运，可以每天搬运Up to 6 million个套件并已经搬运了超过20亿个套件。它描述了不同的论述方法，以及其继承者，该方法使用实际生产数据进行学习套件质量预测。据作者所知，这是首次在真正生产环境中大规模应用学习套件质量预测方法。

Hindi to English: Transformer-Based Neural Machine Translation

paper_url: http://arxiv.org/abs/2309.13222
repo_url: https://github.com/1502shivam-singh/audio-vision-server
paper_authors: Kavit Gangar, Hardik Ruparel, Shreyas Lele
for: 这个论文主要针对的是将印度语言希腊语译成英语的自动翻译问题，以提高翻译质量。
methods: 这个论文使用了深度学习技术，特别是Transformer模型，将希腊语译成英语。为了增强数据训练， authors还使用了回训练和字节对编码（BPE）进行 vocabulary 创建和tokenization。
results: 根据IIT Bombay英语-希腊语词库测试集，这个配置达到了当前最佳的BLEU分数24.53。

Abstract
Machine Translation (MT) is one of the most prominent tasks in Natural Language Processing (NLP) which involves the automatic conversion of texts from one natural language to another while preserving its meaning and fluency. Although the research in machine translation has been going on since multiple decades, the newer approach of integrating deep learning techniques in natural language processing has led to significant improvements in the translation quality. In this paper, we have developed a Neural Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English. Hindi being a low resource language has made it difficult for neural networks to understand the language thereby leading to a slow growth in the development of neural machine translators. Thus, to address this gap, we implemented back-translation to augment the training data and for creating the vocabulary, we experimented with both word and subword level tokenization using Byte Pair Encoding (BPE) thereby ending up training the Transformer in 10 different configurations. This led us to achieve a state-of-the-art BLEU score of 24.53 on the test set of IIT Bombay English-Hindi Corpus in one of the configurations.

摘要
机器翻译（MT）是自然语言处理（NLP）中最为出名的任务之一，它涉及自然语言之间文本的自动转换，保持意思和流畅性。虽然关于机器翻译的研究已经持续多个 décennia，但是在近年来，将深度学习技术应用于自然语言处理领域，带来了对翻译质量的显著改善。在这篇论文中，我们开发了一个基于Transformer模型的神经机器翻译系统，用于从印度语言希ن第语到英语的翻译。由于希第语是一种低资源语言，使得神经网络理解这种语言很困难，因此在神经机器翻译的发展中，进展较为缓慢。为了解决这个问题，我们实施了反向翻译以增加训练数据，并在 vocabulary 创建方面实验了字节对编码（BPE）的两种tokenization策略。经过10种不同的配置训练，我们最终实现了IIT Bombay英语-希第语词库测试集上的state-of-the-art BLEU分数24.53。

2023-09-23

cs.CL

cs.CL - 2023-09-23

paper_url: http://arxiv.org/abs/2309.13476
repo_url: None
paper_authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia
For: The paper aims to improve the accuracy and interpretability of automatic depression detection tools using speech, which can help early screening of depression.* Methods: The proposed bi-modal speech-level transformer model avoids segment-level labelling and provides both speech-level and sentence-level interpretations using gradient-weighted attention maps.* Results: The proposed model outperforms a model that learns at a segment level, with improved accuracy and interpretability. The model can identify the most relevant sentences and text tokens within a given speech that are indicative of depression.

Abstract
Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.897 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations.

摘要
抑郁是一种常见的心理疾病。使用机器学习技术实现的自动抑郁检测工具可以帮助早期检测抑郁。这篇论文解决了两个可能阻碍临床应用的限制： segment-level 标注导致的噪音和模型解释性不足。我们提议使用双模块的speech-level transformer来避免 segment-level 标注，并提出一种层次解释方法，以提供 both speech-level 和 sentence-level 的解释，基于所有注意层的梯度权重注意力地图来跟踪输入特征之间的交互。我们表明，提议的模型在比较 segment level 学习的模型（$p$=0.732， $r$=0.808， $F1$=0.768）的情况下表现出色，其中 $p$=0.854， $r$=0.947， $F1$=0.897。为了解释模型，我们使用一个真正正确的样本，显示某些speech中的哪些句子是抑郁检测中最重要的，以及哪些文本字符和 Mel-spectrogram 区域在这些句子中对抑郁检测最重要。这些解释可以帮助临床专业人员验证抑郁检测工具的预测结果，从而促进其临床应用。

Grounding Description-Driven Dialogue State Trackers with Knowledge-Seeking Turns

paper_url: http://arxiv.org/abs/2309.13448
repo_url: None
paper_authors: Alexandru Coca, Bo-Hsiang Tseng, Jinghong Chen, Weizhe Lin, Weixuan Zhang, Tisha Anders, Bill Byrne
for: 提高对话管理模型的稳定性和泛化能力
methods: 使用对话 corpora 和知识图来固定状态跟踪模型，并在推理和训练过程中添加知识图 turns
results: 对比原始模型，新的方法可以大幅提高对话管理模型的平均共同目标准确率和schema敏感度

Abstract
Schema-guided dialogue state trackers can generalise to new domains without further training, yet they are sensitive to the writing style of the schemata. Augmenting the training set with human or synthetic schema paraphrases improves the model robustness to these variations but can be either costly or difficult to control. We propose to circumvent these issues by grounding the state tracking model in knowledge-seeking turns collected from the dialogue corpus as well as the schema. Including these turns in prompts during finetuning and inference leads to marked improvements in model robustness, as demonstrated by large average joint goal accuracy and schema sensitivity improvements on SGD and SGD-X.

摘要
Schema-guided dialogue state trackers可以通过新领域掌握而无需进一步训练，但它们受到文本风格的影响很sensitive。增加训练集中的人工或 sintetic schema paraphrase可以提高模型的可靠性，但这可能会成本高或控制困难。我们提议通过将知识寻求turns集成到对话 corpus和schema中来固定状态跟踪模型。在训练和推理中包含这些turns的提问可以获得显著改进，如joint目标准确率和schema敏感度的提高，如SGD和SGD-X所示。

My Science Tutor (MyST) – A Large Corpus of Children’s Conversational Speech

paper_url: http://arxiv.org/abs/2309.13347
repo_url: None
paper_authors: Sameer S. Pradhan, Ronald A. Cole, Wayne H. Ward
for: This paper describes the development of the MyST corpus, a large collection of children’s conversational speech, which can be used to improve automatic speech recognition algorithms, build and evaluate conversational AI agents for education, and develop multimodal applications to improve children’s learning.
methods: The MyST corpus was developed as part of the My Science Tutor project, which involves 100K utterances transcribed from approximately 10.5K virtual tutor sessions by 1.3K third, fourth, and fifth grade students. The corpus is available for non-commercial and commercial use under a creative commons license.
results: To date, ten organizations have licensed the corpus for commercial use, and approximately 40 university and other not-for-profit research groups have downloaded the corpus. The corpus has the potential to be used to improve children’s learning and excitement about science, and to help them learn remotely.Here is the information in Simplified Chinese text:
for: 这篇论文描述了MyST corpus的开发，这是一个儿童对话语音集，可以用于改进自动语音识别算法、建立和评估教育机器人、并开发多Modal应用程序，以提高儿童学习科学的兴趣和成就。
methods: MyST corpus是My Science Tutor项目的一部分，涉及100万句话的对话语音，来自约10.5万个虚拟导师会议，由1.3万名第三、四、五年级学生提供。这个 corpus 是可以免费使用（https://myst.cemantix.org），也可以用于商业用途（https://boulderlearning.com/resources/myst-corpus/）。到目前为止，有十家组织已经购买了这个 corpus 的商业授权，并且有约40个大学和其他非营利研究机构下载了这个 corpus。
results: 这个 corpus 的开发可以用于改进儿童学习科学的方法，并且可以帮助儿童在远程学习中学习更好地。到目前为止，有十家组织已经购买了这个 corpus 的商业授权，并且有约40个大学和其他非营利研究机构下载了这个 corpus。

Abstract
This article describes the MyST corpus developed as part of the My Science Tutor project -- one of the largest collections of children's conversational speech comprising approximately 400 hours, spanning some 230K utterances across about 10.5K virtual tutor sessions by around 1.3K third, fourth and fifth grade students. 100K of all utterances have been transcribed thus far. The corpus is freely available (https://myst.cemantix.org) for non-commercial use using a creative commons license. It is also available for commercial use (https://boulderlearning.com/resources/myst-corpus/). To date, ten organizations have licensed the corpus for commercial use, and approximately 40 university and other not-for-profit research groups have downloaded the corpus. It is our hope that the corpus can be used to improve automatic speech recognition algorithms, build and evaluate conversational AI agents for education, and together help accelerate development of multimodal applications to improve children's excitement and learning about science, and help them learn remotely.

摘要
这篇文章介绍了MyST资料集，这是My Science Tutor项目中的一个大型儿童对话语音资料集，包含约400个小时的对话语音，涵盖约230万个语音词汇，分别来自约1.3万名第三、四、五年级学生的10.5万个虚拟教学会话。目前已经转录100万个语音。MyST资料集是免费使用（https://myst.cemantix.org），通过创意共用许可证，用于非商业用途。同时，也可以用于商业用途（https://boulderlearning.com/resources/myst-corpus/），至今已有10家组织购买了商业许可证。我们希望通过这些资料来提高自动语音识别算法，建立和评估教育对话AI代理人，以及在远程学习方面提高儿童对科学的兴趣和学习。

BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models

paper_url: http://arxiv.org/abs/2309.13345
repo_url: https://github.com/rucaibox/bamboo
paper_authors: Zican Dong, Tianyi Tang, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
for: 本研究旨在评估大型自然语言处理（NLP）模型在长文本理解任务上的能力，并提供多任务长文本测试集（BAMBOO）。
methods: 本研究使用了五种长文本模型，在BAMBOO上进行了实验，并评估了这些模型在不同任务上的性能。
results: 研究发现，现有的长文本模型在某些任务上表现出色，但在其他任务上表现较差。研究还指出了未来可以采取的方法来提高长文本模型的能力。

Abstract
Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length. Recently, multiple studies have committed to extending the context length and enhancing the long text modeling capabilities of LLMs. To comprehensively evaluate the long context ability of LLMs, we propose BAMBOO, a multi-task long context benchmark. BAMBOO has been designed with four principles: comprehensive capacity evaluation, avoidance of data contamination, accurate automatic evaluation, and different length levels. It consists of 10 datasets from 5 different long text understanding tasks, i.e. question answering, hallucination detection, text sorting, language modeling, and code completion, to cover core capacities and various domains of LLMs. We conduct experiments with five long context models on BAMBOO and further discuss four key research questions of long text. We also qualitatively analyze current long context models and point out future directions for enhancing long text modeling capacities. We release our data, prompts, and code at https://github.com/RUCAIBox/BAMBOO.

摘要
大型语言模型（LLM）已经在普通长度的NLPT任务上达到了戏剑性能。在最近的几项研究中，研究者们努力扩展了LLM的上下文长度和长文本模型化能力。为全面评估LLM的长文本能力，我们提出了BAMBOO，一个多任务长文本benchmark。BAMBOO遵循四个原则：全面评估能力、数据污染避免、自动评估精度和不同长度级别。它包括10个来自5个不同长文理解任务的数据集，例如问答、幻觉检测、文本排序、语言模型和代码完成等，以覆盖LLM的核心能力和不同领域。我们在BAMBOO上进行了5个长文本模型的实验，并讨论了长文本模型的四个关键研究问题。我们还进行了现有长文本模型的Qualitative分析，并指出了未来扩展长文本模型能力的方向。我们在GitHub上发布了数据、提示和代码，请参考https://github.com/RUCAIBox/BAMBOO。

From Text to Source: Results in Detecting Large Language Model-Generated Content

paper_url: http://arxiv.org/abs/2309.13322
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Wissam Antoun, Benoît Sagot, Djamé Seddah
for: 本研究旨在 investigate Cross-Model Detection，探讨一个基于源LM的泛型分类器是否可以探测目标LM生成的文本。
methods: 本研究使用了多种LM大小和家族，并评估了对分类器泛化的影响。
results: 研究发现，模型大小与泛型分类器效果之间存在明显的反相关关系，大LM更难于探测，特别是当泛型分类器在小LM上训练时。同时，使用相同大小LM的数据进行训练可以提高大LM的探测性能，但可能会导致小LM的性能下降。模型归因实验也表明，LM生成的文本中含有可识别的签名特征。

Abstract
The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques on classifier generalization. The research also delves into Model Attribution, encompassing source model identification, model family classification, and model size classification. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.

摘要
广泛使用大型语言模型（LLM），被夸大为能生成人类语言文本的能力，已引起关于误导和伦理问题的担忧。为解决这些问题，需要开发robust的检测和归因方法。本文研究“交叉模型检测”，检查一个基于源LLM生成文本和人类写作文本的分类器能否检测目标LLM生成的文本。研究全面探讨了不同的LLM大小和家族，以及对分类器泛化的影响。研究还探讨了模型归因，包括来源模型标识、模型家族分类和模型大小分类。我们的结果显示了一些关键发现：与分类器效果相对关系，大型LLM更难于检测，特别是当分类器被训练使用小型LLM的数据时。使用同样大小的LLM数据进行训练可以提高大型LLM的检测性能，但可能导致对小型LLM的性能下降。此外，模型归因实验显示了LLM生成文本中的可察性特征，这些特征可以用于归因LLM。总的来说，我们的研究为LLM检测和归因提供了有价值的发现。

GlotScript: A Resource and Tool for Low Resource Writing System Identification

paper_url: http://arxiv.org/abs/2309.13320
repo_url: https://github.com/cisnlp/GlotScript
paper_authors: Amir Hossein Kargaran, François Yvon, Hinrich Schütze
for: 用于identifying low-resource writing systems
methods: 使用存储的writing system resources和Unicode 15.0 scripts
results: 支持清理多语言 corpus和分析语言模型的TokenizationHere’s the same information in Simplified Chinese:
for: 用于识别低资源文字系统
methods: 使用存储的文字系统资源和Unicode 15.0 字体
results: 支持清理多语言 corpus和分析语言模型的Tokenization

Abstract
We present GlotScript, an open resource and tool for low resource writing system identification. GlotScript-R is a resource that provides the attested writing systems for more than 7,000 languages. It is compiled by aggregating information from existing writing system resources. GlotScript-T is a writing system identification tool that covers all 161 Unicode 15.0 scripts. For an input text, it returns its script distribution where scripts are identified by ISO 15924 codes. We also present two use cases for GlotScript. First, we demonstrate that GlotScript supports cleaning multilingual corpora such as mC4 and OSCAR. Second, we analyze the tokenization of a number of language models such as GPT-4 using GlotScript and provide insights on the coverage of low resource scripts and languages by each language model. We hope that GlotScript will become a useful resource for work on low resource languages in the NLP community. GlotScript-R and GlotScript-T are available at https://github.com/cisnlp/GlotScript.

摘要
我们介绍GlotScript，一个开源资源和工具，用于低资源文字系统识别。GlotScript-R是一个提供了超过7,000种语言的验证文字系统资源。它通过将现有文字系统资源集成起来编译而成。GlotScript-T是一个可以识别所有Unicode 15.0 编码中的161种文字系统的写作系统识别工具。对于输入文本，它返回该文本的文字系统分布，并将文字系统用ISO 15924 编码来标识。我们还介绍了GlotScript的两个使用场景。首先，我们示例了GlotScript可以清洁多语言 corpus，如mC4和OSCAR。其次，我们分析了一些语言模型，如GPT-4，使用GlotScript进行分词，并提供了低资源文字和语言的覆盖率的视图。我们希望GlotScript可以成为NPLTcommunity中工作低资源语言的有用资源。GlotScript-R和GlotScript-T可以在https://github.com/cisnlp/GlotScript 上获取。

Spanish Resource Grammar version 2023

paper_url: http://arxiv.org/abs/2309.13318
repo_url: None
paper_authors: Olga Zamaraeva, Carlos Gómez-Rodríguez
for: 这个论文是为了语言研究和自然语言处理应用开发而写的。
methods: 这个论文使用了最新版本的Freeling morphological analyzer和tagger，并提供了手动验证的treebank和问题列表。
results: 这个论文提供了一个新的研究方向，并在一小部分学习 corpus 上测试了 grammar 的覆盖率和过度生成。

Abstract
We present the latest version of the Spanish Resource Grammar (SRG). The new SRG uses the recent version of Freeling morphological analyzer and tagger and is accompanied by a manually verified treebank and a list of documented issues. We also present the grammar's coverage and overgeneration on a small portion of a learner corpus, an entirely new research line with respect to the SRG. The grammar can be used for linguistic research, such as for empirically driven development of syntactic theory, and in natural language processing applications such as computer-assisted language learning. Finally, as the treebanks grow, they can be used for training high-quality semantic parsers and other systems which may benefit from precise and detailed semantics.

摘要
我们现在发布最新版的西班牙资源语法（SRG）。新的SRG使用了最新版的Freeling morphological analyzer和标注器，并附有手动验证的树链和问题列表。我们还对一小部分学习语料进行了覆盖率和过度生成的测试，这是SRG的全新研究方向。这个语法可以用于语言科研，如逐渐驱动发展 syntax理论，以及自然语言处理应用，如计算机助教语言学习。随着树链的增长，它们可以用于训练高质量semantic parser和其他可能受益于精确和详细 semantics的系统。

Calibrating LLM-Based Evaluator

paper_url: http://arxiv.org/abs/2309.13308
repo_url: None
paper_authors: Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
for: 这 paper 的目的是提出一种自动调整和人类偏好Alignment的方法，以便使用大型自然语言模型（LLM）进行自然语言生成质量评估。
methods: 该方法包括Multi-stage, gradient-free Approach，首先在不同的几个阶段中使用语言模型自己学习不同的几个例子，然后选择最佳表现者进行自我反调。
results: 对多个文本质量评估数据集进行实验，显示该方法可以有效地提高与专家评估的相关性。同时，对于各种有效的评价标准的Qualitative分析提供了深入的直观启示和观察。

Abstract
Recent advancements in large language models (LLMs) on language modeling and emergent capabilities make them a promising reference-free evaluator of natural language generation quality, and a competent alternative to human evaluation. However, hindered by the closed-source or high computational demand to host and tune, there is a lack of practice to further calibrate an off-the-shelf LLM-based evaluator towards better human alignment. In this work, we propose AutoCalibrate, a multi-stage, gradient-free approach to automatically calibrate and align an LLM-based evaluator toward human preference. Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels. Then, an initial set of scoring criteria is drafted by the language model itself, leveraging in-context learning on different few-shot examples. To further calibrate this set of criteria, we select the best performers and re-draft them with self-refinement. Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration. Our comprehensive qualitative analysis conveys insightful intuitions and observations on the essence of effective scoring criteria.

摘要
近期大语言模型（LLM）的进步在语言生成质量评估方面，使得它们成为了无参考的自然语言评估器和人类评估器的有力竞争对手。然而，由于某些原因，很多LLM-based评估器受到了封闭的源代码或高计算需求的限制，导致它们的评估器没有得到进一步的精心调整。在这项工作中，我们提出了一种多stage、gradient-free的自动调整方法，以使得LLM-based评估器更加准确地对应人类的喜好。而不是直接模型人类喜好，我们将人类标签集成到了一个集合中，然后由语言模型自己提出了初始的评估标准。然后，我们选择了最佳表现者，并通过自我修复来重新绘制这些标准。我们在多个文本质量评估数据集上进行了多项实验，并证明了与专家评估的强相关性。我们还提供了深入的Qualitative分析，帮助理解有效的评估标准的本质。

OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for Aspect-Based Sentiment Analysis

paper_url: http://arxiv.org/abs/2309.13297
repo_url: None
paper_authors: Siva Uday Sampreeth Chebolu, Franck Dernoncourt, Nedim Lipka, Thamar Solorio
for: 本研究旨在掌握用户生成的评论中的具体元素之情感分析，以提高对文本内容的情感分析和评估。
methods: 本研究使用了新的OATS dataset，包括三个新领域的评论，以及20,000个句子四重和13,000个评论二重。实验还包括了内部和跨领域的实验，以探索不同的ABSA子 зада业和OATS的潜力。
results: 本研究通过实验获得了OATSdataset的初步基线，并证明了OATS可以解决现有的ABSA领域问题，例如餐厅和笔记型评价等领域的问题。

Abstract
Aspect-based sentiment Analysis (ABSA) delves into understanding sentiments specific to distinct elements within textual content. It aims to analyze user-generated reviews to determine a) the target entity being reviewed, b) the high-level aspect to which it belongs, c) the sentiment words used to express the opinion, and d) the sentiment expressed toward the targets and the aspects. While various benchmark datasets have fostered advancements in ABSA, they often come with domain limitations and data granularity challenges. Addressing these, we introduce the OATS dataset, which encompasses three fresh domains and consists of 20,000 sentence-level quadruples and 13,000 review-level tuples. Our initiative seeks to bridge specific observed gaps: the recurrent focus on familiar domains like restaurants and laptops, limited data for intricate quadruple extraction tasks, and an occasional oversight of the synergy between sentence and review-level sentiments. Moreover, to elucidate OATS's potential and shed light on various ABSA subtasks that OATS can solve, we conducted in-domain and cross-domain experiments, establishing initial baselines. We hope the OATS dataset augments current resources, paving the way for an encompassing exploration of ABSA.

摘要

Natural Language Processing for Requirements Formalization: How to Derive New Approaches?

paper_url: http://arxiv.org/abs/2309.13272
repo_url: https://github.com/ifak-prototypes/nlp_reform
paper_authors: Viju Sudhi, Libin Kutty, Robin Gröpler
for: 本研究旨在提供一种 semi-自动化的需求ormalization方法，以帮助industry和研究人员尽可能自动化软件开发和测试过程。methods: 本研究使用自然语言处理（NLP）技术，包括创建规则集和iterative开发rule sets，以自动化需求ormalization过程。results: 研究表明，使用现有的预训练NLP模型可以减少创建规则集的努力，并且可以轻松适应特定用例和领域。两个 industriuse cases from the automotive and railway domains are used to demonstrate the effectiveness of the proposed methods.

Abstract
It is a long-standing desire of industry and research to automate the software development and testing process as much as possible. In this process, requirements engineering (RE) plays a fundamental role for all other steps that build on it. Model-based design and testing methods have been developed to handle the growing complexity and variability of software systems. However, major effort is still required to create specification models from a large set of functional requirements provided in natural language. Numerous approaches based on natural language processing (NLP) have been proposed in the literature to generate requirements models using mainly syntactic properties. Recent advances in NLP show that semantic quantities can also be identified and used to provide better assistance in the requirements formalization process. In this work, we present and discuss principal ideas and state-of-the-art methodologies from the field of NLP in order to guide the readers on how to create a set of rules and methods for the semi-automated formalization of requirements according to their specific use case and needs. We discuss two different approaches in detail and highlight the iterative development of rule sets. The requirements models are represented in a human- and machine-readable format in the form of pseudocode. The presented methods are demonstrated on two industrial use cases from the automotive and railway domains. It shows that using current pre-trained NLP models requires less effort to create a set of rules and can be easily adapted to specific use cases and domains. In addition, findings and shortcomings of this research area are highlighted and an outlook on possible future developments is given.

摘要
industry和研究界长期希望自动化软件开发和测试过程，并且需求工程（RE）在这些步骤上扮演了基本角色。基于模型的设计和测试方法已经为软件系统的增长复杂性和可变性提供了解决方案。然而，从大量函циональ需求提供的自然语言处理（NLP）技术仍然需要大量的努力来生成需求模型。在文献中，许多基于NLP的方法已经被提出，主要基于语法特征来生成需求模型。然而，现在的NLP进步还表明可以利用semantic量来提供更好的帮助在需求正式化过程中。在这个工作中，我们将介绍和讨论一些在NLP领域的主要想法和现状技术，以帮助读者创建一套 semi-自动化需求正式化的规则和方法。我们在详细介绍了两种方法，并强调了迭代发展规则集的重要性。需求模型被表示为人类和机器可读的形式，即pseudocode。我们的方法在两个工业用例中（来自汽车和铁路领域）得到了证明，显示使用当前预训练的NLP模型需要较少的努力来创建规则集，并且可以轻松地适应特定用例和领域。此外，我们还高亮了这个研究领域的发现和缺陷，并提供了未来可能发展的前景。

A Survey of Document-Level Information Extraction

paper_url: http://arxiv.org/abs/2309.13249
repo_url: https://github.com/Don-No7/Hack-SQL
paper_authors: Hanwen Zheng, Sijia Wang, Lifu Huang
for: 本文是一篇文献综述，旨在为NLП领域的研究人员提供更多的启示，以进一步提高文档级别的自然语言处理（NLP）性能。
methods: 本文使用了现有的国际先进算法进行了系统性的错误分析，并识别了当前的限制和NLП领域的留下的挑战。
results: 根据我们的发现，标注噪音、实体核心匹配和无理解能力是文档级别IE性能的主要限制因素。

Abstract
Document-level information extraction (IE) is a crucial task in natural language processing (NLP). This paper conducts a systematic review of recent document-level IE literature. In addition, we conduct a thorough error analysis with current state-of-the-art algorithms and identify their limitations as well as the remaining challenges for the task of document-level IE. According to our findings, labeling noises, entity coreference resolution, and lack of reasoning, severely affect the performance of document-level IE. The objective of this survey paper is to provide more insights and help NLP researchers to further enhance document-level IE performance.

摘要
文档级信息提取（IE）是自然语言处理（NLP）中关键的任务。本文进行了最新文档级IE литературе的系统性评审。此外，我们还进行了当前状态的算法评估，并确定了现有算法的局限性以及文档级IE任务中仍存在的挑战。根据我们的发现，标注噪音、实体核心归并和不足的逻辑，对文档级IE性能产生了严重的影响。本文的目标是为NLP研究人员提供更多的洞察和帮助，以进一步提高文档级IE性能。

ChEDDAR: Student-ChatGPT Dialogue in EFL Writing Education

paper_url: http://arxiv.org/abs/2309.13243
repo_url: None
paper_authors: Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh
for: 这个研究旨在探讨大规模实际场景下学生和AI系统之间的交互，以推动教育领域中AI生成技术的应用。
methods: 这个研究使用了对212名英语为外语学生进行了一个学期长的实验，他们被要求通过对ChatGPT进行对话来修改他们的作业。研究收集了对话记录、utterance-level作业修改历史、自我评价和学生的意图，以及每个会话的前后调查记录学生的目标和总体经验。
results: 研究发现学生在使用生成AI时的使用模式和满意度与他们的意图有直接关系，并提出了基准结果为两个关键任务在教育上的对话系统中：意图检测和满意度估计。研究建议进一步调整教育中AI生成技术的应用，并提出了可能的使用Scenario使用ChEDDAR。ChEDDAR公共可用于https://github.com/zeunie/ChEDDAR。

Abstract
The integration of generative AI in education is expanding, yet empirical analyses of large-scale, real-world interactions between students and AI systems still remain limited. In this study, we present ChEDDAR, ChatGPT & EFL Learner's Dialogue Dataset As Revising an essay, which is collected from a semester-long longitudinal experiment involving 212 college students enrolled in English as Foreign Langauge (EFL) writing courses. The students were asked to revise their essays through dialogues with ChatGPT. ChEDDAR includes a conversation log, utterance-level essay edit history, self-rated satisfaction, and students' intent, in addition to session-level pre-and-post surveys documenting their objectives and overall experiences. We analyze students' usage patterns and perceptions regarding generative AI with respect to their intent and satisfaction. As a foundational step, we establish baseline results for two pivotal tasks in task-oriented dialogue systems within educational contexts: intent detection and satisfaction estimation. We finally suggest further research to refine the integration of generative AI into education settings, outlining potential scenarios utilizing ChEDDAR. ChEDDAR is publicly available at https://github.com/zeunie/ChEDDAR.

摘要
整合生成AI在教育中的推广正在进行，但实际的大规模实验却仍然受到限制。本研究公布了ChEDDAR，ChatGPT & EFL Learner's Dialogue Dataset As Revising an essay，这是基于一个半年长的实验，其中212名大学生参与了英语作为外语写作课程。这些学生被要求通过对ChatGPT的对话来修改他们的文章。ChEDDAR包括对话记录、文章修改历史记录、自我评价满意度，以及学生的意图，以及每个会话的前后调查，记录了学生的目标和总体体验。我们分析学生对生成AI的使用方式和满意度的关系，并为两个关键任务在教育上的对话系统提供基线结果：检测意图和满意度估计。最后，我们建议进一步推进生成AI的教育集成，并提出了可能的应用场景，使用ChEDDAR可以在https://github.com/zeunie/ChEDDAR获取。

User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

paper_url: http://arxiv.org/abs/2309.13233
repo_url: None
paper_authors: Sam Davidson, Salvatore Romeo, Raphael Shu, James Gung, Arshit Gupta, Saab Mansour, Yi Zhang
for: 提高自动评估新任务对话系统（TOD）的发展，避免人工评估的多个阶段和迭代过程中的阻碍。
methods: 使用最近发展的大型预训练语言模型（LLM）建立新的用户模拟器，通过受 Context 学习提高语言多样性，模拟人类对话伙伴的行为。
results: 比前一工作更高的语言多样性和语义多样性，能够与多个 TOD 系统进行有效交流，尤其是单意对话目标，而且生成的语音和语法多样性比前一工作更高。

Abstract
One of the major impediments to the development of new task-oriented dialogue (TOD) systems is the need for human evaluation at multiple stages and iterations of the development process. In an effort to move toward automated evaluation of TOD, we propose a novel user simulator built using recently developed large pretrained language models (LLMs). In order to increase the linguistic diversity of our system relative to the related previous work, we do not fine-tune the LLMs used by our system on existing TOD datasets; rather we use in-context learning to prompt the LLMs to generate robust and linguistically diverse output with the goal of simulating the behavior of human interlocutors. Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems. Using this approach, our current simulator is effectively able to interact with several TOD systems, especially on single-intent conversational goals, while generating lexically and syntactically diverse output relative to previous simulators that rely upon fine-tuned models. Finally, we collect a Human2Bot dataset of humans interacting with the same TOD systems with which we experimented in order to better quantify these achievements.

摘要

Unify word-level and span-level tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task

paper_url: http://arxiv.org/abs/2309.13230
repo_url: https://github.com/njunlp/njuqe
paper_authors: Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang
for: 这个研究是为了提出一种基于NJUQE框架的pseudo数据方法，以提高 Machine Translation 的质量预测（QE）性能。
methods: 该研究使用了Parallel Data从WMT翻译任务中生成pseudo MQM数据，然后使用XLMR大型模型在pseudo QE数据上进行预训练，并在实际QE数据上进行细化调整。同时，该研究jointly学习了句子级分数和单词级标签。
results: 该研究在英文-德文语对的 sentence-level和word-level质量预测两个子任务上达到了最佳性能，在两个子任务上的margin上提高了较大的性能。

Abstract
We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin.

摘要
我们介绍NJUNLP团队在WMT 2023质量估计（QE）共享任务中的提交。我们对英语-德语语对 submitting 预测，包括两个子任务：（i）句子和单词水平质量预测，以及（ii）细化错误范围检测。本年，我们进一步探索基于NJUQE框架（https://github.com/NJUNLP/njuqe）的pseudo数据方法 для QE。我们使用WMT翻译任务的平行数据生成pseudo MQM数据，然后在这些数据上预训练XLMR大型模型，然后精度调整在真实QE数据上。在两个阶段中，我们同时学习句子级分数和单词级标签。实际上，我们进行了实验来找到提高性能的关键超参数。技术上，我们提出了一种简单的方法，将单词级输出转换为细化错误范围结果。总的来说，我们的模型在英语-德语语对上的 both word-level 和细化错误范围检测子任务上 achieved 最佳成绩，差距非常明显。

COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs

paper_url: http://arxiv.org/abs/2309.14356
repo_url: None
paper_authors: Tiep Le, Vasudev Lal, Phillip Howard
for: 这篇论文的目的是提出一种可扩展的框架，用于自动生成多模态的反例，以提高自然语言处理（NLP）领域中模型对数据中的偶极相关性的耐误性。
methods: 这篇论文使用了文本到图像扩散模型来生成反例。
results: 作者通过人工评估 validate了 COCO-Counterfactuals 多模态反例集的质量，并表明了现有的多模态模型在这些反例中表现不佳。此外，作者还示出了通过使用 COCO-Counterfactuals 进行训练数据增强来提高多模态视语言模型的对外域数据的泛化能力。

Abstract
Counterfactual examples have proven to be valuable in the field of natural language processing (NLP) for both evaluating and improving the robustness of language models to spurious correlations in datasets. Despite their demonstrated utility for NLP, multimodal counterfactual examples have been relatively unexplored due to the difficulty of creating paired image-text data with minimal counterfactual changes. To address this challenge, we introduce a scalable framework for automatic generation of counterfactual examples using text-to-image diffusion models. We use our framework to create COCO-Counterfactuals, a multimodal counterfactual dataset of paired image and text captions based on the MS-COCO dataset. We validate the quality of COCO-Counterfactuals through human evaluations and show that existing multimodal models are challenged by our counterfactual image-text pairs. Additionally, we demonstrate the usefulness of COCO-Counterfactuals for improving out-of-domain generalization of multimodal vision-language models via training data augmentation.

摘要
<>使用卷积神经网络生成对应的图像和文本描述的对称对话实例，以便用于评估和改进自然语言处理（NLP）模型对偶合关系在数据集中的Robustness。 DESPITE THEIR DEMONSTRATED UTILITY FOR NLP, multimodal counterfactual examples have been relatively unexplored due to the difficulty of creating paired image-text data with minimal counterfactual changes. To address this challenge, we introduce a scalable framework for automatic generation of counterfactual examples using text-to-image diffusion models. We use our framework to create COCO-Counterfactuals, a multimodal counterfactual dataset of paired image and text captions based on the MS-COCO dataset. We validate the quality of COCO-Counterfactuals through human evaluations and show that existing multimodal models are challenged by our counterfactual image-text pairs. Additionally, we demonstrate the usefulness of COCO-Counterfactuals for improving out-of-domain generalization of multimodal vision-language models via training data augmentation.难以创建具有最小对称变化的对应图像和文本描述的对称对话实例，这些实例在自然语言处理（NLP）领域已经证明具有检验和改进模型Robustness的价值。 DESPITE THEIR DEMONSTRATED UTILITY FOR NLP, multimodal counterfactual examples have been relatively unexplored。 To address this challenge, we introduce a scalable framework for automatic generation of counterfactual examples using text-to-image diffusion models. We use our framework to create COCO-Counterfactuals, a multimodal counterfactual dataset of paired image and text captions based on the MS-COCO dataset. We validate the quality of COCO-Counterfactuals through human evaluations and show that existing multimodal models are challenged by our counterfactual image-text pairs. Additionally, we demonstrate the usefulness of COCO-Counterfactuals for improving out-of-domain generalization of multimodal vision-language models via training data augmentation.

2023-09-23

cs.LG

cs.LG - 2023-09-23

Interpretable and Flexible Target-Conditioned Neural Planners For Autonomous Vehicles

paper_url: http://arxiv.org/abs/2309.13485
repo_url: None
paper_authors: Haolan Liu, Jishen Zhao, Liangjun Zhang
for: 这 paper 是为了解决自动驾驶车辆 плаanner 中的多个可接受的计划问题而写的。
methods: 这 paper 使用了一种可解释的神经网络执行器，通过灵活的 Gaussian 几何函数和放松的小时钟损失函数来更好地捕捉规划问题的不确定性。
results: 作者在 Lyft 开放数据集上进行系统性的评估，发现其模型在真实世界驾驶enario 中比 Priors 性能更好，具有更安全和更灵活的驾驶性能。

Abstract
Learning-based approaches to autonomous vehicle planners have the potential to scale to many complicated real-world driving scenarios by leveraging huge amounts of driver demonstrations. However, prior work only learns to estimate a single planning trajectory, while there may be multiple acceptable plans in real-world scenarios. To solve the problem, we propose an interpretable neural planner to regress a heatmap, which effectively represents multiple potential goals in the bird's-eye view of an autonomous vehicle. The planner employs an adaptive Gaussian kernel and relaxed hourglass loss to better capture the uncertainty of planning problems. We also use a negative Gaussian kernel to add supervision to the heatmap regression, enabling the model to learn collision avoidance effectively. Our systematic evaluation on the Lyft Open Dataset across a diverse range of real-world driving scenarios shows that our model achieves a safer and more flexible driving performance than prior works.

摘要
学习基本的自动驾驶车辆规划方法有可能在许多复杂的实际驾驶场景中扩大，通过利用庞大量驾驶员示例来担快学习。然而，先前的工作只 learns to estimate 一个规划路径，而实际场景可能存在多个可接受的规划方案。为解决这个问题，我们提议一种可解释性神经网络规划器，使用折衔函数来回归热图，该热图有效表示自动驾驶车辆的飞行视图中的多个可能目标。我们的规划器使用适应 Gaussian 核函数和松弛小时钟损失函数，以更好地捕捉规划问题的不确定性。此外，我们还使用负 Gaussian 核函数来给热图回归添加监督，使模型能够有效地学习避免碰撞。我们对 Lyft 开放数据集进行系统性评估，并在实际驾驶场景中表明我们的模型可以比先前的工作更安全和更灵活地驾驶。

A Unified Scheme of ResNet and Softmax

paper_url: http://arxiv.org/abs/2309.13482
repo_url: None
paper_authors: Zhao Song, Weixin Wang, Junze Yin
for: 这 paper 旨在提供一种统一的分析方法，用于研究深度学习中的 softmax 回归和 residual neural network（ResNet）两种技术的关系。
methods: 这 paper 使用了 theoretically 分析方法，对 regression 问题 $| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b |_2^2$ 进行了分析。
results: 这 paper 得到了 loss 函数的梯度、Hessian 和 Lipschitz 性质的分析结果，并证明了梯度和 Hessian 都是正semidefinite matrix，这使得可以使用高效的 approximate Newton 方法优化。这种统一的方法可以连接两个之前认为是无关的领域，并提供了新的视角 для深度学习模型的优化。

Abstract
Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to image classification, object detection, semantic segmentation, and tensors. Previous research works studied these two concepts separately. In this paper, we provide a theoretical analysis of the regression problem: $\| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b \|_2^2$, where $A$ is a matrix in $\mathbb{R}^{n \times d}$, $b$ is a vector in $\mathbb{R}^n$, and ${\bf 1}_n$ is the $n$-dimensional vector whose entries are all $1$. This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before. We derive the gradient, Hessian, and Lipschitz properties of the loss function. The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix. This enables an efficient approximate Newton method. As a result, this unified scheme helps to connect two previously thought unrelated fields and provides novel insight into loss landscape and optimization for emerging over-parameterized neural networks, which is meaningful for future research in deep learning models.

摘要
大型语言模型（LLM）对人类社会带来了重要的变革。软极值回归和差异神经网络（ResNet）是深度学习中两种重要的技术：它们不仅支持 LLM 的功能，而且与其他机器学习和理论计算机科学领域有着密切的关系，包括图像分类、物体检测、 semantic 分割和矩阵等。在过去的研究中，这两个概念分别得到了研究。在这篇文章中，我们提供了对 regression 问题的理论分析：$\| \langle \exp(Ax) + A x , \mathbf{1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b \|_2^2$, 其中 $A$ 是一个 $n \times d$ 维度的矩阵，$b$ 是一个 $n$ 维度的向量，${\bf 1}_n$ 是一个 $n$ 维度的向量，其中每个元素都是 1。这个 regression 问题是融合软极值回归和 ResNet 的统一方案，这是之前从未有过的。我们 derive 了梯度、Hessian 和 Lipschitz 性质。Hessian 显示为正定定义的矩阵，其结构可以分解为低级matrix 和 диагональ矩阵。这使得我们可以使用高效的approximate Newton 方法。因此，这个统一方案可以将两个 formerly 不相关的领域相连接，提供了新的视角，对深度学习模型的未来研究具有深刻的意义。

Real-time Bandwidth Estimation from Offline Expert Demonstrations

paper_url: http://arxiv.org/abs/2309.13481
repo_url: None
paper_authors: Aashish Gottipati, Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler
for: 该论文targets the problem of bandwidth estimation (BWE) for real-time communication systems, with a focus on integrating data-driven bandwidth estimators into real-time systems.
methods: 该论文提出了一种名为Merlin的完全OFFLINE的数据驱动方法，将先前的追溯方法与深度学习技术相结合，以提高BWE的准确性和可靠性。
results: 实验表明，Merlin在对比WebRTC的视频会议中具有42.85%和12.8%的包丢失和延迟减少，分别。这些结果表明Merlin可以在实时网络控制中提供高质量的带宽估计。

Abstract
In this work, we tackle the problem of bandwidth estimation (BWE) for real-time communication systems; however, in contrast to previous works, we leverage the vast efforts of prior heuristic-based BWE methods and synergize these approaches with deep learning-based techniques. Our work addresses challenges in generalizing to unseen network dynamics and extracting rich representations from prior experience, two key challenges in integrating data-driven bandwidth estimators into real-time systems. To that end, we propose Merlin, the first purely offline, data-driven solution to BWE that harnesses prior heuristic-based methods to extract an expert BWE policy. Through a series of experiments, we demonstrate that Merlin surpasses state-of-the-art heuristic-based and deep learning-based bandwidth estimators in terms of objective quality of experience metrics while generalizing beyond the offline world to in-the-wild network deployments where Merlin achieves a 42.85% and 12.8% reduction in packet loss and delay, respectively, when compared against WebRTC in inter-continental videoconferencing calls. We hope that Merlin's offline-oriented design fosters new strategies for real-time network control.

摘要
在这项工作中，我们解决了实时通信系统中的带宽估计（BWE）问题，但是与前一些工作不同，我们利用了过去的规则基本方法的巨大努力和深度学习基本技术的相互作用。我们的工作解决了在总结到未经见过的网络动态和从前经验中提取丰富表示的两个关键挑战，以使得数据驱动的带宽估计器可以成功地集成到实时系统中。为此，我们提出了Merlin，第一个完全OFFLINE、数据驱动的BWE解决方案，利用了过去的规则基本方法提取出专家级带宽估计策略。经过一系列实验，我们证明Merlin在对象质量体验指标方面超过了现有的规则基本方法和深度学习基本方法的带宽估计器，并在实际网络部署中实现了42.85%和12.8%的数据损失和延迟减少，分别与WebRTC在跨洲视频会议中比较。我们希望Merlin的OFFLINE-oriented设计会激发新的实时网络控制策略。

CA-PCA: Manifold Dimension Estimation, Adapted for Curvature

paper_url: http://arxiv.org/abs/2309.13478
repo_url: None
paper_authors: Anna C. Gilbert, Kevin O’Neill
for: 本文旨在提出一种基于拟合 embedding 的维度估计方法，以改进现有的维度估计方法，以便更好地分析高维数据。
methods: 本文使用了本地 PCA 方法，基于拟合 embedding 来进行维度估计。
results: 经过严格的实验表明，本文提出的 CA-PCA 方法在各种设定下都有所改进，可以更好地估计高维数据的维度。

Abstract
The success of algorithms in the analysis of high-dimensional data is often attributed to the manifold hypothesis, which supposes that this data lie on or near a manifold of much lower dimension. It is often useful to determine or estimate the dimension of this manifold before performing dimension reduction, for instance. Existing methods for dimension estimation are calibrated using a flat unit ball. In this paper, we develop CA-PCA, a version of local PCA based instead on a calibration of a quadratic embedding, acknowledging the curvature of the underlying manifold. Numerous careful experiments show that this adaptation improves the estimator in a wide range of settings.

摘要
高维数据分析中算法的成功常被归结于 manifold 假设，即数据位于或靠近一个低维度的抽象 manifold。在进行维度减少之前，常常需要先确定或估算 manifold 的维度。现有的维度估算方法通常使用平坦单位球进行准备。本文提出了 CA-PCA，基于 quadratic embedding 的本地 PCA 方法，考虑到 manifold 的曲率性。详细的实验表明，这种改进可以在各种场景中提高估计器的性能。Note: "高维数据" (gāo wèi xué) in Chinese refers to data with many features or dimensions, and "抽象 manifold" (chōu xiǎng jiāo) refers to a hypothetical lower-dimensional space that the high-dimensional data is assumed to lie on or near.

SUDS: Sanitizing Universal and Dependent Steganography

paper_url: http://arxiv.org/abs/2309.13467
repo_url: https://github.com/pkrobinette/suds-ecai-2023
paper_authors: Preston K. Robinette, Hanchen D. Wang, Nishan Shehadeh, Daniel Moyer, Taylor T. Johnson
for: This paper focuses on developing a deep learning sanitization technique called SUDS to mitigate the shortcomings of steganalysis in detecting steganography.
methods: The paper uses a deep learning approach called SUDS that is not reliant on prior knowledge of steganographic hiding techniques and can sanitize universal and dependent steganography.
results: The paper demonstrates the capabilities and limitations of SUDS through five research questions, including baseline comparisons and an ablation study, and shows that SUDS can increase the resistance of a poisoned classifier against attacks by 1375%.Here’s the Chinese translation of the three key information points:
for: 这篇论文关注开发一种基于深度学习的清洁技术called SUDS，以mitigate隐藏分析的缺陷。
methods: 这篇论文使用一种基于深度学习的方法called SUDS，该方法不依赖于隐藏技术的先前知识，可以清洁universal和依赖隐藏。
results: 这篇论文通过五个研究问题，包括基线比较和减少研究，展示了SUDS的能力和局限性。此外，SUDS在一个实际场景中能够提高恶意分类器对攻击的抵抗力 by 1375%.

Abstract
Steganography, or hiding messages in plain sight, is a form of information hiding that is most commonly used for covert communication. As modern steganographic mediums include images, text, audio, and video, this communication method is being increasingly used by bad actors to propagate malware, exfiltrate data, and discreetly communicate. Current protection mechanisms rely upon steganalysis, or the detection of steganography, but these approaches are dependent upon prior knowledge, such as steganographic signatures from publicly available tools and statistical knowledge about known hiding methods. These dependencies render steganalysis useless against new or unique hiding methods, which are becoming increasingly common with the application of deep learning models. To mitigate the shortcomings of steganalysis, this work focuses on a deep learning sanitization technique called SUDS that is not reliant upon knowledge of steganographic hiding techniques and is able to sanitize universal and dependent steganography. SUDS is tested using least significant bit method (LSB), dependent deep hiding (DDH), and universal deep hiding (UDH). We demonstrate the capabilities and limitations of SUDS by answering five research questions, including baseline comparisons and an ablation study. Additionally, we apply SUDS to a real-world scenario, where it is able to increase the resistance of a poisoned classifier against attacks by 1375%.

摘要
《隐藏信息在明目张观的形式》，也称为隐藏通信，是一种常用于推广邮件、披露数据和秘密交流的信息隐藏方法。现代隐藏媒体包括图像、文本、音频和视频，这种通信方式在不良行为者中日益普及，以散播蠕虫、披露数据和秘密交流。现有的保护机制主要基于隐藏分析（steganalysis），但这些方法依赖于已知的隐藏技术和统计知识，因此对新或独特的隐藏方法无效。为了解决隐藏分析的缺陷，本研究提出了一种基于深度学习的清洁技术called SUDS，不依赖于隐藏技术的知识，可以清洁universal和依赖隐藏。SUDS在LSB、DDH和UDH方法上进行测试，我们通过 five 个研究问题回答了SUDS的能力和局限性，并进行了减少研究。此外，我们将SUDS应用于实际场景，其能够增加毒性分类器的抵抗力，达到1375%。

Tight bounds on Pauli channel learning without entanglement

paper_url: http://arxiv.org/abs/2309.13461
repo_url: None
paper_authors: Senrui Chen, Changhun Oh, Sisi Zhou, Hsin-Yuan Huang, Liang Jiang
for: 这个论文主要研究了无共振学习算法的优势，具体来说是研究了不使用共振状态、测量和操作来学习Pauli通道的算法。
methods: 这个论文使用了无共振学习算法，具体来说是使用分离状态、测量和操作来学习Pauli通道。这种算法等同于在主系统上执行量子电路，并在电路中进行中间测量和классификация。
results: 论文提出了一个紧binding的下界 bounds for 无共振学习Pauli通道，这个下界是cubic gap 的关键。具体来说， authors 证明了需要 $\Theta(2^n\varepsilon^{-2})$ 轮 measurements来估算Pauli通道的每个特征值到 $\varepsilon$ 错误的高概率。与此相比，一个具有共振的学习算法只需要 $\Theta(\varepsilon^{-2})$ 轮 measurements。这个下界加强了实验准确地示出了共振增强的优势。

Abstract
Entanglement is a useful resource for learning, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize separable states, measurements, and operations between the main system of interest and an ancillary system. These algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. We prove a tight lower bound for learning Pauli channels without entanglement that closes a cubic gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ rounds of measurements. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for characterizing Pauli noise.

摘要
Entanglement 是一种有用的资源 для学习，但准确地量ify its advantage 可以具有挑战。在这项工作中，我们认为不使用束缚状态的学习算法Equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward。我们证明了无束缚状态学习Pauli通道的下界， closing a cubic gap between the best-known upper and lower bound。specifically, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ rounds of measurements. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for characterizing Pauli noise.

Monotonic Neural Ordinary Differential Equation: Time-series Forecasting for Cumulative Data

paper_url: http://arxiv.org/abs/2309.13452
repo_url: None
paper_authors: Zhichao Chen, Leilei Ding, Zhixuan Chu, Yucheng Qi, Jianmin Huang, Hao Wang
for: 预测时间序数据（TSFCD）是决策过程中的关键问题，但现有的时间序预测方法通常忽略了积累数据中的幂等性和不规则性，限制其实际应用。
methods: 我们提出了一种原理驱动的方法called Monotonic neural Ordinary Differential Equation (MODE)，该方法基于神经ordinary differential equations框架，能够有效捕捉和表示积累数据中的幂等性和不规则性。
results: 通过对奖金分配场景的广泛实验，我们展示了MODE的优异性，能够处理积累数据中的幂等性和不规则性，并提供了更好的预测性能。

Abstract
Time-Series Forecasting based on Cumulative Data (TSFCD) is a crucial problem in decision-making across various industrial scenarios. However, existing time-series forecasting methods often overlook two important characteristics of cumulative data, namely monotonicity and irregularity, which limit their practical applicability. To address this limitation, we propose a principled approach called Monotonic neural Ordinary Differential Equation (MODE) within the framework of neural ordinary differential equations. By leveraging MODE, we are able to effectively capture and represent the monotonicity and irregularity in practical cumulative data. Through extensive experiments conducted in a bonus allocation scenario, we demonstrate that MODE outperforms state-of-the-art methods, showcasing its ability to handle both monotonicity and irregularity in cumulative data and delivering superior forecasting performance.

摘要
时序预测基于累积数据（TSFCD）是决策中的一项重要问题，存在许多工业场景中。然而，现有的时序预测方法通常忽视累积数据中的两个重要特征： monotonicity 和 irregularity，这限制了它们的实际应用。为解决这一限制，我们提议一种原则正的方法 called Monotonic Neural Ordinary Differential Equation（MODE），基于神经常微方程。通过利用 MODE，我们可以有效地捕捉和表示实际累积数据中的 monotonicity 和 irregularity。经过广泛的奖励分配场景的实验，我们示出 MODE 可以在 monotonicity 和 irregularity 的情况下提供更好的预测性能，超过现有的方法。

NetDiffus: Network Traffic Generation by Diffusion Models through Time-Series Imaging

paper_url: http://arxiv.org/abs/2310.04429
repo_url: None
paper_authors: Nirhoshan Sivaroopan, Dumindu Bandara, Chamara Madarasingha, Guilluame Jourjon, Anura Jayasumana, Kanchana Thilakarathna
for: 这篇论文的目的是如何使用扩散模型生成假设数据，以便解决现代网络数据的有限访问问题。
methods: 这篇论文使用了Diffusion Models（DM），将一维时间序列网络流量转换为二维图像，然后生成代表性图像。
results: 论文表明，使用NetDiffus可以提高66.4%的数据准确性和18.1%的下游机器学习任务。在七种不同的流量轨迹上进行评估， synthetic数据可以显著改善流量识别、异常检测和流量分类。

Abstract
Network data analytics are now at the core of almost every networking solution. Nonetheless, limited access to networking data has been an enduring challenge due to many reasons including complexity of modern networks, commercial sensitivity, privacy and regulatory constraints. In this work, we explore how to leverage recent advancements in Diffusion Models (DM) to generate synthetic network traffic data. We develop an end-to-end framework - NetDiffus that first converts one-dimensional time-series network traffic into two-dimensional images, and then synthesizes representative images for the original data. We demonstrate that NetDiffus outperforms the state-of-the-art traffic generation methods based on Generative Adversarial Networks (GANs) by providing 66.4% increase in fidelity of the generated data and 18.1% increase in downstream machine learning tasks. We evaluate NetDiffus on seven diverse traffic traces and show that utilizing synthetic data significantly improves traffic fingerprinting, anomaly detection and traffic classification.

摘要
网络数据分析现在成为网络解决方案的核心。然而，因为现代网络的复杂性、商业敏感性、隐私和法规限制等多种原因，实际网络数据访问受到了限制。在这种情况下，我们探讨了如何利用最近的扩散模型（DM）来生成合成网络流量数据。我们开发了一个端到端框架——NetDiffus，它首先将一维时间序列网络流量转换为二维图像，然后将原始数据生成 representativeness 的图像。我们证明了 NetDiffus 比基于生成对抗网络（GANs）的现有流量生成方法提供了66.4% 的真实性提升和18.1% 的下游机器学任务提升。我们对七个多样化的流量轨迹进行了评估，并显示了利用合成数据可以显著提高流量识别、异常检测和流量分类。

Early Classification for Dynamic Inference of Neural Networks

paper_url: http://arxiv.org/abs/2309.13443
repo_url: None
paper_authors: Jingcun Wang, Bing Li, Grace Li Zhang
for: 降低edge设备上深度神经网络（DNNs）的计算成本，以便在资源有限的平台上应用。
methods: 使用动态神经网络（Dynamic Neural Networks）实现结构适应，并在不同输入上进行早期退出。
results: 通过各级别分类器来除外不相关的类别，从而使后续层只需要确定剩下的目标类别。实验结果表明，可以有效降低DNNs在推理中的计算成本。

Abstract
Deep neural networks (DNNs) have been successfully applied in various fields. In DNNs, a large number of multiply-accumulate (MAC) operations is required to be performed, posing critical challenges in applying them in resource-constrained platforms, e.g., edge devices. Dynamic neural networks have been introduced to allow a structural adaption, e.g., early-exit, according to different inputs to reduce the computational cost of DNNs. Existing early-exit techniques deploy classifiers at intermediate layers of DNNs to push them to make a classification decision as early as possible. However, the learned features at early layers might not be sufficient to exclude all the irrelevant classes and decide the correct class, leading to suboptimal results. To address this challenge, in this paper, we propose a class-based early-exit for dynamic inference. Instead of pushing DNNs to make a dynamic decision at intermediate layers, we take advantages of the learned features in these layers to exclude as many irrelevant classes as possible, so that later layers only have to determine the target class among the remaining classes. Until at a layer only one class remains, this class is the corresponding classification result. To realize this class-based exclusion, we assign each class with a classifier at intermediate layers and train the networks together with these classifiers. Afterwards, an exclusion strategy is developed to exclude irrelevant classes at early layers. Experimental results demonstrate the computational cost of DNNs in inference can be reduced significantly.

摘要
To address this challenge, we propose a class-based early-exit for dynamic inference. Instead of pushing DNNs to make a dynamic decision at intermediate layers, we take advantage of the learned features in these layers to exclude as many irrelevant classes as possible. We assign each class with a classifier at intermediate layers and train the networks together with these classifiers. An exclusion strategy is then developed to exclude irrelevant classes at early layers.Experimental results demonstrate that the computational cost of DNNs in inference can be significantly reduced using our approach.Simplified Chinese translation:深度神经网络（DNN）在各个领域得到了成功应用，但是它们需要大量的 multiply-accumulate（MAC）操作，这对于具有限制的资源的平台，如边缘设备，具有挑战性。动态神经网络被引入以实现结构适应，例如早期离开，以降低 DNN 的计算成本。现有的早期离开技术是通过在 DNN 中的中间层添加分类器，以便在不同的输入上使 DNN 尽早做出分类决策。然而，学习在中间层的特征可能不够用于排除所有无关的类并决定正确的类，从而导致低效果。为了解决这个挑战，我们在这篇论文中提出了类型基于的早期离开。而不是在 DNN 中的中间层强制做出动态决策，我们利用中间层学习的特征来排除最多的无关类。在每个层只剩下一个类时，这个类就是对应的分类结果。为实现这种类型基于的排除，我们将每个类分配了中间层的分类器，并与这些分类器一起训练 DNN。后续，我们开发了一种排除策略，以便在早期层中排除无关的类。实验结果表明，使用我们的方法可以在 DNN 的推理中减少计算成本。

MiliPoint: A Point Cloud Dataset for mmWave Radar

paper_url: http://arxiv.org/abs/2309.13425
repo_url: None
paper_authors: Han Cui, Shu Zhong, Jiacheng Wu, Zichao Shen, Naim Dahnoun, Yiren Zhao
for: 这项研究是为了开发更有效的点集基于深度学习方法，以激发mmWave雷达技术的应用在人类活动识别领域。
methods: 该研究使用了大规模的开放数据集，并在这些数据集上进行了多种点基的深度神经网络模型的实现，包括DGCNN、PointNet++和PointTransformer等。
results: 研究发现，使用点基的深度神经网络可以在mmWave雷达数据上实现更高的人类活动识别精度。此外，该研究还提供了一个大规模的开放数据集，可供研究者进一步探索mmWave雷达技术在人类活动识别领域的应用。

Abstract
Millimetre-wave (mmWave) radar has emerged as an attractive and cost-effective alternative for human activity sensing compared to traditional camera-based systems. mmWave radars are also non-intrusive, providing better protection for user privacy. However, as a Radio Frequency (RF) based technology, mmWave radars rely on capturing reflected signals from objects, making them more prone to noise compared to cameras. This raises an intriguing question for the deep learning community: Can we develop more effective point set-based deep learning methods for such attractive sensors? To answer this question, our work, termed MiliPoint, delves into this idea by providing a large-scale, open dataset for the community to explore how mmWave radars can be utilised for human activity recognition. Moreover, MiliPoint stands out as it is larger in size than existing datasets, has more diverse human actions represented, and encompasses all three key tasks in human activity recognition. We have also established a range of point-based deep neural networks such as DGCNN, PointNet++ and PointTransformer, on MiliPoint, which can serve to set the ground baseline for further development.

摘要
幂米波（mmWave）雷达已成为人类活动感知的吸引人和经济实惠的替代方案，比传统的摄像头系统更加cost-effective。另外，mmWave雷达也是不侵入的，为用户隐私提供更好的保护。然而，作为一种Radio Frequency（RF）基于的技术，mmWave雷达需要捕捉到物体上的反射信号，这使其更容易受到噪声的影响，与摄像头相比。这引发了深度学习社区的一个感人问题：可以开发更有效的点集基于深度学习方法吗？为回答这个问题，我们的工作，称为MiliPoint，探讨了这一想法，提供了一个大规模、开放的数据集，让社区可以探索mmWave雷达如何用于人类活动识别。此外，MiliPoint更大、更多样化的人类行为被表示出来，并包括人类活动识别的三个关键任务。我们还在MiliPoint上建立了一些点基的深度神经网络，如DGCNN、PointNet++和PointTransformer，以设置基准 для后续的发展。

DenMune: Density peak based clustering using mutual nearest neighbors

paper_url: http://arxiv.org/abs/2309.13420
repo_url: https://github.com/scikit-learn-contrib/denmune-clustering-algorithm
paper_authors: Mohamed Abbas, Adel El-Zoghobi, Amin Shoukry
for: 该论文是为了解决聚类算法在具有不规则形状、不均匀密度和数据类别受到近似影响的情况下失效问题。
methods: 该论文提出了一种新的聚类算法，即 DenMune，该算法基于在K个最近邻域中查找密集区域，K是用户需要提供的唯一参数，并遵循最近邻域一致原理。
results: 该论文表明，DenMune算法能够在低维和高维数据集上 produz 稳定和Robust的聚类结果，并能自动除掉聚类过程中的噪音和找到目标聚类。

Abstract
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other, even in two dimensions. A novel clustering algorithm, DenMune is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle. The algorithm is stable for a wide range of values of K. Moreover, it is able to automatically detect and remove noise from the clustering process as well as detecting the target clusters. It produces robust results on various low and high-dimensional datasets relative to several known state-of-the-art clustering algorithms.

摘要
很多聚类算法在聚类形状不规则、密度不同、数据类型近似的情况下失败。一种新的聚类算法，DenMune，以解决这些挑战。该算法基于在尺度K中的积 nearest neighborhoods，K是用户需要提供的唯一参数，同时遵循积 nearest neighbor consistency principle。算法在不同的K值下具有稳定性，并且可以自动除掉聚类过程中的噪声，同时检测目标聚类。它在各种低维和高维数据集上显示出了相对稳定的结果，与许多已知的状态 искусственный智能算法相比。

Learning Large-Scale MTP$_2$ Gaussian Graphical Models via Bridge-Block Decomposition

paper_url: http://arxiv.org/abs/2309.13405
repo_url: https://github.com/xiwen1997/mtp2-bbd
paper_authors: Xiwen Wang, Jiaxi Ying, Daniel P. Palomar
for: 本研究实际问题是学习大规模的 Gaussian graficial 模型（MTP}_2），通过引入桥梁的概念，将整个问题分解为较小的scale的子问题，并提出一些可诠释的解决方案，从实际上来说，这个简单可诠释的架构可以将大问题分解为小 tractable 问题，实现了巨大的计算复杂度削减和现有算法的重要提升。
methods: 本研究使用了桥梁分解法，将大规模 Gaussian graficial 模型（MTP}_2）分解为较小的scale的子问题，并提出了一些可诠释的解决方案。
results: 实验结果显示， compared to state-of-the-art 参考标准，本研究的提案方法具有重要的速度优化。

Abstract
This paper studies the problem of learning the large-scale Gaussian graphical models that are multivariate totally positive of order two ($\text{MTP}_2$). By introducing the concept of bridge, which commonly exists in large-scale sparse graphs, we show that the entire problem can be equivalently optimized through (1) several smaller-scaled sub-problems induced by a \emph{bridge-block decomposition} on the thresholded sample covariance graph and (2) a set of explicit solutions on entries corresponding to bridges. From practical aspect, this simple and provable discipline can be applied to break down a large problem into small tractable ones, leading to enormous reduction on the computational complexity and substantial improvements for all existing algorithms. The synthetic and real-world experiments demonstrate that our proposed method presents a significant speed-up compared to the state-of-the-art benchmarks.

摘要
这篇论文研究大规模的高斯图模型，即多变量完全正的第二阶（MTP2）。我们引入了桥梁概念，这种概念在大规模稀疏图中广泛存在。我们显示，整个问题可以等价地通过以下两个步骤优化：1. 使用桥梁块分解法对阈值矩阵相关的一些更小规模的子问题进行优化。2. 对桥梁相对应的输入进行直观的解决方案。从实践角度来看，这种简单可证的方法可以将大问题分解成小可解决的问题，从而减少计算复杂性和提高所有现有算法的性能。实验表明，我们提出的方法与现有的标准准则相比，具有显著的速度提升。

ML Algorithm Synthesizing Domain Knowledge for Fungal Spores Concentration Prediction

paper_url: http://arxiv.org/abs/2309.13402
repo_url: https://github.com/azminewasi/qcre23-finalist
paper_authors: Md Asif Bin Syed, Azmine Toushik Wasi, Imtiaz Ahmed
for: 提高纸品质量控制的效率和可持续性，实现实时精度测试和纠正控制。
methods: 利用时间序列数据和领域知识，采用机器学习算法进行精度预测。
results: 实现实时精度测试和纠正控制，提高纸品质量和可持续性。

Abstract
The pulp and paper manufacturing industry requires precise quality control to ensure pure, contaminant-free end products suitable for various applications. Fungal spore concentration is a crucial metric that affects paper usability, and current testing methods are labor-intensive with delayed results, hindering real-time control strategies. To address this, a machine learning algorithm utilizing time-series data and domain knowledge was proposed. The optimal model employed Ridge Regression achieving an MSE of 2.90 on training and validation data. This approach could lead to significant improvements in efficiency and sustainability by providing real-time predictions for fungal spore concentrations. This paper showcases a promising method for real-time fungal spore concentration prediction, enabling stringent quality control measures in the pulp-and-paper industry.

摘要
《纸品生产业需要精准质量控制，以 Ensure 纸品具备不同应用的纯度和不损害性。蕈菌苗量是影响纸品使用性的关键指标，现有的测试方法具有劳动 INTENSIVE 和延迟结果，使得实时控制策略受阻。为此，一种机器学习算法使用时序数据和领域知识进行建模，选择最佳模型为ridge regression，实现了训练和验证数据的MSE为2.90。这种方法可能会在效率和可持续性方面提供重要的改进，并为纸品生产业提供实时蕈菌苗量预测，帮助实施严格的质量控制措施。本文介绍了实时蕈菌苗量预测的有效方法，为纸品生产业带来更好的质量控制和可持续发展。》Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

On the Sweet Spot of Contrastive Views for Knowledge-enhanced Recommendation

paper_url: http://arxiv.org/abs/2309.13384
repo_url: None
paper_authors: Haibo Ye, Xinjie Li, Yuan Yao, Hanghang Tong
for: 这个论文旨在提高推荐系统的效果，通过在知识图（KG）和用户项交互图（IG）之间建立对应关系。
methods: 该论文提出了一种新的对照学习框架，通过在IG和KG之间建立两个不同的对照视图，并将IG中的知识信息与KG进行一个方向的融合，以便更好地利用知识。
results: 对于三个实际 dataset，该方法的实验结果显示，相比之前的状态 искус技术，该方法能够更高效地提高推荐系统的效果。代码可以通过以下隐藏链接获取：https://figshare.com/articles/conference_contribution/SimKGCL/22783382

Abstract
In recommender systems, knowledge graph (KG) can offer critical information that is lacking in the original user-item interaction graph (IG). Recent process has explored this direction and shows that contrastive learning is a promising way to integrate both. However, we observe that existing KG-enhanced recommenders struggle in balancing between the two contrastive views of IG and KG, making them sometimes even less effective than simply applying contrastive learning on IG without using KG. In this paper, we propose a new contrastive learning framework for KG-enhanced recommendation. Specifically, to make full use of the knowledge, we construct two separate contrastive views for KG and IG, and maximize their mutual information; to ease the contrastive learning on the two views, we further fuse KG information into IG in a one-direction manner.Extensive experimental results on three real-world datasets demonstrate the effectiveness and efficiency of our method, compared to the state-of-the-art. Our code is available through the anonymous link:https://figshare.com/articles/conference_contribution/SimKGCL/22783382

摘要
在推荐系统中，知识图（KG）可以提供用户-ITEM互动图（IG）缺失的关键信息。近期的进程探索了这个方向，并显示了对照学习是一种有前途的方法来整合两者。然而，我们发现现有的KG强化推荐器在平衡两个对照视图IG和KG的视图之间很困难，有时甚至比不用对IG进行对照学习更不有效。在本文中，我们提出了一个新的对照学习框架 для KG强化推荐。具体来说，为了充分利用知识，我们构建了两个独立的对照视图 для KG和IG，并尽可能地增加它们之间的相互信息。此外，为了让对照学习在两个视图之间更加容易，我们进一步将KG信息集成到IG中一irectionally。我们的实验结果表明，我们的方法比现有的状态前方法更有效和高效，并且可以在三个实际 dataset上进行证明。我们的代码可以通过以下匿名链接获取：https://figshare.com/articles/conference_contribution/SimKGCL/22783382

Learning Invariant Representations with a Nonparametric Nadaraya-Watson Head

paper_url: http://arxiv.org/abs/2309.13377
repo_url: https://github.com/alanqrwang/nwhead
paper_authors: Alan Q. Wang, Minh Nguyen, Mert R. Sabuncu
for: 本研究旨在提出一种非 Parametric 的协同学习方法，以实现在不同环境下的数据分布不同时，机器学习模型的可重用性。
methods: 本研究使用 Nadaraya-Watson (NW) 头，该头通过比较学习的表示与支持集中的标注数据进行比较，来预测。通过控制支持集，可以编码不同的 causal 假设。
results: 通过在三个实际世界领域的域泛化任务上进行验证，表明本方法可以学习不受环境影响的抽象特征，并且可以在不同环境下提供好的预测性能。

Abstract
Machine learning models will often fail when deployed in an environment with a data distribution that is different than the training distribution. When multiple environments are available during training, many methods exist that learn representations which are invariant across the different distributions, with the hope that these representations will be transportable to unseen domains. In this work, we present a nonparametric strategy for learning invariant representations based on the recently-proposed Nadaraya-Watson (NW) head. The NW head makes a prediction by comparing the learned representations of the query to the elements of a support set that consists of labeled data. We demonstrate that by manipulating the support set, one can encode different causal assumptions. In particular, restricting the support set to a single environment encourages the model to learn invariant features that do not depend on the environment. We present a causally-motivated setup for our modeling and training strategy and validate on three challenging real-world domain generalization tasks in computer vision.

摘要
机器学习模型经常在培育环境不同于训练环境下部署时失败。当有多个环境可用于训练时，许多方法可以学习不受环境影响的表示，以期这些表示可以在未见领域中传输。在这种工作中，我们提出了一种非 Parametric 策略，基于最近提出的 Nadaraya-Watson（NW）头来学习不受环境影响的表示。NW 头通过比较学习的表示和一个支持集中的标注数据进行比较，来预测。我们表明，通过修改支持集，可以编码不同的 causal 假设。例如，限制支持集为单个环境，使模型学习不受环境的无关特征。我们采用 causally-motivated 的模型和训练策略，并在计算机视觉领域中进行了三个复杂的实际领域泛化任务的验证。

Asca: less audio data is more insightful

paper_url: http://arxiv.org/abs/2309.13373
repo_url: https://github.com/leeciang/asca
paper_authors: Xiang Li, Junhao Chen, Chao Li, Hongwu Lv
for: 本研究旨在提高特殊领域中的专业音频识别，如鸟叫声和潜水声频率的标准化和预测。
methods: 本研究使用了Audio Spectrogram Convolution Attention（ASCA）模型，结合了Transformer和卷积储存架构，同时还具有新的网络设计和注意技术，以及资料增强和调整策略。
results: 在BirdCLEF2023和AudioSet（平衡） datasets上，ASCA模型实现了81.2%和35.1%的准确率，均高于竞争方法。

Abstract
Audio recognition in specialized areas such as birdsong and submarine acoustics faces challenges in large-scale pre-training due to the limitations in available samples imposed by sampling environments and specificity requirements. While the Transformer model excels in audio recognition, its dependence on vast amounts of data becomes restrictive in resource-limited settings. Addressing this, we introduce the Audio Spectrogram Convolution Attention (ASCA) based on CoAtNet, integrating a Transformer-convolution hybrid architecture, novel network design, and attention techniques, further augmented with data enhancement and regularization strategies. On the BirdCLEF2023 and AudioSet(Balanced), ASCA achieved accuracies of 81.2% and 35.1%, respectively, significantly outperforming competing methods. The unique structure of our model enriches output, enabling generalization across various audio detection tasks. Our code can be found at https://github.com/LeeCiang/ASCA.

摘要
<>将文本翻译成简化中文。<>特殊领域的音频识别，如鸟唱和潜船音频识别，由样本环境和特定要求所限制大规模预训练的样本数量。虽然Transformer模型在音频识别方面表现出色，但它在资源有限的设置下变得有限制。为此，我们介绍Audio Spectrogram Convolution Attention（ASCA），基于CoAtNet的干扰混合架构，加入了Transformer- convolution 混合 Architecture，新型网络设计，以及注意技术，并进一步使用数据增强和常规化策略。在BirdCLEF2023和AudioSet（平衡）上，ASCA实现了81.2%和35.1%的准确率，分别大幅超越竞争方法。ASCA的独特结构使得输出更加丰富，使得泛化到不同的音频检测任务。我们的代码可以在https://github.com/LeeCiang/ASCA中找到。

Machine Learning with Chaotic Strange Attractors

paper_url: http://arxiv.org/abs/2309.13361
repo_url: https://github.com/hieu9955/ggggg
paper_authors: Bahadır Utku Kesgin, Uğur Teğin
for: 这个论文是为了解决机器学习中的高功耗问题，通过使用混沌非线性吸引器来实现低功耗的机器学习任务。
methods: 该论文提出了一种基于混沌非线性吸引器的分析计算方法，该方法可以高效地进行机器学习任务，而且具有可编程、通用和泛化的特点。
results: 研究发现，使用该方法可以在批量处理和神经网络训练中实现低功耗，并且在分类和回归学习任务中达到了高准确率和低误差。

Abstract
Machine learning studies need colossal power to process massive datasets and train neural networks to reach high accuracies, which have become gradually unsustainable. Limited by the von Neumann bottleneck, current computing architectures and methods fuel this high power consumption. Here, we present an analog computing method that harnesses chaotic nonlinear attractors to perform machine learning tasks with low power consumption. Inspired by neuromorphic computing, our model is a programmable, versatile, and generalized platform for machine learning tasks. Our mode provides exceptional performance in clustering by utilizing chaotic attractors' nonlinear mapping and sensitivity to initial conditions. When deployed as a simple analog device, it only requires milliwatt-scale power levels while being on par with current machine learning techniques. We demonstrate low errors and high accuracies with our model for regression and classification-based learning tasks.

摘要
Translated into Simplified Chinese:机器学习研究需要巨大的能源来处理庞大的数据集和训练神经网络以达到高精度，这已经变得不可持续。由于 von Neumann 瓶颈，当前的计算架构和方法都在提高能 consumption。在这里，我们提出了一种 Analog computing 方法，利用混沌非线性吸引器来实现机器学习任务，具有低功耗Characteristics。 draw inspiration from neuromorphic computing， our model is a programmable, versatile, and generalized platform for machine learning tasks. Our model shows excellent performance in clustering by leveraging chaotic attractors' nonlinear mapping and sensitivity to initial conditions. When deployed as a simple analog device, it only requires milliwatt-scale power levels while being on par with current machine learning techniques. We demonstrate low errors and high accuracies with our model for regression and classification-based learning tasks.

Accelerating Particle and Fluid Simulations with Differentiable Graph Networks for Solving Forward and Inverse Problems

paper_url: http://arxiv.org/abs/2309.13348
repo_url: None
paper_authors: Krishna Kumar, Yongjin Choi
for: 加速粒子和液体 simulations，解决前向和 inverse problems。
methods: 使用物理嵌入的分发式图网络模型（GNS），通过学习Edge messages来学习本地互动规则，提高对新环境的泛化能力。
results: 对于 granular flow prediction 比 CPU 平行数值计算 Speedup 达到 165 倍，提议一种 hybrid GNS/Material Point Method（MPM），能够在 GNS rollouts 中插入 MPM，以满足保守量和误差最小化，实现了对数值计算的 24 倍加速。 GNS 还可以解决 inverse problems，通过自动导数来计算摩擦角的梯度，并且可以逐步更新摩擦角，以实现最佳匹配目标跑道距离。

Abstract
We leverage physics-embedded differentiable graph network simulators (GNS) to accelerate particulate and fluid simulations to solve forward and inverse problems. GNS represents the domain as a graph with particles as nodes and learned interactions as edges. Compared to modeling global dynamics, GNS enables learning local interaction laws through edge messages, improving its generalization to new environments. GNS achieves over 165x speedup for granular flow prediction compared to parallel CPU numerical simulations. We propose a novel hybrid GNS/Material Point Method (MPM) to accelerate forward simulations by minimizing error on a pure surrogate model by interleaving MPM in GNS rollouts to satisfy conservation laws and minimize errors achieving 24x speedup compared to pure numerical simulations. The differentiable GNS enables solving inverse problems through automatic differentiation, identifying material parameters that result in target runout distances. We demonstrate the ability of GNS to solve inverse problems by iteratively updating the friction angle (a material property) by computing the gradient of a loss function based on the final and target runouts, thereby identifying the friction angle that best matches the observed runout. The physics-embedded and differentiable simulators open an exciting new paradigm for AI-accelerated design, control, and optimization.

摘要
Translated into Simplified Chinese:我们利用嵌入物理的分解urable图示网络优化器（GNS）加速粒子和流体 simulations以解决前向和反向问题。GNS将Domain表示为一个图，粒子作为节点，学习交互作为边。与模型全局动力学相比，GNS可以通过边上消息学习地方交互规则，提高其适应新环境的能力。GNS在粒子流预测方面实现了165倍的加速，比CPU并行数值 simulations 高得多。我们提出了一种新的混合GNS/物理点方法（MPM），通过混合MPM在GNS扫描中来加速前向 simulations，以满足保守法和降低误差，实现24倍的加速比例。可微的GNS可以通过自动导数来解决反向问题，Material angle （一种材料属性）的迭代更新，以实现最佳匹配目标跑道距离。我们示出了GNS可以解决反向问题，通过计算目标跑道距离和最终跑道距离的损失函数梯度来更新Friction angle。物理嵌入和可微的模拟器开启了一个新的AI加速设计、控制和优化的新时代。

On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

paper_url: http://arxiv.org/abs/2309.13337
repo_url: None
paper_authors: Yicheng Li, Haobo Zhang, Qian Lin
for: 本研究探讨了 neural network 中广泛观察到的 ‘benign overfitting’ 现象，这个现象对 statistical learning theory 中的 ‘bias-variance trade-off’ 假设提出了挑战。
methods: 本研究使用了 kernel ridge regression 来描述 neural network 的学习曲线，并提供了一个准确的学习曲线 Characterization，包括 regulatory parameter 的选择、source condition 和 noise 的效应。
results: 研究结果表明，在小量噪声下，very wide neural network 才存在 ‘benign overfitting’ 现象。

Abstract
The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the 'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign overfitting phenomenon' exists in very wide neural networks only when the noise level is small.

摘要
widely observed "benign overfitting phenomenon" in the neural network literature raises the challenge to the "bias-variance trade-off" doctrine in the statistical learning theory. Since the generalization ability of the "lazy trained" over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the "Gaussian design" assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition, and the noise. In particular, our results suggest that the "benign overfitting phenomenon" exists in very wide neural networks only when the noise level is small.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know.

Predicting Temperature of Major Cities Using Machine Learning and Deep Learning

paper_url: http://arxiv.org/abs/2309.13330
repo_url: None
paper_authors: Wasiou Jaharabi, MD Ibrahim Al Hossain, Rownak Tahmid, Md. Zuhayer Islam, T. M. Saad Rayhan
For: The paper aims to develop an accurate temperature prediction method using machine learning algorithms and time series analysis, specifically focusing on the temperature data of major cities.* Methods: The authors use a dataset provided by the University of Dayton, which includes temperature data from major cities. They apply time series analysis techniques such as ARIMA, SARIMA, and Prophet, and incorporate the concept of RNN and LSTM to filter out abnormalities, preprocess the data, and make predictions of future temperature trends.* Results: The authors achieve accurate predictions of temperature in major cities based on the available data, and demonstrate the effectiveness of their method in combating climate change by providing accurate temperature predictions for future reference.

Abstract
Currently, the issue that concerns the world leaders most is climate change for its effect on agriculture, environment and economies of daily life. So, to combat this, temperature prediction with strong accuracy is vital. So far, the most effective widely used measure for such forecasting is Numerical weather prediction (NWP) which is a mathematical model that needs broad data from different applications to make predictions. This expensive, time and labor consuming work can be minimized through making such predictions using Machine learning algorithms. Using the database made by University of Dayton which consists the change of temperature in major cities we used the Time Series Analysis method where we use LSTM for the purpose of turning existing data into a tool for future prediction. LSTM takes the long-term data as well as any short-term exceptions or anomalies that may have occurred and calculates trend, seasonality and the stationarity of a data. By using models such as ARIMA, SARIMA, Prophet with the concept of RNN and LSTM we can, filter out any abnormalities, preprocess the data compare it with previous trends and make a prediction of future trends. Also, seasonality and stationarity help us analyze the reoccurrence or repeat over one year variable and removes the constrain of time in which the data was dependent so see the general changes that are predicted. By doing so we managed to make prediction of the temperature of different cities during any time in future based on available data and built a method of accurate prediction. This document contains our methodology for being able to make such predictions.

摘要
To develop our methodology, we used a database of temperature changes in major cities, provided by the University of Dayton. We employed Time Series Analysis, specifically Long Short-Term Memory (LSTM) models, to turn existing data into a tool for future prediction. LSTM models can capture long-term trends, seasonality, and stationarity in the data, allowing us to make accurate predictions.We used models such as ARIMA, SARIMA, and Prophet, all of which incorporate the concept of Recurrent Neural Networks (RNN) and LSTM. These models can filter out anomalies, preprocess the data, and compare it with previous trends to make accurate predictions. Additionally, seasonality and stationarity help us analyze the reoccurrence of variables over one year and remove the constraints of time, allowing us to see general changes predicted.By using this methodology, we were able to make accurate predictions of temperature in different cities at any time in the future based on available data. This document outlines our methodology for making such predictions.

An Interpretable Systematic Review of Machine Learning Models for Predictive Maintenance of Aircraft Engine

paper_url: http://arxiv.org/abs/2309.13310
repo_url: None
paper_authors: Abdullah Al Hasib, Ashikur Rahman, Mahpara Khabir, Md. Tanvir Rouf Shawon
For: This paper aims to predict aircraft engine failure using machine learning and deep learning models to avoid any kind of disaster.* Methods: The paper utilizes sensor data and employs various machine learning and deep learning models such as LSTM, Bi-LSTM, RNN, Bi-RNN, GRU, Random Forest, KNN, Naive Bayes, and Gradient Boosting to predict aircraft engine failure within a predetermined number of cycles.* Results: The paper achieves a lucrative accuracy of 97.8%, 97.14%, and 96.42% using GRU, Bi-LSTM, and LSTM respectively, demonstrating the capability of the models to predict maintenance at an early stage.

Abstract
This paper presents an interpretable review of various machine learning and deep learning models to predict the maintenance of aircraft engine to avoid any kind of disaster. One of the advantages of the strategy is that it can work with modest datasets. In this study, sensor data is utilized to predict aircraft engine failure within a predetermined number of cycles using LSTM, Bi-LSTM, RNN, Bi-RNN GRU, Random Forest, KNN, Naive Bayes, and Gradient Boosting. We explain how deep learning and machine learning can be used to generate predictions in predictive maintenance using a straightforward scenario with just one data source. We applied lime to the models to help us understand why machine learning models did not perform well than deep learning models. An extensive analysis of the model's behavior is presented for several test data to understand the black box scenario of the models. A lucrative accuracy of 97.8%, 97.14%, and 96.42% are achieved by GRU, Bi-LSTM, and LSTM respectively which denotes the capability of the models to predict maintenance at an early stage.

摘要

CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity

paper_url: http://arxiv.org/abs/2309.13307
repo_url: None
paper_authors: Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, Liwei Wang, Zhouchen Lin, Song-chun Zhu
For: 降低分布式机器学习中的通信复杂度，以提高训练速度和扩展机器数。* Methods: 提出了一新技术 named Common randOm REconstruction(CORE), 可以压缩在机器之间传输的信息，以降低通信复杂度，不受其他严格条件限制。 CORE 将 вектор值信息投影到低维度的归一化向量上，并在通信后重建信息，通过共同Random vectors。* Results: 应用 CORE 到两个分布式任务，分别是线性模型的凸优化和通用非凸优化，设计了新的分布式算法，可以证明性地降低通信复杂度。例如，我们示出对线性模型，CORE 基于算法可以编码梯度 вектор到 $\mathcal{O}(1)$-bits（对 $\mathcal{O}(d)$ 比），并保持不变的整体趋势，超过现有结果。

Abstract
With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. In this paper, we propose a new technique named Common randOm REconstruction(CORE), which can be used to compress the information transmitted between machines in order to reduce communication complexity without other strict conditions. Especially, our technique CORE projects the vector-valued information to a low-dimensional one through common random vectors and reconstructs the information with the same random noises after communication. We apply CORE to two distributed tasks, respectively convex optimization on linear models and generic non-convex optimization, and design new distributed algorithms, which achieve provably lower communication complexities. For example, we show for linear models CORE-based algorithm can encode the gradient vector to $\mathcal{O}(1)$-bits (against $\mathcal{O}(d)$), with the convergence rate not worse, preceding the existing results.

摘要
With 分布式机器学习技术在大规模机器学习任务中成为主要瓶颈，交流复杂性已成为加速训练和扩大机器数量的关键问题。在这篇论文中，我们提出一种新的技术名为通用随机重建（CORE），可以减少机器之间的信息传输量，从而降低交流复杂性，而无需其他严格条件。尤其是，我们的CORE技术将向量值信息映射到低维度的均匀随机向量上，并在通信后重建信息，同时保留了同样的随机噪声。我们在两个分布式任务中应用CORE技术，分别是线性模型的凸优化和非凸优化，并设计了新的分布式算法，实现了可靠性下降的交流复杂性。例如，我们示出了对线性模型的CORE-based算法可以将梯度向量编码为$\mathcal{O}(1)$-比特（对$\mathcal{O}(d)$比特），并且保持不变的整体性能。

Beyond Fairness: Age-Harmless Parkinson’s Detection via Voice

paper_url: http://arxiv.org/abs/2309.13292
repo_url: None
paper_authors: Yicheng Wang, Xiaotian Han, Leisheng Yu, Na Zou
for: 这个研究旨在提高 Parkinson’s disease（PD）早期识别的准确性，特别是针对年轻人群（age ≤ 55），因为现有的深度学习模型具有年龄问题。
methods: 我们使用 GradCAM-based 特征遮罩和组合模型来解决年龄问题，保持公平性和准确性。特别是，GradCAM-based 特征遮罩 selectively 隐藏年龄相关的特征，以保持关键的PD检测信息。组合模型进一步提高了少数群（年轻人群）的预测精度。
results: 我们的方法可以增强PD早期识别的准确性，不会对老年人群（age > 55）的预测精度造成影响。此外，我们也提出了一个两步检测策略，以便评估年轻人群中可能的PD早期病人。

Abstract
Parkinson's disease (PD), a neurodegenerative disorder, often manifests as speech and voice dysfunction. While utilizing voice data for PD detection has great potential in clinical applications, the widely used deep learning models currently have fairness issues regarding different ages of onset. These deep models perform well for the elderly group (age $>$ 55) but are less accurate for the young group (age $\leq$ 55). Through our investigation, the discrepancy between the elderly and the young arises due to 1) an imbalanced dataset and 2) the milder symptoms often seen in early-onset patients. However, traditional debiasing methods are impractical as they typically impair the prediction accuracy for the majority group while minimizing the discrepancy. To address this issue, we present a new debiasing method using GradCAM-based feature masking combined with ensemble models, ensuring that neither fairness nor accuracy is compromised. Specifically, the GradCAM-based feature masking selectively obscures age-related features in the input voice data while preserving essential information for PD detection. The ensemble models further improve the prediction accuracy for the minority (young group). Our approach effectively improves detection accuracy for early-onset patients without sacrificing performance for the elderly group. Additionally, we propose a two-step detection strategy for the young group, offering a practical risk assessment for potential early-onset PD patients.

摘要
帕金森病（PD），一种神经退化疾病，经常表现为语音和嗓音畸形。使用语音数据进行PD检测具有很大的优势，但目前广泛使用的深度学习模型存在年龄偏见问题。这些深度模型对于年龄大于55岁的老年组（age>55）表现良好，但对年龄小于或等于55岁的青年组（age<=55）表现不准确。经过我们的调查，年龄偏见的原因包括1）数据集偏见和2）早期病人的轻微症状。然而，传统的偏见纠正方法不实用，因为它们通常会降低主要组（老年组）的预测精度。为解决这个问题，我们提出了一种新的偏见纠正方法，利用GradCAM基于特征遮盖和ensemble模型，以确保不会降低公平性和精度。具体来说，GradCAM基于特征遮盖选择性地遮盖语音数据中年龄相关的特征，保留PD检测中必要的信息。而ensemble模型进一步提高了少数群（年龄小于或等于55岁）的预测精度。我们的方法可以提高早期PD检测的准确率，而不会损害老年组的表现。此外，我们还提议了一种两步检测策略，为 potential early-onset PD 患者提供实用的风险评估。

Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework

paper_url: http://arxiv.org/abs/2309.13278
repo_url: None
paper_authors: Wenzhuo Zhou, Yuhan Li, Ruoqing Zhu, Annie Qu
for: 该文章的目的是为了在无法预测行为政策下验证高置信度的策略评估。
methods: 该文章使用了一种新的统一错误分析，它同时量化了两种错误来源：模型缺陷和采样统计变化。
results: 该文章的方法可以在无需假设任何弱依赖关系的情况下，对无限时间执行 Markov 决策过程中的目标策略值进行高置信度评估，并且可以适应不同的分布转换。

Abstract
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes, where the objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies. This task faces two primary challenges: providing a comprehensive and rigorous error quantification in CI estimation, and addressing the distributional shift that results from discrepancies between the distribution induced by the target policy and the offline data-generating process. Motivated by an innovative unified error analysis, we jointly quantify the two sources of estimation errors: the misspecification error on modeling marginalized importance weights and the statistical uncertainty due to sampling, within a single interval. This unified framework reveals a previously hidden tradeoff between the errors, which undermines the tightness of the CI. Relying on a carefully designed discriminator function, the proposed estimator achieves a dual purpose: breaking the curse of the tradeoff to attain the tightest possible CI, and adapting the CI to ensure robustness against distributional shifts. Our method is applicable to time-dependent data without assuming any weak dependence conditions via leveraging a local supermartingale/martingale structure. Theoretically, we show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings. The numerical performance of the proposed method is examined in synthetic datasets and an OhioT1DM mobile health study.

摘要
我们研究高自信度偏离策略评估在无穷远 horizon Markov决策过程中，目标是使用偏离策略评估数据来建立一个信息interval（CI）的目标策略价值。这个任务面临两个主要挑战：一是提供全面和准确的误差量化，二是Addressing the distributional shift that results from discrepancies between the distribution induced by the target policy and the offline data-generating process。我们受益于一种创新的统一错误分析，可以同时量化两个来源的误差：模型缺陷导致的重要性加权的误差和采样统计误差。这种统一框架显示了一个隐藏的贸易关系，这种关系使得CI的紧密性受到影响。我们采用一种特制的探测函数，该函数可以实现两个目的：打破贸易关系，以便获得最紧密的CI，并适应CI以确保对分布差异的Robustness。我们的方法可以在没有任何弱依赖条件下应用于时间相关的数据，通过利用本地超martingale/martingale结构。理论上，我们显示了我们的算法是样本效率的，误差稳定的，并在非线性函数近似设定下可靠地收敛。我们的方法的数学性能在 sintetic数据和OhioT1DM移动医学研究中进行了数值验证。

A Deep Learning Sequential Decoder for Transient High-Density Electromyography in Hand Gesture Recognition Using Subject-Embedded Transfer Learning

paper_url: http://arxiv.org/abs/2310.03752
repo_url: None
paper_authors: Golara Ahmadi Azar, Qin Hu, Melika Emami, Alyson Fletcher, Sundeep Rangan, S. Farokh Atashzar
for: 这个研究旨在提高人工智能与人类肢体互动的整合，使用深度学习方法来识别手势。
methods: 这个研究使用了深度学习模型，并将Subject-specific transfer learning和多因素混合结构组合在一起，以提高手势识别精度。
results: 研究获得了73%的平均准确率，在65个手势中预测了73%的手势，并且在有限的训练数据下表现比subject-specific方法更好。

Abstract
Hand gesture recognition (HGR) has gained significant attention due to the increasing use of AI-powered human-computer interfaces that can interpret the deep spatiotemporal dynamics of biosignals from the peripheral nervous system, such as surface electromyography (sEMG). These interfaces have a range of applications, including the control of extended reality, agile prosthetics, and exoskeletons. However, the natural variability of sEMG among individuals has led researchers to focus on subject-specific solutions. Deep learning methods, which often have complex structures, are particularly data-hungry and can be time-consuming to train, making them less practical for subject-specific applications. In this paper, we propose and develop a generalizable, sequential decoder of transient high-density sEMG (HD-sEMG) that achieves 73% average accuracy on 65 gestures for partially-observed subjects through subject-embedded transfer learning, leveraging pre-knowledge of HGR acquired during pre-training. The use of transient HD-sEMG before gesture stabilization allows us to predict gestures with the ultimate goal of counterbalancing system control delays. The results show that the proposed generalized models significantly outperform subject-specific approaches, especially when the training data is limited, and there is a significant number of gesture classes. By building on pre-knowledge and incorporating a multiplicative subject-embedded structure, our method comparatively achieves more than 13% average accuracy across partially observed subjects with minimal data availability. This work highlights the potential of HD-sEMG and demonstrates the benefits of modeling common patterns across users to reduce the need for large amounts of data for new users, enhancing practicality.

摘要
人工智能（AI）激活人机界面（HGR）已经吸引了广泛的关注，因为它可以通过解读 périphérique nervous system的深度空间动态特征，如表面电MYography（sEMG）来控制虚拟现实、迅速 prótesis 和 exoskeletons。然而，人类的自然变化导致研究人员更加注重具体化解决方案。深度学习方法，经常具有复杂结构，需要大量数据和训练时间，使其更难实现具体化应用。在这篇论文中，我们提出和开发了一种通用的、顺序解码器，可以在部分观察者下达73%的平均准确率，对65个手势进行预测，通过在pre-training中获得的HGR知识进行嵌入式传播学习。使用transient HD-sEMG передgesture稳定化可以预测手势，以ultimate goal of counterbalancing system control delays。结果表明，我们的总体模型在限制数据量的情况下，特别是有许多手势类型的情况下，较subject-specific方法表现出优异。通过建立在pre-knowledge基础上，并通过multiplicative subject-embedded结构，我们的方法可以在有限的数据可用性下，实现更高的13%的平均准确率。这种工作展示了HD-sEMG的潜力，并证明了模型Users across common patterns可以降低新用户需要的数据量，提高实用性。

Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training

paper_url: http://arxiv.org/abs/2309.13254
repo_url: None
paper_authors: Zhuang Wang, Zhaozhuo Xu, Anshumali Shrivastava, T. S. Eugene Ng
for: 这篇论文的目的是提出一个最佳的通信方案，以缩小在分布式训练深度神经网络（DNNs）中的交通量，并提高总训练效率。
methods: 这篇论文使用了系统性地探索了对缓冲量矩阵的通信方案设计空间，以找出最佳的方案。它还开发了一个称为Zen的Gradient Synchronization系统，可以实现这个最佳方案。
results: 这篇论文的结果显示，使用Zen的Gradient Synchronization系统可以实现至多5.09倍的交通时间减少和训练过程中的对比增加，相比之前的方法。

Abstract
Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training efficiency. Yet, the optimal communication scheme to fully leverage sparsity is still missing. This paper aims to address this gap. We first analyze the characteristics of sparse tensors in popular DNN models to understand the fundamentals of sparsity. We then systematically explore the design space of communication schemes for sparse tensors and find the optimal one. % We then find the optimal scheme based on the characteristics by systematically exploring the design space. We also develop a gradient synchronization system called Zen that approximately realizes it for sparse tensors. We demonstrate that Zen can achieve up to 5.09x speedup in communication time and up to 2.48x speedup in training throughput compared to the state-of-the-art methods.

摘要
分布式训练是深度神经网络（DNN）训练的标准化方法，以多个GPU进行扩大。但是，分布式训练的性能瓶颈在交换梯度同步方面。在实践中，人们发现了梯度矩阵中的稀疏性，这表明可以减少交换的流量并提高端到端训练效率。然而，完全利用稀疏性的最佳通信方案仍然缺失。这篇论文的目的是填补这个差距。我们首先分析了流行的DNN模型中稀疏矩阵的特点，以了解稀疏性的基础。然后，我们系统地探索了稀疏矩阵的通信方案的设计空间，并找到最佳的一种。我们还开发了一个名为“Zen”的梯度同步系统，可以对稀疏矩阵进行约束式实现。我们 demonstarte了Zen可以在交换时间方面实现5.09倍的速度提高和在训练吞吐量方面实现2.48倍的速度提高，比现有方法更高。

Importance of negative sampling in weak label learning

paper_url: http://arxiv.org/abs/2309.13227
repo_url: None
paper_authors: Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj
for: 这个论文的目的是研究如何在弱标注学习中选择最有用的负例。
methods: 该论文使用了多种采样策略来评估负例的用于弱标注学习中的有用性，并选择其中的最有用的负例。
results: 该论文在CIFAR-10和AudioSet datasets上进行测试，并显示了减少计算成本和提高弱标注分类性能的result。

Abstract
Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open problem that has not been well studied for weak-label learning. In this paper, we study several sampling strategies that can measure the usefulness of negative instances for weak-label learning and select them accordingly. We test our method on CIFAR-10 and AudioSet datasets and show that it improves the weak-label classification performance and reduces the computational cost compared to random sampling methods. Our work reveals that negative instances are not all equally irrelevant, and selecting them wisely can benefit weak-label learning.

摘要
弱标记学习是一项具有挑战性的任务，需要从包含正例和负例的数据袋中学习，但只知道包袋标签。负例pool通常比正例pool更大，因此选择每个包袋中最有用的负例是一个开放的问题，尚未得到了充分的研究。在这篇论文中，我们研究了一些采样策略，可以衡量负例对弱标记学习的用于fulfillment，并根据此选择负例。我们在CIFAR-10和AudioSet数据集上测试了我们的方法，并证明它可以提高弱标记分类性能和降低计算成本，相比随机采样方法。我们的工作表明，负例不是一样无关，选择它们谨慎可以帮助弱标记学习。

Grad DFT: a software library for machine learning enhanced density functional theory

paper_url: http://arxiv.org/abs/2309.15127
repo_url: https://github.com/XanaduAI/GradDFT
paper_authors: Pablo A. M. Casares, Jack S. Baker, Matija Medvidovic, Roberto dos Reis, Juan Miguel Arrazola
for: 本研究旨在扩展density functional theory（DFT）的可Accuracy，特别是在强 correlate系统中。
methods: 该研究使用机器学习技术来扩展DFT的能力，并采用了一种新的 exchange-correlation functional parametrization方法，其中使用神经网络确定函数的重要性。
results: 研究人员开发了一个名为Grad DFT的完全可导的JAX基础库，可以快速实现和试验机器学习提高DFT的exchange-correlation能量函数。此外，研究人员还编译了一个精心选择的实验数据集，用于训练和测试模型的准确性。

Abstract
Density functional theory (DFT) stands as a cornerstone method in computational quantum chemistry and materials science due to its remarkable versatility and scalability. Yet, it suffers from limitations in accuracy, particularly when dealing with strongly correlated systems. To address these shortcomings, recent work has begun to explore how machine learning can expand the capabilities of DFT; an endeavor with many open questions and technical challenges. In this work, we present Grad DFT: a fully differentiable JAX-based DFT library, enabling quick prototyping and experimentation with machine learning-enhanced exchange-correlation energy functionals. Grad DFT employs a pioneering parametrization of exchange-correlation functionals constructed using a weighted sum of energy densities, where the weights are determined using neural networks. Moreover, Grad DFT encompasses a comprehensive suite of auxiliary functions, notably featuring a just-in-time compilable and fully differentiable self-consistent iterative procedure. To support training and benchmarking efforts, we additionally compile a curated dataset of experimental dissociation energies of dimers, half of which contain transition metal atoms characterized by strong electronic correlations. The software library is tested against experimental results to study the generalization capabilities of a neural functional across potential energy surfaces and atomic species, as well as the effect of training data noise on the resulting model accuracy.

摘要
density functional theory（DFT）是计算量子化学和材料科学中的一种重要方法，它具有优秀的 universality 和可扩展性。然而，它在强 correlate 系统中的准确性有限制。为了解决这些缺陷，最近的工作开始使用机器学习技术来扩展 DFT 的能力；这是一个充满开放 вопросов和技术挑战的尝试。在这个工作中，我们提出了 Grad DFT：一个完全可导的 JAX 基础库，允许快速的原型和机器学习增强 exchange-correlation 能量函数的 экспериментирование。Grad DFT 使用一种先进的 exchange-correlation 函数的参数化方法，该方法通过使用神经网络确定的权重，将 exchange-correlation 函数转化为一个可导的形式。此外，Grad DFT 还包括一系列辅助函数，其中包括一个可编译的和完全可导的自 consistent 迭代过程。为支持训练和测试努力，我们还编译了一个权威的对应的实验性离解能量数据集，该数据集包括含有过渡金属原子的 dimer 的实验离解能量，这些金属原子具有强电子 correlate 性。软件库在实验结果上进行测试，以研究一个神经函数在 potential energy surface 和原子种之间的泛化能力，以及训练数据噪声对模型准确性的影响。

Causal Reasoning: Charting a Revolutionary Course for Next-Generation AI-Native Wireless Networks

paper_url: http://arxiv.org/abs/2309.13223
repo_url: None
paper_authors: Christo Kurisummoottil Thomas, Christina Chaccour, Walid Saad, Merouane Debbah, Choong Seon Hong
for: 本文提出了一种全面、前瞻的视野，以响应现有的 wireless 网络挑战，通过 causal reasoning 建立可解释、理解、可持续的无线网络。
methods: 本文提出了一种基于 causal discovery、causal representation learning 和 causal inference 的新框架，用于建立 AI-native 无线网络。
results: 本文指出，通过 incorporating causal discovery，可以解决无线网络中的一些挑战，如 ultra-reliable beamforming、near-accurate physical twin modeling、training data augmentation 和 semantic communication。同时，本文还提出了一些可能的框架，用于通过 causal inference 实现未来无线网络的总体目标，包括意图管理、动态适应性、人类水平的认知和理解。

Abstract
Despite the basic premise that next-generation wireless networks (e.g., 6G) will be artificial intelligence (AI)-native, to date, most existing efforts remain either qualitative or incremental extensions to existing ``AI for wireless'' paradigms. Indeed, creating AI-native wireless networks faces significant technical challenges due to the limitations of data-driven, training-intensive AI. These limitations include the black-box nature of the AI models, their curve-fitting nature, which can limit their ability to reason and adapt, their reliance on large amounts of training data, and the energy inefficiency of large neural networks. In response to these limitations, this article presents a comprehensive, forward-looking vision that addresses these shortcomings by introducing a novel framework for building AI-native wireless networks; grounded in the emerging field of causal reasoning. Causal reasoning, founded on causal discovery, causal representation learning, and causal inference, can help build explainable, reasoning-aware, and sustainable wireless networks. Towards fulfilling this vision, we first highlight several wireless networking challenges that can be addressed by causal discovery and representation, including ultra-reliable beamforming for terahertz (THz) systems, near-accurate physical twin modeling for digital twins, training data augmentation, and semantic communication. We showcase how incorporating causal discovery can assist in achieving dynamic adaptability, resilience, and cognition in addressing these challenges. Furthermore, we outline potential frameworks that leverage causal inference to achieve the overarching objectives of future-generation networks, including intent management, dynamic adaptability, human-level cognition, reasoning, and the critical element of time sensitivity.

摘要
尽管下一代无线网络（如6G）将是人工智能（AI）Native，但到目前为止，大多数现有努力仍然是质量的或增量的对现有“AI for wireless” paradigms的扩展。实际上，创建AI Native的无线网络面临着 significativetchnical挑战，主要是因为AI模型的黑盒性、curve-fitting性、需要大量训练数据、以及大 neural networks的能源浪费。为了解决这些挑战，这篇文章提出了一个全面的、前瞻的视野，通过引入一种新的AI Native无线网络框架来解决这些缺陷。这个框架基于emerging field of causal reasoning，可以帮助建立可解释、 reasoning-aware 和可持续的无线网络。为实现这个视野，我们首先 highlight了一些无线网络挑战可以通过 causal discovery 和 representation learning来解决，包括THz系统中的可靠性 beamforming、数字 twin 模型化、训练数据增强和semantic communication。我们示出了如何通过 causal discovery 来实现动态适应、抗难以适应和认知的能力。此外，我们还 outline了可以利用 causal inference 来实现未来 generation networks 的主要目标，包括意图管理、动态适应、人类水平的认知、reasoning 和时间敏感性。

2023-09-23

eess.IV

eess.IV - 2023-09-23

Gaining Insights into Denoising by Inpainting

paper_url: http://arxiv.org/abs/2309.13486
repo_url: None
paper_authors: Daniel Gaa, Vassillen Chizhov, Pascal Peter, Joachim Weickert, Robin Dirk Adam
for: 这个论文的目的是研究一种基于扩散过程的图像分析技术，包括填充-基于压缩和稠密运动计算。
methods: 这个论文使用了多种方法，包括多个不同的扩散子集的填充结果的平均值，以及改变函数值的方法来提高全局逼近质量。
results: 实验表明，使用不同的扩散方法不会提高重建质量，而是数据适应性更重要。此外，这个论文还提出了一些基本的理论和估计结果，包括在1-D情况下的等价关系。

Abstract
The filling-in effect of diffusion processes is a powerful tool for various image analysis tasks such as inpainting-based compression and dense optic flow computation. For noisy data, an interesting side effect occurs: The interpolated data have higher confidence, since they average information from many noisy sources. This observation forms the basis of our denoising by inpainting (DbI) framework. It averages multiple inpainting results from different noisy subsets. Our goal is to obtain fundamental insights into key properties of DbI and its connections to existing methods. Like in inpainting-based image compression, we choose homogeneous diffusion as a very simple inpainting operator that performs well for highly optimized data. We propose several strategies to choose the location of the selected pixels. Moreover, to improve the global approximation quality further, we also allow to change the function values of the noisy pixels. In contrast to traditional denoising methods that adapt the operator to the data, our approach adapts the data to the operator. Experimentally we show that replacing homogeneous diffusion inpainting by biharmonic inpainting does not improve the reconstruction quality. This again emphasizes the importance of data adaptivity over operator adaptivity. On the foundational side, we establish deterministic and probabilistic theories with convergence estimates. In the non-adaptive 1-D case, we derive equivalence results between DbI on shifted regular grids and classical homogeneous diffusion filtering via an explicit relation between the density and the diffusion time.

摘要
Diffusion 过程中的填充效果是许多图像分析任务的有力工具，如填充基于压缩和稠密光流计算。对于噪声污染的数据，有一个 interessante 的侧效： interpolated 数据具有更高的信任度，因为它们平均了许多噪声来源的信息。这个观察成为我们denoising by inpainting（DbI）框架的基础。DbI 平均了不同噪声子集的多个填充结果。我们的目标是获得基本的洞察和现有方法的连接。与填充基于图像压缩类似，我们选择了高度一致的扩散作为非常简单的填充算子，它在高度优化的数据上表现良好。我们还提出了多种选择选择的像素位置策略，以及改进全局approximation质量的方法。与传统的噪声除法方法不同，我们的方法将数据适应到算子而不是适应到数据。实验表明，将Homogeneous替换为Biharmonic不会提高重建质量。这再次强调了数据适应性的重要性，而不是算子适应性。在基础方面，我们建立了deterministic和probabilistic 理论，并提供了收敛估计。在非适应的1-D情况下，我们 derivation 了DbI 在偏移的正规网格上和经典扩散滤波器之间的等价关系，这种关系可以用来描述density 和扩散时间之间的直接关系。

Design of Novel Loss Functions for Deep Learning in X-ray CT

paper_url: http://arxiv.org/abs/2309.14367
repo_url: None
paper_authors: Obaidullah Rahman, Ken D. Sauer, Madhuri Nagare, Charles A. Bouman, Roman Melnyk, Jie Tang, Brian Nett
for: 提高透射计算机断层（CT）图像质量
methods: 使用深度学习（DL）方法，包括在数据频谱域和重建图像域中进行训练
results: 提出创新的损失函数方法，以更好地衡量图像质量和频谱内容的损失，以提高CT图像重建的精度

Abstract
Deep learning (DL) shows promise of advantages over conventional signal processing techniques in a variety of imaging applications. The networks' being trained from examples of data rather than explicitly designed allows them to learn signal and noise characteristics to most effectively construct a mapping from corrupted data to higher quality representations. In inverse problems, one has options of applying DL in the domain of the originally captured data, in the transformed domain of the desired final representation, or both. X-ray computed tomography (CT), one of the most valuable tools in medical diagnostics, is already being improved by DL methods. Whether for removal of common quantum noise resulting from the Poisson-distributed photon counts, or for reduction of the ill effects of metal implants on image quality, researchers have begun employing DL widely in CT. The selection of training data is driven quite directly by the corruption on which the focus lies. However, the way in which differences between the target signal and measured data is penalized in training generally follows conventional, pointwise loss functions. This work introduces a creative technique for favoring reconstruction characteristics that are not well described by norms such as mean-squared or mean-absolute error. Particularly in a field such as X-ray CT, where radiologists' subjective preferences in image characteristics are key to acceptance, it may be desirable to penalize differences in DL more creatively. This penalty may be applied in the data domain, here the CT sinogram, or in the reconstructed image. We design loss functions for both shaping and selectively preserving frequency content of the signal.

摘要
深度学习（DL）在各种成像应用中显示出优势，比如传统的信号处理技术。DL网络从数据示例而不是直接设计，因此可以学习信号和噪声特征，以最有效地构建受损数据到高质量表示的映射。在逆问题中，可以在原始数据频谱中应用DL，在愿望的最终表示频谱中应用DL，或者两者都应用。X射针 Computed Tomography（CT）是医学诊断中最重要的工具，已经由DL方法进行改进。DL可以用于去除常见的量子噪声，或者去除金属implant的影响而导致的图像质量下降。选择训练数据的驱动因素受到受损的影响很直接。然而，在训练中对目标信号和测量数据之间的差别进行惩罚通常采用传统的点均方差或点绝对差惩罚函数。本工作介绍了一种创新的技术，即在DL中不以均方或绝对差惩罚函数来惩罚差别。特别在X射针CT领域， где radiologists的主观偏好在图像特征上对接受性至关重要。在这种情况下，可能需要通过更创新的惩罚方式来惩罚DL。我们设计了在数据频谱中和重建图像中应用的损失函数，以Shape和选择性保留信号的频谱特征。

Statistically Adaptive Filtering for Low Signal Correction in X-ray Computed Tomography

paper_url: http://arxiv.org/abs/2309.13406
repo_url: None
paper_authors: Obaidullah Rahman, Ken D. Sauer, Charles A. Bouman, Roman Melnyk, Brian Nett
for: 实现低X射线剂量CT图像成像，并且维护适当的医学效果。
methods: 使用灵活范围滤波器来缓和低信号领域的残留artefacts。
results: 提高低频率偏好、条状artefacts、本地平均值和标准差、模拟转换函数和杂音功率спектrum等指标。

Abstract
Low x-ray dose is desirable in x-ray computed tomographic (CT) imaging due to health concerns. But low dose comes with a cost of low signal artifacts such as streaks and low frequency bias in the reconstruction. As a result, low signal correction is needed to help reduce artifacts while retaining relevant anatomical structures. Low signal can be encountered in cases where sufficient number of photons do not reach the detector to have confidence in the recorded data. % NOTE: SNR is ratio of powers, not std. dev. X-ray photons, assumed to have Poisson distribution, have signal to noise ratio proportional to the dose, with poorer SNR in low signal areas. Electronic noise added by the data acquisition system further reduces the signal quality. In this paper we will demonstrate a technique to combat low signal artifacts through adaptive filtration. It entails statistics-based filtering on the uncorrected data, correcting the lower signal areas more aggressively than the high signal ones. We look at local averages to decide how aggressive the filtering should be, and local standard deviation to decide how much detail preservation to apply. Implementation consists of a pre-correction step i.e. local linear minimum mean-squared error correction, followed by a variance stabilizing transform, and finally adaptive bilateral filtering. The coefficients of the bilateral filter are computed using local statistics. Results show that improvements were made in terms of low frequency bias, streaks, local average and standard deviation, modulation transfer function and noise power spectrum.

摘要
低剂量X射线是在X射线计算机断层成像（CT）中所需的，因为它可以降低健康风险。然而，低剂量也会导致低信号artefacts，如斜线和低频偏好。为了减少这些artefacts，而不失去有关生物结构的信息，需要进行低信号修正。低信号可以在具有不足的X射线 фотоン数据 recording 时出现，这会导致信号质量下降。在这种情况下，X射线 photons 的信号噪声比（SNR）会随剂量的增加。electronic noise 由数据获取系统添加到数据中，进一步减少信号质量。本文将介绍一种用于解决低信号artefacts的技术 - 适应 filters。这种技术基于统计分析，通过对未经修正的数据进行统计分析，更加严格地修正低信号区域。我们根据本地平均值和本地标准差来决定修正的程度，以保留生物结构的细节。实现方式包括先进行本地线性最小二乘均值修正，然后应用变量稳定化变换，最后使用适应二值滤波。适应滤波的系数是根据本地统计来计算的。结果表明，该技术可以提高低频偏好、斜线、本地平均值和标准差、模ulation transfer function 和噪声电力谱的性能。

MBIR Training for a 2.5D DL network in X-ray CT

paper_url: http://arxiv.org/abs/2309.13399
repo_url: None
paper_authors: Obaidullah Rahman, Madhuri Nagare, Ken D. Sauer, Charles A. Bouman, Roman Melnyk, Brian Nett, Jie Tang
for: 这个论文目的是使用深度学习模型来快速实现基于模型的迭代重建图像技术（MBIR）的高品质图像。
methods: 这个论文使用了一种基于Unet的modified 2.5D深度学习网络来模仿MBIR图像。
results: 研究发现，使用这种深度学习模型可以快速获得高品质MBIR图像，并且计算成本远低于传统的MBIR方法。图像的文本特征和噪声功率谱都与MBIR图像相似，表明深度学习模型成功模拟了MBIR操作。

Abstract
In computed tomographic imaging, model based iterative reconstruction methods have generally shown better image quality than the more traditional, faster filtered backprojection technique. The cost we have to pay is that MBIR is computationally expensive. In this work we train a 2.5D deep learning (DL) network to mimic MBIR quality image. The network is realized by a modified Unet, and trained using clinical FBP and MBIR image pairs. We achieve the quality of MBIR images faster and with a much smaller computation cost. Visually and in terms of noise power spectrum (NPS), DL-MBIR images have texture similar to that of MBIR, with reduced noise power. Image profile plots, NPS plots, standard deviation, etc. suggest that the DL-MBIR images result from a successful emulation of an MBIR operator.

摘要
在计算tomografic imaging中，基于模型的迭代重建方法通常会提供更好的图像质量，相比较传统的快速滤波后 проекcion技术。然而，MBIR是计算成本高的。在这项工作中，我们使用一个modified U-Net架构来模拟MBIR图像质量。我们使用临床FBP和MBIR图像对的 pairs来训练网络，并在计算成本下降的情况下实现MBIR图像质量。视觉和噪声电磁谱（NPS）等指标表明，DL-MBIR图像具有与MBIR图像相似的 текстура，噪声电磁谱下降。图像profile plot、NPS plot等指标表明，DL-MBIR图像是一个成功地模拟MBIROperator的结果。

Direct Iterative Reconstruction of Multiple Basis Material Images in Photon-counting Spectral CT

paper_url: http://arxiv.org/abs/2309.13397
repo_url: None
paper_authors: Obaidullah Rahman, Ken Sauer, Connor Evans, Ryan Roeder
for: 这项研究旨在利用基于模型的迭代重建(MBIR)方法直接从спектральCT数据中重建材料。
methods: 该方法使用了一种基于模型的迭代重建方法，其中材料含量测量为体积分数，总为最大值 unity。使用了iodine和gadolinium作为常见的contrast agent，并使用了一个包含这两种材料的phantom。
results: 在low-concentration scan中，使用了这种方法可以在ROIs中获得volume fractions的 Close to ground truth值。这项研究旨在为将来包含空间含义和/或材料含量regularization的phantoms、动物成像和临床应用铺垫。

Abstract
In this work, we perform direct material reconstruction from spectral CT data using a model based iterative reconstruction (MBIR) approach. Material concentrations are measured in volume fractions, whose total is constrained by a maximum of unity. A phantom containing a combination of 4 basis materials (water, iodine, gadolinium, calcium) was scanned using a photon-counting detector. Iodine and gadolinium were chosen because of their common use as contrast agents in CT imaging. Scan data was binned into 5 energy (keV) levels. Each energy bin in a calibration scan was reconstructed, allowing the linear attenuation coefficient of each material for every energy to be estimated by a least-squares fit to ground truth in the image domain. The resulting $5\times 4$ matrix, for $5$ energies and $4$ materials, is incorporated into the forward model in direct reconstruction of the $4$ basis material images with spatial and/or inter-material regularization. In reconstruction from a subsequent low-concentration scan, volume fractions within regions of interest (ROIs) are found to be close to the ground truth. This work is meant to lay the foundation for further work with phantoms including spatially coincident mixtures of contrast materials and/or contrast agents in widely varying concentrations, molecular imaging from animal scans, and eventually clinical applications.

摘要
在这项工作中，我们使用基于模型的迭代重建（MBIR）方法直接重建物质图像从 spectral CT 数据。物质浓度表示为体积分数，总是受限于最大unity。一个包含四种基本材料（水、iodine、gadolinium、 calcium）的phantom在一个photon-counting 探测器上进行了扫描。iodine 和 gadolinium 选择是因为它们在 CT 图像中广泛使用为contrast agent。扫描数据被分割成5个能量（keV）层。每个能量层在一个calibration scan中的每个像素的直线吸收系数可以通过对真实图像中的图像域最小二乘来估算。这个 $5\times 4$ 矩阵，其中有5个能量和4种材料，被 incorporated 到了直接重建的 forward 模型中。在重建从后续的低浓度扫描中，ROIs 中的体积分数几乎与真实值一致。这项工作的目的是为了铺垫将来的灵活材料混合物和高级分子成像、动物扫描和临床应用。

Semantic Communications using Foundation Models: Design Approaches and Open Issues

paper_url: http://arxiv.org/abs/2309.13315
repo_url: None
paper_authors: Peiwen Jiang, Chao-Kai Wen, Xinping Yi, Xiao Li, Shi Jin, Jun Zhang
for: This paper aims to investigate the impact of foundation models (FMs) on different system levels, including computation and memory complexity, and to explore the use of compact models to balance performance and complexity.
methods: The paper uses universal knowledge to profoundly transform system design and employs three separate approaches that employ FMs to balance performance and complexity.
results: The study highlights unresolved issues in the field that need addressing, and provides insights into the effectiveness, semantic, and physical levels of system design.

Abstract
Foundation models (FMs), including large language models, have become increasingly popular due to their wide-ranging applicability and ability to understand human-like semantics. While previous research has explored the use of FMs in semantic communications to improve semantic extraction and reconstruction, the impact of these models on different system levels, considering computation and memory complexity, requires further analysis. This study focuses on integrating FMs at the effectiveness, semantic, and physical levels, using universal knowledge to profoundly transform system design. Additionally, it examines the use of compact models to balance performance and complexity, comparing three separate approaches that employ FMs. Ultimately, the study highlights unresolved issues in the field that need addressing.

摘要
基于语言模型（FM）的应用，包括大型语言模型，在各种领域得到广泛应用，这主要归功于它们能够理解人类语言 semantics。 although previous research has explored the use of FMs in semantic communications to improve semantic extraction and reconstruction, the impact of these models on different system levels, considering computation and memory complexity, requires further analysis.本研究强调在效iveness、semantic和physical各级别中集成FMs，使用通用知识进行深度变换系统设计。此外，它还研究了使用压缩模型来平衡性能和复杂性，对三种使用FMs的方法进行比较。最终，研究披露了这个领域中还有待解决的问题。Here's the word-for-word translation:基于语言模型（FM）的应用，包括大型语言模型，在各种领域得到广泛应用，这主要归功于它们能够理解人类语言 semantics。 although previous research has explored the use of FMs in semantic communications to improve semantic extraction and reconstruction, the impact of these models on different system levels, considering computation and memory complexity, requires further analysis.本研究强调在效iveness、semantic和physical各级别中集成FMs，使用通用知识进行深度变换系统设计。此外，它还研究了使用压缩模型来平衡性能和复杂性，对三种使用FMs的方法进行比较。最终，研究披露了这个领域中还有待解决的问题。

2023-09-23

eess.SP

eess.SP - 2023-09-23

Sens-BERT: Enabling Transferability and Re-calibration of Calibration Models for Low-cost Sensors under Reference Measurements Scarcity

paper_url: http://arxiv.org/abs/2309.13390
repo_url: None
paper_authors: M V Narayana, Kranthi Kumar Rachvarapu, Devendra Jalihal, Shiva Nagendra S M
for:这个研究旨在提高低成本测器（LCS）的精确性，以便在空气质量监控中大规模地应用。methods:这个研究使用了一种基于BERT的学习方法，称为Sens-BERT，来对LCS进行准确化。这个方法分成两个阶段：自主学习预训和精确训练。在预训阶段，Sens-BERT被训练以使其学习LCS测器的资料分布特征，并生成对应的嵌入。在精确训练阶段，我们使用Sens-BERT嵌入来学习一个准确化模型。results:这个研究的结果显示，Sens-BERT可以对LCS进行高精确性的准确化，而且不需要大量的对照站资料或频繁的重新准确化。此外，Sens-BERT可以跨测器和位置进行转移学习，因此可以在不同的测器和位置上进行准确化。

Abstract
Low-cost sensors measurements are noisy, which limits large-scale adaptability in airquality monitoirng. Calibration is generally used to get good estimates of air quality measurements out from LCS. In order to do this, LCS sensors are typically co-located with reference stations for some duration. A calibration model is then developed to transfer the LCS sensor measurements to the reference station measurements. Existing works implement the calibration of LCS as an optimization problem in which a model is trained with the data obtained from real-time deployments; later, the trained model is employed to estimate the air quality measurements of that location. However, this approach is sensor-specific and location-specific and needs frequent re-calibration. The re-calibration also needs massive data like initial calibration, which is a cumbersome process in practical scenarios. To overcome these limitations, in this work, we propose Sens-BERT, a BERT-inspired learning approach to calibrate LCS, and it achieves the calibration in two phases: self-supervised pre-training and supervised fine-tuning. In the pre-training phase, we train Sens-BERT with only LCS data (without reference station observations) to learn the data distributional features and produce corresponding embeddings. We then use the Sens-BERT embeddings to learn a calibration model in the fine-tuning phase. Our proposed approach has many advantages over the previous works. Since the Sens-BERT learns the behaviour of the LCS, it can be transferable to any sensor of the same sensing principle without explicitly training on that sensor. It requires only LCS measurements in pre-training to learn the characters of LCS, thus enabling calibration even with a tiny amount of paired data in fine-tuning. We have exhaustively tested our approach with the Community Air Sensor Network (CAIRSENSE) data set, an open repository for LCS.

摘要
低成本感测数据具有噪声，限制了大规模适应性在空气质量监测中。通常情况下，使用均拌法来获得良好的空气质量测量结果。为了实现这一点，低成本感测器通常会与参照站同时进行数据采集。然后，通过开发一个均拌模型，将低成本感测器的测量结果转换为参照站的测量结果。现有的方法通常是通过实时部署来训练一个模型，然后使用这个训练好的模型来估计当地的空气质量测量结果。然而，这种方法具有感测器和地点特定的限制，需要频繁重新均拌，并且重新均拌需要大量的数据，如初始均拌，这在实际应用中是一个繁琐的过程。为了解决这些限制，在这项工作中，我们提出了一种基于BERT的学习方法来均拌低成本感测器。我们的方法分为两个阶段：自主启动阶段和精度调整阶段。在自主启动阶段，我们使用只有低成本感测器数据（没有参照站观测）来帮助Sens-BERT学习数据分布特征，并生成相应的嵌入。然后，在精度调整阶段，我们使用Sens-BERT嵌入来学习一个均拌模型。我们的方法具有许多优势。因为Sens-BERT学习了低成本感测器的行为，因此它可以在任何相同感测原理的感测器上进行传输学习，不需要单独对每个感测器进行均拌。此外，我们只需要在启动阶段使用低成本感测器数据来学习低成本感测器的特征，因此在精度调整阶段只需要小量的配对数据，这在实际应用中是一个方便的。我们在社区空气感测网络（CAIRSENSE）数据集上进行了广泛的测试，并证明了我们的方法的可行性。

Multi-Static ISAC in Cell-Free Massive MIMO: Precoder Design and Privacy Assessment

paper_url: http://arxiv.org/abs/2309.13368
repo_url: https://github.com/isabella-gomes/globecom2023
paper_authors: Isabella W. G. da Silva, Diana P. M. Osorio, Markku Juntti
for: 本研究旨在提高 Cell-free 大量多输入多输出基础设施上的感知通信网络的多样性和功率消耗。
methods: 本文使用了jointly optimizes 的秘密预编码器设计来满足感知和通信需求，并考虑了内部敌对者的攻击。
results: 结果表明，在多Static 环境中，可以更精准地估算目标位置，比单Static 实现更好。

Abstract
A multi-static sensing-centric integrated sensing and communication (ISAC) network can take advantage of the cell-free massive multiple-input multiple-output infrastructure to achieve remarkable diversity gains and reduced power consumption. While the conciliation of sensing and communication requirements is still a challenge, the privacy of the sensing information is a growing concern that should be seriously taken on the design of these systems to prevent other attacks. This paper tackles this issue by assessing the probability of an internal adversary to infer the target location information from the received signal by considering the design of transmit precoders that jointly optimizes the sensing and communication requirements in a multi-static-based cell-free ISAC network. Our results show that the multi-static setting facilitates a more precise estimation of the location of the target than the mono-static implementation.

摘要
一种多Static感知中心Integrated sensing and communication（ISAC）网络可以利用无细结构巨量多输入多输出基础设施，实现Remarkable的多样性增强和降低功率消耗。虽然感知和通信需求的妥协仍然是一个挑战，但感知信息的隐私问题在这些系统的设计中应该严重考虑，以防止其他攻击。本文通过评估接收信号中target位置信息的泄露概率，来评估 transmit precoder的设计，并jointly optimizes the sensing and communication requirements in a multi-static-based cell-free ISAC network。我们的结果表明，在多Static设计下，可以更准确地估计目标的位置，比单Static实现更高精度。

Reinforcement Learning for Robust Header Compression under Model Uncertainty

paper_url: http://arxiv.org/abs/2309.13291
repo_url: None
paper_authors: Shusen Jing, Songyang Zhang, Zhi Ding
For: This paper investigates the integration of bi-directional header compression (BD-ROHC) with reinforcement learning (RL) to improve data efficiency in modern wireless communication systems.* Methods: The paper formulates a partially observable Markov decision process (POMDP) to model the compression process, and uses a deep Q-network (DQN) to learn the optimal compression policy.* Results: Compared to ideal dynamic programming (DP), the proposed method is more scalable and does not require prior knowledge of the transition dynamics or accurate observation dependency of the model.

Abstract
Robust header compression (ROHC), critically positioned between the network and the MAC layers, plays an important role in modern wireless communication systems for improving data efficiency. This work investigates bi-directional ROHC (BD-ROHC) integrated with a novel architecture of reinforcement learning (RL). We formulate a partially observable \emph{Markov} decision process (POMDP), in which agent is the compressor, and the environment consists of the decompressor, channel and header source. Our work adopts the well-known deep Q-network (DQN), which takes the history of actions and observations as inputs, and outputs the Q-values of corresponding actions. Compared with the ideal dynamic programming (DP) proposed in the existing works, our method is scalable to the state, action and observation spaces. In contrast, DP often suffers from formidable computational complexity when the number of states becomes large due to long decompressor feedback delay and complex channel models. In addition, our method does not require prior knowledge of the transition dynamics and accurate observation dependency of the model, which are often not available in many practical applications.

摘要
Robust header compression（ROHC），位于网络和 MAC 层之间，在现代无线通信系统中扮演着重要的角色，以提高数据效率。这项工作 investigate 双向 ROHC（BD-ROHC）与 reinforcement learning（RL）新的架构相结合。我们将 partially observable 马尔可夫决策过程（POMDP）形式ulated，其中 compressor 是 agent，环境包括 decompressor、通道和 header source。我们采用了著名的深度优化网络（DQN），它接受了历史动作和观察输入，并输出对应动作的 Q-值。相比于现有的理想动态计划（DP），我们的方法可扩展到状态、动作和观察空间。而 DP 则经常由长 decompressor 反馈延迟和复杂的通道模型而受到强大的计算复杂度限制。此外，我们的方法不需要transition dynamics 和 observation dependency 的准确知识，这些知识在许多实际应用中通常不可获得。

How to Differentiate between Near Field and Far Field: Revisiting the Rayleigh Distance

paper_url: http://arxiv.org/abs/2309.13238
repo_url: None
paper_authors: Shu Sun, Renwang Li, Xingchen Liu, Liuxun Xue, Chong Han, Meixia Tao
for: This paper aims to provide a comprehensive overview of the existing near field (NF) and far field (FF) boundaries in wireless communication systems, and to introduce a novel NF-FF demarcation method based on effective degrees of freedom (EDoF) of the channel.
methods: The proposed method uses EDoF to characterize the channel and demarcate the NF and FF regions. The authors analyze the main features of the EDoF-based NF-FF boundary and provide insights into wireless system design.
results: The authors demonstrate that the EDoF-based border is able to characterize key channel performance more accurately than the classic Rayleigh distance, and provide insights into wireless system design.Here is the result in Simplified Chinese text:
for: 这篇论文旨在提供无线通信系统中现有的近场（NF）和远场（FF）边界的总览，并提出一种基于有效度分度（EDoF）的信道边界分类方法。
methods: 该方法使用EDoF来特征化信道并分类NF和FF区域。作者分析了EDoF基于的NF-FF边界的主要特征，并提供无线系统设计的启示。
results: 作者表明，EDoF基于的边界能够更准确地特征化频率响应的关键性能特征，并提供无线系统设计的启示。

Abstract
Future wireless communication systems are likely to adopt extremely large aperture arrays and millimeter-wave/sub-THz frequency bands to achieve higher throughput, lower latency, and higher energy efficiency. Conventional wireless systems predominantly operate in the far field (FF) of the radiation source of signals. As the array size increases and the carrier wavelength shrinks, however, the near field (NF) becomes non-negligible. Since the NF and FF differ in many aspects, it is essential to distinguish their corresponding regions. In this article, we first provide a comprehensive overview of the existing NF-FF boundaries, then introduce a novel NF-FF demarcation method based on effective degrees of freedom (EDoF) of the channel. Since EDoF is intimately related to spectral efficiency, the EDoF-based border is able to characterize key channel performance more accurately, as compared with the classic Rayleigh distance. Furthermore, we analyze the main features of the EDoF-based NF-FF boundary and provide insights into wireless system design.

摘要
未来无线通信系统可能会采用非常大的天线数组和毫米波/亿赫兹频段来实现更高的传输速率、更低的延迟时间和更高的能效率。传统无线系统主要在辐射源信号的远场（FF）中运行。然而，随着天线数组的增大和辐射波长的减小，近场（NF）变得不可或缺。由于NF和FF在多方面存在差异，因此需要明确NF-FF的分界线。在这篇文章中，我们首先提供了NF-FF分界线的全面回顾，然后介绍了一种基于效果度量（EDoF）的通道分界方法。由于EDoF与spectral efficiency之间存在紧密的关系，EDoF-based分界线能够更加准确地描述通道性能的关键特征，相比于 классический辐射距离。此外，我们还分析了NF-FF分界线的主要特征，并对无线系统设计提供了深入的理解。

2023-09-24

Design Principles of Robust Multi-Armed Bandit Framework in Video Recommendations

The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance

Improving Robustness of Deep Convolutional Neural Networks via Multiresolution Learning

Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling

Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups

Towards Tuning-Free Minimum-Volume Nonnegative Matrix Factorization

Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense

Federated Deep Multi-View Clustering with Global Self-Supervision

Performance Evaluation of Equal-Weight Portfolio and Optimum Risk Portfolio on Indian Stocks

Regularization and Optimal Multiclass Learning

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

Topology-Agnostic Detection of Temporal Money Laundering Flows in Billion-Scale Transactions

Fantastic Generalization Measures are Nowhere to be Found

A Probabilistic Model for Data Redundancy in the Feature Domain

REWAFL: Residual Energy and Wireless Aware Participant Selection for Efficient Federated Learning over Mobile Devices

Crack-Net: Prediction of Crack Propagation in Composites

Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions

DPA-WNO: A gray box model for a class of stochastic mechanics problem

Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling

Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity

Physics Informed Neural Network Code for 2D Transient Problems (PINN-2DT) Compatible with Google Colab

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities

Data-Driven Modeling of an Unsaturated Bentonite Buffer Model Test Under High Temperatures Using an Enhanced Axisymmetric Reproducing Kernel Particle Method

2023-09-24

Autopet Challenge 2023: nnUNet-based whole-body 3D PET-CT Tumour Segmentation

Sparsity-regularized coded ptychography for robust and efficient lensless microscopy on a chip

MediViSTA-SAM: Zero-shot Medical Video Analysis with Spatio-temporal SAM Adaptation

Deep learning based workflow for accelerated industrial X-ray Computed Tomography

2023-09-24

Non-Uniform Sampling Reconstruction for Symmetrical NMR Spectroscopy by Exploiting Inherent Symmetry

6G Positioning and Sensing Through the Lens of Sustainability, Inclusiveness, and Trustworthiness

Identification of Ghost Targets for Automotive Radar in the Presence of Multipath

Sparsity-Based Channel Estimation Exploiting Deep Unrolling for Downlink Massive MIMO

2023-09-23

Attention Is All You Need For Blind Room Volume Estimation

Two vs. Four-Channel Sound Event Localization and Detection

Contrastive Speaker Embedding With Sequential Disentanglement

2023-09-23

Portrait Stylization: Artistic Style Transfer with Auxiliary Networks for Human Face Stylization

Identifying Systematic Errors in Object Detectors with the SCROD Pipeline

Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers

Edge Aware Learning for 3D Point Cloud

HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

Video Timeline Modeling For News Story Understanding

Dream the Impossible: Outlier Imagination with Diffusion Models

WS-YOLO: Weakly Supervised Yolo Network for Surgical Tool Localization in Endoscopic Videos

Dual-Reference Source-Free Active Domain Adaptation for Nasopharyngeal Carcinoma Tumor Segmentation across Multiple Hospitals

A mirror-Unet architecture for PET/CT lesion segmentation

YOLORe-IDNet: An Efficient Multi-Camera System for Person-Tracking

Cine cardiac MRI reconstruction using a convolutional recurrent network with refinement

Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers

FedDrive v2: an Analysis of the Impact of Label Skewness in Federated Semantic Segmentation for Autonomous Driving

Tackling the Incomplete Annotation Issue in Universal Lesion Detection Task By Exploratory Training

C$^2$VAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior

Gaining the Sparse Rewards by Exploring Binary Lottery Tickets in Spiking Neural Network

MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View Stereo

Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation

Automatic Reverse Engineering: Creating computer-aided design (CAD) models from multi-view images

Discwise Active Learning for LiDAR Semantic Segmentation

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

Randomize to Generalize: Domain Randomization for Runway FOD Detection

Order-preserving Consistency Regularization for Domain Adaptation and Generalization

RTrack: Accelerating Convergence for Visual Object Tracking via Pseudo-Boxes Exploration

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

Multi-modal Domain Adaptation for REG via Relation Transfer

RBFormer: Improve Adversarial Robustness of Transformer by Robust Bias

NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation

Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation

M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Real3D-AD: A Dataset of Point Cloud Anomaly Detection

2023-09-23

Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy

Enhancing Prediction and Analysis of UK Road Traffic Accident Severity Using AI: Integration of Machine Learning, Econometric Techniques, and Time Series Forecasting in Public Health Research

Personalised and Adjustable Interval Type-2 Fuzzy-Based PPG Quality Assessment for the Edge

A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

EMGTFNet: Fuzzy Vision Transformer to decode Upperlimb sEMG signals for Hand Gestures Recognition

AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming