2023-07-05

cs.LG

cs.LG - 2023-07-05

Machine learning at the mesoscale: a computation-dissipation bottleneck

paper_url: http://arxiv.org/abs/2307.02379
repo_url: None
paper_authors: Alessandro Ingrosso, Emanuele Panizon
for: 该论文旨在探讨物理系统中信息处理成本的兼顾问题，即性能和能getic费用之间的trade-off。
methods: 该论文采用了 computation-dissipation bottleneck frameworks，通过使用实际数据集和 sintetic任务，证明了不对称互动的存在导致了性能提高。
results: 该论文的研究结果表明，在输入输出设备中，非平衡情况会导致信息压缩、计算输入输出和动力不逆转行为的均衡。

Abstract
The cost of information processing in physical systems calls for a trade-off between performance and energetic expenditure. Here we formulate and study a computation-dissipation bottleneck in mesoscopic systems used as input-output devices. Using both real datasets and synthetic tasks, we show how non-equilibrium leads to enhanced performance. Our framework sheds light on a crucial compromise between information compression, input-output computation and dynamic irreversibility induced by non-reciprocal interactions.

摘要
信息处理成本在物理系统中需要一种权衡 между性能和能getic耗用。我们在 mesoscopic 系统中作为输入输出设备形式ulated computation-dissipation bottleneck，并使用实际数据和synthetic任务来表征非平衡导致性能增强。我们的框架揭示了一种关键的信息压缩、输入输出计算和动力不可逆性引起的妥协。

Continuum Limits of Ollivier’s Ricci Curvature on data clouds: pointwise consistency and global lower bounds

paper_url: http://arxiv.org/abs/2307.02378
repo_url: None
paper_authors: Nicolas Garcia Trillos, Melanie Weber
for: 研究了一个低维抽象 $\mathcal{M} \subseteq \mathbb{R}^d$ 上的Random geometric graph的 curvature与抽象 $\mathcal{M}$ 的曲率之间的关系，通过维度下的连续假设。
methods: 使用了Ollivier的离散 Ricci curvature的连续假设，并证明了点性不变的consistency结果，以及如果 $\mathcal{M}$ 有下界为正数的 Ricci curvature，那么Random geometric graph将高概率拥有这种全球结构性。
results: 显示了热核辐射在图上的 contraction 性和抽象 $\mathcal{M}$ 的曲率之间的关系，以及从数据云中学习抽象 $\mathcal{M}$ 的应用。特别是，证明了consistency结果可以用来描述抽象 $\mathcal{M}$ 的内在曲率从外在曲率中。

Abstract
Let $\mathcal{M} \subseteq \mathbb{R}^d$ denote a low-dimensional manifold and let $\mathcal{X}= \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $\mathcal{M}$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that the consistency results allow for characterizing the intrinsic curvature of a manifold from extrinsic curvature.

摘要
Let $\mathcal{M} \subseteq \mathbb{R}^d$ be a low-dimensional manifold, and let $\mathcal{X} = \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $\mathcal{M}$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that the consistency results allow for characterizing the intrinsic curvature of a manifold from extrinsic curvature.Here is the translation in Traditional Chinese:Let $\mathcal{M} \subseteq \mathbb{R}^d$ be a low-dimensional manifold, and let $\mathcal{X} = \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $\mathcal{M}$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that the consistency results allow for characterizing the intrinsic curvature of a manifold from extrinsic curvature.

Distance Preserving Machine Learning for Uncertainty Aware Accelerator Capacitance Predictions

paper_url: http://arxiv.org/abs/2307.02367
repo_url: None
paper_authors: Steven Goldenberg, Malachi Schram, Kishansingh Rajput, Thomas Britton, Chris Pappas, Dan Lu, Jared Walden, Majdi I. Radaideh, Sarah Cousineau, Sudarshan Harave
for: 这个论文的目的是提供一种可靠的机器学习模型，尤其是在安全敏感的应用中，如加速器系统。
methods: 这个论文使用了深度神经网络和 Gaussian process 近似技术，并比较了两种不同的特征提取器，包括单元值分解和spectral-normalized dense layer。
results: 这个模型可以实现较好的距离保持和内部容易预测，并且预测在正常分布中的电容量值几乎没有误差（少于1%）。

Abstract
Providing accurate uncertainty estimations is essential for producing reliable machine learning models, especially in safety-critical applications such as accelerator systems. Gaussian process models are generally regarded as the gold standard method for this task, but they can struggle with large, high-dimensional datasets. Combining deep neural networks with Gaussian process approximation techniques have shown promising results, but dimensionality reduction through standard deep neural network layers is not guaranteed to maintain the distance information necessary for Gaussian process models. We build on previous work by comparing the use of the singular value decomposition against a spectral-normalized dense layer as a feature extractor for a deep neural Gaussian process approximation model and apply it to a capacitance prediction problem for the High Voltage Converter Modulators in the Oak Ridge Spallation Neutron Source. Our model shows improved distance preservation and predicts in-distribution capacitance values with less than 1% error.

摘要
提供准确的不确定性估计是机器学习模型生成可靠性的关键，特别是在安全关键应用中，如加速器系统。高斯过程模型通常被视为金标准方法，但它们可能在大量、高维度数据集上表现不佳。将深度神经网络与高斯过程approximation技术结合使用已经显示出了有希望的结果，但通过标准的深度神经网络层进行维度减少并不一定能保持高斯过程模型所需的距离信息。我们基于先前的工作，比较使用对快速特征EXTRACTOR的singular value decomposition和spectral-normalized dense layer作为深度神经网络 Gaussian process approximation模型的特征提取器，并应用于高电压转换模ulators在Oak Ridge Spallation Neutron Source中的电容量预测问题。我们的模型表现出了改善的距离保持和预测在distribution中的电容量值，误差低于1%。

Scaling Laws Do Not Scale

paper_url: http://arxiv.org/abs/2307.03201
repo_url: https://github.com/MarkipTheMudkip/in-class-project-2
paper_authors: Fernando Diaz, Michael Madaio
for: This paper argues that the performance of large AI models may not continue to improve as datasets get larger, as different communities represented in the dataset may have values or preferences not captured by the metrics used to evaluate model performance.methods: The paper highlights the potential risks of using scaling laws to evaluate the performance of AI models, as these laws overlook the possibility that different communities may have different values or preferences.results: The paper suggests that as datasets used to train large AI models grow, the number of distinct communities included in the dataset is likely to increase, and these communities may have different values or preferences that are not captured by the metrics used to evaluate model performance.

Abstract
Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.

摘要
近期的研究已经提出了一种力学关系，称为“扩大法律”，表明人工智能（AI）模型的性能和模型设计参数之间存在线性关系。即随着数据集大小（或模型参数等）的增加，使用该数据集训练的模型的性能会随之增加。然而，这个扩大法律关系忽视了评估模型性能的指标可能是不安定的、争议的或者不符合不同群体的评价标准。在这篇论文中，我们 argueThat as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

Decentralized Data Governance as Part of a Data Mesh Platform: Concepts and Approaches

paper_url: http://arxiv.org/abs/2307.02357
repo_url: None
paper_authors: Arif Wider, Sumedha Verma, Atif Akhtar
for: 本文旨在提供一个概念模型，用于解决数据网络（Data Mesh）的数据管理问题。
methods: 本文使用自动化技术，以提供一个自助数据基础设施平台，以便有效管理数据网络。
results: 本文提出了一种数据网络管理模型，并讨论了如何通过平台方式实现数据管理的自治。

Abstract
Data mesh is a socio-technical approach to decentralized analytics data management. To manage this decentralization efficiently, data mesh relies on automation provided by a self-service data infrastructure platform. A key aspect of this platform is to enable decentralized data governance. Because data mesh is a young approach, there is a lack of coherence in how data mesh concepts are interpreted in the industry, and almost no work on how a data mesh platform facilitates governance. This paper presents a conceptual model of key data mesh concepts and discusses different approaches to drive governance through platform means. The insights presented are drawn from concrete experiences of implementing a fully-functional data mesh platform that can be used as a reference on how to approach data mesh platform development.

摘要
“数据网”是一种社会技术方法，用于分布式数据分析数据管理。为了有效地管理这种分布，数据网利用自助数据基础设施平台的自动化。该平台的一个关键方面是实现分布式数据管理。由于数据网是一种新的方法，因此在行业中对数据网概念的解释存在一定的不一致，而且对于平台如何通过平台手段进行管理的研究几乎缺乏。本文提出了一个概念模型，描述了不同的数据网平台管理方法，并从实际实施了一个可用的数据网平台 refer to as a reference for how to approach data mesh platform development.Here's a word-for-word translation of the text:“数据网”是一种社会技术方法，用于分布式数据分析数据管理。为了有效地管理这种分布，数据网利用自助数据基础设施平台的自动化。该平台的一个关键方面是实现分布式数据管理。由于数据网是一种新的方法，因此在行业中对数据网概念的解释存在一定的不一致，而且对于平台如何通过平台手段进行管理的研究几乎缺乏。本文提出了一个概念模型，描述了不同的数据网平台管理方法，并从实际实施了一个可用的数据网平台 refer to as a reference for how to approach data mesh platform development.

LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.02345
repo_url: None
paper_authors: Outongyi Lv, Bingxin Zhou, Yu Guang Wang
for: 这个研究的目的是分析在线和离线RL中bellman错误的分布特性。
methods: 这个研究使用了分布统计学方法来分析bellman错误的分布特性，并基于这些分布特性来改进MSELoss和LLoss两种损失函数。
results: 研究发现在线环境中bellman错误遵循Logistic分布，而离线环境中bellman错误遵循受限的Logistic分布，这个受限分布与离线数据集中的先前策略相关。这些发现导致了改进MSELoss的假设，并使用Logistic最大 LIKElihood函数来构建LLoss作为替代损失函数。此外，研究还发现在离线数据集中，奖励应该遵循特定的分布，以便实现离线目标。在实验中，研究对Soft-Actor-Critic的两种变体在线和离线环境中进行了控制变量修正。结果证实了我们在线和离线设定中的假设，并发现LLoss的方差小于MSELoss。

Abstract
Currently, research on Reinforcement learning (RL) can be broadly classified into two categories: online RL and offline RL. Both in online and offline RL, the primary focus of research on the Bellman error lies in the optimization techniques and performance improvement, rather than exploring the inherent structural properties of the Bellman error, such as distribution characteristics. In this study, we analyze the distribution of the Bellman approximation error in both online and offline settings. We find that in the online environment, the Bellman error follows a Logistic distribution, while in the offline environment, the Bellman error follows a constrained Logistic distribution, where the constrained distribution is dependent on the prior policy in the offline data set. Based on this finding, we have improved the MSELoss which is based on the assumption that the Bellman errors follow a normal distribution, and we utilized the Logistic maximum likelihood function to construct $\rm LLoss$ as an alternative loss function. In addition, we observed that the rewards in the offline data set should follow a specific distribution, which would facilitate the achievement of offline objectives. In our numerical experiments, we performed controlled variable corrections on the loss functions of two variants of Soft-Actor-Critic in both online and offline environments. The results confirmed our hypothesis regarding the online and offline settings, we also found that the variance of LLoss is smaller than MSELoss. Our research provides valuable insights for further investigations based on the distribution of Bellman errors.

摘要
Translated into Simplified Chinese:现在， reinforcement learning（RL）研究可以分为两类：在线RL和离线RL。在线和离线RL中，bellman错误的主要研究方向都是优化技术和性能提高，而不是探索bellman错误的内在结构特性，如分布特性。在这项研究中，我们分析了在线和离线设置中bellman预测错误的分布。我们发现在线环境中，bellman错误遵循Logistic分布，而在离线环境中，bellman错误遵循受限Logistic分布，其受限分布取决于离线数据集中的先前策略。根据这一发现，我们改进了基于bellman错误预测的MSE损失，并利用Logistic最大likely函数来构建LLoss作为替代损失函数。此外，我们发现离线数据集中的奖励应该遵循特定的分布，以便实现离线目标。在我们的数值实验中，我们对Soft-Actor-Critic两种变体的损失函数进行了在线和离线环境中的变量修正。结果证明了我们在线和离线设置中的假设，并发现LLoss的方差比MSE损失更小。我们的研究提供了有价值的发现，用于进一步基于bellman错误分布的调查。

FAM: Relative Flatness Aware Minimization

paper_url: http://arxiv.org/abs/2307.02337
repo_url: https://github.com/kampmichael/RelativeFlatnessAndGeneralization
paper_authors: Linara Adilova, Amr Abourayya, Jianning Li, Amin Dada, Henning Petzka, Jan Egger, Jens Kleesiek, Michael Kamp
for: 提高模型的泛化能力
methods: 使用 relative flatness 度量，并通过一个简单的正则化项来优化
results: 在多种应用和模型中，FAM 可以提高模型的泛化性能，并且可以在标准训练和finetuning中使用

Abstract
Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more recent successful sharpness-aware optimization techniques. Their widespread adoption in practice, though, is dubious because of the lack of theoretically grounded connection between flatness and generalization, in particular in light of the reparameterization curse - certain reparameterizations of a neural network change most flatness measures but do not change generalization. Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization and solves the reparameterization curse. In this paper, we derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with arbitrary loss functions. It requires computing the Hessian only of a single layer of the network, which makes it applicable to large neural networks, and with it avoids an expensive mapping of the loss surface in the vicinity of the model. In an extensive empirical evaluation we show that this relative flatness aware minimization (FAM) improves generalization in a multitude of applications and models, both in finetuning and standard training. We make the code available at github.

摘要
几乎所有的模型都会在某些特定的输入上 exhibit 非常好的特性，但这并不意味着它们会在整体上具有更好的泛化能力。在1994年，豪克雷特和Schmid哈姆布尔首先提出了优化flatness的想法，然后是更加成功的锐度感知优化技术。尽管它们在实践中的普及度不高，因为没有理论上的基础连接flatness和泛化，尤其是在考虑到映射渐尘的问题。最近的理论工作表明，一种特定的相对flatness度量可以与泛化相关，并且解决了映射渐尘的问题。在这篇论文中，我们 derivates一种基于这种相对flatness度量的正则化器，它容易计算、快速、高效、可以与任何损失函数结合使用。它只需计算单个层的梯度，因此可以应用于大型神经网络，而且不需要在损失函数的附近进行昂贵的映射。在广泛的实验中，我们证明了这种相对flatness感知优化（FAM）可以提高多种应用和模型的泛化能力，包括finetuning和标准训练。我们在github上提供了代码。

Data-driven Predictive Latency for 5G: A Theoretical and Experimental Analysis Using Network Measurements

paper_url: http://arxiv.org/abs/2307.02329
repo_url: None
paper_authors: Marco Skocaj, Francesca Conserva, Nicol Sarcone Grande, Andrea Orsi, Davide Micheli, Giorgio Ghinamo, Simone Bizzarri, Roberto Verdone
for: This paper is written to analyze predictive latency within 5G networks using real-world network data.
methods: The paper uses an analytical formulation of user-plane latency as a Hypoexponential distribution, and conducts experimental results of probabilistic regression, anomaly detection, and predictive forecasting using Machine Learning (ML) techniques such as Bayesian Learning (BL) and Machine Learning on Graphs (GML).
results: The paper provides valuable insights into the efficacy of predictive algorithms in practical applications, and validates the proposed framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events.

Abstract
The advent of novel 5G services and applications with binding latency requirements and guaranteed Quality of Service (QoS) hastened the need to incorporate autonomous and proactive decision-making in network management procedures. The objective of our study is to provide a thorough analysis of predictive latency within 5G networks by utilizing real-world network data that is accessible to mobile network operators (MNOs). In particular, (i) we present an analytical formulation of the user-plane latency as a Hypoexponential distribution, which is validated by means of a comparative analysis with empirical measurements, and (ii) we conduct experimental results of probabilistic regression, anomaly detection, and predictive forecasting leveraging on emerging domains in Machine Learning (ML), such as Bayesian Learning (BL) and Machine Learning on Graphs (GML). We test our predictive framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events. Our results provide valuable insights into the efficacy of predictive algorithms in practical applications.

摘要
五代新服务和应用程序的紧耦合延迟和质量服务（QoS）的出现加剧了网络管理过程中的自主和积极决策的需求。我们的研究目标是对5G网络中的预测延迟进行全面分析，并使用可达到移动网络运营商（MNOs）的实际网络数据进行验证。具体来说，我们：(i) 提出了用户层延迟的分布式 Hypoexponential 分布，并通过对实际测量数据进行比较分析来验证其有效性。(ii) 通过使用 emerging 领域的机器学习（ML）技术，如 bayesian 学习（BL）和机器学习在图上（GML），进行可预测性的回归、异常检测和预测预测，并对各种场景进行测试，包括交通堵塞、都市化交通和社交聚会等。我们的结果提供了实用应用中预测算法的有用信息。

Exploring new ways: Enforcing representational dissimilarity to learn new features and reduce error consistency

paper_url: http://arxiv.org/abs/2307.02516
repo_url: None
paper_authors: Tassilo Wald, Constantin Ulrich, Fabian Isensee, David Zimmerer, Gregor Koehler, Michael Baumgartner, Klaus H. Maier-Hein
for: 提高模型 ensemble 的准确率
methods: 利用 representational similarity field 方法来促进模型之间的不同性 durante 训练
results: 实现了更高的 ensemble 准确率，并且Output predictions 之间更加不相关，从而降低了模型 ensemble 的共同失败模式

Abstract
Independently trained machine learning models tend to learn similar features. Given an ensemble of independently trained models, this results in correlated predictions and common failure modes. Previous attempts focusing on decorrelation of output predictions or logits yielded mixed results, particularly due to their reduction in model accuracy caused by conflicting optimization objectives. In this paper, we propose the novel idea of utilizing methods of the representational similarity field to promote dissimilarity during training instead of measuring similarity of trained models. To this end, we promote intermediate representations to be dissimilar at different depths between architectures, with the goal of learning robust ensembles with disjoint failure modes. We show that highly dissimilar intermediate representations result in less correlated output predictions and slightly lower error consistency, resulting in higher ensemble accuracy. With this, we shine first light on the connection between intermediate representations and their impact on the output predictions.

摘要
设置语言为简化中文。<>独立训练的机器学习模型通常会学习类似的特征。给出一个独立训练的模型集合，这会导致相互相关的预测和共同失败模式。过去关注输出预测或搜索阶段的减 corr 的尝试已经获得了杂合的结果，特别是因为它们在优化目标之间发生了矛盾。在这篇论文中，我们提出了一种新的想法，利用表示相似场来促进训练期间的不同性。为此，我们将不同深度的建筑物 promoted 到不同的中间表示，以达到学习强大的集成模型，并且学习不同的失败模式。我们发现，在不同的中间表示下，输出预测之间的相互关系较强，并且 ensemble 精度较高。我们首次探讨了中间表示如何影响输出预测的问题。

LOB-Based Deep Learning Models for Stock Price Trend Prediction: A Benchmark Study

paper_url: http://arxiv.org/abs/2308.01915
repo_url: None
paper_authors: Matteo Prata, Giuseppe Masi, Leonardo Berti, Viviana Arrigoni, Andrea Coletta, Irene Cannistraci, Svitlana Vyetrenko, Paola Velardi, Novella Bartolini
for: 股票价格趋势预测（SPTP）基于限价书（LOB）数据的深度学习模型的可靠性和泛化性研究。
methods: 我们开发了一个名为LOBCAST的开源框架，它包括数据准备、深度学习模型训练、评估和收益分析。
results: 我们的广泛实验发现所有模型在新数据上表现糟糕，这引发了市场应用性的问题。我们的工作作为一个标准，暴露了当前方法的潜在和局限性，并提供了创新解决方案的视野。

Abstract
The recent advancements in Deep Learning (DL) research have notably influenced the finance sector. We examine the robustness and generalizability of fifteen state-of-the-art DL models focusing on Stock Price Trend Prediction (SPTP) based on Limit Order Book (LOB) data. To carry out this study, we developed LOBCAST, an open-source framework that incorporates data preprocessing, DL model training, evaluation and profit analysis. Our extensive experiments reveal that all models exhibit a significant performance drop when exposed to new data, thereby raising questions about their real-world market applicability. Our work serves as a benchmark, illuminating the potential and the limitations of current approaches and providing insight for innovative solutions.

摘要
现代深度学习（DL）研究的进步已经很大程度上影响了金融领域。我们对十五种当前最佳DL模型进行了评估，以便对使用Limit Order Book（LOB）数据进行股票价格趋势预测（SPTP）。为了实现这项研究，我们开发了一个名为LOBCAST的开源框架，该框架包括数据处理、DL模型训练、评估和利润分析。我们的广泛实验表明，所有模型在新数据上表现出了显著的性能下降，这引发了market应用实际性的问题。我们的工作作为一个 referential，探讨了当前approach的潜力和局限性，并提供了创新解决方案的思路。

Deep Contract Design via Discontinuous Piecewise Affine Neural Networks

paper_url: http://arxiv.org/abs/2307.02318
repo_url: None
paper_authors: Tonghan Wang, Paul Dütting, Dmitry Ivanov, Inbal Talgam-Cohen, David C. Parkes
for: 这篇论文是为了研究深度学习在合同设计中的应用。
methods: 这篇论文使用了深度学习来自动设计优化的合同。它使用了一种新的表示方法——Discontinuous ReLU（DeLU）网络，来模型主体的利益函数。
results: 实验结果表明，使用DeLU网络可以 успеш地近似主体的利益函数，并且可以使用少量的训练样本和内部点法来解决优化问题。

Abstract
Contract design involves a principal who establishes contractual agreements about payments for outcomes that arise from the actions of an agent. In this paper, we initiate the study of deep learning for the automated design of optimal contracts. We formulate this as an offline learning problem, where a deep network is used to represent the principal's expected utility as a function of the design of a contract. We introduce a novel representation: the Discontinuous ReLU (DeLU) network, which models the principal's utility as a discontinuous piecewise affine function where each piece corresponds to the agent taking a particular action. DeLU networks implicitly learn closed-form expressions for the incentive compatibility constraints of the agent and the utility maximization objective of the principal, and support parallel inference on each piece through linear programming or interior-point methods that solve for optimal contracts. We provide empirical results that demonstrate success in approximating the principal's utility function with a small number of training samples and scaling to find approximately optimal contracts on problems with a large number of actions and outcomes.

摘要
（本文研究了深度学习在合同设计中的应用，具体来说是设计优化的合同。我们将这视为一个离线学习问题，其中深度网络用于表示主人的预期用处函数，并且引入了一种新的表示方法：离散ReLU（DeLU）网络。DeLU网络模型了主人的用处函数为离散的piecewise afine函数，每个piece对应代理人行动的不同。DeLU网络隐式学习了代理人的奖励一致约束和主人的用处最大化目标，并且支持并行推理每个piece的linear programming或内部点法解决优化合同。我们提供了实验结果，证明了使用少量训练样本和扩展可以successfully approximate主人的用处函数，并且可以扩展到解决包含大量行动和结果的问题）。

Sumformer: Universal Approximation for Efficient Transformers

paper_url: http://arxiv.org/abs/2307.02301
repo_url: None
paper_authors: Silas Alberti, Niclas Dern, Laura Thesing, Gitta Kutyniok
for: 这篇论文主要针对的是提出一种新的嵌入式序列处理模型，以解决Transformer模型的时间和空间复杂度问题。
methods: 该论文使用了Linformer和Performer模型，并通过一种新的概念——Sumformer来实现对这些模型的 universally approximating。
results: 该论文通过实验和理论分析表明，Sumformer模型可以成功地实现对Linformer和Performer模型的universal approximation，并且提供了一个新的证明，证明Transformer模型只需要一层注意力层可以实现universal approximation。

Abstract
Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation.

摘要
自然语言处理（NLP）在Transformers的出现后得到了非常出色的进步。ChatGPT是其中最著名的例子，对外部研究者的认知产生了深刻的影响。然而，Transformers的时间和空间复杂度随序列长度的增加平方速度增长，对于处理长序列存在重大的限制。虽然有效的Transformers架构如Linformer和Performer已经出现，但它们的理论理解仍然受限。在这篇论文中，我们介绍了Sumformer，一种新的简单架构，可以通用地近似Equivariant sequence-to-sequence函数。我们使用Sumformer来给Linformer和Performer提供首个通用近似结果。此外，我们还 deriv了一个新的证明，显示Transformers只需一层注意力层就能universal approximation。

Improving Address Matching using Siamese Transformer Networks

paper_url: http://arxiv.org/abs/2307.02300
repo_url: https://github.com/avduarte333/adress-matching
paper_authors: André V. Duarte, Arlindo L. Oliveira
for: 提高葡萄牙地址匹配效率，降低错误交付的风险
methods: 使用深度学习模型，包括bi-encoder和cross-encoder，对葡萄牙邮政地址进行嵌入表示，并进行高精度排名
results: 测试用例显示，模型在葡萄牙邮政地址上达到了95%以上的高精度率，并且使用GPU计算可以提高推理速度，比传统方法快4.5倍

Abstract
Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company's reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is fine-tuned to create meaningful embeddings of Portuguese postal addresses, utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately rerank the 10 addresses obtained by the bi-encoder. The model has been tested on a real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.

摘要
企业和邮政机构在包裹处理和交付过程中，匹配地址是一项非常重要的任务。 incorrect 地址交付可能会对公司的名誉产生负面影响，并且可能会导致经济和环境损失。这项研究推出了一种基于深度学习的地址匹配模型，用于提高葡萄牙地址的匹配效率。该模型包括两部分：1. 双编码器，通过细化葡萄牙邮政地址来生成有意义的嵌入。使用这些嵌入来从normalized数据库中检索最有可能性的10个目标地址。2. 混合编码器，通过细化10个从双编码器获取的地址来准确地排名它们。这个模型在葡萄牙地址的实际案例中进行测试，表现出了高度的准确率，超过95%的门槛。在使用GPU计算时，推理速度比传统方法 such as BM25 快得多，约为4.5倍。在实际应用场景中，这种系统的实施将有效地提高物流过程的效率。现在正在进行实际应用研究。

Meta-Learning Adversarial Bandit Algorithms

paper_url: http://arxiv.org/abs/2307.02295
repo_url: None
paper_authors: Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan, Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu
for: 本研究旨在提高多个任务的性能，假设这些任务之间存在自然的相似度测试。
methods: 我们设计了meta算法，用于同时调整内部学习器的初始化和其他超参数。对于多抢拍机（MAB）和bandit线性优化（BLO）两个重要情况，我们设计了外部学习器来同时调整初始化和超参数。
results: 我们证明了在不Regularization的follow-the-leader combinated with two levels of low-dimensional hyperparameter tuning是可以学习一个序列的Affine函数， bounding the regret of online mirror descent（OMD）。

Abstract
We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action space-dependent measure they induce. Our guarantees rely on proving that unregularized follow-the-leader combined with two levels of low-dimensional hyperparameter tuning is enough to learn a sequence of affine functions of non-Lipschitz and sometimes non-convex Bregman divergences bounding the regret of OMD.

摘要
我们研究在线元学习，使得在多个任务之间提高性能，如果这些任务具有自然的相似度度量。作为首先针对在线内部部分信息设定下进行反恶力学学习的研究，我们设计了元学习算法，将外层学习器与内层学习器的初始化和其他超参数同时调整。对于多枪炮（MAB）和反馈函数优化（BLO）两个重要的案例，我们的元学习算法都可以达到良好的性能。对于 MAB，我们使用 Tsallis-Entropy 泛化 Exp3，并证明在任务平均误差小时，元学习器可以提高性能。对于 BLO，我们学习在线镜像下降（OMD）的初始化和调整，并证明任务平均误差与行动空间依赖的度量直接相关。我们的保证基于证明，不带权重的跟随者，加上两级低维度超参数调整，可以学习一个 sequence of affine function， bounding OMD 的误差。

Absorbing Phase Transitions in Artificial Deep Neural Networks

paper_url: http://arxiv.org/abs/2307.02284
repo_url: None
paper_authors: Keiichi Tamai, Tsuyoshi Okubo, Truong Vinh Truong Duy, Naotake Natori, Synge Todo
for: 了解抽象神经网络的行为
methods: 使用 mean-field theory 和 absorbing phase transitions 的研究方法
results: 显示了抽象神经网络的行为可以通过critical phenomena来理解，并且 Architecture 的不同会影响transition的类型。In simpler Chinese:
for: 了解神经网络的行为
methods: 使用 mean-field theory 和 absorbing phase transitions 的方法
results: 发现神经网络的行为可以通过critical phenomena来理解，而且不同的 Architecture 会影响transition的类型。

Abstract
Theoretical understanding of the behavior of infinitely-wide neural networks has been rapidly developed for various architectures due to the celebrated mean-field theory. However, there is a lack of a clear, intuitive framework for extending our understanding to finite networks that are of more practical and realistic importance. In the present contribution, we demonstrate that the behavior of properly initialized neural networks can be understood in terms of universal critical phenomena in absorbing phase transitions. More specifically, we study the order-to-chaos transition in the fully-connected feedforward neural networks and the convolutional ones to show that (i) there is a well-defined transition from the ordered state to the chaotics state even for the finite networks, and (ii) difference in architecture is reflected in that of the universality class of the transition. Remarkably, the finite-size scaling can also be successfully applied, indicating that intuitive phenomenological argument could lead us to semi-quantitative description of the signal propagation dynamics.

摘要
theoretical understanding of the behavior of infinitely-wide neural networks has been rapidly developed for various architectures due to the celebrated mean-field theory. However, there is a lack of a clear, intuitive framework for extending our understanding to finite networks that are of more practical and realistic importance. In the present contribution, we demonstrate that the behavior of properly initialized neural networks can be understood in terms of universal critical phenomena in absorbing phase transitions. More specifically, we study the order-to-chaos transition in the fully-connected feedforward neural networks and the convolutional ones to show that (i) there is a well-defined transition from the ordered state to the chaotics state even for the finite networks, and (ii) difference in architecture is reflected in that of the universality class of the transition. Remarkably, the finite-size scaling can also be successfully applied, indicating that intuitive phenomenological argument could lead us to semi-quantitative description of the signal propagation dynamics.Here's the translation breakdown:* theoretically understanding (理论理解) = 理论理解 (Simplified Chinese)* has been rapidly developed (has been rapidly developed) = 快速发展 (Simplified Chinese)* for various architectures (for various architectures) = для多种架构 (Simplified Chinese)* due to the celebrated mean-field theory (due to the celebrated mean-field theory) = 因为著名的平均场理论 (Simplified Chinese)* However, there is a lack of (However, there is a lack of) = 然而，缺乏 (Simplified Chinese)* a clear, intuitive framework (a clear, intuitive framework) = 一个明确、直观的框架 (Simplified Chinese)* for extending our understanding (for extending our understanding) = 以便扩展我们的理解 (Simplified Chinese)* to finite networks (to finite networks) = 到有限网络 (Simplified Chinese)* that are of more practical and realistic importance (that are of more practical and realistic importance) = 更加实用和现实重要的网络 (Simplified Chinese)* In the present contribution (In the present contribution) = 在当前的贡献中 (Simplified Chinese)* we demonstrate (we demonstrate) = 我们展示 (Simplified Chinese)* that the behavior (that the behavior) = 网络的行为 (Simplified Chinese)* of properly initialized neural networks (of properly initialized neural networks) = 初始化过的神经网络的行为 (Simplified Chinese)* can be understood (can be understood) = 可以理解 (Simplified Chinese)* in terms of universal critical phenomena (in terms of universal critical phenomena) = 以 Kritical 现象的形式来理解 (Simplified Chinese)* in absorbing phase transitions (in absorbing phase transitions) = 在吸收相对 equilibria 阶段中 (Simplified Chinese)* More specifically (More specifically) = 更加细致 (Simplified Chinese)* we study (we study) = 我们研究 (Simplified Chinese)* the order-to-chaos transition (the order-to-chaos transition) = 从有序到无序的转变 (Simplified Chinese)* in the fully-connected feedforward neural networks (in the fully-connected feedforward neural networks) = 在完全连接的前向神经网络中 (Simplified Chinese)* and the convolutional ones (and the convolutional ones) = 以及卷积神经网络 (Simplified Chinese)* to show (to show) = 以示 (Simplified Chinese)* that (i) there is a well-defined transition (that (i) there is a well-defined transition) = 显示 (i) 存在一个明确的转变 (Simplified Chinese)* from the ordered state (from the ordered state) = 从有序状态 (Simplified Chinese)* to the chaotics state (to the chaotics state) = 到无序状态 (Simplified Chinese)* even for the finite networks (even for the finite networks) = 甚至 для有限网络 (Simplified Chinese)* and (ii) difference in architecture (and (ii) difference in architecture) = 以及 (ii) 不同的架构 (Simplified Chinese)* is reflected in that of the universality class (is reflected in that of the universality class) = 在不同的架构中反映出的 универсалитет 类型 (Simplified Chinese)* Remarkably (Remarkably) = 很有趣 (Simplified Chinese)* the finite-size scaling (the finite-size scaling) = 有限大小缩放 (Simplified Chinese)* can also be successfully applied (can also be successfully applied) = 也可以成功应用 (Simplified Chinese)* indicating (indicating) = 表明 (Simplified Chinese)* that (that) = 的 (Simplified Chinese)* intuitive phenomenological argument (intuitive phenomenological argument) = 直观的现象学 argue (Simplified Chinese)* could lead us to (could lead us to) = 可以导我们到 (Simplified Chinese)* semi-quantitative description (semi-quantitative description) = 半量化的描述 (Simplified Chinese)* of the signal propagation dynamics (of the signal propagation dynamics) = 信号传播动态的描述 (Simplified Chinese)

From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks

paper_url: http://arxiv.org/abs/2307.02279
repo_url: None
paper_authors: Cristina Cipriani, Massimo Fornasier, Alessandro Scagliotti
for: This paper aims to extend the mean-field control framework for continuous-time Autoencoders (AutoencODEs) to handle low Tikhonov regularization and potentially non-convex cost landscapes.
methods: The paper proposes a modification of the controlled field in the AutoencODE to enable the extension of the mean-field control framework, and develops a training method tailored to this specific type of Autoencoders with residual connections.
results: The paper shows that many of the global results obtained for high Tikhonov regularization can be recovered in regions where the loss function is locally convex, and validates the approach through numerical experiments conducted on various examples.

Abstract
The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularization, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularization may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.

摘要
<>将神经网络（ResNets）与连续时间控制系统（NeurODEs）的连接，导致了神经网络的数学分析，并且提供了有趣的理论和实践 significado。然而，由于NeurODEs的构造，它们只适用于描述常数宽的层，因此不适用于模型深度学习架构中的层。在这篇论文中，我们提出了一种基于修改控制场的连续时间自编码器，我们称之为AutoencODE。这种修改允许我们将mean-field控制框架原来设计 для传统的NeurODEs应用到这种新的Autoencoder中。在这个设定下，我们研究了低Tikhonov正则化的情况，导致的可能是非对称的成本地图。虽然高Tikhonov正则化的全局结果可能不会全球适用，但我们表明了在成本地图的局部几何上，许多全局结果可以被恢复。受到我们的理论发现的启发，我们开发了针对这种特殊类型的Autoencoders的训练方法，并通过对各种示例进行数学实验验证了我们的方法。

First-Explore, then Exploit: Meta-Learning Intelligent Exploration

paper_url: http://arxiv.org/abs/2307.02276
repo_url: https://github.com/btnorman/First-Explore
paper_authors: Ben Norman, Jeff Clune
For: The paper aims to address the issue of intelligent exploration in reinforcement learning (RL) agents, which have been limited by the conflict between exploration and exploitation.* Methods: The proposed First-Explore framework consists of two policies: one for exploration and one for exploitation. The explore policy learns to explore the environment, while the exploit policy learns to exploit the learned knowledge.* Results: The paper demonstrates that First-Explore can learn intelligent exploration strategies such as exhaustive search and outperforms dominant standard RL and meta-RL approaches on domains where exploration requires sacrificing reward.

Abstract
Standard reinforcement learning (RL) agents never intelligently explore like a human (i.e. by taking into account complex domain priors and previous explorations). Even the most basic intelligent exploration strategies such as exhaustive search are only inefficiently or poorly approximated by approaches such as novelty search or intrinsic motivation, let alone more complicated strategies like learning new skills, climbing stairs, opening doors, or conducting experiments. This lack of intelligent exploration limits sample efficiency and prevents solving hard exploration domains. We argue a core barrier prohibiting many RL approaches from learning intelligent exploration is that the methods attempt to explore and exploit simultaneously, which harms both exploration and exploitation as the goals often conflict. We propose a novel meta-RL framework (First-Explore) with two policies: one policy learns to only explore and one policy learns to only exploit. Once trained, we can then explore with the explore policy, for as long as desired, and then exploit based on all the information gained during exploration. This approach avoids the conflict of trying to do both exploration and exploitation at once. We demonstrate that First-Explore can learn intelligent exploration strategies such as exhaustive search and more, and that it outperforms dominant standard RL and meta-RL approaches on domains where exploration requires sacrificing reward. First-Explore is a significant step towards creating meta-RL algorithms capable of learning human-level exploration which is essential to solve challenging unseen hard-exploration domains.

摘要
标准强化学习（RL）代理没有人类智能的探索能力（即考虑复杂领域假设和前一次探索）。 même 最基本的智能探索策略，如探索所有可能性，都是通过方法如新鲜度搜索或内在动机来不够或不准确地 aproximated。这种缺乏智能探索限制了样本效率，阻碍解决困难探索领域。我们认为许多RL方法无法学习智能探索的核心障碍在于这些方法尝试同时探索和利用，这两个目标经常矛盾。我们提出了一种新的元RL框架（First-Explore），其中有两个策略：一个策略学习只探索，另一个策略学习只利用。一旦训练完成，我们可以使用探索策略，探索到心仪的时间，然后根据所获得的信息进行利用。这种方法避免了同时尝试探索和利用的矛盾。我们证明First-Explore可以学习智能探索策略，例如探索所有可能性，并且超过了主流标准RL和元RL方法在需要牺牲奖励的领域中的表现。First-Explore是创造meta-RL算法可以学习人类水平的探索的重要一步，解决了许多未解之探索领域。

Convolutions Through the Lens of Tensor Networks

paper_url: http://arxiv.org/abs/2307.02275
repo_url: None
paper_authors: Felix Dangel
for: 这篇论文旨在探讨卷积神经网络（TN）如何用于理解卷积层。
methods: 该论文使用了tensor网络（TN）来理解卷积层，并通过绘制图像和执行函数转换、子tensor访问和融合来表示卷积层的含义。
results: 该论文通过对各种自动梯度计算和各种精度估计的图像表示，证明了TN的表达力。此外，该论文还提供了基于连接性模式的卷积特性转换，以便简化和加速图像计算。最后，该论文表明了使用TN实现的计算性能。

Abstract
Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the generalization of theoretical and algorithmic ideas. We provide a new perspective onto convolutions through tensor networks (TNs) which allow reasoning about the underlying tensor multiplications by drawing diagrams, and manipulating them to perform function transformations, sub-tensor access, and fusion. We demonstrate this expressive power by deriving the diagrams of various autodiff operations and popular approximations of second-order information with full hyper-parameter support, batching, channel groups, and generalization to arbitrary convolution dimensions. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to re-wire and simplify diagrams before evaluation. Finally, we probe computational performance, relying on established machinery for efficient TN contraction. Our TN implementation speeds up a recently-proposed KFAC variant up to 4.5x and enables new hardware-efficient tensor dropout for approximate backpropagation.

摘要
尽管卷积有简单的直觉，但它们在理论和算法上的总结和推广是更为复杂的。我们通过tensor网络（TN）提供了一新的视角来理解卷积，这些网络允许我们通过绘制图表和修改它们来实现函数转换、子tensor访问、混合等操作。我们通过示例逻辑来证明TN的表达力，包括自动梯度下降操作和流行的第二个信息近似算法，并支持批处理、通道组、普通的卷积维度等参数。此外，我们还提供了基于卷积连接 patrern的转换，可以重新排列和简化图表之前进行评估。最后，我们评估了TN实现的计算性能，利用现有的高效TN收缩机制。我们的TN实现可以加速一种最近提出的KFAC变种，并启用了新的硬件高效的tensor dropout来实现精确的反propagation。

Dynamical Isometry based Rigorous Fair Neural Architecture Search

paper_url: http://arxiv.org/abs/2307.02263
repo_url: None
paper_authors: Jianxiang Luo, Junyi Hu, Tianji Pang, Weihao Huang, Chuang Liu
for: 提高神经网络搜索的效率和可解释性，以及 garantizar la justicia en la evaluación de módulos。
methods: 基于动态同异ometry的新型神经网络搜索算法，使用fix point analysis方法对平均场观测随机神经网络的动态行为进行分析，并证明模块选择策略是正见的。
results: 通过对ImageNet分类任务进行广泛的实验，显示了使用提议方法可以在同等大小的神经网络中达到顶尖的top-1验证精度，并且demonstrated that our method can achieve better and more stable training performance without loss of generality。

Abstract
Recently, the weight-sharing technique has significantly speeded up the training and evaluation procedure of neural architecture search. However, most existing weight-sharing strategies are solely based on experience or observation, which makes the searching results lack interpretability and rationality. In addition, due to the negligence of fairness, current methods are prone to make misjudgments in module evaluation. To address these problems, we propose a novel neural architecture search algorithm based on dynamical isometry. We use the fix point analysis method in the mean field theory to analyze the dynamics behavior in the steady state random neural network, and how dynamic isometry guarantees the fairness of weight-sharing based NAS. Meanwhile, we prove that our module selection strategy is rigorous fair by estimating the generalization error of all modules with well-conditioned Jacobian. Extensive experiments show that, with the same size, the architecture searched by the proposed method can achieve state-of-the-art top-1 validation accuracy on ImageNet classification. In addition, we demonstrate that our method is able to achieve better and more stable training performance without loss of generality.

摘要
最近，Weight-sharing技术在神经网络搜索中提高了训练和评估过程的速度。然而，大多数现有的Weight-sharing策略都是基于经验或观察，lack of interpretability和理性性。此外，由于对公平性的忽视，当前的方法容易做出不准确的模块评估。为解决这些问题，我们提出了一种基于动态同尺的神经网络搜索算法。我们使用了 fixes point analysis方法来分析动态同尺在平均场 teor 中的动态行为，并证明了动态同尺的权重分享可以保证公平性。此外，我们证明了我们的模块选择策略是正则公平的，可以通过Jacobian的condition number来估算模块的总体适应性。实验表明，我们的方法可以在ImageNet分类任务中 achievestate-of-the-art的顶部一 validate accuracy，并且可以在不失一般性的前提下提高训练性能。

Multivariate Time Series Classification: A Deep Learning Approach

paper_url: http://arxiv.org/abs/2307.02253
repo_url: https://github.com/radrumond/timehetnet
paper_authors: Mohamed Abouelnaga, Julien Vitay, Aida Farahani
for: 本研究探讨了不同方法和神经网络架构在时间序列分类领域中的可行性。
methods: 本研究使用了 Fully Convolutional Networks (FCN) 和 Long Short-Term Memory (LSTM) дляsupervised learning，以及 Recurrent Autoencoders дляsemisupervised learning。
results: 通过分析时间序列数据，研究发现不同参数的影响，并通过精度和准确率等指标评估不同方法的 diferencias，以确定适合这种问题的方法。

Abstract
This paper investigates different methods and various neural network architectures applicable in the time series classification domain. The data is obtained from a fleet of gas sensors that measure and track quantities such as oxygen and sound. With the help of this data, we can detect events such as occupancy in a specific environment. At first, we analyze the time series data to understand the effect of different parameters, such as the sequence length, when training our models. These models employ Fully Convolutional Networks (FCN) and Long Short-Term Memory (LSTM) for supervised learning and Recurrent Autoencoders for semisupervised learning. Throughout this study, we spot the differences between these methods based on metrics such as precision and recall identifying which technique best suits this problem.

摘要
First, we analyze the time series data to understand the impact of different parameters, such as sequence length, when training our models. Our models use Fully Convolutional Networks (FCN) and Long Short-Term Memory (LSTM) for supervised learning, and Recurrent Autoencoders for semisupervised learning.Throughout the study, we compare the performance of these methods based on metrics such as precision and recall, and identify which technique is best suited for this problem.

RanPAC: Random Projections and Pre-trained Models for Continual Learning

paper_url: http://arxiv.org/abs/2307.02251
repo_url: None
paper_authors: Mark D. McDonnell, Dong Gong, Amin Parveneh, Ehsan Abbasnejad, Anton van den Hengel
for: 这个研究是为了解决在非站点数据流中进行逐步学习（Continual Learning，CL）时，不要忘记之前学习的知识。
methods: 这个研究使用了预训模型（pre-trained models），并将其应用到不同的下游需求。它们可以直接使用预训模型的特征（pre-extracted features），或者使用适材化器（adaptors）。但是，这些方法可能会导致忘记现象。因此，这个研究提出了一个简洁有效的方法，通过增加特征之间的互动，提高分类器的线性分类能力，并避免忘记现象。
results: 这个研究发现，透过将固定的Random Projector层加入预训模型的特征表现和出力头，可以增加特征之间的互动，提高分类器的线性分类能力，并避免忘记现象。此外，调整分类器的标本集也可以帮助避免分布差异导致的忘记现象。这些技术在七个类别增量学习 benchmark 测试中，与过去的方法相比，可以大幅降低最终的错误率，并且不需要使用任何复习内存。

Abstract
Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 10\% and 62\% on seven class-incremental benchmark datasets, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast continual learning has not hitherto been fully tapped.

摘要
In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning.Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 10\% and 62\% on seven class-incremental benchmark datasets, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast continual learning has not hitherto been fully tapped.

Set Learning for Accurate and Calibrated Models

paper_url: http://arxiv.org/abs/2307.02245
repo_url: https://github.com/lukasmut/oko
paper_authors: Lukas Muttenthaler, Robert A. Vandermeulen, Qiuyi Zhang, Thomas Unterthiner, Klaus-Robert Müller
for: 降低机器学习模型的自信和准确性问题，提高模型的准确率和准确性，特别在有限的训练数据和类别偏挤的情况下。
methods: 提出了一种新的odd-$k$-out learning（OKO）方法，通过将cross-entropy error最小化为集合而不是单个示例，使模型能够捕捉数据示例之间的相关性，并提高准确率和准确性。
results: OKO方法可以在有限的训练数据和类别偏挤的情况下提高模型的准确率和准确性，并且可以不需要额外的调整参数，如温度Scaling。我们提供了理论支持和广泛的实验分析，证明OKO方法的有效性。

Abstract
Model overconfidence and poor calibration are common in machine learning and difficult to account for when applying standard empirical risk minimization. In this work, we propose a novel method to alleviate these problems that we call odd-$k$-out learning (OKO), which minimizes the cross-entropy error for sets rather than for single examples. This naturally allows the model to capture correlations across data examples and achieves both better accuracy and calibration, especially in limited training data and class-imbalanced regimes. Perhaps surprisingly, OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning, such as temperature scaling. We provide theoretical justification, establishing that OKO naturally yields better calibration, and provide extensive experimental analyses that corroborate our theoretical findings. We emphasize that OKO is a general framework that can be easily adapted to many settings and the trained model can be applied to single examples at inference time, without introducing significant run-time overhead or architecture changes.

摘要
MODEL 过信任和轻度预测困难在机器学习中存在，而且使用标准的empirical risk minimization方法很难准确地考虑这些问题。在这个工作中，我们提出了一种新的方法，称为奇数-$k$-out learning（OKO），该方法通过将cross-entropy errorMinimize for sets而不是单个例子。这 naturally allows the model to capture数据示例之间的相关性，并实现更高的准确率和报告率，尤其在有限的训练数据和类别不均衡情况下。尽管OKO经常提供更好的报告率，而且在使用硬标签和dropping any additional calibration parameter tuning时，我们提供了理论基础，证明OKO自然地提供更好的报告率。我们还提供了广泛的实验分析，证明我们的理论发现。我们强调OKO是一种通用的框架，可以轻松地适应多种设置，并且训练后的模型可以在推理时间应用于单个例子，无需添加显著的运行时过程 overhead或者architecture change。

Knowledge-Guided Additive Modeling For Supervised Regression

paper_url: http://arxiv.org/abs/2307.02229
repo_url: https://github.com/yannclaes/kg-regression
paper_authors: Yann Claes, Vân Anh Huynh-Thu, Pierre Geurts
for: 本研究旨在评估混合模型在标准回归问题上的性能，并与传统机器学习方法进行比较。
methods: 本研究使用了混合模型，其中包括添加式地将 Parametric 物理 термин与机器学习 термин相加。我们还研究了模型免Selection 训练方法。
results: 我们在 synthetic 和实际回归问题上进行了多种方法的比较，结果表明，混合模型在global performance和参数确定方面具有优势。

Abstract
Learning processes by exploiting restricted domain knowledge is an important task across a plethora of scientific areas, with more and more hybrid methods combining data-driven and model-based approaches. However, while such hybrid methods have been tested in various scientific applications, they have been mostly tested on dynamical systems, with only limited study about the influence of each model component on global performance and parameter identification. In this work, we assess the performance of hybrid modeling against traditional machine learning methods on standard regression problems. We compare, on both synthetic and real regression problems, several approaches for training such hybrid models. We focus on hybrid methods that additively combine a parametric physical term with a machine learning term and investigate model-agnostic training procedures. We also introduce a new hybrid approach based on partial dependence functions. Experiments are carried out with different types of machine learning models, including tree-based models and artificial neural networks.

摘要
学习通过利用限制领域知识是科学领域中重要任务，随着更多的混合方法相继出现，这些混合方法结合数据驱动和模型基于方法。然而，虽然这些混合方法在科学应用中得到了证明，但是它们在动力系统上进行了大多数测试，对每个模型组件对全局性表现的影响尚未得到了充分的研究。在这项工作中，我们对混合模型与传统机器学习方法进行比较，在标准回归问题上进行了评估。我们比较了多种混合方法，包括添加式地将 Parametric 物理项与机器学习项相加的方法，以及模型无关的训练过程。此外，我们还介绍了一种基于 partial dependence 函数的新的混合方法。实验使用了不同类型的机器学习模型，包括树状模型和人工神经网络。

Personalized Federated Learning via Amortized Bayesian Meta-Learning

paper_url: http://arxiv.org/abs/2307.02222
repo_url: None
paper_authors: Shiyu Liu, Shaogao Lv, Dun Zeng, Zenglin Xu, Hui Wang, Yue Yu
for: 本研究旨在 Addressing the challenge of statistical heterogeneity in federated learning, 即让多个客户端协同学习一个全局模型，而不曝光他们私有数据。
methods: 本文提出了一种新的个性化联合学习方法，即\emph{FedABML}，它使用了层次变分推理来跨客户端。全局前期目标是捕捉客户端间共同内在结构的表示，然后将其转移到每个客户端的特定任务上，以便通过一些本地更新生成高度准确的客户端特定抽象 posterior。
results: 我们的理论分析表明，\emph{FedABML} 可以在未见数据上提供一个上下文bound，并且保证模型在未见数据上的泛化性能。此外，我们还实现了一些验证性实验，显示\emph{FedABML} 可以超越一些竞争对手。

Abstract
Federated learning is a decentralized and privacy-preserving technique that enables multiple clients to collaborate with a server to learn a global model without exposing their private data. However, the presence of statistical heterogeneity among clients poses a challenge, as the global model may struggle to perform well on each client's specific task. To address this issue, we introduce a new perspective on personalized federated learning through Amortized Bayesian Meta-Learning. Specifically, we propose a novel algorithm called \emph{FedABML}, which employs hierarchical variational inference across clients. The global prior aims to capture representations of common intrinsic structures from heterogeneous clients, which can then be transferred to their respective tasks and aid in the generation of accurate client-specific approximate posteriors through a few local updates. Our theoretical analysis provides an upper bound on the average generalization error and guarantees the generalization performance on unseen data. Finally, several empirical results are implemented to demonstrate that \emph{FedABML} outperforms several competitive baselines.

摘要
“联邦学习”是一种分散式和隐私保证的技术，让多个客户端与服务器共同学习一个全球模型，不会曝露他们的私人数据。然而，客户端的统计差异对全球模型的性能产生挑战，因为全球模型可能无法很好地适应每个客户端的特定任务。为解决这个问题，我们将在个人化联邦学习中引入新的见解，通过整合泛化统计学和机器学习。 Specifically，我们提出一个名为“FedABML”的新算法，它使用客户端之间的层次统计推导，以捕捉客户端的共同内在结构表现。这些表现可以转移到每个客户端的特定任务中，并通过一些本地更新产生高精度的客户端特定概率 posteriors。我们的理论分析提供了随机数据的平均泛化错误上限，并保证模型在未见数据上的泛化性能。最后，我们进行了实验，证明了FedABML在多个竞争性基eline上表现出色。

On the Adversarial Robustness of Generative Autoencoders in the Latent Space

paper_url: http://arxiv.org/abs/2307.02202
repo_url: None
paper_authors: Mingfei Lu, Badong Chen
for:This paper focuses on the adversarial robustness of generative autoencoders, specifically in the latent space.methods:The authors use various attacks in the latent space to demonstrate the vulnerability of popular generative autoencoders. They also compare the performance of variational autoencoders with their deterministic variants and observe that the latter has better latent robustness.results:The authors find that there is a trade-off between adversarial robustness and the degree of disentanglement of the latent codes. They also show that adversarial training can improve the latent robustness of VAEs.

Abstract
The generative autoencoders, such as the variational autoencoders or the adversarial autoencoders, have achieved great success in lots of real-world applications, including image generation, and signal communication. However, little concern has been devoted to their robustness during practical deployment. Due to the probabilistic latent structure, variational autoencoders (VAEs) may confront problems such as a mismatch between the posterior distribution of the latent and real data manifold, or discontinuity in the posterior distribution of the latent. This leaves a back door for malicious attackers to collapse VAEs from the latent space, especially in scenarios where the encoder and decoder are used separately, such as communication and compressed sensing. In this work, we provide the first study on the adversarial robustness of generative autoencoders in the latent space. Specifically, we empirically demonstrate the latent vulnerability of popular generative autoencoders through attacks in the latent space. We also evaluate the difference between variational autoencoders and their deterministic variants and observe that the latter performs better in latent robustness. Meanwhile, we identify a potential trade-off between the adversarial robustness and the degree of the disentanglement of the latent codes. Additionally, we also verify the feasibility of improvement for the latent robustness of VAEs through adversarial training. In summary, we suggest concerning the adversarial latent robustness of the generative autoencoders, analyze several robustness-relative issues, and give some insights into a series of key challenges.

摘要
“生成自 taught autoencoders，如variational autoencoders或adversarial autoencoders，在实际应用中取得了很大的成功，包括图像生成和信号通信。然而，对于它们在实际应用中的Robustness仍然受到了很少的关注。由于生成自 taught autoencoders的潜在阶层结构是probabilistic，因此它们可能会面临 posterior distribution of the latent和实际数据构造的不一致问题，或者 latent posterior distribution的突变。这使得黑客可以通过从latent空间攻击VAEs，特别是在encoder和decoder分开使用的情况下，如传输和压缩感知。在这个工作中，我们提供了生成自 taught autoencoders在latent空间的攻击Robustness的首次研究。我们透过实验示出了流行的生成自 taught autoencoders在latent空间的漏攻击敏感性。我们还评估了variational autoencoders和其决定性版本之间的差异，发现后者在latent robustness方面表现更好。此外，我们也发现了在提高VAEs的latent robustness方面存在一定的贸易关系。最后，我们还验证了对VAEs的latent robustness进行反向培训可以提高其Robustness。总之，我们建议关注生成自 taught autoencoders的latent Robustness，分析了一些Robustness相关的问题，并给出了一些关键挑战的见解。”

ChiENN: Embracing Molecular Chirality with Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.02198
repo_url: https://github.com/gmum/chienn
paper_authors: Piotr Gaiński, Michał Koziarski, Jacek Tabor, Marek Śmieja
for: 本研究旨在使用Graph Neural Networks (GNNs)在化学分子 graph 上进行预测，并能够区分同一分子的镜像（折射体）。
methods: 我们提出了一种理论上正确的消息传递方案，使得 GNNs 能够具有邻居节点顺序的敏感性。我们在这个概念上将其应用于分子异旋性预测任务中，并构建了具有折射体敏感性的 Chiral Edge Neural Network (ChiENN) 层。
results: 我们的实验结果显示，将 ChiENN 层添加到 GNN 模型中，可以超越当前状态艺术方法在分子异旋性预测任务中的性能。

Abstract
Graph Neural Networks (GNNs) play a fundamental role in many deep learning problems, in particular in cheminformatics. However, typical GNNs cannot capture the concept of chirality, which means they do not distinguish between the 3D graph of a chemical compound and its mirror image (enantiomer). The ability to distinguish between enantiomers is important especially in drug discovery because enantiomers can have very distinct biochemical properties. In this paper, we propose a theoretically justified message-passing scheme, which makes GNNs sensitive to the order of node neighbors. We apply that general concept in the context of molecular chirality to construct Chiral Edge Neural Network (ChiENN) layer which can be appended to any GNN model to enable chirality-awareness. Our experiments show that adding ChiENN layers to a GNN outperforms current state-of-the-art methods in chiral-sensitive molecular property prediction tasks.

摘要
图 neural network (GNN) 在深度学习中扮演了基本角色，特别是在化学信息学中。然而， typical GNN 无法捕捉扁平性概念，这意味着它们不能分辨化学结构图和其镜像（扁平体）之间的差异。在药物发现中，能够分辨扁平体的能力是非常重要的，因为扁平体可能具有非常不同的生物化学性质。在这篇论文中，我们提出了一种理论基础的消息传递方案，该方案使 GNN 对节点邻居的顺序敏感。我们将该概念应用于分子扁平性上，并构建了 Chiral Edge Neural Network（ChiENN）层，可以让 GNN 模型具有扁平性意识。我们的实验表明，将 ChiENN 层添加到 GNN 模型后，可以超越当前状态的术法在扁平性敏感分子性质预测任务中表现。

Evaluating AI systems under uncertain ground truth: a case study in dermatology

paper_url: http://arxiv.org/abs/2307.02191
repo_url: None
paper_authors: David Stutz, Ali Taylan Cemgil, Abhijit Guha Roy, Tatiana Matejovicova, Melih Barsbey, Patricia Strachan, Mike Schaekermann, Jan Freyberg, Rajeev Rikhye, Beverly Freeman, Javier Perez Matos, Umesh Telang, Dale R. Webster, Yuan Liu, Greg S. Corrado, Yossi Matias, Pushmeet Kohli, Yun Liu, Arnaud Doucet, Alan Karthikesalingam
for: 这个论文目的是提出了一种方法来评估AI模型的性能时考虑到真实的数据预期不确定性。
methods: 这个论文使用了一种基于统计模型的方法来汇集笔记，并提出了一种新的性能评价指标来考虑annotations uncertainty。
results: 研究发现，使用传统的deterministic aggregation方法时，评估结果具有很大的uncertainty，而使用提出的统计模型方法可以更好地评估模型的性能和uncertainty。

Abstract
For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this is actually not the case and the ground truth may be uncertain. Unfortunately, this is largely ignored in standard evaluation of AI models but can have severe consequences such as overestimating the future performance. To avoid this, we measure the effects of ground truth uncertainty, which we assume decomposes into two main components: annotation uncertainty which stems from the lack of reliable annotations, and inherent uncertainty due to limited observational information. This ground truth uncertainty is ignored when estimating the ground truth by deterministically aggregating annotations, e.g., by majority voting or averaging. In contrast, we propose a framework where aggregation is done using a statistical model. Specifically, we frame aggregation of annotations as posterior inference of so-called plausibilities, representing distributions over classes in a classification setting, subject to a hyper-parameter encoding annotator reliability. Based on this model, we propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation. We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses. The deterministic adjudication process called inverse rank normalization (IRN) from previous work ignores ground truth uncertainty in evaluation. Instead, we present two alternative statistical models: a probabilistic version of IRN and a Plackett-Luce-based model. We find that a large portion of the dataset exhibits significant ground truth uncertainty and standard IRN-based evaluation severely over-estimates performance without providing uncertainty estimates.

摘要
(Simplified Chinese translation)为了保障，医疗领域中的 AI 系统在部署前都会进行严格的评估，以验证其预测结果与固定的真实值进行比较。然而，事实上，真实值并不是固定的，而是具有uncertainty。这一点很容易被忽略，但可能导致未来性能的过度估计。为了避免这种情况，我们需要考虑真实值的uncertainty。我们假设真实值的uncertainty可以分解为两个主要组成部分：注释uncertainty和内在uncertainty。注释uncertainty来自于不可靠的注释，而内在uncertainty来自于限制的观察信息。标准评估方法忽略了这些uncertainty，而是通过 deterministic aggregation（例如，多数投票或平均）来Estimating the ground truth。在这种情况下，我们提出了一种使用统计模型进行注释聚合的框架。specifically，我们将注释聚合视为 posterior inference of so-called plausibilities， representing distributions over classes in a classification setting, subject to a hyper-parameter encoding annotator reliability。基于这个模型，我们提出了一个度量注释uncertainty的metric，并提供了不确定度调整的性能评估 metric。我们在皮肤状况分类从图像中进行了一个案例研究， где注释是在 differential diagnoses 的形式提供的。previous work的 deterministic adjudication process（IRN）忽略了真实值uncertainty，而是通过 deterministic aggregation来Estimating the ground truth。在这种情况下，我们提出了两种统计模型：一种是probabilistic IRN，另一种是Plackett-Luce-based model。我们发现大量数据中存在很大的真实值uncertainty，标准 IRN-based evaluation 严重过度估计性能。

Diffusion Models for Computational Design at the Example of Floor Plans

paper_url: http://arxiv.org/abs/2307.02511
repo_url: None
paper_authors: Joern Ploennigs, Markus Berger
for: 这个研究旨在测试传播模型在土木工程中的应用，尤其是创建特定的建筑计划。
methods: 这个研究使用传播模型实现了图像生成，并提出了改进 semantic 编码的新传播模型。
results: 研究发现，这些传播模型可以从 6% 提高至 90% 的有效 floor plan 生成，并且在不同的例子中进行了多个实验。

Abstract
AI Image generators based on diffusion models are widely discussed recently for their capability to create images from simple text prompts. But, for practical use in civil engineering they need to be able to create specific construction plans for given constraints. Within this paper we explore the capabilities of those diffusion-based AI generators for computational design at the example of floor plans and identify their current limitation. We explain how the diffusion-models work and propose new diffusion models with improved semantic encoding. In several experiments we show that we can improve validity of generated floor plans from 6% to 90% and query performance for different examples. We identify short comings and derive future research challenges of those models and discuss the need to combine diffusion models with building information modelling. With this we provide key insights into the current state and future directions for diffusion models in civil engineering.

摘要
《Diffusion模型基于AI图像生成器在近期受到广泛关注，能够从简单文本提示中生成图像。但在实际的ivil工程应用中，它们需要能够根据给定的约束创建特定的建筑计划。本文介绍了Diffusion模型在计算设计中的能力，以卷积图生成器为例，并评估了其Semantic编码的改进。我们在多个实验中示出，可以从6%提高到90%的有效性，并且提高了不同的示例的查询性能。我们还识别了这些模型的缺点，并提出了未来研究挑战。我们认为Diffusion模型和建筑信息模型的结合是未来的发展趋势。这些发现为Diffusion模型在 civil工程领域的现状和未来发展提供了关键的导向。》Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese. If you need Traditional Chinese, please let me know.

DiffFlow: A Unified SDE Framework for Score-Based Diffusion Models and Generative Adversarial Networks

paper_url: http://arxiv.org/abs/2307.02159
repo_url: None
paper_authors: Jingwei Zhang, Han Shi, Jincheng Yu, Enze Xie, Zhenguo Li
for: 这个论文的目的是提出一种统一的概率理论框架，用于描述 explict 生成模型和 implicit 生成模型之间的关系。
methods: 该论文使用了一种名为 Discriminator Denoising Diffusion Flow (DiffFlow) 的新型 Stochastic Differential Equation (SDE)，用于描述生成模型的学习动态。
results: 该论文提出了一种可以在Explicit 生成模型和 implicit 生成模型之间进行满足的平衡点，并且可以通过调整权重来实现高质量样本的生成和快速样本生成。

Abstract
Generative models can be categorized into two types: explicit generative models that define explicit density forms and allow exact likelihood inference, such as score-based diffusion models (SDMs) and normalizing flows; implicit generative models that directly learn a transformation from the prior to the data distribution, such as generative adversarial nets (GANs). While these two types of models have shown great success, they suffer from respective limitations that hinder them from achieving fast sampling and high sample quality simultaneously. In this paper, we propose a unified theoretic framework for SDMs and GANs. We shown that: i) the learning dynamics of both SDMs and GANs can be described as a novel SDE named Discriminator Denoising Diffusion Flow (DiffFlow) where the drift can be determined by some weighted combinations of scores of the real data and the generated data; ii) By adjusting the relative weights between different score terms, we can obtain a smooth transition between SDMs and GANs while the marginal distribution of the SDE remains invariant to the change of the weights; iii) we prove the asymptotic optimality and maximal likelihood training scheme of the DiffFlow dynamics; iv) under our unified theoretic framework, we introduce several instantiations of the DiffFLow that provide new algorithms beyond GANs and SDMs with exact likelihood inference and have potential to achieve flexible trade-off between high sample quality and fast sampling speed.

摘要
<>将文本翻译成简化中文。<>生成模型可以分为两类：Explicit生成模型，它们定义明确的概率形式，并允许准确的可能性推断，如排Diffusion模型（SDM）和Normalizing Flow；Implicit生成模型，它们直接学习数据分布与假设分布之间的变换，如生成敌方网络（GAN）。虽然这两种模型都有显著的成功，但它们受到减速采样和高质量采样的限制。在这篇论文中，我们提出一个统一的理论框架，用于SDM和GAN。我们证明了：i) SDM和GAN的学习动力可以被描述为一种名为Discriminator Denoising Diffusion Flow（DiffFlow）的新型SDE，其涨动可以由真实数据和生成数据的得分组成的权重所决定; ii) 通过调整不同得分项的相对权重，可以实现SDM和GAN之间的滑块过渡，而且采样分布的总体征不变; iii) 我们证明DiffFlow动力的极限优化和最大可能性训练方案; iv) 在我们统一的理论框架下，我们引入了多种DiffFlow实例，提供了新的算法，包括SDM和GAN中的准确可能性推断和高速采样速度。

Wasserstein Auto-Encoders of Merge Trees (and Persistence Diagrams)

paper_url: http://arxiv.org/abs/2307.02509
repo_url: None
paper_authors: Mahieu Pont, Julien Tierny
for: 本研究提出了一种基于 Wasserstein metric 空间的merge tree auto-encoding（MT-WAE）方法，用于提高传统自编码器的准确率和可读性。
methods: 本方法使用了一种新的非线性神经网络结构，将merge tree经过多层神经网络的操作，以实现更高的准确率和可读性。
results: 实验结果表明，MT-WAE可以快速计算merge tree，并且可以准确地压缩merge tree，同时 preserved Wasserstein 距离和 clusters。此外，本方法还可以应用于维度减少和数据分析等领域。

Abstract
This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE), a novel extension of the classical auto-encoder neural network architecture to the Wasserstein metric space of merge trees. In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network, resulting in superior accuracy and interpretability. Our novel neural network approach can be interpreted as a non-linear generalization of previous linear attempts [65] at merge tree encoding. It also trivially extends to persistence diagrams. Extensive experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average. We show the utility of our contributions in two applications adapted from previous work on merge tree encoding [65]. First, we apply MT-WAE to data reduction and reliably compress merge trees by concisely representing them with their coordinates in the final layer of our auto-encoder. Second, we document an application to dimensionality reduction, by exploiting the latent space of our auto-encoder, for the visual analysis of ensemble data. We illustrate the versatility of our framework by introducing two penalty terms, to help preserve in the latent space both the Wasserstein distances between merge trees, as well as their clusters. In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used for reproducibility.

摘要

Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency

paper_url: http://arxiv.org/abs/2307.02150
repo_url: None
paper_authors: Md Abdul Kadir, Gowtham Krishna Addluri, Daniel Sonntag
for: 这 study aims to improve the interpretability and trustworthiness of machine learning models by examining the generalization of feature attributions across various deep learning architectures.
methods: The study uses feature attribution methods to provide local explanations of model predictions, and explores the feasibility of utilizing these methods as a future detector.
results: The findings suggest that harmonized feature attribution methods can improve interpretability and trust in machine learning applications, regardless of the underlying architecture.

Abstract
Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of feature attributions across various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers. We aim to assess the feasibility of utilizing a feature attribution method as a future detector and examine how these features can be harmonized across multiple models employing distinct architectures but trained on the same data distribution. By exploring this harmonization, we aim to develop a more coherent and optimistic understanding of feature attributions, enhancing the consistency of local explanations across diverse deep-learning models. Our findings highlight the potential for harmonized feature attribution methods to improve interpretability and foster trust in machine learning applications, regardless of the underlying architecture.

摘要

Towards Open Federated Learning Platforms: Survey and Vision from Technical and Legal Perspectives

paper_url: http://arxiv.org/abs/2307.02140
repo_url: https://github.com/morningd/model-centric-fml
paper_authors: Moming Duan
for: 本文提出了一种新的 Federated Learning（FL）平台设计，即开放式 Federated Learning Platforms，以扩展FL的应用场景和提高数据持有者的参与积极性。
methods: 本文提出了两种对接口型FL框架的替换方案：查询型FL和合同型FL，以解决FL中的严重的服务器-客户端耦合、模型重复利用和非公共问题。
results: 本文通过对技术和法律领域的分析，证明了开放式FL平台的可行性和优势，并提出了一种模型license兼容分类法，以便在FL研究中更好地识别和解决模型使用权限问题。

Abstract
Traditional Federated Learning (FL) follows a server-domincated cooperation paradigm which narrows the application scenarios of FL and decreases the enthusiasm of data holders to participate. To fully unleash the potential of FL, we advocate rethinking the design of current FL frameworks and extending it to a more generalized concept: Open Federated Learning Platforms. We propose two reciprocal cooperation frameworks for FL to achieve this: query-based FL and contract-based FL. In this survey, we conduct a comprehensive review of the feasibility of constructing an open FL platform from both technical and legal perspectives. We begin by reviewing the definition of FL and summarizing its inherent limitations, including server-client coupling, low model reusability, and non-public. In the query-based FL platform, which is an open model sharing and reusing platform empowered by the community for model mining, we explore a wide range of valuable topics, including the availability of up-to-date model repositories for model querying, legal compliance analysis between different model licenses, and copyright issues and intellectual property protection in model reusing. In particular, we introduce a novel taxonomy to streamline the analysis of model license compatibility in FL studies that involve batch model reusing methods, including combination, amalgamation, distillation, and generation. This taxonomy provides a systematic framework for identifying the corresponding clauses of licenses and facilitates the identification of potential legal implications and restrictions when reusing models. Through this survey, we uncover the the current dilemmas faced by FL and advocate for the development of sustainable open FL platforms. We aim to provide guidance for establishing such platforms in the future, while identifying potential problems and challenges that need to be addressed.

摘要
传统的联合学习（FL）采用服务器主导的合作方式，这限制了FL的应用场景和数据持有者的参与积极性。为了充分发挥FL的潜力，我们提倡重新设计当前FL框架，扩展其为更通用的概念：开放联合学习平台。我们提出了两种相互合作的FL框架：查询基于的FL和合同基于的FL。在这篇评论中，我们对构建开放FL平台的技术和法律方面进行了全面的审查。我们开始介绍FL的定义和其内置的局限性，包括服务器客户端集成、低级别模型重用和非公共。在查询基于的FL平台中，我们探讨了许多有价值的话题，包括社区 empowered 的模型分享和重用平台，以及模型查询时的法律合规分析、版权问题和知识产权保护。特别是，我们提出了一种新的分类系统，用于协调FL研究中批处理模型 reuse 方法中的许可证兼容性分析，包括组合、混合、精炼和生成等方法。这种分类系统为在FL研究中复用模型时鉴别相关的许可证条款，并且可以帮助确定复用模型时的可能的法律后果和限制。通过这篇评论，我们揭示了当前FL面临的困境，并提倡开发可持续的开放FL平台。我们希望通过这篇评论，为未来建立开放FL平台提供指南，并识别可能的问题和挑战。

Implicit Differentiation for Hyperparameter Tuning the Weighted Graphical Lasso

paper_url: http://arxiv.org/abs/2307.02130
repo_url: None
paper_authors: Can Pouliquen, Paulo Gonçalves, Mathurin Massias, Titouan Vayer
for: 提出一种 Framework 和算法来调整图像隐藏常量的超参数。
methods: 使用一种精简型搜索方法来解决一个笛卡尔级别优化问题。
results: derivation of the Jacobian of the Graphical Lasso solution with respect to its regularization hyperparameters.

Abstract
We provide a framework and algorithm for tuning the hyperparameters of the Graphical Lasso via a bilevel optimization problem solved with a first-order method. In particular, we derive the Jacobian of the Graphical Lasso solution with respect to its regularization hyperparameters.

摘要
我们提供了一个框架和算法，用于调整图解lasso的超参数via一个双层优化问题，解决使用首个方法。特别是，我们计算了图解lasso解的正则化超参数对它的Jacobian。Here's a breakdown of the translation:* 我们 (wǒmen) - we* 提供 (tīngyè) - provide* 框架 (kāiframe) - framework* 算法 (suānfǎ) - algorithm* 调整 (tiējian) - tune* 超参数 (chāoxiǎn) - hyperparameters* via (via) - via* 双层优化问题 (shuāngcéng yòuhuì wèn) - bilevel optimization problem* 解决 (jiějué) - solve* 使用 (fùyòu) - using* 首个方法 (shǒu gè fāng) - first-order method* 特别是 (tèbié shì) - particularly* 我们计算 (wǒmen jìsuān) - we calculate* 图解lasso解 (tújiě lasso jiě) - Graphical Lasso solution* 正则化超参数 (zhèngxíng huìxiāng) - regularization hyperparameters* 对 (duì) - on* 它 (tā) - it* Jacobian (jiābǐjian) - Jacobian

How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model

paper_url: http://arxiv.org/abs/2307.02129
repo_url: https://github.com/pcsl-epfl/hierarchy-learning
paper_authors: Leonardo Petrini, Francesco Cagnetta, Umberto M. Tomasini, Alessandro Favero, Matthieu Wyart
for: 这个论文的目的是解释深度卷积神经网络如何在高维度数据上学习普遍的任务。
methods: 这个论文使用的方法是使用深度卷积神经网络来学习Random Hierarchy Model，这是一个模拟真实数据的简单分类任务。
results: 研究发现，深度卷积神经网络需要的训练数据量（$P^*$）与高维度数据中类别的数量（$n_c$）和高级特征的数量（$m$）以及重复层数（$L$）有关，具体来说，$P^*$的增长率为$n_c m^L$,只是增长平方根。此外，研究还发现，当训练数据量够多时，深度卷积神经网络的表征将变得对于同义词替换无关，并且可以捕捉低级特征与类别之间的相关性。

Abstract
Learning generic high-dimensional tasks is notably hard, as it requires a number of training data exponential in the dimension. Yet, deep convolutional neural networks (CNNs) have shown remarkable success in overcoming this challenge. A popular hypothesis is that learnable tasks are highly structured and that CNNs leverage this structure to build a low-dimensional representation of the data. However, little is known about how much training data they require, and how this number depends on the data structure. This paper answers this question for a simple classification task that seeks to capture relevant aspects of real data: the Random Hierarchy Model. In this model, each of the $n_c$ classes corresponds to $m$ synonymic compositions of high-level features, which are in turn composed of sub-features through an iterative process repeated $L$ times. We find that the number of training data $P^*$ required by deep CNNs to learn this task (i) grows asymptotically as $n_c m^L$, which is only polynomial in the input dimensionality; (ii) coincides with the training set size such that the representation of a trained network becomes invariant to exchanges of synonyms; (iii) corresponds to the number of data at which the correlations between low-level features and classes become detectable. Overall, our results indicate how deep CNNs can overcome the curse of dimensionality by building invariant representations, and provide an estimate of the number of data required to learn a task based on its hierarchically compositional structure.

摘要
学习高维任务非常困难，因为它们需要数据量的幂等于维度。然而，深度卷积神经网络（CNN）表现出了很好的成功，即使在高维数据上。一种广泛的假设是，学习任务是高度结构化的，并且CNN可以利用这种结构来建立低维表示。然而，对于学习多少数据，还不清楚。这篇论文回答了这个问题，对于一个简单的分类任务，即Random Hierarchy Model。在这个模型中，每个分类对应$n_c$个高级特征的多个同义词组合，这些组合通过循环的过程重复$L$次。我们发现，深度CNN需要学习这个任务的数据量($P^*$)（i）在$n_c m^L$的极限上增长，这只是输入维度的多项式函数；（ii）与训练集大小相同，使得训练后神经网络的表示变得对同义词交换无关的；（iii）与低级特征和类之间的相关性变得可识别。总的来说，我们的结果表明深度CNN可以超越维度味精，并提供了学习任务基于层次结构的数据量的估算。

Robust Graph Structure Learning with the Alignment of Features and Adjacency Matrix

paper_url: http://arxiv.org/abs/2307.02126
repo_url: None
paper_authors: Shaogao Lv, Gang Wen, Shiyu Liu, Linsen Wei, Ming Li
for: 提高图 neural network 的 robustness，jointly 学习干净图结构和对应表示。
methods: 提出了一种新的准则化 graph structure learning 方法，利用特征信息和图信息的协调，基于我们 derive的节点级 Rademacher 复杂性下界。还具有减少维度的稀疏降维方法，使用低维度的节点特征来利用图结构。
results: 对实际图据进行了实验，表明我们提出的 GSL 方法在受到噪声影响的图结构下表现出色，超过了多种竞争对手。

Abstract
To improve the robustness of graph neural networks (GNN), graph structure learning (GSL) has attracted great interest due to the pervasiveness of noise in graph data. Many approaches have been proposed for GSL to jointly learn a clean graph structure and corresponding representations. To extend the previous work, this paper proposes a novel regularized GSL approach, particularly with an alignment of feature information and graph information, which is motivated mainly by our derived lower bound of node-level Rademacher complexity for GNNs. Additionally, our proposed approach incorporates sparse dimensional reduction to leverage low-dimensional node features that are relevant to the graph structure. To evaluate the effectiveness of our approach, we conduct experiments on real-world graphs. The results demonstrate that our proposed GSL method outperforms several competitive baselines, especially in scenarios where the graph structures are heavily affected by noise. Overall, our research highlights the importance of integrating feature and graph information alignment in GSL, as inspired by our derived theoretical result, and showcases the superiority of our approach in handling noisy graph structures through comprehensive experiments on real-world datasets.

摘要

Multi-Scale U-Shape MLP for Hyperspectral Image Classification

paper_url: http://arxiv.org/abs/2307.10186
repo_url: None
paper_authors: Moule Lin, Weipeng Jing, Donglin Di, Guangsheng Chen, Houbing Song
for: 该研究旨在提出一种基于多尺度U型多层感知器（MUMLP）模型，以提高 гиперспектраль图像中像素的标识率。
methods: 该模型由设计的多尺度渠道（MSC）块和U型多层感知器（UMLP）结构组成。 MSC将通道维度变换并混合 spectral band 特征，以生成深度水平的表示。 UMLP 由encoder-decoder结构和多层感知器层组成，能够压缩大规模参数。
results: 对于三个公共数据集（Pavia University、Houston 2013和Houston 2018），研究人员进行了广泛的实验，并证明了该模型可以在多种预测任务中卓越于现状顶尖方法。

Abstract
Hyperspectral images have significant applications in various domains, since they register numerous semantic and spatial information in the spectral band with spatial variability of spectral signatures. Two critical challenges in identifying pixels of the hyperspectral image are respectively representing the correlated information among the local and global, as well as the abundant parameters of the model. To tackle this challenge, we propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) a model consisting of the designed MSC (Multi-Scale Channel) block and the UMLP (U-shape Multi-Layer Perceptron) structure. MSC transforms the channel dimension and mixes spectral band feature to embed the deep-level representation adequately. UMLP is designed by the encoder-decoder structure with multi-layer perceptron layers, which is capable of compressing the large-scale parameters. Extensive experiments are conducted to demonstrate our model can outperform state-of-the-art methods across-the-board on three wide-adopted public datasets, namely Pavia University, Houston 2013 and Houston 2018

摘要
To address these challenges, we propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) model, consisting of a designed Multi-Scale Channel (MSC) block and a U-shape Multi-Layer Perceptron (UMLP) structure. The MSC block transforms the channel dimension and mixes spectral band features to embed deep-level representation adequately. The UMLP structure is designed with an encoder-decoder structure and multi-layer perceptron layers, which can compress large-scale parameters.Extensive experiments demonstrate that our MUMLP model outperforms state-of-the-art methods across-the-board on three widely adopted public datasets, namely Pavia University, Houston 2013, and Houston 2018.

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

paper_url: http://arxiv.org/abs/2307.02108
repo_url: None
paper_authors: Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill
for: 这篇 paper 的目的是提出一种 computationally efficient bandit algorithm 来实现 contextual bandit 的 optimal treatment assignment policy，并且可以适应 cumulative regret minimization 和 simple regret minimization 两种不同的目标。
methods: 这篇 paper 使用了一种新的 family of computationally efficient bandit algorithms，这些算法可以适应 contextual bandit 的条件下的模型错误和统计不确定性，并且可以在 continuous arm settings 中进行应用。这些算法基于 “conformal arm sets” (CASs) 的构造和依赖，CASs 提供了每个 context 中的一个包含 context-specific optimal arm 的集合，以 guaranteee 最小化 regret。
results: 这篇 paper 的实验结果显示了这些算法在 simple regret 和 cumulative regret 上都有优秀的表现，并且可以适应 contextual bandit 的不同条件下。此外，paper 还证明了一个 negative result，即一个 algorithm 无法同时 achiev instance-dependent simple regret guarantees 和 minimax optimal cumulative regret guarantees。

Abstract
Simple regret minimization is a critical problem in learning optimal treatment assignment policies across various domains, including healthcare and e-commerce. However, it remains understudied in the contextual bandit setting. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit settings, with the flexibility to be adapted for cumulative regret minimization (with near-optimal minimax guarantees) and simple regret minimization (with SOTA guarantees). Furthermore, our algorithms adapt to model misspecification and extend to the continuous arm settings. These advantages come from constructing and relying on "conformal arm sets" (CASs), which provide a set of arms at every context that encompass the context-specific optimal arm with some probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted by a negative result, which shows that an algorithm can't achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.

摘要
<>设置为使用简化中文（简化字）。<>在多个领域中，包括医疗和电商，学习最佳准备分配策略是一个关键问题。然而，在上下文抽象机器人设置中，这个问题尚未得到充分研究。我们提出了一种新的计算效率高的抽象机器人算法家族，用于 Stochastic Contextual Bandit 设置，并且可以适应积累 regret 最小化（具有近似最优最小化保证）和简单 regret 最小化（具有 State-of-the-Art 保证）。此外，我们的算法可以适应模型误差和连续臂设置。这些优势来自于构造和依赖于 "conformal arm sets"（CASs），它们在每个上下文中提供一组拥有 Context-specific 优臂的arm，并且在 Context 分布中具有一定的概率。我们的正面结果表明，我们的算法可以在简单 regret 和积累 regret 两个方面提供保证，而且在模型误差和连续臂设置下也能够适应。相比之下，一个负面结果表明，无法同时实现实例特定的简单 regret 保证和最优的积累 regret 保证。

SoK: Privacy-Preserving Data Synthesis

paper_url: http://arxiv.org/abs/2307.02106
repo_url: None
paper_authors: Yuzheng Hu, Fan Wu, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David Forsyth, Bo Li, Dawn Song
for: 本研究旨在提供一份概述、分析和讨论隐私保护数据分析（PPDS）领域的综述，以便回答有关PPDS方法的设计原则、分类、优缺点等问题。
methods: 本研究批判了两种主流PPDS方法：统计方法和深度学习（DL）基于方法。统计方法包括模型和表示方法的选择，而DL基于方法则包括不同生成模型原理。此外，我们还提供了参考表格、概括结论和开放问题。
results: 我们对私人图像生成任务进行了 benchmarking，并确定了DP-MERF是一种通用的方法。此外，我们还系统化了过去十年的研究成果，并提出了未来研究方向和对研究人员的呼吁。

Abstract
As the prevalence of data analysis grows, safeguarding data privacy has become a paramount concern. Consequently, there has been an upsurge in the development of mechanisms aimed at privacy-preserving data analyses. However, these approaches are task-specific; designing algorithms for new tasks is a cumbersome process. As an alternative, one can create synthetic data that is (ideally) devoid of private information. This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field. Specifically, we put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods. Under the master recipe, we further dissect the statistical methods into choices of modeling and representation, and investigate the DL-based methods by different generative modeling principles. To consolidate our findings, we provide comprehensive reference tables, distill key takeaways, and identify open problems in the existing literature. In doing so, we aim to answer the following questions: What are the design principles behind different PPDS methods? How can we categorize these methods, and what are the advantages and disadvantages associated with each category? Can we provide guidelines for method selection in different real-world scenarios? We proceed to benchmark several prominent DL-based methods on the task of private image synthesis and conclude that DP-MERF is an all-purpose approach. Finally, upon systematizing the work over the past decade, we identify future directions and call for actions from researchers.

摘要
随着数据分析的普及，保护数据隐私已成为首要的关注点。因此，在数据分析中保持隐私的机制的开发呈现了增加趋势。然而，这些方法都是任务特定的，设计新任务的算法是一个繁琐的过程。为了解决这问题，可以创建没有隐私信息的Synthetic Data。本文关注于隐私保护数据合成（PPDS），提供了全面的概述、分析和讨论。特别是，我们提出了一种综合方法，称为“master recipe”，可以统一两个PPDS研究的主要流派：统计方法和深度学习（DL）基本方法。在master recipe下，我们进一步剖析统计方法，包括模型和表示方法的选择，并investigate DL基本模型的不同原则。为了归纳我们的发现，我们提供了完整的参考表格，概括关键点，并识别现有文献中的开放问题。因此，我们想回答以下问题：PPDS方法的设计原则是什么？如何分类这些方法，它们具有什么优势和缺点？是否可以提供实际应用场景中的方法选择指南？我们继续使用DP-MERF方法进行私人图像生成测试，并证明它是一种通用的方法。最后，我们系统化过去十年的工作，并提出未来方向和研究者的呼吁。

DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications

paper_url: http://arxiv.org/abs/2307.02094
repo_url: https://github.com/ibm/domain-adaptive-attribution-robustness
paper_authors: Adam Ivankay, Mattia Rigotti, Pascal Frossard
for: This paper aims to provide a better understanding of the robustness of deep neural network explanations in the biomedical domain.
methods: The paper proposes a new approach called DomainAdaptiveAREstimator (DARE) to estimate the attribution robustness of explanations in the biomedical domain. DARE takes into account domain-specific plausibility to ensure that the explanations are both accurate and relevant to the domain experts.
results: The paper presents two methods, adversarial training and FAR training, to mitigate the brittleness of explanations in the biomedical domain. The proposed methods are validated through extensive experiments on three established biomedical benchmarks.

Abstract
Along with the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has seen a significant increase recently. Several methods have been introduced to provide insight into the inference process of deep neural networks. However, most of these explainability methods have been shown to be brittle in the face of adversarial perturbations of their inputs in the image and generic textual domain. In this work we show that this phenomenon extends to specific and important high stakes domains like biomedical datasets. In particular, we observe that the robustness of explanations should be characterized in terms of the accuracy of the explanation in linking a model's inputs and its decisions - faithfulness - and its relevance from the perspective of domain experts - plausibility. This is crucial to prevent explanations that are inaccurate but still look convincing in the context of the domain at hand. To this end, we show how to adapt current attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. This results in our DomainAdaptiveAREstimator (DARE) attribution robustness estimator, allowing us to properly characterize the domain-specific robustness of faithful explanations. Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE, allowing us to train networks that display robust attributions. Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.

摘要
alongside the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has increased significantly recently. several methods have been introduced to provide insight into the inference process of deep neural networks. however, most of these explainability methods have been shown to be brittle in the face of adversarial perturbations of their inputs in the image and generic textual domain. in this work, we show that this phenomenon extends to specific and important high-stakes domains like biomedical datasets. in particular, we observe that the robustness of explanations should be characterized in terms of the accuracy of the explanation in linking a model's inputs and its decisions - faithfulness - and its relevance from the perspective of domain experts - plausibility. this is crucial to prevent explanations that are inaccurate but still look convincing in the context of the domain at hand. to this end, we show how to adapt current attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. this results in our domain-adaptive attribution robustness estimator (DARE) attribution robustness estimator, allowing us to properly characterize the domain-specific robustness of faithful explanations. next, we provide two methods, adversarial training and far training, to mitigate the brittleness characterized by DARE, allowing us to train networks that display robust attributions. finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.

Make A Long Image Short: Adaptive Token Length for Vision Transformers

paper_url: http://arxiv.org/abs/2307.02092
repo_url: None
paper_authors: Qiqi Zhou, Yichen Zhu
for: 提高预测速度，减少计算成本
methods: 提出了一种适应测试时动态调整图像token长的方法，包括训练一个可变长度ViT模型和使用一个轻量级的Token长分配器（TLA）来分配最优的token长度
results: 实现了对多种现代视Transformer架构的减少计算成本，并在图像分类和动作识别任务上验证了方法的有效性

Abstract
The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in better performance, it also leads to a considerable increase in computational cost. Motivated by the saying "A picture is worth a thousand words," we propose an innovative approach to accelerate the ViT model by shortening long images. Specifically, we introduce a method for adaptively assigning token length for each image at test time to accelerate inference speed. First, we train a Resizable-ViT (ReViT) model capable of processing input with diverse token lengths. Next, we extract token-length labels from ReViT that indicate the minimum number of tokens required to achieve accurate predictions. We then use these labels to train a lightweight Token-Length Assigner (TLA) that allocates the optimal token length for each image during inference. The TLA enables ReViT to process images with the minimum sufficient number of tokens, reducing token numbers in the ViT model and improving inference speed. Our approach is general and compatible with modern vision transformer architectures, significantly reducing computational costs. We verified the effectiveness of our methods on multiple representative ViT models on image classification and action recognition.

摘要
“当代视觉转换器（ViT）模型将图像转换为一系列有 fix 长度的 токен，并对其进行语言处理的处理方式。虽然增加 токен 的数量通常会导致性能提高，但也会带来巨大的 Computational cost。为了解决这个问题，我们提出了一个创新的方法，即在评估时阶段适应地设定图像的 токен 长度。首先，我们训练了可以处理多种 токен 长度的 Resizable-ViT（ReViT）模型。接着，我们从 ReViT 中提取了 token-length 标签，这些标签指示了对于正确预测所需的最少的 токен 数量。我们然后使用这些标签进行训练一个轻量级的 Token-Length Assigner（TLA），这个 TLA 可以在评估过程中为每个图像分配最佳的 токен 长度。这个方法可以让 ReViT 在评估过程中对图像进行适当的处理，并且可以大幅降低 Computational cost。我们验证了我们的方法在多个代表性的 ViT 模型上进行图像分类和动作识别中的效果。”

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

paper_url: http://arxiv.org/abs/2307.02075
repo_url: None
paper_authors: Qijie Ding, Jie Yin, Daokun Zhang, Junbin Gao
for: 提高实体对应性预测的准确率，抗衡假标签错误的影响
methods: 提出一种独特的 pseudo-labeling 框架（UPL-EA），通过精准的 Transport 模型和跨迭代 pseudo-标签准化来消除 pseudo-标签错误，提高实体对应性预测的准确率
results: 实验结果表明，我们的方法可以在有限的先前对应种子基础下达到竞争性的性能，并经过理论支持和实验验证，我们的方法可以减少 Type I 和 Type II pseudo-标签错误的影响

Abstract
Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) The Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to enable more accurate determination of entity correspondences across two KGs and to mitigate the adverse impact of erroneous matches. A simple but highly effective criterion is further devised to derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration. (2) The cross-iteration pseudo-label calibration operates across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee. The two components are respectively designed to eliminate Type I and Type II pseudo-labeling errors identified through our analyse. The calibrated pseudo-labels are thereafter used to augment prior alignment seeds to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. The experimental results show that our approach achieves competitive performance with limited prior alignment seeds.

摘要
Entity alignment (EA) 目标是在不同知识 graphs (KGs) 中标识同一个真实世界标识的等价实体对。为了系统地战胜假标注Error供 pseudo-labeling-based entity alignment，我们提出了一种Unified Pseudo-Labeling框架 дляEntity Alignment (UPL-EA)，该框架可以显著提高实体对应的准确率。UPL-EA包括两个补充部分：1. 基于Optimal Transport (OT)的 pseudo-labeling使用离散OT模型作为有效的方法来帮助更准确地确定两个KG中的实体对应关系，并 Mitigate the adverse impact of erroneous matches。我们还提出了一个简单 yet highly effective的标准来 derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration。2. 跨迭代 pseudo-label calibration operated across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee。这两个部分分别是为了消除Type I和Type II pseudo-labeling errors，这些错误被我们的分析所识别出。归一化后的 pseudo-labels 然后被用来增强后续模型训练中的对应性。我们的方法在 theoretically supported 和 experimentally validated 的情况下，可以减少 pseudo-labeling errors。实验结果显示，我们的方法在有限的先前对Alignment seeds的情况下可以达到竞争性的性能。

Performance Modeling of Data Storage Systems using Generative Models

paper_url: http://arxiv.org/abs/2307.02073
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Abdalaziz Rashid Al-Maeeni, Aziz Temirkhanov, Artem Ryzhikov, Mikhail Hushchyn
for: 这 paper 是用于高精度模型系统的研究。
methods: 这 paper 使用机器学习基于生成模型来构建存储系统模型。
results: 实验结果显示该模型可以对系统性能做出高精度预测（IOPS 和响应时间），错误率在4-10%和3-16%之间，与 Little’s law 之间呈0.99Spearman 相似性。此外，文章还提供了可用于机器学习 regression 算法、条件生成模型和不确定性估计方法的新数据集。

Abstract
High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.

摘要
高精度模型化系统是工业数据分析的一个主要领域。系统的模型，也称为数字响应器，用于预测它们在不同条件下的行为。我们已经开发了一些基于机器学习的生成模型，用于模型存储系统。该系统包括多个组件：硬盘驱动器（HDD）和固态驱动器（SSD）存储池，以及不同的RAID方案和缓存。每个存储组件都是由一个概率模型来描述该组件性能的可能性分布，包括IOPS和延迟时间，它们取决于组件的配置和外部数据负荷参数。实验结果显示，预测错误率为4-10% для IOPS和3-16% для延迟时间，具体取决于组件和模型。这些预测还与李тт尔定律（Little's law）之间有0.99余 correlations，可以用于无监督可靠性检查。此外，我们还提供了一些新的数据集，可以用于机器学习 regression 算法、条件生成模型和不确定性估计方法的Benchmark。

A Comparison of Machine Learning Methods for Data with High-Cardinality Categorical Variables

paper_url: http://arxiv.org/abs/2307.02071
repo_url: https://github.com/fabsig/compare_ml_highcardinality_categorical_variables
paper_authors: Fabio Sigrist
for: 这篇论文主要研究高Cardinality categorical variables的机器学习模型。
methods: 论文使用了树融合和深度神经网络两种机器学习方法，以及线性混合效应模型。
results: 研究发现，机器学习模型带有随机效应的版本比 классиical版本更高的预测精度。此外，树融合带有随机效应的版本也比深度神经网络带有随机效应的版本更高的预测精度。

Abstract
High-cardinality categorical variables are variables for which the number of different levels is large relative to the sample size of a data set, or in other words, there are few data points per level. Machine learning methods can have difficulties with high-cardinality variables. In this article, we empirically compare several versions of two of the most successful machine learning methods, tree-boosting and deep neural networks, and linear mixed effects models using multiple tabular data sets with high-cardinality categorical variables. We find that, first, machine learning models with random effects have higher prediction accuracy than their classical counterparts without random effects, and, second, tree-boosting with random effects outperforms deep neural networks with random effects.

摘要
高级别分类变量是指数据集中每个变量有许多不同的水平，与样本大小相比，这些变量的数量很大。机器学习方法可能会遇到困难处理高级别分类变量。本文employs empirical comparisons of several versions of two of the most successful machine learning methods, tree-boosting and deep neural networks, as well as linear mixed effects models using multiple tabular data sets with high-cardinality categorical variables. Our findings show that: first, machine learning models with random effects have higher prediction accuracy than their classical counterparts without random effects; second, tree-boosting with random effects outperforms deep neural networks with random effects.

Universal Rates for Multiclass Learning

paper_url: http://arxiv.org/abs/2307.02066
repo_url: https://github.com/Machinfy/Human-Activity-Recognition-with-Smartphones
paper_authors: Steve Hanneke, Shay Moran, Qian Zhang
for: 这个论文是为了研究多类分类的普适率而写的。
methods: 这篇论文使用了 pseudo-cubes 和 DSL 树来研究多类分类的学习问题。
results: 这篇论文提出了一个普适率 bound，解决了 Kalavasis 等人（2022）对多类分类问题的开问。 Additionally, the paper shows that any class with an infinite Littlestone tree requires arbitrarily slow rates, while any class with a near-linear rate must have no infinite DSL tree.

Abstract
We study universal rates for multiclass classification, establishing the optimal rates (up to log factors) for all hypothesis classes. This generalizes previous results on binary classification (Bousquet, Hanneke, Moran, van Handel, and Yehudayoff, 2021), and resolves an open question studied by Kalavasis, Velegkas, and Karbasi (2022) who handled the multiclass setting with a bounded number of class labels. In contrast, our result applies for any countable label space. Even for finite label space, our proofs provide a more precise bounds on the learning curves, as they do not depend on the number of labels. Specifically, we show that any class admits exponential rates if and only if it has no infinite Littlestone tree, and admits (near-)linear rates if and only if it has no infinite Daniely-Shalev-Shwartz-Littleston (DSL) tree, and otherwise requires arbitrarily slow rates. DSL trees are a new structure we define in this work, in which each node of the tree is given by a pseudo-cube of possible classifications of a given set of points. Pseudo-cubes are a structure, rooted in the work of Daniely and Shalev-Shwartz (2014), and recently shown by Brukhim, Carmon, Dinur, Moran, and Yehudayoff (2022) to characterize PAC learnability (i.e., uniform rates) for multiclass classification. We also resolve an open question of Kalavasis, Velegkas, and Karbasi (2022) regarding the equivalence of classes having infinite Graph-Littlestone (GL) trees versus infinite Natarajan-Littlestone (NL) trees, showing that they are indeed equivalent.

摘要
我们研究了 universality 的率数，确定了所有假设集合中的优化率（几乎Log因子）。这个结果总结了过去关于二分类（Bousquet, Hanneke, Moran, van Handel, 和 Yehudayoff, 2021）的研究，并解决了 Kalavasis, Velegkas, 和 Karbasi (2022) 处理多类标签的问题，他们只处理了具有bounded数量的类标签的情况。相比之下，我们的结果适用于任何可 COUNTABLE 标签空间。即使是Finite 标签空间，我们的证明还提供了更精确的学习曲线，因为它们不dependent于标签数量。我们证明，任何一个类别都可以实现指数率，如果和只有无限Littlestone树，而不是有限GL树。GL树是一种我们在这个工作中定义的新结构，每个节点都是一个可能的多个分类的pseudo-cube。pseudo-cubes是Daniely 和 Shalev-Shwartz (2014) 的工作中的一种结构，而且在 Brukhim, Carmon, Dinur, Moran, 和 Yehudayoff (2022) 的研究中被证明可以 caracterize PAC 学习（即uniform rates）多类标签分类。我们还解决了 Kalavasis, Velegkas, 和 Karbasi (2022) 关于无限GL树与无限NL树之间的等价性问题，证明它们确实是等价的。

Line Graphics Digitization: A Step Towards Full Automation

paper_url: http://arxiv.org/abs/2307.02065
repo_url: https://github.com/moured/document-graphics-digitization
paper_authors: Omar Moured, Jiaming Zhang, Alina Roitberg, Thorsten Schwarz, Rainer Stiefelhagen
for: 本研究旨在提高数字化文档的可访问性和可重现性，特别是对于数据统计图表的自动化涂抹和文本内容的研究已经是长期的焦点。
methods: 本文引入了细致的数学图表视觉理解任务，并提供了Line Graphics（LG）数据集，包括5种粗细类别的像素级别注解。我们的数据集包括450份来自不同领域的文档，共520张数据图像。
results: 我们在7种当前顶峰模型中测试了LG数据集，并发现这些模型在数据统计图表的Semantic Segmentation和Object Detection任务中的表现。为了推动数据统计图表的数字化进程，我们将会在社区内分享数据集、代码和模型。

Abstract
The digitization of documents allows for wider accessibility and reproducibility. While automatic digitization of document layout and text content has been a long-standing focus of research, this problem in regard to graphical elements, such as statistical plots, has been under-explored. In this paper, we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories. Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines. Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection. To benchmark our LG dataset, we explore 7 state-of-the-art models. To foster further research on the digitization of statistical graphs, we will make the dataset, code, and models publicly available to the community.

摘要
digitization of documents allow for wider accessibility and reproducibility。although automatic digitization of document layout and text content has been a long-standing focus of research，this problem in regard to graphical elements，such as statistical plots，has been under-explored。in this paper，we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset，which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories。our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines。our proposed dataset can support two different computer vision tasks，i.e., semantic segmentation and object detection。to benchmark our LG dataset，we explore 7 state-of-the-art models。to foster further research on the digitization of statistical graphs，we will make the dataset，code，and models publicly available to the community。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Facing off World Model Backbones: RNNs, Transformers, and S4

paper_url: http://arxiv.org/abs/2307.02064
repo_url: None
paper_authors: Fei Deng, Junyeong Park, Sungjin Ahn
for: 提高模型基于学习 reinforcement learning（MBRL）代理的能力，增强代理的长期记忆。
methods: explore alternative world model backbones，包括Transformers和Structured State Space Sequence（S4）模型，以提高长期记忆。
results: S4WM表现出优于Transformer-based world models的长期记忆能力，同时具有更高的训练效率和想象能力。这些结果铺开了开发更强的MBRL代理的道路。

Abstract
World models are a fundamental component in model-based reinforcement learning (MBRL) agents. To perform temporally extended and consistent simulations of the future in partially observable environments, world models need to possess long-term memory. However, state-of-the-art MBRL agents, such as Dreamer, predominantly employ recurrent neural networks (RNNs) as their world model backbone, which have limited memory capacity. In this paper, we seek to explore alternative world model backbones for improving long-term memory. In particular, we investigate the effectiveness of Transformers and Structured State Space Sequence (S4) models, motivated by their remarkable ability to capture long-range dependencies in low-dimensional sequences and their complementary strengths. We propose S4WM, the first S4-based world model that can generate high-dimensional image sequences through latent imagination. Furthermore, we extensively compare RNN-, Transformer-, and S4-based world models across four sets of environments, which we have specifically tailored to assess crucial memory capabilities of world models, including long-term imagination, context-dependent recall, reward prediction, and memory-based reasoning. Our findings demonstrate that S4WM outperforms Transformer-based world models in terms of long-term memory, while exhibiting greater efficiency during training and imagination. These results pave the way for the development of stronger MBRL agents.

摘要
世界模型是模型基 Reinforcement learning（MBRL）代理的重要组成部分。为在部分可见环境中进行持续时间扩展和一致的模拟未来，世界模型需要拥有长期记忆。然而，当前的MBRL代理，如梦幻，主要采用回归神经网络（RNN）作为世界模型脊梁，它们具有有限的记忆容量。在这篇论文中，我们寻找了替代的世界模型脊梁，以提高长期记忆。具体来说，我们调查了转换器和结构化状态空间序列（S4）模型，这些模型具有长距离依赖关系的捕捉能力和相互补偿的优势。我们提出了S4WM，首个基于S4模型的世界模型，可以通过幻想生成高维图像序列。此外，我们对RNN-, Transformer-, 和 S4-基于世界模型进行了广泛比较，并在我们专门为评估世界模型的重要记忆能力而设计的四组环境中进行了测试。我们的结果表明，S4WM在长期记忆方面比转换器基本世界模型高效，同时在训练和幻想过程中更高效。这些成果为开发更强的MBRL代理铺平道。

Adversarial Attacks on Image Classification Models: FGSM and Patch Attacks and their Impact

paper_url: http://arxiv.org/abs/2307.02055
repo_url: None
paper_authors: Jaydip Sen, Subhasis Dasgupta
for: 本文介绍了对图像分类模型使用 adversarial 攻击的概念。
methods: 本文讨论了两种常见的 adversarial 攻击方法，即快速梯度签名法 (FGSM) 和 adversarial 贴图攻击。
results: 对三种强大预训练的图像分类模型（ResNet-34、GoogleNet、DenseNet-161）进行了攻击性评估，并计算了图像分类任务中模型在攻击和不攻击情况下的分类精度。

Abstract
This chapter introduces the concept of adversarial attacks on image classification models built on convolutional neural networks (CNN). CNNs are very popular deep-learning models which are used in image classification tasks. However, very powerful and pre-trained CNN models working very accurately on image datasets for image classification tasks may perform disastrously when the networks are under adversarial attacks. In this work, two very well-known adversarial attacks are discussed and their impact on the performance of image classifiers is analyzed. These two adversarial attacks are the fast gradient sign method (FGSM) and adversarial patch attack. These attacks are launched on three powerful pre-trained image classifier architectures, ResNet-34, GoogleNet, and DenseNet-161. The classification accuracy of the models in the absence and presence of the two attacks are computed on images from the publicly accessible ImageNet dataset. The results are analyzed to evaluate the impact of the attacks on the image classification task.

摘要

Graph Neural Network-based Power Flow Model

paper_url: http://arxiv.org/abs/2307.02049
repo_url: None
paper_authors: Mingjian Tuo, Xingpeng Li, Tianxia Zhao
for: 这篇论文的目的是提出一种基于图神经网络（GNN）的电力流计算模型，以提高电力系统中线流计算的准确性和效率。
methods: 该模型使用历史电力系统数据进行训练，并使用图神经网络（GNN）模型来预测电力流结果。
results: 对比于传统的直流电力流计算模型和深度神经网络（DNN）、卷积神经网络（CNN）模型，该GNN模型能够提供更准确的解决方案，并且高效。

Abstract
Power flow analysis plays a crucial role in examining the electricity flow within a power system network. By performing power flow calculations, the system's steady-state variables, including voltage magnitude, phase angle at each bus, active/reactive power flow across branches, can be determined. While the widely used DC power flow model offers speed and robustness, it may yield inaccurate line flow results for certain transmission lines. This issue becomes more critical when dealing with renewable energy sources such as wind farms, which are often located far from the main grid. Obtaining precise line flow results for these critical lines is vital for next operations. To address these challenges, data-driven approaches leverage historical grid profiles. In this paper, a graph neural network (GNN) model is trained using historical power system data to predict power flow outcomes. The GNN model enables rapid estimation of line flows. A comprehensive performance analysis is conducted, comparing the proposed GNN-based power flow model with the traditional DC power flow model, as well as deep neural network (DNN) and convolutional neural network (CNN). The results on test systems demonstrate that the proposed GNN-based power flow model provides more accurate solutions with high efficiency comparing to benchmark models.

摘要
电流流分析在电力系统网络中扮演着关键的角色，可以确定电力系统的稳定状态变量，包括每个总机的相位角和电压大小。虽然广泛使用的直流电流模型具有速度和可靠性，但可能导致certain transmission lines的流量结果不准确。这个问题在处理可再生能源such as wind farms时变得更加重要，这些可再生能源往往位于主网络远离的地方。为了解决这些挑战，数据驱动方法可以利用历史电力系统数据来预测电流流的结果。在这篇论文中，一种基于图神经网络（GNN）模型被训练使用历史电力系统数据来预测电流流的结果。GNN模型可以快速估算线流。我们进行了全面的性能分析，比较了提议的GNN-based电流流模型与传统的直流电流模型、深度神经网络（DNN）和卷积神经网络（CNN）模型。测试系统上的结果表明，提议的GNN-based电流流模型可以提供更加准确的解决方案，并且高效性比benchmark模型更高。

Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation

paper_url: http://arxiv.org/abs/2307.05476
repo_url: None
paper_authors: Jung Hyun Ryu, Jaeheyoung Jeon, Jewoong Cho, Myungjoo Kang 1
for: 这篇论文主要针对推荐系统中的次序推荐问题，即为用户随时间的偏好进行推荐。
methods: 本论文使用了对照学习方法，将多个模型的参数融合，以提高推荐系统的总性能。
results: 经过广泛的实验，本论文显示出该方法的效果，并证明其能够提高次序推荐系统的状态前进。

Abstract
Along with the exponential growth of online platforms and services, recommendation systems have become essential for identifying relevant items based on user preferences. The domain of sequential recommendation aims to capture evolving user preferences over time. To address dynamic preference, various contrastive learning methods have been proposed to target data sparsity, a challenge in recommendation systems due to the limited user-item interactions. In this paper, we are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. This approach ensures robust fine-tuning by merging the parameters of multiple models, resulting in improved overall performance. Through extensive experiments, we demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.

摘要
随着在线平台和服务的快速增长，推荐系统已成为用户喜好的标准工具。序列推荐的领域旨在捕捉用户的时间演变的偏好。为了解决动态偏好的挑战，多种对照学习方法已经被提议用于目标数据稀缺。在这篇论文中，我们是首次将施耐德-抽取方法应用于序列推荐，解决和解决实际挑战。这种方法确保了精度的练习调整，从而提高总性性能。通过广泛的实验，我们证明了我们的提议方法的效果， highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.

VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks

paper_url: http://arxiv.org/abs/2307.02040
repo_url: None
paper_authors: Zhaomin Wu, Junyi Hou, Bingsheng He
for: 本研究はVertical Federated Learning（VFL）の性能评価に适用される公共世界データセットの欠如に対処します。
methods: 本研究では、Feature importanceとFeature correlationの2つの键因子を考虑し、それぞれに対応する评価指标とデータセットの分割方法を提案します。
results: 本研究では、State-of-the-art VFLアルゴリズムの效果的な评価を提供し、Future researchの参考になる価値ある Insightsを提供します。

Abstract
Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from a global set, which only capture a subset of feature distributions, leading to inadequate algorithm performance assessment. This paper addresses these shortcomings by introducing two key factors affecting VFL performance - feature importance and feature correlation - and proposing associated evaluation metrics and dataset splitting methods. Additionally, we introduce a real VFL dataset to address the deficit in image-image VFL scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides valuable insights for future research in the field.

摘要
纵向联合学习（VFL）是训练机器学习模型的重要方法，该方法在分布式数据上进行特征分区。然而由于隐私限制，公共世界中的VFL数据集很少，这些数据集只代表了有限的特征分布。现有的标准 benchmark 通常采用人工生成的数据集，这些数据集只反映了一部分特征分布，导致算法性能评估不准确。本文解决这些缺陷，通过介绍特征重要性和特征相关性两个关键因素，并提出相应的评价指标和数据分割方法。此外，我们还介绍了一个真实存在的VFL数据集，用于解决图像-图像VFL场景中的不足。我们对当前VFL领域最先进的算法进行了全面的评估，提供了valuable的情况参考。

Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

paper_url: http://arxiv.org/abs/2307.02037
repo_url: None
paper_authors: Xunpeng Huang, Hanze Dong, Yifan Hao, Yian Ma, Tong Zhang
for: 本研究探讨了 posterior sampling 的可能性，它是通过反射扩散来实现高质量数据样本的生成模型的效能的一种方法。
methods: 本研究使用了分解过程kernel的技术，将 score estimation 转化为了一个mean estimation问题，从而实现了一种新的 posterior sampling 算法。
results: 我们提供了这种算法的收敛分析，并证明了其在高维样本中的性能比传统MCMC方法更高，这是因为该算法的auxiliary distribution的一些性质可以减少误差。

Abstract
The efficacy of modern generative models is commonly contingent upon the precision of score estimation along the diffusion path, with a focus on diffusion models and their ability to generate high-quality data samples. This study delves into the potentialities of posterior sampling through reverse diffusion. An examination of the sampling literature reveals that score estimation can be transformed into a mean estimation problem via the decomposition of the transition kernel. By estimating the mean of the auxiliary distribution, the reverse diffusion process can give rise to a novel posterior sampling algorithm, which diverges from traditional gradient-based Markov Chain Monte Carlo (MCMC) methods. We provide the convergence analysis in total variation distance and demonstrate that the isoperimetric dependency of the proposed algorithm is comparatively lower than that observed in conventional MCMC techniques, which justifies the superior performance for high dimensional sampling with error tolerance. Our analytical framework offers fresh perspectives on the complexity of score estimation at various time points, as denoted by the properties of the auxiliary distribution.

摘要
现代生成模型的效果通常取决于扩散路径上的分数估计精度，尤其是扩散模型和它们能够生成高质量数据样本。这项研究探讨了反扩散 posterior 采样的可能性，通过将分数估计转换为auxiliary distribution的均值估计问题。通过估计auxiliary distribution的均值，反扩散过程可以生成一种新的 posterior 采样算法，与传统的梯度基本 Markov Chain Monte Carlo (MCMC) 方法不同。我们提供了整体变量距离的收敛分析，并证明了提案的算法的iso依赖关系比传统 MCMC 技术更低，这 justify了高维度采样中的高精度和误差容忍。我们的分析框架为 score estimation 的复杂性在不同时刻点提供了新的视角，即auxiliary distribution的属性。

Ranking with Abstention

paper_url: http://arxiv.org/abs/2307.02035
repo_url: None
paper_authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
for: 这个论文提出了一种新的排名概念，即learner可以在一定成本$c$的情况下决定不预测。
methods: 这个论文使用了一种扩展的理论分析，包括线性函数家族和带有一个隐藏层的神经网络的$H$-一致性 bound。
results: 实验结果表明，这种排名方法在实际应用中具有效果。

Abstract
We introduce a novel framework of ranking with abstention, where the learner can abstain from making prediction at some limited cost $c$. We present a extensive theoretical analysis of this framework including a series of $H$-consistency bounds for both the family of linear functions and that of neural networks with one hidden-layer. These theoretical guarantees are the state-of-the-art consistency guarantees in the literature, which are upper bounds on the target loss estimation error of a predictor in a hypothesis set $H$, expressed in terms of the surrogate loss estimation error of that predictor. We further argue that our proposed abstention methods are important when using common equicontinuous hypothesis sets in practice. We report the results of experiments illustrating the effectiveness of ranking with abstention.

摘要
我们介绍了一种新的排名框架，其中学习者可以在某些有限成本$c$的情况下退出预测。我们提供了广泛的理论分析，包括线性函数家族和带一层隐藏层神经网络的$H$-一致性上下文。这些理论保证是文献中的最佳一致性保证，它们是指定损失函数集$H$中预测器的目标损失估计错误的Upper bound，表示了预测器在损失函数集$H$中的损失估计错误。我们还 argue что我们提议的投降方法在实际中使用公共等距inuous假设集时是重要的。我们报告了实验结果，证明了排名框架的效果。

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

paper_url: http://arxiv.org/abs/2307.02031
repo_url: None
paper_authors: Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Xiaonan Nie, Bin Cui
for: This paper aims to improve the efficiency of training Transformer models across multiple GPUs.
methods: The paper proposes a novel system framework called Galvatron-BMW, which integrates multiple parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy using a decision tree approach and dynamic programming search algorithm.
results: Galvatron-BMW consistently achieves superior system throughput in automating distributed training under varying GPU memory constraints, surpassing previous approaches that rely on limited parallelism strategies.Here is the text in Simplified Chinese:
for: 这篇论文目的是提高多个GPU上Transformer模型的训练效率。
methods: 该论文提出了一种新的系统框架called Galvatron-BMW，它集成了多种并发方向并自动确定最佳混合并发策略，使用决策树方法和动态规划搜索算法。
results: Galvatron-BMW在不同Transformer模型的测试场景中 consistently达到了自动化分布训练的最高系统吞吐量，超过了前一些仅仅采用有限并发策略的方法。

Abstract
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.

摘要
<> tranlate_text: transformer 模型已经成为实现不同应用领域的状态码模型的主流方法， serving 为高级大规模深度学习（DL）模型的基础。然而，在多个GPU上有效地训练这些模型仍然是一个复杂的挑战，因为存在丰富的并行性选择。现有的 DL 系统可以通过手动设计分布式训练计划或限制并行性组合的搜索空间。在这篇论文中，我们提出了 Galvatron-BMW 系统框架，该系统框架集成了多种流行的并行性维度，并自动确定最有效的混合并行性策略。为了有效地探索这个庞大的搜索空间，我们使用决策树方法进行分解和剔除，基于直观的理解。此外，我们还提出了一种动态搜索算法，以derive 最佳计划。此外，为了提高资源利用率和系统效率，我们提出了一种两个目标优化工作流程，该工作流程关注工作负荷均衡。我们对不同的 transformer 模型进行了不同的评估， demonstrates Galvatron-BMW 在不同的 GPU 内存限制下自动化分布式训练的能力。在所有测试场景中，Galvatron-BMW consistently achieve superior system throughput，超越了基于有限并行性策略的前一代方法。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

paper_url: http://arxiv.org/abs/2307.02028
repo_url: https://github.com/som-shahlab/ehrshot-benchmark
paper_authors: Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason Fries, Nigam Shah
for: 本研究的目的是提高医疗机器学习（ML）在医疗领域的进步，通过公共数据集、任务和模型的共享，但医疗领域的ML进步受到共享资产的限制。本研究通过三个贡献来解决这些挑战。
methods: 本研究使用了一个新的数据集，名为EHRSHOT，这是医疗记录电子档案（EHR）中的6,712名患者的去identify的结构化数据。与MIMIC-III/IV和其他流行的EHR数据集不同，EHRSHOT是长期跟踪的，而不是仅仅是ICU/ED patients的数据。此外，本研究还公布了一个141M参数的临床基础模型，这是一个可以处理coded EHR数据的完整模型，而不是只能处理不结构化文本的模型。
results: 本研究定义了15个几个shot临床预测任务，使得可以评估基础模型的样本效率和任务适应性。同时，研究者们还提供了一个可重现结果的代码，以及模型和数据集（通过研究数据使用协议获取）。

Abstract
While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, containing de-identified structured data from the electronic health records (EHRs) of 6,712 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaption. The code to reproduce our results, as well as the model and dataset (via a research data use agreement), are available at our Github repo here: https://github.com/som-shahlab/ehrshot-benchmark

摘要
generale 机器学习（ML）社区得益于公共数据集、任务和模型，而医疗机器学习（ML）的进步却受到公共资产的缺乏所妨碍。成功的基本模型创造了新的挑战，需要访问共享预训练模型来验证性能 beneficiaries。我们通过以下三个贡献来解决这些挑战：1. 我们发布了一个新的数据集，EHRSHOT，包含了医疗电子病历（EHR）中6,712名患者的去掉个人信息的结构化数据。与MIMIC-III/IV和其他流行的EHR数据集不同，EHRSHOT是 longitudinal 的，而不是仅仅是ICU/ED patients。2. 我们发布了一个141M参数的临床基础模型，预训练于EHR数据中的结构化数据中的2.57M名患者。我们是一个 Among the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR。我们提供了一个端到端的管道，让社区可以验证和基于其性能。3. 我们定义了15个几个shot的临床预测任务，使得基础模型的性能在样本效率和任务适应方面进行评估。我们的代码、模型和数据集（通过研究数据用途协议获取）都可以在我们 GitHub 仓库中找到：https://github.com/som-shahlab/ehrshot-benchmark。

Using Random Effects Machine Learning Algorithms to Identify Vulnerability to Depression

paper_url: http://arxiv.org/abs/2307.02023
repo_url: None
paper_authors: Runa Bhaumik, Jonathan Stange
for: 预测青年成年人抑郁症状的诊断和预后 прогнозинг
methods: 使用数据驱动的机器学习方法（RE-EM树和MERF）对抑郁风险因素进行分类和识别
results: 结果表明，RE-EM树和MERF方法可以准确地预测青年成年人抑郁症状，并且可以确定抑郁风险因素的复杂相互作用，以及哪些因素对于预后预测最有用。

Abstract
Background: Reliable prediction of clinical progression over time can improve the outcomes of depression. Little work has been done integrating various risk factors for depression, to determine the combinations of factors with the greatest utility for identifying which individuals are at the greatest risk. Method: This study demonstrates that data-driven machine learning (ML) methods such as RE-EM (Random Effects/Expectation Maximization) trees and MERF (Mixed Effects Random Forest) can be applied to reliably identify variables that have the greatest utility for classifying subgroups at greatest risk for depression. 185 young adults completed measures of depression risk, including rumination, worry, negative cognitive styles, cognitive and coping flexibilities, and negative life events, along with symptoms of depression. We trained RE-EM trees and MERF algorithms and compared them to traditional linear mixed models (LMMs) predicting depressive symptoms prospectively and concurrently with cross-validation. Results: Our results indicated that the RE-EM tree and MERF methods model complex interactions, identify subgroups of individuals and predict depression severity comparable to LMM. Further, machine learning models determined that brooding, negative life events, negative cognitive styles, and perceived control were the most relevant predictors of future depression levels. Conclusions: Random effects machine learning models have the potential for high clinical utility and can be leveraged for interventions to reduce vulnerability to depression.

摘要
背景：可靠预测临床进程的发展可以提高抑郁症的结果。然而，有少量的研究把不同的风险因素 integrate 以确定最有用的组合因素，以确定患有抑郁症的个人是否处于最高风险。方法：本研究表明，数据驱动的机器学习（ML）方法，如RE-EM（随机效应/期望最大化）树和MERF（混合效应随机森林）可以可靠地识别出抑郁症风险的最有用变量。185名年轻成人完成了抑郁风险的测量，包括催眠、担忧、消极思维、认知和处理的灵活性、以及负面生活事件，同时测量抑郁症的 симптом。我们训练了RE-EM树和MERF算法，并与传统的线性混合模型（LMM）相比，预测抑郁症的严重程度。结果：我们的结果表明，RE-EM树和MERF方法可以模型复杂的交互，分类个人为不同的子组合，并且预测抑郁症的严重程度与LMM相当。此外，机器学习模型确定了催眠、负面生活事件、消极思维和感知控制是抑郁症的最有用预测因素。结论：Random effects机器学习模型具有高临床实用性，可以用于降低抑郁症的抵触性。

Modular DFR: Digital Delayed Feedback Reservoir Model for Enhancing Design Flexibility

paper_url: http://arxiv.org/abs/2307.11094
repo_url: None
paper_authors: Sosei Ikeda, Hiromitsu Awano, Takashi Sato
for: 这个论文主要是为了提出一种全数字式延迟反馈水库系统（DFR），以便在硬件实现中使用。
methods: 该论文提出了一种新的模块化DFR模型，该模型可以完全在数字domain中实现，并且可以采用不同的非线性函数进行选择，从而提高准确性而减少功耗。
results: 该论文通过两种不同的非线性函数实现DFR，实现了功耗降低10倍和吞吐量提高5.3倍，而保持相同或更好的准确性。

Abstract
A delayed feedback reservoir (DFR) is a type of reservoir computing system well-suited for hardware implementations owing to its simple structure. Most existing DFR implementations use analog circuits that require both digital-to-analog and analog-to-digital converters for interfacing. However, digital DFRs emulate analog nonlinear components in the digital domain, resulting in a lack of design flexibility and higher power consumption. In this paper, we propose a novel modular DFR model that is suitable for fully digital implementations. The proposed model reduces the number of hyperparameters and allows flexibility in the selection of the nonlinear function, which improves the accuracy while reducing the power consumption. We further present two DFR realizations with different nonlinear functions, achieving 10x power reduction and 5.3x throughput improvement while maintaining equal or better accuracy.

摘要
一种延迟反馈蓄水池（DFR）是一种适合硬件实现的计算系统，具有简单的结构。现有大多数 DFR 实现使用分析电路，需要数字到分析和分析到数字转换器进行交互。然而，数字 DFR 模拟分析电路在数字领域，导致设计灵活性偏低和功耗更高。在本文中，我们提出一种新的模块化 DFR 模型，适合完全数字实现。我们的提案减少了多个超参数，并允许非线性函数的选择，从而提高了准确性，同时降低了功耗。我们进一步采用了两种不同的非线性函数来实现 DFR，实现了功耗降低10倍和通过putthrough提高5.3倍，保持或更好的准确性。

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations

paper_url: http://arxiv.org/abs/2307.03678
repo_url: None
paper_authors: Yuhan Ji, Song Gao
for: 评估大语言模型（LLMs）在表示几何和其空间关系方面的能力。
methods: 使用GPT-2和BERT等大语言模型将文本（WKT）格式的几何编码并feed其 embeddings 到分类器和回归器进行评估效果。
results: LLMs-生成的embeddings可以保持几何类型和捕捉一定的空间关系（准确率达73%），但还存在估算数值和检索空间相关对象的挑战。

Abstract
This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using foundation models.

摘要

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

paper_url: http://arxiv.org/abs/2307.02507
repo_url: None
paper_authors: Lincan Li, Kaixiang Yang, Fengji Luo, Jichao Bi
for: 这个研究目的是为了提高大规模未标注交通数据中的复杂空间时间表现，以及对于其他缺乏数据的跨空间任务。
methods: 这篇论文使用了进步的对照学习和一个新的空间时间同步Contextual Contrastive Learning（STS-CCL）模型，包括对于空间时间граф数据的基本和强化增强方法，以及一个空间时间同步对照模组（STS-CM），以同时捕捉出Decent空间时间依赖关系。
results: 实验和评估结果显示，使用STS-CCL模型建立预测器可以对交通预测 benchmark 进行超越性的表现，并且适合具有缺乏数据的大规模跨空间任务。

Abstract
Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, a Semantic Contextual Contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.

摘要
efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains a challenging task. to address this challenge, this work employs advanced contrastive learning and proposes a novel spatial-temporal synchronous contextual contrastive learning (STS-CCL) model. first, we elaborate on the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. second, we introduce a spatial-temporal synchronous contrastive module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. to further discriminate node individuals in negative filtering, a semantic contextual contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. the proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.

Distilling Missing Modality Knowledge from Ultrasound for Endometriosis Diagnosis with Magnetic Resonance Images

paper_url: http://arxiv.org/abs/2307.02000
repo_url: None
paper_authors: Yuan Zhang, Hu Wang, David Butler, Minh-Son To, Jodie Avery, M Louise Hull, Gustavo Carneiro
for: 提高 Magnetic Resonance Imaging (MRI) 图像中镜像腔腔积极膜（POD）探测精度，使用知识汇抽法。
methods: 利用不同数据集的 teacher 模型和学生模型，通过知识汇抽法进行训练，提高学生模型对 MRI 图像中 POD 探测的精度。
results: 实验结果表明，使用我们提出的方法可以提高 MRI 图像中 POD 探测的精度。

Abstract
Endometriosis is a common chronic gynecological disorder that has many characteristics, including the pouch of Douglas (POD) obliteration, which can be diagnosed using Transvaginal gynecological ultrasound (TVUS) scans and magnetic resonance imaging (MRI). TVUS and MRI are complementary non-invasive endometriosis diagnosis imaging techniques, but patients are usually not scanned using both modalities and, it is generally more challenging to detect POD obliteration from MRI than TVUS. To mitigate this classification imbalance, we propose in this paper a knowledge distillation training algorithm to improve the POD obliteration detection from MRI by leveraging the detection results from unpaired TVUS data. More specifically, our algorithm pre-trains a teacher model to detect POD obliteration from TVUS data, and it also pre-trains a student model with 3D masked auto-encoder using a large amount of unlabelled pelvic 3D MRI volumes. Next, we distill the knowledge from the teacher TVUS POD obliteration detector to train the student MRI model by minimizing a regression loss that approximates the output of the student to the teacher using unpaired TVUS and MRI data. Experimental results on our endometriosis dataset containing TVUS and MRI data demonstrate the effectiveness of our method to improve the POD detection accuracy from MRI.

摘要
具体来说，我们的算法首先在 TVUS 数据上训练一个教师模型，用于检测 PODS 消失。然后，我们将这个教师模型与一个学生模型相结合，使用大量的未标注 pelvic 3D MRI 数据进行训练。接着，我们将教师模型中的知识传授给学生模型，使其通过对不同的 TVUS 和 MRI 数据进行无标注对应的损失函数来学习。实验结果表明，我们的方法可以提高 MRI 上 PODS 的检测精度。

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

paper_url: http://arxiv.org/abs/2307.01998
repo_url: https://github.com/sldgroup/survey-zero-shot-nas
paper_authors: Guihong Li, Duc Hoang, Kartikeya Bhardwaj, Ming Lin, Zhangyang Wang, Radu Marculescu
for: 本文旨在审视和比较当前最佳实践（SOTA）的零shot Neural Architecture Search（NAS）方法，强调它们在硬件上的意识。
methods: 本文首先介绍主流的零shot proxy，并讲解它们的理论基础。然后通过大规模的实验比较这些零shot proxy，并在硬件意识和硬件感知 NAS 场景中证明其效果。
results: 本文的实验结果表明，零shot NAS 方法在硬件意识和硬件感知场景中具有极高的效果，并且可以在不同的硬件背景下进行可靠的 NAS。此外，本文还提出了一些可能更好的 proxy 设计的想法。

Abstract
Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate the NAS from training requirements. The key idea behind zero-shot NAS approaches is to design proxies that predict the accuracies of the given networks without training network parameters. The proxies proposed so far are usually inspired by recent progress in theoretical deep learning and have shown great potential on several NAS benchmark datasets. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches, with an emphasis on their hardware awareness. To this end, we first review the mainstream zero-shot proxies and discuss their theoretical underpinnings. We then compare these zero-shot proxies through large-scale experiments and demonstrate their effectiveness in both hardware-aware and hardware-oblivious NAS scenarios. Finally, we point out several promising ideas to design better proxies. Our source code and the related paper list are available on https://github.com/SLDGroup/survey-zero-shot-nas.

摘要
最近，零shot（或无需训练）神经建筑搜索（NAS）方法已经被提出，以解 liberate NAS 从训练要求中。零shot NAS 方法的关键想法是通过不需要训练网络参数来预测网络的准确性。已经提出的proxy都是基于现代神经网络理论的发展，在几个 NAS 比赛数据集上显示出了极高的潜力。这篇论文的目的是对当前领先的零shot NAS 方法进行全面的审视和比较，强调硬件意识。为此，我们首先介绍主流零shot proxy，并讨论它们的理论基础。然后，我们通过大规模的实验比较这些零shot proxy，并在硬件意识和硬件无知 NAS 场景中证明它们的效iveness。最后，我们提出了一些可能会设计更好的proxy的想法。我们的源代码和相关论文列表可以在https://github.com/SLDGroup/survey-zero-shot-nas上获取。

Dynamic Feature-based Deep Reinforcement Learning for Flow Control of Circular Cylinder with Sparse Surface Pressure Sensing

paper_url: http://arxiv.org/abs/2307.01995
repo_url: None
paper_authors: Qiulei Wang, Lei Yan, Gang Hu, Wenli Chen, Bernd R. Noack
for: 这个研究旨在开发一种基于深度学习的闭Loop瓣纹控制算法，以降低瓣纹 drag 和 lift 波动，并且在感知不充分的情况下进行自适应控制。
methods: 该研究基于深度学习，将感知信号提升为动态特征（DF），以预测未来的流态态。 resulting DF-DRL 自动学习了响应控制器，无需动态模型。
results: 对比标准模型，DF-DRL 模型的瓣纹系数降低了25%。使用单个表面压力传感器，DF-DRL 可以降低瓣纹系数到状态 искусственный智能性的8%，并且减少了升力系数波动。这种方法还在更高的 Reynolds 数下表现良好，降低了瓣纹系数32.2% 和 46.55%。

Abstract
This study proposes a self-learning algorithm for closed-loop cylinder wake control targeting lower drag and lower lift fluctuations with the additional challenge of sparse sensor information, taking deep reinforcement learning as the starting point. DRL performance is significantly improved by lifting the sensor signals to dynamic features (DF), which predict future flow states. The resulting dynamic feature-based DRL (DF-DRL) automatically learns a feedback control in the plant without a dynamic model. Results show that the drag coefficient of the DF-DRL model is 25% less than the vanilla model based on direct sensor feedback. More importantly, using only one surface pressure sensor, DF-DRL can reduce the drag coefficient to a state-of-the-art performance of about 8% at Re = 100 and significantly mitigate lift coefficient fluctuations. Hence, DF-DRL allows the deployment of sparse sensing of the flow without degrading the control performance. This method also shows good robustness in controlling flow under higher Reynolds numbers, which reduces the drag coefficient by 32.2% and 46.55% at Re = 500 and 1000, respectively, indicating the broad applicability of the method. Since surface pressure information is more straightforward to measure in realistic scenarios than flow velocity information, this study provides a valuable reference for experimentally designing the active flow control of a circular cylinder based on wall pressure signals, which is an essential step toward further developing intelligent control in realistic multi-input multi-output (MIMO) system.

摘要

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

paper_url: http://arxiv.org/abs/2307.01984
repo_url: https://github.com/neheller/kits21
paper_authors: Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jianpeng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee Schoephoerster, Bailey Abernathy, David Wu, Safa Abdulkadir, Ben Byun, Justice Spriggs, Griffin Struyk, Alexandra Austin, Ben Simpson, Michael Hagstrom, Sierra Virnig, John French, Nitin Venkatesh, Sarah Chan, Keenan Moore, Anna Jacobsen, Susan Austin, Mark Austin, Subodh Regmi, Nikolaos Papanikolopoulos, Christopher Weight
for: 本文是关于2021年的肾茵和肾肿瘤分割挑战（KiTS21）的挑战报告，与2021年的医疗图像计算和计算机助手外科会议（MICCAI）一起举行。
methods: 本挑战使用了一种新的标注方法，收集了每个区域兴趣的三个独立标注，并使用了一个基于网络的标注工具进行完全透明的标注。此外，KiTS21测试集来自外部机构，挑战参与者开发出能够通用化的方法。
results: despite the challenges, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. Here’s the translation in Traditional Chinese:
for: 本文是关于2021年的肾茵和肾肿瘤分割挑战（KiTS21）的挑战报告，与2021年的医疗图像计算和计算机助手外科会议（MICCAI）一起举行。
methods: 本挑战使用了一种新的标注方法，收集了每个区域兴趣的三个独立标注，并使用了一个基于网络的标注工具进行完全透明的标注。此外，KiTS21测试集来自外部机构，挑战参与者开发出能够通用化的方法。
results: despite the challenges, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance.

Abstract
This paper presents the challenge report for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) held in conjunction with the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI). KiTS21 is a sequel to its first edition in 2019, and it features a variety of innovations in how the challenge was designed, in addition to a larger dataset. A novel annotation method was used to collect three separate annotations for each region of interest, and these annotations were performed in a fully transparent setting using a web-based annotation tool. Further, the KiTS21 test set was collected from an outside institution, challenging participants to develop methods that generalize well to new populations. Nonetheless, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. An in-depth meta-analysis is presented describing which methods were used and how they faired on the leaderboard, as well as the characteristics of which cases generally saw good performance, and which did not. Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.

摘要
这篇论文介绍了2021年的肾脏和肾肿瘤分割挑战（KiTS21）的挑战报告，该挑战在2021年的医学影像计算和计算助手外科学会（MICCAI）会议上举行。KiTS21是2019年的首届版本的续作，它在设计方面添加了许多创新，同时使用了更大的数据集。在这次挑战中，使用了一种新的注解方法，每个区域兴趣都有三个独立的注解，并在网络上使用了 transparent 的注解工具进行了注解。此外，KiTS21 测试集来自于外部机构，挑战参与者们开发出能够在新人口中广泛应用的方法。不过，最高排名的团队在2019年的状态前进set上达到了显著的改进，并且这种性能在人类水平逐渐往近。文章还提供了一个深入的meta-分析，描述了参与者们使用的方法以及其在排名表上的表现，以及特定情况下的好坏表现。总的来说，KiTS21 对肾脏瘤分割领域的状态前进做出了重要贡献，并为 semantic segmentation 领域提供了有用的指导。

A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis

paper_url: http://arxiv.org/abs/2307.01981
repo_url: None
paper_authors: Jiaxiang Liu, Tianxiang Hu, Yan Zhang, Xiaotang Gai, Yang Feng, Zuozhu Liu
for: 这个研究是为了提出一个零条件医疗影像分类框架，以便在实际应用中对于有限的疾病或大规模标注数据进行医疗诊断。
methods: 这个研究使用了CLIP的预训练视觉语言模型，并与ChatGPT进行整合，以提供可解释的医疗诊断。在这个框架中，我们使用了分类名称来询问大型语言模型（LLMs），以生成更多的cue和知识，例如疾病 симптом或描述，帮助提供更加精确和可解释的诊断。
results: 我们在一个私人数据集和四个公共数据集上进行了广泛的实验，并进行了详细分析，结果显示了我们的零条件医疗影像分类框架的有效性和可解释性，证明了VLMs和LLMs在医疗应用中的巨大潜力。

Abstract
Zero-shot medical image classification is a critical process in real-world scenarios where we have limited access to all possible diseases or large-scale annotated data. It involves computing similarity scores between a query medical image and possible disease categories to determine the diagnostic result. Recent advances in pretrained vision-language models (VLMs) such as CLIP have shown great performance for zero-shot natural image recognition and exhibit benefits in medical applications. However, an explainable zero-shot medical image recognition framework with promising performance is yet under development. In this paper, we propose a novel CLIP-based zero-shot medical image classification framework supplemented with ChatGPT for explainable diagnosis, mimicking the diagnostic process performed by human experts. The key idea is to query large language models (LLMs) with category names to automatically generate additional cues and knowledge, such as disease symptoms or descriptions other than a single category name, to help provide more accurate and explainable diagnosis in CLIP. We further design specific prompts to enhance the quality of generated texts by ChatGPT that describe visual medical features. Extensive results on one private dataset and four public datasets along with detailed analysis demonstrate the effectiveness and explainability of our training-free zero-shot diagnosis pipeline, corroborating the great potential of VLMs and LLMs for medical applications.

摘要
zero-shot医疗影像分类是现实世界中的关键过程，其中我们可能只有有限的疾病或大规模注释的数据。它涉及计算医疗影像和可能的疾病类别之间的相似性分数，以确定诊断结果。现代预训练视觉语言模型（VLM），如CLIP，在无需训练的情况下显示出了非常好的性能，并且在医疗应用中展现出了优势。然而，一个可解释的无需训练医疗影像分类框架仍然在开发中。在本文中，我们提出了一种基于CLIP的新的无需训练医疗影像分类框架，并与ChatGPT结合使用以提供可解释的诊断。我们的关键想法是使用类别名称来查询大型语言模型（LLM），以自动生成更多的引导和知识，如疾病 симптом或描述，以帮助提供更准确和可解释的诊断。我们还设计了特定的提示，以提高生成的文本中的可读性。我们的无需训练零shot诊断管道在一个私人数据集和四个公共数据集上进行了广泛的测试，并进行了详细的分析，结果证明了我们的训练free零shot诊断管道的有效性和可解释性，证明了VLM和LLM在医疗应用中的潜力。

Algorithme EM régularisé

paper_url: http://arxiv.org/abs/2307.01955
repo_url: None
paper_authors: Pierre Houdouin, Matthieu Jonkcheere, Frederic Pascal
for: 用于处理小样本大数据的 Gaussian Mixture Model (GMM) 最优化likelihood问题。
methods: 提出了一种受限制的EM算法，通过使用先验知识来缓解小样本大数据的问题，以确保covariance矩阵更新的正定性。
results: 实验表明该方法在 clustering 任务中表现良好。

Abstract
Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing maximum likelihood estimate when dealing with Gaussian Mixture Model (GMM). When the sample size is smaller than the data dimension, this could lead to a singular or poorly conditioned covariance matrix and, thus, to performance reduction. This paper presents a regularized version of the EM algorithm that efficiently uses prior knowledge to cope with a small sample size. This method aims to maximize a penalized GMM likelihood where regularized estimation may ensure positive definiteness of covariance matrix updates by shrinking the estimators towards some structured target covariance matrices. Finally, experiments on real data highlight the good performance of the proposed algorithm for clustering purposes

摘要
<>预期最大化（EM）算法是一种广泛使用的迭代算法，用于计算 Gaussian Mixture Model（GMM）中的最大可能性。当样本大小小于数据维度时，这可能导致一个稀疏或不良条件的协方差矩阵，从而导致性能下降。这篇文章提出了一种经过规格化的 EM 算法，可以有效地利用先前知识来应对小样本大小。该方法的目标是最大化约束后 GMM likelihood，通过压缩估计器向一些结构化目标协方差矩阵偏转。最后，在实际数据上进行了 clustering 实验，并证明了该算法的良好性能。>>>

FEMDA: Une méthode de classification robuste et flexible

paper_url: http://arxiv.org/abs/2307.01954
repo_url: None
paper_authors: Pierre Houdouin, Matthieu Jonckheere, Frederic Pascal
for: 本研究旨在提出一种可以承受不同标准差和独立但不同分布的样本的新分类分析技术，以替代传统的线性和 quadratic discriminant分析方法，这些方法受到非泊oluisson分布和杂乱数据的影响。
methods: 该技术基于每个数据点都由其自己的arbitrary Elliptically Symmetrical（ES）分布和自己的扩展参数来定义，这使得模型能够处理可能非常不同、独立但不同分布的样本。
results: 该技术比其他状态艺术方法更快速、简单，对于涉及到非泊oluisson分布和杂乱数据的情况具有更高的Robustness，可以更好地适应实际应用中的数据分布。

Abstract
Linear and Quadratic Discriminant Analysis (LDA and QDA) are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. This paper studies the robustness to scale changes in the data of a new discriminant analysis technique where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. The new decision rule derived is simple, fast, and robust to scale changes in the data compared to other state-of-the-art method

摘要

A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.01951
repo_url: https://github.com/kvignesh1420/gnn_collapse
paper_authors: Vignesh Kothapalli, Tom Tirer, Joan Bruna
for: 本研究ocuses on node-wise classification tasks using graph neural networks (GNNs), and explores the interplay between graph topology and feature evolution.
methods: 本研究使用了community detection on stochastic block model graphs to illustrate the feature evolution, and explores the “Neural Collapse” (NC) phenomenon to understand the reduction in within-class variability.
results: 研究发现，在node-wise classification setting中，也有一定的减少内类差异，但不如instance-wise caso。然而，我们通过理论分析发现，这种减少内类差异的情况需要图像 obey certain strict structural conditions。此外，我们还研究了层次的feature variability evolution和spectral methods的差异。

Abstract
Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. Yet, the interplay between graph topology and feature evolution in GNNs is not well understood. In this paper, we focus on node-wise classification, illustrated with community detection on stochastic block model graphs, and explore the feature evolution through the lens of the "Neural Collapse" (NC) phenomenon. When training instance-wise deep classifiers (e.g. for image classification) beyond the zero training error point, NC demonstrates a reduction in the deepest features' within-class variability and an increased alignment of their class means to certain symmetric structures. We start with an empirical study that shows that a decrease in within-class variability is also prevalent in the node-wise classification setting, however, not to the extent observed in the instance-wise case. Then, we theoretically study this distinction. Specifically, we show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse. Interestingly, this condition is viable also for heterophilic graphs and relates to recent empirical studies on settings with improved GNNs' generalization. Furthermore, by studying the gradient dynamics of the theoretical model, we provide reasoning for the partial collapse observed empirically. Finally, we present a study on the evolution of within- and between-class feature variability across layers of a well-trained GNN and contrast the behavior with spectral methods.

摘要
格 Edge 神经网络 (GNNs) 在图像数据上的分类任务中得到了广泛的应用。然而，图像结构和特征进化在 GNNs 之间的关系还不够了解。在这篇论文中，我们将注意力集中在图像分类任务上，使用社会均衡图来检测社群，并通过 "神经崩溃" (NC) 现象来探索特征进化。当训练深度分类器（例如图像分类） beyond 零训练错误点时，NC 显示出深度特征内部的同类变化减少和类别中心对某些对称结构的偏好增加。我们开始于一个实验研究，显示在节点级分类设定下，也存在类似的减少同类变化现象，但不如实例级分类情况那么严重。然后，我们进行了理论研究。我们表明，即使使用 "乐观" 的数学模型， graphs 需要遵循一种严格的结构条件，以便具有精确的崩溃。意外地，这种条件适用于异谱图也，并与最近的实际研究中的 GNNs 的泛化有关。此外，我们通过研究理论模型的梯度动力学，提供了崩溃观察到的解释。最后，我们展示了一个层次进化特征的演化过程，并与 спектраль方法相比较。

A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization

paper_url: http://arxiv.org/abs/2307.01946
repo_url: None
paper_authors: Kshama Kodthalu Shivashankara, Afagh Mehri Shervedani, Reza Sameni
for: 这个论文的目的是提出一种新的方法来生成Synthetic ECG图像，以便用于训练深度学习模型进行算法式ECG诊断。
methods: 该方法利用了深度学习图像处理技术，并将 PhysioNet PTB-XL ECG时间序列数据作为引用时间序列数据，通过数据扩展技术来生成Synthetic ECG图像。
results: 研究人员通过计算信号噪声比（SNR）来评估生成的Synthetic ECG图像质量，结果显示了平均信号恢复SNR为27$\pm$2.8dB，这说明了提出的Synthetic ECG图像集可以用于训练深度学习模型。

Abstract
The electrocardiogram (ECG) is an accurate and widely available tool for diagnosing cardiovascular diseases. ECGs have been recorded in printed formats for decades and their digitization holds great potential for training machine learning (ML) models in algorithmic ECG diagnosis. Physical ECG archives are at risk of deterioration and scanning printed ECGs alone is insufficient, as ML models require ECG time-series data. Therefore, the digitization and conversion of paper ECG archives into time-series data is of utmost importance. Deep learning models for image processing show promise in this regard. However, the scarcity of ECG archives with reference time-series is a challenge. Data augmentation techniques utilizing \textit{digital twins} present a potential solution. We introduce a novel method for generating synthetic ECG images on standard paper-like ECG backgrounds with realistic artifacts. Distortions including handwritten text artifacts, wrinkles, creases and perspective transforms are applied to the generated images, without personally identifiable information. As a use case, we generated an ECG image dataset of 21,801 records from the 12-lead PhysioNet PTB-XL ECG time-series dataset. A deep ECG image digitization model was built and trained on the synthetic dataset, and was employed to convert the synthetic images to time-series data for evaluation. The signal-to-noise ratio (SNR) was calculated to assess the image digitization quality vs the ground truth ECG time-series. The results show an average signal recovery SNR of 27$\pm$2.8\,dB, demonstrating the significance of the proposed synthetic ECG image dataset for training deep learning models. The codebase is available as an open-access toolbox for ECG research.

摘要
电rokardiogram (ECG) 是一种精度很高且普遍可用的工具，用于诊断心血管疾病。ECG 已经被记录在Printed format 中decades，其数字化具有很大的潜力，用于训练机器学习（ML）模型。Physical ECG archive 面临着逐渐衰老和损坏的风险，而且将Printed ECG 纸背景上的ECG 记录scan alone 是不够的，因为ML 模型需要时间序列数据。因此，将纸背景上的ECG 记录数字化和转换为时间序列数据是非常重要的。深度学习模型 для图像处理表示了可能性。然而，获取ECG archive 中的参考时间序列数据是一个挑战。使用数据扩展技术利用“数字双胞胎”的想法可以解决这个问题。我们提出了一种新的方法，用于在标准纸背景上生成synthetic ECG 图像。这些图像包括手写文本 artifacts、折叠、皱纹和视角变换等缺失，但不包含个人可识别信息。作为用例，我们生成了21,801个纪录，来自PhysioNet PTB-XL ECG 时间序列 dataset。我们建立了一个深度ECG 图像数字化模型，并将其训练在生成的synthetic dataset上。然后，我们使用该模型将生成的synthetic图像转换为时间序列数据，并计算了信号噪声比（SNR）来评估图像数字化质量与真实ECG 时间序列数据之间的对比。结果显示，生成的ECG 图像数据的平均信号恢复SNR为27$\pm$2.8dB，这说明了我们提出的Synthetic ECG 图像dataset的重要性。我们的代码库作为一个开源工具箱，用于心血管疾病研究。

Text + Sketch: Image Compression at Ultra Low Rates

paper_url: http://arxiv.org/abs/2307.01944
repo_url: https://github.com/leieric/text-sketch
paper_authors: Eric Lei, Yiğit Berkay Uslu, Hamed Hassani, Shirin Saeedi Bidokhti
for: 本文旨在探讨如何使用文本描述生成高质量图像，并用于图像压缩。
methods: 本文使用了一些直接使用预训练模型进行图像压缩的技术，包括使用文本描述和侧信息生成高质量重建图像，以及使用预训练模型进行图像压缩。
results: 研究发现，使用这些技术可以在非常低的比特率下实现高度的semantic和spatial结构保持，并且在learned compressors中显著提高了感知和semantic faithfulness。

Abstract
Recent advances in text-to-image generative models provide the ability to generate high-quality images from short text descriptions. These foundation models, when pre-trained on billion-scale datasets, are effective for various downstream tasks with little or no further training. A natural question to ask is how such models may be adapted for image compression. We investigate several techniques in which the pre-trained models can be directly used to implement compression schemes targeting novel low rate regimes. We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions that preserve both semantics and spatial structure of the original. We demonstrate that at very low bit-rates, our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.

摘要

A Neural Network-Based Enrichment of Reproducing Kernel Approximation for Modeling Brittle Fracture

paper_url: http://arxiv.org/abs/2307.01937
repo_url: None
paper_authors: Jonghyuk Baek, Jiun-Shyan Chen
For: The paper is written to propose an improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) for modeling brittle fracture.* Methods: The proposed method uses a background reproducing kernel (RK) approximation defined on a coarse and uniform discretization, which is enriched by a neural network (NN) approximation under a Partition of Unity framework. The NN approximation automatically locates and inserts regularized discontinuities in the function space.* Results: The proposed method is demonstrated to be effective through a series of numerical examples involving damage propagation and branching, and the solution convergence of the proposed method is guaranteed.Here are the three points in Simplified Chinese:* For: 这篇论文是为了提出一种改进版的神经网络增强的 reproduce kernel particle method (NN-RKPM)，用于模拟脆性断裂。* Methods: 该方法使用了一个背景 reproduce kernel (RK) approximation，定义在一个粗略和均匀的离散中，并通过一个神经网络 (NN) aproximation 下的 Partition of Unity 框架进行增强。NN aproximation 自动在函数空间中找到和插入正规破碎。* Results: 该方法在一系列的数值例子中，包括损害传播和分支，并且解的收敛性是保证的。

Abstract
Numerical modeling of localizations is a challenging task due to the evolving rough solution in which the localization paths are not predefined. Despite decades of efforts, there is a need for innovative discretization-independent computational methods to predict the evolution of localizations. In this work, an improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) is proposed for modeling brittle fracture. In the proposed method, a background reproducing kernel (RK) approximation defined on a coarse and uniform discretization is enriched by a neural network (NN) approximation under a Partition of Unity framework. In the NN approximation, the deep neural network automatically locates and inserts regularized discontinuities in the function space. The NN-based enrichment functions are then patched together with RK approximation functions using RK as a Partition of Unity patching function. The optimum NN parameters defining the location, orientation, and displacement distribution across location together with RK approximation coefficients are obtained via the energy-based loss function minimization. To regularize the NN-RK approximation, a constraint on the spatial gradient of the parametric coordinates is imposed in the loss function. Analysis of the convergence properties shows that the solution convergence of the proposed method is guaranteed. The effectiveness of the proposed method is demonstrated by a series of numerical examples involving damage propagation and branching.

摘要
numerical modeling of localizations 是一个复杂的任务，因为本地化路径不是预定的。 DESPITE 数十年的努力，目前仍需要创新的离散独立计算方法，以预测本地化的演化。在这种工作中，一种改进的神经网络增强的复现器kernel方法（NN-RKPM）被提议用于模拟脆弱裂解。在提议的方法中，背景的复现器kernel（RK）approximation在粗略和均匀的离散上定义，然后通过一个神经网络（NN）approximation在Partition of Unity框架下进行增强。在NNapproximation中，深度神经网络自动在函数空间中找到并插入正规化缺陷。然后，NN基于的增强函数被与RKapproximation函数用RK作为Partition of Unity patching函数相连接。通过能量基本的损失函数最小化来获取优化NN参数，其中NN参数包括位置、方向和分布的拟合。为了正则化NN-RKapproximation，在损失函数中添加了空间梯度的约束。分析表示方法的扩散性是保证的。通过一系列数字示例，包括损失传播和分支，这种方法的有效性得到证明。

MDI+: A Flexible Random Forest-Based Feature Importance Framework

paper_url: http://arxiv.org/abs/2307.01932
repo_url: https://github.com/csinva/imodels
paper_authors: Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu
for: 本研究旨在提出一种可变的特征重要性框架，即MDI+，以提高Random Forest模型中特征的重要性评估。
methods: 本研究使用了Random Forest模型和Generalized Linear Models（GLMs），并提出了一种基于Predictability、Computability和Stability框架的指南，以帮助实践者选择适合的GLM和评价指标。
results: 实验表明，MDI+可以在识别信号特征方面表现出色，并且在实际应用中可以提取已知的预测性基因，并且比现有的特征重要性评估方法具有更高的稳定性。

Abstract
Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.

摘要
“ mean decrease in impurity (MDI) 是一个流行的特征重要度量表 для random forest (RF)。我们证明了 MDI 中的特征 $X_k$ 在每棵树中的 RF 相等于不调和的 $R^2$ 值在对应的决策探针中的线性回传模型中。我们使用这个解释来提出一个灵活的特征重要度框架called MDI+。 Specifically, MDI+ 将 MDI 扩展到让分析师可以更改线性回传模型和 $R^2$ 指标，并且包括额外的特征以减少决策树对添加或平滑模型的偏见。我们还提供适当的 GLM 和指标基于 Predictability, Computability, Stability 框架的指南。广泛的数据验证表明 MDI+ 可以对应用于特征重要度度量表示明显的提高。我们还应用 MDI+ 到了两个实际的应用案例：药物对应预测和乳癌类型分类。我们发现 MDI+ 可以提取稳定且有高预测力的遗传因素，较常用的特征重要度度量表示明显的更好。所有的代码和模型都可以在 GitHub 上找到。”

Learning ECG signal features without backpropagation

paper_url: http://arxiv.org/abs/2307.01930
repo_url: None
paper_authors: Péter Pósfay, Marcell T. Kurbucz, Péter Kovács, Antal Jakovác
for: 这篇论文的目的是提出一种新的方法来生成时间序列数据的表示方式，以提高下游任务的效果、范围和可应用性。
methods: 该方法基于物理学的想法，通过数据驱动的方式构建一个减少的表示，同时能够捕捉数据的下面结构和任务特定信息，并且仍然保持易于理解、可读性和验证性。
results: 通过应用该方法于心跳信号分类任务，实现了状态首位表现。

Abstract
Representation learning has become a crucial area of research in machine learning, as it aims to discover efficient ways of representing raw data with useful features to increase the effectiveness, scope and applicability of downstream tasks such as classification and prediction. In this paper, we propose a novel method to generate representations for time series-type data. This method relies on ideas from theoretical physics to construct a compact representation in a data-driven way, and it can capture both the underlying structure of the data and task-specific information while still remaining intuitive, interpretable and verifiable. This novel methodology aims to identify linear laws that can effectively capture a shared characteristic among samples belonging to a specific class. By subsequently utilizing these laws to generate a classifier-agnostic representation in a forward manner, they become applicable in a generalized setting. We demonstrate the effectiveness of our approach on the task of ECG signal classification, achieving state-of-the-art performance.

摘要
研究者们在机器学习领域内，尤其是在 Representation learning 方面，为了找到可以快速、高效地将原始数据转换为有用特征，以提高下游任务（如分类和预测）的效果、范围和可重用性。在这篇论文中，我们提出了一种新的方法，用于生成时间序列型数据的表示。这种方法基于物理学的想法，通过在数据驱动的方式下构建一个压缩表示，能够捕捉数据的下面结构和任务特定信息，同时仍然保持易于理解、可读性和可验证性。这种新的方法ology 目标是在特定类别中找到共同的特征，并通过这些法律生成一个批处器无关的表示，以便在总体上应用。我们在 ECG 信号分类任务中证明了我们的方法的效果，达到了领导性的表现。

ProtoDiffusion: Classifier-Free Diffusion Guidance with Prototype Learning

paper_url: http://arxiv.org/abs/2307.01924
repo_url: None
paper_authors: Gulcin Baykal, Halil Faruk Karagoz, Taha Binhuraib, Gozde Unal
for: 提高生成质量和稳定性，减少训练时间
methods: integrate prototype learning into diffusion models
results: 在不同的数据集和实验设置下，成功实现更高的生成质量和更快的训练时间

Abstract
Diffusion models are generative models that have shown significant advantages compared to other generative models in terms of higher generation quality and more stable training. However, the computational need for training diffusion models is considerably increased. In this work, we incorporate prototype learning into diffusion models to achieve high generation quality faster than the original diffusion model. Instead of randomly initialized class embeddings, we use separately learned class prototypes as the conditioning information to guide the diffusion process. We observe that our method, called ProtoDiffusion, achieves better performance in the early stages of training compared to the baseline method, signifying that using the learned prototypes shortens the training time. We demonstrate the performance of ProtoDiffusion using various datasets and experimental settings, achieving the best performance in shorter times across all settings.

摘要
Diffusion models 是一类生成模型，在生成质量和训练稳定性方面表现出了明显的优势。然而，训练 diffusion models 所需的计算资源增加了 considrably。在这种情况下，我们将 prototype learning 引入 diffusion models，以实现更高的生成质量和更快的训练速度。而不是使用随机初始化的类嵌入，我们使用分开学习的类prototype来导引diffusion过程。我们发现，我们的方法（即 ProtoDiffusion）在训练的早期阶段表现出了更好的性能，这表明使用学习的 prototype 可以缩短训练时间。我们通过不同的数据集和实验设置来证明 ProtoDiffusion 的性能，在所有设置下都达到了最佳性能，并且在更短的时间内完成。

ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling

paper_url: http://arxiv.org/abs/2307.01909
repo_url: https://github.com/aditya-grover/climate-learn
paper_authors: Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover
for:* The paper is written to introduce an open-source PyTorch library called ClimateLearn for training and evaluating machine learning models in data-driven climate science.methods:* The library includes holistic pipelines for dataset processing, state-of-the-art deep learning models, and quantitative and qualitative evaluation for standard weather and climate modeling tasks.results:* The authors have performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of their library, and to their knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems.

Abstract
Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts. In recent years, there has been a surging interest in applying data-driven methods based on machine learning for solving core problems such as weather forecasting and climate downscaling. Despite promising results, much of this progress has been impaired due to the lack of large-scale, open-source efforts for reproducibility, resulting in the use of inconsistent or underspecified datasets, training setups, and evaluations by both domain scientists and artificial intelligence researchers. We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. ClimateLearn consists of holistic pipelines for dataset processing (e.g., ERA5, CMIP6, PRISM), implementation of state-of-the-art deep learning models (e.g., Transformers, ResNets), and quantitative and qualitative evaluation for standard weather and climate modeling tasks. We supplement these functionalities with extensive documentation, contribution guides, and quickstart tutorials to expand access and promote community growth. We have also performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of our library. To our knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems. Our library is available publicly at https://github.com/aditya-grover/climate-learn.

摘要
模拟天气和气候是一项非常重要的努力，以便更好地理解气候变化的短期和长期影响，以及为适应和控制努力提供技术和政策。在过去几年中，有一种增长的兴趣在应用基于机器学习的数据驱动方法来解决气候科学中的核心问题，如天气预报和气候减小。然而，由于缺乏大规模、开源的努力，导致许多进步受到了限制，因为很多域科学家和人工智能研究者使用不一致或不够特定的数据集、训练setup和评估方法。我们介绍了一个名为ClimateLearn的开源PyTorch库，该库可以很大程度地简化天气预报和气候模型训练和评估的过程。ClimateLearn包括整体数据处理管道（如ERA5、CMIP6、PRISM）、现代深度学习模型（如转换器、径深网络）的实现，以及标准天气和气候模型计算任务的量化和质量评估。我们还提供了广泛的文档、贡献指南和快速入门教程，以扩大访问权限和促进社区增长。我们还执行了广泛的预测和减小实验，以示出库的能力和关键特点。到我们所知，ClimateLearn是首个大规模、开源的气候科学与现代机器学习系统之间的桥梁。我们的库可以在https://github.com/aditya-grover/climate-learn上获取。

Stability Analysis Framework for Particle-based Distance GANs with Wasserstein Gradient Flow

paper_url: http://arxiv.org/abs/2307.01879
repo_url: None
paper_authors: Chuqi Chen, Yue Wu, Yang Xiang
For: 本研究 investigate the training process of generative networks that use particle-based distance as the objective function, such as MMD GAN, Cramér GAN, and EIEG GAN. However, these GANs often suffer from unstable training.* Methods: 我们 analyze the stability of the training process of these GANs from the perspective of probability density dynamics. We regard the discriminator $D$ as a feature transformation mapping and the generator $G$ as a random variable mapping. We use the Wasserstein gradient flow of the probability density function to perform stability analysis.* Results: 我们发现 that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.

Abstract
In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cram\'er GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability density dynamics. In our framework, we regard the discriminator $D$ in these GANs as a feature transformation mapping that maps high dimensional data into a feature space, while the generator $G$ maps random variables to samples that resemble real data in terms of feature space. This perspective enables us to perform stability analysis for the training of GANs using the Wasserstein gradient flow of the probability density function. We find that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.

摘要
在这篇论文中，我们研究了生成网络在使用某种概率密度距离函数作为目标函数时的训练过程，例如MMD GAN、Cramér GAN、EIEG GAN。然而，这些GANs经常遇到训练不稳定的问题。在这篇论文中，我们从概率密度动力学的角度分析了这些GANs的训练过程的稳定性。我们认为权重网络$D$ acts as a feature transformation mapping that maps high-dimensional data into a feature space, while generator $G$ maps random variables to samples that resemble real data in terms of feature space.这种角度允许我们使用泊松流程来分析GANs的训练过程的稳定性。我们发现通常在GANs中的训练过程中，权重网络的训练是不稳定的，这是由于GANs中的$\min_G \max_D E(G, D)$的形式化引起的。为了解决这个问题，我们在权重网络的损失函数中添加了稳定化项。我们进行了实验来验证我们的稳定性分析和稳定化方法。

Fast Private Kernel Density Estimation via Locality Sensitive Quantization

paper_url: http://arxiv.org/abs/2307.01877
repo_url: https://github.com/talwagner/lsq
paper_authors: Tal Wagner, Yonatan Naamad, Nina Mishra
for: efficient mechanisms for differentially private kernel density estimation (DP-KDE)
methods: Locality Sensitive Quantization (LSQ) framework, which leverages existing non-private KDE methods and privatizes them in a black-box manner
results: DP-KDE mechanisms that are fast and accurate on large datasets in both high and low dimensions, with linear time complexity in the number of dimensions $d$

Abstract
We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved bounds for low-dimensional data. Our results are obtained through a general framework, which we term Locality Sensitive Quantization (LSQ), for constructing private KDE mechanisms where existing KDE approximation techniques can be applied. It lets us leverage several efficient non-private KDE methods -- like Random Fourier Features, the Fast Gauss Transform, and Locality Sensitive Hashing -- and ``privatize'' them in a black-box manner. Our experiments demonstrate that our resulting DP-KDE mechanisms are fast and accurate on large datasets in both high and low dimensions.

摘要
我们研究高效的权限私钥频率概率密度估计（DP-KDE）机制。先前的工作对于 Gaussian kernel 提出了时间复杂度为对数函数($d$)的算法。这篇论文破坏了这个限制，并示出了在高维数据时间复杂度 linear 的 KDE aproximation 机制，使得其成为可行的。我们还提供了低维数据的改进 bound。我们的结果基于一个通用的框架，我们称之为 Local Sensitive Quantization（LSQ），用于构建私钥 KDE 机制。它允许我们利用一些高效的非私钥 KDE 方法，如 Random Fourier Features、Fast Gauss Transform 和 Locality Sensitive Hashing，并将它们“黑盒”化，以实现私钥 KDE 机制。我们的实验表明，我们的 resulting DP-KDE 机制在大数据集上具有高速和高准确性。

Generalization Guarantees via Algorithm-dependent Rademacher Complexity

paper_url: http://arxiv.org/abs/2307.02501
repo_url: None
paper_authors: Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli
for: 本文旨在提供一种新的复杂度度量来控制通用化误差，用于现代机器学习算法。
methods: 本文使用了一种基于卷积函数的复杂度度量，并利用了这种度量的一些标准性质和各种各样的假设集合的结构，从而得到了一些新的一阶bounds。
results: 本文得到了一些新的一阶bounds，包括基于幂积函数的bounds和基于假设集合的稳定性的bounds。这些bounds可以扩展到连续函数空间中的函数类和压缩算法，并且比之前的方法更加简单和直观。

Abstract
Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.

摘要
Algorithm-和数据-依赖的总结 bounds 是现代机器学习算法的总结行为的解释需要的。在这种情况下，存在信息理论性的总结 bounds，其中包括（不同形式的）相互信息，以及基于假设集的稳定性。我们提出了一个概念上相关，但技术上不同的复杂度测量来控制总结错误，即算法和数据依赖的假设集中的Empirical Rademacher complexity。通过将标准的Rademacher complexity性质与这种类型的概念结合，我们能够：（i）获得基于finite fractal dimension的新的 bounds，这些bounds（a）在前期的继承维度类型 bounds 中扩展到了有限假设类型，并（b）避免在先前的工作中需要的mutual information项;（ii）我们大大简化了最近的维度独立总结 bound for stochastic gradient descent;（iii）我们轻松地回归到VC类和压缩 schemes中的结果，与基于conditional mutual information的方法类似。

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning

paper_url: http://arxiv.org/abs/2307.01875
repo_url: None
paper_authors: Tamas Madl, Weijie Xu, Olivia Choudhury, Matthew Howard
for: 这篇论文目的是提高机器学习中的数据 Utility，同时保证 differential privacy。
methods: 本文提出了一个名为 3A (Approximate, Adapt, Anonymize) 的数据发布框架，以 maximize 数据 Utility，同时保证 differential privacy。
results: 实验结果显示，使用本文提出的方法可以实现高度的数据 Utility，并且与实际数据中的模型性能相似。 compared to state-of-the-art models, 本文的方法可以提高数据生成的分类性能。

Abstract
The availability of large amounts of informative data is crucial for successful machine learning. However, in domains with sensitive information, the release of high-utility data which protects the privacy of individuals has proven challenging. Despite progress in differential privacy and generative modeling for privacy-preserving data release in the literature, only a few approaches optimize for machine learning utility: most approaches only take into account statistical metrics on the data itself and fail to explicitly preserve the loss metrics of machine learning models that are to be subsequently trained on the generated data. In this paper, we introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning, while preserving differential privacy. We also describe a specific implementation of this framework that leverages mixture models to approximate, kernel-inducing points to adapt, and Gaussian differential privacy to anonymize a dataset, in order to ensure that the resulting data is both privacy-preserving and high utility. We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets, when evaluated on held-out real data. We also compare our results with several privacy-preserving synthetic data generation models (such as differentially private generative adversarial networks), and report significant increases in classification performance metrics compared to state-of-the-art models. These favorable comparisons show that the presented framework is a promising direction of research, increasing the utility of low-risk synthetic data release for machine learning.

摘要
“具有大量有用数据的可用性是成功机器学习的关键。然而，在包含敏感信息的领域中，发布高Utility数据以保护个人隐私是挑战。尽管在Literature中已有进步的泛化隐私和生成模型，但大多数方法只考虑数据本身的统计指标，并没有显式保持机器学习模型将要在生成数据上训练的损失指标。在这篇论文中，我们介绍了一个数据发布框架，称为3A（简化、适应、匿名），以最大化机器学习数据的有用性，同时保持泛化隐私。我们还描述了该框架的具体实现，利用混合模型简化数据，使用抽象点适应数据，并使用泛化隐私保护数据，以确保生成的数据具有隐私保护和高Utility。我们通过实验证明，在评估模型在真实数据上的性能时， Privatized 数据与实际数据的差异很小。我们还与一些隐私保护生成数据生成模型进行比较，并发现我们的结果具有显著的提高性，相比于当前的模型。这些有利的比较表明，我们提出的框架是一个有前途的研究方向，增加低风险的生成数据发布的机器学习 utility。”

A hybrid machine learning framework for clad characteristics prediction in metal additive manufacturing

paper_url: http://arxiv.org/abs/2307.01872
repo_url: https://github.com/sinatayebati/cladnet-ml-for-am
paper_authors: Sina Tayebati, Kyu Taek Cho
for:This paper aims to develop a hybrid approach that combines computational fluid dynamics (CFD) modeling and machine learning (ML) techniques to predict and understand the characteristics of metal additive manufacturing (MAM) printed clads.methods:The authors use a calibrated CFD model to generate a comprehensive dataset of clad characteristics, including geometry, quality, and processing parameters. They then employ two sets of processing parameters for training ML models, along with versatile ML models and reliable evaluation metrics, to create a scalable learning framework for predicting clad geometry and quality.results:The proposed hybrid approach resolves many challenges of conventional modeling methods in MAM by providing an efficient, accurate, and scalable platform for clad characteristics prediction and optimization. The authors demonstrate the effectiveness of their approach by using it to predict clad geometry and quality under different processing conditions.

Abstract
During the past decade, metal additive manufacturing (MAM) has experienced significant developments and gained much attention due to its ability to fabricate complex parts, manufacture products with functionally graded materials, minimize waste, and enable low-cost customization. Despite these advantages, predicting the impact of processing parameters on the characteristics of an MAM printed clad is challenging due to the complex nature of MAM processes. Machine learning (ML) techniques can help connect the physics underlying the process and processing parameters to the clad characteristics. In this study, we introduce a hybrid approach which involves utilizing the data provided by a calibrated multi-physics computational fluid dynamic (CFD) model and experimental research for preparing the essential big dataset, and then uses a comprehensive framework consisting of various ML models to predict and understand clad characteristics. We first compile an extensive dataset by fusing experimental data into the data generated using the developed CFD model for this study. This dataset comprises critical clad characteristics, including geometrical features such as width, height, and depth, labels identifying clad quality, and processing parameters. Second, we use two sets of processing parameters for training the ML models: machine setting parameters and physics-aware parameters, along with versatile ML models and reliable evaluation metrics to create a comprehensive and scalable learning framework for predicting clad geometry and quality. This framework can serve as a basis for clad characteristics control and process optimization. The framework resolves many challenges of conventional modeling methods in MAM by solving t the issue of data scarcity using a hybrid approach and introducing an efficient, accurate, and scalable platform for clad characteristics prediction and optimization.

摘要
过去一个 décennie，金属添加itive制造（MAM）经历了重要的发展和引起了广泛关注，因为它可以制造复杂的部件，生产具有功能分布的材料的产品，最小化废弃物，并实现低成本定制。然而，预测MAM打印后皮层特性的影响因素是复杂的，因为MAM过程的自然特性。机器学习（ML）技术可以帮助将物理下面的过程和处理参数与皮层特性相连。在本研究中，我们提出了一种混合方法，利用实验室数据和CFD模型提供的数据来准备 essencial的大型数据集，然后使用包括多种ML模型的完整框架来预测和理解皮层特性。我们首先编辑了一个广泛的数据集，将实验室数据和CFD模型生成的数据融合在一起，这个数据集包括皮层特性的关键特征，如宽度、高度和深度，标签标识皮层质量，以及处理参数。第二，我们使用两组处理参数进行训练ML模型：机器设置参数和物理意识参数，以及多种可靠的ML模型和评价指标来建立一个全面、精准和可扩展的学习框架，用于预测皮层几何和质量。这个框架可以作为皮层特性控制和过程优化的基础。这个框架解决了传统模型方法在MAM中的多个挑战，例如数据不足问题，通过混合方法和引入高效、准确和可扩展的平台，以便预测皮层特性和优化过程。

Self-Consuming Generative Models Go MAD

paper_url: http://arxiv.org/abs/2307.01850
repo_url: None
paper_authors: Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk
for: 本研究探讨了使用生成AI算法训练下一代模型时，自适应循环的特性。
methods: 我们使用当今最佳生成图像模型的三种家族来分析自适应循环的不同情况，包括在训练过程中是否有固定或新鲜的实际数据可用，以及模型是否受到偏见，以考虑数据质量和多样性之间的trade-off。
results: 我们发现，在没有充足的新鲜实际数据的情况下，自适应循环中的未来生成模型会逐渐减少精度或多样性。我们称这种情况为模型自适应疾病（MAD），与狂牛病相似。

Abstract
Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and in whether the samples from previous generation models have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD), making analogy to mad cow disease.

摘要
seized advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and in whether the samples from previous generation models have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD), making analogy to mad cow disease.Here's the translation breakdown: seized (抓取) - advances autophagous (自食性) - loops generative (生成) - AI algorithms imagery (图像) - data types other (其他) - data typesNote that the word "mad" in the last sentence is not translated, as it is a metaphorical term and not a direct translation.

Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

paper_url: http://arxiv.org/abs/2307.01849
repo_url: None
paper_authors: Xiang Li, Varun Belagali, Jinghuan Shang, Michael S. Ryoo
for: robot imitation learning
methods: diffusion models, self-supervised learning (SSL) objective
results: better representation for policy learning, especially when the demonstrations have different proficiencies.Here’s the full text in Simplified Chinese:
for: robot imitation learning
methods: diffusion models, self-supervised learning (SSL) objective
results: 更好的政策学习表示，特别是当示例具有不同水平的时候。

Abstract
Sequence modeling approaches have shown promising results in robot imitation learning. Recently, diffusion models have been adopted for behavioral cloning, benefiting from their exceptional capabilities in modeling complex data distribution. In this work, we propose Crossway Diffusion, a method to enhance diffusion-based visuomotor policy learning by using an extra self-supervised learning (SSL) objective. The standard diffusion-based policy generates action sequences from random noise conditioned on visual observations and other low-dimensional states. We further extend this by introducing a new decoder that reconstructs raw image pixels (and other state information) from the intermediate representations of the reverse diffusion process, and train the model jointly using the SSL loss. Our experiments demonstrate the effectiveness of Crossway Diffusion in various simulated and real-world robot tasks, confirming its advantages over the standard diffusion-based policy. We demonstrate that such self-supervised reconstruction enables better representation for policy learning, especially when the demonstrations have different proficiencies.

摘要
sequence modeling方法在机器人模仿学习中显示了扎实的成果。最近，扩散模型在行为刻画中被采用，因为它们在处理复杂数据分布方面表现出色。在这种工作中，我们提议了跨度扩散（Crossway Diffusion），一种使用额外的自动学习（SSL）目标来增强扩散基于视 Motor 政策学习的方法。标准的扩散基于策略会根据随机噪声和视觉观察结果生成动作序列。我们进一步延伸了这种方法，通过引入一个新的解码器，将推 diffusion 过程中的中间表示重建为原始图像像素和其他状态信息，并在模型中同时使用 SSL 损失进行训练。我们的实验表明，跨度扩散在各种模拟和实际的机器人任务中具有优势，特别是当示例具有不同的技巧水平时。

Empirical Sample Complexity of Neural Network Mixed State Reconstruction

paper_url: http://arxiv.org/abs/2307.01840
repo_url: None
paper_authors: Haimeng Zhao, Giuseppe Carleo, Filippo Vicentini
for: 这个论文旨在研究量子状态重建技术，以减少实际应用中的量子极限复杂性。
methods: 该论文使用了不同的量子状态重建技术，包括变分减少技术，并对其进行了数值研究。
results: 研究发现，在温度有限的伊塞ING模型中，使用不同的量子状态重建技术可以系统地减少量子资源的需求。同时，比较了两种主要的量子 neural 状态编码，即量子扩散算符表示和正值算符测量表示，并发现它们在混合性的不同范围内表现不同。

Abstract
Quantum state reconstruction using Neural Quantum States has been proposed as a viable tool to reduce quantum shot complexity in practical applications, and its advantage over competing techniques has been shown in numerical experiments focusing mainly on the noiseless case. In this work, we numerically investigate the performance of different quantum state reconstruction techniques for mixed states: the finite-temperature Ising model. We show how to systematically reduce the quantum resource requirement of the algorithms by applying variance reduction techniques. Then, we compare the two leading neural quantum state encodings of the state, namely, the Neural Density Operator and the positive operator-valued measurement representation, and illustrate their different performance as the mixedness of the target state varies. We find that certain encodings are more efficient in different regimes of mixedness and point out the need for designing more efficient encodings in terms of both classical and quantum resources.

摘要
量子状态重建使用神经量子状态已被提议为实际应用中减少量子射频复杂性的可能工具，并其优势于竞争技术在数字实验中得到了证明。在这项工作中，我们数字实验 investigate了不同量子状态重建技术的性能在杂态场景下：finite-temperature Ising模型。我们表明如何系统地减少量子资源需求的算法，并应用变差缓和技术。然后，我们比较了两种主要的神经量子状态编码方法， namely，神经激发函数和正值算符测量表示法，并示出它们在不同杂度水平下的不同性能。我们发现某些编码在不同的杂度范围内更高效，并指出了设计更高效的编码的需求，即类比和量子资源。

Collaborative Score Distillation for Consistent Visual Synthesis

paper_url: http://arxiv.org/abs/2307.04787
repo_url: https://github.com/subin-kim-cv/CSD
paper_authors: Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin
for: 提高文本到图像扩散模型的应用范围和可编辑性。
methods: 基于 Stein 变分Gradient Descent（SVGD）的 Collaborative Score Distillation（CSD）方法，通过考虑多个样本的分布来塑造图像集的共聚性。
results: 在各种任务中，如修改投影图像、视频和3D场景，CSD方法能够提高图像集之间的一致性，从而扩展文本到图像扩散模型的应用范围。

Abstract
Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.

摘要
<>将文本转换为简化中文。<>大规模文本到图像扩散模型的生成先验可以激发多种新的生成和编辑应用程序。然而，当应用这些先验到复杂的视觉模式时，保证一组图像之间的一致性是挑战。在这篇论文中，我们解决这个挑战方法是协同分数精灵（CSD）。CSD基于斯坦变分 Gradient Descent（SVGD）。我们建议将多个样本视为“粒子”在SVGD更新中，并将它们的分数函数组合以静止生成先验覆盖多个图像同步。因此，CSD使得多个图像之间的信息集成更加简单，从而实现了多个样本之间的视觉同步。我们在多种任务中展示了CSD的效果，包括修改广角图像、视频和3D场景。我们的结果表明CSD是一种多功能的方法，可以增强样本之间的一致性，从而扩大文本到图像扩散模型的应用范围。

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

paper_url: http://arxiv.org/abs/2307.01831
repo_url: https://github.com/DiT-3D/DiT-3D
paper_authors: Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nießner, Zhenguo Li
for: This paper is written for generating high-quality 3D point clouds using a novel Diffusion Transformer architecture, specifically designed for 3D shape generation.
methods: The paper proposes a novel Diffusion Transformer architecture called DiT-3D, which adapts the design philosophy of DiT but incorporates 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. The paper also introduces 3D window attention to reduce computational cost in 3D shape generation.
results: The proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation on the ShapeNet dataset, with a 4.59 decrease in 1-Nearest Neighbor Accuracy and a 3.51 increase in Coverage metric compared to the state-of-the-art method.

Abstract
Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, namely DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers. Compared to existing U-Net approaches, our DiT-3D is more scalable in model size and produces much higher quality generations. Specifically, the DiT-3D adopts the design philosophy of DiT but modifies it by incorporating 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. To reduce the computational cost of self-attention in 3D shape generation, we incorporate 3D window attention into Transformer blocks, as the increased 3D token length resulting from the additional dimension of voxels can lead to high computation. Finally, linear and devoxelization layers are used to predict the denoised point clouds. In addition, our transformer architecture supports efficient fine-tuning from 2D to 3D, where the pre-trained DiT-2D checkpoint on ImageNet can significantly improve DiT-3D on ShapeNet. Experimental results on the ShapeNet dataset demonstrate that the proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our DiT-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance.

摘要
最近的扩散变换器（例如DiT）已经表现出了高质量的2D图像生成能力。然而，是否Transformer架构在3D形状生成中表现 similarly well，现在都是一个问题。因为前一些3D扩散方法主要采用U-Net架构。为了bridging这个差距，我们提出了一种新的3D扩散变换器，即DiT-3D，它可以直接对粗糙点云进行杂化处理，并使用平杂Transformers进行操作。相比现有的U-Net方法，我们的DiT-3D更加扩展性强，生成质量更高。具体来说，DiT-3D采用了Diffusion Transformer的设计哲学，但是将其修改为包括3D位置嵌入和补丁嵌入，以适应 voxelized点云的输入。为了降低3D形状生成中自我注意力的计算成本，我们引入了3D窗口注意力，并在Transformer块中应用。最后，我们使用线性和反粗糙层来预测净化后的点云。此外，我们的 transformer 架构支持高效的 fine-tuning 从2D到3D，其中预先训练的 DiT-2D checkpoint 在 ImageNet 上可以显著提高 DiT-3D 的性能。实验结果表明，我们提出的 DiT-3D 在 ShapeNet 数据集上实现了状态可见的高精度和多样化3D点云生成。具体来说，我们的 DiT-3D 在1-Nearest Neighbor Accuracy 和 Coverage 指标上降低了state-of-the-art 方法的值，分别降低了4.59和3.51。

Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses

paper_url: http://arxiv.org/abs/2307.01827
repo_url: None
paper_authors: Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz, Yaniv Nikankin, Michal Irani
for: 这个研究的目的是探讨神经网络中训练样本的内存化现象，以及这种现象对神经网络的影响。
methods: 该研究使用了多层感知器和卷积神经网络进行重建训练样本的方法，并对不同的损失函数进行了探讨。
results: 研究发现，使用权重衰变 durante 训练可以提高神经网络的重建可能性，同时也影响了神经网络的性能。此外，研究还发现，在训练样本数量和神经网络neuron数量之间存在一定的关系。

Abstract
Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability.

摘要
<>将文本翻译成简化中文。<> neural network 的吸收训练数据是一个活跃的研究领域，然而我们对其内部工作的理解仍然处于初期阶段。最近，海沃特等人（2022）提出了一种方案，可以从多层感知器二分类网络中重建训练样本，Effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks。在这项工作中，我们将这些发现扩展到多类和卷积神经网络，并 derivate a more general reconstruction scheme 可以应用于更广泛的损失函数，如回归损失。此外，我们研究了不同因素对神经网络的重建性的影响，发现使用权重衰减 durante 训练可以提高重建性 both in terms of quantity and quality。此外，我们还研究了神经网络的 neurons 和训练样本的数量之间的关系。

Structural Balance and Random Walks on Complex Networks with Complex Weights

paper_url: http://arxiv.org/abs/2307.01813
repo_url: None
paper_authors: Yu Tian, Renaud Lambiotte
for: This paper focuses on the study of complex-weighted networks, specifically investigating their structural and dynamical properties when the weight matrix is Hermitian.
methods: The authors use concepts from signed graphs to classify complex-weighted networks based on structural balance and explore the shared spectral properties within each type. They also apply the results to characterize the dynamics of random walks on these networks.
results: The paper shows that local consensus can be achieved asymptotically when the graph is structurally balanced, while global consensus will be obtained when it is strictly unbalanced. The authors also propose a spectral clustering algorithm and explore the performance of the algorithm on both synthetic and real networks.

Abstract
Complex numbers define the relationship between entities in many situations. A canonical example would be the off-diagonal terms in a Hamiltonian matrix in quantum physics. Recent years have seen an increasing interest to extend the tools of network science when the weight of edges are complex numbers. Here, we focus on the case when the weight matrix is Hermitian, a reasonable assumption in many applications, and investigate both structural and dynamical properties of the complex-weighted networks. Building on concepts from signed graphs, we introduce a classification of complex-weighted networks based on the notion of structural balance, and illustrate the shared spectral properties within each type. We then apply the results to characterise the dynamics of random walks on complex-weighted networks, where local consensus can be achieved asymptotically when the graph is structurally balanced, while global consensus will be obtained when it is strictly unbalanced. Finally, we explore potential applications of our findings by generalising the notion of cut, and propose an associated spectral clustering algorithm. We also provide further characteristics of the magnetic Laplacian, associating directed networks to complex-weighted ones. The performance of the algorithm is verified on both synthetic and real networks.

摘要
复杂数字定义了实体之间的关系在许多情况下。一个典型的例子是量子物理中的哈密顿矩阵中的偏置项。过去几年，有越来越多的研究者想要扩展网络科学中的工具，当Weight of edges是复数时。我们在这里关注Hermitian矩阵的情况，这是许多应用中的合理假设。我们研究了复数权重网络的结构和动态特性，并基于签名图的概念引入了复数权重网络的分类。我们发现在不同类型的网络中，存在共同的 спектраль性质。然后，我们应用结果来描述复杂权重网络上Random walk的动态，当网络是结构均衡的时，本地协同可以在极限上 achievable，而全球协同则需要网络是严格不均衡的。最后，我们探讨了我们的发现的应用，包括通过扩展割的概念和相关的 спектраль划分算法。我们还提供了复杂 Laplacian的性能，将导向网络与复数权重网络相关联。我们的实验表明，我们的算法在 synthetic 和实际网络上都能够 достичь好的性能。

Capturing Local Temperature Evolution during Additive Manufacturing through Fourier Neural Operators

paper_url: http://arxiv.org/abs/2307.01804
repo_url: None
paper_authors: Jiangce Chen, Wenzhuo Xu, Martha Baldwin, Björn Nijhuis, Ton van den Boogaard, Noelia Grande Gutiérrez, Sneha Prabha Narra, Christopher McComb
for: 本研究旨在提高附加制造技术的性能，通过快速模拟热性能。
methods: 本文使用Fourier Neural Operator来捕捉加工过程中的本地温度演化。
results: 模型在numerical simulations中表现出高精度，并且可以在不同的几何体上保持通用性。In English, that would be:
for: The purpose of this research is to improve the performance of additive manufacturing technologies by quickly simulating thermal behavior.
methods: The paper uses Fourier Neural Operator to capture the local temperature evolution during the manufacturing process.
results: The model shows high accuracy in numerical simulations and maintains generalizability to different geometries.

Abstract
High-fidelity, data-driven models that can quickly simulate thermal behavior during additive manufacturing (AM) are crucial for improving the performance of AM technologies in multiple areas, such as part design, process planning, monitoring, and control. However, the complexities of part geometries make it challenging for current models to maintain high accuracy across a wide range of geometries. Additionally, many models report a low mean square error (MSE) across the entire domain (part). However, in each time step, most areas of the domain do not experience significant changes in temperature, except for the heat-affected zones near recent depositions. Therefore, the MSE-based fidelity measurement of the models may be overestimated. This paper presents a data-driven model that uses Fourier Neural Operator to capture the local temperature evolution during the additive manufacturing process. In addition, the authors propose to evaluate the model using the $R^2$ metric, which provides a relative measure of the model's performance compared to using mean temperature as a prediction. The model was tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process, and the results demonstrate that the model achieves high fidelity as measured by $R^2$ and maintains generalizability to geometries that were not included in the training process.

摘要
高精度、数据驱动的模型可以快速模拟附加制造过程中的热性能，这些模型在多个领域，如部件设计、过程规划、监测和控制方面，都有提高附加制造技术的表现。然而，部件的复杂 геометри Structure 使得当前的模型难以保持高精度 across 各种 geometries。此外，许多模型报告了 across 整个领域 ($part$) 的低 Mean Square Error ($MSE$)，但在每个时间步骤中，大多数领域并不经历 significannot 的温度变化，只有近 recent depositions 的热效应区域。因此，基于 $MSE$ 的模型准确性测试可能受到过度估计。本文提出了一种基于 Fourier Neural Operator 的数据驱动模型，用于捕捉附加制造过程中的本地温度演化。此外，作者们提议使用 $R^2$ 指标来评估模型的性能，$R^2$ 指标为模型的Relative 性能指标，可以与使用 Mean Temperature 作为预测的 $R^2$ 指标进行比较。模型在基于 Discontinuous Galerkin Finite Element Method 的数值 simulations 上进行测试，结果表明该模型在 $R^2$ 指标下达到了高准确性，并且可以在不包含在训练过程中的 geometry 上保持通用性。

Edge-aware Multi-task Network for Integrating Quantification Segmentation and Uncertainty Prediction of Liver Tumor on Multi-modality Non-contrast MRI

paper_url: http://arxiv.org/abs/2307.01798
repo_url: None
paper_authors: Xiaojiao Xiao, Qinmin Hu, Guanghui Wang
for: Liver tumor diagnosis and analysis
methods: Multi-modality non-contrast magnetic resonance imaging (NCMRI) fusion, edge-aware feature aggregation module (EaFA), and multi-task learning
results: Outperformed state-of-the-art methods with a dice similarity coefficient of 90.01$\pm$1.23 and a mean absolute error of 2.72$\pm$0.58 mm for MD.

Abstract
Simultaneous multi-index quantification, segmentation, and uncertainty estimation of liver tumors on multi-modality non-contrast magnetic resonance imaging (NCMRI) are crucial for accurate diagnosis. However, existing methods lack an effective mechanism for multi-modality NCMRI fusion and accurate boundary information capture, making these tasks challenging. To address these issues, this paper proposes a unified framework, namely edge-aware multi-task network (EaMtNet), to associate multi-index quantification, segmentation, and uncertainty of liver tumors on the multi-modality NCMRI. The EaMtNet employs two parallel CNN encoders and the Sobel filters to extract local features and edge maps, respectively. The newly designed edge-aware feature aggregation module (EaFA) is used for feature fusion and selection, making the network edge-aware by capturing long-range dependency between feature and edge maps. Multi-tasking leverages prediction discrepancy to estimate uncertainty and improve segmentation and quantification performance. Extensive experiments are performed on multi-modality NCMRI with 250 clinical subjects. The proposed model outperforms the state-of-the-art by a large margin, achieving a dice similarity coefficient of 90.01$\pm$1.23 and a mean absolute error of 2.72$\pm$0.58 mm for MD. The results demonstrate the potential of EaMtNet as a reliable clinical-aided tool for medical image analysis.

摘要
simultanous多指标评估、分割和不确定度估计liver肿瘤在多Modal非contrast磁共振成像（NCMRI）中是诊断精准的关键。然而，现有方法缺乏有效的多Modal NCMRI融合机制和准确边界信息捕获机制，使这些任务变得困难。为解决这些问题，这篇论文提出了一个统一框架，即edge-aware多任务网络（EaMtNet），用于 associating multi-index评估、分割和不确定度估计liver肿瘤在多Modal NCMRI中。EaMtNet使用了两个并行的CNN Encoder和Sobel滤波器来提取本地特征和边图，分别。新设计的edge-aware特征聚合模块（EaFA）用于特征融合和选择，使网络变得edge-aware，捕捉特征和边图之间的长距离依赖关系。多任务利用预测差异来估计不确定度和提高分割和评估性能。广泛的实验在多Modal NCMRI上进行，涉及250名临床实验者。提出的模型在多Modal NCMRI中表现出色，达到了dice相似度系数90.01±1.23和平均绝对误差2.72±0.58mm for MD。结果表明EaMtNet可能成为一种可靠的临床辅助工具 для医学影像分析。

2023-07-05

eess.IV

eess.IV - 2023-07-05

Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI

paper_url: http://arxiv.org/abs/2307.02334
repo_url: https://github.com/jmzhang79/dual-arbnet
paper_authors: Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian
for: 这篇论文旨在提高医疗成像领域中的磁共振成像（MRI）像素化，以提高医生在诊断和治疗时的可见度。
methods: 本研究使用了一种基于神经网络的多标示磁共振超解析（SR）重建方法，称为Dual-ArbNet，它可以在不同的标示模式下进行SR重建，并且可以处理不同的像素比例和分辨率。
results: 实验结果显示，Dual-ArbNet方法在两个公共MRI数据集上具有较高的SR性能，并且可以在不同的像素比例和分辨率下进行SR重建。此外，该方法还可以运用到临床实践中。

Abstract
Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research. Benefiting from the diverse and complementary information of multi-contrast MR images in different imaging modalities, multi-contrast Super-Resolution (SR) reconstruction is promising to yield SR images with higher quality. In the medical scenario, to fully visualize the lesion, radiologists are accustomed to zooming the MR images at arbitrary scales rather than using a fixed scale, as used by most MRI SR methods. In addition, existing multi-contrast MRI SR methods often require a fixed resolution for the reference image, which makes acquiring reference images difficult and imposes limitations on arbitrary scale SR tasks. To address these issues, we proposed an implicit neural representations based dual-arbitrary multi-contrast MRI super-resolution method, called Dual-ArbNet. First, we decouple the resolution of the target and reference images by a feature encoder, enabling the network to input target and reference images at arbitrary scales. Then, an implicit fusion decoder fuses the multi-contrast features and uses an Implicit Decoding Function~(IDF) to obtain the final MRI SR results. Furthermore, we introduce a curriculum learning strategy to train our network, which improves the generalization and performance of our Dual-ArbNet. Extensive experiments in two public MRI datasets demonstrate that our method outperforms state-of-the-art approaches under different scale factors and has great potential in clinical practice.

摘要
限于快照系统，重建快照成像（MRI）图像从部分测量是医学成像研究中的关键。利用不同和补充的多比特MR成像模式的多比特超分辨（SR）重建可以获得更高质量的SR图像。在医疗场景下，为了全面显示肿瘤，辐射医生通常会在自定义的比例下缩放MR图像，而不是使用固定比例，这与大多数MRI SR方法不同。此外，现有的多比特MRI SR方法通常需要固定的参参图像分辨率，这使得获得参考图像困难，并对自定义比例SR任务带来限制。为解决这些问题，我们提出了基于卷积神经表示的双自由多比特MRI超分辨方法，称为Dual-ArbNet。首先，我们将目标和参考图像的分辨率解耦通过特征编码器，使网络可以输入自定义的目标和参考图像。然后，我们使用卷积叠加器将多比特特征进行卷积叠加，并使用偏函数IDF获取最终的MRI SR结果。此外，我们引入了课程学习策略来训练我们的网络，这有助于提高我们Dual-ArbNet的一般化和性能。广泛的实验在两个公共MRI数据集上表明，我们的方法在不同的比例因子下表现出色，有很好的潜在应用前景。

Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression

paper_url: http://arxiv.org/abs/2307.02273
repo_url: None
paper_authors: Ahmed Ghorbel, Wassim Hamidouche, Luce Morin
for: 这篇论文是关于神经网络图像压缩（NIC）的研究，旨在提高NIC的性能，并且希望通过对Tranformer-based transform coding框架进行改进，以提高图像压缩的效率和质量。
methods: 本文使用Tranformer-based channel-wise auto-regressive prior模型来提高SwinT-ChARM的性能，并且添加了一个可学习的缩放模块来更好地提取更紧凑的缺失代码。
results: 实验结果表明，提出的框架可以在各种测试集上显著提高NIC的质量和效率，并且在与VVC参考编码器（VTM-18.0）和SwinT-ChARM神经编码器进行比较时，具有更好的质量和效率。

Abstract
Recently, the performance of neural image compression (NIC) has steadily improved thanks to the last line of study, reaching or outperforming state-of-the-art conventional codecs. Despite significant progress, current NIC methods still rely on ConvNet-based entropy coding, limited in modeling long-range dependencies due to their local connectivity and the increasing number of architectural biases and priors, resulting in complex underperforming models with high decoding latency. Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Through the proposed ICT, we can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre-/post-processor to accurately extract more compact latent codes while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the adaptive image compression transformer (AICT) and the neural codec SwinT-ChARM.

摘要
近些年，神经网络图像压缩（NIC）的性能已经逐渐提高，达到或超越传统编码器的状态元。 DESPITE 这些进步，当前的 NIC 方法仍然基于 ConvNet 来实现 entropy coding，受到本地连接性的限制，以及逐渐增加的建筑学偏好和先验，导致复杂的不够表现的模型和高解码延迟。被Transformer 基于 transform coding 框架的 SwinT-ChARM 的效率调查所驱动，我们提议使用更直观而有效的 Tranformer 基于通道 wise auto-regressive prior 模型，以提高后者。通过我们的提议的 ICT，我们可以从 latent 表示中捕捉全局和局部上下文，更好地参数化归一化的量化 latent。此外，我们利用一个可学习的缩放模块，并在 ConvNeXt 基于的预处理/后处理器中使用它来准确地提取更紧凑的 latent 代码，并在重建更高质量的图像。对于一系列的 benchmark 数据集，我们进行了广泛的实验研究，并证明了我们的框架可以显著提高对 coding 效率和解码器复杂度的质量权衡。此外，我们还提供了模型缩放研究，以证明我们的方法的计算效率。最后，我们进行了一些对象和主观分析，以强调 AICT 与 SwinT-ChARM 之间的性能差距。

Direct segmentation of brain white matter tracts in diffusion MRI

paper_url: http://arxiv.org/abs/2307.02223
repo_url: None
paper_authors: Hamza Kebiri, Ali Gholipour, Meritxell Bach Cuadra, Davood Karimi
for:白 matter tracts 的 segmentation，即 brain 中各个区域之间的连接组织。methods:使用 deep learning 方法，直接从 diffusion MRI 数据中提取 white matter tracts。results: segmentation 精度与现有方法相当（mean Dice Similarity Coefficient 为 0.826），并且具有更高的普适性，可应用于低样本量的临床研究和不同的数据获取协议。

Abstract
The brain white matter consists of a set of tracts that connect distinct regions of the brain. Segmentation of these tracts is often needed for clinical and research studies. Diffusion-weighted MRI offers unique contrast to delineate these tracts. However, existing segmentation methods rely on intermediate computations such as tractography or estimation of fiber orientation density. These intermediate computations, in turn, entail complex computations that can result in unnecessary errors. Moreover, these intermediate computations often require dense multi-shell measurements that are unavailable in many clinical and research applications. As a result, current methods suffer from low accuracy and poor generalizability. Here, we propose a new deep learning method that segments these tracts directly from the diffusion MRI data, thereby sidestepping the intermediate computation errors. Our experiments show that this method can achieve segmentation accuracy that is on par with the state of the art methods (mean Dice Similarity Coefficient of 0.826). Compared with the state of the art, our method offers far superior generalizability to undersampled data that are typical of clinical studies and to data obtained with different acquisition protocols. Moreover, we propose a new method for detecting inaccurate segmentations and show that it is more accurate than standard methods that are based on estimation uncertainty quantification. The new methods can serve many critically important clinical and scientific applications that require accurate and reliable non-invasive segmentation of white matter tracts.

摘要
脑白atter包括一组通过不同脑区域的脑 tract， segmentation 这些 tract 常需要在临床和研究实验中进行。Diffusion-weighted MRI 提供了一个唯一的对比，以定义这些 tract。然而，现有的 segmentation 方法通常需要中间计算，如 tractography 或 fibre orientation density 的估计。这些中间计算可能会导致多余的错误，并且常常需要 dense multi-shell measurements，这些 measurements 在许多临床和研究应用中不可得。因此，现有的方法受到低精度和差异化的限制。在这里，我们提出了一种新的深度学习方法，可以直接从 diffusion MRI 数据中分割 white matter tract，并且避免中间计算的错误。我们的实验表明，这种方法可以达到与现有方法相同的 segmentation 精度（mean Dice Similarity Coefficient 0.826）。相比之下，我们的方法在不同的数据采样和数据采集协议下具有更好的普适性。此外，我们还提出了一种新的方法来检测不准确的分割，并证明它比标准的方法更加准确。这些新方法可以为许多重要的临床和科学应用提供准确和可靠的非侵入式 white matter tract 的分割。

Compound Attention and Neighbor Matching Network for Multi-contrast MRI Super-resolution

paper_url: http://arxiv.org/abs/2307.02148
repo_url: None
paper_authors: Wenxuan Chen, Sirui Wu, Shuai Wang, Zhongsen Li, Jia Yang, Huifeng Yao, Xiaomeng Li, Xiaolei Song
for: 这个论文旨在提出一种新的多模式磁共振成像超分辨（SR）网络架构，用于解决现有的SR方法在多模式磁共振成像中存在缺陷，如缺乏合适的参考特征和缺乏频率匹配。
methods: 该论文提出了一种基于自我注意力和邻居匹配的网络架构，称为CANM-Net，它使用复合自我注意力机制和邻居匹配模块来捕捉多模式磁共振成像中的相互依赖关系，并将参考特征和下落特征进行适应性匹配，以实现高质量的SR图像生成。
results: 该论文通过在IXI、fastMRI和实际扫描数据集上进行SR任务的实验，证明了CANM-Net在透彻和跨模式磁共振成像SR中具有优于现有方法的性能，并且在不当 registrations 的情况下仍然保持良好的表现，这表明其在临床应用中具有良好的潜力。

Abstract
Multi-contrast magnetic resonance imaging (MRI) reflects information about human tissue from different perspectives and has many clinical applications. By utilizing the complementary information among different modalities, multi-contrast super-resolution (SR) of MRI can achieve better results than single-image super-resolution. However, existing methods of multi-contrast MRI SR have the following shortcomings that may limit their performance: First, existing methods either simply concatenate the reference and degraded features or exploit global feature-matching between them, which are unsuitable for multi-contrast MRI SR. Second, although many recent methods employ transformers to capture long-range dependencies in the spatial dimension, they neglect that self-attention in the channel dimension is also important for low-level vision tasks. To address these shortcomings, we proposed a novel network architecture with compound-attention and neighbor matching (CANM-Net) for multi-contrast MRI SR: The compound self-attention mechanism effectively captures the dependencies in both spatial and channel dimension; the neighborhood-based feature-matching modules are exploited to match degraded features and adjacent reference features and then fuse them to obtain the high-quality images. We conduct experiments of SR tasks on the IXI, fastMRI, and real-world scanning datasets. The CANM-Net outperforms state-of-the-art approaches in both retrospective and prospective experiments. Moreover, the robustness study in our work shows that the CANM-Net still achieves good performance when the reference and degraded images are imperfectly registered, proving good potential in clinical applications.

摘要
多模式磁共振成像（MRI）可以从不同角度获取人体组织信息，有广泛的临床应用。通过利用不同模式之间的共趋性信息，多模式超解析（SR）的MRI可以实现更好的结果，而存在的方法却有以下缺点：首先，现有方法可能会简单地 concatenate 参考和压缩特征，或者利用全局特征匹配，这些方法不适合多模式MRI SR。其次，虽然许多最新的方法使用 transformer 来捕捉空间维度的长距离依赖关系，但它们忽略了通道维度的自我注意力的重要性，这对低级视觉任务来说非常重要。为了解决这些缺点，我们提出了一种新的网络架构，即嵌入式自注意和邻居匹配网络（CANM-Net），用于多模式MRI SR：嵌入式自注意机制可以有效捕捉空间和通道维度之间的依赖关系；邻居特征匹配模块可以将压缩特征和相邻参考特征匹配并融合，以获得高质量的图像。我们在 IXI、fastMRI 和实际扫描数据集上进行 SR 任务的实验，CANM-Net 比 estado-of-the-art 方法在回顾和前瞻性实验中表现出色。此外，我们的 robustness 研究显示，CANM-Net 在参考和压缩图像不完美匹配时仍能保持良好的性能，这证明它在临床应用中具有良好的潜力。

A Mini-Batch Quasi-Newton Proximal Method for Constrained Total-Variation Nonlinear Image Reconstruction

paper_url: http://arxiv.org/abs/2307.02043
repo_url: None
paper_authors: Tao Hong, Thanh-an Pham, Irad Yavneh, Michael Unser
for: 这篇论文是关于计算成像，使用精确的物理模型来实现高质量重建的。
methods: 本文提出了一种基于强化随机批处理的准确非线性物理模型的计算成像方法，即mini-batch quasi-Newton proximal方法（BQNPM）。
results: 本文通过对三维反射问题进行实验和实际数据测试，证明BQNPM比ASPMs更快速地 converges，并且可以在计算成像中实现高质量的重建。

Abstract
Over the years, computational imaging with accurate nonlinear physical models has drawn considerable interest due to its ability to achieve high-quality reconstructions. However, such nonlinear models are computationally demanding. A popular choice for solving the corresponding inverse problems is accelerated stochastic proximal methods (ASPMs), with the caveat that each iteration is expensive. To overcome this issue, we propose a mini-batch quasi-Newton proximal method (BQNPM) tailored to image-reconstruction problems with total-variation regularization. It involves an efficient approach that computes a weighted proximal mapping at a cost similar to that of the proximal mapping in ASPMs. However, BQNPM requires fewer iterations than ASPMs to converge. We assess the performance of BQNPM on three-dimensional inverse-scattering problems with linear and nonlinear physical models. Our results on simulated and real data show the effectiveness and efficiency of BQNPM,

摘要
随着时间的推移，计算成像技术已经吸引了广泛的关注，因为它可以实现高质量的重建。然而，这些非线性模型在计算上具有挑战性。一种受欢迎的解决方案是加速随机邻域方法（ASPMs），但每个迭代都是贵夫。为了解决这个问题，我们提议一种基于图像重建问题的权重贝叶斯方法（BQNPM）。这种方法具有计算贝叶斯映射的效率，但需要 fewer than ASPMs 的迭代次数才能达到 convergence。我们对三维反射问题进行了线性和非线性物理模型的测试，结果表明 BQNPM 的效果和效率。

Joint Recovery of T1, T2* and Proton Density Maps Using a Bayesian Approach with Parameter Estimation and Complementary Undersampling Patterns

paper_url: http://arxiv.org/abs/2307.02015
repo_url: None
paper_authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu
for: This paper aims to improve the quality of quantitative MR images recovered from undersampled measurements by incorporating the signal model of the variable-flip-angle (VFA) multi-echo 3D gradient-echo (GRE) method into the reconstruction of $T_1$, $T_2^*$, and proton density (PD) maps.methods: The proposed approach is based on a probabilistic Bayesian formulation of the recovery problem, and uses approximate message passing with built-in parameter estimation (AMP-PE) to jointly recover distribution parameters, VFA multi-echo images, and $T_1$, $T_2^*$, and PD maps without the need for hyperparameter tuning.results: The proposed AMP-PE approach outperforms the state-of-the-art $l1$-norm minimization approach in terms of reconstruction performance, and adopting complementary undersampling patterns across different flip angles and/or echo times yields the best performance for $T_2^*$ and proton density mappings.

Abstract
Purpose: To improve the quality of quantitative MR images recovered from undersampled measurements, we incorporate the signal model of the variable-flip-angle (VFA) multi-echo 3D gradient-echo (GRE) method into the reconstruction of $T_1$, $T_2^*$ and proton density (PD) maps. Additionally, we investigate the use of complementary undersampling patterns to determine optimal undersampling schemes for quantitative MRI. Theory: We propose a probabilistic Bayesian formulation of the recovery problem. Our proposed approach, approximate message passing with built-in parameter estimation (AMP-PE), enables the joint recovery of distribution parameters, VFA multi-echo images, and $T_1$, $T_2^*$, and PD maps without the need for hyperparameter tuning. Methods: We conducted both retrospective and prospective undersampling to obtain Fourier measurements using variable-density and Poisson-disk patterns. We investigated a variety of undersampling schemes, adopting complementary patterns across different flip angles and/or echo times. Results: AMP-PE adopts a joint recovery strategy, it outperforms the state-of-the-art $l1$-norm minimization approach that follows a decoupled recovery strategy. For $T_1$ mapping, employing fixed sampling patterns across different echo times produced the best performance. Whereas for $T_2^*$ and proton density mappings, using complementary sampling patterns across different flip angles yielded the best performance. Conclusion: AMP-PE achieves better performance by combining information from both the MR signal model and the sparse prior on VFA multi-echo images. It is equipped with automatic and adaptive parameter estimation, and works naturally with the clinical prospective undersampling scheme.

摘要
目的：提高 Undersampled 测量中的量子 MR 图像质量，我们在 reconstruction 中 incorporate 变量扭矩（VFA）多echo 3D 梯阶 echo（GRE）方法的信号模型。此外，我们还 investigate 使用 complementary 抽象样本来确定最佳的 Undersampling 方案。理论：我们提出了一种 Bayesian 形式的回归问题。我们的提议方法为 approximate message passing with built-in parameter estimation（AMP-PE），它可以同时回归分布参数、VFA multi-echo 图像和 $T_1$, $T_2^*$ 和 proton density（PD）图像，无需进行hyperparameter 调整。方法：我们在 retrospective 和 prospectively 抽象到 obtain Fourier 测量。我们 investigate 了不同的抽象方案，包括 variable-density 和 Poisson-disk 模式。结果：AMP-PE 采用了联合回归策略，其表现更好于 state-of-the-art $l1$-norm 最小化方法，后者采用了解 Coupled 回归策略。对 $T_1$ 图像，使用 fixes 抽象模式 across 不同的 echo times 得到了最佳性能。而对 $T_2^*$ 和 proton density 图像，使用 complementary 抽象模式 across 不同的扭矩 angles 得到了最佳性能。结论：AMP-PE 通过结合 MR 信号模型和 VFA multi-echo 图像的稀热先验来提高量子 MR 图像的质量。它具有自动和适应参数估计，并可以自然地与临床的前向抽象 schemes 结合。

Unsupervised Spectral Demosaicing with Lightweight Spectral Attention Networks

paper_url: http://arxiv.org/abs/2307.01990
repo_url: None
paper_authors: Kai Feng, Yongqiang Zhao, Seong G. Kong, Haijin Zeng
for: 这 paper 的目的是提出一种基于深度学习的无监督 spectral demosaicing 技术，以便在实际图像中进行高质量的颜色彩度恢复。
methods: 该 paper 使用了一种无监督学习的架构，包括提出了一种 mosaic loss function、模型结构、变换策略以及 early stopping 策略，这些组成了一个完整的无监督 spectral demosaicing 框架。
results: 对于实际图像，该 paper 的方法能够更好地抑制空间扭曲、保持 spectral 准确性、稳定性和计算成本，并且在 synthetic 和实际数据上进行了广泛的实验，得到了更高的性能。

Abstract
This paper presents a deep learning-based spectral demosaicing technique trained in an unsupervised manner. Many existing deep learning-based techniques relying on supervised learning with synthetic images, often underperform on real-world images especially when the number of spectral bands increases. According to the characteristics of the spectral mosaic image, this paper proposes a mosaic loss function, the corresponding model structure, a transformation strategy, and an early stopping strategy, which form a complete unsupervised spectral demosaicing framework. A challenge in real-world spectral demosaicing is inconsistency between the model parameters and the computational resources of the imager. We reduce the complexity and parameters of the spectral attention module by dividing the spectral attention tensor into spectral attention matrices in the spatial dimension and spectral attention vector in the channel dimension, which is more suitable for unsupervised framework. This paper also presents Mosaic25, a real 25-band hyperspectral mosaic image dataset of various objects, illuminations, and materials for benchmarking. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method outperforms conventional unsupervised methods in terms of spatial distortion suppression, spectral fidelity, robustness, and computational cost.

摘要
One of the challenges in real-world spectral demosaicing is the inconsistency between the model parameters and the computational resources of the imager. To address this, the paper reduces the complexity and parameters of the spectral attention module by dividing the spectral attention tensor into spectral attention matrices in the spatial dimension and spectral attention vectors in the channel dimension. This is more suitable for an unsupervised framework.The paper also presents Mosaic25, a real 25-band hyperspectral mosaic image dataset of various objects, illuminations, and materials for benchmarking. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method outperforms conventional unsupervised methods in terms of spatial distortion suppression, spectral fidelity, robustness, and computational cost.Here is the Simplified Chinese translation of the text:这篇论文提出了一种基于深度学习的spectral demosaicing技术，该技术在无监督的情况下训练。现有的深度学习基于的技术 часто采用监督学习 Synthetic 图像，在实际图像中表现不佳，特别是随着频谱带数的增加。根据频谱拼接图像的特点，这篇论文提出了一种拼接损失函数、相应的模型结构、转换策略和早stopping策略，这些组成了一个完整的无监督 spectral demosaicing 框架。实际频谱拼接中的一个挑战是模型参数和捕获设备的计算资源之间的不一致。这篇论文通过将频谱注意力矩阵分割成空间维度的频谱注意力矩阵和通道维度的频谱注意力向量，来降低模型的复杂性和参数数量。这更适合无监督的框架。此外，这篇论文还提供了Mosaic25，一个实际25个频谱带的卷积合成图像数据集，包括不同的物体、照明和材料，用于对比。对于实际和 sintetic 数据集进行了广泛的实验，结果表明，提出的方法在隐藏扰乱、频谱准确、稳定性和计算成本方面都有较好的表现。

ToothSegNet: Image Degradation meets Tooth Segmentation in CBCT Images

paper_url: http://arxiv.org/abs/2307.01979
repo_url: None
paper_authors: Jiaxiang Liu, Tianxiang Hu, Yang Feng, Wanghui Ding, Zuozhu Liu
for: constructing three-dimensional tooth models in computer-assisted orthodontics
methods: 使用ToothSegNet框架，通过生成降低图像的信息来训练分割模型，并使用通道维度的混合来减少Encoder和Decoder之间的语义差异，以及通过结构约束损失来精细调整预测的牙齿形态
results: 比前一代医疗图像分割方法更高精度的牙齿分割结果

Abstract
In computer-assisted orthodontics, three-dimensional tooth models are required for many medical treatments. Tooth segmentation from cone-beam computed tomography (CBCT) images is a crucial step in constructing the models. However, CBCT image quality problems such as metal artifacts and blurring caused by shooting equipment and patients' dental conditions make the segmentation difficult. In this paper, we propose ToothSegNet, a new framework which acquaints the segmentation model with generated degraded images during training. ToothSegNet merges the information of high and low quality images from the designed degradation simulation module using channel-wise cross fusion to reduce the semantic gap between encoder and decoder, and also refines the shape of tooth prediction through a structural constraint loss. Experimental results suggest that ToothSegNet produces more precise segmentation and outperforms the state-of-the-art medical image segmentation methods.

摘要
在计算机协助orthodontics中，三维牙齿模型是许多医疗治疗的关键 step。然而，CBCT图像质量问题，如机器设备和病人的牙科条件所导致的锈损和模糊，使 segmentation 变得更加困难。在这篇论文中，我们提出了 ToothSegNet，一个新的框架，通过在训练过程中对生成的受损图像进行准备，使 segmentation 模型更加熟悉受损图像的特征。ToothSegNet 通过核心混合来减少编码器和解码器之间的semantic gap，并通过结构约束损失来精细调整牙齿预测的形态。实验结果表明，ToothSegNet 可以生成更加精准的 segmentation，并超越了当前医学影像 segmentation 方法的性能。

Millimeter-Wave Reflectionless Filters Using Advanced Thin-Film Fabrication

paper_url: http://arxiv.org/abs/2307.01914
repo_url: None
paper_authors: Matthew Morgan, Seng Loo, Tod Boyd, Miho Hunter
for: developing millimeter-wave, lumped-element reflectionless filters
methods: using advanced thin-film fabrication process with better than 2 μm feature size and integrated elements such as SiN Metal-Insulator-Metal (MIM) capacitors, bridges, and TaN Thin-Film Resistors (TFRs)
results: achieved higher frequency implementation than ever beforeHere’s the same information in Simplified Chinese text:
for: 开发毫米波、堆叠元件反射 filters
methods: 利用高精度薄膜制造过程，实现更高频率实现
results: 实现了历史上最高频率实现

Abstract
We report on the development of millimeter-wave, lumped-element reflectionless filters using an advanced thin-film fabrication process. Based on previously demonstrated circuit topologies capable of achieving 50{\Omega} impedance match at all frequencies, these circuits have been implemented at higher frequencies than ever before by leveraging a thin-film process with better than 2 {\mu}m feature size and integrated elements such as SiN Metal-Insulator-Metal (MIM) capacitors, bridges, and TaN Thin-Film Resistors (TFRs).

摘要
我们报道了毫米波，积成元件反射性筛选器的发展，使用进步的薄膜制造过程。基于之前已经证明可以在所有频率上实现50Ω输Impedance匹配的电路结构，这些电路在以前没有达到过的高频范围内实现了，通过利用 better than 2μm的薄膜特性和集成元件 such as SiN Metal-Insulator-Metal (MIM) 电容器、桥和 TaN Thin-Film Resistors (TFRs)。

Self-Supervised Deep Learning for Model Correction in the Computational Crystallography Toolbox

paper_url: http://arxiv.org/abs/2307.01901
repo_url: https://github.com/gigantocypris/spread
paper_authors: Vidya Ganapati, Daniel Tchon, Aaron S. Brewster, Nicholas K. Sauter
for: This paper aims to use the Computational Crystallography Toolbox (CCTBX) to determine the oxidation state of individual metal atoms in a macromolecule.
methods: The paper uses self-supervised deep learning to correct the scientific model in CCTBX and provide uncertainty quantification.
results: The paper describes the potential impact of using self-supervised deep learning to correct the scientific model in CCTBX and provide uncertainty quantification, and provides code for forward model simulation and data analysis at https://github.com/gigantocypris/SPREAD.Here is the text in Simplified Chinese:
for: 这篇论文使用Computational Crystallography Toolbox（CCTBX）确定蛋白质中金属原子的氧化状态。
methods: 这篇论文使用自我超vised深度学习修正CCTBX中的科学模型，并提供不确定性评估。
results: 这篇论文描述了使用自我超vised深度学习修正CCTBX中的科学模型的可能影响，并提供了https://github.com/gigantocypris/SPREAD中的代码进行前向模型仿真和数据分析。

Abstract
The Computational Crystallography Toolbox (CCTBX) is open-source software that allows for processing of crystallographic data, including from serial femtosecond crystallography (SFX), for macromolecular structure determination. We aim to use the modules in CCTBX to determine the oxidation state of individual metal atoms in a macromolecule. Changes in oxidation state are reflected in small shifts of the atom's X-ray absorption edge. These energy shifts can be extracted from the diffraction images recorded in serial femtosecond crystallography, given knowledge of a forward physics model. However, as the diffraction changes only slightly due to the absorption edge shift, inaccuracies in the forward physics model make it extremely challenging to observe the oxidation state. In this work, we describe the potential impact of using self-supervised deep learning to correct the scientific model in CCTBX and provide uncertainty quantification. We provide code for forward model simulation and data analysis, built from CCTBX modules, at https://github.com/gigantocypris/SPREAD , which can be integrated with machine learning. We describe open questions in algorithm development to help spur advances through dialog between crystallographers and machine learning researchers. New methods could help elucidate charge transfer processes in many reactions, including key events in photosynthesis.

摘要
《计算 кристалagraphy工具箱（CCTBX）》是一款开源软件，用于处理晶体学数据，包括 serial femtosecond crystallography（SFX），以确定大分子结构。我们想使用 CCTBX 模块来确定杂谱中金属原子的氧化状态。氧化状态的变化会导致原子的 X-射线吸收边缘微小变化。这些能量差可以从 serial femtosecond crystallography 记录的 diffraction 图像中提取，只要知道前向物理学模型。然而，由于 diffraction 变化只是微小，因此错误在物理学模型中会导致非常困难地观察氧化状态。在这种情况下，我们描述了使用自适应深度学习来更正科学模型在 CCTBX 中的可能的影响，以及提供不确定性评估。我们在 GitHub 上提供了代码，包括 forward 物理学模型的 simulate 和数据分析，可以与机器学习结合使用。我们描述了在算法开发中的开问，以帮助推动进步，通过晶体学家和机器学习研究人员之间的对话。新的方法可以帮助解释生物化学中的电子传递过程，包括照明很重要的PhotoSynthesis 过程。

Grad-FEC: Unequal Loss Protection of Deep Features in Collaborative Intelligence

paper_url: http://arxiv.org/abs/2307.01846
repo_url: None
paper_authors: Korcan Uyanik, S. Faegheh Yeganli, Ivan V. Bajić
for: 提高edge设备和云端的智能合作系统的可靠性和Robustness，即Collaborative Intelligence（CI）系统。
methods: 提出了一种基于Unequal Loss Protection（ULP）的新方法，包括特征重要度估计器，以优先保护front-end生成的重要特征包。
results: 实验结果表明，提出的方法可以在 packet loss 的情况下显著提高CI系统的可靠性和Robustness。

Abstract
Collaborative intelligence (CI) involves dividing an artificial intelligence (AI) model into two parts: front-end, to be deployed on an edge device, and back-end, to be deployed in the cloud. The deep feature tensors produced by the front-end are transmitted to the cloud through a communication channel, which may be subject to packet loss. To address this issue, in this paper, we propose a novel approach to enhance the resilience of the CI system in the presence of packet loss through Unequal Loss Protection (ULP). The proposed ULP approach involves a feature importance estimator, which estimates the importance of feature packets produced by the front-end, and then selectively applies Forward Error Correction (FEC) codes to protect important packets. Experimental results demonstrate that the proposed approach can significantly improve the reliability and robustness of the CI system in the presence of packet loss.

摘要
共同智能（CI）包括将人工智能（AI）模型分成两部分：前端，部署在边缘设备上，和后端，部署在云端。深度特征张量生成于前端将被传输到云端通过通信频道，该频道可能会出现包loss。为 Addressing this issue, in this paper, we propose a novel approach to enhance the resilience of the CI system in the presence of packet loss through Unequal Loss Protection (ULP). The proposed ULP approach involves a feature importance estimator, which estimates the importance of feature packets produced by the front-end, and then selectively applies Forward Error Correction (FEC) codes to protect important packets. Experimental results demonstrate that the proposed approach can significantly improve the reliability and robustness of the CI system in the presence of packet loss.Here's the breakdown of the translation:* 共同智能 (CI): Collaborative intelligence* 人工智能 (AI)：Artificial intelligence* 模型 (model): Model* 前端 (front-end): Front-end* 后端 (back-end): Back-end* 云端 (cloud): Cloud* 深度特征张量 (deep feature tensors): Deep feature tensors* 包loss (packet loss): Packet loss* 强化 (enhance): Enhance* 不平等损失保护 (ULP): Unequal loss protection* 特征重要性估计器 (feature importance estimator): Feature importance estimator* 前端生成的特征包 (feature packets produced by the front-end): Feature packets produced by the front-end* FEC (Forward Error Correction) 码：Forward error correction codes* 实验结果 (experimental results): Experimental results* 可以显著提高 (can significantly improve): Can significantly improve* 可靠性 (reliability): Reliability* Robustness: Robustness

Multi-Channel Feature Extraction for Virtual Histological Staining of Photon Absorption Remote Sensing Images

paper_url: http://arxiv.org/abs/2307.01824
repo_url: None
paper_authors: Marian Boktor, James E. D. Tweel, Benjamin R. Ecclestone, Jennifer Ai Ye, Paul Fieguth, Parsin Haji Reza
for: 这项研究旨在提高血液染色的效率和可靠性，以便在病理诊断中提供可靠的诊断信息，帮助病理医生在疾病分类、评估和治疗规划中做出更好的决策。methods: 该研究提出了一种基于深度学习的虚拟 histological 染色框架，使用 photon absorption remote sensing（PARS）图像进行特征提取，并使用一种变体的K-means方法来捕捉有价值的多模态信息。此外，该研究还提出了一种基于传统cycleGAN框架的多通道cycleGAN（MC-GAN）模型，以包括更多的特征。results: 实验结果表明，特定的特征组合可以超过传统通道的性能，并且可以提高虚拟染色结果与化学染色（H&E）图像的吻合度。在人皮肤和mouse brain组织中应用，结果表明，选择最佳特征组合是关键，可以提高虚拟染色结果的可靠性和可视化质量。

Abstract
Accurate and fast histological staining is crucial in histopathology, impacting diagnostic precision and reliability. Traditional staining methods are time-consuming and subjective, causing delays in diagnosis. Digital pathology plays a vital role in advancing and optimizing histology processes to improve efficiency and reduce turnaround times. This study introduces a novel deep learning-based framework for virtual histological staining using photon absorption remote sensing (PARS) images. By extracting features from PARS time-resolved signals using a variant of the K-means method, valuable multi-modal information is captured. The proposed multi-channel cycleGAN (MC-GAN) model expands on the traditional cycleGAN framework, allowing the inclusion of additional features. Experimental results reveal that specific combinations of features outperform the conventional channels by improving the labeling of tissue structures prior to model training. Applied to human skin and mouse brain tissue, the results underscore the significance of choosing the optimal combination of features, as it reveals a substantial visual and quantitative concurrence between the virtually stained and the gold standard chemically stained hematoxylin and eosin (H&E) images, surpassing the performance of other feature combinations. Accurate virtual staining is valuable for reliable diagnostic information, aiding pathologists in disease classification, grading, and treatment planning. This study aims to advance label-free histological imaging and opens doors for intraoperative microscopy applications.

摘要
准精准快的 Histological 染色是 Histopathology 中非常重要的，它直接影响诊断的准确性和可靠性。传统的染色方法需要较长的时间和主观的干预，导致诊断的延迟。数字化Patology 在提高和优化 Histology 过程中扮演着重要的角色，以提高效率和减少回转时间。本研究提出了一种基于深度学习的虚拟 Histological 染色方法，使用 photon absorption remote sensing（PARS）图像来提取特征。通过 variants of the K-means 方法提取 PARS 时间分解信号中的有价值多Modal 信息。提出的多通道 cycleGAN（MC-GAN）模型在传统 cycleGAN 框架上进行扩展，以包括额外的特征。实验结果表明，特定的特征组合能够超越传统渠道的表现，提高识别组织结构之前的标签。应用于人皮和 Mouse brain 组织样本，结果表明选择最佳特征组合非常重要，它可以提供较高的视觉和量化协调性，超过其他特征组合。准确的虚拟染色对诊断信息的可靠性至关重要，帮助病理学家在疾病分类、评分和治疗规划中做出更加准确的决策。本研究旨在提高无标签 Histological 成像，开启了Intraoperative 镜像应用的大门。

2023-07-04

cs.SD

cs.SD - 2023-07-04

Disentanglement in a GAN for Unconditional Speech Synthesis

paper_url: http://arxiv.org/abs/2307.01673
repo_url: https://github.com/rf5/simple-asgan
paper_authors: Matthew Baas, Herman Kamper
for: This paper is written for unconditional speech synthesis, specifically to learn a disentangled latent space for speech synthesis.
methods: The paper proposes a generative adversarial network (GAN) called AudioStyleGAN (ASGAN), which is tailored to learn a disentangled latent space for speech synthesis. The ASGAN model builds upon the StyleGAN family of image synthesis models, and it uses a modified adaptation of adaptive discriminator augmentation to successfully train the model.
results: The paper achieves state-of-the-art results in unconditional speech synthesis on the small-vocabulary Google Speech Commands digits dataset, and it is substantially faster than existing top-performing diffusion models. The paper also demonstrates that the ASGAN model’s latent space is disentangled, and that simple linear operations in the space can be used to perform several tasks unseen during training, such as voice conversion, speech enhancement, speaker verification, and keyword classification.

Abstract
Can we develop a model that can synthesize realistic speech directly from a latent space, without explicit conditioning? Despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this, even on small-vocabulary datasets. To address this, we propose AudioStyleGAN (ASGAN) -- a generative adversarial network for unconditional speech synthesis tailored to learn a disentangled latent space. Building upon the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation which probabilistically skips discriminator updates. We apply it on the small-vocabulary Google Speech Commands digits dataset, where it achieves state-of-the-art results in unconditional speech synthesis. It is also substantially faster than existing top-performing diffusion models. We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training. Specifically, we perform evaluations in voice conversion, speech enhancement, speaker verification, and keyword classification. Our work indicates that GANs are still highly competitive in the unconditional speech synthesis landscape, and that disentangled latent spaces can be used to aid generalization to unseen tasks. Code, models, samples: https://github.com/RF5/simple-asgan/

摘要
可以开发一个模型，将真实的语音直接从潜在空间 sintesize，无需显式的条件？过去十年多的尝试仍然无法完成这一点，即使在小词库dataset上。为解决这个问题，我们提出了AudioStyleGAN（ASGAN）——一个类型为生成对抗网络的构成，用于无条件语音合成。ASGAN使用抽象的随机变量，将样本的噪声映射到一个分离的潜在空间中，然后将这个空间映射到一系列的语音特征，以抑制信号扩散。为了成功地训练ASGAN，我们导入了一些新的技术，包括修改适应性的检测器更新，以及在检测器更新中 probabilistically skips 检测器更新。我们将其应用于 Google Speech Commands 小词库dataset，实现了无条件语音合成的州际级结果，并且比现有的扩散模型更快。我们还证明了 ASGAN 的潜在空间是分离的：我们显示了在训练中没有看到的任务上，可以使用简单的直线运算来完成多个任务。 Specifically, we perform evaluations in voice conversion, speech enhancement, speaker verification, and keyword classification. 我们的工作表明，GANs 在无条件语音合成领域仍然非常竞争，并且可以使用分离的潜在空间来帮助泛化到未见到的任务。codes, models, samples可以在 https://github.com/RF5/simple-asgan/ 上找到。

Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure

paper_url: http://arxiv.org/abs/2307.01546
repo_url: None
paper_authors: Yikang Wang, Hiromitsu Nishizaki, Ming Li
for: 这个论文旨在提出一种基于Transformer的多级特征聚合嵌入器（MFA-Conformer）结构，用于音频反假护照（CM）。MFA-Conformer可以同时聚合全局和本地信息，从而帮助CM系统捕捉到假造的音频特征。
methods: 该论文提出了一种基于Conformer模型的转移学习方法，使得CM系统可以通过使用预训练的Conformer模型来增强其鲁棒性。此外，论文还提出了一种使用嵌入器融合多级特征的方法，以提高CM系统的抗误差性。
results: 实验结果表明，MFA-Conformer模型在清晰语音库（FAD）的清洁集上达到了0.038%的EER，远远超过了基eline。此外，转移学习方法在纯音频段上进行了有效的提升。

Abstract
This paper introduces the Multi-scale Feature Aggregation Conformer (MFA-Conformer) structure for audio anti-spoofing countermeasure (CM). MFA-Conformer combines a convolutional neural networkbased on the Transformer, allowing it to aggregate global andlocal information. This may benefit the anti-spoofing CM system to capture the synthetic artifacts hidden both locally and globally. In addition, given the excellent performance of MFA Conformer on automatic speech recognition (ASR) and automatic speaker verification (ASV) tasks, we present a transfer learning method that utilizes pretrained Conformer models on ASR or ASV tasks to enhance the robustness of CM systems. The proposed method is evaluated on both Chinese and Englishs poofing detection databases. On the FAD clean set, the MFA-Conformer model pretrained on the ASR task achieves an EER of 0.038%, which dramatically outperforms the baseline. Moreover, experimental results demonstrate that proposed transfer learning method on Conformer is effective on pure speech segments after voice activity detection processing.

摘要
这篇论文介绍了一种基于Transformer的多级特征汇集声音防 spoofing countermeasure（MFA-Conformer）结构。MFA-Conformer结合了一个卷积神经网络，使其能够汇集全局和本地信息。这可能使防 spoofing CM 系统能够捕捉到假声音中的合成artefacts。此外，基于ASR和ASV任务的预训练Conformer模型的表现很出色，我们提出了一种在CM系统中使用这些预训练模型进行升级，以提高CM系统的Robustness。我们在中文和英文伪声检测数据库上评估了该方法。在FAD清洁集上，预训练MFA-Conformer模型在ASR任务上达到了EER值为0.038%，这在比基准值有很大的提升。此外，实验结果表明，在声音段后的语音活动检测处理后，提出的传输学习方法对Conformer是有效的。

Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays

paper_url: http://arxiv.org/abs/2307.01386
repo_url: None
paper_authors: Yijiang Chen, Chengdong Liang, Xiao-Lei Zhang
for: 这篇论文目的是提高杂音环境中的多渠道话语识别率。
methods: 这篇论文使用的方法包括一个特性聚合块和一个通道选择块，两者都是基于图形。特性聚合块将不同时间和通道的话者特征聚合，使用空间时间图形对于多渠道话语识别。通道选择块则排除了可能对系统造成负面影响的杂音通道。
results: 实验结果显示，提案的方法与六种代表性方法相比，在实验数据中提供了15.39%的相对平均错误率（EER）下降，并在模拟数据中提供了17.70%的相对平均错误率下降。此外，其性能也具有不同讯号对比度和复响时间的Robustness。

Abstract
The performance of speaker verification degrades significantly in adverse acoustic environments with strong reverberation and noise. To address this issue, this paper proposes a spatial-temporal graph convolutional network (GCN) method for the multi-channel speaker verification with ad-hoc microphone arrays. It includes a feature aggregation block and a channel selection block, both of which are built on graphs. The feature aggregation block fuses speaker features among different time and channels by a spatial-temporal GCN. The graph-based channel selection block discards the noisy channels that may contribute negatively to the system. The proposed method is flexible in incorporating various kinds of graphs and prior knowledge. We compared the proposed method with six representative methods in both real-world and simulated environments. Experimental results show that the proposed method achieves a relative equal error rate (EER) reduction of $\mathbf{15.39\%}$ lower than the strongest referenced method in the simulated datasets, and $\mathbf{17.70\%}$ lower than the latter in the real datasets. Moreover, its performance is robust across different signal-to-noise ratios and reverberation time.

摘要
“对于具有强 reverberation 和噪声的恶劣类比测试环境， speaker verification 的性能会显著下降。为了解决这个问题，这篇论文提出了一个使用多条件参数的Graph Convolutional Network（GCN）方法，来进行多条件speaker verification。这个方法包括一个网格汇整块和一个网格选择块，它们都是基于图。网格汇整块会融合不同时间和通道的Speaker feature，通过对图进行汇整。网格基于的通道选择块将潜在干扰通道排除，以避免降低系统性能。提案的方法可以灵活地应用多种图和专业知识。我们与六种代表性方法进行比较，结果显示，提案的方法在实际和模拟环境中均有15.39%和17.70%的相对平均错误率（EER）下降，并且在不同的讯号载度和复合时间下保持稳定性。”

Semantic enrichment towards efficient speech representations

paper_url: http://arxiv.org/abs/2307.01323
repo_url: None
paper_authors: Gaëlle Laperrière, Ha Nguyen, Sahar Ghannay, Bassam Jabaian, Yannick Estève
for: 本研究的目的是提高 spoken language understanding 任务中的 semantic extraction，并且考虑 computation costs。
methods: 本研究使用 SAMU-XLSR 模型，通过特有的域内semantic enrichment来增强多语言Speech representation。同时，我们还使用 same-domain French和Italian benchmarks 来提高 low-resource language 的可移植性，以及 explore cross-domain capacities of the enriched SAMU-XLSR。
results: 本研究表明，通过特有的域内semantic enrichment，可以提高 spoken language understanding 任务中的 semantic extraction性能，同时也可以提高 low-resource language 的可移植性。

Abstract
Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from such textual models to enrich multilingual speech representations with language agnostic semantics. By aiming for better semantic extraction on a challenging Spoken Language Understanding task and in consideration with computation costs, this study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model by specializing it on a small amount of transcribed data from the downstream task. In addition, we show the benefits of the use of same-domain French and Italian benchmarks for low-resource language portability and explore cross-domain capacities of the enriched SAMU-XLSR.

摘要
在过去几年，自我超级学习的发音表示 emerged 作为解决 spoken language understanding (SLU) 任务的有用替代方案。同时，基于庞大文本数据的多语言模型被引入，以编码语言不受限制的 semantics。最近，SAMU-XLSR 方法引入了使用文本模型增强多语言speech表示的方法。通过寻求在具有挑战性的 SLU 任务中更好地EXTRACT semantic information和考虑计算成本，本研究探讨了特定领域内Semantic enhancement of the SAMU-XLSR model by specializing it on a small amount of transcribed data from the downstream task。此外，我们还展示了使用 same-domain French and Italian benchmarks 的低资源语言可移植性和跨领域 capacities of the enriched SAMU-XLSR.

2023-07-04

cs.CV

cs.CV - 2023-07-04

Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana

paper_url: http://arxiv.org/abs/2307.01767
repo_url: None
paper_authors: Darlington Akogo, Issah Samori, Cyril Akafia, Harriet Fiagbor, Andrews Kangah, Donald Kwame Asiedu, Kwabena Fuachie, Luis Oala
for: 这项研究的目的是提高农业生产力和食品安全。
methods: 该研究使用无人机采集数据和机器学习算法来确定作物受到的压力。
results: 研究组共同开发了数据、模型和应用程序，并将其提供给当地农民 via 桌面应用程序。

Abstract
The Ghana Cashew Disease Identification with Artificial Intelligence (CADI AI) project demonstrates the importance of sound data work as a precondition for the delivery of useful, localized datacentric solutions for public good tasks such as agricultural productivity and food security. Drone collected data and machine learning are utilized to determine crop stressors. Data, model and the final app are developed jointly and made available to local farmers via a desktop application.

摘要
《加纳核桃疾病识别用人工智能（CADI AI）项目》表明了准确的数据工作的重要性，作为地方化数据驱动解决方案的先决条件，以提高农业生产力和食品安全。在该项目中，用扫描机采集的数据和机器学习算法来确定作物压力。数据、模型和最终应用程序都是在桌面应用程序上开发的，并且提供给当地农民使用。

Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification

paper_url: http://arxiv.org/abs/2307.01759
repo_url: https://github.com/lugges991/metaformer
paper_authors: Lucas Mahler, Qi Wang, Julius Steiglechner, Florian Birk, Samuel Heczko, Klaus Scheffler, Gabriele Lohmann
for: 这个研究旨在提出一个新的多几个地图扩展器架构（METAFormer），用于抑制Autism Spectrum Disorder（ASD）分类。
methods: 这个架构使用了休息状态功能磁共振成像资料，并使用多几个地图方法，包括AAL、CC200和DOS160地图，以进行自我超vised pretraining。
results: 这个研究显示，METAFormer可以在ABIDE I dataset上超过现有的州��状态表现，具有83.7%的精度和0.832的AUC分数。

Abstract
Autism spectrum disorder (ASD) is a prevalent psychiatric condition characterized by atypical cognitive, emotional, and social patterns. Timely and accurate diagnosis is crucial for effective interventions and improved outcomes in individuals with ASD. In this study, we propose a novel Multi-Atlas Enhanced Transformer framework, METAFormer, ASD classification. Our framework utilizes resting-state functional magnetic resonance imaging data from the ABIDE I dataset, comprising 406 ASD and 476 typical control (TC) subjects. METAFormer employs a multi-atlas approach, where flattened connectivity matrices from the AAL, CC200, and DOS160 atlases serve as input to the transformer encoder. Notably, we demonstrate that self-supervised pretraining, involving the reconstruction of masked values from the input, significantly enhances classification performance without the need for additional or separate training data. Through stratified cross-validation, we evaluate the proposed framework and show that it surpasses state-of-the-art performance on the ABIDE I dataset, with an average accuracy of 83.7% and an AUC-score of 0.832. The code for our framework is available at https://github.com/Lugges991/METAFormer

摘要
“对于自闭症 спектル中的诊断，时间和准确性都是非常重要的。在这个研究中，我们提出了一个新的多几个 Atlases 增强 Transformer 框架，METAFormer，用于自闭症分类。我们的框架使用了 AAL、CC200 和 DOS160 的 Atlases，将其融合为一个入口，并将其传递给 Transformer Encoder。我们还证明了，不需要额外训练数据，通过自我预 обу的方法，可以很好地增强分类性能。通过阶层验证，我们评估了我们的框架，并发现它在 ABIDE I 数据集上超过了现有的州度数据，具有83.7% 的精度和 0.832 的 AUC 得分。我们的代码可以在 GitHub 上找到：https://github.com/Lugges991/METAFormer。”

K-complex Detection Using Fourier Spectrum Analysis In EEG

paper_url: http://arxiv.org/abs/2307.01754
repo_url: None
paper_authors: Alexey Protopopov
for: automatic K-complex detection in EEG records
methods: based on fast Fourier transform, not using neural networks
results: comparable or superior quality to previous methods, including those using neural networks, with less computational power required.

Abstract
K-complexes are an important marker of brain activity and are used both in clinical practice to perform sleep scoring, and in research. However, due to the size of electroencephalography (EEG) records, as well as the subjective nature of K-complex detection performed by somnologists, it is reasonable to automate K-complex detection. Previous works in this field of research have relied on the values of true positive rate and false positive rate to quantify the effectiveness of proposed methods, however this set of metrics may be misleading. The objective of the present research is to find a more accurate set of metrics and use them to develop a new method of K-complex detection, which would not rely on neural networks. Thus, the present article proposes two new methods for K-complex detection based on the fast Fourier transform. The results achieved demonstrated that the proposed methods offered a quality of K-complex detection that is either similar or superior to the quality of the methods demonstrated in previous works, including the methods employing neural networks, while requiring less computational power, meaning that K-complex detection does not require the use of neural networks. The proposed methods were evaluated using a new set of metrics, which is more representative of the quality of K-complex detection.

摘要

SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection

paper_url: http://arxiv.org/abs/2307.01750
repo_url: None
paper_authors: Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo
for: 本文提出了一种单域泛化物体检测（Single-DGOD）框架，旨在学习和维护自增强样本的 semantic 结构，以提高模型的泛化能力。
methods: 我们提出的 SRCD 包括两个主要组成部分： texture-based self-augmentation (TBSA) 模块和 local-global semantic reasoning (LGSR) 模块。 TBSA 模块通过自适应增强来消除影响标签的不相关特征，如光、阴影、颜色等。而 LGSR 模块则用于进一步模型实例特征之间的semantic关系，以暴露和维护内在的 semantic 结构。
results: 我们在多个 benchmark 上进行了广泛的实验，证明了我们提出的 SRCD 的效果。

Abstract
This paper provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model's generalization ability. Different from DGOD trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may have two potential limitations: 1) pseudo attribute-label correlation, due to extremely scarce single-domain data; and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this paper, we introduce Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module, and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD.

摘要
这篇论文提出了一种新的单域总体化对象检测框架（即Single-DGOD），我们在这种框架中学习和维护自适应增强的混合域样本的 semantic structure，以提高模型的通用能力。与多源域DGOD相比，单域DGOD更加具有挑战性，因为只有一个源域，因此模型难以通用到多个目标域。现有方法通常采用类似于DGOD的方法，即学习域无关特征，通过减少或压缩 semantic space来实现。但是，存在两个潜在的限制：1） Pseudo attribute-label correlation，由于单域数据非常稀缺; 2）模型忽略了实例水平的semantic structural information，即 samples中实例之间的semantic关系的强度是对模型泛化的关键。在本文中，我们提出了Semantic Reasoning with Compound Domains（SRCD）模型，其包括两个主要组件：texture-based self-augmentation（TBSA）模块和local-global semantic reasoning（LGSR）模块。TBSA模块通过快速和高效地自适应来消除与标签相关的不相关特征，如光、阴影、颜色等。而LGSR模块则用于进一步模型实例水平的semantic关系，以uncover和维护内在的semantic结构。我们在多个 benchmark上进行了广泛的实验，并证明了我们提出的SRCD的效果。

Ben-ge: Extending BigEarthNet with Geographical and Environmental Data

paper_url: http://arxiv.org/abs/2307.01741
repo_url: https://github.com/hsg-aiml/ben-ge
paper_authors: Michael Mommert, Nicolas Kesseli, Joëlle Hanna, Linus Scheibenreif, Damian Borth, Begüm Demir
for: 本研究旨在探讨多模式 Earth observation 数据的分析方法，并证明 combining 不同数据模式可以提高下游任务的准确率。
methods: 本研究使用了 Earth observation 数据的多模式 combine，包括 patch-based 地用/地形类别和地用/地形分割等下游任务。
results: 研究表明，通过 combining 不同数据模式，可以提高下游任务的准确率，并且可以作为 Earth observation 应用的测试平台。

Abstract
Deep learning methods have proven to be a powerful tool in the analysis of large amounts of complex Earth observation data. However, while Earth observation data are multi-modal in most cases, only single or few modalities are typically considered. In this work, we present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and environmental data. Based on this dataset, we showcase the value of combining different data modalities for the downstream tasks of patch-based land-use/land-cover classification and land-use/land-cover segmentation. ben-ge is freely available and expected to serve as a test bed for fully supervised and self-supervised Earth observation applications.

摘要
深度学习方法在大量复杂的地球观测数据分析中表现出了强大的功能。然而，大多数情况下的地球观测数据是多modal的，但只考虑单个或少数modalities。在这个工作中，我们提供了ben-ge数据集，该数据集收集了全球和自由可用的地理和环境数据，并基于这个数据集，我们展示了不同modalities的结合对下游任务（patch-based 土地用途/土地覆盖分类和土地用途/土地覆盖分割）的价值。ben-ge是免费可用的，预计将成为全supervised和self-supervised Earth observation应用程序的测试床。

Synchronous Image-Label Diffusion Probability Model with Application to Stroke Lesion Segmentation on Non-contrast CT

paper_url: http://arxiv.org/abs/2307.01740
repo_url: None
paper_authors: Jianhai Zhang, Tonghua Wan, Ethan MacDonald, Bijoy Menon, Aravind Ganesh, Qiu Wu
for: 这个论文是为了提出一种基于Markov扩散过程的同步图像标签扩散可能性模型（SDPM），用于非对比CT扫描图像中的血栓病变 segmentation。
methods: 该模型基于一个隐变量模型（LVM），并引入了一个额外网络流，以获得初始噪声标签估计，以便高效地推理最终标签。通过优化指定的可变边界，训练好的模型可以对输入图像噪声给出多个标签估计。
results: 该模型在三个血栓病变数据集上进行测试，包括一个公共数据集和两个私人数据集，并与一些U-net和变换器基于的分割方法进行比较。结果显示，提出的SDPM模型能够达到当前最佳性能。代码公开 disponível。

Abstract
Stroke lesion volume is a key radiologic measurement for assessing the prognosis of Acute Ischemic Stroke (AIS) patients, which is challenging to be automatically measured on Non-Contrast CT (NCCT) scans. Recent diffusion probabilistic models have shown potentials of being used for image segmentation. In this paper, a novel Synchronous image-label Diffusion Probability Model (SDPM) is proposed for stroke lesion segmentation on NCCT using Markov diffusion process. The proposed SDPM is fully based on a Latent Variable Model (LVM), offering a complete probabilistic elaboration. An additional net-stream, parallel with a noise prediction stream, is introduced to obtain initial noisy label estimates for efficiently inferring the final labels. By optimizing the specified variational boundaries, the trained model can infer multiple label estimates for reference given the input images with noises. The proposed model was assessed on three stroke lesion datasets including one public and two private datasets. Compared to several U-net and transformer-based segmentation methods, our proposed SDPM model is able to achieve state-of-the-art performance. The code is publicly available.

摘要
stroke lesion volume 是评估急性血栓roke（AIS）患者 prospect 的关键 radiologic 测量，在 Non-Contrast CT（NCCT）扫描中具有挑战性。 recient diffusion probabilistic 模型已经表现出了用于图像分割的潜在性。在这篇论文中，一种新的同步图像标签Diffusion Probability Model（SDPM）被提出用于NCCT扫描中的roke lesion 分割。 SDPM 基于 Latent Variable Model（LVM），提供了完整的 probabilistic 推导。一个额外的网络流，与噪声预测流并行，用于获取初始噪声标签估计，以高效地推导最终标签。通过优化指定的边界，训练模型可以基于输入图像噪声提供多个标签估计。 proposed SDPM 模型在三个roke lesion 数据集中评估，包括一个公共数据集和两个私人数据集。与一些 U-net 和 transformer 基于的 segmentation 方法相比，我们的提出的 SDPM 模型能够实现状态天表性性能。代码公开可用。

Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis

paper_url: http://arxiv.org/abs/2307.01738
repo_url: None
paper_authors: Changjian Shui, Justin Szeto, Raghav Mehta, Douglas L. Arnold, Tal Arbel
for: 这个研究是为了提高深度学习医学影像分析模型在实际医疗执行中的可靠性和准确性。
methods: 我们提出了一个新的二阶方法：集群专注法，它可以首先识别低准确的样本，然后将它们分为群体，最后将每个群体使用集群专注损失来改善准确偏差。
results: 我们的方法可以帮助控制最差表现的子群体的准确偏差，同时保持预测性能，并比最近的基elines表现更好。

Abstract
Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines.

摘要
信任性的深度学习医疗影像模型在实际临床实践中部署需要进行准确化。然而，即使模型在整体上具有良好的准确性，也可能对一个子population产生差异，导致医生因模型的建议而做出不良决策。虽然有方法可以在不同 subgroup 上减少偏见，但这项工作将关注医疗影像分析中的开放问题——准确性偏见的缓解。我们的方法不需要在训练过程中提供 subgroup 特征，因此可以随时缓解不同敏感特征的偏见。为此，我们提出了一种新的两阶段方法：首先使用 clustering 来identify poorly calibrated samples，然后引入 group-wise focal loss 来改善偏见偏见。我们在皮肤损害分类和多发性硬化病（MS）患者预测未来损害情况上进行了评估。此外，我们还考虑了传统敏感特征（如年龄、性别）和医疗影像分析中必需的图像特征，如肿瘤荷重。我们的结果表明，我们的方法可以控制最差 subgroup 的准确性错误，保持预测性能，并超过最近的基elines。

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

paper_url: http://arxiv.org/abs/2307.02500
repo_url: https://github.com/delyan-boychev/pytorch_trainers_interpretability
paper_authors: Delyan Boychev
for: 本研究旨在评估对抗训练的影响，以生成更加鲁棒的模型，即具有对抗攻击的抵抗能力。
methods: 本研究使用了本地特征重要性方法（SHAP和Integrated Gradients）和特征视觉技术（Representation Inversion和Class Specific Image Generation）进行了广泛的测试和分析。
results: 研究发现，对抗训练可以使计算机视觉模型更加 interpretable，即其学习的特征更加类似于人类的理解。此外，对抗训练的模型在对抗攻击时表现更加鲁棒，并且更加注重图像中的特定区域，以支持其预测。

Abstract
With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.

摘要
随着现代深度神经网络的复杂性不断增加，维护它们的解释性变得越来越困难。我们的工作旨在评估针对性训练可以生成更加鲁棒的模型，以降低对抗性攻击的脆弱性。已经证明了在计算机视觉领域中，使用针对性训练可以提高模型的解释性。当我们将模型部署到实际应用中时，解释性的重要性与鲁棒性一样高。为了证明这两个问题之间的相关性，我们广泛使用本地特征重要性方法（SHAP、整合梯度）和特征视觉技术（ Representation Inversion、类特征图生成）进行检验。对比标准模型和鲁棒模型，后者更易受到抗性攻击，并且它所学习的特征更难以被人类理解。然而，这些模型强调特定的图像区域，这些区域支持它们的预测。此外，鲁棒模型学习的特征更加接近真实的特征。

Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images

paper_url: http://arxiv.org/abs/2307.01704
repo_url: None
paper_authors: Peng Tang, Yang Nan, Tobias Lasser
for: 本研究旨在开发一种基于多模态数据的多标签分类方法，以提高皮肤病诊断的准确性。
methods: 该方法利用图像 convolutional neural network (GCN) 来利用多模态数据的协调关系，并通过自适应 fusion 技术将 GCN 的预测与多模态数据 fusion 模型的预测进行权重平均 fusions，以获取更高的分类精度。
results: 实验结果表明，提出的 Graph-Ensemble Learning Model (GELN) 可以在不同的 dataset 上提高分类性能，并在 SPC 和诊断分类方面达到领先的表现。

Abstract
Many skin lesion analysis (SLA) methods recently focused on developing a multi-modal-based multi-label classification method due to two factors. The first is multi-modal data, i.e., clinical and dermoscopy images, which can provide complementary information to obtain more accurate results than single-modal data. The second one is that multi-label classification, i.e., seven-point checklist (SPC) criteria as an auxiliary classification task can not only boost the diagnostic accuracy of melanoma in the deep learning (DL) pipeline but also provide more useful functions to the clinical doctor as it is commonly used in clinical dermatologist's diagnosis. However, most methods only focus on designing a better module for multi-modal data fusion; few methods explore utilizing the label correlation between SPC and skin disease for performance improvement. This study fills the gap that introduces a Graph Convolution Network (GCN) to exploit prior co-occurrence between each category as a correlation matrix into the DL model for the multi-label classification. However, directly applying GCN degraded the performances in our experiments; we attribute this to the weak generalization ability of GCN in the scenario of insufficient statistical samples of medical data. We tackle this issue by proposing a Graph-Ensemble Learning Model (GELN) that views the prediction from GCN as complementary information of the predictions from the fusion model and adaptively fuses them by a weighted averaging scheme, which can utilize the valuable information from GCN while avoiding its negative influences as much as possible. To evaluate our method, we conduct experiments on public datasets. The results illustrate that our GELN can consistently improve the classification performance on different datasets and that the proposed method can achieve state-of-the-art performance in SPC and diagnosis classification.

摘要
多种皮肤 lesion 分析（SLA）方法最近都在努力开发一种多模态基于多标签分类方法，主要是因为两点。第一，我们有多种模态数据，例如临床和肤视图图像，这些数据可以提供补偿信息，以获得更准确的结果。第二，多标签分类可以不仅提高抑制癌症的深度学习（DL）管道的诊断精度，还可以提供更有用的功能 для临床医生，因为这种分类方法在临床 dermatologist 的诊断中广泛使用。然而，大多数方法都是关注设计更好的多模态数据融合模块，很少方法探讨利用皮病分类标准（SPC）和皮肤病的标签相关性来提高性能。本研究填补了这一空白，通过引入一个图像卷积网络（GCN），利用每个类别之间的协同关系，在 DL 模型中进行多标签分类。然而，直接应用 GCN 会下降性能，我们归因于医学数据的不充分统计样本的问题。我们解决这个问题，提出一种图像ensemble学习模型（GELN），视图GCN 的预测为补充信息，并通过一种权重平均方式，将其与融合模型的预测相乘，以利用 GCN 的有价值信息，同时避免它的负面影响。为评估我们的方法，我们在公共数据集上进行实验。结果表明，我们的 GELN 可以在不同的数据集上具有稳定的分类性能，并且可以在 SPC 和诊断分类中达到国际级的表现。

Augment Features Beyond Color for Domain Generalized Segmentation

paper_url: http://arxiv.org/abs/2307.01703
repo_url: None
paper_authors: Qiyu Sun, Pavlo Melnyk, Michael Felsberg, Yang Tang
for: 这篇论文旨在提出一种适用于不同类型资料的通用Semantic Segmentation方法，并且不需要target资料进行训练。
methods: 我们的方法包括两个模组：随机图像颜色增强（RICA）和随机特征分布增强（RFDA）。RICA将图像从RGB转换为CIELAB颜色模型，并在一个感知基础上随机调整图像以提高图像质量。我们还使用CycleGAN-based生成网络将增强后的图像扩展到特征空间，以更好地丰富数据。
results: 我们进行了广泛的实验，结果显示我们的方法在不同的资料集上（包括Synthia、Cityscapes、BDDS和Mapillary）实现了顶尖的Semantic Segmentation性能。

Abstract
Domain generalized semantic segmentation (DGSS) is an essential but highly challenging task, in which the model is trained only on source data and any target data is not available. Previous DGSS methods can be partitioned into augmentation-based and normalization-based ones. The former either introduces extra biased data or only conducts channel-wise adjustments for data augmentation, and the latter may discard beneficial visual information, both of which lead to limited performance in DGSS. Contrarily, our method performs inter-channel transformation and meanwhile evades domain-specific biases, thus diversifying data and enhancing model generalization performance. Specifically, our method consists of two modules: random image color augmentation (RICA) and random feature distribution augmentation (RFDA). RICA converts images from RGB to the CIELAB color model and randomizes color maps in a perception-based way for image enhancement purposes. We further this augmentation by extending it beyond color to feature space using a CycleGAN-based generative network, which complements RICA and further boosts generalization capability. We conduct extensive experiments, and the generalization results from the synthetic GTAV and SYNTHIA to the real Cityscapes, BDDS, and Mapillary datasets show that our method achieves state-of-the-art performance in DGSS.

摘要
领域普遍 semantic segmentation (DGSS) 是一个非常重要但也非常具有挑战性的任务，模型在训练时只有source数据可用，target数据不可用。先前的DGSS方法可以分为两种：增强型和normalization型。前者可能引入额外的偏见数据或只是执行通道 wise的调整，后者可能会弃用有利的视觉信息，这两者都导致DGSS的表现有限。相反，我们的方法会执行 между通道转换和避免域专偏见，因此可以多样化数据和提高模型的普遍性表现。我们的方法包括两个模组：随机图像颜色增强 (RICA) 和随机特征分布增强 (RFDA)。RICA 将图像从 RGB 转换为 CIELAB 颜色模型，并在感知方式下随机调整图像以增强图像表现。我们继续这个增强，通过使用 CycleGAN 基本的生成网络，该网络可以补充 RICA 并进一步提高普遍能力。我们实现了广泛的实验，并从 sintetic GTAV 和 SYNTHIA Synthetic 资料集到 real Cityscapes、BDDS 和 Mapillary 资料集的一致性结果显示，我们的方法在 DGSS 中实现了顶尖的表现。

Spike-driven Transformer

paper_url: http://arxiv.org/abs/2307.01694
repo_url: https://github.com/biclab/spike-driven-transformer
paper_authors: Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li
for:The paper is written to propose a new deep learning model called Spike-driven Transformer, which incorporates the spike-driven paradigm into the Transformer architecture.methods:The proposed Spike-driven Transformer uses four unique properties: event-driven, binary spike communication, self-attention with linear complexity, and mask and addition operations.results:The Spike-driven Transformer achieves a top-1 accuracy of 77.1% on ImageNet-1K, which is the state-of-the-art result in the SNN field.

Abstract
Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.

摘要
ospiking neural networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.Note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Training Energy-Based Models with Diffusion Contrastive Divergences

paper_url: http://arxiv.org/abs/2307.01668
repo_url: None
paper_authors: Weijian Luo, Hao Jiang, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Zhihua Zhang
for:This paper focuses on improving the efficiency and accuracy of Energy-Based Models (EBMs) for generative modeling, specifically addressing the trade-off between computational burden and validity in Contrastive Divergence (CD) training.methods:The authors propose a new family of Diffusion Contrastive Divergence (DCD) methods that replace the Langevin dynamic used in CD with other EBM-parameter-free diffusion processes, leading to more efficient and accurate training of EBMs.results:The proposed DCD methods outperform CD in terms of computational efficiency and accuracy, as demonstrated through extensive experiments on synthetic data modeling, high-dimensional image denoising and generation, and image generation. Specifically, the proposed DCD achieves better performance than CD on synthetic data learning and image denoising experiments, and is capable of training an EBM for generating the Celab-A $32\times 32$ dataset.

Abstract
Energy-Based Models (EBMs) have been widely used for generative modeling. Contrastive Divergence (CD), a prevailing training objective for EBMs, requires sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which leads to an irreconcilable trade-off between the computational burden and the validity of the CD. Running MCMCs till convergence is computationally intensive. On the other hand, short-run MCMC brings in an extra non-negligible parameter gradient term that is difficult to handle. In this paper, we provide a general interpretation of CD, viewing it as a special instance of our proposed Diffusion Contrastive Divergence (DCD) family. By replacing the Langevin dynamic used in CD with other EBM-parameter-free diffusion processes, we propose a more efficient divergence. We show that the proposed DCDs are both more computationally efficient than the CD and are not limited to a non-negligible gradient term. We conduct intensive experiments, including both synthesis data modeling and high-dimensional image denoising and generation, to show the advantages of the proposed DCDs. On the synthetic data learning and image denoising experiments, our proposed DCD outperforms CD by a large margin. In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.

摘要
energy-based models (EBMs) 已经广泛应用于生成模型。对比тив的游逸差（CD），一种广泛使用的EBMs 训练目标，需要从EBM中采样使用Markov Chain Monte Carlo方法（MCMC），这导致了计算束缚和CD 的有效性之间的一种不可调和的负担。在MCMC 到 converges 的过程中，计算束缚是计算昂贵的。另一方面，短跑MCMC 会带来一个难以处理的非可忽略的参数梯度项。在这篇论文中，我们提供了CD 的通用解释，视其为我们提议的噪声扩散对照法（DCD）家族的一个特例。通过将CD 中的朗格文动力换为其他EBM参数无关的扩散过程，我们提议了更有效的分化。我们表明，我们提议的DCD 比CD更高效，并且不受非可忽略的参数梯度项的限制。我们在对 synthetic data 学习和图像噪声除掉和生成等实验中进行了广泛的测试，并证明了我们的DCD 比CD 大幅提高性能。在图像生成实验中，我们的DCD 能够训练一个能够生成 Celab-A $32\times 32$ 数据集的能量基本模型，与现有EBMs 相当。

Sensors and Systems for Monitoring Mental Fatigue: A systematic review

paper_url: http://arxiv.org/abs/2307.01666
repo_url: None
paper_authors: Prabin Sharma, Joanna C. Justus, Govinda R. Poudel
for: 本研究旨在提供一个批判性概念模型，描述关键实现感疲劳检测技术，并进行系统性审查 latest studies 使用生物感测器系统检测人类的疲劳状态。methods: 本研究使用系统性搜寻和评价57篇文献（N=1082），主要使用电enzephalography（EEG）基于感应器追踪疲劳状态。results: 本研究发现EEG基于感应器可提供moderate至good的感传度来检测疲劳状态。另外，本研究发现高密度EEG感应器无法提供增加的优势。基于发现，本研究提供了一个批判性讨论，探讨将穿戴式EEG和环境感应器integragted into real-world monitoring。未来的工作需要进一步改进和适应这些技术，以便实现广泛的疲劳监控在自主和无人驾驶产业中。

Abstract
Mental fatigue is a leading cause of motor vehicle accidents, medical errors, loss of workplace productivity, and student disengagements in e-learning environment. Development of sensors and systems that can reliably track mental fatigue can prevent accidents, reduce errors, and help increase workplace productivity. This review provides a critical summary of theoretical models of mental fatigue, a description of key enabling sensor technologies, and a systematic review of recent studies using biosensor-based systems for tracking mental fatigue in humans. We conducted a systematic search and review of recent literature which focused on detection and tracking of mental fatigue in humans. The search yielded 57 studies (N=1082), majority of which used electroencephalography (EEG) based sensors for tracking mental fatigue. We found that EEG-based sensors can provide a moderate to good sensitivity for fatigue detection. Notably, we found no incremental benefit of using high-density EEG sensors for application in mental fatigue detection. Given the findings, we provide a critical discussion on the integration of wearable EEG and ambient sensors in the context of achieving real-world monitoring. Future work required to advance and adapt the technologies toward widespread deployment of wearable sensors and systems for fatigue monitoring in semi-autonomous and autonomous industries is examined.

摘要
心理疲劳是主要导致机动车事故、医疗错误、工作场所产量下降和电子学习环境中学生失业的原因。开发可靠跟踪心理疲劳的感应器和系统可以预防事故、减少错误，并帮助提高工作场所产量。本文提供了心理疲劳理论模型的批判摘要、关键实现感应器技术的描述，以及在人类中使用感应器基于系统的评估。我们对最新的文献进行了系统性的搜索和评估，搜索结果共57篇论文（N=1082），大多数使用电encephalography（EEG）基于感应器跟踪心理疲劳。我们发现EEG基于感应器可以提供moderate至good的敏感性 для疲劳检测。另外，我们未发现使用高密度EEG感应器在心理疲劳检测中增加的优势。根据发现，我们提供了在实际监测中 integrating wearable EEG和 ambient sensor的批判讨论，以及将这些技术应用于自动化和半自动化业务的未来工作。

Exploring Transformers for On-Line Handwritten Signature Verification

paper_url: http://arxiv.org/abs/2307.01663
repo_url: None
paper_authors: Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Paula Delgado-Santos, Giuseppe Stragapede, Julian Fierrez, Javier Ortega-Garcia
for: 这个研究旨在评估基于最新的Transformers架构的在线签名验证系统的可靠性。
methods: 研究人员使用四种不同的配置，其中两种使用基于Vanilla Transformer核心的Encoder，另外两种则已经在步行和活动识别任务上得到了成功。
results: 实验结果表明，使用Transformers架构可以提供高度可靠的在线签名验证系统。

Abstract
The application of mobile biometrics as a user-friendly authentication method has increased in the last years. Recent studies have proposed novel behavioral biometric recognition systems based on Transformers, which currently outperform the state of the art in several application scenarios. On-line handwritten signature verification aims to verify the identity of subjects, based on their biometric signatures acquired using electronic devices such as tablets or smartphones. This paper investigates the suitability of architectures based on recent Transformers for on-line signature verification. In particular, four different configurations are studied, two of them rely on the Vanilla Transformer encoder, and the two others have been successfully applied to the tasks of gait and activity recognition. We evaluate the four proposed configurations according to the experimental protocol proposed in the SVC-onGoing competition. The results obtained in our experiments are promising, and promote the use of Transformers for on-line signature verification.

摘要
随着移动生物 метрик作为用户友好的验证方法的应用逐渐增加，最近的研究已经提出了基于转换器的新型行为生物метри克认证系统。在线手写签名验证目的是验证使用电子设备such as 平板或智能手机所获取的生物签名的真实性，以验证个体身份。本文研究了基于最新的转换器架构的在线手写签名验证。特别是，我们研究了四种不同的配置，其中两种使用 Vanilla Transformer 编码器，另外两种已经成功应用于步态和活动识别任务。我们按照SVC-onGoing competition的实验协议进行了测试，实验结果很俊朗，这些结果推荐使用转换器进行在线手写签名验证。

Task Planning Support for Arborists and Foresters: Comparing Deep Learning Approaches for Tree Inventory and Tree Vitality Assessment Based on UAV-Data

paper_url: http://arxiv.org/abs/2307.01651
repo_url: None
paper_authors: Jonas-Dario Troles, Richard Nieding, Sonia Simons, Ute Schmid
for: This paper aims to optimize workflows and increase productivity for arborists and foresters who care for trees in urban areas and forests.
methods: The approach uses RGB and multispectral UAV data, as well as multispectral satellite data and soil moisture sensors, to create tree inventories and vitality assessments.
results: The approach generates helpful information and improves task planning for arborists and foresters, allowing them to better care for trees in urban areas and forests.Here’s the text in Simplified Chinese:
for: 这篇论文是为了优化植物护理人员的工作流程和提高生产力而写的。
methods: 该方法使用RGB和多spectral UAV数据，以及多spectral卫星数据和土壤湿度传感器，创建树木目录和评估树木生长状况。
results: 该方法生成有用的信息，提高植物护理人员的日常任务规划，以便更好地照顾城市公园和森林中的树木。

Abstract
Climate crisis and correlating prolonged, more intense periods of drought threaten tree health in cities and forests. In consequence, arborists and foresters suffer from increasing workloads and, in the best case, a consistent but often declining workforce. To optimise workflows and increase productivity, we propose a novel open-source end-to-end approach that generates helpful information and improves task planning of those who care for trees in and around cities. Our approach is based on RGB and multispectral UAV data, which is used to create tree inventories of city parks and forests and to deduce tree vitality assessments through statistical indices and Deep Learning. Due to EU restrictions regarding flying drones in urban areas, we will also use multispectral satellite data and fifteen soil moisture sensors to extend our tree vitality-related basis of data. Furthermore, Bamberg already has a georeferenced tree cadastre of around 15,000 solitary trees in the city area, which is also used to generate helpful information. All mentioned data is then joined and visualised in an interactive web application allowing arborists and foresters to generate individual and flexible evaluations, thereby improving daily task planning.

摘要
клима紧急情况和相关的长期、更加激烈的干旱期导致城市和森林树木健康受到威胁。因此，树木医生和森林管理员工作量增加，而且最好的情况下也是不断减少的工作力量。为了优化工作流程和提高生产力，我们提出了一种新的开源终端方法，通过使用RGB和多spectral UAV数据来生成有用的信息和改善树木照顾人员在城市和森林中的任务规划。我们的方法基于RGB和多spectral UAV数据，用于创建城市公园和森林的树木目录和树木质量评估通过统计指标和深度学习。由于欧盟对城市区域内飞行无人机的限制，我们还使用多spectral卫星数据和十五个土壤湿度传感器来扩展我们树木质量相关的数据基础。此外，巴姆贝格已经有精确地参考树木目录，包括城市区域内约15,000棵 solitary树木，这些数据也用于生成有用信息。所有提到的数据然后被联合并视觉化在一个交互式网页应用程序中， allowing树木医生和森林管理员在日常任务规划中提高效率。

In-Domain Self-Supervised Learning Can Lead to Improvements in Remote Sensing Image Classification

paper_url: http://arxiv.org/abs/2307.01645
repo_url: None
paper_authors: Ivica Dimitrovski, Ivan Kitanovski, Nikola Simidjievski, Dragi Kocev
for: 本研究旨在探讨自监督学习（SSL）在遥感图像分类中的应用前景，尝试利用大量无标签数据来学习图像表示。
methods: 本研究使用Million AID数据集，通过形式ulated auxiliary tasks来生成 pseudo-标签，并使用ViT模型进行预训练和细化训练。
results: 实验结果显示，使用域内SSL预训练（iBOT框架和ViT模型）在14个多样化的遥感图像分类任务中表现更好，比supervised预训练使用ImageNet数据集。

Abstract
Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to leverage large amounts of unlabeled data. In contrast to traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used to create pseudo-labels for the unlabeled data and learn pre-trained models. The pre-trained models can then be fine-tuned on downstream tasks such as remote sensing image scene classification. The paper analyzes the effectiveness of SSL pre-training using Million AID - a large unlabeled remote sensing dataset on various remote sensing image scene classification datasets as downstream tasks. More specifically, we evaluate the effectiveness of SSL pre-training using the iBOT framework coupled with Vision transformers (ViT) in contrast to supervised pre-training of ViT using the ImageNet dataset. The comprehensive experimental work across 14 datasets with diverse properties reveals that in-domain SSL leads to improved predictive performance of models compared to the supervised counterparts.

摘要

ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour

paper_url: http://arxiv.org/abs/2307.01630
repo_url: None
paper_authors: Samy Tafasca, Anshul Gupta, Jean-Marc Odobez
for: 这个研究是为了预测儿童和成人之间的注视目标，以便更好地诊断developmental disorders。
methods: 我们提出了一个新的注视目标预测模型，利用latest geometry preserving depth inference methods和rich gaze information，以及一个新的 ChildPlay 数据集。
results: 我们的模型在benchmark datasets和 ChildPlay 上 achieved state of the art results，并且发现looking at faces prediction performance on children is much worse than on adults，可以通过 fine-tuning models using child gaze annotations 进一步提高。

Abstract
Gaze behaviors such as eye-contact or shared attention are important markers for diagnosing developmental disorders in children. While previous studies have looked at some of these elements, the analysis is usually performed on private datasets and is restricted to lab settings. Furthermore, all publicly available gaze target prediction benchmarks mostly contain instances of adults, which makes models trained on them less applicable to scenarios with young children. In this paper, we propose the first study for predicting the gaze target of children and interacting adults. To this end, we introduce the ChildPlay dataset: a curated collection of short video clips featuring children playing and interacting with adults in uncontrolled environments (e.g. kindergarten, therapy centers, preschools etc.), which we annotate with rich gaze information. We further propose a new model for gaze target prediction that is geometrically grounded by explicitly identifying the scene parts in the 3D field of view (3DFoV) of the person, leveraging recent geometry preserving depth inference methods. Our model achieves state of the art results on benchmark datasets and ChildPlay. Furthermore, results show that looking at faces prediction performance on children is much worse than on adults, and can be significantly improved by fine-tuning models using child gaze annotations. Our dataset and models will be made publicly available.

摘要
“眼光行为如视线接触或共同注意是诊断儿童发展障碍的重要标志。过往的研究通常仅在私人数据库中进行分析，并仅限于实验室设置。而所有公开可用的眼光目标预测参考 benchmark 几乎都包含成年人，这使得在他们上训练的模型在儿童enario中变得更加不适用。在本文中，我们提出了第一个预测儿童和成年人之间的眼光目标的研究。为此，我们介绍了 ChildPlay 数据集：一个 curaated 的短视频剪辑集，该集包含儿童在不受控制的环境中玩耍和与成年人互动的短片，并将其注解为丰富的眼光信息。我们还提出了一新的眼光目标预测模型，它在Scene parts 的3D 视野（3DFoV）中Explicitly 识别场景元件，并仅仅利用最近的具有 geometry preserving 的深度推测方法。我们的模型在参考数据集和 ChildPlay 上实现了顶尖的结果，并且结果显示，对于儿童的视线预测性能与成年人相比，相对较差，并可以通过 Fine-tuning 模型使用儿童的眼光注解进行改善。我们的数据和模型将会公开可用。”

Learning Lie Group Symmetry Transformations with Neural Networks

paper_url: http://arxiv.org/abs/2307.01583
repo_url: https://github.com/victoria-klein/learning-lie-group-symmetries
paper_authors: Alex Gabel, Victoria Klein, Riccardo Valperga, Jeroen S. W. Lamb, Kevin Webster, Rick Quax, Efstratios Gavves
for: 本研究旨在探讨和评估数据集中的对称性，以便更好地进行模型选择、生成模型和数据分析等。
methods: 本研究使用一种新的方法，可以自动找出数据集中未知的对称性，包括李群对称变换以外的其他对称变换。
results: 研究结果表明，该方法可以在不同的数据点 Parametrization 下成功地描述数据集的对称性，并且可以在不同的设置下进行数据分析和模型选择。

Abstract
The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hard-coding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterizing unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings.

摘要
“检测和量化数据集中的对称性问题有助于选择模型、生成模型和数据分析等方面。现有的方法需要先知道任务的对称性，而这项工作则是发现和描述数据集中未知的对称性，即李群对称变换以外的其他对称性。特别是，我们考虑了一种情况，在这种情况下，数据集已经被一个一参数子群的变换所变换，每个数据点的参数值都不同。我们的目标是Characterizing the transformation group and the distribution of parameter values。结果显示了该方法在这两个设置下的效果。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other parts of the world.

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

paper_url: http://arxiv.org/abs/2307.03270
repo_url: https://github.com/louisbearing/hmo-audio
paper_authors: Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda
for: 这种研究旨在提高深度生成模型中的语音与表情动作的同步，以便生成更自然的人脸动作和语音同步。
methods: 该研究使用了多级音视频同步损失函数和多级权重网络来更好地处理语音和表情动作之间的短期和长期相关性。
results: 实验表明，使用该方法可以大幅提高人脸动作的自然性和多级音视频同步的质量，并且在不同时间尺度上都能够保持良好的同步。

Abstract
Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.

摘要
<>使用深度生成模型动画静止图像是一个活跃的研究领域，近年来有重要的进步。然而，许多努力都是 lip syncing 和图像质量的改进，而Audio-visual相关性和自然的头部运动 generation 则 часто被忽略。在这项工作中，我们提出了一种多级音频视频同步损失和多级自适应GAN，以更好地处理 speech 和头部动作之间的短期和长期相关性。具体来说，我们在 multimodal 输入PYRAMIDS 上训练了一 stack of syncer 模型，并将这些模型作为指导在一种多级生成网络中使用，以生成 audio-aligned 动作，从多个时间尺度中 unfolding。我们的生成器在 facial landmark 空间中运行，这是一个标准的低维度头部表示。实验显示，我们的方法可以在 head motion dynamics 质量和多级音频视频同步两个方面提供显著改进，并在 landmark 空间和图像空间都达到了领先水平。

Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error Diffusion

paper_url: http://arxiv.org/abs/2307.02496
repo_url: None
paper_authors: Nishant Kumar, Lukas Krause, Thomas Wondrak, Sven Eckert, Kerstin Eckert, Stefan Gumhold
for: 这个论文的目的是提高电解质水氢生产的环保性，并且解决在电解质水中产生的气泡对反应干扰、细胞效率和能源消耗的问题。
methods: 这个论文使用了外部磁场传感器测量电解质水中的磁场干扰，并解决了磁场干扰对电解质水中的气泡大小和位置的影响。
results: 研究表明，使用归一化神经网络（INN）可以重建电解质水中的电阻场图像，并且对于这种具有较高分辨率的电阻场图像，INN的性能远胜于提高质量。

Abstract
Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.

摘要
电解是绿色氢生产中不可或缺的过程，但在这个过程中生成的气泡会阻碍反应，降低细胞效率，并增加能源消耗。此外，这些气泡会导致细胞内的电导率发生变化，从而导致细胞外的磁场变化。因此，通过外部磁场传感器测量这些气泡引起的磁场变化，并解决反演比特-萨瓦尔律的逆问题，可以估算细胞内的电导率，并因此计算气泡的大小和位置。然而，从只有几个磁场测量获得高分辨率电导率地图是一个不定性问题。为了解决这个问题，我们利用归一化神经网络（INN）来重建电导率场。我们的资料和量化评估结果表明，INN在比特-萨瓦尔律regularization的情况下表现得更好。

EffSeg: Efficient Fine-Grained Instance Segmentation using Structure-Preserving Sparsity

paper_url: http://arxiv.org/abs/2307.01545
repo_url: None
paper_authors: Cédric Picron, Tinne Tuytelaars
for: 提高实例分割精度和效率
methods: 使用Structure-Preserving Sparsity（SPS）方法，将活动特征、潜在特征和 dense 2D 索引地图分别存储，以保持2D空间结构
results: 与 RefineMask 相当的性能在 COCO 上，降低 FLOPs 71%，提高 FPS 29%

Abstract
Many two-stage instance segmentation heads predict a coarse 28x28 mask per instance, which is insufficient to capture the fine-grained details of many objects. To address this issue, PointRend and RefineMask predict a 112x112 segmentation mask resulting in higher quality segmentations. Both methods however have limitations by either not having access to neighboring features (PointRend) or by performing computation at all spatial locations instead of sparsely (RefineMask). In this work, we propose EffSeg performing fine-grained instance segmentation in an efficient way by using our Structure-Preserving Sparsity (SPS) method based on separately storing the active features, the passive features and a dense 2D index map containing the feature indices. The goal of the index map is to preserve the 2D spatial configuration or structure between the features such that any 2D operation can still be performed. EffSeg achieves similar performance on COCO compared to RefineMask, while reducing the number of FLOPs by 71% and increasing the FPS by 29%. Code will be released.

摘要
多两个阶段实例分割头预测每个实例粗略的28x28 mask，这并不够 capture 多个 объек 的细节信息。为解决这个问题，PointRend 和 RefineMask 预测 112x112 分割mask，得到更高质量的分割结果。然而，这两种方法都有局限性，一是不能访问周围特征（PointRend），二是在所有空间位置上进行计算（RefineMask）。在这项工作中，我们提出了高效的 EffSeg，实现细化实例分割，使用我们的结构保持稀疏（SPS）方法，基于分开存储活动特征、游离特征和 dense 2D 索引地图，包含特征索引。索引地图的目的是保持 2D 空间配置或结构，以便任何 2D 操作仍可以进行。EffSeg 在 COCO 上与 RefineMask 的性能相似，而减少 FLOPs 的数量为 71%，并提高 FPS 的值为 29%。代码将被公开。

Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations

paper_url: http://arxiv.org/abs/2307.01533
repo_url: https://github.com/anilosmantur/conditioned_video_anomaly_diffusion
paper_authors: Anil Osman Tur, Nicola Dall’Asen, Cigdem Beyan, Elisa Ricci
for: 本研究旨在解决无监督视频异常检测（VAD）问题，即根据视频帧是否为正常或异常来分类，无需任何标签。
methods: 本方法使用条件扩散模型，输入数据是一个预训练网络提取的空间时间特征，条件是一个缩写视频段的运动和外观特征。
results: 我们的方法使用数据驱动的阈值，并将高重建错误视为异常事件。实验结果表明，我们的方法可以在两个大规模的VAD benchmark上提高异常检测性能，特别是在不同的数据集上表现更好，超过了当前state-of-the-art和基线方法。

Abstract
This paper aims to address the unsupervised video anomaly detection (VAD) problem, which involves classifying each frame in a video as normal or abnormal, without any access to labels. To accomplish this, the proposed method employs conditional diffusion models, where the input data is the spatiotemporal features extracted from a pre-trained network, and the condition is the features extracted from compact motion representations that summarize a given video segment in terms of its motion and appearance. Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events. This study is the first to utilize compact motion representations for VAD and the experiments conducted on two large-scale VAD benchmarks demonstrate that they supply relevant information to the diffusion model, and consequently improve VAD performances w.r.t the prior art. Importantly, our method exhibits better generalization performance across different datasets, notably outperforming both the state-of-the-art and baseline methods. The code of our method is available at https://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusion

摘要
本文目的是解决无监督视频异常检测（VAD）问题，即对每帧视频进行正常或异常分类，无需访问标签。为此，我们提出的方法使用条件扩散模型，其输入数据为预训练网络提取的空间时间特征，条件为视频段异常特征概括。我们的方法使用数据驱动的阈值和高重建错误作为异常事件指标。本研究是首次利用异常动作表示来实现VAD，并在两个大规模VAD标准数据集上进行了实验，证明它们为扩散模型提供有用信息，并在对比先前艺术方法时显著提高VAD性能。重要的是，我们的方法在不同数据集上表现更好的泛化性，特别是在比较先前和基eline方法时表现出优异。代码可以在https://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusion上获取。

Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.01524
repo_url: https://github.com/DL4Compression/Semantic_Segmentation_of_Driving_Videos_on_Learning_based_Image_Compression
paper_authors: Ravi Kakaiya, Rakshith Sathish, Ramanathan Sethuraman, Debdoot Sheet
for: 这个论文目的是提出一种基于学习的压缩编码器，以减少自这些汽车与高级驾驶辅助系统（ADAS）中的数据传输网络延迟。
methods: 这个方法使用学习基于的压缩编码器，以减少从汽车给云端服务器的数据传输网络延迟。实验 validate 了提案的管道，并证明了学习压缩表示可以用于进行像分类 segmentation 的任务，同时实现了 $66 \times$ 的压缩因子，并保留了适用于分类的信息，而且降低了总 Compute 的 $11%$。
results: 这个研究的结果显示，使用学习基于的压缩编码器可以实现高效的数据传输和分类 tasks，并且可以降低从汽车至云端服务器的网络延迟。

Abstract
Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have the potential to radically change the way we travel. Many such vehicles currently rely on segmentation and object detection algorithms to detect and track objects around its surrounding. The data collected from the vehicles are often sent to cloud servers to facilitate continual/life-long learning of these algorithms. Considering the bandwidth constraints, the data is compressed before sending it to servers, where it is typically decompressed for training and analysis. In this work, we propose the use of a learning-based compression Codec to reduce the overhead in latency incurred for the decompression operation in the standard pipeline. We demonstrate that the learned compressed representation can also be used to perform tasks like semantic segmentation in addition to decompression to obtain the images. We experimentally validate the proposed pipeline on the Cityscapes dataset, where we achieve a compression factor up to $66 \times$ while preserving the information required to perform segmentation with a dice coefficient of $0.84$ as compared to $0.88$ achieved using decompressed images while reducing the overall compute by $11\%$.

摘要
自动驾驶车和高级驾驶助手系统（ADAS）有可能对我们的旅行方式产生重大变革。许多这些车辆目前使用分割和物体检测算法来检测和跟踪周围的对象。收集到的数据通常将被送往云服务器以便持续学习这些算法。由于带宽限制，这些数据通常会进行压缩，然后在服务器上解压缩以进行训练和分析。在这项工作中，我们提议使用学习基于的压缩编码器来减少标准管道中的延迟过程中的开销。我们示出了学习压缩表示可以同时完成压缩和减少计算的任务，例如 semantic segmentation。我们对Cityscapes dataset进行实验，并实现了最高的66倍压缩因子，保留了用于完成分割的信息，而计算总量减少了11%。

LPN: Language-guided Prototypical Network for few-shot classification

paper_url: http://arxiv.org/abs/2307.01515
repo_url: None
paper_authors: Kaihui Cheng, Chule Yang
for: 这篇论文主要针对几何shot分类问题进行研究，旨在对新任务进行适应，并充分利用可用的数据。
methods: 本文提出了一个名为Language-guided Prototypical Network（LPN）的方法，它利用视觉和语言模式的共同作用，通过两个平行分支来实现这一目的。
results: 实验结果显示，LPN方法与现有方法相比，在 benchmark 数据集上表现竞争力强。

Abstract
Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the accessible data, recent methods explore suitable measures for the similarity between the query and support images and better high-dimensional features with meta-training and pre-training strategies. However, the potential of multi-modality information has barely been explored, which may bring promising improvement for few-shot classification. In this paper, we propose a Language-guided Prototypical Network (LPN) for few-shot classification, which leverages the complementarity of vision and language modalities via two parallel branches. Concretely, to introduce language modality with limited samples in the visual task, we leverage a pre-trained text encoder to extract class-level text features directly from class names while processing images with a conventional image encoder. Then, a language-guided decoder is introduced to obtain text features corresponding to each image by aligning class-level features with visual features. In addition, to take advantage of class-level features and prototypes, we build a refined prototypical head that generates robust prototypes in the text branch for follow-up measurement. Finally, we aggregate the visual and text logits to calibrate the deviation of a single modality. Extensive experiments demonstrate the competitiveness of LPN against state-of-the-art methods on benchmark datasets.

摘要
《语言指导的原型网络（LPN） для少量类别分类》目标是在新任务上适应，以限量标注示例进行训练。为了充分利用可用数据，当前方法探索适合查询和支持图像之间的相似度度量和高维特征的meta-training和预训练策略。然而，多Modal信息的潜力尚未得到充分利用，这可能带来了promising的改进。在这篇论文中，我们提议一种语言指导的原型网络（LPN），该网络利用视觉和语言Modal的 complementarity，通过两个平行分支来进行图像分类。具体来说，为了在视觉任务中使用有限的语言样本，我们利用预训练的文本编码器提取类别特征直接从类名称中，然后将这些特征与图像进行处理。接着，我们引入语言指导的解码器，以获取每个图像的特有的文本特征。此外，为了利用类别特征和原型，我们构建了一个精细的原型头，该头在文本分支中生成了robust的原型。最后，我们将视觉和文本的启示拼接起来，以调整单一模式的偏差。广泛的实验表明，LPN在标准 benchmark 数据集上与当前状态OF-the-art方法竞争。

SelfFed: Self-supervised Federated Learning for Data Heterogeneity and Label Scarcity in IoMT

paper_url: http://arxiv.org/abs/2307.01514
repo_url: None
paper_authors: Sunder Ali Khowaja, Kapal Dev, Syed Muhammad Anwar, Marius George Linguraru
for: 这个研究旨在提出一个基于自适应学习的联邦学习框架，以解决联邦学习中的数据不均匀问题和标签稀缺问题。
methods: 我们提出了一个两阶段的方法，首先是预训练阶段，使用Swin Transformer基本Encoder在分散式的方式下进行增强模型。第二阶段是精度调整阶段，引入了对照网络和一个新的联合策略，用于在分散式的方式下训练仅有几个标签的目标任务。
results: 我们在公开可用的医疗影像数据集上进行实验分析，结果显示我们的提出的SelfFed框架在非相同和相同数据集上比 existed baseline perform 更好，尤其是在标签稀缺的情况下。我们的方法在非IID数据集上取得最大改进率为8.8%和4.1%。此外，我们的方法甚至在仅使用10%标签的情况下也能够超越 existed baseline。

Abstract
Self-supervised learning in federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for Internet of Medical Things (IoMT). Our proposed SelfFed framework works in two phases. The first phase is the pre-training paradigm that performs augmentive modeling using Swin Transformer based encoder in a decentralized manner. The first phase of SelfFed framework helps to overcome the data heterogeneity issue. The second phase is the fine-tuning paradigm that introduces contrastive network and a novel aggregation strategy that is trained on limited labeled data for a target task in a decentralized manner. This fine-tuning stage overcomes the label scarcity problem. We perform our experimental analysis on publicly available medical imaging datasets and show that our proposed SelfFed framework performs better when compared to existing baselines concerning non-independent and identically distributed (IID) data and label scarcity. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID dataset. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.

摘要
<> translate the following text into Simplified ChineseSelf-supervised learning in federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for Internet of Medical Things (IoMT). Our proposed SelfFed framework works in two phases. The first phase is the pre-training paradigm that performs augmentive modeling using Swin Transformer based encoder in a decentralized manner. The first phase of SelfFed framework helps to overcome the data heterogeneity issue. The second phase is the fine-tuning paradigm that introduces contrastive network and a novel aggregation strategy that is trained on limited labeled data for a target task in a decentralized manner. This fine-tuning stage overcomes the label scarcity problem. We perform our experimental analysis on publicly available medical imaging datasets and show that our proposed SelfFed framework performs better when compared to existing baselines concerning non-independent and identically distributed (IID) data and label scarcity. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID dataset. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.中文简体版：自我超级vised学习在联邦学习模式下受到了互联网医疗器件（IoMT）领域的广泛关注，因为它可以在无标签的隔离数据上进行协同学习。然而，基于自我超级vised的联邦学习策略受到了数据不一致和标签稀缺的问题的影响。在这篇论文中，我们提出了SelfFed框架，用于解决这些问题。我们的SelfFed框架包括两个阶段：首先是预训练阶段，使用SwinTransformer基于 encoder 进行增强模型化，在分布式方式下进行。这个阶段可以减轻数据不一致问题。第二个阶段是精度调整阶段，引入对比网络和一种新的聚合策略，通过限量标签数据进行定制。这个阶段可以解决标签稀缺问题。我们在公共可用的医学成像数据集上进行了实验分析，并证明了我们的SelfFed框架在非独立和同分布（IID）数据上比较出色。我们的方法在非IID数据上最大提高8.8%和4.1%的提升。此外，我们的提posed方法还可以在只有10%的标签实例上进行训练，并且在这些情况下也能够超越现有的基elines。

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

paper_url: http://arxiv.org/abs/2307.01492
repo_url: https://github.com/nvlabs/fb-bev
paper_authors: Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez
for: 本研究旨在提出一个获胜解决方案来3D占用预测挑战，该挑战是CVPR 2023 Workshop on End-to-End Autonomous Driving和CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop的一部分。
methods: 本研究基于FB-BEV，一个前进后退投影的镜头基础预测设计。在FB-BEV的基础上，我们进一步研究了特有的设计和优化，包括共同深度-Semantic预训练、共同矩阵-BEV表示、模型缩放和有效的后处理策略。
results: 这些设计和优化导致在nuScenes数据集上的mIoU分数为54.19%，在挑战赛道上排名第一。代码和模型将在：https://github.com/NVlabs/FB-BEV 中发布。

Abstract
This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection. On top of FB-BEV, we further study novel designs and optimization tailored to the 3D occupancy prediction task, including joint depth-semantic pre-training, joint voxel-BEV representation, model scaling up, and effective post-processing strategies. These designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track. Code and models will be released at: https://github.com/NVlabs/FB-BEV.

摘要
这份技术报告介绍了在CVPR 2023 工作坊上的3D占用预测挑战赛中获胜的解决方案，该挑战赛与CVPR 23 工作坊联合举行。我们的提议方案FB-OCC基于FB-BEV，这是一种前瞻型镜头视图识别设计，使用前进和后退投影。在FB-BEV的基础之上，我们进一步研究了适应3D占用预测任务的新设计和优化策略，包括共同深度semantic预训练、共同矩阵BEV表示、模型缩放和有效 posterior 处理策略。这些设计和优化使得我们在nuScenes数据集上 achieve 54.19%的mIoU分数，在挑战赛中名列第一名。代码和模型将在：https://github.com/NVlabs/FB-BEV 中发布。

Semantic Segmentation on 3D Point Clouds with High Density Variations

paper_url: http://arxiv.org/abs/2307.01489
repo_url: None
paper_authors: Ryan Faulkner, Luke Haub, Simon Ratcliffe, Ian Reid, Tat-Jun Chin
for: 这篇论文是为了解决 LiDAR 探测应用中的大规模 3D 点云对应问题，这些点云具有广泛的区域和距离，并且具有巨大的地方密度变化。
methods: 这篇论文提出了一个名为 HDVNet 的新架构，这个架构包含一个嵌入的集合 Encoder-Decoder 路径，每个路径处理特定的点密度范围。限制Feature Map 之间的连接，使得 HDVNet 能够根据点密度来衡量每个特征的可靠性，例如，对于低密度物体而言，杜重高密度的特征。
results: 在实际的点云数据中，HDVNet 比 state-of-the-art 模型具有更高的准确性，只需使用一半的 Parameters。

Abstract
LiDAR scanning for surveying applications acquire measurements over wide areas and long distances, which produces large-scale 3D point clouds with significant local density variations. While existing 3D semantic segmentation models conduct downsampling and upsampling to build robustness against varying point densities, they are less effective under the large local density variations characteristic of point clouds from surveying applications. To alleviate this weakness, we propose a novel architecture called HDVNet that contains a nested set of encoder-decoder pathways, each handling a specific point density range. Limiting the interconnections between the feature maps enables HDVNet to gauge the reliability of each feature based on the density of a point, e.g., downweighting high density features not existing in low density objects. By effectively handling input density variations, HDVNet outperforms state-of-the-art models in segmentation accuracy on real point clouds with inconsistent density, using just over half the weights.

摘要
李达尔扫描 для测量应用程序获取大面积和长距离的测量值，生成大规模的3D点云，点云中存在显著的地方密度变化。现有的3D semantic segmentation模型通过下采样和上采样来建立对点云密度变化的鲁棒性，但这些模型对大规模的点云密度变化不够有效。为解决这个弱点，我们提出了一种新的架构 called HDVNet，它包含一个嵌入式的编码器-解码器路径，每个路径处理特定的点密度范围。限制Feature map之间的连接，使得 HDVNet 可以根据点的密度来评估特征的可靠性，例如，减重高密度特征不存在低密度对象中。通过有效地处理输入密度变化，HDVNet 在实际点云中 segments 精度高于状态机制模型，使用只有一半的权重。

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

paper_url: http://arxiv.org/abs/2307.01486
repo_url: https://github.com/shijun18/h-denseformer
paper_authors: Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue
for: 这篇论文旨在提出一个混合式密接连接网络（H-DenseFormer）来进行肿瘤分类，以提高多modalities的医疗影像识别能力。
methods: 这篇论文使用了一个具有多路平行嵌入（MPE）模组，可以将多modalities的输入资料组合成多modalities的融合特征。然后，这些融合特征会被传递到不同层次的encoder中进行增强多modalities的学习表现。此外，还设计了一个轻量级的密接连接几何（DCT）封顶，以取代标准几何封顶，从而实现显著的计算量削减。
results: 在HECKTOR21和PI-CAI22两个公共多modalities datasets上进行了广泛的实验，结果显示，我们的提案方法在与现有的州前方法进行比较中，具有更高的识别率和更低的计算量。代码可以在https://github.com/shijun18/H-DenseFormer上取得。

Abstract
Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer.

摘要
近些年来，深度学习方法在多Modal医学影像肿瘤分割方面取得了可观的成果。然而，现有的方法受到表达能力不充分、特定的Modal数量和高计算复杂性的限制。在这篇论文中，我们提出了一种混合 densely connected network для肿瘤分割，称为 H-DenseFormer，它将 CNN 和 Transformer 结构相结合。具体来说，H-DenseFormer 使用 Transformer 结构基于多Path Parallel Embedding（MPE）模块，可以将任意数量的Modalities作为输入，以提取不同Modalities中的融合特征。然后，多Modal融合特征被传递到不同层次的编码器，以增强多Modal学习表达。此外，我们设计了一种轻量级的 Densely Connected Transformer（DCT）块，以取代标准 Transformer 块，从而实现显著降低计算复杂性。我们对公共的两个多Modal 数据集进行了广泛的实验，结果表明，我们的提议方法在计算复杂性下与现有的状态态势表达方法相比，表现出色，并且可以在多Modal 数据集上达到更高的识别率。代码可以在 https://github.com/shijun18/H-DenseFormer 上找到。

A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding

paper_url: http://arxiv.org/abs/2307.01470
repo_url: None
paper_authors: Pavan Kumar Sharma, Pranamesh Chakraborty
for: 本研究的主要目标是对 Driver Gaze 基础知识、测算方法和应用场景进行全面的概述，以便更好地理解和应用 Driver Gaze 技术。
methods: 本研究主要使用 Head-mounted 和远程设置基于眼动估计的方法，并详细介绍了现有的 Driver Gaze 数据集和估计算法，包括传统机器学习和深度学习等技术。
results: 本研究使用 Driver Gaze 估计结果来理解 drivers 在拐弯、加速和减速时的眼动行为，以及道路投放广告结构的影响。同时，本研究也提出了现有文献的限制、挑战和未来发展方向。

Abstract
Driver gaze plays an important role in different gaze-based applications such as driver attentiveness detection, visual distraction detection, gaze behavior understanding, and building driver assistance system. The main objective of this study is to perform a comprehensive summary of driver gaze fundamentals, methods to estimate driver gaze, and it's applications in real world driving scenarios. We first discuss the fundamentals related to driver gaze, involving head-mounted and remote setup based gaze estimation and the terminologies used for each of these data collection methods. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and the equipment used for such data collection. This is followed by a discussion of the algorithms used for driver gaze estimation, which primarily involves traditional machine learning and deep learning based techniques. The estimated driver gaze is then used for understanding gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and determining the effect of roadside advertising structures. Finally, we have discussed the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.

摘要
Driver's gaze plays an important role in various gaze-based applications such as driver attentiveness detection, visual distraction detection, and understanding gaze behavior. The main objective of this study is to provide a comprehensive summary of driver gaze fundamentals, methods for estimating driver gaze, and its applications in real-world driving scenarios.First, we discuss the fundamentals of driver gaze, including head-mounted and remote setup-based gaze estimation, and the terminologies used for each of these data collection methods. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and equipment used for such data collection.Then, we discuss the algorithms used for driver gaze estimation, which primarily involve traditional machine learning and deep learning-based techniques. The estimated driver gaze is used for understanding gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and determining the effect of roadside advertising structures.Finally, we discuss the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.

Generating Animatable 3D Cartoon Faces from Single Portraits

paper_url: http://arxiv.org/abs/2307.01468
repo_url: None
paper_authors: Chuanyu Pan, Guowei Yang, Taijiang Mu, Yu-Kun Lai
for: 本研究旨在提供一种生成可animatable 3D漫画人脸的新方法，以满足现代虚拟现实技术的需求。
methods: 我们提出了一种两stage reconstruction方法，首先使用StyleGAN将输入的真实世界照片转换为 стили化漫画图像，然后通过非rigid deformation以准确地还原3D漫画人脸的细节xture。最后，我们提出了一种基于手动创建的模板和变形传递的semantic preserving face rigging方法。
results: 相比之前的研究，我们的方法可以更好地保持人脸的准确性、美观性和相似性标准。此外，我们还实现了在实时进行人脸动画的能力。

Abstract
With the booming of virtual reality (VR) technology, there is a growing need for customized 3D avatars. However, traditional methods for 3D avatar modeling are either time-consuming or fail to retain similarity to the person being modeled. We present a novel framework to generate animatable 3D cartoon faces from a single portrait image. We first transfer an input real-world portrait to a stylized cartoon image with a StyleGAN. Then we propose a two-stage reconstruction method to recover the 3D cartoon face with detailed texture, which first makes a coarse estimation based on template models, and then refines the model by non-rigid deformation under landmark supervision. Finally, we propose a semantic preserving face rigging method based on manually created templates and deformation transfer. Compared with prior arts, qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria. Furthermore, we demonstrate the capability of real-time facial animation of our 3D model.

摘要
随着虚拟现实（VR）技术的发展，个性化3D人物模型的需求在增长。然而，传统的3D人物模型方法 Either time-consuming or failure to retain the similarity of the person being modeled. We present a novel framework to generate animatable 3D cartoon faces from a single portrait image. We first transfer an input real-world portrait to a stylized cartoon image with a StyleGAN. Then we propose a two-stage reconstruction method to recover the 3D cartoon face with detailed texture, which first makes a coarse estimation based on template models, and then refines the model by non-rigid deformation under landmark supervision. Finally, we propose a semantic preserving face rigging method based on manually created templates and deformation transfer. Compared with prior arts, qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria. Furthermore, we demonstrate the capability of real-time facial animation of our 3D model.Here's a breakdown of the translation:* "随着" (suī zhe) - "with the development of"* "虚拟现实" (hū zhì shèng jì) - "virtual reality"* "技术" (jì shu) - "technology"* "个性化" (ge xìng yào) - "customized"* "3D人物模型" (3D rén wù molding) - "3D avatar model"* "需求" (xū yè) - "need"* "在增长" (zài jìn cháng) - "is growing"* "传统的" (chuán tǒng de) - "traditional"* "3D人物模型方法" (3D rén wù molding fāng fá) - "3D avatar modeling methods"* "Either time-consuming or failure to retain the similarity of the person being modeled" - This phrase is translated as "或者时间耗费或失去人物相似性" (or zhī shí hòu fèi or shī qù rén wù xiàng yì)* "We present a novel framework to generate animatable 3D cartoon faces from a single portrait image" - This phrase is translated as "我们提出了一种新的框架，可以从单一的肖像图片中生成可动的3D漫画人脸" (wǒ men tī shè le yī zhōng xīn de kāng yì, cóng zhī yī zhōng shèng cháng yǐn dào yī zhōng)* "We first transfer an input real-world portrait to a stylized cartoon image with a StyleGAN" - This phrase is translated as "我们首先将输入的真实世界肖像图片转化为一个风格化的漫画人脸，使用StyleGAN" (wǒ men chū xiān shì yǐn zhī shì jīn shì, cóng zhī yī zhōng shèng cháng yǐn dào yī zhōng)* "Then we propose a two-stage reconstruction method to recover the 3D cartoon face with detailed texture" - This phrase is translated as "然后我们提出了一种两阶段重建方法，以获取3D漫画人脸的详细TEXTURE" (rán hái wǒ men tī shè le yī zhōng èr jiāng tiě jīng fāng fá, yǐn dào jīn shì sān jiāng)* "which first makes a coarse estimation based on template models, and then refines the model by non-rigid deformation under landmark supervision" - This phrase is translated as "首先基于模板模型进行大致的估计，然后通过非固定形态的扭变以下领别监督进行细化" (chū shí zhī shì yǐn zhī shì, cóng zhī yī zhōng shèng cháng yǐn dào)* "Finally, we propose a semantic preserving face rigging method based on manually created templates and deformation transfer" - This phrase is translated as "最后，我们提出了一种基于手动创建的模板和形态传递的 semantics保持的人脸动画方法" (wǒ men tī shè le yī zhōng xiān shì yǐn zhī shì, cóng zhī yī zhōng shèng cháng yǐn dào)* "Compared with prior arts, qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria" - This phrase is translated as "与之前的艺术比较，我们的方法在准确性、艺术性和相似性标准上都达到了更高的水平" (yǔ zhī qián de yì shù bǐ kě, wǒ men de fāng fá zài jìn cháng yǐn dào zhèng zhì, xiàng yì yì shù)* "Furthermore, we demonstrate the capability of real-time facial animation of our 3D model" - This phrase is translated as "此外，我们还展示了我们的3D模型在实时动画方面的能力" (qí wài, wǒ men hái jiǎng yǐn dào)

Technical Report for Ego4D Long Term Action Anticipation Challenge 2023

paper_url: http://arxiv.org/abs/2307.01467
repo_url: None
paper_authors: Tatsuya Ishibashi, Kosuke Ono, Noriyuki Kugo, Yuji Sato
for: 预测未来动作序列
methods: 基于SlowFast和SlowFast-CLIP模型的ensemble，加入 Label smoothing 和动作类别（动词、名称）的约束
results: 超越基线性能，在公共领先板上记录第二名成绩

Abstract
In this report, we describe the technical details of our approach for the Ego4D Long-Term Action Anticipation Challenge 2023. The aim of this task is to predict a sequence of future actions that will take place at an arbitrary time or later, given an input video. To accomplish this task, we introduce three improvements to the baseline model, which consists of an encoder that generates clip-level features from the video, an aggregator that integrates multiple clip-level features, and a decoder that outputs Z future actions. 1) Model ensemble of SlowFast and SlowFast-CLIP; 2) Label smoothing to relax order constraints for future actions; 3) Constraining the prediction of the action class (verb, noun) based on word co-occurrence. Our method outperformed the baseline performance and recorded as second place solution on the public leaderboard.

摘要
在这份报告中，我们介绍了我们对Ego4D长期动作预测挑战2023年的技术细节。该任务的目标是基于输入视频预测未来动作的序列，我们在这里引入了三种改进基eline模型，即encoder生成视频clip级别特征，aggregator将多个clip级别特征集成，以及decoder输出Z个未来动作。1）SlowFast和SlowFast-CLIP模型集成；2）将未来动作顺序约束松弛为label smoothing；3）基于单词共occurrence constrain动作类别（动词、名词）预测。我们的方法比基eline表现更好，在公共排名板上录制第二名。

AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation

paper_url: http://arxiv.org/abs/2307.01465
repo_url: None
paper_authors: Yunqing Zhao, Keshigeyan Chandrasegaran, Abdollahzadeh Milad, Chao Du, Tianyu Pang, Ruoteng Li, Henghui Ding, Ngai-Man Cheung
for: 这个论文目的是解决几个示例（例如10）训练样本的图像生成问题。
methods: 这个论文使用了一个已经在大规模源频道上预训练的GAN，并将其适应到目标频道。Central to recent FSIG methods are knowledge preservation criteria, which select and preserve a subset of source knowledge to the adapted model.
results: 我们的研究表明，许多现有的State-of-the-art（SOTA）方法，只考虑源频道，在不同的频道距离下进行适应，表现不佳。我们的第二个贡献是提出了适应aware的kernel Modulation（AdAM），用于通用的几个源-目标频道距离的图像生成。广泛的实验表明，AdAM可以在挑战性的设置下 consistently achieve SOTA performance。

Abstract
Few-shot image generation (FSIG) aims to learn to generate new and diverse images given few (e.g., 10) training samples. Recent work has addressed FSIG by leveraging a GAN pre-trained on a large-scale source domain and adapting it to the target domain with few target samples. Central to recent FSIG methods are knowledge preservation criteria, which select and preserve a subset of source knowledge to the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/task and fail to consider target domain/adaptation in selecting source knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. Firstly, we revisit recent FSIG works and their experiments. We reveal that under setups which assumption of close proximity between source and target domains is relaxed, many existing state-of-the-art (SOTA) methods which consider only source domain in knowledge preserving perform no better than a baseline method. As our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) for general FSIG of different source-target domain proximity. Extensive experiments show that AdAM consistently achieves SOTA performance in FSIG, including challenging setups where source and target domains are more apart.

摘要
几个示例图像生成（FSIG）目标是通过几个（例如10）训练样本学习生成新和多样化的图像。 latest work has addressed FSIG by leveraging a pre-trained GAN on a large-scale source domain and adapting it to the target domain with few target samples. central to recent FSIG methods are knowledge preservation criteria, which select and preserve a subset of source knowledge to the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/task and fail to consider target domain/adaptation in selecting source knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. Firstly, we revisit recent FSIG works and their experiments. We reveal that under setups which assumption of close proximity between source and target domains is relaxed, many existing state-of-the-art (SOTA) methods which consider only source domain in knowledge preserving perform no better than a baseline method. As our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) for general FSIG of different source-target domain proximity. Extensive experiments show that AdAM consistently achieves SOTA performance in FSIG, including challenging setups where source and target domains are more apart.

Unsupervised Quality Prediction for Improved Single-Frame and Weighted Sequential Visual Place Recognition

paper_url: http://arxiv.org/abs/2307.01464
repo_url: None
paper_authors: Helen Carson, Jason J. Ford, Michael Milford
for: 本研究旨在提高自动驾驶系统的定位和视觉定位技术的可靠性和预测性。
methods: 本研究使用一种新的训练自由的方法来预测定位估计的质量，并使用这些预测来偏补一种序列匹配过程，以实现更高的精度性能。
results: 在四个数据集和三种视觉定位技术上，我们的结合系统可以提高定位精度性能，特别是在高精度低匹配点操作点上。我们还提供了减少和分析，以分析预测系统和偏补序列匹配器的性能贡献。

Abstract
While substantial progress has been made in the absolute performance of localization and Visual Place Recognition (VPR) techniques, it is becoming increasingly clear from translating these systems into applications that other capabilities like integrity and predictability are just as important, especially for safety- or operationally-critical autonomous systems. In this research we present a new, training-free approach to predicting the likely quality of localization estimates, and a novel method for using these predictions to bias a sequence-matching process to produce additional performance gains beyond that of a naive sequence matching approach. Our combined system is lightweight, runs in real-time and is agnostic to the underlying VPR technique. On extensive experiments across four datasets and three VPR techniques, we demonstrate our system improves precision performance, especially at the high-precision/low-recall operating point. We also present ablation and analysis identifying the performance contributions of the prediction and weighted sequence matching components in isolation, and the relationship between the quality of the prediction system and the benefits of the weighted sequential matcher.

摘要
“尽管当地化和视觉地标识（VPR）技术的绝对性表现已经取得了 significiant 进步，但是在将这些系统应用于实际应用中，其他特性如完整性和预测性也变得越来越重要，特别是 для安全或操作critical的自动驾驶系统。在这项研究中，我们提出了一种新的、无需训练的方法来预测当地化估计的质量，以及一种新的方法来使用这些预测来偏补一个序列匹配过程，以实现更高的性能提升。我们的整体系统轻量级、实时执行，并且对于下述VPR技术无关。在四个数据集和三种VPR技术的广泛实验中，我们证明了我们的系统可以提高精度性能，特别是在高精度/低回归点操作点。我们还提供了剥离和分析，描述预测和权重序列匹配组件在孤立情况下的性能贡献，以及预测系统质量和权重序列匹配的关系。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

paper_url: http://arxiv.org/abs/2307.01462
repo_url: None
paper_authors: Minh-Quan Dao, Julie Stephany Berrio, Vincent Frémont, Mao Shan, Elwan Héry, Stewart Worrall
for: 提高 LiDAR-based 物体探测方法中的遮挡问题，特别是在城市交通中， где egocar 需要可靠的物体探测，以避免碰撞，而其视场受到了大量道路用户的阻挡。
methods: collaborative perception via Vehicle-to-Everything (V2X) communication，利用多个连接的代理机构形成完整的场景表示，并通过中间协作来解决性能和带宽之间的负担。
results: 我们提出了一种简单又有效的协作方法，可以在实际应用中超越先前的状态艺术方法，同时尽可能减少对单车辆检测模型的修改和假设不实的多个代理机构同步。实验结果表明，我们的协作方法可以达到98%的性能水平，只 consume相同的带宽，与先前的中间协作方法相比。

Abstract
Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.

摘要
干扰是LiDAR基于对象检测方法的主要挑战。在城市交通中， egovehicle 需要可靠的对象检测，以避免事故，而其视场受到了大量道路用户的干扰。 Collaborative perception via Vehicle-to-Everything (V2X) communication 是一种吸引人的解决方案，它利用了多个位置的连接代理人形成完整的场景表示，并且可以解决性能和带宽的贸易。现状的V2X方法在性能和带宽之间进行了中间协作，通过在 Bird-Eye View 图像上进行点云的交换，以避免在早期协作中的带宽浪费，并在晚期协作中进行拼接，以获得更高的检测性能。在实现优秀性能的同时，大多数中间协作方法的实际部署受到了其复杂的架构和学习可学的协作图、 autoencoder 基于的压缩器/解压缩器的假设，以及 между代理人的不realistic 同步假设。在这种工作中，我们设计了一种简单 yet 有效的协作方法，可以在带宽-性能贸易中提供更好的贸易比例，而且尽可能地避免改变单车检测模型，并放弃不realistic 的 между代理人同步假设。 V2X-Sim 数据集的实验表明，我们的协作方法可以达到98%的性能，只消耗与晚期协作相同的带宽。

Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network

paper_url: http://arxiv.org/abs/2307.01447
repo_url: None
paper_authors: Zizhuo Li, Jiayi Ma
for: 提高计算机视觉中的特征匹配精度和效率，特别是在相机转换、基本矩阵估计和视觉地标 tasks 上。
methods: 我们提出了一种快速准确的 MaKeGNN 模型，它使用了稀疏注意力机制来缺省非重复的关键点，并通过匹配关键点来导引有意义的信息传递。
results: 我们在相机转换、基本矩阵估计和视觉地标 tasks 上达到了领先的性能，同时减少了计算和内存复杂度。

Abstract
Accurately matching local features between a pair of images is a challenging computer vision task. Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images for visual and geometric information reasoning. However, in the context of feature matching, considerable keypoints are non-repeatable due to occlusion and failure of the detector, and thus irrelevant for message passing. The connectivity with non-repeatable keypoints not only introduces redundancy, resulting in limited efficiency, but also interferes with the representation aggregation process, leading to limited accuracy. Targeting towards high accuracy and efficiency, we propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide compact and meaningful message passing. More specifically, our Bilateral Context-Aware Sampling Module first dynamically samples two small sets of well-distributed keypoints with high matchability scores from the image pair. Then, our Matchable Keypoint-Assisted Context Aggregation Module regards sampled informative keypoints as message bottlenecks and thus constrains each keypoint only to retrieve favorable contextual information from intra- and inter- matchable keypoints, evading the interference of irrelevant and redundant connectivity with non-repeatable ones. Furthermore, considering the potential noise in initial keypoints and sampled matchable ones, the MKACA module adopts a matchability-guided attentional aggregation operation for purer data-dependent context propagation. By these means, we achieve the state-of-the-art performance on relative camera estimation, fundamental matrix estimation, and visual localization, while significantly reducing computational and memory complexity compared to typical attentional GNNs.

摘要
通过快速匹配本地特征点，计算机视觉任务中的一个挑战是准确地匹配两个图像中的特征点。先前的研究通常使用注意力基于图像内部/ между图像的全连接图 neural networks (GNNs) 进行视觉和几何信息的推理。然而，在特征匹配任务中，许多特征点是不可重复的，这些特征点由遮挡和检测器失败所致，因此对信息传递无关。与非重复的特征点连接不仅引入纠续，导致效率有限，而且干扰归纳表达过程，从而限制准确性。为了实现高精度和效率，我们提出了MaKeGNN，一种稀缺注意力基于GNN架构。MaKeGNN通过快速匹配特征点来导引有效的信息传递。更加详细地说，我们的双边上下文感知抽取模块首先在图像对中动态选择了一小集well-distributed的高匹配分数的特征点。然后，我们的匹配点协助上下文聚合模块将这些有用的特征点作为信道瓶颈，限制每个特征点只能从匹配的内部和外部匹配点中 retrieve 有利的上下文信息，避免与非匹配的非重复连接干扰信息传递。此外，为了避免初始特征点和抽取的匹配点中的噪音，MKACA模块采用了匹配性指导的注意力聚合操作，以保证数据依赖关系的纯净传递。通过这些手段，我们实现了相对摄像头估算、基本矩阵估算和视觉地理位置估算的状态 искусственный情况，同时显著降低了通常的注意力GNNs的计算和内存复杂性。

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems

paper_url: http://arxiv.org/abs/2307.01430
repo_url: None
paper_authors: Zhen Zhu, Weijie Lyu, Yao Xiao, Derek Hoiem
for: 这paper是为了解决开放词汇图像分类 tasks中的灵活 continual learning问题， Drawing inspiration from human cognition的 complementary learning systems.
methods: 我们提出了一种”tree probe”方法， Which enables fast learning from new examples with competitive accuracy to batch-trained linear models. We also propose a method to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample’s class is within any of the exemplar classes.
results: Our proposed method achieves a good balance of learning speed, target task effectiveness, and zero-shot effectiveness in data incremental, class incremental, and task incremental settings, as well as ability to perform flexible inference on varying subsets of zero-shot and learned categories.

Abstract
We introduce a method for flexible continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. We propose a "tree probe" method, an adaption of lazy learning principles, which enables fast learning from new examples with competitive accuracy to batch-trained linear models. Further, we propose a method to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample's class is within any of the exemplar classes. We test in data incremental, class incremental, and task incremental settings, as well as ability to perform flexible inference on varying subsets of zero-shot and learned categories. Our proposed method achieves a good balance of learning speed, target task effectiveness, and zero-shot effectiveness.

摘要
我们介绍了一种可动的开放词汇图像分类方法， drawing inspiration from human cognition中的补偿学习系统。我们提出了“树触”方法，基于懒散学习原则，可以快速从新示例中学习，与批量训练的直线模型具有竞争性准确率。此外，我们提出了将CLIP零shot模型和 exemplar-based模型的预测结果组合使用，使用零shot估计概率是任何 exemplar类中的样本类别。我们在数据增量、类增量和任务增量设置下测试了我们的提议方法，以及在不同的零shot和学习类别上进行可变的测试。我们的提议方法实现了learner speed, target task effectiveness和零shot effectiveness的好 equilibrio。

DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

paper_url: http://arxiv.org/abs/2307.01426
repo_url: https://github.com/sclbd/deepfakebench
paper_authors: Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, Baoyuan Wu
for: 这篇论文是为了提供一个标准化、一致的深伪检测 benchmark，以解决现有的检测模型之间的不公正比较和可能的误导结果。
methods: 这篇论文使用了一个统一的数据管理系统，以确保所有检测模型的输入数据都是一致的。此外，论文还提供了一个整合了现有方法的框架，以及一个标准化的评估度量和评估协议，以促进透明度和重复性。
results: 这篇论文提供了一个全面的检测模型评估，包括15种现有的检测方法、9个深伪数据集、多种深伪检测评估协议和分析工具，以及广泛的评估。此外，论文还提供了新的分析结果，包括不同的观点（例如，数据增强、后向）。

Abstract
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (e.g., data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at https://github.com/SCLBD/DeepfakeBench.

摘要
具有批评性和 часто被忽视的挑战在深度假造检测领域是缺乏标准化、一致化、完整的标准准测。这个问题导致了不公平的性能比较和可能出现误导性的结果。具体来说，存在数据处理管道的不一致性，导致检测模型的输入数据不一致。此外，实验设置存在显著差异，评价策略和标准化也缺乏。为了填补这一漏洞，我们提出了首个深度假造检测的完整准测，即深度假造准测（DeepfakeBench），它的三个重要贡献如下：1. 一个统一的数据管理系统，确保所有检测器的输入数据具有一致性。2. 一个集成的方法实现框架，包括当前领域的状态艺术方法。3. 标准化的评价指标和协议，以促进透明度和可重现性。我们的准测拥有可扩展、模块化的代码基础，包括15种当前领域的检测方法，9个深度假造数据集，深度假造检测评价协议和分析工具，以及全面的评估。此外，我们还提供了来自多个角度的新的分析结论（例如数据增强、后向传播）。我们希望我们的努力可以促进未来的研究和促进深度假造检测领域的创新。我们的所有代码、评价、分析和结论都公开可用于https://github.com/SCLBD/DeepfakeBench。

Consistent Multimodal Generation via A Unified GAN Framework

paper_url: http://arxiv.org/abs/2307.01425
repo_url: None
paper_authors: Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem
for: 这篇论文的目的是如何通过单一的生成模型生成多模态图像输出，包括RGB、深度和表面法向量。
methods: 该论文基于StyleGAN3架构，使用共享背部和模态特定分支在图像生成网络的最后层。它还提出了每个模态的准确性评估器和跨模态一致性评估器。
results: 在斯坦福2D3D数据集上进行实验，论文实现了真实和一致的RGB、深度和表面法向量图像生成。它还提供了一种训练recipe，可以轻松地将预训练模型应用于新领域，只需几个对应的数据对。此外，论文还评估了使用生成的RGB和深度对进行深度估计器的训练或细化。

Abstract
We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at https://github.com/jessemelpolio/MultimodalGAN.

摘要
我们研究如何通过单一的生成模型生成多modal的图像输出，如RGB、深度和表面法向量。挑战在于生成的输出应该是真实的，同时也需要在不同modal之间具有一致性。我们基于StyleGAN3架构，在生成网络的最后层添加共享后置和modal特定的分支，并提出每个modal的准确性评价器以及交叉modal的一致性评价器。在Stanford2D3D数据集上进行实验，我们实现了真实和一致的RGB、深度和normal图像生成。此外，我们还提供了一个训练recipe，可以轻松地在新领域上扩展我们预训练模型，只需要一些对应的对比数据。最后，我们还评估了使用生成的RGB和深度对来训练或练化深度估计器。代码将在GitHub上提供。

Direct Superpoints Matching for Fast and Robust Point Cloud Registration

paper_url: http://arxiv.org/abs/2307.01362
repo_url: None
paper_authors: Aniket Gupta, Yiming Xie, Hanumant Singh, Huaizu Jiang
for: 本研究旨在提出一种简单 yet effective的方法，可以直接匹配 superpoints，以确定源点云和目标点云之间的固定变换。
methods: 本方法使用全局滑动层来直接匹配 superpoints，并使用这些匹配来确定点云之间的变换。
results: 与直接预测对应点的方法相比，我们的方法可以更准确地估计变换，并且不需要任何后处理修复。在 ModelNet 和 3DMatch 测试集上，我们的方法达到了当前最佳的结果。

Abstract
Although deep neural networks endow the downsampled superpoints with discriminative feature representations, directly matching them is usually not used alone in state-of-the-art methods, mainly for two reasons. First, the correspondences are inevitably noisy, so RANSAC-like refinement is usually adopted. Such ad hoc postprocessing, however, is slow and not differentiable, which can not be jointly optimized with feature learning. Second, superpoints are sparse and thus more RANSAC iterations are needed. Existing approaches use the coarse-to-fine strategy to propagate the superpoints correspondences to the point level, which are not discriminative enough and further necessitates the postprocessing refinement. In this paper, we present a simple yet effective approach to extract correspondences by directly matching superpoints using a global softmax layer in an end-to-end manner, which are used to determine the rigid transformation between the source and target point cloud. Compared with methods that directly predict corresponding points, by leveraging the rich information from the superpoints matchings, we can obtain more accurate estimation of the transformation and effectively filter out outliers without any postprocessing refinement. As a result, our approach is not only fast, but also achieves state-of-the-art results on the challenging ModelNet and 3DMatch benchmarks. Our code and model weights will be publicly released.

摘要
尽管深度神经网络使得下采样后的超点具有特征表示能力，但直接匹配它们通常不是独立使用的方法，主要是因为两个原因。首先，匹配是不可避免的噪声，因此通常采用RANSAC-like的修正。这种随机处理不是梯度可导的，因此无法与特征学习一起共同优化。其次，超点是稀疏的，因此需要更多的RANSAC迭代。现有的方法使用粗细到细粒的策略来传播超点匹配到点级别，但这些匹配不够精细，需要进一步的后处理修正。在这篇论文中，我们提出了一种简单又有效的方法，通过直接匹配超点使用全局软max层，以END-TO-END的方式确定源点云和目标点云之间的固定变换。与直接预测对应点的方法相比，通过利用超点匹配中的丰富信息，我们可以更准确地估算变换，并有效地排除噪声，无需任何后处理修正。因此，我们的方法不仅快速，而且实现了对挑战性 ModelNet 和 3DMatch 测试集的state-of-the-art 结果。我们的代码和模型参数将公开发布。

Patch-CNN: Training data-efficient deep learning for high-fidelity diffusion tensor estimation from minimal diffusion protocols

paper_url: http://arxiv.org/abs/2307.01346
repo_url: None
paper_authors: Tobias Goodwin-Allcock, Ting Gong, Robert Gray, Parashkev Nachev, Hui Zhang
for: 优化Diffusion Tensor Estimation（DT）从六个方向的扩散束图像（DWI）中获得更高精度的数据。
methods: 使用深度学习方法，主要是基于 convolutional neural networks（CNN），从只有六个方向的扩散束图像（DWI）中获得 diffusion tensor 参数。
results: 与传统模型适应和基于voxel-wise fully-connected neural networks（FCN）的方法相比，Patch-CNN 可以更好地优化 scalar dMRI 参数和纤维orientation estimation from six-direction DWIs，并生成更好的 tractogram。

Abstract
We propose a new method, Patch-CNN, for diffusion tensor (DT) estimation from only six-direction diffusion weighted images (DWI). Deep learning-based methods have been recently proposed for dMRI parameter estimation, using either voxel-wise fully-connected neural networks (FCN) or image-wise convolutional neural networks (CNN). In the acute clinical context -- where pressure of time limits the number of imaged directions to a minimum -- existing approaches either require an infeasible number of training images volumes (image-wise CNNs), or do not estimate the fibre orientations (voxel-wise FCNs) required for tractogram estimation. To overcome these limitations, we propose Patch-CNN, a neural network with a minimal (non-voxel-wise) convolutional kernel (3$\times$3$\times$3). Compared with voxel-wise FCNs, this has the advantage of allowing the network to leverage local anatomical information. Compared with image-wise CNNs, the minimal kernel vastly reduces training data demand. Evaluated against both conventional model fitting and a voxel-wise FCN, Patch-CNN, trained with a single subject is shown to improve the estimation of both scalar dMRI parameters and fibre orientation from six-direction DWIs. The improved fibre orientation estimation is shown to produce improved tractogram.

摘要
我们提出了一种新方法，named Patch-CNN，用于从只有六个方向的扩散束图像（DWI）中提取扩散tensor（DT）。在临床应用中，使用深度学习基于方法进行dMRI参数估计，可以使用 Either voxel-wise fully-connected neural networks (FCN) or image-wise convolutional neural networks (CNN)。然而，在临床上，由于时间压力，通常只能取得有限的图像方向，因此现有的方法可能需要很多的训练图像量（image-wise CNNs），或者不会估计束纹方向（voxel-wise FCNs）。为了解决这些限制，我们提出了Patch-CNN，一个具有最小（非voxel-wise） convolutional kernel（3x3x3）的神经网络。与voxel-wise FCNs比较，这有利于网络利用局部 анатомиче信息。与image-wise CNNs比较，最小kernel减少了训练数据的需求。我们对比 conventunal model fitting和voxel-wise FCN进行评估，发现Patch-CNN，通过使用单个subject进行训练，可以提高从六个方向DWIs中的scalar dMRI参数和束纹方向的估计。这些改进的束纹方向估计，可以生成改进的 tractogram。

Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning

paper_url: http://arxiv.org/abs/2307.01200
repo_url: None
paper_authors: Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu, Hongwei Yi, Shengping Zhang, Yebin Liu
for: 这个论文的目的是提出一种基于学习的单视动作捕捉系统，可以在世界空间中实时捕捉全身动作，同时保持准确性。
methods: 该论文使用了一种顺序的代理人-到-动作学习方案，并使用了一个代理数据集，包括2D骨架序列和3D旋转动作在世界空间中。这些代理数据允许我们建立一个基于学习的网络，并在全身动作上提供准确的超级视图指导。此外，我们还在网络中共享了身体手部上下文信息，以便更好地恢复腕姿。
results: 根据该论文的结果，我们实现了世界空间中的实时单视全身动作捕捉系统，并且能够保持准确性和物理可能性。此外，我们还提供了更多的视频结果，可以在我们项目页面上找到：https://liuyebin.com/proxycap。

Abstract
Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we contribute a sequential proxy-to-motion learning scheme together with a proxy dataset of 2D skeleton sequences and 3D rotational motions in world space. Such proxy data enables us to build a learning-based network with accurate full-body supervision while also mitigating the generalization issues. For more accurate and physically plausible predictions, a contact-aware neural motion descent module is proposed in our network so that it can be aware of foot-ground contact and motion misalignment with the proxy observations. Additionally, we share the body-hand context information in our network for more compatible wrist poses recovery with the full-body model. With the proposed learning-based solution, we demonstrate the first real-time monocular full-body capture system with plausible foot-ground contact in world space. More video results can be found at our project page: https://liuyebin.com/proxycap.

摘要
现代学习方法在单视动作捕捉领域已经显示出了扎实的成果，通过数据驱动的方式来进行回归。然而，由于数据收集和网络设计的问题，现有的解决方案在实时全身捕捉中具有很大的挑战。在这项工作中，我们提出了一种顺序Proxy-to-动作学习方案，并使用了2Dskeleton序列和3D旋转动作的世界空间数据集作为代理数据。这些代理数据允许我们建立一个基于学习的网络，并在全身监督下进行准确的回归。为了提高预测的准确性和物理可能性，我们还提出了一种具有联系感的神经动作下降模块，该模块能够考虑到脚地接触和动作不一致的代理观察。此外，我们在网络中分享了身体手势信息，以便更好地恢复与全身模型兼容的手势pose。通过我们的学习型解决方案，我们实现了世界空间中实时单视全身捕捉系统，并且首次实现了准确的脚地接触。更多视频结果可以在我们项目页面上找到：https://liuyebin.com/proxycap。

Segment Anything Meets Point Tracking

paper_url: http://arxiv.org/abs/2307.01197
repo_url: https://github.com/syscv/sam-pt
paper_authors: Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu
for: 这篇论文旨在扩展Segment Anything Model (SAM)的能力，以便在动态视频中跟踪和分割任何东西。
methods: 该方法利用了稳健和稀疏点选择和卷积技术来生成Mask，并且使用了点推送技术来跟踪目标对象。
results: 该方法在流行的视频对象分割评价标准DAVIS、YouTube-VOS和MOSE上表现出了强大的零shot性能。相比传统的对象中心的推送策略，我们使用点推送技术来利用地方结构信息，不受对象 semantics 的限制。

Abstract
The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, employing interactive prompts such as points to generate masks. This paper presents SAM-PT, a method extending SAM's capability to tracking and segmenting anything in dynamic videos. SAM-PT leverages robust and sparse point selection and propagation techniques for mask generation, demonstrating that a SAM-based segmentation tracker can yield strong zero-shot performance across popular video object segmentation benchmarks, including DAVIS, YouTube-VOS, and MOSE. Compared to traditional object-centric mask propagation strategies, we uniquely use point propagation to exploit local structure information that is agnostic to object semantics. We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark. To further enhance our approach, we utilize K-Medoids clustering for point initialization and track both positive and negative points to clearly distinguish the target object. We also employ multiple mask decoding passes for mask refinement and devise a point re-initialization strategy to improve tracking accuracy. Our code integrates different point trackers and video segmentation benchmarks and will be released at https://github.com/SysCV/sam-pt.

摘要
Segment Anything Model (SAM) 已成为一种强大的零shot图像分割模型，使用交互提示如点来生成面积。本文介绍 SAM-PT，一种扩展 SAM 的能力，以跟踪和分割视频中的任何物体。SAM-PT 利用了Robust 和稀疏的点选择和宣传技术来生成面积，实际表明了基于 SAM 的分割跟踪器可以在流行的视频对象分割benchmark中具有强大的零shot性能，包括 DAVIS、YouTube-VOS 和 MOSE。与传统的对象中心的面积宣传策略不同，我们使用点宣传来利用本地结构信息，无论对象 semantics 无关。我们通过直接评估零shot开放世界 Unidentified Video Objects (UVO) benchmark来评估点跟踪的优势。为了进一步提高我们的方法，我们使用 K-Medoids 聚类算法来初始化点并跟踪正面和负面的点，以清晰地 distinguishes 目标对象。我们还使用多个面积解码通过来进行面积纠正，并设计了点重初始化策略以提高跟踪准确性。我们的代码将包括不同的点跟踪器和视频分割benchmark，将在 GitHub 上发布。

Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis

paper_url: http://arxiv.org/abs/2307.01148
repo_url: None
paper_authors: Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann, Isabelle Ayx, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt
for: 用于生成真实的医疗数据，保护患者隐私。
methods: 使用生成模型，利用自我指导学习方法检测数据记忆能力。
results: 发现模型确实记忆训练数据，需要采取缓解措施。

Abstract
Generative latent diffusion models have been established as state-of-the-art in data generation. One promising application is generation of realistic synthetic medical imaging data for open data sharing without compromising patient privacy. Despite the promise, the capacity of such models to memorize sensitive patient training data and synthesize samples showing high resemblance to training data samples is relatively unexplored. Here, we assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets. To detect potential memorization of training samples, we utilize self-supervised models based on contrastive learning. Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.

摘要
<>通过实验，我们发现了生成潜在扩散模型在数据生成中的表现。这些模型在生成真实的医疗影像数据方面具有潜在的应用，例如公开分享医疗影像数据而不会侵犯病人隐私。然而，这些模型是否能够记忆敏感的训练数据仍然是一个未知之地。在这篇文章中，我们评估了3D潜在扩散模型在光子计数 computed tomography angiography和膝骨磁共振成像数据集上的记忆能力。我们使用自动化学习的对照学习方法来检测模型是否会记忆训练数据。我们的结果显示，这些潜在扩散模型确实会记忆训练数据，因此需要发展新的策略来缓和这种记忆。

AVSegFormer: Audio-Visual Segmentation with Transformer

paper_url: http://arxiv.org/abs/2307.01146
repo_url: https://github.com/vvvb-github/avsegformer
paper_authors: Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu
for: 该研究旨在提出一种新的听视多模态（AVS）任务，即将视频中的声音对象定位和分割。
methods: 该研究提出了一种基于 transformer 架构的新方法，称为 AVSegFormer，该方法利用听 queries 和可学习 queries 在 transformer 解码器中，使网络能够选择性地注意到有兴趣的视觉特征。
results: extensive experiments 表明，AVSegFormer 在 AVS benchmark 上达到了状态 искусственный智能的最佳 результаTS，并且可以在不同的视频和声音背景下表现出色。I hope that helps!

Abstract
The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given video. This task demands audio-driven pixel-level scene understanding for the first time, posing significant challenges. In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture. Specifically, we introduce audio queries and learnable queries into the transformer decoder, enabling the network to selectively attend to interested visual features. Besides, we present an audio-visual mixer, which can dynamically adjust visual features by amplifying relevant and suppressing irrelevant spatial channels. Additionally, we devise an intermediate mask loss to enhance the supervision of the decoder, encouraging the network to produce more accurate intermediate predictions. Extensive experiments demonstrate that AVSegFormer achieves state-of-the-art results on the AVS benchmark. The code is available at https://github.com/vvvb-github/AVSegFormer.

摘要
通过音频和视觉的组合，多模态社区已经是长期的研究主题。最近，一个新的音频视频分割（AVS）任务被提出，旨在在给定的视频中找到并分割声音的对象。这个任务需要音频驱动像素级场景理解，对于多模态社区来说，它具有 significante挑战。在这篇论文中，我们提出了AVSegFormer，一种新的AVS任务框架，利用转换器架构。我们在转换器解码器中引入了音频问题和学习问题，使网络可以选择性地听取 interess visual特征。此外，我们还提出了一个音频视频混合器，可以动态调整视觉特征，增强相关的空间通道，并降低无关的空间通道。此外，我们还设计了一个中间mask损失，以增强解码器的监督，让网络生成更加准确的中间预测。我们的实验结果表明，AVSegFormer可以在AVS标准 benchmark上达到状态 искусственный智能的Result。代码可以在https://github.com/vvvb-github/AVSegFormer中找到。

2023-07-04

cs.AI

cs.AI - 2023-07-04

The Inner Sentiments of a Thought

paper_url: http://arxiv.org/abs/2307.01784
repo_url: None
paper_authors: Chris Gagne, Peter Dayan
for: 这 paper 探讨了Transformer-based large-scale language models (LLMs) 能够生成高度现实的文本，以及这些模型能够表达和直接或间接表达各种情感和颜色的能力。
methods: 作者使用了LLMs的隐藏表示来训练分布预测器，用于预测句子的最终情感分布的多个量。
results: 作者发现了这些分布预测器具有良好的准确性和均匀性，并且可以用于分析句子的情感 trait。例如，用于分析句子的情感 trait，例如用于分析句子的情感 trait，例如“but” conjunction 可以对话趋于极端情感的变化。此外，作者还使用了这些分布预测器来生成具有特定情感 trait 的句子。

Abstract
Transformer-based large-scale language models (LLMs) are able to generate highly realistic text. They are duly able to express, and at least implicitly represent, a wide range of sentiments and color, from the obvious, such as valence and arousal to the subtle, such as determination and admiration. We provide a first exploration of these representations and how they can be used for understanding the inner sentimental workings of single sentences. We train predictors of the quantiles of the distributions of final sentiments of sentences from the hidden representations of an LLM applied to prefixes of increasing lengths. After showing that predictors of distributions of valence, determination, admiration, anxiety and annoyance are well calibrated, we provide examples of using these predictors for analyzing sentences, illustrating, for instance, how even ordinary conjunctions (e.g., "but") can dramatically alter the emotional trajectory of an utterance. We then show how to exploit the distributional predictions to generate sentences with sentiments in the tails of distributions. We discuss the implications of our results for the inner workings of thoughts, for instance for psychiatric dysfunction.

摘要
Transformer-based大型语言模型（LLMs）能够生成高度真实的文本。它们能够表达，并至少隐含表达，从明显的投情和兴奋到 SUBTLE的决心和赞誉。我们提供了首次探索这些表示方式，并如何用它们来理解单句的内心情感运作。我们使用LLM应用到预FIX的长度 prefixes 中的隐藏表示进行训练，然后预测句子的final sentiment distribution 的quantiles。我们显示了这些预测器具有良好的准确性，然后提供了使用这些预测器来分析句子的例子，例如，如何通过使用 "but" 来剧烈地改变一句话的情感轨迹。最后，我们显示了如何利用分布预测器来生成具有不同情感的句子。我们讨论了我们的结果对内心工作的影响，例如心理障碍。

GHOST: A Graph Neural Network Accelerator using Silicon Photonics

paper_url: http://arxiv.org/abs/2307.01782
repo_url: None
paper_authors: Salma Afifi, Febin Sunny, Amin Shafiee, Mahdi Nikdast, Sudeep Pasricha
for: 这篇论文旨在提出一种基于光学频谱的干扰加速器，用于加速基于图structured数据的图神经网络（GNNs）模型的运行。
methods: 该论文使用了光学频谱技术，实现了图神经网络的三个主要阶段：邻居更新、Message Passing和采样。这些阶段都是在光学频谱中实现的，以提高加速器的效率和能效性。
results: simulations 表明，GHOST 比 GPU、TPU、CPU 和多种现有的 GNN 硬件加速器具有至少 10.2 倍的吞吐量和 3.8 倍的能效率。

Abstract
Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.

摘要
GRAPH神经网络（GNNs）已经成为模elling和学习图structured数据的强大方法。多个领域受益于GNNs的能力，如推荐系统、社交网络分析、药物发现和机器人。但是，加速和有效地处理GNNs需要一种特殊的方法，这些方法超出了传统的人工神经网络加速器的能力，因为GNNs的计算和存储需求很大。CMOS平台的慢速化也驱动了寻找代替实现基台。在这篇论文中，我们提出了GHOST，首个基于光学频谱的GNN硬件加速器。GHOST efficiently减少了顶点中心和边中心操作的成本。它在光学频谱中实现了GNNs的三个主要阶段，allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks。我们的Simulation studies indicate that GHOST exhibits at least 10.2 times better throughput and 3.8 times better energy efficiency compared to GPU, TPU, CPU, and multiple state-of-the-art GNN hardware accelerators.

Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling

paper_url: http://arxiv.org/abs/2307.01778
repo_url: https://github.com/WhoTHU/Adversarial_camou
paper_authors: Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, Xiaolin Hu
for: 这 paper 的目的是为了逃脱人体检测器，而且它们可以在多个视角下工作。
methods: 这 paper 使用了3D 模型来制作隐蔽的 Texture，这种技术已经在隐蔽固定物体上得到成功。然而，人体和服装都是非固定的，因此实现这种技术在物理上是非常困难的。
results: 这 paper 的实验结果表明，使用 AdvCaT 技术可以在多个视角下逃脱多种人体检测器，并且可以在实际世界中应用。

Abstract
Recent works have proposed to craft adversarial clothes for evading person detectors, while they are either only effective at limited viewing angles or very conspicuous to humans. We aim to craft adversarial texture for clothes based on 3D modeling, an idea that has been used to craft rigid adversarial objects such as a 3D-printed turtle. Unlike rigid objects, humans and clothes are non-rigid, leading to difficulties in physical realization. In order to craft natural-looking adversarial clothes that can evade person detectors at multiple viewing angles, we propose adversarial camouflage textures (AdvCaT) that resemble one kind of the typical textures of daily clothes, camouflage textures. We leverage the Voronoi diagram and Gumbel-softmax trick to parameterize the camouflage textures and optimize the parameters via 3D modeling. Moreover, we propose an efficient augmentation pipeline on 3D meshes combining topologically plausible projection (TopoProj) and Thin Plate Spline (TPS) to narrow the gap between digital and real-world objects. We printed the developed 3D texture pieces on fabric materials and tailored them into T-shirts and trousers. Experiments show high attack success rates of these clothes against multiple detectors.

摘要
最近的研究提出了为逃脱人体检测器而制作险oso的衣服，但它们只有有限的视场效果或对人类非常明显。我们想制作基于3D模型的险oso texture для衣服，这是已经用于制作固定险oso对象，如3D打印的乌龟。与固定对象不同，人类和衣服是非固定的，这导致物理实现的困难。为了制作多视场可以逃脱人体检测器的自然looking险oso衣服，我们提出了险oso披落文（AdvCaT），它们类似于日常衣服的一种典型文化，披落文。我们利用 Voronoi 图和 Gumbel-softmax 技巧来参数化险oso披落文并优化参数。此外，我们提出了一种高效的3D矩阵增强管道，将数字和实际对象之间的差异缩小。我们打印了开发的3D texture Piece onto fabric材料，并将其制成T恤和裤子。实验表明这些衣服可以高效地逃脱多个检测器。

MOPO-LSI: A User Guide

paper_url: http://arxiv.org/abs/2307.01719
repo_url: None
paper_authors: Yong Zheng, Kumar Neelotpal Shukla, Jasmine Xu, David, Wang, Michael O’Leary
for: 这份论文是为了介绍MOPO-LSI库的用户指南，包括问题设置、工作流程和配置参数。
methods: 该论文使用MOPO-LSI库来实现多目标投资策略的优化，包括问题设置、工作流程和配置参数。
results: 该论文提供了MOPO-LSI库的用户指南，包括问题设置、工作流程和配置参数，以帮助用户快速地采用该库进行多目标投资策略的优化。

Abstract
MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.

摘要
MOPO-LSI是一个开源的多目标投资组合优化库，用于可持续投资。本文件提供MOPO-LSI版本1.0的用户手册，包括问题设置、工作流程和配置参数。

On the Constrained Time-Series Generation Problem

paper_url: http://arxiv.org/abs/2307.01717
repo_url: None
paper_authors: Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, Svitlana Vyetrenko
for: 这种论文是为了提供一种有效的时间序列生成方法，以满足实际应用中的需求，如增强机器学习算法的性能、增加罕见事件的发生率和创造更多的对称时间序列enario。
methods: 该论文提出了一种新的时间序列生成方法，基于受限制的优化框架，并使用一种名为“指导噪时间”的干扰模型来生成真实的时间序列。
results: 该论文的实验结果表明，相比于现有的方法，该方法能够更高效地生成受限制的时间序列，并且不需要重新训练，从而降低碳脚印。

Abstract
Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.

摘要
通常情况下，人工时间序列被用于实际应用中增强历史时间序列数据，增加罕见事件的发生，以及创建具有时间序列的对应的counterfactualenario。实际性（即 Distributional-similarity）以及满足某些数学条件是常见的需求。例如，美国联邦储金会发布基于受限时间序列的 Synthetic market stress scenarios，用于评估金融机构在假设的经济衰退中的性能。现有的时间序列生成方法通常是通过权重loss来强制满足约束，并拒绝不符合约束的样本。然而，这些方法会需要重新训练，如果变更约束，并且拒绝样本可能是计算昂贵的，或者对于复杂的约束来说是不实用。在这篇论文中，我们提出一种新的方法来解决受约束时间序列生成问题，并提供高效的采样，同时保证生成的时间序列具有实际性。具体来说，我们将问题带入受约束优化框架，然后我们提出了一些生成方法，包括一种名为“GuidedDiffTime”的导向扩散模型，用于生成实际时间序列。我们对几个金融和能源数据集进行了实验，并证明我们的方法在质量和量上都超过了现有的方法。最重要的是，我们的“GuidedDiffTime”模型不需要重新训练，因此可以避免重新训练所带来的碳脚印。

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01708
repo_url: None
paper_authors: Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand
for: 学习风险敏感奖励学习模型
methods: 使用分布式奖励学习引入两种新的模型相等性定义，一种是通用的但是 Computationally intractable，另一种是实用的可以选择希望计划优化的风险度量。
results: 在Tabular和大规模实验中证明了这种框架可以增强任何模型自由风险敏感算法，并提供了多种实际应用场景。

Abstract
We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.

摘要
我团队正在研究风险敏感的推荐学习问题。我们理论上显示，合适的值相等方法，可以在风险中性设置下进行优化规划，但是这些方法在风险敏感设置下不够。我们利用分布式推荐学习来引入两种新的模型相等性，一种是通用的，可以用来规划任何风险度量，但是它是不可能实现的；另一种是实用的，允许您选择想要优化的风险度量。我们 demonstarte了我们的框架可以用来增强任何模型自由风险敏感算法，并提供了表格和大规模实验来证明其能力。

Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data

paper_url: http://arxiv.org/abs/2307.01701
repo_url: None
paper_authors: Florent Guépin, Matthieu Meeus, Ana-Maria Cretu, Yves-Alexandre de Montjoye
for: 这个论文目的是评估人工数据的隐私。
methods: 这篇论文使用了阴影模型进行会员推断攻击，以评估人工数据的安全性。
results: 研究人员通过使用只有人工数据进行三种攻击场景，成功地实现了会员推断攻击，并在两个实际数据集和两个人工数据生成器上进行了测试。这些结果表明，当审核人工数据时，可以减轻对auxiliary dataset的假设，从而实现实际的攻击。

Abstract
Synthetic data is emerging as the most promising solution to share individual-level data while safeguarding privacy. Membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data. These attacks, however, currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. This often is a very strong assumption that would make an attack unlikely to happen in practice. We here show how this assumption can be removed and how MIAs can be performed using only the synthetic data. More specifically, in three different attack scenarios using only synthetic data, our results demonstrate that MIAs are still successful, across two real-world datasets and two synthetic data generators. These results show how the strong hypothesis made when auditing synthetic data releases - access to an auxiliary dataset - can be relaxed to perform an actual attack.

摘要
现代数据是许多领域的解决方案，它可以保护个人隐私的同时，共享个人数据。模型阴影攻击（MIAs），基于阴影模型，已成为评估合成数据隐私的标准方法。然而，这些攻击假设攻击者有访问类似于训练数据的 auxiliary dataset 的权限，这是一个很强的假设，在实际情况中很难发生。我们在这里显示了如何除掉这个假设，并使用只有合成数据进行 MIAs。具体来说，我们在三种不同的攻击场景中，使用了两个真实数据集和两个合成数据生成器，结果表明，MIAs 仍然成功，不需要 auxiliary dataset。这些结果表明，在审核合成数据发布时，对于实际攻击而言，可以放宽这一假设。

Online Learning and Solving Infinite Games with an ERM Oracle

paper_url: http://arxiv.org/abs/2307.01689
repo_url: None
paper_authors: Angelos Assos, Idan Attias, Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson
for: 这篇论文旨在解决在线学习中的泛化误差问题，而现有的算法依赖于计算不fficient的oracle，如标准优化算法（SOA）。
methods: 该论文提出了基于ERM oracle的在线二分类Setting的算法，并证明其在可 realizable 设定下具有有限征逐和在agnostic 设定下具有线性增长的 regret。 regret 的 bound 基于下面两个维度：Littlestone 维度和阈值维度。
results: 该论文显示了在非参数化游戏中，ERM oracle可以被视为best response oracle，并提供了基于best response oracle的学习算法，可以在两个玩家zero-sum 游戏和多个玩家general-sum 游戏中达到approximate-minimax 均衡和approximate coarse correlated 均衡，只要游戏有 bounded fat-threshold 维度。

Abstract
While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.

摘要
在搜索学习设定下，ERM 可以达到近似优化的泛化误差，但在在线学习设定下，算法们往往需要计算效率低的oracle，如标准优化算法（SOA）。在这种情况下，我们提出了一种凭据仅仅基于ERM oracle call的在线二分类设定算法，并证明其在 realizable 设定下有finite regret，在agnostic 设定下有sublinearly growing regret。我们将 regret 约束为Littlestone和阈值维度的下界。在非参数学习游戏中，ERM oracle可以被视为最优回应 oracle，找到对某个玩家的历史玩家的最优回应。在这个设定下，我们提供了只凭据最优回应 oracle 的学习算法，可以在二player零Sum游戏中 converge to approximate-minimax equilibria，在多player general-sum游戏中 converge to approximate coarse correlated equilibria，只要游戏有bounded fat-threshold dimension。我们的算法适用于 binary-valued 和 real-valued 游戏，可以被视为对double oracle和多个 oracle 算法在实践中的 justify。

Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT Services

paper_url: http://arxiv.org/abs/2307.01684
repo_url: None
paper_authors: Liekang Zeng, Xu Chen, Peng Huang, Ke Luo, Xiaoxi Zhang, Zhi Zhou
for: 这个论文是为了提供一个分布式实时 Graph Neural Network (GNN) 推论框架，以便在 IoT 驱动的智能应用中提供 GNN 服务。
methods: 这个论文使用了轻量级的 fog computing 技术，并将 GNN 推论框架分布在多个 fog 节点上，以便更好地利用 IoT 资料来源附近的多元化和动态资源。
results: 这个论文的实验和案例研究显示，Fograph 可以与现有的云 computing 和 fog 部署相比，提供更高的执行速度和过程效率，最高可以达到 5.39 倍的执行速度提升和 6.84 倍的过程提升。

Abstract
Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.

摘要
graph neural networks (GNNs) 已经在不同的应用中引起了广泛的关注，因为它们在图结构上能够提取潜在的表示。为了在智能应用中提供 GNN 基于服务，传统的模型服务 paradigm 通常会将全部的地理分布的输入数据上传到远程数据中心。然而，我们的实验测量表明，这种云端服务中的通信开销很大，而fog计算可以带来极大的潜在优势。为了最大化fog计算的建筑减法，在这篇论文中，我们提出了一种名为 Fograph 的分布式实时 GNN 推理框架。通过对不同类型的 fog 节点进行hetereogeneity-aware执行规划和 GNN 特定压缩技术，Fograph 设计得特别适应 GNN 在fog环境中的服务。实验和案例研究表明，Fograph 可以与状态艺术云服务和 fog 部署相比，提高执行速度和通过put Throughput by up to 5.39 倍和 6.84 倍。

Learning Discrete Weights and Activations Using the Local Reparameterization Trick

paper_url: http://arxiv.org/abs/2307.01683
repo_url: None
paper_authors: Guy Berger, Aviv Navon, Ethan Fetaya
for: 降低计算机视ION和机器学习中的 neural network inference 计算复杂性和内存需求
methods: 使用 binarization 方法，将 neural network weights 和活动化函数 binarized，以实现更高效的计算
results: 实现了针对 binary activations 的网络训练，并且可以在具有低资源的设备上进行高效的计算，并且可以实现 state-of-the-art 的Results

Abstract
In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. A commonplace solution to address this challenge is through the use of binarization. By binarizing the network weights and activations, one can significantly reduce computational complexity by substituting the computationally expensive floating operations with faster bitwise operations. This leads to a more efficient neural network inference that can be deployed on low-resource devices. In this work, we extend previous approaches that trained networks with discrete weights using the local reparameterization trick to also allow for discrete activations. The original approach optimized a distribution over the discrete weights and uses the central limit theorem to approximate the pre-activation with a continuous Gaussian distribution. Here we show that the probabilistic modeling can also allow effective training of networks with discrete activation as well. This further reduces runtime and memory footprint at inference time with state-of-the-art results for networks with binary activations.

摘要
在计算机视觉和机器学习领域，一个重要的挑战是降低神经网络推理的计算和内存占用。一种常见的解决方案是通过binarization来实现。通过将神经网络权重和活动化值binarized，可以很大减少计算复杂性，将计算昂贵的浮点运算替换为更快的位运算。这导致一个更高效的神经网络推理，可以在低资源设备上部署。在这项工作中，我们将之前的方法扩展，使得神经网络可以使用离散权重和离散活动化值进行训练。原来的方法使用了local reparameterization trick来优化分布式权重的学习，并使用中心假设定理来近似预活化的 kontinuous Gaussian Distribution。在这里，我们表明了概率模型也可以有效地训练离散活动化的神经网络。这进一步减少了执行时间和内存占用，并达到了当前最佳的结果，在神经网络中使用二进制活动化。

RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games

paper_url: http://arxiv.org/abs/2307.01676
repo_url: None
paper_authors: Hyeon-Chang Jeon, In-Chang Baek, Cheong-mok Bae, Taehwa Park, Wonsang You, Taegwan Ha, Hoyun Jung, Jinha Noh, Seungwon Oh, Kyung-Joong Kim
for: 这研究旨在提供一个新的游戏模拟器和两个评价指标，用于自动游戏内容均衡。
methods: 这研究使用人工智能技术自动调整游戏内容，并在MMORPG游戏中采用多样化和可定制的内容来测试其效果。
results: 这研究提出了一个新的游戏研究平台，可以扩大自动游戏均衡问题的研究范围，并提供一个真实的游戏生产管道中的框架。

Abstract
The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the game research community has explored automated game balancing using artificial intelligence (AI) techniques. However, previous studies have focused on limited game content and did not consider the importance of the generalization ability of playtesting agents when encountering content changes. In this study, we propose RaidEnv, a new game simulator that includes diverse and customizable content for the boss raid scenario in MMORPG games. Additionally, we design two benchmarks for the boss raid scenario that can aid in the practical application of game AI. These benchmarks address two open problems in automatic content balancing, and we introduce two evaluation metrics to provide guidance for AI in automatic content balancing. This novel game research platform expands the frontiers of automatic game balancing problems and offers a framework within a realistic game production pipeline.

摘要
游戏内容平衡对游戏体验产生很大影响。不均衡的游戏内容会导致玩家失望或厌烦，因为玩家需要重复失败。虽然游戏设计师希望通过调整游戏内容的Difficulty来解决这个问题，但这是一项重复、劳动 INTENSIVE 和挑战性较高的过程，特别是在商业级游戏中。为解决这个问题，游戏研究社区已经开始使用人工智能（AI）技术自动平衡游戏内容。然而，前一些研究都集中在有限的游戏内容上，并未考虑AI游戏测试者在内容变化时的总体化能力的重要性。在这项研究中，我们提出了RaidEnv，一个新的游戏模拟器，包括MMORPG游戏中的bossoid难度scenario中的多样化和可定制内容。此外，我们设计了两个备用测试基准，可以帮助在游戏AI中自动平衡内容的实践应用。这两个基准解决了游戏自动平衡问题中的两个开放问题，并我们引入了两个评价指标，为AI在自动平衡内容中提供指导。这种新的游戏研究平台扩展了自动游戏平衡问题的前iers，并提供了一个在真实游戏生产管道中可行的框架。

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

paper_url: http://arxiv.org/abs/2307.02499
repo_url: https://github.com/x-plug/mplug-docowl
paper_authors: Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang
for: 这篇论文是关于免需OCR文档理解的研究，旨在提高现有多种模型的 document understanding 能力。
methods: 该论文使用了 mPLUG-Owl 模型，并通过自定义数据集和训练策略进行了强化和调整。
results: 实验结果表明，该模型在免需OCR文档理解任务上表现出色，并且在不具体 fine-tuning 的情况下也能够在多个下游任务上发挥良好的效果。

Abstract
Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for OCR-free document understanding. Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding. In this paper, we propose mPLUG-DocOwl based on mPLUG-Owl for OCR-free document understanding. Specifically, we first construct a instruction tuning dataset featuring a wide range of visual-text understanding tasks. Then, we strengthen the OCR-free document understanding ability by jointly train the model on language-only, general vision-and-language, and document instruction tuning dataset with our unified instruction tuning strategy. We also build an OCR-free document instruction understanding evaluation set LLMDoc to better compare models' capabilities on instruct compliance and document understanding. Experimental results show that our model outperforms existing multi-modal models, demonstrating its strong ability of document understanding. Besides, without specific fine-tuning, mPLUG-DocOwl generalizes well on various downstream tasks. Our code, models, training data and evaluation set are available at https://github.com/X-PLUG/mPLUG-DocOwl.

摘要
文档理解指的是自动提取、分析和理解各种数字文档中的信息，如网页。现有的多模型大型语言模型（MLLMs），包括mPLUG-Owl，在零批量情况下已经表现出了扑捉人的可能性，这表明它们可能为无需OCR的文档理解做出贡献。然而，无法在域内训练时，这些模型往往忽略细腻的OCR特征，如复杂的表格或大块文本，这些特征是无需OCR文档理解的关键。在这篇论文中，我们提出了基于mPLUG-Owl的mPLUG-DocOwl模型，用于无需OCR的文档理解。具体来说，我们首先构建了一个具有各种视觉语言理解任务的指导调教数据集。然后，我们通过将模型同时在语言只、通用视觉语言和文档指导调教数据集上进行联合训练，使模型具备更强的无需OCR文档理解能力。此外，我们还建立了一个无需OCR文档指导理解评估集LLMDoc，以更好地比较模型在指令遵从和文档理解方面的能力。实验结果表明，我们的模型在现有多modal模型中表现出色，并且无需特定的 fine-tuning，mPLUG-DocOwl在多种下游任务上具有良好的普适性。我们的代码、模型、训练数据和评估集可以在https://github.com/X-PLUG/mPLUG-DocOwl上获取。

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

paper_url: http://arxiv.org/abs/2307.01646
repo_url: https://github.com/qiyan98/swingnn
paper_authors: Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang
For: 本文研究了基于Permutation-equivariant networks的扩散模型，可以学习 permutation-invariant的分布。但是，与非恒等模型相比，这些 invariable 模型在学习中遇到更大的挑战，其有效的目标分布具有更多的模式，且最佳一步噪声分布是 Gaussian mixture 的分布。* Methods: 本文提出了一种非恒等扩散模型，即 $\textit{SwinGNN}$，它使用高效的 edge-to-edge 2-WL 消息传递网络，并使用 shifted window 基于 SwinTransformers 的自注意力。此外，通过系统的ablations，我们确定了一些关键的训练和采样技术，可以大幅提高生成的样本质量。* Results: 我们的 $\textit{SwinGNN}$ 在 synthetic 和实际的蛋白质和分子数据集上达到了领先的性能。我们的代码在 https://github.com/qiyan98/SwinGNN 上发布。

Abstract
Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

摘要
“基于 permutation-equivariant 网络的扩散模型可以学习 permutation-invariant 分布 для图数据。然而，相比其非 invariat 对手，我们发现这些 invariat 模型在学习中遇到更大的挑战，主要表现在以下两点：1) 其有效目标分布具有更多的模式; 2) 其最佳一步干扰得分函数是 Gaussian 混合函数的更多组件。这些分析结果为我们提供了灵感，我们提议一种非 invariat 扩散模型，即 $\textit{SwinGNN}$，该模型使用高效的 edge-to-edge 2-WL 消息传递网络，并使用 shifted window 基于 SwinTransformers 的自注意力。此外，通过系统性的ablation 研究，我们确定了一些关键的训练和采样技术，可以大幅提高生成的样本质量。最后，我们提出了一个简单的后处理技术，即随机排序生成的图，这可以证明任何图生成模型都可以变换为 permutation-invariant 模型。我们在 synthetic 和实际世界的蛋白质和分子数据上进行了广泛的实验，并证明了我们的 SwinGNN 达到了状态的最佳性能。我们的代码可以在 https://github.com/qiyan98/SwinGNN 上下载。”

Insert-expansions for Tool-enabled Conversational Agents

paper_url: http://arxiv.org/abs/2307.01644
repo_url: None
paper_authors: Andreas Göldi, Roman Rietsche
for: 这篇论文关注大语言模型中的链条思维提示实现方法，特别是在用工具（或“插件”）在明确的思维路径中生成的 conversational agents 中使用工具。
methods: 我们使用 conversation analysis 来研究用户如何在 conversational agents 中提供必要的细节和纠正请求，以便实现更好的回答。
results: 我们通过两个实验直接比较，发现在推荐领域使用“用户为工具”方法可以获得利益。

Abstract
This paper delves into an advanced implementation of Chain-of-Thought-Prompting in Large Language Models, focusing on the use of tools (or "plug-ins") within the explicit reasoning paths generated by this prompting method. We find that tool-enabled conversational agents often become sidetracked, as additional context from tools like search engines or calculators diverts from original user intents. To address this, we explore a concept wherein the user becomes the tool, providing necessary details and refining their requests. Through Conversation Analysis, we characterize this interaction as insert-expansion - an intermediary conversation designed to facilitate the preferred response. We explore possibilities arising from this 'user-as-a-tool' approach in two empirical studies using direct comparison, and find benefits in the recommendation domain.

摘要

Heuristic Algorithms for the Approximation of Mutual Coherence

paper_url: http://arxiv.org/abs/2307.01639
repo_url: None
paper_authors: Gregor Betz, Vera Chekan, Tamara Mchedlidze
for: This paper aims to accelerate the computation of mutual coherence, which is a measure of similarity between two opinions, in the context of the Wahl-O-Mat system used in Germany to help voters find candidates that align with their political preferences.
methods: The authors model the distribution of confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate the model parameters. They also use the expected value of the distribution to approximate the mutual coherence. Some of the presented algorithms are fully polynomial-time, while others only require solving a small number of instances of the SAT model counting problem.
results: The authors’ best algorithm achieves an average squared error of less than 0.0035, which is considered insignificant given the efficiency of the algorithm. The accuracy is precise enough to be used in Wahl-O-Mat-like systems.

Abstract
Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems.

摘要
共同 coherence 是两个意见之间的相似度量量。这个概念来自哲学，但它对各种技术领域都是关键的，例如德国的 Wahl-O-Mat 系统。这个系统帮助选民找到最接近其政治偏好的候选人。然而，正确计算共同 coherence 是非常时间consuming，因为需要遍历所有意见集合中的所有子集，并对每个子集解决一个 SAT 模型计数问题。这个问题在计算机科学中是一个知名的困难问题。这个研究是第一个加速这种计算的研究。我们模型了confirmation value 的分布为三个 Gaussian 的混合，并提供了高效的启发法来估算其模型参数。然后，我们使用这些参数来近似共同 coherence。我们的算法中有一些是完全 polynomial-time，另一些只需解决一小数量的 SAT 模型计数问题。我们的最佳算法的平均平方误差低于 0.0035，这是可以忽略的。此外，我们的精度够高，可以用于 Wahl-O-Mat 类系统。

Random Walk on Multiple Networks

paper_url: http://arxiv.org/abs/2307.01637
repo_url: https://github.com/flyingdoog/rwm
paper_authors: Dongsheng Luo, Yuchen Bian, Yaowei Yan, Xiong Yu, Jun Huan, Xiao Liu, Xiang Zhang
for: 本研究旨在利用多个网络来提高实体之间的归一化和网络推荐等任务中的做出更好的推断。methods: 本研究提出了随机游走在多个网络上（RWM），可以处理多种多网络和多类实体的数据。RWM使用随机游走者在每个网络上访问节点，并计算每个节点的本地邻居关系（i.e., 节点访问概率）。在发现类似访问概率的节点时，游走者们强制合作。results: 研究人员通过分析RWM的整合性和可靠性，并提出了两种可靠性保证的优化方法。在链接预测、网络嵌入和本地社区检测等任务中，RWM表现出色，并且在实际数据上进行了广泛的 экспериментирования。

Abstract
Random Walk is a basic algorithm to explore the structure of networks, which can be used in many tasks, such as local community detection and network embedding. Existing random walk methods are based on single networks that contain limited information. In contrast, real data often contain entities with different types or/and from different sources, which are comprehensive and can be better modeled by multiple networks. To take advantage of rich information in multiple networks and make better inferences on entities, in this study, we propose random walk on multiple networks, RWM. RWM is flexible and supports both multiplex networks and general multiple networks, which may form many-to-many node mappings between networks. RWM sends a random walker on each network to obtain the local proximity (i.e., node visiting probabilities) w.r.t. the starting nodes. Walkers with similar visiting probabilities reinforce each other. We theoretically analyze the convergence properties of RWM. Two approximation methods with theoretical performance guarantees are proposed for efficient computation. We apply RWM in link prediction, network embedding, and local community detection. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of RWM.

摘要
随机漫步是一种基本算法，用于探索网络结构，可以用于多种任务，如本地社区检测和网络嵌入。现有的随机漫步方法基于单个网络，它们只包含有限信息。然而，实际数据通常包含不同类型的实体或来自不同来源的实体，这些信息更加全面，可以更好地使用多个网络来模型。为了利用多个网络中的丰富信息，提高对实体的推断，我们提出了随机漫步多网络（RWM）。RWM是灵活的，支持多类多网络和通用多网络，它们可能形成多对多节点映射。RWM将在每个网络上Random walker，以获取起始节点的本地邻近性（即节点访问概率）。漫步者之间相似的访问概率强化对方。我们 theoretically 分析 RWM 的收敛性质。我们还提出了两种有理性 guarantees 的近似方法，用于有效地计算。我们在链接预测、网络嵌入和本地社区检测中应用 RWM。我们在 Synthetic 和实际数据集上进行了广泛的实验，并证明了 RWM 的效果和效率。

SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting

paper_url: http://arxiv.org/abs/2307.01616
repo_url: None
paper_authors: Zhenwei Zhang, Xin Wang, Yuantao Gu
for: 本文旨在提出一种能够有效地捕捉和模型系列之间的依赖关系的系列意识推断模型，以提高多ivariate时间序列预测的准确性。
methods: 本文提出了一种基于图结构的Series-aware Graph-enhanced Transformer模型，可以有效地表示多个时间序列的多样性模式，并避免系列之间的重复信息。
results: 经过对真实数据和 sintetic 数据的广泛实验，本文表明了SageFormer 模型在比较现有方法时的显著性能优势。

Abstract
Multivariate time series forecasting plays a critical role in diverse domains. While recent advancements in deep learning methods, especially Transformers, have shown promise, there remains a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.

摘要
多变量时间序列预测在多个领域发挥重要作用。Recent Advances in Deep Learning Methods, especially Transformers, have shown promise, but there is still a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.Here is the word-for-word translation of the text into Simplified Chinese:多变量时间序列预测在多个领域发挥重要作用。最近的深度学习方法，特别是转换器，已经展示了承诺，但还有一个差距在强调多个时间序列之间的相互关系。这篇论文介绍了SageFormer，一种基于图strucutres的Series-aware Graph-enhanced Transformer模型，旨在通过图结构来有效地捕捉和模型时间序列之间的相互关系。SageFormer解决了两个关键挑战：一是在多个时间序列之间有效地表示多样的时间模式，二是在多个时间序列之间减少重复的信息。重要的是，提议的系列意识框架可以轻松地与现有的转换器基本模型集成，从而增强其模型多个时间序列之间的相互关系。通过对真实世界和 sintetic数据集进行了广泛的实验，我们展示了SageFormer的性能超过了前一个状态的方法。

Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction

paper_url: http://arxiv.org/abs/2307.01610
repo_url: https://github.com/dependablesystemslab/mia_defense_hamp
paper_authors: Zitao Chen, Karthik Pattabiraman
for: 防止机器学习模型被攻击，保护模型的训练数据隐私。
methods: 提出了一种防御技术，即高度积震训练框架和均衡抑制器，使模型在训练和测试样本上具有类似的预测结果，从而保护模型的隐私。
results: 对五个 benchmark 数据集进行了广泛的评估，并显示了HAMP 可以保持高度的准确率和强的会员隐私。与七种现有的防御技术进行比较，HAMP 在隐私利用与用途之间具有更好的质量比。

Abstract
Machine learning (ML) models are vulnerable to membership inference attacks (MIAs), which determine whether a given input is used for training the target model. While there have been many efforts to mitigate MIAs, they often suffer from limited privacy protection, large accuracy drop, and/or requiring additional data that may be difficult to acquire. This work proposes a defense technique, HAMP that can achieve both strong membership privacy and high accuracy, without requiring extra data. To mitigate MIAs in different forms, we observe that they can be unified as they all exploit the ML model's overconfidence in predicting training samples through different proxies. This motivates our design to enforce less confident prediction by the model, hence forcing the model to behave similarly on the training and testing samples. HAMP consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model's prediction while still achieving high accuracy. To further reduce privacy risk, HAMP uniformly modifies all the prediction outputs to become low-confidence outputs while preserving the accuracy, which effectively obscures the differences between the prediction on members and non-members. We conduct extensive evaluation on five benchmark datasets, and show that HAMP provides consistently high accuracy and strong membership privacy. Our comparison with seven state-of-the-art defenses shows that HAMP achieves a superior privacy-utility trade off than those techniques.

摘要
机器学习（ML）模型容易受到会员推测攻击（MIA），该攻击可以判断给定输入是否用于训练目标模型。虽然有许多防御技术，但它们通常受到有限的隐私保护、大幅下降精度和/或需要难以获得的额外数据。这个工作提出了一种防御技术，即HAMP，可以同时保障强大的会员隐私和高精度。为了 Mitigate MIAs 的不同形式，我们发现它们都利用 ML 模型对训练样本的过于自信。这种动机导致我们的设计强制模型在测试样本上预测更加不自信，从而使模型在训练和测试样本上行为相同。HAMP 包括一种新的训练框架，高 entropy 软标签和一种基于Entropy的正则化器，以限制模型的预测，并且仍然实现高精度。为了进一步减少隐私风险，HAMP 对所有预测输出进行同步修改，使其变成低自信输出，保持精度，同时减少了会员隐私风险。我们对五个 benchmark 数据集进行了广泛的评估，并表明HAMP 可以保持高精度和强大的会员隐私。与七种 state-of-the-art 防御技术进行比较，我们发现HAMP 在隐私利用和实用性之间取得了更好的平衡。

Bridge the Performance Gap in Peak-hour Series Forecasting: The Seq2Peak Framework

paper_url: http://arxiv.org/abs/2307.01597
repo_url: None
paper_authors: Zhenwei Zhang, Xin Wang, Jingyuan Xie, Heling Zhang, Yuantao Gu
For: 预测高峰时间序列 (PHSF) 是各个领域中的一项重要 yet 未经充分利用的任务。现有的深度学习模型在常规时间序列预测 (TSF) 中具有出色的表现，但在 PHSF 中却表现不佳。这可以归结于高峰时间序列的非站ARY性问题，导致直接预测更加困难于标准 TSF。* Methods: 该 paper 提出了一个名为 Seq2Peak 的新框架，用于解决 PHSF 任务。Seq2Peak 包括两个关键组成部分：一个名为 CyclicNorm 的管道，用于 Mitigate 非站ARY性问题，以及一个简单 yet effective 的可调 Parameters-free 峰值时间解码器，使用 hybrid 损失函数，将原始序列和峰值序列作为监督信号。* Results: 对于四个实际 dataset 进行了广泛的实验，证明了提posed 框架的有效性，得到了平均相对改进率为 37.7%，对于 transformer- 和非 transformer- 基于 TSF 模型。

Abstract
Peak-Hour Series Forecasting (PHSF) is a crucial yet underexplored task in various domains. While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF. This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF. Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit. To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models. Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue, and a simple yet effective trainable-parameter-free peak-hour decoder with a hybrid loss function that utilizes both the original series and peak-hour series as supervised signals. Extensive experimentation on publicly available time series datasets demonstrates the effectiveness of the proposed framework, yielding a remarkable average relative improvement of 37.7\% across four real-world datasets for both transformer- and non-transformer-based TSF models.

摘要
《峰值时间序列预测（PHSF）是许多领域中的一项重要 yet 未得到充分研究的任务。当前的深度学习模型在标准时间序列预测（TSF）中表现出色，但在 PHSF 中却表现不佳。这可以归结于峰值时间序列的非站点性问题，导致直接预测变得更加困难。此外，手动提取常规预测结果中的最大值会导致性能下降，因为模型会尝试最小化均方误差。为了解决这些问题，本文提出了 Seq2Peak 框架，这是一种特有的 PHSF 任务解决方案， bridging 标准 TSF 模型的性能差距。Seq2Peak 框架包括两个关键组件：CyclingNorm 管道来 mitigate 非站点性问题，以及一个简单 yet 高效的可学习参数无法 peak-hour 解码器，使用两种不同的超参数来学习峰值时间序列和常规时间序列。经过对公开的时间序列数据集进行了广泛的实验，显示了提议的框架的效果，其中平均相对提升率为 37.7%，对四个实际世界数据集中的两种 transformer 和非 transformer 基于 TSF 模型进行了评估。

paper_url: http://arxiv.org/abs/2307.01595
repo_url: https://github.com/liyingji1996/CCPA
paper_authors: Yingji Li, Mengnan Du, Xin Wang, Ying Wang
for: 这个研究旨在减少预训练 corpora 中的社会偏见，并提高预训练 Language Model 的表现。methods: 本研究使用了两个阶段的方法：第一个阶段是基于 continuous prompt tuning 的数据增强方法，第二个阶段是使用对照学习抑制预训练 Model 的参数。results: 实验结果显示，CCPA 比基于 Counterfactual Data Augmentation 的方法有更好的减少社会偏见表现，并且保持了预训练 Language Model 的语言模型表现。

Abstract
As the representation capability of Pre-trained Language Models (PLMs) improve, there is growing concern that they will inherit social biases from unprocessed corpora. Most previous debiasing techniques used Counterfactual Data Augmentation (CDA) to balance the training corpus. However, CDA slightly modifies the original corpus, limiting the representation distance between different demographic groups to a narrow range. As a result, the debiasing model easily fits the differences between counterfactual pairs, which affects its debiasing performance with limited text resources. In this paper, we propose an adversarial training-inspired two-stage debiasing model using Contrastive learning with Continuous Prompt Augmentation (named CCPA) to mitigate social biases in PLMs' encoding. In the first stage, we propose a data augmentation method based on continuous prompt tuning to push farther the representation distance between sample pairs along different demographic groups. In the second stage, we utilize contrastive learning to pull closer the representation distance between the augmented sample pairs and then fine-tune PLMs' parameters to get debiased encoding. Our approach guides the model to achieve stronger debiasing performance by adding difficulty to the training process. Extensive experiments show that CCPA outperforms baselines in terms of debiasing performance. Meanwhile, experimental results on the GLUE benchmark show that CCPA retains the language modeling capability of PLMs.

摘要
随着预训言语模型（PLMs）的表达能力提高，社会偏见的继承问题日益减少。大多数前一代减偏技术使用Counterfactual Data Augmentation（CDA）来填补训练集。然而，CDA只是略微修改原始集合，因此在不同的人口组中的表达距离仍然很窄。这会使减偏模型轻松地适应差异 между counterfactual pair，从而影响其减偏性能。在这篇论文中，我们提出一种基于对抗学习的两stage减偏模型，使用Continuous Prompt Augmentation（CPA）来减少PLMs的社会偏见。在第一个阶段，我们提出一种基于连续提示调整的数据增强方法，以增加不同人口组之间的表达距离。在第二个阶段，我们使用对抗学习来吸引增强后的样本对以更近的表达距离，然后细化PLMs的参数以获得减偏编码。我们的方法会导致模型增强减偏性能，通过增加训练过程的困难程度。广泛的实验表明，CCPA超过基准的减偏性能。同时，在GLUE benchmark上的实验结果表明，CCPA保留了PLMs的语言模型能力。

Cross-Element Combinatorial Selection for Multi-Element Creative in Display Advertising

paper_url: http://arxiv.org/abs/2307.01593
repo_url: None
paper_authors: Wei Zhang, Ping Zhang, Jian Dong, Yongkang Wang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang
for: 提高电商广告创作效果
methods: 跨元素组合选择框架(CECS)
results: 实现了最高纪录的Offline指标分数，并且在实际业务中实现了6.02%的点击率和10.37%的营收增长

Abstract
The effectiveness of ad creatives is greatly influenced by their visual appearance. Advertising platforms can generate ad creatives with different appearances by combining creative elements provided by advertisers. However, with the increasing number of ad creative elements, it becomes challenging to select a suitable combination from the countless possibilities. The industry's mainstream approach is to select individual creative elements independently, which often overlooks the importance of interaction between creative elements during the modeling process. In response, this paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements, termed CECS. In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element based on the current candidate creatives. In the decoder process, the creative combination problem is transformed into a cascade selection problem of multiple creative elements. A pointer mechanism with a cascade design is used to model the associations among candidates. Comprehensive experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics. Moreover, the CECS algorithm has been deployed in our industrial application, resulting in a significant 6.02% CTR and 10.37% GMV lift, which is beneficial to the business.

摘要
“广告创意的有效性受到它的见识性标志影响。广告平台可以通过不同的创意元素结合来生成不同的创意标志。然而，随着创意元素的数量增加，选择合适的结合成为愈来愈困难。industry的主流方法是选择个别创意元素独立，往往忽略了创意元素间的互动过程中的重要性。对此，本文提出了跨元素兼容选择框架，简称为CECS。在Encoder过程中，采用跨元素互动以静态地调整单一创意元素的表达，以满足目前候选的创意标志。在Decoder过程中，创意组合问题转化为多个创意元素之间的传递选择问题。使用一个链接机制，模型候选者之间的协调关系。实际测试结果显示，CECS取得了线上数据上的SOTA分数。此外，CECS算法已经在我们的商业应用中实现了6.02%的Click Through Rate（CTR）和10.37%的Gross Merchandise Value（GMV）提升，对业务有益。”

Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos

paper_url: http://arxiv.org/abs/2307.03200
repo_url: None
paper_authors: Ashwin Rao
for: 这篇论文是为了探讨如何使用自动语音识别（ASR）系统来提高电子学习视频的掌握效果。
methods: 该论文使用了各种语音识别算法和技术来生成视频的字幕，并对25个教育视频进行了评估。
results: 研究发现，使用ASR系统可以减少对视频的杂音和干扰的影响，并提高视频的掌握效果。同时，还有一些开放的研究方向，如如何更好地识别教育视频中的语音、如何提高语音识别精度等。

Abstract
Videos are increasingly being used for e-learning, and transcripts are vital to enhance the learning experience. The costs and delays of generating transcripts can be alleviated by automatic speech recognition (ASR) systems. In this article, we quantify the transcripts generated by whisper for 25 educational videos and identify some open avenues of research when leveraging ASR for transcribing educational videos.

摘要

IAdet: Simplest human-in-the-loop object detection

paper_url: http://arxiv.org/abs/2307.01582
repo_url: https://github.com/franchesoni/iadet
paper_authors: Franco Marchesoni-Acland, Gabriele Facciolo
for: 这个论文是为了提出一种人工智能注解策略，帮助在数据标注过程中训练模型。
methods: 这个策略包括三个模块：一、助け物标注；二、背景模型训练；三、活动选择下一个数据点。这个框架下开源了一个专门用于单类物体检测的工具——IAdet。
results: 对于PASCAL VOC数据集，IAdet工具可以将数据标注时间减少$25%$，并提供一个免费的训练模型。这些结果是基于一个故意简单的IAdet设计而得到的。因此，IAdet具有多种可以轻松改进的可能性，这为人工智能 loop对象检测系统开创了道路。

Abstract
This work proposes a strategy for training models while annotating data named Intelligent Annotation (IA). IA involves three modules: (1) assisted data annotation, (2) background model training, and (3) active selection of the next datapoints. Under this framework, we open-source the IAdet tool, which is specific for single-class object detection. Additionally, we devise a method for automatically evaluating such a human-in-the-loop system. For the PASCAL VOC dataset, the IAdet tool reduces the database annotation time by $25\%$ while providing a trained model for free. These results are obtained for a deliberately very simple IAdet design. As a consequence, IAdet is susceptible to multiple easy improvements, paving the way for powerful human-in-the-loop object detection systems.

摘要
这个工作提出了一种名为智能标注（IA）的模型训练策略。IA包括三个模块：（1）助记数据标注、（2）背景模型训练和（3）活动选择下一个数据点。在这个框架下，我们开源了专门 для单类物体检测的IADE工具。此外，我们开发了一种自动评估这种人工Loop系统的方法。对于PASCAL VOC数据集，IADE工具可以将数据库标注时间减少$25\%$，并提供一个免费的训练模型。这些结果是基于一个故意非常简单的IADE设计来获得的。因此，IADE具有多个容易改进的地方，这将为人工Loop对象检测系统开额。

Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

paper_url: http://arxiv.org/abs/2307.01578
repo_url: None
paper_authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo
for: 本研究旨在解决人工纠正数据集的全面标注问题，即使有预测器可用。
methods: 本研究使用了一系列的优化策略和lookahead最小化代理成本函数来解决问题。
results: 对于 synthetic 和实际世界的数据集，提议的方法可以实现 significiant 改善（23-86%）的标注效率。In English, that would be:
for: The paper aims to solve the problem of fully annotating a binary classification dataset when a predictor is available.
methods: The paper uses a series of optimization strategies and lookahead minimization of proxy cost functions to solve the problem.
results: On synthetic and real-world datasets, the proposed method achieves significant improvements (23-86%) in annotation efficiency.

Abstract
Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement ($23-86\%$) in annotation efficiency.

摘要
Translated into Simplified Chinese:尽管数据标注对人工智能研发和解释性很重要，大多数研究强调样本效率问题，这篇论文却研究受到忽略的问题——使用预测器获取标注数据。在简单的二分类设置下，我们提出了谱范围从优质通用解决方案到实用效果的方法。问题设为使用预测器完全标注二分类 dataset 的最小 yes/no 问题数。对于普通的二分类问题，解决方案基于编码理论，其中优化问题的策略是根据可能的标签编码使用 Huffman 编码。然而，这种方法对小型数据集来说是计算拥塞的。我们提出了一种实用的解决方案，基于多种准则和lookahead最小化代理成本函数。解决方案被分析、与优质解决方案进行比较，并在多个 sintetic 和实际世界数据集上评估。在这些数据集上，方法可以实现23-86%的标注效率提升。

Conceptual Cognitive Maps Formation with Neural Successor Networks and Word Embeddings

paper_url: http://arxiv.org/abs/2307.01577
repo_url: None
paper_authors: Paul Stoewer, Achim Schilling, Andreas Maier, Patrick Krauss
for: 这个论文旨在探讨人工智能中如何利用人脑中的Contextualization能力，以提高人工智能模型的表现。
methods: 该论文使用Successor Representation和神经网络，以及word embedding vector，构建了三个不同概念的认知地图。
results: 该模型能够学习两种不同的地图级别，并将新信息与相关的先前知识表示相似地分布在认知地图中。

Abstract
The human brain possesses the extraordinary capability to contextualize the information it receives from our environment. The entorhinal-hippocampal plays a critical role in this function, as it is deeply engaged in memory processing and constructing cognitive maps using place and grid cells. Comprehending and leveraging this ability could significantly augment the field of artificial intelligence. The multi-scale successor representation serves as a good model for the functionality of place and grid cells and has already shown promise in this role. Here, we introduce a model that employs successor representations and neural networks, along with word embedding vectors, to construct a cognitive map of three separate concepts. The network adeptly learns two different scaled maps and situates new information in proximity to related pre-existing representations. The dispersion of information across the cognitive map varies according to its scale - either being heavily concentrated, resulting in the formation of the three concepts, or spread evenly throughout the map. We suggest that our model could potentially improve current AI models by providing multi-modal context information to any input, based on a similarity metric for the input and pre-existing knowledge representations.

摘要
人脑具有Contextualizing信息的杰出能力，即将来自环境的信息Contextualized在我们的认知中。Entorhinal-hippocampal系统在这一功能中扮演关键角色，因为它深入参与记忆处理和构建认知地图，使用Place和Grid维度。理解和利用这种能力可能会大幅提升人工智能领域。我们提出一种使用Successor表示和神经网络，以及Word Embedding向量，构建三个不同概念的认知地图。该网络能够学习两种不同的缩放级别的地图，并将新的信息与相关的先前表示相关联。信息在认知地图中的散布方式因缩放级别而异，可能是集中形成三个概念，或者在整个地图中均匀分布。我们建议，我们的模型可能可以提高当前的人工智能模型，通过为输入提供多modal的上下文信息，基于输入和先前知识表示之间的相似度 metric。

Machine Learning-Based Intrusion Detection: Feature Selection versus Feature Extraction

paper_url: http://arxiv.org/abs/2307.01570
repo_url: None
paper_authors: Vu-Duc Ngo, Tuan-Cuong Vuong, Thien Van Luong, Hung Tran
for: 这种研究旨在比较Feature Selection和Feature Extraction两种方法在网络入侵检测中的性能，以及它们在不同的数据集和分类方式下的运行时间复杂度。
methods: 这种研究使用了UNSW-NB15数据集和多种性能指标，如准确率、回归率、检测精度和运行时间复杂度，对Feature Selection和Feature Extraction两种方法进行了比较。
results: 研究发现，Feature Selection方法在多数情况下提供了更好的检测性能，同时具有较低的训练和检测时间复杂度。然而，Feature Extraction方法在某些情况下（如K很小）表现更为可靠，并且对K的变化更为敏感。

Abstract
Internet of things (IoT) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as K = 4. Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Tab. 14 at the end of Section IV.

摘要
互联网智能化 (IoT) 在多个领域中扮演着重要角色，如智能城市、智能农业、智能医疗和智能制造。然而，IoT 设备高度易受到黑客攻击，这可能会导致安全泄露和数据泄露。为了有效防止这些攻击，一些基于机器学习的网络入侵检测方法在 IoT 网络中得到了开发，这些方法通常是基于特征抽象或特征选择技术来将输入数据的维度降低到可以被机器学习模型处理的水平。这样可以确保检测的复杂度足够低，以便在实时运行，特别是在任何入侵检测系统中。本文提供了对这两种特征减少方法的入侵检测方法在不同的性能指标下进行了比较，包括精度率、回传率、检测率和运行时间复杂度。例如，通常来说，选择特征方法不仅提供更好的检测性能，而且还比特征抽象方法来的训练和检测时间更短，尤其当K增加时。然而，抽象方法在K很小时（例如K=4）时更加可靠，而且特征选择方法比特征抽象方法更敏感于K的变化。根据这个比较，我们提供了一个实用的指南，可以帮助您在具体情况下选择适合的入侵检测类型，详细可见在表14中。

Scalable variable selection for two-view learning tasks with projection operators

paper_url: http://arxiv.org/abs/2307.01558
repo_url: https://github.com/aalto-ics-kepaco/projse
paper_authors: Sandor Szedmak, Riikka Huusari, Tat Hong Duong Le, Juho Rousu
for: 提出了一种新的变量选择方法，适用于两视图设置或 vector-valued supervised learning 问题。
methods: 使用迭代选择高度相关于输出变量的变量，但与先前选择的变量不相关。使用投影算子和其代数来测量相关性，并可以利用 kernel 函数来表达非线性相关模型。
results: 通过实验 validate 了我们的方法，并在真实数据上验证了其扩展性和选择的有用性。

Abstract
In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables, but which are not correlated with the previously chosen variables. To measure the correlation, our method uses the concept of projection operators and their algebra. With the projection operators the relationship, correlation, between sets of input and output variables can also be expressed by kernel functions, thus nonlinear correlation models can be exploited as well. We experimentally validate our approach, showing on both synthetic and real data its scalability and the relevance of the selected features. Keywords: Supervised variable selection, vector-valued learning, projection-valued measure, reproducing kernel Hilbert space

摘要
在这篇论文中，我们提出了一种新的变量选择方法，适用于两视设置或vector-valued生育学问题。我们的框架可以处理巨大规模的选择任务，数据样本数可以达到十万级。总之，我们的方法通过逐步选择与输出变量高度相关的变量，但与之前选择的变量不相关的方式进行变量选择。为了度量相关性，我们使用投影算子和其代数来测量输入和输出变量之间的相关性。通过投影算子，我们可以将输入和输出变量之间的相关性表示为kernel函数，从而可以利用非线性相关模型。我们实验 validate我们的方法，并在Synthetic和实际数据上展示了其扩展性和选择的有效性。关键词：supervised变量选择，vector-valued学习，投影值度量，复制kernel希尔бер特空间。

Separated RoadTopoFormer

paper_url: http://arxiv.org/abs/2307.01557
repo_url: None
paper_authors: Mingjie Lu, Yuanxian Huang, Ji Liu, Jinzhang Peng, Lu Tian, Ashish Sirasao
for: 本研究的目的是提高自动驾驶技术的实现，强调了驾驶场景理解的重要性。
methods: 本研究使用了Separated RoadTopoFormer框架，这是一个端到端的框架，可以同时检测路径中的交通元素和车道中心线，以及这些元素之间的关系。
results: 本研究的最终提交得分为0.445 OLS，在两个子任务和总分中都具有竞争力。

Abstract
Understanding driving scenarios is crucial to realizing autonomous driving. Previous works such as map learning and BEV lane detection neglect the connection relationship between lane instances, and traffic elements detection tasks usually neglect the relationship with lane lines. To address these issues, the task is presented which includes 4 sub-tasks, the detection of traffic elements, the detection of lane centerlines, reasoning connection relationships among lanes, and reasoning assignment relationships between lanes and traffic elements. We present Separated RoadTopoFormer to tackle the issues, which is an end-to-end framework that detects lane centerline and traffic elements with reasoning relationships among them. We optimize each module separately to prevent interaction with each other and aggregate them together with few finetunes. For two detection heads, we adopted a DETR-like architecture to detect objects, and for the relationship head, we concat two instance features from front detectors and feed them to the classifier to obtain relationship probability. Our final submission achieves 0.445 OLS, which is competitive in both sub-task and combined scores.

摘要
理解驾驶场景是自动驾驶实现的关键。先前的工作，如地图学习和BEV车道检测，忽略了车道实例之间的连接关系，而交通元素检测任务通常忽略车道线的关系。为解决这些问题，我们提出了一个包含4个子任务的任务，即交通元素检测、车道中心线检测、车道间连接关系的推理和车道和交通元素之间的关系推理。我们提出了分离的路况拟合器（Separated RoadTopoFormer）来解决这些问题，它是一个端到端框架，可以同时检测车道中心线和交通元素，并推理它们之间的关系。我们对每个模块进行独立优化，以避免它们之间的交互，并将它们粗略地融合。为两个检测头，我们采用了一种类似于DETR架构来检测对象，而对于关系头，我们将前两个检测器的实例特征 concatenate 并feed 到分类器来获得关系概率。我们的最终提交得分为0.445 OLS，这在两个子任务和合并分数中都是竞争力强的。

Knowledge Graph for NLG in the context of conversational agents

paper_url: http://arxiv.org/abs/2307.01548
repo_url: None
paper_authors: Hussam Ghanem, Massinissa Atmani, Christophe Cruz
for: 本文提供了对知识图（KG）到文本生成的不同架构的回顾，包括图神经网络、图转换器和 seq2seq 模型。
methods: 本文讲解了不同架构的优势和局限性，并指出了在实际任务中选择架构的重要性。
results: 本文选择了基于 PLM 的 seq2seq 转换器模型来进行知识图到文本生成任务，并计划修改 PLM 上的 kg-to-text 生成数据集，以及在未来的工作中探讨情感和多语言维度。

Abstract
The use of knowledge graphs (KGs) enhances the accuracy and comprehensiveness of the responses provided by a conversational agent. While generating answers during conversations consists in generating text from these KGs, it is still regarded as a challenging task that has gained significant attention in recent years. In this document, we provide a review of different architectures used for knowledge graph-to-text generation including: Graph Neural Networks, the Graph Transformer, and linearization with seq2seq models. We discuss the advantages and limitations of each architecture and conclude that the choice of architecture will depend on the specific requirements of the task at hand. We also highlight the importance of considering constraints such as execution time and model validity, particularly in the context of conversational agents. Based on these constraints and the availability of labeled data for the domains of DAVI, we choose to use seq2seq Transformer-based models (PLMs) for the Knowledge Graph-to-Text Generation task. We aim to refine benchmark datasets of kg-to-text generation on PLMs and to explore the emotional and multilingual dimensions in our future work. Overall, this review provides insights into the different approaches for knowledge graph-to-text generation and outlines future directions for research in this area.

摘要
使用知识图（KG）可以提高对话机器人的回答准确性和全面性。在生成回答时，从KG中生成文本是一项有趣且复杂的任务，在最近几年内受到了广泛关注。在本文中，我们提供了不同架构的知识图到文本生成评论，包括图神经网络、图转换器和 linearization with seq2seq 模型。我们讨论了每个架构的优点和缺点，并结论认为选择架构取决于任务的具体需求。我们还 highlight了考虑执行时间和模型有效性的重要性，特别在对话机器人的上下文中。基于这些约束和 DAVI 领域的可用标注数据，我们选择使用 Transformer 基于 seq2seq 模型（PLMs）进行知识图到文本生成任务。我们希望可以修改 PLMs 上 kg-to-text 生成的标准数据集，并在未来的工作中探讨情感和多语言维度。总的来说，本文提供了不同的知识图到文本生成方法的概述，并预示了未来这个领域的研究方向。

Learning to Prompt in the Classroom to Understand AI Limits: A pilot study

paper_url: http://arxiv.org/abs/2307.01540
repo_url: None
paper_authors: Emily Theophilou, Cansu Koyuturk, Mona Yavari, Sathya Bursic, Gregor Donabauer, Alessia Telari, Alessia Testa, Raffaele Boiano, Davinia Hernandez-Leo, Martin Ruskov, Davide Taibi, Alessandro Gabbiadini, Dimitri Ognibene
for: 本研究的目的是研究人工智能的acceptance和用途，以及如何将人工智能应用于解决社会问题。
methods: 本研究使用了Large Language Models（LLM）和其 derivated chatbots，如ChatGPT，来实现人工智能的自然语言处理能力。
results: 研究获得了显著的结果，包括学生对人工智能的评价高，对 chatGPT 的互动质量改善，对人工智能的态度变得更积极，同时更好地理解人工智能的限制。

Abstract
Artificial intelligence's progress holds great promise in assisting society in addressing pressing societal issues. In particular Large Language Models (LLM) and the derived chatbots, like ChatGPT, have highly improved the natural language processing capabilities of AI systems allowing them to process an unprecedented amount of unstructured data. The consequent hype has also backfired, raising negative sentiment even after novel AI methods' surprising contributions. One of the causes, but also an important issue per se, is the rising and misleading feeling of being able to access and process any form of knowledge to solve problems in any domain with no effort or previous expertise in AI or problem domain, disregarding current LLMs limits, such as hallucinations and reasoning limits. Acknowledging AI fallibility is crucial to address the impact of dogmatic overconfidence in possibly erroneous suggestions generated by LLMs. At the same time, it can reduce fear and other negative attitudes toward AI. AI literacy interventions are necessary that allow the public to understand such LLM limits and learn how to use them in a more effective manner, i.e. learning to "prompt". With this aim, a pilot educational intervention was performed in a high school with 30 students. It involved (i) presenting high-level concepts about intelligence, AI, and LLM, (ii) an initial naive practice with ChatGPT in a non-trivial task, and finally (iii) applying currently-accepted prompting strategies. Encouraging preliminary results have been collected such as students reporting a) high appreciation of the activity, b) improved quality of the interaction with the LLM during the educational activity, c) decreased negative sentiments toward AI, d) increased understanding of limitations and specifically We aim to study factors that impact AI acceptance and to refine and repeat this activity in more controlled settings.

摘要
人工智能的进步具有巨大的承诺，可以帮助社会解决一系列的社会问题。特别是大型自然语言处理模型（LLM）和其 derivated chatbot如ChatGPT，有效提高了人工智能系统的自然语言处理能力，使其能处理前所未有的大量不结构化数据。然而，这种热情也导致了负面情绪的升温，包括误导和过度自信的问题。一个重要的原因是人们对人工智能的训练和应用方面的知识和经验不足，导致他们假设AI系统可以轻松地解决任何问题，不需要专业知识或培训。我们认为，承诺AI系统的有限性是关键，以避免“欺诈式”的过度自信和误导。同时，了解AI系统的限制可以减少对人工智能的恐惧和负面情绪。为此，我们发展了一种AI文化干预措施，以帮助公众更好地理解LLM的限制和如何更有效地使用它们。在这个研究中，我们在一所高中进行了一场教育干预，涉及到（i）介绍智能、AI和LLM的概念，（ii）初步使用ChatGPT完成一项非常困难的任务，以及（iii）应用已知的提示策略。结果表明，学生对这个活动表示高度的欢迎，并且在与LLM的交互中改善了质量，同时减少了对人工智能的负面情绪。我们计划进一步研究这些因素，以便更好地理解AI接受度的影响因素，并在更加控制的环境下重复和改进这种活动。

Anomaly detection in image or latent space of patch-based auto-encoders for industrial image analysis

paper_url: http://arxiv.org/abs/2307.02495
repo_url: None
paper_authors: Nicolas Pinon, Robin Trombetta, Carole Lartizien
for: 检测颜色图像中异常点的方法
methods: 基于patch-based auto-encoder构建的三种方法： errors between original image and its reconstruction, support estimation of normal image distribution in latent space, and error between original image and restored version of reconstructed image
results: 在MVTecAD工业图像数据库上评估和比较三种方法的性能，并与两种现有的状态前方法进行比较

Abstract
We study several methods for detecting anomalies in color images, constructed on patch-based auto-encoders. Wecompare the performance of three types of methods based, first, on the error between the original image and its reconstruction,second, on the support estimation of the normal image distribution in the latent space, and third, on the error between the originalimage and a restored version of the reconstructed image. These methods are evaluated on the industrial image database MVTecADand compared to two competitive state-of-the-art methods.

摘要
我们研究了一些用于检测颜色图像异常的方法，基于patch-based自适应编码器。我们对三种类型的方法进行比较，分别是根据原始图像与其重建图像之间的错误、在隐藏空间中正常图像分布的支持估计、以及原始图像与重建后的图像之间的错误。这些方法在MVTecAD工业图像数据库上进行评估，并与两种现有的状态艺术方法进行比较。

Analyzing Intentional Behavior in Autonomous Agents under Uncertainty

paper_url: http://arxiv.org/abs/2307.01532
repo_url: https://github.com/filipcano/intentional-autonomous-agents
paper_authors: Filip Cano Córdoba, Samuel Judson, Timos Antonopoulos, Katrine Bjørner, Nicholas Shoemaker, Scott J. Shapiro, Ruzica Piskac, Bettina Könighofer
for: 本研究旨在提供一种量化评估自动决策系统的责任性，以便在不确定环境中进行原则正的决策。
methods: 我们使用Markov决策过程（MDP）模型不确定环境，并使用概率模型检查来计算机器人在达到某个事件的能力。我们称这为“职责范围”。我们还使用Counterfactual reasoning来自动生成相关的场景，以提高评估的可靠性。
results: 我们通过一个实验示例，证明我们的方法可以区分“意图的”和“巧合的”交通事故。

Abstract
Principled accountability for autonomous decision-making in uncertain environments requires distinguishing intentional outcomes from negligent designs from actual accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely on probabilistic model checking to compute the ability of the agent to influence reaching a certain event. We call this the scope of agency. We say that there is evidence of intentional behavior if the scope of agency is high and the decisions of the agent are close to being optimal for reaching the event. Our method applies counterfactual reasoning to automatically generate relevant scenarios that can be analyzed to increase the confidence of our assessment. In a case study, we show how our method can distinguish between 'intentional' and 'accidental' traffic collisions.

摘要
<>使用原则性的负责任度来评估自主决策在不确定环境中的决策过程，需要分辨出意图的结果和不注意的设计。我们提出了基于量化度量的意图行为分析方法。我们将不确定环境模型为马尔可夫决策过程（MDP）。对于每个场景，我们采用概率模型检查来计算机器人的影响力。我们称之为作用范围。如果作用范围高并且机器人的决策较近于最优的达成事件，我们认为有意图行为的证据。我们的方法使用Counterfactual reasoning自动生成相关的场景，以提高评估的信度。在一个案例研究中，我们示例如如何使用我们的方法分辨出“意图”和“意外”的交通事故。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Convolutional Transformer for Autonomous Recognition and Grading of Tomatoes Under Various Lighting, Occlusion, and Ripeness Conditions

paper_url: http://arxiv.org/abs/2307.01530
repo_url: None
paper_authors: Asim Khan, Taimur Hassan, Muhammad Shafay, Israa Fahmy, Naoufel Werghi, Lakmal Seneviratne, Irfan Hussain
for:* Tomatoes are harvested using mobile robots in real-world scenarios, but this is challenging due to factors such as occlusion and color similarity.methods:* The proposed framework uses a convolutional transformer architecture to autonomously recognize and grade tomatoes, regardless of occlusion level, lighting conditions, and ripeness.results:* The proposed framework outperforms existing methods by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores on three different datasets.Here are the three points in Simplified Chinese text:for:* Tomatoes 被用mobile robots在实际场景中收割，但这会受到叶子和枝条的 occlusion 和 Tomatoes 和周围植物的颜色相似性的影响。methods:* this research 提出了一种基于 convolutional transformer 架构的自动 Tomatoes 识别和评分方法，不 regard occlusion 水平、照明条件和成熟度。results:* this research 的提出方法在三个不同的 dataset 上比基eline 方法高出 58.14%, 65.42%, 66.39% 的mean average precision 分数。

Abstract
Harvesting fully ripe tomatoes with mobile robots presents significant challenges in real-world scenarios. These challenges arise from factors such as occlusion caused by leaves and branches, as well as the color similarity between tomatoes and the surrounding foliage during the fruit development stage. The natural environment further compounds these issues with varying light conditions, viewing angles, occlusion factors, and different maturity levels. To overcome these obstacles, this research introduces a novel framework that leverages a convolutional transformer architecture to autonomously recognize and grade tomatoes, irrespective of their occlusion level, lighting conditions, and ripeness. The proposed model is trained and tested using carefully annotated images curated specifically for this purpose. The dataset is prepared under various lighting conditions, viewing perspectives, and employs different mobile camera sensors, distinguishing it from existing datasets such as Laboro Tomato and Rob2Pheno Annotated Tomato. The effectiveness of the proposed framework in handling cluttered and occluded tomato instances was evaluated using two additional public datasets, Laboro Tomato and Rob2Pheno Annotated Tomato, as benchmarks. The evaluation results across these three datasets demonstrate the exceptional performance of our proposed framework, surpassing the state-of-the-art by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores for KUTomaData, Laboro Tomato, and Rob2Pheno Annotated Tomato, respectively. The results underscore the superiority of the proposed model in accurately detecting and delineating tomatoes compared to baseline methods and previous approaches. Specifically, the model achieves an F1-score of 80.14%, a Dice coefficient of 73.26%, and a mean IoU of 66.41% on the KUTomaData image dataset.

摘要
摘取全熟 Tomatoes 的 mobile robot 在实际应用场景中具有显著的挑战。这些挑战来自于叶子和枝条所引起的遮挡，以及 Tomatoes 和周围的植物发育阶段的颜色相似性。自然环境更进一步复杂这些问题，包括不同的照明条件、视角、遮挡因子和不同的成熟度。为了解决这些问题，这项研究提出了一种新的框架，利用卷积变换器架构来自动认识和评分 Tomatoes，不受遮挡、照明条件和成熟度的影响。该模型被训练和测试使用特别为这个目的制作的精心注释图像集。该数据集在不同的照明条件和视角下准备，并使用不同的移动摄像头感知器，与现有的数据集不同。为了评估提案的效果，我们使用了两个额外的公共数据集，即 Laboro Tomato 和 Rob2Pheno Annotated Tomato，作为标准。测试结果表明，提案的框架在处理受遮挡和受遮挡的 Tomatoes 实例方面表现出色，相比基eline方法和前一代方法，提案的模型在三个数据集上的mean average precision分数上出色，高于state-of-the-art的58.14%、65.42%和66.39%。结果表明，提案的模型在识别和定义 Tomatoes 方面具有显著的优势，具体来说，模型在 KUTomaData 图像集上达到了80.14%的F1分数、73.26%的Dice系数和66.41%的mean IoU。

LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via Latent Ensemble Attack

paper_url: http://arxiv.org/abs/2307.01520
repo_url: None
paper_authors: Joonkyo Shim, Hyunsoo Yoon
for: 防止深伪（deepfake）威胁， recent studies 使用 adversarial perturbation 来攻击深伪模型的输出。
methods: 我们提出了一个简单又有效的攻击方法，called Latent Ensemble ATtack (LEAT)，它攻击独立的潜在编码过程，从而生成不同于目标属性的复杂的出力图像。
results: 我们的方法在实际应用中具有更高的防护成功率，比之前的方法更加有效。

Abstract
Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society. To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs. However, previous approaches primarily focus on generating distorted outputs based on only predetermined target attributes, leading to a lack of robustness in real-world scenarios where target attributes are unknown. Additionally, the transferability of perturbations between two prominent generative models, Generative Adversarial Networks (GANs) and Diffusion Models, remains unexplored. In this paper, we emphasize the importance of target attribute-transferability and model-transferability for achieving robust deepfake disruption. To address this challenge, we propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process. By disrupting the latent encoding process, it generates distorted output images in subsequent generation processes, regardless of the given target attributes. This target attribute-agnostic attack ensures robust disruption even when the target attributes are unknown. Additionally, we introduce a Normalized Gradient Ensemble strategy that effectively aggregates gradients for iterative gradient attacks, enabling simultaneous attacks on various types of deepfake models, involving both GAN-based and Diffusion-based models. Moreover, we demonstrate the insufficiency of evaluating disruption quality solely based on pixel-level differences. As a result, we propose an alternative protocol for comprehensively evaluating the success of defense. Extensive experiments confirm the efficacy of our method in disrupting deepfakes in real-world scenarios, reporting a higher defense success rate compared to previous methods.

摘要
深刻复制（Deepfakes），由生成模型生成的恶意视觉内容，对社会 pose 危害性的增加。为了积极防止深刻复制害处， latest studies have employed adversarial perturbation to disrupt deepfake model outputs. However, previous approaches primarily focus on generating distorted outputs based on predetermined target attributes, leading to a lack of robustness in real-world scenarios where target attributes are unknown. Additionally, the transferability of perturbations between two prominent generative models, 生成 adversarial Networks (GANs) and Diffusion Models, remains unexplored. In this paper, we emphasize the importance of target attribute-transferability and model-transferability for achieving robust deepfake disruption. To address this challenge, we propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process. By disrupting the latent encoding process, it generates distorted output images in subsequent generation processes, regardless of the given target attributes. This target attribute-agnostic attack ensures robust disruption even when the target attributes are unknown. Additionally, we introduce a Normalized Gradient Ensemble strategy that effectively aggregates gradients for iterative gradient attacks, enabling simultaneous attacks on various types of deepfake models, involving both GAN-based and Diffusion-based models. Moreover, we demonstrate the insufficiency of evaluating disruption quality solely based on pixel-level differences. As a result, we propose an alternative protocol for comprehensively evaluating the success of defense. Extensive experiments confirm the efficacy of our method in disrupting deepfakes in real-world scenarios, reporting a higher defense success rate compared to previous methods.

Deep Attention Q-Network for Personalized Treatment Recommendation

paper_url: http://arxiv.org/abs/2307.01519
repo_url: https://github.com/stevenmsm/rl-icu-daqn
paper_authors: Simin Ma, Junghwan Lee, Nicoleta Serban, Shihao Yang
for: 这研究旨在提出个性化治疗建议的新方法，以提高医疗结果。
methods: 该研究使用深度注意力Q网络， combinig transformer架构和深度强化学习框架，fficiently incorporate all past patient observations。
results: 研究在实际的 septic shock 和急性低血压 cohorts 中展示了其超过当前状态艺术模型的优势。

Abstract
Tailoring treatment for individual patients is crucial yet challenging in order to achieve optimal healthcare outcomes. Recent advances in reinforcement learning offer promising personalized treatment recommendations; however, they rely solely on current patient observations (vital signs, demographics) as the patient's state, which may not accurately represent the true health status of the patient. This limitation hampers policy learning and evaluation, ultimately limiting treatment effectiveness. In this study, we propose the Deep Attention Q-Network for personalized treatment recommendations, utilizing the Transformer architecture within a deep reinforcement learning framework to efficiently incorporate all past patient observations. We evaluated the model on real-world sepsis and acute hypotension cohorts, demonstrating its superiority to state-of-the-art models. The source code for our model is available at https://github.com/stevenmsm/RL-ICU-DAQN.

摘要
个人化治疗是现代医疗的关键，但同时也是非常具有挑战性，以达到最佳医疗结果。最新的增强学习技术具有个人化治疗建议的承诺，但是它们只是基于当前患者的观察（生命 Parameters、人口）来判断患者的状态，这可能并不准确地反映患者的真实健康状况。这种限制会阻碍策略学习和评估，从而限制治疗的效果。在这项研究中，我们提出了深度注意力Q网络，利用转换器架构在深度增强学习框架中高效地包含所有过去患者的观察。我们对实际世界的 septic shock 和急性低血压群体进行了评估，并证明了我们的模型的优越性。模型的源代码可以在https://github.com/stevenmsm/RL-ICU-DAQN上获取。

All in One: Multi-task Prompting for Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.01504
repo_url: https://github.com/sheldonresearch/ProG
paper_authors: Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan
for: 填充预训练模型的知识空间，以提高graph任务的性能。
methods: 提出了一种多任务提问方法，通过统一格式、语言提问的拓展和下游任务的改进，使得自然语言处理中的提问思想可以轻松地应用于图领域。
results: 经过广泛的实验，结果表明该方法能够提高graph任务的性能，并且可以适应不同的任务。

Abstract
Recently, ''pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a ''negative transfer'' to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.

摘要
近期，“预训练和精度调整”被广泛采用为许多图像任务的标准工作流程，因为它可以将通用的图像知识传递给不同应用程序，填补每个应用程序的图像缺失。然而，图像任务中的节点水平、边水平和图像水平具有很大的多样性，这使得预训练预测经常与这些多种任务不兼容。这种差距甚至可能导致特定应用程序的负面传播，从而导致Results poor。 inspirited by the prompt learning in natural language processing (NLP), which has shown significant effectiveness in leveraging prior knowledge for various NLP tasks, we investigate the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.

HEDI: First-Time Clinical Application and Results of a Biomechanical Evaluation and Visualisation Tool for Incisional Hernia Repair

paper_url: http://arxiv.org/abs/2307.01502
repo_url: None
paper_authors: Jacob J. Relle, Samuel Voß, Ramesch Raschidi, Regine Nessel, Johannes Görich, Mark O. Wielpütz, Thorsten Löffler, Vincent Heuveline, Friedrich Kallinowski, Philipp D. Lösel
for: 治疗腹部坏死病例，减少疼痛、不适和重复手术的风险
methods: 使用生物力学方法，考虑腹部团附肌肉的活动、内部压力、组织弹性和腹部膨润，以提高腹部坏死病例的治疗效果
results: 在31名患者的临床应用中，使用HEDI工具进行预操作评估，比报告的成功率显著提高，所有患者三年后没有疼痛和坏死病例重复Here’s the English version of the three key points for reference:
for: Treatment of abdominal wall defects to reduce pain, discomfort, and the risk of repeated surgical repairs
methods: Use of biomechanical methods that consider muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distention to improve treatment outcomes
results: Significantly improved success rates in the first clinical application of HEDI in the preoperative evaluation of 31 patients, with all patients remaining pain-free and showing no hernia recurrence after three years of follow-up.

Abstract
Abdominal wall defects often lead to pain, discomfort, and recurrence of incisional hernias, resulting in significant morbidity and repeated surgical repairs worldwide. Mesh repair for large hernias is usually based on the defect area with a fixed overlap, without considering biomechanical aspects such as muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distention. To address this issue, we present a biomechanical approach to incisional hernia repair that takes into account the unstable abdominal wall. Additionally, we introduce HEDI, a tool that uses dynamic computed tomography with Valsalva maneuver to automatically detect and assess hernia size, volume, and abdominal wall instability. Our first clinical application of HEDI in the preoperative evaluation of 31 patients shows significantly improved success rates compared to reported rates, with all patients remaining pain-free and showing no hernia recurrence after three years of follow-up.

摘要
腹壁损害常引起疼痛、不适和重复的 Hernia 复发，导致全球较高的患者病况和多次手术修复。固定 overlap 的网 repair 通常基于缺陷区域，没有考虑生物力学方面的因素，如肌肉活动、腹部压力、组织弹性和腹壁膨胀。为解决这个问题，我们提出了一种生物力学方法 для剖股 Hernia 修复，考虑到不稳定的腹壁。此外，我们还介绍了 HEDI，一种使用动态计算tomography 和 Valsalva 优化来自动检测和评估 Hernia 大小、体积和腹壁不稳定性。我们在31名患者的前Operative 评估中首次应用 HEDI，显示成功率明显高于报道率，所有患者三年跟踪后保持无疼痛，无 Hernia 复发。

FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization

paper_url: http://arxiv.org/abs/2307.02493
repo_url: None
paper_authors: Eunju Yang, Gyusang Cho, Chan-Hyun Youn
for: 本研究旨在提出一种实用的多源领域适应（MSDA）问题scenario，以适应部署模型到客户端的数据集。
methods: 本研究提出了一种新的适应问题scenario，称为Three-Free Domain Adaptation（TFDA），其中目标标签、源数据集和源领域信息（领域标签和领域数量）都不可用。为解决这种问题scenario，我们提出了一种实用的适应框架called FREEDOM。
results: FREEDOM可以在无源领域信息的情况下实现state-of-the-art或相当的性能，同时减少了最终模型的大小。此外，FREEDOM可以独立于源领域数量进行部署。

Abstract
From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.

摘要
从服务方面来看，多源领域适应（MSDA）是一个有前途的enario，可以将部署的模型适应到客户的资料集。它可以无需目标标签进行适应，并且可以处理多个领域的资料集合建构的情况。然而，MSDA的训练 heavily rely on多个领域的领域信息（每个数据标签），并且需要源和目标资料集同时存在（physically），导致客户设备限制或资料隐私问题。为了更实际的模型适应方案，我们将这些限制放宽，并提出了一个新的问题场景：三自领域适应（TFDA），其中1）目标标签、2）源资料集和3）源领域信息（领域标签和领域数量）都不可用。在这个问题场景下，我们提出了一个实用的适应框架called FREEDOM。它利用了生成模型的力量，将数据分解为类别和风格的两个方面，其中风格是源数据中的class独立信息，通过非 Parametric Bayesian方法设计。在适应阶段，FREEDOM的目标是将源类别分布与目标的类别分布相对consistent，然后仅部署一部分的分类模型为个人化网络。因此，FREEDOM可以实现state-of-the-art或相等的性能，而不需要领域信息，并且对数据集大小进行干扰。

A Bibliographic Study on Artificial Intelligence Research: Global Panorama and Indian Appearance

paper_url: http://arxiv.org/abs/2308.00705
repo_url: None
paper_authors: Amit Tiwari, Susmita Bardhan, Vikas Kumar
for: 本研究用科学映射方法对2015-2020年的人工智能（AI）研究进行了 bibliometric 分析，以了解AI研究的发展趋势。
methods: 本研究使用了Scopus数据库收集必要的数据，并对数据进行了手动和工具（OpenRefine）的数据转换，以便进行分析。
results: 研究发现，在2015-2020年间，开放获取和商业刊物的AI研究量相对较高，IEEE是最主要的出版商，发表了84%的最高引用文章。此外，中国和美国是AI领域的主要贡献国。研究还发现， neural networks和深度学习是AI研究中最主要的话题。最后，研究发现，不仅公共机构，私人机构也在投入AI研究。

Abstract
The present study identifies and assesses the bibliographic trend in Artificial Intelligence (AI) research for the years 2015-2020 using the science mapping method of bibliometric study. The required data has been collected from the Scopus database. To make the collected data analysis-ready, essential data transformation was performed manually and with the help of a tool viz. OpenRefine. For determining the trend and performing the mapping techniques, top five open access and commercial journals of AI have been chosen based on their citescore driven ranking. The work includes 6880 articles published in the specified period for analysis. The trend is based on Country-wise publications, year-wise publications, topical terms in AI, top-cited articles, prominent authors, major institutions, involvement of industries in AI and Indian appearance. The results show that compared to open access journals; commercial journals have a higher citescore and number of articles published over the years. Additionally, IEEE is the prominent publisher which publishes 84% of the top-cited publications. Further, China and the United States are the major contributors to literature in the AI domain. The study reveals that neural networks and deep learning are the major topics included in top AI research publications. Recently, not only public institutions but also private bodies are investing their resources in AI research. The study also investigates the relative position of Indian researchers in terms of AI research. Present work helps in understanding the initial development, current stand and future direction of AI.

摘要

Country-wise publications: China and the United States are the major contributors to AI literature.2. Year-wise publications: There has been a steady increase in the number of publications over the years.3. Topical terms in AI: Neural networks and deep learning are the major topics included in top AI research publications.4. Top-cited articles: IEEE publishes 84% of the top-cited publications.5. Prominent authors: The study reveals that there are several prominent authors in the field of AI.6. Major institutions: The study shows that China and the United States are the major contributors to AI research.7. Involvement of industries in AI: Private bodies are investing their resources in AI research, in addition to public institutions.8. Indian appearance: The study investigates the relative position of Indian researchers in terms of AI research.The present work provides an understanding of the initial development, current stand, and future direction of AI research.

Mitigating Bias: Enhancing Image Classification by Improving Model Explanations

paper_url: http://arxiv.org/abs/2307.01473
repo_url: None
paper_authors: Raha Ahmadi, Mohammad Javad Rajabi, Mohammad Khalooie, Mohammad Sabokrou
for: 提高图像分类器的主要概念理解和表示，增强模型对图像中主要元素的理解。
methods: 提出了一种新的方法，通过同时引导模型对前景进行注意力调控，使模型更好地捕捉图像中的主要概念。
results: 通过对标准数据集进行广泛的实验，证明了该方法的有效性，提高了图像分类器的准确率。

Abstract
Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifiers as the crucial elements of interest in images may be overshadowed. In this paper, we propose a novel approach to address this issue and improve the learning of main concepts by image classifiers. Our central idea revolves around concurrently guiding the model's attention toward the foreground during the classification task. By emphasizing the foreground, which encapsulates the primary objects of interest, we aim to shift the focus of the model away from the dominant influence of the background. To accomplish this, we introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. We investigate various strategies, including modifying the loss function or incorporating additional architectural components, to enable the classifier to effectively capture the primary concept within an image. Additionally, we explore the impact of different foreground attention mechanisms on model performance and provide insights into their effectiveness. Through extensive experimentation on benchmark datasets, we demonstrate the efficacy of our proposed approach in improving the classification accuracy of image classifiers. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images. The results of this study contribute to advancing the field of image classification and provide valuable insights for developing more robust and accurate deep-learning models.

摘要
深度学习模型已经展现出了学习复杂模式和概念的Remarkable能力。然而，最近的发现表明这些模型在训练数据中很可能会依赖于图像的背景中的简单和易于识别的特征，而不是主要的概念或 объек。这种情况会导致图像分类器的挑战，因为图像中的关键元素可能会被遮盖。在这篇论文中，我们提出了一种新的方法来解决这个问题，并提高图像分类器的学习。我们的中心思想是在分类任务中同时引导模型的注意力向前景方向。通过强调前景，我们希望使模型忽略背景的主导性的影响。为实现这一点，我们引入了一种机制，使得模型能够充分分配注意力于前景。我们 investigate了多种策略，包括修改损失函数或添加额外的建筑 комponents，以使模型能够有效地捕捉图像中的主要概念。此外，我们还探讨了不同的前景注意力机制对模型性能的影响，并提供了有价值的发现。通过对标准数据集进行广泛的实验，我们证明了我们提出的方法的可行性和效果。我们的发现指出了图像分类器中的前景注意力对模型理解和图像中主要概念的表示的重要性。这些发现对图像分类领域的进一步发展具有重要意义，并为开发更加稳定和准确的深度学习模型提供了有价值的发现。

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01472
repo_url: None
paper_authors: Zhuoran Li, Ling Pan, Longbo Huang
For: 提出了一种新的多智能体偏Diffusion Offline Multi-agent Model（DOM2），用于离线多智能体学习（MARL）。* Methods: 在政策网络中 интегри了扩散模型，并提出了一种轨迹基据增强方案，以提高政策表达力和多样性。* Results: 对多智能体particle和多智能体MuJoCo环境进行了广泛的实验，并显示了DOM2在环境变化时的稳定性和表现优于现有方法。此外，DOM2在数据效率方面也表现出了优势，可以在$20+$ times less data的情况下达到现有方法的性能水平。

Abstract
We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-augmentation scheme in training. These key ingredients make our algorithm more robust to environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better in shifted environments thanks to its high expressiveness and diversity. Furthermore, DOM2 shows superior data efficiency and can achieve state-of-the-art performance with $20+$ times less data compared to existing algorithms.

摘要
我们提出了一种新的多智能体吸引模型（DOM2），用于离线多智能体学习（MARL）。与现有的策略设计依赖保守主义的算法不同，DOM2 增强策略表达性和多样性基于吸引模型。我们在策略网络中 integrate 了吸引模型，并提出了一种基于轨迹的数据增强方案。这些关键元素使得我们的算法更加鲁棒地应对环境变化，并实现了显著提高表现、泛化和数据效率。我们的广泛的实验结果表明，DOM2 在多智能体粒子和多智能体 MuJoCo 环境中表现出色，并在Shifted环境中具有更高的表现和泛化能力。此外，DOM2 还示出了更高的数据效率，可以使用 $20++$ times less data than existing algorithms 达到相同的表现。

A multilevel framework for AI governance

paper_url: http://arxiv.org/abs/2307.03198
repo_url: None
paper_authors: Hyesun Choung, Prabu David, John S. Seberger
for: 本研究旨在发展一种基于伦理和基本人类价值观的AI治理框架，以实现AI的潜在好荣和风险减少。
methods: 本研究使用多级治理方法，包括政府、企业和公民三个层次的潜在利益相互关系，以及三个维度的信任（能力、完整性和善良）。
results: 本研究提供了实用的洞察，可以用于进一步提高用户体验和指导AI相关公共政策。

Abstract
To realize the potential benefits and mitigate potential risks of AI, it is necessary to develop a framework of governance that conforms to ethics and fundamental human values. Although several organizations have issued guidelines and ethical frameworks for trustworthy AI, without a mediating governance structure, these ethical principles will not translate into practice. In this paper, we propose a multilevel governance approach that involves three groups of interdependent stakeholders: governments, corporations, and citizens. We examine their interrelationships through dimensions of trust, such as competence, integrity, and benevolence. The levels of governance combined with the dimensions of trust in AI provide practical insights that can be used to further enhance user experiences and inform public policy related to AI.

摘要
为了实现人工智能的潜在优势并mitigate其潜在风险，需要建立一个遵循伦理和基本人类价值观的管理框架。虽然一些组织已经发布了信任worthy AI的指南和伦理体系，但无法mediating governance结构，这些伦理原则将不会在实践中传递。在这篇论文中，我们提议一种多级管理方法，其包括三个相互依赖的团队：政府、企业和公民。我们研究这些团队之间的关系通过信任的维度，如能力、完整性和好意。这些管理层结合了信任的维度，可以为AI的用户经验提供实用的洞察，同时也可以为AI相关的公共政策提供指导。

Causal Reinforcement Learning: A Survey

paper_url: http://arxiv.org/abs/2307.01452
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang
for: 本文主要是为了介绍 causal reinforcement learning，一种增强现有算法的措施，通过 incorporating causal relationships 来增强知识传递效果。
methods: 本文首先介绍了 causality 和 reinforcement learning 的基本概念，然后解释了如何通过 causality Address core challenges in non-causal reinforcement learning。最后，文章系统地审视了现有的 causal reinforcement learning 方法，根据其 Target problems 和方法ologies 进行分类。
results: 本文提供了一个 comprehensive review of the literature on causal reinforcement learning，包括了现有的方法和技术，以及未来的开发方向。

Abstract
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.

摘要
<>将文本翻译成简化字版本。<>根据序列决策问题下的不确定性，力量学习是一种重要的思想模式。虽然过去几十年内有很多出色的成就，但是在实际应用中仍然存在很多挑战。其中一个主要的障碍是，力量学习代理人缺乏世界的基本理解，因此必须通过多次尝试和错误互动来学习。它们还可能面临困难提供决策的解释和泛化获得的知识。然而， causality 却提供了一个明显的优势，即可以系统地形式化知识，并利用不变性来实现有效的知识传递。这导致了 causal reinforcement learning 的出现，这是一种增强现有算法的方法，通过包含 causal 关系来增强学习过程。在这篇评论中，我们系统地介绍了 causal reinforcement learning 的文献。我们首先介绍了 causality 和 reinforcement learning 的基本概念，然后解释了如何通过 causality 解决非 causal reinforcement learning 中的核心挑战。然后，我们按照目标问题和方法分类地系统地审查了现有的 causal reinforcement learning 方法。最后，我们列出了未解决的问题和未来方向。

A Double Machine Learning Approach to Combining Experimental and Observational Data

paper_url: http://arxiv.org/abs/2307.01449
repo_url: None
paper_authors: Marco Morucci, Vittorio Orlandi, Harsh Parikh, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky
for: combines experimental and observational studies to test for assumption violations and estimate treatment effects consistently
methods: double machine learning approach, tests for violations of external validity and ignorability under milder assumptions, semi-parametrically efficient treatment effect estimators
results: demonstrated in three real-world case studies, relevant for practical settings

Abstract
Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one assumption is violated, we provide semi-parametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. We demonstrate the applicability of our approach in three real-world case studies, highlighting its relevance for practical settings.

摘要
实验和观察研究经常受到有效性问题，因为假设未经测试。我们提出了一种双机器学习方法，将实验和观察研究结合起来，让实践者可以一直测试假设违反和处理效果的一致性。我们的框架测试了外在效应和忽略性的违反，假设更加轻松。只有一个假设违反时，我们提供了半 Parametric 有效的处理效果估计器。但我们的食物免费 theorem 表明，要准确地识别违反的假设，以确保可靠地估计处理效果。我们在三个实际案例中示例出了我们的方法的实用性。

TablEye: Seeing small Tables through the Lens of Images

paper_url: http://arxiv.org/abs/2307.02491
repo_url: None
paper_authors: Seung-eon Lee, Sang-Chul Lee
for: 本研究旨在探讨几shot表格学习的可能性，尤其是在表格数据中缺乏 Labeling 的情况下。
methods: 本研究提出了一种名为 TablEye 的框架，通过域转换来缓解表格数据中的限制，并采用了一些已经证明有效的几shot学习算法和嵌入函数。
results: TablEye 在 4-shot 任务中的最高 AUC 为 0.11，在 1-shot 设置中的平均提前率为 3.17%。这表明 TablEye 在几shot表格学习中表现出色。

Abstract
The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.

摘要
《几招学习 tabular 数据的探索成为必要。表格数据是一种多元的表示方式，涵盖了多种信息，但它并不免受数据和模型大小的限制。对于广泛的表格数据进行标签可能是困难的，而且可能无法捕捉到所有重要的特征。然而，几招 tabular 学习仍然 relativity 未explored，主要是因为独立的数据集之间的共享信息缺乏，以及表格数据中的内在含糊性。根据我们所知，没有有意义且不受限制的几招 tabular 学习技术有 been developed，不需要对数据集做任何限制。在这篇论文中，我们提出了一个创新的框架，即 TablEye，以解决表格数据的限制。TablEye 采用域转换来强制保留表格数据的内在 semantics，并采用了已经测试过的几招学习算法和嵌入函数来获取和应用优先知识。通过共享数据域，我们可以利用这些优先知识，原本从图像领域学习而来。 TablEye 在 4 折事件中的最大 AUC 为 0.11，在 1 折事件中的 STUNT 中平均高于 TabLLM 3.17% 的精度。》

Garbage in, garbage out: Zero-shot detection of crime using Large Language Models

paper_url: http://arxiv.org/abs/2307.06844
repo_url: https://github.com/anjsimmo/zero-shot-crime-detection
paper_authors: Anj Simmons, Rajesh Vasa
for: 这个论文旨在利用大语言模型学习的通用常识来进行零极端理解犯罪，给出文本描述的视频surveillance。
methods: 该论文使用大语言模型进行零极端理解犯罪，但是需要手动将视频转换成高质量的文本描述。
results: 研究发现，当视频被手动转换成高质量的文本描述时，大语言模型可以通过零极端理解犯罪，并且性能与现有的状态时技术相当。但是，现有的自动视频到文本转换方法无法生成高质量的视频描述，导致大语言模型输出垃圾结果。

Abstract
This paper proposes exploiting the common sense knowledge learned by large language models to perform zero-shot reasoning about crimes given textual descriptions of surveillance videos. We show that when video is (manually) converted to high quality textual descriptions, large language models are capable of detecting and classifying crimes with state-of-the-art performance using only zero-shot reasoning. However, existing automated video-to-text approaches are unable to generate video descriptions of sufficient quality to support reasoning (garbage video descriptions into the large language model, garbage out).

摘要

Unsupervised Feature Learning with Emergent Data-Driven Prototypicality

paper_url: http://arxiv.org/abs/2307.01421
repo_url: None
paper_authors: Yunhui Guo, Youren Zhang, Yubei Chen, Stella X. Yu
for: 在一个没有标签的图像集中，我们的目标是训练一个模型，使其对每个图像映射到一个特征空间中，以便不仅 proximity 表示视觉相似性，而且图像的位置直接表示数据集中的趋势。
methods: 我们的关键发现是在径比空间中进行无监督特征学习，而不是在欧几里得空间中。在这个空间中，点之间的距离仍然表示图像相似性，而且我们获得了更多的容量来表示数据集中的趋势。
results: 我们提出了一种无监督特征学习算法，使用径比空间中的圆束排序（HACK）。HACK 首先生成了径比空间中的均匀填充的粒子，然后将每个图像分配给每个粒子。在凝固后，图像会更加典型地表示数据集。我们的特征映射器只需要在径比空间中扩散训练实例，我们发现图像会靠近起始点，确认我们的想法：无监督的趋势发现。我们展示了我们的数据驱动的趋势可以轻松地实现无监督实例选择，提高模型的普适性和对于非典型实例的鲁棒性。

Abstract
Given an image set without any labels, our goal is to train a model that maps each image to a point in a feature space such that, not only proximity indicates visual similarity, but where it is located directly encodes how prototypical the image is according to the dataset. Our key insight is to perform unsupervised feature learning in hyperbolic instead of Euclidean space, where the distance between points still reflect image similarity, and yet we gain additional capacity for representing prototypicality with the location of the point: The closer it is to the origin, the more prototypical it is. The latter property is simply emergent from optimizing the usual metric learning objective: The image similar to many training instances is best placed at the center of corresponding points in Euclidean space, but closer to the origin in hyperbolic space. We propose an unsupervised feature learning algorithm in Hyperbolic space with sphere pACKing. HACK first generates uniformly packed particles in the Poincar\'e ball of hyperbolic space and then assigns each image uniquely to each particle. Images after congealing are regarded more typical of the dataset it belongs to. With our feature mapper simply trained to spread out training instances in hyperbolic space, we observe that images move closer to the origin with congealing, validating our idea of unsupervised prototypicality discovery. We demonstrate that our data-driven prototypicality provides an easy and superior unsupervised instance selection to reduce sample complexity, increase model generalization with atypical instances and robustness with typical ones.

摘要
We propose an unsupervised feature learning algorithm in hyperbolic space with sphere pACKing. HACK first generates uniformly packed particles in the Poincaré ball of hyperbolic space and then assigns each image uniquely to each particle. Images after congealing are regarded as more typical of the dataset they belong to. With our feature mapper simply trained to spread out training instances in hyperbolic space, we observe that images move closer to the origin with congealing, validating our idea of unsupervised prototypicality discovery.We demonstrate that our data-driven prototypicality provides an easy and superior unsupervised instance selection to reduce sample complexity, increase model generalization with atypical instances, and robustness with typical ones.

Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks

paper_url: http://arxiv.org/abs/2307.03197
repo_url: None
paper_authors: Aysha Thahsin Zahir Ismail, Raj Mani Shukla
for: 这篇论文旨在研究分布式协同机器学习（DCML）中数据欺诈攻击的影响。特别是在DCML中 Split learning（SL）和 Federated Learning（FL）的混合方法（SplitFed Learning，SFL）中。
methods: 本研究提出了三种新型攻击策略，包括无目标攻击、Targeted攻击和距离基于攻击。这些攻击策略的目标是降低基于DCML的分类器性能。
results: 经过对三种攻击策略的测试和分析，结果显示无目标和距离基于攻击对SFL中的分类器性能有更大的影响，而Targeted攻击的影响相对较小。这些结果也在两个不同的案例研究中进行了验证。

Abstract
Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL

摘要
分布式合作机器学习（DCML）是一种可能的中央机器学习隐私问题的解决方案。Split learning（SL）和联邦学习（FL）是DCML中两种有效的学习方法。在最近，关于FL和SL的混合，即SplitFed Learning（SFL）的兴趣增加。这项研究是DCML基于类别器性能下数据毒素攻击的第一个研究。我们提出了三种新的攻击策略，包括无目标、targeted和距离基于攻击，这些攻击策略的目的都是降低DCML基于类别器的性能。我们对两个不同的案例进行了电势心跳信号分类和自动手写数字识别的测试，并通过变化客户端中的Percentage of malicious clients和模型Split层来进行了一系列攻击实验。结果表明，无目标和距离基于攻击更有可能影响DCML基于类别器的性能，compared to targeted attacks。

Learning to Communicate using Contrastive Learning

paper_url: http://arxiv.org/abs/2307.01403
repo_url: https://github.com/SonamSangpoLama/Music-Genre-Classification
paper_authors: Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch
for: 这篇论文的目的是提出一种基于对比学习的通信方法，以便在多智能RL中实现有效的协调。
methods: 该方法利用在智能代理之间交换的通信信息，并通过分析这些信息的关系来学习有效的通信方式。
results: 该方法在通信 essencial 环境下表现出色，比前一些工作更高效和更快速地学习。此外，该方法还能够使通信更加 симметричной，并capture环境中的全局状态信息。

Abstract
Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.

摘要
通信是多智能体RL中协调的有力工具。但是引入有效、通用语言是一项困难挑战，特别是在分布式设定下。在这项工作中，我们提出一种不同的视角，即在智能体之间交换的通信消息被视为环境状态的不同不完整的视图。通过对交换的消息之间关系的检查，我们提议使用对比学习来最大化交换消息序列中的相互信息。在通信必需的环境下，我们的方法在性能和学习速度两个方面都超过了前一个工作。使用质量度量和表示探测，我们表明我们的方法导致更Symmetric的通信和捕捉环境中的全局状态信息。总之，我们展示了对比学习的力量和通信消息作为编码的利用对RL中的协调有益。

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

paper_url: http://arxiv.org/abs/2307.01394
repo_url: None
paper_authors: Niranda Perera, Arup Kumar Sarker, Mills Staylor, Gregor von Laszewski, Kaiying Shan, Supun Kamburugamuve, Chathura Widanage, Vibhatha Abeykoon, Thejaka Amila Kanewela, Geoffrey Fox
for: The paper aims to improve the performance of data processing pipelines by optimizing the use of Dataframes, which are widely used in data engineering applications.
methods: The authors propose a cost model for evaluating parallel processing patterns for distributed Dataframe operators and evaluate the performance of their reference runtime implementation, Cylon, on the ORNL Summit supercomputer.
results: The authors evaluate the performance of Cylon on the ORNL Summit supercomputer and present the results, which demonstrate the potential for improving the performance of data processing pipelines using their proposed approach.Here is the same information in Simplified Chinese text:
for: 这篇论文目标是提高数据处理管道的性能，通过优化广泛使用的数据框架（DataFrame）。
methods: 作者提出了一个评估分布式数据框架操作的成本模型，并评估了他们的参考运行时实现（Cylon）在ORNL Summit超级计算机上的性能。
results: 作者在ORNL Summit超级计算机上评估了Cylon的性能，并显示了他们的提案可以提高数据处理管道的性能。

Abstract
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer.

摘要
“数据科学领域在过去十年内发展壮大，主要归功于大数据革命。人工智能（AI）和机器学习（ML）的应用在数据工程领域中增加了复杂性，现在已经成为数据处理管道的一部分，处理 terrabytes 的数据。通常，数据预处理过程中需要投入大量时间，因此提高预处理效率直接影响整个管道性能。社区最近广泛接受了数据帧作为数据表示和修改的标准数据结构。然而，当前最广泛使用的串行数据帧（R、pandas）在处理中规模较大的数据集时会出现性能限制。我们认为，从高性能计算的角度来看这个问题，还有很多空间提高。在先前的发表文章中，我们提出了分布式数据帧操作的并行处理模式和参考实现 Cylon 。在这篇论文中，我们将对此概念进行扩展，并提出一种成本模型来评估所提出的Pattern。此外，我们还在 ORNL Summit 超级计算机上评估了 Cylon 的性能。”

Depth video data-enabled predictions of longitudinal dairy cow body weight using thresholding and Mask R-CNN algorithms

paper_url: http://arxiv.org/abs/2307.01383
repo_url: https://github.com/yebigithub/BW_dairy
paper_authors: Ye Bi, Leticia M. Campos, Jin Wang, Haipeng Yu, Mark D. Hanigan, Gota Morota
for: 这个研究的目的是预测牛体重，并使用视频数据来预测。
methods: 研究使用了深度学习segmentation方法，包括单resholding、自适应阈值和Mask R-CNN等三种方法来 segment牛体从背景。
results: 研究发现，使用Mask R-CNN方法和线性混合模型可以获得最佳预测系数和平均绝对误差，即0.98和2.03%。此外，这种方法也在留三头牛掉cross-validation中表现最佳。

Abstract
Monitoring cow body weight is crucial to support farm management decisions due to its direct relationship with the growth, nutritional status, and health of dairy cows. Cow body weight is a repeated trait, however, the majority of previous body weight prediction research only used data collected at a single point in time. Furthermore, the utility of deep learning-based segmentation for body weight prediction using videos remains unanswered. Therefore, the objectives of this study were to predict cow body weight from repeatedly measured video data, to compare the performance of the thresholding and Mask R-CNN deep learning approaches, to evaluate the predictive ability of body weight regression models, and to promote open science in the animal science community by releasing the source code for video-based body weight prediction. A total of 40,405 depth images and depth map files were obtained from 10 lactating Holstein cows and 2 non-lactating Jersey cows. Three approaches were investigated to segment the cow's body from the background, including single thresholding, adaptive thresholding, and Mask R-CNN. Four image-derived biometric features, such as dorsal length, abdominal width, height, and volume, were estimated from the segmented images. On average, the Mask-RCNN approach combined with a linear mixed model resulted in the best prediction coefficient of determination and mean absolute percentage error of 0.98 and 2.03%, respectively, in the forecasting cross-validation. The Mask-RCNN approach was also the best in the leave-three-cows-out cross-validation. The prediction coefficients of determination and mean absolute percentage error of the Mask-RCNN coupled with the linear mixed model were 0.90 and 4.70%, respectively. Our results suggest that deep learning-based segmentation improves the prediction performance of cow body weight from longitudinal depth video data.

摘要
监测牛体重是重要的 farm 管理决策的支持因素，因为它直接关系着牛的生长、营养状况和健康。牛体重是一个 repeating 特征，但大多数之前的体重预测研究只使用了单点时间收集的数据。此外，使用视频深度学习 segmentation 对体重预测还没有得到充分的答案。因此，本研究的目标是预测牛体重从重复测量的视频数据，比较深度学习 segmentation 和阈值分割的性能，评估体重预测模型的预测能力，并促进动物科学社区开放科学的发展。总的来说，我们获得了 10 头排 milk Holstein 牛和 2 头不排 milk Jersey 牛的 40,405 个深度图像和深度图像文件。我们 investigate 三种方法来 segment 牛的身体和背景，包括单个阈值、自适应阈值和Mask R-CNN。从 segmented 图像中，我们估算了牛身体的四个图像特征，包括背部长度、腹部宽度、高度和体积。在预测cross-validation中，Mask R-CNN 方法与直线混合模型结合得到了最好的预测 coefficient of determination 和 mean absolute percentage error，具体值分别为 0.98 和 2.03%。在离别三只牛 cross-validation 中，Mask R-CNN 方法也是最好的。预测 coefficient of determination 和 mean absolute percentage error 的值分别为 0.90 和 4.70%。我们的结果表明，使用深度学习 segmentation 可以提高牛体重预测的精度，从长itudinal depth video 数据中预测牛体重。

Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models

paper_url: http://arxiv.org/abs/2307.01379
repo_url: https://github.com/jinhaoduan/shifting-attention-to-relevance
paper_authors: Jinhao Duan, Hao Cheng, Shiqi Wang, Chenan Wang, Alex Zavalny, Renjing Xu, Bhavya Kailkhura, Kaidi Xu
for: 该研究旨在解决自然语言生成模型（LLM）中的不确定性评估问题，即用户可以信任模型输出的问题。
methods: 该研究使用了自动进行反推的LLM，并研究了生成不均衡（generative inequalities）如何影响不确定性评估。
results: 实验结果表明，可以通过对更重要的（relevant）组件进行共同偏移注意力来解决生成不均衡导致的偏袋。这种方法被称为共同偏移注意力到更重要的组件（SAR）。

Abstract
Although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.

摘要
although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.Here's the translation in Traditional Chinese:although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.

A CNN regression model to estimate buildings height maps using Sentinel-1 SAR and Sentinel-2 MSI time series

paper_url: http://arxiv.org/abs/2307.01378
repo_url: None
paper_authors: Ritu Yadav, Andrea Nascetti, Yifang Ban
For: The paper is written for urban planning, infrastructure management, and environmental analysis, with a focus on accurately estimating building heights using satellite time series data.* Methods: The paper proposes a supervised Multimodal Building Height Regression Network (MBHR-Net) that uses Sentinel-1 (S1) and Sentinel-2 (S2) satellite time series data to estimate building heights at 10m spatial resolution. The model extracts meaningful features from both S1 and S2 images to learn complex spatio-temporal relationships between image patterns and building heights.* Results: The preliminary results demonstrate the effectiveness of the MBHR-Net model in accurately estimating building heights, with a Root Mean Squared Error (RMSE) of 3.73m, an Intersection over Union (IOU) of 0.95, and an R-squared (R2) score of 0.61. These results show the potential of the model for urban planning, environmental impact analysis, and other related applications.Here is the simplified Chinese text for the three key information points:* 为：这篇论文是为城市规划、基础设施管理和环境分析而写的，旨在准确估计建筑高度使用卫星时序数据。* 方法：该论文提出一种监督式多Modal Building Height Regression Network（MBHR-Net），使用卫星时序数据Sentinel-1（S1）和Sentinel-2（S2）来估计建筑高度的10米空间分辨率。MBHR-Net利用S1和S2图像中的有价值特征来学习建筑高度与图像模式之间的复杂空间时间关系。* 结果：初步结果表明MBHR-Net模型可以准确估计建筑高度，RMSE为3.73米，IOU为0.95，R2为0.61。这些结果表明MBHR-Net模型在城市规划、环境影响分析等领域有广泛的应用前景。

Abstract
Accurate estimation of building heights is essential for urban planning, infrastructure management, and environmental analysis. In this study, we propose a supervised Multimodal Building Height Regression Network (MBHR-Net) for estimating building heights at 10m spatial resolution using Sentinel-1 (S1) and Sentinel-2 (S2) satellite time series. S1 provides Synthetic Aperture Radar (SAR) data that offers valuable information on building structures, while S2 provides multispectral data that is sensitive to different land cover types, vegetation phenology, and building shadows. Our MBHR-Net aims to extract meaningful features from the S1 and S2 images to learn complex spatio-temporal relationships between image patterns and building heights. The model is trained and tested in 10 cities in the Netherlands. Root Mean Squared Error (RMSE), Intersection over Union (IOU), and R-squared (R2) score metrics are used to evaluate the performance of the model. The preliminary results (3.73m RMSE, 0.95 IoU, 0.61 R2) demonstrate the effectiveness of our deep learning model in accurately estimating building heights, showcasing its potential for urban planning, environmental impact analysis, and other related applications.

摘要
准确估算建筑高度是城市规划、基础设施管理和环境分析中非常重要的。在这项研究中，我们提出了一种监督式多Modal Building Height Regression Network（MBHR-Net），用于使用Sentinel-1（S1）和Sentinel-2（S2）卫星时序序数据来估算建筑高度的10米分辨率。S1提供Synthetic Aperture Radar（SAR）数据，可以提供建筑结构信息，而S2提供多spectral数据，敏感于不同的地面覆盖类型、植被生长阶段和建筑阴影。我们的MBHR-Net通过提取S1和S2图像中有用的特征来学习图像模式和建筑高度之间的复杂空间时间关系。模型在荷兰10座城市进行训练和测试。使用Root Mean Squared Error（RMSE）、Intersection over Union（IOU）和R-squared（R2） metric来评估模型的性能。初步结果（3.73米RMSE、0.95 IoU、0.61 R2）表明我们的深度学习模型在准确地估算建筑高度方面表现出色，这有助于城市规划、环境影响分析等相关应用。

Efficient Determination of Safety Requirements for Perception Systems

paper_url: http://arxiv.org/abs/2307.01371
repo_url: None
paper_authors: Sydney M. Katz, Anthony L. Corso, Esen Yel, Mykel J. Kochenderfer
for: 本研究旨在提高安全性的感知系统设计，通过将高级安全要求转化为组件级别的要求。
methods: 本研究使用了 Gaussian processes 和 threshold bandits 等常见黑盒估计技术，并将其结合起来开发了一种新的估计方法，称为 smoothing bandits。
results: 在基于视觉的飞机冲突避免问题上进行了实验，并显示了 compared to Gaussian process 和 threshold bandit 基elines，smoothing bandits 方法可以提高准确性和效率。

Abstract
Perception systems operate as a subcomponent of the general autonomy stack, and perception system designers often need to optimize performance characteristics while maintaining safety with respect to the overall closed-loop system. For this reason, it is useful to distill high-level safety requirements into component-level requirements on the perception system. In this work, we focus on efficiently determining sets of safe perception system performance characteristics given a black-box simulator of the fully-integrated, closed-loop system. We combine the advantages of common black-box estimation techniques such as Gaussian processes and threshold bandits to develop a new estimation method, which we call smoothing bandits. We demonstrate our method on a vision-based aircraft collision avoidance problem and show improvements in terms of both accuracy and efficiency over the Gaussian process and threshold bandit baselines.

摘要
Here's the text in Simplified Chinese:感知系统作为整体自主架构的一部分，感知系统设计师需要优化性能特性，同时保持关于全关环境的安全性。为此，可以将高层级的安全需求转换为 ком成器-level的需求。在这个工作中，我们专注于使用黑盒模拟器来划出安全的感知系统性能特性集。我们结合了常见的黑盒估计技术，如 Gaussian processes 和阈值拍赌，开发了一种新的估计方法，我们称之为“缓和拍赌”。我们在基于 computer vision 的飞机回避撞击问题上进行了实验，并证明了我们的方法在精度和效率上具有改进。

Minimizing Age of Information for Mobile Edge Computing Systems: A Nested Index Approach

paper_url: http://arxiv.org/abs/2307.01366
repo_url: None
paper_authors: Shuo Chen, Ning Yang, Meng Zhang, Jun Wang
for: 实现实时应用需求，减少信息新鲜度指标 Age-of-Information (AoI) 的延迟。
methods: 利用移动 Edge 计算 (MEC) 技术，将任务从移动设备卸载到 Edge 节点进行计算，以提高计算效率。
results: 提出一种基于 Restless Multi-Arm-Bandit (RMAB) 问题的嵌入式指标框架，并设计一种嵌入式指标策略，可以提供可靠性和准确性的平衡。该策略可以减少优化率差距达40%，并且随系统缩放因子增大，可以达到下界的静态优化。

Abstract
Exploiting the computational heterogeneity of mobile devices and edge nodes, mobile edge computation (MEC) provides an efficient approach to achieving real-time applications that are sensitive to information freshness, by offloading tasks from mobile devices to edge nodes. We use the metric Age-of-Information (AoI) to evaluate information freshness. An efficient solution to minimize the AoI for the MEC system with multiple users is non-trivial to obtain due to the random computing time. In this paper, we consider multiple users offloading tasks to heterogeneous edge servers in a MEC system. We first reformulate the problem as a Restless Multi-Arm-Bandit (RMAB) problem and establish a hierarchical Markov Decision Process (MDP) to characterize the updating of AoI for the MEC system. Based on the hierarchical MDP, we propose a nested index framework and design a nested index policy with provably asymptotic optimality. Finally, the closed form of the nested index is obtained, which enables the performance tradeoffs between computation complexity and accuracy. Our algorithm leads to an optimality gap reduction of up to 40%, compared to benchmarks. Our algorithm asymptotically approximates the lower bound as the system scalar gets large enough.

摘要
使用移动设备和边缘节点的计算多样性，移动边计算（MEC）提供了一种高效的方法来实现快速应用程序，这些应用程序需要快速获取新信息。我们使用年龄信息新鲜度（AoI）度量来评估信息新鲜度。由于移动设备到边缘节点的任务下载是随机的，因此设计一个高效的解决方案来减少AoI的差值是非常困难的。在这篇论文中，我们考虑了多个用户将任务下载到不同类型的边缘服务器。我们首先将问题转化为一个闹鼓多臂投资（RMAB）问题，然后建立了一个层次的Markov决策过程（MDP）来描述MEC系统中AoI的更新。基于层次MDP，我们提出了一个嵌入式索引框架，并设计了一个嵌入式索引策略，其可以证明是 asymptotic 优化的。最后，我们计算出嵌入式索引的闭合形式，这使得我们可以实现计算复杂度和准确性之间的性能质量负荷。我们的算法可以减少优化缺陷至多40%，相比 benchmark。我们的算法可以在系统缩放比例很大时，近似于下界。

Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2307.01316
repo_url: https://github.com/cav-research-lab/safe-reinforcement-learning-using-symbolic-logical-programming-for-autonomous-highway-driving
paper_authors: Iman Sharifi, Mustafa Yildirim, Saber Fallah
for: 本研究旨在开发一种能够在真实环境中学习自动驾驶策略，并确保安全性的神经 символи学学习方法（DRLSL）。
methods: 本研究使用了神经网络和 симвоlic first-order 逻辑组合，以学习自动驾驶策略。
results: 实验结果表明，DRLSL 方法可以在真实环境中学习自动驾驶策略，并且在训练和测试阶段都可以避免不安全的行为。此外，DRLSL 方法在新的驾驶场景中表现更好，比传统 DRL 方法更快速地训练并且更好地泛化。

Abstract
The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.

摘要
Autonomous driving 环境的动态性和多样化的道路用户带来了决策的挑战。深度让资料学习（DRL）已经成为解决这个问题的流行方法。然而，现有的 DRL 解决方案主要在虚拟环境中应用，这限制了它们在实际环境中的部署。为了解决这个问题，这篇文章介绍了一种新的神经符号学习（DRLSL）方法，它结合了 DRL 的学习从经验和符号逻辑的知识驱动理解来实现在实时交互中的自动驾驶策略学习。这种创新的方法可以在实际环境中学习自动驾驶策略，同时保证安全。我们在高D数据集上实现了 DRLSL 框架，并在训练和测试阶段都避免了不安全的行为。此外，我们的结果表明，DRLSL 在训练阶段更快地 converges 和在新驾驶场景中更好地泛化。

Self-Tuning PID Control via a Hybrid Actor-Critic-Based Neural Structure for Quadcopter Control

paper_url: http://arxiv.org/abs/2307.01312
repo_url: None
paper_authors: Iman Sharifi, Aria Alasty
for: 这个研究旨在实时自适应PID控制器的设计和训练，以提高Quadrotor的态度和高度控制稳定性。methods: 这个研究使用了一个基于强迫学习的神经网络，用于自适应PID控制器的训练和决策。具体来说，这个神经网络包含了两个隐藏层和sigmoid滤波函数，并使用了Adaptive Momentum（ADAM）优化器和Back-Propagation（BP）算法进行学习。results: 研究结果显示，提案的方法在面对不确定模型参数和外部干扰时能够更加稳定和有效，并且在训练和决策过程中能够快速地适应环境变化。此外，这个方法也比常规PID控制器 WITH CONSTANT GAINS更好地表现，具体来说，它能够更好地应对Quadrotor的态度和高度控制需求。

Abstract
Proportional-Integrator-Derivative (PID) controller is used in a wide range of industrial and experimental processes. There are a couple of offline methods for tuning PID gains. However, due to the uncertainty of model parameters and external disturbances, real systems such as Quadrotors need more robust and reliable PID controllers. In this research, a self-tuning PID controller using a Reinforcement-Learning-based Neural Network for attitude and altitude control of a Quadrotor has been investigated. An Incremental PID, which contains static and dynamic gains, has been considered and only the variable gains have been tuned. To tune dynamic gains, a model-free actor-critic-based hybrid neural structure was used that was able to properly tune PID gains, and also has done the best as an identifier. In both tunning and identification tasks, a Neural Network with two hidden layers and sigmoid activation functions has been learned using Adaptive Momentum (ADAM) optimizer and Back-Propagation (BP) algorithm. This method is online, able to tackle disturbance, and fast in training. In addition to robustness to mass uncertainty and wind gust disturbance, results showed that the proposed method had a better performance when compared to a PID controller with constant gains.

摘要
提出了一种基于循环优化器的自适应PID控制器，用于飞行器的拜投和高度控制。在实验中，通过使用一个基于循环优化器的神经网络来自适应PID控制器的变量参数，并且通过使用一个模型自由的actor-critic型神经网络来调整动态参数。在训练过程中，使用了适应动量优化器和反射传播算法来学习神经网络。这种方法在线、能够抗扰动、快速训练，并且在飞行器的拜投和高度控制中表现更好。此外，这种方法还能够抗质量不确定和风 Gust扰动的影响。Here is the word-for-word translation of the given text into Simplified Chinese:提出了一种基于循环优化器的自适应PID控制器，用于飞行器的拜投和高度控制。在实验中，通过使用一个基于循环优化器的神经网络来自适应PID控制器的变量参数，并且通过使用一个模型自由的actor-critic型神经网络来调整动态参数。在训练过程中，使用了适应动量优化器和反射传播算法来学习神经网络。这种方法在线、能够抗扰动、快速训练，并且在飞行器的拜投和高度控制中表现更好。

Reliable AI: Does the Next Generation Require Quantum Computing?

paper_url: http://arxiv.org/abs/2307.01301
repo_url: None
paper_authors: Aras Bacho, Holger Boche, Gitta Kutyniok
for: The paper explores the question of whether quantum computing is necessary for the next generation of artificial intelligence.
methods: The paper uses various computational models, including digital and analog computing models, to evaluate the limitations of current artificial intelligence systems and the potential benefits of quantum computing.
results: The paper finds that current digital computing models are limited in their ability to solve certain problems, such as optimization and deep learning, and that analog computing models may offer a way to overcome these limitations. However, even when using quantum computing models, some limitations persist.

Abstract
In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.

摘要
在这份调查中，我们想要探讨人工智能下一代是否需要量子计算。人工智能在我们日常生活中越来越重要，是四个工业革命中的核心。因此，人工智能的可靠性和信worthiness是非常重要。然而，人工智能的可靠性还存在许多问题，如隐私、责任、安全和安全性，在自动驾驶、医疗、机器人等领域。这些问题的原因可能包括不充分的数据、偏见和稳定性问题，以及基本问题如计算机问题。这些计算机问题的根源在于数字硬件基于图灵机制，这是不可避免的离散的。注意，我们的发现表明，数字硬件在解决优化、深度学习和梯度方程问题方面存在极大的限制。因此，这些限制对人工智能领域，特别是机器学习领域产生了深刻的影响。此外，虽然广泛认为量子计算在某些问题上具有优势，但我们的发现表明，使用量子计算模型 based on quantum circuit或量子图灵机制时，这些限制仍然存在。相比之下，分析计算模型，如Blum-Shub-Smale机器，表现出可以突破这些限制的潜力。

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

paper_url: http://arxiv.org/abs/2307.01292
repo_url: None
paper_authors: Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov
for: 这种论文主要针对于Model-serving系统的安全性问题，具体来说是针对于模型EXTRACTION攻击的Robustness问题。methods: 该论文提出了一种新的查询减少算法，以及一种基于噪声的防御机制来对抗模型EXTRACTION攻击。results: 该论文表明，使用提出的查询减少算法和噪声防御机制可以减少模型EXTRACTION攻击的精度和准确率，同时可以保持系统的性能（goodput）在接受ABLE范围内。

Abstract
Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. Modern inference serving systems break this assumption. Thus, they cannot be directly applied to extract a victim model, as models are hidden behind a layer of abstraction exposed by the serving system. An attacker can no longer identify which model she is interacting with. To this end, we first propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained when attacking a single, explicitly specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Second, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. The proposed defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). Third, we show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We implement the proposed defense in a real system with plans to open source.

摘要
现代服务系统中的模型服务系统在实时网络应用中变得越来越流行，用户可以向服务器发送查询，并指定所需的性能指标（例如精度和响应时间）。服务器将维护一个模型集（模型zoo），并根据用户的查询来提供服务。这篇论文检查这些系统的安全性，尤其是对于模型提取攻击的Robustness。现有的黑盒攻击假设可以重复选择服务器提供的模型来进行推理请求。然而，现代推理服务系统破坏了这一假设，因此无法直接应用于提取受害者模型。攻击者无法确定她正在互动的模型。为此，我们首先提出了一种高效的询问 fingerprinting 算法，允许攻击者随时触发所需的模型。我们显示，使用我们的 fingerprinting 算法可以在 $1\%$ 的精度和准确性上达到单个、显式指定的模型攻击的精度和准确性水平，并且可以在 $14.6\%$ 的精度和 $7.7\%$ 的准确性上获得更高的提取精度和准确性。其次，我们采用噪音基的防御机制来抵御 fingerprinting，通过添加噪音到指定的性能指标来推迟攻击。我们的防御策略可以在中等模型提取 task 下 reducethe attack's accuracy和fidelity by up to $9.8\%$和$4.8\%$，respectively。最后，我们表明了我们的防御策略存在一定的质量负担和系统好put的负担，可以实现可配置的和显著的受害者模型提取保护，同时维护 Acceptable 的好put ($>80\%$).我们已经实现了我们的防御策略，计划将其开源。

Fighting the disagreement in Explainable Machine Learning with consensus

paper_url: http://arxiv.org/abs/2307.01288
repo_url: None
paper_authors: Antonio Jesus Banegas-Luna, Carlos Martınez-Cortes, Horacio Perez-Sanchez
for: 了解机器学习模型内部工作方式
methods: 使用解释算法进行模型解释
results: 研究发现提出的函数比其他函数更公正、更一致、更准确地解释了五种机器学习模型。

Abstract
Machine learning (ML) models are often valued by the accuracy of their predictions. However, in some areas of science, the inner workings of models are as relevant as their accuracy. To understand how ML models work internally, the use of interpretability algorithms is the preferred option. Unfortunately, despite the diversity of algorithms available, they often disagree in explaining a model, leading to contradictory explanations. To cope with this issue, consensus functions can be applied once the models have been explained. Nevertheless, the problem is not completely solved because the final result will depend on the selected consensus function and other factors. In this paper, six consensus functions have been evaluated for the explanation of five ML models. The models were previously trained on four synthetic datasets whose internal rules were known in advance. The models were then explained with model-agnostic local and global interpretability algorithms. Finally, consensus was calculated with six different functions, including one developed by the authors. The results demonstrated that the proposed function is fairer than the others and provides more consistent and accurate explanations.

摘要

Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the Human Connectome Development Cohort

paper_url: http://arxiv.org/abs/2307.05426
repo_url: None
paper_authors: Abdoljalil Addeh, Fernando Vega, Rebecca J Williams, Ali Golestani, G. Bruce Pike, M. Ethan MacDonald
for: 降低fMRI研究成本、简化实验设备、减轻参与者负担。
methods: 使用一维度卷积神经网络模型，从休息BOLD信号中捕捉有用的特征，重建实际的呼吸周期和呼吸变化时间序列。
results: CNN模型能够从休息BOLD信号中捕捉有用的特征，重建实际的呼吸周期和呼吸变化时间序列。

Abstract
In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries. It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows.

摘要
很多fMRI研究中，呼吸信号不可用或质量不符合要求。因此，直接从BOLD信号中除掉低频呼吸变化不能进行。这项研究提议一种一维Convolutional Neural Network（CNN）模型，用于重建两个呼吸指标：RV和RVT。结果表明，CNN可以从休息BOLD信号中捕捉有用的特征，重建实际的RV和RVT时序。预计通过该方法应用，将降低fMRI研究的成本，降低复杂性，并减轻参与者的负担，他们不需要穿着呼吸膜。

NeuBTF: Neural fields for BTF encoding and transfer

paper_url: http://arxiv.org/abs/2307.01199
repo_url: None
paper_authors: Carlos Rodriguez-Pardo, Konstantinos Kazatzis, Jorge Lopez-Moreno, Elena Garces
for: 本研究提出了一种新的神经材料表示法，用于解决神经材料的固定性问题，以提高渲染效果。
methods: 该方法使用导航图像作为输入，用于condition神经BTF的结构特征。然后，神经BTF可以用UVs、摄像机和光照 вектор进行查询。
results: 该方法在多种 sintetic和捕捉的材料上实现了竞争性的压缩率，并且能够学习表示多种光学性质。

Abstract
Neural material representations are becoming a popular way to represent materials for rendering. They are more expressive than analytic models and occupy less memory than tabulated BTFs. However, existing neural materials are immutable, meaning that their output for a certain query of UVs, camera, and light vector is fixed once they are trained. While this is practical when there is no need to edit the material, it can become very limiting when the fragment of the material used for training is too small or not tileable, which frequently happens when the material has been captured with a gonioreflectometer. In this paper, we propose a novel neural material representation which jointly tackles the problems of BTF compression, tiling, and extrapolation. At test time, our method uses a guidance image as input to condition the neural BTF to the structural features of this input image. Then, the neural BTF can be queried as a regular BTF using UVs, camera, and light vectors. Every component in our framework is purposefully designed to maximize BTF encoding quality at minimal parameter count and computational complexity, achieving competitive compression rates compared with previous work. We demonstrate the results of our method on a variety of synthetic and captured materials, showing its generality and capacity to learn to represent many optical properties.

摘要
神经材料表示法正在成为渲染中的受欢迎方法。它们比分析模型更加表达力，占用内存更少，但现有的神经材料都是不可变的，意味着它们在训练后的输出将 forever fixed。这在没有需要修改材料时是有用的，但在材料的 Fragment 太小或不可平铺时会变得非常局限。在这篇论文中，我们提出了一种新的神经材料表示法，该法同时解决了 BTF 压缩、瓦片和推理问题。在测试时，我们使用引导图像作为输入，通过神经 BTF 根据 UV、摄像头和光照向量进行访问。每个组件在我们的框架中都是为 maximize BTF 编码质量而设计，同时保持最低的参数计数和计算复杂度，与前一个工作相比，我们的压缩率具有竞争力。我们在多种 sintetic 和 captured 材料上展示了我们的方法的通用性和能力学习表示多种光学性质。

Squeezing Large-Scale Diffusion Models for Mobile

paper_url: http://arxiv.org/abs/2307.01193
repo_url: None
paper_authors: Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, Hyungjun Kim
for: 本文旨在探讨如何将稳定扩散模型（Stable Diffusion）部署到移动设备上，以便实现高精度图像生成。
methods: 本文使用了TensorFlow Lite框架来部署稳定扩散模型到移动设备上，并解决了由限制计算资源和存储空间所带来的问题。
results: 研究人员通过实现Mobile Stable Diffusion来降低了512x512像素图像生成的推理时间至少于7秒，在Android设备上使用移动GPU进行推理。

Abstract
The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more than one billion parameters to mobile devices poses distinctive challenges due to the limited computational and memory resources, which may vary according to the device. In this paper, we present the challenges and solutions for deploying Stable Diffusion on mobile devices with TensorFlow Lite framework, which supports both iOS and Android devices. The resulting Mobile Stable Diffusion achieves the inference latency of smaller than 7 seconds for a 512x512 image generation on Android devices with mobile GPUs.

摘要
diffusion 模型的出现对高精度图像生成带来了广泛的应用场景，从实际应用到学术研究都有了很大的进步。随着模型在各种实际应用中的推广，需要将模型部署到移动设备上的需求增长了。然而，将大 diffusion 模型（如稳定扩散），它有超过一十亿参数，部署到移动设备上具有限制的计算资源和内存资源的问题。在这篇文章中，我们介绍了将 Stable Diffusion 部署到移动设备上的挑战和解决方案，使用 TensorFlow Lite 框架支持 iOS 和 Android 设备。Mobile Stable Diffusion 实现了对 Android 设备上的移动 GPU 进行512x512像素图像生成的评估时间小于 7 秒。

SAMAug: Point Prompt Augmentation for Segment Anything Model

paper_url: http://arxiv.org/abs/2307.01187
repo_url: None
paper_authors: Haixing Dai, Chong Ma, Zhengliang Liu, Yiwei Li, Peng Shu, Xiaozheng Wei, Lin Zhao, Zihao Wu, Dajiang Zhu, Wei Liu, Quanzheng Li, Tianming Liu, Xiang Li
for: 这篇论文是为了提高交互式图像分割模型（SAM）的性能而写的。
methods: 这篇论文提出了一种新的视觉点增强方法，称为SAMAug，用于增强SAM的分割性能。SAMAug生成了增强点提示，以提供更多的信息给SAM。
results: 经过测试COCO、Fundus和Chest X-ray等数据集，研究发现SAMAug可以提高SAM的分割性能，尤其是使用最大差Entropy和Saliency模型方法时。这种方法表明了视觉提示工程可以推动交互计算机视觉模型的进步。

Abstract
This paper introduces SAMAug, a novel visual point augmentation method for the Segment Anything Model (SAM) that enhances interactive image segmentation performance. SAMAug generates augmented point prompts to provide more information to SAM. From the initial point prompt, SAM produces the initial mask, which is then fed into our proposed SAMAug to generate augmented point prompts. By incorporating these extra points, SAM can generate augmented segmentation masks based on the augmented point prompts and the initial prompt, resulting in improved segmentation performance. We evaluate four point augmentation techniques: random selection, maximum difference entropy, maximum distance, and a saliency model. Experiments on the COCO, Fundus, and Chest X-ray datasets demonstrate that SAMAug can boost SAM's segmentation results, especially using the maximum distance and saliency model methods. SAMAug underscores the potential of visual prompt engineering to advance interactive computer vision models.

摘要
这篇论文介绍了SAMAug，一种新的视觉点增强方法，用于提高Segment Anything Model（SAM）的交互图像分割性能。SAMAug生成了增强后的点提示，以提供更多信息给SAM。从初始点提示开始，SAM生成了初始面积，然后我们提posed SAMAug将这些增强后的点提示与初始提示相结合，以生成增强后的分割面积。通过这种方式，SAM可以基于增强后的点提示和初始提示来生成更好的分割结果。我们评估了四种点增强技术：随机选择、最大差异熵、最大距离和聚光模型。在COCO、Fundus和Chest X-ray数据集上进行了实验，结果表明，使用最大距离和聚光模型方法时，SAMAug可以提高SAM的分割结果，特别是在远程图像分割任务上。SAMAug还证明了视觉提示工程的潜力，可以推动交互计算机视觉模型的进步。

PlanE: Representation Learning over Planar Graphs

paper_url: http://arxiv.org/abs/2307.01180
repo_url: https://github.com/zzysonny/plane
paper_authors: Radoslav Dimitrov, Zeyang Zhao, Ralph Abboud, İsmail İlkan Ceylan
for: 本研究的目的是设计一个可以实现完整Graph Isomorphism的planar graphs Representation Learning架构。
methods: 本研究使用了PlanE框架，其中包括一些可以学习完整Graph Isomorphism的planar graphs Representation Learning架构。
results: 实验结果表明，PlanE框架可以实现高效地learning complete invariants over planar graphs，并在well-known planar graph benchmarks上achieve multiple state-of-the-art results。

Abstract
Graph neural networks are prominent models for representation learning over graphs, where the idea is to iteratively compute representations of nodes of an input graph through a series of transformations in such a way that the learned graph function is isomorphism invariant on graphs, which makes the learned representations graph invariants. On the other hand, it is well-known that graph invariants learned by these class of models are incomplete: there are pairs of non-isomorphic graphs which cannot be distinguished by standard graph neural networks. This is unsurprising given the computational difficulty of graph isomorphism testing on general graphs, but the situation begs to differ for special graph classes, for which efficient graph isomorphism testing algorithms are known, such as planar graphs. The goal of this work is to design architectures for efficiently learning complete invariants of planar graphs. Inspired by the classical planar graph isomorphism algorithm of Hopcroft and Tarjan, we propose PlanE as a framework for planar representation learning. PlanE includes architectures which can learn complete invariants over planar graphs while remaining practically scalable. We empirically validate the strong performance of the resulting model architectures on well-known planar graph benchmarks, achieving multiple state-of-the-art results.

摘要
граф neural networks是输入图形的表示学习模型的主要选择，其中的思想是通过一系列变换来计算输入图形中节点的表示，以确保学习的图函数是isoomorfism不变的，这使得学习的表示是图 invariants。然而，这类模型学习的图 invariants是不完整的：存在非isoomorfism的图对的标准图 neural networks无法分辨。这并不奇怪，因为普通图 isomorphism testing 是NP完备问题，但是在特殊的图类中，有高效的图 isomorphism testing 算法，如平面图。本文的目标是设计能够有效地学习平面图的完整 invariants 的架构。我们提出 PlanE 框架，它包括可以学习平面图中的完整 invariants 的建筑。我们验证了 PlanE 的实验性成果，在一些知名的平面图 bencmarks 上达到了多个 state-of-the-art 结果。

Don’t freeze: Finetune encoders for better Self-Supervised HAR

paper_url: http://arxiv.org/abs/2307.01168
repo_url: None
paper_authors: Vitor Fortes Rey, Dominique Nshimyimana, Paul Lukowicz
for: solves the labelled data availability problem in human activity recognition
methods: uses pretext tasks such as reconstruction or contrastive predictive coding to learn useful representations
results: substantial performance gains across pretext tasks, with the improvement inversely proportional to the amount of labelled data.

Abstract
Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive predictive coding, useful representations can be learned that then can be used for classification. Those approaches follow the pretrain, freeze and fine-tune procedure. In this paper we will show how a simple change - not freezing the representation - leads to substantial performance gains across pretext tasks. The improvement was found in all four investigated datasets and across all four pretext tasks and is inversely proportional to amount of labelled data. Moreover the effect is present whether the pretext task is carried on the Capture24 dataset or directly in unlabelled data of the target dataset.

摘要
近些时候，自我指导学习在人体活动识别领域被提出，作为标签数据可用性问题的解决方案。这种方法通过使用预测任务，如重建或对比预测编码，来学习有用的表示。这些方法遵循“预训练、冻结并微调”的过程。在这篇论文中，我们将展示一种简单的变化，即不冻结表示，导致了重大的性能提升，并在所有四个调查dataset和所有四个预测任务中都有效。此外，这种效果是无论预测任务是在Capture24 dataset上进行还是直接在无标签数据中进行的。

Human in the AI loop via xAI and Active Learning for Visual Inspection

paper_url: http://arxiv.org/abs/2307.05508
repo_url: None
paper_authors: Jože M. Rožanec, Elias Montini, Vincenzo Cutrona, Dimitrios Papamartzivanos, Timotej Klemenčič, Blaž Fortuna, Dunja Mladenić, Entso Veliou, Thanassis Giannetsos, Christos Emmanouilidis
for: 这篇论文主要是关于工业革命的影响和人机合作的研究。
methods: 论文使用了活动学习和可解释人工智能等两个人工智能子领域，以实现人机合作。
results: 论文提出了人机合作在视觉检测方面的可能性，并在欧盟H2020星计划中获得了一些关于视觉检测的结果，包括人工智能、人数字双生和网络安全等方面的研究。

Abstract
Industrial revolutions have historically disrupted manufacturing by introducing automation into production. Increasing automation reshapes the role of the human worker. Advances in robotics and artificial intelligence open new frontiers of human-machine collaboration. Such collaboration can be realized considering two sub-fields of artificial intelligence: active learning and explainable artificial intelligence. Active learning aims to devise strategies that help obtain data that allows machine learning algorithms to learn better. On the other hand, explainable artificial intelligence aims to make the machine learning models intelligible to the human person. The present work first describes Industry 5.0, human-machine collaboration, and state-of-the-art regarding quality inspection, emphasizing visual inspection. Then it outlines how human-machine collaboration could be realized and enhanced in visual inspection. Finally, some of the results obtained in the EU H2020 STAR project regarding visual inspection are shared, considering artificial intelligence, human digital twins, and cybersecurity.

摘要
工业革命历史上都会对制造进行重大的变革，通过引入自动化技术来提高生产效率。随着机器人和人工智能的发展，人工与机器之间的合作被打开了新的前ier。这种合作可以通过两个人工智能的子领域来实现：活动学习和可 explainable artificial intelligence。活动学习的目标是开发出用于帮助机器学习算法学习的策略，而可 explainable artificial intelligence的目标是使机器学习模型对人类更加明了。本文首先描述了第五代工业革命（Industry 5.0）、人机合作和当前在质量检查方面的状况，特别是视觉检查。然后，它详细介绍了如何通过人机合作来增强视觉检查。最后，本文分享了在欧盟H2020 STAR项目中关于视觉检查的一些结果，包括人工智能、人数字双和安全性。

Soft Gripping: Specifying for Trustworthiness

paper_url: http://arxiv.org/abs/2307.01159
repo_url: None
paper_authors: Dhaminda B. Abeywickrama, Nguyen Hao Le, Greg Chance, Peter D. Winter, Arianna Manzini, Alix J. Partridge, Jonathan Ives, John Downer, Graham Deacon, Jonathan Rossiter, Kerstin Eder, Shane Windsor
for: 这篇论文主要用于推动软体机器人技术的广泛应用，提高软体机器人的可靠性和信任性。
methods: 该论文提出了对软体机器人系统的规范化需求，包括功能性和非功能性需求，如可靠性、安全性、适应性、预测性、伦理和法规要求。
results: 该论文提出了一个广泛的软体机器人抓取器的规范，用于快递卸载各种商品。该规范覆盖了软体机器人抓取器的功能和非功能需求，以提高软体机器人的可靠性和信任性。

Abstract
Soft robotics is an emerging technology in which engineers create flexible devices for use in a variety of applications. In order to advance the wide adoption of soft robots, ensuring their trustworthiness is essential; if soft robots are not trusted, they will not be used to their full potential. In order to demonstrate trustworthiness, a specification needs to be formulated to define what is trustworthy. However, even for soft robotic grippers, which is one of the most mature areas in soft robotics, the soft robotics community has so far given very little attention to formulating specifications. In this work, we discuss the importance of developing specifications during development of soft robotic systems, and present an extensive example specification for a soft gripper for pick-and-place tasks for grocery items. The proposed specification covers both functional and non-functional requirements, such as reliability, safety, adaptability, predictability, ethics, and regulations. We also highlight the need to promote verifiability as a first-class objective in the design of a soft gripper.

摘要

Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01158
repo_url: None
paper_authors: Ini Oguntola, Joseph Campbell, Simon Stepputtis, Katia Sycara
for: 提高人工智能在多代理器中的社会智能能力，使其能够模拟人类的心理状态。
methods: 使用深度网络模型精准地表示政策，并将含义深刻的信仰ground在政策中。
results: 在混合合作竞争环境中实现了初步的实验成果。

Abstract
The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward signal for multi-agent reinforcement learning. Finally, we present preliminary empirical results in a mixed cooperative-competitive environment.

摘要
人类社交智能中能模拟他人的心理状态是关键，可以为人工智能 Agent 在多个 Agent 设置中带来类似的 beneficial 效果。我们提出了将 semantically meaningful 和 human-interpretable 的 belief 嵌入 deep network 中的策略中。然后，我们考虑了第二阶段的 belief 预测任务。我们认为每个 Agent 可以使用它他人的 belief 预测作为多 Agent 学习 reinforcement 中的潜在奖励信号。最后，我们提供了一些初步的实验结果在混合合作-竞争环境中。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

paper_url: http://arxiv.org/abs/2307.01139
repo_url: https://github.com/lupantech/ScienceQA
paper_authors: Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge
for: 这个论文主要是为了提高大型语言模型（LLM）与科学领域的整合。
methods: 这篇论文使用了一种名为SciTune的调整框架，以提高LLM的科学多Modal指令遵循能力。
results: 与机器生成数据进行finetuning的模型相比，LLaMA-SciTune在科学问答 benchMark中的表现平均高于人类表现，并在多个子类中也达到了人类水平。

Abstract
Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. In comparison to the models that are finetuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.

摘要
科学定uning是一种流行的思想，用于将大型语言模型（LLM）与人类意图相对应。尽管这个想法在改进现有基础模型方面得到了更多的关注，但是它在科学领域中的应用还很少。在这项工作中，我们提出了SciTune作为一种调整框架，用于改进LLM的遵循科学多模式指令的能力。为测试我们的方法ологи，我们使用了人类生成的科学指令调整数据集，并训练了一个大型多modal模型LLaMA-SciTune，该模型连接了视觉编码器和LLM，用于科学领域中的视觉和语言理解。与由机器生成数据进行Finetuning的模型相比，LLaMA-SciTune在科学问答标准 bencmark上平均和许多子类型方面超越了人类表现。

Exploring the In-context Learning Ability of Large Language Model for Biomedical Concept Linking

paper_url: http://arxiv.org/abs/2307.01137
repo_url: None
paper_authors: Qinyong Wang, Zhenxiang Gao, Rong Xu
for: This research aims to explore the effectiveness of large language models (LLMs) in biomedical concept mapping, specifically in the task of biomedical concept linking.
methods: The proposed approach uses a two-stage retrieve-and-rank framework that leverages in-context learning (ICL) capabilities of LLMs. The approach first embeds biomedical concepts using language models, and then uses embedding similarity to retrieve the top candidates. The contextual information of these candidates is incorporated into the prompt and processed by a large language model to re-rank the concepts.
results: The approach achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization, demonstrating competitive performance relative to supervised learning methods. Additionally, it showed a significant improvement (over 20-point absolute increase in F1 score) on an oncology matching dataset.

Abstract
The biomedical field relies heavily on concept linking in various areas such as literature mining, graph alignment, information retrieval, question-answering, data, and knowledge integration. Although large language models (LLMs) have made significant strides in many natural language processing tasks, their effectiveness in biomedical concept mapping is yet to be fully explored. This research investigates a method that exploits the in-context learning (ICL) capabilities of large models for biomedical concept linking. The proposed approach adopts a two-stage retrieve-and-rank framework. Initially, biomedical concepts are embedded using language models, and then embedding similarity is utilized to retrieve the top candidates. These candidates' contextual information is subsequently incorporated into the prompt and processed by a large language model to re-rank the concepts. This approach achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization, exhibiting a competitive performance relative to supervised learning methods. Further, it showed a significant improvement, with an over 20-point absolute increase in F1 score on an oncology matching dataset. Extensive qualitative assessments were conducted, and the benefits and potential shortcomings of using large language models within the biomedical domain were discussed. were discussed.

摘要
生物医学领域强调概念链接在文献检索、图像对alignment、信息检索、问答系统、数据和知识 интеграции等方面发挥重要作用。虽然大型自然语言处理模型（LLM）在许多自然语言处理任务中取得了 significiant进步，但它们在生物医学概念映射方面的效果尚未得到完全探索。本研究探讨了一种利用大型模型的在场学习（ICL）能力进行生物医学概念链接的方法。该方法采用了两个阶段的 retrieve-and-rank框架。首先，生物医学概念被使用语言模型进行嵌入，然后使用嵌入相似性来 retrieve top candidates。这些候选的上下文信息然后被 incorporated 到提示中，并被一个大型语言模型处理以重新排名概念。该方法在 BC5CDR 疾病实体Normalization 和化学实体Normalization 中实现了 90% 的准确率和 94.7% 的准确率，与超级vised learning方法相当。此外，它在生物医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学�

ChatGPT vs. Google: A Comparative Study of Search Performance and User Experience

paper_url: http://arxiv.org/abs/2307.01135
repo_url: None
paper_authors: Ruiyun Xu, Yue Feng, Hailiang Chen
for: investigate the differences in user behavior when employing search engines and chatbot tools for information-seeking tasks
methods: randomized online experiment, ChatGPT-like tool and Google Search-like tool
results: ChatGPT group consistently spends less time on all tasks, ChatGPT levels user search performance across different education levels, perceived information quality and user experience are better with ChatGPT, but may also lead to overreliance and generate or replicate misinformation.Here is the same information in Simplified Chinese text:
for: 研究用户在使用搜索引擎和 чат机器人工具时的行为差异
methods: 随机在线实验，使用ChatGPT类工具和Google搜索类工具
results: ChatGPT组的时间投入都比较短，ChatGPT在不同教育水平上的搜索性能相似，但可能会导致过依赖和生成或复制错误信息。

Abstract
The advent of ChatGPT, a large language model-powered chatbot, has prompted questions about its potential implications for traditional search engines. In this study, we investigate the differences in user behavior when employing search engines and chatbot tools for information-seeking tasks. We carry out a randomized online experiment, dividing participants into two groups: one using a ChatGPT-like tool and the other using a Google Search-like tool. Our findings reveal that the ChatGPT group consistently spends less time on all tasks, with no significant difference in overall task performance between the groups. Notably, ChatGPT levels user search performance across different education levels and excels in answering straightforward questions and providing general solutions but falls short in fact-checking tasks. Users perceive ChatGPT's responses as having higher information quality compared to Google Search, despite displaying a similar level of trust in both tools. Furthermore, participants using ChatGPT report significantly better user experiences in terms of usefulness, enjoyment, and satisfaction, while perceived ease of use remains comparable between the two tools. However, ChatGPT may also lead to overreliance and generate or replicate misinformation, yielding inconsistent results. Our study offers valuable insights for search engine management and highlights opportunities for integrating chatbot technologies into search engine designs.

摘要
《ChatGPT的出现：一项研究 traditional search engines的影响》Introduction:随着ChatGPT的出现，它使得用户对传统搜索引擎的使用方式和功能表现出了不同的需求和预期。本研究旨在探讨用户在使用搜索引擎和ChatGPT工具时的行为差异。我们采用了随机在线实验，将参与者分为两组：一组使用ChatGPT类工具，另一组使用Google搜索类工具。我们的发现表明，ChatGPT组的用户在所有任务上的时间投入相对较少，没有显著差异在总任务表现水平之间。另外，ChatGPT在简单问题和通用解决方案方面表现出色，但在事实核查任务方面表现不佳。用户对ChatGPT的回答评价为高信息质量，尽管显示类似的信任度。此外，使用ChatGPT的参与者对工具的用户体验比使用Google搜索更高，包括有用性、愉悦度和满意度，但易用性认知不异。然而，ChatGPT也可能导致过度依赖和生成或复制错误信息，导致不一致的结果。本研究对搜索引擎管理提供了有价值的发现，同时也探讨了将chatbot技术integrated into search engine designs的可能性。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction

paper_url: http://arxiv.org/abs/2307.01128
repo_url: None
paper_authors: Salvatore Carta, Alessandro Giuliani, Leonardo Piano, Alessandro Sebastian Podda, Livio Pompianu, Sandro Gabriele Tiddia
for: 本研究旨在提出一种可扩展和灵活的知识图生成方法，以解决现有知识图生成技术的瓶颈和局限性。
methods: 该方法基于最新的生成大语言模型GPT-3.5，包括迭代提示策略和外部知识无关策略，以解决知识图生成过程中的主要挑战。
results: 实验结果表明，该方法可以有效地生成高质量的知识图，并且可以应对不同的应用场景。

Abstract
In the current digitalization era, capturing and effectively representing knowledge is crucial in most real-world scenarios. In this context, knowledge graphs represent a potent tool for retrieving and organizing a vast amount of information in a properly interconnected and interpretable structure. However, their generation is still challenging and often requires considerable human effort and domain expertise, hampering the scalability and flexibility across different application fields. This paper proposes an innovative knowledge graph generation approach that leverages the potential of the latest generative large language models, such as GPT-3.5, that can address all the main critical issues in knowledge graph building. The approach is conveyed in a pipeline that comprises novel iterative zero-shot and external knowledge-agnostic strategies in the main stages of the generation process. Our unique manifold approach may encompass significant benefits to the scientific community. In particular, the main contribution can be summarized by: (i) an innovative strategy for iteratively prompting large language models to extract relevant components of the final graph; (ii) a zero-shot strategy for each prompt, meaning that there is no need for providing examples for "guiding" the prompt result; (iii) a scalable solution, as the adoption of LLMs avoids the need for any external resources or human expertise. To assess the effectiveness of our proposed model, we performed experiments on a dataset that covered a specific domain. We claim that our proposal is a suitable solution for scalable and versatile knowledge graph construction and may be applied to different and novel contexts.

摘要
在当今数字化时代，捕捉并有效地表达知识是许多实际场景中的关键。在这个 контексте，知识图表示一种可观之的工具，可以快速地检索和组织大量信息，并将其拼接成可读可写的结构。然而，知识图的生成仍然是一个挑战，经常需要大量的人工劳动和领域专业知识，从而限制了扩展性和灵活性在不同应用领域。这篇论文提出了一种创新的知识图生成方法，利用最新的生成大语言模型，如GPT-3.5，解决了知识图生成中的主要挑战。该方法通过一个包含新的迭代零shot和外部知识无关策略的管道来实现。我们的独特 manifoldapproach可能带来了科学社区的重要收益。具体来说，主要贡献可以概括为：（i）一种创新的迭代Prompt大语言模型中提取 relevante组件的策略；（ii）每个Prompt不需要提供示例，即零shot策略；（iii）可扩展的解决方案，因为采用LLMs可以避免任何外部资源或人类专业知识的需求。为评估我们提出的模型效果，我们在特定领域中进行了实验。我们宣称，我们的提案是一种可扩展和多样化的知识图生成方法，可以应用于不同的和新的上下文。

2023-07-04

cs.CL

cs.CL - 2023-07-04

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data

paper_url: http://arxiv.org/abs/2307.01764
repo_url: https://github.com/the-anonymous-bs/espnet
paper_authors: Guangzhi Sun, Chao Zhang, Ivan Vulić, Paweł Budzianowski, Philip C. Woodland
for: 提高 task-oriented dialogue (ToD) 系统中的 slot-filling 精度和效率，即使具有有限的标注数据。
methods: 提出了一种 Knowledge-Aware Audio-Grounded 框架（KA2G），通过将 text 生成任务和 audio 模式结合起来，实现了数据稀缺下的 slot-filling。KA2G 还使用了可用的外部知识（如预先定义的槽值列表）来进一步提高 slot-filling 的精度。
results: KA2G 在 speech-based single-turn SLURP 数据集和一个商业 ToD 系统中的 multi-turn 数据集上进行了实验，并显示了与先前作品相比，特别是在 few-shot 和 zero-shot 设置下，具有强大和一致的提升。

Abstract
Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by 1) framing it as a text generation task, 2) grounding text generation additionally in the audio modality, and 3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.

摘要
人工标注细腻槽值标签 для任务对话（ToD）系统是一项昂贵和耗时的努力。这种情况激励了对槽 filling 方法的研究，这些方法可以采用有限量的标注数据。此外，当前大多数 ToD 研究仅基于文本输入模式，忽略了自动语音识别（ASR）的额外挑战。在这种工作中，我们提出了一个知识感知音频根据 generator 框架（KA2G），这个框架专注于几 shot 和零 shot 槽 filling для speech-based ToD。KA2G 通过以下三种方法实现了稳定和数据效果的槽 filling：1. 将其视为文本生成任务。2. 在音频模式中进一步地固定文本生成。3. 使用可用的外部知识（如预定的槽值列表）进行条件。我们发现，在 KA2G 框架中结合两个模式可以提高对 ASR 错误的强度。此外，KA2G 中的知识感知槽值生成器，通过使用指针生成机制实现，尤其是在几 shot 和零 shot 学习中具有优势。我们在标准的speech-based single-turn SLURP 数据集和一个商业 ToD 系统提取的多turn 数据集上进行了实验，并表现出了强大和一致的提升，特别是在几 shot 和零 shot 设置下。

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework

paper_url: http://arxiv.org/abs/2307.01715
repo_url: None
paper_authors: Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua, Tal Rosenwein
for: 提高 Automatic Speech Recognition (ASR) 模型的某些特性，如采样时间和单词错误率 (WER)。
methods: 提出了一种通用的 Plug-and-Play 框架，可以补充 CTCP 损失函数，以便根据某些愿望的属性进行优化。
results: 在 ASR 领域中应用该框架，可以提高 emission time 的优化效果，最多提高 570ms，同时只有较少影响 WER 的准确率。此外，还可以提高 WER 的准确率，相比基eline模型，提高了4.5%。

Abstract
Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.

摘要
Connectionist Temporal Classification (CTC) 是一种广泛使用的训练监督序列到序列（seq2seq）模型的评价标准。它使得学习输入和输出序列之间的关系，称为对齐，通过排除完美对齐（导致真实的输出）的权重，以换取不完美对齐。这种二分法对于真实世界应用中的其他重要对齐特性不足。我们提出了《Align With Purpose》，一种通用的插件和替换框架，可以增强模型在 CTc 评价标准下的愿望特性。我们通过补充 CTc loss函数中的一个额外损失项来实现这一点，该项将对齐按照愿望特性进行优先级排序。我们的方法不需要对 CTc loss函数进行任何改变，可以轻松地优化多种特性，并允许对不完美对齐进行区分。我们在自动语音识别（ASR）领域应用了我们的框架，并在不同的特性、模型选择和训练数据集大小（最大达280,000小时）上进行了通用性测试。为证明我们的框架的有效性，我们在两种不相关的特性上应用了它：发射时间和单词错误率（WER）。对于前者，我们report了最多570ms的延迟优化和一定的WER降低，对于后者，我们report了相对于基eline模型的4.5% WER提升。到目前为止，这些应用都没有在这样大的数据集上进行过。值得注意的是，我们的方法只需要几行代码实现，并且可以扩展到其他对齐无法损失函数和领域。

Dipping PLMs Sauce: Bridging Structure and Text for Effective Knowledge Graph Completion via Conditional Soft Prompting

paper_url: http://arxiv.org/abs/2307.01709
repo_url: https://github.com/chenchens190009/csprom-kg
paper_authors: Chen Chen, Yufei Wang, Aixin Sun, Bing Li, Kwok-Yan Lam
for: 本研究的目的是提高知识图谱完成（KGC） task 的效果，通过维护知识图谱的结构信息和文本信息之间的平衡。
methods: 本研究提出了一种名为 CSProm-KG（Conditional Soft Prompts for KGC）的方法，它只是根据实体和关系表示生成的条件软提示参数进行调整。
results: 对三个常见的静态 KGC 测试集 WN18RR、FB15K-237 和 Wikidata5M 以及两个时间 KGC 测试集 ICEWS14 和 ICEWS05-15 进行测试，CSProm-KG 表现出色，超越了比较基eline模型。我们还进行了进一步的分析，以证明我们的提出的组件的有效性、CSProm-KG 的效率和其可变性。

Abstract
Knowledge Graph Completion (KGC) often requires both KG structural and textual information to be effective. Pre-trained Language Models (PLMs) have been used to learn the textual information, usually under the fine-tune paradigm for the KGC task. However, the fine-tuned PLMs often overwhelmingly focus on the textual information and overlook structural knowledge. To tackle this issue, this paper proposes CSProm-KG (Conditional Soft Prompts for KGC) which maintains a balance between structural information and textual knowledge. CSProm-KG only tunes the parameters of Conditional Soft Prompts that are generated by the entities and relations representations. We verify the effectiveness of CSProm-KG on three popular static KGC benchmarks WN18RR, FB15K-237 and Wikidata5M, and two temporal KGC benchmarks ICEWS14 and ICEWS05-15. CSProm-KG outperforms competitive baseline models and sets new state-of-the-art on these benchmarks. We conduct further analysis to show (i) the effectiveness of our proposed components, (ii) the efficiency of CSProm-KG, and (iii) the flexibility of CSProm-KG.

摘要
知识图结束 (KGC) 常常需要知识图结构和文本信息同时进行效果。先训练语言模型 (PLMs) 已经被用来学习文本信息，通常在细致调参 paradigm 中进行 KGC 任务。然而，细致调参 PLMs 经常偏重于文本信息，忽略知识图结构。为了解决这个问题，这篇论文提出了 CSProm-KG (Conditional Soft Prompts for KGC)，它保持了知识图结构和文本知识之间的平衡。CSProm-KG 只是调整基于实体和关系表示的 Conditional Soft Prompts 的参数。我们证明了 CSProm-KG 在三个流行的静态 KGC 标准测试集 WN18RR、FB15K-237 和 Wikidata5M 上表现出色，并在两个时间 KGC 标准测试集 ICEWS14 和 ICEWS05-15 上设置新的状态纪录。我们进一步分析表明（i）我们提posed的组件的效果，（ii）CSProm-KG 的效率，以及（iii）CSProm-KG 的灵活性。

Racial Bias Trends in the Text of US Legal Opinions

paper_url: http://arxiv.org/abs/2307.01693
repo_url: None
paper_authors: Rohan Jinturkar
for: 这篇论文探讨了美国法律中的种族偏见，具体来说是法官的言论中是否存在种族偏见，以及这种偏见是否随时间和地区而变化。
methods: 作者使用了一种方法来测量大规模文本中的隐性种族偏见，并应用了这种方法来分析600万多个美国联邦和州法院案例文献从1860年到2009年。
results: 研究发现，美国法官的言论中存在强烈的种族偏见，传统的黑人名字更加与“不愉快”的词语相关，而传统的白人名字更加与“愉快”的词语相关。此外，研究还发现，在1950年之前的法律意见中没有发现更高的隐性种族偏见，nor did legal opinions from Northeastern states show greater change in racial bias over time compared to Southern states.

Abstract
Although there is widespread recognition of racial bias in US law, it is unclear how such bias appears in the language of law, namely judicial opinions, and whether it varies across time period or region. Building upon approaches for measuring implicit racial bias in large-scale corpora, we approximate GloVe word embeddings for over 6 million US federal and state court cases from 1860 to 2009. We find strong evidence of racial bias across nearly all regions and time periods, as traditionally Black names are more closely associated with pre-classified "unpleasant" terms whereas traditionally White names are more closely associated with pre-classified "pleasant" terms. We also test whether legal opinions before 1950 exhibit more implicit racial bias than those after 1950, as well as whether opinions from Southern states exhibit less change in racial bias than those from Northeastern states. We do not find evidence of elevated bias in legal opinions before 1950, or evidence that legal opinions from Northeastern states show greater change in racial bias over time compared to Southern states. These results motivate further research into institutionalized racial bias.

摘要
尽管美国法律界存在普遍的种族偏见，但是未知如何在法律语言中表现出这种偏见，以及是否随时间或地区而变化。我们基于大规模文本潜在偏见测量方法，对1860年至2009年美国联邦和州法院案例600万起进行了 aproximate GloVe词嵌入。我们发现在大多数地区和时间期间，传统的黑人名字更加密切相关于预先分类的“不愉快” terms，而传统的白人名字更加密切相关于预先分类的“愉悦” terms。我们还测试了1950年之前的法律意见是否具有更高的潜在种族偏见，以及南部州法律意见是否在时间的推移中改变了种族偏见的变化。我们未能发现1950年之前的法律意见具有偏高的偏见，也未能发现南部州法律意见在时间的推移中改变种族偏见的变化。这些结果激励进一步研究 институциализи了种族偏见。

paper_url: http://arxiv.org/abs/2307.01680
repo_url: None
paper_authors: Dimosthenis Antypas, Jose Camacho-Collados
for: 这篇论文的目的是探讨在自然语言处理（NLP）领域中自动推测仇恨言论的研究。大多数前一 studies 都是基于社交媒体数据集，这些数据集的创建过程中含有自己的偏见，而模型从这些数据集偏见中学习。
methods: 在这篇论文中，我们使用了大规模的训练语言模型，并在不同的仇恨言论检测数据集上进行了精致的调整。我们还进行了许多数据集的比较，以探讨不同数据集在培训 hate speech detection 模型时的可行性。
results: 我们的实验结果显示，不同的数据集在培训 hate speech detection 模型时有所不同的可行性。其中，一些数据集更加普遍，可以在不同的背景下进行应用。此外，我们发现可以通过 комбінуing 不同的数据集来建立更加Robust的 hate speech detection 模型，这个 Robustness 甚至在控制data size 和比较最佳个别数据集时仍然保持。

Abstract
The automatic detection of hate speech online is an active research area in NLP. Most of the studies to date are based on social media datasets that contribute to the creation of hate speech detection models trained on them. However, data creation processes contain their own biases, and models inherently learn from these dataset-specific biases. In this paper, we perform a large-scale cross-dataset comparison where we fine-tune language models on different hate speech detection datasets. This analysis shows how some datasets are more generalisable than others when used as training data. Crucially, our experiments show how combining hate speech detection datasets can contribute to the development of robust hate speech detection models. This robustness holds even when controlling by data size and compared with the best individual datasets.

摘要
自然语言处理（NLP）领域中自动发现仇恨言论在线是一个活跃的研究领域。大多数研究到目前为止都是基于社交媒体数据集，这些数据集在创建仇恨言论检测模型时提供了贡献。然而，数据创建过程中带有自己的偏见，模型从这些数据集特定的偏见中学习。在这篇论文中，我们进行了大规模的跨数据集比较，我们在不同的仇恨言论检测数据集上进行了精细的微调。这一分析表明了某些数据集在用于训练模型时更加通用，而且我们的实验显示，将多个仇恨言论检测数据集组合起来可以帮助建立更加鲁棒的仇恨言论检测模型。这种鲁棒性甚至在控制数据量和相比最佳单个数据集的情况下保持。

Disentanglement in a GAN for Unconditional Speech Synthesis

paper_url: http://arxiv.org/abs/2307.01673
repo_url: https://github.com/rf5/simple-asgan
paper_authors: Matthew Baas, Herman Kamper
for: 这个研究旨在开发一个可以直接从潜在空间生成真实语音的模型，不需要明确的条件。
methods: 这个模型基于StyleGAN家族的内生对称网络，可以将抽象的噪音映射到一个分离的潜在空间中，然后将这个潜在空间映射到一系列的语音特征，以消除干扰信号扩散。
results: 在使用Google Speech Commands数据集的小词库数据集上，ASGAN已经取得了顶尖的结果，并且比 existing 的扩散模型快得多。此外，我们还证明了 ASGAN 的潜在空间是分离的，可以使用Simple linear operations在这个空间中进行多个未见 durante 训练的任务。

Abstract
Can we develop a model that can synthesize realistic speech directly from a latent space, without explicit conditioning? Despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this, even on small-vocabulary datasets. To address this, we propose AudioStyleGAN (ASGAN) -- a generative adversarial network for unconditional speech synthesis tailored to learn a disentangled latent space. Building upon the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation which probabilistically skips discriminator updates. We apply it on the small-vocabulary Google Speech Commands digits dataset, where it achieves state-of-the-art results in unconditional speech synthesis. It is also substantially faster than existing top-performing diffusion models. We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training. Specifically, we perform evaluations in voice conversion, speech enhancement, speaker verification, and keyword classification. Our work indicates that GANs are still highly competitive in the unconditional speech synthesis landscape, and that disentangled latent spaces can be used to aid generalization to unseen tasks. Code, models, samples: https://github.com/RF5/simple-asgan/

摘要
可以开发一个模型， direct from a latent space synthesize realistic speech without explicit conditioning? despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this, even on small-vocabulary datasets. To address this, we propose AudioStyleGAN (ASGAN) -- a generative adversarial network for unconditional speech synthesis tailored to learn a disentangled latent space. Building upon the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation which probabilistically skips discriminator updates. We apply it on the small-vocabulary Google Speech Commands digits dataset, where it achieves state-of-the-art results in unconditional speech synthesis. It is also substantially faster than existing top-performing diffusion models. We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training. Specifically, we perform evaluations in voice conversion, speech enhancement, speaker verification, and keyword classification. Our work indicates that GANs are still highly competitive in the unconditional speech synthesis landscape, and that disentangled latent spaces can be used to aid generalization to unseen tasks.Here's the translation in Traditional Chinese:可以开发一个模型， directly from a latent space synthesize realistic speech without explicit conditioning? despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this, even on small-vocabulary datasets. To address this, we propose AudioStyleGAN (ASGAN) -- a generative adversarial network for unconditional speech synthesis tailored to learn a disentangled latent space. Building upon the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation which probabilistically skips discriminator updates. We apply it on the small-vocabulary Google Speech Commands digits dataset, where it achieves state-of-the-art results in unconditional speech synthesis. It is also substantially faster than existing top-performing diffusion models. We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training. Specifically, we perform evaluations in voice conversion, speech enhancement, speaker verification, and keyword classification. Our work indicates that GANs are still highly competitive in the unconditional speech synthesis landscape, and that disentangled latent spaces can be used to aid generalization to unseen tasks.

Boosting Norwegian Automatic Speech Recognition

paper_url: http://arxiv.org/abs/2307.01672
repo_url: None
paper_authors: Javier de la Rosa, Rolv-Arild Braaten, Per Egil Kummervold, Freddy Wetjen, Svein Arne Brygfjeld
for: 本研究为自动speech recognition（ASR）模型在挪威官方文字两种语言中提供了多个基线。
methods: 本研究使用不同大小和预训练方法的ASR模型在多个挪威语音dataset上进行了比较。同时，我们也测试了这些模型在之前的状态艺术模型和尘泥dataset上的性能。
results: 我们在挪威议会语音 corpus（NPSC）上从单词错误率（WER）17.10%下降至7.60%，模型在挪威语言中获得了5.81%的最佳状态。同时，我们还讨论了进一步改进ASR模型的挑战和解决方案。

Abstract
In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk. We compare the performance of models of varying sizes and pre-training approaches on multiple Norwegian speech datasets. Additionally, we measure the performance of these models against previous state-of-the-art ASR models, as well as on out-of-domain datasets. We improve the state of the art on the Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of 17.10\% to 7.60\%, with models achieving 5.81\% for Bokm{\aa}l and 11.54\% for Nynorsk. We also discuss the challenges and potential solutions for further improving ASR models for Norwegian.

摘要
在这篇论文中，我们提出了多种基线模型 для自动语音识别（ASR）模型，用于两种官方文字语言在挪威：博克马尔和新北兰语。我们比较了不同大小和预训练方法的模型在多个挪威语音 dataset 上的性能。此外，我们还测试了这些模型与之前的状态对应 ASR 模型和不同语音集上的性能。我们在挪威国会语音集（NPSC）上提高了状态对应率从17.10% 降低至7.60%，其中模型为5.81% для博克马尔和11.54% для新北兰语。我们还讨论了进一步改进 ASR 模型的挑战和解决方案。

Unified Conversational Models with System-Initiated Transitions between Chit-Chat and Task-Oriented Dialogues

paper_url: http://arxiv.org/abs/2307.01664
repo_url: None
paper_authors: Ye Liu, Stefan Ultes, Wolfgang Minker, Wolfgang Maier
for: 这个论文的目的是研究在对话模型中实现功能目标和社交对话的联合模型，以及在对话模式之间发生转变时的启发机制。
methods: 这个论文使用了两种类型的对话场景，一种从社交对话逐渐转移到任务 Orientated 请求，另一种从任务 Orientated 交互开始，然后在所有请求信息都提供后转移到社交对话。作者还提出了两种有效的启发模型，一种是一个精确的批量模型，另一种是一个连续的启发模型使用自动生成的推荐embeddings。
results: 研究发现，连续启发模型可以在多个领域任务中实现更高的转换效果，并且可以用于指导对话模型在不同领域之间的批量转换。

Abstract
Spoken dialogue systems (SDSs) have been separately developed under two different categories, task-oriented and chit-chat. The former focuses on achieving functional goals and the latter aims at creating engaging social conversations without special goals. Creating a unified conversational model that can engage in both chit-chat and task-oriented dialogue is a promising research topic in recent years. However, the potential ``initiative'' that occurs when there is a change between dialogue modes in one dialogue has rarely been explored. In this work, we investigate two kinds of dialogue scenarios, one starts from chit-chat implicitly involving task-related topics and finally switching to task-oriented requests; the other starts from task-oriented interaction and eventually changes to casual chat after all requested information is provided. We contribute two efficient prompt models which can proactively generate a transition sentence to trigger system-initiated transitions in a unified dialogue model. One is a discrete prompt model trained with two discrete tokens, the other one is a continuous prompt model using continuous prompt embeddings automatically generated by a classifier. We furthermore show that the continuous prompt model can also be used to guide the proactive transitions between particular domains in a multi-domain task-oriented setting.

摘要
干脆对话系统（SDS）已经分别开发出了两类：任务oriented和聊天。前者关注实现功能目标，而后者想创造有趣的社交对话没有特定目标。在最近几年中，创建一个综合对话模型可以在一个对话中同时进行聊天和任务oriented对话是一个有前途的研究话题。然而，在对话模式之间的变化中可能会发生的“发起”（initiative） rarely been explored。在这项工作中，我们研究了两种对话场景：一个从聊天逐渐涉及到任务相关话题，最后转换到任务oriented请求；另一个从任务oriented交互开始，最后变成了聊天。我们提出了两种高效的提示模型，可以触发系统自主发起对话模式的转换。一个是使用两个简单的Token进行训练的批示模型，另一个是使用自动生成的连续提示嵌入数据来 guideline 系统自主转换的连续提示模型。此外，我们还证明了连续提示模型可以在多个领域任务oriented Setting 中用于指导系统自主转换。

Chain of Thought Prompting Elicits Knowledge Augmentation

paper_url: http://arxiv.org/abs/2307.01640
repo_url: https://github.com/ruckbreasoning/cot-ka
paper_authors: Dingjun Wu, Jing Zhang, Xinmei Huang
for: 这篇论文旨在提出一种基于链条思维（Chain-of-Thought，CoT）的知识增强深度学习方法（Knowledge-Augmented Deep Learning，KADL）。
methods: 这种方法使用大语言模型进行广泛预训练，然后将其作为外部知识集成到深度学习模型中。
results: 对于多种逻辑任务的 eleven 个公共数据集上，CoT-KA 方法比纯CoT方法和非增强方法表现出色，得到了更高的性能。

Abstract
The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models. Conventional methods typically employ task-specific approaches to gather external knowledge from various sources. In contrast, large language models are extensively pre-trained and can serve as a comprehensive source of external knowledge. In this paper, we propose CoT-KA, a Chain-of-Thought-based method that augments knowledge for deep learning. CoT-KA avoids the need for additional knowledge retrieval or knowledge reasoning models, as required in conventional augmentation methods. Our results demonstrate that CoT-KA outperforms both pure CoT-based methods and the non-augmented method across the majority of eleven publicly available benchmarks for various reasoning tasks.

摘要
知识增强深度学习方式指的是一种将领域知识集成到深度模型中的方法。传统方法通常采用任务特定的方法来从多种来源中收集外部知识。然而，大型语言模型已经广泛预训练，可以作为外部知识的全面来源。在这篇论文中，我们提出了基于链条思想的CoT-KA方法，用于增强深度学习。CoT-KA不需要额外的知识检索或知识推理模型，与传统增强方法不同。我们的结果表明，CoT-KA在多种公共可用的benchmark上比纯CoT方法和非增强方法表现出色，其中大多数任务的性能都高于非增强方法。

A Language Model for Grammatical Error Correction in L2 Russian

paper_url: http://arxiv.org/abs/2307.01609
repo_url: None
paper_authors: Nikita Remnev, Sergei Obiedkov, Ekaterina Rakhilina, Ivan Smirnov, Anastasia Vyrenkova
for: correction of non-native (L2) writing errors in Russian language
methods: use of a language model trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus
results: validation of the model’s quality against the RULEC-GEC corpusHere’s the full text in Simplified Chinese:
for: correction of non-native (L2) 中文写作错误
methods: 使用基于新闻子集的俄语国家 corpus 上的无标文本语言模型
results: validate 模型质量 against RULEC-GEC corpusI hope that helps!

Abstract
Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, we propose a pipeline involving a language model intended for correcting errors in L2 Russian writing. The language model proposed is trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus, and the quality of the model is validated against the RULEC-GEC corpus.

摘要
grammatical error correction是自然语言处理中的基本任务之一。对于俄语，大多数可用的拼写检查器能够准确地检测 typo和其他简单错误，但在面临非Native（L2）写作时，它们frequently failed，因为L2写作中含有不典型的错误。在这篇论文中，我们提议一个涉及语言模型的管道，用于 correecting L2俄语写作中的错误。我们的语言模型基于俄语日报子集（Newspaper subcorpus）上的未标注文本，并对RULEC-GEC corpus进行验证。

Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation

paper_url: http://arxiv.org/abs/2307.01542
repo_url: https://github.com/thu-coai/selfcont
paper_authors: Jian Guan, Minlie Huang
for: 提高自然语言生成 tasks 中的多样性，尤其是使用 GPT2 预训练语言模型进行开放式生成时，具有重复性的问题。
methods: 我们提出了一种自我对比训练方法，通过对同一模型的 premature checkpoint 的输出进行罚款，以避免重复性的过度估计。
results: 我们在两个 datasets 上进行了实验，发现这种方法可以有效地避免重复性，同时保持流畅性。此外，我们发现Language Models 在预测重复Token时使用更长的词语关系，可能是句子水平重复的原因。

Abstract
Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.

摘要
尽管在许多生成任务中进步很大，预训练语言模型（LM）如GPT2仍然很容易通过最大化基于解码算法来生成重复的文本。我们认为这是因为学习偏见：LM学习了简单的重复模式更快，使得它们在MLE损失函数下过度估计token级别的重复概率。我们提议使用自我对比训练来追加预训练模型的检查点，并在检查点不正确预测重复时进行惩罚，这有效地减少了重复，同时保持了流畅性在两个数据集上。此外，我们发现LM在预测重复token时使用了更长的距离，这可能是句子水平的重复循环的原因。

On Evaluating and Mitigating Gender Biases in Multilingual Settings

paper_url: http://arxiv.org/abs/2307.01503
repo_url: None
paper_authors: Aniket Vashishtha, Kabir Ahuja, Sunayana Sitaram
for: 本研究旨在 investigate the challenges of evaluating and mitigating biases in multilingual settings, especially for non-western context.
methods: 本研究使用 human annotations 创建了一个用于评估 gender biases 的指标，并将 existing debiasing methods 扩展到不同的印度语言。
results: 研究发现了 multilingual settings 中 studying social biases 的挑战，并提供了资源和 mitigation techniques 以逐步扩展到更多的语言。

Abstract
While understanding and removing gender biases in language models has been a long-standing problem in Natural Language Processing, prior research work has primarily been limited to English. In this work, we investigate some of the challenges with evaluating and mitigating biases in multilingual settings which stem from a lack of existing benchmarks and resources for bias evaluation beyond English especially for non-western context. In this paper, we first create a benchmark for evaluating gender biases in pre-trained masked language models by extending DisCo to different Indian languages using human annotations. We extend various debiasing methods to work beyond English and evaluate their effectiveness for SOTA massively multilingual models on our proposed metric. Overall, our work highlights the challenges that arise while studying social biases in multilingual settings and provides resources as well as mitigation techniques to take a step toward scaling to more languages.

摘要
tradicional，理解并消除语言模型中的性别偏见问题一直是自然语言处理领域的长期问题，但之前的研究主要集中在英语上。在这项工作中，我们探讨了在多语言设置中评估和消除偏见的挑战，以及由于英语以外的语言缺乏现有的偏见评估 benchmark和资源而导致的问题。在这篇论文中，我们首先创建了评估隐藏语言模型中的性别偏见的benchmark，通过对印度语言进行扩展DisCo以获得人工纠正。然后，我们扩展了不同的去偏见方法以工作在英语以外的语言上，并评估这些方法在我们提出的指标上的效果。总之，我们的工作揭示了在多语言设置中研究社会偏见的挑战，并提供了资源以及消除技术，以便扩展到更多的语言。

SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification

paper_url: http://arxiv.org/abs/2307.01488
repo_url: None
paper_authors: Junjie Wu, Dit-Yan Yeung
for: 防御文本攻击，提高自然语言处理（NLP）系统的鲁棒性。
methods: 自我标注对假数据进行随机修饰，并通过对这些修饰和其对应的对抗样本进行对比来实现对抗训练。
results: 可以很好地训练不含标签数据的语言模型，并且可以提高现有预训练语言模型的鲁棒性。

Abstract
Despite their promising performance across various natural language processing (NLP) tasks, current NLP systems are vulnerable to textual adversarial attacks. To defend against these attacks, most existing methods apply adversarial training by incorporating adversarial examples. However, these methods have to rely on ground-truth labels to generate adversarial examples, rendering it impractical for large-scale model pre-training which is commonly used nowadays for NLP and many other tasks. In this paper, we propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training), which can learn robust representations without requiring labeled data. Specifically, SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples. Adversarial training is achieved by minimizing the contrastive loss between the augmentations and their adversarial counterparts. We evaluate SCAT on two text classification datasets using two state-of-the-art attack schemes proposed recently. Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models. Moreover, to demonstrate its flexibility, we show that SCAT can also be combined with supervised adversarial training to further enhance model robustness.

摘要
尽管现有的自然语言处理（NLP）系统在不同的任务上表现出色，但它们对文本恶作剂攻击仍然易受到影响。为防止这些攻击，大多数现有的方法采用了对抗训练，但这些方法往往需要使用准确的标签来生成对抗示例，这使得大规模模型预训练成为不可能的。在这篇论文中，我们提出了一种新的学习框架，即SCAT（自主对抗学习），可以不需要标签数据来学习强化表示。具体来说，SCAT通过修改数据的随机扩展来生成对抗示例，然后通过对这些对抗示例和其对抗样本进行对抗训练来减少对抗攻击的影响。我们在两个文本分类任务上使用了两种最新的攻击方案进行评估。我们的结果显示，SCAT不仅可以从零开始训练Robust语言模型，而且还可以显著提高现有预训练语言模型的Robust性。此外，我们还证明了SCAT可以与有监督对抗训练结合使用，以进一步增强模型的Robust性。

CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care

paper_url: http://arxiv.org/abs/2307.01458
repo_url: https://github.com/meetyou-ai-lab/care-mi
paper_authors: Tong Xiang, Liangzhi Li, Wangyue Li, Mingbai Bai, Lu Wei, Bowen Wang, Noa Garcia
for: 这 paper 的目的是evaluating the misinformation generated by large language models (LLMs) in the sensitive topic of maternity and infant care, and providing a benchmark for assessing the quality of long-form generation in Chinese.methods: 该 paper 使用了一个新的 benchmark ， named CARE-MI, to evaluate the misinformation of LLMs in the maternity and infant care domain, and compared potential solutions for long-form generation evaluation.results: 该 paper 发现，current Chinese LLMs 在这个领域 still have a long way to go, and proposed a judgment model for automatically assessing the long-form output of LLMs using the benchmark questions.

Abstract
The recent advances in NLP, have led to a new trend of applying LLMs to real-world scenarios. While the latest LLMs are astonishingly fluent when interacting with humans, they suffer from the misinformation problem by unintentionally generating factually false statements. This can lead to harmful consequences, especially when produced within sensitive contexts, such as healthcare. Yet few previous works have focused on evaluating misinformation in the long-form generation of LLMs, especially for knowledge-intensive topics. Moreover, although LLMs have been shown to perform well in different languages, misinformation evaluation has been mostly conducted in English. To this end, we present a benchmark, CARE-MI, for evaluating LLM misinformation in: 1) a sensitive topic, specifically the maternity and infant care domain; and 2) a language other than English, namely Chinese. Most importantly, we provide an innovative paradigm for building long-form generation evaluation benchmarks that can be transferred to other knowledge-intensive domains and low-resourced languages. Our proposed benchmark fills the gap between the extensive usage of LLMs and the lack of datasets for assessing the misinformation generated by these models. It contains 1,612 expert-checked questions, accompanied with human-selected references. Using our benchmark, we conduct extensive experiments and found that current Chinese LLMs are far from perfect in the topic of maternity and infant care. In an effort to minimize the reliance on human resources for performance evaluation, we offer a judgment model for automatically assessing the long-form output of LLMs using the benchmark questions. Moreover, we compare potential solutions for long-form generation evaluation and provide insights for building more robust and efficient automated metric.

摘要
近些年，自然语言处理（NLP）的进步，启动了应用大型自然语言模型（LLM）到实际场景的新趋势。latest LLMs在与人类交互时表现出很高的流畅性，但它们受到谎言问题的困扰，即不慎生成的false信息。这可能导致有害的后果，特别是在敏感场景中，如医疗领域。然而，前期工作很少关注了LLMs中的谎言评估，特别是在知识密集的领域和语言中。为了解决这问题，我们提出了一个benchmark，CARE-MI，用于评估LLMs中的谎言。CARE-MI包括以下两个方面：1）敏感领域，即婴儿护理领域；2）语言，即中文。我们还提供了一种创新的评估长形生成 benchmark的方法，可以转移到其他知识密集的领域和低资源语言。我们的提案填补了LLMs的广泛使用和评估谎言生成的数据差距。我们的benchmark包括1,612个专家审核的问题，以及人选的参考文献。使用我们的benchmark，我们进行了广泛的实验，发现当前的中文LLMs在婴儿护理领域还有很大的改进空间。为了减少人工资源的依赖，我们提供了一种自动评估长形输出的模型，以及对不同解决方案的比较。

Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking

paper_url: http://arxiv.org/abs/2307.01453
repo_url: https://github.com/jlab-nlp/refpydst
paper_authors: Brendan King, Jeffrey Flanigan
for: 提高对对话状态跟踪（DST）的表现，特别是在零和几个示例学习环境下。
methods: 提出了三种改进来提高在对话状态跟踪中的受Context learning，包括：将DST视为Python编程任务，明确表示语言核心引用在Python中；选择多个相关示例来提高性能；在解码阶段使用重新权重方法，考虑竞争表达形式的概率，生成更准确的对话状态预测。
results: 使用MultiWOZ进行评估，在零和几个示例学习环境下实现了多个多任务共同目标准确率的状态前几。

Abstract
There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances the state of the art with three advancements to in-context learning for DST. First, we formulate DST as a Python programming task, explicitly modeling language coreference as variable reference in Python. Second, since in-context learning depends highly on the context examples, we propose a method to retrieve a diverse set of relevant examples to improve performance. Finally, we introduce a novel re-weighting method during decoding that takes into account probabilities of competing surface forms, and produces a more accurate dialogue state prediction. We evaluate our approach using MultiWOZ and achieve state-of-the-art multi-domain joint-goal accuracy in zero and few-shot settings.

摘要
有很多人表达了对零和几个shot学习对话状态追踪（DST）的兴趣，这是因为收集和标注任务型对话的成本很高。现有研究表明，在Context中学习只需要很少数据和零参数更新，甚至在几个shot Setting下超越训练方法的性能（Hu et al. 2022）。我们提出了RefPyDST，这是一种在Context中学习DST的新方法，它具有以下三个进步：1. 我们将DST视为一种Python编程任务，直接在Python中表示语言核心语言引用。2. 由于Context学习强烈取决于上下文示例，我们提议一种方法来检索更多相关的示例，以提高性能。3. 我们提出了一种新的重新权重方法，在解码过程中考虑竞争表面形式的概率，并生成更准确的对话状态预测。我们使用MultiWOZ进行评估，并在零和几个shot Setting下实现了多个领域共同目标准确率的状态前景。

ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

paper_url: http://arxiv.org/abs/2307.01448
repo_url: None
paper_authors: Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang, Jiawei Han
for: 本研究旨在提供一种用于提取结构化反应信息的方法，以便化学家在实验室工作和计算机辅助药物设计等高级任务中使用。
methods: 本研究使用了两种弱监督方法进行预训练，利用文本中频繁出现的语言特征来识别化学反应的特征。同时，我们采用了专利记录中的 sintetic 数据作为远程监督，以把领域知识integrated到模型中。
results: 实验表明， ReactIE 方法可以达到显著提高，并超过所有基eline。

Abstract
Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that ReactIE achieves substantial improvements and outperforms all existing baselines.

摘要
科学文献中的结构化化学反应信息对化学家进行实验室工作和高级尝试（如计算机支持药物设计）起着重要作用。然而，提取结构化反应的数据标注因为需要域专家劳动量大，因此成本高昂。这导致相关模型在这个领域进步受阻。在这篇论文中，我们提议了ReactIE，它组合了两种弱监督方法进行预训练。我们的方法利用文本中的频繁出现的语言特征作为化学反应的特征标志。此外，我们采用了专利记录中的 sintetic data作为远程监督，以把领域知识引入模型中。实验表明，ReactIE可以实现显著改进，并超过所有基elines。

On Conditional and Compositional Language Model Differentiable Prompting

paper_url: http://arxiv.org/abs/2307.01446
repo_url: https://github.com/jpilaul/PRopS
paper_authors: Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer
for: 本研究旨在提高预训练语言模型（PLM）的下游任务性能，通过各种提示方法来适应不同任务。
methods: 本研究使用了 conditional和compositional的可微分提示方法，并提出了一种新的模型——Prompt Production System（PRopS），可以将任务说明或输入元数据转化为Continuous提示，以便从PLM中获取特定任务输出。PRopS使用了基于神经网络的Production Systems模型结构，可以学习到特定提示输入模式的精细规则，从而实现compositional transfer learning和少量学习。
results: 对比其他PLM适应技术，PRopS在compositional generalization任务、可控摘要和多语言翻译等任务中具有优异表现，需要更少的可训练参数。

Abstract
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules -- neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that PRopS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.

摘要
<>文本提示（Prompts）已经被证明是一种有效的方法，用于适应预训练语言模型（PLM）来实现下游任务的好准确性。文本提示可以表示为人工设计的单词序列或学习到的连续嵌入。在这项工作中，我们研究了强制和组合的可微分提示。我们提出了一个新的模型，即提示生产系统（PRopS），该模型可以将任务指令或输入元数据转换为可微分的提示，从而使PLM发生任务特定的输出。我们的模型采用基于我们的神经网络表述的生产系统结构，该结构允许模型学习分解规则——神经函数学习特定提示输入模式的转换，使其适用于组合转移学习和少量学习。我们对PRopS进行了广泛的实验和理论分析，并证明了其在组合总结任务、可控概要和多语言翻译方面的表现，常常超过其他PLM适应技术，而且经常超过完全精度地训练的模型。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Modeling Tag Prediction based on Question Tagging Behavior Analysis of CommunityQA Platform Users

paper_url: http://arxiv.org/abs/2307.01420
repo_url: None
paper_authors: Kuntal Kumar Pal, Michael Gamon, Nirupama Chandrasekaran, Silviu Cucerzan
for: 提高社区问答平台信息组织和检索效果，更快准确地回答问题，评估话题Popularity。
methods: 对17个StackExchange社区用户标签行为进行了系统性分析，发现了不同领域的共同特性。采用发现结果开发flexible的神经网络标签预测模型，可预测问题的流行标签和更加细化的标签。
results: 经过广泛的实验和性能评估，证明了模型的有效性。

Abstract
In community question-answering platforms, tags play essential roles in effective information organization and retrieval, better question routing, faster response to questions, and assessment of topic popularity. Hence, automatic assistance for predicting and suggesting tags for posts is of high utility to users of such platforms. To develop better tag prediction across diverse communities and domains, we performed a thorough analysis of users' tagging behavior in 17 StackExchange communities. We found various common inherent properties of this behavior in those diverse domains. We used the findings to develop a flexible neural tag prediction architecture, which predicts both popular tags and more granular tags for each question. Our extensive experiments and obtained performance show the effectiveness of our model

摘要
在社区问答平台上，标签扮演着关键的角色，即信息组织和检索、更好的问题路由、更快的问题回答以及评估话题 популярность。因此，自动为帖子提供标签预测和建议是用户们的高Utility功能。为了在多个社区和领域中提高标签预测，我们进行了17个Stack Exchange社区用户标签行为的严格分析。我们发现了这些多样化领域中标签行为的共同特性。我们使用这些发现来开发一种灵活的神经网络标签预测架构，可以预测每个问题的流行标签以及更加细化的标签。我们的广泛的实验和表现表明我们的模型的效果。

Multi-Task Learning Improves Performance In Deep Argument Mining Models

paper_url: http://arxiv.org/abs/2307.01401
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Amirhossein Farzam, Shashank Shekhar, Isaac Mehlhaff, Marco Morucci
for: 本研究的目的是提高对用户生成文本中的论证技巧的分析，以便进行政治和市场分析等下游任务。
methods: 本研究使用了现代深度学习方法，包括多任务学习，以提取和注释用户生成文本中的论证技巧。
results: 研究表明，不同的论证检测任务共享相似的semantic和logical结构，并且可以通过共享表示和 Parametern sharing 来提高性能。

Abstract
The successful analysis of argumentative techniques from user-generated text is central to many downstream tasks such as political and market analysis. Recent argument mining tools use state-of-the-art deep learning methods to extract and annotate argumentative techniques from various online text corpora, however each task is treated as separate and different bespoke models are fine-tuned for each dataset. We show that different argument mining tasks share common semantic and logical structure by implementing a multi-task approach to argument mining that achieves better performance than state-of-the-art methods for the same problems. Our model builds a shared representation of the input text that is common to all tasks and exploits similarities between tasks in order to further boost performance via parameter-sharing. Our results are important for argument mining as they show that different tasks share substantial similarities and suggest a holistic approach to the extraction of argumentative techniques from text.

摘要
成功分析口说技巧是许多下游任务的核心，如政治和市场分析。现有的口说采矿工具使用 cutting-edge 深度学习方法提取和标注口说技巧，但每个任务都是专门训练不同的模型。我们显示出不同的口说采矿任务有共同的semantic和logical结构，通过实现多任务方法来采矿口说，可以更好地提高性能。我们的模型建立了输入文本共同的表示，并利用任务之间的相似性以进一步提高性能。我们的结果对口说采矿有重要意义，表明不同任务之间有许多相似之处，并建议一个整体的方法来从文本中提取口说技巧。

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

paper_url: http://arxiv.org/abs/2307.01387
repo_url: None
paper_authors: Javier de la Rosa, Álvaro Pérez Pozo, Salvador Ros, Elena González-Blanco
for: This paper is written for the analysis of poetry in a multilingual setting, specifically to address the lack of tools for automatically analyzing and scanning poems.
methods: The paper presents a new approach called \textsc{Alberti}, which is a multilingual pre-trained large language model for poetry. The model is trained using domain-specific pre-training (DSP) on a corpus of over 12 million verses from 12 languages.
results: The paper reports that \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes on two structural poetry tasks: Spanish stanza type classification and metrical pattern prediction for Spanish, English, and German. Additionally, \textsc{Alberti} achieves state-of-the-art results for German when compared to rule-based systems.

Abstract
The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained large language model for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.

摘要
计算 poetry 的分析受到计算 poetry 工具的缺乏的限制。在多语言设置下，问题更加严重，因为押韵和律诗系统只存在于个别语言中，这使得比较研究非常困难和耗时。在这种工作中，我们介绍了 \textsc{Alberti}，首个用于 poetry 的多语言预训练大语言模型。通过领域特定预训练（DSP），我们进一步训练了多语言 BERT 在12种语言的超过12万句诗歌中进行预训练。我们对其表现进行评估，并在西班牙押韵类型分类和德语、英语和西班牙的 мет律 Pattern 预测任务上达到了比较好的结果，并且在对比rule-based系统的 germany 语言中达到了国际一流的结果，这表明了 DSP 在 poetry 领域的可能性和有效性。

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

paper_url: http://arxiv.org/abs/2307.01381
repo_url: https://github.com/osu-starlab/implicitmemory
paper_authors: Matthew Raffel, Lizhong Chen
for: 这个论文的目的是提出一种新的做法来处理同时的语音翻译任务，使得翻译器可以同时接受输入语音序列，并且不需要耗费过多计算资源。
methods: 这个论文使用了一种新的Left Context方法，通过将上一个segment的注意力输出作为下一个segment的左 контекст来实现。这种方法可以减少计算资源的消耗，同时也可以保持模型的准确性。
results: 实验结果表明，使用这种Left Context方法可以在encoder前进行加速，而且与使用左 context和内存银行的方法相比，翻译质量几乎相同。

Abstract
Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment's attention calculation. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass with nearly identical translation quality when compared with the state-of-the-art approach that employs both left context and memory banks.

摘要
同时语音翻译是一项人类交流困难的沟通任务，即在流动输入语音时生成翻译。为此流处理任务，使用块处理的转换器已经实现了状态体系的最佳性能，并降低计算成本。现有的方法，包括左上下文和内存银行，尝试使信息在段之间传递，但是这些方法都是不充分的表示和过分的计算成本。在这篇论文中，我们提出了隐式记忆转换器，通过新的左上下文方法，消除了需要显式表示内存的需求。我们从上一个段的注意输出中生成左上下文，并将其包含在当前段的注意计算中的键和值中。对于 Must-C 数据集的实验结果表明，隐式记忆转换器在编码前进行速度增加，与使用左上下文和内存银行的状态体系相比，翻译质量几乎完全一致。

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

paper_url: http://arxiv.org/abs/2307.01377
repo_url: https://github.com/osu-starlab/shiftablecontext
paper_authors: Matthew Raffel, Drew Penney, Lizhong Chen
for: 提高同时翻译的精度，解决 segment-based 处理模型在训练和推理环境中的上下文匹配问题。
methods: 提出了Shiftable Context scheme，通过保证训练和推理环境中segment和上下文大小的一致性，提高同时翻译的精度。
results: 在英语-德语、英语-法语和英语-西班牙语语对上，对Augmented Memory Transformer模型进行Shiftable Context修改后，提高了等待k值的BLEU分数平均值2.09、1.83和1.95个数值，而 computation-aware Average Lagging的影响很小。

Abstract
Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. Shiftable Context is also broadly applicable to segment-based transformers for streaming tasks. Our experiments on the English-German, English-French, and English-Spanish language pairs from the MUST-C dataset demonstrate that when applied to the Augmented Memory Transformer, a state-of-the-art model for simultaneous speech translation, the proposed scheme achieves an average increase of 2.09, 1.83, and 1.95 BLEU scores across each wait-k value for the three language pairs, respectively, with a minimal impact on computation-aware Average Lagging.

摘要
使用分割基于的处理模型已经是同时交互翻译的有效架构。然而，这些模型会在训练和推理环境中创建上下文匹配问题，从而影响翻译准确性。我们解决这个问题 by proposing Shiftable Context，一种简单 yet effective的方案，确保在训练和推理过程中保持一致的分割和上下文大小，即使有部分填充的分割段 due to the streaming nature of simultaneous translation。Shiftable Context 还可以广泛应用于流处理任务中的 segment-based transformers。我们在英语-德语、英语-法语和英语-西班牙语语对的 MUST-C 数据集上进行了实验，并发现当应用到 Augmented Memory Transformer，一种现有的同时speech翻译模型时，提议的方案可以在每个 wait-k 值上得到平均提高2.09、1.83和1.95 的 BLEU 分数，并且对 computation-aware Average Lagging 产生了最小的影响。

Multilingual Language Models are not Multicultural: A Case Study in Emotion

paper_url: http://arxiv.org/abs/2307.01370
repo_url: https://github.com/shreyahavaldar/multicultural_emotion
paper_authors: Shreya Havaldar, Sunny Rai, Bhumika Singhal, Langchen Liu, Sharath Chandra Guntuku, Lyle Ungar
for: investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages
methods: use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, and investigate the Anglocentricity of embeddings obtained from LMs and the Western norms reflected in generative LMs
results: multilingual LMs do not successfully learn the culturally appropriate nuances of emotion, and possible research directions towards correcting this are highlighted

Abstract
Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this.

摘要
情感表达在不同的文化中存在差异。为了使用大语言模型（LM）进行多语言任务需要情感敏感，LM必须反映这种文化差异。本研究发现，2023年广泛使用的多语言LM（例如XLM-RoBERTa）的嵌入是英语中心，生成LM（例如ChatGPT）even responding to prompts in other languages still reflect Western norms. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion, and we highlight possible research directions towards correcting this.Note that the translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Semantic enrichment towards efficient speech representations

paper_url: http://arxiv.org/abs/2307.01323
repo_url: None
paper_authors: Gaëlle Laperrière, Ha Nguyen, Sahar Ghannay, Bassam Jabaian, Yannick Estève
for: 本研究旨在提高 spoken language understanding 任务中的 semantic extraction，并且考虑 computation costs。
methods: 本研究使用 SAMU-XLSR 模型，通过特点域 semantic enrichment 来增强 multilingual speech representation。
results: 研究发现，特点域 semantic enrichment 可以提高 spoken language understanding 任务中的 semantic extraction，同时还可以提高 low-resource language 的 portability。

Abstract
Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from such textual models to enrich multilingual speech representations with language agnostic semantics. By aiming for better semantic extraction on a challenging Spoken Language Understanding task and in consideration with computation costs, this study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model by specializing it on a small amount of transcribed data from the downstream task. In addition, we show the benefits of the use of same-domain French and Italian benchmarks for low-resource language portability and explore cross-domain capacities of the enriched SAMU-XLSR.

摘要
过去几年，自我超级学习的语音表示方法在解决口语语言理解（SLU）任务上出现了丰盈的替代方案。同时，基于巨量文本数据的多种语言模型被引入，以将语言不可知变Semantics编码。最近，SAMU-XLSR方法引入了将投资 language agnostic semantics的方法，以增强多ilingual speech表示。本研究的目的是通过对特定领域的Semantic抽象来提高SAMU-XLSR模型的SLU能力，并考虑计算成本。此外，我们还展示了对 French和Italian benchmarks的同domain使用可以提高低资源语言的可移植性，并探索了增强SAMU-XLSR的跨领域能力。

Exploring Spoken Named Entity Recognition: A Cross-Lingual Perspective

paper_url: http://arxiv.org/abs/2307.01310
repo_url: https://github.com/moncefbenaicha/spokenner
paper_authors: Moncef Benaicha, David Thulke, M. A. Tuğtekin Turan
for: 研究 spoken named entity recognition (NER) 的进展，以便更好地从文本数据中识别实体。
methods: 使用 transferred learning 技术，从荷兰语、英语和德语三种语言进行了跨语言交流学习。使用 Wav2Vec2-XLS-R 模型，并在自定义 Pseudo-annotated 数据集上进行了训练。
results: 结果表明，使用 End-to-End 方式的 spoken NER 比 pipeline 方式的系统表现更好，特别是从德语到荷兰语的转移学习表现出色，超过了荷兰 E2E 系统7%，超过了荷兰 pipeline 系统4%。这项研究不仅证明了跨语言转移学习在 spoken NER 中的可行性，还提示了未来的评估中需要更多的数据收集，以提高结果。

Abstract
Recent advancements in Named Entity Recognition (NER) have significantly improved the identification of entities in textual data. However, spoken NER, a specialized field of spoken document retrieval, lags behind due to its limited research and scarce datasets. Moreover, cross-lingual transfer learning in spoken NER has remained unexplored. This paper utilizes transfer learning across Dutch, English, and German using pipeline and End-to-End (E2E) schemes. We employ Wav2Vec2-XLS-R models on custom pseudo-annotated datasets and investigate several architectures for the adaptability of cross-lingual systems. Our results demonstrate that End-to-End spoken NER outperforms pipeline-based alternatives over our limited annotations. Notably, transfer learning from German to Dutch surpasses the Dutch E2E system by 7% and the Dutch pipeline system by 4%. This study not only underscores the feasibility of transfer learning in spoken NER but also sets promising outcomes for future evaluations, hinting at the need for comprehensive data collection to augment the results.

摘要
近期的Named Entity Recognition（NER）技术发展有所进步，有效地识别文本数据中的实体。然而，口语NER，是特殊的口语文检 Retrieval 领域，由于研究的限制和数据的缺乏，落后于NER。此外，口语NER的语言交互转移学习还未得到探索。这篇论文利用了语言交互转移学习 across Dutch, English, and German，使用管道和End-to-End（E2E）方案。我们使用Wav2Vec2-XLS-R模型在自定义pseudo-annotated dataset上进行了训练，并 investigate了多种架构以便适应跨语言系统的适应性。我们的结果表明，End-to-End口语NER比管道方式更高效，并且跨语言转移学习从德语到荷兰语的表现比荷兰E2E系统高出7%，并高过荷兰管道系统4%。这篇研究不仅证明了口语NER中的转移学习的可能性，还提供了未来评估中的优秀结果，强调了需要大量数据收集以增强结果。

The Evolution of Substance Use Coverage in the Philadelphia Inquirer

paper_url: http://arxiv.org/abs/2307.01299
repo_url: None
paper_authors: Layla Bouzoubaa, Ramtin Ehsani, Preetha Chatterjee, Rezvaneh Rezapour
for: 本研究旨在探讨媒体对非法药物使用的报道和讨论是如何发展和变化，以及这些报道对公众对毒瘾的观感、政策和公共健康OUTCOMES有什么影响。
methods: 本研究使用了157,476篇洛杉矶时报文章，时间跨度为10年，并从这些文章中选择了3,903篇文章，其中每篇文章至少提到一种通常被滥用的药物。
results: 研究发现，大麻和鸦片是报道的最多的药物类型，而幻觉药物则被更加正面地报道。相比之下，鸦片被报道的最为负面。这项研究的目的是强调媒体对毒瘾和药物使用的报道应该准确、包容，以便减少对毒瘾人士的刻板印象和恐慌。

Abstract
The media's representation of illicit substance use can lead to harmful stereotypes and stigmatization for individuals struggling with addiction, ultimately influencing public perception, policy, and public health outcomes. To explore how the discourse and coverage of illicit drug use changed over time, this study analyzes 157,476 articles published in the Philadelphia Inquirer over a decade. Specifically, the study focuses on articles that mentioned at least one commonly abused substance, resulting in a sample of 3,903 articles. Our analysis shows that cannabis and narcotics are the most frequently discussed classes of drugs. Hallucinogenic drugs are portrayed more positively than other categories, whereas narcotics are portrayed the most negatively. Our research aims to highlight the need for accurate and inclusive portrayals of substance use and addiction in the media.

摘要
媒体对非法药物使用的表达可能会导致有害的 sterotype 和偏见，影响公众对添iction的看法，政策和公共健康 outcome。为了探讨媒体对非法药物使用的话语和报道如何变化过时，这项研究分析了费城纪事报上的157,476篇文章，时间段为10年。研究选择了提及常用药物的文章，共3,903篇。我们的分析表明，大麻和毒品是最常讨论的药物类型。幻觉药物在其他类型中被更正面地描述，而毒品则被最为负面地描述。我们的研究旨在强调媒体对药物使用和添iction的精准和包容的报道是必要的。

Trainable Transformer in Transformer

paper_url: http://arxiv.org/abs/2307.01189
repo_url: https://github.com/abhishekpanigrahi1996/transformer_in_transformer
paper_authors: Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora
for: 这篇论文是为了探讨如何使用大型预训练语言模型进行在线学习（ICL），并在推理过程中 simulate 和 fine-tune 内部模型（例如 linear 或 2-layer MLP）。
methods: 该论文提出了一种高效的构建方法，即 Transformer in Transformer（简称 TinT），允许 transformer 模型在推理过程中 simulate 和 fine-tune 复杂的模型（例如预训练语言模型）。该方法使用了创新的近似技术，使得 TinT 模型只需要 fewer than 2 billion parameters 可以 simulate 和 fine-tune 125 million parameter transformer 模型。
results: 该论文通过进行综合的 end-to-end 实验 validate 了 TinT 模型的内部细化过程，并发现在不同的语言模型和下游任务上，TinT 模型可以提高性能 by 4-16% 绝对值。这些发现表明大型预训练语言模型可以执行复杂的子任务。

Abstract
Recent works attribute the capability of in-context learning (ICL) in large pre-trained language models to implicitly simulating and fine-tuning an internal model (e.g., linear or 2-layer MLP) during inference. However, such constructions require large memory overhead, which makes simulation of more sophisticated internal models intractable. In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e.g., pre-trained language models). In particular, we introduce innovative approximation techniques that allow a TinT model with less than 2 billion parameters to simulate and fine-tune a 125 million parameter transformer model within a single forward pass. TinT accommodates many common transformer variants and its design ideas also improve the efficiency of past instantiations of simple models inside transformers. We conduct end-to-end experiments to validate the internal fine-tuning procedure of TinT on various language modeling and downstream tasks. For example, even with a limited one-step budget, we observe TinT for a OPT-125M model improves performance by 4-16% absolute on average compared to OPT-125M. These findings suggest that large pre-trained language models are capable of performing intricate subroutines. To facilitate further work, a modular and extensible codebase for TinT is included.

摘要
最近的研究归功启发式学习（ICL）在大型预训练语言模型中的能力，是因为这些模型在推理过程中隐式地模拟和精细调整内部模型（例如线性或2层MLP）。然而，这些构造需要大量内存负担，使得更复杂的内部模型的模拟变得不可行。在这项工作中，我们提出了高效的构造方案——Transformer in Transformer（简称TinT），允许 transformer 模型在推理过程中内部模拟和精细调整复杂模型（例如预训练语言模型）。具体来说，我们提出了创新的近似技术，使得 TinT 模型 fewer than 2 billion parameters 可以在单个前进 passes 中模拟和精细调整 125 million parameter transformer 模型。TinT 支持许多常见的 transformer 变体，并且其设计思想还改进了过去简单模型在 transformers 中的效率。我们通过综合实验 validate 内部精细调整过程的有效性，并发现 TinT 对于不同语言模型和下游任务的性能都有明显提升。例如，即使只有一步预算，我们发现 TinT 对于 OPT-125M 模型可以提高性能的平均差值为 4-16%。这些发现表明大型预训练语言模型可以执行复杂的子过程。为了促进进一步研究，我们附加了可重用和扩展的代码库。

Improving Language Plasticity via Pretraining with Active Forgetting

paper_url: http://arxiv.org/abs/2307.01163
repo_url: None
paper_authors: Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe
for: 这篇论文的目的是提出一种简单的活动忘记机制，以便使PLMs可以快速适应新语言。
methods: 该论文使用了一种活动忘记机制，在预训练过程中每K更新一次 embedding layer，以便让PLMs能够快速学习新的 embedding。
results: 实验表明，使用该忘记机制可以使PLMs在语言适应过程中更快 converges，并且在具有少量数据的情况下，特别是与英语远程的语言，能够表现出更好的性能。

Abstract
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

摘要
现代自然语言处理（NLP）中的预训练语言模型（PLM）已成为主流模型。尽管它们在下游任务中表现出色，但是将PLM应用于新语言可能会困难，这可能限制了它们的 universality 性。先前的工作已经证明可以通过学习一个新的映射层来解决这个问题，但是这需要大量的数据和计算资源。我们提议使用活动忘记机制 durante la pretraining，以便快速地使PLM适应新语言。具体来说，在每个更新中重置 embedding layer，我们鼓励PLM在有限的更新数量内快速学习新的映射，类似于一种元学习效应。我们使用 RoBERTa 进行实验，并证明了在语言适应过程中使用我们的忘记机制可以更快地 converges，并且在数据量较少的情况下，特别是与英语较为 distant 的语言，模型的表现更出色。

Translating Latin with Artificial Intelligence

paper_url: http://arxiv.org/abs/2307.07520
repo_url: None
paper_authors: Sylvio R. Bistafa
for: 该研究旨在透过人工智能翻译技术，解决早期科学文献的可用性问题，特别是李奥纳·欧拉的作品。
methods: 本研究使用了两种流行的人工智能翻译算法，即Google Translate和ChatGPT，进行比较性测试，以验证它们的表现。
results: 测试结果表明，ChatGPT在翻译李奥纳·欧拉的1739年信件中表现出色，提供了优秀的翻译结果，这表明了ChatGPT可以作为一个有价值的翻译工具，不仅对普通的拉丁文献专家有帮助，还对特殊的拉丁文献翻译家有利。

Abstract
The major hindrance in the study of earlier scientific literature is the availability of Latin translations into modern languages. This is particular true for the works of Euler who authored about 850 manuscripts and wrote a thousand letters and received back almost two thousand more. The translation of many of these manuscripts, books and letters have been published in various sources over the last two centuries, but many more have not yet appeared. Fortunately, nowadays, the artificial intelligence AI translation can be used to circumvent the challenges of translating such substantial number of texts. To validate this tool, benchmark tests have been performed to compare the performance of two popular AI translating algorithms, namely Google Translate and ChatGPT. Since it was found that ChatGPT performed better on these tests, this translating support was then used on an excerpt of a 1739 letter from Johann Bernoulli to Euler, where he notifies that he was sending to Euler the first part of his manuscript Hydraulica. The findings highlight ChatGPT as a valuable translation tool, catering not only to general Latin practitioners but also proving beneficial for specialized Latin translators.

摘要
主要阻碍古科学文献研究的问题是现代语言中的拉丁文翻译的可用性。这 particualrly true for Euler 的作品，他撰写了约850份手稿和写了1000封信件，收到了 almost 2000封回信。许多这些手稿、书籍和信件的翻译已经在过去两个世纪出版，但还有很多没有出现。幸运的是，现在可以使用人工智能 AI 翻译工具来绕过这些文献的翻译挑战。为验证这个工具，我们进行了比较两个流行的 AI 翻译算法的 benchMark 测试，结果发现 ChatGPT 的表现更好，因此选择使用这个翻译支持。在一篇1739年的Euler 写给 Bernoulli 的信件中，Bernoulli 通知他将发送给 Euler 的第一部分的液体学 manuscript。这些发现 highlight ChatGPT 作为一个有价值的翻译工具，不仅有利于一般拉丁文翻译者，还有利于专业拉丁文翻译者。

2023-07-04

cs.LG

cs.LG - 2023-07-04

GHOST: A Graph Neural Network Accelerator using Silicon Photonics

paper_url: http://arxiv.org/abs/2307.01782
repo_url: None
paper_authors: Salma Afifi, Febin Sunny, Amin Shafiee, Mahdi Nikdast, Sudeep Pasricha
for: 这篇论文的目的是为了提出一种基于光学频谱的干扰器硬件加速器，用于加速图 neuron 网络（GNNs）的运算。
methods: 这篇论文使用了光学频谱技术，实现了图 neuron 网络的三个主要阶段（邻居更新、 message passing 和更新），并且可以用于多种广泛使用的 GNN 模型和架构，如图 convolution 网络和图注意力网络。
results: 根据 simulations 研究，GHOST 相比 GPU、TPU、CPU 和多种现有 GNN 硬件加速器，能够提供至少 10.2 倍的吞吐量和 3.8 倍的能效率。

Abstract
Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.

摘要
граф нейрон сети (GNNs) 已成为图Structured data的 мощful approached for modeling and learning. 多个领域受益于 GNNs 的能力, such as recommendation systems, social network analysis, drug discovery, and robotics. 然而，加速和有效地处理 GNNs 需要特殊的approach，以 beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. CMOS 平台的慢速下降也驱动了寻找代替实现SUBSTRATES. 在这篇论文中，我们提出了 GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU, and multiple state-of-the-art GNN hardware accelerators.

FedHIL: Heterogeneity Resilient Federated Learning for Robust Indoor Localization with Mobile Devices

paper_url: http://arxiv.org/abs/2307.01780
repo_url: None
paper_authors: Danish Gufran, Sudeep Pasricha
for: 本研究旨在提高设备不同、indoor环境多样化的情况下的indoor定位精度，同时保护用户数据隐私。
methods: 本研究提出了一种基于联合学习（Federated Learning，FL）和indoor定位的嵌入式机器学习框架（FedHIL），通过选择性调整量来维护ML模型的性能，并在不同设备和环境中实现高精度indoor定位。
results: 实验表明，FedHIL在多种不同的indoor环境和设备上都能够实现1.62倍的定位精度提高，较前期工作的最佳FL-based indoor定位框架的1.35倍。

Abstract
Indoor localization plays a vital role in applications such as emergency response, warehouse management, and augmented reality experiences. By deploying machine learning (ML) based indoor localization frameworks on their mobile devices, users can localize themselves in a variety of indoor and subterranean environments. However, achieving accurate indoor localization can be challenging due to heterogeneity in the hardware and software stacks of mobile devices, which can result in inconsistent and inaccurate location estimates. Traditional ML models also heavily rely on initial training data, making them vulnerable to degradation in performance with dynamic changes across indoor environments. To address the challenges due to device heterogeneity and lack of adaptivity, we propose a novel embedded ML framework called FedHIL. Our framework combines indoor localization and federated learning (FL) to improve indoor localization accuracy in device-heterogeneous environments while also preserving user data privacy. FedHIL integrates a domain-specific selective weight adjustment approach to preserve the ML model's performance for indoor localization during FL, even in the presence of extremely noisy data. Experimental evaluations in diverse real-world indoor environments and with heterogeneous mobile devices show that FedHIL outperforms state-of-the-art FL and non-FL indoor localization frameworks. FedHIL is able to achieve 1.62x better localization accuracy on average than the best performing FL-based indoor localization framework from prior work.

摘要
室内定位在应用程序中扮演着重要的角色，如应急应对、仓库管理和增强现实体验。通过在移动设备上部署机器学习（ML）基于的室内定位框架，用户可以在各种室内和地下环境中自动地标定自己的位置。然而，实现准确的室内定位可以是困难的，因为移动设备的硬件和软件栈的差异会导致不一致和不准确的位置估计。传统的ML模型也具有依赖于初始训练数据的问题，从而使其在室内环境中表现出现很大的变化和衰退。为解决设备不一致和数据变化导致的挑战，我们提出了一种新的嵌入式ML框架called FedHIL。FedHIL将室内定位和联邦学习（FL）结合起来，以提高设备不一致环境中的室内定位精度，同时也保护用户数据隐私。FedHIL使用域特定的选择性加重方法来保持ML模型在室内定位中的表现，即使面临非常噪音的数据时也能够保持高性能。实验证明，FedHIL在多个真实世界室内环境和不同的移动设备上表现出色，与传统的FL和非FL室内定位框架相比，具有1.62倍的本地化精度。

Shapley Sets: Feature Attribution via Recursive Function Decomposition

paper_url: http://arxiv.org/abs/2307.01777
repo_url: None
paper_authors: Torty Sivill, Peter Flach
for: 本研究旨在替代Feature Value Attribution中常用但可能受特征相互作用的Shapley值，提出一种新的归属方法——Shapley Set。
methods: 本研究使用了一种归属函数分解算法，将模型分解成不可分割变量组，并具有对数 linear 复杂度。
results: 研究表明，Shapley Set具有与Shapley值相同的公正性观念，并且可以避免基于Shapley值的归属方法中出现的坑。此外，Shapley Set在数据类型具有复杂依赖关系时表现 particullary 优异。

Abstract
Despite their ubiquitous use, Shapley value feature attributions can be misleading due to feature interaction in both model and data. We propose an alternative attribution approach, Shapley Sets, which awards value to sets of features. Shapley Sets decomposes the underlying model into non-separable variable groups using a recursive function decomposition algorithm with log linear complexity in the number of variables. Shapley Sets attributes to each non-separable variable group their combined value for a particular prediction. We show that Shapley Sets is equivalent to the Shapley value over the transformed feature set and thus benefits from the same axioms of fairness. Shapley Sets is value function agnostic and we show theoretically and experimentally how Shapley Sets avoids pitfalls associated with Shapley value based alternatives and are particularly advantageous for data types with complex dependency structure.

摘要
尽管Shapley值特征归功通用，但它们可能导致特征互动的启示，both model和数据级。我们提出了一种替代方案，即Shapley集，该奖励集合特征。Shapley集使用一种分解函数分解算法，将基础模型分解为不可分割变量组。对每个不可分割变量组，Shapley集归功其组合值 для特定预测。我们证明了Shapley集等于在转换特征集上的Shapley值，因此受到同样的公平原则保证。Shapley集是值函数无关的，我们 theoretically和实验表明，Shapley集可以避免基于Shapley值的代替方法中的坑害，特别是数据类型具有复杂依赖结构。

Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics

paper_url: http://arxiv.org/abs/2307.01770
repo_url: None
paper_authors: Guillaume Mahey, Laetitia Chapel, Gilles Gasso, Clément Bonet, Nicolas Courty
for: 本 paper 描述了一种新的 Wasserstein 距离代理（min-SWGG），该代理基于输送地图，并与 Wasserstein 泛化 геodesics 相关。
methods: 本 paper 使用了一种新的 Computational Scheme，可以使用 gradient descent 优化。此外，paper 还提供了一种关于 Wasserstein 距离的closed form解，并证明了 min-SWGG 是 Wasserstein 距离的上界，并且与 Sliced-Wasserstein 相似，但具有更多的特性。
results: 本 paper 通过 empirical evidences 支持 min-SWGG 在各种应用中的 beneficial 效果，包括梯度流、形状匹配和图像颜色化等。

Abstract
Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others.

摘要
瓦asserstein距离（WD）和相关的最优运输计划在probability measures中显示了有用性。在这篇论文中，我们提议一个新的proxy，称为min-SWGG，它基于两个输入分布的运输地图，它是通过一个优化的一维投影来定义的。我们将min-SWGG与通用水stein化曲线的关系进行连接，并在特定情况下提供一个新的准确 Wasserstein距离的closed form，使得可以使用梯度下降优化。我们证明min-SWGG是WD的上界，并且它的复杂性与Sliced-Wasserstein相似，但它具有提供相关运输计划的特点。我们还研究了一些理论性质，如metricity、weak convergence、computational和topological性。empirical evidence表明min-SWGG在各种场景中具有各种优点，从梯度流、形态匹配到图像颜色化等。

Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana

paper_url: http://arxiv.org/abs/2307.01767
repo_url: None
paper_authors: Darlington Akogo, Issah Samori, Cyril Akafia, Harriet Fiagbor, Andrews Kangah, Donald Kwame Asiedu, Kwabena Fuachie, Luis Oala
for: 论文旨在演示如何通过团队合作和数据工程来提高农业产量和食品安全。
methods: 论文使用了无人机采集的数据和机器学习算法来确定作物压力。
results: 研究实现了一个基于地 desktop 应用程序的本地化数据驱动解决方案，以提高农业生产力和食品安全。

Abstract
The Ghana Cashew Disease Identification with Artificial Intelligence (CADI AI) project demonstrates the importance of sound data work as a precondition for the delivery of useful, localized datacentric solutions for public good tasks such as agricultural productivity and food security. Drone collected data and machine learning are utilized to determine crop stressors. Data, model and the final app are developed jointly and made available to local farmers via a desktop application.

摘要
《加纳杏仁疾病识别用人工智能项目（CADI AI）》显示了数据工作的重要性，作为当地数据驱动解决方案的前提。该项目使用无人机收集数据和机器学习来确定作物压力。数据、模型和最终应用程序均由本地农民通过桌面应用程序获得。

Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification

paper_url: http://arxiv.org/abs/2307.01759
repo_url: https://github.com/lugges991/metaformer
paper_authors: Lucas Mahler, Qi Wang, Julius Steiglechner, Florian Birk, Samuel Heczko, Klaus Scheffler, Gabriele Lohmann
For: This paper proposes a novel framework for ASD classification using resting-state functional magnetic resonance imaging data.* Methods: The proposed framework, called METAFormer, utilizes a multi-atlas approach and self-supervised pretraining to improve classification performance.* Results: The proposed framework achieves state-of-the-art performance on the ABIDE I dataset, with an average accuracy of 83.7% and an AUC-score of 0.832.Here is the same information in Simplified Chinese text:* For: 这个论文提出了一种基于Resting-state功能磁共振成像数据的ASD分类方法。* Methods: 提议的方法是METAFormer，它使用多个图像的方法和自我批示训练来提高分类性能。* Results: 提议的方法在ABIDE I dataset上达到了状态之arte的性能，具体来说是83.7%的平均精度和0.832的AUC分数。

Abstract
Autism spectrum disorder (ASD) is a prevalent psychiatric condition characterized by atypical cognitive, emotional, and social patterns. Timely and accurate diagnosis is crucial for effective interventions and improved outcomes in individuals with ASD. In this study, we propose a novel Multi-Atlas Enhanced Transformer framework, METAFormer, ASD classification. Our framework utilizes resting-state functional magnetic resonance imaging data from the ABIDE I dataset, comprising 406 ASD and 476 typical control (TC) subjects. METAFormer employs a multi-atlas approach, where flattened connectivity matrices from the AAL, CC200, and DOS160 atlases serve as input to the transformer encoder. Notably, we demonstrate that self-supervised pretraining, involving the reconstruction of masked values from the input, significantly enhances classification performance without the need for additional or separate training data. Through stratified cross-validation, we evaluate the proposed framework and show that it surpasses state-of-the-art performance on the ABIDE I dataset, with an average accuracy of 83.7% and an AUC-score of 0.832. The code for our framework is available at https://github.com/Lugges991/METAFormer

摘要
“自闭症 спектルム病（ASD）是一种常见的心理疾病，具有异常的认知、情感和社交模式。及时和准确的诊断非常重要，以便为患有ASD的个体提供有效的 intervención和改善结果。在这项研究中，我们提出了一种新的多 Atlas 增强变换框架，METAFormer，用于ASD分类。我们的框架使用了ABIDE I 数据集中的406名ASD和476名 Typical control（TC）个体的休息态功能磁共振成像数据。METAFormer 使用多Atlas方法，其中扁平连接矩阵从AAL、CC200和DOS160 的图像服务器为变换器编码器的输入。我们表明，不需要额外或分离的训练数据，通过自我超vision的预训练，即将掩码的值重建为输入的masked 值，可以明显提高分类性能。通过 stratified 树目录验证，我们评估了提议的框架，并发现其在ABIDE I 数据集上的平均准确率为83.7%，AUC 分数为0.832。 code for our framework is available at https://github.com/Lugges991/METAFormer。”

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

paper_url: http://arxiv.org/abs/2307.01753
repo_url: https://github.com/mehdirezaie/dimagfnl
paper_authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho, Julien Guy, Klaus Honscheid, Theodore Kisner, Martin Landriau, Michael Levi, Marc Manera, Aaron Meisner, Ramon Miquel, Eva-Maria Mueller, Adam Myers, Jeffrey A. Newman, Jundan Nie, Nathalie Palanque-Delabrouille, Will Percival, Claire Poppett, Graziano Rossi, Eusebio Sanchez, Michael Schubnell, Gregory Tarlé, Benjamin Alan Weaver, Christophe Yèche, Zhimin Zhou, Hu Zou
For: The paper aims to constrain the local primordial non-Gaussianity parameter fNL using angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys.* Methods: The paper uses linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales, and tests the methods against log-normal simulations with and without fNL and systematics.* Results: The paper finds fNL $= 47^{+14(+29)}_{-11(-22)}$ at 68%(95%) confidence, with a maximum likelihood value of fNL $\sim 50$ and increased uncertainty when including a full set of imaging maps. The results indicate fNL > 0 with a 99.9 percent confidence level, which could be attributed to unforeseen systematics or a scale-dependent fNL model.

Abstract
We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter fNL. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range 0.2< z < 1.35. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against log-normal simulations with and without fNL and systematics, showing superior performance of the neural network treatment in reducing remaining systematics. Assuming the universality relation, we find fNL $= 47^{+14(+29)}_{-11(-22)}$ at 68\%(95\%) confidence. With a more aggressive treatment, including regression against the full set of imaging maps, our maximum likelihood value shifts slightly to fNL$ \sim 50$ and the uncertainty on fNL increases due to the removal of large-scale clustering information. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. Despite extensive efforts to mitigate systematics, our measurements indicate fNL > 0 with a 99.9 percent confidence level. This outcome raises concerns as it could be attributed to unforeseen systematics, including calibration errors or uncertainties associated with low-\ell systematics in the extinction template. Alternatively, it could suggest a scale-dependent fNL model--causing significant non-Gaussianity around large-scale structure while leaving cosmic microwave background scales unaffected. Our results encourage further studies of fNL with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics.

摘要
我们使用 DESI 图像观测的 Angular 卷积方法来约束本地原始非加性参数 fNL。我们的样本包括超过 12 百万目标，覆盖 14,000平方度天空，红shift 在 0.2 < z < 1.35 之间。我们认为 galactic 遮盖、观测深度和天文望远镜为主要系统性错误来源，并使用线性回归和人工神经网络来缓减非 cosmological 过卷 clustering。我们的方法在 log-normal simulations 中与和 без fNL 和系统atic 进行测试，显示人工神经网络处理的superior performance 在减少剩下系统atic。assuming универса性关系，我们得到 fNL = 47 ± 14 ± 29 的确idence Interval。通过对全aset of imaging maps进行回归，我们的最大似然值shift 到 fNL ≈ 50，并且因为移除大规模 clustering 信息而增加了 fNL 的不确定度。我们进行了一系列Robustness 测试（例如，对 imaging、 declination 或 scale 进行cut），发现结果是一致的。despite extensive efforts to mitigate systematics，我们的测量结果表明 fNL > 0 的99.9% 信任水平。这些结果可能被归因于未知系统atic，包括折合错误或低-\ell 系统atic 在 extinction 模板中的不确定度。 Alternatively，这些结果可能表明 scale-dependent fNL 模型，导致在大规模结构上显著的非 Gaussianity，而不影响cosmic microwave background 观测。我们的结果鼓励 DESI 光谱样本进一步研究 fNL，其中包括3D clustering modes，可以帮助分离图像系统atic。

SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection

paper_url: http://arxiv.org/abs/2307.01750
repo_url: None
paper_authors: Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo
for: 这个论文提出了一个新的单域泛化物体检测框架（即Single-DGOD），旨在学习和维护自增强采样的 semantic 结构，以提高模型的泛化能力。
methods: 论文提出了两个主要组件： texture-based self-augmentation (TBSA) 模块和 local-global semantic reasoning (LGSR) 模块。 TBSA 模块用于消除图像水平上的不相关属性，如光影、颜色等，而 LGSR 模块用于进一步模型实例层次的 semantic 关系，以帮助维护内在的 semantic 结构。
results: 对多个benchmark进行了广泛的实验，证明了提出的 SRCD 的效果。

Abstract
This paper provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model's generalization ability. Different from DGOD trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may have two potential limitations: 1) pseudo attribute-label correlation, due to extremely scarce single-domain data; and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this paper, we introduce Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module, and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD.

摘要
To address these limitations, this paper introduces Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. SRCD consists of two main components: the texture-based self-augmentation (TBSA) module and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by using a light-yet-efficient self-augmentation. LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures.Experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD. The main contributions of this paper are:1. A novel framework for Single-DGOD, which learns and maintains the semantic structures of self-augmented compound cross-domain samples.2. A new module called TBSA, which eliminates the effects of irrelevant attributes associated with labels at the image level.3. A module called LGSR, which models the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures.Overall, this paper presents a more effective and efficient approach to Single-DGOD, which can improve the generalization ability of object detection models in real-world applications.

RRCNN: A novel signal decomposition approach based on recurrent residue convolutional neural network

paper_url: http://arxiv.org/abs/2307.01725
repo_url: https://github.com/zhoudafa08/rrcnn
paper_authors: Feng Zhou, Antonio Cicone, Haomin Zhou
for: 这种研究的目的是为了开发一种基于深度学习的非站立信号分解方法，以提高现有方法的缺点，如边界和模式混合问题和噪声Robustness。
methods: 该方法使用了卷积神经网络、径向结构和非线性活动函数来计算信号的本地平均值，并在深度学习框架下实现了新的非站立信号分解方法。
results: 实验表明，提案的方法可以更好地处理边界问题、模式混合问题、噪声Robustness和分解结果的正交性，并且在计算本地平均值和信号分解两个方面都有更高的性能。

Abstract
The decomposition of non-stationary signals is an important and challenging task in the field of signal time-frequency analysis. In the recent two decades, many signal decomposition methods led by the empirical mode decomposition, which was pioneered by Huang et al. in 1998, have been proposed by different research groups. However, they still have some limitations. For example, they are generally prone to boundary and mode mixing effects and are not very robust to noise. Inspired by the successful applications of deep learning in fields like image processing and natural language processing, and given the lack in the literature of works in which deep learning techniques are used directly to decompose non-stationary signals into simple oscillatory components, we use the convolutional neural network, residual structure and nonlinear activation function to compute in an innovative way the local average of the signal, and study a new non-stationary signal decomposition method under the framework of deep learning. We discuss the training process of the proposed model and study the convergence analysis of the learning algorithm. In the experiments, we evaluate the performance of the proposed model from two points of view: the calculation of the local average and the signal decomposition. Furthermore, we study the mode mixing, noise interference, and orthogonality properties of the decomposed components produced by the proposed method. All results show that the proposed model allows for better handling boundary effect, mode mixing effect, robustness, and the orthogonality of the decomposed components than existing methods.

摘要
非站点信号的分解是信号时频分析领域中的一个重要和挑战性任务。过去二十年，许多基于实验模式分解的信号分解方法已经被不同的研究组织提出。然而，它们仍有一些限制，例如容易受边缘和模式混合效应的影响，并不够鲁棒对噪声。受图像处理和自然语言处理等领域的深度学习成功应用启发，我们使用卷积神经网络、循环结构和非线性活化函数计算非站点信号的本地均值，并研究了一种基于深度学习框架的新的非站点信号分解方法。我们讨论了该模型的训练过程和学习算法的整合分析。在实验中，我们评估了提案模型的性能从两个角度：计算本地均值和信号分解。此外，我们还研究了分解后的模式混合、噪声抑制和正交性特性。所有结果都表明，提案的模型可以更好地处理边缘效应、模式混合效应、鲁棒性和分解后的正交性。

MOPO-LSI: A User Guide

paper_url: http://arxiv.org/abs/2307.01719
repo_url: None
paper_authors: Yong Zheng, Kumar Neelotpal Shukla, Jasmine Xu, David, Wang, Michael O’Leary
for: 这份论文是为了提供一个开源的多目标投资套件库，用于实现可持续投资。
methods: 该论文使用了多目标优化算法来解决投资问题，并提供了一个可用的配置文件来定制算法的参数。
results: 该论文通过使用多目标优化算法，可以实现更好的投资效果，并提供了一个可用的配置文件来定制算法的参数。I hope that helps! Let me know if you have any other questions.

Abstract
MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.

摘要
MOPO-LSI是一个开源的多目标投资组合优化库，旨在推动可持续投资。这份文档提供MOPO-LSI版本1.0的用户指南，包括问题设置、工作流程和配置参数。

On the Constrained Time-Series Generation Problem

paper_url: http://arxiv.org/abs/2307.01717
repo_url: None
paper_authors: Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, Svitlana Vyetrenko
for: 这个论文的目的是解决受限时间序列生成问题，以提高机器学习算法的性能，增加罕见事件的发生频率，并生成对应的counterfactualenario。
methods: 这个论文提出了一种新的方法集，包括一种受限时间序列生成模型“GuidedDiffTime”，用于生成符合限制的时间序列。这些方法使用可导的扩散模型，并通过优化问题来保证生成的时间序列具有真实性。
results: 这个论文在金融和能源等领域进行了评估，并证明了其方法的优越性。具体来说，这些方法可以提高现有方法的性能，同时不需要重新训练，从而减少碳脚印。

Abstract
Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.

摘要
Synthetic time series 常用于实际应用中以增强机器学习算法的性能，增加罕见事件的发生频率，并创建对时间序列的counterfactualenario。例如，美国联邦储金行发布了基于受限时间序列的synthetic市场压力场景，用于金融机构评估其在假设的经济衰退中的性能。现有的时间序列生成方法通常是通过减少训练损失来实现约束，并拒绝不符合约束的样本。然而，这些方法需要重新训练，如果改变约束，并且拒绝样本可能是计算昂贵或对复杂约束来说不实际。在这篇论文中，我们提出一种新的方法来解决受约束时间序列生成问题，并提供高效的采样，以保证生成的时间序列的真实性。具体来说，我们将问题带入一个受约束优化框架，然后我们提出一种生成方法，包括“导航扩散模型”，用于生成真实的时间序列。在实际中，我们对金融和能源等数据集进行了评估，并证明我们的方法在质量和效率两个方面都有较好的表现。最重要的是，我们的“导航扩散模型”不需要重新训练，以避免重新训练所带来的碳脚印。

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework

paper_url: http://arxiv.org/abs/2307.01715
repo_url: None
paper_authors: Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua, Tal Rosenwein
for: 这篇论文的目的是提高CTC评价函数中的一种扩展，以便在训练seq2seq模型时提高模型的性能。
methods: 该论文提出了一种扩展CTC评价函数的方法，称为“Align With Purpose”，该方法通过添加一个额外的损失函数来优化模型的一定性能。
results: 该论文在自动语音识别领域中应用了该方法，并实现了在不同的性能指标下提高模型的性能。例如，在释放时间优化中，提高了570毫秒，而word error rate（WER）下降了4.5%。此外，该方法可以在大规模数据上进行扩展，并且可以通过只添加一些代码来实现。

Abstract
Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.

摘要
Connectionist Temporal Classification (CTC) 是一种广泛使用的训练监督序列到序列（seq2seq）模型的评价标准。它允许学习输入和输出序列之间的关系，称为对齐，通过对完美对齐（导致真实值）进行积分，而抛弃不完美对齐。这个二元对齐分类不足以捕捉其他重要的对齐属性，因此我们提出了 $\textit{Align With Purpose}$，一种通用的插件和替换框架。我们通过补充 CTC 的损失函数中的额外损失项，以便根据某种需要的属性进行对齐优化。我们的方法不需要对 CTC 损失函数进行任何修改，可以轻松地优化多种属性，并允许对不完美对齐进行分类。我们在自动语音识别（ASR）领域应用了我们的框架，并在不同的属性、结构和训练数据规模（最多 280,000 小时）上进行了证明。为了证明我们的框架的有效性，我们在两个不相关的属性上应用了它：发射时间和单词错误率（WER）。对于前者，我们报告了最多 570ms 的延迟优化和相对较小的 WER 降低，对于后者，我们报告了相对于基eline模型的4.5% WER 提高。这些应用都是在我们知道的数据规模上进行的，而且我们的方法只需要几行代码就可以实现，并且可以扩展到其他对齐不受限制的损失函数和领域。

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01708
repo_url: None
paper_authors: Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand
for: 本研究强调学习风险敏感奖励学习模型。
methods: 本文使用分布式奖励学习引入两种新的模型Equivalence定义，一种是通用的，可以用来奖励任何风险度量，但是 computationally intractable; 另一种是实用的，允许用户选择可以奖励的风险度量。
results: 我们的框架可以用来改进任何模型自由风险敏感算法，并在标准和大规模实验中证明其能力。

Abstract
We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.

摘要
我们考虑到风险敏感的强化学习问题。我们理论上显示，对于风险中立设定的价值相等方法，不够以来 пла номoptimal 的方式在风险敏感设定中。我们利用分布式强化学习来引入两个新的模型相等性，一个是一般的，可以用来 пла номoptimal 任何风险度量，但是computationally intractable;另一个是实用的，允许选择可以实时最佳化的风险度量。我们显示了我们的框架可以与任何风险敏感无模型学习算法结合，并提供了 Tabular 和大规模实验来证明其能力。

Online Learning and Solving Infinite Games with an ERM Oracle

paper_url: http://arxiv.org/abs/2307.01689
repo_url: None
paper_authors: Angelos Assos, Idan Attias, Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson
for: 这个论文主要针对的是在在线学习Setting中，使用ERM oracle calls来实现最佳化的泛化误差和均衡点。
methods: 这篇论文提出了一种基于ERM oracle calls的在线binary分类算法，以及在多 Player游戏中的均衡点算法，这些算法都可以在不同的游戏设定中实现最佳化的性能。
results: 论文表明了这种算法在可 réalisable Setting中有finite regret，在agnostic Setting中具有sublinearly growing regret，并且可以在不同的游戏设定中实现最佳化的性能，其性能与游戏的Littlestone和阈值维度有关。

Abstract
While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.

摘要
在随机学习设定下，ERM 已经足够保证逼近优化的泛化误差，但在在线学习设定下，算法们尚未知道是否可以达到优化的泛化误差。在这项工作中，我们提出了基于 ERM oracle 的在线二分类Setting 的算法，并证明其在可 realizable 设定下有finite regret，在agnostical 设定下有sublinearly growing regret。我们 bound regret 的大小与underlying 概念类型的 Littlestone 和阈值维度。在非参数学习游戏中，我们可以将 ERM oracle 解释为最佳回应 oracle，找到对某个玩家的历史玩家的最佳回应。在这个设定下，我们提供了基于最佳回应 oracle 的学习算法，可以在两 player zero-sum 游戏和多 player general-sum 游戏中达到approximate-minimax equilibria和approximate coarse correlated equilibria，只要游戏有 bounded fat-threshold dimension。我们的算法适用于 binary-valued 和 real-valued 游戏，可以视为对 double oracle 和多 oracle 算法在实践中的广泛使用提供 justify。

Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT Services

paper_url: http://arxiv.org/abs/2307.01684
repo_url: None
paper_authors: Liekang Zeng, Xu Chen, Peng Huang, Ke Luo, Xiaoxi Zhang, Zhi Zhou
for:Fograph is designed to provide real-time GNN inference for IoT-driven smart applications, leveraging the resources of multiple fog nodes to reduce communication overhead and improve performance.methods:Fograph employs heterogeneity-aware execution planning and GNN-specific compression techniques to optimize the performance of GNN inference in fog environments.results:Compared to state-of-the-art cloud serving and fog deployment, Fograph achieves up to 5.39x execution speedup and 6.84x throughput improvement, demonstrating its effectiveness in improving the performance of GNN-based services for IoT applications.

Abstract
Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.

摘要
граф neural networks (GNNs) 在不同的应用领域获得了不断增长的兴趣，主要是因为它们在图结构上能够激发出优秀的隐藏表示。为了在基于 IoT 的智能应用中提供 GNN 服务，传统的模型服务方式通常是通过完全上传到远程数据中心进行云计算。然而，我们的实验表明，这种云计算中的通信开销很大，而 fog 计算的出现也提供了一个可能性。为了最大化fog计算中的建筑减少开销，在这篇论文中，我们提出了一种分布式实时 GNN 推理框架，称之为 Fograph。 Fograph 利用了多个 fog 节点的多样化和动态资源，以便在 IoT 数据源附近进行 GNN 推理。通过对异质性的执行规划和 GNN 特定压缩技术，Fograph 的设计与 fog 环境中 GNN 服务的特点相匹配。实验和案例研究表明，Fograph 在比较云服务和 fog 部署时可以达到5.39倍的执行速度提升和6.84倍的吞吐量提高。

Learning Discrete Weights and Activations Using the Local Reparameterization Trick

paper_url: http://arxiv.org/abs/2307.01683
repo_url: None
paper_authors: Guy Berger, Aviv Navon, Ethan Fetaya
for: 降低计算机视和机器学习中的神经网络推断 computation和存储需求
methods: 使用二进制化来减少神经网络推断的计算复杂性，并通过使用比较快的位运算来替代慢速的浮点运算
results: 实现了降低计算机视和机器学习中神经网络推断的时间和内存占用，并达到了当前最佳性能的二进制激活神经网络推断

Abstract
In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. A commonplace solution to address this challenge is through the use of binarization. By binarizing the network weights and activations, one can significantly reduce computational complexity by substituting the computationally expensive floating operations with faster bitwise operations. This leads to a more efficient neural network inference that can be deployed on low-resource devices. In this work, we extend previous approaches that trained networks with discrete weights using the local reparameterization trick to also allow for discrete activations. The original approach optimized a distribution over the discrete weights and uses the central limit theorem to approximate the pre-activation with a continuous Gaussian distribution. Here we show that the probabilistic modeling can also allow effective training of networks with discrete activation as well. This further reduces runtime and memory footprint at inference time with state-of-the-art results for networks with binary activations.

摘要
在计算机视觉和机器学习中，一个重要挑战是降低神经网络推理的计算和内存需求。一种常见的解决方案是通过 binarization 来实现这一目标。通过将神经网络权重和活动化值binarized，可以在替换计算昂贵的浮点运算时大幅降低计算复杂性。这会导致更高效的神经网络推理，可以在低资源设备上部署。在这项工作中，我们extend了之前的方法，使得神经网络可以使用随机变量的批处理技术进行训练，而不是通过精确的权重值来进行训练。我们原始的方法是使用中心假设定理来近似预Activation的Continuous Gaussian Distribution。这里我们表明，可以通过概率模型来有效地训练具有随机变量的神经网络。这会进一步降低执行时间和内存占用，并且在state-of-the-art 的结果下，对于具有二进制活动化的神经网络进行推理。

Training Energy-Based Models with Diffusion Contrastive Divergences

paper_url: http://arxiv.org/abs/2307.01668
repo_url: None
paper_authors: Weijian Luo, Hao Jiang, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Zhihua Zhang
for: 这个论文主要针对的问题是如何改进对能量基模型（EBM）的训练方法，以提高EBM的生成能力和效率。
methods: 这个论文提出了一种新的启发对EBM的训练方法，即Diffusion Contrastive Divergence（DCD），它将Langevin dynamic更换为其他EBM参数自由的扩散过程。这种方法可以更高效地进行训练，并且不受非可忽略的梯度项的限制。
results: 作者在实验中表明，提出的DCD方法可以在生成数据集和高维图像噪声除除和生成任务中表现出色，比CD更高效和稳定。此外，DCD还能够训练EBM来生成Celab-A $32\times 32$数据集，与现有EBM相当。

Abstract
Energy-Based Models (EBMs) have been widely used for generative modeling. Contrastive Divergence (CD), a prevailing training objective for EBMs, requires sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which leads to an irreconcilable trade-off between the computational burden and the validity of the CD. Running MCMCs till convergence is computationally intensive. On the other hand, short-run MCMC brings in an extra non-negligible parameter gradient term that is difficult to handle. In this paper, we provide a general interpretation of CD, viewing it as a special instance of our proposed Diffusion Contrastive Divergence (DCD) family. By replacing the Langevin dynamic used in CD with other EBM-parameter-free diffusion processes, we propose a more efficient divergence. We show that the proposed DCDs are both more computationally efficient than the CD and are not limited to a non-negligible gradient term. We conduct intensive experiments, including both synthesis data modeling and high-dimensional image denoising and generation, to show the advantages of the proposed DCDs. On the synthetic data learning and image denoising experiments, our proposed DCD outperforms CD by a large margin. In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.

摘要
能量基模型（EBM）在生成模型方面广泛使用。对比差分泵（CD）是EBM训练的主要目标函数，但是使用Markov链约化 Monte Carlo方法（MCMC）来采样EBM，导致计算成本和验证CD之间存在不可 reconcile的负担。在MCMC运行至收敛之前，计算成本很高；另一方面，使用短跑MCMC会带来额外的非可忽略的参数梯度项，而且难以处理。在本文中，我们提供了CD的普遍解释，视其为我们提议的噪声对照分布（DCD）家族的特例。我们将CD中使用的朗格温动力换用其他EBM参数无关的扩散过程，并提出了更高效的分离。我们表明，我们提议的DCD比CD更高效，而且不受非可忽略的参数梯度项的限制。我们进行了广泛的实验，包括生成数据模型和高维图像减震和生成，以显示我们的提议DCD的优势。在生成数据学习和图像减震实验中，我们的提议DCD比CD大幅提高。在图像生成实验中，我们的提议DCD可以训练一个能量基模型，用于生成Celab-A $32\times 32$ 数据集，与现有EBM相当。

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

paper_url: http://arxiv.org/abs/2307.01649
repo_url: None
paper_authors: Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang
for: 这 paper 的目的是研究 Convolutional residual neural networks (ConvResNets) 的性能，并解释它们在实践中的出色预测能力，不能由 conventional wisdom 解释。
methods: 这 paper 使用 weight decay 来研究 ConvResNeXts 的表现，从非Parametric classification 的角度来看。
results: 研究表明，ConvResNeXts 可以具有高精度的预测性能，并且可以有效地适应函数的柔和性和低维度结构。

Abstract
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.

摘要

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

paper_url: http://arxiv.org/abs/2307.01646
repo_url: https://github.com/qiyan98/swingnn
paper_authors: Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang
for: 本文旨在提出一种基于卷积神经网络的非对称扩散模型，用于学习图数据上的非对称分布。
methods: 该模型使用高效的边到边2-WL消息传递网络，并利用Shifted Window基于SwinTransformers的自注意机制。
results: 经过系统的ablations和训练技巧优化，我们的SwinGNN在synthetic和实际的蛋白质和分子数据上达到了顶尖性能。

Abstract
Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

摘要
Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.Here's the translation in Traditional Chinese:Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

Heuristic Algorithms for the Approximation of Mutual Coherence

paper_url: http://arxiv.org/abs/2307.01639
repo_url: None
paper_authors: Gregor Betz, Vera Chekan, Tamara Mchedlidze
for: This paper is written for those interested in efficient computation of mutual coherence, particularly in the context of political preference matching systems like Wahl-O-Mat.
methods: The paper presents several heuristics to estimate the model parameters of a mixture of three Gaussians distribution, which is used to approximate the mutual coherence. Some of the algorithms are fully polynomial-time, while others require solving a small number of instances of the SAT model counting problem.
results: The paper reports the average squared error of the best algorithm, which is below 0.0035, indicating a high degree of accuracy while also being efficient. The results are precise enough to be used in Wahl-O-Mat-like systems.Here’s the same information in Simplified Chinese text:
for: 这篇论文是为了提高共谐度的计算效率而写的，特别是在政治偏好匹配系统中使用。
methods: 论文提出了一些用于估计三个 Gaussian 分布的模型参数的快速算法，其中一些是完全 полиномиаль时间的，而另一些只需解决一些 SAT 模型计数问题。
results: 论文报告了最佳算法的平均平方误差，为0.0035以下，表明高度准确并且高效。结果可以用于 Wahl-O-Mat 类系统中。

Abstract
Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems.

摘要
互相协调是两个意见之间的相似度度量。这个概念起源于哲学，但是它对广泛的技术领域都很重要，例如德国的 Wahl-O-Mat 系统。这个系统帮助选民找到最符合其政治偏好的候选人。计算互相协调的精确方法需要遍历所有意见的所有子集，并解决每个子集的 SAT 模型计数问题，这是计算机科学中知名的困难问题。这项工作是首次加速这种计算的研究。我们模型了确认值的分布为三个高斯分布的混合，并提供了高效的启发式来估算模型参数。然后，我们使用分布的期望值来 aproximate 互相协调。一些我们提出的算法是完全多项式时间的，其他些只需解决一些 SAT 模型计数问题。我们的最佳算法的平均方差平方误差低于 0.0035，这对于效率来说是无意义的。此外，我们的精度够精确，可以用于 Wahl-O-Mat 类系统。

HAGNN: Hybrid Aggregation for Heterogeneous Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.01636
repo_url: None
paper_authors: Guanghui Zhu, Zhennan Zhu, Hongyang Chen, Chunfeng Yuan, Yihua Huang
for: Handle heterogeneous graphs with rich type semantic information.
methods: + Propose a novel framework called HAGNN (Hybrid Aggregation for Heterogeneous GNNs) + Leverage both meta-path neighbors and directly connected neighbors for node aggregation + Divide the aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation + Use a new data structure called fused meta-path graph for intra-type aggregation + Perform structural semantic aware aggregation
results: + Outperform existing heterogeneous GNN models on node classification, node clustering, and link prediction tasks + Demonstrate the effectiveness of HAGNN in handling heterogeneous graphs with rich type semantic information.

Abstract
Heterogeneous graph neural networks (GNNs) have been successful in handling heterogeneous graphs. In existing heterogeneous GNNs, meta-path plays an essential role. However, recent work pointed out that simple homogeneous graph model without meta-path can also achieve comparable results, which calls into question the necessity of meta-path. In this paper, we first present the intrinsic difference about meta-path-based and meta-path-free models, i.e., how to select neighbors for node aggregation. Then, we propose a novel framework to utilize the rich type semantic information in heterogeneous graphs comprehensively, namely HAGNN (Hybrid Aggregation for Heterogeneous GNNs). The core of HAGNN is to leverage the meta-path neighbors and the directly connected neighbors simultaneously for node aggregations. HAGNN divides the overall aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation. During the intra-type aggregation phase, we propose a new data structure called fused meta-path graph and perform structural semantic aware aggregation on it. Finally, we combine the embeddings generated by each phase. Compared with existing heterogeneous GNN models, HAGNN can take full advantage of the heterogeneity in heterogeneous graphs. Extensive experimental results on node classification, node clustering, and link prediction tasks show that HAGNN outperforms the existing modes, demonstrating the effectiveness of HAGNN.

摘要
《异类图 neural network（GNN）在处理异类图方面取得成功。现有的异类GNN中，元路扮演着关键性的角色。然而，最近的研究表明，简单的同类图模型无需元路可以达到相似的结果，这意味着元路的必要性被质疑。在这篇论文中，我们首先介绍异类GNN中元路和无元路两种模型之间的本质差异，即如何选择节点 для节点聚合。然后，我们提出了一种新的框架，即Hybrid Aggregation for Heterogeneous GNNs（异类GNN中的混合聚合），用于全面利用异类图中各种类型Semantic信息。核心思想是同时利用元路邻居和直接连接邻居进行节点聚合。我们将整个聚合过程分成两个阶段：元路基于的内部聚合和元路无的交叉聚合。在内部聚合阶段，我们提出了一种新的数据结构called fused meta-path graph，并在其上进行结构层次意识感知聚合。最后，我们将每个阶段生成的embeddings合并。与现有的异类GNN模型相比，HAGNN可以全面利用异类图中的异类性。我们在节点分类、节点封顶和链接预测任务上进行了广泛的实验，结果显示，HAGNN在这些任务上表现出了更好的效果，证明了HAGNN的效果。》

Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network

paper_url: http://arxiv.org/abs/2307.01622
repo_url: https://github.com/mertnakip/Recurrent-Trend-Predictive-Neural-Network
paper_authors: Mert Nakıp, Onur Çopur, Emrah Biyik, Cüneyt Güzeliş
For: The paper proposes an advanced machine learning algorithm for efficient residential demand control in smart home energy management systems.* Methods: The proposed algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), simultaneously forecasts renewable energy generation and schedules household appliances, eliminating the need for separate algorithms.* Results: The evaluation results show that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than optimization while outperforming state-of-the-art forecasting techniques.Here is the same information in Simplified Chinese:* For: 本文提出了一种高效的机器学习算法，用于智能家庭能源管理系统中的居民需求控制。* Methods: 提议的算法是基于循环趋势预测神经网络的预测嵌入式调度算法（rTPNN-FES），同时预测可再生能源生产和家庭电器的调度。* Results: 评估结果显示，rTPNN-FES可以在优化过程中提供近似优化的调度，比起现有预测技术要高效，并且在37.5倍 faster than optimization。

Abstract
Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.

摘要
智能家庭能源管理系统可以使分布式电力网络运行更加高效和可靠，并允许有效地推进分布式可再生能源源。这些系统需要可靠的预测、优化和控制/调度算法，以处理各种不确定的需求和可再生能源生产。本文提出了一种高级的机器学习算法，即循环趋势预测神经网络基于预测嵌入的调度算法（rTPNN-FES），以提供高效的家庭需求控制。rTPNN-FES是一种新的神经网络架构，同时预测可再生能源生产和调度家用电器。由嵌入结构，rTPNN-FES消除了分离的预测和调度算法，生成一个强健对预测错误的负荷调度。本文还评估了提议的算法在智能家庭上的性能。评估结果显示，rTPNN-FES提供了近似优化的调度，比传统预测技术更高效，并且比优化算法更快，每秒37.5次。

SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting

paper_url: http://arxiv.org/abs/2307.01616
repo_url: None
paper_authors: Zhenwei Zhang, Xin Wang, Yuantao Gu
for: 本研究旨在提高多ivariate时间序列预测中的深度学习方法，尤其是Transformers的应用。
methods: 本paper引入了Series-aware Graph-enhanced Transformer模型，用于有效地捕捉和模型系列之间的依赖关系。
results: 经过广泛的实验研究，本paper显示了SageFormer模型在实际数据和 sintetic dataset上的superior性能，比之前的状态之 искусственный智能方法更高。

Abstract
Multivariate time series forecasting plays a critical role in diverse domains. While recent advancements in deep learning methods, especially Transformers, have shown promise, there remains a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.

摘要
多ivariate时间序列预测在多个领域发挥关键作用。最近的深度学习方法，特别是Transformers，已经显示了承诺，但还有一个差距在处理多个时间序列之间的相互依赖关系。这篇论文引入了SageFormer，一种基于图结构的Series-aware Graph-enhanced Transformer模型，用于有效地捕捉和模型多个时间序列之间的依赖关系。SageFormer解决了两个关键挑战：一是有效地表示多个时间序列中的多样化时间模式，二是避免多个时间序列之间的重复信息。重要的是，我们提出的系列意识框架可以轻松地与现有的Transformer-based模型结合使用，以提高对多个时间序列之间的依赖关系的模型。通过对真实世界和 sintetic 数据集进行广泛的实验，我们展示了SageFormer比前一个状态的方法更高效。

Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction

paper_url: http://arxiv.org/abs/2307.01610
repo_url: https://github.com/dependablesystemslab/mia_defense_hamp
paper_authors: Zitao Chen, Karthik Pattabiraman
For: This paper is written to address the problem of membership inference attacks (MIAs) on machine learning (ML) models, which can compromise the privacy of training data. The paper proposes a defense technique called HAMP that can provide strong membership privacy and high accuracy without requiring additional data.* Methods: The HAMP technique consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model’s prediction while still achieving high accuracy. The technique also modifies all prediction outputs to become low-confidence outputs, effectively obscuring the differences between the prediction on members and non-members.* Results: The paper conducts extensive evaluation on five benchmark datasets and shows that HAMP provides consistently high accuracy and strong membership privacy, outperforming seven state-of-the-art defenses in terms of privacy-utility trade-off.

Abstract
Machine learning (ML) models are vulnerable to membership inference attacks (MIAs), which determine whether a given input is used for training the target model. While there have been many efforts to mitigate MIAs, they often suffer from limited privacy protection, large accuracy drop, and/or requiring additional data that may be difficult to acquire. This work proposes a defense technique, HAMP that can achieve both strong membership privacy and high accuracy, without requiring extra data. To mitigate MIAs in different forms, we observe that they can be unified as they all exploit the ML model's overconfidence in predicting training samples through different proxies. This motivates our design to enforce less confident prediction by the model, hence forcing the model to behave similarly on the training and testing samples. HAMP consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model's prediction while still achieving high accuracy. To further reduce privacy risk, HAMP uniformly modifies all the prediction outputs to become low-confidence outputs while preserving the accuracy, which effectively obscures the differences between the prediction on members and non-members. We conduct extensive evaluation on five benchmark datasets, and show that HAMP provides consistently high accuracy and strong membership privacy. Our comparison with seven state-of-the-art defenses shows that HAMP achieves a superior privacy-utility trade off than those techniques.

摘要
为了mitigate MIA的不同形式，我们发现它们都利用了 ML 模型对训练样本的过于自信的预测，通过不同的代理来实现。这种情况使我们设计了一种强制模型在训练和测试样本上具有相同的预测行为的方法。HAMP 包括一种新的训练框架，高级 entropy 软标签和一种基于 entropy 的 regularizer，以防止模型的预测，同时仍然实现高准确率。为了进一步减少隐私风险，HAMP 对所有预测输出进行了一致的低信任输出修改，使模型的预测结果变得模拟，从而隐藏了训练和测试样本之间的差异。我们对五个 benchmark 数据集进行了广泛的评估，并显示了 HAMP 可以在高准确率和强大的成员隐私之间取得平衡。我们与七种 state-of-the-art 防御技术进行比较，发现 HAMP 在隐私利用与实用性之间取得了更好的平衡。

Prototypes as Explanation for Time Series Anomaly Detection

paper_url: http://arxiv.org/abs/2307.01601
repo_url: None
paper_authors: Bin Li, Carsten Jentsch, Emmanuel Müller
for: 本文针对时间序列资料中的异常模式探测，尤其是在没有标签的情况下，时间序列资料的动态性和未料到的异常行为导致探测过程具有挑战性。
methods: 本文提出了ProtoAD方法，利用示例来解释深度黑盒模型中的异常探测过程。在不对探测性能有重要影响的情况下，示例提供了深度黑盒模型中的透彻关键，并提供了域专家和投资者对模型的直觉理解。
results: 本文extend了广泛使用的示例学习在分类问题上的应用，并将其推广到异常探测问题上。通过视觉化示例的latent空间和输入空间，我们直观地解释了常规资料如何被模型，并且解释了具体的异常模式是如何被识别为异常的。

Abstract
Detecting abnormal patterns that deviate from a certain regular repeating pattern in time series is essential in many big data applications. However, the lack of labels, the dynamic nature of time series data, and unforeseeable abnormal behaviors make the detection process challenging. Despite the success of recent deep anomaly detection approaches, the mystical mechanisms in such black-box models have become a new challenge in safety-critical applications. The lack of model transparency and prediction reliability hinders further breakthroughs in such domains. This paper proposes ProtoAD, using prototypes as the example-based explanation for the state of regular patterns during anomaly detection. Without significant impact on the detection performance, prototypes shed light on the deep black-box models and provide intuitive understanding for domain experts and stakeholders. We extend the widely used prototype learning in classification problems into anomaly detection. By visualizing both the latent space and input space prototypes, we intuitively demonstrate how regular data are modeled and why specific patterns are considered abnormal.

摘要
检测时序序数据中异常模式的检测是许多大数据应用场景中的关键问题。然而，缺乏标签、时序序数据的动态性和未预期的异常行为使检测过程具有挑战性。虽然最近的深度异常检测方法已经取得了成功，但这些黑盒模型中的神秘机制成为了新的挑战。模型的不透明度和预测可靠性限制了进一步的突破。本文提出了ProtoAD，使用模型为异常检测中的示例基本解释。无需对检测性能产生显著影响，示例揭示了深度黑盒模型的内部机制，提供了域专家和投资者Intuitive的理解。我们将通用的 prototype 学习在分类问题中扩展到异常检测。通过视觉化 latent space 和输入空间示例，我们直观地解释了如何模型正常数据和哪些特定模式被视为异常。

A Scalable Reinforcement Learning-based System Using On-Chain Data for Cryptocurrency Portfolio Management

paper_url: http://arxiv.org/abs/2307.01599
repo_url: None
paper_authors: Zhenhan Huang, Fumihide Tanaka
For: The paper is written for proposing a novel reinforcement learning-based system for cryptocurrency portfolio management that incorporates on-chain data for end-to-end management.* Methods: The paper uses on-chain data to train a reinforcement learning model for cryptocurrency portfolio management, and the model is tested and evaluated using backtesting results on three portfolios.* Results: The results show that the proposed CryptoRLPM system outperforms all baselines in terms of accumulated rate of return, daily rate of return, and Sortino ratio, with an enhancement of at least 83.14%, 0.5603%, and 2.1767 respectively compared to Bitcoin.Here are the three points in Simplified Chinese text:
for: 这篇论文是为了提出一种基于强化学习的 криптовалю端folio管理系统，该系统包括了链上数据的测试和评估。
methods: 论文使用链上数据来训练一个基于强化学习的 криптовалю端folio管理模型，并对模型进行了测试和评估。
results: 结果显示，提出的 CryptoRLPM 系统在比基金的测试和评估中减少了至少 83.14%、0.5603% 和 2.1767% 的负面影响，并且在比特币方面减少了至少 83.14%、0.5603% 和 2.1767% 的负面影响。

Abstract
On-chain data (metrics) of blockchain networks, akin to company fundamentals, provide crucial and comprehensive insights into the networks. Despite their informative nature, on-chain data have not been utilized in reinforcement learning (RL)-based systems for cryptocurrency (crypto) portfolio management (PM). An intriguing subject is the extent to which the utilization of on-chain data can enhance an RL-based system's return performance compared to baselines. Therefore, in this study, we propose CryptoRLPM, a novel RL-based system incorporating on-chain data for end-to-end crypto PM. CryptoRLPM consists of five units, spanning from information comprehension to trading order execution. In CryptoRLPM, the on-chain data are tested and specified for each crypto to solve the issue of ineffectiveness of metrics. Moreover, the scalable nature of CryptoRLPM allows changes in the portfolios' cryptos at any time. Backtesting results on three portfolios indicate that CryptoRLPM outperforms all the baselines in terms of accumulated rate of return (ARR), daily rate of return (DRR), and Sortino ratio (SR). Particularly, when compared to Bitcoin, CryptoRLPM enhances the ARR, DRR, and SR by at least 83.14%, 0.5603%, and 2.1767 respectively.

摘要
币Chain数据（指标），类似于公司基础数据，为区块链网络提供了关键和全面的信息。尽管它们的信息性很高，但是它们在基于强化学习（RL）的系统中没有被利用，用于货币（简称为“爬”）股票管理（PM）。这是一个有趣的话题，即使用币Chain数据可以提高RL基本系统的回报性相比基准。因此，在这种研究中，我们提出了CryptoRLPM，一种包含五个单元的RL基本系统，用于综合管理爬股票。CryptoRLPM中的币Chain数据被测试和特定为每种爬股票，以解决币Chain数据的不准确性问题。此外，CryptoRLPM具有可扩展性，可以在任何时间更改股票组合中的爬股票。回testing结果表明，CryptoRLPM在三个股票组合上超过所有基准，在累积收益率（ARR）、日内收益率（DRR）和Sortino分数（SR）方面。特别是与比特币相比，CryptoRLPM在ARR、DRR和SR方面提高了至少83.14%、0.5603%和2.1767%。

Bridge the Performance Gap in Peak-hour Series Forecasting: The Seq2Peak Framework

paper_url: http://arxiv.org/abs/2307.01597
repo_url: None
paper_authors: Zhenwei Zhang, Xin Wang, Jingyuan Xie, Heling Zhang, Yuantao Gu
for: 预测峰值时间序列 (PHSF) 是许多领域的关键任务，但是目前的深度学习模型在这种任务上表现不佳。这可以归结于峰值时间序列的高度非站台性，导致直接预测更加困难于标准时间序列预测 (TSF)。
methods: 本文提出了一种名为Seq2Peak的新框架，用于解决 PHSF 任务中的性能差距。Seq2Peak 包括两个关键组件：一个名为 CyclicNorm 的管道，用于解决非站台性问题，以及一个简单 yet effective 的可学习参数自由的峰值时间序列解码器，使用了一种混合损失函数，将原始序列和峰值时间序列作为监督信号。
results: 对于四个实际世界数据集，Seq2Peak 实现了惊人的平均相对提升率为 37.7%，对于基于 transformer 和非 transformer 的 TSF 模型。

Abstract
Peak-Hour Series Forecasting (PHSF) is a crucial yet underexplored task in various domains. While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF. This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF. Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit. To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models. Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue, and a simple yet effective trainable-parameter-free peak-hour decoder with a hybrid loss function that utilizes both the original series and peak-hour series as supervised signals. Extensive experimentation on publicly available time series datasets demonstrates the effectiveness of the proposed framework, yielding a remarkable average relative improvement of 37.7\% across four real-world datasets for both transformer- and non-transformer-based TSF models.

摘要
《峰值小时序列预测（PHSF）是许多领域中的关键 yet 未得到充分的研究。当前的深度学习模型在标准时间序列预测（TSF）中表现出色，但在 PHSF 中却很难达到相似的结果。这可以归因于峰值小时序列的高度非站ARY，使得直接预测变得更加困难于标准 TSF。另外，通过手动提取最大值从 regular 预测结果来获得优化性能的方法会带来较差的性能，因为模型会尝试最小化均方误差。为解决这些问题，本文提出了 Seq2Peak 框架，这是专门为 PHSF 任务设计的。Seq2Peak 包括两个关键组成部分： CyclicNorm 管道，用于mitigate 非站ARY问题，以及一个简单 yet 高效的可学习参数无 peak-hour 解码器，使用了 Hybrid 损失函数，该函数使用原始序列和峰值小时序列作为监督信号。经过对公共可用时间序列数据集的广泛实验，Seq2Peak 的效果得到了许多实验证明，其中平均相对提升率为 37.7%，在四个实际世界数据集上。

Cross-Element Combinatorial Selection for Multi-Element Creative in Display Advertising

paper_url: http://arxiv.org/abs/2307.01593
repo_url: None
paper_authors: Wei Zhang, Ping Zhang, Jian Dong, Yongkang Wang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang
for: 本研究旨在提高广告创作的效果，通过采用跨元素共同选择机制来选择多个创意元素的合适组合。
methods: 本研究提出了一种跨元素共同选择框架（CECS），包括编码器过程和解码器过程。编码器过程采用跨元素交互来动态调整单个创意元素的表达，而解码器过程将创意组合问题转化为多个创意元素之间的链式选择问题。
results: 实验结果表明，CECS取得了最佳成绩（SOTA）在线上数据集上的评价指标，并在实际应用中实现了显著的6.02% CTR和10.37% GMV提升，这对业务具有益处。

Abstract
The effectiveness of ad creatives is greatly influenced by their visual appearance. Advertising platforms can generate ad creatives with different appearances by combining creative elements provided by advertisers. However, with the increasing number of ad creative elements, it becomes challenging to select a suitable combination from the countless possibilities. The industry's mainstream approach is to select individual creative elements independently, which often overlooks the importance of interaction between creative elements during the modeling process. In response, this paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements, termed CECS. In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element based on the current candidate creatives. In the decoder process, the creative combination problem is transformed into a cascade selection problem of multiple creative elements. A pointer mechanism with a cascade design is used to model the associations among candidates. Comprehensive experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics. Moreover, the CECS algorithm has been deployed in our industrial application, resulting in a significant 6.02% CTR and 10.37% GMV lift, which is beneficial to the business.

摘要
“广告创意的有效性受到它的视觉形象影响很大。广告平台可以通过结合广告主提供的创意元素，生成不同的创意形象。然而，随着创意元素的数量增加，选择适当的组合变得越来越困难。业界主流的方法是选择个别创意元素独立地，往往忽略了创意元素间的互动过程中的重要性。因此，这篇文章提出了跨元素选择框架（CECS）。在encode过程中，采用了跨元素互动来动态地调整单一创意元素的表达，以满足目前的候选者。在decode过程中，创意组合问题转化为多个创意元素之间的传递选择问题。使用一个链接机制，模型候选者之间的协力。实际测试统计表明，CECS已经 дости得了最佳成绩（SOTA）的数据。此外，CECS算法已经在我们的业务应用中实现了6.02%的Click Through Rate（CTR）和10.37%的Gross Merchandise Value（GMV）提升，对业务有很大的帮助。”

Learning Lie Group Symmetry Transformations with Neural Networks

paper_url: http://arxiv.org/abs/2307.01583
repo_url: https://github.com/victoria-klein/learning-lie-group-symmetries
paper_authors: Alex Gabel, Victoria Klein, Riccardo Valperga, Jeroen S. W. Lamb, Kevin Webster, Rick Quax, Efstratios Gavves
for: 检测和评估数据集中的对称性，用于模型选择、生成模型和数据分析等方面。
methods: 利用一种新的方法，可以自动发现数据集中的未知对称性，包括 Lie 群对称变换以外的其他对称性。
results: 研究得出的结果表明，该方法可以有效地检测和评估数据集中的对称性，并且可以在不同的参数值下进行一一对应。

Abstract
The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hard-coding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterizing unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings.

摘要
问题是检测和评估数据集中的对称性，具有各种应用，如模型选择、生成模型和数据分析等。现有的方法需要先知道任务的对称性，而这种工作则是通过发现数据集中未知对称性，即李群对称变换 beyond 传统 Considered in the field (旋转、缩放和平移)。 Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings.Note that the translation is in Simplified Chinese, which is the more commonly used variety of Chinese in mainland China. If you prefer Traditional Chinese, I can provide that as well.

IAdet: Simplest human-in-the-loop object detection

paper_url: http://arxiv.org/abs/2307.01582
repo_url: https://github.com/franchesoni/iadet
paper_authors: Franco Marchesoni-Acland, Gabriele Facciolo
for: 提高单类物体检测模型的训练效率和质量，通过人工监督系统。
methods: propose a Intelligent Annotation (IA) strategy, including three modules: 助手数据标注、背景模型训练和活动选择下一个数据点。开发了特定于单类物体检测的IAdet工具，并提出了自动评估这种人工监督系统的方法。
results: 在PASCAL VOC数据集上，IAdet工具可以减少数据库标注时间25%，并提供一个免费训练过的模型。这些结果是基于偏门设计的very simple IAdet design，因此IAdet具有多个简单的改进空间，预示了可以实现强大的人工监督对象检测系统。

Abstract
This work proposes a strategy for training models while annotating data named Intelligent Annotation (IA). IA involves three modules: (1) assisted data annotation, (2) background model training, and (3) active selection of the next datapoints. Under this framework, we open-source the IAdet tool, which is specific for single-class object detection. Additionally, we devise a method for automatically evaluating such a human-in-the-loop system. For the PASCAL VOC dataset, the IAdet tool reduces the database annotation time by $25\%$ while providing a trained model for free. These results are obtained for a deliberately very simple IAdet design. As a consequence, IAdet is susceptible to multiple easy improvements, paving the way for powerful human-in-the-loop object detection systems.

摘要
这个工作提出了一种名为智能注释（IA）的模型训练策略。IA包括三个模块：（1）助手数据注释、（2）背景模型训练和（3）活动选择下一个数据点。在这个框架下，我们开源了专门用于单类对象检测的IADE工具。此外，我们还提出了一种自动评估这种人在循环系统的方法。对于PASCAL VOC数据集，IADE工具可以降低数据库注释时间$25\%$，同时提供免费的训练模型。这些结果是基于故意设计非常简单的IADE设计而得到的。因此，IADE易受到多个简单的改进，开展出具有强大人在循环对象检测系统的可能性。

Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

paper_url: http://arxiv.org/abs/2307.01578
repo_url: None
paper_authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo
for: 本研究旨在解决人工监督学习中数据注释的缺失问题，即使用一个预测器可以获得更多的注释数据。
methods: 本研究使用了一种枚举编码法来寻找最优的问题策略，以及一些启发式和lookahead最小化代理成本函数的方法。
results: 研究表明，使用提议的方法可以在几种 sintetic 和实际世界的数据集上实现23-86%的注释效率提升。

Abstract
Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement ($23-86\%$) in annotation efficiency.

摘要

Multi-Task Learning to Enhance Generazability of Neural Network Equalizers in Coherent Optical Systems

paper_url: http://arxiv.org/abs/2307.05374
repo_url: None
paper_authors: Sasipim Srivallapanondh, Pedro J. Freire, Ashraful Alam, Nelson Costa, Bernhard Spinnler, Antonio Napoli, Egor Sedov, Sergei K. Turitsyn, Jaroslaw E. Prilepsky
for: 提高减噪系统的灵活性
methods: 使用多任务学习方法提高NN基于的平衡器
results: 单个NN基于平衡器可以提高Q因子至4dB，不需要重新训练，即使发射功率、符号速率或传输距离发生变化。

Abstract
For the first time, multi-task learning is proposed to improve the flexibility of NN-based equalizers in coherent systems. A "single" NN-based equalizer improves Q-factor by up to 4 dB compared to CDC, without re-training, even with variations in launch power, symbol rate, or transmission distance.

摘要

Approximate information for efficient exploration-exploitation strategies

paper_url: http://arxiv.org/abs/2307.01563
repo_url: None
paper_authors: Alex Barbier-Chebbah, Christian L. Vestergaard, Jean-Baptiste Masson
for: 这篇论文目标是解决决策中的探索-利用矛盾，具体是多重枪支问题。
methods: 这篇论文提出了一种新的算法，即approximate information maximization（AIM），该算法使用分析式 entropy 导数来选择每个时刻哪个枪支。AIM 与 Infomax 和 Thompson sampling 性能相同，同时具有加速、决定性和可追踪性等优点。
results: 实验证明 AIM 遵循 Lai-Robbins asymptotic bound，并在不同的假设下表现稳定。其表达可调，可以根据具体情况进行特定优化。

Abstract
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a range of priors. Its expression is tunable, which allows for specific optimization in various settings.

摘要

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

paper_url: http://arxiv.org/abs/2307.03270
repo_url: https://github.com/louisbearing/hmo-audio
paper_authors: Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda
for: 这个论文主要针对的是使用深度生成模型来动画非动体图像，以实现更加自然的头部动作和语音同步。
methods: 该论文提出了一种多尺度音视频同步损失函数和多尺度自适应GAN，以更好地处理语音和头部动作之间的短期和长期相关性。
results: 实验表明，该方法可以在多尺度音视频同步和头部动作质量上达到州前的提升，并且在标准的面部特征域中生成更加自然的头部动作。

Abstract
Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.

摘要
<> transtable text into Simplified Chinese.<>使用深度生成模型动画静止图像是一个活跃的研究领域，最近几年得到了重要的进步。然而，许多努力都是 lip syncing 和图像质量的优化，而生成自然的头部运动和语音-图像相关性往往被忽略。在这项工作中，我们提议一种多尺度音视频同步损失和多尺度自适应GAN，以更好地处理语音和头部运动之间的短期和长期相关性。特别是，我们在多modal输入 pyramids 上堆叠 syncer 模型，并使用这些模型作为导向在多尺度生成网络中生成音频同步的动作。我们的生成器在 facial landmark 领域中运行，这是一个标准的低维度头部表示。实验结果表明，我们的方法可以在头部运动动态质量和多尺度音视频同步两个方面达到显著提高。

Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones

paper_url: http://arxiv.org/abs/2307.01559
repo_url: None
paper_authors: Elia Cereda, Alessandro Giusti, Daniele Palossi
for: 这个研究旨在解决单位大小仅对应小型飞行器（nano-drone）上进行大型深度学习模型的问题。
methods: 本研究提出了一种分布式边缘-fog计算模型，以实现在nano-drone上进行大型深度学习模型的执行。此外，本研究还提出了一种验证fog计算的方法，以确保fog节点或通信链路不可信。
results: 相比于完全在nano-drone上执行的现有Visual Pose Estimation网络，这个分布式边缘-fog执行方案可以提高$R^2$ score +0.19。在攻击情况下，本方法可以在2秒内检测攻击，95%的概率可以检测到。

Abstract
Palm-sized nano-drones are an appealing class of edge nodes, but their limited computational resources prevent running large deep-learning models onboard. Adopting an edge-fog computational paradigm, we can offload part of the computation to the fog; however, this poses security concerns if the fog node, or the communication link, can not be trusted. To tackle this concern, we propose a novel distributed edge-fog execution scheme that validates fog computation by redundantly executing a random subnetwork aboard our nano-drone. Compared to a State-of-the-Art visual pose estimation network that entirely runs onboard, a larger network executed in a distributed way improves the $R^2$ score by +0.19; in case of attack, our approach detects it within 2s with 95% probability.

摘要
手持式奈米型机器人的 Computational Resources 有限，无法进行大型深度学习模型的 Calculation。我们运用 Edge-Fog 计算模式，将一部分计算推广到fog中，但这会带来安全性 Concern ，如果fog Node 或通信链路不能被信任。为解决这问题，我们提出了一个分布式 Edge-Fog 执行方案，透过重复运行 Random Subnetwork 在我们的奈米型机器人上，以验证fog计算。相比于完全在board上运行的 State-of-the-Art 视觉 pose 估测网络，分布式执行的大型网络可以提高 $R^2$ 分数 +0.19; 在攻击情况下，我们的方法可以在2秒内检测到攻击，95%的机会性。

Multi-gauge Hydrological Variational Data Assimilation: Regionalization Learning with Spatial Gradients using Multilayer Perceptron and Bayesian-Guided Multivariate Regression

paper_url: http://arxiv.org/abs/2307.02497
repo_url: None
paper_authors: Ngo Nghi Truyen Huynh, Pierre-André Garambois, François Colleoni, Benjamin Renard, Hélène Roux
for: 这篇论文旨在解决水文模型中难以估计的空间分布型水文参数问题，特别是无测水道上的洪水。
methods: 本研究使用了一种新的区域化技术，将复杂的区域转换函数融合到高分辨率水文模型中，以便使用机器学习优化算法进行学习。
results: 本研究获得了一种可靠地估计水文模型中的空间分布型参数，并且可以处理多测站数据，实现了高精度的水文预测。

Abstract
Tackling the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, this contribution presents a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on: (i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or (ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions. The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.

摘要
solves the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, by presenting a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on:(i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or(ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions.The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.

Scalable variable selection for two-view learning tasks with projection operators

paper_url: http://arxiv.org/abs/2307.01558
repo_url: https://github.com/aalto-ics-kepaco/projse
paper_authors: Sandor Szedmak, Riikka Huusari, Tat Hong Duong Le, Juho Rousu
for: 该论文提出了一种新的变量选择方法，适用于两视图设置或vector-valued超vision学习问题。该方法可以处理巨大规模的选择任务，数据样本数可以达到百万级。
methods: 该方法通过Iteratively选择高度相关于输出变量的变量，但不相关于先前选择的变量。为度量相关性，该方法使用投影算子和其代数。通过投影算子，输入和输出变量之间的关系可以表示为kernel函数，从而可以利用非线性相关模型。
results: 该方法在synthetic和实际数据上进行了实验 validate，显示了其扩展性和选择的有效性。

Abstract
In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables, but which are not correlated with the previously chosen variables. To measure the correlation, our method uses the concept of projection operators and their algebra. With the projection operators the relationship, correlation, between sets of input and output variables can also be expressed by kernel functions, thus nonlinear correlation models can be exploited as well. We experimentally validate our approach, showing on both synthetic and real data its scalability and the relevance of the selected features. Keywords: Supervised variable selection, vector-valued learning, projection-valued measure, reproducing kernel Hilbert space

摘要
在这篇论文中，我们提出了一种新的变量选择方法，适用于两视设定或vector-valued学习问题。我们的框架可以处理非常大规模的选择任务，数据样本数可以达到百万级。总之，我们的方法通过逐步选择输出变量高度相关的变量，但不相关于已经选择的变量来进行变量选择。为了度量相关性，我们使用投影算子和其代数来度量输入和输出变量之间的关系。通过投影算子，我们可以将输入和输出变量之间的关系表示为内积函数，从而可以利用内积函数来表示非线性相关模型。我们在实验中 validate our approach，并在 synthetic 和实际数据上证明了我们的方法的扩展性和选择的相关性。关键词：supervised变量选择、vector-valued学习、投影值度量、 reproduce kernel Hilbert space

Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error Diffusion

paper_url: http://arxiv.org/abs/2307.02496
repo_url: None
paper_authors: Nishant Kumar, Lukas Krause, Thomas Wondrak, Sven Eckert, Kerstin Eckert, Stefan Gumhold
for: 用于实现可持续的氢生产
methods: 使用外部磁场探测器和归一化方法测量磁场干扰，并使用INN重建电导率场
results: 比使用提高方法（Tikhonov regularization）表现更好，可以高精度地重建电导率场

Abstract
Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.

摘要
<>使用电解为绿色氢生产的关键步骤，但在过程中生成的气泡会阻碍反应、降低电池效率和增加能源消耗。此外，这些气泡会导致电池内的导电性变化，从而导致电磁场附近电池的变化。因此，通过外部磁场探测器测量气泡启发的磁场变化，并解决生成的Biot-Savart法的反问题，可以估算电池内的导电性，并由此计算气泡的大小和位置。但是，从仅几个磁场测量得到高分辨率导电地图是一个不定义的倒数问题。为了解决这个问题，我们利用归一化神经网络（INNs）重建导电场。我们的质量结果和随机扩散评价表明，INN在性能方面远胜于TIkhonov正则化。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.01524
repo_url: https://github.com/DL4Compression/Semantic_Segmentation_of_Driving_Videos_on_Learning_based_Image_Compression
paper_authors: Ravi Kakaiya, Rakshith Sathish, Ramanathan Sethuraman, Debdoot Sheet
for: 提高自动驾驶和高级驾驶助手系统（ADAS）的性能和可扩展性。
methods: 使用学习基于的压缩编码器来减少传输数据的延迟，并且通过学习的方式使得压缩编码器可以同时执行压缩和解压缩操作。
results: 在Cityscapes dataset上实验 validate the proposed pipeline，实现了压缩因子达66倍，保留了segmenation任务所需的信息，而且降低了总计算量11%。

Abstract
Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have the potential to radically change the way we travel. Many such vehicles currently rely on segmentation and object detection algorithms to detect and track objects around its surrounding. The data collected from the vehicles are often sent to cloud servers to facilitate continual/life-long learning of these algorithms. Considering the bandwidth constraints, the data is compressed before sending it to servers, where it is typically decompressed for training and analysis. In this work, we propose the use of a learning-based compression Codec to reduce the overhead in latency incurred for the decompression operation in the standard pipeline. We demonstrate that the learned compressed representation can also be used to perform tasks like semantic segmentation in addition to decompression to obtain the images. We experimentally validate the proposed pipeline on the Cityscapes dataset, where we achieve a compression factor up to $66 \times$ while preserving the information required to perform segmentation with a dice coefficient of $0.84$ as compared to $0.88$ achieved using decompressed images while reducing the overall compute by $11\%$.

摘要
自动驾驶车和高级驾驶助手系统（ADAS）有可能改变我们的旅行方式。许多这些车辆目前都使用分割和对象检测算法来检测和跟踪周围的对象。收集到的数据通常会被发送到云服务器以便持续/人生学习这些算法。由于带宽约束，数据通常会被压缩后发送到服务器，其中它们通常会被解压缩以进行训练和分析。在这种情况下，我们提议使用学习基于压缩编码器来减少标准管道中的延迟过载。我们示出了learned压缩表示可以用于实现像semantic segmentation这样的任务，而不需要解压缩。我们对Cityscapes数据集进行实验，并实现了最多$66\times$的压缩因子，保留了需要进行分割的信息，并且将compute总体减少$11\%$。

Deep Attention Q-Network for Personalized Treatment Recommendation

paper_url: http://arxiv.org/abs/2307.01519
repo_url: https://github.com/stevenmsm/rl-icu-daqn
paper_authors: Simin Ma, Junghwan Lee, Nicoleta Serban, Shihao Yang
for: 这篇论文旨在提供个性化治疗建议，以实现医疗结果最佳化。
methods: 本研究使用深度注意力Q网络（DAQN），利用对应架构内的强化学习框架，高效地包含所有过去病人观察数据。
results: 比较先前的模型，本研究的DAQN模型在实际世界的 septic shock 和急性低血压患者群中表现出色，显示其超越性。

Abstract
Tailoring treatment for individual patients is crucial yet challenging in order to achieve optimal healthcare outcomes. Recent advances in reinforcement learning offer promising personalized treatment recommendations; however, they rely solely on current patient observations (vital signs, demographics) as the patient's state, which may not accurately represent the true health status of the patient. This limitation hampers policy learning and evaluation, ultimately limiting treatment effectiveness. In this study, we propose the Deep Attention Q-Network for personalized treatment recommendations, utilizing the Transformer architecture within a deep reinforcement learning framework to efficiently incorporate all past patient observations. We evaluated the model on real-world sepsis and acute hypotension cohorts, demonstrating its superiority to state-of-the-art models. The source code for our model is available at https://github.com/stevenmsm/RL-ICU-DAQN.

摘要
个人化治疗是现代医疗的关键，但是实现优化医疗效果却是挑战。 latest advances in reinforcement learning 提供了个人化治疗建议的可能性，但是它们只基于当前患者的观察数据（生命 Parameters, demographics）来定义患者的状态，这可能不准确地反映患者的真实健康状况。这种限制策略学习和评估，最终限制了治疗效果。在这项研究中，我们提出了 Deep Attention Q-Network，使用 transformer 架构在深度强化学习框架中高效地包含所有过去患者的观察数据。我们对现实世界的 septic shock 和急性低血压群体进行了评估，并证明了我们的模型在现有模型之上表现出色。我们的模型的源代码可以在 https://github.com/stevenmsm/RL-ICU-DAQN 上获取。

SelfFed: Self-supervised Federated Learning for Data Heterogeneity and Label Scarcity in IoMT

paper_url: http://arxiv.org/abs/2307.01514
repo_url: None
paper_authors: Sunder Ali Khowaja, Kapal Dev, Syed Muhammad Anwar, Marius George Linguraru
for: 这个研究旨在提出一个基于自适应学习的联邦学习框架，以实现在对没有标签的隔离数据上进行协同学习。
methods: 我们提出了一个名为SelfFed的框架，它包括两个阶段：首先是预训阶段，使用Swin Transformer基本Encoder进行增强模型，在分散式的方式下进行执行。其次是精度调整阶段，引入对照网络和一个新的聚合策略，在分散式的方式下进行训练，以解决标签稀缺问题。
results: 我们在公共可用的医疗图像数据集上进行实验分析，结果显示，我们的提出的SelfFed框架在非Identical和相似数据（IID） dataset上比基于已有的基eline出perform得更好，具体的提高8.8%和4.1%在Retina和COVID-FL数据集上。此外，我们的方法甚至在仅使用10%标签的情况下也能超越现有的基eline。

Abstract
Self-supervised learning in federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for Internet of Medical Things (IoMT). Our proposed SelfFed framework works in two phases. The first phase is the pre-training paradigm that performs augmentive modeling using Swin Transformer based encoder in a decentralized manner. The first phase of SelfFed framework helps to overcome the data heterogeneity issue. The second phase is the fine-tuning paradigm that introduces contrastive network and a novel aggregation strategy that is trained on limited labeled data for a target task in a decentralized manner. This fine-tuning stage overcomes the label scarcity problem. We perform our experimental analysis on publicly available medical imaging datasets and show that our proposed SelfFed framework performs better when compared to existing baselines concerning non-independent and identically distributed (IID) data and label scarcity. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID dataset. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.

摘要
“自我指导学习在联合学习框架中得到了产业和研究领域的广泛关注，因为它可以在不同数据源上进行协同学习，无需标签数据。然而，基于自我指导学习的联合学习策略受到数据不均衡和标签稀缺的限制，即数据多样性问题。在本文中，我们提出了基于互联网医疗器件（IoMT）的SelfFed框架。我们的提议的SelfFed框架分为两个阶段。第一阶段是预训练阶段，使用Swin Transformer基于编码器进行增强模型，在分布式方式下进行。第一阶段的SelfFed框架帮助解决数据多样性问题。第二阶段是精度调整阶段，引入对照网络和一种新的聚合策略，在分布式方式下进行限制标签数据的训练。这个精度调整阶段帮助解决标签稀缺问题。我们在公开available的医学成像数据集上进行实验分析，并证明我们的提议的SelfFed框架在非Identical和相似数据（IID）下性能更好，提高了8.8%和4.1%的提升。此外，我们的提议方法还能在只有10%标签实例的情况下超越现有的基准值。”

Relation-aware graph structure embedding with co-contrastive learning for drug-drug interaction prediction

paper_url: http://arxiv.org/abs/2307.01507
repo_url: None
paper_authors: Mengying Jiang, Guizhong Liu, Biao Zhao, Yuanchao Su, Weiqiang Jin
for: 预测多种关系 drug-drug interaction (DDIs)
methods: 使用 relation-aware graph structure embedding (RaGSE) with co-contrastive learning
results: 在三个任务上比前state-of-the-art方法表现出色，得到更好的预测结果

Abstract
Relation-aware graph structure embedding is promising for predicting multi-relational drug-drug interactions (DDIs). Typically, most existing methods begin by constructing a multi-relational DDI graph and then learning relation-aware graph structure embeddings (RaGSEs) of drugs from the DDI graph. Nevertheless, most existing approaches are usually limited in learning RaGSEs of new drugs, leading to serious over-fitting when the test DDIs involve such drugs. To alleviate this issue, we propose a novel DDI prediction method based on relation-aware graph structure embedding with co-contrastive learning, RaGSECo. The proposed RaGSECo constructs two heterogeneous drug graphs: a multi-relational DDI graph and a multi-attribute drug-drug similarity (DDS) graph. The two graphs are used respectively for learning and propagating the RaGSEs of drugs, aiming to ensure all drugs, including new ones, can possess effective RaGSEs. Additionally, we present a novel co-contrastive learning module to learn drug-pairs (DPs) representations. This mechanism learns DP representations from two distinct views (interaction and similarity views) and encourages these views to supervise each other collaboratively to obtain more discriminative DP representations. We evaluate the effectiveness of our RaGSECo on three different tasks using two real datasets. The experimental results demonstrate that RaGSECo outperforms existing state-of-the-art prediction methods.

摘要
“关系意识的图结构嵌入显示了在预测多关系药物交互（DDIs）方面的承诺。通常，现有的方法都是从多关系DDIs图构建起来，然后学习关系意识图结构嵌入（RaGSEs）。然而，这些方法通常只能学习新药物的RaGSEs，导致在测试DDIs中严重过拟合。为解决这个问题，我们提出了一种基于关系意识图结构嵌入和协同对比学习的新DDIs预测方法，即RaGSECo。提案的RaGSECo构建了两个不同类型的药物图：一个多关系DDIs图和一个多属性药物对比图。这两个图用于学习和传播药物的RaGSEs，以确保所有药物，包括新的一些，都可以具有有效的RaGSEs。此外，我们还提出了一种新的协同对比学习模块，用于学习药物对的表示。这个机制从两种不同的视图（交互视图和相似视图）中学习药物对的表示，并且使这两个视图相互监督each other以获得更有特征的药物对表示。我们使用三个不同的任务和两个真实数据集进行了实验，结果显示，RaGSECo在这些任务中表现出了更高的效果。”

All in One: Multi-task Prompting for Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.01504
repo_url: https://github.com/sheldonresearch/ProG
paper_authors: Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan
for: 填充预训练模型的知识空间，以便更好地应对不同的图任务。
methods: 提出了一种基于多 зада务提问的图模型提问方法，包括对图提问和自然语言提问的融合、对图任务的重新定义以适应预训练模型，以及使用元学习来快速学习更好的初始化方法。
results: 经过广泛的实验，结果表明该方法可以在不同的图任务上达到更高的性能。

Abstract
Recently, ''pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a ''negative transfer'' to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.

摘要
近些时候，“预训练和精度调整”成为了许多图任务的标准工作流程，因为它可以帮助图模型学习通用的图知识，从而缓解每个应用程序缺乏图注释的问题。然而，图任务中的节点层、边层和图层具有广泛的多样性，这些预训练预TeX often incompatible with these multiple tasks，这可能会导致“负面传播”，从而影响特定应用程序的结果。 inspirited by the prompt learning in natural language processing (NLP), which has shown significant effectiveness in leveraging prior knowledge for various NLP tasks, we investigate the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks.在这篇论文中，我们提出了一种新的多任务提问方法 для图模型。具体来说，我们首先将图提问和语言提问的格式统一为Prompt Token、Token结构和插入模式。这样，NLP中的提问思想可以轻松地在图领域中引入。然后，为了进一步缩小不同图任务和当前预训练策略之间的差距，我们进一步研究了各种图应用程序的任务空间，并重新表述下游问题为图级任务。最后，我们引入了元学习，以更有效地学习多任务提问中的初始化，以使我们的提问框架更可靠和通用于不同任务。我们进行了广泛的实验，实验结果表明了我们的方法的优越性。

Accelerated stochastic approximation with state-dependent noise

paper_url: http://arxiv.org/abs/2307.01497
repo_url: None
paper_authors: Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li
for: solves a class of stochastic smooth convex optimization problems with general noise assumptions.
methods: uses two non-Euclidean accelerated stochastic approximation routines: stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE).
results: achieves the optimal convergence rate and attains the optimal iteration and sample complexities simultaneously, with more general assumptions for SGE that allow for efficient application to statistical estimation problems under heavy tail noises and discontinuous score functions.

Abstract
We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.

摘要
我团队考虑了一类泛化噪声 convex 优化问题，其中噪声噪声度受到较为一般的假设。与经典问题设定相比，我们假设噪声 variance 与优化解的"低效"有关。这些问题在各种应用中自然出现，如统计中的泛化线性回归问题。然而，据我们所知，现有的泛化噪声策略并没有达到最佳的依赖于准确性、问题参数和批处大小。我们介绍了两种非欧几何减速泛化策略：噪声加速梯度下降（SAGD）和梯度拓展（SGE）。我们证明了这两种策略，在合适的假设下，都可以达到最佳的准确率，同时具有最佳的迭代次数和批处大小复杂度。然而，SGE 算法的假设更加通用，允许在重 tailed 噪声和离散分数函数的情况下进行有效的应用。我们还讨论了 SGE 在 quadratic growth conditions 下的应用，并示出它可以用来恢复稀疏解。最后，我们报告了一些高维度设置下的仿真实验结果，以 illustrate 我们的提议方法的数值性能。

Review of Deep Learning-based Malware Detection for Android and Windows System

paper_url: http://arxiv.org/abs/2307.01494
repo_url: None
paper_authors: Nazmul Islam, Seokjoo Shin
for: 遥测和区分不同种类的黑客病毒，以评估其行为和威胁水平，并发展防御策略。
methods: 使用人工智能技术（AI）为抗黑客系统，以应对不同类型的隐藏和混淆技术。
results: 实验结果显示，使用AI技术可以实现百分之百的检测精度，探测不同类型的黑客病毒。

Abstract
Differentiating malware is important to determine their behaviors and level of threat; as well as to devise defensive strategy against them. In response, various anti-malware systems have been developed to distinguish between different malwares. However, most of the recent malware families are Artificial Intelligence (AI) enable and can deceive traditional anti-malware systems using different obfuscation techniques. Therefore, only AI-enabled anti-malware system is robust against these techniques and can detect different features in the malware files that aid in malicious activities. In this study we review two AI-enabled techniques for detecting malware in Windows and Android operating system, respectively. Both the techniques achieved perfect accuracy in detecting various malware families.

摘要
不同的黑客软件有不同的行为和威胁水平，因此可以通过区分黑客软件来制定防御策略。然而，大多数最新的黑客软件家族具有人工智能（AI）功能，可以使用不同的隐蔽技术欺骗传统的防病软件。因此，只有使用AI技术的防病软件才能够对这些技术进行鲜活的检测和区分。本研究将介绍两种基于AI技术的防病方法，一种用于Windows操作系统，另一种用于Android操作系统。两种方法均达到了完美的检测精度，可以帮助检测不同的黑客软件家族。

FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization

paper_url: http://arxiv.org/abs/2307.02493
repo_url: None
paper_authors: Eunju Yang, Gyusang Cho, Chan-Hyun Youn
for:* 这个研究是为了解决多源领域适应（Multi-Source Domain Adaptation，MSDA）中的问题，特别是在没有目标标签和多个领域的情况下进行模型适应。methods:* 这个研究提出了一个新的问题场景，即Three-Free Domain Adaptation（TFDA），在这个问题场景下，目标标签、源数据集和源领域资讯（领域标签和领域数量）都是不可用的。* 这个研究提出了一个实用的适应框架，called FREEDOM，它利用生成模型，将数据分解为类别和样式的两个方面，并使用非Parametric Bayesian方法来定义样式。在适应阶段，FREEDOM尝试将源类别分布与目标类别分布匹配，然后只部署部分的分类模型为个人化网络。results:* 这个研究获得了state-of-the-art或相等的性能，而且可以在没有领域资讯的情况下进行适应，并且将终端模型的大小减少到目标边缘。

Abstract
From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.

摘要
从服务角度来看，多源频率适应（MSDA）是一个有前途的场景，用于适应已部署模型到客户的数据集。它可以无需目标标签进行适应，并且支持多个源频率构建的情况。然而，它在训练中强依赖于多个源频率数据集的先前知识，以及每个数据样本的频率标签。此外，MSDA需要同时使用源和目标数据集（物理上），导致客户设备存储限制或数据隐私问题。为了更实际的模型适应场景，我们宽松了这些限制，并提出了一个新的问题场景：三自频率适应（TFDA），其中1）目标标签，2）源数据集，以及3）源频率信息（频率标签和频率数量）都不可用。在这种问题场景下，我们提出了一个实用的适应框架called FREEDOM。它利用了生成模型的力量，将数据分解成类和风格两个方面，其中风格被定义为来源数据中独立于类的信息，并使用非 Parametric Bayesian方法设计。在适应阶段，FREEDOM的目标是匹配源类分布与目标类分布，以哲学的思想，即类分布在风格不同的情况下仍然一致。然后，FREEDOM只部署一部分的分类模型作为个性化网络。因此，FREEDOM可以在无需源频率信息的情况下实现状态前或相当的性能，并且减少了目标模型的最终大小，不受源频率数量的影响。

Nexus sine qua non: Essentially Connected Networks for Traffic Forecasting

paper_url: http://arxiv.org/abs/2307.01482
repo_url: None
paper_authors: Tong Nie, Guoyang Qin, Lijun Sun, Yunpeng Wang, Jian Sun
for: 本研究旨在开发简洁高效的神经网络模型，用于learnings representations和预测交通数据中的下一个时刻行为。
methods: 本研究使用了spatiotemporal graph neural networks (STGNNs)，但是现有STGNNs使用复杂的技术来捕捉交通数据中的结构，导致它们难以理解和扩展。因此，研究人员寻求了简单 yet efficient的architecture。研究人员发现了STGNN的表示中的核心是certain forms of spatiotemporal contextualization，并根据此设计了一种简单的efficient message-passing backbone，即Nexus sine qua non (NexuSQN)。
results: 研究人员发现，NexuSQN比较简单的结构，即使不使用复杂的RNNs、Transformers和diffusion convolutions，仍能在计算效率、精度和大小等方面超越了复杂的参考模型。这表明，将来可能有一个Promising future for developing simple yet efficient neural predictors。

Abstract
Spatiotemporal graph neural networks (STGNNs) have emerged as a leading approach for learning representations and forecasting on traffic datasets with underlying topological and correlational structures. However, current STGNNs use intricate techniques with high complexities to capture these structures, making them difficult to understand and scale. The existence of simple yet efficient architectures remains an open question. Upon closer examination, we find what lies at the core of STGNN's representations are certain forms of spatiotemporal contextualization. In light of this, we design Nexus sine qua non (NexuSQN), an essentially connected network built on an efficient message-passing backbone. NexuSQN simply uses learnable "where" and "when" locators for the aforementioned contextualization and omits any intricate components such as RNNs, Transformers, and diffusion convolutions. Results show that NexuSQN outperforms intricately designed benchmarks in terms of size, computational efficiency, and accuracy. This suggests a promising future for developing simple yet efficient neural predictors.

摘要
现代各种图 neural networks (STGNNs) 已经成为学习表示和预测交通数据中的底层拓扑和相关结构的领先方法。然而，当前的 STGNNs 使用复杂的技术来捕捉这些结构，这使得它们变得难以理解和扩展。有效且简单的架构的存在仍然是一个开放的问题。经过仔细分析，我们发现 STGNN 的表示核心是一种特定的空间时间嵌入。基于这一点，我们设计了 Nexus sine qua non (NexuSQN)，一种简单而高效的网络。NexuSQN 使用学习的 "where" 和 "when" 嵌入来进行上述嵌入，并且不包含任何复杂的组件，如 RNNs、Transformers 和扩散卷积。结果表明，NexuSQN 在Size、计算效率和准确性三个方面超过了复杂设计的标准准。这表示在发展简单且高效的神经预测器方面，有一个广阔的未来。

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01472
repo_url: None
paper_authors: Zhuoran Li, Ling Pan, Longbo Huang
for: 本研究提出了一种新的多智能体偏好离线模型（DOM2），用于多智能体学习 reinforcement learning（MARL）环境中的离线学习。
methods: 在本研究中，我们将diffusion模型integrated into the policy network，并提出了一种基于轨迹的数据增强方案。这些关键元素使得我们的算法更加鲁棒对环境变化，并实现了 significiant improvements in performance, generalization和数据效率。
results: 我们的实验结果表明，DOM2在多智能体粒子和多智能体MuJoCo环境中比 existed state-of-the-art方法表现出更高的表现，并在shifted环境中具有更高的表现和更好的泛化能力。此外，DOM2也表现出了更高的数据效率，可以在$20++$ times less data的情况下达到state-of-the-art表现。

Abstract
We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-augmentation scheme in training. These key ingredients make our algorithm more robust to environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better in shifted environments thanks to its high expressiveness and diversity. Furthermore, DOM2 shows superior data efficiency and can achieve state-of-the-art performance with $20+$ times less data compared to existing algorithms.

摘要
我们提出了一种新的扩散停机多智能体模型（DOM2），用于停机多智能体学习（MARL）。与现有算法不同，我们的算法不仅仅依靠保守性在策略设计中，而是通过扩散来增强策略表达能力和多样性。具体来说，我们在策略网络中 интегrollo了扩散模型，并提出了一种基于轨迹的数据扩充方案在训练中。这些关键元素使我们的算法更加鲁棒对环境变化，并在性能、泛化和数据效率方面达到了显著的改进。我们的广泛的实验结果表明，DOM2在多体分子和多体MuJoCo环境中比现有状态的方法表现出色，并在偏shifted环境中具有更高的表达能力和多样性。此外，DOM2还表现出了更好的数据效率，可以在$20++$times less data的情况下达到状态顶尖的性能。

A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding

paper_url: http://arxiv.org/abs/2307.01470
repo_url: None
paper_authors: Pavan Kumar Sharma, Pranamesh Chakraborty
for: 本研究的主要目标是对driver gaze基础知识、测量方法和实际驾驶场景中的应用进行全面的总结。
methods: 本研究使用了头戴式和远程设置基于眼动估算的方法，以及与这些数据收集方法相关的术语。然后列出了现有的参考驾驶员眼动数据集，并讲述了数据收集方法和设备使用的方法。最后，本研究讲述了用于眼动估算的算法，主要是传统机器学习和深度学习基本方法。
results: 估算的驾驶员眼动被用于理解在交叉路口、上坡入口、下坡出口、车道变换和道路广告结构的影响。而且，本研究还讲述了现有文献中的限制、挑战和未来发展预cast。

Abstract
Driver gaze plays an important role in different gaze-based applications such as driver attentiveness detection, visual distraction detection, gaze behavior understanding, and building driver assistance system. The main objective of this study is to perform a comprehensive summary of driver gaze fundamentals, methods to estimate driver gaze, and it's applications in real world driving scenarios. We first discuss the fundamentals related to driver gaze, involving head-mounted and remote setup based gaze estimation and the terminologies used for each of these data collection methods. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and the equipment used for such data collection. This is followed by a discussion of the algorithms used for driver gaze estimation, which primarily involves traditional machine learning and deep learning based techniques. The estimated driver gaze is then used for understanding gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and determining the effect of roadside advertising structures. Finally, we have discussed the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.

摘要
Driver's gaze plays an important role in various gaze-based applications, such as detecting driver attentiveness, visual distraction, and understanding gaze behavior. The main objective of this study is to provide a comprehensive overview of driver gaze fundamentals, methods for estimating driver gaze, and its applications in real-world driving scenarios.First, we discuss the fundamentals of driver gaze, including head-mounted and remote setup-based gaze estimation, and the terminologies used for each data collection method. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and equipment used for such data collection.Then, we discuss the algorithms used for driver gaze estimation, primarily involving traditional machine learning and deep learning-based techniques. The estimated driver gaze is used to understand gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and the effect of roadside advertising structures.Finally, we discuss the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.

Causal Reinforcement Learning: A Survey

paper_url: http://arxiv.org/abs/2307.01452
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang
for: 本研究写作的目的是对于 causal reinforcement learning 的文献审查和概述。
methods: 本文使用的方法包括 introducing basic concepts of causality and reinforcement learning, 以及 categorizing and systematically reviewing existing causal reinforcement learning approaches based on their target problems and methodologies.
results: 本文审查了 current literature on causal reinforcement learning, 并发现了 several open issues and future directions in this emerging field.

Abstract
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.

摘要
<>通过下面的文本翻译到简化中文：<>Control learning是一种重要的思想方式，用于解决带有不确定性的顺序决策问题。虽然在过去几十年内，有很多出色的成果，但是在实际应用中仍然存在很多挑战。其中一个主要的障碍是控制学学习代理不具备世界的基本理解，因此需要通过大量的尝试和错误互动来学习。它们还可能面临着解释决策的挑战和掌握知识的一致性问题。然而， causality 提供了一种明显的优势，即可以系统地ormalize知识，并利用不变性来实现有效的知识传递。这导致了 causal reinforcement learning 的出现，这是一种尝试将 causality integrated 到学习过程中的一种新领域。在这篇评论中，我们全面评论了 literature 中的 causal reinforcement learning 研究。我们首先介绍了 causality 和 reinforcement learning 的基本概念，然后解释了如何通过 causality 解决非 causal reinforcement learning 中的核心挑战。然后，我们按照目标问题和方法分类系统地审查了现有的 causal reinforcement learning 方法。最后，我们列出了未解决的问题和未来的方向。

A Double Machine Learning Approach to Combining Experimental and Observational Data

paper_url: http://arxiv.org/abs/2307.01449
repo_url: None
paper_authors: Marco Morucci, Vittorio Orlandi, Harsh Parikh, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky
for: 该论文旨在提出一种将实验和观察研究结合起来的双机器学习方法，以便实践者可以一起测试假设的满足性和对治疗效果的估计。
methods: 该方法使用双机器学习技术将实验和观察研究结合起来，以检测假设的满足性和外部有效性的违反。当只有一个假设被违反时，我们提供了半 Parametric 有效的治疗效果估计器。
results: 该研究在三个实际应用场景中展示了其适用性，并指出了准确地识别违反假设的重要性以确保治疗效果的估计的重要性。

Abstract
Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one assumption is violated, we provide semi-parametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. We demonstrate the applicability of our approach in three real-world case studies, highlighting its relevance for practical settings.

摘要
实验和观察研究经常受到有效性问题，因为假设通常无法被证明。我们提出了一种双机器学习方法，可以结合实验和观察研究，让实践者可以测试假设违背和估计治疗效果一致。我们的框架测试了外部有效性和无知性的违背，假设较弱的假设违背。只有一个假设违背时，我们提供了半 parametrically有效的治疗效果估计器。但我们的无免责 theorem 显示，精确地识别违背的假设是估计治疗效果一致的必要条件。我们在三个实际应用中例子中详细介绍了我们的方法，强调了它在实践中的重要性。

On Conditional and Compositional Language Model Differentiable Prompting

paper_url: http://arxiv.org/abs/2307.01446
repo_url: https://github.com/jpilaul/PRopS
paper_authors: Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer
for: 本研究旨在调整静态语言模型（PLM），以便在下游任务中表现出色。
methods: 本研究使用 conditional和compositional differentiable prompting，并提出了一种新的模型——Prompt Production System（PRopS），可以将任务说明或输入元数据转换成细化的Continuous prompts，以便从PLM中获得任务特定的输出。PRopS使用基于神经网络的Production Systems结构，可以学习不同的提示输入模式，进行可compose转换，适用于小样本学习和过渡学习。
results: 研究表明，PRopS可以在compositional generalization任务、可控摘要和多语言翻译中，Consistently exceed other PLM adaptation techniques，并经常超越完全精心调整模型。同时，PRopS需要 fewer trainable parameters，适合实际应用。

Abstract
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules -- neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that PRopS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.

摘要
<>转换给定文本到简化中文。文本：提示已经被证明是一种有效的方法，用于适应预训练语言模型（PLM）在下游任务中表现良好。提示可以表示为人工设计的单词序列或学习到的连续嵌入。在这种工作中，我们研究了决定式和组合的可导提示。我们提出了一种新的模型，提示生产系统（PRopS），该模型学习将任务指令或输入元数据转换为可导的提示，以便从PLM中获取任务特定的输出。我们的模型采用基于我们的神经网络表述的生产系统结构，该结构允许模型学习分解规则——神经函数学习特定提示输入模式的特殊化，使其适用于组合转移学习和少量学习。我们进行了广泛的实验和理论分析，并证明了PRopS在组合泛化任务、可控概要摘要和多语言翻译中表现出色，而需要 fewer 可训练参数。

Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM

paper_url: http://arxiv.org/abs/2307.05383
repo_url: None
paper_authors: Di Fan, Mingyang Liu, Xiaohan Zhang, Xiaopeng Gong
for: 本研究提出了一种基于自动选择的 galvanic skin response (GSR) 信号特征和 Support Vector Machine (SVM) 的人类情感识别方法。
methods: 研究使用 e-Health Sensor Platform V2.0 获取 GSR 信号，然后使用浮点函数除噪和normalize 处理数据，提取30个特征。然后，使用协方差基于的特征选择来优化特征。最后，使用 SVM 输入优化特征进行人类情感识别。
results: 实验结果表明，提出的方法可以实现好的人类情感识别，识别率高于 66.67%。

Abstract
A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper. GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is de-noised by wavelet function and normalized to get rid of the individual difference. 30 features are extracted from the normalized data, however, directly using of these features will lead to a low recognition rate. In order to gain the optimized features, a covariance based feature selection is employed in our method. Finally, a SVM with input of the optimized features is utilized to achieve the human emotion recognition. The experimental results indicate that the proposed method leads to good human emotion recognition, and the recognition accuracy is more than 66.67%.

摘要
本文提出了一种基于自动选择的galvanic skin response（GSR）信号特征和支持向量机（SVM）的人类情感识别方法。GSR信号通过e-Health感知平台V2.0获取。然后，数据进行杂谱函数滤波和normalizaation处理，以消除个体差异。从 нормализов的数据中提取了30个特征，但直接使用这些特征将导致低的识别率。为了获得优化的特征，我们在方法中使用covariance基于的特征选择。最后，使用输入优化特征的SVM实现人类情感识别。实验结果表明，提出的方法可以实现良好的人类情感识别，识别率高于66.67%。

TablEye: Seeing small Tables through the Lens of Images

paper_url: http://arxiv.org/abs/2307.02491
repo_url: None
paper_authors: Seung-eon Lee, Sang-Chul Lee
for: 这个论文目的是解决几个板表学习问题，具体来说是在几个板表数据上培养模型，而不需要大量标签数据。
methods: 这个论文使用的方法是基于域转换的，通过生成板表图像来保持原始板表数据的内在 semantics。然后使用已经测试过的几个shot学习算法和嵌入函数来获得和应用优先知识。
results: 这个论文的结果表明，TablEye在4个shot任务中的最高AUC为0.11，在1个shot设置中的平均错误率高于TabLLM by 3.17%。这表明TablEye在几个板表数据上具有更好的性能。

Abstract
The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.

摘要
exploration of few-shot tabular learning becoming increasingly important. 表格数据是一种多样化表示方式，它可以捕捉多种信息，但同时也有一些限制，例如数据属性和模型大小。对于大量表格数据的标注可能是困难的，而且可能无法捕捉所有重要的特征。然而，几 shot tabular learning仍然尚未得到广泛的探索，主要是因为独立的数据集之间的共享信息缺乏，以及表格数据中的内在含义是不具有明确定义的。根据我们所知，没有任何不受限制的几 shot tabular learning技术已经被开发出来，没有强制要求数据集的限制。在这篇论文中，我们提出了一个创新的框架，即TablEye，以解决表格数据的几 shot learning问题。TablEye采用域转换来解决几 shot learning问题，通过生成表格图像来保留原始表格数据的内在含义。这种方法利用了已经测试过的几 shot学习算法和嵌入函数，以获取和应用先前知识。通过共享数据域，我们可以利用这些先前知识，原来学习自图像领域。特别是，TablEye在4 shot任务中的最大AUC为0.11，在1 shot任务中的平均准确率高于TabLLM的3.17%。

Learning to Branch in Combinatorial Optimization with Graph Pointer Networks

paper_url: http://arxiv.org/abs/2307.01434
repo_url: None
paper_authors: Rui Wang, Zhiming Zhou, Tao Zhang, Ling Wang, Xin Xu, Xiangke Liao, Kaiwen Li
for: 解决 combinatorial optimization 问题的variable选择策略学习
methods: 提出了一种基于图Pointer网络的变量选择策略学习模型，利用图特征、全局特征和历史特征来表示解决器状态
results: 实验表明，提出的方法可以有效地将解决器状态映射到分支变量决策中，并且在各种benchmark问题上显著超越了经典强分支专家规则，同时也超越了当前最佳机器学习基于分支和缓存的方法。

Abstract
Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions. The model is trained to imitate the classic strong branching expert rule by a designed top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. Our approach also outperforms the state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.

摘要
通常的方法之一用于解决 combinatorial optimization 问题是 branch-and-bound。这篇论文提出了一种图像指针网络模型，用于学习变量选择策略在 branch-and-bound 中。我们提取了图像特征、全局特征和历史特征来表示解决器状态。提议的模型，结合图像神经网络和指针机制，可以有效地将解决器状态映射到分支变量决策。模型通过一个设计的 top-k Kullback-Leibler 分布差函数进行训练，以模仿经典的强分支专家规则。实验表明，提议的方法在一系列的 benchmark 问题上显著超越了通用的专家设计的分支规则。我们的方法还超越了当前的Machine Learning 基于 branch-and-bound 方法在所有测试实例上的解决速度和搜索树大小。此外，模型还可以泛化到未看过的实例和更大的实例。

SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages

paper_url: http://arxiv.org/abs/2307.05362
repo_url: None
paper_authors: Xuewei Cheng, Ke Huang, Yi Zou, Shujie Ma
for: automatische slaapfaseclassificatie
methods: GAN-powered ensemble deep learning model (SleepEGAN) en data-augmentatie
results: verbeterde classificatie-accurateit compared to existing state-of-the-art methods using three public sleep datasets.

Abstract
Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two problems, this paper develops a generative adversarial network (GAN)-powered ensemble deep learning model, named SleepEGAN, for the imbalanced classification of sleep stages. To alleviate class imbalance, we propose a new GAN (called EGAN) architecture adapted to the features of EEG signals for data augmentation. The generated samples for the minority classes are used in the training process. In addition, we design a cost-free ensemble learning strategy to reduce the model estimation variance caused by the heterogeneity between the validation and test sets, so as to enhance the accuracy and robustness of prediction performance. We show that the proposed method can improve classification accuracy compared to several existing state-of-the-art methods using three public sleep datasets.

摘要
深度神经网络在自动睡眠阶段分类中发挥了重要作用，因为它们具有强大的表示能力和内存中特征转换能力。然而， raw EEG 信号中的分类不均和个体差异通常会对任何机器学习算法的分类性能产生很大的影响。为解决这两个问题，本文提出了基于生成对抗网络（GAN）的 ensemble 深度学习模型，名为 SleepEGAN，用于不均分类睡眠阶段。为了缓解分类不均，我们提出了一种适应 EEG 信号特点的新 GAN 架构（称为 EGAN），用于数据增强。生成的小类样本在训练过程中使用。此外，我们设计了一种免费的ensemble学习策略，以降低因验证集和测试集之间的个体差异而导致的模型估计方差，以提高预测性能的准确性和稳定性。我们示示了提案的方法可以在三个公共睡眠数据集上提高分类精度，比较现有的一些状态之 arts 方法。

Smart filter aided domain adversarial neural network: An unsupervised domain adaptation method for fault diagnosis in noisy industrial scenarios

paper_url: http://arxiv.org/abs/2307.01429
repo_url: None
paper_authors: Baorui Dai, Gaëtan Frusque, Tianfu Li, Qi Li, Olga Fink
for: 这个研究旨在提出一种基于不监督领域适应（Unsupervised Domain Adaptation, UDA）的缺陷诊断方法，以便在实际工业应用中将运作经验和缺陷特征转移到不同的运作条件、不同的机器设备或实际数据和模拟数据之间。
methods: 本研究提出了一种名为Smart Filter-Aided Domain Adversarial Neural Network（SFDANN）的缺陷诊断方法，其主要包括两个步骤。第一步是发展一个智能节点，它可以在时间-频域域中强制同源和目标领域数据的相似性。第二步是将重建后的数据输入到一个领域对抗神经网络（Domain Adversarial Neural Network, DANN）中，以学习领域不断和特征分类。
results: 本研究运用了两个缺陷诊断案例，一是磨削机缺陷诊断在噪音环境中，另一是列车轨道缺陷诊断在列车-轨道-桥梁组合震动系统中，这两个案例都是将模拟数据转移到实际数据上，以验证SFDANN方法的效果。结果显示，相比于其他代表性的UDA方法，SFDANN方法在稳定性和识别性方面表现出色。

Abstract
The application of unsupervised domain adaptation (UDA)-based fault diagnosis methods has shown significant efficacy in industrial settings, facilitating the transfer of operational experience and fault signatures between different operating conditions, different units of a fleet or between simulated and real data. However, in real industrial scenarios, unknown levels and types of noise can amplify the difficulty of domain alignment, thus severely affecting the diagnostic performance of deep learning models. To address this issue, we propose an UDA method called Smart Filter-Aided Domain Adversarial Neural Network (SFDANN) for fault diagnosis in noisy industrial scenarios. The proposed methodology comprises two steps. In the first step, we develop a smart filter that dynamically enforces similarity between the source and target domain data in the time-frequency domain. This is achieved by combining a learnable wavelet packet transform network (LWPT) and a traditional wavelet packet transform module. In the second step, we input the data reconstructed by the smart filter into a domain adversarial neural network (DANN). To learn domain-invariant and discriminative features, the learnable modules of SFDANN are trained in a unified manner with three objectives: time-frequency feature proximity, domain alignment, and fault classification. We validate the effectiveness of the proposed SFDANN method based on two fault diagnosis cases: one involving fault diagnosis of bearings in noisy environments and another involving fault diagnosis of slab tracks in a train-track-bridge coupling vibration system, where the transfer task involves transferring from numerical simulations to field measurements. Results show that compared to other representative state of the art UDA methods, SFDANN exhibits superior performance and remarkable stability.

摘要
通过不监督领域适应（UDA）基本的缺陷诊断方法应用，在实际工业场景中显示出了显著的效果，帮助传输不同操作条件、不同单元的船队中的运行经验和缺陷特征。然而，在实际工业场景中，未知的噪声水平和类型可能会增加领域对Alignment的困难度，从而严重地affect Deep learning模型的诊断性能。为解决这个问题，我们提出了一种名为智能筛子援助领域对抗神经网络（SFDANN）的UDA方法，用于缺陷诊断在噪声rich工业场景中。该方法包括两个步骤：第一步：我们开发了一种智能筛子，通过将源频域和目标频域数据在时域频域上进行动态相似性检查，以确保频域数据的匹配。这是通过组合学习抽象射频包变换网络（LWPT）和传统的抽象射频包变换模块来实现的。第二步：我们将重构后的数据输入到领域对抗神经网络（DANN）中，以学习频域特征的域不可分别性和分类特征。我们将学习模块在一起训练三个目标：时域特征的相似性、频域对齐和缺陷分类。我们验证了我们提出的SFDANN方法的效果，在磁矿轮毂缺陷诊断和铁路桥摆车轨缺陷诊断两个案例中进行了比较，其中一个案例是在噪声环境中进行磁矿轮毂缺陷诊断，另一个案例是在铁路桥摆车轨缺陷诊断中，将数据从数值仿真转移到场景测量中。结果显示，相比其他代表性的UDA方法，SFDANN方法在稳定性和性能两个方面具有显著优势。

Generative Flow Networks: a Markov Chain Perspective

paper_url: http://arxiv.org/abs/2307.01422
repo_url: None
paper_authors: Tristan Deleu, Yoshua Bengio
for: 这篇论文是为了提出一种基于Markov链 Monte Carlo方法的新框架，用于采样高多modal的概率分布。
methods: 论文使用Generative Flow Networks（GFlowNets）作为一种新的采样框架，通过对采样视为一个顺序决策问题来mitigate高多modal的问题。
results: 论文提出了一种新的框架，可以在不同的状态空间下视为一种回归Markov链，并且可以通过对GFlowNets进行抽象来看到它们与MCMC方法之间的相似性。

Abstract
While Markov chain Monte Carlo methods (MCMC) provide a general framework to sample from a probability distribution defined up to normalization, they often suffer from slow convergence to the target distribution when the latter is highly multi-modal. Recently, Generative Flow Networks (GFlowNets) have been proposed as an alternative framework to mitigate this issue when samples have a clear compositional structure, by treating sampling as a sequential decision making problem. Although they were initially introduced from the perspective of flow networks, the recent advances of GFlowNets draw more and more inspiration from the Markov chain literature, bypassing completely the need for flows. In this paper, we formalize this connection and offer a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space as recurrent Markov chains. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks, and most importantly to highlight their

摘要
While Markov chain Monte Carlo methods (MCMC) provide a general framework to sample from a probability distribution defined up to normalization, they often suffer from slow convergence to the target distribution when the latter is highly multi-modal. Recently, Generative Flow Networks (GFlowNets) have been proposed as an alternative framework to mitigate this issue when samples have a clear compositional structure, by treating sampling as a sequential decision making problem. Although they were initially introduced from the perspective of flow networks, the recent advances of GFlowNets draw more and more inspiration from the Markov chain literature, bypassing completely the need for flows. In this paper, we formalize this connection and offer a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space as recurrent Markov chains. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks, and most importantly to highlight their differences.Note: The translation is done using a machine translation tool, and may not be perfect.

Free energy of Bayesian Convolutional Neural Network with Skip Connection

paper_url: http://arxiv.org/abs/2307.01417
repo_url: None
paper_authors: Shuya Nagayasu, Sumio Watanabe
for: 本研究探讨了Convolutional Neural Networks(CNNs)中skip connection的效果，以及 Bayesian learning中这种结构的可能性。
methods: 本研究使用了Bayesian方法来研究CNNs中skip connection的效果，并对 Bayesian CNN的一般化性能进行了解释。
results: 研究发现，Bayesian CNN中skip connection的upper bound of free energy不依赖于过参数，并且Bayesian CNN的一般化错误有类似的性能。

Abstract
Since the success of Residual Network(ResNet), many of architectures of Convolutional Neural Networks(CNNs) have adopted skip connection. While the generalization performance of CNN with skip connection has been explained within the framework of Ensemble Learning, the dependency on the number of parameters have not been revealed. In this paper, we show that Bayesian free energy of Convolutional Neural Network both with and without skip connection in Bayesian learning. The upper bound of free energy of Bayesian CNN with skip connection does not depend on the oveparametrization and, the generalization error of Bayesian CNN has similar property.

摘要
自Residual Network(ResNet)的成功以来，许多Convolutional Neural Networks(CNNs)的 arquitectures 已经采用了跳connection。然而，通用的参数数量对CNN with skip connection的泛化性能的影响还没有得到解释。在这篇论文中，我们展示了Bayesian free energy of Convolutional Neural Network both with and without skip connection in Bayesian learning。无论是Bayesian CNN with skip connection还是Bayesian CNN without skip connection，其Upper bound of free energy都不依赖于过参数化，而泛化误差的性能也具有相同的性质。

Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks

paper_url: http://arxiv.org/abs/2307.03197
repo_url: None
paper_authors: Aysha Thahsin Zahir Ismail, Raj Mani Shukla
for: 这个论文旨在研究和分析 SplitFed Learning (SFL) 中数据毒素攻击的影响。
methods: 该论文提出了三种新的攻击策略，包括无目标攻击、targeted攻击和距离基于攻击。
results: 研究发现，无目标和距离基于攻击在SFL中有更大的影响，比targeted攻击更容易让分类器输出错误。研究还通过对两个案例研究（electrocardiogram signal classification和自动手写数字识别）进行了多个攻击实验，并分析了攻击的影响。

Abstract
Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL

摘要
分布式协作机器学习（DCML）是一种可能的中央机器学习隐私问题的解决方案。分布式学习（SL）和联邦学习（FL）是DCML中两种有效的学习方法。最近，关注于SL和FL的混合，即SplitFed Learning（SFL）的研究增长。这项研究是对SFL中数据毒化攻击的首次研究。我们提出了三种新的攻击策略，namely 无目标、Targeted和距离基于攻击，这三种攻击策略都是为了降低基于DCML的分类器性能。我们在两个不同的案例研究中进行了电室心跳信号分类和自动手写数字识别的试验。我们在clients和服务器之间的模型 Split层进行了变化，并通过调整恶意客户端的百分比和选择的模型 Split层来进行了一系列攻击实验。结果表明，无目标和距离基于攻击更有可能影响DCML-based分类器的性能，compared to Targeted attacks。

Multi-Predictor Fusion: Combining Learning-based and Rule-based Trajectory Predictors

paper_url: http://arxiv.org/abs/2307.01408
repo_url: None
paper_authors: Sushant Veer, Apoorva Sharma, Marco Pavone
for: 这篇论文是关于自动驾驶车辆（AV）的 trajectory 预测模块，尤其是在高度互动的交通enario中，以提高安全和效率的规划计划。
methods: 这篇论文提出了一种名为多predictor fusion（MPF）的算法，它将学习基于predictors和逻辑规则的motions planners结合在一起，以提高学习型预测器的性能。MPF使用probabilistic combining方法，将学习型和逻辑规则基的预测器的轨迹混合在一起，以获得最佳性能。
results: 根据我们的结果，MPF在多种指标上表现出色，并且在线性能最高和最稳定的情况下运行。

Abstract
Trajectory prediction modules are key enablers for safe and efficient planning of autonomous vehicles (AVs), particularly in highly interactive traffic scenarios. Recently, learning-based trajectory predictors have experienced considerable success in providing state-of-the-art performance due to their ability to learn multimodal behaviors of other agents from data. In this paper, we present an algorithm called multi-predictor fusion (MPF) that augments the performance of learning-based predictors by imbuing them with motion planners that are tasked with satisfying logic-based rules. MPF probabilistically combines learning- and rule-based predictors by mixing trajectories from both standalone predictors in accordance with a belief distribution that reflects the online performance of each predictor. In our results, we show that MPF outperforms the two standalone predictors on various metrics and delivers the most consistent performance.

摘要
几何预测模组是自动驾驶车 (AV) 规划中的关键启动器，特别是在高度互动的交通情况下。最近，学习型几何预测器在提供最佳性能方面有所成就，因为它们可以从数据中学习多种行为模式。在这篇文章中，我们提出了一个名为多predictor融合（MPF）的算法，它将学习型和规则型预测器融合在一起，以提高几何预测器的性能。MPF 使用一个信念分布来混合两个独立的预测器的轨迹，以实现学习型和规则型预测器的共同运行。在我们的结果中，我们发现MPF 在多个指标上表现更好，并提供了最稳定的性能。

Learning to Communicate using Contrastive Learning

paper_url: http://arxiv.org/abs/2307.01403
repo_url: https://github.com/SonamSangpoLama/Music-Genre-Classification
paper_authors: Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch
for: 这个研究目的是为了提高多智能体RL中的协调，并且解决对环境的观察和沟通问题。
methods: 这个研究使用了对比学习来学习通信，将在不同时间和位置发送的消息视为不完整的环境状态观察。
results: 研究发现，这种方法可以在对话重要的环境中提高性能和学习速度，并且对环境状态观察有更好的对 symmetry 和全局状态资讯的捕捉。

Abstract
Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.

摘要
通信是多智能RL中协调工具的强大工具。但是引入有效、公共语言是一个困难的挑战，特别是在分布式设定下。在这项工作中，我们提出了一种不同的视角，即在代理者之间交换的通信信息被视为环境状态的不同不完整的视图。我们提出了通过对交换的消息进行对比学习，以最大化交换消息序列中的相互信息。在需要通信的环境下，我们的方法比前一项工作在性能和学习速度方面表现更好。使用质量指标和表示探测，我们显示了我们的方法在交换消息中引入更Symmetric的通信和捕捉环境中的全局状态信息。总之，我们展示了对冲学习的力量和通过消息编码实现有效的通信的重要性。

Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II – Clustering Extremely High-Dimensional Grid-Based Data

paper_url: http://arxiv.org/abs/2307.01400
repo_url: None
paper_authors: Chandrika Kamath, Juliette S. Franzman
for: 这个论文的目的是建立一个准确的模拟模型，以便更好地预测计算机模拟中的空间时间输出。
methods: 作者使用了一种简单的方法，即将输出数据分为不同类别，并建立每个类别的单独的模拟模型。但是当输出数据中的空间域数量很大，分类变得更加困难。因此，作者首先将数据转换为一致的格式，然后使用随机投影法减少数据的维度，使用迭代k-means算法进行分类。
results: 作者的方法可以将极高维度的数据进行有意义的分类，即使有一定的近似性。他们通过控制随机投影的方式和k-means算法的初始中心点的选择，确定了数据集中的cluster数量。

Abstract
Building an accurate surrogate model for the spatio-temporal outputs of a computer simulation is a challenging task. A simple approach to improve the accuracy of the surrogate is to cluster the outputs based on similarity and build a separate surrogate model for each cluster. This clustering is relatively straightforward when the output at each time step is of moderate size. However, when the spatial domain is represented by a large number of grid points, numbering in the millions, the clustering of the data becomes more challenging. In this report, we consider output data from simulations of a jet interacting with high explosives. These data are available on spatial domains of different sizes, at grid points that vary in their spatial coordinates, and in a format that distributes the output across multiple files at each time step of the simulation. We first describe how we bring these data into a consistent format prior to clustering. Borrowing the idea of random projections from data mining, we reduce the dimension of our data by a factor of thousand, making it possible to use the iterative k-means method for clustering. We show how we can use the randomness of both the random projections, and the choice of initial centroids in k-means clustering, to determine the number of clusters in our data set. Our approach makes clustering of extremely high dimensional data tractable, generating meaningful cluster assignments for our problem, despite the approximation introduced in the random projections.

摘要
在计算机模拟中的输出中，建立准确的代理模型是一项复杂的任务。一种简单的方法是根据输出的相似性进行归类，并为每个归类建立一个独立的代理模型。当输出的每个时间步骤的大小是 Moderate 时，这种归类是相对容易的。但是，当 spatial 领域被表示为数百万个网点时，归类数据变得更加困难。在这份报告中，我们考虑了计算机模拟中的液体喷气与高爆物相互作用的输出数据。这些数据在不同的空间尺度上可以获得，并且在每个时间步骤上分布在多个文件中。我们首先描述了如何将这些数据转换成一致的格式，以便归类。我们采用了数据挖掘中的Random Projections的想法，将数据维度减少到一千倍，使用迭代k-means算法进行归类。我们示出了如何使用Random Projections和k-means归类算法中的随机初始化中心的Randomness来确定数据集中的凝集数。我们的方法使得归类EXTREMELY HIGH 维度数据成为可能，生成了有意义的凝集分配，尽管在随机投影中引入了一定的简化。

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

paper_url: http://arxiv.org/abs/2307.01394
repo_url: None
paper_authors: Niranda Perera, Arup Kumar Sarker, Mills Staylor, Gregor von Laszewski, Kaiying Shan, Supun Kamburugamuve, Chathura Widanage, Vibhatha Abeykoon, Thejaka Amila Kanewela, Geoffrey Fox
for: 本研究旨在提高数据工程应用程序的性能，特别是在处理大量数据时。
methods: 本文使用高性能计算的视角，提出了分布式数据框架操作的并行处理模式，并实现了参考runtime实现Cylon。
results: 本研究在ORNL Summit超级计算机上评估了Cylon的性能。

Abstract
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer.

摘要
“数据科学领域在过去的一个 décennial 内扩大了很大，主要归功于大数据革命。人工智能（AI）和机器学习（ML）对数据工程应用带来了更多复杂性，这些应用现在被 integrate 到数据处理管道中来处理 terrabytes 级数据。通常，处理数据预处理过程中会投入大量时间，因此提高其效率直接影响整个管道性能。社区最近普遍认可数据帧为数据表示和操作的启用词。但是，当前最广泛使用的序列数据帧（R、pandas）在处理 Moderately 大规模数据集时会表现出性能限制。我们认为，从高性能计算的角度来看这个问题，还有很多可以提高的空间。在先前的发表文章中，我们提出了分布式数据帧运算 Patterns 和 Referencel Runtime 实现 Cylon 等一系列并发处理模式[1]。在这篇论文中，我们将这个概念进一步发展，并提出一种成本模型来评估所提出的模式。此外，我们还在 ORNL Summit 超级计算机上评估了 Cylon 的性能。”

Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part I – Analysis with a Small Sample Size

paper_url: http://arxiv.org/abs/2307.01393
repo_url: None
paper_authors: Chandrika Kamath, Juliette S. Franzman, Brian H. Daub
for: 本研究旨在开发一种高质量的空间-时间抽象方法，以便更好地理解复杂现象的计算机模拟结果。
methods: 本研究使用了一种基于机器学习的抽象方法，并在使用了一些简单的方法来提高抽象精度。
results: 研究发现，使用这种抽象方法可以创建高质量的空间-时间抽象模型，并且不需要进行大量的计算机模拟。

Abstract
Computer simulations, especially of complex phenomena, can be expensive, requiring high-performance computing resources. Often, to understand a phenomenon, multiple simulations are run, each with a different set of simulation input parameters. These data are then used to create an interpolant, or surrogate, relating the simulation outputs to the corresponding inputs. When the inputs and outputs are scalars, a simple machine learning model can suffice. However, when the simulation outputs are vector valued, available at locations in two or three spatial dimensions, often with a temporal component, creating a surrogate is more challenging. In this report, we use a two-dimensional problem of a jet interacting with high explosives to understand how we can build high-quality surrogates. The characteristics of our data set are unique - the vector-valued outputs from each simulation are available at over two million spatial locations; each simulation is run for a relatively small number of time steps; the size of the computational domain varies with each simulation; and resource constraints limit the number of simulations we can run. We show how we analyze these extremely large data-sets, set the parameters for the algorithms used in the analysis, and use simple ways to improve the accuracy of the spatio-temporal surrogates without substantially increasing the number of simulations required.

摘要
计算机模拟，尤其是复杂现象的模拟，可能具有高成本，需要高性能计算资源。经常情况下，以解释现象，需要运行多个模拟，每个模拟都有不同的模拟输入参数。这些数据后来用于创建一个 interpolant，或surrogate，将模拟输出与相应的输入关系。当输入和输出都是整数时，一个简单的机器学习模型即可。但当模拟输出是二维或三维的向量值，创建surrogate更加困难。在这份报告中，我们使用一个两维问题，即喷气与高爆物相互作用，来理解如何建立高质量surrogate。我们的数据集的特点是唯一的：每个模拟的向量值输出在超过两百万个空间位置上可用;每个模拟只需要很少的时间步骤;计算区域的大小随每个模拟而异；资源限制限制我们可以运行的模拟数量。我们如何分析这些非常大的数据集，设置分析中使用的参数，并使用简单的方法提高空间temporal surrogate的准确性，不需要substantially增加模拟数量。

Adversarial Learning in Real-World Fraud Detection: Challenges and Perspectives

paper_url: http://arxiv.org/abs/2307.01390
repo_url: None
paper_authors: Danele Lunghi, Alkis Simitsis, Olivier Caelen, Gianluca Bontempi
for: 本研究旨在探讨针对诈骗检测系统的攻击方法，以及如何扩展对其他领域和应用的攻击技术。
methods: 本研究使用了对抗机器学习技术，以探讨诈骗检测系统中的攻击方法。
results: 本研究发现了一些针对诈骗检测系统的攻击方法，并提出了一些可能的解决方案。

Abstract
Data economy relies on data-driven systems and complex machine learning applications are fueled by them. Unfortunately, however, machine learning models are exposed to fraudulent activities and adversarial attacks, which threaten their security and trustworthiness. In the last decade or so, the research interest on adversarial machine learning has grown significantly, revealing how learning applications could be severely impacted by effective attacks. Although early results of adversarial machine learning indicate the huge potential of the approach to specific domains such as image processing, still there is a gap in both the research literature and practice regarding how to generalize adversarial techniques in other domains and applications. Fraud detection is a critical defense mechanism for data economy, as it is for other applications as well, which poses several challenges for machine learning. In this work, we describe how attacks against fraud detection systems differ from other applications of adversarial machine learning, and propose a number of interesting directions to bridge this gap.

摘要
将文本翻译成简化中文。数据经济依赖于数据驱动系统和复杂的机器学习应用程序，但是这些应用程序受到诈骗活动和敌意攻击的威胁。在过去的一个 décennial 以来，关于反对机器学习的研究兴趣增长了 significatively，揭示了机器学习应用程序可能受到严重的影响。虽然初期的反对机器学习结果表明了该方法在图像处理领域的巨大潜力，但是在其他领域和应用程序中，还存在一定的泛化问题。防止诈骗是数据经济中的关键防御机制，同时也是其他应用程序中的挑战。在这种情况下，我们描述了诈骗检测系统受到攻击的方式与其他应用程序不同，并提出了一些有趣的方向来bridging这个差距。

Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer’s Disease Progression via Counterfactual Inference

paper_url: http://arxiv.org/abs/2307.01389
repo_url: None
paper_authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li
for: 这篇论文旨在探讨阿尔茨海默症（AD）的预后诊断和个性化治疗方案。
methods: 论文提出了一种基于图 convolutional neural network（GVCNet）的方法来估计个体对药物剂量的影响，以探讨阿尔茨海默症发展的因果关系。
results: 论文显示了这种方法可以实现个体对阿尔茨海默症发展的测量，并且可以提供可靠的预后诊断和个性化治疗方案。

Abstract
Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-beta accumulation and AD pathophysiology remains unclear, and causal inference approaches are needed to uncover how amyloid-beta levels can impact AD development. In this paper, we propose a graph varying coefficient neural network (GVCNet) for estimating the individual treatment effect with continuous treatment levels using a graph convolutional neural network. We highlight the potential of causal inference approaches, including GVCNet, for measuring the regional causal connections between amyloid-beta accumulation and AD pathophysiology, which may serve as a robust tool for early diagnosis and tailored care.

摘要
阿尔茨海默病（AD）是一种神经退化疾病，起始于蛋白质沉积，然后是神经元丢失和结构、功能和认知的衰退。脑内βamyloid沉积的寻测，通过18F-氟苯酚（AV45） пози特核燐发射 Tomography（PET）成像，在早期诊断AD中广泛使用。然而，蛋白质沉积和AD生物学过程之间的关系仍然不清楚，需要用 causal inference 方法来探索蛋白质沉积如何影响AD发展。在这篇论文中，我们提出了一种基于图变换系数神经网络（GVCNet）的个体处方效应估计方法，可以用于评估连续治疗水平下的个体处方效应。我们强调了可meter causal inference 方法，包括 GVCNet，在评估蛋白质沉积和AD生物学过程之间的区域 causal 连接方面的潜在价值，这可能成为早期诊断和个性化治疗的可靠工具。

Systematic Bias in Sample Inference and its Effect on Machine Learning

paper_url: http://arxiv.org/abs/2307.01384
repo_url: None
paper_authors: Owen O’Neill, Fintan Costello
for: 这种机器学习模型下的目标特征下预测不准确，特别是对少数群体的预测。
methods: 使用小样本统计推断的方法，导致预测结果受到方向性的统计偏见。
results: 对多个子集的预测结果显示，这种偏见导致了少数群体的预测错误率较高。

Abstract
A commonly observed pattern in machine learning models is an underprediction of the target feature, with the model's predicted target rate for members of a given category typically being lower than the actual target rate for members of that category in the training set. This underprediction is usually larger for members of minority groups; while income level is underpredicted for both men and women in the 'adult' dataset, for example, the degree of underprediction is significantly higher for women (a minority in that dataset). We propose that this pattern of underprediction for minorities arises as a predictable consequence of statistical inference on small samples. When presented with a new individual for classification, an ML model performs inference not on the entire training set, but on a subset that is in some way similar to the new individual, with sizes of these subsets typically following a power law distribution so that most are small (and with these subsets being necessarily smaller for the minority group). We show that such inference on small samples is subject to systematic and directional statistical bias, and that this bias produces the observed patterns of underprediction seen in ML models. Analysing a standard sklearn decision tree model's predictions on a set of over 70 subsets of the 'adult' and COMPAS datasets, we found that a bias prediction measure based on small-sample inference had a significant positive correlations (0.56 and 0.85) with the observed underprediction rate for these subsets.

摘要
通常观察到的机器学习模型 patrón es la underprediction del feature objetivo, con la tasa predicha del modelo para los miembros de una categoría específica generalmente siendo menor que la tasa real para los miembros de esa categoría en el conjunto de entrenamiento. Esta underprediction es usualmente más grande para los miembros de los grupos minoritarios; por ejemplo, en el conjunto de datos 'adult', la tasa de underprediction es significativamente más alta para las mujeres (un grupo minoritario en ese conjunto de datos). Proponemos que este patrón de underprediction para los minorías se debe a una inferencia estadística predictible en pequeños conjuntos de datos. Cuando se presenta a un nuevo individuo para clasificación, un modelo de aprendizaje automático realiza inferencia no en todo el conjunto de entrenamiento, sino en un subconjunto que es de alguna manera similar al nuevo individuo, con tamaños de estos subconjuntos que siguen una distribución de potencia, lo que significa que la mayoría son pequeños (y con estos subconjuntos necesariamente más pequeños para el grupo minoritario). Demostramos que esta inferencia en pequeños conjuntos de datos está sujeta a una bias estadística sistemática y direccional, y que esta bias produce los patrones de underprediction observados en los modelos de aprendizaje automático. Analizando las predicciones de un modelo de árbol de decisión de sklearn en más de 70 subconjuntos del conjunto de datos 'adult' y COMPAS, encontramos que una medida de predicción de bias basada en la inferencia en pequeños conjuntos de datos tuvo una correlación positiva significativa (0,56 y 0,85) con la tasa de underprediction observada para estos subconjuntos.

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

paper_url: http://arxiv.org/abs/2307.01381
repo_url: https://github.com/osu-starlab/implicitmemory
paper_authors: Matthew Raffel, Lizhong Chen
for: 这个论文目的是提出一种新的听话器，以便在同时进行口头翻译。
methods: 该方法使用块处理来分割输入序列，并使用新的左上下文方法来隐式地保留记忆。
results: 实验结果表明，使用该方法可以在Encoder前进行快速加速，并且与使用左上下文和记忆银行的方法相比，翻译质量几乎相同。

Abstract
Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment's attention calculation. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass with nearly identical translation quality when compared with the state-of-the-art approach that employs both left context and memory banks.

摘要
同时对话翻译是人类communication task中的一项重要任务，即在流动输入语音时实时生成翻译。为此流动任务，使用块处理的 transformers 已经达到了状态机器的性能标准，而且可以降低计算成本。现有的方法，包括左上下文和内存银行，尝试让信息在段之间传递，但是这些方法都是不够的表示和过分昂贵的计算。在这篇论文中，我们提出了隐式记忆 transformer，通过一种新的左上下文方法，使得不需要显式表示内存。我们从前一段的注意输出中生成左上下文，并将其包含在当前段的注意计算中的键和值中。实验结果表明，隐式记忆 transformer 在 Must-C 数据集上提供了大幅降低encoder前进计算时间，并且与使用左上下文和内存银行的状态之前的翻译质量相似。

Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models

paper_url: http://arxiv.org/abs/2307.01379
repo_url: https://github.com/jinhaoduan/shifting-attention-to-relevance
paper_authors: Jinhao Duan, Hao Cheng, Shiqi Wang, Chenan Wang, Alex Zavalny, Renjing Xu, Bhavya Kailkhura, Kaidi Xu
for: 这项研究的目的是解决自动逆进语言模型（LLMs）生成输出的不确定性问题，即用户可以信任模型输出的问题。
methods: 这项研究使用了自动逆进语言模型（LLMs）生成输出的token不均等，即一些token更加重要（或代表）于另外的token，并且对于估计不确定性，所有token被视为平等的现象，来 investigate 如何解决这些不平等。
results: 研究结果显示，在估计不确定性时，许多重要的token和含有有限 semantics的句子被平均地或者甚至很重视，以至于存在biases。为了解决这些biases，提议使用 JOINT SHIFTING ATTENTION TO RELEVANT（SAR）组件，并在实验中达到了superior表现。

Abstract
Although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.

摘要
尽管大型自然语言模型（LLM）已经表现出了很大的潜力，但是 Still characterizing the uncertainty of model generations, i.e., when users can trust model outputs, is still a challenge. Our research is based on the heuristic fact that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

paper_url: http://arxiv.org/abs/2307.01377
repo_url: https://github.com/osu-starlab/shiftablecontext
paper_authors: Matthew Raffel, Drew Penney, Lizhong Chen
for: simultaneous speech translation
methods: 使用 segment-based processing 和 Shiftable Context scheme
results: average increase of 2.09, 1.83, and 1.95 BLEU scores across each wait-k value for the three language pairs, with minimal impact on computation-aware Average Lagging.

Abstract
Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. Shiftable Context is also broadly applicable to segment-based transformers for streaming tasks. Our experiments on the English-German, English-French, and English-Spanish language pairs from the MUST-C dataset demonstrate that when applied to the Augmented Memory Transformer, a state-of-the-art model for simultaneous speech translation, the proposed scheme achieves an average increase of 2.09, 1.83, and 1.95 BLEU scores across each wait-k value for the three language pairs, respectively, with a minimal impact on computation-aware Average Lagging.

摘要
transformer模型使用分段处理有效地实现同时语音翻译。然而，这些模型在训练和推理环境中存在上下文匹配问题，从而限制了翻译准确性。我们解决这个问题，提出了Shiftable Context，一种简单 yet effective的方案，确保在训练和推理过程中保持一致的分段和上下文大小。Shiftable Context还可以广泛应用于流处理任务中的 segment-based transformer。我们在MUST-C数据集上进行英语-德语、英语-法语和英语-西班牙语三对语言对的实验，结果显示，当应用到Augmented Memory Transformer模型时，提出的方案平均提高了2.09、1.83和1.95的BLEU分数 across each wait-k值，并且对计算意识的均衡延迟产生了最小的影响。

Adaptive Principal Component Regression with Applications to Panel Data

paper_url: http://arxiv.org/abs/2307.01357
repo_url: None
paper_authors: Anish Agarwal, Keegan Harris, Justin Whitehouse, Zhiwei Steven Wu
for: This paper provides time-uniform finite sample guarantees for online principal component regression (PCR) in the presence of adaptive data collection.
methods: The paper uses tools from modern martingale concentration to analyze PCR in the online setting, which is a generalization of the fixed-design error-in-variables regression.
results: The paper provides a framework for experiment design in panel data settings when interventions are assigned adaptively, which can be seen as a generalization of synthetic control and synthetic interventions frameworks.Here’s the Chinese version:
for: 这篇论文提供了在在线主成分回归（PCR）中的时间固定样本保证。
methods: 这篇论文使用现代随机 martingale 集中来分析 PCR 在在线设置下的分析。
results: 这篇论文提供了针对板块数据设置中的实验设计框架，当实验是通过适应性的干预分配策略进行分配。

Abstract
Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for online (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixed design setting do not readily extend to the online setting, our results rely on adapting tools from modern martingale concentration to the error-in-variables setting. As an application of our bounds, we provide a framework for experiment design in panel data settings when interventions are assigned adaptively. Our framework may be thought of as a generalization of the synthetic control and synthetic interventions frameworks, where data is collected via an adaptive intervention assignment policy.

摘要
主成分回归（PCR）是一种流行的固定设计错误变量回归技术， linear regression 设定中的一种扩展，在观测 covariates 上存在随机噪声。我们提供了在线（规化）PCR 的首次时间均衡finite sample guarantees，当数据采集是动态的。由于fixed design 设定中PCR 的证明技巧不直接适用于在线设定，我们的结果基于采用现代martingale concentration 工具来error-in-variables设定。我们的极限 bounds 可以应用于面板数据设置中的实验设计，当实验是通过适应性干预分配策略采集数据。我们的框架可以看作是错误变量和synthetic control 框架的扩展，在适应性干预分配策略下采集数据。

Learning Generic Solutions for Multiphase Transport in Porous Media via the Flux Functions Operator

paper_url: http://arxiv.org/abs/2307.01354
repo_url: None
paper_authors: Waleed Diab, Omar Chaabi, Shayma Alkobaisi, Abeeb Awotunde, Mohammed Al Kobaisi
for: 加速 fluid 流动和运输在 porous media 中的 simulate 算法，使得在科学和工程领域中可以更快速地解决问题。
methods: 使用 deep learning 技术，具体来说是 Physics-Informed DeepONets (PI-DeepONets)，通过学习 partial differential equations (PDEs) 中的运算函数，从而实现快速的解决。
results: 比 traditional numerical solvers 快速多达四个数量级，并且可以捕捉到 any type of flux function (concave, convex, or non-convex) 的解决。同时，trained PI-DeepONet model 表现出了优秀的泛化能力，这使得它成为了解决 transport problems in porous media 中的一个有力的工具。

Abstract
Traditional numerical schemes for simulating fluid flow and transport in porous media can be computationally expensive. Advances in machine learning for scientific computing have the potential to help speed up the simulation time in many scientific and engineering fields. DeepONet has recently emerged as a powerful tool for accelerating the solution of partial differential equations (PDEs) by learning operators (mapping between function spaces) of PDEs. In this work, we learn the mapping between the space of flux functions of the Buckley-Leverett PDE and the space of solutions (saturations). We use Physics-Informed DeepONets (PI-DeepONets) to achieve this mapping without any paired input-output observations, except for a set of given initial or boundary conditions; ergo, eliminating the expensive data generation process. By leveraging the underlying physical laws via soft penalty constraints during model training, in a manner similar to Physics-Informed Neural Networks (PINNs), and a unique deep neural network architecture, the proposed PI-DeepONet model can predict the solution accurately given any type of flux function (concave, convex, or non-convex) while achieving up to four orders of magnitude improvements in speed over traditional numerical solvers. Moreover, the trained PI-DeepONet model demonstrates excellent generalization qualities, rendering it a promising tool for accelerating the solution of transport problems in porous media.

摘要
传统的数学方法 для模拟 fluid 流和物质传输在porous media中可能是计算昂贵的。机器学习的应用在科学计算中有助于减少模拟时间在多科学和工程领域。DeepONet 是一种可以加速解决部分偏微分方程（PDEs）的有力工具，它可以学习 PDEs 中操作（函数空间之间的映射）的映射。在这个工作中，我们学习了 Buckley-Leverett PDE 中的流函数空间和解空间之间的映射，使用 Physics-Informed DeepONets（PI-DeepONets）来实现这种映射，不需要任何对应的输入输出观察数据，只需要给定一些初始或边界条件即可。通过在模型训练中采用物理法律的软约束，类似于 Physics-Informed Neural Networks（PINNs），以及特有的深度神经网络架构，我们的提议的 PI-DeepONet 模型可以准确地预测解，并且可以在不同类型的流函数（凹、 convex、非几何）下实现四个数量级的速度提高。此外，我们训练的 PI-DeepONet 模型还表现出了优秀的泛化质量，使其成为加速porous media中物质传输问题的解决工具。

Patch-CNN: Training data-efficient deep learning for high-fidelity diffusion tensor estimation from minimal diffusion protocols

paper_url: http://arxiv.org/abs/2307.01346
repo_url: None
paper_authors: Tobias Goodwin-Allcock, Ting Gong, Robert Gray, Parashkev Nachev, Hui Zhang
for: 这种论文是为了提出一种新的方法，即 Patch-CNN，用于从六个方向的扩散图像（DWI）中提取扩散矩阵（DT）的估计。
methods: 该方法使用了深度学习方法，使用了缓冲层（Convolutional Neural Network，CNN）来学习扩散矩阵的估计。
results: 对比传统模型适应和维度全连接神经网络（voxel-wise Fully-Connected Neural Network，FCN），Patch-CNN 可以更好地估计扩散矩阵和纤维方向，并且只需要使用单个试验者的数据进行训练。

Abstract
We propose a new method, Patch-CNN, for diffusion tensor (DT) estimation from only six-direction diffusion weighted images (DWI). Deep learning-based methods have been recently proposed for dMRI parameter estimation, using either voxel-wise fully-connected neural networks (FCN) or image-wise convolutional neural networks (CNN). In the acute clinical context -- where pressure of time limits the number of imaged directions to a minimum -- existing approaches either require an infeasible number of training images volumes (image-wise CNNs), or do not estimate the fibre orientations (voxel-wise FCNs) required for tractogram estimation. To overcome these limitations, we propose Patch-CNN, a neural network with a minimal (non-voxel-wise) convolutional kernel (3$\times$3$\times$3). Compared with voxel-wise FCNs, this has the advantage of allowing the network to leverage local anatomical information. Compared with image-wise CNNs, the minimal kernel vastly reduces training data demand. Evaluated against both conventional model fitting and a voxel-wise FCN, Patch-CNN, trained with a single subject is shown to improve the estimation of both scalar dMRI parameters and fibre orientation from six-direction DWIs. The improved fibre orientation estimation is shown to produce improved tractogram.

摘要
我们提出了一种新方法，patch-CNN，用于从六个方向的扩散tensor（DT）估计。在临床情况下，使用全量的学习方法来估计DMRI参数，可以使用 Either voxel-wise fully-connected neural networks（FCN）或图像-wise convolutional neural networks（CNN）。现有的方法 either require an infeasible number of training images volumes（image-wise CNNs），或者不能估计纤维方向（voxel-wise FCNs），从而限制了轨迹估计。为了超越这些限制，我们提出了patch-CNN，一个具有最小（非voxel-wise） convolutional kernel（3×3×3）的神经网络。与voxel-wise FCNs比较，这有利于神经网络利用地方 анатомиче信息。与image-wise CNNs比较，最小kernel减少了训练数据的需求。我们通过对 conventiomal model fitting和voxel-wise FCN进行比较，发现patch-CNN，通过一个个体训练，可以提高六个方向DWI中的scalar DMRI参数和纤维方向的估计。此外，改进的纤维方向估计也可以提高轨迹的估计。

Robust Uncertainty Estimation for Classification of Maritime Objects

paper_url: http://arxiv.org/abs/2307.01325
repo_url: None
paper_authors: Jonathan Becktor, Frederik Scholler, Evangelos Boukas, Lazaros Nalpantidis
for: 这篇论文的目的是探讨在海上领域中使用不确定性估计的可能性，并在具有各种硬件和软件限制的实际场景中进行评估。
methods: 这篇论文使用了蒙特卡洛批处理来实现内部类uncertainty，并结合了最新的异常检测发现的技术来获得更全面的不确定性测量。
results: 该论文的实验结果显示，通过将Monte Carlo Dropout与异常检测技术结合使用，可以提高FPR95的性能，相比之下当模型没有异常数据训练时，该方法的性能提高了8%。此外，相比于基本实现的宽度网络，该方法可以提高性能 by 77%。此外， authors还释放了SHIPS数据集，并证明了该方法的有效性，将FPR95提高了44.2%。

Abstract
We explore the use of uncertainty estimation in the maritime domain, showing the efficacy on toy datasets (CIFAR10) and proving it on an in-house dataset, SHIPS. We present a method joining the intra-class uncertainty achieved using Monte Carlo Dropout, with recent discoveries in the field of outlier detection, to gain more holistic uncertainty measures. We explore the relationship between the introduced uncertainty measures and examine how well they work on CIFAR10 and in a real-life setting. Our work improves the FPR95 by 8% compared to the current highest-performing work when the models are trained without out-of-distribution data. We increase the performance by 77% compared to a vanilla implementation of the Wide ResNet. We release the SHIPS dataset and show the effectiveness of our method by improving the FPR95 by 44.2% with respect to the baseline. Our approach is model agnostic, easy to implement, and often does not require model retraining.

摘要
我们探索了海上领域中uncertainty估计的使用，通过使用CIFAR10杂交数据集和自有数据集SHIPS进行证明，并提出了将Monte Carlo Dropout中的内类uncertainty与现代异常检测发现相结合以获得更全面的uncertainty测度的方法。我们研究了引入的uncertainty测度与之间的关系，并在CIFAR10和实际场景中评估其效果。我们的工作提高了FPR95的性能，相比最高性能工作不包含外围数据集时，提高了8%。相比于普通实现的宽度网络，我们的方法提高了77%的性能。我们发布了SHIPS数据集，并通过提高FPR95的性能44.2%来证明我们的方法的效果。我们的方法是模型无关的，易于实现，通常不需要模型重新训练。

Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly

paper_url: http://arxiv.org/abs/2307.01317
repo_url: https://github.com/DLR-RM/GRACE
paper_authors: Jianxiang Feng, Matan Atad, Ismael Rodríguez, Maximilian Durner, Stephan Günnemann, Rudolph Triebel
for: 本研究旨在提高机器学习（ML）模型在机器人组装序列规划（RASP）中的 introspection 能力，以避免效率下降。
methods: 本研究提出了一种基于密度的可行性学习方法，不需要非可行示例。具体来说，我们将可行性学习问题转化为Out-of-Distribution（OOD）探测问题，使用Normalizing Flows（NF）来估计复杂的概率分布。
results: 在机器人组装用例中，提出的方法比单类基elines表现出色地探测不可行的组装。我们还进一步调查了我们方法的内部工作机制，发现可以通过高级变体NF实现很大的内存节省。

Abstract
Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF.

摘要
machine learning (ml) 模型在机器人组装序列规划 (rasp) 中需要 introspective 对预测的解决方案，以避免效率降低。先前的工作需要两类样本：可行和不可行的示例。然而，不可行的示例具有充足的收集困难，导致在重新训练时需要充足的时间。在这种情况下，我们提议一种基于浓度学习的可行学习方法，只需要可行的示例。具体来说，我们将可行学习问题定义为 OUT-OF-DISTRIBUTION (OOD) 检测，使用 Normalizing Flows (NF) 来Estimate 复杂的概率分布。实验表明，我们提议的方法在机器人组装use case中表现出色，可以快速检测不可行的组装。我们进一步调查我们的方法的内部工作机制，发现可以基于高级变体的 NF 实现大量内存保存。

Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2307.01316
repo_url: https://github.com/cav-research-lab/safe-reinforcement-learning-using-symbolic-logical-programming-for-autonomous-highway-driving
paper_authors: Iman Sharifi, Mustafa Yildirim, Saber Fallah
for: 本研究旨在开发一种能够在真实环境中学习自动驾驶策略，并确保安全性的神经符号逻辑深度学习方法（DRLSL）。
methods: 本方法结合神经网络学习和符号逻辑推理，以便在真实环境中学习自动驾驶策略，并且能够保证安全性。
results: 我们在使用高D数据集进行实践中，发现DRLSL方法可以避免不安全行为，并且在训练和测试阶段都能够快速 converges。此外，我们的结果还表明，DRLSL方法在面对新的驾驶场景时能够更好地泛化。

Abstract
The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.

摘要
<> translate "The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods."into Simplified Chinese.驾驶环境的动态性和路用者的多样性 pose significant challenges for autonomous driving decision-making.深度强化学习（DRL）已经成为解决这个问题的popular方法。然而，现有的DRL解决方案主要在模拟环境中应用，由于安全问题，导致它们在实际环境中没有得到广泛应用。为了突破这个限制，这篇论文提出了一种新的 neuralsymbolic model-free DRL方法，叫做DRL with Symbolic Logics（DRLSL）。这种方法结合了DRL（学习经验）和符号逻辑（知识驱动的理解），以便在真实环境中学习自动驾驶策略，并确保安全。我们在自动驾驶中实现了DRLSL框架，使用highD dataset，并证明了我们的方法在训练和测试阶段都可以避免不安全的行为。此外，我们的结果还表明，DRLSL在训练阶段更快 converges和在新的驾驶enario中表现更好的普适性。

A numerical algorithm for attaining the Chebyshev bound in optimal learning

paper_url: http://arxiv.org/abs/2307.01304
repo_url: None
paper_authors: Pradyumna Paruchuri, Debasish Chatterjee
for: 解决 оптимального学习从数据点集中回归函数的问题
methods: 基于近似解决半无穷问题的目标采样技术
results: 计算废弃半径和废弃中心，解决函数回归问题

Abstract
Given a compact subset of a Banach space, the Chebyshev center problem consists of finding a minimal circumscribing ball containing the set. In this article we establish a numerically tractable algorithm for solving the Chebyshev center problem in the context of optimal learning from a finite set of data points. For a hypothesis space realized as a compact but not necessarily convex subset of a finite-dimensional subspace of some underlying Banach space, this algorithm computes the Chebyshev radius and the Chebyshev center of the hypothesis space, thereby solving the problem of optimal recovery of functions from data. The algorithm itself is based on, and significantly extends, recent results for near-optimal solutions of convex semi-infinite problems by means of targeted sampling, and it is of independent interest. Several examples of numerical computations of Chebyshev centers are included in order to illustrate the effectiveness of the algorithm.

摘要
Translation notes:* "compact subset" becomes "compact subset" (同义译)* "Chebyshev center problem" becomes "Chebychev中心问题" (direct translation)* "hypothesis space" becomes "假设空间" (direct translation)* "compact but not necessarily convex subset" becomes "不必然凸的子集" (direct translation)* "minimal circumscribing ball" becomes "最小圆包" (direct translation)* "optimal learning from a finite set of data points" becomes "从finite个数据点中优化学习" (direct translation)* "convex semi-infinite problems" becomes "凸半无穷问题" (direct translation)* "targeted sampling" becomes "targeted采样" (direct translation)* "of independent interest" becomes "独立有利" (direct translation)Note that in Simplified Chinese, the word "space" is often omitted in translations, so "underlying Banach space" becomes just "underlying Banach 空间" in the translation.

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

paper_url: http://arxiv.org/abs/2307.01292
repo_url: None
paper_authors: Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov
for: 本研究探讨了基于服务器端模型 zoo 的实时网络应用中的安全性，具体来说是对模型EXTRACTION攻击的Robustness。
methods: 本研究提出了一种高效的查询 fingerprinting 算法，使攻击者可以让服务器端模型 consistently 执行恶意操作。此外，本研究还提出了一种基于噪音的防御机制，通过添加噪音到指定性能指标来防止 fingerprinting。
results: 本研究表明，使用高效查询 fingerprinting 算法可以在模型EXTRACTION攻击中实现高精度和高准确率（在 $1%$ 以内），同时可以提高模型抽象层的安全性。此外，本研究还发现了一种基于噪音的防御机制可以减少攻击精度和准确率（在 $9.8%$ 和 $4.8%$ 以内）。

Abstract
Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. Modern inference serving systems break this assumption. Thus, they cannot be directly applied to extract a victim model, as models are hidden behind a layer of abstraction exposed by the serving system. An attacker can no longer identify which model she is interacting with. To this end, we first propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained when attacking a single, explicitly specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Second, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. The proposed defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). Third, we show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We implement the proposed defense in a real system with plans to open source.

摘要
模型服务系统在实时网络应用中变得越来越流行，特别是在用户发送查询并指定需要的性能指标（例如精度和响应时间）后，服务器根据指定的指标从后端维护的模型 zoo 中提供查询结果。这篇论文检查这些系统的安全性，特别是对于模型提取攻击的Robustness。现有的黑盒攻击假设可以重复地选择服务器上的单个模型来进行推理请求。现代推理服务系统破坏了这一假设，因此无法直接应用于提取受害模型。攻击者无法确定她正在互动的是哪个模型。为此，我们首先提出了一种高效的询问算法，使得攻击者可以轻松地触发所需的模型。我们显示，使用我们的询问算法可以在$1\%$的精度和准确度下提取模型，并且可以在$14.6\%$的精度和$7.7\%$的准确度上提高模型提取的精度和准确度，相比之下 Naive 攻击。其次，我们采用噪音基的防御机制，将指定性能指标添加噪音，以防止指纹。我们的防御策略可以在中等模型提取 task 下 reducuce 攻击的精度和准确度为$9.8\%$和$4.8\%$。最后，我们显示了我们的防御机制存在可配置的质量和系统性能之间的负面冲击，可以在保持可接受的系统性能（$>80\%$）的情况下实现可靠的受害模型提取保护。我们已经实现了我们的防御机制，计划将其开源。

Fighting the disagreement in Explainable Machine Learning with consensus

paper_url: http://arxiv.org/abs/2307.01288
repo_url: None
paper_authors: Antonio Jesus Banegas-Luna, Carlos Martınez-Cortes, Horacio Perez-Sanchez
For: 本研究旨在解释机器学习模型的内部工作方式，以提高模型的可解释性。* Methods: 本研究使用了多种可解释性算法，包括本研究所开发的一种新的函数，以解释五种机器学习模型。* Results: 研究结果显示，提出的函数比其他函数更公正，提供了更一致和准确的解释。

Abstract
Machine learning (ML) models are often valued by the accuracy of their predictions. However, in some areas of science, the inner workings of models are as relevant as their accuracy. To understand how ML models work internally, the use of interpretability algorithms is the preferred option. Unfortunately, despite the diversity of algorithms available, they often disagree in explaining a model, leading to contradictory explanations. To cope with this issue, consensus functions can be applied once the models have been explained. Nevertheless, the problem is not completely solved because the final result will depend on the selected consensus function and other factors. In this paper, six consensus functions have been evaluated for the explanation of five ML models. The models were previously trained on four synthetic datasets whose internal rules were known in advance. The models were then explained with model-agnostic local and global interpretability algorithms. Finally, consensus was calculated with six different functions, including one developed by the authors. The results demonstrated that the proposed function is fairer than the others and provides more consistent and accurate explanations.

摘要
Translated into Simplified Chinese:机器学习（ML）模型常被评估于其预测准确率。然而，在一些科学领域中，模型内部的工作方式也很重要。为了了解模型如何工作，使用可解释算法是最佳选择。然而，尽管有多种可解释算法可用，它们经常在解释模型时存在差异，导致不一致的解释。为了解决这个问题，可以应用consensus函数。然而，这并不完全解决问题，因为选择的consensus函数以及其他因素会影响最终结果。在这篇论文中，六种consensus函数被评估以解释五种ML模型。这些模型先前在四个已知内部规则的 sintetic数据集上进行了训练。然后，使用模型无关的本地和全局可解释算法来解释模型。最后，使用六种不同的consensus函数进行了投票，包括作者所开发的一种。结果表明，提案的函数比其他们更公平，并提供了更一致和准确的解释。

Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the Human Connectome Development Cohort

paper_url: http://arxiv.org/abs/2307.05426
repo_url: None
paper_authors: Abdoljalil Addeh, Fernando Vega, Rebecca J Williams, Ali Golestani, G. Bruce Pike, M. Ethan MacDonald
for: 这个研究的目的是提高fMRI研究中肺功能信号的质量和可用性。
methods: 该研究使用一种一维 convolutional neural network（CNN）模型来重建两种肺功能指标，即RV和RVT。
results: 研究结果表明，CNN模型可以从休息BOLD信号中捕捉有用的特征，并重建实际的RV和RVT时间序列。I hope this helps! Let me know if you have any other questions.

Abstract
In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries. It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows.

摘要
很多fMRI研究中的呼吸信号不可用或者质量不良。因此，直接从BOLD信号中除去低频呼吸变化是不可能的。本研究提出了一种一维 convolutional neural network（CNN）模型，用于重建两个呼吸指标：RV和RVT。结果显示，CNN可以从休息BOLD信号中捕捉有用的特征，重建真实的RV和RVT时间序列。预计该方法的应用将降低fMRI研究的成本，降低复杂性，并减少参与者的负担，因为他们不需要穿戴呼吸膜。

NeuBTF: Neural fields for BTF encoding and transfer

paper_url: http://arxiv.org/abs/2307.01199
repo_url: None
paper_authors: Carlos Rodriguez-Pardo, Konstantinos Kazatzis, Jorge Lopez-Moreno, Elena Garces
for: 这篇论文旨在提出一种新的神经网络材料表示方法，用于解决神经网络材料的固定性问题，以便在渲染中使用。
methods: 该方法使用神经网络来表示材料，并使用引导图像来控制神经网络的输出。在测试时，该方法可以使用UV、摄像头和光照向量来查询神经网络的输出。
results: 该方法可以在多种 sintetic和实际材料上达到竞争性的压缩率，并且可以通过引导图像来控制神经网络的输出。

Abstract
Neural material representations are becoming a popular way to represent materials for rendering. They are more expressive than analytic models and occupy less memory than tabulated BTFs. However, existing neural materials are immutable, meaning that their output for a certain query of UVs, camera, and light vector is fixed once they are trained. While this is practical when there is no need to edit the material, it can become very limiting when the fragment of the material used for training is too small or not tileable, which frequently happens when the material has been captured with a gonioreflectometer. In this paper, we propose a novel neural material representation which jointly tackles the problems of BTF compression, tiling, and extrapolation. At test time, our method uses a guidance image as input to condition the neural BTF to the structural features of this input image. Then, the neural BTF can be queried as a regular BTF using UVs, camera, and light vectors. Every component in our framework is purposefully designed to maximize BTF encoding quality at minimal parameter count and computational complexity, achieving competitive compression rates compared with previous work. We demonstrate the results of our method on a variety of synthetic and captured materials, showing its generality and capacity to learn to represent many optical properties.

摘要
神经材料表示法是现代渲染中广泛应用的一种表示方法。它比分析模型更加表达力，且占用内存更少，但现有的神经材料都是不可变的，意味着它们的输出对于特定的UV、摄像机和光量向量的训练后就是固定的。这在材料的预测中是有用的，但在材料需要编辑时可能变得非常限制性。在这篇论文中，我们提出了一种新的神经材料表示方法，该方法同时解决了BTF压缩、瓦片和推导问题。在测试时，我们使用导航图像作为输入，通过conditioning神经BTF于这个输入图像的结构特征来控制神经BTF。然后，神经BTF可以被查询作为普通BTF使用UV、摄像机和光量向量。我们的框架中每个组件都是为最大化BTF编码质量而设计，而且减少参数计数和计算复杂度，与之前的工作相比，我们的方法实现了竞争力的压缩率。我们在多种 sintetic和捕捉的材料上进行了试验，展示了我们的方法的通用性和能力学习表示多种光学性质。

Improved sampling via learned diffusions

paper_url: http://arxiv.org/abs/2307.01198
repo_url: None
paper_authors: Lorenz Richter, Julius Berner, Guan-Horng Liu
for: 这些论文提出了基于深度学习的方法，用于从不正规分布中采样。
methods: 这些方法是控制的扩散过程的特殊情况，寻找从给定的先前分布到目标分布的最有可能的杂乱进程。
results: 我们在这些方法中引入了一种变量形式，基于时间反转的扩散过程中的路径空间测量差异。这种抽象视角导致了可优化的梯度下降算法，并包含了先前的目标作为特殊情况。此外，我们还可以考虑不同于倒卡劳布拉迪弗分布的差异，以避免模式塌缩。例如，我们提出了对数差异损失函数，它在数值上显示了优化性和改进性。

Abstract
Recently, a series of papers proposed deep learning-based approaches to sample from unnormalized target densities using controlled diffusion processes. In this work, we identify these approaches as special cases of the Schr\"odinger bridge problem, seeking the most likely stochastic evolution between a given prior distribution and the specified target. We further generalize this framework by introducing a variational formulation based on divergences between path space measures of time-reversed diffusion processes. This abstract perspective leads to practical losses that can be optimized by gradient-based algorithms and includes previous objectives as special cases. At the same time, it allows us to consider divergences other than the reverse Kullback-Leibler divergence that is known to suffer from mode collapse. In particular, we propose the so-called log-variance loss, which exhibits favorable numerical properties and leads to significantly improved performance across all considered approaches.

摘要
最近，一系列论文提出了基于深度学习的方法来从不正规Target概率分布中采样。在这篇文章中，我们将这些方法定义为Schrödinger大桥问题的特殊情况，寻找从给定的先验分布到指定的Target概率分布的最有可能的杂化过程。我们进一步总结了这个框架，通过在Path空间测度上引入减法，从而得到了一种可优化的变分形式。这种抽象的视角允许我们考虑其他than reverse Kullback-Leibler divergence的异同，这种异同 known to suffer from mode collapse。特别是，我们提议使用Log-variance loss，它在数值上具有优秀的性质，并在所有考虑的方法中带来显著改进。

Squeezing Large-Scale Diffusion Models for Mobile

paper_url: http://arxiv.org/abs/2307.01193
repo_url: None
paper_authors: Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, Hyungjun Kim
for: 这 paper 旨在探讨将 Stable Diffusion 模型部署到移动设备上，以便实现高精度图像生成。
methods: 该 paper 使用 TensorFlow Lite 框架来实现移动设备上的 Stable Diffusion 部署，并支持 iOS 和 Android 设备。
results: 该 paper 实现的 Mobile Stable Diffusion 可以在 Android 设备上 achieve 512x512 图像生成的推理延迟时间小于 7 秒，并且可以在移动 GPU 上实现。

Abstract
The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more than one billion parameters to mobile devices poses distinctive challenges due to the limited computational and memory resources, which may vary according to the device. In this paper, we present the challenges and solutions for deploying Stable Diffusion on mobile devices with TensorFlow Lite framework, which supports both iOS and Android devices. The resulting Mobile Stable Diffusion achieves the inference latency of smaller than 7 seconds for a 512x512 image generation on Android devices with mobile GPUs.

摘要
Diffusion模型的出现已经极大地扩大了高精度图像生成的范围，导致了实践部署和学术研究中的重要进步。然而，将大型Diffusion模型，如Stable Diffusion，deploy到移动设备上具有限制的计算和内存资源的问题。在这篇文章中，我们介绍了将Stable Diffusion部署到移动设备上的挑战和解决方案，使用TensorFlow Lite框架支持iOS和Android设备。我们的Mobile Stable Diffusion实现了512x512像素生成的推理延迟低于7秒钟在Android设备上。

Trainable Transformer in Transformer

paper_url: http://arxiv.org/abs/2307.01189
repo_url: https://github.com/abhishekpanigrahi1996/transformer_in_transformer
paper_authors: Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora
for: 这个论文目的是提出一种高效的Transformer模型内部精细调整方法，以便在推理过程中进行精细调整。
methods: 这个方法使用了一些创新的近似技术，使得一个具有少于20亿参数的TinT模型能够在单步前进中 simulate和精细调整一个125亿参数的Transformer模型。
results: 在语言模型和下游任务中进行综合实验 validate了TinT模型的内部精细调整过程，并证明了大型预训练语言模型可以执行复杂的子任务。例如，even with a limited one-step budget, we observe TinT for a OPT-125M model improves performance by 4-16% absolute on average compared to OPT-125M。

Abstract
Recent works attribute the capability of in-context learning (ICL) in large pre-trained language models to implicitly simulating and fine-tuning an internal model (e.g., linear or 2-layer MLP) during inference. However, such constructions require large memory overhead, which makes simulation of more sophisticated internal models intractable. In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e.g., pre-trained language models). In particular, we introduce innovative approximation techniques that allow a TinT model with less than 2 billion parameters to simulate and fine-tune a 125 million parameter transformer model within a single forward pass. TinT accommodates many common transformer variants and its design ideas also improve the efficiency of past instantiations of simple models inside transformers. We conduct end-to-end experiments to validate the internal fine-tuning procedure of TinT on various language modeling and downstream tasks. For example, even with a limited one-step budget, we observe TinT for a OPT-125M model improves performance by 4-16% absolute on average compared to OPT-125M. These findings suggest that large pre-trained language models are capable of performing intricate subroutines. To facilitate further work, a modular and extensible codebase for TinT is included.

摘要
近期研究归功大型预训言语模型中的增Context学习（ICL）能力于内部模型（例如线性或2层MLP）的隐式模拟和细化 during inference. 然而，这些建构具有大量内存开销，使得更复杂的内部模型的模拟成为不可行。在这项工作中，我们提出了高效的建构——Transformer in Transformer（简称TinT），允许 transformer 模型在执行中内部模拟和细化复杂模型（例如预训言语模型）。具体来说，我们提出了创新的近似技术，使得 TinT 模型 fewer than 200 billion parameters 可以在单个前进 pass 中模拟和细化 125 million parameter transformer 模型。TinT 支持许多常见 transformer 变种，并且其设计想法也提高了过去内置简单模型的效率。我们通过综合实验 validate TinT 模型内部细化过程的效果，并在语言模型和下游任务上 observe 4-16% 绝对提升。这些发现表明大规模预训言语模型可以执行复杂的子routines。为了便于后续工作，我们附加了可扩展和可模块化的代码基金。

Fitting an ellipsoid to a quadratic number of random points

paper_url: http://arxiv.org/abs/2307.01181
repo_url: None
paper_authors: Afonso S. Bandeira, Antoine Maillard, Shahar Mendelson, Elliot Paquette
for: 这个论文研究了将 $n$ 个标准正态随机向量在 $\mathbb{R}^d$ 中适应中心圆柱体的问题，当 $n, d \to \infty$ 时。
methods: 这个论文使用了 Bartl & Mendelson 关于 Gram 矩阵的集中性的结论，并使用了一些轻量级的假设来证明这个问题在高概率下是可行的。
results: 这个论文证明了当 $n \leq d^2 / C$，其中 $C > 0$ 是一个可能很大的常数， THEN 问题 $(\mathrm{P})$ 有高概率是可行的。

Abstract
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions with high probability if $n \geq (1 + \varepsilon) d^2 /4$. So far, only a trivial bound $n \geq d^2 / 2$ is known on the negative side, while the best results on the positive side assume $n \leq d^2 / \mathrm{polylog}(d)$. In this work, we improve over previous approaches using a key result of Bartl & Mendelson on the concentration of Gram matrices of random vectors under mild assumptions on their tail behavior. This allows us to give a simple proof that $(\mathrm{P})$ is feasible with high probability when $n \leq d^2 / C$, for a (possibly large) constant $C > 0$.

摘要
我们考虑一个问题($\mathrm{P}$)，即在中心为零的椭球上适应 $n$ 标准高斯均匀随机向量，当 $n, d \to \infty$ 时。这个问题据悉有一个锐化可行性过渡：如果 $n \leq (1 - \varepsilon) d^2 / 4$，那么 $(\mathrm{P})$ 有高概率解，而如果 $n \geq (1 + \varepsilon) d^2 /4$，那么 $(\mathrm{P})$ 有高概率无解。目前只知道一个负边界 $n \geq d^2 / 2$，而最好的结果在正边界上假设 $n \leq d^2 / \text{polylog}(d)$。在这个工作中，我们使用 Bartl & Mendelson 关于均匀矩阵的吸引性的结果，从而得到一个简单的证明：如果 $n \leq d^2 / C$，那么 $(\mathrm{P})$ 有高概率解，其中 $C > 0$ 是一个可能很大的常数。

PlanE: Representation Learning over Planar Graphs

paper_url: http://arxiv.org/abs/2307.01180
repo_url: https://github.com/zzysonny/plane
paper_authors: Radoslav Dimitrov, Zeyang Zhao, Ralph Abboud, İsmail İlkan Ceylan
for: 本研究的目的是设计一个可以快速学习完整的平面图 isomorphism 的架构，以便在平面图上进行图像学习。
methods: 本研究使用了一种称为 PlanE 的框架，它是基于 Hopcroft 和 Tarjan 的平面图 isomorphism 算法。PlanE 包括一些可以学习完整的平面图 invariants 的架构，并且可以在实际上扩展到大规模的平面图。
results: 本研究透过实验验证了 PlanE 的模型架构，并取得了多个 state-of-the-art 的结果。在 well-known 平面图 benchmark 上，PlanE 的模型能够实现高效地学习完整的平面图 invariants。

Abstract
Graph neural networks are prominent models for representation learning over graphs, where the idea is to iteratively compute representations of nodes of an input graph through a series of transformations in such a way that the learned graph function is isomorphism invariant on graphs, which makes the learned representations graph invariants. On the other hand, it is well-known that graph invariants learned by these class of models are incomplete: there are pairs of non-isomorphic graphs which cannot be distinguished by standard graph neural networks. This is unsurprising given the computational difficulty of graph isomorphism testing on general graphs, but the situation begs to differ for special graph classes, for which efficient graph isomorphism testing algorithms are known, such as planar graphs. The goal of this work is to design architectures for efficiently learning complete invariants of planar graphs. Inspired by the classical planar graph isomorphism algorithm of Hopcroft and Tarjan, we propose PlanE as a framework for planar representation learning. PlanE includes architectures which can learn complete invariants over planar graphs while remaining practically scalable. We empirically validate the strong performance of the resulting model architectures on well-known planar graph benchmarks, achieving multiple state-of-the-art results.

摘要
“图 neural networks 是 Representation learning over graphs 中的主要模型，其中的思想是通过一系列转换来计算输入图的节点的表示，以确定learned graph function 是isoformation invariant的，这使得learned representation 成为图 invariants。然而，已知这些类型的模型学习的图 invariants 是不完全的：存在一些非同构的图对标准图 neural networks 无法分辨。这不Surprising，因为计算通用图 isomorphism testing 的计算复杂度很高，但在特定的图类中，有高效的图 isomorphism testing 算法，如平面图。我们的目标是设计一种能够有效地学习完整的平面图 invariants的architecture。 draw inspiration from Hopcroft 和 Tarjan 的平面图 isomorphism 算法，我们提出 PlanE 框架，用于平面 representation learning。 PlanE 包括一些可以学习完整的平面图 invariants 的architecture，并且 remain practically scalable。我们通过实验证明了这些结果的强性，在well-known planar graph benchmarks 上达到多个state-of-the-art result。”

Learning Mixtures of Gaussians Using the DDPM Objective

paper_url: http://arxiv.org/abs/2307.01178
repo_url: None
paper_authors: Kulin Shah, Sitan Chen, Adam Klivans
for: 本文研究了 diffusion 模型可以学习哪些分布？
methods: 本文使用了什么方法？
results: 本文得到了哪些结果？Here are my answers, in Simplified Chinese:
for: 本文研究了 diffusion 模型可以学习 Gaussian mixture models 的参数。
methods: 本文使用了 gradient descent 算法，并证明了其可以高效地学习 Gaussian mixture models 的参数。
results: 本文证明了 gradient descent 算法可以在两种设置下高效地学习 Gaussian mixture models：1) 随机初始化下可以learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers。2) warm start 下可以 learns mixtures of $K$ spherical Gaussians with $\Omega(\sqrt{\log(\min(K,d))})$-separated centers。

Abstract
Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers. 2) We show gradient descent with a warm start learns mixtures of $K$ spherical Gaussians with $\Omega(\sqrt{\log(\min(K,d))})$-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.

摘要
近期研究表明，扩散模型可以学习任何分布，只要可以进行分数估计。然而，我们还不够了解在哪些情况下分数估计是可行的，更重要的是，我们是否可以实现有效的梯度下降算法来解决这个问题。在这个工作中，我们给出了首次可证fficient的结果，其中包括以下两个情况：1. 我们证明，使用随机 initialization 的梯度下降在 $d$ 维的两个球形 Gaussian 混合模型中可以有效地回归真实参数。2. 我们证明，使用温始的梯度下降可以在 $K$ 个球形 Gaussian 混合模型中，对中心点进行 $\Omega(\sqrt{\log(\min(K,d))})$ 级别的分割。我们的证明中的一个关键元素是一种新的分数-基本方法和 EM 算法以及spectral methods之间的连接。

Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space

paper_url: http://arxiv.org/abs/2307.01177
repo_url: None
paper_authors: Zhengdao Chen
for: 本文研究深度学习理论中神经网络（NN）定义的函数空间的特点。
methods: 作者视多层NN为定义特定层次的再生核希尔бер特空间（RKHS），称为神经希尔бер特阶梯（NHL）。
results: 作者证明了多层NN表达的函数和NHL之间的对应关系，并提供了控制复杂性度量的泛化保证。 Plus, the author derives the evolution of NHL as the dynamics of multiple random fields, and shows examples of depth separation in NHLs under different activation functions.

Abstract
The characterization of the functions spaces explored by neural networks (NNs) is an important aspect of deep learning theory. In this work, we view a multi-layer NN with arbitrary width as defining a particular hierarchy of reproducing kernel Hilbert spaces (RKHSs), named a Neural Hilbert Ladder (NHL). This allows us to define a function space and a complexity measure that generalize prior results for shallow NNs, and we then examine their theoretical properties and implications in several aspects. First, we prove a correspondence between functions expressed by L-layer NNs and those belonging to L-level NHLs. Second, we prove generalization guarantees for learning an NHL with the complexity measure controlled. Third, corresponding to the training of multi-layer NNs in the infinite-width mean-field limit, we derive an evolution of the NHL characterized as the dynamics of multiple random fields. Fourth, we show examples of depth separation in NHLs under ReLU and quadratic activation functions. Finally, we complement the theory with numerical results to illustrate the learning of RKHS in NN training.

摘要
文章主要探讨深度学习理论中神经网络（NN）函数空间的特点。在这篇文章中，我们将多层NN视为定义特定层次的重复内 produit 希尔бер特空间（RKHS），称为神经希尔бер特阶梯（NHL）。这允许我们定义函数空间和复杂度度量，这些度量将对先前的浅层NN进行扩展，并且我们将研究这些理论性质和影响。首先，我们证明了L层NN表达的函数和L层NHL之间的对应关系。其次，我们证明了控制复杂度度量的学习承诺。第三，对于在无限宽度的平均场中训练多层NN，我们 derivation 了NHL的演化，这可以看做多个Random Fields 的动态。最后，我们通过实验示例来补充理论，以Illustrate 神经网络在训练中学习RKHS的过程。

Quantum Neural Estimation of Entropies

paper_url: http://arxiv.org/abs/2307.01171
repo_url: None
paper_authors: Ziv Goldfeld, Dhrumil Patel, Sreejith Sreekumar, Mark M. Wilde
for: 估计量子系统中的信息量和相关性
methods: 使用变量量子算法和经典神经网络参数化测量方法
results: 精确地估计了不同 entropy 度量的值，有效地应用于下游任务

Abstract
Entropy measures quantify the amount of information and correlations present in a quantum system. In practice, when the quantum state is unknown and only copies thereof are available, one must resort to the estimation of such entropy measures. Here we propose a variational quantum algorithm for estimating the von Neumann and R\'enyi entropies, as well as the measured relative entropy and measured R\'enyi relative entropy. Our approach first parameterizes a variational formula for the measure of interest by a quantum circuit and a classical neural network, and then optimizes the resulting objective over parameter space. Numerical simulations of our quantum algorithm are provided, using a noiseless quantum simulator. The algorithm provides accurate estimates of the various entropy measures for the examples tested, which renders it as a promising approach for usage in downstream tasks.

摘要
Entropy 测量量代表量子系统中的信息量和相关性。在实践中，当量子状态未知，仅可以通过量子状态的复制来进行估算Entropy测量。我们提出了一种量子算法来估算 von Neumann 熵和 R\'enyi 熵，以及测量相对熵和测量 R\'enyi 相对熵。我们的方法首先假设测量对象的量子演算和классиical neural network的参数，然后对参数空间进行优化。我们的numerical simulation表明，该算法可以准确地估算各种熵测量的例子，这使其成为下游任务中的一个有前途的方法。Here's the breakdown of the translation:* Entropy 测量量 (Entropy measures) -> 熵测量 (entropy measurements)* 量子系统 (quantum system) -> 量子状态 (quantum state)* 未知 (unknown) -> 未知的 (unknown)* 复制 (copies) -> 复制品 (copies)* 估算 (estimation) -> 估算值 (estimated value)* von Neumann 熵 (Von Neumann entropy) -> von Neumann 熵量 (Von Neumann entropy)* R\'enyi 熵 (R\'enyi entropy) -> R\'enyi 熵量 (R\'enyi entropy)* 测量相对熵 (measured relative entropy) -> 测量相对熵量 (measured relative entropy)* 测量 R\'enyi 相对熵 (measured R\'enyi relative entropy) -> 测量 R\'enyi 相对熵量 (measured R\'enyi relative entropy)* 参数 (parameters) -> 参数空间 (parameter space)* 优化 (optimization) -> 优化过程 (optimization process)* numerical simulation -> 数值仿真 (numerical simulation)Note that the translation is done in Simplified Chinese, which is the most widely used standard for Chinese writing. The translation is done word-for-word, and some of the phrases or sentences may not be exactly the same as the original English version, but they should convey the same meaning.

Online nearest neighbor classification

paper_url: http://arxiv.org/abs/2307.01170
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Sanjoy Dasgupta, Geelon So
for: 研究在可实现 Setting 中的在线非参数化分类问题。
methods: 使用 classical 1-nearest neighbor algorithm，并证明其在可实现 Setting 中 achieve 下降的误差率。
results: 实现下降的误差率，即在对征或平滑的对手中的误差率。

Abstract
We study an instance of online non-parametric classification in the realizable setting. In particular, we consider the classical 1-nearest neighbor algorithm, and show that it achieves sublinear regret - that is, a vanishing mistake rate - against dominated or smoothed adversaries in the realizable setting.

摘要
我们研究在可实现 setting 中的在线非参数化分类问题。特别是，我们考虑了经典的1 nearest neighbor算法，并证明它在可实现 setting 中对于受控或平滑的反对敌人（adversaries） achieve 子线性 regret - 即消失的错误率。

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

paper_url: http://arxiv.org/abs/2307.01169
repo_url: None
paper_authors: Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She
for: 这篇论文是关于最优化问题的，具体来说是使用斜率逐步下降法和贝叶斯搜索法来解决一种具有约束的最优化问题。
methods: 这篇论文使用了一种名为”proximal Polyak-Lojasiewicz”的假设，并通过将这个假设应用到斜率逐步下降法中来提高它的准确率。此外，论文还使用了一种名为”bound- and summation-constrained steepest descent”的方法来解决具有约束的最优化问题。
results: 论文的结果表明，使用这种新的方法可以在$O(n \log n)$时间内解决具有约束的最优化问题，而且比之前的方法更快。此外，论文还证明了这种方法的准确率是Random Selection的两倍，并且不виси于问题的维度$n$。

Abstract
We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n^2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.

摘要
我们考虑最小化一个几何函数，并且需要遵循一个总和约束。我们利用两个坐标更新的连接，将其与等式约束对应的steepest descent在L1内 producer一个更快的价值变数。然后我们考虑受约束的最小化问题，其中包括总和约束和范围约束。现有的对策可能只能保证很小的进步，或者需要O(n^2)的时间来计算。我们显示，在L1内的约束降阶 descendence可以在O(nlogn)的时间内获得更多的进步，并且可以更快地计算。

Don’t freeze: Finetune encoders for better Self-Supervised HAR

paper_url: http://arxiv.org/abs/2307.01168
repo_url: None
paper_authors: Vitor Fortes Rey, Dominique Nshimyimana, Paul Lukowicz
for: 这个论文是为了解决人类活动识别领域中的标签数据可用性问题而提出的一种解决方案。
methods: 这个论文使用了自然语言处理中的预测任务，如重构和对比预测编码，来学习有用的表示。这些方法采用了预训练、冻结和细化的过程。
results: 这个论文发现，不冻结表示后的表示可以获得显著性能提升，这种提升是随着标签数据的量而增加的。此外，这种效果是无论在Capture24数据集上进行预测任务还是直接在目标数据集上进行预测任务中都存在。

Abstract
Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive predictive coding, useful representations can be learned that then can be used for classification. Those approaches follow the pretrain, freeze and fine-tune procedure. In this paper we will show how a simple change - not freezing the representation - leads to substantial performance gains across pretext tasks. The improvement was found in all four investigated datasets and across all four pretext tasks and is inversely proportional to amount of labelled data. Moreover the effect is present whether the pretext task is carried on the Capture24 dataset or directly in unlabelled data of the target dataset.

摘要
近期，无监督学习在人活动识别领域被提出，作为数据可用性问题的解决方案。这种方法是通过重构或对比预测编码来学习有用的表示，然后用于分类。这些方法遵循“预训练、冻结并微调”的过程。在这篇论文中，我们将展示一种简单的改变：不冻结表示，导致了重要的性能提升，并且这种提升随着数据量的减少而增加。此外，这种效果是不论预测任务是在 Capture24 数据集上进行还是直接在无标签数据集上进行的。

Coupled Gradient Flows for Strategic Non-Local Distribution Shift

paper_url: http://arxiv.org/abs/2307.01166
repo_url: None
paper_authors: Lauren Conger, Franca Hoffmann, Eric Mazumdar, Lillian Ratliff
for: 本研究旨在分析现实世界系统中的分布变化动态，包括学习算法和其部署的分布之间的反馈循环。
methods: 本研究提出了一种新的整合方法，该方法可以模型学习算法部署中的复杂分布变化，包括策略性反应、非本地人口互动和其他外部因素引起的分布变化。
results: 研究表明，当算法进行梯度下降 retraining 时，可以 дости到稳定状态，并且在有限和无限维度中都有显式速率，这些速率取决于模型参数。此外，研究还发现，该方法可以 Capture 许多已知的分布变化形式，如楔形和不同影响。

Abstract
We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential equation model that captures fine-grained changes in the distribution over time by accounting for complex dynamics that arise due to strategic responses to algorithmic decision-making, non-local endogenous population interactions, and other exogenous sources of distribution shift. We consider two common settings in machine learning: cooperative settings with information asymmetries, and competitive settings where a learner faces strategic users. For both of these settings, when the algorithm retrains via gradient descent, we prove asymptotic convergence of the retraining procedure to a steady-state, both in finite and in infinite dimensions, obtaining explicit rates in terms of the model parameters. To do so we derive new results on the convergence of coupled PDEs that extends what is known on multi-species systems. Empirically, we show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.

摘要
我们提出了一种新的框架，用于分析实际系统中分布shift的动态。这个框架 capture了学习算法和它们所部署的分布之间的反馈循环。先前的工作大多把反馈引起的分布shift模型为对抗性或非常简单的分布shift结构。相比之下，我们提出了一个结合部分梯度方程的模型，该模型可以考虑复杂的时间变化、策略性反应、非本地人口互动等因素，以捕捉细腻的分布变化。我们考虑了两种常见的机器学习设置：合作性设置和竞争性设置。在两个设置中，当算法通过梯度下降 retrained 时，我们证明了预测过程的稳定性，包括有限维度和无穷维度下的稳定性，并得到了明确的速率。为此，我们 derivation 了新的结果，用于coupled PDEs 的减少。实际证明，我们的方法能够捕捉到一些已知的分布shift形式，如极化和不同的影响。

Improving Language Plasticity via Pretraining with Active Forgetting

paper_url: http://arxiv.org/abs/2307.01163
repo_url: None
paper_authors: Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe
for: 实现PLMs的 universality，将其应用于新语言。
methods: 使用活动遗忘机制 during pretraining，以实现PLMs快速适应新语言。
results: 在语言适应中，使用我们的遗忘机制可以提高PLMs的学习新embeddings的能力，并在仅有少量数据的情况下表现出佳。

Abstract
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

摘要
现在，预训练语言模型（PLM）是自然语言处理的主要模型。尽管它们在下游性能方面表现出色，但是将其应用到新语言可能会增加难度，从而限制其universal accessible的能力。先前的工作已经证明可以通过学习一个新的嵌入层来解决这个问题，但是这需要大量的数据和计算资源。我们提议使用活动忘记机制 durante la pretrainings，作为一种简单的创建 PLMs 可快速适应新语言的方法。具体来说，在每K更新中，我们会重置嵌入层，这会让 PLM 在有限的更新数量内提高其学习新嵌入的能力，类似于 meta-learning 效果。我们通过使用 RoBERTa 进行实验，发现使用我们的忘记机制不仅可以在语言适应过程中快速 convergence，而且在数据量低的情况下，特别是与英语远的语言，也能够表现出较好的性能。

Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01158
repo_url: None
paper_authors: Ini Oguntola, Joseph Campbell, Simon Stepputtis, Katia Sycara
for: 本研究旨在提高人工智能代理人在多智能环境中的社会智能，通过模拟他人的心理状态。
methods: 本研究使用深度网络模型策略，并将含义rich的信念嵌入策略中。然后，对每个代理人的信念预测能力作为多代理人学习的自适应奖励信号。
results: 在混合合作竞争环境中，该方法可以提高代理人之间的协作和竞争能力。

Abstract
The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward signal for multi-agent reinforcement learning. Finally, we present preliminary empirical results in a mixed cooperative-competitive environment.

摘要
人类社交智能中能够模拟他人的心理状态是非常重要的，可以为人工智能agent提供类似的社交动力。我们提出了将 semantically meaningful和human-interpretable的beliefsgrounding在深度网络模型中的方法。然后我们考虑了第二个belief预测任务。我们认为每个agent可以预测其他agent的beliefs作为多 agent reinforcement learning中的内在奖励信号。最后，我们提供了一些初步的实验结果在混合合作-竞争环境中。Here's a word-for-word translation of the text:人类社交智能中能够模拟他人的心理状态是非常重要的，可以为人工智能agent提供类似的社交动力。我们提出了将semantically meaningful和human-interpretable的beliefsgrounding在深度网络模型中的方法。然后我们考虑了第二个belief预测任务。我们认为每个agent可以预测其他agent的beliefs作为多agent reinforcement learning中的内在奖励信号。最后，我们提供了一些初步的实验结果在混合合作-竞争环境中。

A novel approach for predicting epidemiological forecasting parameters based on real-time signals and Data Assimilation

paper_url: http://arxiv.org/abs/2307.01157
repo_url: None
paper_authors: Romain Molinas, César Quilodrán Casas, Rossella Arcucci, Ovidiu Şerban
for: 预测epidemiological参数，使用新的实时信号 integrate from various sources, such as social media-based population density maps and Air Quality data。
methods: 使用Convolutional Neural Networks (CNN) ensemble models and various data sources and fusion methodology to build robust predictions, and use data assimilation to estimate the state of the system from fused CNN predictions。
results: 提高了 COVID-19 疫情预测的性能和灵活性，并且比标准模型（SEIR）更高精度和更稳定。

Abstract
This paper proposes a novel approach to predict epidemiological parameters by integrating new real-time signals from various sources of information, such as novel social media-based population density maps and Air Quality data. We implement an ensemble of Convolutional Neural Networks (CNN) models using various data sources and fusion methodology to build robust predictions and simulate several dynamic parameters that could improve the decision-making process for policymakers. Additionally, we used data assimilation to estimate the state of our system from fused CNN predictions. The combination of meteorological signals and social media-based population density maps improved the performance and flexibility of our prediction of the COVID-19 outbreak in London. While the proposed approach outperforms standard models, such as compartmental models traditionally used in disease forecasting (SEIR), generating robust and consistent predictions allows us to increase the stability of our model while increasing its accuracy.

摘要
本文提出了一种新的方法，通过将新的实时信号 integrate into various sources of information, such as social media-based population density maps and Air Quality data, to predict epidemiological parameters。我们使用了一个ensemble of Convolutional Neural Networks (CNN) models and various data sources and fusion methodology to build robust predictions and simulate several dynamic parameters that could improve the decision-making process for policymakers。此外，我们使用数据充满来估计系统状态的CNN预测结果。 combining meteorological signals and social media-based population density maps improved the performance and flexibility of our prediction of the COVID-19 outbreak in London。相比标准模型（如SEIR组件模型），我们的方法具有更高的稳定性和准确性。

AVSegFormer: Audio-Visual Segmentation with Transformer

paper_url: http://arxiv.org/abs/2307.01146
repo_url: https://github.com/vvvb-github/avsegformer
paper_authors: Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu
for: 本研究旨在提出一种新的听视分割（AVS）任务，以解决在视频中找到并分割听起来的对象。
methods: 本文提出了一种基于transformer架构的AVSegFormer模型，通过引入听音查询和可学习查询，使网络可以 selectively 关注有趣的视觉特征。此外，我们还提出了一种听视混合器，可以动态调整视觉特征，并且设置了一个中间mask损失，以便更好地监督网络的预测。
results: 广泛的实验表明，AVSegFormer可以在AVS标准准样上取得状态的损失。网络代码可以在https://github.com/vvvb-github/AVSegFormer上下载。

Abstract
The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given video. This task demands audio-driven pixel-level scene understanding for the first time, posing significant challenges. In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture. Specifically, we introduce audio queries and learnable queries into the transformer decoder, enabling the network to selectively attend to interested visual features. Besides, we present an audio-visual mixer, which can dynamically adjust visual features by amplifying relevant and suppressing irrelevant spatial channels. Additionally, we devise an intermediate mask loss to enhance the supervision of the decoder, encouraging the network to produce more accurate intermediate predictions. Extensive experiments demonstrate that AVSegFormer achieves state-of-the-art results on the AVS benchmark. The code is available at https://github.com/vvvb-github/AVSegFormer.

摘要
具有音频和视觉功能的组合已经是多Modal社区中的一个长期关注的话题。最近，一个新的音频视频分割（AVS）任务被引入，旨在在给定的视频中找到并分割声音的对象。这个任务要求音频驱动像素级场景理解，具有重大挑战。在这篇论文中，我们提出了AVSegFormer，一种新的AVS任务框架，利用转换架构。具体来说，我们引入了音频问题和学习问题到转换解码器中，使网络可以选择性地注意到有兴趣的视觉特征。此外，我们提出了一个音频视频混合器，可以动态调整视觉特征，增强有用的空间通道。此外，我们还提出了一个中间面 mask loss，以增强解码器的监督，让网络生成更加准确的中间预测。广泛的实验表明，AVSegFormer可以在AVS标准 bencmarks 上达到状态的最佳结果。代码可以在https://github.com/vvvb-github/AVSegFormer 中下载。

SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

paper_url: http://arxiv.org/abs/2307.01139
repo_url: https://github.com/lupantech/ScienceQA
paper_authors: Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge
for: 这个论文旨在提高大型语言模型（LLM）的能力，使其更好地遵循科学 Multimodal 指令。
methods: 这个论文提出了 SciTune 调教框架，用于改进 LLM 的科学 Multimodal 理解能力。其中使用了人类生成的科学指令调教数据集，并训练了一个包含视觉编码器和 LLM 的多Modal 模型 LLaMA-SciTune。
results: 对比机器生成数据只进行finetuning的模型，LLaMA-SciTune 在科学QA benchmark中平均和许多子类型的人工性能都高于人类性能。

Abstract
Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. In comparison to the models that are finetuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.

摘要
instruction fine-tuning是一种流行的思想，用于将大型语言模型（LLM）与人类意图进行对接。尽管这个想法在提高LLM对现有基础模型的适应性方面具有广泛的应用前景，但是它在科学领域中得到了更少的探索。在这项工作中，我们提出了SciTune作为一种调整框架，用于改进LLM对科学多Modal指令的遵循能力。为测试我们的方法，我们使用了人类生成的科学指令调整数据集，并训练了一个包含视觉编码器和LLM的科学频谱模型LLaMA-SciTune。与只使用机器生成的数据进行finetuning的模型相比，LLaMA-SciTune在科学问答 bencmark中平均和许多子类型上超越了人类性能。

2023-07-04

eess.IV

eess.IV - 2023-07-04

Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis

paper_url: http://arxiv.org/abs/2307.01738
repo_url: None
paper_authors: Changjian Shui, Justin Szeto, Raghav Mehta, Douglas L. Arnold, Tal Arbel
for: 避免深度学习医学图像分析模型在实际临床中的不可靠部署，需要进行准确性调整。
methods: 我们提出了一种两阶段方法：集群焦点法，首先标识不准确的样本，将其分为组，然后引入组织损失来改善准确性偏见。
results: 我们在皮肤病分类HAM10000 dataset和多发性硬化病患者未来病变预测 task 上进行了评估，结果表明，我们的方法可以有效控制最差表现 subgroup 的准确性错误，同时保持预测性能，并超越最近的基elines。

Abstract
Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines.

摘要
信任性的深入学习医学影像模型在实际临床应用中需要进行准确化。然而，即使模型在整体上具有良好的准确性，也可能对一个子群体存在不良准确性，导致医生不知道基于模型的建议而做出不良决策。虽然有些方法可以成功地消除 subgroup 的偏见，但这项工作将关注在医学影像分析中的开放问题上，即如何消除准确性偏见。我们的方法不需要在训练过程中提供 subgroup 特征，因此可以随意地消除不同敏感特征的偏见。为此，我们提出了一种新的两stage方法：集群焦点法。首先，我们将准确性不佳的样本分成集群，然后引入集群级别的焦点损失来改善准确性偏见。我们在皮肤病分类 task 和多发性精神病（MS）患者未来病变预测任务上进行了评估。除了考虑传统的敏感特征（例如年龄、性别）与人口 subgroup 之外，我们还考虑了医学影像分析中的不同图像特征，如病虫荷载，这些特征是必需的。我们的结果表明，我们的方法可以有效地控制最差 subgroup 的准确性错误，保持预测性能，并超越最新的基eline。

Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

paper_url: http://arxiv.org/abs/2307.01567
repo_url: None
paper_authors: Yipeng Liu, Qi Yang, Yujie Zhang, Yiling Xu, Le Yang, Xiaozhong Xu, Shan Liu
for: 提高无参考点云质量评估（NR-PCQA）方法的一般化性能。
methods: 提出一种基于域相关性的NR-PCQA方法，包括解释性模型、域转换和Semantic explanation。
results: 实验结果表明，提出的D$^3$-PCQA方法在多个公开数据集上 exhibits 强大的一般化能力和 robust性。

Abstract
Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, as reference point clouds are not available in many cases, no-reference (NR) metrics have become a research hotspot. Existing NR methods suffer from poor generalization performance. To address this shortcoming, we propose a novel NR-PCQA method, Point Cloud Quality Assessment via Domain-relevance Degradation Description (D$^3$-PCQA). First, we demonstrate our model's interpretability by deriving the function of each module using a kernelized ridge regression model. Specifically, quality assessment can be characterized as a leap from the scattered perceptual domain (reflecting subjective perception) to the ordered quality domain (reflecting mean opinion score). Second, to reduce the significant domain discrepancy, we establish an intermediate domain, the description domain, based on insights from subjective experiments, by considering the domain relevance among samples located in the perception domain and learning a structured latent space. The anchor features derived from the learned latent space are generated as cross-domain auxiliary information to promote domain transformation. Furthermore, the newly established description domain decomposes the NR-PCQA problem into two relevant stages. These stages include a classification stage that gives the degradation descriptions to point clouds and a regression stage to determine the confidence degrees of descriptions, providing a semantic explanation for the predicted quality scores. Experimental results demonstrate that D$^3$-PCQA exhibits robust performance and outstanding generalization ability on several publicly available datasets. The code in this work will be publicly available at https://smt.sjtu.edu.cn.

摘要
Full-reference (FR) 点云质量评估 (PCQA) 在最近几年内取得了显著的进步。然而，由于参考点云不常可用，无参考 (NR) 指标成为了研究热点。现有的 NR 方法受到质量评估的泛化性能的限制。为了解决这一缺点，我们提出了一种新的 NR-PCQA 方法，即 Point Cloud Quality Assessment via Domain-relevance Degradation Description (D$^3$-PCQA)。首先，我们证明了我们的模型的可解释性，通过使用kernelized ridge regression模型来 derivate每个模块的函数。具体来说，质量评估可以被描述为从杂乱的感知领域（反映主观感受）到有序的质量领域（反映意见票）的跳跃。其次，为了减少域外差，我们建立了一个中间域，即描述域，基于对主观实验所获得的域相关性的思考。通过学习协同的秘密空间，我们生成了跨域的帮助信息，以便进行域转换。此外，我们新建立的描述域将 NR-PCQA 问题分解成两个相关的阶段。这两个阶段分别是用来给点云的质量描述和确定描述的可信度的阶段，从而提供了 semantics 的解释。实验结果表明，D$^3$-PCQA 具有出色的 Robustness 和泛化能力，在多个公开可用的数据集上达到了优秀的性能。我们将在https://smt.sjtu.edu.cn上公开代码。

Spatio-Temporal Perception-Distortion Trade-off in Learned Video SR

paper_url: http://arxiv.org/abs/2307.01556
repo_url: https://github.com/kuis-ai-tekalp-research-group/perceptual-vsr
paper_authors: Nasrin Rahimi, A. Murat Tekalp
for: 这个论文旨在探讨视频超解像的准确性评价方法，尤其是考虑视频中的运动流动性。
methods: 该论文提出了一种新的视频质量评价指标，强调视频中的自然运动流动性，并提出了一种基于这个指标的视频超解像框架（PSVR）。
results: 实验结果表明，该论文提出的评价指标和框架可以更好地评价视频超解像的准确性，并且支持假设，即视频准确性评价应该考虑运动流动性的自然性。

Abstract
Perception-distortion trade-off is well-understood for single-image super-resolution. However, its extension to video super-resolution (VSR) is not straightforward, since popular perceptual measures only evaluate naturalness of spatial textures and do not take naturalness of flow (temporal coherence) into account. To this effect, we propose a new measure of spatio-temporal perceptual video quality emphasizing naturalness of optical flow via the perceptual straightness hypothesis (PSH) for meaningful spatio-temporal perception-distortion trade-off. We also propose a new architecture for perceptual VSR (PSVR) to explicitly enforce naturalness of flow to achieve realistic spatio-temporal perception-distortion trade-off according to the proposed measures. Experimental results with PVSR support the hypothesis that a meaningful perception-distortion tradeoff for video should account for the naturalness of motion in addition to naturalness of texture.

摘要
文本扭曲质量评估对单张超高清图像处理well understood,但是扩展到视频超高清图像（VSR）并不直接，因为流行的感知度量只评估自然性的空间纹理，而不考虑流动的自然性（时间准确性）。为此，我们提出了一种新的spatio-temporal感知质量指标，强调流动的自然性via the perceptual straightness hypothesis（PSH），以实现有意义的spatio-temporal扭曲质量评估。我们还提出了一种新的PSVR架构，以直接强制实现流动的自然性，以达到实际的spatio-temporal扭曲质量评估。实验结果表示，在PVSR中，一个有意义的扭曲质量评估应该考虑流动的自然性，不仅是空间纹理的自然性。

Convolutional Transformer for Autonomous Recognition and Grading of Tomatoes Under Various Lighting, Occlusion, and Ripeness Conditions

paper_url: http://arxiv.org/abs/2307.01530
repo_url: None
paper_authors: Asim Khan, Taimur Hassan, Muhammad Shafay, Israa Fahmy, Naoufel Werghi, Lakmal Seneviratne, Irfan Hussain
for: 本研究旨在开发一种自主识别和评估 Tomatoes 的框架，以便在实际场景中使用移动机器人收割 Tomatoes。
methods: 本研究使用一种卷积变换器架构，通过自动识别和评估 Tomatoes，缓解因为叶子和枝条等因素而导致的 occlusion 问题。
results: 经过训练和测试，提出的方法在不同的照明条件和观察角度下，对 Tomatoes 的识别和评估表现出色，比基eline方法和先前方法高出58.14%、65.42% 和 66.39% 的mean average precision 分数。

Abstract
Harvesting fully ripe tomatoes with mobile robots presents significant challenges in real-world scenarios. These challenges arise from factors such as occlusion caused by leaves and branches, as well as the color similarity between tomatoes and the surrounding foliage during the fruit development stage. The natural environment further compounds these issues with varying light conditions, viewing angles, occlusion factors, and different maturity levels. To overcome these obstacles, this research introduces a novel framework that leverages a convolutional transformer architecture to autonomously recognize and grade tomatoes, irrespective of their occlusion level, lighting conditions, and ripeness. The proposed model is trained and tested using carefully annotated images curated specifically for this purpose. The dataset is prepared under various lighting conditions, viewing perspectives, and employs different mobile camera sensors, distinguishing it from existing datasets such as Laboro Tomato and Rob2Pheno Annotated Tomato. The effectiveness of the proposed framework in handling cluttered and occluded tomato instances was evaluated using two additional public datasets, Laboro Tomato and Rob2Pheno Annotated Tomato, as benchmarks. The evaluation results across these three datasets demonstrate the exceptional performance of our proposed framework, surpassing the state-of-the-art by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores for KUTomaData, Laboro Tomato, and Rob2Pheno Annotated Tomato, respectively. The results underscore the superiority of the proposed model in accurately detecting and delineating tomatoes compared to baseline methods and previous approaches. Specifically, the model achieves an F1-score of 80.14%, a Dice coefficient of 73.26%, and a mean IoU of 66.41% on the KUTomaData image dataset.

摘要
采收完全熟 Tomatoes 的 mobile robot 存在许多实际应用中的挑战。这些挑战包括由叶子和枝头所引起的遮掩、 Tomatoes 和周围的植物发育阶段的颜色相似性，以及自然环境中的不同光照条件、观察角度和不同熟度水平。为了解决这些问题，这项研究提出了一个新的框架，利用卷积变换器架构来自动识别和分级 Tomatoes，无论它们的遮掩水平、光照条件和熟度如何。该提案的模型被训练和测试使用特意为这项研究制作的注意词汇图像集。该数据集在不同的光照条件下、不同的观察角度下和使用不同的移动摄像头感知器时被准备。与现有的数据集不同，这个数据集不仅包括不同的光照条件和观察角度，还使用了不同的移动摄像头感知器。为了评估该提案的效果，研究者们使用了另外两个公共数据集作为参照，即 Laboro Tomato 和 Rob2Pheno Annotated Tomato。结果表明，该提案的模型在处理受遮掩和受遮掩的 Tomatoes 实例时表现出色，与基准方法和先前的方法相比，提高了58.14%、65.42%和66.39%的平均准确率。这些结果表明，该模型在识别和定义 Tomatoes 方面具有出色的性能，而不是基准方法和先前的方法。具体来说，模型在 KUTomaData 图像集上 achieve 的 F1 分数为 80.14%，Dice 系数为 73.26%，和 Mean IoU 为 66.41%。

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

paper_url: http://arxiv.org/abs/2307.01486
repo_url: https://github.com/shijun18/h-denseformer
paper_authors: Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue
for: 这篇论文是针对多Modal的医疗影像肿瘤分类问题提出的一个新方法。
methods: 本文提出了一个混合了CNN和Transformer结构的对称网络，名为H-DenseFormer，它可以将多Modal的输入转换为融合特征，并将这些融合特征传递到不同层次的Encoder中进行增强多Modal学习表现。此外，本文还提出了一个轻量级的DCT块来取代标准Transformer块，以减少计算复杂度。
results: 在两个公共的多Modal数据集上进行了广泛的实验，结果显示了我们的提案方法在与现有的State-of-the-art方法进行比较时，具有更好的表现，同时计算复杂度较低。

Abstract
Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer.

摘要
Simplified Chinese:近期，深度学习方法在多Modal医疗影像肿瘤分割领域得到了广泛应用，并取得了Promising的结果。然而，大多数现有方法受到不充分的表达能力、特定的Modal数量和高计算复杂性的限制。在本文中，我们提出了一种混合密集连接网络，名为H-DenseFormer，它结合了Convolutional Neural Network (CNN)和Transformer结构的表达力。具体来说，H-DenseFormer integrate了一个基于Transformer的多路平行嵌入（MPE）模块，可以将多Modal的输入作为输入，以提取不同Modal的融合特征。然后，这些融合特征被传递到不同层的编码器，以增强多Modal学习表达。此外，我们设计了一个轻量级的Densely Connected Transformer（DCT）块，以取代标准Transformer块，从而显著降低计算复杂性。我们在HECKTOR21和PI-CAI22两个公共多Modal数据集上进行了广泛的实验。实验结果表明，我们提出的方法可以比现有的状态级方法更高效，同时计算复杂性也更低。源代码可以在https://github.com/shijun18/H-DenseFormer上获取。

Zero-DeepSub: Zero-Shot Deep Subspace Reconstruction for Rapid Multiparametric Quantitative MRI Using 3D-QALAS

paper_url: http://arxiv.org/abs/2307.01410
repo_url: None
paper_authors: Yohan Jun, Yamin Arefeen, Jaejin Cho, Shohei Fujita, Xiaoqing Wang, P. Ellen Grant, Borjan Gagoski, Camilo Jaimes, Michael S. Gee, Berkin Bilgic
for: develop and evaluate methods for reconstructing 3D-quantification using an interleaved Look-Locker acquisition sequence with T2 preparation pulse (3D-QALAS) time-series images
methods: using a low-rank subspace method and zero-shot deep-learning subspace method (Zero-DeepSub) for rapid and high fidelity T1 and T2 mapping and time-resolved imaging
results: good linearity and reduced biases compared to conventional QALAS, better g-factor maps and reduced voxel blurring, noise, and artifacts compared to conventional QALAS, and robust performance at up to 9-fold acceleration with Zero-DeepSub enabled whole-brain T1, T2, and PD mapping at 1 mm isotropic resolution within 2 min of scan time.Here’s the format you requested:
for: develop and evaluate methods for 3D-quantification using 3D-QALAS time-series images
methods: using low-rank subspace method and Zero-DeepSub
results: good linearity, reduced biases, better g-factor maps, and reduced voxel blurring, noise, and artifacts, and robust performance at up to 9-fold acceleration

Abstract
Purpose: To develop and evaluate methods for 1) reconstructing 3D-quantification using an interleaved Look-Locker acquisition sequence with T2 preparation pulse (3D-QALAS) time-series images using a low-rank subspace method, which enables accurate and rapid T1 and T2 mapping, and 2) improving the fidelity of subspace QALAS by combining scan-specific deep-learning-based reconstruction and subspace modeling. Methods: A low-rank subspace method for 3D-QALAS (i.e., subspace QALAS) and zero-shot deep-learning subspace method (i.e., Zero-DeepSub) were proposed for rapid and high fidelity T1 and T2 mapping and time-resolved imaging using 3D-QALAS. Using an ISMRM/NIST system phantom, the accuracy of the T1 and T2 maps estimated using the proposed methods was evaluated by comparing them with reference techniques. The reconstruction performance of the proposed subspace QALAS using Zero-DeepSub was evaluated in vivo and compared with conventional QALAS at high reduction factors of up to 9-fold. Results: Phantom experiments showed that subspace QALAS had good linearity with respect to the reference methods while reducing biases compared to conventional QALAS, especially for T2 maps. Moreover, in vivo results demonstrated that subspace QALAS had better g-factor maps and could reduce voxel blurring, noise, and artifacts compared to conventional QALAS and showed robust performance at up to 9-fold acceleration with Zero-DeepSub, which enabled whole-brain T1, T2, and PD mapping at 1 mm isotropic resolution within 2 min of scan time. Conclusion: The proposed subspace QALAS along with Zero-DeepSub enabled high fidelity and rapid whole-brain multiparametric quantification and time-resolved imaging.

摘要
目的：开发和评估使用排序 Look-Locker 类型的三维量化（3D-QALAS）时间序列图像的重要方法，以实现精确和快速的 T1 和 T2 地图的构建，并且提高 subspace QALAS 的实用性。方法：提出了一种基于低维度的 subspace QALAS 方法和 zero-shot 深度学习 subspace 方法（Zero-DeepSub），用于快速和高实用性的 T1 和 T2 地图和时间分辨图像的重建。使用 ISMRM/NIST 系统实验库中的实验库，评估了提案方法中的 T1 和 T2 地图的准确性，并与参考方法进行比较。结果：实验结果显示，subspace QALAS 具有对于参考方法的良好线性性，而且可以降低 conventional QALAS 中的偏差，特别是 T2 地图。此外，在 vivo 中的结果显示，subspace QALAS 可以提供更好的 g-因素地图，并且可以降低像素模糊、噪音和错误，并且在 Zero-DeepSub 的支持下，可以在 9 倍的压缩因子下进行快速的构建。结论：提案的 subspace QALAS 和 Zero-DeepSub 可以实现高实用性和快速的全脑多 parametr 量化和时间分辨图像。

A CNN regression model to estimate buildings height maps using Sentinel-1 SAR and Sentinel-2 MSI time series

paper_url: http://arxiv.org/abs/2307.01378
repo_url: None
paper_authors: Ritu Yadav, Andrea Nascetti, Yifang Ban
for: 这个研究旨在提出一个监督学习的多modal建筑高度回溯网络（MBHR-Net），用于以10米间隔估计建筑高度使用快射-1（S1）和快射-2（S2）卫星时间序列。
methods: 这个研究使用了S1提供的Synthetic Aperture Radar（SAR）数据，以及S2提供的多spectral数据，并使用深度学习模型将这两种数据融合以学习复杂的空间-时间关系。
results: 这个研究的初步结果显示MBHR-Net可以实现高度精准的估计（3.73米RMSE、0.95 IoU、0.61 R2），表明这个深度学习模型具有实用的应用前景，包括城市规划、环境影响分析等。

Abstract
Accurate estimation of building heights is essential for urban planning, infrastructure management, and environmental analysis. In this study, we propose a supervised Multimodal Building Height Regression Network (MBHR-Net) for estimating building heights at 10m spatial resolution using Sentinel-1 (S1) and Sentinel-2 (S2) satellite time series. S1 provides Synthetic Aperture Radar (SAR) data that offers valuable information on building structures, while S2 provides multispectral data that is sensitive to different land cover types, vegetation phenology, and building shadows. Our MBHR-Net aims to extract meaningful features from the S1 and S2 images to learn complex spatio-temporal relationships between image patterns and building heights. The model is trained and tested in 10 cities in the Netherlands. Root Mean Squared Error (RMSE), Intersection over Union (IOU), and R-squared (R2) score metrics are used to evaluate the performance of the model. The preliminary results (3.73m RMSE, 0.95 IoU, 0.61 R2) demonstrate the effectiveness of our deep learning model in accurately estimating building heights, showcasing its potential for urban planning, environmental impact analysis, and other related applications.

摘要
准确估算建筑高度是城市规划、基础设施管理和环境分析中非常重要的。在本研究中，我们提出了一种监督式多模态建筑高度回归网络（MBHR-Net），用于使用 Sentinal-1（S1）和 Sentinal-2（S2）卫星时序序数据来估算建筑高度的10米空间分辨率。S1提供Synthetic Aperture Radar（SAR）数据，可以提供建筑结构的有价信息，而S2提供多spectral数据，敏感于不同的地表类型、植被生长阶段和建筑阴影。我们的MBHR-Net试图从S1和S2图像中提取有用的特征，以学习图像模式和建筑高度之间的复杂空间时间关系。模型在荷兰10座城市进行训练和测试。使用Root Mean Squared Error（RMSE）、Intersection over Union（IOU）和R-squared（R2） метри来评估模型的性能。初步结果（3.73米RMSE、0.95 IoU、0.61 R2）表明我们的深度学习模型可以准确地估算建筑高度，展示其在城市规划、环境影响分析等相关应用中的潜力。

Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis

paper_url: http://arxiv.org/abs/2307.01148
repo_url: None
paper_authors: Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann, Isabelle Ayx, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt
for: 这个论文的目的是评估三维潜在扩散模型在生成医疗数据方面的能力。
methods: 该论文使用了自我超vised模型基于对比学习来检测潜在的记忆效应。
results: 研究结果表明，这些潜在扩散模型确实会记忆训练数据，需要采取措施来缓解这种记忆效应。

Abstract
Generative latent diffusion models have been established as state-of-the-art in data generation. One promising application is generation of realistic synthetic medical imaging data for open data sharing without compromising patient privacy. Despite the promise, the capacity of such models to memorize sensitive patient training data and synthesize samples showing high resemblance to training data samples is relatively unexplored. Here, we assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets. To detect potential memorization of training samples, we utilize self-supervised models based on contrastive learning. Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.

摘要
文本翻译为简化字Simplified Chinese。<>生成式潜在扩散模型已成为数据生成领域的状态机。一个有前途的应用是生成真实的医疗数据，以便在开放数据分享无需妥协病人隐私。虽然有承诺，但是这些模型对敏感病人训练数据的记忆能力和生成样本高度相似的样本的 sintesis能力尚未得到充分探讨。我们在 photon-counting coronary computed tomography angiography和 knee magnetic resonance imaging 数据集上评估了3D潜在扩散模型的记忆能力。为检测可能的记忆 Training samples，我们利用了自我超VI的 contrastive learning。我们的结果表明，这些潜在扩散模型确实记忆训练数据，而需要采取措施来缓解记忆。

2023-07-03

cs.SD

cs.SD - 2023-07-03

musif: a Python package for symbolic music feature extraction

paper_url: http://arxiv.org/abs/2307.01120
repo_url: https://github.com/didoneproject/musif
paper_authors: Ana Llorens, Federico Simonetta, Martín Serrano, Álvaro Torrente
for: 本研究团队开发了一个名为musif的Python包，用于自动提取Symbolic Music Score中的特征。
methods: musif包包含了一大量的特征，这些特征由音乐学家、音乐理论家、统计学家和计算机科学家团队共同开发。此外，包还允许用户轻松创建自定义特征使用常用的Python库。
results: musif包支持处理高质量的MusicXML格式音乐学数据，同时也支持其他常用的音乐信息检索任务格式，如MIDI、MEI、Kern等。作者提供了详细的文档和教程，以帮助扩展框架并帮助新手了解其使用。

Abstract
In this work, we introduce musif, a Python package that facilitates the automatic extraction of features from symbolic music scores. The package includes the implementation of a large number of features, which have been developed by a team of experts in musicology, music theory, statistics, and computer science. Additionally, the package allows for the easy creation of custom features using commonly available Python libraries. musif is primarily geared towards processing high-quality musicological data encoded in MusicXML format, but also supports other formats commonly used in music information retrieval tasks, including MIDI, MEI, Kern, and others. We provide comprehensive documentation and tutorials to aid in the extension of the framework and to facilitate the introduction of new and inexperienced users to its usage.

摘要
在这项工作中，我们介绍了musif，一个Python包，用于自动提取符号音乐谱的特征。该包包括一大量的特征，由音乐学、音乐理论、统计和计算机科学领域的专家们开发。此外，包还允许用户轻松创建自定义特征使用常用的Python库。musif主要针对高质量的音乐学数据编码为MusicXML格式进行处理，也支持其他常用于音乐信息检索任务的格式，包括MIDI、MEI、Kern等。我们提供了完善的文档和教程，以帮助扩展该框架并帮助新用户入门使用。

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

paper_url: http://arxiv.org/abs/2307.00759
repo_url: None
paper_authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati
for: 提高自然语言处理（NLP）中自定义单词的识别率
methods: 使用 Contextual Adapters 进行注意力基于偏移的偏移模型，并在训练过程中使用超vision损失来缓和训练
results: 在低资源语言中提高了自定义单词的检索精度，实现了48% F1提升，同时也导致了基础 CTCL 模型的5-11% 词错率下降

Abstract
Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities. While this approach works well with enough data, we showcase that it isn't an effective strategy for low-resource languages. In this work, we propose a supervision loss for smoother training of the Contextual Adapters. Further, we explore a multilingual strategy to improve performance with limited training data. Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language. Interestingly, as a by-product of training the Contextual Adapters, we see a 5-11% Word Error Rate (WER) reduction in the performance of the base CTC model as well.

摘要
卷积时序分类（CTC）模型在自动语音识别（ASR）中具有平衡速度和性能的优点，但这些模型仍然在其他领域面临挑战，例如个性化向custom字进行个性化。一种最近的方法是使用上下文适应器来改善CTC模型中的认知 CustomEntities的识别。虽然这种方法在充足的数据量下工作良好，但我们发现在低资源语言上这种策略并不是有效的。在这种情况下，我们提出了一种超vision损失来帮助Contextual Adapters更平滑地训练。此外，我们探索了一种多语言策略以提高具有有限训练数据的性能。我们的方法实现了一个48%的F1提升在检索未看过的个性化字符串中，并且 Interestingly, 在训练Contextual Adapters的过程中，我们发现了5-11%的单词错误率（WER）下降在基本CTC模型的性能中。

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

paper_url: http://arxiv.org/abs/2307.00729
repo_url: None
paper_authors: Sheng Zhao, Qilong Yuan, Yibo Duan, Zhuoyue Chen
for: 本研究主要目标是开发一种可以生成语音内容的语音生成模型，以便模拟人工声音。
methods: 该模型采用了端到端多模块结构，包括说话者编码器、基于Tacotron2的合成器和基于WaveRNN的 vocoder。
results: 经过多种比较实验和模型结构的研究，该模型最终在ADD 2023挑战赛Track 1.1中获得了44.97%的Weighted Deception Success Rate（WDSR）。

Abstract
The task of synthetic speech generation is to generate language content from a given text, then simulating fake human voice.The key factors that determine the effect of synthetic speech generation mainly include speed of generation, accuracy of word segmentation, naturalness of synthesized speech, etc. This paper builds an end-to-end multi-module synthetic speech generation model, including speaker encoder, synthesizer based on Tacotron2, and vocoder based on WaveRNN. In addition, we perform a lot of comparative experiments on different datasets and various model structures. Finally, we won the first place in the ADD 2023 challenge Track 1.1 with the weighted deception success rate (WDSR) of 44.97%.

摘要
文本生成任务的目标是将文本转化为语言内容，然后模拟人工嗓音。主要影响生成效果的因素包括生成速度、单词分 segmentation 精度、生成的嗓音自然程度等。这篇文章建立了端到端多模块合成嗓音模型，包括说话者编码器、基于 Tacotron2 的生成器和基于 WaveRNN 的 vocoder。此外，我们进行了多种比较 эксперименты，包括不同的数据集和模型结构。最后，我们在 ADD 2023 挑战赛 Track 1.1 中获得了44.97%的Weighted Deception Success Rate（WDSR）。

2023-07-03

eess.AS

eess.AS - 2023-07-03

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

paper_url: http://arxiv.org/abs/2307.00782
repo_url: None
paper_authors: Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee
for: 这项研究旨在提高文本转语音（TTS）系统的长文朗读质量。
methods: 该研究提出了一种轻量级 yet有效的 TTS 系统，即 ContextSpeech。该系统首先设计了一种储存机制，以利用全文和语音上下文来增强句子编码。然后，它构建了层次结构的文本 semantics，以扩大全文上下文的增强范围。最后，它综合应用了线性化自注意力，以提高模型效率。
results: 实验表明，ContextSpeech 在段落读物中提高了声音质量和语调表达性，与竞争性模型相当。示例响应器可以在以下链接中浏览：https://contextspeech.github.io/demo/

Abstract
While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) ignorance of cross-sentence contextual information, and ii) high computation and memory cost for long-form synthesis. To address these issues, this work develops a lightweight yet effective TTS system, ContextSpeech. Specifically, we first design a memory-cached recurrence mechanism to incorporate global text and speech context into sentence encoding. Then we construct hierarchically-structured textual semantics to broaden the scope for global context enhancement. Additionally, we integrate linearized self-attention to improve model efficiency. Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency. Audio samples are available at: https://contextspeech.github.io/demo/

摘要
“当前的文本到语音系统可以生成具有非常高质量的自然语音，但是在段落/长文读取中仍然存在很大的挑战。这些问题的原因是：一、忽略跨句Contextual信息，二、长文合成的计算和内存成本过高。为了解决这些问题，本工作开发了一个轻量级又有效的文本到语音系统——ContextSpeech。具体来说，我们首先设计了一种嵌入式的记忆缓存机制，以将全文和语音Context incorporated into sentence encoding。然后，我们构建了层次结构的文本 semantics，以扩大全文Context的增强范围。此外，我们将Linearized self-attention integrated into the model，以提高模型效率。实验表明，ContextSpeech可以在段落读取中显著提高声音质量和表达性，并且与其他模型相比，其效率相对较高。听 samples可以在：https://contextspeech.github.io/demo/ ”Note that the translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, I can provide that as well.