2023-07-05

cs.LG

cs.LG - 2023-07-05

Machine learning at the mesoscale: a computation-dissipation bottleneck

paper_url: http://arxiv.org/abs/2307.02379
repo_url: None
paper_authors: Alessandro Ingrosso, Emanuele Panizon
for: 该论文旨在探讨物理系统中信息处理成本的兼顾问题，即性能和能getic费用之间的trade-off。
methods: 该论文采用了 computation-dissipation bottleneck frameworks，通过使用实际数据集和 sintetic任务，证明了不对称互动的存在导致了性能提高。
results: 该论文的研究结果表明，在输入输出设备中，非平衡情况会导致信息压缩、计算输入输出和动力不逆转行为的均衡。

Abstract
The cost of information processing in physical systems calls for a trade-off between performance and energetic expenditure. Here we formulate and study a computation-dissipation bottleneck in mesoscopic systems used as input-output devices. Using both real datasets and synthetic tasks, we show how non-equilibrium leads to enhanced performance. Our framework sheds light on a crucial compromise between information compression, input-output computation and dynamic irreversibility induced by non-reciprocal interactions.

摘要
信息处理成本在物理系统中需要一种权衡 между性能和能getic耗用。我们在 mesoscopic 系统中作为输入输出设备形式ulated computation-dissipation bottleneck，并使用实际数据和synthetic任务来表征非平衡导致性能增强。我们的框架揭示了一种关键的信息压缩、输入输出计算和动力不可逆性引起的妥协。

Continuum Limits of Ollivier’s Ricci Curvature on data clouds: pointwise consistency and global lower bounds

paper_url: http://arxiv.org/abs/2307.02378
repo_url: None
paper_authors: Nicolas Garcia Trillos, Melanie Weber
for: 研究了一个低维抽象 $\mathcal{M} \subseteq \mathbb{R}^d$ 上的Random geometric graph的 curvature与抽象 $\mathcal{M}$ 的曲率之间的关系，通过维度下的连续假设。
methods: 使用了Ollivier的离散 Ricci curvature的连续假设，并证明了点性不变的consistency结果，以及如果 $\mathcal{M}$ 有下界为正数的 Ricci curvature，那么Random geometric graph将高概率拥有这种全球结构性。
results: 显示了热核辐射在图上的 contraction 性和抽象 $\mathcal{M}$ 的曲率之间的关系，以及从数据云中学习抽象 $\mathcal{M}$ 的应用。特别是，证明了consistency结果可以用来描述抽象 $\mathcal{M}$ 的内在曲率从外在曲率中。

Abstract
Let $\mathcal{M} \subseteq \mathbb{R}^d$ denote a low-dimensional manifold and let $\mathcal{X}= \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $\mathcal{M}$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that the consistency results allow for characterizing the intrinsic curvature of a manifold from extrinsic curvature.

摘要
Let $\mathcal{M} \subseteq \mathbb{R}^d$ be a low-dimensional manifold, and let $\mathcal{X} = \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $\mathcal{M}$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that the consistency results allow for characterizing the intrinsic curvature of a manifold from extrinsic curvature.Here is the translation in Traditional Chinese:Let $\mathcal{M} \subseteq \mathbb{R}^d$ be a low-dimensional manifold, and let $\mathcal{X} = \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $\mathcal{M}$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that the consistency results allow for characterizing the intrinsic curvature of a manifold from extrinsic curvature.

Distance Preserving Machine Learning for Uncertainty Aware Accelerator Capacitance Predictions

paper_url: http://arxiv.org/abs/2307.02367
repo_url: None
paper_authors: Steven Goldenberg, Malachi Schram, Kishansingh Rajput, Thomas Britton, Chris Pappas, Dan Lu, Jared Walden, Majdi I. Radaideh, Sarah Cousineau, Sudarshan Harave
for: 这个论文的目的是提供一种可靠的机器学习模型，尤其是在安全敏感的应用中，如加速器系统。
methods: 这个论文使用了深度神经网络和 Gaussian process 近似技术，并比较了两种不同的特征提取器，包括单元值分解和spectral-normalized dense layer。
results: 这个模型可以实现较好的距离保持和内部容易预测，并且预测在正常分布中的电容量值几乎没有误差（少于1%）。

Abstract
Providing accurate uncertainty estimations is essential for producing reliable machine learning models, especially in safety-critical applications such as accelerator systems. Gaussian process models are generally regarded as the gold standard method for this task, but they can struggle with large, high-dimensional datasets. Combining deep neural networks with Gaussian process approximation techniques have shown promising results, but dimensionality reduction through standard deep neural network layers is not guaranteed to maintain the distance information necessary for Gaussian process models. We build on previous work by comparing the use of the singular value decomposition against a spectral-normalized dense layer as a feature extractor for a deep neural Gaussian process approximation model and apply it to a capacitance prediction problem for the High Voltage Converter Modulators in the Oak Ridge Spallation Neutron Source. Our model shows improved distance preservation and predicts in-distribution capacitance values with less than 1% error.

摘要
提供准确的不确定性估计是机器学习模型生成可靠性的关键，特别是在安全关键应用中，如加速器系统。高斯过程模型通常被视为金标准方法，但它们可能在大量、高维度数据集上表现不佳。将深度神经网络与高斯过程approximation技术结合使用已经显示出了有希望的结果，但通过标准的深度神经网络层进行维度减少并不一定能保持高斯过程模型所需的距离信息。我们基于先前的工作，比较使用对快速特征EXTRACTOR的singular value decomposition和spectral-normalized dense layer作为深度神经网络 Gaussian process approximation模型的特征提取器，并应用于高电压转换模ulators在Oak Ridge Spallation Neutron Source中的电容量预测问题。我们的模型表现出了改善的距离保持和预测在distribution中的电容量值，误差低于1%。

Scaling Laws Do Not Scale

paper_url: http://arxiv.org/abs/2307.03201
repo_url: https://github.com/MarkipTheMudkip/in-class-project-2
paper_authors: Fernando Diaz, Michael Madaio
for: This paper argues that the performance of large AI models may not continue to improve as datasets get larger, as different communities represented in the dataset may have values or preferences not captured by the metrics used to evaluate model performance.methods: The paper highlights the potential risks of using scaling laws to evaluate the performance of AI models, as these laws overlook the possibility that different communities may have different values or preferences.results: The paper suggests that as datasets used to train large AI models grow, the number of distinct communities included in the dataset is likely to increase, and these communities may have different values or preferences that are not captured by the metrics used to evaluate model performance.

Abstract
Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.

摘要
近期的研究已经提出了一种力学关系，称为“扩大法律”，表明人工智能（AI）模型的性能和模型设计参数之间存在线性关系。即随着数据集大小（或模型参数等）的增加，使用该数据集训练的模型的性能会随之增加。然而，这个扩大法律关系忽视了评估模型性能的指标可能是不安定的、争议的或者不符合不同群体的评价标准。在这篇论文中，我们 argueThat as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

Decentralized Data Governance as Part of a Data Mesh Platform: Concepts and Approaches

paper_url: http://arxiv.org/abs/2307.02357
repo_url: None
paper_authors: Arif Wider, Sumedha Verma, Atif Akhtar
for: 本文旨在提供一个概念模型，用于解决数据网络（Data Mesh）的数据管理问题。
methods: 本文使用自动化技术，以提供一个自助数据基础设施平台，以便有效管理数据网络。
results: 本文提出了一种数据网络管理模型，并讨论了如何通过平台方式实现数据管理的自治。

Abstract
Data mesh is a socio-technical approach to decentralized analytics data management. To manage this decentralization efficiently, data mesh relies on automation provided by a self-service data infrastructure platform. A key aspect of this platform is to enable decentralized data governance. Because data mesh is a young approach, there is a lack of coherence in how data mesh concepts are interpreted in the industry, and almost no work on how a data mesh platform facilitates governance. This paper presents a conceptual model of key data mesh concepts and discusses different approaches to drive governance through platform means. The insights presented are drawn from concrete experiences of implementing a fully-functional data mesh platform that can be used as a reference on how to approach data mesh platform development.

摘要
“数据网”是一种社会技术方法，用于分布式数据分析数据管理。为了有效地管理这种分布，数据网利用自助数据基础设施平台的自动化。该平台的一个关键方面是实现分布式数据管理。由于数据网是一种新的方法，因此在行业中对数据网概念的解释存在一定的不一致，而且对于平台如何通过平台手段进行管理的研究几乎缺乏。本文提出了一个概念模型，描述了不同的数据网平台管理方法，并从实际实施了一个可用的数据网平台 refer to as a reference for how to approach data mesh platform development.Here's a word-for-word translation of the text:“数据网”是一种社会技术方法，用于分布式数据分析数据管理。为了有效地管理这种分布，数据网利用自助数据基础设施平台的自动化。该平台的一个关键方面是实现分布式数据管理。由于数据网是一种新的方法，因此在行业中对数据网概念的解释存在一定的不一致，而且对于平台如何通过平台手段进行管理的研究几乎缺乏。本文提出了一个概念模型，描述了不同的数据网平台管理方法，并从实际实施了一个可用的数据网平台 refer to as a reference for how to approach data mesh platform development.

LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.02345
repo_url: None
paper_authors: Outongyi Lv, Bingxin Zhou, Yu Guang Wang
for: 这个研究的目的是分析在线和离线RL中bellman错误的分布特性。
methods: 这个研究使用了分布统计学方法来分析bellman错误的分布特性，并基于这些分布特性来改进MSELoss和LLoss两种损失函数。
results: 研究发现在线环境中bellman错误遵循Logistic分布，而离线环境中bellman错误遵循受限的Logistic分布，这个受限分布与离线数据集中的先前策略相关。这些发现导致了改进MSELoss的假设，并使用Logistic最大 LIKElihood函数来构建LLoss作为替代损失函数。此外，研究还发现在离线数据集中，奖励应该遵循特定的分布，以便实现离线目标。在实验中，研究对Soft-Actor-Critic的两种变体在线和离线环境中进行了控制变量修正。结果证实了我们在线和离线设定中的假设，并发现LLoss的方差小于MSELoss。

Abstract
Currently, research on Reinforcement learning (RL) can be broadly classified into two categories: online RL and offline RL. Both in online and offline RL, the primary focus of research on the Bellman error lies in the optimization techniques and performance improvement, rather than exploring the inherent structural properties of the Bellman error, such as distribution characteristics. In this study, we analyze the distribution of the Bellman approximation error in both online and offline settings. We find that in the online environment, the Bellman error follows a Logistic distribution, while in the offline environment, the Bellman error follows a constrained Logistic distribution, where the constrained distribution is dependent on the prior policy in the offline data set. Based on this finding, we have improved the MSELoss which is based on the assumption that the Bellman errors follow a normal distribution, and we utilized the Logistic maximum likelihood function to construct $\rm LLoss$ as an alternative loss function. In addition, we observed that the rewards in the offline data set should follow a specific distribution, which would facilitate the achievement of offline objectives. In our numerical experiments, we performed controlled variable corrections on the loss functions of two variants of Soft-Actor-Critic in both online and offline environments. The results confirmed our hypothesis regarding the online and offline settings, we also found that the variance of LLoss is smaller than MSELoss. Our research provides valuable insights for further investigations based on the distribution of Bellman errors.

摘要
Translated into Simplified Chinese:现在， reinforcement learning（RL）研究可以分为两类：在线RL和离线RL。在线和离线RL中，bellman错误的主要研究方向都是优化技术和性能提高，而不是探索bellman错误的内在结构特性，如分布特性。在这项研究中，我们分析了在线和离线设置中bellman预测错误的分布。我们发现在线环境中，bellman错误遵循Logistic分布，而在离线环境中，bellman错误遵循受限Logistic分布，其受限分布取决于离线数据集中的先前策略。根据这一发现，我们改进了基于bellman错误预测的MSE损失，并利用Logistic最大likely函数来构建LLoss作为替代损失函数。此外，我们发现离线数据集中的奖励应该遵循特定的分布，以便实现离线目标。在我们的数值实验中，我们对Soft-Actor-Critic两种变体的损失函数进行了在线和离线环境中的变量修正。结果证明了我们在线和离线设置中的假设，并发现LLoss的方差比MSE损失更小。我们的研究提供了有价值的发现，用于进一步基于bellman错误分布的调查。

FAM: Relative Flatness Aware Minimization

paper_url: http://arxiv.org/abs/2307.02337
repo_url: https://github.com/kampmichael/RelativeFlatnessAndGeneralization
paper_authors: Linara Adilova, Amr Abourayya, Jianning Li, Amin Dada, Henning Petzka, Jan Egger, Jens Kleesiek, Michael Kamp
for: 提高模型的泛化能力
methods: 使用 relative flatness 度量，并通过一个简单的正则化项来优化
results: 在多种应用和模型中，FAM 可以提高模型的泛化性能，并且可以在标准训练和finetuning中使用

Abstract
Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more recent successful sharpness-aware optimization techniques. Their widespread adoption in practice, though, is dubious because of the lack of theoretically grounded connection between flatness and generalization, in particular in light of the reparameterization curse - certain reparameterizations of a neural network change most flatness measures but do not change generalization. Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization and solves the reparameterization curse. In this paper, we derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with arbitrary loss functions. It requires computing the Hessian only of a single layer of the network, which makes it applicable to large neural networks, and with it avoids an expensive mapping of the loss surface in the vicinity of the model. In an extensive empirical evaluation we show that this relative flatness aware minimization (FAM) improves generalization in a multitude of applications and models, both in finetuning and standard training. We make the code available at github.

摘要
几乎所有的模型都会在某些特定的输入上 exhibit 非常好的特性，但这并不意味着它们会在整体上具有更好的泛化能力。在1994年，豪克雷特和Schmid哈姆布尔首先提出了优化flatness的想法，然后是更加成功的锐度感知优化技术。尽管它们在实践中的普及度不高，因为没有理论上的基础连接flatness和泛化，尤其是在考虑到映射渐尘的问题。最近的理论工作表明，一种特定的相对flatness度量可以与泛化相关，并且解决了映射渐尘的问题。在这篇论文中，我们 derivates一种基于这种相对flatness度量的正则化器，它容易计算、快速、高效、可以与任何损失函数结合使用。它只需计算单个层的梯度，因此可以应用于大型神经网络，而且不需要在损失函数的附近进行昂贵的映射。在广泛的实验中，我们证明了这种相对flatness感知优化（FAM）可以提高多种应用和模型的泛化能力，包括finetuning和标准训练。我们在github上提供了代码。

Data-driven Predictive Latency for 5G: A Theoretical and Experimental Analysis Using Network Measurements

paper_url: http://arxiv.org/abs/2307.02329
repo_url: None
paper_authors: Marco Skocaj, Francesca Conserva, Nicol Sarcone Grande, Andrea Orsi, Davide Micheli, Giorgio Ghinamo, Simone Bizzarri, Roberto Verdone
for: This paper is written to analyze predictive latency within 5G networks using real-world network data.
methods: The paper uses an analytical formulation of user-plane latency as a Hypoexponential distribution, and conducts experimental results of probabilistic regression, anomaly detection, and predictive forecasting using Machine Learning (ML) techniques such as Bayesian Learning (BL) and Machine Learning on Graphs (GML).
results: The paper provides valuable insights into the efficacy of predictive algorithms in practical applications, and validates the proposed framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events.

Abstract
The advent of novel 5G services and applications with binding latency requirements and guaranteed Quality of Service (QoS) hastened the need to incorporate autonomous and proactive decision-making in network management procedures. The objective of our study is to provide a thorough analysis of predictive latency within 5G networks by utilizing real-world network data that is accessible to mobile network operators (MNOs). In particular, (i) we present an analytical formulation of the user-plane latency as a Hypoexponential distribution, which is validated by means of a comparative analysis with empirical measurements, and (ii) we conduct experimental results of probabilistic regression, anomaly detection, and predictive forecasting leveraging on emerging domains in Machine Learning (ML), such as Bayesian Learning (BL) and Machine Learning on Graphs (GML). We test our predictive framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events. Our results provide valuable insights into the efficacy of predictive algorithms in practical applications.

摘要
五代新服务和应用程序的紧耦合延迟和质量服务（QoS）的出现加剧了网络管理过程中的自主和积极决策的需求。我们的研究目标是对5G网络中的预测延迟进行全面分析，并使用可达到移动网络运营商（MNOs）的实际网络数据进行验证。具体来说，我们：(i) 提出了用户层延迟的分布式 Hypoexponential 分布，并通过对实际测量数据进行比较分析来验证其有效性。(ii) 通过使用 emerging 领域的机器学习（ML）技术，如 bayesian 学习（BL）和机器学习在图上（GML），进行可预测性的回归、异常检测和预测预测，并对各种场景进行测试，包括交通堵塞、都市化交通和社交聚会等。我们的结果提供了实用应用中预测算法的有用信息。

Exploring new ways: Enforcing representational dissimilarity to learn new features and reduce error consistency

paper_url: http://arxiv.org/abs/2307.02516
repo_url: None
paper_authors: Tassilo Wald, Constantin Ulrich, Fabian Isensee, David Zimmerer, Gregor Koehler, Michael Baumgartner, Klaus H. Maier-Hein
for: 提高模型 ensemble 的准确率
methods: 利用 representational similarity field 方法来促进模型之间的不同性 durante 训练
results: 实现了更高的 ensemble 准确率，并且Output predictions 之间更加不相关，从而降低了模型 ensemble 的共同失败模式

Abstract
Independently trained machine learning models tend to learn similar features. Given an ensemble of independently trained models, this results in correlated predictions and common failure modes. Previous attempts focusing on decorrelation of output predictions or logits yielded mixed results, particularly due to their reduction in model accuracy caused by conflicting optimization objectives. In this paper, we propose the novel idea of utilizing methods of the representational similarity field to promote dissimilarity during training instead of measuring similarity of trained models. To this end, we promote intermediate representations to be dissimilar at different depths between architectures, with the goal of learning robust ensembles with disjoint failure modes. We show that highly dissimilar intermediate representations result in less correlated output predictions and slightly lower error consistency, resulting in higher ensemble accuracy. With this, we shine first light on the connection between intermediate representations and their impact on the output predictions.

摘要
设置语言为简化中文。<>独立训练的机器学习模型通常会学习类似的特征。给出一个独立训练的模型集合，这会导致相互相关的预测和共同失败模式。过去关注输出预测或搜索阶段的减 corr 的尝试已经获得了杂合的结果，特别是因为它们在优化目标之间发生了矛盾。在这篇论文中，我们提出了一种新的想法，利用表示相似场来促进训练期间的不同性。为此，我们将不同深度的建筑物 promoted 到不同的中间表示，以达到学习强大的集成模型，并且学习不同的失败模式。我们发现，在不同的中间表示下，输出预测之间的相互关系较强，并且 ensemble 精度较高。我们首次探讨了中间表示如何影响输出预测的问题。

LOB-Based Deep Learning Models for Stock Price Trend Prediction: A Benchmark Study

paper_url: http://arxiv.org/abs/2308.01915
repo_url: None
paper_authors: Matteo Prata, Giuseppe Masi, Leonardo Berti, Viviana Arrigoni, Andrea Coletta, Irene Cannistraci, Svitlana Vyetrenko, Paola Velardi, Novella Bartolini
for: 股票价格趋势预测（SPTP）基于限价书（LOB）数据的深度学习模型的可靠性和泛化性研究。
methods: 我们开发了一个名为LOBCAST的开源框架，它包括数据准备、深度学习模型训练、评估和收益分析。
results: 我们的广泛实验发现所有模型在新数据上表现糟糕，这引发了市场应用性的问题。我们的工作作为一个标准，暴露了当前方法的潜在和局限性，并提供了创新解决方案的视野。

Abstract
The recent advancements in Deep Learning (DL) research have notably influenced the finance sector. We examine the robustness and generalizability of fifteen state-of-the-art DL models focusing on Stock Price Trend Prediction (SPTP) based on Limit Order Book (LOB) data. To carry out this study, we developed LOBCAST, an open-source framework that incorporates data preprocessing, DL model training, evaluation and profit analysis. Our extensive experiments reveal that all models exhibit a significant performance drop when exposed to new data, thereby raising questions about their real-world market applicability. Our work serves as a benchmark, illuminating the potential and the limitations of current approaches and providing insight for innovative solutions.

摘要
现代深度学习（DL）研究的进步已经很大程度上影响了金融领域。我们对十五种当前最佳DL模型进行了评估，以便对使用Limit Order Book（LOB）数据进行股票价格趋势预测（SPTP）。为了实现这项研究，我们开发了一个名为LOBCAST的开源框架，该框架包括数据处理、DL模型训练、评估和利润分析。我们的广泛实验表明，所有模型在新数据上表现出了显著的性能下降，这引发了market应用实际性的问题。我们的工作作为一个 referential，探讨了当前approach的潜力和局限性，并提供了创新解决方案的思路。

Deep Contract Design via Discontinuous Piecewise Affine Neural Networks

paper_url: http://arxiv.org/abs/2307.02318
repo_url: None
paper_authors: Tonghan Wang, Paul Dütting, Dmitry Ivanov, Inbal Talgam-Cohen, David C. Parkes
for: 这篇论文是为了研究深度学习在合同设计中的应用。
methods: 这篇论文使用了深度学习来自动设计优化的合同。它使用了一种新的表示方法——Discontinuous ReLU（DeLU）网络，来模型主体的利益函数。
results: 实验结果表明，使用DeLU网络可以 успеш地近似主体的利益函数，并且可以使用少量的训练样本和内部点法来解决优化问题。

Abstract
Contract design involves a principal who establishes contractual agreements about payments for outcomes that arise from the actions of an agent. In this paper, we initiate the study of deep learning for the automated design of optimal contracts. We formulate this as an offline learning problem, where a deep network is used to represent the principal's expected utility as a function of the design of a contract. We introduce a novel representation: the Discontinuous ReLU (DeLU) network, which models the principal's utility as a discontinuous piecewise affine function where each piece corresponds to the agent taking a particular action. DeLU networks implicitly learn closed-form expressions for the incentive compatibility constraints of the agent and the utility maximization objective of the principal, and support parallel inference on each piece through linear programming or interior-point methods that solve for optimal contracts. We provide empirical results that demonstrate success in approximating the principal's utility function with a small number of training samples and scaling to find approximately optimal contracts on problems with a large number of actions and outcomes.

摘要
（本文研究了深度学习在合同设计中的应用，具体来说是设计优化的合同。我们将这视为一个离线学习问题，其中深度网络用于表示主人的预期用处函数，并且引入了一种新的表示方法：离散ReLU（DeLU）网络。DeLU网络模型了主人的用处函数为离散的piecewise afine函数，每个piece对应代理人行动的不同。DeLU网络隐式学习了代理人的奖励一致约束和主人的用处最大化目标，并且支持并行推理每个piece的linear programming或内部点法解决优化合同。我们提供了实验结果，证明了使用少量训练样本和扩展可以successfully approximate主人的用处函数，并且可以扩展到解决包含大量行动和结果的问题）。

Sumformer: Universal Approximation for Efficient Transformers

paper_url: http://arxiv.org/abs/2307.02301
repo_url: None
paper_authors: Silas Alberti, Niclas Dern, Laura Thesing, Gitta Kutyniok
for: 这篇论文主要针对的是提出一种新的嵌入式序列处理模型，以解决Transformer模型的时间和空间复杂度问题。
methods: 该论文使用了Linformer和Performer模型，并通过一种新的概念——Sumformer来实现对这些模型的 universally approximating。
results: 该论文通过实验和理论分析表明，Sumformer模型可以成功地实现对Linformer和Performer模型的universal approximation，并且提供了一个新的证明，证明Transformer模型只需要一层注意力层可以实现universal approximation。

Abstract
Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation.

摘要
自然语言处理（NLP）在Transformers的出现后得到了非常出色的进步。ChatGPT是其中最著名的例子，对外部研究者的认知产生了深刻的影响。然而，Transformers的时间和空间复杂度随序列长度的增加平方速度增长，对于处理长序列存在重大的限制。虽然有效的Transformers架构如Linformer和Performer已经出现，但它们的理论理解仍然受限。在这篇论文中，我们介绍了Sumformer，一种新的简单架构，可以通用地近似Equivariant sequence-to-sequence函数。我们使用Sumformer来给Linformer和Performer提供首个通用近似结果。此外，我们还 deriv了一个新的证明，显示Transformers只需一层注意力层就能universal approximation。

Improving Address Matching using Siamese Transformer Networks

paper_url: http://arxiv.org/abs/2307.02300
repo_url: https://github.com/avduarte333/adress-matching
paper_authors: André V. Duarte, Arlindo L. Oliveira
for: 提高葡萄牙地址匹配效率，降低错误交付的风险
methods: 使用深度学习模型，包括bi-encoder和cross-encoder，对葡萄牙邮政地址进行嵌入表示，并进行高精度排名
results: 测试用例显示，模型在葡萄牙邮政地址上达到了95%以上的高精度率，并且使用GPU计算可以提高推理速度，比传统方法快4.5倍

Abstract
Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company's reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is fine-tuned to create meaningful embeddings of Portuguese postal addresses, utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately rerank the 10 addresses obtained by the bi-encoder. The model has been tested on a real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.

摘要
企业和邮政机构在包裹处理和交付过程中，匹配地址是一项非常重要的任务。 incorrect 地址交付可能会对公司的名誉产生负面影响，并且可能会导致经济和环境损失。这项研究推出了一种基于深度学习的地址匹配模型，用于提高葡萄牙地址的匹配效率。该模型包括两部分：1. 双编码器，通过细化葡萄牙邮政地址来生成有意义的嵌入。使用这些嵌入来从normalized数据库中检索最有可能性的10个目标地址。2. 混合编码器，通过细化10个从双编码器获取的地址来准确地排名它们。这个模型在葡萄牙地址的实际案例中进行测试，表现出了高度的准确率，超过95%的门槛。在使用GPU计算时，推理速度比传统方法 such as BM25 快得多，约为4.5倍。在实际应用场景中，这种系统的实施将有效地提高物流过程的效率。现在正在进行实际应用研究。

Meta-Learning Adversarial Bandit Algorithms

paper_url: http://arxiv.org/abs/2307.02295
repo_url: None
paper_authors: Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan, Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu
for: 本研究旨在提高多个任务的性能，假设这些任务之间存在自然的相似度测试。
methods: 我们设计了meta算法，用于同时调整内部学习器的初始化和其他超参数。对于多抢拍机（MAB）和bandit线性优化（BLO）两个重要情况，我们设计了外部学习器来同时调整初始化和超参数。
results: 我们证明了在不Regularization的follow-the-leader combinated with two levels of low-dimensional hyperparameter tuning是可以学习一个序列的Affine函数， bounding the regret of online mirror descent（OMD）。

Abstract
We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action space-dependent measure they induce. Our guarantees rely on proving that unregularized follow-the-leader combined with two levels of low-dimensional hyperparameter tuning is enough to learn a sequence of affine functions of non-Lipschitz and sometimes non-convex Bregman divergences bounding the regret of OMD.

摘要
我们研究在线元学习，使得在多个任务之间提高性能，如果这些任务具有自然的相似度度量。作为首先针对在线内部部分信息设定下进行反恶力学学习的研究，我们设计了元学习算法，将外层学习器与内层学习器的初始化和其他超参数同时调整。对于多枪炮（MAB）和反馈函数优化（BLO）两个重要的案例，我们的元学习算法都可以达到良好的性能。对于 MAB，我们使用 Tsallis-Entropy 泛化 Exp3，并证明在任务平均误差小时，元学习器可以提高性能。对于 BLO，我们学习在线镜像下降（OMD）的初始化和调整，并证明任务平均误差与行动空间依赖的度量直接相关。我们的保证基于证明，不带权重的跟随者，加上两级低维度超参数调整，可以学习一个 sequence of affine function， bounding OMD 的误差。

Absorbing Phase Transitions in Artificial Deep Neural Networks

paper_url: http://arxiv.org/abs/2307.02284
repo_url: None
paper_authors: Keiichi Tamai, Tsuyoshi Okubo, Truong Vinh Truong Duy, Naotake Natori, Synge Todo
for: 了解抽象神经网络的行为
methods: 使用 mean-field theory 和 absorbing phase transitions 的研究方法
results: 显示了抽象神经网络的行为可以通过critical phenomena来理解，并且 Architecture 的不同会影响transition的类型。In simpler Chinese:
for: 了解神经网络的行为
methods: 使用 mean-field theory 和 absorbing phase transitions 的方法
results: 发现神经网络的行为可以通过critical phenomena来理解，而且不同的 Architecture 会影响transition的类型。

Abstract
Theoretical understanding of the behavior of infinitely-wide neural networks has been rapidly developed for various architectures due to the celebrated mean-field theory. However, there is a lack of a clear, intuitive framework for extending our understanding to finite networks that are of more practical and realistic importance. In the present contribution, we demonstrate that the behavior of properly initialized neural networks can be understood in terms of universal critical phenomena in absorbing phase transitions. More specifically, we study the order-to-chaos transition in the fully-connected feedforward neural networks and the convolutional ones to show that (i) there is a well-defined transition from the ordered state to the chaotics state even for the finite networks, and (ii) difference in architecture is reflected in that of the universality class of the transition. Remarkably, the finite-size scaling can also be successfully applied, indicating that intuitive phenomenological argument could lead us to semi-quantitative description of the signal propagation dynamics.

摘要
theoretical understanding of the behavior of infinitely-wide neural networks has been rapidly developed for various architectures due to the celebrated mean-field theory. However, there is a lack of a clear, intuitive framework for extending our understanding to finite networks that are of more practical and realistic importance. In the present contribution, we demonstrate that the behavior of properly initialized neural networks can be understood in terms of universal critical phenomena in absorbing phase transitions. More specifically, we study the order-to-chaos transition in the fully-connected feedforward neural networks and the convolutional ones to show that (i) there is a well-defined transition from the ordered state to the chaotics state even for the finite networks, and (ii) difference in architecture is reflected in that of the universality class of the transition. Remarkably, the finite-size scaling can also be successfully applied, indicating that intuitive phenomenological argument could lead us to semi-quantitative description of the signal propagation dynamics.Here's the translation breakdown:* theoretically understanding (理论理解) = 理论理解 (Simplified Chinese)* has been rapidly developed (has been rapidly developed) = 快速发展 (Simplified Chinese)* for various architectures (for various architectures) = для多种架构 (Simplified Chinese)* due to the celebrated mean-field theory (due to the celebrated mean-field theory) = 因为著名的平均场理论 (Simplified Chinese)* However, there is a lack of (However, there is a lack of) = 然而，缺乏 (Simplified Chinese)* a clear, intuitive framework (a clear, intuitive framework) = 一个明确、直观的框架 (Simplified Chinese)* for extending our understanding (for extending our understanding) = 以便扩展我们的理解 (Simplified Chinese)* to finite networks (to finite networks) = 到有限网络 (Simplified Chinese)* that are of more practical and realistic importance (that are of more practical and realistic importance) = 更加实用和现实重要的网络 (Simplified Chinese)* In the present contribution (In the present contribution) = 在当前的贡献中 (Simplified Chinese)* we demonstrate (we demonstrate) = 我们展示 (Simplified Chinese)* that the behavior (that the behavior) = 网络的行为 (Simplified Chinese)* of properly initialized neural networks (of properly initialized neural networks) = 初始化过的神经网络的行为 (Simplified Chinese)* can be understood (can be understood) = 可以理解 (Simplified Chinese)* in terms of universal critical phenomena (in terms of universal critical phenomena) = 以 Kritical 现象的形式来理解 (Simplified Chinese)* in absorbing phase transitions (in absorbing phase transitions) = 在吸收相对 equilibria 阶段中 (Simplified Chinese)* More specifically (More specifically) = 更加细致 (Simplified Chinese)* we study (we study) = 我们研究 (Simplified Chinese)* the order-to-chaos transition (the order-to-chaos transition) = 从有序到无序的转变 (Simplified Chinese)* in the fully-connected feedforward neural networks (in the fully-connected feedforward neural networks) = 在完全连接的前向神经网络中 (Simplified Chinese)* and the convolutional ones (and the convolutional ones) = 以及卷积神经网络 (Simplified Chinese)* to show (to show) = 以示 (Simplified Chinese)* that (i) there is a well-defined transition (that (i) there is a well-defined transition) = 显示 (i) 存在一个明确的转变 (Simplified Chinese)* from the ordered state (from the ordered state) = 从有序状态 (Simplified Chinese)* to the chaotics state (to the chaotics state) = 到无序状态 (Simplified Chinese)* even for the finite networks (even for the finite networks) = 甚至 для有限网络 (Simplified Chinese)* and (ii) difference in architecture (and (ii) difference in architecture) = 以及 (ii) 不同的架构 (Simplified Chinese)* is reflected in that of the universality class (is reflected in that of the universality class) = 在不同的架构中反映出的 универсалитет 类型 (Simplified Chinese)* Remarkably (Remarkably) = 很有趣 (Simplified Chinese)* the finite-size scaling (the finite-size scaling) = 有限大小缩放 (Simplified Chinese)* can also be successfully applied (can also be successfully applied) = 也可以成功应用 (Simplified Chinese)* indicating (indicating) = 表明 (Simplified Chinese)* that (that) = 的 (Simplified Chinese)* intuitive phenomenological argument (intuitive phenomenological argument) = 直观的现象学 argue (Simplified Chinese)* could lead us to (could lead us to) = 可以导我们到 (Simplified Chinese)* semi-quantitative description (semi-quantitative description) = 半量化的描述 (Simplified Chinese)* of the signal propagation dynamics (of the signal propagation dynamics) = 信号传播动态的描述 (Simplified Chinese)

From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks

paper_url: http://arxiv.org/abs/2307.02279
repo_url: None
paper_authors: Cristina Cipriani, Massimo Fornasier, Alessandro Scagliotti
for: This paper aims to extend the mean-field control framework for continuous-time Autoencoders (AutoencODEs) to handle low Tikhonov regularization and potentially non-convex cost landscapes.
methods: The paper proposes a modification of the controlled field in the AutoencODE to enable the extension of the mean-field control framework, and develops a training method tailored to this specific type of Autoencoders with residual connections.
results: The paper shows that many of the global results obtained for high Tikhonov regularization can be recovered in regions where the loss function is locally convex, and validates the approach through numerical experiments conducted on various examples.

Abstract
The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularization, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularization may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.

摘要
<>将神经网络（ResNets）与连续时间控制系统（NeurODEs）的连接，导致了神经网络的数学分析，并且提供了有趣的理论和实践 significado。然而，由于NeurODEs的构造，它们只适用于描述常数宽的层，因此不适用于模型深度学习架构中的层。在这篇论文中，我们提出了一种基于修改控制场的连续时间自编码器，我们称之为AutoencODE。这种修改允许我们将mean-field控制框架原来设计 для传统的NeurODEs应用到这种新的Autoencoder中。在这个设定下，我们研究了低Tikhonov正则化的情况，导致的可能是非对称的成本地图。虽然高Tikhonov正则化的全局结果可能不会全球适用，但我们表明了在成本地图的局部几何上，许多全局结果可以被恢复。受到我们的理论发现的启发，我们开发了针对这种特殊类型的Autoencoders的训练方法，并通过对各种示例进行数学实验验证了我们的方法。

First-Explore, then Exploit: Meta-Learning Intelligent Exploration

paper_url: http://arxiv.org/abs/2307.02276
repo_url: https://github.com/btnorman/First-Explore
paper_authors: Ben Norman, Jeff Clune
For: The paper aims to address the issue of intelligent exploration in reinforcement learning (RL) agents, which have been limited by the conflict between exploration and exploitation.* Methods: The proposed First-Explore framework consists of two policies: one for exploration and one for exploitation. The explore policy learns to explore the environment, while the exploit policy learns to exploit the learned knowledge.* Results: The paper demonstrates that First-Explore can learn intelligent exploration strategies such as exhaustive search and outperforms dominant standard RL and meta-RL approaches on domains where exploration requires sacrificing reward.

Abstract
Standard reinforcement learning (RL) agents never intelligently explore like a human (i.e. by taking into account complex domain priors and previous explorations). Even the most basic intelligent exploration strategies such as exhaustive search are only inefficiently or poorly approximated by approaches such as novelty search or intrinsic motivation, let alone more complicated strategies like learning new skills, climbing stairs, opening doors, or conducting experiments. This lack of intelligent exploration limits sample efficiency and prevents solving hard exploration domains. We argue a core barrier prohibiting many RL approaches from learning intelligent exploration is that the methods attempt to explore and exploit simultaneously, which harms both exploration and exploitation as the goals often conflict. We propose a novel meta-RL framework (First-Explore) with two policies: one policy learns to only explore and one policy learns to only exploit. Once trained, we can then explore with the explore policy, for as long as desired, and then exploit based on all the information gained during exploration. This approach avoids the conflict of trying to do both exploration and exploitation at once. We demonstrate that First-Explore can learn intelligent exploration strategies such as exhaustive search and more, and that it outperforms dominant standard RL and meta-RL approaches on domains where exploration requires sacrificing reward. First-Explore is a significant step towards creating meta-RL algorithms capable of learning human-level exploration which is essential to solve challenging unseen hard-exploration domains.

摘要
标准强化学习（RL）代理没有人类智能的探索能力（即考虑复杂领域假设和前一次探索）。 même 最基本的智能探索策略，如探索所有可能性，都是通过方法如新鲜度搜索或内在动机来不够或不准确地 aproximated。这种缺乏智能探索限制了样本效率，阻碍解决困难探索领域。我们认为许多RL方法无法学习智能探索的核心障碍在于这些方法尝试同时探索和利用，这两个目标经常矛盾。我们提出了一种新的元RL框架（First-Explore），其中有两个策略：一个策略学习只探索，另一个策略学习只利用。一旦训练完成，我们可以使用探索策略，探索到心仪的时间，然后根据所获得的信息进行利用。这种方法避免了同时尝试探索和利用的矛盾。我们证明First-Explore可以学习智能探索策略，例如探索所有可能性，并且超过了主流标准RL和元RL方法在需要牺牲奖励的领域中的表现。First-Explore是创造meta-RL算法可以学习人类水平的探索的重要一步，解决了许多未解之探索领域。

Convolutions Through the Lens of Tensor Networks

paper_url: http://arxiv.org/abs/2307.02275
repo_url: None
paper_authors: Felix Dangel
for: 这篇论文旨在探讨卷积神经网络（TN）如何用于理解卷积层。
methods: 该论文使用了tensor网络（TN）来理解卷积层，并通过绘制图像和执行函数转换、子tensor访问和融合来表示卷积层的含义。
results: 该论文通过对各种自动梯度计算和各种精度估计的图像表示，证明了TN的表达力。此外，该论文还提供了基于连接性模式的卷积特性转换，以便简化和加速图像计算。最后，该论文表明了使用TN实现的计算性能。

Abstract
Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the generalization of theoretical and algorithmic ideas. We provide a new perspective onto convolutions through tensor networks (TNs) which allow reasoning about the underlying tensor multiplications by drawing diagrams, and manipulating them to perform function transformations, sub-tensor access, and fusion. We demonstrate this expressive power by deriving the diagrams of various autodiff operations and popular approximations of second-order information with full hyper-parameter support, batching, channel groups, and generalization to arbitrary convolution dimensions. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to re-wire and simplify diagrams before evaluation. Finally, we probe computational performance, relying on established machinery for efficient TN contraction. Our TN implementation speeds up a recently-proposed KFAC variant up to 4.5x and enables new hardware-efficient tensor dropout for approximate backpropagation.

摘要
尽管卷积有简单的直觉，但它们在理论和算法上的总结和推广是更为复杂的。我们通过tensor网络（TN）提供了一新的视角来理解卷积，这些网络允许我们通过绘制图表和修改它们来实现函数转换、子tensor访问、混合等操作。我们通过示例逻辑来证明TN的表达力，包括自动梯度下降操作和流行的第二个信息近似算法，并支持批处理、通道组、普通的卷积维度等参数。此外，我们还提供了基于卷积连接 patrern的转换，可以重新排列和简化图表之前进行评估。最后，我们评估了TN实现的计算性能，利用现有的高效TN收缩机制。我们的TN实现可以加速一种最近提出的KFAC变种，并启用了新的硬件高效的tensor dropout来实现精确的反propagation。

Dynamical Isometry based Rigorous Fair Neural Architecture Search

paper_url: http://arxiv.org/abs/2307.02263
repo_url: None
paper_authors: Jianxiang Luo, Junyi Hu, Tianji Pang, Weihao Huang, Chuang Liu
for: 提高神经网络搜索的效率和可解释性，以及 garantizar la justicia en la evaluación de módulos。
methods: 基于动态同异ometry的新型神经网络搜索算法，使用fix point analysis方法对平均场观测随机神经网络的动态行为进行分析，并证明模块选择策略是正见的。
results: 通过对ImageNet分类任务进行广泛的实验，显示了使用提议方法可以在同等大小的神经网络中达到顶尖的top-1验证精度，并且demonstrated that our method can achieve better and more stable training performance without loss of generality。

Abstract
Recently, the weight-sharing technique has significantly speeded up the training and evaluation procedure of neural architecture search. However, most existing weight-sharing strategies are solely based on experience or observation, which makes the searching results lack interpretability and rationality. In addition, due to the negligence of fairness, current methods are prone to make misjudgments in module evaluation. To address these problems, we propose a novel neural architecture search algorithm based on dynamical isometry. We use the fix point analysis method in the mean field theory to analyze the dynamics behavior in the steady state random neural network, and how dynamic isometry guarantees the fairness of weight-sharing based NAS. Meanwhile, we prove that our module selection strategy is rigorous fair by estimating the generalization error of all modules with well-conditioned Jacobian. Extensive experiments show that, with the same size, the architecture searched by the proposed method can achieve state-of-the-art top-1 validation accuracy on ImageNet classification. In addition, we demonstrate that our method is able to achieve better and more stable training performance without loss of generality.

摘要
最近，Weight-sharing技术在神经网络搜索中提高了训练和评估过程的速度。然而，大多数现有的Weight-sharing策略都是基于经验或观察，lack of interpretability和理性性。此外，由于对公平性的忽视，当前的方法容易做出不准确的模块评估。为解决这些问题，我们提出了一种基于动态同尺的神经网络搜索算法。我们使用了 fixes point analysis方法来分析动态同尺在平均场 teor 中的动态行为，并证明了动态同尺的权重分享可以保证公平性。此外，我们证明了我们的模块选择策略是正则公平的，可以通过Jacobian的condition number来估算模块的总体适应性。实验表明，我们的方法可以在ImageNet分类任务中 achievestate-of-the-art的顶部一 validate accuracy，并且可以在不失一般性的前提下提高训练性能。

Multivariate Time Series Classification: A Deep Learning Approach

paper_url: http://arxiv.org/abs/2307.02253
repo_url: https://github.com/radrumond/timehetnet
paper_authors: Mohamed Abouelnaga, Julien Vitay, Aida Farahani
for: 本研究探讨了不同方法和神经网络架构在时间序列分类领域中的可行性。
methods: 本研究使用了 Fully Convolutional Networks (FCN) 和 Long Short-Term Memory (LSTM) дляsupervised learning，以及 Recurrent Autoencoders дляsemisupervised learning。
results: 通过分析时间序列数据，研究发现不同参数的影响，并通过精度和准确率等指标评估不同方法的 diferencias，以确定适合这种问题的方法。

Abstract
This paper investigates different methods and various neural network architectures applicable in the time series classification domain. The data is obtained from a fleet of gas sensors that measure and track quantities such as oxygen and sound. With the help of this data, we can detect events such as occupancy in a specific environment. At first, we analyze the time series data to understand the effect of different parameters, such as the sequence length, when training our models. These models employ Fully Convolutional Networks (FCN) and Long Short-Term Memory (LSTM) for supervised learning and Recurrent Autoencoders for semisupervised learning. Throughout this study, we spot the differences between these methods based on metrics such as precision and recall identifying which technique best suits this problem.

摘要
First, we analyze the time series data to understand the impact of different parameters, such as sequence length, when training our models. Our models use Fully Convolutional Networks (FCN) and Long Short-Term Memory (LSTM) for supervised learning, and Recurrent Autoencoders for semisupervised learning.Throughout the study, we compare the performance of these methods based on metrics such as precision and recall, and identify which technique is best suited for this problem.

RanPAC: Random Projections and Pre-trained Models for Continual Learning

paper_url: http://arxiv.org/abs/2307.02251
repo_url: None
paper_authors: Mark D. McDonnell, Dong Gong, Amin Parveneh, Ehsan Abbasnejad, Anton van den Hengel
for: 这个研究是为了解决在非站点数据流中进行逐步学习（Continual Learning，CL）时，不要忘记之前学习的知识。
methods: 这个研究使用了预训模型（pre-trained models），并将其应用到不同的下游需求。它们可以直接使用预训模型的特征（pre-extracted features），或者使用适材化器（adaptors）。但是，这些方法可能会导致忘记现象。因此，这个研究提出了一个简洁有效的方法，通过增加特征之间的互动，提高分类器的线性分类能力，并避免忘记现象。
results: 这个研究发现，透过将固定的Random Projector层加入预训模型的特征表现和出力头，可以增加特征之间的互动，提高分类器的线性分类能力，并避免忘记现象。此外，调整分类器的标本集也可以帮助避免分布差异导致的忘记现象。这些技术在七个类别增量学习 benchmark 测试中，与过去的方法相比，可以大幅降低最终的错误率，并且不需要使用任何复习内存。

Abstract
Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 10\% and 62\% on seven class-incremental benchmark datasets, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast continual learning has not hitherto been fully tapped.

摘要
In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning.Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 10\% and 62\% on seven class-incremental benchmark datasets, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast continual learning has not hitherto been fully tapped.

Set Learning for Accurate and Calibrated Models

paper_url: http://arxiv.org/abs/2307.02245
repo_url: https://github.com/lukasmut/oko
paper_authors: Lukas Muttenthaler, Robert A. Vandermeulen, Qiuyi Zhang, Thomas Unterthiner, Klaus-Robert Müller
for: 降低机器学习模型的自信和准确性问题，提高模型的准确率和准确性，特别在有限的训练数据和类别偏挤的情况下。
methods: 提出了一种新的odd-$k$-out learning（OKO）方法，通过将cross-entropy error最小化为集合而不是单个示例，使模型能够捕捉数据示例之间的相关性，并提高准确率和准确性。
results: OKO方法可以在有限的训练数据和类别偏挤的情况下提高模型的准确率和准确性，并且可以不需要额外的调整参数，如温度Scaling。我们提供了理论支持和广泛的实验分析，证明OKO方法的有效性。

Abstract
Model overconfidence and poor calibration are common in machine learning and difficult to account for when applying standard empirical risk minimization. In this work, we propose a novel method to alleviate these problems that we call odd-$k$-out learning (OKO), which minimizes the cross-entropy error for sets rather than for single examples. This naturally allows the model to capture correlations across data examples and achieves both better accuracy and calibration, especially in limited training data and class-imbalanced regimes. Perhaps surprisingly, OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning, such as temperature scaling. We provide theoretical justification, establishing that OKO naturally yields better calibration, and provide extensive experimental analyses that corroborate our theoretical findings. We emphasize that OKO is a general framework that can be easily adapted to many settings and the trained model can be applied to single examples at inference time, without introducing significant run-time overhead or architecture changes.

摘要
MODEL 过信任和轻度预测困难在机器学习中存在，而且使用标准的empirical risk minimization方法很难准确地考虑这些问题。在这个工作中，我们提出了一种新的方法，称为奇数-$k$-out learning（OKO），该方法通过将cross-entropy errorMinimize for sets而不是单个例子。这 naturally allows the model to capture数据示例之间的相关性，并实现更高的准确率和报告率，尤其在有限的训练数据和类别不均衡情况下。尽管OKO经常提供更好的报告率，而且在使用硬标签和dropping any additional calibration parameter tuning时，我们提供了理论基础，证明OKO自然地提供更好的报告率。我们还提供了广泛的实验分析，证明我们的理论发现。我们强调OKO是一种通用的框架，可以轻松地适应多种设置，并且训练后的模型可以在推理时间应用于单个例子，无需添加显著的运行时过程 overhead或者architecture change。

Knowledge-Guided Additive Modeling For Supervised Regression

paper_url: http://arxiv.org/abs/2307.02229
repo_url: https://github.com/yannclaes/kg-regression
paper_authors: Yann Claes, Vân Anh Huynh-Thu, Pierre Geurts
for: 本研究旨在评估混合模型在标准回归问题上的性能，并与传统机器学习方法进行比较。
methods: 本研究使用了混合模型，其中包括添加式地将 Parametric 物理 термин与机器学习 термин相加。我们还研究了模型免Selection 训练方法。
results: 我们在 synthetic 和实际回归问题上进行了多种方法的比较，结果表明，混合模型在global performance和参数确定方面具有优势。

Abstract
Learning processes by exploiting restricted domain knowledge is an important task across a plethora of scientific areas, with more and more hybrid methods combining data-driven and model-based approaches. However, while such hybrid methods have been tested in various scientific applications, they have been mostly tested on dynamical systems, with only limited study about the influence of each model component on global performance and parameter identification. In this work, we assess the performance of hybrid modeling against traditional machine learning methods on standard regression problems. We compare, on both synthetic and real regression problems, several approaches for training such hybrid models. We focus on hybrid methods that additively combine a parametric physical term with a machine learning term and investigate model-agnostic training procedures. We also introduce a new hybrid approach based on partial dependence functions. Experiments are carried out with different types of machine learning models, including tree-based models and artificial neural networks.

摘要
学习通过利用限制领域知识是科学领域中重要任务，随着更多的混合方法相继出现，这些混合方法结合数据驱动和模型基于方法。然而，虽然这些混合方法在科学应用中得到了证明，但是它们在动力系统上进行了大多数测试，对每个模型组件对全局性表现的影响尚未得到了充分的研究。在这项工作中，我们对混合模型与传统机器学习方法进行比较，在标准回归问题上进行了评估。我们比较了多种混合方法，包括添加式地将 Parametric 物理项与机器学习项相加的方法，以及模型无关的训练过程。此外，我们还介绍了一种基于 partial dependence 函数的新的混合方法。实验使用了不同类型的机器学习模型，包括树状模型和人工神经网络。

Personalized Federated Learning via Amortized Bayesian Meta-Learning

paper_url: http://arxiv.org/abs/2307.02222
repo_url: None
paper_authors: Shiyu Liu, Shaogao Lv, Dun Zeng, Zenglin Xu, Hui Wang, Yue Yu
for: 本研究旨在 Addressing the challenge of statistical heterogeneity in federated learning, 即让多个客户端协同学习一个全局模型，而不曝光他们私有数据。
methods: 本文提出了一种新的个性化联合学习方法，即\emph{FedABML}，它使用了层次变分推理来跨客户端。全局前期目标是捕捉客户端间共同内在结构的表示，然后将其转移到每个客户端的特定任务上，以便通过一些本地更新生成高度准确的客户端特定抽象 posterior。
results: 我们的理论分析表明，\emph{FedABML} 可以在未见数据上提供一个上下文bound，并且保证模型在未见数据上的泛化性能。此外，我们还实现了一些验证性实验，显示\emph{FedABML} 可以超越一些竞争对手。

Abstract
Federated learning is a decentralized and privacy-preserving technique that enables multiple clients to collaborate with a server to learn a global model without exposing their private data. However, the presence of statistical heterogeneity among clients poses a challenge, as the global model may struggle to perform well on each client's specific task. To address this issue, we introduce a new perspective on personalized federated learning through Amortized Bayesian Meta-Learning. Specifically, we propose a novel algorithm called \emph{FedABML}, which employs hierarchical variational inference across clients. The global prior aims to capture representations of common intrinsic structures from heterogeneous clients, which can then be transferred to their respective tasks and aid in the generation of accurate client-specific approximate posteriors through a few local updates. Our theoretical analysis provides an upper bound on the average generalization error and guarantees the generalization performance on unseen data. Finally, several empirical results are implemented to demonstrate that \emph{FedABML} outperforms several competitive baselines.

摘要
“联邦学习”是一种分散式和隐私保证的技术，让多个客户端与服务器共同学习一个全球模型，不会曝露他们的私人数据。然而，客户端的统计差异对全球模型的性能产生挑战，因为全球模型可能无法很好地适应每个客户端的特定任务。为解决这个问题，我们将在个人化联邦学习中引入新的见解，通过整合泛化统计学和机器学习。 Specifically，我们提出一个名为“FedABML”的新算法，它使用客户端之间的层次统计推导，以捕捉客户端的共同内在结构表现。这些表现可以转移到每个客户端的特定任务中，并通过一些本地更新产生高精度的客户端特定概率 posteriors。我们的理论分析提供了随机数据的平均泛化错误上限，并保证模型在未见数据上的泛化性能。最后，我们进行了实验，证明了FedABML在多个竞争性基eline上表现出色。

On the Adversarial Robustness of Generative Autoencoders in the Latent Space

paper_url: http://arxiv.org/abs/2307.02202
repo_url: None
paper_authors: Mingfei Lu, Badong Chen
for:This paper focuses on the adversarial robustness of generative autoencoders, specifically in the latent space.methods:The authors use various attacks in the latent space to demonstrate the vulnerability of popular generative autoencoders. They also compare the performance of variational autoencoders with their deterministic variants and observe that the latter has better latent robustness.results:The authors find that there is a trade-off between adversarial robustness and the degree of disentanglement of the latent codes. They also show that adversarial training can improve the latent robustness of VAEs.

Abstract
The generative autoencoders, such as the variational autoencoders or the adversarial autoencoders, have achieved great success in lots of real-world applications, including image generation, and signal communication. However, little concern has been devoted to their robustness during practical deployment. Due to the probabilistic latent structure, variational autoencoders (VAEs) may confront problems such as a mismatch between the posterior distribution of the latent and real data manifold, or discontinuity in the posterior distribution of the latent. This leaves a back door for malicious attackers to collapse VAEs from the latent space, especially in scenarios where the encoder and decoder are used separately, such as communication and compressed sensing. In this work, we provide the first study on the adversarial robustness of generative autoencoders in the latent space. Specifically, we empirically demonstrate the latent vulnerability of popular generative autoencoders through attacks in the latent space. We also evaluate the difference between variational autoencoders and their deterministic variants and observe that the latter performs better in latent robustness. Meanwhile, we identify a potential trade-off between the adversarial robustness and the degree of the disentanglement of the latent codes. Additionally, we also verify the feasibility of improvement for the latent robustness of VAEs through adversarial training. In summary, we suggest concerning the adversarial latent robustness of the generative autoencoders, analyze several robustness-relative issues, and give some insights into a series of key challenges.

摘要
“生成自 taught autoencoders，如variational autoencoders或adversarial autoencoders，在实际应用中取得了很大的成功，包括图像生成和信号通信。然而，对于它们在实际应用中的Robustness仍然受到了很少的关注。由于生成自 taught autoencoders的潜在阶层结构是probabilistic，因此它们可能会面临 posterior distribution of the latent和实际数据构造的不一致问题，或者 latent posterior distribution的突变。这使得黑客可以通过从latent空间攻击VAEs，特别是在encoder和decoder分开使用的情况下，如传输和压缩感知。在这个工作中，我们提供了生成自 taught autoencoders在latent空间的攻击Robustness的首次研究。我们透过实验示出了流行的生成自 taught autoencoders在latent空间的漏攻击敏感性。我们还评估了variational autoencoders和其决定性版本之间的差异，发现后者在latent robustness方面表现更好。此外，我们也发现了在提高VAEs的latent robustness方面存在一定的贸易关系。最后，我们还验证了对VAEs的latent robustness进行反向培训可以提高其Robustness。总之，我们建议关注生成自 taught autoencoders的latent Robustness，分析了一些Robustness相关的问题，并给出了一些关键挑战的见解。”

ChiENN: Embracing Molecular Chirality with Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.02198
repo_url: https://github.com/gmum/chienn
paper_authors: Piotr Gaiński, Michał Koziarski, Jacek Tabor, Marek Śmieja
for: 本研究旨在使用Graph Neural Networks (GNNs)在化学分子 graph 上进行预测，并能够区分同一分子的镜像（折射体）。
methods: 我们提出了一种理论上正确的消息传递方案，使得 GNNs 能够具有邻居节点顺序的敏感性。我们在这个概念上将其应用于分子异旋性预测任务中，并构建了具有折射体敏感性的 Chiral Edge Neural Network (ChiENN) 层。
results: 我们的实验结果显示，将 ChiENN 层添加到 GNN 模型中，可以超越当前状态艺术方法在分子异旋性预测任务中的性能。

Abstract
Graph Neural Networks (GNNs) play a fundamental role in many deep learning problems, in particular in cheminformatics. However, typical GNNs cannot capture the concept of chirality, which means they do not distinguish between the 3D graph of a chemical compound and its mirror image (enantiomer). The ability to distinguish between enantiomers is important especially in drug discovery because enantiomers can have very distinct biochemical properties. In this paper, we propose a theoretically justified message-passing scheme, which makes GNNs sensitive to the order of node neighbors. We apply that general concept in the context of molecular chirality to construct Chiral Edge Neural Network (ChiENN) layer which can be appended to any GNN model to enable chirality-awareness. Our experiments show that adding ChiENN layers to a GNN outperforms current state-of-the-art methods in chiral-sensitive molecular property prediction tasks.

摘要
图 neural network (GNN) 在深度学习中扮演了基本角色，特别是在化学信息学中。然而， typical GNN 无法捕捉扁平性概念，这意味着它们不能分辨化学结构图和其镜像（扁平体）之间的差异。在药物发现中，能够分辨扁平体的能力是非常重要的，因为扁平体可能具有非常不同的生物化学性质。在这篇论文中，我们提出了一种理论基础的消息传递方案，该方案使 GNN 对节点邻居的顺序敏感。我们将该概念应用于分子扁平性上，并构建了 Chiral Edge Neural Network（ChiENN）层，可以让 GNN 模型具有扁平性意识。我们的实验表明，将 ChiENN 层添加到 GNN 模型后，可以超越当前状态的术法在扁平性敏感分子性质预测任务中表现。

Evaluating AI systems under uncertain ground truth: a case study in dermatology

paper_url: http://arxiv.org/abs/2307.02191
repo_url: None
paper_authors: David Stutz, Ali Taylan Cemgil, Abhijit Guha Roy, Tatiana Matejovicova, Melih Barsbey, Patricia Strachan, Mike Schaekermann, Jan Freyberg, Rajeev Rikhye, Beverly Freeman, Javier Perez Matos, Umesh Telang, Dale R. Webster, Yuan Liu, Greg S. Corrado, Yossi Matias, Pushmeet Kohli, Yun Liu, Arnaud Doucet, Alan Karthikesalingam
for: 这个论文目的是提出了一种方法来评估AI模型的性能时考虑到真实的数据预期不确定性。
methods: 这个论文使用了一种基于统计模型的方法来汇集笔记，并提出了一种新的性能评价指标来考虑annotations uncertainty。
results: 研究发现，使用传统的deterministic aggregation方法时，评估结果具有很大的uncertainty，而使用提出的统计模型方法可以更好地评估模型的性能和uncertainty。

Abstract
For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this is actually not the case and the ground truth may be uncertain. Unfortunately, this is largely ignored in standard evaluation of AI models but can have severe consequences such as overestimating the future performance. To avoid this, we measure the effects of ground truth uncertainty, which we assume decomposes into two main components: annotation uncertainty which stems from the lack of reliable annotations, and inherent uncertainty due to limited observational information. This ground truth uncertainty is ignored when estimating the ground truth by deterministically aggregating annotations, e.g., by majority voting or averaging. In contrast, we propose a framework where aggregation is done using a statistical model. Specifically, we frame aggregation of annotations as posterior inference of so-called plausibilities, representing distributions over classes in a classification setting, subject to a hyper-parameter encoding annotator reliability. Based on this model, we propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation. We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses. The deterministic adjudication process called inverse rank normalization (IRN) from previous work ignores ground truth uncertainty in evaluation. Instead, we present two alternative statistical models: a probabilistic version of IRN and a Plackett-Luce-based model. We find that a large portion of the dataset exhibits significant ground truth uncertainty and standard IRN-based evaluation severely over-estimates performance without providing uncertainty estimates.

摘要
(Simplified Chinese translation)为了保障，医疗领域中的 AI 系统在部署前都会进行严格的评估，以验证其预测结果与固定的真实值进行比较。然而，事实上，真实值并不是固定的，而是具有uncertainty。这一点很容易被忽略，但可能导致未来性能的过度估计。为了避免这种情况，我们需要考虑真实值的uncertainty。我们假设真实值的uncertainty可以分解为两个主要组成部分：注释uncertainty和内在uncertainty。注释uncertainty来自于不可靠的注释，而内在uncertainty来自于限制的观察信息。标准评估方法忽略了这些uncertainty，而是通过 deterministic aggregation（例如，多数投票或平均）来Estimating the ground truth。在这种情况下，我们提出了一种使用统计模型进行注释聚合的框架。specifically，我们将注释聚合视为 posterior inference of so-called plausibilities， representing distributions over classes in a classification setting, subject to a hyper-parameter encoding annotator reliability。基于这个模型，我们提出了一个度量注释uncertainty的metric，并提供了不确定度调整的性能评估 metric。我们在皮肤状况分类从图像中进行了一个案例研究， где注释是在 differential diagnoses 的形式提供的。previous work的 deterministic adjudication process（IRN）忽略了真实值uncertainty，而是通过 deterministic aggregation来Estimating the ground truth。在这种情况下，我们提出了两种统计模型：一种是probabilistic IRN，另一种是Plackett-Luce-based model。我们发现大量数据中存在很大的真实值uncertainty，标准 IRN-based evaluation 严重过度估计性能。

Diffusion Models for Computational Design at the Example of Floor Plans

paper_url: http://arxiv.org/abs/2307.02511
repo_url: None
paper_authors: Joern Ploennigs, Markus Berger
for: 这个研究旨在测试传播模型在土木工程中的应用，尤其是创建特定的建筑计划。
methods: 这个研究使用传播模型实现了图像生成，并提出了改进 semantic 编码的新传播模型。
results: 研究发现，这些传播模型可以从 6% 提高至 90% 的有效 floor plan 生成，并且在不同的例子中进行了多个实验。

Abstract
AI Image generators based on diffusion models are widely discussed recently for their capability to create images from simple text prompts. But, for practical use in civil engineering they need to be able to create specific construction plans for given constraints. Within this paper we explore the capabilities of those diffusion-based AI generators for computational design at the example of floor plans and identify their current limitation. We explain how the diffusion-models work and propose new diffusion models with improved semantic encoding. In several experiments we show that we can improve validity of generated floor plans from 6% to 90% and query performance for different examples. We identify short comings and derive future research challenges of those models and discuss the need to combine diffusion models with building information modelling. With this we provide key insights into the current state and future directions for diffusion models in civil engineering.

摘要
《Diffusion模型基于AI图像生成器在近期受到广泛关注，能够从简单文本提示中生成图像。但在实际的ivil工程应用中，它们需要能够根据给定的约束创建特定的建筑计划。本文介绍了Diffusion模型在计算设计中的能力，以卷积图生成器为例，并评估了其Semantic编码的改进。我们在多个实验中示出，可以从6%提高到90%的有效性，并且提高了不同的示例的查询性能。我们还识别了这些模型的缺点，并提出了未来研究挑战。我们认为Diffusion模型和建筑信息模型的结合是未来的发展趋势。这些发现为Diffusion模型在 civil工程领域的现状和未来发展提供了关键的导向。》Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese. If you need Traditional Chinese, please let me know.

DiffFlow: A Unified SDE Framework for Score-Based Diffusion Models and Generative Adversarial Networks

paper_url: http://arxiv.org/abs/2307.02159
repo_url: None
paper_authors: Jingwei Zhang, Han Shi, Jincheng Yu, Enze Xie, Zhenguo Li
for: 这个论文的目的是提出一种统一的概率理论框架，用于描述 explict 生成模型和 implicit 生成模型之间的关系。
methods: 该论文使用了一种名为 Discriminator Denoising Diffusion Flow (DiffFlow) 的新型 Stochastic Differential Equation (SDE)，用于描述生成模型的学习动态。
results: 该论文提出了一种可以在Explicit 生成模型和 implicit 生成模型之间进行满足的平衡点，并且可以通过调整权重来实现高质量样本的生成和快速样本生成。

Abstract
Generative models can be categorized into two types: explicit generative models that define explicit density forms and allow exact likelihood inference, such as score-based diffusion models (SDMs) and normalizing flows; implicit generative models that directly learn a transformation from the prior to the data distribution, such as generative adversarial nets (GANs). While these two types of models have shown great success, they suffer from respective limitations that hinder them from achieving fast sampling and high sample quality simultaneously. In this paper, we propose a unified theoretic framework for SDMs and GANs. We shown that: i) the learning dynamics of both SDMs and GANs can be described as a novel SDE named Discriminator Denoising Diffusion Flow (DiffFlow) where the drift can be determined by some weighted combinations of scores of the real data and the generated data; ii) By adjusting the relative weights between different score terms, we can obtain a smooth transition between SDMs and GANs while the marginal distribution of the SDE remains invariant to the change of the weights; iii) we prove the asymptotic optimality and maximal likelihood training scheme of the DiffFlow dynamics; iv) under our unified theoretic framework, we introduce several instantiations of the DiffFLow that provide new algorithms beyond GANs and SDMs with exact likelihood inference and have potential to achieve flexible trade-off between high sample quality and fast sampling speed.

摘要
<>将文本翻译成简化中文。<>生成模型可以分为两类：Explicit生成模型，它们定义明确的概率形式，并允许准确的可能性推断，如排Diffusion模型（SDM）和Normalizing Flow；Implicit生成模型，它们直接学习数据分布与假设分布之间的变换，如生成敌方网络（GAN）。虽然这两种模型都有显著的成功，但它们受到减速采样和高质量采样的限制。在这篇论文中，我们提出一个统一的理论框架，用于SDM和GAN。我们证明了：i) SDM和GAN的学习动力可以被描述为一种名为Discriminator Denoising Diffusion Flow（DiffFlow）的新型SDE，其涨动可以由真实数据和生成数据的得分组成的权重所决定; ii) 通过调整不同得分项的相对权重，可以实现SDM和GAN之间的滑块过渡，而且采样分布的总体征不变; iii) 我们证明DiffFlow动力的极限优化和最大可能性训练方案; iv) 在我们统一的理论框架下，我们引入了多种DiffFlow实例，提供了新的算法，包括SDM和GAN中的准确可能性推断和高速采样速度。

Wasserstein Auto-Encoders of Merge Trees (and Persistence Diagrams)

paper_url: http://arxiv.org/abs/2307.02509
repo_url: None
paper_authors: Mahieu Pont, Julien Tierny
for: 本研究提出了一种基于 Wasserstein metric 空间的merge tree auto-encoding（MT-WAE）方法，用于提高传统自编码器的准确率和可读性。
methods: 本方法使用了一种新的非线性神经网络结构，将merge tree经过多层神经网络的操作，以实现更高的准确率和可读性。
results: 实验结果表明，MT-WAE可以快速计算merge tree，并且可以准确地压缩merge tree，同时 preserved Wasserstein 距离和 clusters。此外，本方法还可以应用于维度减少和数据分析等领域。

Abstract
This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE), a novel extension of the classical auto-encoder neural network architecture to the Wasserstein metric space of merge trees. In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network, resulting in superior accuracy and interpretability. Our novel neural network approach can be interpreted as a non-linear generalization of previous linear attempts [65] at merge tree encoding. It also trivially extends to persistence diagrams. Extensive experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average. We show the utility of our contributions in two applications adapted from previous work on merge tree encoding [65]. First, we apply MT-WAE to data reduction and reliably compress merge trees by concisely representing them with their coordinates in the final layer of our auto-encoder. Second, we document an application to dimensionality reduction, by exploiting the latent space of our auto-encoder, for the visual analysis of ensemble data. We illustrate the versatility of our framework by introducing two penalty terms, to help preserve in the latent space both the Wasserstein distances between merge trees, as well as their clusters. In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used for reproducibility.

摘要

Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency

paper_url: http://arxiv.org/abs/2307.02150
repo_url: None
paper_authors: Md Abdul Kadir, Gowtham Krishna Addluri, Daniel Sonntag
for: 这 study aims to improve the interpretability and trustworthiness of machine learning models by examining the generalization of feature attributions across various deep learning architectures.
methods: The study uses feature attribution methods to provide local explanations of model predictions, and explores the feasibility of utilizing these methods as a future detector.
results: The findings suggest that harmonized feature attribution methods can improve interpretability and trust in machine learning applications, regardless of the underlying architecture.

Abstract
Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of feature attributions across various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers. We aim to assess the feasibility of utilizing a feature attribution method as a future detector and examine how these features can be harmonized across multiple models employing distinct architectures but trained on the same data distribution. By exploring this harmonization, we aim to develop a more coherent and optimistic understanding of feature attributions, enhancing the consistency of local explanations across diverse deep-learning models. Our findings highlight the potential for harmonized feature attribution methods to improve interpretability and foster trust in machine learning applications, regardless of the underlying architecture.

摘要

Towards Open Federated Learning Platforms: Survey and Vision from Technical and Legal Perspectives

paper_url: http://arxiv.org/abs/2307.02140
repo_url: https://github.com/morningd/model-centric-fml
paper_authors: Moming Duan
for: 本文提出了一种新的 Federated Learning（FL）平台设计，即开放式 Federated Learning Platforms，以扩展FL的应用场景和提高数据持有者的参与积极性。
methods: 本文提出了两种对接口型FL框架的替换方案：查询型FL和合同型FL，以解决FL中的严重的服务器-客户端耦合、模型重复利用和非公共问题。
results: 本文通过对技术和法律领域的分析，证明了开放式FL平台的可行性和优势，并提出了一种模型license兼容分类法，以便在FL研究中更好地识别和解决模型使用权限问题。

Abstract
Traditional Federated Learning (FL) follows a server-domincated cooperation paradigm which narrows the application scenarios of FL and decreases the enthusiasm of data holders to participate. To fully unleash the potential of FL, we advocate rethinking the design of current FL frameworks and extending it to a more generalized concept: Open Federated Learning Platforms. We propose two reciprocal cooperation frameworks for FL to achieve this: query-based FL and contract-based FL. In this survey, we conduct a comprehensive review of the feasibility of constructing an open FL platform from both technical and legal perspectives. We begin by reviewing the definition of FL and summarizing its inherent limitations, including server-client coupling, low model reusability, and non-public. In the query-based FL platform, which is an open model sharing and reusing platform empowered by the community for model mining, we explore a wide range of valuable topics, including the availability of up-to-date model repositories for model querying, legal compliance analysis between different model licenses, and copyright issues and intellectual property protection in model reusing. In particular, we introduce a novel taxonomy to streamline the analysis of model license compatibility in FL studies that involve batch model reusing methods, including combination, amalgamation, distillation, and generation. This taxonomy provides a systematic framework for identifying the corresponding clauses of licenses and facilitates the identification of potential legal implications and restrictions when reusing models. Through this survey, we uncover the the current dilemmas faced by FL and advocate for the development of sustainable open FL platforms. We aim to provide guidance for establishing such platforms in the future, while identifying potential problems and challenges that need to be addressed.

摘要
传统的联合学习（FL）采用服务器主导的合作方式，这限制了FL的应用场景和数据持有者的参与积极性。为了充分发挥FL的潜力，我们提倡重新设计当前FL框架，扩展其为更通用的概念：开放联合学习平台。我们提出了两种相互合作的FL框架：查询基于的FL和合同基于的FL。在这篇评论中，我们对构建开放FL平台的技术和法律方面进行了全面的审查。我们开始介绍FL的定义和其内置的局限性，包括服务器客户端集成、低级别模型重用和非公共。在查询基于的FL平台中，我们探讨了许多有价值的话题，包括社区 empowered 的模型分享和重用平台，以及模型查询时的法律合规分析、版权问题和知识产权保护。特别是，我们提出了一种新的分类系统，用于协调FL研究中批处理模型 reuse 方法中的许可证兼容性分析，包括组合、混合、精炼和生成等方法。这种分类系统为在FL研究中复用模型时鉴别相关的许可证条款，并且可以帮助确定复用模型时的可能的法律后果和限制。通过这篇评论，我们揭示了当前FL面临的困境，并提倡开发可持续的开放FL平台。我们希望通过这篇评论，为未来建立开放FL平台提供指南，并识别可能的问题和挑战。

Implicit Differentiation for Hyperparameter Tuning the Weighted Graphical Lasso

paper_url: http://arxiv.org/abs/2307.02130
repo_url: None
paper_authors: Can Pouliquen, Paulo Gonçalves, Mathurin Massias, Titouan Vayer
for: 提出一种 Framework 和算法来调整图像隐藏常量的超参数。
methods: 使用一种精简型搜索方法来解决一个笛卡尔级别优化问题。
results: derivation of the Jacobian of the Graphical Lasso solution with respect to its regularization hyperparameters.

Abstract
We provide a framework and algorithm for tuning the hyperparameters of the Graphical Lasso via a bilevel optimization problem solved with a first-order method. In particular, we derive the Jacobian of the Graphical Lasso solution with respect to its regularization hyperparameters.

摘要
我们提供了一个框架和算法，用于调整图解lasso的超参数via一个双层优化问题，解决使用首个方法。特别是，我们计算了图解lasso解的正则化超参数对它的Jacobian。Here's a breakdown of the translation:* 我们 (wǒmen) - we* 提供 (tīngyè) - provide* 框架 (kāiframe) - framework* 算法 (suānfǎ) - algorithm* 调整 (tiējian) - tune* 超参数 (chāoxiǎn) - hyperparameters* via (via) - via* 双层优化问题 (shuāngcéng yòuhuì wèn) - bilevel optimization problem* 解决 (jiějué) - solve* 使用 (fùyòu) - using* 首个方法 (shǒu gè fāng) - first-order method* 特别是 (tèbié shì) - particularly* 我们计算 (wǒmen jìsuān) - we calculate* 图解lasso解 (tújiě lasso jiě) - Graphical Lasso solution* 正则化超参数 (zhèngxíng huìxiāng) - regularization hyperparameters* 对 (duì) - on* 它 (tā) - it* Jacobian (jiābǐjian) - Jacobian

How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model

paper_url: http://arxiv.org/abs/2307.02129
repo_url: https://github.com/pcsl-epfl/hierarchy-learning
paper_authors: Leonardo Petrini, Francesco Cagnetta, Umberto M. Tomasini, Alessandro Favero, Matthieu Wyart
for: 这个论文的目的是解释深度卷积神经网络如何在高维度数据上学习普遍的任务。
methods: 这个论文使用的方法是使用深度卷积神经网络来学习Random Hierarchy Model，这是一个模拟真实数据的简单分类任务。
results: 研究发现，深度卷积神经网络需要的训练数据量（$P^*$）与高维度数据中类别的数量（$n_c$）和高级特征的数量（$m$）以及重复层数（$L$）有关，具体来说，$P^*$的增长率为$n_c m^L$,只是增长平方根。此外，研究还发现，当训练数据量够多时，深度卷积神经网络的表征将变得对于同义词替换无关，并且可以捕捉低级特征与类别之间的相关性。

Abstract
Learning generic high-dimensional tasks is notably hard, as it requires a number of training data exponential in the dimension. Yet, deep convolutional neural networks (CNNs) have shown remarkable success in overcoming this challenge. A popular hypothesis is that learnable tasks are highly structured and that CNNs leverage this structure to build a low-dimensional representation of the data. However, little is known about how much training data they require, and how this number depends on the data structure. This paper answers this question for a simple classification task that seeks to capture relevant aspects of real data: the Random Hierarchy Model. In this model, each of the $n_c$ classes corresponds to $m$ synonymic compositions of high-level features, which are in turn composed of sub-features through an iterative process repeated $L$ times. We find that the number of training data $P^*$ required by deep CNNs to learn this task (i) grows asymptotically as $n_c m^L$, which is only polynomial in the input dimensionality; (ii) coincides with the training set size such that the representation of a trained network becomes invariant to exchanges of synonyms; (iii) corresponds to the number of data at which the correlations between low-level features and classes become detectable. Overall, our results indicate how deep CNNs can overcome the curse of dimensionality by building invariant representations, and provide an estimate of the number of data required to learn a task based on its hierarchically compositional structure.

摘要
学习高维任务非常困难，因为它们需要数据量的幂等于维度。然而，深度卷积神经网络（CNN）表现出了很好的成功，即使在高维数据上。一种广泛的假设是，学习任务是高度结构化的，并且CNN可以利用这种结构来建立低维表示。然而，对于学习多少数据，还不清楚。这篇论文回答了这个问题，对于一个简单的分类任务，即Random Hierarchy Model。在这个模型中，每个分类对应$n_c$个高级特征的多个同义词组合，这些组合通过循环的过程重复$L$次。我们发现，深度CNN需要学习这个任务的数据量($P^*$)（i）在$n_c m^L$的极限上增长，这只是输入维度的多项式函数；（ii）与训练集大小相同，使得训练后神经网络的表示变得对同义词交换无关的；（iii）与低级特征和类之间的相关性变得可识别。总的来说，我们的结果表明深度CNN可以超越维度味精，并提供了学习任务基于层次结构的数据量的估算。

Robust Graph Structure Learning with the Alignment of Features and Adjacency Matrix

paper_url: http://arxiv.org/abs/2307.02126
repo_url: None
paper_authors: Shaogao Lv, Gang Wen, Shiyu Liu, Linsen Wei, Ming Li
for: 提高图 neural network 的 robustness，jointly 学习干净图结构和对应表示。
methods: 提出了一种新的准则化 graph structure learning 方法，利用特征信息和图信息的协调，基于我们 derive的节点级 Rademacher 复杂性下界。还具有减少维度的稀疏降维方法，使用低维度的节点特征来利用图结构。
results: 对实际图据进行了实验，表明我们提出的 GSL 方法在受到噪声影响的图结构下表现出色，超过了多种竞争对手。

Abstract
To improve the robustness of graph neural networks (GNN), graph structure learning (GSL) has attracted great interest due to the pervasiveness of noise in graph data. Many approaches have been proposed for GSL to jointly learn a clean graph structure and corresponding representations. To extend the previous work, this paper proposes a novel regularized GSL approach, particularly with an alignment of feature information and graph information, which is motivated mainly by our derived lower bound of node-level Rademacher complexity for GNNs. Additionally, our proposed approach incorporates sparse dimensional reduction to leverage low-dimensional node features that are relevant to the graph structure. To evaluate the effectiveness of our approach, we conduct experiments on real-world graphs. The results demonstrate that our proposed GSL method outperforms several competitive baselines, especially in scenarios where the graph structures are heavily affected by noise. Overall, our research highlights the importance of integrating feature and graph information alignment in GSL, as inspired by our derived theoretical result, and showcases the superiority of our approach in handling noisy graph structures through comprehensive experiments on real-world datasets.

摘要

Multi-Scale U-Shape MLP for Hyperspectral Image Classification

paper_url: http://arxiv.org/abs/2307.10186
repo_url: None
paper_authors: Moule Lin, Weipeng Jing, Donglin Di, Guangsheng Chen, Houbing Song
for: 该研究旨在提出一种基于多尺度U型多层感知器（MUMLP）模型，以提高 гиперспектраль图像中像素的标识率。
methods: 该模型由设计的多尺度渠道（MSC）块和U型多层感知器（UMLP）结构组成。 MSC将通道维度变换并混合 spectral band 特征，以生成深度水平的表示。 UMLP 由encoder-decoder结构和多层感知器层组成，能够压缩大规模参数。
results: 对于三个公共数据集（Pavia University、Houston 2013和Houston 2018），研究人员进行了广泛的实验，并证明了该模型可以在多种预测任务中卓越于现状顶尖方法。

Abstract
Hyperspectral images have significant applications in various domains, since they register numerous semantic and spatial information in the spectral band with spatial variability of spectral signatures. Two critical challenges in identifying pixels of the hyperspectral image are respectively representing the correlated information among the local and global, as well as the abundant parameters of the model. To tackle this challenge, we propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) a model consisting of the designed MSC (Multi-Scale Channel) block and the UMLP (U-shape Multi-Layer Perceptron) structure. MSC transforms the channel dimension and mixes spectral band feature to embed the deep-level representation adequately. UMLP is designed by the encoder-decoder structure with multi-layer perceptron layers, which is capable of compressing the large-scale parameters. Extensive experiments are conducted to demonstrate our model can outperform state-of-the-art methods across-the-board on three wide-adopted public datasets, namely Pavia University, Houston 2013 and Houston 2018

摘要
To address these challenges, we propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) model, consisting of a designed Multi-Scale Channel (MSC) block and a U-shape Multi-Layer Perceptron (UMLP) structure. The MSC block transforms the channel dimension and mixes spectral band features to embed deep-level representation adequately. The UMLP structure is designed with an encoder-decoder structure and multi-layer perceptron layers, which can compress large-scale parameters.Extensive experiments demonstrate that our MUMLP model outperforms state-of-the-art methods across-the-board on three widely adopted public datasets, namely Pavia University, Houston 2013, and Houston 2018.

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

paper_url: http://arxiv.org/abs/2307.02108
repo_url: None
paper_authors: Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill
for: 这篇 paper 的目的是提出一种 computationally efficient bandit algorithm 来实现 contextual bandit 的 optimal treatment assignment policy，并且可以适应 cumulative regret minimization 和 simple regret minimization 两种不同的目标。
methods: 这篇 paper 使用了一种新的 family of computationally efficient bandit algorithms，这些算法可以适应 contextual bandit 的条件下的模型错误和统计不确定性，并且可以在 continuous arm settings 中进行应用。这些算法基于 “conformal arm sets” (CASs) 的构造和依赖，CASs 提供了每个 context 中的一个包含 context-specific optimal arm 的集合，以 guaranteee 最小化 regret。
results: 这篇 paper 的实验结果显示了这些算法在 simple regret 和 cumulative regret 上都有优秀的表现，并且可以适应 contextual bandit 的不同条件下。此外，paper 还证明了一个 negative result，即一个 algorithm 无法同时 achiev instance-dependent simple regret guarantees 和 minimax optimal cumulative regret guarantees。

Abstract
Simple regret minimization is a critical problem in learning optimal treatment assignment policies across various domains, including healthcare and e-commerce. However, it remains understudied in the contextual bandit setting. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit settings, with the flexibility to be adapted for cumulative regret minimization (with near-optimal minimax guarantees) and simple regret minimization (with SOTA guarantees). Furthermore, our algorithms adapt to model misspecification and extend to the continuous arm settings. These advantages come from constructing and relying on "conformal arm sets" (CASs), which provide a set of arms at every context that encompass the context-specific optimal arm with some probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted by a negative result, which shows that an algorithm can't achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.

摘要
<>设置为使用简化中文（简化字）。<>在多个领域中，包括医疗和电商，学习最佳准备分配策略是一个关键问题。然而，在上下文抽象机器人设置中，这个问题尚未得到充分研究。我们提出了一种新的计算效率高的抽象机器人算法家族，用于 Stochastic Contextual Bandit 设置，并且可以适应积累 regret 最小化（具有近似最优最小化保证）和简单 regret 最小化（具有 State-of-the-Art 保证）。此外，我们的算法可以适应模型误差和连续臂设置。这些优势来自于构造和依赖于 "conformal arm sets"（CASs），它们在每个上下文中提供一组拥有 Context-specific 优臂的arm，并且在 Context 分布中具有一定的概率。我们的正面结果表明，我们的算法可以在简单 regret 和积累 regret 两个方面提供保证，而且在模型误差和连续臂设置下也能够适应。相比之下，一个负面结果表明，无法同时实现实例特定的简单 regret 保证和最优的积累 regret 保证。

SoK: Privacy-Preserving Data Synthesis

paper_url: http://arxiv.org/abs/2307.02106
repo_url: None
paper_authors: Yuzheng Hu, Fan Wu, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David Forsyth, Bo Li, Dawn Song
for: 本研究旨在提供一份概述、分析和讨论隐私保护数据分析（PPDS）领域的综述，以便回答有关PPDS方法的设计原则、分类、优缺点等问题。
methods: 本研究批判了两种主流PPDS方法：统计方法和深度学习（DL）基于方法。统计方法包括模型和表示方法的选择，而DL基于方法则包括不同生成模型原理。此外，我们还提供了参考表格、概括结论和开放问题。
results: 我们对私人图像生成任务进行了 benchmarking，并确定了DP-MERF是一种通用的方法。此外，我们还系统化了过去十年的研究成果，并提出了未来研究方向和对研究人员的呼吁。

Abstract
As the prevalence of data analysis grows, safeguarding data privacy has become a paramount concern. Consequently, there has been an upsurge in the development of mechanisms aimed at privacy-preserving data analyses. However, these approaches are task-specific; designing algorithms for new tasks is a cumbersome process. As an alternative, one can create synthetic data that is (ideally) devoid of private information. This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field. Specifically, we put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods. Under the master recipe, we further dissect the statistical methods into choices of modeling and representation, and investigate the DL-based methods by different generative modeling principles. To consolidate our findings, we provide comprehensive reference tables, distill key takeaways, and identify open problems in the existing literature. In doing so, we aim to answer the following questions: What are the design principles behind different PPDS methods? How can we categorize these methods, and what are the advantages and disadvantages associated with each category? Can we provide guidelines for method selection in different real-world scenarios? We proceed to benchmark several prominent DL-based methods on the task of private image synthesis and conclude that DP-MERF is an all-purpose approach. Finally, upon systematizing the work over the past decade, we identify future directions and call for actions from researchers.

摘要
随着数据分析的普及，保护数据隐私已成为首要的关注点。因此，在数据分析中保持隐私的机制的开发呈现了增加趋势。然而，这些方法都是任务特定的，设计新任务的算法是一个繁琐的过程。为了解决这问题，可以创建没有隐私信息的Synthetic Data。本文关注于隐私保护数据合成（PPDS），提供了全面的概述、分析和讨论。特别是，我们提出了一种综合方法，称为“master recipe”，可以统一两个PPDS研究的主要流派：统计方法和深度学习（DL）基本方法。在master recipe下，我们进一步剖析统计方法，包括模型和表示方法的选择，并investigate DL基本模型的不同原则。为了归纳我们的发现，我们提供了完整的参考表格，概括关键点，并识别现有文献中的开放问题。因此，我们想回答以下问题：PPDS方法的设计原则是什么？如何分类这些方法，它们具有什么优势和缺点？是否可以提供实际应用场景中的方法选择指南？我们继续使用DP-MERF方法进行私人图像生成测试，并证明它是一种通用的方法。最后，我们系统化过去十年的工作，并提出未来方向和研究者的呼吁。

DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications

paper_url: http://arxiv.org/abs/2307.02094
repo_url: https://github.com/ibm/domain-adaptive-attribution-robustness
paper_authors: Adam Ivankay, Mattia Rigotti, Pascal Frossard
for: This paper aims to provide a better understanding of the robustness of deep neural network explanations in the biomedical domain.
methods: The paper proposes a new approach called DomainAdaptiveAREstimator (DARE) to estimate the attribution robustness of explanations in the biomedical domain. DARE takes into account domain-specific plausibility to ensure that the explanations are both accurate and relevant to the domain experts.
results: The paper presents two methods, adversarial training and FAR training, to mitigate the brittleness of explanations in the biomedical domain. The proposed methods are validated through extensive experiments on three established biomedical benchmarks.

Abstract
Along with the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has seen a significant increase recently. Several methods have been introduced to provide insight into the inference process of deep neural networks. However, most of these explainability methods have been shown to be brittle in the face of adversarial perturbations of their inputs in the image and generic textual domain. In this work we show that this phenomenon extends to specific and important high stakes domains like biomedical datasets. In particular, we observe that the robustness of explanations should be characterized in terms of the accuracy of the explanation in linking a model's inputs and its decisions - faithfulness - and its relevance from the perspective of domain experts - plausibility. This is crucial to prevent explanations that are inaccurate but still look convincing in the context of the domain at hand. To this end, we show how to adapt current attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. This results in our DomainAdaptiveAREstimator (DARE) attribution robustness estimator, allowing us to properly characterize the domain-specific robustness of faithful explanations. Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE, allowing us to train networks that display robust attributions. Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.

摘要
alongside the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has increased significantly recently. several methods have been introduced to provide insight into the inference process of deep neural networks. however, most of these explainability methods have been shown to be brittle in the face of adversarial perturbations of their inputs in the image and generic textual domain. in this work, we show that this phenomenon extends to specific and important high-stakes domains like biomedical datasets. in particular, we observe that the robustness of explanations should be characterized in terms of the accuracy of the explanation in linking a model's inputs and its decisions - faithfulness - and its relevance from the perspective of domain experts - plausibility. this is crucial to prevent explanations that are inaccurate but still look convincing in the context of the domain at hand. to this end, we show how to adapt current attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. this results in our domain-adaptive attribution robustness estimator (DARE) attribution robustness estimator, allowing us to properly characterize the domain-specific robustness of faithful explanations. next, we provide two methods, adversarial training and far training, to mitigate the brittleness characterized by DARE, allowing us to train networks that display robust attributions. finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.

Make A Long Image Short: Adaptive Token Length for Vision Transformers

paper_url: http://arxiv.org/abs/2307.02092
repo_url: None
paper_authors: Qiqi Zhou, Yichen Zhu
for: 提高预测速度，减少计算成本
methods: 提出了一种适应测试时动态调整图像token长的方法，包括训练一个可变长度ViT模型和使用一个轻量级的Token长分配器（TLA）来分配最优的token长度
results: 实现了对多种现代视Transformer架构的减少计算成本，并在图像分类和动作识别任务上验证了方法的有效性

Abstract
The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in better performance, it also leads to a considerable increase in computational cost. Motivated by the saying "A picture is worth a thousand words," we propose an innovative approach to accelerate the ViT model by shortening long images. Specifically, we introduce a method for adaptively assigning token length for each image at test time to accelerate inference speed. First, we train a Resizable-ViT (ReViT) model capable of processing input with diverse token lengths. Next, we extract token-length labels from ReViT that indicate the minimum number of tokens required to achieve accurate predictions. We then use these labels to train a lightweight Token-Length Assigner (TLA) that allocates the optimal token length for each image during inference. The TLA enables ReViT to process images with the minimum sufficient number of tokens, reducing token numbers in the ViT model and improving inference speed. Our approach is general and compatible with modern vision transformer architectures, significantly reducing computational costs. We verified the effectiveness of our methods on multiple representative ViT models on image classification and action recognition.

摘要
“当代视觉转换器（ViT）模型将图像转换为一系列有 fix 长度的 токен，并对其进行语言处理的处理方式。虽然增加 токен 的数量通常会导致性能提高，但也会带来巨大的 Computational cost。为了解决这个问题，我们提出了一个创新的方法，即在评估时阶段适应地设定图像的 токен 长度。首先，我们训练了可以处理多种 токен 长度的 Resizable-ViT（ReViT）模型。接着，我们从 ReViT 中提取了 token-length 标签，这些标签指示了对于正确预测所需的最少的 токен 数量。我们然后使用这些标签进行训练一个轻量级的 Token-Length Assigner（TLA），这个 TLA 可以在评估过程中为每个图像分配最佳的 токен 长度。这个方法可以让 ReViT 在评估过程中对图像进行适当的处理，并且可以大幅降低 Computational cost。我们验证了我们的方法在多个代表性的 ViT 模型上进行图像分类和动作识别中的效果。”

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

paper_url: http://arxiv.org/abs/2307.02075
repo_url: None
paper_authors: Qijie Ding, Jie Yin, Daokun Zhang, Junbin Gao
for: 提高实体对应性预测的准确率，抗衡假标签错误的影响
methods: 提出一种独特的 pseudo-labeling 框架（UPL-EA），通过精准的 Transport 模型和跨迭代 pseudo-标签准化来消除 pseudo-标签错误，提高实体对应性预测的准确率
results: 实验结果表明，我们的方法可以在有限的先前对应种子基础下达到竞争性的性能，并经过理论支持和实验验证，我们的方法可以减少 Type I 和 Type II pseudo-标签错误的影响

Abstract
Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) The Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to enable more accurate determination of entity correspondences across two KGs and to mitigate the adverse impact of erroneous matches. A simple but highly effective criterion is further devised to derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration. (2) The cross-iteration pseudo-label calibration operates across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee. The two components are respectively designed to eliminate Type I and Type II pseudo-labeling errors identified through our analyse. The calibrated pseudo-labels are thereafter used to augment prior alignment seeds to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. The experimental results show that our approach achieves competitive performance with limited prior alignment seeds.

摘要
Entity alignment (EA) 目标是在不同知识 graphs (KGs) 中标识同一个真实世界标识的等价实体对。为了系统地战胜假标注Error供 pseudo-labeling-based entity alignment，我们提出了一种Unified Pseudo-Labeling框架 дляEntity Alignment (UPL-EA)，该框架可以显著提高实体对应的准确率。UPL-EA包括两个补充部分：1. 基于Optimal Transport (OT)的 pseudo-labeling使用离散OT模型作为有效的方法来帮助更准确地确定两个KG中的实体对应关系，并 Mitigate the adverse impact of erroneous matches。我们还提出了一个简单 yet highly effective的标准来 derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration。2. 跨迭代 pseudo-label calibration operated across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee。这两个部分分别是为了消除Type I和Type II pseudo-labeling errors，这些错误被我们的分析所识别出。归一化后的 pseudo-labels 然后被用来增强后续模型训练中的对应性。我们的方法在 theoretically supported 和 experimentally validated 的情况下，可以减少 pseudo-labeling errors。实验结果显示，我们的方法在有限的先前对Alignment seeds的情况下可以达到竞争性的性能。

Performance Modeling of Data Storage Systems using Generative Models

paper_url: http://arxiv.org/abs/2307.02073
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Abdalaziz Rashid Al-Maeeni, Aziz Temirkhanov, Artem Ryzhikov, Mikhail Hushchyn
for: 这 paper 是用于高精度模型系统的研究。
methods: 这 paper 使用机器学习基于生成模型来构建存储系统模型。
results: 实验结果显示该模型可以对系统性能做出高精度预测（IOPS 和响应时间），错误率在4-10%和3-16%之间，与 Little’s law 之间呈0.99Spearman 相似性。此外，文章还提供了可用于机器学习 regression 算法、条件生成模型和不确定性估计方法的新数据集。

Abstract
High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.

摘要
高精度模型化系统是工业数据分析的一个主要领域。系统的模型，也称为数字响应器，用于预测它们在不同条件下的行为。我们已经开发了一些基于机器学习的生成模型，用于模型存储系统。该系统包括多个组件：硬盘驱动器（HDD）和固态驱动器（SSD）存储池，以及不同的RAID方案和缓存。每个存储组件都是由一个概率模型来描述该组件性能的可能性分布，包括IOPS和延迟时间，它们取决于组件的配置和外部数据负荷参数。实验结果显示，预测错误率为4-10% для IOPS和3-16% для延迟时间，具体取决于组件和模型。这些预测还与李тт尔定律（Little's law）之间有0.99余 correlations，可以用于无监督可靠性检查。此外，我们还提供了一些新的数据集，可以用于机器学习 regression 算法、条件生成模型和不确定性估计方法的Benchmark。

A Comparison of Machine Learning Methods for Data with High-Cardinality Categorical Variables

paper_url: http://arxiv.org/abs/2307.02071
repo_url: https://github.com/fabsig/compare_ml_highcardinality_categorical_variables
paper_authors: Fabio Sigrist
for: 这篇论文主要研究高Cardinality categorical variables的机器学习模型。
methods: 论文使用了树融合和深度神经网络两种机器学习方法，以及线性混合效应模型。
results: 研究发现，机器学习模型带有随机效应的版本比 классиical版本更高的预测精度。此外，树融合带有随机效应的版本也比深度神经网络带有随机效应的版本更高的预测精度。

Abstract
High-cardinality categorical variables are variables for which the number of different levels is large relative to the sample size of a data set, or in other words, there are few data points per level. Machine learning methods can have difficulties with high-cardinality variables. In this article, we empirically compare several versions of two of the most successful machine learning methods, tree-boosting and deep neural networks, and linear mixed effects models using multiple tabular data sets with high-cardinality categorical variables. We find that, first, machine learning models with random effects have higher prediction accuracy than their classical counterparts without random effects, and, second, tree-boosting with random effects outperforms deep neural networks with random effects.

摘要
高级别分类变量是指数据集中每个变量有许多不同的水平，与样本大小相比，这些变量的数量很大。机器学习方法可能会遇到困难处理高级别分类变量。本文employs empirical comparisons of several versions of two of the most successful machine learning methods, tree-boosting and deep neural networks, as well as linear mixed effects models using multiple tabular data sets with high-cardinality categorical variables. Our findings show that: first, machine learning models with random effects have higher prediction accuracy than their classical counterparts without random effects; second, tree-boosting with random effects outperforms deep neural networks with random effects.

Universal Rates for Multiclass Learning

paper_url: http://arxiv.org/abs/2307.02066
repo_url: https://github.com/Machinfy/Human-Activity-Recognition-with-Smartphones
paper_authors: Steve Hanneke, Shay Moran, Qian Zhang
for: 这个论文是为了研究多类分类的普适率而写的。
methods: 这篇论文使用了 pseudo-cubes 和 DSL 树来研究多类分类的学习问题。
results: 这篇论文提出了一个普适率 bound，解决了 Kalavasis 等人（2022）对多类分类问题的开问。 Additionally, the paper shows that any class with an infinite Littlestone tree requires arbitrarily slow rates, while any class with a near-linear rate must have no infinite DSL tree.

Abstract
We study universal rates for multiclass classification, establishing the optimal rates (up to log factors) for all hypothesis classes. This generalizes previous results on binary classification (Bousquet, Hanneke, Moran, van Handel, and Yehudayoff, 2021), and resolves an open question studied by Kalavasis, Velegkas, and Karbasi (2022) who handled the multiclass setting with a bounded number of class labels. In contrast, our result applies for any countable label space. Even for finite label space, our proofs provide a more precise bounds on the learning curves, as they do not depend on the number of labels. Specifically, we show that any class admits exponential rates if and only if it has no infinite Littlestone tree, and admits (near-)linear rates if and only if it has no infinite Daniely-Shalev-Shwartz-Littleston (DSL) tree, and otherwise requires arbitrarily slow rates. DSL trees are a new structure we define in this work, in which each node of the tree is given by a pseudo-cube of possible classifications of a given set of points. Pseudo-cubes are a structure, rooted in the work of Daniely and Shalev-Shwartz (2014), and recently shown by Brukhim, Carmon, Dinur, Moran, and Yehudayoff (2022) to characterize PAC learnability (i.e., uniform rates) for multiclass classification. We also resolve an open question of Kalavasis, Velegkas, and Karbasi (2022) regarding the equivalence of classes having infinite Graph-Littlestone (GL) trees versus infinite Natarajan-Littlestone (NL) trees, showing that they are indeed equivalent.

摘要
我们研究了 universality 的率数，确定了所有假设集合中的优化率（几乎Log因子）。这个结果总结了过去关于二分类（Bousquet, Hanneke, Moran, van Handel, 和 Yehudayoff, 2021）的研究，并解决了 Kalavasis, Velegkas, 和 Karbasi (2022) 处理多类标签的问题，他们只处理了具有bounded数量的类标签的情况。相比之下，我们的结果适用于任何可 COUNTABLE 标签空间。即使是Finite 标签空间，我们的证明还提供了更精确的学习曲线，因为它们不dependent于标签数量。我们证明，任何一个类别都可以实现指数率，如果和只有无限Littlestone树，而不是有限GL树。GL树是一种我们在这个工作中定义的新结构，每个节点都是一个可能的多个分类的pseudo-cube。pseudo-cubes是Daniely 和 Shalev-Shwartz (2014) 的工作中的一种结构，而且在 Brukhim, Carmon, Dinur, Moran, 和 Yehudayoff (2022) 的研究中被证明可以 caracterize PAC 学习（即uniform rates）多类标签分类。我们还解决了 Kalavasis, Velegkas, 和 Karbasi (2022) 关于无限GL树与无限NL树之间的等价性问题，证明它们确实是等价的。

Line Graphics Digitization: A Step Towards Full Automation

paper_url: http://arxiv.org/abs/2307.02065
repo_url: https://github.com/moured/document-graphics-digitization
paper_authors: Omar Moured, Jiaming Zhang, Alina Roitberg, Thorsten Schwarz, Rainer Stiefelhagen
for: 本研究旨在提高数字化文档的可访问性和可重现性，特别是对于数据统计图表的自动化涂抹和文本内容的研究已经是长期的焦点。
methods: 本文引入了细致的数学图表视觉理解任务，并提供了Line Graphics（LG）数据集，包括5种粗细类别的像素级别注解。我们的数据集包括450份来自不同领域的文档，共520张数据图像。
results: 我们在7种当前顶峰模型中测试了LG数据集，并发现这些模型在数据统计图表的Semantic Segmentation和Object Detection任务中的表现。为了推动数据统计图表的数字化进程，我们将会在社区内分享数据集、代码和模型。

Abstract
The digitization of documents allows for wider accessibility and reproducibility. While automatic digitization of document layout and text content has been a long-standing focus of research, this problem in regard to graphical elements, such as statistical plots, has been under-explored. In this paper, we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories. Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines. Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection. To benchmark our LG dataset, we explore 7 state-of-the-art models. To foster further research on the digitization of statistical graphs, we will make the dataset, code, and models publicly available to the community.

摘要
digitization of documents allow for wider accessibility and reproducibility。although automatic digitization of document layout and text content has been a long-standing focus of research，this problem in regard to graphical elements，such as statistical plots，has been under-explored。in this paper，we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset，which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories。our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines。our proposed dataset can support two different computer vision tasks，i.e., semantic segmentation and object detection。to benchmark our LG dataset，we explore 7 state-of-the-art models。to foster further research on the digitization of statistical graphs，we will make the dataset，code，and models publicly available to the community。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Facing off World Model Backbones: RNNs, Transformers, and S4

paper_url: http://arxiv.org/abs/2307.02064
repo_url: None
paper_authors: Fei Deng, Junyeong Park, Sungjin Ahn
for: 提高模型基于学习 reinforcement learning（MBRL）代理的能力，增强代理的长期记忆。
methods: explore alternative world model backbones，包括Transformers和Structured State Space Sequence（S4）模型，以提高长期记忆。
results: S4WM表现出优于Transformer-based world models的长期记忆能力，同时具有更高的训练效率和想象能力。这些结果铺开了开发更强的MBRL代理的道路。

Abstract
World models are a fundamental component in model-based reinforcement learning (MBRL) agents. To perform temporally extended and consistent simulations of the future in partially observable environments, world models need to possess long-term memory. However, state-of-the-art MBRL agents, such as Dreamer, predominantly employ recurrent neural networks (RNNs) as their world model backbone, which have limited memory capacity. In this paper, we seek to explore alternative world model backbones for improving long-term memory. In particular, we investigate the effectiveness of Transformers and Structured State Space Sequence (S4) models, motivated by their remarkable ability to capture long-range dependencies in low-dimensional sequences and their complementary strengths. We propose S4WM, the first S4-based world model that can generate high-dimensional image sequences through latent imagination. Furthermore, we extensively compare RNN-, Transformer-, and S4-based world models across four sets of environments, which we have specifically tailored to assess crucial memory capabilities of world models, including long-term imagination, context-dependent recall, reward prediction, and memory-based reasoning. Our findings demonstrate that S4WM outperforms Transformer-based world models in terms of long-term memory, while exhibiting greater efficiency during training and imagination. These results pave the way for the development of stronger MBRL agents.

摘要
世界模型是模型基 Reinforcement learning（MBRL）代理的重要组成部分。为在部分可见环境中进行持续时间扩展和一致的模拟未来，世界模型需要拥有长期记忆。然而，当前的MBRL代理，如梦幻，主要采用回归神经网络（RNN）作为世界模型脊梁，它们具有有限的记忆容量。在这篇论文中，我们寻找了替代的世界模型脊梁，以提高长期记忆。具体来说，我们调查了转换器和结构化状态空间序列（S4）模型，这些模型具有长距离依赖关系的捕捉能力和相互补偿的优势。我们提出了S4WM，首个基于S4模型的世界模型，可以通过幻想生成高维图像序列。此外，我们对RNN-, Transformer-, 和 S4-基于世界模型进行了广泛比较，并在我们专门为评估世界模型的重要记忆能力而设计的四组环境中进行了测试。我们的结果表明，S4WM在长期记忆方面比转换器基本世界模型高效，同时在训练和幻想过程中更高效。这些成果为开发更强的MBRL代理铺平道。

Adversarial Attacks on Image Classification Models: FGSM and Patch Attacks and their Impact

paper_url: http://arxiv.org/abs/2307.02055
repo_url: None
paper_authors: Jaydip Sen, Subhasis Dasgupta
for: 本文介绍了对图像分类模型使用 adversarial 攻击的概念。
methods: 本文讨论了两种常见的 adversarial 攻击方法，即快速梯度签名法 (FGSM) 和 adversarial 贴图攻击。
results: 对三种强大预训练的图像分类模型（ResNet-34、GoogleNet、DenseNet-161）进行了攻击性评估，并计算了图像分类任务中模型在攻击和不攻击情况下的分类精度。

Abstract
This chapter introduces the concept of adversarial attacks on image classification models built on convolutional neural networks (CNN). CNNs are very popular deep-learning models which are used in image classification tasks. However, very powerful and pre-trained CNN models working very accurately on image datasets for image classification tasks may perform disastrously when the networks are under adversarial attacks. In this work, two very well-known adversarial attacks are discussed and their impact on the performance of image classifiers is analyzed. These two adversarial attacks are the fast gradient sign method (FGSM) and adversarial patch attack. These attacks are launched on three powerful pre-trained image classifier architectures, ResNet-34, GoogleNet, and DenseNet-161. The classification accuracy of the models in the absence and presence of the two attacks are computed on images from the publicly accessible ImageNet dataset. The results are analyzed to evaluate the impact of the attacks on the image classification task.

摘要

Graph Neural Network-based Power Flow Model

paper_url: http://arxiv.org/abs/2307.02049
repo_url: None
paper_authors: Mingjian Tuo, Xingpeng Li, Tianxia Zhao
for: 这篇论文的目的是提出一种基于图神经网络（GNN）的电力流计算模型，以提高电力系统中线流计算的准确性和效率。
methods: 该模型使用历史电力系统数据进行训练，并使用图神经网络（GNN）模型来预测电力流结果。
results: 对比于传统的直流电力流计算模型和深度神经网络（DNN）、卷积神经网络（CNN）模型，该GNN模型能够提供更准确的解决方案，并且高效。

Abstract
Power flow analysis plays a crucial role in examining the electricity flow within a power system network. By performing power flow calculations, the system's steady-state variables, including voltage magnitude, phase angle at each bus, active/reactive power flow across branches, can be determined. While the widely used DC power flow model offers speed and robustness, it may yield inaccurate line flow results for certain transmission lines. This issue becomes more critical when dealing with renewable energy sources such as wind farms, which are often located far from the main grid. Obtaining precise line flow results for these critical lines is vital for next operations. To address these challenges, data-driven approaches leverage historical grid profiles. In this paper, a graph neural network (GNN) model is trained using historical power system data to predict power flow outcomes. The GNN model enables rapid estimation of line flows. A comprehensive performance analysis is conducted, comparing the proposed GNN-based power flow model with the traditional DC power flow model, as well as deep neural network (DNN) and convolutional neural network (CNN). The results on test systems demonstrate that the proposed GNN-based power flow model provides more accurate solutions with high efficiency comparing to benchmark models.

摘要
电流流分析在电力系统网络中扮演着关键的角色，可以确定电力系统的稳定状态变量，包括每个总机的相位角和电压大小。虽然广泛使用的直流电流模型具有速度和可靠性，但可能导致certain transmission lines的流量结果不准确。这个问题在处理可再生能源such as wind farms时变得更加重要，这些可再生能源往往位于主网络远离的地方。为了解决这些挑战，数据驱动方法可以利用历史电力系统数据来预测电流流的结果。在这篇论文中，一种基于图神经网络（GNN）模型被训练使用历史电力系统数据来预测电流流的结果。GNN模型可以快速估算线流。我们进行了全面的性能分析，比较了提议的GNN-based电流流模型与传统的直流电流模型、深度神经网络（DNN）和卷积神经网络（CNN）模型。测试系统上的结果表明，提议的GNN-based电流流模型可以提供更加准确的解决方案，并且高效性比benchmark模型更高。

Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation

paper_url: http://arxiv.org/abs/2307.05476
repo_url: None
paper_authors: Jung Hyun Ryu, Jaeheyoung Jeon, Jewoong Cho, Myungjoo Kang 1
for: 这篇论文主要针对推荐系统中的次序推荐问题，即为用户随时间的偏好进行推荐。
methods: 本论文使用了对照学习方法，将多个模型的参数融合，以提高推荐系统的总性能。
results: 经过广泛的实验，本论文显示出该方法的效果，并证明其能够提高次序推荐系统的状态前进。

Abstract
Along with the exponential growth of online platforms and services, recommendation systems have become essential for identifying relevant items based on user preferences. The domain of sequential recommendation aims to capture evolving user preferences over time. To address dynamic preference, various contrastive learning methods have been proposed to target data sparsity, a challenge in recommendation systems due to the limited user-item interactions. In this paper, we are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. This approach ensures robust fine-tuning by merging the parameters of multiple models, resulting in improved overall performance. Through extensive experiments, we demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.

摘要
随着在线平台和服务的快速增长，推荐系统已成为用户喜好的标准工具。序列推荐的领域旨在捕捉用户的时间演变的偏好。为了解决动态偏好的挑战，多种对照学习方法已经被提议用于目标数据稀缺。在这篇论文中，我们是首次将施耐德-抽取方法应用于序列推荐，解决和解决实际挑战。这种方法确保了精度的练习调整，从而提高总性性能。通过广泛的实验，我们证明了我们的提议方法的效果， highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.

VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks

paper_url: http://arxiv.org/abs/2307.02040
repo_url: None
paper_authors: Zhaomin Wu, Junyi Hou, Bingsheng He
for: 本研究はVertical Federated Learning（VFL）の性能评価に适用される公共世界データセットの欠如に対処します。
methods: 本研究では、Feature importanceとFeature correlationの2つの键因子を考虑し、それぞれに対応する评価指标とデータセットの分割方法を提案します。
results: 本研究では、State-of-the-art VFLアルゴリズムの效果的な评価を提供し、Future researchの参考になる価値ある Insightsを提供します。

Abstract
Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from a global set, which only capture a subset of feature distributions, leading to inadequate algorithm performance assessment. This paper addresses these shortcomings by introducing two key factors affecting VFL performance - feature importance and feature correlation - and proposing associated evaluation metrics and dataset splitting methods. Additionally, we introduce a real VFL dataset to address the deficit in image-image VFL scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides valuable insights for future research in the field.

摘要
纵向联合学习（VFL）是训练机器学习模型的重要方法，该方法在分布式数据上进行特征分区。然而由于隐私限制，公共世界中的VFL数据集很少，这些数据集只代表了有限的特征分布。现有的标准 benchmark 通常采用人工生成的数据集，这些数据集只反映了一部分特征分布，导致算法性能评估不准确。本文解决这些缺陷，通过介绍特征重要性和特征相关性两个关键因素，并提出相应的评价指标和数据分割方法。此外，我们还介绍了一个真实存在的VFL数据集，用于解决图像-图像VFL场景中的不足。我们对当前VFL领域最先进的算法进行了全面的评估，提供了valuable的情况参考。

Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

paper_url: http://arxiv.org/abs/2307.02037
repo_url: None
paper_authors: Xunpeng Huang, Hanze Dong, Yifan Hao, Yian Ma, Tong Zhang
for: 本研究探讨了 posterior sampling 的可能性，它是通过反射扩散来实现高质量数据样本的生成模型的效能的一种方法。
methods: 本研究使用了分解过程kernel的技术，将 score estimation 转化为了一个mean estimation问题，从而实现了一种新的 posterior sampling 算法。
results: 我们提供了这种算法的收敛分析，并证明了其在高维样本中的性能比传统MCMC方法更高，这是因为该算法的auxiliary distribution的一些性质可以减少误差。

Abstract
The efficacy of modern generative models is commonly contingent upon the precision of score estimation along the diffusion path, with a focus on diffusion models and their ability to generate high-quality data samples. This study delves into the potentialities of posterior sampling through reverse diffusion. An examination of the sampling literature reveals that score estimation can be transformed into a mean estimation problem via the decomposition of the transition kernel. By estimating the mean of the auxiliary distribution, the reverse diffusion process can give rise to a novel posterior sampling algorithm, which diverges from traditional gradient-based Markov Chain Monte Carlo (MCMC) methods. We provide the convergence analysis in total variation distance and demonstrate that the isoperimetric dependency of the proposed algorithm is comparatively lower than that observed in conventional MCMC techniques, which justifies the superior performance for high dimensional sampling with error tolerance. Our analytical framework offers fresh perspectives on the complexity of score estimation at various time points, as denoted by the properties of the auxiliary distribution.

摘要
现代生成模型的效果通常取决于扩散路径上的分数估计精度，尤其是扩散模型和它们能够生成高质量数据样本。这项研究探讨了反扩散 posterior 采样的可能性，通过将分数估计转换为auxiliary distribution的均值估计问题。通过估计auxiliary distribution的均值，反扩散过程可以生成一种新的 posterior 采样算法，与传统的梯度基本 Markov Chain Monte Carlo (MCMC) 方法不同。我们提供了整体变量距离的收敛分析，并证明了提案的算法的iso依赖关系比传统 MCMC 技术更低，这 justify了高维度采样中的高精度和误差容忍。我们的分析框架为 score estimation 的复杂性在不同时刻点提供了新的视角，即auxiliary distribution的属性。

Ranking with Abstention

paper_url: http://arxiv.org/abs/2307.02035
repo_url: None
paper_authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
for: 这个论文提出了一种新的排名概念，即learner可以在一定成本$c$的情况下决定不预测。
methods: 这个论文使用了一种扩展的理论分析，包括线性函数家族和带有一个隐藏层的神经网络的$H$-一致性 bound。
results: 实验结果表明，这种排名方法在实际应用中具有效果。

Abstract
We introduce a novel framework of ranking with abstention, where the learner can abstain from making prediction at some limited cost $c$. We present a extensive theoretical analysis of this framework including a series of $H$-consistency bounds for both the family of linear functions and that of neural networks with one hidden-layer. These theoretical guarantees are the state-of-the-art consistency guarantees in the literature, which are upper bounds on the target loss estimation error of a predictor in a hypothesis set $H$, expressed in terms of the surrogate loss estimation error of that predictor. We further argue that our proposed abstention methods are important when using common equicontinuous hypothesis sets in practice. We report the results of experiments illustrating the effectiveness of ranking with abstention.

摘要
我们介绍了一种新的排名框架，其中学习者可以在某些有限成本$c$的情况下退出预测。我们提供了广泛的理论分析，包括线性函数家族和带一层隐藏层神经网络的$H$-一致性上下文。这些理论保证是文献中的最佳一致性保证，它们是指定损失函数集$H$中预测器的目标损失估计错误的Upper bound，表示了预测器在损失函数集$H$中的损失估计错误。我们还 argue что我们提议的投降方法在实际中使用公共等距inuous假设集时是重要的。我们报告了实验结果，证明了排名框架的效果。

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

paper_url: http://arxiv.org/abs/2307.02031
repo_url: None
paper_authors: Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Xiaonan Nie, Bin Cui
for: This paper aims to improve the efficiency of training Transformer models across multiple GPUs.
methods: The paper proposes a novel system framework called Galvatron-BMW, which integrates multiple parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy using a decision tree approach and dynamic programming search algorithm.
results: Galvatron-BMW consistently achieves superior system throughput in automating distributed training under varying GPU memory constraints, surpassing previous approaches that rely on limited parallelism strategies.Here is the text in Simplified Chinese:
for: 这篇论文目的是提高多个GPU上Transformer模型的训练效率。
methods: 该论文提出了一种新的系统框架called Galvatron-BMW，它集成了多种并发方向并自动确定最佳混合并发策略，使用决策树方法和动态规划搜索算法。
results: Galvatron-BMW在不同Transformer模型的测试场景中 consistently达到了自动化分布训练的最高系统吞吐量，超过了前一些仅仅采用有限并发策略的方法。

Abstract
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.

摘要
<> tranlate_text: transformer 模型已经成为实现不同应用领域的状态码模型的主流方法， serving 为高级大规模深度学习（DL）模型的基础。然而，在多个GPU上有效地训练这些模型仍然是一个复杂的挑战，因为存在丰富的并行性选择。现有的 DL 系统可以通过手动设计分布式训练计划或限制并行性组合的搜索空间。在这篇论文中，我们提出了 Galvatron-BMW 系统框架，该系统框架集成了多种流行的并行性维度，并自动确定最有效的混合并行性策略。为了有效地探索这个庞大的搜索空间，我们使用决策树方法进行分解和剔除，基于直观的理解。此外，我们还提出了一种动态搜索算法，以derive 最佳计划。此外，为了提高资源利用率和系统效率，我们提出了一种两个目标优化工作流程，该工作流程关注工作负荷均衡。我们对不同的 transformer 模型进行了不同的评估， demonstrates Galvatron-BMW 在不同的 GPU 内存限制下自动化分布式训练的能力。在所有测试场景中，Galvatron-BMW consistently achieve superior system throughput，超越了基于有限并行性策略的前一代方法。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

paper_url: http://arxiv.org/abs/2307.02028
repo_url: https://github.com/som-shahlab/ehrshot-benchmark
paper_authors: Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason Fries, Nigam Shah
for: 本研究的目的是提高医疗机器学习（ML）在医疗领域的进步，通过公共数据集、任务和模型的共享，但医疗领域的ML进步受到共享资产的限制。本研究通过三个贡献来解决这些挑战。
methods: 本研究使用了一个新的数据集，名为EHRSHOT，这是医疗记录电子档案（EHR）中的6,712名患者的去identify的结构化数据。与MIMIC-III/IV和其他流行的EHR数据集不同，EHRSHOT是长期跟踪的，而不是仅仅是ICU/ED patients的数据。此外，本研究还公布了一个141M参数的临床基础模型，这是一个可以处理coded EHR数据的完整模型，而不是只能处理不结构化文本的模型。
results: 本研究定义了15个几个shot临床预测任务，使得可以评估基础模型的样本效率和任务适应性。同时，研究者们还提供了一个可重现结果的代码，以及模型和数据集（通过研究数据使用协议获取）。

Abstract
While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, containing de-identified structured data from the electronic health records (EHRs) of 6,712 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaption. The code to reproduce our results, as well as the model and dataset (via a research data use agreement), are available at our Github repo here: https://github.com/som-shahlab/ehrshot-benchmark

摘要
generale 机器学习（ML）社区得益于公共数据集、任务和模型，而医疗机器学习（ML）的进步却受到公共资产的缺乏所妨碍。成功的基本模型创造了新的挑战，需要访问共享预训练模型来验证性能 beneficiaries。我们通过以下三个贡献来解决这些挑战：1. 我们发布了一个新的数据集，EHRSHOT，包含了医疗电子病历（EHR）中6,712名患者的去掉个人信息的结构化数据。与MIMIC-III/IV和其他流行的EHR数据集不同，EHRSHOT是 longitudinal 的，而不是仅仅是ICU/ED patients。2. 我们发布了一个141M参数的临床基础模型，预训练于EHR数据中的结构化数据中的2.57M名患者。我们是一个 Among the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR。我们提供了一个端到端的管道，让社区可以验证和基于其性能。3. 我们定义了15个几个shot的临床预测任务，使得基础模型的性能在样本效率和任务适应方面进行评估。我们的代码、模型和数据集（通过研究数据用途协议获取）都可以在我们 GitHub 仓库中找到：https://github.com/som-shahlab/ehrshot-benchmark。

Using Random Effects Machine Learning Algorithms to Identify Vulnerability to Depression

paper_url: http://arxiv.org/abs/2307.02023
repo_url: None
paper_authors: Runa Bhaumik, Jonathan Stange
for: 预测青年成年人抑郁症状的诊断和预后 прогнозинг
methods: 使用数据驱动的机器学习方法（RE-EM树和MERF）对抑郁风险因素进行分类和识别
results: 结果表明，RE-EM树和MERF方法可以准确地预测青年成年人抑郁症状，并且可以确定抑郁风险因素的复杂相互作用，以及哪些因素对于预后预测最有用。

Abstract
Background: Reliable prediction of clinical progression over time can improve the outcomes of depression. Little work has been done integrating various risk factors for depression, to determine the combinations of factors with the greatest utility for identifying which individuals are at the greatest risk. Method: This study demonstrates that data-driven machine learning (ML) methods such as RE-EM (Random Effects/Expectation Maximization) trees and MERF (Mixed Effects Random Forest) can be applied to reliably identify variables that have the greatest utility for classifying subgroups at greatest risk for depression. 185 young adults completed measures of depression risk, including rumination, worry, negative cognitive styles, cognitive and coping flexibilities, and negative life events, along with symptoms of depression. We trained RE-EM trees and MERF algorithms and compared them to traditional linear mixed models (LMMs) predicting depressive symptoms prospectively and concurrently with cross-validation. Results: Our results indicated that the RE-EM tree and MERF methods model complex interactions, identify subgroups of individuals and predict depression severity comparable to LMM. Further, machine learning models determined that brooding, negative life events, negative cognitive styles, and perceived control were the most relevant predictors of future depression levels. Conclusions: Random effects machine learning models have the potential for high clinical utility and can be leveraged for interventions to reduce vulnerability to depression.

摘要
背景：可靠预测临床进程的发展可以提高抑郁症的结果。然而，有少量的研究把不同的风险因素 integrate 以确定最有用的组合因素，以确定患有抑郁症的个人是否处于最高风险。方法：本研究表明，数据驱动的机器学习（ML）方法，如RE-EM（随机效应/期望最大化）树和MERF（混合效应随机森林）可以可靠地识别出抑郁症风险的最有用变量。185名年轻成人完成了抑郁风险的测量，包括催眠、担忧、消极思维、认知和处理的灵活性、以及负面生活事件，同时测量抑郁症的 симптом。我们训练了RE-EM树和MERF算法，并与传统的线性混合模型（LMM）相比，预测抑郁症的严重程度。结果：我们的结果表明，RE-EM树和MERF方法可以模型复杂的交互，分类个人为不同的子组合，并且预测抑郁症的严重程度与LMM相当。此外，机器学习模型确定了催眠、负面生活事件、消极思维和感知控制是抑郁症的最有用预测因素。结论：Random effects机器学习模型具有高临床实用性，可以用于降低抑郁症的抵触性。

Modular DFR: Digital Delayed Feedback Reservoir Model for Enhancing Design Flexibility

paper_url: http://arxiv.org/abs/2307.11094
repo_url: None
paper_authors: Sosei Ikeda, Hiromitsu Awano, Takashi Sato
for: 这个论文主要是为了提出一种全数字式延迟反馈水库系统（DFR），以便在硬件实现中使用。
methods: 该论文提出了一种新的模块化DFR模型，该模型可以完全在数字domain中实现，并且可以采用不同的非线性函数进行选择，从而提高准确性而减少功耗。
results: 该论文通过两种不同的非线性函数实现DFR，实现了功耗降低10倍和吞吐量提高5.3倍，而保持相同或更好的准确性。

Abstract
A delayed feedback reservoir (DFR) is a type of reservoir computing system well-suited for hardware implementations owing to its simple structure. Most existing DFR implementations use analog circuits that require both digital-to-analog and analog-to-digital converters for interfacing. However, digital DFRs emulate analog nonlinear components in the digital domain, resulting in a lack of design flexibility and higher power consumption. In this paper, we propose a novel modular DFR model that is suitable for fully digital implementations. The proposed model reduces the number of hyperparameters and allows flexibility in the selection of the nonlinear function, which improves the accuracy while reducing the power consumption. We further present two DFR realizations with different nonlinear functions, achieving 10x power reduction and 5.3x throughput improvement while maintaining equal or better accuracy.

摘要
一种延迟反馈蓄水池（DFR）是一种适合硬件实现的计算系统，具有简单的结构。现有大多数 DFR 实现使用分析电路，需要数字到分析和分析到数字转换器进行交互。然而，数字 DFR 模拟分析电路在数字领域，导致设计灵活性偏低和功耗更高。在本文中，我们提出一种新的模块化 DFR 模型，适合完全数字实现。我们的提案减少了多个超参数，并允许非线性函数的选择，从而提高了准确性，同时降低了功耗。我们进一步采用了两种不同的非线性函数来实现 DFR，实现了功耗降低10倍和通过putthrough提高5.3倍，保持或更好的准确性。

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations

paper_url: http://arxiv.org/abs/2307.03678
repo_url: None
paper_authors: Yuhan Ji, Song Gao
for: 评估大语言模型（LLMs）在表示几何和其空间关系方面的能力。
methods: 使用GPT-2和BERT等大语言模型将文本（WKT）格式的几何编码并feed其 embeddings 到分类器和回归器进行评估效果。
results: LLMs-生成的embeddings可以保持几何类型和捕捉一定的空间关系（准确率达73%），但还存在估算数值和检索空间相关对象的挑战。

Abstract
This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using foundation models.

摘要

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

paper_url: http://arxiv.org/abs/2307.02507
repo_url: None
paper_authors: Lincan Li, Kaixiang Yang, Fengji Luo, Jichao Bi
for: 这个研究目的是为了提高大规模未标注交通数据中的复杂空间时间表现，以及对于其他缺乏数据的跨空间任务。
methods: 这篇论文使用了进步的对照学习和一个新的空间时间同步Contextual Contrastive Learning（STS-CCL）模型，包括对于空间时间граф数据的基本和强化增强方法，以及一个空间时间同步对照模组（STS-CM），以同时捕捉出Decent空间时间依赖关系。
results: 实验和评估结果显示，使用STS-CCL模型建立预测器可以对交通预测 benchmark 进行超越性的表现，并且适合具有缺乏数据的大规模跨空间任务。

Abstract
Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, a Semantic Contextual Contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.

摘要
efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains a challenging task. to address this challenge, this work employs advanced contrastive learning and proposes a novel spatial-temporal synchronous contextual contrastive learning (STS-CCL) model. first, we elaborate on the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. second, we introduce a spatial-temporal synchronous contrastive module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. to further discriminate node individuals in negative filtering, a semantic contextual contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. the proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.

Distilling Missing Modality Knowledge from Ultrasound for Endometriosis Diagnosis with Magnetic Resonance Images

paper_url: http://arxiv.org/abs/2307.02000
repo_url: None
paper_authors: Yuan Zhang, Hu Wang, David Butler, Minh-Son To, Jodie Avery, M Louise Hull, Gustavo Carneiro
for: 提高 Magnetic Resonance Imaging (MRI) 图像中镜像腔腔积极膜（POD）探测精度，使用知识汇抽法。
methods: 利用不同数据集的 teacher 模型和学生模型，通过知识汇抽法进行训练，提高学生模型对 MRI 图像中 POD 探测的精度。
results: 实验结果表明，使用我们提出的方法可以提高 MRI 图像中 POD 探测的精度。

Abstract
Endometriosis is a common chronic gynecological disorder that has many characteristics, including the pouch of Douglas (POD) obliteration, which can be diagnosed using Transvaginal gynecological ultrasound (TVUS) scans and magnetic resonance imaging (MRI). TVUS and MRI are complementary non-invasive endometriosis diagnosis imaging techniques, but patients are usually not scanned using both modalities and, it is generally more challenging to detect POD obliteration from MRI than TVUS. To mitigate this classification imbalance, we propose in this paper a knowledge distillation training algorithm to improve the POD obliteration detection from MRI by leveraging the detection results from unpaired TVUS data. More specifically, our algorithm pre-trains a teacher model to detect POD obliteration from TVUS data, and it also pre-trains a student model with 3D masked auto-encoder using a large amount of unlabelled pelvic 3D MRI volumes. Next, we distill the knowledge from the teacher TVUS POD obliteration detector to train the student MRI model by minimizing a regression loss that approximates the output of the student to the teacher using unpaired TVUS and MRI data. Experimental results on our endometriosis dataset containing TVUS and MRI data demonstrate the effectiveness of our method to improve the POD detection accuracy from MRI.

摘要
具体来说，我们的算法首先在 TVUS 数据上训练一个教师模型，用于检测 PODS 消失。然后，我们将这个教师模型与一个学生模型相结合，使用大量的未标注 pelvic 3D MRI 数据进行训练。接着，我们将教师模型中的知识传授给学生模型，使其通过对不同的 TVUS 和 MRI 数据进行无标注对应的损失函数来学习。实验结果表明，我们的方法可以提高 MRI 上 PODS 的检测精度。

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

paper_url: http://arxiv.org/abs/2307.01998
repo_url: https://github.com/sldgroup/survey-zero-shot-nas
paper_authors: Guihong Li, Duc Hoang, Kartikeya Bhardwaj, Ming Lin, Zhangyang Wang, Radu Marculescu
for: 本文旨在审视和比较当前最佳实践（SOTA）的零shot Neural Architecture Search（NAS）方法，强调它们在硬件上的意识。
methods: 本文首先介绍主流的零shot proxy，并讲解它们的理论基础。然后通过大规模的实验比较这些零shot proxy，并在硬件意识和硬件感知 NAS 场景中证明其效果。
results: 本文的实验结果表明，零shot NAS 方法在硬件意识和硬件感知场景中具有极高的效果，并且可以在不同的硬件背景下进行可靠的 NAS。此外，本文还提出了一些可能更好的 proxy 设计的想法。

Abstract
Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate the NAS from training requirements. The key idea behind zero-shot NAS approaches is to design proxies that predict the accuracies of the given networks without training network parameters. The proxies proposed so far are usually inspired by recent progress in theoretical deep learning and have shown great potential on several NAS benchmark datasets. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches, with an emphasis on their hardware awareness. To this end, we first review the mainstream zero-shot proxies and discuss their theoretical underpinnings. We then compare these zero-shot proxies through large-scale experiments and demonstrate their effectiveness in both hardware-aware and hardware-oblivious NAS scenarios. Finally, we point out several promising ideas to design better proxies. Our source code and the related paper list are available on https://github.com/SLDGroup/survey-zero-shot-nas.

摘要
最近，零shot（或无需训练）神经建筑搜索（NAS）方法已经被提出，以解 liberate NAS 从训练要求中。零shot NAS 方法的关键想法是通过不需要训练网络参数来预测网络的准确性。已经提出的proxy都是基于现代神经网络理论的发展，在几个 NAS 比赛数据集上显示出了极高的潜力。这篇论文的目的是对当前领先的零shot NAS 方法进行全面的审视和比较，强调硬件意识。为此，我们首先介绍主流零shot proxy，并讨论它们的理论基础。然后，我们通过大规模的实验比较这些零shot proxy，并在硬件意识和硬件无知 NAS 场景中证明它们的效iveness。最后，我们提出了一些可能会设计更好的proxy的想法。我们的源代码和相关论文列表可以在https://github.com/SLDGroup/survey-zero-shot-nas上获取。

Dynamic Feature-based Deep Reinforcement Learning for Flow Control of Circular Cylinder with Sparse Surface Pressure Sensing

paper_url: http://arxiv.org/abs/2307.01995
repo_url: None
paper_authors: Qiulei Wang, Lei Yan, Gang Hu, Wenli Chen, Bernd R. Noack
for: 这个研究旨在开发一种基于深度学习的闭Loop瓣纹控制算法，以降低瓣纹 drag 和 lift 波动，并且在感知不充分的情况下进行自适应控制。
methods: 该研究基于深度学习，将感知信号提升为动态特征（DF），以预测未来的流态态。 resulting DF-DRL 自动学习了响应控制器，无需动态模型。
results: 对比标准模型，DF-DRL 模型的瓣纹系数降低了25%。使用单个表面压力传感器，DF-DRL 可以降低瓣纹系数到状态 искусственный智能性的8%，并且减少了升力系数波动。这种方法还在更高的 Reynolds 数下表现良好，降低了瓣纹系数32.2% 和 46.55%。

Abstract
This study proposes a self-learning algorithm for closed-loop cylinder wake control targeting lower drag and lower lift fluctuations with the additional challenge of sparse sensor information, taking deep reinforcement learning as the starting point. DRL performance is significantly improved by lifting the sensor signals to dynamic features (DF), which predict future flow states. The resulting dynamic feature-based DRL (DF-DRL) automatically learns a feedback control in the plant without a dynamic model. Results show that the drag coefficient of the DF-DRL model is 25% less than the vanilla model based on direct sensor feedback. More importantly, using only one surface pressure sensor, DF-DRL can reduce the drag coefficient to a state-of-the-art performance of about 8% at Re = 100 and significantly mitigate lift coefficient fluctuations. Hence, DF-DRL allows the deployment of sparse sensing of the flow without degrading the control performance. This method also shows good robustness in controlling flow under higher Reynolds numbers, which reduces the drag coefficient by 32.2% and 46.55% at Re = 500 and 1000, respectively, indicating the broad applicability of the method. Since surface pressure information is more straightforward to measure in realistic scenarios than flow velocity information, this study provides a valuable reference for experimentally designing the active flow control of a circular cylinder based on wall pressure signals, which is an essential step toward further developing intelligent control in realistic multi-input multi-output (MIMO) system.

摘要

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

paper_url: http://arxiv.org/abs/2307.01984
repo_url: https://github.com/neheller/kits21
paper_authors: Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jianpeng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee Schoephoerster, Bailey Abernathy, David Wu, Safa Abdulkadir, Ben Byun, Justice Spriggs, Griffin Struyk, Alexandra Austin, Ben Simpson, Michael Hagstrom, Sierra Virnig, John French, Nitin Venkatesh, Sarah Chan, Keenan Moore, Anna Jacobsen, Susan Austin, Mark Austin, Subodh Regmi, Nikolaos Papanikolopoulos, Christopher Weight
for: 本文是关于2021年的肾茵和肾肿瘤分割挑战（KiTS21）的挑战报告，与2021年的医疗图像计算和计算机助手外科会议（MICCAI）一起举行。
methods: 本挑战使用了一种新的标注方法，收集了每个区域兴趣的三个独立标注，并使用了一个基于网络的标注工具进行完全透明的标注。此外，KiTS21测试集来自外部机构，挑战参与者开发出能够通用化的方法。
results: despite the challenges, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. Here’s the translation in Traditional Chinese:
for: 本文是关于2021年的肾茵和肾肿瘤分割挑战（KiTS21）的挑战报告，与2021年的医疗图像计算和计算机助手外科会议（MICCAI）一起举行。
methods: 本挑战使用了一种新的标注方法，收集了每个区域兴趣的三个独立标注，并使用了一个基于网络的标注工具进行完全透明的标注。此外，KiTS21测试集来自外部机构，挑战参与者开发出能够通用化的方法。
results: despite the challenges, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance.

Abstract
This paper presents the challenge report for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) held in conjunction with the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI). KiTS21 is a sequel to its first edition in 2019, and it features a variety of innovations in how the challenge was designed, in addition to a larger dataset. A novel annotation method was used to collect three separate annotations for each region of interest, and these annotations were performed in a fully transparent setting using a web-based annotation tool. Further, the KiTS21 test set was collected from an outside institution, challenging participants to develop methods that generalize well to new populations. Nonetheless, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. An in-depth meta-analysis is presented describing which methods were used and how they faired on the leaderboard, as well as the characteristics of which cases generally saw good performance, and which did not. Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.

摘要
这篇论文介绍了2021年的肾脏和肾肿瘤分割挑战（KiTS21）的挑战报告，该挑战在2021年的医学影像计算和计算助手外科学会（MICCAI）会议上举行。KiTS21是2019年的首届版本的续作，它在设计方面添加了许多创新，同时使用了更大的数据集。在这次挑战中，使用了一种新的注解方法，每个区域兴趣都有三个独立的注解，并在网络上使用了 transparent 的注解工具进行了注解。此外，KiTS21 测试集来自于外部机构，挑战参与者们开发出能够在新人口中广泛应用的方法。不过，最高排名的团队在2019年的状态前进set上达到了显著的改进，并且这种性能在人类水平逐渐往近。文章还提供了一个深入的meta-分析，描述了参与者们使用的方法以及其在排名表上的表现，以及特定情况下的好坏表现。总的来说，KiTS21 对肾脏瘤分割领域的状态前进做出了重要贡献，并为 semantic segmentation 领域提供了有用的指导。

A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis

paper_url: http://arxiv.org/abs/2307.01981
repo_url: None
paper_authors: Jiaxiang Liu, Tianxiang Hu, Yan Zhang, Xiaotang Gai, Yang Feng, Zuozhu Liu
for: 这个研究是为了提出一个零条件医疗影像分类框架，以便在实际应用中对于有限的疾病或大规模标注数据进行医疗诊断。
methods: 这个研究使用了CLIP的预训练视觉语言模型，并与ChatGPT进行整合，以提供可解释的医疗诊断。在这个框架中，我们使用了分类名称来询问大型语言模型（LLMs），以生成更多的cue和知识，例如疾病 симптом或描述，帮助提供更加精确和可解释的诊断。
results: 我们在一个私人数据集和四个公共数据集上进行了广泛的实验，并进行了详细分析，结果显示了我们的零条件医疗影像分类框架的有效性和可解释性，证明了VLMs和LLMs在医疗应用中的巨大潜力。

Abstract
Zero-shot medical image classification is a critical process in real-world scenarios where we have limited access to all possible diseases or large-scale annotated data. It involves computing similarity scores between a query medical image and possible disease categories to determine the diagnostic result. Recent advances in pretrained vision-language models (VLMs) such as CLIP have shown great performance for zero-shot natural image recognition and exhibit benefits in medical applications. However, an explainable zero-shot medical image recognition framework with promising performance is yet under development. In this paper, we propose a novel CLIP-based zero-shot medical image classification framework supplemented with ChatGPT for explainable diagnosis, mimicking the diagnostic process performed by human experts. The key idea is to query large language models (LLMs) with category names to automatically generate additional cues and knowledge, such as disease symptoms or descriptions other than a single category name, to help provide more accurate and explainable diagnosis in CLIP. We further design specific prompts to enhance the quality of generated texts by ChatGPT that describe visual medical features. Extensive results on one private dataset and four public datasets along with detailed analysis demonstrate the effectiveness and explainability of our training-free zero-shot diagnosis pipeline, corroborating the great potential of VLMs and LLMs for medical applications.

摘要
zero-shot医疗影像分类是现实世界中的关键过程，其中我们可能只有有限的疾病或大规模注释的数据。它涉及计算医疗影像和可能的疾病类别之间的相似性分数，以确定诊断结果。现代预训练视觉语言模型（VLM），如CLIP，在无需训练的情况下显示出了非常好的性能，并且在医疗应用中展现出了优势。然而，一个可解释的无需训练医疗影像分类框架仍然在开发中。在本文中，我们提出了一种基于CLIP的新的无需训练医疗影像分类框架，并与ChatGPT结合使用以提供可解释的诊断。我们的关键想法是使用类别名称来查询大型语言模型（LLM），以自动生成更多的引导和知识，如疾病 симптом或描述，以帮助提供更准确和可解释的诊断。我们还设计了特定的提示，以提高生成的文本中的可读性。我们的无需训练零shot诊断管道在一个私人数据集和四个公共数据集上进行了广泛的测试，并进行了详细的分析，结果证明了我们的训练free零shot诊断管道的有效性和可解释性，证明了VLM和LLM在医疗应用中的潜力。

Algorithme EM régularisé

paper_url: http://arxiv.org/abs/2307.01955
repo_url: None
paper_authors: Pierre Houdouin, Matthieu Jonkcheere, Frederic Pascal
for: 用于处理小样本大数据的 Gaussian Mixture Model (GMM) 最优化likelihood问题。
methods: 提出了一种受限制的EM算法，通过使用先验知识来缓解小样本大数据的问题，以确保covariance矩阵更新的正定性。
results: 实验表明该方法在 clustering 任务中表现良好。

Abstract
Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing maximum likelihood estimate when dealing with Gaussian Mixture Model (GMM). When the sample size is smaller than the data dimension, this could lead to a singular or poorly conditioned covariance matrix and, thus, to performance reduction. This paper presents a regularized version of the EM algorithm that efficiently uses prior knowledge to cope with a small sample size. This method aims to maximize a penalized GMM likelihood where regularized estimation may ensure positive definiteness of covariance matrix updates by shrinking the estimators towards some structured target covariance matrices. Finally, experiments on real data highlight the good performance of the proposed algorithm for clustering purposes

摘要
<>预期最大化（EM）算法是一种广泛使用的迭代算法，用于计算 Gaussian Mixture Model（GMM）中的最大可能性。当样本大小小于数据维度时，这可能导致一个稀疏或不良条件的协方差矩阵，从而导致性能下降。这篇文章提出了一种经过规格化的 EM 算法，可以有效地利用先前知识来应对小样本大小。该方法的目标是最大化约束后 GMM likelihood，通过压缩估计器向一些结构化目标协方差矩阵偏转。最后，在实际数据上进行了 clustering 实验，并证明了该算法的良好性能。>>>

FEMDA: Une méthode de classification robuste et flexible

paper_url: http://arxiv.org/abs/2307.01954
repo_url: None
paper_authors: Pierre Houdouin, Matthieu Jonckheere, Frederic Pascal
for: 本研究旨在提出一种可以承受不同标准差和独立但不同分布的样本的新分类分析技术，以替代传统的线性和 quadratic discriminant分析方法，这些方法受到非泊oluisson分布和杂乱数据的影响。
methods: 该技术基于每个数据点都由其自己的arbitrary Elliptically Symmetrical（ES）分布和自己的扩展参数来定义，这使得模型能够处理可能非常不同、独立但不同分布的样本。
results: 该技术比其他状态艺术方法更快速、简单，对于涉及到非泊oluisson分布和杂乱数据的情况具有更高的Robustness，可以更好地适应实际应用中的数据分布。

Abstract
Linear and Quadratic Discriminant Analysis (LDA and QDA) are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. This paper studies the robustness to scale changes in the data of a new discriminant analysis technique where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. The new decision rule derived is simple, fast, and robust to scale changes in the data compared to other state-of-the-art method

摘要

A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.01951
repo_url: https://github.com/kvignesh1420/gnn_collapse
paper_authors: Vignesh Kothapalli, Tom Tirer, Joan Bruna
for: 本研究ocuses on node-wise classification tasks using graph neural networks (GNNs), and explores the interplay between graph topology and feature evolution.
methods: 本研究使用了community detection on stochastic block model graphs to illustrate the feature evolution, and explores the “Neural Collapse” (NC) phenomenon to understand the reduction in within-class variability.
results: 研究发现，在node-wise classification setting中，也有一定的减少内类差异，但不如instance-wise caso。然而，我们通过理论分析发现，这种减少内类差异的情况需要图像 obey certain strict structural conditions。此外，我们还研究了层次的feature variability evolution和spectral methods的差异。

Abstract
Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. Yet, the interplay between graph topology and feature evolution in GNNs is not well understood. In this paper, we focus on node-wise classification, illustrated with community detection on stochastic block model graphs, and explore the feature evolution through the lens of the "Neural Collapse" (NC) phenomenon. When training instance-wise deep classifiers (e.g. for image classification) beyond the zero training error point, NC demonstrates a reduction in the deepest features' within-class variability and an increased alignment of their class means to certain symmetric structures. We start with an empirical study that shows that a decrease in within-class variability is also prevalent in the node-wise classification setting, however, not to the extent observed in the instance-wise case. Then, we theoretically study this distinction. Specifically, we show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse. Interestingly, this condition is viable also for heterophilic graphs and relates to recent empirical studies on settings with improved GNNs' generalization. Furthermore, by studying the gradient dynamics of the theoretical model, we provide reasoning for the partial collapse observed empirically. Finally, we present a study on the evolution of within- and between-class feature variability across layers of a well-trained GNN and contrast the behavior with spectral methods.

摘要
格 Edge 神经网络 (GNNs) 在图像数据上的分类任务中得到了广泛的应用。然而，图像结构和特征进化在 GNNs 之间的关系还不够了解。在这篇论文中，我们将注意力集中在图像分类任务上，使用社会均衡图来检测社群，并通过 "神经崩溃" (NC) 现象来探索特征进化。当训练深度分类器（例如图像分类） beyond 零训练错误点时，NC 显示出深度特征内部的同类变化减少和类别中心对某些对称结构的偏好增加。我们开始于一个实验研究，显示在节点级分类设定下，也存在类似的减少同类变化现象，但不如实例级分类情况那么严重。然后，我们进行了理论研究。我们表明，即使使用 "乐观" 的数学模型， graphs 需要遵循一种严格的结构条件，以便具有精确的崩溃。意外地，这种条件适用于异谱图也，并与最近的实际研究中的 GNNs 的泛化有关。此外，我们通过研究理论模型的梯度动力学，提供了崩溃观察到的解释。最后，我们展示了一个层次进化特征的演化过程，并与 спектраль方法相比较。

A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization

paper_url: http://arxiv.org/abs/2307.01946
repo_url: None
paper_authors: Kshama Kodthalu Shivashankara, Afagh Mehri Shervedani, Reza Sameni
for: 这个论文的目的是提出一种新的方法来生成Synthetic ECG图像，以便用于训练深度学习模型进行算法式ECG诊断。
methods: 该方法利用了深度学习图像处理技术，并将 PhysioNet PTB-XL ECG时间序列数据作为引用时间序列数据，通过数据扩展技术来生成Synthetic ECG图像。
results: 研究人员通过计算信号噪声比（SNR）来评估生成的Synthetic ECG图像质量，结果显示了平均信号恢复SNR为27$\pm$2.8dB，这说明了提出的Synthetic ECG图像集可以用于训练深度学习模型。

Abstract
The electrocardiogram (ECG) is an accurate and widely available tool for diagnosing cardiovascular diseases. ECGs have been recorded in printed formats for decades and their digitization holds great potential for training machine learning (ML) models in algorithmic ECG diagnosis. Physical ECG archives are at risk of deterioration and scanning printed ECGs alone is insufficient, as ML models require ECG time-series data. Therefore, the digitization and conversion of paper ECG archives into time-series data is of utmost importance. Deep learning models for image processing show promise in this regard. However, the scarcity of ECG archives with reference time-series is a challenge. Data augmentation techniques utilizing \textit{digital twins} present a potential solution. We introduce a novel method for generating synthetic ECG images on standard paper-like ECG backgrounds with realistic artifacts. Distortions including handwritten text artifacts, wrinkles, creases and perspective transforms are applied to the generated images, without personally identifiable information. As a use case, we generated an ECG image dataset of 21,801 records from the 12-lead PhysioNet PTB-XL ECG time-series dataset. A deep ECG image digitization model was built and trained on the synthetic dataset, and was employed to convert the synthetic images to time-series data for evaluation. The signal-to-noise ratio (SNR) was calculated to assess the image digitization quality vs the ground truth ECG time-series. The results show an average signal recovery SNR of 27$\pm$2.8\,dB, demonstrating the significance of the proposed synthetic ECG image dataset for training deep learning models. The codebase is available as an open-access toolbox for ECG research.

摘要
电rokardiogram (ECG) 是一种精度很高且普遍可用的工具，用于诊断心血管疾病。ECG 已经被记录在Printed format 中decades，其数字化具有很大的潜力，用于训练机器学习（ML）模型。Physical ECG archive 面临着逐渐衰老和损坏的风险，而且将Printed ECG 纸背景上的ECG 记录scan alone 是不够的，因为ML 模型需要时间序列数据。因此，将纸背景上的ECG 记录数字化和转换为时间序列数据是非常重要的。深度学习模型 для图像处理表示了可能性。然而，获取ECG archive 中的参考时间序列数据是一个挑战。使用数据扩展技术利用“数字双胞胎”的想法可以解决这个问题。我们提出了一种新的方法，用于在标准纸背景上生成synthetic ECG 图像。这些图像包括手写文本 artifacts、折叠、皱纹和视角变换等缺失，但不包含个人可识别信息。作为用例，我们生成了21,801个纪录，来自PhysioNet PTB-XL ECG 时间序列 dataset。我们建立了一个深度ECG 图像数字化模型，并将其训练在生成的synthetic dataset上。然后，我们使用该模型将生成的synthetic图像转换为时间序列数据，并计算了信号噪声比（SNR）来评估图像数字化质量与真实ECG 时间序列数据之间的对比。结果显示，生成的ECG 图像数据的平均信号恢复SNR为27$\pm$2.8dB，这说明了我们提出的Synthetic ECG 图像dataset的重要性。我们的代码库作为一个开源工具箱，用于心血管疾病研究。

Text + Sketch: Image Compression at Ultra Low Rates

paper_url: http://arxiv.org/abs/2307.01944
repo_url: https://github.com/leieric/text-sketch
paper_authors: Eric Lei, Yiğit Berkay Uslu, Hamed Hassani, Shirin Saeedi Bidokhti
for: 本文旨在探讨如何使用文本描述生成高质量图像，并用于图像压缩。
methods: 本文使用了一些直接使用预训练模型进行图像压缩的技术，包括使用文本描述和侧信息生成高质量重建图像，以及使用预训练模型进行图像压缩。
results: 研究发现，使用这些技术可以在非常低的比特率下实现高度的semantic和spatial结构保持，并且在learned compressors中显著提高了感知和semantic faithfulness。

Abstract
Recent advances in text-to-image generative models provide the ability to generate high-quality images from short text descriptions. These foundation models, when pre-trained on billion-scale datasets, are effective for various downstream tasks with little or no further training. A natural question to ask is how such models may be adapted for image compression. We investigate several techniques in which the pre-trained models can be directly used to implement compression schemes targeting novel low rate regimes. We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions that preserve both semantics and spatial structure of the original. We demonstrate that at very low bit-rates, our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.

摘要

A Neural Network-Based Enrichment of Reproducing Kernel Approximation for Modeling Brittle Fracture

paper_url: http://arxiv.org/abs/2307.01937
repo_url: None
paper_authors: Jonghyuk Baek, Jiun-Shyan Chen
For: The paper is written to propose an improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) for modeling brittle fracture.* Methods: The proposed method uses a background reproducing kernel (RK) approximation defined on a coarse and uniform discretization, which is enriched by a neural network (NN) approximation under a Partition of Unity framework. The NN approximation automatically locates and inserts regularized discontinuities in the function space.* Results: The proposed method is demonstrated to be effective through a series of numerical examples involving damage propagation and branching, and the solution convergence of the proposed method is guaranteed.Here are the three points in Simplified Chinese:* For: 这篇论文是为了提出一种改进版的神经网络增强的 reproduce kernel particle method (NN-RKPM)，用于模拟脆性断裂。* Methods: 该方法使用了一个背景 reproduce kernel (RK) approximation，定义在一个粗略和均匀的离散中，并通过一个神经网络 (NN) aproximation 下的 Partition of Unity 框架进行增强。NN aproximation 自动在函数空间中找到和插入正规破碎。* Results: 该方法在一系列的数值例子中，包括损害传播和分支，并且解的收敛性是保证的。

Abstract
Numerical modeling of localizations is a challenging task due to the evolving rough solution in which the localization paths are not predefined. Despite decades of efforts, there is a need for innovative discretization-independent computational methods to predict the evolution of localizations. In this work, an improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) is proposed for modeling brittle fracture. In the proposed method, a background reproducing kernel (RK) approximation defined on a coarse and uniform discretization is enriched by a neural network (NN) approximation under a Partition of Unity framework. In the NN approximation, the deep neural network automatically locates and inserts regularized discontinuities in the function space. The NN-based enrichment functions are then patched together with RK approximation functions using RK as a Partition of Unity patching function. The optimum NN parameters defining the location, orientation, and displacement distribution across location together with RK approximation coefficients are obtained via the energy-based loss function minimization. To regularize the NN-RK approximation, a constraint on the spatial gradient of the parametric coordinates is imposed in the loss function. Analysis of the convergence properties shows that the solution convergence of the proposed method is guaranteed. The effectiveness of the proposed method is demonstrated by a series of numerical examples involving damage propagation and branching.

摘要
numerical modeling of localizations 是一个复杂的任务，因为本地化路径不是预定的。 DESPITE 数十年的努力，目前仍需要创新的离散独立计算方法，以预测本地化的演化。在这种工作中，一种改进的神经网络增强的复现器kernel方法（NN-RKPM）被提议用于模拟脆弱裂解。在提议的方法中，背景的复现器kernel（RK）approximation在粗略和均匀的离散上定义，然后通过一个神经网络（NN）approximation在Partition of Unity框架下进行增强。在NNapproximation中，深度神经网络自动在函数空间中找到并插入正规化缺陷。然后，NN基于的增强函数被与RKapproximation函数用RK作为Partition of Unity patching函数相连接。通过能量基本的损失函数最小化来获取优化NN参数，其中NN参数包括位置、方向和分布的拟合。为了正则化NN-RKapproximation，在损失函数中添加了空间梯度的约束。分析表示方法的扩散性是保证的。通过一系列数字示例，包括损失传播和分支，这种方法的有效性得到证明。

MDI+: A Flexible Random Forest-Based Feature Importance Framework

paper_url: http://arxiv.org/abs/2307.01932
repo_url: https://github.com/csinva/imodels
paper_authors: Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu
for: 本研究旨在提出一种可变的特征重要性框架，即MDI+，以提高Random Forest模型中特征的重要性评估。
methods: 本研究使用了Random Forest模型和Generalized Linear Models（GLMs），并提出了一种基于Predictability、Computability和Stability框架的指南，以帮助实践者选择适合的GLM和评价指标。
results: 实验表明，MDI+可以在识别信号特征方面表现出色，并且在实际应用中可以提取已知的预测性基因，并且比现有的特征重要性评估方法具有更高的稳定性。

Abstract
Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.

摘要
“ mean decrease in impurity (MDI) 是一个流行的特征重要度量表 для random forest (RF)。我们证明了 MDI 中的特征 $X_k$ 在每棵树中的 RF 相等于不调和的 $R^2$ 值在对应的决策探针中的线性回传模型中。我们使用这个解释来提出一个灵活的特征重要度框架called MDI+。 Specifically, MDI+ 将 MDI 扩展到让分析师可以更改线性回传模型和 $R^2$ 指标，并且包括额外的特征以减少决策树对添加或平滑模型的偏见。我们还提供适当的 GLM 和指标基于 Predictability, Computability, Stability 框架的指南。广泛的数据验证表明 MDI+ 可以对应用于特征重要度度量表示明显的提高。我们还应用 MDI+ 到了两个实际的应用案例：药物对应预测和乳癌类型分类。我们发现 MDI+ 可以提取稳定且有高预测力的遗传因素，较常用的特征重要度度量表示明显的更好。所有的代码和模型都可以在 GitHub 上找到。”

Learning ECG signal features without backpropagation

paper_url: http://arxiv.org/abs/2307.01930
repo_url: None
paper_authors: Péter Pósfay, Marcell T. Kurbucz, Péter Kovács, Antal Jakovác
for: 这篇论文的目的是提出一种新的方法来生成时间序列数据的表示方式，以提高下游任务的效果、范围和可应用性。
methods: 该方法基于物理学的想法，通过数据驱动的方式构建一个减少的表示，同时能够捕捉数据的下面结构和任务特定信息，并且仍然保持易于理解、可读性和验证性。
results: 通过应用该方法于心跳信号分类任务，实现了状态首位表现。

Abstract
Representation learning has become a crucial area of research in machine learning, as it aims to discover efficient ways of representing raw data with useful features to increase the effectiveness, scope and applicability of downstream tasks such as classification and prediction. In this paper, we propose a novel method to generate representations for time series-type data. This method relies on ideas from theoretical physics to construct a compact representation in a data-driven way, and it can capture both the underlying structure of the data and task-specific information while still remaining intuitive, interpretable and verifiable. This novel methodology aims to identify linear laws that can effectively capture a shared characteristic among samples belonging to a specific class. By subsequently utilizing these laws to generate a classifier-agnostic representation in a forward manner, they become applicable in a generalized setting. We demonstrate the effectiveness of our approach on the task of ECG signal classification, achieving state-of-the-art performance.

摘要
研究者们在机器学习领域内，尤其是在 Representation learning 方面，为了找到可以快速、高效地将原始数据转换为有用特征，以提高下游任务（如分类和预测）的效果、范围和可重用性。在这篇论文中，我们提出了一种新的方法，用于生成时间序列型数据的表示。这种方法基于物理学的想法，通过在数据驱动的方式下构建一个压缩表示，能够捕捉数据的下面结构和任务特定信息，同时仍然保持易于理解、可读性和可验证性。这种新的方法ology 目标是在特定类别中找到共同的特征，并通过这些法律生成一个批处器无关的表示，以便在总体上应用。我们在 ECG 信号分类任务中证明了我们的方法的效果，达到了领导性的表现。

ProtoDiffusion: Classifier-Free Diffusion Guidance with Prototype Learning

paper_url: http://arxiv.org/abs/2307.01924
repo_url: None
paper_authors: Gulcin Baykal, Halil Faruk Karagoz, Taha Binhuraib, Gozde Unal
for: 提高生成质量和稳定性，减少训练时间
methods: integrate prototype learning into diffusion models
results: 在不同的数据集和实验设置下，成功实现更高的生成质量和更快的训练时间

Abstract
Diffusion models are generative models that have shown significant advantages compared to other generative models in terms of higher generation quality and more stable training. However, the computational need for training diffusion models is considerably increased. In this work, we incorporate prototype learning into diffusion models to achieve high generation quality faster than the original diffusion model. Instead of randomly initialized class embeddings, we use separately learned class prototypes as the conditioning information to guide the diffusion process. We observe that our method, called ProtoDiffusion, achieves better performance in the early stages of training compared to the baseline method, signifying that using the learned prototypes shortens the training time. We demonstrate the performance of ProtoDiffusion using various datasets and experimental settings, achieving the best performance in shorter times across all settings.

摘要
Diffusion models 是一类生成模型，在生成质量和训练稳定性方面表现出了明显的优势。然而，训练 diffusion models 所需的计算资源增加了 considrably。在这种情况下，我们将 prototype learning 引入 diffusion models，以实现更高的生成质量和更快的训练速度。而不是使用随机初始化的类嵌入，我们使用分开学习的类prototype来导引diffusion过程。我们发现，我们的方法（即 ProtoDiffusion）在训练的早期阶段表现出了更好的性能，这表明使用学习的 prototype 可以缩短训练时间。我们通过不同的数据集和实验设置来证明 ProtoDiffusion 的性能，在所有设置下都达到了最佳性能，并且在更短的时间内完成。

ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling

paper_url: http://arxiv.org/abs/2307.01909
repo_url: https://github.com/aditya-grover/climate-learn
paper_authors: Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover
for:* The paper is written to introduce an open-source PyTorch library called ClimateLearn for training and evaluating machine learning models in data-driven climate science.methods:* The library includes holistic pipelines for dataset processing, state-of-the-art deep learning models, and quantitative and qualitative evaluation for standard weather and climate modeling tasks.results:* The authors have performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of their library, and to their knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems.

Abstract
Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts. In recent years, there has been a surging interest in applying data-driven methods based on machine learning for solving core problems such as weather forecasting and climate downscaling. Despite promising results, much of this progress has been impaired due to the lack of large-scale, open-source efforts for reproducibility, resulting in the use of inconsistent or underspecified datasets, training setups, and evaluations by both domain scientists and artificial intelligence researchers. We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. ClimateLearn consists of holistic pipelines for dataset processing (e.g., ERA5, CMIP6, PRISM), implementation of state-of-the-art deep learning models (e.g., Transformers, ResNets), and quantitative and qualitative evaluation for standard weather and climate modeling tasks. We supplement these functionalities with extensive documentation, contribution guides, and quickstart tutorials to expand access and promote community growth. We have also performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of our library. To our knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems. Our library is available publicly at https://github.com/aditya-grover/climate-learn.

摘要
模拟天气和气候是一项非常重要的努力，以便更好地理解气候变化的短期和长期影响，以及为适应和控制努力提供技术和政策。在过去几年中，有一种增长的兴趣在应用基于机器学习的数据驱动方法来解决气候科学中的核心问题，如天气预报和气候减小。然而，由于缺乏大规模、开源的努力，导致许多进步受到了限制，因为很多域科学家和人工智能研究者使用不一致或不够特定的数据集、训练setup和评估方法。我们介绍了一个名为ClimateLearn的开源PyTorch库，该库可以很大程度地简化天气预报和气候模型训练和评估的过程。ClimateLearn包括整体数据处理管道（如ERA5、CMIP6、PRISM）、现代深度学习模型（如转换器、径深网络）的实现，以及标准天气和气候模型计算任务的量化和质量评估。我们还提供了广泛的文档、贡献指南和快速入门教程，以扩大访问权限和促进社区增长。我们还执行了广泛的预测和减小实验，以示出库的能力和关键特点。到我们所知，ClimateLearn是首个大规模、开源的气候科学与现代机器学习系统之间的桥梁。我们的库可以在https://github.com/aditya-grover/climate-learn上获取。

Stability Analysis Framework for Particle-based Distance GANs with Wasserstein Gradient Flow

paper_url: http://arxiv.org/abs/2307.01879
repo_url: None
paper_authors: Chuqi Chen, Yue Wu, Yang Xiang
For: 本研究 investigate the training process of generative networks that use particle-based distance as the objective function, such as MMD GAN, Cramér GAN, and EIEG GAN. However, these GANs often suffer from unstable training.* Methods: 我们 analyze the stability of the training process of these GANs from the perspective of probability density dynamics. We regard the discriminator $D$ as a feature transformation mapping and the generator $G$ as a random variable mapping. We use the Wasserstein gradient flow of the probability density function to perform stability analysis.* Results: 我们发现 that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.

Abstract
In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cram\'er GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability density dynamics. In our framework, we regard the discriminator $D$ in these GANs as a feature transformation mapping that maps high dimensional data into a feature space, while the generator $G$ maps random variables to samples that resemble real data in terms of feature space. This perspective enables us to perform stability analysis for the training of GANs using the Wasserstein gradient flow of the probability density function. We find that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.

摘要
在这篇论文中，我们研究了生成网络在使用某种概率密度距离函数作为目标函数时的训练过程，例如MMD GAN、Cramér GAN、EIEG GAN。然而，这些GANs经常遇到训练不稳定的问题。在这篇论文中，我们从概率密度动力学的角度分析了这些GANs的训练过程的稳定性。我们认为权重网络$D$ acts as a feature transformation mapping that maps high-dimensional data into a feature space, while generator $G$ maps random variables to samples that resemble real data in terms of feature space.这种角度允许我们使用泊松流程来分析GANs的训练过程的稳定性。我们发现通常在GANs中的训练过程中，权重网络的训练是不稳定的，这是由于GANs中的$\min_G \max_D E(G, D)$的形式化引起的。为了解决这个问题，我们在权重网络的损失函数中添加了稳定化项。我们进行了实验来验证我们的稳定性分析和稳定化方法。

Fast Private Kernel Density Estimation via Locality Sensitive Quantization

paper_url: http://arxiv.org/abs/2307.01877
repo_url: https://github.com/talwagner/lsq
paper_authors: Tal Wagner, Yonatan Naamad, Nina Mishra
for: efficient mechanisms for differentially private kernel density estimation (DP-KDE)
methods: Locality Sensitive Quantization (LSQ) framework, which leverages existing non-private KDE methods and privatizes them in a black-box manner
results: DP-KDE mechanisms that are fast and accurate on large datasets in both high and low dimensions, with linear time complexity in the number of dimensions $d$

Abstract
We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved bounds for low-dimensional data. Our results are obtained through a general framework, which we term Locality Sensitive Quantization (LSQ), for constructing private KDE mechanisms where existing KDE approximation techniques can be applied. It lets us leverage several efficient non-private KDE methods -- like Random Fourier Features, the Fast Gauss Transform, and Locality Sensitive Hashing -- and ``privatize'' them in a black-box manner. Our experiments demonstrate that our resulting DP-KDE mechanisms are fast and accurate on large datasets in both high and low dimensions.

摘要
我们研究高效的权限私钥频率概率密度估计（DP-KDE）机制。先前的工作对于 Gaussian kernel 提出了时间复杂度为对数函数($d$)的算法。这篇论文破坏了这个限制，并示出了在高维数据时间复杂度 linear 的 KDE aproximation 机制，使得其成为可行的。我们还提供了低维数据的改进 bound。我们的结果基于一个通用的框架，我们称之为 Local Sensitive Quantization（LSQ），用于构建私钥 KDE 机制。它允许我们利用一些高效的非私钥 KDE 方法，如 Random Fourier Features、Fast Gauss Transform 和 Locality Sensitive Hashing，并将它们“黑盒”化，以实现私钥 KDE 机制。我们的实验表明，我们的 resulting DP-KDE 机制在大数据集上具有高速和高准确性。

Generalization Guarantees via Algorithm-dependent Rademacher Complexity

paper_url: http://arxiv.org/abs/2307.02501
repo_url: None
paper_authors: Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli
for: 本文旨在提供一种新的复杂度度量来控制通用化误差，用于现代机器学习算法。
methods: 本文使用了一种基于卷积函数的复杂度度量，并利用了这种度量的一些标准性质和各种各样的假设集合的结构，从而得到了一些新的一阶bounds。
results: 本文得到了一些新的一阶bounds，包括基于幂积函数的bounds和基于假设集合的稳定性的bounds。这些bounds可以扩展到连续函数空间中的函数类和压缩算法，并且比之前的方法更加简单和直观。

Abstract
Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.

摘要
Algorithm-和数据-依赖的总结 bounds 是现代机器学习算法的总结行为的解释需要的。在这种情况下，存在信息理论性的总结 bounds，其中包括（不同形式的）相互信息，以及基于假设集的稳定性。我们提出了一个概念上相关，但技术上不同的复杂度测量来控制总结错误，即算法和数据依赖的假设集中的Empirical Rademacher complexity。通过将标准的Rademacher complexity性质与这种类型的概念结合，我们能够：（i）获得基于finite fractal dimension的新的 bounds，这些bounds（a）在前期的继承维度类型 bounds 中扩展到了有限假设类型，并（b）避免在先前的工作中需要的mutual information项;（ii）我们大大简化了最近的维度独立总结 bound for stochastic gradient descent;（iii）我们轻松地回归到VC类和压缩 schemes中的结果，与基于conditional mutual information的方法类似。

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning

paper_url: http://arxiv.org/abs/2307.01875
repo_url: None
paper_authors: Tamas Madl, Weijie Xu, Olivia Choudhury, Matthew Howard
for: 这篇论文目的是提高机器学习中的数据 Utility，同时保证 differential privacy。
methods: 本文提出了一个名为 3A (Approximate, Adapt, Anonymize) 的数据发布框架，以 maximize 数据 Utility，同时保证 differential privacy。
results: 实验结果显示，使用本文提出的方法可以实现高度的数据 Utility，并且与实际数据中的模型性能相似。 compared to state-of-the-art models, 本文的方法可以提高数据生成的分类性能。

Abstract
The availability of large amounts of informative data is crucial for successful machine learning. However, in domains with sensitive information, the release of high-utility data which protects the privacy of individuals has proven challenging. Despite progress in differential privacy and generative modeling for privacy-preserving data release in the literature, only a few approaches optimize for machine learning utility: most approaches only take into account statistical metrics on the data itself and fail to explicitly preserve the loss metrics of machine learning models that are to be subsequently trained on the generated data. In this paper, we introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning, while preserving differential privacy. We also describe a specific implementation of this framework that leverages mixture models to approximate, kernel-inducing points to adapt, and Gaussian differential privacy to anonymize a dataset, in order to ensure that the resulting data is both privacy-preserving and high utility. We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets, when evaluated on held-out real data. We also compare our results with several privacy-preserving synthetic data generation models (such as differentially private generative adversarial networks), and report significant increases in classification performance metrics compared to state-of-the-art models. These favorable comparisons show that the presented framework is a promising direction of research, increasing the utility of low-risk synthetic data release for machine learning.

摘要
“具有大量有用数据的可用性是成功机器学习的关键。然而，在包含敏感信息的领域中，发布高Utility数据以保护个人隐私是挑战。尽管在Literature中已有进步的泛化隐私和生成模型，但大多数方法只考虑数据本身的统计指标，并没有显式保持机器学习模型将要在生成数据上训练的损失指标。在这篇论文中，我们介绍了一个数据发布框架，称为3A（简化、适应、匿名），以最大化机器学习数据的有用性，同时保持泛化隐私。我们还描述了该框架的具体实现，利用混合模型简化数据，使用抽象点适应数据，并使用泛化隐私保护数据，以确保生成的数据具有隐私保护和高Utility。我们通过实验证明，在评估模型在真实数据上的性能时， Privatized 数据与实际数据的差异很小。我们还与一些隐私保护生成数据生成模型进行比较，并发现我们的结果具有显著的提高性，相比于当前的模型。这些有利的比较表明，我们提出的框架是一个有前途的研究方向，增加低风险的生成数据发布的机器学习 utility。”

A hybrid machine learning framework for clad characteristics prediction in metal additive manufacturing

paper_url: http://arxiv.org/abs/2307.01872
repo_url: https://github.com/sinatayebati/cladnet-ml-for-am
paper_authors: Sina Tayebati, Kyu Taek Cho
for:This paper aims to develop a hybrid approach that combines computational fluid dynamics (CFD) modeling and machine learning (ML) techniques to predict and understand the characteristics of metal additive manufacturing (MAM) printed clads.methods:The authors use a calibrated CFD model to generate a comprehensive dataset of clad characteristics, including geometry, quality, and processing parameters. They then employ two sets of processing parameters for training ML models, along with versatile ML models and reliable evaluation metrics, to create a scalable learning framework for predicting clad geometry and quality.results:The proposed hybrid approach resolves many challenges of conventional modeling methods in MAM by providing an efficient, accurate, and scalable platform for clad characteristics prediction and optimization. The authors demonstrate the effectiveness of their approach by using it to predict clad geometry and quality under different processing conditions.

Abstract
During the past decade, metal additive manufacturing (MAM) has experienced significant developments and gained much attention due to its ability to fabricate complex parts, manufacture products with functionally graded materials, minimize waste, and enable low-cost customization. Despite these advantages, predicting the impact of processing parameters on the characteristics of an MAM printed clad is challenging due to the complex nature of MAM processes. Machine learning (ML) techniques can help connect the physics underlying the process and processing parameters to the clad characteristics. In this study, we introduce a hybrid approach which involves utilizing the data provided by a calibrated multi-physics computational fluid dynamic (CFD) model and experimental research for preparing the essential big dataset, and then uses a comprehensive framework consisting of various ML models to predict and understand clad characteristics. We first compile an extensive dataset by fusing experimental data into the data generated using the developed CFD model for this study. This dataset comprises critical clad characteristics, including geometrical features such as width, height, and depth, labels identifying clad quality, and processing parameters. Second, we use two sets of processing parameters for training the ML models: machine setting parameters and physics-aware parameters, along with versatile ML models and reliable evaluation metrics to create a comprehensive and scalable learning framework for predicting clad geometry and quality. This framework can serve as a basis for clad characteristics control and process optimization. The framework resolves many challenges of conventional modeling methods in MAM by solving t the issue of data scarcity using a hybrid approach and introducing an efficient, accurate, and scalable platform for clad characteristics prediction and optimization.

摘要
过去一个 décennie，金属添加itive制造（MAM）经历了重要的发展和引起了广泛关注，因为它可以制造复杂的部件，生产具有功能分布的材料的产品，最小化废弃物，并实现低成本定制。然而，预测MAM打印后皮层特性的影响因素是复杂的，因为MAM过程的自然特性。机器学习（ML）技术可以帮助将物理下面的过程和处理参数与皮层特性相连。在本研究中，我们提出了一种混合方法，利用实验室数据和CFD模型提供的数据来准备 essencial的大型数据集，然后使用包括多种ML模型的完整框架来预测和理解皮层特性。我们首先编辑了一个广泛的数据集，将实验室数据和CFD模型生成的数据融合在一起，这个数据集包括皮层特性的关键特征，如宽度、高度和深度，标签标识皮层质量，以及处理参数。第二，我们使用两组处理参数进行训练ML模型：机器设置参数和物理意识参数，以及多种可靠的ML模型和评价指标来建立一个全面、精准和可扩展的学习框架，用于预测皮层几何和质量。这个框架可以作为皮层特性控制和过程优化的基础。这个框架解决了传统模型方法在MAM中的多个挑战，例如数据不足问题，通过混合方法和引入高效、准确和可扩展的平台，以便预测皮层特性和优化过程。

Self-Consuming Generative Models Go MAD

paper_url: http://arxiv.org/abs/2307.01850
repo_url: None
paper_authors: Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk
for: 本研究探讨了使用生成AI算法训练下一代模型时，自适应循环的特性。
methods: 我们使用当今最佳生成图像模型的三种家族来分析自适应循环的不同情况，包括在训练过程中是否有固定或新鲜的实际数据可用，以及模型是否受到偏见，以考虑数据质量和多样性之间的trade-off。
results: 我们发现，在没有充足的新鲜实际数据的情况下，自适应循环中的未来生成模型会逐渐减少精度或多样性。我们称这种情况为模型自适应疾病（MAD），与狂牛病相似。

Abstract
Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and in whether the samples from previous generation models have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD), making analogy to mad cow disease.

摘要
seized advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and in whether the samples from previous generation models have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD), making analogy to mad cow disease.Here's the translation breakdown: seized (抓取) - advances autophagous (自食性) - loops generative (生成) - AI algorithms imagery (图像) - data types other (其他) - data typesNote that the word "mad" in the last sentence is not translated, as it is a metaphorical term and not a direct translation.

Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

paper_url: http://arxiv.org/abs/2307.01849
repo_url: None
paper_authors: Xiang Li, Varun Belagali, Jinghuan Shang, Michael S. Ryoo
for: robot imitation learning
methods: diffusion models, self-supervised learning (SSL) objective
results: better representation for policy learning, especially when the demonstrations have different proficiencies.Here’s the full text in Simplified Chinese:
for: robot imitation learning
methods: diffusion models, self-supervised learning (SSL) objective
results: 更好的政策学习表示，特别是当示例具有不同水平的时候。

Abstract
Sequence modeling approaches have shown promising results in robot imitation learning. Recently, diffusion models have been adopted for behavioral cloning, benefiting from their exceptional capabilities in modeling complex data distribution. In this work, we propose Crossway Diffusion, a method to enhance diffusion-based visuomotor policy learning by using an extra self-supervised learning (SSL) objective. The standard diffusion-based policy generates action sequences from random noise conditioned on visual observations and other low-dimensional states. We further extend this by introducing a new decoder that reconstructs raw image pixels (and other state information) from the intermediate representations of the reverse diffusion process, and train the model jointly using the SSL loss. Our experiments demonstrate the effectiveness of Crossway Diffusion in various simulated and real-world robot tasks, confirming its advantages over the standard diffusion-based policy. We demonstrate that such self-supervised reconstruction enables better representation for policy learning, especially when the demonstrations have different proficiencies.

摘要
sequence modeling方法在机器人模仿学习中显示了扎实的成果。最近，扩散模型在行为刻画中被采用，因为它们在处理复杂数据分布方面表现出色。在这种工作中，我们提议了跨度扩散（Crossway Diffusion），一种使用额外的自动学习（SSL）目标来增强扩散基于视 Motor 政策学习的方法。标准的扩散基于策略会根据随机噪声和视觉观察结果生成动作序列。我们进一步延伸了这种方法，通过引入一个新的解码器，将推 diffusion 过程中的中间表示重建为原始图像像素和其他状态信息，并在模型中同时使用 SSL 损失进行训练。我们的实验表明，跨度扩散在各种模拟和实际的机器人任务中具有优势，特别是当示例具有不同的技巧水平时。

Empirical Sample Complexity of Neural Network Mixed State Reconstruction

paper_url: http://arxiv.org/abs/2307.01840
repo_url: None
paper_authors: Haimeng Zhao, Giuseppe Carleo, Filippo Vicentini
for: 这个论文旨在研究量子状态重建技术，以减少实际应用中的量子极限复杂性。
methods: 该论文使用了不同的量子状态重建技术，包括变分减少技术，并对其进行了数值研究。
results: 研究发现，在温度有限的伊塞ING模型中，使用不同的量子状态重建技术可以系统地减少量子资源的需求。同时，比较了两种主要的量子 neural 状态编码，即量子扩散算符表示和正值算符测量表示，并发现它们在混合性的不同范围内表现不同。

Abstract
Quantum state reconstruction using Neural Quantum States has been proposed as a viable tool to reduce quantum shot complexity in practical applications, and its advantage over competing techniques has been shown in numerical experiments focusing mainly on the noiseless case. In this work, we numerically investigate the performance of different quantum state reconstruction techniques for mixed states: the finite-temperature Ising model. We show how to systematically reduce the quantum resource requirement of the algorithms by applying variance reduction techniques. Then, we compare the two leading neural quantum state encodings of the state, namely, the Neural Density Operator and the positive operator-valued measurement representation, and illustrate their different performance as the mixedness of the target state varies. We find that certain encodings are more efficient in different regimes of mixedness and point out the need for designing more efficient encodings in terms of both classical and quantum resources.

摘要
量子状态重建使用神经量子状态已被提议为实际应用中减少量子射频复杂性的可能工具，并其优势于竞争技术在数字实验中得到了证明。在这项工作中，我们数字实验 investigate了不同量子状态重建技术的性能在杂态场景下：finite-temperature Ising模型。我们表明如何系统地减少量子资源需求的算法，并应用变差缓和技术。然后，我们比较了两种主要的神经量子状态编码方法， namely，神经激发函数和正值算符测量表示法，并示出它们在不同杂度水平下的不同性能。我们发现某些编码在不同的杂度范围内更高效，并指出了设计更高效的编码的需求，即类比和量子资源。

Collaborative Score Distillation for Consistent Visual Synthesis

paper_url: http://arxiv.org/abs/2307.04787
repo_url: https://github.com/subin-kim-cv/CSD
paper_authors: Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin
for: 提高文本到图像扩散模型的应用范围和可编辑性。
methods: 基于 Stein 变分Gradient Descent（SVGD）的 Collaborative Score Distillation（CSD）方法，通过考虑多个样本的分布来塑造图像集的共聚性。
results: 在各种任务中，如修改投影图像、视频和3D场景，CSD方法能够提高图像集之间的一致性，从而扩展文本到图像扩散模型的应用范围。

Abstract
Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.

摘要
<>将文本转换为简化中文。<>大规模文本到图像扩散模型的生成先验可以激发多种新的生成和编辑应用程序。然而，当应用这些先验到复杂的视觉模式时，保证一组图像之间的一致性是挑战。在这篇论文中，我们解决这个挑战方法是协同分数精灵（CSD）。CSD基于斯坦变分 Gradient Descent（SVGD）。我们建议将多个样本视为“粒子”在SVGD更新中，并将它们的分数函数组合以静止生成先验覆盖多个图像同步。因此，CSD使得多个图像之间的信息集成更加简单，从而实现了多个样本之间的视觉同步。我们在多种任务中展示了CSD的效果，包括修改广角图像、视频和3D场景。我们的结果表明CSD是一种多功能的方法，可以增强样本之间的一致性，从而扩大文本到图像扩散模型的应用范围。

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

paper_url: http://arxiv.org/abs/2307.01831
repo_url: https://github.com/DiT-3D/DiT-3D
paper_authors: Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nießner, Zhenguo Li
for: This paper is written for generating high-quality 3D point clouds using a novel Diffusion Transformer architecture, specifically designed for 3D shape generation.
methods: The paper proposes a novel Diffusion Transformer architecture called DiT-3D, which adapts the design philosophy of DiT but incorporates 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. The paper also introduces 3D window attention to reduce computational cost in 3D shape generation.
results: The proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation on the ShapeNet dataset, with a 4.59 decrease in 1-Nearest Neighbor Accuracy and a 3.51 increase in Coverage metric compared to the state-of-the-art method.

Abstract
Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, namely DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers. Compared to existing U-Net approaches, our DiT-3D is more scalable in model size and produces much higher quality generations. Specifically, the DiT-3D adopts the design philosophy of DiT but modifies it by incorporating 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. To reduce the computational cost of self-attention in 3D shape generation, we incorporate 3D window attention into Transformer blocks, as the increased 3D token length resulting from the additional dimension of voxels can lead to high computation. Finally, linear and devoxelization layers are used to predict the denoised point clouds. In addition, our transformer architecture supports efficient fine-tuning from 2D to 3D, where the pre-trained DiT-2D checkpoint on ImageNet can significantly improve DiT-3D on ShapeNet. Experimental results on the ShapeNet dataset demonstrate that the proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our DiT-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance.

摘要
最近的扩散变换器（例如DiT）已经表现出了高质量的2D图像生成能力。然而，是否Transformer架构在3D形状生成中表现 similarly well，现在都是一个问题。因为前一些3D扩散方法主要采用U-Net架构。为了bridging这个差距，我们提出了一种新的3D扩散变换器，即DiT-3D，它可以直接对粗糙点云进行杂化处理，并使用平杂Transformers进行操作。相比现有的U-Net方法，我们的DiT-3D更加扩展性强，生成质量更高。具体来说，DiT-3D采用了Diffusion Transformer的设计哲学，但是将其修改为包括3D位置嵌入和补丁嵌入，以适应 voxelized点云的输入。为了降低3D形状生成中自我注意力的计算成本，我们引入了3D窗口注意力，并在Transformer块中应用。最后，我们使用线性和反粗糙层来预测净化后的点云。此外，我们的 transformer 架构支持高效的 fine-tuning 从2D到3D，其中预先训练的 DiT-2D checkpoint 在 ImageNet 上可以显著提高 DiT-3D 的性能。实验结果表明，我们提出的 DiT-3D 在 ShapeNet 数据集上实现了状态可见的高精度和多样化3D点云生成。具体来说，我们的 DiT-3D 在1-Nearest Neighbor Accuracy 和 Coverage 指标上降低了state-of-the-art 方法的值，分别降低了4.59和3.51。

Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses

paper_url: http://arxiv.org/abs/2307.01827
repo_url: None
paper_authors: Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz, Yaniv Nikankin, Michal Irani
for: 这个研究的目的是探讨神经网络中训练样本的内存化现象，以及这种现象对神经网络的影响。
methods: 该研究使用了多层感知器和卷积神经网络进行重建训练样本的方法，并对不同的损失函数进行了探讨。
results: 研究发现，使用权重衰变 durante 训练可以提高神经网络的重建可能性，同时也影响了神经网络的性能。此外，研究还发现，在训练样本数量和神经网络neuron数量之间存在一定的关系。

Abstract
Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability.

摘要
<>将文本翻译成简化中文。<> neural network 的吸收训练数据是一个活跃的研究领域，然而我们对其内部工作的理解仍然处于初期阶段。最近，海沃特等人（2022）提出了一种方案，可以从多层感知器二分类网络中重建训练样本，Effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks。在这项工作中，我们将这些发现扩展到多类和卷积神经网络，并 derivate a more general reconstruction scheme 可以应用于更广泛的损失函数，如回归损失。此外，我们研究了不同因素对神经网络的重建性的影响，发现使用权重衰减 durante 训练可以提高重建性 both in terms of quantity and quality。此外，我们还研究了神经网络的 neurons 和训练样本的数量之间的关系。

Structural Balance and Random Walks on Complex Networks with Complex Weights

paper_url: http://arxiv.org/abs/2307.01813
repo_url: None
paper_authors: Yu Tian, Renaud Lambiotte
for: This paper focuses on the study of complex-weighted networks, specifically investigating their structural and dynamical properties when the weight matrix is Hermitian.
methods: The authors use concepts from signed graphs to classify complex-weighted networks based on structural balance and explore the shared spectral properties within each type. They also apply the results to characterize the dynamics of random walks on these networks.
results: The paper shows that local consensus can be achieved asymptotically when the graph is structurally balanced, while global consensus will be obtained when it is strictly unbalanced. The authors also propose a spectral clustering algorithm and explore the performance of the algorithm on both synthetic and real networks.

Abstract
Complex numbers define the relationship between entities in many situations. A canonical example would be the off-diagonal terms in a Hamiltonian matrix in quantum physics. Recent years have seen an increasing interest to extend the tools of network science when the weight of edges are complex numbers. Here, we focus on the case when the weight matrix is Hermitian, a reasonable assumption in many applications, and investigate both structural and dynamical properties of the complex-weighted networks. Building on concepts from signed graphs, we introduce a classification of complex-weighted networks based on the notion of structural balance, and illustrate the shared spectral properties within each type. We then apply the results to characterise the dynamics of random walks on complex-weighted networks, where local consensus can be achieved asymptotically when the graph is structurally balanced, while global consensus will be obtained when it is strictly unbalanced. Finally, we explore potential applications of our findings by generalising the notion of cut, and propose an associated spectral clustering algorithm. We also provide further characteristics of the magnetic Laplacian, associating directed networks to complex-weighted ones. The performance of the algorithm is verified on both synthetic and real networks.

摘要
复杂数字定义了实体之间的关系在许多情况下。一个典型的例子是量子物理中的哈密顿矩阵中的偏置项。过去几年，有越来越多的研究者想要扩展网络科学中的工具，当Weight of edges是复数时。我们在这里关注Hermitian矩阵的情况，这是许多应用中的合理假设。我们研究了复数权重网络的结构和动态特性，并基于签名图的概念引入了复数权重网络的分类。我们发现在不同类型的网络中，存在共同的 спектраль性质。然后，我们应用结果来描述复杂权重网络上Random walk的动态，当网络是结构均衡的时，本地协同可以在极限上 achievable，而全球协同则需要网络是严格不均衡的。最后，我们探讨了我们的发现的应用，包括通过扩展割的概念和相关的 спектраль划分算法。我们还提供了复杂 Laplacian的性能，将导向网络与复数权重网络相关联。我们的实验表明，我们的算法在 synthetic 和实际网络上都能够 достичь好的性能。

Capturing Local Temperature Evolution during Additive Manufacturing through Fourier Neural Operators

paper_url: http://arxiv.org/abs/2307.01804
repo_url: None
paper_authors: Jiangce Chen, Wenzhuo Xu, Martha Baldwin, Björn Nijhuis, Ton van den Boogaard, Noelia Grande Gutiérrez, Sneha Prabha Narra, Christopher McComb
for: 本研究旨在提高附加制造技术的性能，通过快速模拟热性能。
methods: 本文使用Fourier Neural Operator来捕捉加工过程中的本地温度演化。
results: 模型在numerical simulations中表现出高精度，并且可以在不同的几何体上保持通用性。In English, that would be:
for: The purpose of this research is to improve the performance of additive manufacturing technologies by quickly simulating thermal behavior.
methods: The paper uses Fourier Neural Operator to capture the local temperature evolution during the manufacturing process.
results: The model shows high accuracy in numerical simulations and maintains generalizability to different geometries.

Abstract
High-fidelity, data-driven models that can quickly simulate thermal behavior during additive manufacturing (AM) are crucial for improving the performance of AM technologies in multiple areas, such as part design, process planning, monitoring, and control. However, the complexities of part geometries make it challenging for current models to maintain high accuracy across a wide range of geometries. Additionally, many models report a low mean square error (MSE) across the entire domain (part). However, in each time step, most areas of the domain do not experience significant changes in temperature, except for the heat-affected zones near recent depositions. Therefore, the MSE-based fidelity measurement of the models may be overestimated. This paper presents a data-driven model that uses Fourier Neural Operator to capture the local temperature evolution during the additive manufacturing process. In addition, the authors propose to evaluate the model using the $R^2$ metric, which provides a relative measure of the model's performance compared to using mean temperature as a prediction. The model was tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process, and the results demonstrate that the model achieves high fidelity as measured by $R^2$ and maintains generalizability to geometries that were not included in the training process.

摘要
高精度、数据驱动的模型可以快速模拟附加制造过程中的热性能，这些模型在多个领域，如部件设计、过程规划、监测和控制方面，都有提高附加制造技术的表现。然而，部件的复杂 геометри Structure 使得当前的模型难以保持高精度 across 各种 geometries。此外，许多模型报告了 across 整个领域 ($part$) 的低 Mean Square Error ($MSE$)，但在每个时间步骤中，大多数领域并不经历 significannot 的温度变化，只有近 recent depositions 的热效应区域。因此，基于 $MSE$ 的模型准确性测试可能受到过度估计。本文提出了一种基于 Fourier Neural Operator 的数据驱动模型，用于捕捉附加制造过程中的本地温度演化。此外，作者们提议使用 $R^2$ 指标来评估模型的性能，$R^2$ 指标为模型的Relative 性能指标，可以与使用 Mean Temperature 作为预测的 $R^2$ 指标进行比较。模型在基于 Discontinuous Galerkin Finite Element Method 的数值 simulations 上进行测试，结果表明该模型在 $R^2$ 指标下达到了高准确性，并且可以在不包含在训练过程中的 geometry 上保持通用性。

Edge-aware Multi-task Network for Integrating Quantification Segmentation and Uncertainty Prediction of Liver Tumor on Multi-modality Non-contrast MRI

paper_url: http://arxiv.org/abs/2307.01798
repo_url: None
paper_authors: Xiaojiao Xiao, Qinmin Hu, Guanghui Wang
for: Liver tumor diagnosis and analysis
methods: Multi-modality non-contrast magnetic resonance imaging (NCMRI) fusion, edge-aware feature aggregation module (EaFA), and multi-task learning
results: Outperformed state-of-the-art methods with a dice similarity coefficient of 90.01$\pm$1.23 and a mean absolute error of 2.72$\pm$0.58 mm for MD.

Abstract
Simultaneous multi-index quantification, segmentation, and uncertainty estimation of liver tumors on multi-modality non-contrast magnetic resonance imaging (NCMRI) are crucial for accurate diagnosis. However, existing methods lack an effective mechanism for multi-modality NCMRI fusion and accurate boundary information capture, making these tasks challenging. To address these issues, this paper proposes a unified framework, namely edge-aware multi-task network (EaMtNet), to associate multi-index quantification, segmentation, and uncertainty of liver tumors on the multi-modality NCMRI. The EaMtNet employs two parallel CNN encoders and the Sobel filters to extract local features and edge maps, respectively. The newly designed edge-aware feature aggregation module (EaFA) is used for feature fusion and selection, making the network edge-aware by capturing long-range dependency between feature and edge maps. Multi-tasking leverages prediction discrepancy to estimate uncertainty and improve segmentation and quantification performance. Extensive experiments are performed on multi-modality NCMRI with 250 clinical subjects. The proposed model outperforms the state-of-the-art by a large margin, achieving a dice similarity coefficient of 90.01$\pm$1.23 and a mean absolute error of 2.72$\pm$0.58 mm for MD. The results demonstrate the potential of EaMtNet as a reliable clinical-aided tool for medical image analysis.

摘要
simultanous多指标评估、分割和不确定度估计liver肿瘤在多Modal非contrast磁共振成像（NCMRI）中是诊断精准的关键。然而，现有方法缺乏有效的多Modal NCMRI融合机制和准确边界信息捕获机制，使这些任务变得困难。为解决这些问题，这篇论文提出了一个统一框架，即edge-aware多任务网络（EaMtNet），用于 associating multi-index评估、分割和不确定度估计liver肿瘤在多Modal NCMRI中。EaMtNet使用了两个并行的CNN Encoder和Sobel滤波器来提取本地特征和边图，分别。新设计的edge-aware特征聚合模块（EaFA）用于特征融合和选择，使网络变得edge-aware，捕捉特征和边图之间的长距离依赖关系。多任务利用预测差异来估计不确定度和提高分割和评估性能。广泛的实验在多Modal NCMRI上进行，涉及250名临床实验者。提出的模型在多Modal NCMRI中表现出色，达到了dice相似度系数90.01±1.23和平均绝对误差2.72±0.58mm for MD。结果表明EaMtNet可能成为一种可靠的临床辅助工具 для医学影像分析。