2023-10-16

cs.LG

cs.LG - 2023-10-16

Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction

paper_url: http://arxiv.org/abs/2310.10893
repo_url: https://github.com/lee-cbg/activetcr
paper_authors: Pengfei Zhang, Seojin Bang, Heewook Lee
for: 这个研究的目的是为了提高TCR与复装序列之间的紧密互动，以便更好地预测TCR对复装序列的认知。
methods: 这个研究使用了活动学习和机器学习技术，将TCR与复装序列之间的紧密互动预测模型与训练数据集成一体。
results: 这个研究发现，使用活动学习可以大幅降低验证成本，并且可以提高预测模型的性能。此外，提供真实的TCR与复装序列对的标签可以帮助减少更多的重复数据，无需增加训练数据量。

Abstract
T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are ''worth'' for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction.

摘要

The Calysto Scheme Project

paper_url: http://arxiv.org/abs/2310.10886
repo_url: https://github.com/calysto/calysto_scheme
paper_authors: Douglas S. Blank, James B. Marshall
for: 这篇论文主要是用来介绍一种名为Calysto Scheme的编程语言，以及它在Python基础上实现的一系列可行性和可读性优化。
methods: 这篇论文使用了Continuation-Passing Style和一系列正确性保持的编程变换来将Scheme语言转换成Python语言。它支持标准Scheme功能，包括call/cc，以及一些扩展功能，如非探测运算符、自动回tracking和Python交互。
results: 这篇论文的研究结果表明，Calysto Scheme可以在教学和实际应用中使用，具有简单易用的特点和可以快速安装。它已经在Jupyter Notebook生态系统中集成，并在教学中使用了一些有趣和独特的措施。

Abstract
Calysto Scheme is written in Scheme in Continuation-Passing Style, and converted through a series of correctness-preserving program transformations into Python. It has support for standard Scheme functionality, including call/cc, as well as syntactic extensions, a nondeterministic operator for automatic backtracking, and many extensions to allow Python interoperation. Because of its Python foundation, it can take advantage of modern Python libraries, including those for machine learning and other pedagogical contexts. Although Calysto Scheme was developed with educational purposes in mind, it has proven to be generally useful due to its simplicity and ease of installation. It has been integrated into the Jupyter Notebook ecosystem and used in the classroom to teach introductory Programming Languages with some interesting and unique twists.

摘要
加利斯托计划（Calysto Scheme）是一种使用Scheme语言编写的continuation-passing style编程语言，并通过一系列正确性保持的程序转换成Python语言。它支持标准Scheme功能，包括call/cc，以及一些语法扩展和不确定运算符，用于自动回tracking。由于其基于Python语言，因此可以利用现代Python库，包括机器学习和其他教学上的其他库。虽然加利斯托计划是为教育目的设计的，但由于其简单易用，因此在其他场景中也有广泛的应用。它已经与Jupyter Notebook生态系统集成，并在课堂上使用，以教授初级编程语言。

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

paper_url: http://arxiv.org/abs/2310.10879
repo_url: None
paper_authors: Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You
for: 提高现代深度学习模型的优化和扩展数据集的训练效率
methods: 提出了一种基于分布式数据并行训练的新训练方案，可以有效地处理不同长度的序列，无需添加过多的padding
results: 通过该方案，可以在训练时间和准确率两个方面提高表现，在实验中比较了100倍以上的减少padding量而不删除任何帧

Abstract
The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments.

摘要
现代深度神经网络模型的复杂度和数据集的规模在不断增长，因此需要开发优化和可扩展的训练方法。在这份白皮书中，我们解决了神经网络模型使用不同长度序列训练的挑战。我们提议一种新的训练方案，可以允许高效分布式数据并行训练，无需较大的过载。通过这种方法，我们可以在不删帧的情况下，将padding减少超过100倍，从而提高了训练时间和准确率的表现。

Eco-Driving Control of Connected and Automated Vehicles using Neural Network based Rollout

paper_url: http://arxiv.org/abs/2310.10878
repo_url: None
paper_authors: Jacob Paugh, Zhaoxuan Zhu, Shobhit Gupta, Marcello Canova, Stephanie Stockar
for: 提高connected和自主汽车的能源消耗效率，通过在行驶中使用 Vehicle-to-Everything信息优化车辆速度和动力系统。
methods: 使用层次多时间预报法，通过神经网络学习全路价值函数，并用来 aproximate终端成本在减少范围优化中。
results: 在实际道路上的模拟中，提议的方法可以与使用强化学习获得的 Stochastic 优化解决方案相当，而无需 слож的训练方法和内存占用。

Abstract
Connected and autonomous vehicles have the potential to minimize energy consumption by optimizing the vehicle velocity and powertrain dynamics with Vehicle-to-Everything info en route. Existing deterministic and stochastic methods created to solve the eco-driving problem generally suffer from high computational and memory requirements, which makes online implementation challenging. This work proposes a hierarchical multi-horizon optimization framework implemented via a neural network. The neural network learns a full-route value function to account for the variability in route information and is then used to approximate the terminal cost in a receding horizon optimization. Simulations over real-world routes demonstrate that the proposed approach achieves comparable performance to a stochastic optimization solution obtained via reinforcement learning, while requiring no sophisticated training paradigm and negligible on-board memory.

摘要
自适应和连接的汽车可以减少能源消耗，通过优化车辆速度和动力系统，基于车辆到所有事物（V2X）信息在路线上进行优化。现有的决策方法通常具有高计算和存储需求，在线实现具有挑战性。本工作提出了层次多 horizons 优化框架，通过神经网络学习全路价值函数，并用于缩小往返极限优化。通过实验示例，我们证明了提议的方法可以与基于强化学习的随机优化方法相当，而不需要复杂的训练方法和车辆上的快速存储。

Religious Affiliation in the Twenty-First Century: A Machine Learning Perspective on the World Value Survey

paper_url: http://arxiv.org/abs/2310.10874
repo_url: None
paper_authors: Elaheh Jafarigol, William Keely, Tess Hartog, Tom Welborn, Peyman Hekmatpour, Theodore B. Trafalis
for: 这个研究是使用全球数据收集的世界价值调查数据进行量化分析，以研究社会中个体的宗教信仰、价值观和行为的变化趋势。
methods: 该研究使用随机森林算法来标识宗教性的关键因素，并使用重抽样技术来平衡数据并改善偏袋学习性能指标。
results: 变量重要性分析结果显示，年龄和收入在大多数国家中是最重要的变量，这与社会学基本理论中关于宗教和人类行为的概念有直接关系。

Abstract
This paper is a quantitative analysis of the data collected globally by the World Value Survey. The data is used to study the trajectories of change in individuals' religious beliefs, values, and behaviors in societies. Utilizing random forest, we aim to identify the key factors of religiosity and classify respondents of the survey as religious and non religious using country level data. We use resampling techniques to balance the data and improve imbalanced learning performance metrics. The results of the variable importance analysis suggest that Age and Income are the most important variables in the majority of countries. The results are discussed with fundamental sociological theories regarding religion and human behavior. This study is an application of machine learning in identifying the underlying patterns in the data of 30 countries participating in the World Value Survey. The results from variable importance analysis and classification of imbalanced data provide valuable insights beneficial to theoreticians and researchers of social sciences.

摘要
这个论文是一项量化分析全球World Value Survey所收集的数据。数据用于研究社会中个体信仰、价值观和行为的变化轨迹。使用Random Forest算法，我们希望寻找 religiosity 的关键因素并使用国家级数据将响应者分类为宗教和非宗教。我们使用重样技术来协议数据，以改善偏袋学习表现指标。结果变量重要性分析表明，年龄和收入在大多数国家中是最重要的变量。结果与基本社会学理论相关于宗教和人类行为进行了讨论。这项研究是机器学习在World Value Survey数据中寻找下面的 patrón 的应用。变量重要性分析和响应者分类提供了有价值的发现，对社会科学研究人员有所帮助。

Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.10856
repo_url: None
paper_authors: Xianyue Peng, Hang Gao, Gengyue Han, Hao Wang, Michael Zhang
for: 解决现代道路网络中的城市干道堵塞问题，提高交通效率。
methods: 提出了一种联合优化方法，通过同时控制信号时间和车辆路径选择，使用多智能拟似人类学习（MADRL）。
results: 经过数学实验表明，我们的策略可以在修改后的哈佛环境中提高交通效率，并且比只控制信号时间或车辆路径选择 alone 更高。

Abstract
Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performance by simultaneously controlling signal timings and route choices using Multi-Agent Deep Reinforcement Learning (MADRL). Signal control agents (SAs) are employed to establish signal timings at intersections, whereas vehicle routing agents (RAs) are responsible for selecting vehicle routes. By establishing relevance between agents and enabling them to share observations and rewards, interaction and cooperation among agents are fostered, which enhances individual training. The Multi-Agent Advantage Actor-Critic algorithm is used to handle multi-agent environments, and Deep Neural Network (DNN) structures are designed to facilitate the algorithm's convergence. Notably, our work is the first to utilize MADRL in determining the optimal joint policy for signal control and vehicle routing. Numerical experiments conducted on the modified Sioux network demonstrate that our integration of signal control and vehicle routing outperforms controlling signal timings or vehicles' routes alone in enhancing traffic efficiency.

摘要
现代城市交通堵塞是一个严重的问题，对现代道路网络造成了很大的影响。为了解决这个问题并提高交通效率，交通信号控制和车辆 routing 已经被证明是有效的解决方案。在这篇论文中，我们提出了一种联合优化方法，将交通信号控制和车辆 routing 联合优化。我们使用多 Agent Deep Reinforcement Learning（MADRL）来同时控制信号时间和车辆路径选择。信号控制代理（SAs）负责在交叉口设置信号时间，而车辆路径选择代理（RAs）负责选择车辆路径。通过在代理之间建立相关性，并让代理们共享观察和奖励，这会促进代理之间的交互和合作，从而提高每个代理的训练效果。我们使用多 Agent Advantage Actor-Critic算法来处理多 Agent 环境，并使用深度神经网络（DNN）结构来促进算法的收敛。值得注意的是，我们的工作是首次在确定合适的共同策略方面利用 MADRL。我们在修改过 Sioux 网络进行数值实验，结果显示，我们将信号控制和车辆 routing 联合优化的策略与单独控制信号时间或车辆路径的策略相比，在提高交通效率方面具有显著的优势。

Probabilistic Classification by Density Estimation Using Gaussian Mixture Model and Masked Autoregressive Flow

paper_url: http://arxiv.org/abs/2310.10843
repo_url: https://github.com/bghojogh/density-based-classifiers
paper_authors: Benyamin Ghojogh, Milad Amir Toutounchian
for: 这篇论文主要用于提出一种基于密度估计的分类方法，而密度估计通常用于数据分布估计而不是分类。
methods: 该论文使用了两种密度估计方法： Gaussian Mixture Model (GMM) 和 Masked Autoregressive Flow (MAF)。GMM 是一种预测最大化的密度估计方法，而 MAF 则是一种基于普通化流和自适应网络的生成模型。
results: 该论文的实验结果显示，使用 GMM 和 MAF 进行密度估计可以超过简单的线性分类器，如线性激发分析。此外，该论文还开启了研究者们可以根据密度估计来提出其他可能的概率分类器的研究方向。

Abstract
Density estimation, which estimates the distribution of data, is an important category of probabilistic machine learning. A family of density estimators is mixture models, such as Gaussian Mixture Model (GMM) by expectation maximization. Another family of density estimators is the generative models which generate data from input latent variables. One of the generative models is the Masked Autoregressive Flow (MAF) which makes use of normalizing flows and autoregressive networks. In this paper, we use the density estimators for classification, although they are often used for estimating the distribution of data. We model the likelihood of classes of data by density estimation, specifically using GMM and MAF. The proposed classifiers outperform simpler classifiers such as linear discriminant analysis which model the likelihood using only a single Gaussian distribution. This work opens the research door for proposing other probabilistic classifiers based on joint density estimation.

摘要
density 估计是机器学习中一种重要的分布估计类别。一个家族的density 估计器是混合模型，如 Gaussian Mixture Model (GMM) 使用期望最大化。另一个家族的density 估计器是生成模型，它们可以将输入的latent variable generate 为数据。本文使用density 估计器进行分类，尽管它们通常用于估计数据的分布。我们使用GMM和MAF来模型数据的类别概率。提议的分类器比 simpler 分类器，如线性混合分析，which 模型只使用单个 Gaussian 分布来模型类别概率。这项工作打开了研究机会，提出其他基于共同分布估计的概率分类器的建议。

A Machine Learning-based Algorithm for Automated Detection of Frequency-based Events in Recorded Time Series of Sensor Data

paper_url: http://arxiv.org/abs/2310.10841
repo_url: None
paper_authors: Bahareh Medghalchi, Andreas Vogel
for: 本研究旨在提出一种新的自动事件探测方法，用于检测时间序列数据中的事件。
methods: 该方法首先将时间序列数据映射到时间频域的表示中，然后对表示进行筛选和对象检测模型的训练，以检测期望的事件对象在表示中。
results: 该方法在未seen的数据集上测试，准确率为0.97，能够准确地检测事件的时间间隔，提高了自动事件检测的精度和可靠性。

Abstract
Automated event detection has emerged as one of the fundamental practices to monitor the behavior of technical systems by means of sensor data. In the automotive industry, these methods are in high demand for tracing events in time series data. For assessing the active vehicle safety systems, a diverse range of driving scenarios is conducted. These scenarios involve the recording of the vehicle's behavior using external sensors, enabling the evaluation of operational performance. In such setting, automated detection methods not only accelerate but also standardize and objectify the evaluation by avoiding subjective, human-based appraisals in the data inspection. This work proposes a novel event detection method that allows to identify frequency-based events in time series data. To this aim, the time series data is mapped to representations in the time-frequency domain, known as scalograms. After filtering scalograms to enhance relevant parts of the signal, an object detection model is trained to detect the desired event objects in the scalograms. For the analysis of unseen time series data, events can be detected in their scalograms with the trained object detection model and are thereafter mapped back to the time series data to mark the corresponding time interval. The algorithm, evaluated on unseen datasets, achieves a precision rate of 0.97 in event detection, providing sharp time interval boundaries whose accurate indication by human visual inspection is challenging. Incorporating this method into the vehicle development process enhances the accuracy and reliability of event detection, which holds major importance for rapid testing analysis.

摘要
自动化事件检测已经成为监测技术系统行为的基本实践，通过感知器数据。在汽车业，这些方法受到时间序列数据跟踪事件的高需求。为评估活动汽车安全系统，进行了多种驾驶场景测试。这些场景包括记录汽车的行为使用外部感知器，以便评估运作性能。在这种设置下，自动检测方法不仅加速了，还标准化和对象化了评估，通过避免人类基于数据检查的主观评估。本研究提出了一种新的事件检测方法，可以在时间序列数据中检测频率基于事件。为此，将时间序列数据映射到时间频率域的表示，称为scalogram。然后，对scalogram进行过滤，以增强有关信号的部分。然后，将对象检测模型训练以检测想要的事件对象在scalogram中。对于未看过的时间序列数据分析，可以使用训练好的对象检测模型在scalogram中检测事件，并将其映射回时间序列数据，以标识相应的时间间隔。这种算法，在未经过测试的数据上进行评估，具有0.97的准确率，提供了准确的时间间隔边界，人工视觉检查具有挑战性。通过将这种方法 integrate到汽车开发过程中，提高了事件检测的准确性和可靠性，这些特性对于快速分析具有重要性。

Approximating Two-Layer Feedforward Networks for Efficient Transformers

paper_url: http://arxiv.org/abs/2310.10837
repo_url: https://github.com/robertcsordas/moe
paper_authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
for: 降低神经网络（NN）计算和内存需求，而不是牺牲性能。
methods: 使用稀疏混合专家（MoE）建立资源高效的大语言模型（LM）。
results: 在 WikiText-103 和 enwiki8 数据集上，与 dense Transformer-XL 相比，我们的 MoE 在两个不同的尺度上具有相当的竞争力，而且具有许多资源的灵活性。

Abstract
How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel perspectives on MoEs, presenting a general framework that unifies various methods to approximate two-layer NNs (e.g., feedforward blocks of Transformers), including product-key memories (PKMs). Leveraging insights from this framework, we propose methods to improve both MoEs and PKMs. Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs. We show that our MoEs are competitive with the dense Transformer-XL on both the WikiText-103 and enwiki8 datasets at two different scales, while being much more resource efficient. This demonstrates that MoEs are relevant not only to extremely large LMs but also to any-scale resource-efficient LMs. Our code is public.

摘要
如何减少神经网络（NN）的计算和内存需求而不 sacrificing性能？许多latest works使用稀疏混合专家（MoE）来建立资源高效的大语言模型（LM）。我们介绍了MoE的一些新的视角，提出了一个通用的框架，可以 aproximate two-layer NNs（例如Transformers的Feedforward块），包括产品密钥记忆（PKM）。通过这个框架的洞察，我们提出了MoE和PKM的改进方法。与先前的工作不同，我们的评估条件是参数平等（parameter-equal），这是评估LMs的关键。我们的MoE在WikiText-103和enwiki8 dataset上与 dense Transformer-XL 在两个不同的 scales 上具有相似的竞争力，而且具有许多资源的约束。这表明MoE不仅适用于极大规模的LM，还适用于任何规模的资源高效LM。我们的代码public。

Gaussian processes based data augmentation and expected signature for time series classification

paper_url: http://arxiv.org/abs/2310.10836
repo_url: None
paper_authors: Marco Romito, Francesco Triggiano
for: 这篇论文旨在提出一种基于预期签名的时间序列特征提取模型，用于预测时间序列的统计特征。
methods: 该模型使用 Gaussian processes 数据增强来计算预期签名，并通过一个监督任务来学习最佳的特征提取。
results: 模型可以学习出优化的特征提取，并且可以用于预测时间序列的统计特征。

Abstract
The signature is a fundamental object that describes paths (that is, continuous functions from an interval to a Euclidean space). Likewise, the expected signature provides a statistical description of the law of stochastic processes. We propose a feature extraction model for time series built upon the expected signature. This is computed through a Gaussian processes based data augmentation. One of the main features is that an optimal feature extraction is learnt through the supervised task that uses the model.

摘要
《签名》是一个基本对象，描述了路径（即连续函数从一个时间interval到一个欧几里得空间）。类似地，预期签名提供了一个统计描述，描述了随机过程的法律。我们提议一种基于预期签名的特征提取模型，用于时间序列。这个模型通过基于 Gaussian 过程的数据增强来计算预期签名。其中一个主要特点是通过一个监督任务来学习优化的特征提取。

Accurate Data-Driven Surrogates of Dynamical Systems for Forward Propagation of Uncertainty

paper_url: http://arxiv.org/abs/2310.10831
repo_url: None
paper_authors: Saibal De, Reese E. Jones, Hemanth Kolla
for: 这篇研究的目的是为了提出一种新的非侵入式方法来建立模型uncertainty quantification中的代理模型。
methods: 这篇研究使用了Stochastic collocation（SC）方法，并与Data-driven sparse identification of nonlinear dynamics（SINDy）框架结合，以建立动态模型的代理模型。
results: 研究发现，使用SC-over-dynamics框架可以降低错误，包括系统轨道的描述和模型状态分布的描述。三个测试问题中的两个问题（一个ordinary differential equation和一个partial differential equation）的数据表明，这种方法可以提供更好的结果。

Abstract
Stochastic collocation (SC) is a well-known non-intrusive method of constructing surrogate models for uncertainty quantification. In dynamical systems, SC is especially suited for full-field uncertainty propagation that characterizes the distributions of the high-dimensional primary solution fields of a model with stochastic input parameters. However, due to the highly nonlinear nature of the parameter-to-solution map in even the simplest dynamical systems, the constructed SC surrogates are often inaccurate. This work presents an alternative approach, where we apply the SC approximation over the dynamics of the model, rather than the solution. By combining the data-driven sparse identification of nonlinear dynamics (SINDy) framework with SC, we construct dynamics surrogates and integrate them through time to construct the surrogate solutions. We demonstrate that the SC-over-dynamics framework leads to smaller errors, both in terms of the approximated system trajectories as well as the model state distributions, when compared against full-field SC applied to the solutions directly. We present numerical evidence of this improvement using three test problems: a chaotic ordinary differential equation, and two partial differential equations from solid mechanics.

摘要
这项工作提出了一种alternative方法，在这里我们通过SC 近似方法来 Approximate the dynamics of the model, rather than the solution。通过将数据驱动的稀疏标识非线性动力（SINDy）框架与SC 结合，我们构建了动力准模型，并将其通过时间集成以构建准确的解决方案。我们发现，使用SC над动力框架，相比拟合解决方案直接使用SC，能够减少误差，包括系统轨迹的近似值以及模型状态分布的误差。我们通过三个测试问题来提供数字证据：一个混沌的常微分方程，以及两个固体力学中的部分偏微分方程。

Uncertainty-aware transfer across tasks using hybrid model-based successor feature reinforcement learning

paper_url: http://arxiv.org/abs/2310.10818
repo_url: None
paper_authors: Parvin Malekzadeh, Ming Hou, Konstantinos N. Plataniotis
For: 该论文主要针对复杂大规模决策问题的实用强化学习（RL）问题进行研究，旨在提高样本效率。* Methods: 该论文提出了一种将模型基于（MB）方法与继承特征（SF）算法结合的方法，以及一种基于卡尔曼滤波器（KF）的多模型适应估计来实现不确定性感知探索。* Results: 该论文的实验结果显示，该算法可以在不同的转移动力学中转移知识，对下游任务进行有效的探索和学习，并在样本量方面比起现有基elines要少得多。

Abstract
Sample efficiency is central to developing practical reinforcement learning (RL) for complex and large-scale decision-making problems. The ability to transfer and generalize knowledge gained from previous experiences to downstream tasks can significantly improve sample efficiency. Recent research indicates that successor feature (SF) RL algorithms enable knowledge generalization between tasks with different rewards but identical transition dynamics. It has recently been hypothesized that combining model-based (MB) methods with SF algorithms can alleviate the limitation of fixed transition dynamics. Furthermore, uncertainty-aware exploration is widely recognized as another appealing approach for improving sample efficiency. Putting together two ideas of hybrid model-based successor feature (MB-SF) and uncertainty leads to an approach to the problem of sample efficient uncertainty-aware knowledge transfer across tasks with different transition dynamics or/and reward functions. In this paper, the uncertainty of the value of each action is approximated by a Kalman filter (KF)-based multiple-model adaptive estimation. This KF-based framework treats the parameters of a model as random variables. To the best of our knowledge, this is the first attempt at formulating a hybrid MB-SF algorithm capable of generalizing knowledge across large or continuous state space tasks with various transition dynamics while requiring less computation at decision time than MB methods. The number of samples required to learn the tasks was compared to recent SF and MB baselines. The results show that our algorithm generalizes its knowledge across different transition dynamics, learns downstream tasks with significantly fewer samples than starting from scratch, and outperforms existing approaches.

摘要
sample efficiency是RL中央的一个重要问题，它可以提高RL的实用性和可扩展性。在不同的任务中，通过转移和总结之前的经验可以提高sample efficiency。现有研究表明，Successor Feature（SF）RL算法可以在不同的奖励下产生相同的转移动力学中转移知识。此外，uncertainty-aware探索被广泛认为是一个有appeal的方法，可以提高sample efficiency。将两个想法MB-SF和uncertainty结合起来，可以解决在不同的转移动力学或奖励函数下的样本效率不稳定问题。在这篇论文中，我们使用Kalman缓冲（KF）基于多模型适应的方法来估计每个行为的值的不确定性。这是我们知道的第一个能够在大或连续状态空间任务上广泛适用的hybrid MB-SF算法。我们对SF和MB基线进行比较，并发现我们的算法可以在不同的转移动力学下转移知识，在不同的任务上学习要少样本更多，并且超过现有的方法。

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

paper_url: http://arxiv.org/abs/2310.10810
repo_url: https://github.com/abukharin3/ernie
paper_authors: Alexander Bukharin, Yan Li, Yue Yu, Qingru Zhang, Zhehui Chen, Simiao Zuo, Chao Zhang, Songan Zhang, Tuo Zhao
For: The paper aims to improve the robustness of multi-agent reinforcement learning (MARL) policies by controlling the Lipschitz constant of the policies and using adversarial regularization to promote continuity with respect to state observations and actions.* Methods: The proposed framework, called ERNIE, uses adversarial regularization to promote the Lipschitz continuity of policies, and the authors reformulate adversarial regularization as a Stackelberg game to reduce training instability.* Results: The authors demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments, and show that the ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. Additionally, the authors extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest.

Abstract
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we show that we can gain robustness by controlling a policy's Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Based on these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization. The ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. However, ERNIE's adversarial regularization may introduce some training instability. To reduce this instability, we reformulate adversarial regularization as a Stackelberg game. We demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments. In addition, we extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest. Our code is available at https://github.com/abukharin3/ERNIE.

摘要
多智能体学习（MARL）已经在多个领域展示了承诺的成绩。然而，MARL策略通常缺乏鲁棒性，因此容易受到环境小变化的影响。这对实际世界中部署MARL算法提出了严重的问题，因为测试环境可能与训练环境有所不同。在这种情况下，我们展示了通过控制策略的 lipschitz常量来增加鲁棒性的可能性，并在某些条件下证明存在 lipschitz和准确策略的存在。基于这些发现，我们提出了一个新的鲁棒MARL框架，称为ERNIE，它通过对策略的状态观测和行动进行对抗规范化来提高策略的鲁棒性。ERNIE框架可以抵抗噪声观测、变化的转移动力和代理者的恶意行动。然而，ERNIE的对抗规范化可能会导致训练不稳定。为了减少这种不稳定，我们将对抗规范化转换为一个Stackelberg游戏。我们通过广泛的实验证明了提议的框架的效果，包括交通灯控制和粒子环境。此外，我们将ERNIE扩展到了 Mean-field MARL，使其超越其不鲁棒的对比，并且这种扩展是独立有价值的。我们的代码可以在https://github.com/abukharin3/ERNIE上获取。

Regularization properties of adversarially-trained linear regression

paper_url: http://arxiv.org/abs/2310.10807
repo_url: https://github.com/antonior92/advtrain-linreg
paper_authors: Antônio H. Ribeiro, Dave Zachariah, Francis Bach, Thomas B. Schön
For: The paper is focused on studying the effectiveness of adversarial training in defending against input perturbations in linear models, and exploring the relationship between adversarial training and other regularization methods.* Methods: The paper uses a min-max formulation of adversarial training to search for the best solution when the training data are corrupted by worst-case attacks. The authors also compare the solution of adversarial training with other regularization methods, such as ridge regression and Lasso.* Results: The main findings of the paper include that adversarial training yields the minimum-norm interpolating solution in the overparameterized regime, and that adversarial training can be equivalent to parameter shrinking methods in the underparametrized region. Additionally, the choice of adversarial radius for optimal bounds does not depend on the additive noise variance. The authors confirm their theoretical findings with numerical examples.

Abstract
State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it searches for the best solution when the training data were corrupted by the worst-case attacks. Linear models are among the simple models where vulnerabilities can be observed and are the focus of our study. In this case, adversarial training leads to a convex optimization problem which can be formulated as the minimization of a finite sum. We provide a comparative analysis between the solution of adversarial training in linear regression and other regularization methods. Our main findings are that: (A) Adversarial training yields the minimum-norm interpolating solution in the overparameterized regime (more parameters than data), as long as the maximum disturbance radius is smaller than a threshold. And, conversely, the minimum-norm interpolator is the solution to adversarial training with a given radius. (B) Adversarial training can be equivalent to parameter shrinking methods (ridge regression and Lasso). This happens in the underparametrized region, for an appropriate choice of adversarial radius and zero-mean symmetrically distributed covariates. (C) For $\ell_\infty$-adversarial training -- as in square-root Lasso -- the choice of adversarial radius for optimal bounds does not depend on the additive noise variance. We confirm our theoretical findings with numerical examples.

摘要
现代机器学习模型可能受到非常小的输入干扰，这些干扰是利用攻击性的构造的。对抗攻击是一种有效的防御策略。作为一个最小化问题，它搜索最佳解决方案，当训练数据被攻击最坏的攻击时。线性模型是我们研究的简单模型，在这种情况下，对抗训练转化为一个凸优化问题，可以通过最小化一个有限和来解决。我们提供了对抗训练在线性回归和其他规化方法之间的比较分析。我们的主要发现是：(A) 对抗训练在过参数化 régime（更多参数 чем数据）下给出最小欧式 interpolator，只要攻击干扰半径小于一个阈值。而且，对抗训练的解决方案与最小欧式 interpolator相同。(B) 对抗训练可以与参数缩放方法（ridge regression和lasso）等同。这发生在下参数化 régime，对应的攻击干扰半径和零均匀分布的covariate是合适的。(C) 对 $\ell_\infty$-对抗训练（如平方lasso）中选择的攻击干扰半径对最佳 bound 没有依赖于加法噪声Variance。我们通过数学示例证明了我们的理论发现。

Neural Tangent Kernels Motivate Graph Neural Networks with Cross-Covariance Graphs

paper_url: http://arxiv.org/abs/2310.10791
repo_url: None
paper_authors: Shervin Khalafi, Saurabh Sihag, Alejandro Ribeiro
for: 这篇论文探讨了用神经网络学习推论和泛化行为，特别是在图 нейрон网络（GNN）中。
methods: 该论文使用了神经 tangent 函数（NTK）来分析神经网络的学习和泛化行为。
results: 研究发现，在 GNN 中，优化对齐关系（alignment）可以优化图表示或图变换函数，并且有理论保证对齐是最佳的。实验结果表明，使用 cross-covariance 作为图 shift 函数可以超过只使用输入数据 covariance matrix 的 GNN。

Abstract
Neural tangent kernels (NTKs) provide a theoretical regime to analyze the learning and generalization behavior of over-parametrized neural networks. For a supervised learning task, the association between the eigenvectors of the NTK kernel and given data (a concept referred to as alignment in this paper) can govern the rate of convergence of gradient descent, as well as generalization to unseen data. Building upon this concept, we investigate NTKs and alignment in the context of graph neural networks (GNNs), where our analysis reveals that optimizing alignment translates to optimizing the graph representation or the graph shift operator in a GNN. Our results further establish the theoretical guarantees on the optimality of the alignment for a two-layer GNN and these guarantees are characterized by the graph shift operator being a function of the cross-covariance between the input and the output data. The theoretical insights drawn from the analysis of NTKs are validated by our experiments focused on a multi-variate time series prediction task for a publicly available dataset. Specifically, they demonstrate that GNNs with cross-covariance as the graph shift operator indeed outperform those that operate on the covariance matrix from only the input data.

摘要
neural tangent kernels (NTKs) 提供了一个理论 régime 来分析过 parametrized нейрон网络在学习和泛化行为的条件。在一个监督学习任务中，NTK kernel 的 eigenvectors 和资料之间的关联（一个称为“alignment”的概念）可以控制梯度下降的速度，以及对未见到的资料的泛化。在这篇文章中，我们对 GNN （图形神经网络）中的 NTK 和配置进行了研究，我们的分析表明，对 GNN 进行配置的最佳化将导致图形表现或图形移动操作的最佳化。我们的结果还证明了在 GNN 中对 alignment 的最佳化具有理论保证，这些保证是由图形移动操作的cross-covariance 和输入资料的covariance matrix 所决定。我们的实验还证明了 GNN 使用 cross-covariance 作为图形移动操作实际上会比使用仅从输入资料的covariance matrix 来操作更好。

Correcting model misspecification in physics-informed neural networks (PINNs)

paper_url: http://arxiv.org/abs/2310.10776
repo_url: None
paper_authors: Zongren Zou, Xuhui Meng, George Em Karniadakis
methods: 这 paper 使用的方法包括 physics-informed neural networks (PINNs) 和其他深度神经网络 (DNNs)。results: 这 paper 的结果表明，通过使用 DNNs 来模型不完全的物理模型，可以将计算错误减少，并且可以使 PINNs 在复杂系统中应用。此外，这 paper 还使用 Bayesian PINNs (B-PINNs) 和/或 ensemble PINNs 来评估不确定性。

Abstract
Data-driven discovery of governing equations in computational science has emerged as a new paradigm for obtaining accurate physical models and as a possible alternative to theoretical derivations. The recently developed physics-informed neural networks (PINNs) have also been employed to learn governing equations given data across diverse scientific disciplines. Despite the effectiveness of PINNs for discovering governing equations, the physical models encoded in PINNs may be misspecified in complex systems as some of the physical processes may not be fully understood, leading to the poor accuracy of PINN predictions. In this work, we present a general approach to correct the misspecified physical models in PINNs for discovering governing equations, given some sparse and/or noisy data. Specifically, we first encode the assumed physical models, which may be misspecified, then employ other deep neural networks (DNNs) to model the discrepancy between the imperfect models and the observational data. Due to the expressivity of DNNs, the proposed method is capable of reducing the computational errors caused by the model misspecification and thus enables the applications of PINNs in complex systems where the physical processes are not exactly known. Furthermore, we utilize the Bayesian PINNs (B-PINNs) and/or ensemble PINNs to quantify uncertainties arising from noisy and/or gappy data in the discovered governing equations. A series of numerical examples including non-Newtonian channel and cavity flows demonstrate that the added DNNs are capable of correcting the model misspecification in PINNs and thus reduce the discrepancy between the physical models and the observational data. We envision that the proposed approach will extend the applications of PINNs for discovering governing equations in problems where the physico-chemical or biological processes are not well understood.

摘要
<>用数据驱动的方法发现计算机科学中的管理方程式已经成为一种新的方法，以获取准确的物理模型，也可能是理论 derivations的可能的替代方法。最近开发的物理学泛化神经网络（PINNs）已经在多个科学领域中使用来学习管理方程式。 despite the effectiveness of PINNs for discovering governing equations, the physical models encoded in PINNs may be misspecified in complex systems as some of the physical processes may not be fully understood, leading to poor accuracy of PINN predictions. In this work, we present a general approach to correct the misspecified physical models in PINNs for discovering governing equations, given some sparse and/or noisy data. Specifically, we first encode the assumed physical models, which may be misspecified, then employ other deep neural networks (DNNs) to model the discrepancy between the imperfect models and the observational data. Due to the expressivity of DNNs, the proposed method is capable of reducing the computational errors caused by the model misspecification and thus enables the applications of PINNs in complex systems where the physical processes are not exactly known. Furthermore, we utilize the Bayesian PINNs (B-PINNs) and/or ensemble PINNs to quantify uncertainties arising from noisy and/or gappy data in the discovered governing equations. A series of numerical examples including non-Newtonian channel and cavity flows demonstrate that the added DNNs are capable of correcting the model misspecification in PINNs and thus reduce the discrepancy between the physical models and the observational data. We envision that the proposed approach will extend the applications of PINNs for discovering governing equations in problems where the physico-chemical or biological processes are not well understood.

Gotta be SAFE: A New Framework for Molecular Design

paper_url: http://arxiv.org/abs/2310.10773
repo_url: None
paper_authors: Emmanuel Noutahi, Cristian Gabellini, Michael Craig, Jonathan S. C Lim, Prudencio Tossou
for: 用于AI驱动的分子设计
methods: 使用Sequential Attachment-based Fragment Embedding（SAFE），一种新的线notation для化学结构
results: 通过训练一个87亿参数的GPT2-like模型，实现了多种优秀的优化性能，打开了新的化学空间探索的可能性，对AI驱动的分子设计具有广泛的应用前景

Abstract
Traditional molecular string representations, such as SMILES, often pose challenges for AI-driven molecular design due to their non-sequential depiction of molecular substructures. To address this issue, we introduce Sequential Attachment-based Fragment Embedding (SAFE), a novel line notation for chemical structures. SAFE reimagines SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining full compatibility with existing SMILES parsers. It streamlines complex generative tasks, including scaffold decoration, fragment linking, polymer generation, and scaffold hopping, while facilitating autoregressive generation for fragment-constrained design, thereby eliminating the need for intricate decoding or graph-based models. We demonstrate the effectiveness of SAFE by training an 87-million-parameter GPT2-like model on a dataset containing 1.1 billion SAFE representations. Through extensive experimentation, we show that our SAFE-GPT model exhibits versatile and robust optimization performance. SAFE opens up new avenues for the rapid exploration of chemical space under various constraints, promising breakthroughs in AI-driven molecular design.

摘要
传统分子串表示方式，如SMILES，经常对AI驱动分子设计带来挑战，因为它们不能正确地表示分子子结构的顺序。为解决这个问题，我们介绍Sequential Attachment-based Fragment Embedding（SAFE），一种新的化学结构表示方式。SAFE将SMILES字符串重新表示为一个无序的连接式块序列，同时保持与现有SMILES解析器的兼容性。这种方法可以简化复杂的生成任务，如架构饰 ornamentation、分子连接、聚合物生成和架构跳跃，并且可以支持束缚生成，从而消除需要复杂的解码或图形模型。我们在一个8700万参数的GPT2-like模型上训练了SAFE模型，并通过广泛的实验表明了我们的SAFE-GPT模型在不同的约束下进行优化表现灵活和强大。SAFE打开了新的化学空间探索的可能性，承诺AI驱动分子设计中的突破。

Unsupervised Lead Sheet Generation via Semantic Compression

paper_url: http://arxiv.org/abs/2310.10772
repo_url: https://github.com/zacharynovack/lead-ae
paper_authors: Zachary Novack, Nikita Srivatsan, Taylor Berg-Kirkpatrick, Julian McAuley
for: 本研究旨在提高生成音乐的效率和质量，通过生成与原始乐谱版本相对应的简化后的乐谱（lead sheet）。
methods: 我们提出了一种新的模型 called Lead-AE，它使用可控的局部稀缺约束来模型乐谱，并使用可导的top-k算法来实现简化后的乐谱可控。
results: 我们的方法比已有的决定性基线要好，可以生成具有准确性和完整性的简化后乐谱，并且在人工评价中也得到了良好的评价。

Abstract
Lead sheets have become commonplace in generative music research, being used as an initial compressed representation for downstream tasks like multitrack music generation and automatic arrangement. Despite this, researchers have often fallen back on deterministic reduction methods (such as the skyline algorithm) to generate lead sheets when seeking paired lead sheets and full scores, with little attention being paid toward the quality of the lead sheets themselves and how they accurately reflect their orchestrated counterparts. To address these issues, we propose the problem of conditional lead sheet generation (i.e. generating a lead sheet given its full score version), and show that this task can be formulated as an unsupervised music compression task, where the lead sheet represents a compressed latent version of the score. We introduce a novel model, called Lead-AE, that models the lead sheets as a discrete subselection of the original sequence, using a differentiable top-k operator to allow for controllable local sparsity constraints. Across both automatic proxy tasks and direct human evaluations, we find that our method improves upon the established deterministic baseline and produces coherent reductions of large multitrack scores.

摘要
乐谱简纸在生成音乐研究中变得普遍，用作下游任务的初始压缩表示，如多轨音乐生成和自动编排。尽管如此，研究人员通常会回归到决定性减少方法（如天空线算法）来生成乐谱简纸，未受到乐谱简纸本身质量和如何准确反映其管弦乐谱的关注。为解决这些问题，我们提出了 conditional lead sheet generation 问题（即根据全音乐谱版本生成乐谱简纸），并证明这可以视为一种无监督的音乐压缩任务，其中乐谱简纸表示一个压缩的 latent 序列。我们提出了一种新的模型， называ为 Lead-AE，该模型将乐谱简纸视为原始序列的一个离散子选择，使用可微 differentiable top-k 算子来实现可控的地方缺失约束。在自动代理任务和直接人类评估中，我们发现我们的方法比已有的决定性基线更好，并生成了大量多轨音乐谱的准确压缩。

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

paper_url: http://arxiv.org/abs/2310.10767
repo_url: None
paper_authors: Tianxiang Gao, Xiaokai Huo, Hailiang Liu, Hongyang Gao
for: 本研究探讨了深度平衡模型（DEQ），即无限层 neural network 的普适性和训练特性。
methods: 本研究使用了 neural ordinary differential equations（ODEs）和 deep equilibrium models（DEQs）来探讨深度平衡模型的性质。
results: 研究发现，当 DEQ 层宽度趋于无穷时，它会 converge to a Gaussian process，并且这种整合在深度和宽度之间进行交换时不会出现。此外，研究还发现，相关的 Gaussian vector 在任意两个不同输入数据对之间保持非零最小特征值，这使得 NNGP kernel 具有稳定性。这些发现对 DEQ 的训练和泛化做出了基础性的研究。

Abstract
Neural networks with wide layers have attracted significant attention due to their equivalence to Gaussian processes, enabling perfect fitting of training data while maintaining generalization performance, known as benign overfitting. However, existing results mainly focus on shallow or finite-depth networks, necessitating a comprehensive analysis of wide neural networks with infinite-depth layers, such as neural ordinary differential equations (ODEs) and deep equilibrium models (DEQs). In this paper, we specifically investigate the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process, establishing what is known as the Neural Network and Gaussian Process (NNGP) correspondence. Remarkably, this convergence holds even when the limits of depth and width are interchanged, which is not observed in typical infinite-depth Multilayer Perceptron (MLP) networks. Furthermore, we demonstrate that the associated Gaussian vector remains non-degenerate for any pairwise distinct input data, ensuring a strictly positive smallest eigenvalue of the corresponding kernel matrix using the NNGP kernel. These findings serve as fundamental elements for studying the training and generalization of DEQs, laying the groundwork for future research in this area.

摘要
神经网络with宽层有吸引了广泛关注，因为它们相当于 Gaussian 过程，可以完美适应训练数据而且保持泛化性能，称为恰好的过拟合。然而，现有的研究主要集中在浅或固定深度的神经网络上，需要进行广泛的抽象深度神经网络的研究，如神经ordinary differential equations（ODEs）和deep equilibrium models（DEQs）。在这篇论文中，我们专门研究深度平衡模型（DEQ），这是一个无限深度神经网络，具有共享权重矩阵的层。我们的分析表明，当 DEQ 层的宽度接近无穷大时，它会 converge to a Gaussian process，确立了称为神经网络和Gaussian过程（NNGP）匹配。很Remarkably，这种convergence 随着深度和宽度的限制的交换，不同于 typical 无限深度多层感知（MLP）网络。此外，我们证明了相关的 Gaussian vector 在任意不同输入数据对之间保持非零特征值，确保 smallest eigenvalue of the corresponding kernel matrix strictly positive using the NNGP kernel。这些发现对 DEQ 的训练和泛化提供了基本的元素，为将来在这个领域的研究奠定基础。

Exploring hyperelastic material model discovery for human brain cortex: multivariate analysis vs. artificial neural network approaches

paper_url: http://arxiv.org/abs/2310.10762
repo_url: None
paper_authors: Jixin Hou, Nicholas Filla, Xianyan Chen, Mir Jalil Razavi, Tianming Liu, Xianqiao Wang
for: 这个研究的目的是找到最适合人脑组织的 constitutive material model.
methods: 这个研究使用人工神经网络和多元回归方法来自动找到合适的 constitutive material model.
results: 研究发现，人工神经网络可以自动地找到准确的 constitutive material model，但是五个参数和两个参数神经网络模型在单模和多模载 scenarios下被发现是不优的，可以further simplifies into two-term和单term模型。这些发现 highlights the importance of hyperparameters for artificial neural network和emphasize the necessity for detailed cross-validations of regularization parameters to ensure optimal selection at a global level.

Abstract
Traditional computational methods, such as the finite element analysis, have provided valuable insights into uncovering the underlying mechanisms of brain physical behaviors. However, precise predictions of brain physics require effective constitutive models to represent the intricate mechanical properties of brain tissue. In this study, we aimed to identify the most favorable constitutive material model for human brain tissue. To achieve this, we applied artificial neural network and multiple regression methods to a generalization of widely accepted classic models, and compared the results obtained from these two approaches. To evaluate the applicability and efficacy of the model, all setups were kept consistent across both methods, except for the approach to prevent potential overfitting. Our results demonstrate that artificial neural networks are capable of automatically identifying accurate constitutive models from given admissible estimators. Nonetheless, the five-term and two-term neural network models trained under single-mode and multi-mode loading scenarios, were found to be suboptimal and could be further simplified into two-term and single-term, respectively, with higher accuracy using multiple regression. Our findings highlight the importance of hyperparameters for the artificial neural network and emphasize the necessity for detailed cross-validations of regularization parameters to ensure optimal selection at a global level in the development of material constitutive models. This study validates the applicability and accuracy of artificial neural network to automatically discover constitutive material models with proper regularization as well as the benefits in model simplification without compromising accuracy for traditional multivariable regression.

摘要
传统计算方法，如finite element分析，已经提供了许多关键的发现，揭示了大脑物理行为的下面机制。然而，精确预测大脑物理需要有效的 constitutive 模型来表示大脑组织的复杂机械性质。在本研究中，我们想要找到最佳的 constitutive 材料模型 для人类大脑组织。为了实现这一目标，我们使用人工神经网络和多重回归方法，并对这两种方法进行比较。为了评估模型的适用性和效果，所有的设置都保持了一致，除非是避免过拟合。我们的结果表明，人工神经网络可以自动地从给定的可接受的估计器中提取精确的 constitutive 模型。然而，在单模式和多模式荷载场景下，五项和二项神经网络模型被发现为不优化，可以进一步简化为二项和单项模型，具有更高的准确率。我们的发现强调了人工神经网络中的hyperparameter的重要性，并提醒了在开发物理模型时需要进行详细的交叉验证，以确保优选的全局范围内的正则化参数。本研究证明了人工神经网络可以自动地找到符合正则化的 constitutive 材料模型，并且可以避免过拟合而不会产生准确性下降。

Statistical Barriers to Affine-equivariant Estimation

paper_url: http://arxiv.org/abs/2310.10758
repo_url: None
paper_authors: Zihao Chen, Yeshwanth Cherapanamjeri
for: Robust mean estimation in high-dimensional datasets with affine-invariant properties.
methods: Affine-equivariant estimators, lower bounds, and a new estimator based on a high-dimensional median.
results: Strict degradation in recovery error with quantitative rates degrading by a factor of $\sqrt{d}$ under two outlier models, and a new affine-equivariant estimator that nearly matches the lower bound.

Abstract
We investigate the quantitative performance of affine-equivariant estimators for robust mean estimation. As a natural stability requirement, the construction of such affine-equivariant estimators has been extensively studied in the statistics literature. We quantitatively evaluate these estimators under two outlier models which have been the subject of much recent work: the heavy-tailed and adversarial corruption settings. We establish lower bounds which show that affine-equivariance induces a strict degradation in recovery error with quantitative rates degrading by a factor of $\sqrt{d}$ in both settings. We find that classical estimators such as the Tukey median (Tukey '75) and Stahel-Donoho estimator (Stahel '81 and Donoho '82) are either quantitatively sub-optimal even within the class of affine-equivariant estimators or lack any quantitative guarantees. On the other hand, recent estimators with strong quantitative guarantees are not affine-equivariant or require additional distributional assumptions to achieve it. We remedy this by constructing a new affine-equivariant estimator which nearly matches our lower bound. Our estimator is based on a novel notion of a high-dimensional median which may be of independent interest. Notably, our results are applicable more broadly to any estimator whose performance is evaluated in the Mahalanobis norm which, for affine-equivariant estimators, corresponds to an evaluation in Euclidean norm on isotropic distributions.

摘要
我们研究了不变式性的估计器在鲁棒均值估计中的量化性能。作为自然的稳定要求，建构这类不变式估计器在统计学Literature中得到了广泛的研究。我们量化地评估这些估计器在两种噪声模型下：重 tailed 和 adversarial corruption 设定下。我们建立了下限，显示不变式性导致了减少Recovery error的精度下限，具体是在两个设定下的 $\sqrt{d}$ 因子下降。我们发现经典估计器，如Tukey median（Tukey '75）和Stahel-Donoho estimator（Stahel '81和Donoho '82）在不变式估计器中是量化上不优或者没有量化保证。然而，现有的估计器具有强量化保证的，但是它们不是不变式的或者需要额外的分布假设来实现不变式性。我们提供了一种新的不变式估计器，它几乎与我们的下限匹配。我们的估计器基于一种新的高维度中位数据，这可能是独立的兴趣。值得注意的是，我们的结果适用于任何在Mahalanobis 距离上评估的估计器，这对于不变式估计器来说相当于在几何均勋度上评估。

Mori-Zwanzig latent space Koopman closure for nonlinear autoencoder

paper_url: http://arxiv.org/abs/2310.10745
repo_url: None
paper_authors: Priyam Gupta, Peter J. Schmid, Denis Sipp, Taraneh Sayadi, Georgios Rigas
for: 该研究旨在提高数据驱动方法的精度和稳定性，以便更好地理解和预测复杂非线性系统的动态。
methods: 该研究提出了一种新的Morzy-Zwanzig自适应器（MZ-AE），通过非线性自适应器提取关键观察量，并通过Morzy-Zwanzigormalism来修正非马洛夫矩阵，实现了closed的动态表示。
results: 实验表明，MZ-AE可以准确地捕捉圆柱体流动中的模式转移，并提供了低维度的预测模型，对恒定 Kuramoto-Sivashinsky 系统 exhibit promising short-term predictability和robust long-term statistical performance。

Abstract
The Koopman operator presents an attractive approach to achieve global linearization of nonlinear systems, making it a valuable method for simplifying the understanding of complex dynamics. While data-driven methodologies have exhibited promise in approximating finite Koopman operators, they grapple with various challenges, such as the judicious selection of observables, dimensionality reduction, and the ability to predict complex system behaviours accurately. This study presents a novel approach termed Mori-Zwanzig autoencoder (MZ-AE) to robustly approximate the Koopman operator in low-dimensional spaces. The proposed method leverages a nonlinear autoencoder to extract key observables for approximating a finite invariant Koopman subspace and integrates a non-Markovian correction mechanism using the Mori-Zwanzig formalism. Consequently, this approach yields a closed representation of dynamics within the latent manifold of the nonlinear autoencoder, thereby enhancing the precision and stability of the Koopman operator approximation. Demonstrations showcase the technique's ability to capture regime transitions in the flow around a circular cylinder. It also provided a low dimensional approximation for chaotic Kuramoto-Sivashinsky with promising short-term predictability and robust long-term statistical performance. By bridging the gap between data-driven techniques and the mathematical foundations of Koopman theory, MZ-AE offers a promising avenue for improved understanding and prediction of complex nonlinear dynamics.

摘要
科普曼运算符提供了一种globally linearization的方法，使得非线性系统的理解得以简化。虽然数据驱动的方法在 aproximate Koopman operator 方面表现出了承诺，但它们还需要解决一些挑战，例如选择合适的观察量、维度减少和准确预测复杂系统行为。这种研究提出了一种新的方法，即Mori-Zwanzig autoencoder (MZ-AE)，以稳定地 aproximate Koopman operator 在低维空间中。该方法利用非线性自适应神经网络提取关键观察量，并通过Morin Zwanzig 正则进行修正。因此，该方法可以在非线性自适应神经网络的 latent manifold 中 closure 动力学，从而提高 Koopman operator 的准确性和稳定性。示例显示该方法可以在圆柱体流动中捕捉转态。此外，它还为混沌 Kuramoto-Sivashinsky 提供了一种低维度的近似，并且在短期预测和长期统计性能方面具有承诺。通过将数据驱动技术与 Koopman 理论的数学基础相连接，MZ-AE 提供了一条可能的通路，以提高复杂非线性动力学的理解和预测。

Fast Adversarial Label-Flipping Attack on Tabular Data

paper_url: http://arxiv.org/abs/2310.10744
repo_url: None
paper_authors: Xinglong Chang, Gillian Dobbie, Jörg Wicker
for: 这篇研究旨在阐述机器学习模型在需要高可靠性的领域中受到攻击的问题，以及这些攻击的威胁。
methods: 本研究提出了一种新的快速攻击方法，即 Fast Adversarial Label-Flipping Attack (FALFA)，用于游戏机器学习模型。FALFA基于对敌人目标的变数转换，并使用线性程式来降低计算Complexity。
results: 使用了10个真实的条形数据集，研究发现FALFA具有高度的攻击潜力，显示了需要robust的防御措施。

Abstract
Machine learning models are increasingly used in fields that require high reliability such as cybersecurity. However, these models remain vulnerable to various attacks, among which the adversarial label-flipping attack poses significant threats. In label-flipping attacks, the adversary maliciously flips a portion of training labels to compromise the machine learning model. This paper raises significant concerns as these attacks can camouflage a highly skewed dataset as an easily solvable classification problem, often misleading machine learning practitioners into lower defenses and miscalculations of potential risks. This concern amplifies in tabular data settings, where identifying true labels requires expertise, allowing malicious label-flipping attacks to easily slip under the radar. To demonstrate this risk is inherited in the adversary's objective, we propose FALFA (Fast Adversarial Label-Flipping Attack), a novel efficient attack for crafting adversarial labels. FALFA is based on transforming the adversary's objective and employs linear programming to reduce computational complexity. Using ten real-world tabular datasets, we demonstrate FALFA's superior attack potential, highlighting the need for robust defenses against such threats.

摘要

MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design

paper_url: http://arxiv.org/abs/2310.10732
repo_url: None
paper_authors: Xiang Fu, Tian Xie, Andrew S. Rosen, Tommi Jaakkola, Jake Smith
for: 这项研究旨在开发一种基于推卷模型的金属有机框架（MOF）结构生成器，以便为碳捕集应用提供高性能的MOF材料。
methods: 该研究使用了一种基于等距离推卷模型的diffusion模型，通过对分子组分部件坐标和identities进行杜磊推卷过程，生成高精度的MOF结构。然后，通过一种新型组装算法，确定了全原子MOF结构。
results: 研究人员通过分子仿真实验，证明了该模型可以生成有效和新型的MOF结构，并且可以有效地设计出standing MOFOaterials for carbon capture应用。

Abstract
Metal-organic frameworks (MOFs) are of immense interest in applications such as gas storage and carbon capture due to their exceptional porosity and tunable chemistry. Their modular nature has enabled the use of template-based methods to generate hypothetical MOFs by combining molecular building blocks in accordance with known network topologies. However, the ability of these methods to identify top-performing MOFs is often hindered by the limited diversity of the resulting chemical space. In this work, we propose MOFDiff: a coarse-grained (CG) diffusion model that generates CG MOF structures through a denoising diffusion process over the coordinates and identities of the building blocks. The all-atom MOF structure is then determined through a novel assembly algorithm. Equivariant graph neural networks are used for the diffusion model to respect the permutational and roto-translational symmetries. We comprehensively evaluate our model's capability to generate valid and novel MOF structures and its effectiveness in designing outstanding MOF materials for carbon capture applications with molecular simulations.

摘要
金属有机框架（MOF）在应用于气体存储和碳捕集等领域具有极高的利用价值，这主要归功于它们的非常的孔隙和可调化化学结构。MOF的模块性质使得可以通过模板基本方法生成 гипотетическихMOF结构，这些结构是通过将分子结构块组合在已知网络结构中来实现的。然而，这些方法的选择性往往受到生成化学空间的局限性的影响。在这种情况下，我们提出了MOFDiff：一种粗粒度（CG）扩散模型，该模型通过CG结构块坐标和分子标识的杜瓦扩散过程来生成CG MOF结构。然后，我们使用一种新的组装算法来确定全原子MOF结构。我们使用对称图 Néural networks来实现扩散模型，以保持分子的卷积和旋转平移 symmetries。我们对我们的模型的有效性进行了广泛的评估，并通过分子价值计算来评估MOF材料在碳捕集应用中的性能。

A representation learning approach to probe for dynamical dark energy in matter power spectra

paper_url: http://arxiv.org/abs/2310.10717
repo_url: None
paper_authors: Davide Piras, Lucas Lombriser
for: searched for a compressed representation of dynamical dark energy models in observational studies of the cosmic large-scale structure.
methods: trained a variational autoencoder (VAE) architecture, DE-VAE, on matter power spectra boosts generated at different redshift values and wavenumbers, and used a neural network to compress and reconstruct the boosts.
results: found that a single latent parameter is sufficient to predict 95% (99%) of DE power spectra within $1\sigma$ ($2\sigma$) of a Gaussian error, and the three variables can be linked together with an explicit equation through symbolic regression.

Abstract
We present DE-VAE, a variational autoencoder (VAE) architecture to search for a compressed representation of dynamical dark energy (DE) models in observational studies of the cosmic large-scale structure. DE-VAE is trained on matter power spectra boosts generated at wavenumbers $k\in(0.01-2.5) \ h/\rm{Mpc}$ and at four redshift values $z\in(0.1,0.48,0.78,1.5)$ for the most typical dynamical DE parametrization with two extra parameters describing an evolving DE equation of state. The boosts are compressed to a lower-dimensional representation, which is concatenated with standard cold dark matter (CDM) parameters and then mapped back to reconstructed boosts; both the compression and the reconstruction components are parametrized as neural networks. Remarkably, we find that a single latent parameter is sufficient to predict 95% (99%) of DE power spectra generated over a broad range of cosmological parameters within $1\sigma$ ($2\sigma$) of a Gaussian error which includes cosmic variance, shot noise and systematic effects for a Stage IV-like survey. This single parameter shows a high mutual information with the two DE parameters, and these three variables can be linked together with an explicit equation through symbolic regression. Considering a model with two latent variables only marginally improves the accuracy of the predictions, and adding a third latent variable has no significant impact on the model's performance. We discuss how the DE-VAE architecture can be extended from a proof of concept to a general framework to be employed in the search for a common lower-dimensional parametrization of a wide range of beyond-$\Lambda$CDM models and for different cosmological datasets. Such a framework could then both inform the development of cosmological surveys by targeting optimal probes, and provide theoretical insight into the common phenomenological aspects of beyond-$\Lambda$CDM models.

摘要
我们提出了DE-VAE，一种简化自动抽象（VAE）架构，用于在观测宇宙大规模结构的观测学中寻找压缩表现。DE-VAE 被训练在物质能谱强化器中，这些强化器在几何常数 $k\in(0.01-2.5) \ h/\rm{Mpc}$ 和四个红shift值 $z\in(0.1,0.48,0.78,1.5)$ 上生成了最常见的动态暗能（DE）模型的两个额外参数。这些强化器被压缩到较低维度的表现，并与标准冷黑物质（CDM）参数一起 concatenated，然后将其映射回重建的强化器；压缩和重建都是用神经网 parametrized。我们发现，仅具一个潜在参数可以预测95%（99%）的DE强化器生成的广泛范围的 cosmological 参数之间的误差，包括cosmic variance、shot noise和系统效应。这个单一参数与 DE 两个参数之间存在高的共同信息，这三个变数可以通过symbolic regression 连接起来。仅具二个潜在参数的情况只有marginally 提高了预测的精度，而添加第三个潜在参数没有显著影响模型的性能。我们讨论了DE-VAE 架构如何从证明理论中扩展到一个通用的架构，以便在不同的 cosmological 资料集上寻找一致的下dimensional parametrization。这个架构可以帮助发展 cosmological 调查，targeting 最佳探针，并提供理论上的共同现象描述。

A Computational Framework for Solving Wasserstein Lagrangian Flows

paper_url: http://arxiv.org/abs/2310.10649
repo_url: https://github.com/necludov/wl-mechanics
paper_authors: Kirill Neklyudov, Rob Brekelmans, Alexander Tong, Lazar Atanackovic, Qiang Liu, Alireza Makhzani
for: 这篇论文主要针对单元细胞动力学问题进行优化运输问题的扩展，包括不同的可能性空间（kinetic energy）和权重函数（potential energy）的组合，以及这些组合所导致的多种优化运输问题，如契德桥、不均习运输、物理约束等。
methods: 该论文提出了一种基于深度学习的框架，可以处理这些优化运输问题的复杂计算。该框架不需要直接 simulate 或 backpropagate learned dynamics，也不需要优化couplings。
results: 作者们在单元细胞动力学问题中展示了该框架的灵活性和高效性，并比 précédentes 方法（含 incorporating prior knowledge into the dynamics）取得了更好的结果。

Abstract
The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry ($\textit{kinetic energy}$), and the regularization of density paths ($\textit{potential energy}$). These combinations yield different variational problems ($\textit{Lagrangians}$), encompassing many variations of the optimal transport problem such as the Schr\"odinger bridge, unbalanced optimal transport, and optimal transport with physical constraints, among others. In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. Leveraging the dual formulation of the Lagrangians, we propose a novel deep learning based framework approaching all of these problems from a unified perspective. Our method does not require simulating or backpropagating through the trajectories of the learned dynamics, and does not need access to optimal couplings. We showcase the versatility of the proposed framework by outperforming previous approaches for the single-cell trajectory inference, where incorporating prior knowledge into the dynamics is crucial for correct predictions.

摘要
“Optimal transport问题的动力学表述可以通过不同的下层结构（动能）和扩散函数（potential energy）的选择扩展。这些组合导致了不同的变量问题（Lagrangians），包括舒得桥、不均衡优化运输、物理约束优化运输等。总的来说，优化的扩散路径未知，解决这些变量问题可能会 computationally challenging。我们基于对准形式的方法提出了一种新的深度学习框架，该框架不需要 simulate或backpropagate通过学习的动力学轨迹，也不需要对优化的扩散函数进行访问。我们展示了该框架的多样性，在单个细胞轨迹推断中超过了先前的方法。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Efficacy of Dual-Encoders for Extreme Multi-Label Classification

paper_url: http://arxiv.org/abs/2310.10636
repo_url: None
paper_authors: Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit S Dhillon
for: 这个论文主要针对的是多类分类问题（Extreme Multi-Label Classification，XMC），具体来说是使用 dual-encoder 模型来解决这类问题。
methods: 这篇论文使用了 dual-encoder 模型，并且提出了一种新的损失函数来优化 Recall@k 纪录。
results: 论文的实验结果表明，使用标准的 dual-encoder 模型可以与现有的 SOTA 多类分类方法匹配或超越，即使是在最大的 XMC 数据集上。此外，论文还提出了一种可微的 topk 错误基于损失函数，可以专门优化 Recall@k 纪录。

Abstract
Dual-encoder models have demonstrated significant success in dense retrieval tasks for open-domain question answering that mostly involves zero-shot and few-shot scenarios. However, their performance in many-shot retrieval problems where training data is abundant, such as extreme multi-label classification (XMC), remains under-explored. Existing empirical evidence suggests that, for such problems, the dual-encoder method's accuracies lag behind the performance of state-of-the-art (SOTA) extreme classification methods that grow the number of learnable parameters linearly with the number of classes. As a result, some recent extreme classification techniques use a combination of dual-encoders and a learnable classification head for each class to excel on these tasks. In this paper, we investigate the potential of "pure" DE models in XMC tasks. Our findings reveal that when trained correctly standard dual-encoders can match or outperform SOTA extreme classification methods by up to 2% at Precision@1 even on the largest XMC datasets while being 20x smaller in terms of the number of trainable parameters. We further propose a differentiable topk error-based loss function, which can be used to specifically optimize for Recall@k metrics. We include our PyTorch implementation along with other resources for reproducing the results in the supplementary material.

摘要
dual-encoder 模型在开放问题 answering 中的 dense retrieval 任务中表现出了重要的成功，特别是在零shot 和几shot 场景下。然而，它们在有很多training data的 many-shot retrieval 问题中，如极多标签分类 (XMC)，的性能还未得到了充分的探索。现有的实际证据表明，对于这些任务， dual-encoder 方法的准确率落后于 state-of-the-art (SOTA) 极分类方法的性能，后者通过将学习参数的数量与类数直线上增加来提高性能。因此，一些最新的极分类技术使用了 dual-encoder 和每个类别上的学习权重的组合来进行优化。在这篇论文中，我们调查了 "纯" dual-encoder 模型在 XMC 任务中的潜力。我们发现，当正确地训练标准 dual-encoder 模型时，它们可以与 SOTA 极分类方法相当或超越它们，在最大 XMC 数据集上的精度@1 指标上提高至2%，而且只需20倍的学习参数数量。我们还提出了一种可微的 topk 错误基于损失函数，可以专门优化 Recall@k 指标。我们在辅料中包含了 PyTorch 实现以及其他用于重现结果的资源。

Certainty In, Certainty Out: REVQCs for Quantum Machine Learning

paper_url: http://arxiv.org/abs/2310.10629
repo_url: None
paper_authors: Hannah Helgesen, Michael Felsberg, Jan-Åke Larsson
for: 这篇论文的目的是提出高单个样本准确率作为主要目标，并提出一种逆向培训方法以实现此目标。
methods: 该论文使用统计理论和反向培训方法来实现高准确率单个样本推断。
results: 论文通过评估多种有效的变换量量计划（VQC）在随机二进制子集上进行单个样本推断时，实现了10-15%的提升。

Abstract
The field of Quantum Machine Learning (QML) has emerged recently in the hopes of finding new machine learning protocols or exponential speedups for classical ones. Apart from problems with vanishing gradients and efficient encoding methods, these speedups are hard to find because the sampling nature of quantum computers promotes either simulating computations classically or running them many times on quantum computers in order to use approximate expectation values in gradient calculations. In this paper, we make a case for setting high single-sample accuracy as a primary goal. We discuss the statistical theory which enables highly accurate and precise sample inference, and propose a method of reversed training towards this end. We show the effectiveness of this training method by assessing several effective variational quantum circuits (VQCs), trained in both the standard and reversed directions, on random binary subsets of the MNIST and MNIST Fashion datasets, on which our method provides an increase of $10-15\%$ in single-sample inference accuracy.

摘要
quantum机器学习（QML）领域在最近才出现，旨在找到新的机器学习协议或类比速度。不过，由于混合度难以计算和有效编码方法，这些增速很难找。在这篇论文中，我们提出了设置高单个样本准确率为主要目标的观点。我们讨论了 Statistical Theory，它使得高准确和精确的样本推测成为可能，并提出了反向训练方法。我们通过评估多种有效的量子征值逻辑环（VQC），在标准和反向方向上进行训练，在随机二进制subset of MNIST和MNIST Fashion数据集上显示了10-15%的单个样本推测准确率提高。

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

paper_url: http://arxiv.org/abs/2310.10616
repo_url: None
paper_authors: Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai
for: 这篇论文探讨了基于转换器架构的大语言模型在更复杂的情况下的启发式学习（ICL）能力，以及这种能力的理论和机制。
methods: 作者构造了一系列基于复杂表达函数的synthetic ICLe学习问题，并证明了存在特定的转换器可以近似地实现这些算法，只需要较少的深度和大小。在实验中，作者发现训练过的转换器能够在这些设定下达到近似optimal ICL性能，并展示了层次结构的分解，其中下层层transforms the dataset，而上层进行线性ICL。
results: 作者通过广泛的探索和一种新的粘贴实验发现了许多在训练过的转换器中的机制，如输入和表示的具体复制行为，上层线性ICL能力，以及在更加复杂的混合 Setting中的后ICL表示选择机制。这些观察到的机制与作者的理论相符，可能有助于理解转换器在更真实的场景中的ICL能力。

Abstract
While large language models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) capabilities, understandings of such capabilities are still in an early stage, where existing theory and mechanistic understanding focus mostly on simple scenarios such as learning simple function classes. This paper takes initial steps on understanding ICL in more complex scenarios, by studying learning with representations. Concretely, we construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function, composed with a linear function that differs in each instance. By construction, the optimal ICL algorithm first transforms the inputs by the representation function, and then performs linear ICL on top of the transformed dataset. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size. Empirically, we find trained transformers consistently achieve near-optimal ICL performance in this setting, and exhibit the desired dissection where lower layers transforms the dataset and upper layers perform linear ICL. Through extensive probing and a new pasting experiment, we further reveal several mechanisms within the trained transformers, such as concrete copying behaviors on both the inputs and the representations, linear ICL capability of the upper layers alone, and a post-ICL representation selection mechanism in a harder mixture setting. These observed mechanisms align well with our theory and may shed light on how transformers perform ICL in more realistic scenarios.

摘要
大型语言模型基于变换器架构已经展示了很出色的上下文学习（ICL）能力，但对这些能力的理解仍然处于早期阶段，现有的理论和机制理解主要集中在简单的情景下，如学习简单的函数类。这篇论文从更复杂的情景出发，研究学习表示法。具体来说，我们构造了一些具有复合结构的培成式上下文学习问题，其中标签取决于输入的表示函数，这个函数可能是复杂的但固定的。因此，最佳的ICL算法首先将输入转化为表示函数的输出，然后在这些输出上进行线性ICL。我们证明了在某种程度上，存在可以近似实现这种算法的变换器，只需要较少的深度和大小。Empirically，我们发现训练后的变换器在这种设定下具有近乎最佳的ICL性能，并且展现出了预期的分割，下层层次将输入数据转化，而上层层次进行线性ICL。通过广泛的探索和一种新的粘贴实验，我们还发现了许多内部机制，如输入和表示的具体复制行为，上层层次的线性ICL能力，以及在更复杂的混合 Setting下的后ICL表示选择机制。这些观察到的机制与我们的理论相吻合，可能为ICL在更实际的情景中的研究提供了灵感。

IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

paper_url: http://arxiv.org/abs/2310.10611
repo_url: None
paper_authors: Taejong Joo, Diego Klabjan
for: 这篇论文旨在解决机器学习中的分布偏移问题，即在不具备标签的情况下，在分布偏移后的预测 зада务中保持高度的准确性。
methods: 该论文提出了一种重要性权重Weighted group accuracy estimator，通过提出一个优化问题，找到一个能够在分布偏移后的预测任务中准确地估计分组准确率的重要性权重。同时，该论文也进行了理论分析。
results: 经过广泛的实验，论文证明了该重要性权重Weighted group accuracy estimator的效果，可以帮助解决不supervised domain adaptation问题中的模型校准和模型选择问题。同时，该论文还提出了一种新的改进方向，即通过提高分组准确率来提高模型的转移率。

Abstract
Reasoning about a model's accuracy on a test sample from its confidence is a central problem in machine learning, being connected to important applications such as uncertainty representation, model selection, and exploration. While these connections have been well-studied in the i.i.d. settings, distribution shifts pose significant challenges to the traditional methods. Therefore, model calibration and model selection remain challenging in the unsupervised domain adaptation problem--a scenario where the goal is to perform well in a distribution shifted domain without labels. In this work, we tackle difficulties coming from distribution shifts by developing a novel importance weighted group accuracy estimator. Specifically, we formulate an optimization problem for finding an importance weight that leads to an accurate group accuracy estimation in the distribution shifted domain with theoretical analyses. Extensive experiments show the effectiveness of group accuracy estimation on model calibration and model selection. Our results emphasize the significance of group accuracy estimation for addressing challenges in unsupervised domain adaptation, as an orthogonal improvement direction with improving transferability of accuracy.

摘要
machine learning 中，关于模型在测试样本上的准确性的推理是一个中心问题，与重要应用领域如不确定性表示、模型选择和探索相连。然而，在不同分布下 poses significant challenges to traditional methods。因此，模型准确性和模型选择在无supervised domain adaptation问题中仍然是挑战。在这种情况下，我们通过开发一种重要性加权组准精度估计器来解决分布shift的困难。specifically，我们提出了一个优化问题，找到一个导致在分布shifted domain中准确的组准精度估计器，并进行了理论分析。我们的实验表明，组准精度估计器对模型准确性和模型选择具有重要的作用，并且是一个对照 Transferability of accuracy的正交改进方向。

BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

paper_url: http://arxiv.org/abs/2310.10606
repo_url: None
paper_authors: Tianle Huang, Nitish Sontakke, K. Niranjan Kumar, Irfan Essa, Stefanos Nikolaidis, Dennis W. Hong, Sehoon Ha
for: 降低 sim2real 距离
methods: 自适应域随机化 + 精细调整
results: 比 vanilla DR 或 Bayesian DR 更高的奖励得分，同样的时间步数内Here’s a more detailed explanation of each point:
for: The paper is written to address the issue of careful tuning of randomization parameters in domain randomization (DR) methods, and to propose a new method called Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune) that can significantly accelerate the learning process by fine-tuning from previously learned policy.
methods: The proposed BayRnTune method inherits the spirit of Bayesian DR but with a key difference - it uses strategic fine-tuning of the previous policy to adapt to new environments. The method is evaluated in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments.
results: The results show that BayRnTune yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR. This suggests that the proposed method can significantly accelerate the learning process and improve the performance of DR in robotics.

Abstract
Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR.

摘要
域随机化（DR），即在训练策略时随机使用不同的动力学，已经证明是一种简单 yet effective的算法，可以减少实际世界和模拟之间的差距。然而，DR通常需要仔细调整随机参数。例如， bayesian domain randomization（Bayesian DR）和活动域随机化（Adaptive DR）可以自动选择随机参数的范围，使用实际世界经验。尽管有效，这些算法通常需要长时间的计算，因为每轮训练都需要从头开始训练一个新的策略。在这种情况下，我们提出了 adaptive bayesian domain randomization via strategic fine-tuning（BayRnTune），它继承了 BayRn 的精神，但是强调快速学习过程，通过对之前学习的策略进行细化来加速学习。这个想法引出了一个关键的问题：我们在细化过程中应该使用哪个先前学习的策略作为先前？我们 investigate了四种不同的细化策略，并与基线算法进行比较在五个模拟环境中，这些环境从简单的benchmark任务到更复杂的四肢机器人环境。我们的分析表明，我们的方法可以在同样的时间步骤内达到更高的奖励。

Pareto Optimization to Accelerate Multi-Objective Virtual Screening

paper_url: http://arxiv.org/abs/2310.10598
repo_url: None
paper_authors: Jenna C. Fromer, David E. Graff, Connor W. Coley
for: 本研究旨在透过多属性算法来快速找到具有强烈结合性、最小化副作用和适当的药物性特性的药物分子。
methods: 本研究使用多属性贝叶斯搜寻来减少虚拟实验成本，并运用这种方法在确定蛋白质和副标的对应的选择性抑制剂中找到适当的药物分子。
results: 本研究发现，使用多属性贝叶斯搜寻可以快速找到具有强烈结合性、最小化副作用和适当的药物性特性的药物分子，并且可以实现对虚拟实验中的药物分子库进行高效的搜寻和范畴化。

Abstract
The discovery of therapeutic molecules is fundamentally a multi-objective optimization problem. One formulation of the problem is to identify molecules that simultaneously exhibit strong binding affinity for a target protein, minimal off-target interactions, and suitable pharmacokinetic properties. Inspired by prior work that uses active learning to accelerate the identification of strong binders, we implement multi-objective Bayesian optimization to reduce the computational cost of multi-property virtual screening and apply it to the identification of ligands predicted to be selective based on docking scores to on- and off-targets. We demonstrate the superiority of Pareto optimization over scalarization across three case studies. Further, we use the developed optimization tool to search a virtual library of over 4M molecules for those predicted to be selective dual inhibitors of EGFR and IGF1R, acquiring 100% of the molecules that form the library's Pareto front after exploring only 8% of the library. This workflow and associated open source software can reduce the screening burden of molecular design projects and is complementary to research aiming to improve the accuracy of binding predictions and other molecular properties.

摘要
发现治疗分子是一个多目标优化问题的基本问题。一种形ulation的问题是通过同时具有高绑定亲和力、最小的偶折受影响和合适的药物生物学性 Properties 来认定分子。取得了先前工作使用活动学习加速绑定分子的识别的灵感，我们实现了多属性权重优化来降低虚拟屏选中计算成本，并应用于预测绑定分子的药物设计中。我们在三个案例中证明了对比权重优化的优势，并使用开发的优化工具来搜索虚拟库中的可选性双抑制剂。通过探索虚拟库的8% only，我们收获了虚拟库的极值 front 上的100%分子。这种工作流和相关的开源软件可以减轻分子设计项目的屏选负担，并且与尝试提高绑定预测和其他分子性质的研究相 complementary。

HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

paper_url: http://arxiv.org/abs/2310.10565
repo_url: None
paper_authors: Lanxiang Xing, Haixu Wu, Yuezhou Ma, Jianmin Wang, Mingsheng Long
for: 这个论文旨在提出一种准确且可解释的流体模拟器，即HelmSim，以解决流体动力学的长期挑战。
methods: 该论文提出了一种基于Helmholtz定理的HelmDynamic块，该块将流体动力学分解为更容易解决的curl-free和divergence-free部分，物理相应于流体的潜potential和流体流函数。该块被 embedding到一个多尺度 интеграцион网络中，以 интеGRATE temporal维度上的多个空间尺度的Helmholtz动力学。
results: 对比previoius velocity estimating方法，HelmSim具有 faithful derived from Helmholtz theorem和Physically interpretable evidence，并在 numerically simulated和实际观测的标准准样中实现了一致的state-of-the-art表现，即使在复杂的边界条件下。

Abstract
Fluid simulation is a long-standing challenge due to the intrinsic high-dimensional non-linear dynamics. Previous methods usually utilize the non-linear modeling capability of deep models to directly estimate velocity fields for future prediction. However, skipping over inherent physical properties but directly learning superficial velocity fields will overwhelm the model from generating precise or physics-reliable results. In this paper, we propose the HelmSim toward an accurate and interpretable simulator for fluid. Inspired by the Helmholtz theorem, we design a HelmDynamic block to learn the Helmholtz dynamics, which decomposes fluid dynamics into more solvable curl-free and divergence-free parts, physically corresponding to potential and stream functions of fluid. By embedding the HelmDynamic block into a Multiscale Integration Network, HelmSim can integrate learned Helmholtz dynamics along temporal dimension in multiple spatial scales to yield future fluid. Comparing with previous velocity estimating methods, HelmSim is faithfully derived from Helmholtz theorem and ravels out complex fluid dynamics with physically interpretable evidence. Experimentally, our proposed HelmSim achieves the consistent state-of-the-art in both numerical simulated and real-world observed benchmarks, even for scenarios with complex boundaries.

摘要
fluid 模拟是一个长期的挑战，因为它的自然高维非线性动力学性。以前的方法通常使用深度模型的非线性建模能力直接估算未来的速度场，但是跳过了内在物理属性，直接学习 superficies 的速度场将会让模型生成精度不高或者物理可靠的结果。在这篇论文中，我们提出了 HelmSim，一种准确和可解释的流体模拟器。受 helmholtz 定理启发，我们设计了 HelmDynamic 块，用于学习 helmholtz 动力学，它将流体动力学分解为更可解决的 curl-free 和 divergence-free 部分，物理相应于流体的潜在函数和流函数。通过在多尺度练习网络中嵌入 HelmDynamic 块，HelmSim 可以在多个空间尺度上将学习的 helmholtz 动力学集成到时间维度上，以生成未来的流体。相比之前的速度估计方法，HelmSim 是准确地从 helmholtz 定理中派生出来，并且可以揭示出复杂的流体动力学性，并且具有物理可解的证据。实验表明，我们提出的 HelmSim 在数值 simulate 和实际观测的标准准确，即使场景具有复杂的边界。

Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data

paper_url: http://arxiv.org/abs/2310.10559
repo_url: None
paper_authors: Mouad El Bouchattaoui, Myriam Tami, Benoit Lepetit, Paul-Henry Cournède
for: 这篇论文的目的是用来估计治疗效果的变化趋势，特别是在精准医学、epidemiology、经济和市场营销等领域。
methods: 这篇论文使用了一种新的方法，即假设存在不观察到的风险因素（也称为调整变量），这些风险因素只影响短期内的结果。
results: 论文的实验结果显示，这种新方法可以准确地估计个体治疗效果，并能够捕捉长期内治疗响应中的不观察到风险因素。

Abstract
Estimating treatment effects over time is relevant in many real-world applications, such as precision medicine, epidemiology, economy, and marketing. Many state-of-the-art methods either assume the observations of all confounders or seek to infer the unobserved ones. We take a different perspective by assuming unobserved risk factors, i.e., adjustment variables that affect only the sequence of outcomes. Under unconfoundedness, we target the Individual Treatment Effect (ITE) estimation with unobserved heterogeneity in the treatment response due to missing risk factors. We address the challenges posed by time-varying effects and unobserved adjustment variables. Led by theoretical results over the validity of the learned adjustment variables and generalization bounds over the treatment effect, we devise Causal DVAE (CDVAE). This model combines a Dynamic Variational Autoencoder (DVAE) framework with a weighting strategy using propensity scores to estimate counterfactual responses. The CDVAE model allows for accurate estimation of ITE and captures the underlying heterogeneity in longitudinal data. Evaluations of our model show superior performance over state-of-the-art models.

摘要
在许多实际应用中，如精准医学、 Epidemiology、经济和市场营销中，估计治疗效果的演化是非常重要的。许多现代方法都是假设所有干扰因素的观察，或者尝试推断未观察到的干扰因素。我们采取了一种不同的视角，假设存在未观察到的风险因素，即调整变量，这些变量只影响结果序列。在干扰性下，我们target个人治疗效果（ITE）估计，带有未观察到的多变性。我们解决了时间变化的效果和未观察到的调整变量的挑战。通过理论结果的有效性和权重分配策略使用可能性分数来估计对应响应，我们设计了 causal DVAE（CDVAE）模型。这个模型结合了动态变量自动编码器（DVAE）框架和一种利用可能性分数进行权重分配的策略，以估计对应响应。 CDVAE 模型允许精准地估计 ITE，并捕捉了长期数据中的下降多变性。我们对我们的模型进行评估，并证明它们在现有模型中表现出色。

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

paper_url: http://arxiv.org/abs/2310.10556
repo_url: None
paper_authors: Zihao Li, Xiang Ji, Minshuo Chen, Mengdi Wang
for: solves reinforcement learning problems with human preference data
methods: uses actor-critic methods and fitted-Q-evaluation with a deep neural network
results: establishes a sample-efficient estimator for off-policy evaluation with high reward smoothness, and almost aligns with classical OPE results with observable reward data.

Abstract
A recently popular approach to solving reinforcement learning is with data from human preferences. In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve evaluating an intermediate policy over a reward learned from human preference data with distribution shift, known as off-policy evaluation (OPE). Such algorithm includes (i) learning reward function from human preference dataset, and (ii) learning expected cumulative reward of a target policy. Despite the huge empirical success, existing OPE methods with preference data often lack theoretical understanding and rely heavily on heuristics. In this paper, we study the sample efficiency of OPE with human preference and establish a statistical guarantee for it. Specifically, we approach OPE by learning the value function by fitted-Q-evaluation with a deep neural network. By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Under the assumption of high reward smoothness, our results \textit{almost align with the classical OPE results with observable reward data}. To the best of our knowledge, this is the first result that establishes a \textit{provably efficient} guarantee for off-policy evaluation with RLHF.

摘要
一种最近受欢迎的解决方案是使用人类偏好数据来解决强化学习问题。实际上，人类偏好数据现在与 классиical 强化学习算法（如actor-critic方法）结合使用，这些算法包括（i）从人类偏好数据集中学习奖励函数，和（ii）使用learned reward的Off-Policy Evaluation（OPE）来评估目标策略的预期总奖励。 despite the huge empirical success, existing OPE methods with preference data often lack theoretical understanding and rely heavily on heuristics. In this paper, we study the sample efficiency of OPE with human preference and establish a statistical guarantee for it. Specifically, we approach OPE by learning the value function by fitted-Q-evaluation with a deep neural network. By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Under the assumption of high reward smoothness, our results almost align with the classical OPE results with observable reward data. To the best of our knowledge, this is the first result that establishes a provably efficient guarantee for off-policy evaluation with RLHF.

Population-based wind farm monitoring based on a spatial autoregressive approach

paper_url: http://arxiv.org/abs/2310.10555
repo_url: None
paper_authors: W. Lin, K. Worden, E. J. Cross
for: 降低风力电站运行和维护成本
methods: 使用人口基于的结构健康监测系统，并利用多个结构（i.e.~风机）共享数据来提高结构行为预测
results: 提出了一种基于 Gaussian process 的空间自回归模型（GP-SPARX 模型），可以正确捕捉风机群的空间和时间相关性，并且可以用于健康监测系统的实现。

Abstract
An important challenge faced by wind farm operators is to reduce operation and maintenance cost. Structural health monitoring provides a means of cost reduction through minimising unnecessary maintenance trips as well as prolonging turbine service life. Population-based structural health monitoring can further reduce the cost of health monitoring systems by implementing one system for multiple structures (i.e.~turbines). At the same time, shared data within a population of structures may improve the predictions of structural behaviour. To monitor turbine performance at a population/farm level, an important initial step is to construct a model that describes the behaviour of all turbines under normal conditions. This paper proposes a population-level model that explicitly captures the spatial and temporal correlations (between turbines) induced by the wake effect. The proposed model is a Gaussian process-based spatial autoregressive model, named here a GP-SPARX model. This approach is developed since (a) it reflects our physical understanding of the wake effect, and (b) it benefits from a stochastic data-based learner. A case study is provided to demonstrate the capability of the GP-SPARX model in capturing spatial and temporal variations as well as its potential applicability in a health monitoring system.

摘要
operator of wind farms 面临一个重要挑战是减少运营和维护成本。人口基本的结构健康监测可以通过最小化无必要的维护旅行以及提高机顺服务寿命，从而减少健康监测系统的成本。同时，在多个结构之间共享数据可以提高结构行为预测的准确性。为监测风 турbin的性能，一个重要的初始步骤是建立一个描述所有风 турbin在正常情况下行为的模型。这篇文章提出了一种人口级别的模型，该模型由 Gaussian 过程基本的空间自相关模型（GP-SPARX）组成，这种方法因其体现了风阻效应的物理理解，同时受益于数据驱动的随机学习。一个案例研究证明了 GP-SPARX 模型能够 capture 空间和时间变化，以及其可能应用于健康监测系统。

TacticAI: an AI assistant for football tactics

paper_url: http://arxiv.org/abs/2310.10553
repo_url: None
paper_authors: Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls
for: 这篇论文是为了开发一种基于人工智能的足球战术助手（TacticAI），帮助教练分析对手队伍的战术模式，并提供有效的回应策略。
methods: 这篇论文使用了预测和生成两部分的算法，允许教练通过样本和探索不同的球员布局来评估不同的角球模式，并选择最有可能性 succeed 的设置。
results: 研究人员通过对一些有关的 benchmark task 进行验证，证明 TacticAI 的模型建议不仅与实际战术无法分辨，而且在 90% 的时间上超过现有战术。另外，TacticAI 还提供了一个有效的角球检索系统。

Abstract
Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning.

摘要
现代足球中，认识对手队伍实施的战术模式，并开发有效应对策略，是核心问题。然而，这种算法化研究仍然是一个开放的研究挑战。为解决这个需求，我们提出了TacticAI，一个基于人工智能的足球战术助手。我们与足球领域专家合作开发并评估了TacticAI，专注于分析角球机会，因为这些机会提供了教练最直接的改进和优化机会。TacticAI包含预测和生成两个组成部分，允许教练通过采样和探索不同的玩家设置来寻找最有可能成功的角球机会。我们在多个相关的 bencmark任务上验证了TacticAI：预测接收者和射击尝试，并建议玩家位置调整。我们通过对足球领域专家进行质量调研，证明TacticAI的模型建议与实际战术无法分辨，并且90%的时间 prefer TacticAI的建议。此外，TacticAI还提供了有效的角球检索系统。TacticAI达到了这些结果，尽管数据的可用性受限，通过几何深度学习实现了数据效率。

Optimal vintage factor analysis with deflation varimax

paper_url: http://arxiv.org/abs/2310.10545
repo_url: None
paper_authors: Xin Bing, Dian Jin, Yuqian Zhang
for: This paper proposes a new method for vintage factor analysis, which aims to find a low-dimensional representation of the original data and then seek a rotation that is scientifically meaningful.
methods: The proposed method uses a deflation varimax procedure that solves each row of an orthogonal matrix sequentially, which has a net computational gain and flexibility.
results: The proposed method is able to fully establish theoretical guarantees for the proposed procedure in a broad context, and it is shown to be optimal in all SNR regimes. Additionally, the method is valid for finite sample and allows the number of the latent factors to grow with the sample size.Here is the Chinese translation of the three points:
for: 这个论文提出了一种新的维度分析方法，目的是找到原始数据的低维度表示，然后寻找科学意义的旋转。
methods: 该方法使用了一种减法varimax过程，解决每个正交矩阵的每行问题，具有计算效益和灵活性。
results: 该方法能够在广泛的情况下提供完整的理论保证，并且在所有的噪声范围内都是优化的。此外，方法是有限样本的有效性和因子数量可以随样本大小和维度的增长而增长。

Abstract
Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. Perhaps the most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity, little theoretical guarantee can be provided mainly because varimax rotation requires to solve a non-convex optimization over the set of orthogonal matrices. In this paper, we propose a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. In addition to its net computational gain and flexibility, we are able to fully establish theoretical guarantees for the proposed procedure in a broad context. Adopting this new varimax approach as the second step after PCA, we further analyze this two step procedure under a general class of factor models. Our results show that it estimates the factor loading matrix in the optimal rate when the signal-to-noise-ratio (SNR) is moderate or large. In the low SNR regime, we offer possible improvement over using PCA and the deflation procedure when the additive noise under the factor model is structured. The modified procedure is shown to be optimal in all SNR regimes. Our theory is valid for finite sample and allows the number of the latent factors to grow with the sample size as well as the ambient dimension to grow with, or even exceed, the sample size. Extensive simulation and real data analysis further corroborate our theoretical findings.

摘要
古典因素分析是一种重要的因素分析方法，旨在首先找到原始数据的低维度表示，然后寻找一种可靠的旋转，使得旋转后的低维度表示具有科学意义。最广泛使用的古典因素分析方法是主Component分析（PCA）followed by varimax旋转。尽管它受欢迎，但是可以提供的理论保证很少，因为varimax旋转需要解决非核心化优化问题。在这篇论文中，我们提出了一种减少varimax过程中的计算量和灵活性的方法，并且可以在广泛的Context下提供完整的理论保证。我们采用这种新的varimax方法作为PCA之后的第二步，然后对这两步进程进行了广泛的分析。我们的结果表明，这种两步过程在中等或大的信号噪声比（SNR）下能够优化因子加载矩阵。在噪声比较低的情况下，我们提供了可能的改进方案，其中添加的噪声在因子模型下是结构化的。我们的修改过程在所有SNR régime下是优化的。我们的理论是有限样本和因子数量可以随样本大小和环境维度增长。我们的实验和实际数据分析进一步证明了我们的理论发现。

Comparing Comparators in Generalization Bounds

paper_url: http://arxiv.org/abs/2310.10534
repo_url: None
paper_authors: Fredrik Hellström, Benjamin Guedj
for: 本研究的目的是提出一种基于信息理论和PAC-搜索概率的通用泛化约束，用于评估机器学习模型的泛化性能。
methods: 本文使用了信息理论和PAC-搜索概率来 derive一些泛化约束，并证明了这些约束的优化性。
results: 本文的研究结果表明，使用这些泛化约束可以获得更加优化的泛化性能，并且可以在不同的维度上进行泛化。

Abstract
We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cram\'er function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.

摘要

Learning optimal integration of spatial and temporal information in noisy chemotaxis

paper_url: http://arxiv.org/abs/2310.10531
repo_url: https://github.com/kirkegaardlab/chemoxrl
paper_authors: Albert Alonso, Julius B. Kirkegaard
for: 研究 chemotaxis 驱动 by spatial 和 temporal 估计的边界
methods: 使用 deep reinforcement learning 研究可以在不受限制的方式集成 spatial 和 temporal 信息
results: 发现 transition between regimes 是连续的，combined strategy 在过渡区域表现更好，并且 policy 听取的 gradient 信息是非rivial的组合。Is there anything else I can help with?

Abstract
We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a non-trivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.

摘要
Translation into Simplified Chinese:我们研究 chemotaxis 驱动 by 空间估计 gradient 和 temporal 估计之间的边界。然而，已知的是，随着噪声水平的提高，小organism 中的 spatial chemotaxis 变得不利。但是，是否存在突然的优化策略转换，或者是一个连续的转换，这还未得到了解。我们使用深度学习来研究可能的空间和时间信息的集成，不受任何假设或限制。我们使用 recurrent neural network 来参数化这种合并的 chemotactic 政策，并使用一个最小的化学吸引细胞模型来评估它。我们比较了这种政策与受限的变种，发现它在小和大细胞尺度之间分别转换为纯 temporal 和空间策略，并且在这两个策略之间存在一个连续的转换。此外，我们还发现，在转换区域，合并策略比受限变种和显式地集成空间和时间信息的模型都更高效。最后，我们使用 integrated gradients 的归因方法，发现政策在化学追踪过程中动态变化的空间和时间信息的权重组合是非常复杂的。

From Spectral Theorem to Statistical Independence with Application to System Identification

paper_url: http://arxiv.org/abs/2310.10523
repo_url: None
paper_authors: Muhammad Abdullah Naeem, Amir Khazraei, Miroslav Pajic
for: 这个论文是关于高维Random Dynamical Systems的研究，具体来说是研究这些系统的identification问题。
methods: 作者使用spectral theorem for non-Hermitian operators来研究系统的特征向量，并通过分析eigenvalues和eigenvectors来描述系统的特性。
results: 作者发现，当系统是稳定的时，系统的特征向量可以分解为多个lower dimensional Random Dynamical Systems，这些系统之间是独立的。此外，作者还发现，在这种情况下，covariates可能会受到维度的干扰，导致error的增加。

Abstract
High dimensional random dynamical systems are ubiquitous, including -- but not limited to -- cyber-physical systems, daily return on different stocks of S&P 1500 and velocity profile of interacting particle systems around McKeanVlasov limit. Mathematically, underlying phenomenon can be captured via a stable $n$-dimensional linear transformation `$A$' and additive randomness. System identification aims at extracting useful information about underlying dynamical system, given a length $N$ trajectory from it (corresponds to an $n \times N$ dimensional data matrix). We use spectral theorem for non-Hermitian operators to show that spatio-temperal correlations are dictated by the discrepancy between algebraic and geometric multiplicity of distinct eigenvalues corresponding to state transition matrix. Small discrepancies imply that original trajectory essentially comprises of multiple lower dimensional random dynamical systems living on $A$ invariant subspaces and are statistically independent of each other. In the process, we provide first quantitative handle on decay rate of finite powers of state transition matrix $\|A^{k}\|$ . It is shown that when a stable dynamical system has only one distinct eigenvalue and discrepancy of $n-1$: $\|A\|$ has a dependence on $n$, resulting dynamics are spatially inseparable and consequently there exist at least one row with covariates of typical size $\Theta\big(\sqrt{N-n+1}$ $e^{n}\big)$ i.e., even under stability assumption, covariates can suffer from curse of dimensionality. In the light of these findings we set the stage for non-asymptotic error analysis in estimation of state transition matrix $A$ via least squares regression on observed trajectory by showing that element-wise error is essentially a variant of well-know Littlewood-Offord problem.

摘要
高维Random动力系统广泛存在，包括但不限于Cyber-Physical Systems、每天不同股票S&P 1500的回报和Interacting Particle Systems around McKeanVlasov limit的速度 Profile。数学上，下面的现象可以通过一个稳定的$n$-维线性变换'$A$'和随机性来捕捉。系统识别目标是从这个系统中提取有用的信息，了解下面的动力系统。我们使用非 hermitian 算子的特征定理来证明，在空间-时间 correlations 中，存在一些独特的多个低维Random dynamical systems 在 $A$ invariable subspaces 中生活，这些系统是独立的。在这个过程中，我们提供了第一个量化的把握，以及 $\|A^{k}\|$ 的衰减率。当一个稳定的动力系统只有一个独特的征值，并且差值为 $n-1$，则 $\|A\|$ 具有对 $n$ 的依赖关系，结果的动力系统是无法分离的。因此，存在至少一行具有特点大小 $\Theta\big(\sqrt{N-n+1}$ $e^{n}\big)$ 的covariates，即，even under stability assumption，covariates 可能会受到维度约束。在这些发现的基础上，我们设置了非对数学术的错误分析在 $A$ 的最小二乘回归中，并证明了元素级别的错误是一种变种的 Littlewood-Offord 问题。

Reproducing Bayesian Posterior Distributions for Exoplanet Atmospheric Parameter Retrievals with a Machine Learning Surrogate Model

paper_url: http://arxiv.org/abs/2310.10521
repo_url: None
paper_authors: Eyup B. Unlu, Roy T. Forestano, Konstantin T. Matchev, Katia Matcheva
for: 这个论文是为了实现基于机器学习的 posterior 分布模型，用于重现通过掩蔽行星的谱 spectra 获得的外层星球大气参数的 Bayesian posterior distributions。
methods: 该模型使用了适应性学习和半监督学习，以便利用大量的无标注训练数据。它还进行了领域适应的特征处理，以提高模型性能。
results: 该模型在2023年 Ariel 机器学习数据挑战中获得了优胜解决方案。

Abstract
We describe a machine-learning-based surrogate model for reproducing the Bayesian posterior distributions for exoplanet atmospheric parameters derived from transmission spectra of transiting planets with typical retrieval software such as TauRex. The model is trained on ground truth distributions for seven parameters: the planet radius, the atmospheric temperature, and the mixing ratios for five common absorbers: $H_2O$, $CH_4$, $NH_3$, $CO$ and $CO_2$. The model performance is enhanced by domain-inspired preprocessing of the features and the use of semi-supervised learning in order to leverage the large amount of unlabelled training data available. The model was among the winning solutions in the 2023 Ariel Machine Learning Data Challenge.

摘要
我们描述了一种基于机器学习的代理模型，用于重现吸收 спектроскопии中探测到的外层星球大气参数的 bayesian posterior distribution。该模型使用了常用的恢复软件 such as TauRex，并在七个参数上进行了训练：星球半径、大气温度以及五种常见吸收物的混合率：$H_2O$, $CH_4$, $NH_3$, $CO$ 和 $CO_2$。通过域名预处理和使用半监督学习，我们提高了模型的性能，并利用了大量的无标注训练数据。该模型在2023年的Ariel机器学习数据挑战中获得了奖励。

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

paper_url: http://arxiv.org/abs/2310.10505
repo_url: https://github.com/liziniu/ReMax
paper_authors: Ziniu Li, Tian Xu, Yushun Zhang, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
for: 本研究旨在提高RLHF任务中的训练效率，并解决PPO算法的计算效率问题。
methods: 本研究提出了一种新的RLHF算法 called ReMax，基于REINFORCE算法，并具有一种新的减少方差技术。
results: ReMax比PPO具有三大优点：首先，ReMax简单实现，消除了多个超参数，减少了训练时间和精度优化的努力。其次，ReMax减少了50%的内存使用量，可以在8xA100-40GB GPU上训练Llama2（7B）模型。最后，ReMax比PPO快2倍，不降低性能。

Abstract
Alignment is of critical importance for training large language models (LLMs). The predominant strategy to address this is through Reinforcement Learning from Human Feedback (RLHF), where PPO serves as the de-facto algorithm. Yet, PPO is known to suffer from computational inefficiency, which is a challenge that this paper aims to address. We identify three important properties in RLHF tasks: fast simulation, deterministic transitions, and trajectory-level rewards, which are not leveraged in PPO. Based on such observations, we develop a new algorithm tailored for RLHF, called ReMax. The algorithm design of ReMax is built on a celebrated algorithm REINFORCE but is equipped with a new variance-reduction technique. Our method has three-fold advantages over PPO: first, ReMax is simple to implement and removes many hyper-parameters in PPO, which are scale-sensitive and laborious to tune. Second, ReMax saves about 50% memory usage in principle. As a result, PPO runs out-of-memory when fine-tuning a Llama2 (7B) model on 8xA100-40GB GPUs, whereas ReMax can afford training. This memory improvement is achieved by removing the value model in PPO. Third, based on our calculations, we find that even assuming PPO can afford the training of Llama2 (7B), it would still run about 2x slower than ReMax. This is due to the computational overhead of the value model, which does not exist in ReMax. Importantly, the above computational improvements do not sacrifice the performance. We hypothesize these advantages can be maintained in larger-scaled models. Our implementation of ReMax is available at https://github.com/liziniu/ReMax

摘要
<>translate text into Simplified Chinese<>大量语言模型（LLM）的训练需要Alignment是关键。现有的主流策略是通过人类反馈学习（RLHF），其中PPO serves as the de-facto algorithm。然而，PPO知道 suffer from computational inefficiency，这是这篇文章的目标。我们确定了RLHF任务中的三个重要特性：快速的模拟，决定性的转移和轨迹级别的奖励，这些特性在PPO中未被利用。基于这些观察，我们开发了一种适合RLHF的新算法，called ReMax。ReMax的算法设计基于celebrated algorithm REINFORCE，但具有一种新的减少偏移技术。我们的方法有三个优势：首先，ReMax简单实现，消除了PPO中许多参数，这些参数是敏感度和耗时consuming。其次，ReMax将减少约50%的内存使用量。这使得PPO在精度级别的模型（7B）上的8xA100-40GB GPU上进行精度级别的模型（7B）上进行精度级别的训练时出现内存不足问题，而ReMax可以进行训练。这种内存改进是通过 removing the value model in PPO 来实现的。第三，基于我们的计算，即使PPO可以训练Llama2（7B），它仍然会比ReMax约2倍 slower。这是因为值模型在PPO中的计算 overhead，不存在在ReMax中。重要的是，上述计算改进不会减少性能。我们认为这些优势可以在更大的模型 scale 中被维持。我们的 ReMax 实现可以在中找到。

Few-Shot Learning Patterns in Financial Time-Series for Trend-Following Strategies

paper_url: http://arxiv.org/abs/2310.10500
repo_url: None
paper_authors: Kieran Wood, Samuel Kessler, Stephen J. Roberts, Stefan Zohren
for: 这个论文是为了提出一种能够快速适应金融市场变化的时间序列趋势预测模型，以避免在金融市场突然变化时出现的损失。
methods: 该模型使用了深度学习的最新进展，特别是几拟学习，以及时间序列趋势预测模型。
results: 该模型在2018-2023年的紧张市场期间，相比 neural forecaster 和时间序列势力策略，提高了18.9%的谭瑞比，并且在 COVID-19 下落期间， doubles 快速恢复。此外，该模型还可以对新的金融资产进行零 shot 位置，并且与 neural time-series trend forecaster 相比，在同一时间期内提高了5倍的谭瑞比。

Abstract
Forecasting models for systematic trading strategies do not adapt quickly when financial market conditions change, as was seen in the advent of the COVID-19 pandemic in 2020, when market conditions changed dramatically causing many forecasting models to take loss-making positions. To deal with such situations, we propose a novel time-series trend-following forecaster that is able to quickly adapt to new market conditions, referred to as regimes. We leverage recent developments from the deep learning community and use few-shot learning. We propose the Cross Attentive Time-Series Trend Network - X-Trend - which takes positions attending over a context set of financial time-series regimes. X-Trend transfers trends from similar patterns in the context set to make predictions and take positions for a new distinct target regime. X-Trend is able to quickly adapt to new financial regimes with a Sharpe ratio increase of 18.9% over a neural forecaster and 10-fold over a conventional Time-series Momentum strategy during the turbulent market period from 2018 to 2023. Our strategy recovers twice as quickly from the COVID-19 drawdown compared to the neural-forecaster. X-Trend can also take zero-shot positions on novel unseen financial assets obtaining a 5-fold Sharpe ratio increase versus a neural time-series trend forecaster over the same period. X-Trend both forecasts next-day prices and outputs a trading signal. Furthermore, the cross-attention mechanism allows us to interpret the relationship between forecasts and patterns in the context set.

摘要
预测模型 для系统性交易策略不快适应金融市场条件变化，例如2020年COVID-19大流行期间，市场条件快速变化，许多预测模型亏损。为解决这种情况，我们提出了一种新的时间序列趋势预测器，可以快速适应新的市场条件，称为“ régime”。我们利用了最新的深度学习社区的进展，并使用几何学学习。我们提出了跨注意力时间序列趋势网络（X-Trend），它在一个上下文集中注意力分配位置，并将趋势从类似的模式传递到新目标 régime 中进行预测和交易。X-Trend 能快速适应新的金融 régime，其肖特比（Sharpe ratio）提高18.9%于神经预测器和10倍于传统时间序列势力策略在2018-2023年的混乱市场期间。我们的策略在COVID-19下滑期间复制两倍于神经预测器。X-Trend 还可以在未看到的金融资产上出现零shot位置，其肖特比提高5倍于神经时间序列趋势预测器在同一时间期。X-Trend 同时预测下一天的价格和输出交易信号。此外，跨注意力机制允许我们解释预测和上下文集中的模式之间的关系。

Passive Inference Attacks on Split Learning via Adversarial Regularization

paper_url: http://arxiv.org/abs/2310.10483
repo_url: None
paper_authors: Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, Beng Chin Ooi
for: 这个研究旨在攻击 Split Learning (SL) 的实际和有效替代方案。
methods: 这个研究引入了一个名为 SDAR 的攻击框架，这个框架使用辅助数据和敌对调整来学习一个可以实时重建客户端私人模型的可靠模拟器。
results: 实验结果显示，在实际且实用的攻击enario中，SDAR 能够实时重建客户端私人数据，并在 U-shaped SL 中重建数据和标签。在 CIFAR-10 上，在深度分割水平 7 下，SDAR 能够实现私人数据重建的 mean squared error 小于 0.025，并在 U-shaped SL 中 дости得标签推论精度高于 98%。

Abstract
Split Learning (SL) has emerged as a practical and efficient alternative to traditional federated learning. While previous attempts to attack SL have often relied on overly strong assumptions or targeted easily exploitable models, we seek to develop more practical attacks. We introduce SDAR, a novel attack framework against SL with an honest-but-curious server. SDAR leverages auxiliary data and adversarial regularization to learn a decodable simulator of the client's private model, which can effectively infer the client's private features under the vanilla SL, and both features and labels under the U-shaped SL. We perform extensive experiments in both configurations to validate the effectiveness of our proposed attacks. Notably, in challenging but practical scenarios where existing passive attacks struggle to reconstruct the client's private data effectively, SDAR consistently achieves attack performance comparable to active attacks. On CIFAR-10, at the deep split level of 7, SDAR achieves private feature reconstruction with less than 0.025 mean squared error in both the vanilla and the U-shaped SL, and attains a label inference accuracy of over 98% in the U-shaped setting, while existing attacks fail to produce non-trivial results.

摘要

Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

paper_url: http://arxiv.org/abs/2310.10462
repo_url: None
paper_authors: Yunli Wang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Dongying Kong, Han Li, Kun Gai
for: 这个论文主要针对大规模top-k选择问题中的涨幅排序系统优化，具体来说是通过学习排序来优化模型。
methods: 该论文提出了一种基于多任务学习框架的 Adaptive Neural Ranking Framework，通过将relaxed和完整的目标优化并 combinely，使得优化目标适应不同数据复杂度和模型能力。
results: 实验结果表明，该方法在4个公共和商业benchmark上表现出色，并且在线上实验中具有显著的应用价值。

Abstract
Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems, and learning-to-rank is an important way to optimize the models in cascade ranking systems. Previous works on learning-to-rank usually focus on letting the model learn the complete order or pay more attention to the order of top materials, and adopt the corresponding rank metrics as optimization targets. However, these optimization targets can not adapt to various cascade ranking scenarios with varying data complexities and model capabilities; and the existing metric-driven methods such as the Lambda framework can only optimize a rough upper bound of the metric, potentially resulting in performance misalignment. To address these issues, we first propose a novel perspective on optimizing cascade ranking systems by highlighting the adaptability of optimization targets to data complexities and model capabilities. Concretely, we employ multi-task learning framework to adaptively combine the optimization of relaxed and full targets, which refers to metrics Recall@m@k and OAP respectively. Then we introduce a permutation matrix to represent the rank metrics and employ differentiable sorting techniques to obtain a relaxed permutation matrix with controllable approximate error bound. This enables us to optimize both the relaxed and full targets directly and more appropriately using the proposed surrogate losses within the deep learning framework. We named this method as Adaptive Neural Ranking Framework. We use the NeuralSort method to obtain the relaxed permutation matrix and draw on the uncertainty weight method in multi-task learning to optimize the proposed losses jointly. Experiments on a total of 4 public and industrial benchmarks show the effectiveness and generalization of our method, and online experiment shows that our method has significant application value.

摘要
cascade ranking 广泛应用于大规模 top-k 选择问题中，学习 rank 是一种重要的优化方法。前一些工作通常是让模型学习完整的排序或更加注重 top Materials 的排序，采用相应的rank metric作为优化目标。然而，这些优化目标无法适应不同的排序场景中的数据复杂性和模型能力；而现有的 metric-driven 方法，如Lambda框架，只能优化一个粗略的上界，可能导致性能不符。为解决这些问题，我们首先提出一种新的视角，即优化 cascade ranking 系统的可适应性。具体来说，我们使用多任务学习框架来适应性地组合优化 relaxed 和 full 目标。relaxed 目标指的是 recall@m@k 和 OAP metric，而 full 目标则是完整的排序。然后，我们引入排序矩阵来表示排序 metric，并使用可微排序技术来获得一个可控的相对误差 bound。这使得我们可以直接优化 relaxed 和 full 目标，并更加合适地使用我们提出的代理损失函数在深度学习框架中进行优化。我们称这种方法为 Adaptive Neural Ranking Framework。我们使用 NeuralSort 方法来获得 relaxed 排序矩阵，并在多任务学习中使用不确定性权重来优化我们的提出的损失函数。实验结果显示，我们的方法在四个公共和工业标准准中表现出色，并且在实际应用中具有显著的价值。

A Geometric Insight into Equivariant Message Passing Neural Networks on Riemannian Manifolds

paper_url: http://arxiv.org/abs/2310.10448
repo_url: None
paper_authors: Ilyes Batatia
for: 本文提出了一种 geometric 的思路，用于解释 equivariant message passing 在 Riemannian manifold 上的实现。
methods: 作者使用 coordinate-independent feature fields 表示数据的 numerical features，并将其映射到主bundle 上的 equivariant embedding 中。然后，他们提出一种优化 Polyakov action 的方法，以确保 embedding 中的 metric 与原始 metric 相似。
results: 作者提出了一种基于 equivariant diffusion process 的 message passing scheme，可以在 manifold 上实现。此外，他们还提出了一种基于高阶 equivariant diffusion process 的新的一般化 GNN 模型，可以扩展 ACE 和 MACE formalism 到 Riemannian manifold 上的数据。

Abstract
This work proposes a geometric insight into equivariant message passing on Riemannian manifolds. As previously proposed, numerical features on Riemannian manifolds are represented as coordinate-independent feature fields on the manifold. To any coordinate-independent feature field on a manifold comes attached an equivariant embedding of the principal bundle to the space of numerical features. We argue that the metric this embedding induces on the numerical feature space should optimally preserve the principal bundle's original metric. This optimality criterion leads to the minimization of a twisted form of the Polyakov action with respect to the graph of this embedding, yielding an equivariant diffusion process on the associated vector bundle. We obtain a message passing scheme on the manifold by discretizing the diffusion equation flow for a fixed time step. We propose a higher-order equivariant diffusion process equivalent to diffusion on the cartesian product of the base manifold. The discretization of the higher-order diffusion process on a graph yields a new general class of equivariant GNN, generalizing the ACE and MACE formalism to data on Riemannian manifolds.

摘要
这个工作提出了一种几何视角来理解在里曼尼投影上的平衡消息传递。在先前的提议中，数字特征在里曼尼投影上是作为独立坐标的特征场表示的。为任何独立特征场在投影上来说，有一个对称嵌入主 bundle 到特征空间的 equivariant 嵌入。我们 argue 这个嵌入应该保持原始主 bundle 的 metric 的最佳方式，这个标准导致了对 twisted 形式的 Polyakov 动作的最小化，从而获得一个 equivariant 扩散过程在关联的向量bundle 上。我们可以通过粘束扩散方程的离散来获得一个消息传递方案在投影上。我们提出了一种高阶不变扩散过程，与 cartesian product 的基 manifold 相等。离散这种高阶扩散过程在图上得到一个新的一般类型的不变 GNN，扩展了数据在里曼尼投影上的 ACE 和 MACE formalism。

Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

paper_url: http://arxiv.org/abs/2310.10443
repo_url: https://github.com/andreasgrv/sigmoid-bottleneck
paper_authors: Andreas Grivas, Antonio Vergari, Adam Lopez
for: 这篇论文是关于多标签分类任务中的sigmoid输出层，其中每个输入可以获得多个标签。
methods: 这篇论文使用了Discrete Fourier Transform（DFT）输出层，以确保所有稀疏的标签组合都是可arginmax的。
results: 论文表明，sigmoid输出层在多标签分类任务中会导致无法argmax的输出，并且可以通过使用DFT输出层来避免这种情况。DFT输出层比sigmoid输出层更快速地训练，并且具有更好的参数效率。

Abstract
Sigmoid output layers are widely used in multi-label classification (MLC) tasks, in which multiple labels can be assigned to any input. In many practical MLC tasks, the number of possible labels is in the thousands, often exceeding the number of input features and resulting in a low-rank output layer. In multi-class classification, it is known that such a low-rank output layer is a bottleneck that can result in unargmaxable classes: classes which cannot be predicted for any input. In this paper, we show that for MLC tasks, the analogous sigmoid bottleneck results in exponentially many unargmaxable label combinations. We explain how to detect these unargmaxable outputs and demonstrate their presence in three widely used MLC datasets. We then show that they can be prevented in practice by introducing a Discrete Fourier Transform (DFT) output layer, which guarantees that all sparse label combinations with up to $k$ active labels are argmaxable. Our DFT layer trains faster and is more parameter efficient, matching the F1@k score of a sigmoid layer while using up to 50% fewer trainable parameters. Our code is publicly available at https://github.com/andreasgrv/sigmoid-bottleneck.

摘要
希格迪输出层在多标签分类(MLC)任务中广泛使用，在任务中任何输入都可以获得多个标签。在实际应用中，可能有数천个可能的标签，常常超过输入特征的数量，导致输出层的低级排名。在多类分类中，这种低级输出层会导致不可预测的类：无法预测的类。在这篇论文中，我们表明MLC任务中的希格迪瓶颈会导致无数多个不可预测的标签组合。我们解释了如何检测这些不可预测的输出和三个常用的MLC数据集中其存在。然后我们表明可以通过引入离散傅里叶变换(DFT)输出层来避免这些不可预测的输出。我们的DFT层在训练时更快，并且使用更少的可训练参数，与希格迪层的F1@k分数相同，而使用的参数数量可以减少到50%。我们的代码可以在https://github.com/andreasgrv/sigmoid-bottleneck上获取。

Equivariant Matrix Function Neural Networks

paper_url: http://arxiv.org/abs/2310.10434
repo_url: None
paper_authors: Ilyes Batatia, Lars L. Schaaf, Huajie Chen, Gábor Csányi, Christoph Ortner, Felix A. Faber
for: This paper aims to address the challenges of modeling non-local interactions in systems such as large conjugated molecules, metals, or amorphous materials using Graph Neural Networks (GNNs) and traditional neural networks.
methods: The paper introduces a novel architecture called Matrix Function Neural Networks (MFNs), which parameterizes non-local interactions through analytic matrix equivariant functions. The MFN architecture uses resolvent expansions for a straightforward implementation and the potential for linear scaling with system size.
results: The MFN architecture achieves state-of-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.

Abstract
Graph Neural Networks (GNNs), especially message-passing neural networks (MPNNs), have emerged as powerful architectures for learning on graphs in diverse applications. However, MPNNs face challenges when modeling non-local interactions in systems such as large conjugated molecules, metals, or amorphous materials. Although Spectral GNNs and traditional neural networks such as recurrent neural networks and transformers mitigate these challenges, they often lack extensivity, adaptability, generalizability, computational efficiency, or fail to capture detailed structural relationships or symmetries in the data. To address these concerns, we introduce Matrix Function Neural Networks (MFNs), a novel architecture that parameterizes non-local interactions through analytic matrix equivariant functions. Employing resolvent expansions offers a straightforward implementation and the potential for linear scaling with system size. The MFN architecture achieves state-of-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.

摘要
图形神经网络（GNNs），特别是消息传递神经网络（MPNNs），在不同应用场景中显示出了强大的架构能力。然而，MPNNs在大 conjugated molecules、金属和归一化材料等系统中模型非本地交互时面临挑战。虽然spectral GNNs和传统神经网络如回归神经网络和transformers可以减轻这些挑战，但它们经常缺乏广泛性、适应性、普适性、计算效率或失去数据中的细致结构关系或对称性。为解决这些问题，我们介绍了矩阵函数神经网络（MFNs），一种新的架构，该参数非本地交互通过矩阵对偶变换函数。使用resolvent expansions的实现可以提供一种简单的实现方式，并且可能实现系统大小的线性扩展。MFN架构在标准图形数据集上达到了state-of-the-art性能，如ZINC和TU数据集，并能够捕捉到量子系统中的复杂非本地交互，为新的state-of-the-art力场开创道路。

Continuously Adapting Random Sampling (CARS) for Power Electronics Parameter Design

paper_url: http://arxiv.org/abs/2310.10425
repo_url: None
paper_authors: Dominik Happel, Philipp Brendel, Andreas Rosskopf, Stefan Ditze
for: 这个论文主要针对的是电子能源参数设计任务的优化问题，通常使用详细的优化方法或者笨拙的搜索方法来解决。
methods: 该论文提出了一种新的方法 named “Continuously Adapting Random Sampling” (CARS)，它提供了一种连续的方法，位于详细优化方法和笨拙搜索方法之间。这种方法可以快速地进行大量的 simulations，同时逐渐增加关注最有前途的参数范围。这个方法 Draws inspiration from multi-armed bandit research and leads to prioritized sampling of sub-domains in one high-dimensional parameter tensor。
results: 该论文对三个例子的电子能源使用情况进行了评估，得到的设计与遗传算法相当竞争力，同时具有高度并行化的 simulate 特点和不断进行探索和利用设置之间的融合。

Abstract
To date, power electronics parameter design tasks are usually tackled using detailed optimization approaches with detailed simulations or using brute force grid search grid search with very fast simulations. A new method, named "Continuously Adapting Random Sampling" (CARS) is proposed, which provides a continuous method in between. This allows for very fast, and / or large amounts of simulations, but increasingly focuses on the most promising parameter ranges. Inspirations are drawn from multi-armed bandit research and lead to prioritized sampling of sub-domains in one high-dimensional parameter tensor. Performance has been evaluated on three exemplary power electronic use-cases, where resulting designs appear competitive to genetic algorithms, but additionally allow for highly parallelizable simulation, as well as continuous progression between explorative and exploitative settings.

摘要
文本翻译为简化中文：迄今，电力电子参数设计任务通常使用详细优化方法或使用劳顿搜索法，均采用详细的simulation。一种新的方法，名为“连续适应随机抽样”（CARS）被提议，它提供了一种连续的方法，位于详细优化和劳顿搜索之间。这使得可以很快、或者进行大量的simulation，但是逐渐关注最有前途的参数范围。 draw inspirations from multi-armed bandit research and lead to prioritized sampling of sub-domains in one high-dimensional parameter tensor。 performance has been evaluated on three exemplary power electronic use-cases, where resulting designs appear competitive to genetic algorithms, but additionally allow for highly parallelizable simulation, as well as continuous progression between explorative and exploitative settings。Here is the translation of the text into Simplified Chinese:迄今，电力电子参数设计任务通常使用详细优化方法或使用劳顿搜索法，均采用详细的simulation。一种新的方法，名为“连续适应随机抽样”（CARS）被提议，它提供了一种连续的方法，位于详细优化和劳顿搜索之间。这使得可以很快、或者进行大量的simulation，但是逐渐关注最有前途的参数范围。 draw inspirations from multi-armed bandit research and lead to prioritized sampling of sub-domains in one high-dimensional parameter tensor。 performance has been evaluated on three exemplary power electronic use-cases, where resulting designs appear competitive to genetic algorithms, but additionally allow for highly parallelizable simulation, as well as continuous progression between explorative and exploitative settings。

Towards Fair and Calibrated Models

paper_url: http://arxiv.org/abs/2310.10399
repo_url: None
paper_authors: Anand Brahmbhatt, Vipul Rathore, Mausam, Parag Singla
for: 建立不偏袋化和准确的机器学习模型
methods: 使用特定定义的公平性、抽象和解释性，并提出了一种基于温度 scaling的简单预处理技术和修改现有抽象损失来实现公平和准确的模型
results: 通过对多种数据集进行广泛实验，发现这些技术可以实现公平和准确的模型，并提供了对模型的解释和分析。

Abstract
Recent literature has seen a significant focus on building machine learning models with specific properties such as fairness, i.e., being non-biased with respect to a given set of attributes, calibration i.e., model confidence being aligned with its predictive accuracy, and explainability, i.e., ability to be understandable to humans. While there has been work focusing on each of these aspects individually, researchers have shied away from simultaneously addressing more than one of these dimensions. In this work, we address the problem of building models which are both fair and calibrated. We work with a specific definition of fairness, which closely matches [Biswas et. al. 2019], and has the nice property that Bayes optimal classifier has the maximum possible fairness under our definition. We show that an existing negative result towards achieving a fair and calibrated model [Kleinberg et. al. 2017] does not hold for our definition of fairness. Further, we show that ensuring group-wise calibration with respect to the sensitive attributes automatically results in a fair model under our definition. Using this result, we provide a first cut approach for achieving fair and calibrated models, via a simple post-processing technique based on temperature scaling. We then propose modifications of existing calibration losses to perform group-wise calibration, as a way of achieving fair and calibrated models in a variety of settings. Finally, we perform extensive experimentation of these techniques on a diverse benchmark of datasets, and present insights on the pareto-optimality of the resulting solutions.

摘要

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

paper_url: http://arxiv.org/abs/2310.10379
repo_url: https://github.com/keanson/revisit-logistic-softmax
paper_authors: Tianjun Ke, Haoqun Cao, Zenan Ling, Feng Zhou
for: 这个论文主要研究了如何使用逻辑-软泛函数来提高几个shot分类（FSC）中的不确定性评估和性能。
methods: 该论文使用了 bayesian 方法来 caracterize uncertainty in FSC，并使用了修改后的逻辑-软泛函数来控制先前不确定性的问题。
results: 该论文通过 theoretically 和 empirically 表明，修改后的逻辑-软泛函数可以提高 uncertainty 估计的准确性和性能，并且可以在标准 benchmark 数据集上达到或超过相同水平。

Abstract
Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification due to its conditional conjugacy property. However, the theoretical property of logistic-softmax is not clear and previous research indicated that the inherent uncertainty of logistic-softmax leads to suboptimal performance. To mitigate these issues, we revisit and redesign the logistic-softmax likelihood, which enables control of the \textit{a priori} confidence level through a temperature parameter. Furthermore, we theoretically and empirically show that softmax can be viewed as a special case of logistic-softmax and logistic-softmax induces a larger family of data distribution than softmax. Utilizing modified logistic-softmax, we integrate the data augmentation technique into the deep kernel based Gaussian process meta-learning framework, and derive an analytical mean-field approximation for task-specific updates. Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets. Code is publicly available at \url{https://github.com/keanson/revisit-logistic-softmax}.

摘要
使用适应学习的方法可以在几个批处理（Few-shot Classification，FSC）中表现出色，因为它可以通过之前的知识来解决新的问题。 bayesian方法可以准确地描述 FSC 中的uncertainty，这对于高风险领域非常重要。在这种情况下，通常使用Logistic-softmax概率 Distribution来取代Softmax概率 Distribution，因为它们具有 conditional conjugacy 性质。然而，Logistic-softmax的理论性不够清楚，而且先前的研究表明，Logistic-softmax的内在不确定性会导致表现下降。为了解决这些问题，我们重新访问和重新设计Logistic-softmax概率 Distribution，这使得可以通过温度参数控制 \textit{a priori} 信任水平。此外，我们还证明了Softmax可以视为Logistic-softmax的特殊情况，Logistic-softmax可以生成更大的数据分布Family。通过修改Logistic-softmax，我们将数据扩展技术集成到深度kernel基于Gaussian Process meta-学习框架中，并 derive了analytical mean-field Approximation for task-specific updates。我们的方法可以提供Well-calibrated uncertainty estimates，并在标准 benchmark datasets上实现了相对或superior的Results。相关代码可以在上获取。I hope this helps! Let me know if you have any further questions or if you'd like me to translate anything else.

Multi-Factor Spatio-Temporal Prediction based on Graph Decomposition Learning

paper_url: http://arxiv.org/abs/2310.10374
repo_url: None
paper_authors: Jiahao Ji, Jingyuan Wang, Yu Mou, Cheng Long
for: 本文提出了一种多因素空间时间预测任务，用于预测不同因素的空间时间数据的发展趋势。
methods: 本文提出了一种基于层次分解策略的 theoretically 有效的方法，以及一种名为空间时间图分解学习（STGDL）的模型无关框架。STGDL 包括两个主要组成部分：自动图分解模块和分解学习网络。
results: 对四个实际的空间时间数据集进行了广泛的实验，结果显示，使用本文提出的方法可以significantly 降低不同模型的预测错误率，最高降低到35.36%。此外，一个案例研究也表明了本方法的可解释性潜力。

Abstract
Spatio-temporal (ST) prediction is an important and widely used technique in data mining and analytics, especially for ST data in urban systems such as transportation data. In practice, the ST data generation is usually influenced by various latent factors tied to natural phenomena or human socioeconomic activities, impacting specific spatial areas selectively. However, existing ST prediction methods usually do not refine the impacts of different factors, but directly model the entangled impacts of multiple factors. This amplifies the modeling complexity of ST data and compromises model interpretability. To this end, we propose a multi-factor ST prediction task that predicts partial ST data evolution under different factors, and combines them for a final prediction. We make two contributions to this task: an effective theoretical solution and a portable instantiation framework. Specifically, we first propose a theoretical solution called decomposed prediction strategy and prove its effectiveness from the perspective of information entropy theory. On top of that, we instantiate a novel model-agnostic framework, named spatio-temporal graph decomposition learning (STGDL), for multi-factor ST prediction. The framework consists of two main components: an automatic graph decomposition module that decomposes the original graph structure inherent in ST data into subgraphs corresponding to different factors, and a decomposed learning network that learns the partial ST data on each subgraph separately and integrates them for the final prediction. We conduct extensive experiments on four real-world ST datasets of two types of graphs, i.e., grid graph and network graph. Results show that our framework significantly reduces prediction errors of various ST models by 9.41% on average (35.36% at most). Furthermore, a case study reveals the interpretability potential of our framework.

摘要
这是一个很重要的数据探索和分析技术，尤其是在城市系统中的交通数据。在实践中，这些数据通常受到自然现象或人类社会经济活动的多种隐藏因素影响，这些因素影响特定的空间区域选择性地。然而，现有的这些预测方法通常不会细分这些因素的影响，而是直接模型这些杂糅的影响。这会增加这些数据的预测复杂性和模型解释性。为了解决这个问题，我们提出了一个多因素预测任务，预测不同因素的部分预测结果，然后结合它们进行最终预测。我们做出了两个贡献：一个有效的理论解决方案和一个可携的实现框架。具体来说，我们首先提出了一个名为分解预测策略的理论解决方案，并证明其有效性从信息熵理论的角度。而在这个基础上，我们实现了一个名为类型-独立预测架构（STGDL）的新模型独立框架，这个框架包括两个主要 ком成分：一个自动对应图解析模组，将原始的图структуре组织体内的ST数据分解为不同因素的子图，以及一个分解学网络，这个学网络在每个子图上进行分解预测，然后将它们结合进行最终预测。我们对四个真实世界的ST数据集进行了广泛的实验，结果显示，我们的框架可以对不同的ST模型进行预测，将预测错误量降低了9.41%的平均值（最高到35.36%）。此外，一个实验显示了我们的框架的解释能力。

Machine learning in physics: a short guide

paper_url: http://arxiv.org/abs/2310.10368
repo_url: https://github.com/franciscorodrigues-usp/MLP
paper_authors: Francisco A. Rodrigues
for: Physics field （物理领域）
methods: Machine learning（机器学习）
results: Causal inference, symbolic regression, deep learning（因果推理、符号回归、深度学习）Here’s a more detailed explanation of each point:1. for: The paper is written for the field of physics, specifically focusing on the applications of machine learning in physics.2. methods: The paper covers the main concepts of machine learning, including supervised, unsupervised, and reinforcement learning, as well as more specialized topics such as causal inference, symbolic regression, and deep learning.3. results: The paper discusses some of the principal applications of machine learning in physics and highlights the associated challenges and perspectives.I hope this helps! Let me know if you have any other questions.

Abstract
Machine learning is a rapidly growing field with the potential to revolutionize many areas of science, including physics. This review provides a brief overview of machine learning in physics, covering the main concepts of supervised, unsupervised, and reinforcement learning, as well as more specialized topics such as causal inference, symbolic regression, and deep learning. We present some of the principal applications of machine learning in physics and discuss the associated challenges and perspectives.

摘要
机器学习是一个迅速成长的领域，拥有可能改革多个科学领域的潜力，包括物理学。本篇文章提供了物理学中机器学习的简要总览，涵盖主要概念的监督学习、无监督学习和强化学习，以及更特殊的主题，如 causal inference、符号回传和深度学习。我们介绍了物理学中机器学习的主要应用和相关挑战，以及未来的展望。

Advantages of Machine Learning in Bus Transport Analysis

paper_url: http://arxiv.org/abs/2310.19810
repo_url: None
paper_authors: Amirsadegh Roshanzamir
for: 这个研究旨在使用指导学习算法分析特拉特当地公共汽车系统的准时性。
methods: 该研究使用了各种指导学习算法，包括Python的Sci Kit Learn和Stats Models库，以建立准确的模型，能够预测任何一天是否会遵循公共汽车路线的时间标准。
results: 研究发现，指导学习算法最重要的考虑因素是公共汽车路线的效率，这对于改善公共汽车系统的性能提供了重要的洞察。

Abstract
Supervised Machine Learning is an innovative method that aims to mimic human learning by using past experiences. In this study, we utilize supervised machine learning algorithms to analyze the factors that contribute to the punctuality of Tehran BRT bus system. We gather publicly available datasets of 2020 to 2022 from Municipality of Tehran to train and test our models. By employing various algorithms and leveraging Python's Sci Kit Learn and Stats Models libraries, we construct accurate models capable of predicting whether a bus route will meet the prescribed standards for on-time performance on any given day. Furthermore, we delve deeper into the decision-making process of each algorithm to determine the most influential factor it considers. This investigation allows us to uncover the key feature that significantly impacts the effectiveness of bus routes, providing valuable insights for improving their performance.

摘要
超vised机器学习是一种创新的方法，旨在模仿人类学习的方式，使用过去的经验。在这个研究中，我们使用超vised机器学习算法来分析特拉ن布特公共汽车系统的准时性因素。我们使用2020年至2022年公共数据集，来训练和测试我们的模型。通过使用不同的算法和利用Python的Sci Kit Learn和Stats Models库，我们构建了准确的模型，能够预测任何一天会否遵循指定的准时性标准。此外，我们还探究每个算法的决策过程，以确定它最重要的考虑因素。这些调查可以帮助我们找到影响公共汽车线路效果的关键特征，提供有价值的反馈，以提高其性能。

MgNO: Efficient Parameterization of Linear Operators via Multigrid

paper_url: http://arxiv.org/abs/2310.19809
repo_url: None
paper_authors: Juncai He, Xinliang Liu, Jinchao Xu
for: 这个论文旨在提出一种简洁的神经网络架构，用于学习运算。
methods: 该方法使用了神经网络中的非线性运算层，其输出可以表示为 $\mathcal O_i(u) = \sigma\left( \sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$。在这里，$\mathcal W_{ij}$ 是将 $j $- 个输入神经元连接到 $i $- 个输出神经元的半bounded线性算子，而偏置 $\mathcal B_{ij}$ 是一个函数而不是整数。
results: 该方法可以准确地解决不同类型的偏微分方程（PDEs），并且在训练时显示出了更高的易学性和更低的抗抑阻性。

Abstract
In this work, we propose a concise neural operator architecture for operator learning. Drawing an analogy with a conventional fully connected neural network, we define the neural operator as follows: the output of the $i$-th neuron in a nonlinear operator layer is defined by $\mathcal O_i(u) = \sigma\left( \sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$. Here, $\mathcal W_{ij}$ denotes the bounded linear operator connecting $j$-th input neuron to $i$-th output neuron, and the bias $\mathcal B_{ij}$ takes the form of a function rather than a scalar. Given its new universal approximation property, the efficient parameterization of the bounded linear operators between two neurons (Banach spaces) plays a critical role. As a result, we introduce MgNO, utilizing multigrid structures to parameterize these linear operators between neurons. This approach offers both mathematical rigor and practical expressivity. Additionally, MgNO obviates the need for conventional lifting and projecting operators typically required in previous neural operators. Moreover, it seamlessly accommodates diverse boundary conditions. Our empirical observations reveal that MgNO exhibits superior ease of training compared to other CNN-based models, while also displaying a reduced susceptibility to overfitting when contrasted with spectral-type neural operators. We demonstrate the efficiency and accuracy of our method with consistently state-of-the-art performance on different types of partial differential equations (PDEs).

摘要
在这项工作中，我们提出了一种简洁神经操作架构，用于神经网络学习。我们将神经操作定义为：输出第i个神经元的非线性操作层的输出为： $\mathcal O_i(u) = \sigma\left(\sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$. 其中， $\mathcal W_{ij}$ 表示连接第j个输入神经元到第i个输出神经元的稍尺度的线性操作，而偏置 $\mathcal B_{ij}$ 是一个函数而非整数。由于这个新的通用近似性质，神经操作中的稍尺度线性操作之间的效率参数化（Banach空间）扮演了关键的角色。因此，我们引入MgNO，利用多重格struktur来参数这些线性操作。这种方法具有数学上的准确性和实际上的表达力。此外，MgNO可以自然地满足多种边界条件。我们的实验观察表明，MgNO比其他CNN基于模型更易于训练，同时也具有较少的折衔强度。我们通过不同类型的偏微分方程（PDEs）的实验表明了我们的方法的效率和准确性。

An Anytime Algorithm for Good Arm Identification

paper_url: http://arxiv.org/abs/2310.10359
repo_url: None
paper_authors: Marc Jourdan, Clémence Réda
for: 这 paper 的目的是解决在固定预算和时间限制下的好臂标识问题（GAI）。
methods: 这 paper 提出了一种无参数和时间自适应的采样规则，称为 APGAI，可以在固定信度和预算设置下使用。
results: 作者提供了关于 APGAI 的Upper bound 的概率错误和预测采样复杂性的证明，以及实验结果表明 APGAI 在 synthetic 和实际数据上具有良好的表现。

Abstract
In good arm identification (GAI), the goal is to identify one arm whose average performance exceeds a given threshold, referred to as good arm, if it exists. Few works have studied GAI in the fixed-budget setting, when the sampling budget is fixed beforehand, or the anytime setting, when a recommendation can be asked at any time. We propose APGAI, an anytime and parameter-free sampling rule for GAI in stochastic bandits. APGAI can be straightforwardly used in fixed-confidence and fixed-budget settings. First, we derive upper bounds on its probability of error at any time. They show that adaptive strategies are more efficient in detecting the absence of good arms than uniform sampling. Second, when APGAI is combined with a stopping rule, we prove upper bounds on the expected sampling complexity, holding at any confidence level. Finally, we show good empirical performance of APGAI on synthetic and real-world data. Our work offers an extensive overview of the GAI problem in all settings.

摘要
<>translate into Simplified ChineseIn good arm identification (GAI), the goal is to identify one arm whose average performance exceeds a given threshold, referred to as good arm, if it exists. Few works have studied GAI in the fixed-budget setting, when the sampling budget is fixed beforehand, or the anytime setting, when a recommendation can be asked at any time. We propose APGAI, an anytime and parameter-free sampling rule for GAI in stochastic bandits. APGAI can be straightforwardly used in fixed-confidence and fixed-budget settings. First, we derive upper bounds on its probability of error at any time. They show that adaptive strategies are more efficient in detecting the absence of good arms than uniform sampling. Second, when APGAI is combined with a stopping rule, we prove upper bounds on the expected sampling complexity, holding at any confidence level. Finally, we show good empirical performance of APGAI on synthetic and real-world data. Our work offers an extensive overview of the GAI problem in all settings.中文简体版：在好臂标识（GAI）中，目标是找到一个臂的平均性能超过给定的阈值的臂，如果存在。已有相对少的研究对GAI进行了固定预算设定或任何时间设定。我们提出了APGAI，一种无参数和任何时间 sampling 规则。APGAI可以直接在固定信度和固定预算设定下使用。我们首先 deriv 了APGAI在任何时间的错误概率的Upper bound。这些结果显示了适应策略在缺乏好臂时更有效率地检测。其次，当APGAI与停止规则结合使用时，我们证明了预期的样本复杂度的Upper bound，保持任何信度水平。最后，我们在 sintetic 和实际数据上显示了APGAI的良好实际表现。我们的工作对GAI问题在所有设定中进行了全面的概述。

Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification

paper_url: http://arxiv.org/abs/2310.10321
repo_url: None
paper_authors: Junjie Dong, Mudi Jiang, Lianyu Hu, Zengyou He
for: 该论文的目的是提出一种新的序列分类方法，以解决现有方法中的一些挑战，如缺乏特征组合的探索和精度下降。
methods: 该方法基于1D卷积神经网络（1DCNN）架构，并采用哈明距离基于相似度度量来确保特征挖掘和分类过程中的一致性。具体来说，该方法首先训练一个可解释的CNNEncoder对序列数据进行学习，然后通过梯度下降方式搜索出高度探索的k-mer组合。
results: 实验结果表明，该方法在分类精度方面比现有的状态作法更高。

Abstract
Sequence classification has numerous applications in various fields. Despite extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.

摘要
Sequence 分类有很多应用场景，尤其是在不同领域。 DESPITE extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.Here's the word-for-word translation:序列分类有很多应用场景，尤其是在不同领域。 DESPITE extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. 现有的 pattern-based methods 在挖掘过程中对每个特征进行分解能力的测量，导致漏掉一些特征组合的分解能力。更重要的是，将序列转换为特征向量后，保证总的分解性能是一个大问题。 To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.

A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead

paper_url: http://arxiv.org/abs/2310.10315
repo_url: None
paper_authors: Kamila Zaman, Alberto Marchisio, Muhammad Abdullah Hanif, Muhammad Shafique
for: 这篇论文主要是为了提供一个全面的Quantum Machine Learning（QML）领域的审视，并对不同的QML算法、量子数据集、硬件技术、软件工具、模拟器和应用场景进行了详细的介绍。
methods: 本论文使用了多种方法，包括论述基础概念、对класситиче计算的比较、介绍不同的QML算法和其适用领域、描述量子数据集和硬件技术的发展，以及介绍软件工具和模拟器。
results: 本论文提供了大量有价值的信息和资源，可以帮助读者快速入门到当前QML领域的state-of-the-art技术。

Abstract
Quantum Computing (QC) claims to improve the efficiency of solving complex problems, compared to classical computing. When QC is applied to Machine Learning (ML) applications, it forms a Quantum Machine Learning (QML) system. After discussing the basic concepts of QC and its advantages over classical computing, this paper reviews the key aspects of QML in a comprehensive manner. We discuss different QML algorithms and their domain applicability, quantum datasets, hardware technologies, software tools, simulators, and applications. In this survey, we provide valuable information and resources for readers to jumpstart into the current state-of-the-art techniques in the QML field.

摘要
量子计算（QC）宣称可以提高解决复杂问题的效率，相比于经典计算。当QC应用于机器学习（ML）应用时，它形成了量子机器学习（QML）系统。本文详细介绍了QML的关键方面，包括不同的QML算法和它们的领域应用、量子数据集、硬件技术、软件工具、模拟器和应用。本文提供了读者们进入现有技术领域的价值信息和资源，以便他们可以快速掌握当前领域的最新技术。

Transparent Anomaly Detection via Concept-based Explanations

paper_url: http://arxiv.org/abs/2310.10702
repo_url: None
paper_authors: Laya Rafiee Sevyeri, Ivaxi Sheth, Farhood Farahnak, Shirin Abbasinejad Enger
for: 本文提出了一种可解释的异常检测方法，以提高异常检测的可读性和人类可解释性。
methods: 本文使用了一种基于概念学习的异常检测方法，可以提供人类可解释的概念解释。此外，本文还提出了一种可与其他分类型异常检测方法集成的概念学习方法。
results: 本文通过三个实际数据集的实验表明，ACE方法可以提供高或相当于黑色盒模型的准确率，同时具有人类可解释的优势。

Abstract
Advancements in deep learning techniques have given a boost to the performance of anomaly detection. However, real-world and safety-critical applications demand a level of transparency and reasoning beyond accuracy. The task of anomaly detection (AD) focuses on finding whether a given sample follows the learned distribution. Existing methods lack the ability to reason with clear explanations for their outcomes. Hence to overcome this challenge, we propose Transparent {A}nomaly Detection {C}oncept {E}xplanations (ACE). ACE is able to provide human interpretable explanations in the form of concepts along with anomaly prediction. To the best of our knowledge, this is the first paper that proposes interpretable by-design anomaly detection. In addition to promoting transparency in AD, it allows for effective human-model interaction. Our proposed model shows either higher or comparable results to black-box uninterpretable models. We validate the performance of ACE across three realistic datasets - bird classification on CUB-200-2011, challenging histopathology slide image classification on TIL-WSI-TCGA, and gender classification on CelebA. We further demonstrate that our concept learning paradigm can be seamlessly integrated with other classification-based AD methods.

摘要
深度学习技术的进步使得异常检测性能得到了提高。然而，实际应用中需要更进一步的透明度和理解，而不仅仅是精度。异常检测任务的目标是判断给定样本是否遵循学习的分布。现有方法缺乏对结果的解释能力。因此，我们提出了透明异常检测概念解释（ACE）。ACE可以提供人类可读解释，并且与异常预测一起提供概念。根据我们所知，这是第一篇提出可解释的异常检测方法。此外，ACE还允许人机交互，从而提高了异常检测的效iveness。我们的提议的模型在三个实际数据集上进行验证：鸟类分类在CUB-200-2011上， histopathology slice image分类在TIL-WSI-TCGA上，以及性别分类在CelebA上。此外，我们还证明了我们的概念学习方法可以与其他分类型异常检测方法一起兼容。

Time integration schemes based on neural networks for solving partial differential equations on coarse grids

paper_url: http://arxiv.org/abs/2310.10308
repo_url: None
paper_authors: Xinxin Yan, Zhideng Zhou, Xiaohan Cheng, Xiaolei Yang
for: 本研究旨在提出一种基于神经网络的时间步长学习方法，以满足不同数学条件的需求。
methods: 本研究使用神经网络学习3步线性多步法，并应用到了三个模拟问题中，即一维热方程、一维波方程和一维吸引方程。
results: 结果显示，学习的完全约束方法的预测误差与Runge-Kutta方法和Adams-Bashforth方法的预测误差几乎相同。相比传统方法，学习的无约束和半约束方法在粗网格上显著减少预测误差，特别是对一维热方程的温度预测有显著改善。在4倍粗网格上，一些热方程的 casos 的 Mean Square Error 可以减少一个数量级，而波方程的预测相比传统方法具有明显的改善。在32倍粗网格上，Burgers方程的 Mean Square Error 可以减少35%-40%。

Abstract
The accuracy of solving partial differential equations (PDEs) on coarse grids is greatly affected by the choice of discretization schemes. In this work, we propose to learn time integration schemes based on neural networks which satisfy three distinct sets of mathematical constraints, i.e., unconstrained, semi-constrained with the root condition, and fully-constrained with both root and consistency conditions. We focus on the learning of 3-step linear multistep methods, which we subsequently applied to solve three model PDEs, i.e., the one-dimensional heat equation, the one-dimensional wave equation, and the one-dimensional Burgers' equation. The results show that the prediction error of the learned fully-constrained scheme is close to that of the Runge-Kutta method and Adams-Bashforth method. Compared to the traditional methods, the learned unconstrained and semi-constrained schemes significantly reduce the prediction error on coarse grids. On a grid that is 4 times coarser than the reference grid, the mean square error shows a reduction of up to an order of magnitude for some of the heat equation cases, and a substantial improvement in phase prediction for the wave equation. On a 32 times coarser grid, the mean square error for the Burgers' equation can be reduced by up to 35% to 40%.

摘要
“对于半精簇方程（PDEs）的粗糙网格解决方法，选择精度方法的选择对准精度有着很大的影响。在这项工作中，我们提出了基于神经网络的时间拟合方法，满足三种不同的数学约束，即无约束、半约束（根条件）和完全约束（根条件和一致性条件）。我们主要关注了学习3步线性多步法，并将其应用于解决三个模型PDE中的一维热方程、一维波方程和一维布尔格方程。结果表明，学习的完全约束方法的预测误差与Runge-Kutta方法和Adams-Bashforth方法几乎相同。相比传统方法，学习无约束和半约束方法在粗糙网格上显著减少预测误差。在参照网格4倍粗的情况下，一些热方程的 случа在下面可以减少至次之 Magnitude 的误差，而波方程的预测也有显著改善。在32倍粗网格上，布尔格方程的误差可以减少35%-40%。”

Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

paper_url: http://arxiv.org/abs/2310.10280
repo_url: None
paper_authors: Hadar Mulian, Segev Shlomov, Lior Limonad
for: 这项研究旨在探讨人工智能教师模型在促进细动技能学习中的潜在优势，以提高学习效率和学习结果的一致性。
methods: 该研究采用了人工智能学习和仿真学习方法，通过模拟教师与学生之间的互动来评估人工智能教师模型的效果。
results: 研究发现，使用人工智能教师模型可以提高学习效率和学习结果的一致性，并且可以适应不同的学生和学习环境。

Abstract
Motor skills, especially fine motor skills like handwriting, play an essential role in academic pursuits and everyday life. Traditional methods to teach these skills, although effective, can be time-consuming and inconsistent. With the rise of advanced technologies like robotics and artificial intelligence, there is increasing interest in automating such teaching processes using these technologies, via human-robot and human-computer interactions. In this study, we examine the potential of a virtual AI teacher in emulating the techniques of human educators for motor skill acquisition. We introduce an AI teacher model that captures the distinct characteristics of human instructors. Using a Reinforcement Learning environment tailored to mimic teacher-learner interactions, we tested our AI model against four guiding hypotheses, emphasizing improved learner performance, enhanced rate of skill acquisition, and reduced variability in learning outcomes. Our findings, validated on synthetic learners, revealed significant improvements across all tested hypotheses. Notably, our model showcased robustness across different learners and settings and demonstrated adaptability to handwriting. This research underscores the potential of integrating Reinforcement Learning and Imitation Learning models with robotics in revolutionizing the teaching of critical motor skills.

摘要
motor skills, especially fine motor skills like handwriting, play a crucial role in academic pursuits and everyday life. traditional methods to teach these skills, although effective, can be time-consuming and inconsistent. with the rise of advanced technologies like robotics and artificial intelligence, there is increasing interest in automating such teaching processes using these technologies, via human-robot and human-computer interactions. in this study, we examine the potential of a virtual AI teacher in emulating the techniques of human educators for motor skill acquisition. we introduce an AI teacher model that captures the distinct characteristics of human instructors. using a reinforcement learning environment tailored to mimic teacher-learner interactions, we tested our AI model against four guiding hypotheses, emphasizing improved learner performance, enhanced rate of skill acquisition, and reduced variability in learning outcomes. our findings, validated on synthetic learners, revealed significant improvements across all tested hypotheses. notably, our model showcased robustness across different learners and settings and demonstrated adaptability to handwriting. this research underscores the potential of integrating reinforcement learning and imitation learning models with robotics in revolutionizing the teaching of critical motor skills.

Leveraging heterogeneous spillover effects in maximizing contextual bandit rewards

paper_url: http://arxiv.org/abs/2310.10259
repo_url: None
paper_authors: Ahmed Sayeed Faruk, Elena Zheleva
for: 提高个性化推荐的相关性和准确性
methods: 利用多重环境抽象和个性化投资策略考虑用户之间的协同影响
results: 比现有方法高得多，可以更好地满足用户的需求和期望Here’s a brief explanation of each point:
for: The paper aims to improve the relevance and accuracy of personalized recommendations by taking into account the interdependent relationships between users.
methods: The proposed method uses a multi-armed bandit framework to model the interactions between users and the items they interact with, and incorporates heterogeneous spillover effects to better capture the impact of one user’s actions on others.
results: The proposed method outperforms existing approaches that ignore spillover effects, achieving significantly higher rewards in several real-world datasets.

Abstract
Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of these bandit algorithms is to learn the best arm (i.e., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. However, current approaches ignore potential spillover between interacting users, where the action of one user can impact the actions and rewards of other users. Moreover, spillover may vary for different people based on their preferences and the closeness of ties to other users. This leads to heterogeneity in the spillover effects, i.e., the extent to which the action of one user can impact the action of another. Here, we propose a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. By experimenting on several real-world datasets using prominent linear and non-linear contextual bandit algorithms, we observe that our proposed method leads to significantly higher rewards than existing solutions that ignore spillover.

摘要
Translated into Simplified Chinese: recommender systems 使用 contextual multi-armed bandits 不断提高相关的 item 推荐，通过考虑 contextual 信息来学习每个用户最佳的 arm (即最佳推荐)，以达到用户参与推荐的累积奖励的最大化。然而，当前的方法忽略了用户之间的互动副作用，即一个用户的行为会影响另一个用户的行为和奖励。此外，这种副作用可能因用户的偏好和与其他用户之间的关系而异常，即副作用的强度不同。为此，我们提出了一个框架，使得 contextual multi-armed bandits 能够考虑这种异常的副作用，以便为每个用户选择最佳的 arm。通过在一些真实世界数据上使用许多知名的线性和非线性 contextual bandit 算法进行实验，我们发现，我们提出的方法可以与忽略副作用的方法相比，获得更高的奖励。

paper_url: http://arxiv.org/abs/2310.10250
repo_url: None
paper_authors: Simon Hakenes, Tobias Glasmachers
for: 解决扩展空间 navigate 极少奖励问题
methods: 使用 topological maps 提升 elementary actions 到 object-oriented macro actions
results: 使用 DQN agent 解决 otherwise 不可能的环境

Abstract
This work addresses the challenge of navigating expansive spaces with sparse rewards through Reinforcement Learning (RL). Using topological maps, we elevate elementary actions to object-oriented macro actions, enabling a simple Deep Q-Network (DQN) agent to solve otherwise practically impossible environments.

摘要
这个工作面临了在广阔空间中缺乏奖励的挑战，通过再增 learning (RL) 方法解决。我们使用 topological maps，将基本的动作提升到对象层次的macro动作，使得简单的深度Q网络 (DQN) Agent 能够解决 otherwise 不可能的环境。

The Mixtures and the Neural Critics: On the Pointwise Mutual Information Profiles of Fine Distributions

paper_url: http://arxiv.org/abs/2310.10240
repo_url: https://github.com/cbg-ethz/bmi
paper_authors: Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx
for: 这个论文研究了点wise矩阵相互信息的profile，这是矩阵相互信息的一种扩展，它保持了 diffeomorphisms 的变换不变性。
methods: 论文使用了 Monte Carlo 方法来近似 multivariate normal distributions 的profile，并 introduce了 fine distributions 家族，可以用来研究现有的矩阵相互信息估计器的局限性，以及 neural critics 在variational estimators 中的行为。
results: 论文显示了 fine distributions 可以用来研究矩阵相互信息估计器的局限性，以及 neural critics 的行为，并可以用来获得model-based Bayesian 矩阵相互信息估计，适用于具有可用的领域专业知识的问题，在哪里 uncertainty quantification 是必要的。

Abstract
Mutual information quantifies the dependence between two random variables and remains invariant under diffeomorphisms. In this paper, we explore the pointwise mutual information profile, an extension of mutual information that maintains this invariance. We analytically describe the profiles of multivariate normal distributions and introduce the family of fine distributions, for which the profile can be accurately approximated using Monte Carlo methods. We then show how fine distributions can be used to study the limitations of existing mutual information estimators, investigate the behavior of neural critics used in variational estimators, and understand the effect of experimental outliers on mutual information estimation. Finally, we show how fine distributions can be used to obtain model-based Bayesian estimates of mutual information, suitable for problems with available domain expertise in which uncertainty quantification is necessary.

摘要
互信息量量化了两个随机变量之间的依赖关系，并保持不变于 diffeomorphisms。在这篇文章中，我们探讨了点 wise 互信息 Profile，是互信息的扩展，保持这种不变性。我们 analytically 描述了多变量正态分布的 Profile，并引入了 fine 分布家族，其中 profile 可以使用 Monte Carlo 方法准确地 approximation。然后，我们示示了 fine 分布可以用来研究现有互信息估计器的限制，调查变量批评器在variational estimator中的行为，并理解试验异常点对互信息估计的影响。最后，我们示示了 fine 分布可以用来获得基于模型的 Bayesian 估计，适用于具有可用的领域专业知识的问题，在uncertainty quantification中需要。

Structural transfer learning of non-Gaussian DAG

paper_url: http://arxiv.org/abs/2310.10239
repo_url: None
paper_authors: Mingyang Ren, Xin He, Junhui Wang
for: targets to improve the reconstruction of directional relationships among nodes in a directed acyclic graph (DAG) using heterogeneous data from multiple studies.
methods: proposes a novel set of structural similarity measures for DAG and a transfer DAG learning framework that leverages information from auxiliary DAGs of different levels of similarities.
results: substantial improvement in DAG reconstruction in the target study, even when no auxiliary DAG is overall similar to the target DAG, and supported by extensive numerical experiments on both synthetic data and multi-site brain functional connectivity network data.Here’s the full translation in Simplified Chinese:
for: 本研究目的是提高基于多个研究中收集的不同数据的指向关系图（DAG）的重建精度。
methods: 提出了一种新的结构相似度测量方法，并提出了一种基于不同相似度水平的转移DAG学习框架，以有效地利用 auxiliary DAGs 中的信息。
results: 对目标研究中的 DAGC 重建具有重要提高，即使 auxiliary DAG 与目标 DAGC 无法总体相似，而且通过对 sintetic 数据和多地点大脑功能连接网络数据进行广泛的数值实验支持。

Abstract
Directed acyclic graph (DAG) has been widely employed to represent directional relationships among a set of collected nodes. Yet, the available data in one single study is often limited for accurate DAG reconstruction, whereas heterogeneous data may be collected from multiple relevant studies. It remains an open question how to pool the heterogeneous data together for better DAG structure reconstruction in the target study. In this paper, we first introduce a novel set of structural similarity measures for DAG and then present a transfer DAG learning framework by effectively leveraging information from auxiliary DAGs of different levels of similarities. Our theoretical analysis shows substantial improvement in terms of DAG reconstruction in the target study, even when no auxiliary DAG is overall similar to the target DAG, which is in sharp contrast to most existing transfer learning methods. The advantage of the proposed transfer DAG learning is also supported by extensive numerical experiments on both synthetic data and multi-site brain functional connectivity network data.

摘要

GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation

paper_url: http://arxiv.org/abs/2310.10211
repo_url: None
paper_authors: Jhe-Yu Liou, Stephanie Forrest, Carole-Jean Wu
for: 本研究旨在提高大规模机器学习（ML）应用中的并行加速器（如GPU）的性能，但 ML 模型开发者通常缺乏关于下游系统架构的详细知识，而系统编程者则通常没有高级别的 ML 模型的理解。
methods: 本研究提出了 GEVO-ML，一种自动发现优化机会并调整 ML kernels性能的工具，其中 ML 模型和训练/预测过程都是通过单一高级表示语言（多层次中间表示语言，MLIR）表示的。GEVO-ML 使用多目标进化搜索发现 MLIR 代码中的修改（突变），以提高 Desired riteria 的性能，保留必要的功能。
results: 对两个不同的 ML 工作负荷进行了训练和预测。GEVO-ML 在这两个模型中发现了显著的 Pareto 提升，提高了模型精度的误差率从 2% 下降到 90.43%，并在训练工作负荷中提高了模型精度从 91% 到 96%，无需牺牲训练或测试速度。分析表明，GEVO-ML 的关键突变包括多种code修改，尽管可能不familiar with human developers，但它们实现了类似于人类开发者在模型设计中进行的改进，例如更改学习率或者修剪不必要的层参数。

Abstract
Parallel accelerators, such as GPUs, are key enablers for large-scale Machine Learning (ML) applications. However, ML model developers often lack detailed knowledge of the underlying system architectures, while system programmers usually do not have a high-level understanding of the ML model that runs on the specific system. To mitigate this gap between two relevant aspects of domain knowledge, this paper proposes GEVO-ML, a tool for automatically discovering optimization opportunities and tuning the performance of ML kernels, where the model and training/prediction processes are uniformly represented in a single intermediate language, the Multiple-Layer Intermediate Representation (MLIR). GEVO-ML uses multi-objective evolutionary search to find edits (mutations) to MLIR code that ultimately runs on GPUs, improving performance on desired criteria while retaining required functionality. We demonstrate GEVO-ML on two different ML workloads for both model training and prediction. GEVO-ML finds significant Pareto improvements for these models, achieving 90.43% performance improvement when model accuracy is relaxed by 2%, from 91.2% to 89.3%. For the training workloads, GEVO-ML finds a 4.88% improvement in model accuracy, from 91% to 96%, without sacrificing training or testing speed. Our analysis of key GEVO-ML mutations reveals diverse code modifications, while might be foreign to human developers, achieving similar effects with how human developers improve model design, for example, by changing learning rates or pruning non-essential layer parameters.

摘要
高级加速器，如图形处理器（GPU），是大规模机器学习（ML）应用的关键驱动器。然而，ML模型开发者经常缺乏深入的系统架构知识，而系统编程者通常没有高级的ML模型的具体知识。为了 bridge这两个领域的知识差距，这篇论文提出了 GEVO-ML，一种自动发现优化机会并调整 ML kernels的工具。GEVO-ML 使用多目标进化搜索来找到 MLIR 代码中的修改（突变），以提高 Desired 特性的性能，保留必要的功能。我们在两个不同的 ML 任务上运行 GEVO-ML，包括模型训练和预测。GEVO-ML 在这些模型上发现了显著的 pareto 改进，将模型精度从 91.2% 下降到 89.3%，同时提高了性能。对于训练任务，GEVO-ML 提高了模型精度从 91% 到 96%，而无需牺牲训练或测试速度。我们分析了 GEVO-ML 中关键的突变，发现这些突变可能 foreign 于人类开发者，但具有类似的效果，例如更改学习率或减少不必要的层参数。

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

paper_url: http://arxiv.org/abs/2310.10207
repo_url: https://github.com/joyjayng/Bongard-OpenWorld
paper_authors: Rujie Wu, Xiaojian Ma, Qing Li, Wei Wang, Zhenliang Zhang, Song-Chun Zhu, Yizhou Wang
for: 评估现实世界中的几何shot理解能力，即通过几何shot的图像训练模型可以在新的图像中理解和分类图像。
methods: 使用经典的Bongard问题（BPs）作为基础，并添加两种新的挑战：1）开放世界自由形容符，即图像概念由开放词汇中的图像特征和概念组成，2）使用实际世界图像而非 sintetic 图像。
results: 研究发现，当前的几何shot理解算法面临 significiant 挑战，而且even irectly probing VLMs 和 combining VLMs 和 LLMs 在交互理解方案中，无法距离人类的问题解决能力（64% 准确率，而人类参与者可以达到 91%）。

Abstract
We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative), the model needs to identify the set that query images belong to by inducing the visual concepts, which is exclusively depicted by images from the positive set. Our benchmark inherits the few-shot concept induction of the original BPs while adding the two novel layers of challenge: 1) open-world free-form concepts, as the visual concepts in Bongard-OpenWorld are unique compositions of terms from an open vocabulary, ranging from object categories to abstract visual attributes and commonsense factual knowledge; 2) real-world images, as opposed to the synthetic diagrams used by many counterparts. In our exploration, Bongard-OpenWorld already imposes a significant challenge to current few-shot reasoning algorithms. We further investigate to which extent the recently introduced Large Language Models (LLMs) and Vision-Language Models (VLMs) can solve our task, by directly probing VLMs, and combining VLMs and LLMs in an interactive reasoning scheme. We even designed a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems. However, none of these approaches manage to close the human-machine gap, as the best learner achieves 64% accuracy while human participants easily reach 91%. We hope Bongard-OpenWorld can help us better understand the limitations of current visual intelligence and facilitate future research on visual agents with stronger few-shot visual reasoning capabilities.

摘要
我们介绍了一个新的评价实际世界几何reasoning的benchmark，即Bongard-OpenWorld。它基于经典的Bongard问题（BP），要求模型通过引入视觉概念，将查询图像分配到正确的集合中。我们的benchmark继承了原BP的几何概念引入，并添加了两个新的挑战：1）开放世界自由形态概念，视觉概念在Bongard-OpenWorld中是独特的词汇库中的组合，范围从物体类到抽象视觉特征和常识知识; 2）实际图像，而不是许多同类的synthetic图像。在我们的探索中，Bongard-OpenWorld已经对当前几何reasoning算法带来了 significativetranslation challenges。我们进一步调查了current Large Language Models (LLMs)和Vision-Language Models (VLMs)是否可以解决我们的任务，直接考试VLMs，并将VLMs和LLMs结合在互动理解方案中。我们甚至设计了一种神经符号逻辑 reasoningapproach，将LLMs & VLMs与逻辑逻辑 reasoning相结合，以便模拟人类问题解决过程。然而， none of these approaches manage to close the human-machine gap，best learner的准确率只有64%，而人类参与者容易达到91%。我们希望Bongard-OpenWorld可以帮助我们更好地理解当前视觉智能的局限性，并促进未来的视觉代理人with stronger few-shot visual reasoning能力的研究。

Interpretable Predictive Models to Understand Risk Factors for Maternal and Fetal Outcomes

paper_url: http://arxiv.org/abs/2310.10203
repo_url: None
paper_authors: Tomas M. Bosschieter, Zifei Xu, Hui Lan, Benjamin J. Lengerich, Harsha Nori, Ian Painter, Vivienne Souter, Rich Caruana
for: 这个论文旨在提高妈妈和婴儿的健康，通过更好地理解风险因素，加强高风险患者的监测，及时采取有效措施，以便妈妈医生能够提供更好的照料。
methods: 这篇论文使用了可解释扩展机器学习方法（EBM）进行预测和重要风险因素的 indentification。EBM具有高准确率和可解释性，并且在验证和稳定性分析中证明了其可靠性。
results: 研究发现，EBM模型可以准确预测四种妈妈和婴儿的病情，并且可以提供有价值的风险因素。例如， maternal height 是Shoulder dystocia 的第二重要风险因素。这些结果表明，EBM模型在预测和预防妈妈和婴儿的严重病情中具有优秀的性能和可解释性。

Abstract
Although most pregnancies result in a good outcome, complications are not uncommon and can be associated with serious implications for mothers and babies. Predictive modeling has the potential to improve outcomes through better understanding of risk factors, heightened surveillance for high risk patients, and more timely and appropriate interventions, thereby helping obstetricians deliver better care. We identify and study the most important risk factors for four types of pregnancy complications: (i) severe maternal morbidity, (ii) shoulder dystocia, (iii) preterm preeclampsia, and (iv) antepartum stillbirth. We use an Explainable Boosting Machine (EBM), a high-accuracy glass-box learning method, for prediction and identification of important risk factors. We undertake external validation and perform an extensive robustness analysis of the EBM models. EBMs match the accuracy of other black-box ML methods such as deep neural networks and random forests, and outperform logistic regression, while being more interpretable. EBMs prove to be robust. The interpretability of the EBM models reveals surprising insights into the features contributing to risk (e.g. maternal height is the second most important feature for shoulder dystocia) and may have potential for clinical application in the prediction and prevention of serious complications in pregnancy.

摘要

An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records

paper_url: http://arxiv.org/abs/2310.10187
repo_url: None
paper_authors: Fabio Azzalini, Tommaso Dolci, Marco Vagaggini
for: 预测医院复 admit 的风险，以降低医疗成本并提高患者健康状况。
methods: 提出了一种新的、可解释的深度学习框架，基于 NLP 发现 word embeddings 和 ConvLSTM 神经网络模型，以更好地处理时间数据。
results: 对医院复 admit 的预测任务进行了 validate，并 introduce 了一种模型依赖的技术来使结果更容易被医疗 personnels 理解。结果比传统基于机器学习的模型提供更好的性能，同时也提供了更加可解释的结果。

Abstract
With the increasing availability of patients' data, modern medicine is shifting towards prospective healthcare. Electronic health records contain a variety of information useful for clinical patient description and can be exploited for the construction of predictive models, given that similar medical histories will likely lead to similar progressions. One example is unplanned hospital readmission prediction, an essential task for reducing hospital costs and improving patient health. Despite predictive models showing very good performances especially with deep-learning models, they are often criticized for the poor interpretability of their results, a fundamental characteristic in the medical field, where incorrect predictions might have serious consequences for the patient health. In this paper we propose a novel, interpretable deep-learning framework for predicting unplanned hospital readmissions, supported by NLP findings on word embeddings and by neural-network models (ConvLSTM) for better handling temporal data. We validate our system on the two predictive tasks of hospital readmission within 30 and 180 days, using real-world data. In addition, we introduce and test a model-dependent technique to make the representation of results easily interpretable by the medical staff. Our solution achieves better performances compared to traditional models based on machine learning, while providing at the same time more interpretable results.

摘要
Translation notes:* "prospective healthcare" is translated as "前瞻医疗" (pre-emptive healthcare)* "electronic health records" is translated as "电子健康记录" (electronic health records)* "clinical patient description" is translated as "临床患者描述" (clinical patient description)* "unplanned hospital readmission" is translated as "不计划入院" (unplanned hospital readmission)* "predictive models" is translated as "预测模型" (predictive models)* "word embeddings" is translated as "词嵌入" (word embeddings)* "ConvLSTM" is translated as "卷积LSTM" (ConvLSTM)* "temporal data" is translated as "时间数据" (temporal data)* "medical staff" is translated as "医疗人员" (medical staff)

Hypergraph Echo State Network

paper_url: http://arxiv.org/abs/2310.10177
repo_url: None
paper_authors: Justin Lien
for: 这篇文章是用于描述一种基于几何网络的对数据进行有效处理的网络模型，并且提出了一个基于几何网络的对数状态网络（HypergraphESN）的设计。
methods: 这篇文章使用了一种基于几何网络的对数状态网络（HypergraphESN），并且提出了这个方法的算法和稳定性条件。
results: 数据实验显示，HypergraphESN在处理几何网络结构的数据时，能够与传统的几何网络模型（GraphESN）相比，获得更高的准确率。具体来说，HypergraphESN在处理高阶相互作用的数据时，能够更好地处理非线性特征，并且可以实现更高的准确率。

Abstract
A hypergraph as a generalization of graphs records higher-order interactions among nodes, yields a more flexible network model, and allows non-linear features for a group of nodes. In this article, we propose a hypergraph echo state network (HypergraphESN) as a generalization of graph echo state network (GraphESN) designed for efficient processing of hypergraph-structured data, derive convergence conditions for the algorithm, and discuss its versatility in comparison to GraphESN. The numerical experiments on the binary classification tasks demonstrate that HypergraphESN exhibits comparable or superior accuracy performance to GraphESN for hypergraph-structured data, and accuracy increases if more higher-order interactions in a network are identified.

摘要
一种超графи（hypergraph）是图的扩展，用于记录高阶交互 among nodes，具有更灵活的网络模型，并允许非线性特征 для一组节点。在这篇文章中，我们提议一种基于超графи的响应状态网络（HypergraphESN）作为图响应状态网络（GraphESN）的扩展，用于高效处理超графи结构数据， derivation of convergence conditions for the algorithm, and discussion of its versatility compared to GraphESN. 数值实验表明，对于二分类任务，HypergraphESN可以与GraphESN具有相同或更高的准确率表现，并且如果在网络中更多的高阶交互被标识， то准确率会进一步提高。

On permutation symmetries in Bayesian neural network posteriors: a variational perspective

paper_url: http://arxiv.org/abs/2310.10171
repo_url: None
paper_authors: Simone Rossi, Ankit Singh, Thomas Hannagan
for: 这项研究旨在理解神经网络中梯度下降优化的困难性, 以及 Bayesian neural networks（BNNs）中 approximate inference 的问题。
methods: 这篇论文使用了 marginalized loss barrier 和 solution interpolation 的扩展 formalism, 以及一种匹配算法来搜索线性连接的解。
results: 实验结果表明, 对于多种架构和数据集, linearly connected solutions 的 marginalized loss barrier 几乎为零。

Abstract
The elusive nature of gradient-based optimization in neural networks is tied to their loss landscape geometry, which is poorly understood. However recent work has brought solid evidence that there is essentially no loss barrier between the local solutions of gradient descent, once accounting for weight-permutations that leave the network's computation unchanged. This raises questions for approximate inference in Bayesian neural networks (BNNs), where we are interested in marginalizing over multiple points in the loss landscape. In this work, we first extend the formalism of marginalized loss barrier and solution interpolation to BNNs, before proposing a matching algorithm to search for linearly connected solutions. This is achieved by aligning the distributions of two independent approximate Bayesian solutions with respect to permutation matrices. We build on the results of Ainsworth et al. (2023), reframing the problem as a combinatorial optimization one, using an approximation to the sum of bilinear assignment problem. We then experiment on a variety of architectures and datasets, finding nearly zero marginalized loss barriers for linearly connected solutions.

摘要
“神经网络中梯度基本优化的难易程度与其损失函数 geometry 存在深刻的关系。然而，最近的研究表明，在考虑权重 Permutation 后，梯度 descend 的本地解决方案之间存在无损函数梯度。这引发了对 approximate inference 在 Bayesian neural networks （BNNs）中进行 marginalization 的问题。在这个工作中，我们首先扩展了 marginalized loss barrier 和 solution interpolation 的形式主义，然后提出了一种匹配算法，用于搜索 linearly connected solutions。这是通过对两个独立的approximate Bayesian解决方案的分布进行对齐，以实现对 permutation matrices 的Alignment。我们基于 Ainsworth et al. (2023) 的结果，重新定义问题为一个 combinatorial optimization 问题，使用一种 Approximation 来计算 bilinear assignment problem 的和。然后我们在不同的架构和数据集上进行了实验，发现linearly connected solutions 的 marginalized loss barrier 几乎为零。”Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

An Empirical Study of Simplicial Representation Learning with Wasserstein Distance

paper_url: http://arxiv.org/abs/2310.10143
repo_url: None
paper_authors: Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai
for: 本研究探讨了使用树结构上的1- Wasserstein距离（Tree-Wasserstein distance，TWD）来学习 simplicial 表示，TWD 是两个树嵌入向量之间的L1距离。
methods: 本研究使用了一种基于自动采样的自监学习方法，使用 TWD 作为相似度度量，并提出了一种简单 yet effective的 Jeffrey divergence 基于正则化方法来稳定优化。
results: 通过对 STL10、CIFAR10、CIFAR100 和 SVHN 等数据集进行实验，研究发现，将 softmax 函数和 TWD 组合使用可以获得较低的结果，而且模型性能取决于 TWD 和 simplicial 模型的组合，并且 Jeffrey divergence 正则化通常能够稳定模型训练。最终，研究人员发现了选择合适的 TWD 和 simplicial 模型的组合可以超越cosine similarity 基于表示学习。

Abstract
In this paper, we delve into the problem of simplicial representation learning utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. Specifically, we consider a framework for simplicial representation estimation employing a self-supervised learning approach based on SimCLR with a negative TWD as a similarity measure. In SimCLR, the cosine similarity with real-vector embeddings is often utilized; however, it has not been well studied utilizing L1-based measures with simplicial embeddings. A key challenge is that training the L1 distance is numerically challenging and often yields unsatisfactory outcomes, and there are numerous choices for probability models. Thus, this study empirically investigates a strategy for optimizing self-supervised learning with TWD and find a stable training procedure. More specifically, we evaluate the combination of two types of TWD (total variation and ClusterTree) and several simplicial models including the softmax function, the ArcFace probability model, and simplicial embedding. Moreover, we propose a simple yet effective Jeffrey divergence-based regularization method to stabilize the optimization. Through empirical experiments on STL10, CIFAR10, CIFAR100, and SVHN, we first found that the simple combination of softmax function and TWD can obtain significantly lower results than the standard SimCLR (non-simplicial model and cosine similarity). We found that the model performance depends on the combination of TWD and the simplicial model, and the Jeffrey divergence regularization usually helps model training. Finally, we inferred that the appropriate choice of combination of TWD and simplicial models outperformed cosine similarity based representation learning.

摘要
在这篇论文中，我们研究了使用树结构上的一 Wasserstein 距离（TWD）来学习 simplicial 表示，其中 TWD 定义为两个树嵌入向量之间的 L1 距离。我们考虑了一种基于自适应学习的 simplicial 表示估计方法，使用 SimCLR 自适应学习框架，并使用 TWD 作为相似度量。在 SimCLR 中，通常使用 cosine 相似性来衡量实际向量嵌入，但是使用 L1 基于的度量尚未得到广泛研究。主要挑战在于训练 L1 距离是数值上困难，而且存在多种概率模型选择。因此，我们在这篇论文中进行了实验性的研究，以便在 TWD 和 simplicial 模型之间找到稳定的训练过程。具体来说，我们评估了两种类型的 TWD（总变量和 ClusterTree）以及多种 simplicial 模型，包括软max 函数、ArcFace 概率模型和 simplicial 嵌入。此外，我们还提出了一种简单 yet 有效的 Jeffrey 分布基本规范 regularization 方法，以稳定优化。通过对 STL10、CIFAR10、CIFAR100 和 SVHN 等数据集进行实验，我们发现了以下结论：1. 将 softmax 函数和 TWD 结合使用可以获得显著更好的结果，与标准 SimCLR（非 simplicial 模型和 cosine 相似性）相比。2. 模型性能取决于 TWD 和 simplicial 模型的组合，而 Jeffrey 分布基本规范常常帮助模型训练。3. 选择合适的 TWD 和 simplicial 模型的组合，通常会超越 cosine 相似性基于的表示学习。总之，我们的研究表明，使用 TWD 和 simplicial 模型可以提高表示学习的性能，并且可以选择合适的组合来超越 cosine 相似性基于的表示学习。

A Comprehensive Study of Privacy Risks in Curriculum Learning

paper_url: http://arxiv.org/abs/2310.10124
repo_url: None
paper_authors: Joann Qiongna Chen, Xinlei He, Zheng Li, Yang Zhang, Zhou Li
for: 本研究旨在探讨curriculum learning（CL）对机器学习的隐私影响，以填补现有的知识空白。
methods: 我们使用了membership inference attack（MIA）和attribute inference attack（AIA）两种方法来衡量CL对隐私的泄露。
results: 我们的evalution结果显示，MIA在CL下变得slightly more effective，但它的影响尤其明显于difficult sample subset。AIA对CL下的模型比MIA更加敏感，而exististing defense techniques仍然有效。此外，我们还提出了一种新的MIA方法，称为Diff-Cali，它基于difficulty scores进行结果准确化。

Abstract
Training a machine learning model with data following a meaningful order, i.e., from easy to hard, has been proven to be effective in accelerating the training process and achieving better model performance. The key enabling technique is curriculum learning (CL), which has seen great success and has been deployed in areas like image and text classification. Yet, how CL affects the privacy of machine learning is unclear. Given that CL changes the way a model memorizes the training data, its influence on data privacy needs to be thoroughly evaluated. To fill this knowledge gap, we perform the first study and leverage membership inference attack (MIA) and attribute inference attack (AIA) as two vectors to quantify the privacy leakage caused by CL. Our evaluation of nine real-world datasets with attack methods (NN-based, metric-based, label-only MIA, and NN-based AIA) revealed new insights about CL. First, MIA becomes slightly more effective when CL is applied, but the impact is much more prominent to a subset of training samples ranked as difficult. Second, a model trained under CL is less vulnerable under AIA, compared to MIA. Third, the existing defense techniques like DP-SGD, MemGuard, and MixupMMD are still effective under CL, though DP-SGD has a significant impact on target model accuracy. Finally, based on our insights into CL, we propose a new MIA, termed Diff-Cali, which exploits the difficulty scores for result calibration and is demonstrated to be effective against all CL methods and the normal training method. With this study, we hope to draw the community's attention to the unintended privacy risks of emerging machine-learning techniques and develop new attack benchmarks and defense solutions.

摘要
通过训练机器学习模型使用meaningful order的数据，即从易到难，已经证明可以加速训练过程并提高模型性能。关键技术是curriculum learning（CL），已经在图像和文本分类等领域取得了很大成功。然而，CL对机器学习的隐私影响是不清楚。因为CL改变了模型对训练数据的记忆方式，因此其对隐私的影响需要进行仔细评估。为了填补这个知识空白，我们进行了第一个研究，并利用成员推理攻击（MIA）和特征推理攻击（AIA）作为两种量度CL对隐私的泄露的方法。我们对九个实际 datasets进行了评估，并使用NN-based、metric-based、label-only MIA和NN-based AIA等方法进行攻击。我们发现了以下新的发现：1. MIA在CL应用后变得略微更加有效，但对于一些训练样本 ranked as difficult 的影响更加明显。2. 一个CL训练的模型对AIA更加抵触，相比于MIA。3. 现有的防御技术如DP-SGD、MemGuard和MixupMMD仍然有效于CL，尽管DP-SGD对目标模型准确率有显著影响。4. 基于我们对CL的发现，我们提出了一种新的MIA，称为Diff-Cali，它利用难度分数进行结果准确性的调整，并证明可以有效地对CL方法和常规训练方法进行攻击。通过这项研究，我们希望能吸引社区关注机器学习领域的意外隐私风险，并开发新的攻击 benchmark和防御解决方案。

A proximal augmented Lagrangian based algorithm for federated learning with global and local convex conic constraints

paper_url: http://arxiv.org/abs/2310.10117
repo_url: None
paper_authors: Chuan He, Le Peng, Ju Sun
for: 本研究针对 federated learning (FL) with constraints 进行研究，实现了在中央服务器和所有本地客户端之间收集数据，并实现了模型训练。
methods: 本研究提出了一个基于 proximal augmented Lagrangian (AL) 的 federated learning 框架，并使用了各种对数方法来解决问题。
results: 本研究的实验结果显示了这个方法在 Neyman-Pearson 分类和模型公平性方面的实际优势。另外，本研究还提出了一个新的 federated learning 框架，具有全球和本地凸对数约束的特点。

Abstract
This paper considers federated learning (FL) with constraints, where the central server and all local clients collectively minimize a sum of convex local objective functions subject to global and local convex conic constraints. To train the model without moving local data from clients to the central server, we propose an FL framework in which each local client performs multiple updates using the local objective and local constraint, while the central server handles the global constraint and performs aggregation based on the updated local models. In particular, we develop a proximal augmented Lagrangian (AL) based algorithm for FL with global and local convex conic constraints. The subproblems arising in this algorithm are solved by an inexact alternating direction method of multipliers (ADMM) in a federated fashion. Under a local Lipschitz condition and mild assumptions, we establish the worst-case complexity bounds of the proposed algorithm for finding an approximate KKT solution. To the best of our knowledge, this work proposes the first algorithm for FL with global and local constraints. Our numerical experiments demonstrate the practical advantages of our algorithm in performing Neyman-Pearson classification and enhancing model fairness in the context of FL.

摘要
We develop a proximal augmented Lagrangian (AL) based algorithm for FL with global and local convex conic constraints. The subproblems arising in this algorithm are solved using an inexact alternating direction method of multipliers (ADMM) in a federated fashion. Under local Lipschitz conditions and mild assumptions, we establish the worst-case complexity bounds of the proposed algorithm for finding an approximate KKT solution.To the best of our knowledge, this work proposes the first algorithm for FL with global and local constraints. Our numerical experiments demonstrate the practical advantages of our algorithm in performing Neyman-Pearson classification and enhancing model fairness in the context of FL.Here's the Simplified Chinese translation:这篇论文研究了基于约束的联合学习（Federated Learning，FL），其中中央服务器和所有本地客户端共同减少一个拥有 convex 本地目标函数的总和，同时遵循全局和本地 convex 凹陷约束。为了不让本地数据从客户端传输到中央服务器，我们提议了一种基于 FL 的框架，其中每个本地客户端可以多次使用本地目标和约束进行更新，而中央服务器则负责全局约束并基于更新后的本地模型进行聚合。我们开发了一种基于 proximal augmented Lagrangian（AL）的算法，用于解决 FL 中的全局和本地 convex 凹陷约束问题。这些子问题在我们的算法中使用了一种不精确的 alternating direction method of multipliers（ADMM）来解决。在本地 Lipschitz 条件和某些假设下，我们确定了我们提议的算法的最坏情况复杂性 bound。根据我们所知，这是第一个基于 FL 的全局和本地约束算法。我们的数据实验表明，我们的算法在 Neyman-Pearson 分类和 Federation Learning 中的实际优势。

PAC Learning Linear Thresholds from Label Proportions

paper_url: http://arxiv.org/abs/2310.10098
repo_url: None
paper_authors: Anand Brahmbhatt, Rishi Saket, Aravindan Raghuveer
for: 本研究旨在学习从标签分布中提取信息，特别是在标签分布中存在噪声和不确定性的情况下。
methods: 本研究使用了一种基于 Gaussian distribution 的方法，使用随机抽样来估算标签分布的方差矩阵，并使用这个矩阵来定义一个特征向量空间中的正常向量。然后，使用这个正常向量来定义一个线性阈值函数（LTF），并使用这个 LTF 来学习实例分类器。
results: 本研究表明，使用这种方法可以高效地学习 LTF，并且可以在标签分布中存在噪声和不确定性的情况下提取有用的信息。此外，研究还提供了一些总体错误 bounds 和特性分布 bounds，以确保学习的准确性和稳定性。实验评估表明，本方法可以与 [Saket’21, Saket’22] 等方法相比，并且在某些特殊情况下可以提供更高的准确性。

Abstract
Learning from label proportions (LLP) is a generalization of supervised learning in which the training data is available as sets or bags of feature-vectors (instances) along with the average instance-label of each bag. The goal is to train a good instance classifier. While most previous works on LLP have focused on training models on such training data, computational learnability of LLP was only recently explored by [Saket'21, Saket'22] who showed worst case intractability of properly learning linear threshold functions (LTFs) from label proportions. However, their work did not rule out efficient algorithms for this problem on natural distributions. In this work we show that it is indeed possible to efficiently learn LTFs using LTFs when given access to random bags of some label proportion in which feature-vectors are, conditioned on their labels, independently sampled from a Gaussian distribution $N(\mathbf{\mu}, \mathbf{\Sigma})$. Our work shows that a certain matrix -- formed using covariances of the differences of feature-vectors sampled from the bags with and without replacement -- necessarily has its principal component, after a transformation, in the direction of the normal vector of the LTF. Our algorithm estimates the means and covariance matrices using subgaussian concentration bounds which we show can be applied to efficiently sample bags for approximating the normal direction. Using this in conjunction with novel generalization error bounds in the bag setting, we show that a low error hypothesis LTF can be identified. For some special cases of the $N(\mathbf{0}, \mathbf{I})$ distribution we provide a simpler mean estimation based algorithm. We include an experimental evaluation of our learning algorithms along with a comparison with those of [Saket'21, Saket'22] and random LTFs, demonstrating the effectiveness of our techniques.

摘要
学习从标签比例（LLP）是一种泛化超级vised学习，其训练数据为特征向量集或包中的平均实例标签。目标是训练一个好的实例分类器。而大多数之前的LLP研究都集中在模型的训练上，而Computational learnability of LLP只在[Saket'21, Saket'22]中被研究过，他们表明了线性阈值函数（LTF）的合理学习是最坏情况不可能的。然而，他们的工作没有排除了自然分布下的有效算法。在这个工作中，我们证明了可以高效地学习LTF，只要给出Random Bag of Label Proportions（RBLP）中的特征向量，并且这些特征向量是Conditioned on their labels，独立地从 Gaussian 分布 $N(\mathbf{\mu}, \mathbf{\Sigma})$ 中随机抽取。我们的工作表明，一个特定的矩阵，由RBLP中带有和无置换的特征向量的差异的covariances形成，然后经过一种变换，必然有其主成分在正常方向上。我们的算法使用Subgaussian散射约束来估算均值和 covariance 矩阵，然后使用这些矩阵来随机抽取包来估算正常方向。使用这种方法，我们可以高效地分类LTF。对于 $N(\mathbf{0}, \mathbf{I})$ 分布的特殊情况，我们还提供了一个简单的均值估计基于算法。我们在实验评估了我们的学习算法，并与 [Saket'21, Saket'22] 和随机 LTF 进行了比较，demonstrating the effectiveness of our techniques。

LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Proportions

paper_url: http://arxiv.org/abs/2310.10096
repo_url: None
paper_authors: Anand Brahmbhatt, Mohith Pokala, Rishi Saket, Aravindan Raghuveer
for: 本文提出了一个大规模的标注量学习（LLP）数据集，用于 Addressing the lack of a open, large scale tabular benchmark。
methods: 本文提出了一个名为 LLP-Bench 的数据集，包含 56 个 LLP 数据集（52 个特征袋和 4 个随机袋数据集），它们都是从 Criteo CTR 预测数据集中的 45 万个实例中构建的。此外，本文还提出了四种度量量学习数据集的困难程度。
results: 本文通过使用这四种度量量学习数据集的困难程度，进行了深入的分析。此外，本文还通过使用 9 种 state-of-the-art 和受欢迎的标注量学习技术，对所有 56 个数据集进行了性能测试。根据本文的描述，这是文献中最广泛的标注量学习技术测试。

Abstract
In the task of Learning from Label Proportions (LLP), a model is trained on groups (a.k.a bags) of instances and their corresponding label proportions to predict labels for individual instances. LLP has been applied pre-dominantly on two types of datasets - image and tabular. In image LLP, bags of fixed size are created by randomly sampling instances from an underlying dataset. Bags created via this methodology are called random bags. Experimentation on Image LLP has been mostly on random bags on CIFAR-* and MNIST datasets. Despite being a very crucial task in privacy sensitive applications, tabular LLP does not yet have a open, large scale LLP benchmark. One of the unique properties of tabular LLP is the ability to create feature bags where all the instances in a bag have the same value for a given feature. It has been shown in prior research that feature bags are very common in practical, real world applications [Chen et. al '23, Saket et. al. '22]. In this paper, we address the lack of a open, large scale tabular benchmark. First we propose LLP-Bench, a suite of 56 LLP datasets (52 feature bag and 4 random bag datasets) created from the Criteo CTR prediction dataset consisting of 45 million instances. The 56 datasets represent diverse ways in which bags can be constructed from underlying tabular data. To the best of our knowledge, LLP-Bench is the first large scale tabular LLP benchmark with an extensive diversity in constituent datasets. Second, we propose four metrics that characterize and quantify the hardness of a LLP dataset. Using these four metrics we present deep analysis of the 56 datasets in LLP-Bench. Finally we present the performance of 9 SOTA and popular tabular LLP techniques on all the 56 datasets. To the best of our knowledge, our study consisting of more than 2500 experiments is the most extensive study of popular tabular LLP techniques in literature.

摘要
在学习从标签比例（LLP）任务中，一个模型被训练在实例组（即袋）和其对应的标签比例上，以预测个体实例的标签。 LLG 已经主要应用于图像和表格数据集。在图像 LLG 中，实例组通常通过随机抽样实例从下面数据集创建。这种方法创建的袋被称为随机袋。在 CIFAR-* 和 MNIST 数据集上进行了大量实验。虽然图像 LLG 是一个非常重要的任务，但是表格 LLG 尚未有一个开放、大规模的 LLG bencmark。表格 LLG 的一个独特性是可以创建特征袋，其中所有实例在袋中都有相同的特征值。在先前的研究中已经证明了特征袋在实际应用中很常见。在这篇文章中，我们解决表格 LLG 缺乏一个开放、大规模的 bencmark 问题。我们提出了 LLP-Bench，一个包含 56 个 LLG 数据集（52 个特征袋数据集和 4 个随机袋数据集）的集合，这些数据集是从 Criteo CTR 预测数据集中的 45 万个实例中创建的。这些 56 个数据集表示了从下面表格数据中构建袋的多种方法。我们知道，LLP-Bench 是首先开放、大规模的表格 LLG bencmark，并且具有广泛的数据集多样性。其次，我们提出了四种度量 LLG 数据集的困难程度。使用这四种度量，我们进行了深入的分析 LLP-Bench 中的 56 个数据集。最后，我们在所有 56 个数据集上运行了 9 种 state-of-the-art 和流行的表格 LLG 技术，并进行了 более чем 2500 个实验。到目前为止，我们的研究是文献中最广泛的表格 LLG 技术研究。

Label Differential Privacy via Aggregation

paper_url: http://arxiv.org/abs/2310.10092
repo_url: None
paper_authors: Anand Brahmbhatt, Rishi Saket, Shreyas Havaldar, Anshul Nasery, Aravindan Raghuveer
for: 保护敏感训练标签的隐私
methods: 使用 randomly weighted aggregation 和 additive noise 保护隐私
results: 可以实现 label-DP 保护，无需或少量的添加噪声，并且 preserved 训练任务的效果

Abstract
In many real-world applications, in particular due to recent developments in the privacy landscape, training data may be aggregated to preserve the privacy of sensitive training labels. In the learning from label proportions (LLP) framework, the dataset is partitioned into bags of feature-vectors which are available only with the sum of the labels per bag. A further restriction, which we call learning from bag aggregates (LBA) is where instead of individual feature-vectors, only the (possibly weighted) sum of the feature-vectors per bag is available. We study whether such aggregation techniques can provide privacy guarantees under the notion of label differential privacy (label-DP) previously studied in for e.g. [Chaudhuri-Hsu'11, Ghazi et al.'21, Esfandiari et al.'22]. It is easily seen that naive LBA and LLP do not provide label-DP. Our main result however, shows that weighted LBA using iid Gaussian weights with $m$ randomly sampled disjoint $k$-sized bags is in fact $(\varepsilon, \delta)$-label-DP for any $\varepsilon > 0$ with $\delta \approx \exp(-\Omega(\sqrt{k}))$ assuming a lower bound on the linear-mse regression loss. Further, this preserves the optimum over linear mse-regressors of bounded norm to within $(1 \pm o(1))$-factor w.p. $\approx 1 - \exp(-\Omega(m))$. We emphasize that no additive label noise is required. The analogous weighted-LLP does not however admit label-DP. Nevertheless, we show that if additive $N(0, 1)$ noise can be added to any constant fraction of the instance labels, then the noisy weighted-LLP admits similar label-DP guarantees without assumptions on the dataset, while preserving the utility of Lipschitz-bounded neural mse-regression tasks. Our work is the first to demonstrate that label-DP can be achieved by randomly weighted aggregation for regression tasks, using no or little additive noise.

摘要
在许多实际应用中，特别是due to recent developments in privacy landscape，training data可能会被聚合以保护敏感训练标签的隐私。在学习从标签聚合（LLP）框架中，数据集被分解成具有特征向量的袋子，但这些特征向量只有每个袋子的标签总和。我们称这种约束为学习从袋子聚合（LBA）。我们研究了这种聚合技术是否可以提供隐私保证，并且我们发现这种保证是可行的。我们的主要结果表明，使用独立的 Gaussian 权重，将 $m$ 个不同大小的 $k$-个袋子Randomly sampled，并使用加权 LBA，可以实现 $( \varepsilon, \delta)$-标签隐私（label-DP），其中 $\delta \approx \exp(-\Omega(\sqrt{k}))$。此外，这种方法可以保持最佳的线性mse回归损失，即 $(1 \pm o(1))$-factor w.p. $\approx 1 - \exp(-\Omega(m))$.这意味着没有添加标签噪声。虽然加权 LLP 不能实现标签隐私，但我们发现，如果将任意一部分的实例标签添加 $N(0, 1)$ 噪声，那么这种噪声化的加权 LLP 可以实现类似的标签隐私保证，不需要对数据集进行任何假设。此外，这种方法可以保持 Lipschitz-bounded 神经网络 mse-regression 任务的实用性。我们的工作是首次示出，通过Randomly weighted aggregation可以实现标签隐私，无需或只需少量的添加噪声。

Over-the-Air Federated Learning and Optimization

paper_url: http://arxiv.org/abs/2310.10089
repo_url: None
paper_authors: Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, Khaled B. Letaief
For: 这篇论文关注于 federated learning (FL) 中的 over-the-air computation (AirComp)，以减少无线网络上的通信负担，但是会增加模型聚合错误引起的学习性能下降。* Methods: 该论文首先进行了 AirComp-based FedAvg 算法的完整的研究，包括在强型凸和非凸设定下的常数和减少学习率下的渐进分析，以及在数据不同性下的影响分析。* Results: 该论文通过渐进分析和启发性分析，描述了模型聚合错误对渐进级别的影响，并提供了系统设计的准确性保证。此外，该论文还探讨了不同类型的本地更新（模型、梯度和模型差异）在 AirFedAvg 算法中的影响。

Abstract
Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to model aggregation error arising from channel fading and noise. We first provide a comprehensive study on the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both strongly convex and non-convex settings with constant and diminishing learning rates in the presence of data heterogeneity. Through convergence and asymptotic analysis, we characterize the impact of aggregation error on the convergence bound and provide insights for system design with convergence guarantees. Then we derive convergence rates for AirFedAvg algorithms for strongly convex and non-convex objectives. For different types of local updates that can be transmitted by edge devices (i.e., local model, gradient, and model difference), we reveal that transmitting local model in AirFedAvg may cause divergence in the training procedure. In addition, we consider more practical signal processing schemes to improve the communication efficiency and further extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes. Extensive simulation results under different settings of objective functions, transmitted local information, and communication schemes verify the theoretical conclusions.

摘要
federated learning（FL）是一种新的分布式机器学习模式，允许Edge设备之间的大量设备共同训练全球模型，保持隐私。在这个教程中，我们关注FL通过过空 computation（AirComp）来降低由无线网络传输的交流负担，但是由于通道抖动和噪声而导致模型聚合错误，因此降低了学习性能。我们首先对AirComp-based FedAvg（AirFedAvg）算法进行了全面的研究，包括强度凸和非凸设置下的常数和减少学习率，并在数据不同性下进行了与界分析。我们通过收敛和极限分析来描述聚合错误对收敛 bound 的影响，并提供了系统设计的准确性保证。然后，我们 derivated convergence rates for AirFedAvg algorithms for strongly convex and non-convex objectives. 对于不同的本地更新可以在 Edge devices 上传输（即本地模型、梯度和模型差异），我们发现在 AirFedAvg 中传输本地模型可能会导致训练过程中的偏转。此外，我们考虑了更实际的通信减少技术，并将这些技术应用于不同的聚合错误类型，以进一步推广收敛分析。我们的实验结果在不同的目标函数、传输的本地信息和通信方案下都验证了我们的理论结论。

A simple uniformly optimal method without line search for convex optimization

paper_url: http://arxiv.org/abs/2310.10082
repo_url: None
paper_authors: Tianjiao Li, Guanghui Lan
for: solves convex optimization problems with unknown problem parameters (e.g., Lipschitz constant) without the need for line search procedures.
methods: presents a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that achieves an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant.
results: demonstrates the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization through numerical results.

Abstract
Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with H\"{o}lder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.

摘要
《线搜索（或回溯）过程已广泛应用于首领方法中解决凸优化问题，特别是当问题参数未知（例如 lipschitz常数）。在这篇论文中，我们证明线搜索是不必要的，以实现凸优化问题的最佳$\mathcal{O}(1/k^2)$趋势速度。我们 THEN present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM)，可以在凸优化问题中不需要global lipschitz常数或线搜索过程来实现最佳趋势速度。我们然后扩展AC-FGM来解决凸优化问题，并证明它自动实现了最佳趋势速度，并且可以在所有问题类型中实现最佳趋势速度，只需要输入所需的精度。最后，我们report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization。》Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

SoTTA: Robust Test-Time Adaptation on Noisy Data Streams

paper_url: http://arxiv.org/abs/2310.10074
repo_url: https://github.com/taeckyung/SoTTA
paper_authors: Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, Sung-Ju Lee
for: 这个论文旨在Addressing distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation.
methods: 这个方法使用 two-fold enablers: (i) input-wise robustness via high-confidence uniform-class sampling, and (ii) parameter-wise robustness via entropy-sharpness minimization.
results: 比较先前的TTA方法，这个方法在存在噪音样本的情况下实现了比较好的性能，并且在没有噪音样本的情况下实现了相当的性能。

Abstract
Test-time adaptation (TTA) aims to address distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation. However, most TTA methods assume benign test streams, while test samples could be unexpectedly diverse in the wild. For instance, an unseen object or noise could appear in autonomous driving. This leads to a new threat to existing TTA algorithms; we found that prior TTA algorithms suffer from those noisy test samples as they blindly adapt to incoming samples. To address this problem, we present Screening-out Test-Time Adaptation (SoTTA), a novel TTA algorithm that is robust to noisy samples. The key enabler of SoTTA is two-fold: (i) input-wise robustness via high-confidence uniform-class sampling that effectively filters out the impact of noisy samples and (ii) parameter-wise robustness via entropy-sharpness minimization that improves the robustness of model parameters against large gradients from noisy samples. Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. The source code is available at https://github.com/taeckyung/SoTTA .

摘要
(i) input-wise robustness via high-confidence uniform-class sampling that effectively filters out the impact of noisy samples;(ii) parameter-wise robustness via entropy-sharpness minimization that improves the robustness of model parameters against large gradients from noisy samples.Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. The source code is available at .

Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey

paper_url: http://arxiv.org/abs/2310.10060
repo_url: None
paper_authors: Zijun Gao, Lingbo Li, Tianhua Xu
for:* The paper aims to provide a comprehensive review of data augmentation (DA) techniques for time series classification (TSC) and to develop a novel taxonomy for categorizing these techniques.methods:* The paper uses an extensive literature review and a rigorous analysis of over 100 scholarly articles to identify and categorize more than 60 unique DA techniques for TSC.* The paper also employs an all-encompassing empirical assessment using 8 UCR time-series datasets and ResNet to evaluate the performance of various DA strategies.results:* The paper reports a benchmark accuracy of 88.94 +- 11.83% using a multi-faceted evaluation paradigm that includes Accuracy, Method Ranking, and Residual Analysis.* The paper highlights the inconsistent efficacies of DA techniques for TSC and underscores the need for a robust navigational aid for scholars to select appropriate methods.

Abstract
Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible, user-oriented tools. In light of these challenges, this study embarks on an exhaustive dissection of DA methodologies within the TSC realm. Our initial approach involved an extensive literature review spanning a decade, revealing that contemporary surveys scarcely capture the breadth of advancements in DA for TSC, prompting us to meticulously analyze over 100 scholarly articles to distill more than 60 unique DA techniques. This rigorous analysis precipitated the formulation of a novel taxonomy, purpose-built for the intricacies of DA in TSC, categorizing techniques into five principal echelons: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. Our taxonomy promises to serve as a robust navigational aid for scholars, offering clarity and direction in method selection. Addressing the conspicuous absence of holistic evaluations for prevalent DA techniques, we executed an all-encompassing empirical assessment, wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR time-series datasets, employing ResNet and a multi-faceted evaluation paradigm encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the inconsistent efficacies of DA techniques, with...

摘要
<>Translate the given text into Simplified Chinese.<>时序分类（TSC）中的数据扩展（DA）已经成为一种不可或缺的策略，主要是因为它可以增加训练样本数量，从而增强模型的鲁棒性，维护数据集的多样性，并避免过拟合。然而，现有的DA在TSC领域的文献 landscape 受到了 Fragmented 的文献回顾，混乱的方法分类、不够的评价标准和Accessible 的用户工具的缺乏，这些挑战使得这项研究发起了一项极其详细的DA方法分析。我们的初始方法是进行了一项广泛的文献回顾，覆盖了一个 décennial 的时间范围，发现当前的 contemporary 文献几乎不能涵盖DA在TSC领域的全面发展，因此我们仔细分析了超过 100 篇学术文章，提取了超过 60 种Unique DA技术。这项精心的分析导致了我们提出了一个专门为TSC领域的DA方法分类的新分类法，将技术分为五个主要层次：转换基于、模式基于、生成器、分解基于和自动化数据扩展。我们的分类法承诺成为学术界的一个robust navigational aid，为执行者提供了方法选择的明确性和方向性。为了弥补DA技术的普遍存在的评价问题，我们实施了一项总面的实验室评价，对超过 15 种DA策略进行了8个UCSD时序数据集的评估，使用了ResNet和一种多方面的评价方案，包括准确率、方法排名和剩余分析，实现了基准准确率88.94±11.83%。我们的调查表明，DA技术的不同策略在不同的时序数据集上的表现存在差异，...

Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction

paper_url: http://arxiv.org/abs/2310.10056
repo_url: None
paper_authors: Han Qi, Xinyang Geng, Stefano Rando, Iku Ohama, Aviral Kumar, Sergey Levine
for: 这 paper 的目的是提出一种数据驱动的晶体结构预测方法，以优化晶体结构的计算成本。
methods: 这 paper 使用了一种名为 LCOMs (latent conservative objective models) 的方法，它利用一个状态环境 autoencoder 将晶体结构转换为一个矢量空间中的搜索空间，然后优化一个保守的晶体能量模型。
results: 这 paper 的结果表明，LCOMs 方法可以与现有最佳方法相比，成功率相似，而计算成本减少了多少。

Abstract
In computational chemistry, crystal structure prediction (CSP) is an optimization problem that involves discovering the lowest energy stable crystal structure for a given chemical formula. This problem is challenging as it requires discovering globally optimal designs with the lowest energies on complex manifolds. One approach to tackle this problem involves building simulators based on density functional theory (DFT) followed by running search in simulation, but these simulators are painfully slow. In this paper, we study present and study an alternate, data-driven approach to crystal structure prediction: instead of directly searching for the most stable structures in simulation, we train a surrogate model of the crystal formation energy from a database of existing crystal structures, and then optimize this model with respect to the parameters of the crystal structure. This surrogate model is trained to be conservative so as to prevent exploitation of its errors by the optimizer. To handle optimization in the non-Euclidean space of crystal structures, we first utilize a state-of-the-art graph diffusion auto-encoder (CD-VAE) to convert a crystal structure into a vector-based search space and then optimize a conservative surrogate model of the crystal energy, trained on top of this vector representation. We show that our approach, dubbed LCOMs (latent conservative objective models), performs comparably to the best current approaches in terms of success rate of structure prediction, while also drastically reducing computational cost.

摘要
在计算化学中，晶体结构预测（CSP）是一个优化问题，涉及到找到给定化学式的最低能量稳定晶体结构。这个问题是复杂的，因为需要找到最低能量的全局优化设计在复杂的拟合上。一种方法是建立基于密度函数理论（DFT）的模拟器，然后通过搜索在模拟中进行优化，但这些模拟器很慢。在这篇论文中，我们研究了一种 alternate，数据驱动的晶体结构预测方法：而不是直接在模拟中搜索最稳定的结构，我们将训练一个晶体形成能量的模拟器，该模拟器在数据库中的已知晶体结构基础上被训练，然后对晶体结构参数进行优化。这个模拟器被设计为保守的，以避免其错误被优化器利用。为处理晶体结构的非几何空间优化问题，我们首先使用当前最佳的图像扩散自动encoder（CD-VAE）将晶体结构转换为一个矢量基本搜索空间，然后对这个矢量表示的晶体能量模拟器进行保守的优化。我们发现，我们的方法，称为LCOMs（幽默保守目标模型），与当前最佳方法相比，在结构预测成功率方面表现相似，同时也很快地减少计算成本。

Symmetrical SyncMap for Imbalanced General Chunking Problems

paper_url: http://arxiv.org/abs/2310.10045
repo_url: None
paper_authors: Heng Zhang, Danilo Vasconcellos Vargas
for: 本研究旨在学习从序列中检索复杂结构，并适应任何结构变化。
methods: 本研究使用非线性动力学方程， inspirited by neuron group behaviors，而不使用损失函数。
results: 我们的算法在12个不均衡CGCP中表现出色，超过或与其他无监督状态元既达到同等水平。在实际应用场景中，我们的方法在3个场景中表现出优异，表明对时间数据中的トポлогиcal结构和层次结构具有探索性。

Abstract
Recently, SyncMap pioneered an approach to learn complex structures from sequences as well as adapt to any changes in underlying structures. This is achieved by using only nonlinear dynamical equations inspired by neuron group behaviors, i.e., without loss functions. Here we propose Symmetrical SyncMap that goes beyond the original work to show how to create dynamical equations and attractor-repeller points which are stable over the long run, even dealing with imbalanced continual general chunking problems (CGCPs). The main idea is to apply equal updates from negative and positive feedback loops by symmetrical activation. We then introduce the concept of memory window to allow for more positive updates. Our algorithm surpasses or ties other unsupervised state-of-the-art baselines in all 12 imbalanced CGCPs with various difficulties, including dynamically changing ones. To verify its performance in real-world scenarios, we conduct experiments on several well-studied structure learning problems. The proposed method surpasses substantially other methods in 3 out of 4 scenarios, suggesting that symmetrical activation plays a critical role in uncovering topological structures and even hierarchies encoded in temporal data.

摘要
最近，SyncMap开创了一种从序列学习复杂结构的方法，同时适应下面结构的变化。这是通过使用非线性动力学方程， inspirited by neuron group behaviors，而不使用损失函数。在这个研究中，我们提出了对称的SyncMap，超越原始工作，并显示了如何创建动力学方程和吸引器-抵抗点，这些点在长期内是稳定的，甚至在不均衡的CGCP中进行总览。我们的算法在12个不均衡CGCP中至少与其他无监督状态艺术基elines一样好，包括动态变化的CGCP。为了证明它在实际情况下的表现，我们在several well-studied structure learning问题上进行了实验。提出的方法在3个问题中大幅超过其他方法，表明对称活动在捕捉时间数据中的 topological结构和层次结构具有关键作用。

TpopT: Efficient Trainable Template Optimization on Low-Dimensional Manifolds

paper_url: http://arxiv.org/abs/2310.10039
repo_url: None
paper_authors: Jingkai Yan, Shiyu Wang, Xinyu Rain Wei, Jimmy Wang, Zsuzsanna Márka, Szabolcs Márka, John Wright
for: 检测低维度信号家族
methods: 使用 TemPlate OPTimization 框架，combined with embedding和kernel interpolation，提高计算效率
results: 在 gravitational wave detection 和手写数据上显示了明显的性能改善，并且可以替换现有的 matched filtering 方法

Abstract
In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality. In this work, we study TpopT (TemPlate OPTimization) as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability. We provide a theoretical analysis of the convergence of Riemannian gradient descent for TpopT, and prove that it has a superior dimension scaling to covering. We also propose a practical TpopT framework for nonparametric signal sets, which incorporates techniques of embedding and kernel interpolation, and is further configurable into a trainable network architecture by unrolled optimization. The proposed trainable TpopT exhibits significantly improved efficiency-accuracy tradeoffs for gravitational wave detection, where matched filtering is currently a method of choice. We further illustrate the general applicability of this approach with experiments on handwritten digit data.

摘要
在科学和工程应用中，检测低维度信号家族是一项常复现的任务。经典的方法之一是模板匹配，它在搜索空间使用密集的模板银行，但它的计算效率受到信号空间维度的不利影响。在这项工作中，我们研究TpopT（模板优化）作为一种可扩展的搜索框架，可以快速检测低维度信号家族，同时保持高度可读性。我们提供了TpopT的理论分析，证明它在维度上有更好的缩放性。此外，我们还提出了一种实用的TpopT框架，用于非参数式信号集，该框架包括投影和核函数 interpolate 技术，并可以通过不断的优化来转化为可训练的网络结构。我们的可训练TpopT在探测 gravitational wave 方面表现出了明显的效率-准确性融合优势，现在matched filtering 是选择的方法。此外，我们还通过对手写数据进行实验，证明了这种方法的通用性。

Unraveling Fundamental Properties of Power System Resilience Curves using Unsupervised Machine Learning

paper_url: http://arxiv.org/abs/2310.10030
repo_url: None
paper_authors: Bo Li, Ali Mostafavi
for: 这个研究旨在描述和量化基础设施的鲜敏性特点。
methods: 这个研究使用无监督机器学习分析了超过200个关于三次极端天气事件的停电情况下的鲜敏性曲线。
results: 研究发现了两种基础设施鲜敏性曲线模型：三角形曲线和梯形曲线。三角形曲线基于 three critical functionality threshold、critical functionality recovery rate 和 recovery pivot point。梯形曲线则基于停电持续时间和平均恢复率。停电持续时间越长，恢复率就越慢。这些发现可以帮助我们更好地理解和预测基础设施的鲜敏性表现。

Abstract
The standard model of infrastructure resilience, the resilience triangle, has been the primary way of characterizing and quantifying infrastructure resilience. However, the theoretical model merely provides a one-size-fits-all framework for all infrastructure systems. Most of the existing studies examine the characteristics of infrastructure resilience curves based on analytical models constructed upon simulated system performance. Limited empirical studies hindered our ability to fully understand and predict resilience characteristics in infrastructure systems. To address this gap, this study examined over 200 resilience curves related to power outages in three major extreme weather events. Using unsupervised machine learning, we examined different curve archetypes, as well as the fundamental properties of each resilience curve archetype. The results show two primary archetypes for power system resilience curves, triangular, and trapezoidal curves. Triangular curves characterize resilience behavior based on 1. critical functionality threshold, 2. critical functionality recovery rate, and 3. recovery pivot point. Trapezoidal archetypes explain resilience curves based on 1. duration of sustained function loss and 2. constant recovery rate. The longer the duration of sustained function loss, the slower the constant rate of recovery. The findings of this study provide novel perspectives enabling better understanding and prediction of resilience performance of power system infrastructures.

摘要
现代基础设施鲜度模型，即鲜度三角形模型，已成为基础设施鲜度的主要方法。然而，这种理论模型只能为所有基础设施系统提供一个一大 Familiar framework。大多数现有研究都是基于对基础设施系统性能的分析建模。有限的实证研究限制了我们理解和预测基础设施系统鲜度的能力。为了解决这个差距，本研究对三次极端天气事件中的电力停机事件进行了200多个鲜度曲线的研究。使用无监督机器学习方法，我们研究了不同的鲜度曲线范型，以及每个鲜度曲线范型的基本性质。结果显示，电力系统鲜度曲线有两种主要范型：三角形曲线和梯形曲线。三角形曲线表示鲜度行为的三个关键指标：极限功能阈值、极限功能恢复率和恢复枢轴点。梯形曲线则解释鲜度曲线的两个指标：持续功能损失的时间长度和恢复率。即使持续功能损失的时间长度越长，恢复率也越慢。这些发现为电力系统基础设施的鲜度性能提供了新的视角，帮助更好地理解和预测鲜度性能。

Data-Driven Score-Based Models for Generating Stable Structures with Adaptive Crystal Cells

paper_url: http://arxiv.org/abs/2310.10695
repo_url: https://github.com/findooshka/diffusion-atoms
paper_authors: Arsen Sultanov, Jean-Claude Crivello, Tabea Rebafka, Nataliya Sokolovska
for: 本研究旨在通过机器学习生成模型，找到新的功能性和稳定性的材料。
methods: 该研究使用了分布式朴素迭代随机动力学模型，在训练过程中学习了晶格的各个参数，并在生成新的化学结构时使用了两个杂谱处理来生成晶格和原子位置。
results: 研究人员通过对不同化学系统和晶体群进行比较，表明了他们的模型能够在不需要额外训练的情况下，生成新的候选结构。

Abstract
The discovery of new functional and stable materials is a big challenge due to its complexity. This work aims at the generation of new crystal structures with desired properties, such as chemical stability and specified chemical composition, by using machine learning generative models. Compared to the generation of molecules, crystal structures pose new difficulties arising from the periodic nature of the crystal and from the specific symmetry constraints related to the space group. In this work, score-based probabilistic models based on annealed Langevin dynamics, which have shown excellent performance in various applications, are adapted to the task of crystal generation. The novelty of the presented approach resides in the fact that the lattice of the crystal cell is not fixed. During the training of the model, the lattice is learned from the available data, whereas during the sampling of a new chemical structure, two denoising processes are used in parallel to generate the lattice along the generation of the atomic positions. A multigraph crystal representation is introduced that respects symmetry constraints, yielding computational advantages and a better quality of the sampled structures. We show that our model is capable of generating new candidate structures in any chosen chemical system and crystal group without any additional training. To illustrate the functionality of the proposed method, a comparison of our model to other recent generative models, based on descriptor-based metrics, is provided.

摘要
<>Translate the given text into Simplified Chinese.<>新型功能稳定材料的发现是一个大的挑战，因为它的复杂性。这项工作的目标是通过机器学习生成模型生成新的晶体结构，其具有指定的化学稳定性和某些化学成分。与分子生成不同，晶体结构受到晶体 периоди性和空间群特殊约束的限制，这些约束使得晶体生成增加了新的挑战。在这项工作中，我们使用了Score-based潜在随机模型，这种模型在多种应用中表现出色。我们的新方法在训练模型时不 fix 晶体维度，而是在数据available时学习晶体矩阵，并在生成原子位置时使用了两个杂化过程。我们引入了多граф晶体表示，该表示符合Symmetry约束，从而获得计算优势和更高质量的样本结构。我们显示了我们的模型可以在任选的化学系统和晶体组中生成新的候选结构，无需额外训练。为证明我们的方法的可行性，我们对其与其他最近的生成模型进行了比较，并通过描述符 metric 进行评估。

Riemannian Residual Neural Networks

paper_url: http://arxiv.org/abs/2310.10013
repo_url: None
paper_authors: Isay Katsman, Eric Ming Chen, Sidhanth Holalkere, Anna Asch, Aaron Lou, Ser-Nam Lim, Christopher De Sa
for: 这种研究旨在扩展常见的欧几丁素神经网络（ResNet）到整体几何抽象空间中，以便在自然科学中遇到的拓扑空间数据上进行学习。
methods: 这篇论文使用了几何神经网络的扩展，以便在拓扑空间上进行学习。这种扩展基于几何抽象空间的原则，并且可以覆盖整体几何抽象空间中的任何点。
results: 论文的实验结果表明，使用这种几何神经网络可以在拓扑空间上进行更好的学习，并且在相关的测试指标上表现更好，比如训练律动和测试结果。

Abstract
Recent methods in geometric deep learning have introduced various neural networks to operate over data that lie on Riemannian manifolds. Such networks are often necessary to learn well over graphs with a hierarchical structure or to learn over manifold-valued data encountered in the natural sciences. These networks are often inspired by and directly generalize standard Euclidean neural networks. However, extending Euclidean networks is difficult and has only been done for a select few manifolds. In this work, we examine the residual neural network (ResNet) and show how to extend this construction to general Riemannian manifolds in a geometrically principled manner. Originally introduced to help solve the vanishing gradient problem, ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks. We find that our Riemannian ResNets mirror these desirable properties: when compared to existing manifold neural networks designed to learn over hyperbolic space and the manifold of symmetric positive definite matrices, we outperform both kinds of networks in terms of relevant testing metrics and training dynamics.

摘要
现代几何深度学习方法已经引入了许多神经网络操作于偏射抽象空间上的数据。这些神经网络经常用于学习具有层次结构的图或者学习自然科学中遇到的拟合空间上的数据。这些神经网络通常是基于标准欧几何网络的扩展，但扩展到普通的欧几何空间是困难的，只有对一些特殊的欧几何空间进行了扩展。在这个工作中，我们研究了剩余神经网络（ResNet）的扩展，并证明了这种扩展方法可以在一般的偏射抽象空间上进行地理emetric的扩展。原本是解决减速问题的概念，ResNet在机器学习中得到了广泛的应用，因为它具有良好的学习性、优秀的实验性和容易整合到不同神经网络中的特点。我们发现，我们的偏射抽象空间上的ResNet和已有的欧几何空间上的神经网络相比，在相关的测试指标和训练动态上表现出色。

Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?

paper_url: http://arxiv.org/abs/2310.10012
repo_url: None
paper_authors: Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie, Chih-Hsun Lin, Jia-You Chen, Bo Li, Pin-Yu Chen, Chia-Mu Yu, Chun-Ying Huang
for: 本研究旨在调查 diffusion models 的安全机制，以确保它们不会生成不适或有害内容。
methods: 我们提出了一种新的概念检索算法，可以评估 diffusion models 的安全性。该算法首先提取敏感或不适的概念，然后使用这些概念来自动标识 diffusion models 中可能生成不适内容的提问。
results: 我们的研究表明， Ring-A-Bell 可以 manipulate 安全提问 benchmarks，使得原本被视为安全的提问可以逃脱现有的安全机制，并生成不适或有害内容。这表明现有的安全机制并不够，需要进一步改进。

Abstract
Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have recently demonstrated exceptional capabilities for generating high-quality content. However, this progress has raised several concerns of potential misuse, particularly in creating copyrighted, prohibited, and restricted content, or NSFW (not safe for work) images. While efforts have been made to mitigate such problems, either by implementing a safety filter at the evaluation stage or by fine-tuning models to eliminate undesirable concepts or styles, the effectiveness of these safety measures in dealing with a wide range of prompts remains largely unexplored. In this work, we aim to investigate these safety mechanisms by proposing one novel concept retrieval algorithm for evaluation. We introduce Ring-A-Bell, a model-agnostic red-teaming tool for T2I diffusion models, where the whole evaluation can be prepared in advance without prior knowledge of the target model. Specifically, Ring-A-Bell first performs concept extraction to obtain holistic representations for sensitive and inappropriate concepts. Subsequently, by leveraging the extracted concept, Ring-A-Bell automatically identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content, allowing the user to assess the reliability of deployed safety mechanisms. Finally, we empirically validate our method by testing online services such as Midjourney and various methods of concept removal. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms, thus revealing the defects of the so-called safety mechanisms which could practically lead to the generation of harmful contents.

摘要
Diffusion模型 для文本到图像（T2I）合成，如稳定扩散（SD），最近已经展示出了高质量内容的生成能力。然而，这种进步也引起了许多关于可能的不当使用的担忧，特别是在生成版权、禁止或限制的内容，或者NSFW（不适合工作）图像。虽有尝试了对这些问题进行缓解，例如在评估阶段实施安全筛选或者 Fine-tune模型以消除不жела的概念或风格，但是这些安全措施在各种提示下的效果仍然未经充分探索。在这项工作中，我们目的是调查这些安全机制。我们提出了一种新的概念检索算法，用于评估T2I扩散模型的安全性。我们称之为“铃铛”（Ring-A-Bell），它是一种无关模型的红Team工具，可以在提前准备的情况下完全无需先知Target模型来进行评估。具体来说，“铃铛”首先从敏感和不适合内容中提取概念，然后利用提取到的概念来自动识别扩散模型中的问题提示，并生成相应的不适合内容。这样，用户可以评估 deployed safety mechanisms的可靠性。最后，我们经验 validate我们的方法，测试在线服务如midjourney和不同的概念 removalfrom。我们的结果表明，“铃铛”可以通过修改安全提示benchmark，将原本被视为安全的提示转变为扩散模型生成不适合内容，因此揭示了现有的安全机制的缺陷，这些缺陷可能导致生成危害内容。

Implicit regularization via soft ascent-descent

paper_url: http://arxiv.org/abs/2310.10006
repo_url: https://github.com/feedbackward/bdd-flood
paper_authors: Matthew J. Holland, Kosuke Nakatani
for: 提高机器学习过程中的OFF-sample泛化性能，避免过多的试错和重复。
methods: 使用Gradient Regularization的softened、点 wise机制，以降低边缘点的影响和抑制异常值的影响。
results: 与SAM和Flooding相比，SoftAD可以实现类比的分类精度，同时具有远小的损失泛化差和模型评价。

Abstract
As models grow larger and more complex, achieving better off-sample generalization with minimal trial-and-error is critical to the reliability and economy of machine learning workflows. As a proxy for the well-studied heuristic of seeking "flat" local minima, gradient regularization is a natural avenue, and first-order approximations such as Flooding and sharpness-aware minimization (SAM) have received significant attention, but their performance depends critically on hyperparameters (flood threshold and neighborhood radius, respectively) that are non-trivial to specify in advance. In order to develop a procedure which is more resilient to misspecified hyperparameters, with the hard-threshold "ascent-descent" switching device used in Flooding as motivation, we propose a softened, pointwise mechanism called SoftAD that downweights points on the borderline, limits the effects of outliers, and retains the ascent-descent effect. We contrast formal stationarity guarantees with those for Flooding, and empirically demonstrate how SoftAD can realize classification accuracy competitive with SAM and Flooding while maintaining a much smaller loss generalization gap and model norm. Our empirical tests range from simple binary classification on the plane to image classification using neural networks with millions of parameters; the key trends are observed across all datasets and models studied, and suggest a potential new approach to implicit regularization.

摘要
To overcome this limitation, we propose a softened, pointwise mechanism called SoftAD, which downweights points on the borderline, limits the effects of outliers, and retains the ascent-descent effect. By comparing the formal stationarity guarantees of SoftAD with those of Flooding, we demonstrate that SoftAD can achieve classification accuracy competitive with SAM and Flooding while maintaining a much smaller loss generalization gap and model norm.Our empirical tests cover a range of datasets and models, from simple binary classification on the plane to image classification using neural networks with millions of parameters. The key trends observed across all datasets and models suggest a potential new approach to implicit regularization.

Conformal Contextual Robust Optimization

paper_url: http://arxiv.org/abs/2310.10003
repo_url: None
paper_authors: Yash Patel, Sahana Rayan, Ambuj Tewari
for: 这个论文是为了解决决策问题，具体来说是使用数据驱动方法来避免对不确定性范围的误差，从而提高决策的优化。
methods: 这个论文使用的方法是基于Conditional Generative Model的高维空间中的非 conjugate 预测区域，这些预测区域具有 Desired distribution-free coverage guarantees。
results: 研究人员通过在一系列的 simulations-based inference benchmark tasks和基于气象预测的交通路径规划问题来展示 CPO 框架的效果，并提供了semantically meaningful的视觉总结来解释决策的优化。

Abstract
Data-driven approaches to predict-then-optimize decision-making problems seek to mitigate the risk of uncertainty region misspecification in safety-critical settings. Current approaches, however, suffer from considering overly conservative uncertainty regions, often resulting in suboptimal decisionmaking. To this end, we propose Conformal-Predict-Then-Optimize (CPO), a framework for leveraging highly informative, nonconvex conformal prediction regions over high-dimensional spaces based on conditional generative models, which have the desired distribution-free coverage guarantees. Despite guaranteeing robustness, such black-box optimization procedures alone inspire little confidence owing to the lack of explanation of why a particular decision was found to be optimal. We, therefore, augment CPO to additionally provide semantically meaningful visual summaries of the uncertainty regions to give qualitative intuition for the optimal decision. We highlight the CPO framework by demonstrating results on a suite of simulation-based inference benchmark tasks and a vehicle routing task based on probabilistic weather prediction.

摘要
<>使用数据驱动的方法来解决决策问题，以减少不确定性区域的误差，在安全关键的场景中非常重要。现有的方法frequently suffer from considering overly conservative uncertainty regions, often resulting in suboptimal decision-making. To address this, we propose Conformal-Predict-Then-Optimize (CPO), a framework that leverages highly informative, nonconvex conformal prediction regions over high-dimensional spaces based on conditional generative models, which provide desired distribution-free coverage guarantees. Despite providing robustness, such black-box optimization procedures alone may lack confidence due to the lack of explanation of why a particular decision was found to be optimal. We, therefore, augment CPO with additional provision of semantically meaningful visual summaries of the uncertainty regions to provide qualitative intuition for the optimal decision. We demonstrate the effectiveness of the CPO framework through results on a suite of simulation-based inference benchmark tasks and a vehicle routing task based on probabilistic weather prediction.>Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you prefer Traditional Chinese, I can provide that as well.

Outlier Detection Using Generative Models with Theoretical Performance Guarantees

paper_url: http://arxiv.org/abs/2310.09999
repo_url: None
paper_authors: Jirong Yi, Jingchao Gao, Tianming Wang, Xiaodong Wu, Weiyu Xu
for: 这篇论文考虑了模拟器模型中的信号恢复问题，具体来说是在线性测量中受到稀疏异常的情况下恢复原始信号。
methods: 我们提出了一种异常检测方法，可以在模拟器模型下恢复原始信号，并且我们提供了有关信号恢复的理论保证。
results: 我们的实验结果表明，使用我们的方法可以成功恢复信号，即使在稀疏异常的情况下。我们的方法比传统的lasso和平方$\ell_2$最小化方法更高效。

Abstract
This paper considers the problem of recovering signals modeled by generative models from linear measurements contaminated with sparse outliers. We propose an outlier detection approach for reconstructing the ground-truth signals modeled by generative models under sparse outliers. We establish theoretical recovery guarantees for reconstruction of signals using generative models in the presence of outliers, giving lower bounds on the number of correctable outliers. Our results are applicable to both linear generator neural networks and the nonlinear generator neural networks with an arbitrary number of layers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outlier detection problem via $\ell_1$ norm minimization, and a gradient descent algorithm for solving the outlier detection problem via squared $\ell_1$ norm minimization. We conduct extensive experiments using variational auto-encoder and deep convolutional generative adversarial networks, and the experimental results show that the signals can be successfully reconstructed under outliers using our approach. Our approach outperforms the traditional Lasso and $\ell_2$ minimization approach.

摘要
We propose two algorithms to solve the outlier detection problem: an iterative alternating direction method of multipliers (ADMM) algorithm that minimizes the $\ell_1$ norm, and a gradient descent algorithm that minimizes the squared $\ell_1$ norm. We conduct extensive experiments using variational auto-encoders and deep convolutional generative adversarial networks, and the results show that our approach can successfully reconstruct the signals even under outliers. Our approach outperforms traditional Lasso and $\ell_2$ minimization methods.

Applications of Machine Learning in Biopharmaceutical Process Development and Manufacturing: Current Trends, Challenges, and Opportunities

paper_url: http://arxiv.org/abs/2310.09991
repo_url: None
paper_authors: Thanh Tung Khuat, Robert Bassett, Ellen Otte, Alistair Grevis-James, Bogdan Gabrys
for: 本研究旨在提供一个全面的机器学习（ML）解决方案在生物医药领域的应用现状，包括生物产品设计、监测、控制和优化的过程中的应用。
methods: 本研究使用的方法包括机器学习模型的采用，以提高生物医药生产过程中的分析、监测和控制能力。
results: 本研究结果表明，机器学习模型在生物医药生产过程中的应用可以提高生产效率、产品质量和生产可靠性等方面的表现。同时，本研究还揭示了生物医药过程数据的复杂性和多维性，以及机器学习模型在生物医药过程中的挑战和限制。

Abstract
While machine learning (ML) has made significant contributions to the biopharmaceutical field, its applications are still in the early stages in terms of providing direct support for quality-by-design based development and manufacturing of biopharmaceuticals, hindering the enormous potential for bioprocesses automation from their development to manufacturing. However, the adoption of ML-based models instead of conventional multivariate data analysis methods is significantly increasing due to the accumulation of large-scale production data. This trend is primarily driven by the real-time monitoring of process variables and quality attributes of biopharmaceutical products through the implementation of advanced process analytical technologies. Given the complexity and multidimensionality of a bioproduct design, bioprocess development, and product manufacturing data, ML-based approaches are increasingly being employed to achieve accurate, flexible, and high-performing predictive models to address the problems of analytics, monitoring, and control within the biopharma field. This paper aims to provide a comprehensive review of the current applications of ML solutions in a bioproduct design, monitoring, control, and optimisation of upstream, downstream, and product formulation processes. Finally, this paper thoroughly discusses the main challenges related to the bioprocesses themselves, process data, and the use of machine learning models in biopharmaceutical process development and manufacturing. Moreover, it offers further insights into the adoption of innovative machine learning methods and novel trends in the development of new digital biopharma solutions.

摘要
机器学习（ML）在生物医药领域已经做出了重要贡献，但是其应用还处于初期阶段，对生物医药生产的质量设计和生产进行直接支持的应用还尚未发挥出大量潜力。然而，由于生产数据的积累，ML模型的应用正在不断增加，取代传统的多变量数据分析方法。这种趋势主要归功于实时监测生产过程中变量和产品质量特征的实施，以及高级进程分析技术的普及。由于生物产品设计、生产和加工数据的复杂性和多维性，ML方法在解决生物过程数据分析、监测和控制方面提供了高精度、灵活性和高性能的预测模型。本文旨在为读者提供生物产品设计、监测、控制和优化过程中机器学习解决方案的全面审视。此外，本文还详细讨论了生物过程本身、数据和机器学习模型在生物医药过程开发和生产中的主要挑战，以及采用创新的机器学习方法和新趋势在生物医药领域的发展。

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

paper_url: http://arxiv.org/abs/2310.09988
repo_url: None
paper_authors: Zhihong Lei, Ernest Pusateri, Shiyi Han, Leo Liu, Mingbin Xu, Tim Ng, Ruchir Travadi, Youyuan Zhang, Mirko Hannemann, Man-Hung Siu, Zhen Huang
for: 这个论文旨在提高端到端语音识别系统的个性化性，使其能够更准确地识别个人内容，如联系人姓名。
methods: 该论文基于连接主义时间分类的技术，提出了一种生成个人实体唤起的新的子词tokenization方法。此外，该论文还使用了两种已知技术：上下文偏移和词段均衡。
results: 根据论文的表述，使用这些技术组合后，个人名实体识别精度与一个竞争性hybrid系统相当。

Abstract
Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. Building on previous work, we present a novel method for generating additional subword tokenizations for personal entities from their pronunciations. We show that using this technique in combination with two established techniques, contextual biasing and wordpiece prior normalization, we are able to achieve personal named entity accuracy on par with a competitive hybrid system.

摘要
Translated into Simplified Chinese:最近的深度学习和自动语音识别技术的进步，已经提高了端到端语音识别系统的准确率，但是个人内容such as contact names仍然是一个挑战。在这项工作中，我们描述了基于连接主义时间分类的个人化解决方案。基于之前的工作，我们提出了一种新的方法，通过个人实体的发音来生成额外的子字符串拼接。我们表明，使用这种技术与两种已知技术，Contextual biasing和wordpiece prior normalization，可以达到与竞争性混合系统相同的个人名实体准确率。

2023-10-16

Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction

The Calysto Scheme Project

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Eco-Driving Control of Connected and Automated Vehicles using Neural Network based Rollout

Religious Affiliation in the Twenty-First Century: A Machine Learning Perspective on the World Value Survey

Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning

Probabilistic Classification by Density Estimation Using Gaussian Mixture Model and Masked Autoregressive Flow

A Machine Learning-based Algorithm for Automated Detection of Frequency-based Events in Recorded Time Series of Sensor Data

Approximating Two-Layer Feedforward Networks for Efficient Transformers

Gaussian processes based data augmentation and expected signature for time series classification

Accurate Data-Driven Surrogates of Dynamical Systems for Forward Propagation of Uncertainty

Uncertainty-aware transfer across tasks using hybrid model-based successor feature reinforcement learning

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Regularization properties of adversarially-trained linear regression

Neural Tangent Kernels Motivate Graph Neural Networks with Cross-Covariance Graphs

Correcting model misspecification in physics-informed neural networks (PINNs)

Gotta be SAFE: A New Framework for Molecular Design

Unsupervised Lead Sheet Generation via Semantic Compression

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

Exploring hyperelastic material model discovery for human brain cortex: multivariate analysis vs. artificial neural network approaches

Statistical Barriers to Affine-equivariant Estimation

Mori-Zwanzig latent space Koopman closure for nonlinear autoencoder

Fast Adversarial Label-Flipping Attack on Tabular Data

MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design

A representation learning approach to probe for dynamical dark energy in matter power spectra

A Computational Framework for Solving Wasserstein Lagrangian Flows

Efficacy of Dual-Encoders for Extreme Multi-Label Classification

Certainty In, Certainty Out: REVQCs for Quantum Machine Learning

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

Pareto Optimization to Accelerate Multi-Objective Virtual Screening

HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

Population-based wind farm monitoring based on a spatial autoregressive approach

TacticAI: an AI assistant for football tactics

Optimal vintage factor analysis with deflation varimax

Comparing Comparators in Generalization Bounds

Learning optimal integration of spatial and temporal information in noisy chemotaxis

From Spectral Theorem to Statistical Independence with Application to System Identification

Reproducing Bayesian Posterior Distributions for Exoplanet Atmospheric Parameter Retrievals with a Machine Learning Surrogate Model

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Few-Shot Learning Patterns in Financial Time-Series for Trend-Following Strategies

Passive Inference Attacks on Split Learning via Adversarial Regularization

Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

A Geometric Insight into Equivariant Message Passing Neural Networks on Riemannian Manifolds

Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

Equivariant Matrix Function Neural Networks

Continuously Adapting Random Sampling (CARS) for Power Electronics Parameter Design

Towards Fair and Calibrated Models

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

Multi-Factor Spatio-Temporal Prediction based on Graph Decomposition Learning

Machine learning in physics: a short guide

Advantages of Machine Learning in Bus Transport Analysis

MgNO: Efficient Parameterization of Linear Operators via Multigrid

An Anytime Algorithm for Good Arm Identification

Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification

A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead

Transparent Anomaly Detection via Concept-based Explanations

Time integration schemes based on neural networks for solving partial differential equations on coarse grids

Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

Leveraging heterogeneous spillover effects in maximizing contextual bandit rewards

Leveraging Topological Maps in Deep Reinforcement Learning for Multi-Object Navigation

The Mixtures and the Neural Critics: On the Pointwise Mutual Information Profiles of Fine Distributions

Structural transfer learning of non-Gaussian DAG

GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

Interpretable Predictive Models to Understand Risk Factors for Maternal and Fetal Outcomes

An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records

Hypergraph Echo State Network

On permutation symmetries in Bayesian neural network posteriors: a variational perspective

An Empirical Study of Simplicial Representation Learning with Wasserstein Distance

A Comprehensive Study of Privacy Risks in Curriculum Learning

A proximal augmented Lagrangian based algorithm for federated learning with global and local convex conic constraints

PAC Learning Linear Thresholds from Label Proportions

LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Proportions

Label Differential Privacy via Aggregation

Over-the-Air Federated Learning and Optimization