2023-11-28

cs.LG

cs.LG - 2023-11-28

LiveTune: Dynamic Parameter Tuning for Training Deep Neural Networks

paper_url: http://arxiv.org/abs/2311.17279
repo_url: None
paper_authors: Soheil Zibakhsh Shabgahi, Nojan Sheybani, Aiden Tabrizi, Farinaz Koushanfar
for: 提高机器学习训练的实时可调参数能力，减少重启和检查点的开销。
methods: 使用 LiveVariables 技术实现实时参数调整，并且不需要重启训练Session。
results: 测试结果显示，LiveTune 框架可以在实时调整参数时，节省训练时间约 60 秒和能耗约 5.4 千焦耳每个参数调整。

Abstract
Traditional machine learning training is a static process that lacks real-time adaptability of hyperparameters. Popular tuning solutions during runtime involve checkpoints and schedulers. Adjusting hyper-parameters usually require the program to be restarted, wasting utilization and time, while placing unnecessary strain on memory and processors. We present LiveTune, a new framework allowing real-time parameter tuning during training through LiveVariables. Live Variables allow for a continuous training session by storing parameters on designated ports on the system, allowing them to be dynamically adjusted. Extensive evaluations of our framework show saving up to 60 seconds and 5.4 Kilojoules of energy per hyperparameter change.

摘要
传统机器学习训练是一个静态的过程，缺乏实时调整Hyperparameter的能力。流行的训练时间调整方案包括Checkpoint和调度器。调整Hyperparameter通常需要程序重启，浪费资源和时间，同时带来内存和处理器的额外压力。我们提出了LiveTune框架，允许实时参数调整 durante el entrenamiento through LiveVariables。Live Variables使得参数可以在系统上的特定端口存储，以动态调整。我们的框架的广泛评估表明，每个Hyperparameter变化可以保存60秒钟和5.4千焦耳的能源。

An Online Optimization-Based Decision Support Tool for Small Farmers in India: Learning in Non-stationary Environments

paper_url: http://arxiv.org/abs/2311.17277
repo_url: None
paper_authors: Tuxun Lu, Aviva Prins
for: 帮助小型印度农场管理农作，尤其是在气候变化的影响下。
methods: 模型个农庄为Markov决策过程（MDP），采用Li和Li（2019）的Follow the Weighted Leader（FWL）在线学习算法提供农作规划建议。
results: 在模拟中成功生成了保持价值的农作 patrern suggessions，与关闭计划算法相比，运行时间大幅减少。

Abstract
Crop management decision support systems are specialized tools for farmers that reduce the riskiness of revenue streams, especially valuable for use under the current climate changes that impact agricultural productivity. Unfortunately, small farmers in India, who could greatly benefit from these tools, do not have access to them. In this paper, we model an individual greenhouse as a Markov Decision Process (MDP) and adapt Li and Li (2019)'s Follow the Weighted Leader (FWL) online learning algorithm to offer crop planning advice. We successfully produce utility-preserving cropping pattern suggestions in simulations. When we compare against an offline planning algorithm, we achieve the same cumulative revenue with greatly reduced runtime.

摘要
农业资源管理决策支持系统是专门为农民设计的工具，可以减少农业生产力的风险，特别是在气候变化的影响下。然而，印度的小型农民，他们可以很大地受益于这些工具，却没有访问到它们。在这篇论文中，我们将个体绿色房间模型为Markov决策过程（MDP），并采用李和李（2019）的跟随权重领袖（FWL）在线学习算法，为农民提供耕作计划建议。我们在模拟中成功地生成了保持实用性的耕作模式建议。与批处理算法相比，我们在运行时间上减少了很多。

SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata

paper_url: http://arxiv.org/abs/2311.17259
repo_url: None
paper_authors: Mark Díaz, Sunipa Dev, Emily Reif, Remi Denton, Vinodkumar Prabhakaran
for: 本研究旨在提供一种框架，用于系统地分析未结构化数据在基础模型开发中的挑战，以及如何使用和Documentation决策。
methods: 本研究使用一种基于人类表示的框架，用于分析未结构化数据中的人类表示，并identify下游风险。
results: 本研究通过两个示例使用Common Crawl网页文本 corpus（C4）和LAION-400M，并提出了一些假设的行动步骤，以便在数据使用、开发和Documentation中进行系统性的分析和决策。

Abstract
The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions. From a Responsible AI perspective, these decisions often rely upon understanding how people are represented in data. We propose a framework designed to guide analysis of human representation in unstructured data and identify downstream risks. We apply the framework in two toy examples using the Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of hypothetical action steps in service of dataset use, development, and documentation.

摘要
数据的无架构性在基础模型开发中是一个挑战，难以进行系统性的分析，以便做出数据使用和文件纪录的决策。从负责任AI的角度来看，这些决策通常需要了解数据中人类的表现。我们提出了一个框架，用于分析数据中人类的表现，并识别下游的风险。我们在两个玩偶示例中使用了Common Crawl网页文档（C4）和LAION-400M，并提出了一组假设的行动步骤，用于Dataset的使用、开发和文件纪录。

Fourier Neural Differential Equations for learning Quantum Field Theories

paper_url: http://arxiv.org/abs/2311.17250
repo_url: https://github.com/2357e2/fnde
paper_authors: Isaac Brant, Alexander Norcliffe, Pietro Liò
for: 这个论文旨在使用神经异 diferencial equation（NDE）学习粒子散射矩阵，并提出了一种新的快速傅立叶 neural differential equation（FNDE）模型，以提高模型的泛化能力。
methods: 该论文使用NDE模型学习了哥伦布场论（phi^4）、scalar-yukawa理论和量子电磁学。NDE模型通过学习粒子散射矩阵，从而提取出理论中的互动 Hamiltoniano。
results: 研究发现，使用FNDE模型可以提高模型的泛化能力，并且通过学习粒子散射矩阵，从而提取出理论中的互动 Hamiltoniano。

Abstract
A Quantum Field Theory is defined by its interaction Hamiltonian, and linked to experimental data by the scattering matrix. The scattering matrix is calculated as a perturbative series, and represented succinctly as a first order differential equation in time. Neural Differential Equations (NDEs) learn the time derivative of a residual network's hidden state, and have proven efficacy in learning differential equations with physical constraints. Hence using an NDE to learn particle scattering matrices presents a possible experiment-theory phenomenological connection. In this paper, NDE models are used to learn $\phi^4$ theory, Scalar-Yukawa theory and Scalar Quantum Electrodynamics. A new NDE architecture is also introduced, the Fourier Neural Differential Equation (FNDE), which combines NDE integration and Fourier network convolution. The FNDE model demonstrates better generalisability than the non-integrated equivalent FNO model. It is also shown that by training on scattering data, the interaction Hamiltonian of a theory can be extracted from network parameters.

摘要
一种量子场论是通过其交互哈密顿定义，并通过散射矩阵与实验数据连接。散射矩阵可以通过批量 serie 表示，并表示为时间第一频率方程的几何函数。神经差分方程（NDEs）学习时间导函数的差分状态，并在物理约束下学习散射矩阵。因此，使用 NDE 来学习散射矩阵可能存在实验-理论现象联系。在本文中，NDE 模型用于学习 $\phi^4$ 理论、杂scalar-Yukawa理论和杂量量电动力学。新的 FNDE 模型还是引入的，它将 NDE 集成和福特网络卷积结合。FNDE 模型在普适性方面表现更好 than 非集成的 FNO 模型。此外，通过训练散射数据，交互哈密顿理论中的交互 Hamiltonian 也可以从网络参数中提取。

Invariance assumptions for class distribution estimation

paper_url: http://arxiv.org/abs/2311.17225
repo_url: None
paper_authors: Dirk Tasche
for: 这个论文目的是解决数据集移动问题下的分类分布估计问题。
methods: 这个论文使用了假设对于训练集和测试集的 JOINT 分布是一致的，从而使得类别分布估计问题更加容易。具体来说，论文考虑了covariate shift、factorizable joint shift和sparse joint shift等假设，并对它们对分类分布估计的影响进行了讨论。
results: 论文得出的结论是，通过假设对于训练集和测试集的 JOINT 分布是一致的，可以更好地估计测试集中类别的分布。此外，论文还提出了一种基于假设的方法来实现这一目的。

Abstract
We study the problem of class distribution estimation under dataset shift. On the training dataset, both features and class labels are observed while on the test dataset only the features can be observed. The task then is the estimation of the distribution of the class labels, i.e. the estimation of the class prior probabilities, in the test dataset. Assumptions of invariance between the training joint distribution of features and labels and the test distribution can considerably facilitate this task. We discuss the assumptions of covariate shift, factorizable joint shift, and sparse joint shift and their implications for class distribution estimation.

摘要
我们研究在数据集变换的情况下的类分布估计问题。在训练数据集上，both features和类别标签都可见，而在测试数据集上只能见到特征。任务就是估计测试数据集中类别标签的分布，即估计类前期概率。假设训练数据集和测试数据集之间的变换是相互对应的，可以大大facilitate这个任务。我们讨论了covariate shift、factorizable joint shift和sparse joint shift的假设，以及它们对类分布估计的影响。

Optimal EEG Electrode Set for Emotion Recognition From Brain Signals: An Empirical Quest

paper_url: http://arxiv.org/abs/2311.17204
repo_url: None
paper_authors: Rumman Ahmed Prodhan, Sumya Akter, Tanmoy Sarkar Pias, Md. Akhtaruzzaman Adnan
for: This paper aims to empirically analyze the contribution of each part of the brain in exhibiting emotions.
methods: The authors use the DEAP dataset to find the most optimal electrode set and use Fast Fourier Transformation for effective feature extraction, as well as a 1D-CNN with residual connection for classification.
results: The study finds that 12 electrodes (F7, P8, O1, F8, C4, T7, PO3, Fp1, Fp2, O2, P3, and Fz) achieve 95.81% accuracy in recognizing emotions, and that the frontal lobe is the most important for recognizing emotion. Additionally, the authors find that adding more than 10 electrodes does not improve performance significantly.

Abstract
The human brain is a complex organ, still completely undiscovered, that controls almost all the parts of the body. Apart from survival, the human brain stimulates emotions. Recent research indicates that brain signals can be very effective for emotion recognition. However, which parts of the brain exhibit most of the emotions is still under-explored. In this study, we empirically analyze the contribution of each part of the brain in exhibiting emotions. We use the DEAP dataset to find the most optimal electrode set which eventually leads to the effective brain part associated with emotions. We use Fast Fourier Transformation for effective feature extraction and a 1D-CNN with residual connection for classification. Though 32 electrodes from the DEAP dataset got an accuracy of 97.34%, only 12 electrodes (F7, P8, O1, F8, C4, T7, PO3, Fp1, Fp2, O2, P3, and Fz) achieve 95.81% accuracy. This study also shows that adding more than 10 electrodes does not improve performance significantly. Moreover, the frontal lobe is the most important for recognizing emotion.

摘要
人脑是一个复杂的器官，仍然完全未经探索，它控制着身体大多数部分。除了生存，人脑刺激情感。最近的研究表明，大脑信号可以非常有效地识别情感。然而，哪些脑部分展示情感的研究仍然尚未得到足够的探索。在这项研究中，我们employs DEAP数据集来找出最佳电极组，最终导致有效的脑部分与情感相关。我们使用 Fast Fourier Transformation 进行有效的特征提取，并使用1D-CNN With residual connection 进行分类。尽管DEAP数据集中的32个电极达到了97.34%的准确率，但只有12个电极（F7、P8、O1、F8、C4、T7、PO3、Fp1、Fp2、O2、P3和Fz）达到了95.81%的准确率。此研究还表明，添加更多于10个电极不会提高性能。此外，前列带是识别情感的最重要部分。

A personalized Uncertainty Quantification framework for patient survival models: estimating individual uncertainty of patients with metastatic brain tumors in the absence of ground truth

paper_url: http://arxiv.org/abs/2311.17173
repo_url: None
paper_authors: Yuqi Wang, Aarzu Gupta, David Carpenter, Trey Mullikin, Zachary J. Reitman, Scott Floyd, John Kirkpatrick, Joseph K. Salama, Paul W. Sperduto, Jian-Guo Liu, Mustafa R. Bashir, Kyle J. Lafata
for: 这个论文的目的是提出一种基于uncertaintyQuantification (UQ)的方法来估算患者存活模型的不确定性，以便在缺乏真实参照数据的情况下进行存活预测。
methods: 这个方法使用了一个 dataset of 1383 例 brain metastases 患者在2015年1月到2020年12月期间接受了stereotactic radiosurgery (SRS)，并对这些数据进行了不同的统计和非统计模型的预测，包括 CoxPH、conditional survival forest (CSF) 和 neural multi-task linear regression (NMTLR)。
results: 研究结果表明，所有模型在 intracranial progression (ICP) 上有最低的不确定性（2.21%），而在 intracranial progression and death (ICPD) 上有最高的不确定性（17.28%）。 OS 模型在不确定性性能方面表现了较高的变化，NMTLR 模型有最低的不确定性（1.96%），而 CSF 模型有最高的不确定性（14.29%）。因此，这种方法可以估算患者存活模型的不确定性。

Abstract
TodevelopanovelUncertaintyQuantification (UQ) framework to estimate the uncertainty of patient survival models in the absence of ground truth, we developed and evaluated our approach based on a dataset of 1383 patients treated with stereotactic radiosurgery (SRS) for brain metastases between January 2015 and December 2020. Our motivating hypothesis is that a time-to-event prediction of a test patient on inference is more certain given a higher feature-space-similarity to patients in the training set. Therefore, the uncertainty for a particular patient-of-interest is represented by the concordance index between a patient similarity rank and a prediction similarity rank. Model uncertainty was defined as the increased percentage of the max uncertainty-constrained-AUC compared to the model AUC. We evaluated our method on multiple clinically-relevant endpoints, including time to intracranial progression (ICP), progression-free survival (PFS) after SRS, overall survival (OS), and time to ICP and/or death (ICPD), on a variety of both statistical and non-statistical models, including CoxPH, conditional survival forest (CSF), and neural multi-task linear regression (NMTLR). Our results show that all models had the lowest uncertainty on ICP (2.21%) and the highest uncertainty (17.28%) on ICPD. OS models demonstrated high variation in uncertainty performance, where NMTLR had the lowest uncertainty(1.96%)and CSF had the highest uncertainty (14.29%). In conclusion, our method can estimate the uncertainty of individual patient survival modeling results. As expected, our data empirically demonstrate that as model uncertainty measured via our technique increases, the similarity between a feature-space and its predicted outcome decreases.

摘要
为了开发一个不确定量评估患者存生模型的框架，我们采用了一 dataset of 1383 例患者在2015年1月至2020年12月期间接受了静脉射线手术（SRS）治疗脑 метастаases。我们的动机是，在推理时，对测试患者的时间到事件预测更加确定，与训练集中的患者更相似的特征空间相似性越高。因此，对某个特定患者的不确定性被表示为相似性排名和预测相似性排名之间的协调指数。我们定义了模型不确定性为Maximum uncertainty-constrained AUC比较模型 AUC 的增加百分数。我们对多个临床有用的终点进行了评估，包括脑幕进程（ICP）、前列腺生长终止率（PFS）、总存生率（OS）和脑幕进程和/或死亡（ICPD），使用了不同的统计学和非统计学模型，包括 CoxPH、Conditional survival forest（CSF）和神经多任务线性回归（NMTLR）。我们的结果表明，所有模型的最低不确定性是ICP（2.21%），最高不确定性是ICPD（17.28%）。OS 模型的不确定性表现出了大量的变化，NMTLR 模型的不确定性最低（1.96%），CSF 模型的不确定性最高（14.29%）。总之，我们的方法可以估计个体患者存生模型结果的不确定性。我们的数据实际证明，via 我们的技术，模型不确定性的增加与特征空间和预测结果之间的相似性减少。

Fast Particle-based Anomaly Detection Algorithm with Variational Autoencoder

paper_url: http://arxiv.org/abs/2311.17162
repo_url: https://github.com/ryanliu30/fastanomalydetection
paper_authors: Ryan Liu, Abhijith Gandrakota, Jennifer Ngadiuba, Maria Spiropulu, Jean-Roch Vlimant
for: 这个论文的目的是探讨模型独立异常检测的新方法，寻找超越标准模型物理。
methods: 这个论文使用了分子基于变量自动encoder（VAE）的异常检测算法，叫做Set-VAE。
results: 作者在这个论文中展示了使用Set-VAE的异常检测算法可以比传统的融合性-基于jet选择的方法提高异常检测效率，并且提出了CLIP-VAE，可以在触发系统中减少异常检测的执行时间和缓存需求。

Abstract
Model-agnostic anomaly detection is one of the promising approaches in the search for new beyond the standard model physics. In this paper, we present Set-VAE, a particle-based variational autoencoder (VAE) anomaly detection algorithm. We demonstrate a 2x signal efficiency gain compared with traditional subjettiness-based jet selection. Furthermore, with an eye to the future deployment to trigger systems, we propose the CLIP-VAE, which reduces the inference-time cost of anomaly detection by using the KL-divergence loss as the anomaly score, resulting in a 2x acceleration in latency and reducing the caching requirement.

摘要
“模型独立异常检测是物理新模型搜索中的一种promising方法。在这篇论文中，我们提出了Set-VAE，一种基于变量自动编码器（VAE）的异常检测算法。我们证明了与传统的融合性-基于jet选择的异常检测相比，Set-VAE可以提供2倍的信号效率提升。此外，为了在触发系统中部署，我们提议了CLIP-VAE，它使用KL散度损失作异常分数，从而实现2倍的响应时间加速和缓存需求减少。”Note: Simplified Chinese is used in this translation, as it is more commonly used in mainland China and is the standard for most online content. If you prefer Traditional Chinese, I can provide that version as well.

A point cloud approach to generative modeling for galaxy surveys at the field level

paper_url: http://arxiv.org/abs/2311.17141
repo_url: https://github.com/smsharma/point-cloud-galaxy-diffusion
paper_authors: Carolina Cuesta-Lazaro, Siddharth Mishra-Sharma
for: 本研究使用扩散型生成模型描述宇宙中星系分布，直接用3D空间坐标和可选的属性（例如速度和质量）来描述星系分布，不需要分割或卷积。
methods: 本研究使用自定义的扩散模型进行模拟和推断，可以重现宇宙中星系分布的主要摘要统计数据，同时可以计算星系场的 conditional likelihood。
results: 本研究在Quijote simulations中应用了这种方法，并成功地重现了大质量黑 mater坍塌体的分布。这种方法可以扩展到涵盖宇宙学数据的全面分析，超越摘要统计方法和神经网络模拟方法的限制。

Abstract
We introduce a diffusion-based generative model to describe the distribution of galaxies in our Universe directly as a collection of points in 3-D space (coordinates) optionally with associated attributes (e.g., velocities and masses), without resorting to binning or voxelization. The custom diffusion model can be used both for emulation, reproducing essential summary statistics of the galaxy distribution, as well as inference, by computing the conditional likelihood of a galaxy field. We demonstrate a first application to massive dark matter haloes in the Quijote simulation suite. This approach can be extended to enable a comprehensive analysis of cosmological data, circumventing limitations inherent to summary statistic -- as well as neural simulation-based inference methods.

摘要
我们引入一种扩散基于的生成模型，用于描述宇宙中星系的分布，直接表示为3D空间坐标中的点集合，可选择 associate 属性（例如速度和质量），而不需要分割或嵌入。我们的自定义扩散模型可以用于启用模拟，重现宇宙中星系分布的重要摘要统计数据，以及条件概率计算，用于 galaxy 场的计算。我们在 Quijote 仿真 simulations 中进行了首次应用。这种方法可以扩展到涵盖 cosmological data 的全面分析，超越摘要统计和神经网络模拟基于的推理方法的局限性。

Predicting the Age of Astronomical Transients from Real-Time Multivariate Time Series

paper_url: http://arxiv.org/abs/2311.17143
repo_url: None
paper_authors: Hali Huang, Daniel Muthukrishna, Prajna Nair, Zimi Zhang, Michael Fausnaugh, Torsha Majumder, Ryan J. Foley, George R. Ricker
for: 这篇论文主要是为了提高我们对天体变化的理解，尤其是新的天文学天空勘测将记录未曾有的数量的变化。
methods: 这篇论文使用了 bayesian probabilistic recurrent neural network 方法，可以在实时中预测变化的年龄，并且可以提供 robust 的不确定性。
results: 这篇论文可以准确地预测变化的年龄，并且可以为新的天文学勘测提供有价值的信息，以提高我们对天体变化的理解。

Abstract
Astronomical transients, such as supernovae and other rare stellar explosions, have been instrumental in some of the most significant discoveries in astronomy. New astronomical sky surveys will soon record unprecedented numbers of transients as sparsely and irregularly sampled multivariate time series. To improve our understanding of the physical mechanisms of transients and their progenitor systems, early-time measurements are necessary. Prioritizing the follow-up of transients based on their age along with their class is crucial for new surveys. To meet this demand, we present the first method of predicting the age of transients in real-time from multi-wavelength time-series observations. We build a Bayesian probabilistic recurrent neural network. Our method can accurately predict the age of a transient with robust uncertainties as soon as it is initially triggered by a survey telescope. This work will be essential for the advancement of our understanding of the numerous young transients being detected by ongoing and upcoming astronomical surveys.

摘要
天文学上的快时变化，如超新星和其他罕见的恒星爆发，对天文学发现做出了重要贡献。未来的天文望远镜观测将记录到前所未有的数量的变化，这些变化将是疏散和不规则的多变量时间序列。为了更好地理解变化的物理机制和其前体系统，早期测量是关键。为此，我们提出了实时预测变化的年龄的方法。我们构建了 bayesian 概率回归神经网络，可以准确地预测变化的年龄，并且提供了准确的不确定性。这种方法将对当前和未来的天文观测计划做出重要贡献。

\texttt{GlycoNMR}: Dataset and benchmarks for NMR chemical shift prediction of carbohydrates with graph neural networks

paper_url: http://arxiv.org/abs/2311.17134
repo_url: None
paper_authors: Zizhang Chen, Ryan Paul Badman, Lachele Foley, Robert Woods, Pengyu Hong
for: 这篇论文旨在用分子表示学（MRL）方法将碳水化合物转化为数字表示，以保留其化学特征，并为生物化学研究提供基础。
methods: 这篇论文使用了两个精心级别的数据集，包括2,609个碳水化合物结构和211,543个核磁共振（NMR）化学偏移的精确原子级预测。它还采用了特制的碳水化合物特征，适应特殊的碳水化合物数据问题。
results: 本文在新的数据集上测试了四种修改后的MRL模型，并获得了高度的预测精度。这些结果表明，采用特制的碳水化合物特征和适应特殊数据问题的MRL模型，可以有效地解决碳水化合物预测问题。

Abstract
Molecular representation learning (MRL) is a powerful tool for bridging the gap between machine learning and chemical sciences, as it converts molecules into numerical representations while preserving their chemical features. These encoded representations serve as a foundation for various downstream biochemical studies, including property prediction and drug design. MRL has had great success with proteins and general biomolecule datasets. Yet, in the growing sub-field of glycoscience (the study of carbohydrates, where longer carbohydrates are also called glycans), MRL methods have been barely explored. This under-exploration can be primarily attributed to the limited availability of comprehensive and well-curated carbohydrate-specific datasets and a lack of Machine learning (ML) pipelines specifically tailored to meet the unique problems presented by carbohydrate data. Since interpreting and annotating carbohydrate-specific data is generally more complicated than protein data, domain experts are usually required to get involved. The existing MRL methods, predominately optimized for proteins and small biomolecules, also cannot be directly used in carbohydrate applications without special modifications. To address this challenge, accelerate progress in glycoscience, and enrich the data resources of the MRL community, we introduce GlycoNMR. GlycoNMR contains two laboriously curated datasets with 2,609 carbohydrate structures and 211,543 annotated nuclear magnetic resonance (NMR) chemical shifts for precise atomic-level prediction. We tailored carbohydrate-specific features and adapted existing MRL models to tackle this problem effectively. For illustration, we benchmark four modified MRL models on our new datasets.

摘要
聚合物表示学（MRL）是一种有力的工具，可以将分子转换为数字表示，保留其化学特征。这些编码表示可以作为下游生物化学研究的基础，包括质量预测和药物设计。MRL在蛋白质和通用生物分子数据集上已经取得了很大成功。然而，在增长的聚合物科学领域（研究碳水化合物，其中 longer碳水化合物也称为聚糖），MRL方法尚未得到了充分的探索。这种下降的原因主要是因为聚合物特有的数据集缺乏完整和准确的储存，以及缺乏特化于聚合物数据的机器学习（ML）管道。由于读取和标注聚合物特有的数据是通常比较复杂的，因此需要域专家的参与。现有的MRL方法，主要是为蛋白质和小分子优化的，无法直接在聚合物应用中使用，需要特殊修改。为了解决这个挑战，加速聚合物科学的进步，并充实MRL社区的数据资源，我们引入了GlycoNMR。GlycoNMR包含2,609个碳水化合物结构和211,543个核磁共振（NMR）化学偏移的精心筛选数据集。我们适应特有的碳水化合物特征，并采用现有的MRL模型来解决这个问题。例如，我们在新的数据集上 benchmark了四种修改后的MRL模型。

An Investigation of Time Reversal Symmetry in Reinforcement Learning

paper_url: http://arxiv.org/abs/2311.17008
repo_url: None
paper_authors: Brett Barkley, Amy Zhang, David Fridovich-Keil
for: 提高深度学习样本效率
methods: 使用时间反转Symmetry和时间反转数据增强
results: 在适当的环境和奖励结构下，时间反转数据增强可以提高深度学习样本效率，但在一些情况下可能会导致样本效率下降和策略性能下降。

Abstract
One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.

摘要
We find that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL.Empirical evaluations show that these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. However, we also find that TSDA can significantly degrade sample efficiency and policy performance when the environment and reward structures are not appropriate. Ultimately, we conclude that time symmetry shows promise in enhancing the sample efficiency of RL and provide guidance on when and how to employ TSDA effectively.

On the Impact of Sampling on Deep Sequential State Estimation

paper_url: http://arxiv.org/abs/2311.17006
repo_url: None
paper_authors: Helena Calatrava, Ricardo Augusto Borsoi, Tales Imbiriba, Pau Closas
for: 这 paper 是关于 sequential models 的状态推断和参数学习，使用 Approximation techniques 来最大化证明下界对数据分布的含义下的 marginal log-likelihood。
methods: 这 paper 使用 Dynamical Variational Autoencoders 和 deep Kalman filter 来实现状态推断和参数学习，并使用 importance sampling 来提高 generative modeling 性能。
results: 这 paper 的实验表明，使用 tighter Monte Carlo objectives 可以提高状态推断和参数学习的性能，包括 log-likelihood estimates 和 KL divergence between the variational distribution 和 transition model。 Additionally, the paper shows an improvement in estimating the model parameters and latent states, with a decrease in RMSE.

Abstract
State inference and parameter learning in sequential models can be successfully performed with approximation techniques that maximize the evidence lower bound to the marginal log-likelihood of the data distribution. These methods may be referred to as Dynamical Variational Autoencoders, and our specific focus lies on the deep Kalman filter. It has been shown that the ELBO objective can oversimplify data representations, potentially compromising estimation quality. Tighter Monte Carlo objectives have been proposed in the literature to enhance generative modeling performance. For instance, the IWAE objective uses importance weights to reduce the variance of marginal log-likelihood estimates. In this paper, importance sampling is applied to the DKF framework for learning deep Markov models, resulting in the IW-DKF, which shows an improvement in terms of log-likelihood estimates and KL divergence between the variational distribution and the transition model. The framework using the sampled DKF update rule is also accommodated to address sequential state and parameter estimation when working with highly non-linear physics-based models. An experiment with the 3-space Lorenz attractor shows an enhanced generative modeling performance and also a decrease in RMSE when estimating the model parameters and latent states, indicating that tighter MCOs lead to improved state inference performance.

摘要
State 推断和参数学习在序列模型中可以使用approximation技术来最大化证明下界对数据分布的含义下的含义下的质量。这些方法可以称为动态自适应编码器，我们的特定关注点在于深度卡尔曼滤波器。它已经显示出ELBO目标可能简化数据表示，可能影响估计质量。在文献中，使用重要性权重来减少分布估计的方差的 Monte Carlo 目标已经被提出。例如，IWAE目标使用重要性权重来降低分布估计的方差。在这篇论文中，我们将重要性抽样应用到DKF框架中，从而获得IW-DKF，它在评估数据分布和转换模型之间的KL散度和LOG-LIKELIHOOD估计中表现出了改善。此外，我们还使用抽样DKF更新规则来解决高非线性物理基础模型的Sequential state和参数估计问题。在3个空间 Lorenz 吸引器实验中，我们发现使用紧密的MCO可以提高生成模型性能和降低参数和隐藏状态估计的RMSE，这表明了紧密的MCO可以提高状态推断性能。

FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings

paper_url: http://arxiv.org/abs/2311.16984
repo_url: None
paper_authors: Jean Ogier du Terrail, Quentin Klopfenstein, Honghao Li, Imke Mayer, Nicolas Loiseau, Mohammad Hallal, Félix Balazard, Mathieu Andreux
for: 该研究旨在提高非randomized clinical trials中实验药物的效果证明，使用隐私保护技术来解决数据共享的挑战。
methods: 该研究使用了联合学习（Federated Learning，FL）技术，开发了一种名为FedECA的隐私增强的时间到事件结果方法，以减少patients的数据曝光。
results: 对于时间到事件结果的推断，FedECA方法在比较力和对照组匹配性方面都有优势，超过了最相似的匹配评估（MAIC）方法。code基于Substra开源软件，可以供参考。

Abstract
External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval in non-randomized settings. However, the main challenge of implementing ECA lies in accessing real-world data or historical clinical trials. Indeed, data sharing is often not feasible due to privacy considerations related to data leaving the original collection centers, along with pharmaceutical companies' competitive motives. In this paper, we leverage a privacy-enhancing technology called federated learning (FL) to remove some of the barriers to data sharing. We introduce a federated learning inverse probability of treatment weighted (IPTW) method for time-to-event outcomes called FedECA which eases the implementation of ECA by limiting patients' data exposure. We show with extensive experiments that FedECA outperforms its closest competitor, matching-adjusted indirect comparison (MAIC), in terms of statistical power and ability to balance the treatment and control groups. To encourage the use of such methods, we publicly release our code which relies on Substra, an open-source FL software with proven experience in privacy-sensitive contexts.

摘要
外部控制臂（ECA）可以帮助早期药物开发和获得药物批准，但实施ECA的主要挑战在于获取实际世界数据或历史临床试验数据。实际上，数据分享通常不可能因为数据离开原始收集中心的隐私问题以及药品公司的竞争动机。在这篇论文中，我们利用隐私提升技术called federated learning（FL）来解决这些障碍。我们介绍了一种基于FL的时间点事件结果的 inverse probability of treatment weighted（IPTW）方法called FedECA，它可以减少患者数据暴露，从而使ECA更加容易实施。我们通过了详细的实验表明，FedECA在统计能力和对治疗和控制组的均衡方面都高于其最近竞争者matching-adjusted indirect comparison（MAIC）。为促进这些方法的使用，我们在公共发布我们的代码，该代码基于Substra，一个经验证的开源FL软件，其在隐私敏感场景中具有良好的表现。

Bidirectional Reactive Programming for Machine Learning

paper_url: http://arxiv.org/abs/2311.16977
repo_url: None
paper_authors: Dumitru Potop Butucaru, Albert Cohen, Gordon Plotkin, Hugo Pompougnac
for: 本文旨在描述一种新的反应编程模型，用于模型系统与环境的连续交互。
methods: 本文使用了反应编程的 symmetric 构造，以及对后向循环的约束，实现实用性。
results: 文章示示了多种机器学习（ML）系统可以被视为双向反应程序的形式，包括反向模式自动微分、后向循环传播、批量Normalization、双向Recurrent Neural Networks、训练和奖励学习算法等。

Abstract
Reactive languages are dedicated to the programming of systems which interact continuously and concurrently with their environment. Values take the form of unbounded streams modeling the (discrete) passing of time or the sequence of concurrent interactions. While conventional reactivity models recurrences forward in time, we introduce a symmetric reactive construct enabling backward recurrences. Constraints on the latter allow to make the implementation practical. Machine Learning (ML) systems provide numerous motivations for all of this: we demonstrate that reverse-mode automatic differentiation, backpropagation, batch normalization, bidirectional recurrent neural networks, training and reinforcement learning algorithms, are all naturally captured as bidirectional reactive programs.

摘要
响应语言专注于实时并发地与环境交互的系统编程。值采取流式模型，表示时间的推移或同时进行交互的序列。而传统响应模型仅考虑前进时间，我们则引入倒计时的响应构造。对后向倒计时的约束，使实现实际可行。机器学习（ML）系统带来了大量的动机：我们示出了反Mode自动导数、反propagation、批处理、双向循环神经网络、训练和奖励学习算法，都是自然地捕捉为双向响应程序。

Machine learning force-field models for metallic spin glass

paper_url: http://arxiv.org/abs/2311.16964
repo_url: None
paper_authors: Menglin Shi, Sheng Zhang, Gia-Wei Chern
for: 这个论文的目的是研究 металлических磁玻璃系统，例如含有磁矿物质的含气铁合金。
methods: 这个论文使用了可扩展的机器学习（ML）框架来模拟磁玻璃系统的动力学行为。他们开发了一种基于本地性的神经网络模型，用于准确地预测电子诱发的本地磁场，这个磁场是磁动学行为的驱动者。
results: 这个论文的研究结果表明，使用ML模型可以高效地和准确地模拟磁玻璃系统的动力学行为。他们在研究一种含气铁合金的材料中应用了这种方法，并发现了一些有趣的结果，如材料的刚性和磁矿物质的分布。

Abstract
Metallic spin glass systems, such as dilute magnetic alloys, are characterized by randomly distributed local moments coupled to each other through a long-range electron-mediated effective interaction. We present a scalable machine learning (ML) framework for dynamical simulations of metallic spin glasses. A Behler-Parrinello type neural-network model, based on the principle of locality, is developed to accurately and efficiently predict electron-induced local magnetic fields that drive the spin dynamics. A crucial component of the ML model is a proper symmetry-invariant representation of local magnetic environment which is direct input to the neural net. We develop such a magnetic descriptor by incorporating the spin degrees of freedom into the atom-centered symmetry function methods which are widely used in ML force-field models for quantum molecular dynamics. We apply our approach to study the relaxation dynamics of an amorphous generalization of the s-d model. Our work highlights the promising potential of ML models for large-scale dynamical modeling of itinerant magnets with quenched disorder.

摘要
金属自旋玻璃系统，如含杂磁矿物质，由 randomly 分布的地方时刻相互链接的远距离电子谱供电子 mediated 效应相互作用所特征。我们提出了可扩展的机器学习（ML）框架，用于动力学模拟金属自旋玻璃系统。我们基于 Behler-Parrinello 型神经网络模型，利用地方性原理，准确地预测电子引起的本地磁场，这些磁场是自动机动的主要驱动力。我们在 ML 模型中添加了一个对称 invariants 表示本地磁环境，这是神经网络的直接输入。我们在 ML 力场模型中 incorporating 磁度度量，以便更好地考虑磁环境的影响。我们在一种扩展的 s-d 模型中应用了我们的方法，研究了这种杂质磁玻璃的relaxation 动力学。我们的工作表明了 ML 模型在大规模动力学模拟含杂磁矿物质的潜在承载。

Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent

paper_url: http://arxiv.org/abs/2311.16956
repo_url: None
paper_authors: Frederik Köhne, Leonie Kreis, Anton Schiela, Roland Herzog
for: 提出了一种基于量可追踪的梯度下降逐步大小自适应方法，以优化梯度下降算法的性能。
methods: 利用梯度 Lipschitz常数和搜索方向的本地方差来实现自适应步长大小。
results: 实现了一种几乎无参数的梯度下降算法，并且在quadratic问题上具有证明性的收敛性质和在经典图像分类任务上展现出了真正的问题适应性。

Abstract
This paper proposes a novel approach to adaptive step sizes in stochastic gradient descent (SGD) by utilizing quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local variance in search directions. Our findings yield a nearly hyperparameter-free algorithm for stochastic optimization, which has provable convergence properties when applied to quadratic problems and exhibits truly problem adaptive behavior on classical image classification tasks. Our framework enables the potential inclusion of a preconditioner, thereby enabling the implementation of adaptive step sizes for stochastic second-order optimization methods.

摘要
Here is the text in Simplified Chinese:这篇论文提出了一种基于 numerically traceable 的 adaptive step size 方法，即使用梯度 Lipschitz 常数和搜索方向的本地卷积分布。我们的方法减少了 hyperparameter 的需求，并在 quadratic 问题上有证明的收敛性。此外，它在经典的图像分类任务上展现了真正的问题适应性。我们的框架允许使用预conditioner，以实现随机第二阶导数优化方法的 adaptive step size。

Multinomial belief networks

paper_url: http://arxiv.org/abs/2311.16909
repo_url: https://github.com/debabratabar/Fake_news_detection
paper_authors: H. C. Donker, D. Neijzen, G. A. Lunter
for: 本研究使用泊然方法来解决健康数据分析中的不确定性、缺失观察数据和罕见样本问题。
methods: 本文提出了一种深度生成模型，其中权重和隐藏层都采用DIRICHLET分布。我们采用了一种基于 Gibbs 抽样的方法，利用一系列的扩充关系，类似于周公共生成模型。
results: 我们在小写数字和大规模的DNA突变癌症数据上应用了这种模型，并显示了该模型能够自动找出生物意义的元签ature。

Abstract
A Bayesian approach to machine learning is attractive when we need to quantify uncertainty, deal with missing observations, when samples are scarce, or when the data is sparse. All of these commonly apply when analysing healthcare data. To address these analytical requirements, we propose a deep generative model for multinomial count data where both the weights and hidden units of the network are Dirichlet distributed. A Gibbs sampling procedure is formulated that takes advantage of a series of augmentation relations, analogous to the Zhou-Cong-Chen model. We apply the model on small handwritten digits, and a large experimental dataset of DNA mutations in cancer, and we show how the model is able to extract biologically meaningful meta-signatures in a fully data-driven way.

摘要

Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning

paper_url: http://arxiv.org/abs/2311.16883
repo_url: None
paper_authors: Daniel Barley, Holger Fröning
for: 这篇论文的目的是提高深度神经网络（DNNs）的训练效率和可持续性，通过对活化部分进行简洁化。
methods: 这篇论文使用了结构化剔除和BSR格式，并将它与大小为零的矩阵进行组合，以便实现对活化部分的简洁化。此外，这篇论文还引入了高效的对称剔除算子，以便在GPU上实现快速的训练。
results: 这篇论文的结果显示，通过对活化部分进行简洁化，可以实现训练DNNs的内存消耗 reduction，并且可以保持模型的准确性。具体来说，这篇论文的结果显示，在运算像素排序任务上，可以实现32%的内存消耗reduction，而且模型的准确性仍然保持在90%左右。

Abstract
The rise of Deep Neural Networks (DNNs) has led to an increase in model size and complexity, straining the memory capacity of GPUs. Sparsity in DNNs, characterized as structural or ephemeral, has gained attention as a solution. This work focuses on ephemeral sparsity, aiming to reduce memory consumption during training. It emphasizes the significance of activations, an often overlooked component, and their role in memory usage. This work employs structured pruning in Block Sparse Compressed Row (BSR) format in combination with a magnitude-based criterion to efficiently prune activations. We furthermore introduce efficient block-sparse operators for GPUs and showcase their effectiveness, as well as the superior compression offered by block sparsity. We report the effectiveness of activation pruning by evaluating training speed, accuracy, and memory usage of large-scale neural architectures on the example of ResMLP on image classification tasks. As a result, we observe a memory reduction of up to 32% while maintaining accuracy. Ultimately, our approach aims to democratize large-scale model training, reduce GPU requirements, and address ecological concerns.

摘要
深度神经网络（DNNs）的崛起使得模型大小和复杂度增加，导致GPU内存容量受到挑战。随着DNNs中的稀疏性（structural or ephemeral）的增加，人们开始关注这种解决方案。本文将重点强调启动的稀疏性，以减少训练期间内存消耗。我们使用BSR格式中的结构化剪裁法，并与大小基于的规则一起快速剪裁启动。此外，我们还提出了高效的块稀疏运算符，并证明其效果和块稀疏的压缩率。通过对ResMLP图像分类任务的训练进行评估，我们发现内存减少了32%，保持了准确性。最终，我们的方法旨在普及大规模模型训练，降低GPU需求，并解决生态环境问题。

Imputation using training labels and classification via label imputation

paper_url: http://arxiv.org/abs/2311.16877
repo_url: https://github.com/thunguyen177/iul-cbmi
paper_authors: Thu Nguyen, Pål Halvorsen, Michael A. Riegler
for: 处理缺失数据的问题
methods: 堆叠标签到输入数据进行重建
results: 提高准确率

Abstract
Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.

摘要
常见的问题在实际应用中是缺失数据。各种填充方法已经开发出来处理缺失数据。然而，即使标签通常在训练数据中可用，常见的填充方法通常只是依靠输入，忽略标签。在这种工作中，我们示出了把标签堆叠到输入中可以显著提高输入的填充。此外，我们提议一种分类策略，其中首先INITIALIZES predicted test label with missing values，然后堆叠标签与输入进行填充。这种技术可以同时填充标签和输入，并且可以处理没有任何先前填充的数据训练。此外，该技术适用于连续、分类或混合类型数据。实验结果表明，该技术可以提高准确性。

Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing

paper_url: http://arxiv.org/abs/2311.16876
repo_url: None
paper_authors: Zhengming Zhang, Yongming Huang, Cheng Zhang, Qingbi Zheng, Luxi Yang, Xiaohu You
For: The paper proposes a framework for dynamic and efficient resource allocation in network slicing-based communication systems, specifically for diversified services.* Methods: The proposed framework uses a digital twin model and reinforcement learning agents to handle the challenge of resource allocation in practical systems. The digital twin model is built using historical data and neural networks, and is calibrated to match the real environment.* Results: The proposed framework significantly improves the performance of slice optimization strategies, as demonstrated through numerical simulation experiments. The use of a digital twin model and reinforcement learning agents enables the framework to achieve good results with less interaction with the real environment.

Abstract
Network slicing-based communication systems can dynamically and efficiently allocate resources for diversified services. However, due to the limitation of the network interface on channel access and the complexity of the resource allocation, it is challenging to achieve an acceptable solution in the practical system without precise prior knowledge of the dynamics probability model of the service requests. Existing work attempts to solve this problem using deep reinforcement learning (DRL), however, such methods usually require a lot of interaction with the real environment in order to achieve good results. In this paper, a framework consisting of a digital twin and reinforcement learning agents is present to handle the issue. Specifically, we propose to use the historical data and the neural networks to build a digital twin model to simulate the state variation law of the real environment. Then, we use the data generated by the network slicing environment to calibrate the digital twin so that it is in sync with the real environment. Finally, DRL for slice optimization optimizes its own performance in this virtual pre-verification environment. We conducted an exhaustive verification of the proposed digital twin framework to confirm its scalability. Specifically, we propose to use loss landscapes to visualize the generalization of DRL solutions. We explore a distillation-based optimization scheme for lightweight slicing strategies. In addition, we also extend the framework to offline reinforcement learning, where solutions can be used to obtain intelligent decisions based solely on historical data. Numerical simulation experiments show that the proposed digital twin can significantly improve the performance of the slice optimization strategy.

摘要

A unified weighting framework for evaluating nearest neighbour classification

paper_url: http://arxiv.org/abs/2311.16872
repo_url: None
paper_authors: Oliver Urs Lenz, Henri Bollaert, Chris Cornelis
for: 评估类别 nearest neighbor 算法的选择
methods: 使用距离函数和权重策略，以及不同的规模计量
results: 研究发现，Boscovich距离函数和Samworth权重策略以及各种规模计量都有优秀表现，NN和FRNN均以这些方法为最佳选择，而FNN则较为简单的Yager距离权重策略也有比较好的表现。

Abstract
We present the first comprehensive and large-scale evaluation of classical (NN), fuzzy (FNN) and fuzzy rough (FRNN) nearest neighbour classification. We show that existing proposals for nearest neighbour weighting can be standardised in the form of kernel functions, applied to the distance values and/or ranks of the nearest neighbours of a test instance. Furthermore, we identify three commonly used distance functions and four scaling measures. We systematically evaluate these choices on a collection of 85 real-life classification datasets. We find that NN, FNN and FRNN all perform best with Boscovich distance. NN and FRNN perform best with a combination of Samworth rank- and distance weights and scaling by the mean absolute deviation around the median ($r_1$), the standard deviaton ($r_2$) or the interquartile range ($r_{\infty}^*$), while FNN performs best with only Samworth distance-weights and $r_1$- or $r_2$-scaling. We also introduce a new kernel based on fuzzy Yager negation, and show that NN achieves comparable performance with Yager distance-weights, which are simpler to implement than a combination of Samworth distance- and rank-weights. Finally, we demonstrate that FRNN generally outperforms NN, which in turns performs systematically better than FNN.

摘要
我们提出了首次全面和大规模评估类传统（NN）、概率（FNN）和概率粗糙（FRNN）最近邻类划分方法的评估。我们表明，现有的邻居重要性评价方法可以标准化为内核函数，应用到测试实例的距离值和/或排名的 nearest neighbors 中。此外，我们标识出了三种常用的距离函数和四种缩放度量。我们系统地评估这些选择，并发现 Bocschovich 距离函数是最佳。NN 和 FRNN 都perform最佳，使用 Samworth 排名-和距离重量，并使用 $r_1$, $r_2$ 或 $r_{\infty}^*$ 的缩放度量。而 FNN 只使用 Samworth 距离重量，并使用 $r_1$- 或 $r_2$-缩放度量。我们还介绍了一种基于概率粗糙 Yager 否定的新内核，并发现 NN 可以与 Yager 距离重量实现相似的性能，而这些重量更容易实现。最后，我们展示了 FRNN 通常超过 NN，而 NN 则在 turns 系统性地高于 FNN。

Power Hungry Processing: Watts Driving the Cost of AI Deployment?

paper_url: http://arxiv.org/abs/2311.16863
repo_url: None
paper_authors: Alexandra Sasha Luccioni, Yacine Jernite, Emma Strubell
For: The paper aims to compare the ongoing inference cost of various categories of machine learning (ML) systems, including task-specific and general-purpose models.* Methods: The authors use a systematic comparison of the deployment cost of these models, measuring the amount of energy and carbon required to perform 1,000 inferences on a representative benchmark dataset.* Results: The authors find that multi-purpose, generative architectures are orders of magnitude more expensive than task-specific systems for a variety of tasks, even when controlling for the number of model parameters.Here’s the same information in Simplified Chinese text:* For: 这个论文目的是比较不同类型的机器学习（ML）系统的持续执行成本，包括任务特定和通用模型。* Methods: 作者使用系统比较持续执行成本的方法，测量使用代表性数据集进行1,000次推理所需的能源和碳排放。* Results: 作者发现通用生成架构在多种任务上比任务特定模型多得多orders of magnitude，即使控制参数的数量。

Abstract
Recent years have seen a surge in the popularity of commercial AI products based on generative, multi-purpose AI systems promising a unified approach to building machine learning (ML) models into technology. However, this ambition of "generality" comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit. In this work, we propose the first systematic comparison of the ongoing inference cost of various categories of ML systems, covering both task-specific (i.e. finetuned models that carry out a single task) and `general-purpose' models, (i.e. those trained for multiple tasks). We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models. We find that multi-purpose, generative architectures are orders of magnitude more expensive than task-specific systems for a variety of tasks, even when controlling for the number of model parameters. We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions. All the data from our study can be accessed via an interactive demo to carry out further exploration and analysis.

摘要
Translated into Simplified Chinese:近年来，商业AI产品的普及率呈指数增长，基于生成、多目标AI系统的推广，提出了一种简化机器学习（ML）模型的方法。然而，这种“通用性”的目标来得到了环境成本的代价，这些系统占用的能源和排放的碳排放很高。在这项工作中，我们提出了首个对不同类型的ML系统持续执行成本进行系统比较，包括任务特定（即精心调整模型进行单个任务）和通用模型（可以执行多个任务）。我们测量了部署成本为在代表性数据集上使用这些模型进行1,000次推理所需的能源和碳排放。我们发现，生成、多目标架构的成本比任务特定系统高出多个级别，即使控制参数的数量。我们 conclude 当前投入多用途生成ML系统的趋势，并提醒将其效用与能源和排放成本进行更加INTENTIONAL的评估。我们的所有数据可以通过交互示例来进行进一步的探索和分析。

Data-efficient operator learning for solving high Mach number fluid flow problems

paper_url: http://arxiv.org/abs/2311.16860
repo_url: None
paper_authors: Noah Ford, Victor J. Leon, Honest Merman, Jeffrey Gilbert, Alexander New
for: 预测高速 fluid 流体在不规则 geometries 上的解决方案
methods: 使用 SciML 预测方法，并使用 Neural Basis Functions (NBF) 学习数据中的行为模式，以便在低数据情况下进行预测
results: NBF 模型比基础模型更有效，但还存在一些继续需要解决的挑战。

Abstract
We consider the problem of using SciML to predict solutions of high Mach fluid flows over irregular geometries. In this setting, data is limited, and so it is desirable for models to perform well in the low-data setting. We show that Neural Basis Functions (NBF), which learns a basis of behavior modes from the data and then uses this basis to make predictions, is more effective than a basis-unaware baseline model. In addition, we identify continuing challenges in the space of predicting solutions for this type of problem.

摘要
我们考虑使用SciML预测高马希尔流体流过不规则结构的解决方案。在这种情况下，数据有限，因此希望模型在低数据情况下表现良好。我们发现基于行为模式学习（NBF），它从数据中学习一个行为模式基础，然后使用这个基础进行预测，比基础不明的基准模型更有效。此外，我们还发现这种问题预测解决方案还存在一些挑战。

Attentional Graph Neural Networks for Robust Massive Network Localization

paper_url: http://arxiv.org/abs/2311.16856
repo_url: None
paper_authors: Wenzhong Yan, Juntao Wang, Feng Yin, Abdelhak M. Zoubir
for: 本研究用Graph Neural Networks (GNNs)和注意力机制来解决非线性回归问题：网络地址。
methods: 提出了一种基于GNNs的网络地址方法，可以在严重的非视线（NLOS）干扰下达到高稳定性和准确性。
results: 实验结果表明，提出的GNNs-based模型在NLOS场景下表现出色，特别是在复杂NLOS场景下。但是，GNNs-based模型的灵活性有限，其准确性高度依赖于特定的超参数。为了扩展GNNs-based模型的应用范围和适用场景，我们提出了两种注意力图 neural networks（AGNNs），它们可以自动学习最佳超参数。实验结果表明，AGNNs模型可以提高地址精度，为实际应用提供了有前途的解决方案。

Abstract
Graph neural networks (GNNs) have gained significant popularity for classification tasks in machine learning, yet their applications to regression problems remain limited. Concurrently, attention mechanisms have emerged as powerful tools in sequential learning tasks. In this paper, we employ GNNs and attention mechanisms to address a classical but challenging nonlinear regression problem: network localization. We propose a novel GNN-based network localization method that achieves exceptional stability and accuracy in the presence of severe non-line-of-sight (NLOS) propagations, while eliminating the need for laborious offline calibration or NLOS identification. Extensive experimental results validate the effectiveness and high accuracy of our GNN-based localization model, particularly in challenging NLOS scenarios. However, the proposed GNN-based model exhibits limited flexibility, and its accuracy is highly sensitive to a specific hyperparameter that determines the graph structure. To address the limitations and extend the applicability of the GNN-based model to real scenarios, we introduce two attentional graph neural networks (AGNNs) that offer enhanced flexibility and the ability to automatically learn the optimal hyperparameter for each node. Experimental results confirm that the AGNN models are able to enhance localization accuracy, providing a promising solution for real-world applications. We also provide some analyses of the improved performance achieved by the AGNN models from the perspectives of dynamic attention and signal denoising characteristics.

摘要
граф нейронных сетей (GNNs) 已经在机器学习中获得了广泛的应用，但它们在回归 задача中的应用还是有限的。同时，注意机制已经作为序列学习任务中的有效工具而出现。在这篇论文中，我们使用 GNNs 和注意机制来解决一个经典的非线性回归问题：网络地址。我们提出了一种基于 GNNs 的网络地址方法，该方法在严重的非直线视线（NLOS）传播中具有出色的稳定性和准确性，而不需要劳碌的在线调整或NLOS标识。广泛的实验结果证明了我们基于 GNNs 的地址模型的效果和高精度，特别是在复杂NLOS场景下。然而，我们的 GNNs 基本模型具有局限性，其精度高度依赖于确定图 структуры的特定参数。为了解决这些限制并扩展 GNNs 的应用范围，我们引入了两种注意力图神经网络（AGNNs），它们具有更高的灵活性和自动学习最佳参数的能力。实验结果表明，AGNN 模型能够提高地址精度，提供了实际应用中的有力解决方案。我们还对 AGNN 模型的改进表现进行了一些分析，包括动态注意和信号干扰特性的分析。

Identifiable Feature Learning for Spatial Data with Nonlinear ICA

paper_url: http://arxiv.org/abs/2311.16849
repo_url: None
paper_authors: Hermanni Hälvä, Jonathan So, Richard E. Turner, Aapo Hyvärinen
for: 非线性ICA（简称ICA）是深度表示学习和分离的新型方法，它可以捕捉数据中更复杂的 latent dependencies。
methods: 我们提出了一种基于 $t$-进程（TP） latent components的新的非线性ICA框架，该框架适用于数据中的高维dependency结构，如空间和时空数据。我们还开发了一种新的学习和推理算法，该算法结合了深度混合函数和TP先验方法，并使用了引导点的方法来提高计算效率。
results: 我们通过 simulate spatial data 和实际世界时空数据进行了实验，并证明了我们的算法和定理。我们的结果表明，TP非线性ICA可以更好地捕捉数据中的 latent dependencies，并且可以在具有高维dependency结构的数据上提供更高的可解释性。

Abstract
Recently, nonlinear ICA has surfaced as a popular alternative to the many heuristic models used in deep representation learning and disentanglement. An advantage of nonlinear ICA is that a sophisticated identifiability theory has been developed; in particular, it has been proven that the original components can be recovered under sufficiently strong latent dependencies. Despite this general theory, practical nonlinear ICA algorithms have so far been mainly limited to data with one-dimensional latent dependencies, especially time-series data. In this paper, we introduce a new nonlinear ICA framework that employs $t$-process (TP) latent components which apply naturally to data with higher-dimensional dependency structures, such as spatial and spatio-temporal data. In particular, we develop a new learning and inference algorithm that extends variational inference methods to handle the combination of a deep neural network mixing function with the TP prior, and employs the method of inducing points for computational efficacy. On the theoretical side, we show that such TP independent components are identifiable under very general conditions. Further, Gaussian Process (GP) nonlinear ICA is established as a limit of the TP Nonlinear ICA model, and we prove that the identifiability of the latent components at this GP limit is more restricted. Namely, those components are identifiable if and only if they have distinctly different covariance kernels. Our algorithm and identifiability theorems are explored on simulated spatial data and real world spatio-temporal data.

摘要
近期，非线性ICA 已经出现为深度表征学习和分解的替代方案。非线性ICA 的优点在于已经建立了一套复杂的可识别性理论，具体来说，可以在具有强 suficient latent dependencies 的情况下回归原始组件。 despite this general theory, practical nonlinear ICA algorithms have so far been mainly limited to data with one-dimensional latent dependencies, especially time-series data. 在这篇论文中，我们介绍了一种新的非线性ICA 框架，使用 t-process（TP）latent components，这些组件自然地适用于高维dependency结构的数据，例如空间和时空数据。我们开发了一种新的学习和推断算法，可以将深度混合函数与TP prior相结合，并使用引导点的方法来提高计算效率。从理论角度来看，我们证明了这些TP独立组件在非常一般的条件下是可识别的。此外，我们还将GP非线性ICA 作为 TP非线性ICA 模型的极限情况，并证明了这些独立组件在GP极限下的可识别性是更加restricted。即，这些组件只有在它们的卷积函数权重相差较大时才是可识别的。我们的算法和可识别性定理在 simulate spatial data 和实际世界时空数据上进行了调查。

The HR-Calculus: Enabling Information Processing with Quaternion Algebra

paper_url: http://arxiv.org/abs/2311.16771
repo_url: None
paper_authors: Danilo P. Mandic, Sayed Pouria Talebi, Clive Cheong Took, Yili Xia, Dongpo Xu, Min Xiang, Pauline Bourigault
for: 本文旨在探讨采用四元数和其分数 algebra 模拟三维空间中的旋转/方向问题，以及应用于机器学习、信号处理和控制领域中的自适应信息处理技术。
methods: 本文使用 HR-calculus 提供了对四元数值信号进行适应学习技术的数学基础，并提供了 Gradient 算子、链式和乘积Derivative规则以及 Taylor 级数展开等工具。
results: 本文介绍了在单节和多节形式上适应信息处理的主要应用，并提供了支持材料（SM）。

Abstract
From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing techniques specifically designed for quaternion-valued signals have only recently come to the attention of the machine learning, signal processing, and control communities. The most important development in this direction is introduction of the HR-calculus, which provides the required mathematical foundation for deriving adaptive information processing techniques directly in the quaternion domain. In this article, the foundations of the HR-calculus are revised and the required tools for deriving adaptive learning techniques suitable for dealing with quaternion-valued signals, such as the gradient operator, chain and product derivative rules, and Taylor series expansion are presented. This serves to establish the most important applications of adaptive information processing in the quaternion domain for both single-node and multi-node formulations. The article is supported by Supplementary Material, which will be referred to as SM.

摘要
自它们的出现以来，四元数和它们的分配代数已经在三维空间中模拟旋转/方向的问题上表现出了优势，从电磁场理论的初步形ulation到量子场理论的基础都有使用。尽管它们在实际世界现象模拟方面表现出了很好的多样性，但是专门为四元数值信号设计的适应信息处理技术直到最近才被机器学习、信号处理和控制共同体的注意。这一方向的最重要发展是引入HR-calculus，该算术提供了对 directly derivation of adaptive learning techniques in the quaternion domain的数学基础。在本文中，HR-calculus的基础会被重点探讨，以及适用于处理四元数值信号的 gradient operator、链式和产品 derive rule、泰勒级数展开等工具。这将为单节和多节形式的应用提供基础。本文支持了Supplementary Material，将在SM中提供。

Asynchronous Wireless Federated Learning with Probabilistic Client Selection

paper_url: http://arxiv.org/abs/2311.16741
repo_url: None
paper_authors: Jiarong Yang, Yuan Liu, Fangjiong Chen, Wen Chen, Changle Li
for: 这篇论文旨在解决对于分布式学习（Federated Learning，FL）中的延迟问题（stragglers issue），并且提出了一个基于机会抽样的方法来解决这个问题。
methods: 本论文使用了机会抽样和机会分配来解决延迟问题，并且提出了一个优化问题来寻找最佳的机会抽样和机会分配组合，以提高对于FL的吞吐量和能源消耗。
results: 实验结果显示，提出的方法可以对FL中的延迟问题进行有效地解决，并且比传统方法有更好的性能。

Abstract
Federated learning (FL) is a promising distributed learning framework where distributed clients collaboratively train a machine learning model coordinated by a server. To tackle the stragglers issue in asynchronous FL, we consider that each client keeps local updates and probabilistically transmits the local model to the server at arbitrary times. We first derive the (approximate) expression for the convergence rate based on the probabilistic client selection. Then, an optimization problem is formulated to trade off the convergence rate of asynchronous FL and mobile energy consumption by joint probabilistic client selection and bandwidth allocation. We develop an iterative algorithm to solve the non-convex problem globally optimally. Experiments demonstrate the superiority of the proposed approach compared with the traditional schemes.

摘要
Federated 学习（FL）是一种有前途的分布式学习框架，在多个分布式客户端之间协同训练一个由服务器协调的机器学习模型。为解决异步FL中的延迟问题，我们假设每个客户端都保留了本地更新，并在自由时间进行可能性抽取本地模型并将其传输到服务器。我们首先 derivates （近似）的表达式，基于 probabilistic 客户端选择，然后将异步FL的快速吞吐率与 mobilenergy 消耗进行负面优化。我们开发了一种Iterative 算法，可以全面优化非对称问题。实验结果表明，我们的方法比传统方法更高效。Note: "异步FL" in the text refers to "asynchronous federated learning".

Sluggish and Chemically-Biased Interstitial Diffusion in Concentrated Solid Solution Alloys: Mechanisms and Methods

paper_url: http://arxiv.org/abs/2311.16727
repo_url: https://github.com/jeremy1189/interstitial-diffusion
paper_authors: Biao Xu, Haijun Fu, Shasha Huang, Shihua Ma, Yaoxu Xiong, Jun Zhang, Xuepeng Xiang, Wenyu Lu, Ji-Jung Kai, Shijun Zhao
for: 这个论文主要研究了 Fe-Ni 共晶物中的慢性扩散和化学偏好扩散。
methods: 这个论文使用了机器学习（ML）和遗传 Monte Carlo（kMC）方法，其中 ML 用于准确地预测扩散能量障碍。
results: 研究发现，Fe-Ni 共晶物中的慢性扩散和 “Ni-Ni-Ni” 偏好扩散是由于 “梯度锁” 机制引起的，而 “Fe-Fe-Fe” 偏好扩散则受到 “组分控制” 机制的影响。这些机制的发现启发了一种实用的 AvgS-kMC 方法，用于快速和方便地确定扩散性。此外，这个方法还可以通过 differential evolutionary algorithm 进行反向设计，以便优化慢性扩散性。

Abstract
Interstitial diffusion is a pivotal process that governs the phase stability and irradiation response of materials in non-equilibrium conditions. In this work, we study sluggish and chemically-biased interstitial diffusion in Fe-Ni concentrated solid solution alloys (CSAs) by combining machine learning (ML) and kinetic Monte Carlo (kMC), where ML is used to accurately and efficiently predict the migration energy barriers on-the-fly. The ML-kMC reproduces the diffusivity that was reported by molecular dynamics results at high temperatures. With this powerful tool, we find that the observed sluggish diffusion and the "Ni-Ni-Ni"-biased diffusion in Fe-Ni alloys are ascribed to a unique "Barrier Lock" mechanism, whereas the "Fe-Fe-Fe"-biased diffusion is influenced by a "Component Dominance" mechanism. Inspired by the mentioned mechanisms, a practical AvgS-kMC method is proposed for conveniently and swiftly determining interstitial-mediated diffusivity by only relying on the mean energy barriers of migration patterns. Combining the AvgS-kMC with the differential evolutionary algorithm, an inverse design strategy for optimizing sluggish diffusion properties is applied to emphasize the crucial role of favorable migration patterns.

摘要
“交叉扩散是材料在非平衡状态下的阶段稳定和辐射回应的决定性过程。在这个工作中，我们使用机器学习（ML）和运动 Monte Carlo（kMC）结合来研究 Fe-Ni 浓缩体积合金（CSAs）中的慢扩散和化学偏好扩散。ML 是用来精确地预测迁移能量障碍的，而 kMC 则是用来模拟迁移过程。这两种方法的结合可以重现在高温下的分子动力学结果中报告的迁移率。”“我们发现，Fe-Ni 合金中的慢扩散和 "Ni-Ni-Ni" 偏好扩散是由一种叫做 "Barrier Lock" 机制引起的，而 "Fe-Fe-Fe" 偏好扩散则是由一种叫做 "Component Dominance" 机制控制的。这些机制的验证使我们提出一种实用的 AvgS-kMC 方法，可以轻松地和快速地决定慢扩散的媒介。”“我们结合 AvgS-kMC 方法和演化遗传算法，实现了对慢扩散性的反向设计。这个方法可以帮助设计更好的材料，并且可以实现更好的辐射回应和阶段稳定。”

Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm

paper_url: http://arxiv.org/abs/2311.16706
repo_url: None
paper_authors: Mohammad Reza Karimi, Ya-Ping Hsieh, Andreas Krause
for: 解决机器学习中的各种问题，可以看作是解决 entropy-regularized 优化运输问题，这种问题的 canonical approach 是 Sinkhorn 迭代。
methods: 使用 mirror descent 框架，从而得到了更多的数学意味。
results: 提出了一种连续时间 analogue 的 Sinkhorn 算法，可以承受噪声和偏见，并且可以拓展到其他几种最近发现的数学和机器学习问题。

Abstract
Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we build upon this result by introducing a continuous-time analogue of the Sinkhorn algorithm. This perspective allows us to derive novel variants of Sinkhorn schemes that are robust to noise and bias. Moreover, our continuous-time dynamics not only generalize but also offer a unified perspective on several recently discovered dynamics in machine learning and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).

摘要
很多机器学习问题可以表述为寻找带有Entropy regularization的最佳运输问题在概率分布空间上。传统方法通过Sinkhorn迭代，其具有丰富的数学性质。近期，Sinkhorn算法被推广到镜像下降框架中，从而获益于古典优化理论的问题。我们在这篇文章中继续这一点，引入了缓时间的Sinkhorn算法。这个角度允许我们 derivate novel的Sinkhorn方案，这些方案具有防止噪音和偏误的特性。此外，我们的缓时间动态不仅泛化了Sinkhorn方案，而且还提供了机器学习和数学领域中最近发现的一些动态，例如(Deb et al. 2023)所提出的" Wasserstein mirror flow"或(Claisse et al. 2023)所提出的"mean-field Schrödinger equation".

PyTorch Geometric High Order: A Unified Library for High Order Graph Neural Network

paper_url: http://arxiv.org/abs/2311.16670
repo_url: https://github.com/graphpku/pygho
paper_authors: Xiyuan Wang, Muhan Zhang
for: 本研究旨在提供一个简单易用的高阶图神经网络（HOGNN）库，即PyTorch Geometric High Order（PyGHO），用于扩展PyTorch Geometric（PyG）。
methods: PyGHO提供了一个统一的高阶GNN方法框架，包括子图GNN和k-WL GNN，以及高阶GNN的数据结构和处理工具，以及一组高级GNN操作。
results: 对于实际任务，PyGHO可以提高HOGNN的运行速度，最高达50%，并将实现代码量减少到一半。

Abstract
We introduce PyTorch Geometric High Order (PyGHO), a library for High Order Graph Neural Networks (HOGNNs) that extends PyTorch Geometric (PyG). Unlike ordinary Message Passing Neural Networks (MPNNs) that exchange messages between nodes, HOGNNs, encompassing subgraph GNNs and k-WL GNNs, encode node tuples, a method previously lacking a standardized framework and often requiring complex coding. PyGHO's main objective is to provide an unified and user-friendly interface for various HOGNNs. It accomplishes this through streamlined data structures for node tuples, comprehensive data processing utilities, and a flexible suite of operators for high-order GNN methodologies. In this work, we present a detailed in-depth of PyGHO and compare HOGNNs implemented with PyGHO with their official implementation on real-world tasks. PyGHO achieves up to $50\%$ acceleration and reduces the code needed for implementation by an order of magnitude. Our library is available at \url{https://github.com/GraphPKU/PygHO}.

摘要
我们介绍PyTorch Geometric High Order（PyGHO），一个用于高阶图神经网络（HOGNNs）的库，扩展了PyTorch Geometric（PyG）。与普通的消息传递神经网络（MPNNs）不同，HOGNNs将节点对编码，而之前缺乏标准化框架和复杂的编程。PyGHO的主要目标是提供一个简单易用的HOGNNs接口，它通过流lined数据结构、全面的数据处理工具和高阶GNN方法学 suite的灵活操作来实现。在这个工作中，我们对PyGHO进行了详细探讨，并对实际任务上HOGNNs的官方实现进行了比较。我们发现，使用PyGHO可以实现加速达50%，并且可以减少实现代码的数量一倍。我们的库可以在中获取。

Pseudo-Likelihood Inference

paper_url: http://arxiv.org/abs/2311.16656
repo_url: https://github.com/FTurci/plm-brain
paper_authors: Theo Gruner, Boris Belousov, Fabio Muratore, Daniel Palenicek, Jan Peters
for: 这个论文主要针对的是 bayesian 系统逻辑学习中的难以求导likelihood问题。
methods: 这篇论文提出了一种新的 Pseudo-Likelihood Inference（PLI）方法，它将神经网络approximation bring into Approximate Bayesian Computation（ABC），使其在高维问题上能够与Sequential Neural Posterior Estimation（SNPE）相比。
results: 论文通过使用 integral probability metrics和adaptive bandwidth来更新信息理论信任区域，使得PLI可以通过梯度下降优化神经后退，不需要摘要统计数据，并且可以处理多个观察数据。相比SNPE，PLI在更多数据available时显示出改进的性能。论文在四个经典的 SBI bencmark任务和一个高度动态的物理系统上进行了评估，并表明PLI在随机优化和多模 posterior 领域具有特殊优势。

Abstract
Simulation-Based Inference (SBI) is a common name for an emerging family of approaches that infer the model parameters when the likelihood is intractable. Existing SBI methods either approximate the likelihood, such as Approximate Bayesian Computation (ABC) or directly model the posterior, such as Sequential Neural Posterior Estimation (SNPE). While ABC is efficient on low-dimensional problems, on higher-dimensional tasks, it is generally outperformed by SNPE, which leverages function approximation. In this paper, we propose Pseudo-Likelihood Inference (PLI), a new method that brings neural approximation into ABC, making it competitive on challenging Bayesian system identification tasks. By utilizing integral probability metrics, we introduce a smooth likelihood kernel with an adaptive bandwidth that is updated based on information-theoretic trust regions. Thanks to this formulation, our method (i) allows for optimizing neural posteriors via gradient descent, (ii) does not rely on summary statistics, and (iii) enables multiple observations as input. In comparison to SNPE, it leads to improved performance when more data is available. The effectiveness of PLI is evaluated on four classical SBI benchmark tasks and on a highly dynamic physical system, showing particular advantages on stochastic simulations and multi-modal posterior landscapes.

摘要
模拟基于推理（SBI）是一种广泛使用的方法，用于在likelihood是不可 Calculate时获取模型参数的推理。现有的SBI方法可以通过approximate Bayesian computation（ABC）或directly model the posterior（SNPE）来实现。而ABC在低维度问题上是高效的，但是在更高维度任务上通常会被SNPE超越，SNPE利用函数抽象来实现。在这篇论文中，我们提出了 Pseudo-Likelihood Inference（PLI），一种新的方法，它将 neural approximation 引入ABC中，使其在复杂的 Bayesian system identification 任务上竞争。我们通过使用积分概率度量，引入了一个适应性的带宽，这个带宽会根据信息理论的信任区域更新。因此，我们的方法（i）可以通过梯度下降优化神经 posterior，（ii）不需要摘要统计，（iii）允许多个观察入参。与SNPE相比，它在更多数据available时表现更好。我们的方法在四个 классиical SBI benchmark任务和一个高度动态的物理系统上进行了评估，特别是在随机 simulations 和多模态 posterior 陆地表现出色。

Elucidating Discrepancy in Explanations of Predictive Models Developed using EMR

paper_url: http://arxiv.org/abs/2311.16654
repo_url: None
paper_authors: Aida Brankovic, Wenjie Huang, David Cook, Sankalp Khanna, Konstanty Bialkowski
for: 这项研究旨在检验机器学习算法在电子医疗记录数据上的临床决策支持算法是否符合专业医疗知识。
methods: 这项研究使用当今最佳实践的解释性人工智能方法对临床决策支持算法进行分析，并讨论了从临床和技术角度来解释出现的差异。
results: 研究发现，当前的解释性人工智能方法与专业医疗知识之间存在一定的不一致，并且从临床和技术角度来解释出现的差异。

Abstract
The lack of transparency and explainability hinders the clinical adoption of Machine learning (ML) algorithms. While explainable artificial intelligence (XAI) methods have been proposed, little research has focused on the agreement between these methods and expert clinical knowledge. This study applies current state-of-the-art explainability methods to clinical decision support algorithms developed for Electronic Medical Records (EMR) data to analyse the concordance between these factors and discusses causes for identified discrepancies from a clinical and technical perspective. Important factors for achieving trustworthy XAI solutions for clinical decision support are also discussed.

摘要
Machine learning (ML) 算法在临床应用面临透明性和解释性的限制，而解释人工智能（XAI）方法已经被提出，但是对专家临床知识的协调得到了少量的研究。本研究通过现有的状态 искусственный智能（XAI）方法对电子医疗记录（EMR）数据中的决策支持算法进行分析，检查这些因素之间的一致性，并从临床和技术角度讨论了出现的差异的原因。本研究还讨论了实现可靠的 XAI 解决方案的重要因素。

Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

paper_url: http://arxiv.org/abs/2311.16646
repo_url: None
paper_authors: Ming-Yu Chung, Sheng-Yen Chou, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo, Tsung-Yi Ho
for: 增强深度学习数据效率的数据净化研究
methods: 基于kernel方法的数据净化和触发模式生成方法
results: 通过优化触发设计框架，实现对数据净化的效果针对性和抗干扰性的触发攻击。

Abstract
Dataset distillation offers a potential means to enhance data efficiency in deep learning. Recent studies have shown its ability to counteract backdoor risks present in original training samples. In this study, we delve into the theoretical aspects of backdoor attacks and dataset distillation based on kernel methods. We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation. Following a comprehensive set of analyses and experiments, we show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation. Notably, datasets poisoned by our designed trigger prove resilient against conventional backdoor attack detection and mitigation methods. Our empirical results validate that the triggers developed using our approaches are proficient at executing resilient backdoor attacks.

摘要

Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight

paper_url: http://arxiv.org/abs/2311.16632
repo_url: https://github.com/antonio955/pi_dae
paper_authors: Antonio Liguori, Matias Quintana, Chun Fu, Clayton Miller, Jérôme Frisch, Christoph van Treeck
for: 本研究旨在提出 Physics-informed Denoising Autoencoders (PI-DAE) 方法，用于适应商业建筑物 missing data 问题。
methods: 本方法基于 Denoising Autoencoder (DAE) 结构，并在损失函数中引入物理知识。
results: 研究表明，在不同数据缺失率情况下，PI-DAE 方法具有更好的稳定性和可解释性，并且可以从物理学上得到有价值的参数。

Abstract
Missing data are frequently observed by practitioners and researchers in the building energy modeling community. In this regard, advanced data-driven solutions, such as Deep Learning methods, are typically required to reflect the non-linear behavior of these anomalies. As an ongoing research question related to Deep Learning, a model's applicability to limited data settings can be explored by introducing prior knowledge in the network. This same strategy can also lead to more interpretable predictions, hence facilitating the field application of the approach. For that purpose, the aim of this paper is to propose the use of Physics-informed Denoising Autoencoders (PI-DAE) for missing data imputation in commercial buildings. In particular, the presented method enforces physics-inspired soft constraints to the loss function of a Denoising Autoencoder (DAE). In order to quantify the benefits of the physical component, an ablation study between different DAE configurations is conducted. First, three univariate DAEs are optimized separately on indoor air temperature, heating, and cooling data. Then, two multivariate DAEs are derived from the previous configurations. Eventually, a building thermal balance equation is coupled to the last multivariate configuration to obtain PI-DAE. Additionally, two commonly used benchmarks are employed to support the findings. It is shown how introducing physical knowledge in a multivariate Denoising Autoencoder can enhance the inherent model interpretability through the optimized physics-based coefficients. While no significant improvement is observed in terms of reconstruction error with the proposed PI-DAE, its enhanced robustness to varying rates of missing data and the valuable insights derived from the physics-based coefficients create opportunities for wider applications within building systems and the built environment.

摘要
Missing data 是在建筑能源模型社区中经常观察到的问题。在这种情况下，高级数据驱动解决方案，如深度学习方法，通常被用来反映数据的非线性行为。作为一个持续研究问题，模型在有限数据设置下的可用性可以通过在网络中引入先验知识来探索。为此，本文提出使用物理知识驱动的杜邦抑制自适应器（PI-DAE）来填充商业建筑中的缺失数据。特别是，提出的方法使用物理启发的软约束来改进杜邦抑制自适应器（DAE）的损失函数。为了衡量物理组件的影响，对不同DAE配置进行了减少研究。首先，三个单变量DAE被独立地优化，其中一个是indoor air temperature，另外两个是加热和冷却数据。然后，两个多变量DAE被 derivated from the previous configurations。最后，一个建筑thermal balance方程被 coupling to the last multivariate configuration，以获得PI-DAE。此外，两个常用的benchmark被使用以支持发现。结果显示，在多变量DAE中引入物理知识可以提高模型解释性，并且通过优化物理基于的系数来获得有价值的物理学习知识。虽然在PI-DAE中没有显著提高重建错误的情况，但是它的适应性能强化和可靠性在不同的缺失数据率下得到了提升。这些发现可能为建筑系统和建筑环境中的应用提供新的机遇。

Outfit Completion via Conditional Set Transformation

paper_url: http://arxiv.org/abs/2311.16630
repo_url: None
paper_authors: Takuma Nakamura, Yuki Saito, Ryosuke Goto
for: solves the outfit completion problem as a set retrieval task
methods: proposes a novel framework with conditional set transformation architecture and compatibility-based regularization method
results: outperforms existing approaches in terms of accuracy, condition satisfaction, and compatibility of completion results

Abstract
In this paper, we formulate the outfit completion problem as a set retrieval task and propose a novel framework for solving this problem. The proposal includes a conditional set transformation architecture with deep neural networks and a compatibility-based regularization method. The proposed method utilizes a map with permutation-invariant for the input set and permutation-equivariant for the condition set. This allows retrieving a set that is compatible with the input set while reflecting the properties of the condition set. In addition, since this structure outputs the element of the output set in a single inference, it can achieve a scalable inference speed with respect to the cardinality of the output set. Experimental results on real data reveal that the proposed method outperforms existing approaches in terms of accuracy of the outfit completion task, condition satisfaction, and compatibility of completion results.

摘要
在这篇论文中，我们将穿着完成问题формализова为集合检索任务，并提出了一种新的解决方案。该提案包括一个受条件设定的集合变换架构，以及一种基于兼容性的Regularization方法。我们的方法使用一个具有排序不变性的映射，以便输入集合中的元素与条件集合中的元素进行匹配。此外，由于这种结构在单个推理过程中输出输出集合的元素，因此可以实现相对于输出集合的Cardinality进行扩展的推理速度。实验结果表明，我们的方法在实际数据上超过现有方法的准确率、条件满足度和完成结果的兼容性。

Symmetry-regularized neural ordinary differential equations

paper_url: http://arxiv.org/abs/2311.16628
repo_url: None
paper_authors: Wenbo Hao
for: 这个论文旨在使用神经网络模型中的隐藏状态动态，将其解释为常微分方程，从而在连续时间框架中捕捉系统动态。
methods: 该论文使用连续李群同构的方法，从而 derivation 保守法则并将其添加到损失函数中，使模型在训练过程中具有更高的稳定性和robustness。
results: 在一个简单的示例中，该论文使用cosine rate of change的隐藏状态，通过IDENTIFYING Lie symmetries, deriving conservation laws, and constructing a new loss function来ILLUSTRATE this method。

Abstract
Neural Ordinary Differential Equations (Neural ODEs) is a class of deep neural network models that interpret the hidden state dynamics of neural networks as an ordinary differential equation, thereby capable of capturing system dynamics in a continuous time framework. In this work, I integrate symmetry regularization into Neural ODEs. In particular, I use continuous Lie symmetry of ODEs and PDEs associated with the model to derive conservation laws and add them to the loss function, making it physics-informed. This incorporation of inherent structural properties into the loss function could significantly improve robustness and stability of the model during training. To illustrate this method, I employ a toy model that utilizes a cosine rate of change in the hidden state, showcasing the process of identifying Lie symmetries, deriving conservation laws, and constructing a new loss function.

摘要
neural ordinary differential equations (Neural ODEs) 是一类深度神经网络模型，它将神经网络中的隐藏状态动态视为常微分方程，因此可以捕捉系统动态在连续时间框架中。在这种工作中，我将对Neural ODEs进行Symmetry regularization。特别是，我使用连续李群对ODEs和PDEs相关的模型的Symmetry来 derivation conservation laws，并将它们添加到损失函数中，使其成为物理知识 Informed。这种在损失函数中包含内在结构特性的 incorporation 可能会在训练过程中显著提高模型的稳定性和Robustness。为 illustrate this method，我使用一个简单的示例，使用cosine rate of change在隐藏状态中，详细介绍了如何Identify Lie symmetries、derive conservation laws、并构建新的损失函数。

Gaussian Processes for Monitoring Air-Quality in Kampala

paper_url: http://arxiv.org/abs/2311.16625
repo_url: https://github.com/claramst/gps-kampala-airquality
paper_authors: Clara Stoddart, Lauren Shrack, Richard Sserunjogi, Usman Abdul-Ganiy, Engineer Bainomugisha, Deo Okure, Ruth Misener, Jose Pablo Folch, Ruby Sedgwick
for: 这 paper 是为了研究使用 Gaussian Processes 来现场测量空气质量和未来预测空气质量。
methods: 这 paper 使用 Gaussian Processes 方法进行现场测量和预测空气质量，并对数据进行异常值除外和不同的kernel函数比较。
results: 这 paper 在使用 AirQo 的数据集中 demonstarted 可以使用 Gaussian Processes 方法来现场测量和预测空气质量，并且可以根据不同的kernel函数和预测方法进行选择。

Abstract
Monitoring air pollution is of vital importance to the overall health of the population. Unfortunately, devices that can measure air quality can be expensive, and many cities in low and middle-income countries have to rely on a sparse allocation of them. In this paper, we investigate the use of Gaussian Processes for both nowcasting the current air-pollution in places where there are no sensors and forecasting the air-pollution in the future at the sensor locations. In particular, we focus on the city of Kampala in Uganda, using data from AirQo's network of sensors. We demonstrate the advantage of removing outliers, compare different kernel functions and additional inputs. We also compare two sparse approximations to allow for the large amounts of temporal data in the dataset.

摘要
监测空气质量是人口全体健康的重要因素。可惜，测量空气质量的设备可能很昂贵，许多低中收入国家的城市只有限量的设备分配。在这篇论文中，我们调查了使用 Gaussian Processes 来现场测量无探针地区的空气质量，以及将来探针地点的空气质量预测。我们specifically focuses on UGANDA的加莲达市，使用 AirQo 网络的探针数据。我们展示了减少异常点的优点，比较不同的核函数和附加输入。我们还比较了两种稀缺抽象方法，以处理大量的时间数据。

Beyond Labels: Advancing Cluster Analysis with the Entropy of Distance Distribution (EDD)

paper_url: http://arxiv.org/abs/2311.16621
repo_url: None
paper_authors: Claus Metzner, Achim Schilling, Patrick Krauss
for: 本研究旨在开发一种新的标签自由 clustering 分析方法，以更好地处理高维数据集中的各种各样的分布征性。
methods: 本方法基于对距离分布的信息 entropy 量化，不需要预先定义标签，可以独立地探测数据集中的各种各样的分布征性。
results: 实验结果表明，当 clusters 宽度增大时，EDD 值会逐渐增加，表明方法的敏感性和准确性在探测不同程度的分布征性。

Abstract
In the evolving landscape of data science, the accurate quantification of clustering in high-dimensional data sets remains a significant challenge, especially in the absence of predefined labels. This paper introduces a novel approach, the Entropy of Distance Distribution (EDD), which represents a paradigm shift in label-free clustering analysis. Traditional methods, reliant on discrete labels, often struggle to discern intricate cluster patterns in unlabeled data. EDD, however, leverages the characteristic differences in pairwise point-to-point distances to discern clustering tendencies, independent of data labeling. Our method employs the Shannon information entropy to quantify the 'peakedness' or 'flatness' of distance distributions in a data set. This entropy measure, normalized against its maximum value, effectively distinguishes between strongly clustered data (indicated by pronounced peaks in distance distribution) and more homogeneous, non-clustered data sets. This label-free quantification is resilient against global translations and permutations of data points, and with an additional dimension-wise z-scoring, it becomes invariant to data set scaling. We demonstrate the efficacy of EDD through a series of experiments involving two-dimensional data spaces with Gaussian cluster centers. Our findings reveal a monotonic increase in the EDD value with the widening of cluster widths, moving from well-separated to overlapping clusters. This behavior underscores the method's sensitivity and accuracy in detecting varying degrees of clustering. EDD's potential extends beyond conventional clustering analysis, offering a robust, scalable tool for unraveling complex data structures without reliance on pre-assigned labels.

摘要
在数据科学领域中发展的漫游境界中，对高维数据集中团集的准确评估仍然是一项主要挑战，特别是在缺乏预定标签的情况下。这篇论文介绍了一种新的方法——Entropy of Distance Distribution（EDD），这种方法代表了一种新的分类分析 paradigma shift。传统的方法，它们依赖于预定标签，通常在无标签数据中很难分辨复杂的团集模式。然而，EDD却利用数据集中对点对点距离的特征差异来分辨团集倾向，不需要预定标签。我们的方法利用希尔борн信息熵来衡量数据集中对点对点距离的'峰度'或'平坦度'。这个熵度量，相对于最大值归一化，有效地分辨了强团集数据集（表现为明显的峰值在距离分布中）和更Homogeneous的非团集数据集。这种标签自由的评估是对全球翻译和数据点 permutation 不敏感，并且通过一种额外的维度 wise z-scoring，变得缺省数据集 scaling 不变。我们通过对二维数据空间中的 Gaussian 团集中心进行一系列实验来证明 EDD 的有效性。我们发现，随着团集宽度的增加，EDD 值 monotonic 增加，从well-separated 到 overlap 的团集。这种行为证明了 EDD 的敏感和准确性在检测不同程度的团集。EDD 的潜在应用不仅限于传统的团集分析，还提供了一种可靠、扩展的工具，无需预定标签，可以用于探索复杂的数据结构。

Adversarial Distribution Balancing for Counterfactual Reasoning

paper_url: http://arxiv.org/abs/2311.16616
repo_url: https://github.com/sschrod/adbcr
paper_authors: Stefan Schrod, Fabian Sinz, Michael Altenbuchinger
for: 这个论文主要是为了解决 causal prediction 模型中面临的问题，即只能观察到实际（factual）干预的结果，而不能观察到其他可能的干预方案（counterfactuals）。
methods: 这篇论文提出了一种名为 Adversarial Distribution Balancing for Counterfactual Reasoning（ADBCR）的方法，它直接使用可能的 counterfactual 结果来除掉假设的 causal 关系。
results: 论文表明，ADBCR 可以在三个 benchmark 数据集上准确地识别 causal 关系，并且可以通过包含验证数据集中的无标签数据进行更好地适应验证数据集来进一步提高性能。

Abstract
The development of causal prediction models is challenged by the fact that the outcome is only observable for the applied (factual) intervention and not for its alternatives (the so-called counterfactuals); in medicine we only know patients' survival for the administered drug and not for other therapeutic options. Machine learning approaches for counterfactual reasoning have to deal with both unobserved outcomes and distributional differences due to non-random treatment administration. Unsupervised domain adaptation (UDA) addresses similar issues; one has to deal with unobserved outcomes -- the labels of the target domain -- and distributional differences between source and target domain. We propose Adversarial Distribution Balancing for Counterfactual Reasoning (ADBCR), which directly uses potential outcome estimates of the counterfactuals to remove spurious causal relations. We show that ADBCR outcompetes state-of-the-art methods on three benchmark datasets, and demonstrate that ADBCR's performance can be further improved if unlabeled validation data are included in the training procedure to better adapt the model to the validation domain.

摘要
发展 causal 预测模型受到了因果只能观察到应用 (实际) 干预的问题，而不是其他治疗选项 (称为 counterfactual) 的问题；在医学中，我们只知道接受药物的病人存活情况，而不知道其他治疗方案的效果。机器学习方法 для counterfactual 理解需要面临无观察的结果和分布差异问题，这与非随机的干预 administratin 有关。无监督领域适应 (UDA) 解决类似问题，需要面临无观察的结果（目标领域的标签）和源领域和目标领域之间的分布差异。我们提议 Adversarial Distribution Balancing for Counterfactual Reasoning (ADBCR)，直接使用 counterfactual 的潜在结果估计来消除假 causal 关系。我们表明 ADBCR 能够超越当前的方法在三个 benchmark 数据集上，并示出 ADBCR 性能可以通过包含预处理数据集中的无标签数据进行更好地适应验证领域进行改进。

A Multivariate Unimodality Test Harnenssing the Dip Statistic of Mahalanobis Distances Over Random Projections

paper_url: http://arxiv.org/abs/2311.16614
repo_url: None
paper_authors: Prodromos Kolyvakis, Aristidis Likas
for: 本研究旨在开发一种基于α-单模性假设的多变量单模性测试方法，用于评估多维数据集中的单模性。
methods: 本方法基于一维单模性测试方法的推广，通过线性随机投影将多维数据投影到一维空间中，然后使用点对点距离来评估单模性。
results: both theoretical和empirical研究表明，本方法可以有效地评估多维数据集中的单模性，并且可以估算单模性的数量。

Abstract
Unimodality, pivotal in statistical analysis, offers insights into dataset structures and drives sophisticated analytical procedures. While unimodality's confirmation is straightforward for one-dimensional data using methods like Silverman's approach and Hartigans' dip statistic, its generalization to higher dimensions remains challenging. By extrapolating one-dimensional unimodality principles to multi-dimensional spaces through linear random projections and leveraging point-to-point distancing, our method, rooted in $\alpha$-unimodality assumptions, presents a novel multivariate unimodality test named mud-pod. Both theoretical and empirical studies confirm the efficacy of our method in unimodality assessment of multidimensional datasets as well as in estimating the number of clusters.

摘要
一个重要的统计分析概念是唯一性（unimodality），它提供了数据集结构的信息和驱动了复杂的分析过程。 although the confirmation of unimodality is straightforward for one-dimensional data using methods such as Silverman's approach and Hartigans' dip statistic, its generalization to higher dimensions remains challenging. 我们的方法，基于 $\alpha$-唯一性假设，通过线性随机投影将一维唯一性原理推广到多维空间，并使用点到点距离来评估多维数据的唯一性。 both theoretical and empirical studies have confirmed the effectiveness of our method in assessing the unimodality of multidimensional datasets and estimating the number of clusters.Here's the word-for-word translation of the text into Simplified Chinese:一个重要的统计分析概念是唯一性（unimodality），它提供了数据集结构的信息和驱动了复杂的分析过程。 although the confirmation of unimodality is straightforward for one-dimensional data using methods such as Silverman's approach and Hartigans' dip statistic, its generalization to higher dimensions remains challenging. 我们的方法，基于 $\alpha$-唯一性假设，通过线性随机投影将一维唯一性原理推广到多维空间，并使用点到点距离来评估多维数据的唯一性。 both theoretical and empirical studies have confirmed the effectiveness of our method in assessing the unimodality of multidimensional datasets and estimating the number of clusters.

Eigenmatrix for unstructured sparse recovery

paper_url: http://arxiv.org/abs/2311.16609
repo_url: None
paper_authors: Lexing Ying
for: 这个论文关注了一般形式的不结构化稀疏恢复问题，包括理想化approximation、spectral function估计、Fourier倒推、Laplace倒推和稀疏减去。
methods: 这个论文提出了一种数据驱动的构建方法，即eigenmatrix，它可以提供一种新的方法来解决这些稀疏恢复问题。
results: 文章提供了numerical resultsto demonstartefficient性of the proposed method。

Abstract
This paper considers the unstructured sparse recovery problems in a general form. Examples include rational approximation, spectral function estimation, Fourier inversion, Laplace inversion, and sparse deconvolution. The main challenges are the noise in the sample values and the unstructured nature of the sample locations. This paper proposes the eigenmatrix, a data-driven construction with desired approximate eigenvalues and eigenvectors. The eigenmatrix offers a new way for these sparse recovery problems. Numerical results are provided to demonstrate the efficiency of the proposed method.

摘要
这篇论文考虑了一般形式的不结构化稀疏回复问题。例如：理解函数估计、 спектраль函数估计、傅立假函数倒推、拉Place倒推和稀疏混合。主要挑战是样本值中的噪声和样本位置的不结构化。该论文提出了 eigen矩阵，一种数据驱动的建构，具有愿景的近似尺度和方向。 eigen矩阵为这些稀疏回复问题提供了一新的方法。数据结果表明该方法的效率。

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models

paper_url: http://arxiv.org/abs/2311.16604
repo_url: None
paper_authors: Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao
for: 提高喉音识别（SV）模型在噪音环境中的性能
methods: 使用学习基于的插值代理来自动生成适当的系数来改进SV模型在噪音环境中的性能
results: 在不同的SV模型下，LC4SV consistently improve SV performance in noisy environments

Abstract
The performance of speaker verification (SV) models may drop dramatically in noisy environments. A speech enhancement (SE) module can be used as a front-end strategy. However, existing SE methods may fail to bring performance improvements to downstream SV systems due to artifacts in the predicted signals of SE models. To compensate for artifacts, we propose a generic denoising framework named LC4SV, which can serve as a pre-processor for various unknown downstream SV models. In LC4SV, we employ a learning-based interpolation agent to automatically generate the appropriate coefficients between the enhanced signal and its noisy input to improve SV performance in noisy environments. Our experimental results demonstrate that LC4SV consistently improves the performance of various unseen SV systems. To the best of our knowledge, this work is the first attempt to develop a learning-based interpolation scheme aiming at improving SV performance in noisy environments.

摘要
speaker 识别 (SV) 模型在噪声环境中表现可能会降低很多。一个speech 提高 (SE) 模块可以作为前端策略。然而，现有的 SE 方法可能无法提高下游 SV 系统的性能因为提高后的信号中存在artefacts。为了补偿这些artefacts，我们提议一种通用的噪声抑制框架名为LC4SV，它可以作为不同的未知下游 SV 模型的预处理器。在LC4SV中，我们使用学习基于的 interpolate 代理来自动生成适当的系数 между 提高后的信号和其噪声输入，以改善 SV 性能在噪声环境中。我们的实验结果表明，LC4SV 可靠地提高了多种未看过 SV 系统的性能。据我们所知，这是首次开发一种学习基于的 interpolate 方案，以改善 SV 性能在噪声环境中。

D4AM: A General Denoising Framework for Downstream Acoustic Models

paper_url: http://arxiv.org/abs/2311.16595
repo_url: https://github.com/changlee0903/d4am
paper_authors: Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen
for: 提高自动语音识别（ASR）系统在噪音环境下的性能，通过采用涉及语音和文本的混合数据进行训练。
methods: 提出了一种通用的杜邦混合模型（D4AM），通过负梯度反向搅拌和权重调整算法来实现语音减噪和ASR识别的共同训练。
results: 对于不同的语音识别模型，D4AM可以提供明显的性能提升，特别是在未看过的语音识别系统上进行识别时，相比直接将噪音输入传递，WER减少24.65%。

Abstract
The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. However, existing training objectives of SE methods are not fully effective at integrating speech-text and noisy-clean paired data for training toward unseen ASR systems. In this study, we propose a general denoising framework, D4AM, for various downstream acoustic models. Our framework fine-tunes the SE model with the backward gradient according to a specific acoustic model and the corresponding classification objective. In addition, our method aims to consider the regression objective as an auxiliary loss to make the SE model generalize to other unseen acoustic models. To jointly train an SE unit with regression and classification objectives, D4AM uses an adjustment scheme to directly estimate suitable weighting coefficients rather than undergoing a grid search process with additional training costs. The adjustment scheme consists of two parts: gradient calibration and regression objective weighting. The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. Specifically, when evaluated on the Google ASR API with real noisy data completely unseen during SE training, D4AM achieves a relative WER reduction of 24.65% compared with the direct feeding of noisy input. To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems. Our code is available at https://github.com/ChangLee0903/D4AM.

摘要
听音模型在噪声环境下表现不佳，使用推荐语音增强（SE）作为前端策略可以帮助自动语音识别（ASR）系统。然而，现有的训练目标方法不够有效地将语音-文本和噪声-干净的对应数据 integrate into 训练，以适应未看过的 ASR 系统。在这项研究中，我们提出了一种通用的杜陵减少框架（D4AM），用于不同的下游听音模型。我们的框架在特定听音模型和相应的分类目标下，使用反向梯度来细化 SE 模型。此外，我们的方法还考虑了使用回归目标作为辅助损失函数，以使 SE 模型能够通用化到其他未看过的听音模型。为了同时训练 SE 单元与回归和分类目标进行结合训练，D4AM 使用调整方案来直接估算适当的权重系数而不需要进行额外的训练成本。调整方案包括两部分：梯度准确和回归目标权重。实验结果表明，D4AM 可以有效地提高多种未看过的听音模型的表现，并且超过其他组合方案。特别是，在使用 Google ASR API 的真实噪声数据进行评估时，D4AM 相比直接输入噪声输入，实现了相对减少 WER 24.65%。我们知道这是首次通过有效地结合回归（减少）和分类（ASR）目标来 derive 一个通用的预处理器，用于多种未看过的 ASR 系统。我们的代码可以在 GitHub 上找到。

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

paper_url: http://arxiv.org/abs/2311.16584
repo_url: None
paper_authors: Pengchao Han, Xingyan Shi, Jianwei Huang
for: This paper focuses on addressing the challenges of knowledge distillation (KD) in collaborative learning among distributed clients with different model architectures and non-sharing of local data and model parameters.
methods: The proposed method, Federated knowledge distillation enabled by Adversarial Learning (FedAL), uses a min-max game between clients and a server acting as a discriminator to guide clients’ local model training and achieve consensus model outputs among clients. Additionally, the method uses less-forgetting regularization to mitigate catastrophic forgetting due to clients’ heterogeneous local data.
results: The experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.

Abstract
Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.

摘要
知识塑形（KD）可以启用分布式客户端之间的合作学习，即使这些客户端拥有不同的模型结构和不共享本地数据和模型参数。每个客户端将其本地模型更新为所有客户端模型的平均输出/特征作为目标，称为 federated KD。然而，现有的 federated KD 方法通常在客户端的本地模型训练时不会perform well，因为客户端的本地数据具有不同的数据多样性。在这篇论文中，我们提出了 Federated knowledge distillation enabled by Adversarial Learning（FedAL），以解决客户端之间数据多样性的问题。首先，为了使客户端的本地模型输出更一致，服务器 acts as a discriminator 以引导客户端的本地模型训练，以实现客户端之间的共识模型输出。此外，由于客户端的本地数据多样性，可能会导致客户端的本地训练和全局知识传输中的快速忘记。为此，我们设计了 less-forgetting regularization，以保证客户端能够传输和学习知识。实验结果显示，FedAL 和其变种在其他 federated KD 基线上表现出较高的准确率。

Scalable Label Distribution Learning for Multi-Label Classification

paper_url: http://arxiv.org/abs/2311.16556
repo_url: https://github.com/ailearn-ml/sldl
paper_authors: Xingyu Zhao, Yuexuan An, Lei Qi, Xin Geng
for: 提高多标签分类（MLC）方法的可扩展性和效率，以适应实际场景中的非对称标签关系和大规模输出空间。
methods: 基于离散分布的 latent 空间模型，将标签转换为离散分布，并利用非对称距离度量来建立标签之间的相关性。然后，通过学习将特征空间映射到 latent 空间，以实现计算复杂性与标签数无关。最后，通过 nearest-neighbor 策略对 latent 表示进行解码，获得最终预测结果。
results: 在广泛的实验中，SLDL 能够具有非常竞争力的分类性能，同时具有可扩展的计算复杂性，适用于大规模输出空间。

Abstract
Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric, which is violated in many real-world scenarios. Moreover, most existing methods design learning processes associated with the number of labels, which makes their computational complexity a bottleneck when scaling up to large-scale output space. To tackle these issues, we propose a novel MLC learning method named Scalable Label Distribution Learning (SLDL) for multi-label classification which can describe different labels as distributions in a latent space, where the label correlation is asymmetric and the dimension is independent of the number of labels. Specifically, SLDL first converts labels into continuous distributions within a low-dimensional latent space and leverages the asymmetric metric to establish the correlation between different labels. Then, it learns the mapping from the feature space to the latent space, resulting in the computational complexity is no longer related to the number of labels. Finally, SLDL leverages a nearest-neighbor-based strategy to decode the latent representations and obtain the final predictions. Our extensive experiments illustrate that SLDL can achieve very competitive classification performances with little computational consumption.

摘要
多标签分类（MLC）问题是将给定实例标注为一组相关的标签。现有的大多数 MLC 方法假设每个标签对的相关性是对称的，而在实际场景中这是被违反的。此外，现有的方法设计学习过程与标签数量相关，这使得计算复杂性成为扩展到大规模输出空间的瓶颈。为解决这些问题，我们提出了一种新的 MLC 学习方法 named Scalable Label Distribution Learning（SLDL），可以在一个低维 latent space 中描述不同标签为分布，其中标签相关性是非对称的。SLDL 方法首先将标签转换为在低维 latent space 中的连续分布，然后利用非对称度量来确定不同标签之间的相关性。接着，它学习从特征空间到 latent space 的映射，使得计算复杂性不再与标签数量相关。最后，SLDL 使用 nearest-neighbor-based 策略来解码 latent 表示，从而获得最终预测。我们的广泛实验表明，SLDL 可以在计算效率低下 achieves 非常竞争的分类性能。

Communication Efficiency Optimization of Federated Learning for Computing and Network Convergence of 6G Networks

paper_url: http://arxiv.org/abs/2311.16540
repo_url: None
paper_authors: Yizhuo Cai, Bo Lei, Qianying Zhao, Jing Peng, Min Wei, Yushun Zhang, Xing Zhang
for: 这篇论文目的是强化联盟学习在复杂网络环境中的通信效率。
methods: 本论文使用了computing-measurable, perceptible, distributable, dispatchable, and manageable的6G网络架构和思维模式来支持联盟学习训练，并根据生意需求、资源负载、网络状态和设备算力来决策训练过程。
results: 本论文的实验结果显示，使用CNC架构可以对联盟学习在复杂网络环境中的通信效率进行优化，并能够对参与训练的设备进行均衡分配，以提高资源利用率。

Abstract
Federated learning effectively addresses issues such as data privacy by collaborating across participating devices to train global models. However, factors such as network topology and device computing power can affect its training or communication process in complex network environments. A new network architecture and paradigm with computing-measurable, perceptible, distributable, dispatchable, and manageable capabilities, computing and network convergence (CNC) of 6G networks can effectively support federated learning training and improve its communication efficiency. By guiding the participating devices' training in federated learning based on business requirements, resource load, network conditions, and arithmetic power of devices, CNC can reach this goal. In this paper, to improve the communication efficiency of federated learning in complex networks, we study the communication efficiency optimization of federated learning for computing and network convergence of 6G networks, methods that gives decisions on its training process for different network conditions and arithmetic power of participating devices in federated learning. The experiments address two architectures that exist for devices in federated learning and arrange devices to participate in training based on arithmetic power while achieving optimization of communication efficiency in the process of transferring model parameters. The results show that the method we proposed can (1) cope well with complex network situations (2) effectively balance the delay distribution of participating devices for local training (3) improve the communication efficiency during the transfer of model parameters (4) improve the resource utilization in the network.

摘要
“联合学习”有效地解决数据隐私和其他问题，通过参与设备之间的协作来训练全球模型。然而，在复杂的网络环境下，因为网络拓扑和设备计算能力的因素，可能会影响联合学习的训练或通信过程。随着6G网络的出现，一种新的网络体系和思维方式，即计算可衡量、感知可控、分配可靠、派发可靠和可管理的计算和网络融合（CNC）技术，可以有效支持联合学习的训练，提高其通信效率。通过根据业务需求、资源负担、网络条件和设备算力来导引参与联合学习的设备进行训练，CNC可以实现这一目标。在本文中，我们研究了在复杂网络环境下，为了提高联合学习的通信效率，CNC技术的通信效率优化方法。我们实验了两种设备在联合学习中的架构，并将设备按照算力排序，以实现在传输模型参数时的优化通信效率。结果显示，我们的方法可以：1. 适应复杂网络环境2. 平衡参与设备的本地训练延迟3. 提高传输模型参数时的通信效率4. 提高网络资源的利用率。

Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks

paper_url: http://arxiv.org/abs/2311.16538
repo_url: None
paper_authors: Ye Lin Tun, Chu Myaet Thwal, Ji Su Yoon, Sun Moo Kang, Chaoning Zhang, Choong Seon Hong
for: 这个研究旨在使用联合学习（Federated Learning，FL）方法在分散式学习环境中训练传播模型，以保护敏感数据的隐私。
methods: 这个研究使用了联合学习（FL）方法，收集模型参数而不是敏感数据，以确保数据隐私。
results: 实验结果显示，联合学习（FL）策略可以实现分散式学习环境中的传播模型训练，并且在不同的分散式学习场景下实现了高效的训练。

Abstract
Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challenges associated with privacy-sensitive data, such domains could still benefit from valuable vision services provided by diffusion models. Federated learning (FL) plays a crucial role in enabling decentralized model training without compromising data privacy. Instead of collecting data, an FL system gathers model parameters, effectively safeguarding the private data of different parties involved. This makes FL systems vital for managing decentralized learning tasks, especially in scenarios where privacy-sensitive data is distributed across a network of clients. Nonetheless, FL presents its own set of challenges due to its distributed nature and privacy-preserving properties. Therefore, in this study, we explore the FL strategy to train diffusion models, paving the way for the development of federated diffusion models. We conduct experiments on various FL scenarios, and our findings demonstrate that federated diffusion models have great potential to deliver vision services to privacy-sensitive domains.

摘要
diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challenges associated with privacy-sensitive data, such domains could still benefit from valuable vision services provided by diffusion models. federated learning (FL) plays a crucial role in enabling decentralized model training without compromising data privacy. Instead of collecting data, an FL system gathers model parameters, effectively safeguarding the private data of different parties involved. This makes FL systems vital for managing decentralized learning tasks, especially in scenarios where privacy-sensitive data is distributed across a network of clients. Nonetheless, FL presents its own set of challenges due to its distributed nature and privacy-preserving properties. Therefore, in this study, we explore the FL strategy to train diffusion models, paving the way for the development of federated diffusion models. We conduct experiments on various FL scenarios, and our findings demonstrate that federated diffusion models have great potential to deliver vision services to privacy-sensitive domains.Here is the word-for-word translation of the text into Simplified Chinese:Diffusion模型在视觉任务中表现出了极大的潜力，特别是图像生成。然而，它们的训练通常采用中央化的方式进行，利用公共可用的数据采集。这种方法可能在许多领域不太实际，如医疗领域，这些领域涉及到数据采集的隐私问题。尽管面临着隐私敏感数据的挑战，这些领域仍可从 diffusion模型中获得有价值的视觉服务。联邦学习（FL）在实现分布式学习无需牺牲数据隐私方面扮演着关键的角色。而不是收集数据，FL系统会收集模型参数，从而保护不同参与方的私人数据。这使得FL系统在分布式学习任务中扮演着重要的角色，特别是在分布式数据频繁被分布在多个客户端的情况下。然而，FL也存在其他挑战，如分布式特性和隐私保护性。因此，在本研究中，我们研究了使用FL策略来训练 diffusion模型，为隐私敏感领域提供联邦扩散模型。我们在不同的FL场景中进行了实验，我们的发现表明，联邦扩散模型在隐私敏感领域中具有巨大的潜力。

Personalized Predictions of Glioblastoma Infiltration: Mathematical Models, Physics-Informed Neural Networks and Multimodal Scans

paper_url: http://arxiv.org/abs/2311.16536
repo_url: None
paper_authors: Ray Zirui Zhang, Ivan Ezhov, Michal Balcerak, Andy Zhu, Benedikt Wiestler, Bjoern Menze, John Lowengrub
for: This paper aims to predict the infiltration of glioblastoma (GBM) from medical MRI scans to better understand tumor growth dynamics and design personalized radiotherapy treatment plans.
methods: The paper proposes using physics-informed neural networks (PINNs) to estimate patient-specific parameters of a reaction-diffusion PDE model of GBM growth from a single 3D structural MRI snapshot. The method integrates both data and theory by embedding the PDE into a loss function.
results: The paper validates the method on synthetic and patient datasets and shows promise for real-time parametric inference in the clinical setting for personalized GBM treatment. The method is able to accurately estimate patient-specific parameters and handle complex brain geometry within the PINN framework.

Abstract
Predicting the infiltration of Glioblastoma (GBM) from medical MRI scans is crucial for understanding tumor growth dynamics and designing personalized radiotherapy treatment plans.Mathematical models of GBM growth can complement the data in the prediction of spatial distributions of tumor cells. However, this requires estimating patient-specific parameters of the model from clinical data, which is a challenging inverse problem due to limited temporal data and the limited time between imaging and diagnosis. This work proposes a method that uses Physics-Informed Neural Networks (PINNs) to estimate patient-specific parameters of a reaction-diffusion PDE model of GBM growth from a single 3D structural MRI snapshot. PINNs embed both the data and the PDE into a loss function, thus integrating theory and data. Key innovations include the identification and estimation of characteristic non-dimensional parameters, a pre-training step that utilizes the non-dimensional parameters and a fine-tuning step to determine the patient specific parameters. Additionally, the diffuse domain method is employed to handle the complex brain geometry within the PINN framework. Our method is validated both on synthetic and patient datasets, and shows promise for real-time parametric inference in the clinical setting for personalized GBM treatment.

摘要
预测 Glioblastoma (GBM) 的渗透是诊断和治疗计划的关键，可以帮助理解肿瘤生长动态和设计个性化的放疗疗法。数学模型可以补充医学数据，但是这需要估算患者特定的模型参数，这是一个困难的逆问题，因为有限的时间数据和诊断之间的时间间隔。本文提出了一种方法，使用 Physics-Informed Neural Networks (PINNs) 来估算患者特定的反应扩散 PDE 模型参数从单个 3D 结构 MRI Snapshot。PINNs 将数据和 PDE embed 到损失函数中，因此将理论和数据集成起来。关键创新包括特征非维度参数的标识和估算、预训练步骤使用非维度参数，以及细化步骤确定患者特定参数。此外，Diffuse Domain 方法被用来处理复杂的脑 geometery 在 PINN 框架中。我们的方法在 synthetic 和患者数据集上验证，并在临床设置中实现了实时参数推断，这示出了个性化 GBM 治疗的潜在优势。

Contrastive encoder pre-training-based clustered federated learning for heterogeneous data

paper_url: http://arxiv.org/abs/2311.16535
repo_url: None
paper_authors: Ye Lin Tun, Minh N. H. Nguyen, Chu Myaet Thwal, Jinwoo Choi, Choong Seon Hong
for: 提高分布式学习系统中数据不一致问题的性能
methods: 使用自我超vision学习和客户端分组来解决分布式学习系统中数据不一致问题
results: 通过对照预训练和客户端分组的结合使用，提高了分布式学习系统的模型整合和总体性能

Abstract
Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly affect its performance. To address this, clustered federated learning (CFL) has been proposed to construct personalized models for different client clusters. One effective client clustering strategy is to allow clients to choose their own local models from a model pool based on their performance. However, without pre-trained model parameters, such a strategy is prone to clustering failure, in which all clients choose the same model. Unfortunately, collecting a large amount of labeled data for pre-training can be costly and impractical in distributed environments. To overcome this challenge, we leverage self-supervised contrastive learning to exploit unlabeled data for the pre-training of FL systems. Together, self-supervised pre-training and client clustering can be crucial components for tackling the data heterogeneity issues of FL. Leveraging these two crucial strategies, we propose contrastive pre-training-based clustered federated learning (CP-CFL) to improve the model convergence and overall performance of FL systems. In this work, we demonstrate the effectiveness of CP-CFL through extensive experiments in heterogeneous FL settings, and present various interesting observations.

摘要
federated learning (FL) 是一种有前途的方法，它允许分布式客户端共同训练全局模型，同时保持客户端数据隐私。然而，FL 经常遇到数据不一致问题，这可能会对其表现产生重要影响。为解决这个问题，归类联合 federated learning (CFL) 已经提议，以构建不同客户端群的个性化模型。一种有效的客户端归类策略是让客户端选择自己的本地模型从模型池中，根据其性能。然而，无法预先训练模型参数，这种策略容易导致归类失败，在所有客户端选择同一个模型。寻集大量标注数据 для预训练可以是成本高昂且不实际的。为解决这个挑战，我们利用无supervised contrastive learning来利用无标注数据来预训练 FL 系统。ogether，self-supervised pre-training和客户端归类可以是FL系统面临数据不一致问题的关键解决方案。基于这两种关键策略，我们提议contrastive pre-training-based clustered federated learning (CP-CFL)，以提高 FL 系统的模型融合和总体性能。在这里，我们通过广泛的实验表明 CP-CFL 的效果，并提出了多种有趣的观察。

Utility Fairness in Contextual Dynamic Pricing with Demand Learning

paper_url: http://arxiv.org/abs/2311.16528
repo_url: None
paper_authors: Xi Chen, David Simchi-Levi, Yining Wang
for: 这篇论文是为了研究个性化价格的Contextual Bandit算法，以实现公平性约束下的优化 regret upper bound。
methods: 该方法包括动态价格和需求学习，解决了价格策略中公平性的挑战。我们首先在静态全信息设置下形式化了优化价格策略为受约束的优化问题，并提出了一种简化的近似算法来计算理想策略。
results: 我们通过数学分析和计算研究，描述了公平价格策略的结构， derivated一种简化的价格策略，这 laid the foundations for further research and extensions。此外，我们还扩展了我们的研究到动态价格问题中，设立了一种非标准 regret lower bound，这 highlights the complexity added by fairness constraints。我们的研究表明了公平性的成本和其对于利润最大化和公平性的影响。这项研究为数据驱动的动态价格中融合了优化效率和优化公平性的优化算法提供了一个重要的步骤。

Abstract
This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints in scenarios with uncertain demand, achieving an optimal regret upper bound. Our approach, which incorporates dynamic pricing and demand learning, addresses the critical challenge of fairness in pricing strategies. We first delve into the static full-information setting to formulate an optimal pricing policy as a constrained optimization problem. Here, we propose an approximation algorithm for efficiently and approximately computing the ideal policy. We also use mathematical analysis and computational studies to characterize the structures of optimal contextual pricing policies subject to fairness constraints, deriving simplified policies which lays the foundations of more in-depth research and extensions. Further, we extend our study to dynamic pricing problems with demand learning, establishing a non-standard regret lower bound that highlights the complexity added by fairness constraints. Our research offers a comprehensive analysis of the cost of fairness and its impact on the balance between utility and revenue maximization. This work represents a step towards integrating ethical considerations into algorithmic efficiency in data-driven dynamic pricing.

摘要
In the static full-information setting, we formulate an optimal pricing policy as a constrained optimization problem and propose an approximation algorithm for efficiently computing the ideal policy. Through mathematical analysis and computational studies, we characterize the structures of optimal contextual pricing policies subject to fairness constraints and derive simplified policies that provide a foundation for further research and extensions.We extend our study to dynamic pricing problems with demand learning, establishing a non-standard regret lower bound that highlights the complexity added by fairness constraints. Our research provides a comprehensive analysis of the cost of fairness and its impact on the balance between utility and revenue maximization. This work is a step towards integrating ethical considerations into algorithmic efficiency in data-driven dynamic pricing.

On robust overfitting: adversarial training induced distribution matters

paper_url: http://arxiv.org/abs/2311.16526
repo_url: None
paper_authors: Runzhi Tian, Yongyi Mao
for: 本研究探讨了对抗训练在鲁棒性方面的问题，即对抗训练可能会导致模型的过度鲁棒性。
methods: 本研究使用了PGD-based adversarial training，并提供了一个新的上限 bound for generalization error，其中包含了一个叫做“local dispersion”的扰动算子。
results: 研究发现，鲁棒过度鲁棒性与对抗训练中扰动的困难度有直接关系。同时，提供了一个新的上限 bound for generalization error，可以更好地理解对抗训练的鲁棒性问题。

Abstract
Adversarial training may be regarded as standard training with a modified loss function. But its generalization error appears much larger than standard training under standard loss. This phenomenon, known as robust overfitting, has attracted significant research attention and remains largely as a mystery. In this paper, we first show empirically that robust overfitting correlates with the increasing generalization difficulty of the perturbation-induced distributions along the trajectory of adversarial training (specifically PGD-based adversarial training). We then provide a novel upper bound for generalization error with respect to the perturbation-induced distributions, in which a notion of the perturbation operator, referred to "local dispersion", plays an important role.

摘要
▼ Adversarial training可以看作是标准训练程序中修改了损失函数的情况。但其泛化误差显示得 much larger than标准训练程序下的泛化误差。这种现象，被称为 robust overfitting，已经吸引了广泛的研究注意力，并且仍然是一个谜。在这篇论文中，我们首先经验性地证明了抗性过拟合与随着对抗训练过程中的扰动函数分布的增加相关。然后，我们提供了一个新的泛化误差Upper bound，其中一个扰动运算符，称为"本地扰动"，扮演着重要的角色。

Value Approximation for Two-Player General-Sum Differential Games with State Constraints

paper_url: http://arxiv.org/abs/2311.16520
repo_url: None
paper_authors: Lei Zhang, Mukesh Ghimire, Wenlong Zhang, Zhe Xu, Yi Ren
for: 解决 Hamilton-Jacobi-Isaacs（HJI） partial differential equations（PDEs），帮助实现平衡反馈控制在两个玩家的分布式游戏中。
methods: Physics-informed machine learning 方法，以及 hybrid 学习方法、值硬化方法和 epigraphical 技术。
results: comparing 这些方法的表现， hybrid 学习方法在总体和安全性方面表现最佳。

Abstract
Solving Hamilton-Jacobi-Isaacs (HJI) PDEs enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD). While physics-informed machine learning has been adopted to address CoD in solving PDEs, this method falls short in learning discontinuous solutions due to its sampling nature, leading to poor safety performance of the resulting controllers in robotics applications where values are discontinuous due to state or other temporal logic constraints. In this study, we explore three potential solutions to this problem: (1) a hybrid learning method that uses both equilibrium demonstrations and the HJI PDE, (2) a value-hardening method where a sequence of HJIs are solved with increasing Lipschitz constant on the constraint violation penalty, and (3) the epigraphical technique that lifts the value to a higher dimensional auxiliary state space where the value becomes continuous. Evaluations through 5D and 9D vehicle simulations and 13D drone simulations reveal that the hybrid method outperforms others in terms of generalization and safety performance.

摘要
解决汉姆逊-雅各布-以萨（HJI） partial differential equations (PDEs) 可以实现平衡反馈控制在两个玩家演算中，但是面临维度灰度（CoD）的困难。而物理学 Informed machine learning 已经被应用于解决 PDEs 中的 CoD，但是这种方法因为采样的性质而无法学习离散解，导致 robotics 应用中的控制器性能不佳，因为状态或时间逻辑约束中的离散性。在这种研究中，我们探讨了三种解决方案：（1）一种 hybrid 学习方法，使用Equilibrium demonstrations 和 HJI PDE;（2）一种 value-hardening 方法，在不同的 Lipschitz 常数下解决 HJI 中的约束犯规问题;（3）一种 epigraphical 技术，将值提升到一个更高维度的副状态空间中，使得值变得连续。经过 5D 和 9D 汽车模拟和 13D 无人机模拟，我们发现 hybrid 方法在总体性和安全性方面表现更好。

B-LSTM-MIONet: Bayesian LSTM-based Neural Operators for Learning the Response of Complex Dynamical Systems to Length-Variant Multiple Input Functions

paper_url: http://arxiv.org/abs/2311.16519
repo_url: None
paper_authors: Zhihao Kong, Amirhossein Mollaali, Christian Moya, Na Lu, Guang Lin
for: 本文描述了一种基于神经网络的非线性算子学习框架，即深度操作网络（DeepONet），以及其扩展多输入深度神经网络（MIONet），可以处理多输入函数在不同的巴拿赫空间中。
methods: 本文使用了长短期记忆神经网络（LSTM）来学习时间依赖的数据中的非线性算子。
results: 通过将LSTM интегрирован到MIONet中，本文提出了一种能够处理变量长度、实时数据的模型，并且可以Quantify uncertainty through a novel Bayesian method, sampling from MIONet parameter distributions。这种模型可以更加准确和可靠地处理噪声数据。

Abstract
Deep Operator Network (DeepONet) is a neural network framework for learning nonlinear operators such as those from ordinary differential equations (ODEs) describing complex systems. Multiple-input deep neural operators (MIONet) extended DeepONet to allow multiple input functions in different Banach spaces. MIONet offers flexibility in training dataset grid spacing, without constraints on output location. However, it requires offline inputs and cannot handle varying sequence lengths in testing datasets, limiting its real-time application in dynamic complex systems. This work redesigns MIONet, integrating Long Short Term Memory (LSTM) to learn neural operators from time-dependent data. This approach overcomes data discretization constraints and harnesses LSTM's capability with variable-length, real-time data. Factors affecting learning performance, like algorithm extrapolation ability are presented. The framework is enhanced with uncertainty quantification through a novel Bayesian method, sampling from MIONet parameter distributions. Consequently, we develop the B-LSTM-MIONet, incorporating LSTM's temporal strengths with Bayesian robustness, resulting in a more precise and reliable model for noisy datasets.

摘要
深度运算网络（深度ONet）是一种神经网络框架，用于学习非线性运算符，如 diferencial equations（ODEs）描述的复杂系统。多输入深度神经操作符（MIONet）扩展了深度ONet，允许多个输入函数在不同的巴нах空间中。MIONet 具有不受输出位置限制的网格间距离的灵活性，但它需要线上输入，无法处理测试数据中的变换序列长度，限制了它的实时应用在动态复杂系统中。这项工作重新设计了 MIONet，将长期快速记忆（LSTM） integrate 到它中，以学习时间依赖的数据。这种方法突破了数据离散约束，利用 LSTM 对变量长度、实时数据的能力。学习性能因素，如算法插入能力，被表示出来。框架增强了不确定性评估，通过一种新的 bayesian 方法，从 MIONet 参数分布中采样。因此，我们开发了 B-LSTM-MIONet，将 LSTM 的时间优势与 bayesian 可靠性结合起来，得到一个更精准和可靠的模型，适用于噪音数据集。

On the Robustness of Decision-Focused Learning

paper_url: http://arxiv.org/abs/2311.16487
repo_url: https://github.com/yehya-farhat/AdvAttacks-DFL
paper_authors: Yehya Farhat
for: 这个论文的目的是研究决策关注学习（DFL）在做出决策时的机器学习（ML）模型预测缺失参数问题的性能。
methods: 这个论文使用了十种不同的DFL方法，并对这些方法在预测然后优化问题的情况下进行了两种不同的敌对攻击测试。
results: 研究发现DFL模型在敌对攻击下的性能与其能够根据真实标签获得优化的决策相高度相关。此外，研究还提供了如何target这些模型的方法，并显示这些模型在训练过程中的响应不同于实际标签。

Abstract
Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted. DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives. DFL has shown a lot of promise and holds the capacity to revolutionize decision-making in many real-world applications. However, very little is known about the performance of these models under adversarial attacks. We adopt ten unique DFL methods and benchmark their performance under two distinctly focused attacks adapted towards the Predict-then-Optimize problem setting. Our study proposes the hypothesis that the robustness of a model is highly correlated with its ability to find predictions that lead to optimal decisions without deviating from the ground-truth label. Furthermore, we provide insight into how to target the models that violate this condition and show how these models respond differently depending on the achieved optimality at the end of their training cycles.

摘要
决策驱动学习（DFL）是一种新趋势的学习方法，旨在训练一个机器学习（ML）模型，用于预测未知优化问题中缺失的参数。 DFL 将 ML 模型与预测和优化任务集成在一个端到端系统中，从而提供更好的训练和测试目标的对齐。 DFL 已经显示出很多搭配性和可能在许多实际应用中革新决策。然而，对这些模型在黑客攻击下的性能知之甚少。我们采用了十种独特的 DFL 方法，对其在预测然后优化问题设置下的两种明确定向攻击进行了 benchmark。我们的研究提出了假设，即模型的可靠性与其能够在ground truth标签的基础上找到导致优化决策的预测的能力高度相关。此外，我们还提供了如何对这些违反这个条件的模型进行攻击的方法，并显示了这些模型在训练过程中的不同响应。

On the Effect of Defections in Federated Learning and How to Prevent Them

paper_url: http://arxiv.org/abs/2311.16459
repo_url: None
paper_authors: Minbiao Han, Kumar Kshitij Patel, Han Shao, Lingxiao Wang
for: This paper focuses on the problem of defections in federated learning, where agents may choose to stop participating in the collaboration and instead use their own models.
methods: The paper proposes a new optimization algorithm with theoretical guarantees to prevent defections and ensure asymptotic convergence to an effective solution for all participating agents.
results: The paper shows that current federated optimization algorithms fail to disincentivize harmful defections, and that the proposed algorithm is effective in preventing defections and improving the final model’s robustness and ability to generalize.

Abstract
Federated learning is a machine learning protocol that enables a large population of agents to collaborate over multiple rounds to produce a single consensus model. There are several federated learning applications where agents may choose to defect permanently$-$essentially withdrawing from the collaboration$-$if they are content with their instantaneous model in that round. This work demonstrates the detrimental impact of such defections on the final model's robustness and ability to generalize. We also show that current federated optimization algorithms fail to disincentivize these harmful defections. We introduce a novel optimization algorithm with theoretical guarantees to prevent defections while ensuring asymptotic convergence to an effective solution for all participating agents. We also provide numerical experiments to corroborate our findings and demonstrate the effectiveness of our algorithm.

摘要
federated learning 是一种机器学习协议，允许大量代理人在多 Round 的合作下生成一个共识模型。有些 federated learning 应用中，代理人可能会在 Round 中 permanently 退出$-$ essentially 退出协作 $-$ 如果他们满意当前的模型。这项工作表明这种退出会对最终模型的 Robustness 和泛化能力产生负面影响。我们还证明现有的 federated 优化算法无法抑制这些危害性的退出。我们介绍了一种新的优化算法，具有理论保证，可以防止退出而确保所有参与代理人的 asymptotic 收敛到有效解决方案。我们还提供了数学实验来证明我们的发现，并证明我们的算法的有效性。

Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization

paper_url: http://arxiv.org/abs/2311.16442
repo_url: None
paper_authors: Jinhao Li, Shiyao Li, Jiaming Xu, Shan Huang, Yaoxiu Lian, Jun Liu, Yu Wang, Guohao Dai
for: 这个研究旨在提高大型语言模型（LLMs）的推理成本，并且使用2位量化来实现。
methods: 作者提出了以下技术来解决现有的挑战：仅将一小部分群组使用4位量化，并考虑内存对齐；对于罕见大 weights的分布，只需要16位量化一小部分 OUTliers。
results: 作者获得了>0.5%的准确度改善，并且降低了<3%的平均位数，并且实现了缩短LLM的推理时间和硬件成本。

Abstract
Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. The state-of-the-art methods use 2-bit quantization for mainstream LLMs. However, challenges still exist: (1) Nonnegligible accuracy loss for 2-bit quantization. Weights are quantized by groups, while the ranges of weights are large in some groups, resulting in large quantization errors and nonnegligible accuracy loss (e.g. >3% for Llama2-7b with 2-bit quantization in GPTQ and Greenbit). (2) Limited accuracy improvement by adding 4-bit weights. Increasing 10% extra average bit more 4-bit weights only leads to <0.5% accuracy improvement on a quantized Llama2-7b. (3) Time-consuming dequantization operations on GPUs. The dequantization operations lead to >50% execution time, hindering the potential of reducing LLM inference cost. To tackle these challenges, we propose the following techniques: (1) We only quantize a small fraction of groups with the larger range using 4-bit with memory alignment consideration on GPUs. (2) We point out that the distribution of the sparse outliers with larger weights is different in 2-bit and 4-bit groups, and only a small fraction of outliers require 16-bit quantization. Such design leads to >0.5% accuracy improvement with <3% average increased bit for Llama2-7b. (3) We design the asynchronous dequantization on GPUs, leading to up to 3.92X speedup. We conduct extensive experiments on different model families and model sizes. We achieve 2.85-bit for each weight and the end-to-end speedup for Llama2-7b is 1.74X over the original model, and we reduce both runtime cost and hardware cost by up to 2.70X and 2.81X with less GPU requirements.

摘要
大型语言模型（LLM）在不同领域展示出了优异的能力，但是推理成本仍然是一大障碍。现有的方法通常使用2比特quantization，但是还有一些挑战：1. 2比特quantization对于精确性的影响不可忽略。在某些group中， weights的范围很大，导致quantization error很大，精确性损失超过3%（例如Llama2-7b中的GPTQ和Greenbit）。2. 增加4比特 weights 的精确度改善不足。将10%更多的平均位数增加到4比特 weights 中，仅导致 <0.5% 的精确度改善（在压缩Llama2-7b中）。3. GPU上的dequantization操作耗时过长。dequantization操作导致>50%的执行时间，使得将LLM推理成本降低的潜力受限。为了解决这些挑战，我们提出以下技术：1. 仅将部分group使用4比特quantization，并考虑内存Alignment的考虑（在GPU上）。2. 我们发现2比特和4比特group中 sparse outliers 的分布不同，只有一小部分outliers需要16比特quantization。这种设计导致精确度改善 >0.5%，推理成本增加 <3%（在Llama2-7b中）。3. 我们设计了GPU上的异步dequantization操作，导致最高达3.92倍的速度提升。我们实现了不同的模型家族和模型大小，并进行了广泛的实验。我们可以达到2.85比特每Weight，并且总体推理速度比原始模型快1.74倍。此外，我们还可以降低硬件成本和时间成本，分别降低到2.70倍和2.81倍，并且降低GPU的需求。

A Combinatorial Approach to Robust PCA

paper_url: http://arxiv.org/abs/2311.16416
repo_url: None
paper_authors: Weihao Kong, Mingda Qiao, Rajat Sen
for: 该论文研究了在恶化的情况下恢复高维数据的问题，具体来说，假设数据点上的 Gaussian 噪声在一个未知的 $k$-维子空间 $U \subseteq \mathbb{R}^d$ 中，并且一些随机选择的坐标被控制在一个恶意者手中。
methods: 该论文使用了一种新的 Basis Pursuit（BP）方法分析，以及一种自然的 combinatorial 问题的研究来解决该问题。
results: 该论文的主要结果是一种高效的算法，可以在 $ks^2 = O(d)$ 情况下，在对数误差 $\tilde O(ks/d)$ 下恢复所有数据点。

Abstract
We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown $k$-dimensional subspace $U \subseteq \mathbb{R}^d$, and $s$ randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when $ks^2 = O(d)$, recovers every single data point up to a nearly-optimal $\ell_1$ error of $\tilde O(ks/d)$ in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace $U$. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace $U$ is arbitrary.

摘要
我们研究在恶意扰乱的情况下，用条件是低维度的 Gaussian 资料还原的问题。具体来说，我们假设 Gaussian 资料点在未知的 $k$-dimensional子空间 $U \subseteq \mathbb{R}^d$ 中，并且每个数据点的 $s$ 个标量被控制在恶意的人手中。这个设定模拟了从高维度 yet 结构化的数据集中学习的情况，因为数据点很 unlikely to be entirely clean。我们的主要结果是一个高效的算法，当 $ks^2 = O(d)$ 时，可以将每个数据点Recover up to a nearly-optimal $\ell_1$ error of $\tilde O(ks/d)$ in expectation。在我们的证明中，我们使用了一个新的 Basis Pursuit (BP) 方法的分析，这个方法可以在对应的子空间 $U$ 下成功。不过，我们则提出了一个新的方法，通过研究一个自然的Combinatorial problem，可以在数据点的随机支持下，获得一个高 probabilities 的错误界限，即使 $U$ 是任意的。

Reduced-order modeling for parameterized PDEs via implicit neural representations

paper_url: http://arxiv.org/abs/2311.16410
repo_url: None
paper_authors: Tianshu Wen, Kookjin Lee, Youngsoo Choi
for: 这个论文是为了解决多个参数的partial differential equations (PDEs)问题而提出的一种数据驱动简化模型方法。
methods: 该方法使用了 implicit neural representation (INR) 模型 физи学信号在连续的方式下，并使用 parametrized neural ODE (PNODE) 学习 latent 动力学Characterized by multiple PDE parameters。
results: 在Two-dimensional Burgers equation中，该方法可以在大 Reynolds number 下实现速度增加 Up to O(10^3) 和 ~1% 相对误差 relative to ground truth values。

Abstract
We present a new data-driven reduced-order modeling approach to efficiently solve parametrized partial differential equations (PDEs) for many-query problems. This work is inspired by the concept of implicit neural representation (INR), which models physics signals in a continuous manner and independent of spatial/temporal discretization. The proposed framework encodes PDE and utilizes a parametrized neural ODE (PNODE) to learn latent dynamics characterized by multiple PDE parameters. PNODE can be inferred by a hypernetwork to reduce the potential difficulties in learning PNODE due to a complex multilayer perceptron (MLP). The framework uses an INR to decode the latent dynamics and reconstruct accurate PDE solutions. Further, a physics-informed loss is also introduced to correct the prediction of unseen parameter instances. Incorporating the physics-informed loss also enables the model to be fine-tuned in an unsupervised manner on unseen PDE parameters. A numerical experiment is performed on a two-dimensional Burgers equation with a large variation of PDE parameters. We evaluate the proposed method at a large Reynolds number and obtain up to speedup of O(10^3) and ~1% relative error to the ground truth values.

摘要
我们提出了一种新的数据驱动简化参数化partial differential equation（PDE）解决方法，用于解决许多查询问题。这项工作 draws inspiration from implicit neural representation（INR），它模型物理信号在连续的方式下，不abhängig von spatial/temporal 精度。我们的框架使用PDE并使用 parametrized neural ODE（PNODE）来学习含多个PDE参数的潜在动态。PNODE可以通过一个hypernetwork来减少学习PNODE的可能性，因为复杂的多层感知器（MLP）。我们的框架使用INR来解码潜在动态并重建准确的PDE解决方案。此外，我们还引入物理信息损失，以正确地预测未看到参数实例。在 incorporating 物理信息损失时，我们也可以在不upervised 的方式下，在未看到PDE参数的情况下，细化模型。我们在两个维度的拜尔斯Equation中进行了一个数学实验，并评估了我们的方法。我们在大 Reynolds 数下运行了方法，并 obtaint up to speedup of O(10^3) and ~1% 相对误差 relative to the ground truth values.

Deep Learning for Time Series Classification of Parkinson’s Disease Eye Tracking Data

paper_url: http://arxiv.org/abs/2311.16381
repo_url: None
paper_authors: Gonzalo Uribarri, Simon Ekman von Huth, Josefine Waldthaler, Per Svenningsson, Erik Fransén
for: 本研究使用眼动跟踪技术来诊断和评估帕金森病。
methods: 使用当今最佳的深度学习算法对眼动跟踪数据进行帕金森病分类。
results: 模型能够学习分类任务，并在未看到过的试验者上generalize。InceptionTime实现了78%的准确率，而ROCKET实现了88%的准确率。使用新的验证方法可以提高ROCKET模型的解释性和普适性，实现96%的准确率。

Abstract
Eye-tracking is an accessible and non-invasive technology that provides information about a subject's motor and cognitive abilities. As such, it has proven to be a valuable resource in the study of neurodegenerative diseases such as Parkinson's disease. Saccade experiments, in particular, have proven useful in the diagnosis and staging of Parkinson's disease. However, to date, no single eye-movement biomarker has been found to conclusively differentiate patients from healthy controls. In the present work, we investigate the use of state-of-the-art deep learning algorithms to perform Parkinson's disease classification using eye-tracking data from saccade experiments. In contrast to previous work, instead of using hand-crafted features from the saccades, we use raw $\sim1.5\,s$ long fixation intervals recorded during the preparatory phase before each trial. Using these short time series as input we implement two different classification models, InceptionTime and ROCKET. We find that the models are able to learn the classification task and generalize to unseen subjects. InceptionTime achieves $78\%$ accuracy, while ROCKET achieves $88\%$ accuracy. We also employ a novel method for pruning the ROCKET model to improve interpretability and generalizability, achieving an accuracy of $96\%$. Our results suggest that fixation data has low inter-subject variability and potentially carries useful information about brain cognitive and motor conditions, making it suitable for use with machine learning in the discovery of disease-relevant biomarkers.

摘要
眼动跟踪技术是一种可访问和不侵入的技术，可以提供对试验者的motor和认知能力的信息。因此，它在研究 Parkinson's disease 等神经退化疾病中发挥了重要作用。尤其是在跑动实验中，眼动跟踪数据具有诊断和分期 Parkinson's disease 的价值。然而，迄今为止，没有一个眼动运动生物标志能够确定性地区分病人和健康人。在当前的工作中，我们investigate了使用现代深度学习算法来对眼动跟踪数据进行 Parkinson's disease 分类。与前一些研究不同，我们不使用手动制定的特征从跑动中提取数据，而是使用raw fixation interval的1.5秒时间序列。使用这些短时间序列作为输入，我们实现了两种不同的分类模型：InceptionTime和ROCKET。我们发现这些模型可以学习分类任务，并能够对未见的试验者进行泛化。InceptionTime achieves 78% accuracy，而ROCKET achieves 88% accuracy。我们还使用了一种新的方法来剪辑ROCKET模型，以提高可解释性和泛化性，实现了96%的准确性。我们的结果表明，fixation数据具有低 между人变化，并可能含有有用的认知和motor状况信息，因此适合与机器学习结合使用，以探索疾病相关的生物标志。