2023-07-13

cs.LG

cs.LG - 2023-07-13

PC-Droid: Faster diffusion and improved quality for particle cloud generation

paper_url: http://arxiv.org/abs/2307.06836
repo_url: None
paper_authors: Matthew Leigh, Debajyoti Sengupta, John Andrew Raine, Guillaume Quétant, Tobias Golling
for: 本研究旨在提高diffusion模型的jet particle云生成性能。
methods: 本研究采用了新的diffusion形式化，使用更新的integration solvers，并同时训练所有jet类型。
results: 研究发现，使用快速architecture和consistency distillation可以提高生成速度和质量，并且可以比PC-JeDi和Delphes快速得多。

Abstract
Building on the success of PC-JeDi we introduce PC-Droid, a substantially improved diffusion model for the generation of jet particle clouds. By leveraging a new diffusion formulation, studying more recent integration solvers, and training on all jet types simultaneously, we are able to achieve state-of-the-art performance for all types of jets across all evaluation metrics. We study the trade-off between generation speed and quality by comparing two attention based architectures, as well as the potential of consistency distillation to reduce the number of diffusion steps. Both the faster architecture and consistency models demonstrate performance surpassing many competing models, with generation time up to two orders of magnitude faster than PC-JeDi and three orders of magnitude faster than Delphes.

摘要
基于 PC-JeDi 的成功，我们现在宣布 PC-Droid，一种显著改进的噪声扩散模型，用于生成jet particle云。通过利用新的扩散 форму拉，研究更新的集成解决方案，并同时训练所有jet类型，我们能够实现所有类型的jet across all evaluation metrics的状态态核心性能。我们研究生成速度和质量之间的交易，并比较了两种注意力基 architecture，以及可能性的一致投Distillation来降低扩散步数。这两种更快的架构和一致模型都达到了许多竞争模型的表现，并且生成时间与 PC-JeDi 相比可以提高到两个数量级，与 Delphes 相比可以提高到三个数量级。

A Novel Bayes’ Theorem for Upper Probabilities

paper_url: http://arxiv.org/abs/2307.06831
repo_url: None
paper_authors: Michele Caprio, Yusuf Sale, Eyke Hüllermeier, Insup Lee
for: 这个论文主要是为了解决概率分布的 Bayes posterior probability 问题。
methods: 该论文使用了 Wasserman 和 Kadane 的概率分布类型的上界 bounds，并在这个基础上进行了扩展，对受试验的概率分布进行了扩展。
results: 该论文提出了一个上界 bounds 的方法，可以用于解决具有不确定性的 likelihood 的情况。此外，该论文还给出了一个充分条件，使得上界 bounds 变为等式。这个结果有趣，可以应用于多种工程（如模型预测控制）、机器学习和人工智能领域。

Abstract
In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes' posterior probability of a measurable set $A$, when the prior lies in a class of probability measures $\mathcal{P}$ and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.

摘要
尤先和卡达尼在1990年的论文中，设定了一个上界 для bayes posterior概率的可测集A，当先前 liegt in a class of probability measures $\mathcal{P}$ 和 likelihood 是precise时。他们还给出了一个充分条件，使上界成为等式。在这篇论文中，我们对 Wasserman 和卡达尼的结果进行推广，同时考虑了征客的不确定性。我们给出了一个上界 для posterior概率，当先前和征客都属于一个概率集时。此外，我们还给出了一个充分条件，使上界成为等式。这个结果具有广泛的应用前途，例如机器学习、人工智能和工程预测控制等领域。

A Causal Framework to Unify Common Domain Generalization Approaches

paper_url: http://arxiv.org/abs/2307.06825
repo_url: None
paper_authors: Nevin L. Zhang, Kaican Li, Han Gao, Weiyan Xie, Zhi Lin, Zhenguo Li, Luning Wang, Yongxiang Huang
for: This paper is written for researchers and practitioners who are interested in domain generalization (DG) in machine learning. It aims to provide a causal framework for understanding the key ideas behind different DG approaches and their relationships.
methods: The paper uses a causal framework to understand and unify different DG approaches, including domain adaptation, transfer learning, and multi-task learning. It also discusses the theoretical underpinnings of these methods and how they are related to each other.
results: The paper provides a new understanding of the underlying principles of DG and sheds light on the relative advantages and limitations of different DG methods. It also helps to identify future research directions in this area.

Abstract
Domain generalization (DG) is about learning models that generalize well to new domains that are related to, but different from, the training domain(s). It is a fundamental problem in machine learning and has attracted much attention in recent years. A large number of approaches have been proposed. Different approaches are motivated from different perspectives, making it difficult to gain an overall understanding of the area. In this paper, we propose a causal framework for domain generalization and present an understanding of common DG approaches in the framework. Our work sheds new lights on the following questions: (1) What are the key ideas behind each DG method? (2) Why is it expected to improve generalization to new domains theoretically? (3) How are different DG methods related to each other and what are relative advantages and limitations? By providing a unified perspective on DG, we hope to help researchers better understand the underlying principles and develop more effective approaches for this critical problem in machine learning.

摘要
领域通用化（DG）是关于学习模型可以在新领域中具有良好的泛化性，这些新领域与训练领域相关但不同的。这是机器学习中的一个基本问题，在过去几年内吸引了很多关注。有很多方法被提出来解决这个问题，这些方法受到不同的动机，使得了解这个领域的总体情况变得更加困难。在这篇论文中，我们提出了 causal 框架来解释领域通用化，并对常见的 DG 方法进行了解释。我们的工作照明了以下几个问题：（1）每种 DG 方法的关键思想是什么？（2）为什么 theoretically 预计这些方法可以提高新领域的泛化性？（3）不同的 DG 方法之间有什么关系，它们的优劣点在哪里？通过提供一个综合的视角，我们希望能够帮助研究人员更好地理解领域的基本原理，并开发更有效的领域通用化方法。

TinyMetaFed: Efficient Federated Meta-Learning for TinyML

paper_url: http://arxiv.org/abs/2307.06822
repo_url: None
paper_authors: Haoyu Ren, Xue Li, Darko Anicic, Thomas A. Runkler
for: 这篇论文旨在探讨使用 Federated Meta-Learning (FedML) 技术来搭建 Tiny Machine Learning (TinyML) 应用程序，以发掘微型设备之间的知识统合。
methods: 本论文提出了一个名为 TinyMetaFed 的模型独立 Meta-Learning 框架，可以在 TinyML 环境中进行协同训练。TinyMetaFed 使用了部分本地重建和 Top-P% 选择性通信，以节省通信成本和保护隐私。同时，它还实现了在线学习以减少训练时间，以及在各client当中实现了几次学习来提高模型的稳定性。
results: 论文的实验结果显示，TinyMetaFed 可以对 TinyML 应用程序中的能源消耗和通信负载进行重要的减少，并且可以快速将模型训练到新的设备上。同时，TinyMetaFed 还能够提高模型的稳定性和准确性。

Abstract
The field of Tiny Machine Learning (TinyML) has made substantial advancements in democratizing machine learning on low-footprint devices, such as microcontrollers. The prevalence of these miniature devices raises the question of whether aggregating their knowledge can benefit TinyML applications. Federated meta-learning is a promising answer to this question, as it addresses the scarcity of labeled data and heterogeneous data distribution across devices in the real world. However, deploying TinyML hardware faces unique resource constraints, making existing methods impractical due to energy, privacy, and communication limitations. We introduce TinyMetaFed, a model-agnostic meta-learning framework suitable for TinyML. TinyMetaFed facilitates collaborative training of a neural network initialization that can be quickly fine-tuned on new devices. It offers communication savings and privacy protection through partial local reconstruction and Top-P% selective communication, computational efficiency via online learning, and robustness to client heterogeneity through few-shot learning. The evaluations on three TinyML use cases demonstrate that TinyMetaFed can significantly reduce energy consumption and communication overhead, accelerate convergence, and stabilize the training process.

摘要
随着小型机器学习（TinyML）领域的发展，它已经有效地将机器学习技术应用到低资源设备上，如微控制器。由于这些小型设备的普遍使用，我们可以问到 whether 将这些设备的知识聚合起来可以对 TinyML 应用程序产生 beneficial effect。 Federated meta-learning 是一种有前途的答案，因为它解决了实际世界中数据标签的稀缺和设备之间数据分布的不均匀性。然而，部署 TinyML 硬件面临着唯一的资源限制，使得现有的方法无法实施 due to energy, privacy, and communication limitations。我们介绍 TinyMetaFed，一种适用于 TinyML 的模型独立 meta-learning 框架。TinyMetaFed 可以快速地在新设备上练习和 fine-tune 神经网络初始化，并提供了通信成本和隐私保护，计算效率和稳定性。我们在三个 TinyML 应用场景中进行了评估，结果表明 TinyMetaFed 可以显著减少能源消耗和通信开销，加速整合过程，并稳定训练过程。

Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics

paper_url: http://arxiv.org/abs/2307.06797
repo_url: https://github.com/rossetl/fef
paper_authors: Alessandra Carbone, Aurélien Decelle, Lorenzo Rosset, Beatriz Seoane
for: 这个研究旨在使用能量基模型生成高质量、标签特定的数据，在复杂结构数据集中，如人类基因组、RNA或蛋白sequences数据。传统训练方法在Markov链条件采样中遇到困难，这会影响数据的多样性和生成时间。
methods: 这篇研究使用了一种新的训练算法，利用非平衡效应。这种方法应用于Restricted Boltzmann Machine上，可以在几个采样步骤内准确地分类样本并生成高质量的 sintetic数据。
results: 这种方法在四种不同类型的数据上得到了成功应用，包括手写数字、人类基因组的分类、蛋白 sequences 家族中功能特征的测试、以及特定的税onomies中的同源RNA序列。

Abstract
In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied on the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to four different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, and homologous RNA sequences from specific taxonomies.

摘要

Robotic surface exploration with vision and tactile sensing for cracks detection and characterisation

paper_url: http://arxiv.org/abs/2307.06784
repo_url: None
paper_authors: Francesca Palermo, Bukeikhan Omarali, Changae Oh, Kaspar Althoefer, Ildar Farkhatdinov
for: 这个论文旨在开发一种基于视觉和感觉分析的裂解检测算法，以便在各种环境中检测裂解。
methods: 该算法使用了一个基于光纤的感测器来收集数据，并使用了一个摄像头来扫描环境并运行一个对象检测算法。在检测到裂解后，将裂解转化为完全连接图，并使用最小权重树来计算裂解的短路，以便开发一个机器人抓取器的运动规划。
results: 实验结果表明，提案的算法能够成功地检测裂解，并且可以通过对裂解的分析来确定裂解的长度、宽度、方向和分支数。此外，该算法还可以降低视觉检测算法的成本，并提高对裂解的正确分类和准确地理学分析。

Abstract
This paper presents a novel algorithm for crack localisation and detection based on visual and tactile analysis via fibre-optics. A finger-shaped sensor based on fibre-optics is employed for the data acquisition to collect data for the analysis and the experiments. To detect the possible locations of cracks a camera is used to scan an environment while running an object detection algorithm. Once the crack is detected, a fully-connected graph is created from a skeletonised version of the crack. A minimum spanning tree is then employed for calculating the shortest path to explore the crack which is then used to develop the motion planner for the robotic manipulator. The motion planner divides the crack into multiple nodes which are then explored individually. Then, the manipulator starts the exploration and performs the tactile data classification to confirm if there is indeed a crack in that location or just a false positive from the vision algorithm. If a crack is detected, also the length, width, orientation and number of branches are calculated. This is repeated until all the nodes of the crack are explored. In order to validate the complete algorithm, various experiments are performed: comparison of exploration of cracks through full scan and motion planning algorithm, implementation of frequency-based features for crack classification and geometry analysis using a combination of vision and tactile data. From the results of the experiments, it is shown that the proposed algorithm is able to detect cracks and improve the results obtained from vision to correctly classify cracks and their geometry with minimal cost thanks to the motion planning algorithm.

摘要
Here is the text in Simplified Chinese:这篇论文提出了一种基于视觉和感觉分析的新算法，用于检测和定位裂隙。该算法使用了一个光纤形状感器和一个摄像头来检测裂隙的可能位置，然后使用全连接图和最小束梁树来探索裂隙，计算裂隙的长度、宽度、方向和分支数。该算法通过多种实验验证，包括全扫探测和运动规划算法的比较，以及基于频率特征的裂隙分类和几何分析。实验结果表明，提出的算法可以准确检测裂隙，并使用运动规划算法来提高裂隙分类和几何分析的准确率，而且Cost minimization。

Towards Ordinal Data Science

paper_url: http://arxiv.org/abs/2307.09477
repo_url: None
paper_authors: Gerd Stumme, Dominik Dürrschnabel, Tom Hanika
for: 本研究旨在发展一种新的数据科学研究范畴—顺序数据科学，通过计算顺序结构的方式，从实际数据中提取知识。
methods: 本研究使用了不同的方法来测量和计算顺序结构，包括使用指定的数据图模型、顺序分析和知识表示方法。
results: 本研究显示了通过顺序数据科学方法可以从实际数据中提取有用的知识，并且可以与其他机器学习和知识表示方法进行融合，有助于多种领域的研究和应用。

Abstract
Order is one of the main instruments to measure the relationship between objects in (empirical) data. However, compared to methods that use numerical properties of objects, the amount of ordinal methods developed is rather small. One reason for this is the limited availability of computational resources in the last century that would have been required for ordinal computations. Another reason -- particularly important for this line of research -- is that order-based methods are often seen as too mathematically rigorous for applying them to real-world data. In this paper, we will therefore discuss different means for measuring and 'calculating' with ordinal structures -- a specific class of directed graphs -- and show how to infer knowledge from them. Our aim is to establish Ordinal Data Science as a fundamentally new research agenda. Besides cross-fertilization with other cornerstone machine learning and knowledge representation methods, a broad range of disciplines will benefit from this endeavor, including, psychology, sociology, economics, web science, knowledge engineering, scientometrics.

摘要
<> empirical 数据中对象之间的关系的一种主要仪器是顺序。然而，相比使用数字性质的方法， ordinal 方法的开发规模相对较少。这一点有两个原因：一个是 Last century 的计算资源有限，另一个是 ordinal 方法被视为实际数据应用中过于数学化，难以应用。在这篇论文中，我们将讨论不同的 ordinal 结构测量和计算方法，并如何从 ordinal 结构中提取知识。我们的目标是建立 Ordinal Data Science 作为一个新的研究课程。此外，与其他基estone machine learning 和知识表示方法融合，这种研究将对多种学科产生广泛的影响，包括心理学、社会学、经济学、网络科学、知识工程、科学ometrics。Note: "Ordinal Data Science" is a new research agenda that the author is proposing, which focuses on the study of ordinal structures in empirical data and their applications in various fields.

A decision framework for selecting information-transfer strategies in population-based SHM

paper_url: http://arxiv.org/abs/2307.06978
repo_url: None
paper_authors: Aidan J. Hughes, Jack Poole, Nikolaos Dervilis, Paul Gardner, Keith Worden
for: 提供了一种基于人口的Structural Health Monitoring（SHM）系统的决策支持，以便在结构的运行和维护中减少成本并提高安全性。
methods: 使用了转移学习技术，共享各个结构之间的信息，以减少数据稀缺性的问题。
results: 提出了一种决策框架，可以选择最佳的转移策略，以避免负面传输，并优化信息传输策略，从而减少结构运行和维护成本，并提高安全性。

Abstract
Decision-support for the operation and maintenance of structures provides significant motivation for the development and implementation of structural health monitoring (SHM) systems. Unfortunately, the limited availability of labelled training data hinders the development of the statistical models on which these decision-support systems rely. Population-based SHM seeks to mitigate the impact of data scarcity by using transfer learning techniques to share information between individual structures within a population. The current paper proposes a decision framework for selecting transfer strategies based upon a novel concept -- the expected value of information transfer -- such that negative transfer is avoided. By avoiding negative transfer, and by optimising information transfer strategies using the transfer-decision framework, one can reduce the costs associated with operating and maintaining structures, and improve safety.

摘要
simplified_chinesestructure health monitoring (SHM) 系统的开发和实施带来了重要的决策支持 для结构维护（OM）操作。然而，有限的标签数据的可用性限制了这些决策支持系统中使用的统计模型的发展。基于人口的 SHM 寻求通过转移学习技术共享结构之间的信息，以减轻数据稀缺的影响。本文提出了一种基于新的概念——信息传递期望值——的决策框架，以避免负面传递。通过避免负面传递和优化信息传递策略使用转移决策框架，可以降低结构维护和操作成本，并提高安全性。Note: The Simplified Chinese translation is using the traditional Chinese characters and grammar, which is different from the Simplified Chinese used in mainland China.

Generalizing Supervised Deep Learning MRI Reconstruction to Multiple and Unseen Contrasts using Meta-Learning Hypernetworks

paper_url: http://arxiv.org/abs/2307.06771
repo_url: https://github.com/sriprabhar/km-maml
paper_authors: Sriprabha Ramanarayanan, Arun Palla, Keerthi Ram, Mohanasankar Sivaprakasam
for:This paper proposes a multimodal meta-learning model for image reconstruction, which aims to improve the knowledge generalization of imaging tasks by learning both shared and discriminative weights for various configurations of imaging tasks.methods:The proposed model, called KM-MAML, uses hypernetworks to evolve mode-specific weights, and incorporates gradient-based meta-learning in the contextual space to update the weights of the hypernetworks for different modes. The model also uses a low-rank kernel modulation operation to provide mode-specific inductive bias for multiple modes.results:The experiments on multi-contrast MRI reconstruction show that the proposed model exhibits superior reconstruction performance over joint training, other meta-learning methods, and context-specific MRI reconstruction methods, and better adaptation capabilities with improvement margins of 0.5 dB in PSNR and 0.01 in SSIM. Additionally, a representation analysis with U-Net shows that kernel modulation infuses 80% of mode-specific representation changes in the high-resolution layers.

Abstract
Meta-learning has recently been an emerging data-efficient learning technique for various medical imaging operations and has helped advance contemporary deep learning models. Furthermore, meta-learning enhances the knowledge generalization of the imaging tasks by learning both shared and discriminative weights for various configurations of imaging tasks. However, existing meta-learning models attempt to learn a single set of weight initializations of a neural network that might be restrictive for multimodal data. This work aims to develop a multimodal meta-learning model for image reconstruction, which augments meta-learning with evolutionary capabilities to encompass diverse acquisition settings of multimodal data. Our proposed model called KM-MAML (Kernel Modulation-based Multimodal Meta-Learning), has hypernetworks that evolve to generate mode-specific weights. These weights provide the mode-specific inductive bias for multiple modes by re-calibrating each kernel of the base network for image reconstruction via a low-rank kernel modulation operation. We incorporate gradient-based meta-learning (GBML) in the contextual space to update the weights of the hypernetworks for different modes. The hypernetworks and the reconstruction network in the GBML setting provide discriminative mode-specific features and low-level image features, respectively. Experiments on multi-contrast MRI reconstruction show that our model, (i) exhibits superior reconstruction performance over joint training, other meta-learning methods, and context-specific MRI reconstruction methods, and (ii) better adaptation capabilities with improvement margins of 0.5 dB in PSNR and 0.01 in SSIM. Besides, a representation analysis with U-Net shows that kernel modulation infuses 80% of mode-specific representation changes in the high-resolution layers. Our source code is available at https://github.com/sriprabhar/KM-MAML/.

摘要
meta-学习已经是现代医学影像处理中提高效率的新趋势，帮助提高现代深度学习模型的性能。 meta-学习可以增强医学影像任务的知识泛化，通过学习多种配置的影像任务中的共享和特异性权重。然而，现有的 meta-学习模型尝试学习一个 neural network 的权重初始化，可能是多模态数据的限制。本工作旨在开发一种多模态 meta-学习模型，用于图像重建，该模型通过演化机制来包含多种获取设置的多模态数据。我们提出的模型被称为 KM-MAML（核心修饰基于多模态 meta-学习），具有卷积核修饰操作来生成模式特有的权重。这些权重提供了模式特有的权重初始化，通过重新调整每个核心的基础网络来进行图像重建。我们在上下文空间中使用 GBML（梯度基于 meta-学习）来更新 hypernetworks 的权重。hypernetworks 和重建网络在 GBML 设置中提供了特定模式的权重和低级别图像特征。在多模式 MRI 重建中，我们的模型（i）表现出优于联合训练、其他 meta-学习方法和特定模式 MRI 重建方法，并（ii）在 PSNR 和 SSIM 等标准中提高了适应能力，增强率为 0.5 dB 和 0.01。此外，通过 U-Net 的表示分析发现，核修饰操作在高分辨率层中充分满足了80%的模式特有表示变化。我们的源代码可以在中下载。

Privacy-Utility Trade-offs in Neural Networks for Medical Population Graphs: Insights from Differential Privacy and Graph Structure

paper_url: http://arxiv.org/abs/2307.06760
repo_url: None
paper_authors: Tamara T. Mueller, Maulik Chevli, Ameya Daigavane, Daniel Rueckert, Georgios Kaissis
for: 这个论文探讨了在医疗领域的人口图格中应用差异性保护图 neural network 的实际问题，并在不同的隐私水平下对实际数据集和 sintetic 数据集进行了隐私-功能质量的质量。
methods: 本文使用了 differentially private graph neural networks 来保护数据隐私，并通过会员推测攻击来进行审核。
results: 研究发现了在医疗领域中应用差异性保护图 neural network 的潜在和挑战，并发现了图STRUCTURE 对模型准确率的影响。

Abstract
We initiate an empirical investigation into differentially private graph neural networks on population graphs from the medical domain by examining privacy-utility trade-offs at different privacy levels on both real-world and synthetic datasets and performing auditing through membership inference attacks. Our findings highlight the potential and the challenges of this specific DP application area. Moreover, we find evidence that the underlying graph structure constitutes a potential factor for larger performance gaps by showing a correlation between the degree of graph homophily and the accuracy of the trained model.

摘要
我们开始了一项实验性研究，探索在医疗领域人口图的差异性保护图神经网络中的 privacy-utility 质量平衡，并通过会员推测攻击来审核。我们的发现表明这个特定的DP应用领域的潜力和挑战。此外，我们发现图structure在训练模型准确率方面可能具有一定的因果关系，并且与图同质性（degree of graph homophily）相关。Note: "差异性保护" (differential privacy) in Chinese is often abbreviated as "DP".

Extended Graph Assessment Metrics for Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.10112
repo_url: None
paper_authors: Tamara T. Mueller, Sophie Starck, Leonhard F. Feiner, Kyriaki-Margarita Bintsi, Daniel Rueckert, Georgios Kaissis
for: 这个论文主要探讨了如何将病人群组重新拼接成叫做人口图（population graph），并使用图神经网络（GNNs）进行医疗下游任务。
methods: 论文提出了一种新的图评价指标（GAMs），用于评价不同的图结构。GAMs 包括两个指标：homophily 和 cross-class neighbourhood similarity（CCNS）。
results: 论文通过对不同的医疗人口图和不同的学习设置进行测试，显示了这些指标与模型性能之间的相关性。

Abstract
When re-structuring patient cohorts into so-called population graphs, initially independent data points can be incorporated into one interconnected graph structure. This population graph can then be used for medical downstream tasks using graph neural networks (GNNs). The construction of a suitable graph structure is a challenging step in the learning pipeline that can have severe impact on model performance. To this end, different graph assessment metrics have been introduced to evaluate graph structures. However, these metrics are limited to classification tasks and discrete adjacency matrices, only covering a small subset of real-world applications. In this work, we introduce extended graph assessment metrics (GAMs) for regression tasks and continuous adjacency matrices. We focus on two GAMs in specific: \textit{homophily} and \textit{cross-class neighbourhood similarity} (CCNS). We extend the notion of GAMs to more than one hop, define homophily for regression tasks, as well as continuous adjacency matrices, and propose a light-weight CCNS distance for discrete and continuous adjacency matrices. We show the correlation of these metrics with model performance on different medical population graphs and under different learning settings.

摘要
当重构患者群体为所谓的人口图时，初始独立数据点可以被一起 incorporated 到一个连接的图结构中。这个人口图然后可以用图神经网络（GNNs）进行医疗下游任务。图结构的建立是学习管道中的一个挑战性 step，它可以对模型性能产生严重的影响。为此，不同的图评估度量（GAMs）已经被引入，以评估图结构。但这些度量只适用于分类任务和精确的邻接矩阵，只覆盖了一小部分实际应用场景。在这种工作中，我们介绍了扩展的图评估度量（GAMs），用于回归任务和连续邻接矩阵。我们特别关注了两个 GAMs：同类性和跨类邻近相似性（CCNS）。我们将homophily 扩展到回归任务，以及连续邻接矩阵，并提出了一种轻量级的 CCNS 距离。我们还显示了这些度量与不同的医疗人口图和不同的学习设置之间的相关性。

Neuro-symbolic Empowered Denoising Diffusion Probabilistic Models for Real-time Anomaly Detection in Industry 4.0

paper_url: http://arxiv.org/abs/2307.06975
repo_url: None
paper_authors: Luigi Capogrosso, Alessio Mascolini, Federico Girella, Geri Skenderi, Sebastiano Gaiardelli, Nicola Dall’Ora, Francesco Ponzio, Enrico Fraccaroli, Santa Di Cataldo, Sara Vinco, Enrico Macii, Franco Fummi, Marco Cristani
for: 这篇论文旨在提出一种基于扩散的实时异常预测模型，以应对工业4.0过程中的异常情况。
methods: 该模型使用神经符号学方法，并结合工业知识 ontology，以增加智能制造的正式知识。
results: 该模型可以提供简单 yet 有效的异常预测方法，并可以在嵌入式系统上部署，以便直接整合到生产过程中。

Abstract
Industry 4.0 involves the integration of digital technologies, such as IoT, Big Data, and AI, into manufacturing and industrial processes to increase efficiency and productivity. As these technologies become more interconnected and interdependent, Industry 4.0 systems become more complex, which brings the difficulty of identifying and stopping anomalies that may cause disturbances in the manufacturing process. This paper aims to propose a diffusion-based model for real-time anomaly prediction in Industry 4.0 processes. Using a neuro-symbolic approach, we integrate industrial ontologies in the model, thereby adding formal knowledge on smart manufacturing. Finally, we propose a simple yet effective way of distilling diffusion models through Random Fourier Features for deployment on an embedded system for direct integration into the manufacturing process. To the best of our knowledge, this approach has never been explored before.

摘要
产业4.0具有整合数字技术，如物联网、大数据和人工智能，以提高生产和工业过程的效率和生产力。随着这些技术变得更加相互连接和相互依赖，产业4.0系统变得更加复杂，从而增加了发现和终止异常的困难，这些异常可能会对生产过程产生干扰。本文提出了一种基于扩散的实时异常预测模型，使用神经符号方法，将产业知识体系集成到模型中，从而添加了智能制造的正式知识。此外，我们还提出了一种简单 yet有效的抽象扩散模型的方法，通过Random Fourier Features进行简化，以便在嵌入式系统上部署，直接整合到生产过程中。到目前为止，这种方法尚未得到探讨。

Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent

paper_url: http://arxiv.org/abs/2307.06753
repo_url: https://github.com/Triang-jyed-driung/C2-SC2-GMM
paper_authors: Ruichong Zhang
for: 本文旨在学习 Gaussian Mixture Models (GMM) 的问题上，GMM 在机器学习中扮演着重要的角色，具有表达能力和可解释性，广泛应用于统计、计算机视觉等领域。
methods: 本文提出了一种基于 Sliced Cram'er 2-distance 的方法，可以学习一般多维 GMM。该方法具有许多优点，包括：1) 在一维情况下具有闭式表达，易于计算和实现；2) 兼容梯度下降，可以轻松地与神经网络结合；3) 可以直接将 GMM 学习到另一个 GMM 上，无需采样；4) 具有一些理论保证，如全球梯度 boundedness 和随机抽取梯度的无偏性。
results: 本文通过一个 Gaussian Mixture Distributional Deep Q Network 的示例，证明了该方法的效果。与之前的模型相比，这种模型具有参数效率和更好的解释性。

Abstract
The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the univariate case and is easy to compute and implement using common machine learning libraries (e.g., PyTorch and TensorFlow). Second, it is compatible with gradient descent, which enables us to integrate GMMs with neural networks seamlessly. Third, it can fit a GMM not only to a set of data points, but also to another GMM directly, without sampling from the target model. And fourth, it has some theoretical guarantees like global gradient boundedness and unbiased sampling gradient. These features are especially useful for distributional reinforcement learning and Deep Q Networks, where the goal is to learn a distribution over future rewards. We will also construct a Gaussian Mixture Distributional Deep Q Network as a toy example to demonstrate its effectiveness. Compared with previous models, this model is parameter efficient in terms of representing a distribution and possesses better interpretability.

摘要
学习 Gaussian Mixture Model (GMM) 在机器学习中扮演着重要的角色。GMM 知名于其表达力和可解性，在统计学、计算机视觉以及分布式强化学习中有广泛的应用。然而，许多已知的算法无法适用于 GMM，只有一些 Expectation-Maximization 算法和 Sliced Wasserstein Distance 可以学习这些模型。另外，很少的算法可以与 gradient descent 相结合，这是 neural network 的常见学习过程。在这篇论文中，我们 derivated 一个关闭式的 GMM 公式在一维 случа的情况下，然后提出了一种名为 Sliced Cram\'er 2-distance 的距离函数，用于学习总ivariate GMM。我们的方法具有以下优点：1. 在一维情况下，我们有关闭式的表达，计算和实现容易，可以使用常用的机器学习库（如 PyTorch 和 TensorFlow）。2. 我们的方法可以与 gradient descent 相结合，可以很好地与 neural network 结合使用。3. 我们的方法可以不仅学习一个数据集，还可以直接学习另一个 GMM，不需要采样于目标模型。4. 我们的方法具有一些理论保证，如全局梯度约束和随机梯度的不偏性。这些特点特别有用于分布式强化学习和 Deep Q Networks，其目标是学习未来奖励的分布。我们还会构建一个 Gaussian Mixture Distributional Deep Q Network 作为一个示例，以示其效果。与之前的模型相比，这个模型在表达分布的参数效率和解释性方面具有优势。

Learning Multiple Coordinated Agents under Directed Acyclic Graph Constraints

paper_url: http://arxiv.org/abs/2307.07529
repo_url: None
paper_authors: Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang
for: 本研究提出了一种多智能体强化学习（MARL）方法，用于在导向acyclic graph（DAG）约束下学习多个协ordinated 智能体。
methods: 我们的方法利用DAG结构 между智能体来更有效地学习表现。我们提出了一种基于MARL模型与合成奖励（MARLM-SR）的新价值函数，并证明其为优化价值函数的下界。我们还提出了一种实用的训练算法，其中采用了新的领导者智能体和奖励生成器和分配器智能体，以便更好地在DAG约束下尝试参数空间的探索。
results: 我们在四个DAG环境中进行了实验，包括一个真实世界的Intel高量包装和测试Factory的一个实际任务。我们发现，我们的方法在DAG约束下表现出优于非DAG方法。

Abstract
This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints. Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance. Theoretically, we propose a novel surrogate value function based on a MARL model with synthetic rewards (MARLM-SR) and prove that it serves as a lower bound of the optimal value function. Computationally, we propose a practical training algorithm that exploits new notion of leader agent and reward generator and distributor agent to guide the decomposed follower agents to better explore the parameter space in environments with DAG constraints. Empirically, we exploit four DAG environments including a real-world scheduling for one of Intel's high volume packaging and test factory to benchmark our methods and show it outperforms the other non-DAG approaches.

摘要
Theoretically, we propose a new surrogate value function based on a MARL model with synthetic rewards (MARLM-SR) and prove that it is a lower bound of the optimal value function. Computationally, we develop a practical training algorithm that uses new notions of leader agent and reward generator and distributor agent to guide the decomposed follower agents to explore the parameter space more effectively in environments with DAG constraints.Empirically, we test our method on four DAG environments, including a real-world scheduling problem for one of Intel's high-volume packaging and test factories, and show that it outperforms other non-DAG approaches.

Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2307.06742
repo_url: None
paper_authors: Jinhua Si, Fang He, Xi Lin, Xindi Tang
for: 该研究旨在提高城市群落之间的交通服务质量，通过实施聚合资源管理和需求应对机制。
methods: 该研究提出了一个两级框架，其中上级为聚合agent reinforcement学习模型，用于协同分配空闲车辆到不同的城市线路上，而下级使用适应大 neighboorhood search冒泡算法更新车辆路径。
results: 数值研究基于中国厦门及其周边城市的实际数据表明，提出的框架可以有效缓解供应和需求不均衡，并实现了显著提高每天系统利润和订单完成率。

Abstract
The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and pooled-ride vehicle routing. To tackle these challenges, this study proposes a two-level framework designed to facilitate online fleet management. Specifically, a novel multi-agent feudal reinforcement learning model is proposed at the upper level of the framework to cooperatively assign idle vehicles to different intercity lines, while the lower level updates the routes of vehicles using an adaptive large neighborhood search heuristic. Numerical studies based on the realistic dataset of Xiamen and its surrounding cities in China show that the proposed framework effectively mitigates the supply and demand imbalances, and achieves significant improvement in both the average daily system profit and order fulfillment ratio.

摘要
integración del desarrollo de ciudadanos ha llevado a un aumento en la demanda de viajes interciudades. los servicios de pooling de viajes interciudades tienen un gran potencial para mejorar los servicios de autobuses interciudades tradicionales al implementar mejoras de demanda. Sin embargo, sus operaciones en línea sufren complejidades inherentes debido a la asignación de recursos de vehículos entre ciudades y la ruta de los vehículos en pool. para abordar estos desafíos, este estudio propone un marco de dos niveles diseñado para facilitar la gestión en línea de flotas. específicamente, se propone un modelo de aprendizaje por refuerzo multiexperto en el nivel superior del marco para asignar vehículos ociosos a diferentes líneas interciudades de manera cooperativa, mientras que el nivel inferior actualiza las rutas de los vehículos utilizando un algoritmo de búsqueda de vecindario grande adaptativo. los estudios numéricos realizados con datos realistas de Xiamen y sus ciudades circundantes en china muestran que el marco propuesto efectivamente mitiga las desequilibrios de suministro y demanda, y logra mejoras significativas en la rentabilidad diaria promedio del sistema y el índice de cumplimiento de órdenes.

Implicit regularization in AI meets generalized hardness of approximation in optimization – Sharp results for diagonal linear networks

paper_url: http://arxiv.org/abs/2307.07410
repo_url: https://github.com/johanwind/which_l1_minimizer
paper_authors: Johan S. Wind, Vegard Antun, Anders C. Hansen
for: 这篇论文的目的是理解深度学习中神经网络架构和梯度下降方法所带来的隐式正则化。
methods: 这篇论文使用了斜线Linear Networks（DLN）的梯度流和梯度下降方法来研究隐式正则化。
results: 这篇论文提出了新的 convergence bounds，证明了DLN的梯度流可以准确地 aproximate basis pursuit优化问题的解，并且其中的非锐性取决于DLN的深度。

Abstract
Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the $\ell^1$-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN.

摘要
It is well known that the $\ell^1$-norm of the gradient flow of DLNs with a tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with a tiny initialization approximates minimizers of the basis pursuit optimization problem (rather than just the objective function), and we obtain new and sharp convergence bounds with respect to the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem, which is a contradiction, thus implying sharpness.Moreover, we characterize which $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN.

MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

paper_url: http://arxiv.org/abs/2307.06736
repo_url: https://github.com/coding-loong/MPR-Net
paper_authors: Tianlong Zhao, Xiang Ma, Xuemei Li, Caiming Zhang
for: 这篇论文的目的是提出一种新的时间序列预测模型，以解决现有模型的缺点，如缺乏可解释性和高计算复杂度。
methods: 该模型使用了卷积操作来适应多尺度历史时间序列模式，然后基于先前知识来扩展模式，并使用卷积操作来重建未来模式。
results: 该模型在多个真实数据集上进行了严格的实验，并 achieved state-of-the-art 预测性能，同时也具有良好的泛化和Robustness性能。

Abstract
Time series forecasting has received wide interest from existing research due to its broad applications and inherent challenging. The research challenge lies in identifying effective patterns in historical series and applying them to future forecasting. Advanced models based on point-wise connected MLP and Transformer architectures have strong fitting power, but their secondary computational complexity limits practicality. Additionally, those structures inherently disrupt the temporal order, reducing the information utilization and making the forecasting process uninterpretable. To solve these problems, this paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable. By carrying out sufficient experiments on more than ten real data sets of both short and long term forecasting tasks, MPR-Net achieves the state of the art forecasting performance, as well as good generalization and robustness performance.

摘要
时间序列预测已经受到了广泛的研究关注，因为它具有广泛的应用和内在的挑战性。研究挑战在历史序列中找到有效的模式，并将其应用到未来预测中。高级模型基于点对点连接MLP和Transformer架构具有强大的适应力，但是其次要计算复杂性限制了实用性。此外，这些结构自然地扰乱时间顺序，从而减少了信息利用和使预测过程不可读取。为解决这些问题，本文提出了一种预测模型，MPR-Net。它首先适应性分解多级历史序列模式使用 convolution 操作，然后基于先前知识的模式复制优化方法建立预测方法，最后使用 deconvolution 操作重建未来序列为未来序列。通过利用时间序列中的时间依赖关系，MPR-Net不仅实现了线性时间复杂度，还使预测过程可读取。通过对更多than ten 个真实数据集的短期和长期预测任务进行 suficient 实验，MPR-Net实现了状态的前ier预测性能，以及良好的总体和稳定性性能。

Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds

paper_url: http://arxiv.org/abs/2307.06723
repo_url: None
paper_authors: Nairen Cao, Shang-En Huang, Hsin-Hao Su
for: 这 paper 研究了平行算法 для协同分组问题，即每对不同实体都被标注为相似或不相似。目标是将实体 partition 到最小化与标签的不同而最大化。现有的有效平行算法都有至少3的 approx ratio，与优化后的 sequential 算法（CLN22）的 $1.994+\epsilon$ 比率存在显著差距。
methods: 我们提出了首个 poly-logarithmic depth 平行算法，可以达到比3更好的 approx ratio。我们的算法计算了 $(2.4+\epsilon)$-近似解决方案，并且需要 $\tilde{O}(m^{1.5})$ 工作。此外，它可以被翻译成 $\tilde{O}(m^{1.5})$-时间的 sequential 算法和 poly-logarithmic 轮数低Memory MPC 算法，并且总共需要 $\tilde{O}(m^{1.5})$ 内存。
results: 我们的方法可以达到比3更好的 approx ratio，并且可以在 $\tilde{O}(m^{1.5})$ 工作和内存下完成。此外，我们还证明了我们的方法可以被翻译成 sequential 算法和 MPC 算法。

Abstract
In this paper, we study parallel algorithms for the correlation clustering problem, where every pair of two different entities is labeled with similar or dissimilar. The goal is to partition the entities into clusters to minimize the number of disagreements with the labels. Currently, all efficient parallel algorithms have an approximation ratio of at least 3. In comparison with the $1.994+\epsilon$ ratio achieved by polynomial-time sequential algorithms [CLN22], a significant gap exists. We propose the first poly-logarithmic depth parallel algorithm that achieves a better approximation ratio than 3. Specifically, our algorithm computes a $(2.4+\epsilon)$-approximate solution and uses $\tilde{O}(m^{1.5})$ work. Additionally, it can be translated into a $\tilde{O}(m^{1.5})$-time sequential algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with $\tilde{O}(m^{1.5})$ total memory. Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12] length-constrained multi-commodity flow algorithm, where we develop an efficient parallel algorithm to solve a truncated correlation clustering linear program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of the truncated linear program can be rounded with a factor of at most 2.4 loss by using the framework of [CMSY15]. Such a rounding framework can then be implemented using parallel pivot-based approaches.

摘要
在这篇论文中，我们研究并解决了并行算法的相关划分问题，其中每对两个不同的实体都被标注为相似或不相似。目标是使实体 partition 以最小化与标签的不一致数。目前所有的有效并行算法都有至少3倍的approximation ratio。相比之下，可以达到$1.994+\epsilon$的准确率的核心时间算法（CLN22）存在显著的差距。我们提出了第一个多余logs深度的并行算法，它可以达到更好的approximation ratio，具体来说是$(2.4+\epsilon)$-approximate解决方案，并且使用了$\tilde{O}(m^{1.5})$的工作量。此外，它还可以被翻译成$\tilde{O}(m^{1.5})$时间的顺序算法和多余logs内存MPC算法，并且总内存占用为$\tilde{O}(m^{1.5})$。我们的方法受到Awerbuch、Khandekar和Rao（AKR12）的长度限制多产品流算法的启发，我们开发了一个高效的并行算法来解决压缩相关划分线程程序（Charikar、Guruswami和Wirth（CGW05））。然后，我们显示了这个压缩的线程程序解决方案可以通过使用框架（CMSY15）中的缩放来实现，并且这种缩放框架可以通过并行枢轴方法来实现。

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

paper_url: http://arxiv.org/abs/2307.06721
repo_url: None
paper_authors: Sho Shimoyama, Tetsuro Morimura, Kenshi Abe, Toda Takamichi, Yuta Tomomatsu, Masakazu Sugiyama, Asahi Hentona, Yuuki Azuma, Hirotaka Ninomiya
for: 这篇论文主要针对对话策略学习（DPL）中的对话策略和奖励函数的学习问题。
methods: 该论文提出了一种基于对抗学习（AL）的方法，通过对对话策略和奖励函数的同时训练来估算奖励函数。
results: 该论文通过对多个多元任务对话资料集 MultiWOZ 进行测试，证明了该方法可以减少对AL的依赖性，同时保留其优势。

Abstract
Dialog policies, which determine a system's action based on the current state at each dialog turn, are crucial to the success of the dialog. In recent years, reinforcement learning (RL) has emerged as a promising option for dialog policy learning (DPL). In RL-based DPL, dialog policies are updated according to rewards. The manual construction of fine-grained rewards, such as state-action-based ones, to effectively guide the dialog policy is challenging in multi-domain task-oriented dialog scenarios with numerous state-action pair combinations. One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL). Although this method has demonstrated superior performance experimentally, it is fraught with the inherent problems of AL, such as mode collapse. This paper first identifies the role of AL in DPL through detailed analyses of the objective functions of dialog policy and reward estimator. Next, based on these analyses, we propose a method that eliminates AL from reward estimation and DPL while retaining its advantages. We evaluate our method using MultiWOZ, a multi-domain task-oriented dialog corpus.

摘要
对话策略，决定系统在对话中的行为，是对对话的成功非常重要的。在过去几年，人工智能学习（RL）已经成为对话策略学习（DPL）中一种可能的选择。在RL基于的DPL中，对话策略会根据奖励进行更新。在多个领域任务对话enario中，手动构建细化的奖励，如状态动作对的奖励，是一个挑战。一种可以从收集的数据中估计奖励的方法是通过对抗学习（AL）训练奖励估计器和对话策略同时。虽然这种方法在实验中表现出色，但它受到了对抗学习的内在问题，如模式落入。本文首先通过对对话策略和奖励估计器的目标函数进行详细分析，然后根据这些分析，我们提出了一种不使用对抗学习的奖励估计和对话策略学习方法。我们使用MultiWOZ多个领域任务对话资料来评估我们的方法。

Unsupervised Calibration through Prior Adaptation for Text Classification using Large Language Models

paper_url: http://arxiv.org/abs/2307.06713
repo_url: None
paper_authors: Lautaro Estienne, Luciana Ferrer, Matías Vera, Pablo Piantanida
for: 这个研究旨在透过不需要 Labelled 样本来进行文本分类任务，并且只需要几个内部样本查询。
methods: 本研究提出了一种方法，将语言模型（LLM）视为黑盒子，并在这个黑盒子中进行标签整合。
results: 研究结果显示，这种方法可以超越未适应的模型，并且在不同的训练体例中表现出色。

Abstract
A wide variety of natural language tasks are currently being addressed with large-scale language models (LLMs). These models are usually trained with a very large amount of unsupervised text data and adapted to perform a downstream natural language task using methods like fine-tuning, calibration or in-context learning. In this work, we propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples and only few in-domain sample queries. The proposed approach treats the LLM as a black box, adding a stage where the model posteriors are calibrated to the task. Results show that these methods outperform the un-adapted model for different number of training shots in the prompt and a previous approach were calibration is performed without using any adaptation data.

摘要
很多自然语言任务目前正在使用大规模语言模型（LLM）进行处理。这些模型通常通过大量不监督文本数据进行训练并通过细化、调整或在语言任务中进行学习来适应下渠道任务。在这项工作中，我们提议一种方法，可以在没有标签样本的情况下，通过调整模型 posterior 来进行文本分类任务。这种方法将 LLM 视为黑盒模型，并在模型 posterior 的抽象上进行调整。结果显示，这种方法可以在不同的训练预示中超越未适应模型。

GRAN is superior to GraphRNN: node orderings, kernel- and graph embeddings-based metrics for graph generators

paper_url: http://arxiv.org/abs/2307.06709
repo_url: https://github.com/otouat/gnnevaluationmetrics
paper_authors: Ousmane Touat, Julian Stier, Pierre-Edouard Portier, Michael Granitzer
for: 本研究探讨了多种生成模型的应用在图像领域，包括药物发现、路网、神经网络搜索和程序 Synthesis。
methods: 本文使用了kernel-based和拓扑-based的评估方法来评估生成模型的性能，并对GRAN和GraphRNN两种常见的生成模型进行比较，以及提出一种改进GraphRNN的方法。
results: 研究发现，拓扑-based的评估方法在嵌入空间中表现较好，GRAN比GraphRNN更有优势，而改进的GraphRNN方法也有效于小型图。此外，本文还提供了一个关于数据选择和节点特征初始化的指南。

Abstract
A wide variety of generative models for graphs have been proposed. They are used in drug discovery, road networks, neural architecture search, and program synthesis. Generating graphs has theoretical challenges, such as isomorphic representations -- evaluating how well a generative model performs is difficult. Which model to choose depending on the application domain? We extensively study kernel-based metrics on distributions of graph invariants and manifold-based and kernel-based metrics in graph embedding space. Manifold-based metrics outperform kernel-based metrics in embedding space. We use these metrics to compare GraphRNN and GRAN, two well-known generative models for graphs, and unveil the influence of node orderings. It shows the superiority of GRAN over GraphRNN - further, our proposed adaptation of GraphRNN with a depth-first search ordering is effective for small-sized graphs. A guideline on good practices regarding dataset selection and node feature initialization is provided. Our work is accompanied by open-source code and reproducible experiments.

摘要
各种生成模型 для图有多种提议。它们在药物探索、路网、神经网络搜索和程序生成中使用。生成图有理论挑战，如同构表示——评估生成模型表现的难度。哪种模型取决于应用领域？我们广泛研究基于kernel的度量和抽象空间中基于抽象的度量。抽象空间中基于度量的模型表现较好。我们使用这些度量对GRAN和GraphRNN两种知名的生成模型进行比较，并揭示节点顺序对GRAN和GraphRNN的影响。结果显示GRAN在小型图中表现更优。此外，我们提出了基于深度先遍步顺序的GraphRNN改进方案，对小型图有效。我们还提供了关于数据选择和节点特征初始化的良好实践指南。我们的工作附有开源代码和可重现的实验。

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

paper_url: http://arxiv.org/abs/2307.06701
repo_url: None
paper_authors: Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi
for: 这篇论文targets the video prediction task, aiming to improve the accuracy and efficiency of video prediction models.
methods: 该模型combines two novel techniques: (i) hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) spatiotemporal PixelCNN (ST-PixelCNN). The proposed model is called sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE).
results: 实验结果表明， compared to other state-of-the-art video prediction techniques, S-HR-VQVAE achieves better performance in both quantitative and qualitative evaluations, despite having a much smaller model size.

Abstract
We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.

摘要
我们提出了一种新的模型，它将（i）我们最近提出的层次嵌入式减量变换自适应器（HR-VQVAE）和（ii）一种新的空间时间帧帧 convolutional neural network（ST-PixelCNN）相结合。我们称这种方法为sequential hierarchical residual learning vector quantized variational autoencoder（S-HR-VQVAE）。通过利用HR-VQVAE对静止图像的减量表示的内在能力，以及ST-PixelCNN对空间时间信息的处理能力，S-HR-VQVAE可以更好地处理视频预测中的主要挑战，包括学习空间时间信息、处理高维数据、抵御模糊预测和隐式模型物理特征。我们在KTH人体动作和Move-MNIST任务上进行了广泛的实验，并证明了我们的模型与当今最佳视频预测技术相比，在量化和 каче化评价中均表现出色，即使模型规模很小。最后，我们提出了一种新的训练方法，可以同时优化HR-VQVAE和ST-PixelCNN参数。

Short Boolean Formulas as Explanations in Practice

paper_url: http://arxiv.org/abs/2307.06971
repo_url: None
paper_authors: Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander
for: 这个论文的目的是解释数据模型中的 объяснения。
methods: 这个论文使用了简短的布尔方程来实现解释。
results: 研究发现，使用简短的布尔方程可以获得reasonably accurate的解释，且可以避免过拟合。

Abstract
We investigate explainability via short Boolean formulas in the data model based on unary relations. As an explanation of length k, we take a Boolean formula of length k that minimizes the error with respect to the target attribute to be explained. We first provide novel quantitative bounds for the expected error in this scenario. We then also demonstrate how the setting works in practice by studying three concrete data sets. In each case, we calculate explanation formulas of different lengths using an encoding in Answer Set Programming. The most accurate formulas we obtain achieve errors similar to other methods on the same data sets. However, due to overfitting, these formulas are not necessarily ideal explanations, so we use cross validation to identify a suitable length for explanations. By limiting to shorter formulas, we obtain explanations that avoid overfitting but are still reasonably accurate and also, importantly, human interpretable.

摘要
我们通过简单的布尔方程来调查可解释性。我们选择一个长度为k的布尔方程，以最小化与目标特性的误差来解释。我们首先提供了新的量化 bound，用于预期误差的情况。然后，我们通过使用Answer Set Programming编码来计算不同长度的解释方程，并证明在具体数据集上获得最佳性能。然而，由于过拟合，这些方程可能并不是理想的解释。因此，我们使用交叉验证来确定合适的解释长度，以避免过拟合而仍保持可解释性和有理解性。

IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation

paper_url: http://arxiv.org/abs/2307.06698
repo_url: https://github.com/thiviyant/intelligraphs
paper_authors: Thiviyan Thanapalasingam, Emile van Krieken, Peter Bloem, Paul Groth
for: 本文旨在提出一个新的知识图谱推理任务，即生成有Semantics的可能性推理图谱。
methods: 本文提出了五个新的知识图谱数据集，并实现了一种生成符合逻辑规则的子图谱的数据生成器。同时，本文也提出了四种基线模型，包括三种基于传统的KGE模型。
results: 本文的实验表明，传统的KGE模型无法 capture Semantics，而IntelliGraphs数据集和生成器可以帮助提高机器学习模型的semantic理解能力。

Abstract
Knowledge Graph Embedding (KGE) models are used to learn continuous representations of entities and relations. A key task in the literature is predicting missing links between entities. However, Knowledge Graphs are not just sets of links but also have semantics underlying their structure. Semantics is crucial in several downstream tasks, such as query answering or reasoning. We introduce the subgraph inference task, where a model has to generate likely and semantically valid subgraphs. We propose IntelliGraphs, a set of five new Knowledge Graph datasets. The IntelliGraphs datasets contain subgraphs with semantics expressed in logical rules for evaluating subgraph inference. We also present the dataset generator that produced the synthetic datasets. We designed four novel baseline models, which include three models based on traditional KGEs. We evaluate their expressiveness and show that these models cannot capture the semantics. We believe this benchmark will encourage the development of machine learning models that emphasize semantic understanding.

摘要
知识图 embedding (KGE) 模型用于学习连续表示实体和关系。文献中的关键任务是预测缺失的链接。然而，知识图不仅是链接的集合，还有底层 semantics 结构。这些 semantics 在下游任务中如查询回答或理解中是关键的。我们介绍了 subgraph inference 任务，其中模型需要生成可能和semantically valid的子图。我们提出了 IntelliGraphs，一组五个新的知识图据集。IntelliGraphs 数据集包含具有 semantics 表示的子图，用逻辑规则进行评估子图推理。我们还介绍了生成这些 sintetic 数据集的数据生成器。我们设计了四种基线模型，其中三种基于传统 KGE。我们评估了这些模型的表达能力，并证明这些模型无法捕捉 semantics。我们认为这个标准会鼓励机器学习模型强调semantic理解。

Ageing Analysis of Embedded SRAM on a Large-Scale Testbed Using Machine Learning

paper_url: http://arxiv.org/abs/2307.06693
repo_url: None
paper_authors: Leandro Lanzieri, Peter Kietzmann, Goerschwin Fey, Holger Schlarb, Thomas C. Schmidt
for: 这篇论文的目的是为了检测和预测 IoT 设备年龄，以便在长期运行的场景下进行诊断和维护。
methods: 该论文使用了大规模的实验分析自然 SRAM 耗尽的方法，通过不同的指标进行特征提取，并使用常见的机器学习方法来预测节点的运行时间。
results: 研究发现，即使年龄的影响很小，但是我们的指标可以准确地估算节点的使用时间，$R^2$ 分数为 0.77，错误率为 24% 使用回归分析，并且使用分类器可以达到六个月的分辨率。

Abstract
Ageing detection and failure prediction are essential in many Internet of Things (IoT) deployments, which operate huge quantities of embedded devices unattended in the field for years. In this paper, we present a large-scale empirical analysis of natural SRAM wear-out using 154 boards from a general-purpose testbed. Starting from SRAM initialization bias, which each node can easily collect at startup, we apply various metrics for feature extraction and experiment with common machine learning methods to predict the age of operation for this node. Our findings indicate that even though ageing impacts are subtle, our indicators can well estimate usage times with an $R^2$ score of 0.77 and a mean error of 24% using regressors, and with an F1 score above 0.6 for classifiers applying a six-months resolution.

摘要
互联网物联网（IoT）应用中，年龄检测和失效预测是非常重要的，因为它们运行着庞大量的嵌入式设备，距离用户超过几年。在这篇论文中，我们对一个通用测试平台上的154个板子进行了大规模的实践分析，以探讨自然SRAM耗尽的情况。我们从SRAM初始化偏好开始，每个节点可以轻松地收集这些数据，然后我们使用不同的特征提取方法和常见的机器学习方法来预测节点的使用时间。我们的发现表明，尽管年龄的影响很小，但我们的指标可以很好地估计节点的使用时间，$R^2$分数为0.77，误差为24%，使用回归分析器，并且使用六个月的分辨率时，F1分数高于0.6。

paper_url: http://arxiv.org/abs/2307.06688
repo_url: https://github.com/aavek/aeolus-ocean
paper_authors: Andrew Alexander Vekinis, Stavros Perantonis
For: 帮助无人水面船（USV）在海洋领域实现自主导航，以提高安全性和降低运行成本，同时为海洋研究、探索和监测提供新的可能性。* Methods: 使用深度强化学习（DRL）和计算机视觉（CV）算法，在实际海洋 simulate 环境中创建了 COLREG 遵从的数字吊尼，以开发和引导 USV 控制系统。* Results: 在许多成功的航行任务中，使用这种方法训练出的自主 Agent 能够成功避免碰撞，并在开放海域和沿海交通中与其他船只进行安全的交通。

Abstract
Heading towards navigational autonomy in unmanned surface vehicles (USVs) in the maritime sector can fundamentally lead towards safer waters as well as reduced operating costs, while also providing a range of exciting new capabilities for oceanic research, exploration and monitoring. However, achieving such a goal is challenging. USV control systems must, safely and reliably, be able to adhere to the international regulations for preventing collisions at sea (COLREGs) in encounters with other vessels as they navigate to a given waypoint while being affected by realistic weather conditions, either during the day or at night. To deal with the multitude of possible scenarios, it is critical to have a virtual environment that is able to replicate the realistic operating conditions USVs will encounter, before they can be implemented in the real world. Such "digital twins" form the foundations upon which Deep Reinforcement Learning (DRL) and Computer Vision (CV) algorithms can be used to develop and guide USV control systems. In this paper we describe the novel development of a COLREG-compliant DRL-based collision avoidant navigational system with CV-based awareness in a realistic ocean simulation environment. The performance of the trained autonomous Agents resulting from this approach is evaluated in several successful navigations to set waypoints in both open sea and coastal encounters with other vessels. A binary executable version of the simulator with trained agents is available at https://github.com/aavek/Aeolus-Ocean

摘要
heading towards autonomous navigation in unmanned surface vehicles (USVs) in the maritime industry can lead to safer waters and lower operating costs, while also providing new opportunities for ocean research, exploration, and monitoring. However, achieving this goal is challenging. USV control systems must be able to safely and reliably follow international collision regulations (COLREGs) when encountering other vessels while navigating to a specific location in realistic weather conditions, both day and night. To handle various scenarios, it is crucial to have a virtual environment that can realistically simulate the operating conditions USVs will encounter. Such "digital twins" provide the foundation for developing and testing USV control systems using Deep Reinforcement Learning (DRL) and Computer Vision (CV) algorithms. In this paper, we describe the development of a COLREG-compliant DRL-based collision avoidance navigational system with CV-based awareness in a realistic ocean simulation environment. The performance of the trained autonomous Agents resulting from this approach is evaluated in several successful navigations to set waypoints in both open sea and coastal encounters with other vessels. A binary executable version of the simulator with trained agents is available at .

Machine Learning-Assisted Pattern Recognition Algorithms for Estimating Ultimate Tensile Strength in Fused Deposition Modeled Polylactic Acid Specimens

paper_url: http://arxiv.org/abs/2307.06970
repo_url: None
paper_authors: Akshansh Mishra, Vijaykumar S Jatti
for: 这项研究旨在利用监督学习算法来估算由热成型法制造的聚拉ctic酸（PLA）样品的绝对剪切强度（UTS）。
methods: 本研究使用了四种监督分类算法，namely Logistic Classification, Gradient Boosting Classification, Decision Tree, and K-Nearest Neighbor，来预测样品的UTS。
results: 研究发现，Decision Tree和K-Nearest Neighbor算法均达到了F1分数0.71，但KNN算法表现出了更高的Area Under the Curve（AUC）分数0.79，在分类任务中表现出了更好的能力。这表明KNN算法在分类任务中的选择性比其他算法更高，因此在这种研究 Context中是最佳的选择。

Abstract
In this study, we investigate the application of supervised machine learning algorithms for estimating the Ultimate Tensile Strength (UTS) of Polylactic Acid (PLA) specimens fabricated using the Fused Deposition Modeling (FDM) process. A total of 31 PLA specimens were prepared, with Infill Percentage, Layer Height, Print Speed, and Extrusion Temperature serving as input parameters. The primary objective was to assess the accuracy and effectiveness of four distinct supervised classification algorithms, namely Logistic Classification, Gradient Boosting Classification, Decision Tree, and K-Nearest Neighbor, in predicting the UTS of the specimens. The results revealed that while the Decision Tree and K-Nearest Neighbor algorithms both achieved an F1 score of 0.71, the KNN algorithm exhibited a higher Area Under the Curve (AUC) score of 0.79, outperforming the other algorithms. This demonstrates the superior ability of the KNN algorithm in differentiating between the two classes of ultimate tensile strength within the dataset, rendering it the most favorable choice for classification in the context of this research. This study represents the first attempt to estimate the UTS of PLA specimens using machine learning-based classification algorithms, and the findings offer valuable insights into the potential of these techniques in improving the performance and accuracy of predictive models in the domain of additive manufacturing.

摘要
在这个研究中，我们研究了使用监督式机器学习算法来估计制造使用泵流溶解模型（FDM） proces的聚酸酯（PLA）样品的最大强度（UTS）。总共有31个PLA样品被准备，输入参数包括填充比率、层高、印刷速度和溶解温度。研究的主要目标是评估四种不同的监督式分类算法，namely Logistic Classification、Gradient Boosting Classification、Decision Tree和K-Nearest Neighbor，在预测样品的UTS方面的精度和有效性。结果显示，Despite Tree和K-Nearest Neighbor算法都达到了F1分数0.71，KNN算法的AUC分数为0.79，高于其他算法，这表明KNN算法在数据集中更好地区分两个类别的最终强度，因此在这个上下文中，KNN算法是最佳选择。这项研究是预测PLA样品的UTS使用机器学习基于分类算法的第一次尝试，发现的结果提供了对预测模型在材料加工领域的可能性和精度的有价值的信息。

Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

paper_url: http://arxiv.org/abs/2307.07426
repo_url: https://github.com/iamtheband/martelloni_et_al_ismir2023
paper_authors: Andrea Martelloni, Andrew P McPherson, Mathieu Barthet
for: 这个论文旨在提高低音钢琴的演奏能力，通过实时音乐信息检索（RT-MIR）技术。
methods: 该论文使用了 convolutional neural networks（CNNs）和变量自动编码器（VAEs）来实现实时钢琴身部打击识别和嵌入学习。
results: 研究发现，使用VAEs可以提高分类器的质量，特别是在简化后的2类识别任务中，而且VAEs可以提高分布之间的类别分离度。

Abstract
Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified.

摘要
现实时音乐信息检索（RT-MIR）具有增强传统音响乐器的潜在能力。我们开发了RT-MIR技术，旨在补充打击式手风琴演奏。我们提出了增强乐器性能的RT-MIR系统设计目标：（i） causal约束，（ii）实际无关作用响应延迟，（iii）控制亲密支持，（iv）合成控制支持。我们介绍了实时鼓部打击识别和嵌入学习技术，使用卷积神经网络（CNN）和CNN与变量自动编码器（VAE）进行联合训练。我们提出了鼓部打击的分类法，并采用跨数据集评估方法。结果表明，网络具有强大分类能力，特别是在简化后2类认知任务中，而VAE增加了分布间的KL散度，表明VAE嵌入质量可以支持控制亲密和丰富的互动。然而，我们还需要进一步探索不同数据集的通用化问题。

Layerwise Linear Mode Connectivity

paper_url: http://arxiv.org/abs/2307.06966
repo_url: None
paper_authors: Linara Adilova, Asja Fischer, Martin Jaggi
for: 这个论文是关于联合训练的 federated deep learning 中的一种常用策略，即在训练过程中多次进行模型参数的汇集，以实现更强的全局模型。
methods: 这个论文使用了一种叫做 “layerwise” 的方法，即在不同层之间进行汇集，以解决联合训练中模型之间的差异。
results: 论文的结果表明，使用 layerwise 方法可以减轻模型之间的差异，从而提高联合训练的效果。此外，论文还发现了一些特定的层或层组在联合训练中的阻碍效应，这些阻碍效应可以通过 adjusting the learning rate 来解决。

Abstract
In the federated setup one performs an aggregation of separate local models multiple times during training in order to obtain a stronger global model; most often aggregation is a simple averaging of the parameters. Understanding when and why averaging works in a non-convex setup, such as federated deep learning, is an open challenge that hinders obtaining highly performant global models. On i.i.d.~datasets federated deep learning with frequent averaging is successful. The common understanding, however, is that during the independent training models are drifting away from each other and thus averaging may not work anymore after many local parameter updates. The problem can be seen from the perspective of the loss surface: for points on a non-convex surface the average can become arbitrarily bad. The assumption of local convexity, often used to explain the success of federated averaging, contradicts to the empirical evidence showing that high loss barriers exist between models from the very beginning of the learning, even when training on the same data. Based on the observation that the learning process evolves differently in different layers, we investigate the barrier between models in a layerwise fashion. Our conjecture is that barriers preventing from successful federated training are caused by a particular layer or group of layers.

摘要
在联合设置下，通过多次对多个本地模型进行聚合来实现更强的全球模型，通常是简单的参数平均。但是理解在非 convex 设置中，如联合深度学习中， WHEN 和 WHY 聚合工作的问题是一个开放的挑战，这阻碍了获得高性能的全球模型。在 i.i.d. datasets 上，联合深度学习 WITH 频繁聚合是成功的。然而，通常认为在独立训练中模型会逐渐偏离彼此，因此聚合可能不再有效了，特别是在多个本地参数更新后。这可以从损失函数的角度看，在非 convex 表面上的平均可能变得无限坏。常见的本地几何Assumption ，用来解释联合聚合的成功，与实验证据表明，从学习开始，模型之间的损失函数高度不同，这与高损失障碍的存在相 contradistinguish。基于层 wise 的观察，我们提出的假设是，在层 wise 的某些层或组件上，存在阻碍联合训练的栅栏。

Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks

paper_url: http://arxiv.org/abs/2307.06645
repo_url: None
paper_authors: Mario Di Mauro, Giovanni Galatro, Fabio Postiglione, Wei Song, Antonio Liotta
for: 预测实时流量（如VoIP）的行为可以帮助运营商更好地规划其网络基础设施，并优化资源的分配。本文提出了一种预测QoS/QoE指标的方法，以帮助运营商更好地理解和预测VOIP流量的行为。
methods: 本文使用了时间序列分析和机器学习技术（深度基于和树基于）来预测VOIP流量中重要的QoS/QoE指标。具体来说，本文首先将问题定型为一个多变量时间序列分析问题，然后使用VECTOR自动回归模型和机器学习技术来预测QoS/QoE指标的行为。
results: 实验结果表明，使用时间序列分析和机器学习技术可以准确预测VOIP流量中重要的QoS/QoE指标。其中，深度基于机器学习技术表现较好，时间复杂度较低。此外，本文还进行了一系列 auxillary 分析（如站点性和相互响应函数），以提供更深入的理解和分析VOIP流量的行为。

Abstract
Predicting the behavior of real-time traffic (e.g., VoIP) in mobility scenarios could help the operators to better plan their network infrastructures and to optimize the allocation of resources. Accordingly, in this work the authors propose a forecasting analysis of crucial QoS/QoE descriptors (some of which neglected in the technical literature) of VoIP traffic in a real mobile environment. The problem is formulated in terms of a multivariate time series analysis. Such a formalization allows to discover and model the temporal relationships among various descriptors and to forecast their behaviors for future periods. Techniques such as Vector Autoregressive models and machine learning (deep-based and tree-based) approaches are employed and compared in terms of performance and time complexity, by reframing the multivariate time series problem into a supervised learning one. Moreover, a series of auxiliary analyses (stationarity, orthogonal impulse responses, etc.) are performed to discover the analytical structure of the time series and to provide deep insights about their relationships. The whole theoretical analysis has an experimental counterpart since a set of trials across a real-world LTE-Advanced environment has been performed to collect, post-process and analyze about 600,000 voice packets, organized per flow and differentiated per codec.

摘要
预测实时交通（如VoIP）的行为在 mobilitas enario 可以帮助操作商更好地规划其网络基础设施和资源的分配。因此，在这种工作中，作者们提出了对关键 QoS/QoE 特征（一些在技术文献中被忽略）的 VoIP 流量预测分析。问题被形式化为多变量时间系列分析。这种形式化允许发现和模型时间序列中的关系，并预测未来时间段的行为。作者们使用 vector autoregressive 模型和机器学习（深度基于和树基于）方法，并对其性能和时间复杂度进行比较。此外，作者们还进行了一系列辅助分析（如站点性和正交冲击响应），以发现时间序列的分析结构和提供深入的理解。整个理论分析有实验室的实际应用，在一个真实的 LTE-Advanced 环境中进行了600,000个语音包的收集、后处理和分析，按流分类和编解码器进行分类。

An Improved Uniform Convergence Bound with Fat-Shattering Dimension

paper_url: http://arxiv.org/abs/2307.06644
repo_url: None
paper_authors: Roberto Colomboni, Emmanuel Esposito, Andrea Paudice
for: 这个论文是为了研究实值函数的均匀收敛性而写的。
methods: 该论文使用了新的均匀收敛约束，以提高现有最佳上界的多项式级别。
results: 该论文提出了一个新的均匀收敛约束，可以减少多项式级别上的一个多项式系数，从而关闭当前的 gap。

Abstract
The fat-shattering dimension characterizes the uniform convergence property of real-valued functions. The state-of-the-art upper bounds feature a multiplicative squared logarithmic factor on the sample complexity, leaving an open gap with the existing lower bound. We provide an improved uniform convergence bound that closes this gap.

摘要
“脂肪破碎维度”指的是实值函数的均匀收敛性质。现有的最佳上限 bounds 包含一个乘方 logarithmic 因子，留下一个开放的差距，我们提供了改进的均匀收敛 bound，填充这个差距。

Discovering How Agents Learn Using Few Data

paper_url: http://arxiv.org/abs/2307.06640
repo_url: https://github.com/jettbrains/-L-
paper_authors: Iosif Sakos, Antonios Varvitsiotis, Georgios Piliouras
for: 这个论文的目的是为了实时识别多个代理系统的学习动力学，以便在不监控代理系统的情况下，通过短暂的单个系统轨迹来学习代理系统的行为。
methods: 这个论文提出了一种理论和算法框架，通过帕ynomial regression来识别代理系统的学习动力学，并通过sum-of-squares优化来执行计算。
results: 实验表明，使用这种方法，只需要使用单个系统轨迹的5个样本，就可以准确地回归真实的代理系统动力学，包括平衡选择和预测混沌系统的结果。这些发现表明，这种方法在多个竞争性多代理系统中可以提供有效的政策和决策支持。

Abstract
Decentralized learning algorithms are an essential tool for designing multi-agent systems, as they enable agents to autonomously learn from their experience and past interactions. In this work, we propose a theoretical and algorithmic framework for real-time identification of the learning dynamics that govern agent behavior using a short burst of a single system trajectory. Our method identifies agent dynamics through polynomial regression, where we compensate for limited data by incorporating side-information constraints that capture fundamental assumptions or expectations about agent behavior. These constraints are enforced computationally using sum-of-squares optimization, leading to a hierarchy of increasingly better approximations of the true agent dynamics. Extensive experiments demonstrated that our approach, using only 5 samples from a short run of a single trajectory, accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lyapunov times. These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems.

摘要
分布式学习算法是多智能系统设计的重要工具，它使得代理能 autonomously 从经验和过去互动中学习。在这项工作中，我们提出了一种理论和算法框架，用于实时识别代理行为的学习动力学。我们使用多项式回归来识别代理动力学，并通过包含侧情信息约束来补偿有限数据。这些约束通过权重加权平均来实现，从而构建一个层次结构，从最糟糕的应答逐渐提升到最佳的真实代理动力学。广泛的实验表明，我们的方法只需使用单个轨迹的5个样本，便可以准确地回归真实的代理动力学，并在多个标准测试函数上达到10个Ляпунов时间的预测。这些发现表明，我们的方法在多智能系统中有很大的潜力，以支持有效的政策和决策。

Frameless Graph Knowledge Distillation

paper_url: http://arxiv.org/abs/2307.06631
repo_url: https://github.com/dshi3553usyd/frameless_graph_distillation
paper_authors: Dai Shi, Zhiqi Shao, Yi Guo, Junbin Gao
for: 本研究旨在提高graph neural network（GNN）的推理速度，通过知识传递（KD）机制将复杂的教师模型传递给简单的学生模型，并让学生模型能够快速地完成重要的学习任务。
methods: 本研究使用了多级GNN，即图帧лет（graph framelet），并证明了通过多级图知识的有效利用，学生模型能够适应同形同性和不同性图，并有可能解决潦烂issue。
results: 对比 experiments表明，我们提出的模型可以保持与教师模型相同的学习精度，同时具有高速的推理速度。

Abstract
Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to the graph representation learning models such as graph neural networks (GNNs) to accelerate the model's inference speed via student models. However, many existing KD-based GNNs utilize MLP as a universal approximator in the student model to imitate the teacher model's process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multi-scaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multi-scaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the over-squashing issue with a simple yet effectively graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.

摘要
知识塑化（KD）已经展示了让知识从复杂的教师模型传递到简单的学生模型中，以便高效地完成重要的学习任务而无需失去很多预测精度。最近，许多人对使用KD机制来加速图表示学习模型（GNNs）的推理速度进行了尝试。然而，大多数现有的KD-基于GNNs使用多层感知网络（MLP）作为学生模型的universal approximator，而不考虑教师模型中的图知识。在这种工作中，我们提供了基于多尺度GNNs的KD框架，称之为图帧lets，并证明了，通过在多尺度的图帧lets中精准地利用图知识，学生模型可以适应同质和不同的图Structures，并有可能解决过分压缩问题。此外，我们还表明了教师模型对图知识的学习和吞吐过程，通过 Both algebra and geometry。经过全面的实验，我们的提议的模型可以达到和教师模型的预测精度，同时保持高速的推理速度。

Quantum Autoencoders for Learning Quantum Channel Codes

paper_url: http://arxiv.org/abs/2307.06622
repo_url: None
paper_authors: Lakshika Rathi, Stephen DiAdamo, Alireza Shabani
for: 本研究探讨了使用量子机器学习技术进行类别和量子通信的应用，包括不同量子链路模型下的通信场景。
methods: 我们采用了参数化的量子循环和灵活的通道噪声模型，开发了一个机器学习框架，用于生成量子通道码和评估其效果。
results: 我们在不同量子链路模型下应用了这个框架，并在每个场景中达到了强表现。我们的结果表明，量子机器学习可以在量子通信系统研究中发挥作用，帮助我们更好地理解各种通信设置、多样化通道模型以及容量下限。

Abstract
This work investigates the application of quantum machine learning techniques for classical and quantum communication across different qubit channel models. By employing parameterized quantum circuits and a flexible channel noise model, we develop a machine learning framework to generate quantum channel codes and evaluate their effectiveness. We explore classical, entanglement-assisted, and quantum communication scenarios within our framework. Applying it to various quantum channel models as proof of concept, we demonstrate strong performance in each case. Our results highlight the potential of quantum machine learning in advancing research on quantum communication systems, enabling a better understanding of capacity bounds under modulation constraints, various communication settings, and diverse channel models.

摘要
这项研究探讨了使用量子机器学习技术进行классический和量子通信 across不同量子通道模型。我们通过使用参数化的量子电路和灵活的通道噪声模型，开发了一个机器学习框架，以生成量子通道编码并评估其效果。我们在不同的通信场景中（包括类型、助け助け和量子通信）进行了探索。通过应用到不同的量子通道模型中作为证明，我们证明了我们的结果在每个情况下都具有强表现。我们的结果表明量子机器学习在研究量子通信系统方面可能会有益，帮助我们更好地理解容器约束下的容量边界，不同通信设置下的通信效果，以及不同通道模型下的通信性能。

Online Distributed Learning with Quantized Finite-Time Coordination

paper_url: http://arxiv.org/abs/2307.06620
repo_url: None
paper_authors: Nicola Bastianello, Apostolos I. Rikos, Karl H. Johansson
for: 本研究考虑在分布式学习问题中进行在线分布式学习。在我们的设定中，一组代理需要协同训练来自流动数据源的学习模型。与联邦学习不同，我们的方法不依赖中央服务器，而是仅仅通过代理之间的点对点通信。这种方法在隐私、安全和成本因素的情况下是非常有用。
methods: 我们提出了一种分布式算法，该算法基于量化、有限时协调协议来聚合本地训练模型。此外，我们的算法允许在本地训练中使用随机抽样subset的梯度。这使得我们的算法比传统梯度下降更加高效和可扩展。
results: 我们分析了提议算法的性能，并对在线解决方案的平均距离进行分析。最后，我们对一个логистиック回归任务进行了数值研究。

Abstract
In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.

摘要
在这篇论文中，我们考虑了在分布式学习环境下进行在线学习问题。在我们的设定中，一群代理需要合作地训练基于流动数据的学习模型。与联邦学习不同，我们的方法不依赖中央服务器，只是基于代理之间的点对点通信。这种方法通常在数据不能被移动到中央位置的场景下使用，例如隐私、安全或成本原因。为了 compensate the lack of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.Here's the translation in Traditional Chinese:在这篇论文中，我们考虑了在分布式学习环境下进行在线学习问题。在我们的设定中，一群代理需要合作地训练基于流动数据的学习模型。与联邦学习不同，我们的方法不依赖中央服务器，只是基于代理之间的点对点通信。这种方法通常在数据无法被移动到中央位置的场景下使用，例如隐私、安全或成本原因。为了 compensate the lack of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.

Learning IMM Filter Parameters from Measurements using Gradient Descent

paper_url: http://arxiv.org/abs/2307.06618
repo_url: None
paper_authors: André Brandenburger, Folker Hoffmann, Alexander Charlish
for: 这篇论文主要是为了优化感知器（IMM）筛选器的参数，使其可以通过测量数据来自动优化，而不需要任何真实数据。
methods: 该论文使用了测量数据来优化IMM筛选器的参数，而不需要任何真实数据。
results: 经过测试和比较，该方法可以与使用真实数据参数化的IMM筛选器匹配性能。

Abstract
The performance of data fusion and tracking algorithms often depends on parameters that not only describe the sensor system, but can also be task-specific. While for the sensor system tuning these variables is time-consuming and mostly requires expert knowledge, intrinsic parameters of targets under track can even be completely unobservable until the system is deployed. With state-of-the-art sensor systems growing more and more complex, the number of parameters naturally increases, necessitating the automatic optimization of the model variables. In this paper, the parameters of an interacting multiple model (IMM) filter are optimized solely using measurements, thus without necessity for any ground-truth data. The resulting method is evaluated through an ablation study on simulated data, where the trained model manages to match the performance of a filter parametrized with ground-truth values.

摘要
系统性能的数据融合和跟踪算法常常取决于感知器系统中的参数，这些参数不仅描述感知器系统，还可能是任务特定的。而目标下的内在参数甚至可能是完全不可见的，直到系统部署才能确定。随着现代感知器系统的复杂度不断增加，参数的数量自然增加，因此需要自动优化模型变量。在这篇论文中，我们使用仅基于测量结果进行参数优化，因此无需任何真实数据。这种方法在模拟数据上进行了减少研究，并证明了它可以与基于真实数据 parametrize 的筛子性能匹配。

Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks

paper_url: http://arxiv.org/abs/2307.06608
repo_url: None
paper_authors: Jiaming Zhang, Jitao Sang, Qi Yi, Changsheng Xu
for: 这 paper 旨在提高无盒 adversarial attack 的实用性和挑战性。
methods: 本 paper 采用了一种 innovative 的想法，即将 adversarial attack 视为下游任务，并使用 foundational models 作为 surrogate models。
results: 实验结果表明，使用 margin-based loss strategy 来微调 foundational models 可以提高其性能，并且这种方法的性能超过了其他更复杂的算法。

Abstract
Recently, the no-box adversarial attack, in which the attacker lacks access to the model's architecture, weights, and training data, become the most practical and challenging attack setup. However, there is an unawareness of the potential and flexibility inherent in the surrogate model selection process on no-box setting. Inspired by the burgeoning interest in utilizing foundational models to address downstream tasks, this paper adopts an innovative idea that 1) recasting adversarial attack as a downstream task. Specifically, image noise generation to meet the emerging trend and 2) introducing foundational models as surrogate models. Harnessing the concept of non-robust features, we elaborate on two guiding principles for surrogate model selection to explain why the foundational model is an optimal choice for this role. However, paradoxically, we observe that these foundational models underperform. Analyzing this unexpected behavior within the feature space, we attribute the lackluster performance of foundational models (e.g., CLIP) to their significant representational capacity and, conversely, their lack of discriminative prowess. To mitigate this issue, we propose the use of a margin-based loss strategy for the fine-tuning of foundational models on target images. The experimental results verify that our approach, which employs the basic Fast Gradient Sign Method (FGSM) attack algorithm, outstrips the performance of other, more convoluted algorithms. We conclude by advocating for the research community to consider surrogate models as crucial determinants in the effectiveness of adversarial attacks in no-box settings. The implications of our work bear relevance for improving the efficacy of such adversarial attacks and the overall robustness of AI systems.

摘要
近期，无框黑盒攻击（no-box adversarial attack）成为了最实用和挑战性最高的攻击设置。然而，关于选择surrogate模型的潜在和可能性的了解却受到了忽略。 draw inspiration from the growing interest in using foundational models to address downstream tasks, this paper proposes an innovative idea that recasts adversarial attacks as a downstream task and introduces foundational models as surrogate models. Based on the concept of non-robust features, we present two guiding principles for surrogate model selection to explain why foundational models are optimal for this role. However, paradoxically, we observe that these foundational models underperform. Analyzing this unexpected behavior within the feature space, we attribute the lackluster performance of foundational models (e.g., CLIP) to their significant representational capacity and, conversely, their lack of discriminative prowess. To mitigate this issue, we propose the use of a margin-based loss strategy for the fine-tuning of foundational models on target images. The experimental results verify that our approach, which employs the basic Fast Gradient Sign Method (FGSM) attack algorithm, outstrips the performance of other, more convoluted algorithms. We conclude by advocating for the research community to consider surrogate models as crucial determinants in the effectiveness of adversarial attacks in no-box settings. The implications of our work bear relevance for improving the efficacy of such adversarial attacks and the overall robustness of AI systems.

Is Task-Agnostic Explainable AI a Myth?

paper_url: http://arxiv.org/abs/2307.06963
repo_url: None
paper_authors: Alicja Chaszczewicz
for: 本研究提供一个对当代可解释人工智能（XAI）的框架，并评估XAI方法的概念和技术限制，以及它们在实际应用中的适用性。
methods: 本研究探讨了三种XAI研究方向，包括图像、文本和图形数据的说明，并考虑了对图像、文本和图形数据的说明方法。
results: 本研究发现，虽然XAI方法可以提供补充性和有用的输出，但是研究人员和决策者应考虑XAI方法的概念和技术限制，这些限制往往会变成黑盒子。

Abstract
Our work serves as a framework for unifying the challenges of contemporary explainable AI (XAI). We demonstrate that while XAI methods provide supplementary and potentially useful output for machine learning models, researchers and decision-makers should be mindful of their conceptual and technical limitations, which frequently result in these methods themselves becoming black boxes. We examine three XAI research avenues spanning image, textual, and graph data, covering saliency, attention, and graph-type explainers. Despite the varying contexts and timeframes of the mentioned cases, the same persistent roadblocks emerge, highlighting the need for a conceptual breakthrough in the field to address the challenge of compatibility between XAI methods and application tasks.

摘要
我们的工作作为当代可解释人工智能（XAI）挑战的框架。我们示出XAI方法可以为机器学习模型提供补充性和有用的输出，但研究人员和决策者应注意这些方法的概念和技术限制，这些限制 frequently result in these methods becoming black boxes。我们探讨了图像、文本和图表数据三个XAI研究方向，涵盖了吸引力、注意力和图表类型的解释器。虽然这些案例在不同的上下文和时间段出现，但同样的持续的障碍出现， highlighting the need for a conceptual breakthrough in the field to address the challenge of compatibility between XAI methods and application tasks.

Deep Neural Networks for Semiparametric Frailty Models via H-likelihood

paper_url: http://arxiv.org/abs/2307.06581
repo_url: None
paper_authors: Hangbin Lee, IL DO HA, Youngjo Lee
for: 预测时间事件聚合数据的 clustering 问题，提出了一种新的深度神经网络基于γ frailty模型（DNN-FM）。
methods: 该模型使用负profiled h-likelihood作为损失函数，通过最大化新的h-likelihood来获得固定参数和随机强度的最优估计器。
results: 实验研究表明，提出的方法可以提高现有方法的预测性能。一个实际数据分析表明，包含个体特定的强度可以提高DNN基于Cox模型（DNN-Cox）的预测性能。

Abstract
For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood as a loss function, constructed by profiling out the non-parametric baseline hazard. Experimental studies show that the proposed method enhances the prediction performance of the existing methods. A real data analysis shows that the inclusion of subject-specific frailties helps to improve prediction of the DNN based Cox model (DNN-Cox).

摘要
<>对嵌套时间事件数据预测，我们提出了一种新的深度神经网络基于gamma领域模型（DNN-FM）。这种模型的优点在于，joint最大化新的h-概率提供了固定参数的最大似然估计和随机领域的最佳无偏预测。因此，我们使用负概率h-概率作为损失函数，通过批量训练深度神经网络来训练该模型。实验表明，我们的方法可以提高现有方法的预测性能。一个实际分析表明，包含个体特定领域风险的模型可以提高DNN-Cox模型（DNN-Cox）的预测性能。Note: "gamma领域模型" (gamma frailty model) refers to a type of statistical model used for survival analysis, which accounts for the variation in hazard rates across individuals or groups.

Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

paper_url: http://arxiv.org/abs/2307.06565
repo_url: None
paper_authors: Lianke Qin, Zhao Song, Yuanyuan Yang
for: 这篇论文的目的是提出一种高效的神经网络训练方法，并提供一个对应的具有条件均值的证明。
methods: 本论文使用了一种名为“static half-space report”的数据结构，并使用了一个具有内置的二层全连接神经网络来实现活化神经元识别。
results: 本论文证明了其训练方法可以在$O(M^2/\epsilon^2)$时间内提供一个对应的均值证明，其中网络大小 quadratic 于对应的对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应对应

Abstract
Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time. Therefore, designing an efficient neural network training method with a provable convergence guarantee is a fundamental and important research question. In this paper, we present a static half-space report data structure that consists of a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search. We also prove that our algorithm can converge in $O(M^2/\epsilon^2)$ time with network size quadratic in the coefficient norm upper bound $M$ and error term $\epsilon$.

摘要
深度学习在许多领域中广泛应用，但模型训练过程通常需要巨量计算资源和时间。因此，设计高效的神经网络训练方法，并且可以证明收敛保证是基本和重要的研究问题。在这篇论文中，我们提出了一种静态半空间报告数据结构，它包括一个完全连接的两层神经网络，用于实现启动ReLU活动的启动neuron标识。我们还证明了我们的算法可以在$O(M^2/\epsilon^2)$时间内收敛，其中网络大小 quadratic 于Activation нор Upper bound $M$ 和 error term $\epsilon$。

Prescriptive Process Monitoring Under Resource Constraints: A Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2307.06564
repo_url: https://github.com/mshoush/rl-prescriptive-monitoring
paper_authors: Mahmoud Shoush, Marlon Dumas
for: 这 paper 的目的是优化业务过程的性能，通过实时触发 intervención，提高Positive case outcome的可能性。
methods: 这 paper 使用了 reinforcement learning 方法，通过试错学习来学习 intervención 政策。
results: 这 paper 的实验结果表明，通过使用 conformal prediction 技术来考虑预测uncertainty，可以帮助 reinforcement learning 代理人 converges towards 更高的 net intervention gain 政策。

Abstract
Prescriptive process monitoring methods seek to optimize the performance of business processes by triggering interventions at runtime, thereby increasing the probability of positive case outcomes. These interventions are triggered according to an intervention policy. Reinforcement learning has been put forward as an approach to learning intervention policies through trial and error. Existing approaches in this space assume that the number of resources available to perform interventions in a process is unlimited, an unrealistic assumption in practice. This paper argues that, in the presence of resource constraints, a key dilemma in the field of prescriptive process monitoring is to trigger interventions based not only on predictions of their necessity, timeliness, or effect but also on the uncertainty of these predictions and the level of resource utilization. Indeed, committing scarce resources to an intervention when the necessity or effects of this intervention are highly uncertain may intuitively lead to suboptimal intervention effects. Accordingly, the paper proposes a reinforcement learning approach for prescriptive process monitoring that leverages conformal prediction techniques to consider the uncertainty of the predictions upon which an intervention decision is based. An evaluation using real-life datasets demonstrates that explicitly modeling uncertainty using conformal predictions helps reinforcement learning agents converge towards policies with higher net intervention gain

摘要

Nested Elimination: A Simple Algorithm for Best-Item Identification from Choice-Based Feedback

paper_url: http://arxiv.org/abs/2307.09295
repo_url: None
paper_authors: Junwen Yang, Yifan Feng
for: 本研究目标是通过选择反馈来最佳化最受欢迎的商品确定。
methods: 本文提出了一种嵌套减少算法（NE），它基于信息论下界的嵌套结构。NE简单结构，易于实现，并具有高度理论保证的样本复杂度。
results: 本文提供了实例特定的非假想性 bound，证明NE在样本复杂度方面具有高度最佳化性。此外，我们还证明NE在最差情况下具有高阶绝佳性。数值实验结果从 sintetic 和实际数据中协调我们的理论发现。

Abstract
We study the problem of best-item identification from choice-based feedback. In this problem, a company sequentially and adaptively shows display sets to a population of customers and collects their choices. The objective is to identify the most preferred item with the least number of samples and at a high confidence level. We propose an elimination-based algorithm, namely Nested Elimination (NE), which is inspired by the nested structure implied by the information-theoretic lower bound. NE is simple in structure, easy to implement, and has a strong theoretical guarantee for sample complexity. Specifically, NE utilizes an innovative elimination criterion and circumvents the need to solve any complex combinatorial optimization problem. We provide an instance-specific and non-asymptotic bound on the expected sample complexity of NE. We also show NE achieves high-order worst-case asymptotic optimality. Finally, numerical experiments from both synthetic and real data corroborate our theoretical findings.

摘要
我们研究最佳项目标识别问题，基于选择反馈。在这个问题中，一家公司逐渐和适应地显示给客户群体的显示集，并收集他们的选择。目标是 identificar el item más preferido con el menor número de muestras y un alto nivel de confianza。我们提出了一种嵌套减少算法（NE），它基于信息理论下界的嵌套结构。NE estructura simple, fácil de implementar y tiene una garantía teórica fuerte en términos de complejidad de muestras. En particular, NE utiliza una criterio de eliminación innovador y se circunda de resolver cualquier problema de optimización combinatoria complejo. Proporcionamos una bound no asymptótica específica de la complejidad de muestras esperada de NE para cada instancia. Además, mostramos que NE alcanza la optimidad de orden alto en el peor de los casos. Por último, los experimentos numéricos de datos sintéticos y reales respaldan nuestros hallazgos teóricos.

Metal Oxide-based Gas Sensor Array for the VOCs Analysis in Complex Mixtures using Machine Learning

paper_url: http://arxiv.org/abs/2307.06556
repo_url: None
paper_authors: Shivam Singh, Sajana S, Poornima, Gajje Sreelekha, Chandranath Adak, Rajendra P. Shukla, Vinayak Kamble
for: 这个研究目的是为了开发一个能够同时识别和预测多种有机气体的感应器阵列，以便非侵入性地检测疾病。
methods: 这个研究使用了三种金属酸电极的感应器阵列，并使用机器学习方法来识别四种不同的有机气体。
results: 研究发现，使用机器学习方法可以实现99%以上的准确率来识别不同的化学物质，并且在预测化学物质浓度方面也有出色的效果。

Abstract
Detection of Volatile Organic Compounds (VOCs) from the breath is becoming a viable route for the early detection of diseases non-invasively. This paper presents a sensor array with three metal oxide electrodes that can use machine learning methods to identify four distinct VOCs in a mixture. The metal oxide sensor array was subjected to various VOC concentrations, including ethanol, acetone, toluene and chloroform. The dataset obtained from individual gases and their mixtures were analyzed using multiple machine learning algorithms, such as Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree, Linear Regression, Logistic Regression, Naive Bayes, Linear Discriminant Analysis, Artificial Neural Network, and Support Vector Machine. KNN and RF have shown more than 99% accuracy in classifying different varying chemicals in the gas mixtures. In regression analysis, KNN has delivered the best results with R2 value of more than 0.99 and LOD of 0.012, 0.015, 0.014 and 0.025 PPM for predicting the concentrations of varying chemicals Acetone, Toluene, Ethanol, and Chloroform, respectively in complex mixtures. Therefore, it is demonstrated that the array utilizing the provided algorithms can classify and predict the concentrations of the four gases simultaneously for disease diagnosis and treatment monitoring.

摘要
这篇文章描述了一种基于呼吸检测的有机化合物检测技术，可以不侵入式地检测疾病的早期。文章提出了一个使用机器学习方法的几个金属氧化物电极阵列，可以同时检测四种不同的有机化合物浓度。这个阵列在不同的有机化合物浓度下进行了试验，包括乙醇、乙酸、苯和氯化物。取得的数据被多种机器学习算法分析，包括随机森林（RF）、最近邻居（KNN）、决策树、直线回归、条件式回归、简单贝叶激活函数、线性滤元分析、人工神经网络和支持向量机器学习。KNN和RF算法在分类不同的化学物质时有超过99%的准确率。在回归分析中，KNN算法实现了最佳结果，R2值超过0.99并LOD值为0.012、0.015、0.014和0.025 ppm，对于不同的化学物质浓度进行预测。因此，文章证明了这个阵列和提供的算法可以同时分类和预测不同化学物质的浓度，从而实现疾病诊断和治疗监控。

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

paper_url: http://arxiv.org/abs/2307.06555
repo_url: None
paper_authors: Shijun Zhang, Jianfeng Lu, Hongkai Zhao
for: 这个论文探讨了深度神经网络在不同的活动函数下的表达能力。
methods: 论文使用了一个活动函数集合 $\mathscr{A}$，其包括大多数常用的活动函数，如 $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, 和 $\mathtt{SRS}$.
results: 论文表明，对任意活动函数 $\varrho\in \mathscr{A}$，一个 $\mathtt{ReLU}$ 网络宽度为 $N$，深度为 $L$ 可以在任何绝对上被 $\varrho$-活动的网络宽度为 $6N$，深度为 $2L$ 所 aproximated 到任何精度。这一发现使得大多数approximation结果在 $\mathtt{ReLU}$ 网络上得到的结果可以被推广到各种其他活动函数，只是需要略大些常数。

Abstract
This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $6N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.

摘要

paper_url: http://arxiv.org/abs/2307.09575
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Mert Kayaalp, Ali H. Sayed
for: 本研究探讨了社交网络上连接的代理之间的 causal 影响，特别是社交学习模型和分布式决策协议的动态。
methods: 本研究使用了表达式描述代理对对之间的 causal 关系，并解释了信息流动在网络上的方式。
results: 研究发现代理之间的影响关系取决于社交网络的拓扑结构和每个代理对推理问题的信息水平。提出了一种算法来评估代理之间的全局影响，并提供了从Raw observational data中学习模型参数的方法。

Abstract
This paper investigates causal influences between agents linked by a social graph and interacting over time. In particular, the work examines the dynamics of social learning models and distributed decision-making protocols, and derives expressions that reveal the causal relations between pairs of agents and explain the flow of influence over the network. The results turn out to be dependent on the graph topology and the level of information that each agent has about the inference problem they are trying to solve. Using these conclusions, the paper proposes an algorithm to rank the overall influence between agents to discover highly influential agents. It also provides a method to learn the necessary model parameters from raw observational data. The results and the proposed algorithm are illustrated by considering both synthetic data and real Twitter data.

摘要

Full-resolution Lung Nodule Segmentation from Chest X-ray Images using Residual Encoder-Decoder Networks

paper_url: http://arxiv.org/abs/2307.06547
repo_url: None
paper_authors: Michael James Horry, Subrata Chakraborty, Biswajeet Pradhan, Manoranjan Paul, Jing Zhu, Prabal Datta Barua, U. Rajendra Acharya, Fang Chen, Jianlong Zhou
for: 验诊肺癌的早期诊断，提高肺癌患者的生存率。
methods: 使用高效的编码器-解码器神经网络，不减扩图像，以避免信号损失。
results: localize肺胞结核病变，实现了85%的敏感性和8个false positive的准确率，并且具有低 False Positive 率和快速的推理时间。

Abstract
Lung cancer is the leading cause of cancer death and early diagnosis is associated with a positive prognosis. Chest X-ray (CXR) provides an inexpensive imaging mode for lung cancer diagnosis. Suspicious nodules are difficult to distinguish from vascular and bone structures using CXR. Computer vision has previously been proposed to assist human radiologists in this task, however, leading studies use down-sampled images and computationally expensive methods with unproven generalization. Instead, this study localizes lung nodules using efficient encoder-decoder neural networks that process full resolution images to avoid any signal loss resulting from down-sampling. Encoder-decoder networks are trained and tested using the JSRT lung nodule dataset. The networks are used to localize lung nodules from an independent external CXR dataset. Sensitivity and false positive rates are measured using an automated framework to eliminate any observer subjectivity. These experiments allow for the determination of the optimal network depth, image resolution and pre-processing pipeline for generalized lung nodule localization. We find that nodule localization is influenced by subtlety, with more subtle nodules being detected in earlier training epochs. Therefore, we propose a novel self-ensemble model from three consecutive epochs centered on the validation optimum. This ensemble achieved a sensitivity of 85% in 10-fold internal testing with false positives of 8 per image. A sensitivity of 81% is achieved at a false positive rate of 6 following morphological false positive reduction. This result is comparable to more computationally complex systems based on linear and spatial filtering, but with a sub-second inference time that is faster than other methods. The proposed algorithm achieved excellent generalization results against an external dataset with sensitivity of 77% at a false positive rate of 7.6.

摘要
肺癌是最主要的癌症致死原因，早期诊断和治疗可以提高生存率。胸部X射线成像（CXR）是肺癌诊断的便宜成像方式。但是，使用CXR可能困难地分辨出疑似肿体，特别是与血管和骨结构相似的结构。过去，计算机视觉已经被提议用于帮助人类放射学专家进行诊断，但是这些研究通常使用压缩图像和计算成本高昂的方法，并且无法证明普适性。相反，本研究使用高效的encoder-decoder神经网络来local化肺肿体。这些神经网络可以处理全分辨率图像，以避免因压缩而导致的信号损失。这些神经网络在JSRT肺肿体数据集上进行训练和测试，并在一个独立的外部CXR数据集上进行应用。我们使用自动化框架来测量感知率和假阳率。这些实验允许我们确定最佳神经网络深度、图像分辨率和预处理管道，以及肺肿体localization的影响因素。我们发现，肺肿体localization受到微妙度的影响，微妙度较高的肿体在训练过程中更易于检测。因此，我们提出了一种新的自我ensemble模型，其中三个连续的训练EP中心于验证优点。这个ensemble得到了10次内部测试中的感知率85%，假阳率8。在减少False Positive的情况下，我们得到了感知率77%，假阳率7.6%。这个结果与更计算复杂的方法相比，具有更快的决策时间，但是与其他方法相比，具有更好的普适性。

On the Effective Horizon of Inverse Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.06541
repo_url: None
paper_authors: Yiqing Xu, Finale Doshi-Velez, David Hsu
for: 本文研究 inverse reinforcement learning（IRL）算法，它们通常基于前向奖励学习或规划来计算一个假设的奖励函数，然后与专家示范匹配。
methods: 本文使用了时间框架来控制IRL算法的计算效率和奖励函数的准确性。
results: 本文的实验结果证明，使用有效的时间框架可以更快地获得更好的结果，并且可以避免过度适应。此外，本文还提出了一种jointly学习奖励函数和有效时间框架的方法，这种方法在实验中获得了好的结果。

Abstract
Inverse reinforcement learning (IRL) algorithms often rely on (forward) reinforcement learning or planning over a given time horizon to compute an approximately optimal policy for a hypothesized reward function and then match this policy with expert demonstrations. The time horizon plays a critical role in determining both the accuracy of reward estimate and the computational efficiency of IRL algorithms. Interestingly, an effective time horizon shorter than the ground-truth value often produces better results faster. This work formally analyzes this phenomenon and provides an explanation: the time horizon controls the complexity of an induced policy class and mitigates overfitting with limited data. This analysis leads to a principled choice of the effective horizon for IRL. It also prompts us to reexamine the classic IRL formulation: it is more natural to learn jointly the reward and the effective horizon together rather than the reward alone with a given horizon. Our experimental results confirm the theoretical analysis.

摘要
倒向奖励学习（Inverse Reinforcement Learning，IRL）算法经常利用前进的奖励学习或规划算法计算一个假设的奖励函数的相对优化策略，然后与专家示范相匹配。时间范围在计算奖励估计的准确性和IRL算法的计算效率中扮演了关键的角色。有趣的是，一个有效的时间范围 shorter than the ground-truth value 可以更快地生成更好的结果。这个研究正式分析了这个现象，并提供了一个解释：时间范围控制引induced policy class的复杂性，并降低了limited data的过拟合。这种分析导致了一种原则性的选择有效的时间范围 для IRL。此外，它也让我们重新考虑了 класси的IRL形式：在学习奖励函数时，更自然的是同时学习有效的时间范围。我们的实验结果证实了理论分析。

Convolutional Neural Networks for Sentiment Analysis on Weibo Data: A Natural Language Processing Approach

paper_url: http://arxiv.org/abs/2307.06540
repo_url: None
paper_authors: Yufei Xie, Rodolfo C. Raga Jr
for: 这个研究旨在使用卷积神经网络（CNN）进行微博上的情感分析任务，提供了一种新的自然语言处理（NLP）方法。
methods: 该研究使用了精心预处理、分词和分类的方法，并使用了word embedding来进行特征提取。使用了CNN模型进行情感分类任务，并在测试集上达到了大约0.73的macro-average F1分数。
results: 该研究发现，使用CNN模型进行情感分类任务可以 дости得balanced的性能，并且可以用于社交媒体分析、市场调查和政策研究等实际应用。

Abstract
This study addressed the complex task of sentiment analysis on a dataset of 119,988 original tweets from Weibo using a Convolutional Neural Network (CNN), offering a new approach to Natural Language Processing (NLP). The data, sourced from Baidu's PaddlePaddle AI platform, were meticulously preprocessed, tokenized, and categorized based on sentiment labels. A CNN-based model was utilized, leveraging word embeddings for feature extraction, and trained to perform sentiment classification. The model achieved a macro-average F1-score of approximately 0.73 on the test set, showing balanced performance across positive, neutral, and negative sentiments. The findings underscore the effectiveness of CNNs for sentiment analysis tasks, with implications for practical applications in social media analysis, market research, and policy studies. The complete experimental content and code have been made publicly available on the Kaggle data platform for further research and development. Future work may involve exploring different architectures, such as Recurrent Neural Networks (RNN) or transformers, or using more complex pre-trained models like BERT, to further improve the model's ability to understand linguistic nuances and context.

摘要

Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems

paper_url: http://arxiv.org/abs/2307.06538
repo_url: None
paper_authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Morris Yau
for: 学习混合线性动力系统，以提高时间序列数据的预测和理解。
methods: 使用矩阵分解方法来学习混合线性动力系统，不需要强制分离条件，可以与 bayes 优化 clustering 竞争。
results: 算法在受限 observe 的情况下工作，并可以在时间序列数据中提高预测和理解。

Abstract
Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.

摘要
最近，陈和穷initiated the study of学习混合线性动力系统。线性动力系统已经广泛应用于时间序列数据的模型化，使用混合模型可以更好地适应下面的子 poblation 表示。在这项工作中，我们提出了一种基于矩阵 decompositions的新方法 для学习混合线性动力系统，这种方法不需要强 separation conditions on the components，可以和 Bayes 优化 clustering of trajectories 竞争。此外，我们的算法在具有部分观测的复杂设定下也可以工作。我们的起点是classic Ho-Kalman algorithm 是现代tensor decomposition方法 для学习潜在变量模型的近亲，这给我们一个playbook for how to extend it to work with more complicated generative models。

DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection

paper_url: http://arxiv.org/abs/2307.06534
repo_url: https://github.com/jaeminyoo/dsv
paper_authors: Jaemin Yoo, Yue Zhao, Lingxiao Zhao, Leman Akoglu
for: 这篇论文主要关注于如何运用自动学习（Self-supervised learning）来进行无监督异常检测（Unsupervised anomaly detection），并且提出了一个名为“Discordance and Separability Validation”的无监督验证损失函数，用于选择高性能的检测模型。
methods: 本文使用了一些资料增强技术，包括随机对称变数和随机对称变数的混合，并且提出了一个名为“Discordance and Separability Validation”的无监督验证损失函数，用于选择高性能的检测模型。
results: 本文的实验结果显示，这个名为“Discordance and Separability Validation”的无监督验证损失函数能够帮助选择高性能的检测模型，并且与其他基准相比，具有更高的检测精度。

Abstract
Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSL-based anomaly detection (SSAD), yet a systematic method for doing so remains unknown. In this work, we propose DSV (Discordance and Separability Validation), an unsupervised validation loss to select high-performing detection models with effective augmentation HPs. DSV captures the alignment between an augmentation function and the anomaly-generating mechanism with surrogate losses, which approximate the discordance and separability of test data, respectively. As a result, the evaluation via DSV leads to selecting an effective SSAD model exhibiting better alignment, which results in high detection accuracy. We theoretically derive the degree of approximation conducted by the surrogate losses and empirically show that DSV outperforms a wide range of baselines on 21 real-world tasks.

摘要

Artificial Intelligence for Drug Discovery: Are We There Yet?

paper_url: http://arxiv.org/abs/2307.06521
repo_url: None
paper_authors: Catrin Hasselgren, Tudor I. Oprea
for: 本研究旨在探讨用数据科学、信息学和人工智能（AI）加速有效药物开发，降低成本和动物实验。
methods: 本研究使用AI技术，如生成化学、机器学习和多属性优化，对疾病、目标和治疗方式进行三大柱子的应用，主要关注小分子药物。
results: AI技术已经使得许多化合物进入临床试验阶段，但科学社区必须仔细评估已知信息，解决复制危机。AI在药物发现中的潜力只能在有足够的基础知识和人类 intervene later ipeline 阶段得到实现。

Abstract
Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small molecule drugs. AI technologies, such as generative chemistry, machine learning, and multi-property optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.

摘要
医药发现在推广新技术，如数据科学、信息学和人工智能（AI），以加速有效治疗的开发，同时降低成本和动物实验。AI正在改变医药发现，可见投资者、产业和学术科学家以及法maker均表示兴趣。成功的医药发现需要优化与药理动力、药代谱和临床结果相关的属性。本文评论AI在三大柱子上的应用：疾病、目标和治疗方式，主要关注小分子药。AI技术，如生成化学、机器学习和多属性优化，已经使得许多化合物进入临床试验。科学社区需要仔细检查已知信息，解决复制危机。AI在医药发现的潜力只能在充分的基础知识和后期管道阶段得到实现，需要合适的人类干预。

Machine Learning practices and infrastructures

paper_url: http://arxiv.org/abs/2307.06518
repo_url: https://github.com/Nikolay-Lysenko/readingbricks
paper_authors: Glen Berman
for: This paper focuses on the interactions between practitioners and the tools they use in machine learning (ML) practices, and how these interactions shape the development of ML systems.
methods: The paper uses an empirical study of questions asked on the Stack Exchange forums to explore the use of interactive computing platforms (e.g. Jupyter Notebook and Google Colab) in ML practices.
results: The paper finds that interactive computing platforms are used in a variety of learning and coordination practices, which constitutes an infrastructural relationship between interactive computing platforms and ML practitioners. The paper also highlights how this relationship risks making invisible aspects of the ML life cycle that are important for the societal impact of deployed ML systems.

Abstract
Machine Learning (ML) systems, particularly when deployed in high-stakes domains, are deeply consequential. They can exacerbate existing inequities, create new modes of discrimination, and reify outdated social constructs. Accordingly, the social context (i.e. organisations, teams, cultures) in which ML systems are developed is a site of active research for the field of AI ethics, and intervention for policymakers. This paper focuses on one aspect of social context that is often overlooked: interactions between practitioners and the tools they rely on, and the role these interactions play in shaping ML practices and the development of ML systems. In particular, through an empirical study of questions asked on the Stack Exchange forums, the use of interactive computing platforms (e.g. Jupyter Notebook and Google Colab) in ML practices is explored. I find that interactive computing platforms are used in a host of learning and coordination practices, which constitutes an infrastructural relationship between interactive computing platforms and ML practitioners. I describe how ML practices are co-evolving alongside the development of interactive computing platforms, and highlight how this risks making invisible aspects of the ML life cycle that AI ethics researchers' have demonstrated to be particularly salient for the societal impact of deployed ML systems.

摘要
Through an empirical study of questions asked on the Stack Exchange forums, this paper explores the use of interactive computing platforms (such as Jupyter Notebook and Google Colab) in ML practices. I find that these platforms are used in a variety of learning and coordination practices, which forms an infrastructural relationship between interactive computing platforms and ML practitioners.I describe how ML practices are co-evolving alongside the development of interactive computing platforms, and highlight how this risks making certain aspects of the ML life cycle invisible to AI ethics researchers. These invisible aspects have been shown to be particularly important for the societal impact of deployed ML systems.

Leveraging Contextual Counterfactuals Toward Belief Calibration

paper_url: http://arxiv.org/abs/2307.06513
repo_url: None
paper_authors: Qiuyi, Zhang, Michael S. Lee, Sherol Chen
for: 这个研究旨在探讨如何将人类价值观和信念汇入到人工智能系统中，以便更好地将人类价值观与AI系统的决策联系起来。
methods: 这个研究使用了一种名为“信念整合”的过程，将人类价值观和信念与AI系统的决策过程整合起来。研究者还提出了一个名为“信念整合循环”的框架，用于更好地调整人类价值观和信念的多样性，并使用多个目标优化来实现这一目的。
results: 研究者透过实际应用“信念整合循环”框架，发现可以在不同的 контек斯中找到一个共识的优化结果，即可以将人类价值观和信念与AI系统的决策过程整合起来，以提高AI系统的决策 accuracy。

Abstract
Beliefs and values are increasingly being incorporated into our AI systems through alignment processes, such as carefully curating data collection principles or regularizing the loss function used for training. However, the meta-alignment problem is that these human beliefs are diverse and not aligned across populations; furthermore, the implicit strength of each belief may not be well calibrated even among humans, especially when trying to generalize across contexts. Specifically, in high regret situations, we observe that contextual counterfactuals and recourse costs are particularly important in updating a decision maker's beliefs and the strengths to which such beliefs are held. Therefore, we argue that including counterfactuals is key to an accurate calibration of beliefs during alignment. To do this, we first segment belief diversity into two categories: subjectivity (across individuals within a population) and epistemic uncertainty (within an individual across different contexts). By leveraging our notion of epistemic uncertainty, we introduce `the belief calibration cycle' framework to more holistically calibrate this diversity of beliefs with context-driven counterfactual reasoning by using a multi-objective optimization. We empirically apply our framework for finding a Pareto frontier of clustered optimal belief strengths that generalize across different contexts, demonstrating its efficacy on a toy dataset for credit decisions.

摘要
信仰和价值在我们的人工智能系统中越来越被包含，通过谨慎地制定数据收集原则或者训练过程中的损失函数规范化。然而，我们称之为“高痛苦问题”的是，人类的信仰各自不同，而且在不同的人群中并不协调。尤其是在扩展到不同上下文时，人类的偏见可能并不准确。因此，我们认为包含对话框架是对准信仰的准确均衡的关键。我们将信仰多样性分为两类：个人差异（在人口内部）和知识不确定性（在个体内部不同上下文中）。通过我们的知识不确定性概念，我们提出了“信仰均衡ecycle”框架，用于更全面地均衡这些多样性的信仰，通过context驱动的对话框架来实现。我们在一个假设问题上采用多目标优化，实际应用了我们的框架，并证明其在不同上下文中的普遍性。

Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.06501
repo_url: None
paper_authors: Wenzhou Lv, Tianyu Wu, Luolin Xiong, Liang Wu, Jian Zhou, Yang Tang, Feng Qian
for: 这种研究的目的是为了开发一种可以实现closed-loop糖尿病控制的人工肾脏系统（AP），以提高患有类型1糖尿病（T1DM）的人的血糖水平控制。methods: 这种研究使用了一种混合控制策略， combining model predictive control（MPC）和深度学习（DRL），以便融合MPC的安全性和稳定性，和DRL的个性化和适应性。此外，研究还使用了meta-学习技术，以便更快地适应新的患者，并使用有限的数据进行适应。results: 研究结果表明，这种控制策略可以在FDA所批准的UVA/Padova T1DM仿真器上实现最高的糖尿病控制效果，并最低化低血糖的发生频率。结论：这些结果表明，提案的控制策略可以有效地实现closed-loop糖尿病控制，并且可以在实际应用中提高患有T1DM的人的血糖水平控制。

Abstract
Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.

摘要
目标：人工胰腺（AP）在type 1 диабеت�ellitus（T1DM）患者中实现closed-loop血糖控制显示了承诺的潜力。然而，为AP设计有效的控制策略仍然是一个挑战，因为生物学过程复杂、延迟的胰岛响应和不准确的血糖测量。MPC（模型预测控制）可以提供安全性和稳定性通过动态模型和安全约束，但缺乏个性化和适应能力，并且在不期望的饭物上表现不佳。相反，深度学习（DRL）可以提供个性化和适应策略，但面临分布转移和大量数据要求。方法：我们提出一种hybrid控制策略（HyCPAP），以解决以上挑战。HyCPAP将MPC策略和DRL ensemble策略相结合，利用两者之间的优势，并弥补它们的相应局限性。为了更快地部署AP系统在实际 Settings中，我们还在HyCPAP中 интегрирова了meta-学习技术，利用前一个体验和患者共享的知识，以快速适应新的患者，并使用有限的可用数据进行适应。结果：我们在FDA所批准的UVA/Padova T1DM simulator上进行了广泛的实验，在三个场景中。我们的方法在血糖控制中度量最高，并且出现 hypoglycemia 的 случа数最低。结论：结果显示了我们的方法在T1DM患者中closed-loop血糖控制中的优势。重要性：这种控制策略可以减少AP系统中的血糖不稳定性和 hypoglycemia 的风险，提高患者的生活质量。

Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems

paper_url: http://arxiv.org/abs/2307.06496
repo_url: None
paper_authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed
for: 本研究旨在攻击可解释深度学习模型（IDLS），以提高攻击者对这些模型的控制。
methods: 我们提出了一种基于转移和分数方法的 Query-efficient Score-based black-box attack，称为QuScore。这种攻击方法不需要知道目标模型和其相关的解释模型。
results: 我们的实验结果表明，QuScore 可以快速地找到可以欺骗 IDLS 的攻击样本，并且可以在不同的 DNN 模型和解释模型上实现高攻击成功率。在 ImageNet 和 CIFAR 数据集上，我们得到了95%-100% 的攻击成功率和69% 的平均传播率。

Abstract
Deep learning models are susceptible to adversarial samples in white and black-box environments. Although previous studies have shown high attack success rates, coupling DNN models with interpretation models could offer a sense of security when a human expert is involved, who can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations. In black-box settings, as access to the components of IDLSes is limited, it becomes more challenging for the adversary to fool the system. In this work, we propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model. QuScore is based on transfer-based and score-based methods by employing an effective microbial genetic algorithm. Our method is designed to reduce the number of queries necessary to carry out successful attacks, resulting in a more efficient process. By continuously refining the adversarial samples created based on feedback scores from the IDLS, our approach effectively navigates the search space to identify perturbations that can fool the system. We evaluate the attack's effectiveness on four CNN models (Inception, ResNet, VGG, DenseNet) and two interpretation models (CAM, Grad), using both ImageNet and CIFAR datasets. Our results show that the proposed approach is query-efficient with a high attack success rate that can reach between 95% and 100% and transferability with an average success rate of 69% in the ImageNet and CIFAR datasets. Our attack method generates adversarial examples with attribution maps that resemble benign samples. We have also demonstrated that our attack is resilient against various preprocessing defense techniques and can easily be transferred to different DNN models.

摘要
深度学习模型容易受到恶意样本的攻击，包括白盒和黑盒环境。 Previous studies have shown that coupling DNN models with interpretation models can provide a sense of security, as a human expert can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations. In black-box settings, as access to the components of IDLSes is limited, it becomes more challenging for the adversary to fool the system.In this work, we propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model. QuScore is based on transfer-based and score-based methods using an effective microbial genetic algorithm. Our method is designed to reduce the number of queries necessary to carry out successful attacks, resulting in a more efficient process. By continuously refining the adversarial samples created based on feedback scores from the IDLS, our approach effectively navigates the search space to identify perturbations that can fool the system.We evaluate the attack's effectiveness on four CNN models (Inception, ResNet, VGG, DenseNet) and two interpretation models (CAM, Grad), using both ImageNet and CIFAR datasets. Our results show that the proposed approach is query-efficient with a high attack success rate that can reach between 95% and 100% and transferability with an average success rate of 69% in the ImageNet and CIFAR datasets. Our attack method generates adversarial examples with attribution maps that resemble benign samples. We have also demonstrated that our attack is resilient against various preprocessing defense techniques and can easily be transferred to different DNN models.

Embracing the chaos: analysis and diagnosis of numerical instability in variational flows

paper_url: http://arxiv.org/abs/2307.06957
repo_url: None
paper_authors: Zuheng Xu, Trevor Campbell
for: 本文研究了数值不稳定性对采样、密度评估和证明下界（ELBO）估计中的可靠性。
methods: 作者使用了对流动系统的尝试，并利用了阴影理论提供了有关采样、密度评估和ELBO估计的数据可靠性的理论保证。
results: 作者发现，尽管流动可能会出现严重的数值不稳定性，但是在应用中，流动的结果通常足够准确。此外，作者还开发了一种用于实践中验证流动结果的诊断方法。

Abstract
In this paper, we investigate the impact of numerical instability on the reliability of sampling, density evaluation, and evidence lower bound (ELBO) estimation in variational flows. We first empirically demonstrate that common flows can exhibit a catastrophic accumulation of error: the numerical flow map deviates significantly from the exact map -- which affects sampling -- and the numerical inverse flow map does not accurately recover the initial input -- which affects density and ELBO computations. Surprisingly though, we find that results produced by flows are often accurate enough for applications despite the presence of serious numerical instability. In this work, we treat variational flows as dynamical systems, and leverage shadowing theory to elucidate this behavior via theoretical guarantees on the error of sampling, density evaluation, and ELBO estimation. Finally, we develop and empirically test a diagnostic procedure that can be used to validate results produced by numerically unstable flows in practice.

摘要
在这篇论文中，我们研究了数值不稳定性对样本、分布评估和证据下界（ELBO）估计的可靠性的影响。我们首先经验表明，通用的流体可能会出现严重的数值积累错误：数值流图与精确流图不同很多，影响样本；同时，数值逆流图不能准确地回归初始输入，影响分布和ELBO计算。尽管如此，我们发现在应用中，流体所生成的结果通常够准确。在这篇文章中，我们对变换流体视为动态系统，利用阴影理论提供了对样本、分布评估和ELBO估计错误的理论保证。最后，我们开发了一种可用于实践中验证不稳定流体生成结果的诊断过程。

Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!

paper_url: http://arxiv.org/abs/2307.06483
repo_url: None
paper_authors: Nathan TeBlunthuis, Valerie Hase, Chung-Hong Chan
for: 这个论文主要是为了探讨自动分类器（AC）如何在通信科学和相关领域中用于量度大量数据，以及如何 corrrect 自动分类器的错误以获得正确的结果。
methods: 这篇论文使用了supervised machine learning（SML）方法来建立自动分类器，并对这些分类器进行了系统性的Literature Review。
results: 论文发现，通信学家大多 Ignore 自动分类器的错误，但是这些错误会导致误分类偏见和不准确的结果。论文还提出了一种新的错误 corrction方法，并通过Monte Carlo simulations进行了测试，发现这种方法是iversatile和高效的。因此，这种方法可以用于 corrrecting 自动分类器的错误，以提高量度结果的准确性。

Abstract
Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.

摘要
自动分类器（AC），通常通过直接指导学习（SML）建立，可以处理大量的数据样本，从文本到图像和视频，并在通信科学和相关领域中广泛使用。尽管如此，甚至高度准确的分类器也会出现错误，导致分类偏见和误导性结果，而不是在下游分析中考虑这些错误。根据我们在通信学者对SML应用的系统性文献综述中发现，communication scholars在大多数情况下忽略了错误分类的偏见。在理论上，现有的统计方法可以使用“金标准”验证数据，如人工标注者创建的数据，来修正错误分类和生成一致的估计。我们介绍和测试了这些方法，包括我们设计和实现的一种新方法，via Monte Carlo simulations，以揭示每种方法的局限性，并将其发布。根据我们的结果，我们推荐我们的新错误修正方法，因为它是多功能和高效的。总之，自动分类器，即使其准确率低于通常的标准或系统性错误，也可以用于measurement，只要采用合适的研究设计和错误修正方法。

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

paper_url: http://arxiv.org/abs/2307.06457
repo_url: None
paper_authors: Max Simchowitz, Abhishek Gupta, Kaiqing Zhang
for: 本研究目的是在分布shift下提供严格的统计保证，尤其是在拥有分布shift的 combinatorial distribution shift Setting 中。
methods: 本文使用 bilinear embedding 来描述标签 $z$ 的分布，并提出了一系列的理论结果，包括新的算法、泛化保证和线性代数结果。一个关键的工具是一种基于相对spectral gap的 perturbation bound ，可能对独立的 linear algebra 领域具有广泛的应用。
results: 本文提出的泛化保证可以在gradual spectral decay 的情况下提供严格的保证，并且可以涵盖typical high-dimensional data 中的分布shift。

Abstract
Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{H}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.

摘要
“获取严格的统计保证是一个打开的和活跃的研究领域。我们研究一种我们称为可 combinatorial 分布shift 的设置，其中（a）在测试和训练分布下，标签 $z$ 是基于对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x,y)$ 的对 $(x

Stochastic Delay Differential Games: Financial Modeling and Machine Learning Algorithms

paper_url: http://arxiv.org/abs/2307.06450
repo_url: None
paper_authors: Robert Balkin, Hector D. Ceniceros, Ruimeng Hu
for: 这个论文是为了解决含有延迟效应的多代理人游戏的closed-loop纳什平衡问题。
methods: 该论文提出了一种基于深度学习的数字方法，通过具有不同的回归神经网络来参数化每个玩家的控制。这些神经网络是通过一种修改后的布朗的虚构玩家来训练的。
results: 论文通过对金融相关的问题进行测试，证明了该方法的有效性。此外，论文还开发了一些新的问题，并derived了其分析纳什平衡解的解决方案，作为对深度学习方法的评价标准。

Abstract
In this paper, we propose a numerical methodology for finding the closed-loop Nash equilibrium of stochastic delay differential games through deep learning. These games are prevalent in finance and economics where multi-agent interaction and delayed effects are often desired features in a model, but are introduced at the expense of increased dimensionality of the problem. This increased dimensionality is especially significant as that arising from the number of players is coupled with the potential infinite dimensionality caused by the delay. Our approach involves parameterizing the controls of each player using distinct recurrent neural networks. These recurrent neural network-based controls are then trained using a modified version of Brown's fictitious play, incorporating deep learning techniques. To evaluate the effectiveness of our methodology, we test it on finance-related problems with known solutions. Furthermore, we also develop new problems and derive their analytical Nash equilibrium solutions, which serve as additional benchmarks for assessing the performance of our proposed deep learning approach.

摘要
在这篇论文中，我们提出了一种数值方法来找到延迟游戏的关闭环路奈特Eq的解。这种游戏在金融和经济领域非常普遍，因为多个代理人之间的交互和延迟效果是经常需要的特性，但是这些特性会导致问题的维度增加。这种维度增加特别是由玩家的数量和延迟效果的潜在无穷维度相互作用而引起的。我们的方法是使用独特的回归神经网络来参数化每个玩家的控制。这些回归神经网络控制是通过修改布朗的虚构游戏来训练的。为了评估我们的方法的效果，我们在金融相关的问题上进行测试，并开发了新的问题，并计算了其分析奈特Eq解，这些解serve为评估我们提出的深度学习方法的benchmark。

On Collaboration in Distributed Parameter Estimation with Resource Constraints

paper_url: http://arxiv.org/abs/2307.06442
repo_url: None
paper_authors: Yu-Zhen Janice Chen, Daniel S. Menasché, Don Towsley
for: 本研究探讨了感知器/代理人数据采集和协作策略，以优化参数估计，考虑到资源约束和感知器/代理人之间的观测相关性。
methods: 本研究使用了信息最大化（或者是拉普拉斯函数最小化）问题来设计感知器/代理人的数据采集和协作策略。当知道观测变量之间的相关性时，我们分析了两种特殊情况：一种是不能利用观测变量之间的相关性进行协作估计的情况，另一种是投入有限资源以供应共同采集和传输不需要的信息，以提高参数估计的自信度。当知道某些相关性的信息不可用时，我们提议使用多重投机算法来学习最佳数据采集和协作策略，并通过实验证明其效果。
results: 本研究通过实验表明，提议的多重投机算法（DOUBLE-F、DOUBLE-Z、UCB-F、UCB-Z）在分布式参数估计问题中是有效的，可以在资源约束和观测相关性不可知的情况下提高参数估计的自信度。

Abstract
We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. When the knowledge of correlation between variables is available, we analytically identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection policy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on the estimate of the parameter of interest. When the knowledge of certain correlation is unavailable but collaboration may still be worthwhile, we propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our distributed parameter estimation problem and demonstrate that the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, are effective through simulations.

摘要
我们研究感知器/代理人数据收集和合作策略，以优化参数估计，考虑资源限制和感知器/代理人之间的观测协同关系。我们具体考虑一组感知器/代理人，每个感知器/代理人都从不同变量的多变量 Gaussian 分布中采样，并有不同的估计目标，我们将感知器/代理人的数据收集和合作策略设计问题定义为 Fisher 信息最大化（或 Cramer-Rao 约束最小化）问题。当知道变量之间的相关性信息时，我们分析出两种特殊情况：（1）在把样本之间的相关性信息不能利用 для协同估计目的时，和（2）在优化数据收集策略时，投入有限的资源，以协同采样和传输不是当前兴趣的信息，并且已知的统计信息，以提高估计参数的信度。当知道某些相关性信息不available时，我们提出了一些新的多重抓捕算法，用于学习最佳数据收集和合作策略，并在 simulations 中证明了这些算法的有效性。

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

paper_url: http://arxiv.org/abs/2307.06440
repo_url: https://github.com/jeankaddour/notrainnogain
paper_authors: Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner
for: 这项研究旨在提高Transformer基于语言模型的训练效率，以适应近年来训练计算量的快速增长。
methods: 这项研究使用了三类有效训练算法：动态架构（层栈和层产生）、批量选择（选择性反Prop和RHO损失）和高效优化器（Lion和Sophia）。
results: 在固定计算预算下使用这些方法进行BERT和T5的预处理，我们发现他们的训练、验证和下游性能减退，与基准值（完全衰减学习率）相比。我们定义了一种评估协议，使得计算可以在任意机器上进行，并将所有计算时间映射到一个引用机器（引用系统时间）。

Abstract
The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time. We discuss the limitations of our proposed protocol and release our code to encourage rigorous research in efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.

摘要
计算量 necesary for 训练Transformer-based语言模型在最近几年内大幅增长。这种趋势激发了关于高效训练算法的研究，旨在提高训练、验证和下游性能的速度比标准训练更快。在这项工作中，我们回顾了三种类型的高效训练算法：动态建筑（层堆栈、层产生）、批量选择（选择性反馈、RHO损失）和高效优化器（Lion、Sophia）。在使用这些方法预训练BERT和T5时，我们发现其训练、验证和下游性能减零比基eline WITH Fully-decayed学习率。我们定义了一种评估协议，使得计算可以在任意机器上进行，并将所有计算时间映射到一个参照机器（我们称之为参照系统时间）。我们讨论了我们所提出的评估协议的限制，并发布了我们的代码，以便促进高效训练过程的严格研究：https://github.com/JeanKaddour/NoTrainNoGain。

Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events

paper_url: http://arxiv.org/abs/2307.06434
repo_url: None
paper_authors: Boyang Yu, Nikolai Hartmann, Luca Schinnerl, Thomas Kuhr
for: 用于提高 Belle II 实验中衡量罕见过程的精度，需要大量的 simulate 数据，但这需要高度的计算成本并且大多数模拟数据被分析阶段排除。
methods: 使用图 neural network 过滤器来缩减数据量，并使用图注意力和统计方法（如采样和重新权重）来处理过滤引入的偏见。
results: 通过使用图注意力和统计方法，提高了过滤器的性能，并且可以更好地处理背景辐射。

Abstract
When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus, filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and investigated statistical methods including sampling and reweighting to deal with the biases introduced by the filtering.

摘要
在 Belle II 中测量罕见过程时，需要巨大的亮度，这意味着需要大量的 simulate 来确定信号效率和背景贡献。然而，这个过程需要高度的计算成本，而大多数 simulated 数据，特别是背景数据，都会在分析阶段被抛弃。因此，我们引入了图神经网络滤波器，以避免亮度测量中的资源浪费。在我们的工作中，我们提高了滤波器的性能，并 investigate 了统计方法，包括采样和重新权重，以处理由滤波器引入的偏见。

Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

paper_url: http://arxiv.org/abs/2307.06431
repo_url: None
paper_authors: Tobias Schröder, Zijing Ou, Jen Ning Lim, Yingzhen Li, Sebastian J. Vollmer, Andrew B. Duncan
for: 提高能量基模型的训练效率和准确性
methods: 提出了一种新的损失函数 called Energy Discrepancy (ED), 不需要计算分数或昂贵的Markov链 Monte Carlo
results: 在数值实验中，ED可以更快和更准确地学习低维数据分布，并且在高维图像数据上表现出较好的效果。

Abstract
Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.

摘要
energized-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.Here's the breakdown of the translation:* "energized-based models" becomes "能量基于模型" (néngyù jībǎo módelǐ)* "probabilistic models" becomes "概率模型" (guèshí módelǐ)* "widespread adoption" becomes "广泛的采用" (guǎngfāng de qièyòu)* "computational burden" becomes "计算负担" (jìsuàn fùdān)* "training them" becomes "训练它们" (xùxí tāmen)* "novel loss function" becomes "新的损失函数" (xīn de shèshì fúnción)* "called Energy Discrepancy" becomes "被称为能量差" (bèi xiàngwàng néngyù dà)* "ED" becomes "ED" (ED)* "explicit score matching" becomes "直接对应" (zhíxí dìbiāo)* "negative log-likelihood loss" becomes "负极log-likelihood损失" (fùjí log-likelihood shèshì)* "limits" becomes "限制" (xiàngsuā)* "interpolating between both" becomes "在两者之间进行 interpolating" (zhīyī zhījīn zhīxíng)* "minimum ED estimation" becomes "最小的ED估算" (zuìxìng de ED gèsuàn)* "nearsightedness" becomes "近视" (jìngshì)* "score-based estimation methods" becomes "分数基于的估算方法" (fēnshù jībǎo de gèsuàn fāngchéng)* "theoretical guarantees" becomes "理论保证" (lǐshù bǎozhì)* "through numerical experiments" becomes "通过数字实验" (tōngchái shùzhì shíyan)* "low-dimensional data distributions" becomes "低维度数据分布" (dīyùdù shùbù)* "high-dimensional image data" becomes "高维度图像数据" (gāoyùdù túxìng shùbù)* "manifold hypothesis" becomes "拓扑假设" (tuōpǔ jiǎxìng)* "variational decoder model" becomes "变分解码模型" (biànfāngsuī módelǐ)

Differentially Private Decoupled Graph Convolutions for Multigranular Topology Protection

paper_url: http://arxiv.org/abs/2307.06422
repo_url: None
paper_authors: Eli Chien, Wei-Ning Chen, Chao Pan, Pan Li, Ayfer Özgür, Olgica Milenkovic
for: 本文targets at solving real-world learning problems involving graph-structured data while preserving sensitive user information and interactions.
methods: 本文提出了一种新的形式化的敏感数据隐私（DP）框架，即图像隐私（GDP），以保证图像学习设置中的模型参数和预测值的隐私。此外，该文还引入了一种新的软化节点数据相互关系，以实现不同的隐私要求 для节点特征和图结构。
results: 该文的实验结果表明，DPDGC模型可以更好地平衡隐私和实用性的贸易offs，并且在七种节点分类 benchmark dataset上显示出较好的性能。

Abstract
Graph learning methods, such as Graph Neural Networks (GNNs) based on graph convolutions, are highly successful in solving real-world learning problems involving graph-structured data. However, graph learning methods expose sensitive user information and interactions not only through their model parameters but also through their model predictions. Consequently, standard Differential Privacy (DP) techniques that merely offer model weight privacy are inadequate. This is especially the case for node predictions that leverage neighboring node attributes directly via graph convolutions that create additional risks of privacy leakage. To address this problem, we introduce Graph Differential Privacy (GDP), a new formal DP framework tailored to graph learning settings that ensures both provably private model parameters and predictions. Furthermore, since there may be different privacy requirements for the node attributes and graph structure, we introduce a novel notion of relaxed node-level data adjacency. This relaxation can be used for establishing guarantees for different degrees of graph topology privacy while maintaining node attribute privacy. Importantly, this relaxation reveals a useful trade-off between utility and topology privacy for graph learning methods. In addition, our analysis of GDP reveals that existing DP-GNNs fail to exploit this trade-off due to the complex interplay between graph topology and attribute data in standard graph convolution designs. To mitigate this problem, we introduce the Differentially Private Decoupled Graph Convolution (DPDGC) model, which benefits from decoupled graph convolution while providing GDP guarantees. Extensive experiments on seven node classification benchmarking datasets demonstrate the superior privacy-utility trade-off of DPDGC over existing DP-GNNs based on standard graph convolution design.

摘要
“图学算法，如基于图 convolution 的图神经网络（GNNs），在实际世界中解决了一系列基于图结构数据的学习问题，但是图学算法会暴露用户敏感信息和交互，不仅通过模型参数，还通过模型预测结果。因此，标准的敏感数据隐私（DP）技术，只能保证模型参数的隐私，不够。尤其是节点预测结果，通过图 convolution 直接使用邻居节点属性，会增加隐私泄露的风险。为解决这个问题，我们引入图敏感隐私（GDP），一种新的正式隐私框架，可以保证模型参数和预测结果的隐私。此外，因为节点属性和图结构可能有不同的隐私要求，我们引入了一种新的节点级别数据邻接关系的relaxation。这种宽松可以用于建立不同程度的图结构隐私保证，同时维护节点属性隐私。这种宽松还表明了图学算法中的有用负担轮径性，可以用于提高图学算法的隐私性。此外，我们的分析表明，现有的DP-GNNs不能充分利用这种负担轮径性，因为标准的图 convolution 设计中的图结构和属性数据之间存在复杂的互动。为此，我们引入了干扰隐私的分离图 convolution（DPDGC）模型，它具有隐私保证和标准图 convolution 设计之间的融合。我们在七种节点分类 benchmark 数据集上进行了广泛的实验，并证明了 DPDGC 的隐私性-用途性质量比例明显超过现有的 DP-GNNs。”

Testing Sparsity Assumptions in Bayesian Networks

paper_url: http://arxiv.org/abs/2307.06406
repo_url: None
paper_authors: Luke Duttweiler, Sally W. Thurston, Anthony Almudevar
for: 本文是针对 bayesian network（BN）结构发现算法的研究，它们通常假设真实的网络具有一定的稀疏性。
methods: 本文基于 theorem 2 in Duttweiler et. al. (2023) 提出了一种使用样本 eigenvalues 测试 BN 结构的可行性，并提供了一种偏差处理程序来改善测试的精度。
results: 通过 simulations 和一个人类皮炎研究数据的示例，本文证明了该测试的性能，并建议了一种 linear BN 结构发现工作流程，以帮助选择合适的结构发现算法。

Abstract
Bayesian network (BN) structure discovery algorithms typically either make assumptions about the sparsity of the true underlying network, or are limited by computational constraints to networks with a small number of variables. While these sparsity assumptions can take various forms, frequently the assumptions focus on an upper bound for the maximum in-degree of the underlying graph $\nabla_G$. Theorem 2 in Duttweiler et. al. (2023) demonstrates that the largest eigenvalue of the normalized inverse covariance matrix ($\Omega$) of a linear BN is a lower bound for $\nabla_G$. Building on this result, this paper provides the asymptotic properties of, and a debiasing procedure for, the sample eigenvalues of $\Omega$, leading to a hypothesis test that may be used to determine if the BN has max in-degree greater than 1. A linear BN structure discovery workflow is suggested in which the investigator uses this hypothesis test to aid in selecting an appropriate structure discovery algorithm. The hypothesis test performance is evaluated through simulations and the workflow is demonstrated on data from a human psoriasis study.

摘要
bayesian 网络（BN）结构发现算法通常会假设真实的网络结构是稀疏的，或者由于计算限制只能处理具有少量变量的网络。而这些稀疏假设可以有多种形式，经常是关于真实图像 $\nabla_G$ 的最大入度上界的假设。图2在Duttweiler等人（2023）的论文中表明了正则化 inverse covariance 矩阵（ $\Omega$）的最大特征值是 $\nabla_G$ 的下界。基于这个结果，这篇论文描述了 $\Omega$ 的样本特征值的极限性质和一种减偏处理方法，导致了一种用于检测 BN 是否有最大入度大于 1 的 гипотезы测试。一个基于线性 BN 结构发现工作流程是在启用这个测试来帮助选择合适的结构发现算法。这个测试的性能通过模拟和实验来评估，并在人类皮炎研究数据上示例了这个工作流程。

Trainability, Expressivity and Interpretability in Gated Neural ODEs

paper_url: http://arxiv.org/abs/2307.06398
repo_url: https://github.com/timkimd/rnntools.jl
paper_authors: Timothy Doyeon Kim, Tankut Can, Kamesh Krishnamurthy
for: 这篇论文的目的是探讨生物和人工神经网络如何实现计算所需的任务。特别是需要复杂的记忆存储和检索 pose 一个 significan challenge для这些网络来实现或学习。
methods: 这篇论文使用了 Neural Ordinary Differential Equations (nODEs) 家族的模型，并添加了阻断交互来实现 adaptive timescales。这些模型被称为 gated Neural ODEs (gnODEs)。
results: 作者们示出了 gnODEs 可以学习 (approximate) 连续拥有器，并且可以提高解释性，以至于可以显式地可见化学习的结构。此外，作者们还引入了一种新的表达能力测试方法，用于探讨神经网络的能力是否可以生成复杂的轨迹。

Abstract
Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.

摘要
Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.Here are some key differences between the original text and the translated text:1. Word order: In Chinese, the word order is different from English. In the translated text, the word order is adjusted to conform to Simplified Chinese grammar.2. Characters: Chinese uses characters to represent words, rather than the Roman alphabet used in English. The translated text uses traditional Chinese characters to represent each word.3. Tones: Chinese is a tonal language, which means that the same word can have different meanings depending on the tone used to pronounce it. The translated text does not include tones, as Simplified Chinese does not use tones.4. Grammar: Chinese grammar is different from English, and the translated text reflects these differences. For example, Chinese has no verb conjugation, and word order is used to indicate the relationship between words.5. Vocabulary: The translated text uses Chinese vocabulary and phrases that are equivalent to the original text. However, some words and phrases may be adjusted to conform to Simplified Chinese usage.

paper_url: http://arxiv.org/abs/2307.06385
repo_url: None
paper_authors: Kalyan Ramakrishnan
for: 这篇论文关注的是 Audio-Visual Event Localization（AVEL）任务，即在视频中同时可见和听到的事件的时间本地化和分类。
methods: 我们使用了一种基本模型，首先将训练数据中的帧分割成多个时间片（slice），然后使用这些时间片来重新训练基本模型，以获得更细致的时间分布的标签。我们还提出了一个辅助目标函数，以便更好地预测本地化的事件标签。
results: 我们的三 stage 管道可以在无需改变模型结构的情况下，超越一些现有的 AVEL 方法，并在一个相关的弱监督任务中提高性能。

Abstract
Audio-Visual Event Localization (AVEL) is the task of temporally localizing and classifying \emph{audio-visual events}, i.e., events simultaneously visible and audible in a video. In this paper, we solve AVEL in a weakly-supervised setting, where only video-level event labels (their presence/absence, but not their locations in time) are available as supervision for training. Our idea is to use a base model to estimate labels on the training data at a finer temporal resolution than at the video level and re-train the model with these labels. I.e., we determine the subset of labels for each \emph{slice} of frames in a training video by (i) replacing the frames outside the slice with those from a second video having no overlap in video-level labels, and (ii) feeding this synthetic video into the base model to extract labels for just the slice in question. To handle the out-of-distribution nature of our synthetic videos, we propose an auxiliary objective for the base model that induces more reliable predictions of the localized event labels as desired. Our three-stage pipeline outperforms several existing AVEL methods with no architectural changes and improves performance on a related weakly-supervised task as well.

摘要
听视事件地理化（AVEL）是将视频中的听视事件分类和时间地理化的任务。在这篇论文中，我们在弱监督Setting下解决AVEL，即只有视频水平的事件标签（其存在或不存在，但不是时间上的位置）作为训练模型的超级vision。我们的想法是使用基本模型将训练数据中的标签进行精细的时间分辨率的估计，然后在这些标签上重新训练模型。具体来说，我们将每个slice的帧替换为另一个没有交叉的视频中的帧，然后通过基本模型来提取slice中的标签。为了处理我们的 sintetic video的异常性，我们提出了一个辅助目标来使基本模型更加可靠地预测本地化的事件标签。我们的三个阶段管道比许多现有的AVEL方法表现更好，并且提高了一个相关的弱监督任务的性能。

Personalized Anomaly Detection in PPG Data using Representation Learning and Biometric Identification

paper_url: http://arxiv.org/abs/2307.06380
repo_url: None
paper_authors: Ramin Ghorbani, Marcel J. T. Reinders, David M. J. Tax
for: 这篇论文旨在提高心跳信号中的异常检测性能，特别是针对罕见和弱迹的心跳异常。
methods: 本文提出了一个两阶段框架，首先使用表示学习将原始心跳信号转换为更有吸引力和简洁的表示，然后运用三种不同的无监督异常检测方法进行运动检测和生物识别。
results: 结果显示，表示学习可以对异常检测性能有所提高，同时降低了人类间的差异。个性化模型进一步增强了异常检测性能，说明了个体化在心跳信号中的重要性。生物识别结果显示，新用户与授权用户之间比较容易分辨，对于一群用户来说则更加困难。总之，本研究证明了表示学习和个体化在心跳信号中的异常检测性能有所提高。

Abstract
Photoplethysmography (PPG) signals, typically acquired from wearable devices, hold significant potential for continuous fitness-health monitoring. In particular, heart conditions that manifest in rare and subtle deviating heart patterns may be interesting. However, robust and reliable anomaly detection within these data remains a challenge due to the scarcity of labeled data and high inter-subject variability. This paper introduces a two-stage framework leveraging representation learning and personalization to improve anomaly detection performance in PPG data. The proposed framework first employs representation learning to transform the original PPG signals into a more discriminative and compact representation. We then apply three different unsupervised anomaly detection methods for movement detection and biometric identification. We validate our approach using two different datasets in both generalized and personalized scenarios. The results show that representation learning significantly improves anomaly detection performance while reducing the high inter-subject variability. Personalized models further enhance anomaly detection performance, underscoring the role of personalization in PPG-based fitness-health monitoring systems. The results from biometric identification show that it's easier to distinguish a new user from one intended authorized user than from a group of users. Overall, this study provides evidence of the effectiveness of representation learning and personalization for anomaly detection in PPG data.

摘要

Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks

paper_url: http://arxiv.org/abs/2307.06362
repo_url: None
paper_authors: Inbar Seroussi, Asaf Miron, Zohar Ringel
for: 这篇论文旨在提出一种权威的理论框架，以帮助选择和训练Physically Informed Neural Networks (PINNs)。
methods: 这篇论文使用了 infinitedimensional over-parameterized neural networks和 Gaussian process regression (GPR)的等价性， derivation of an integro-differential equation that governs PINN prediction in the large data-set limit – the Neurally-Informed Equation (NIE)。
results: 这篇论文通过spectral decomposition of the source term in the original differential equation来量化PINN网络中的隐式偏见。

Abstract
Physically informed neural networks (PINNs) are a promising emerging method for solving differential equations. As in many other deep learning approaches, the choice of PINN design and training protocol requires careful craftsmanship. Here, we suggest a comprehensive theoretical framework that sheds light on this important problem. Leveraging an equivalence between infinitely over-parameterized neural networks and Gaussian process regression (GPR), we derive an integro-differential equation that governs PINN prediction in the large data-set limit -- the Neurally-Informed Equation (NIE). This equation augments the original one by a kernel term reflecting architecture choices and allows quantifying implicit bias induced by the network via a spectral decomposition of the source term in the original differential equation.

摘要
物理 Informed neural networks (PINNs) 是一种有前途的新方法，用于解决 diferencial equations。在其他深度学习方法一样，选择 PINN 的设计和训练协议需要小心细作。在这里，我们提出了一个完整的理论框架，以便更好地解决这个重要问题。通过 infinitely over-parameterized neural networks 和 Gaussian process regression (GPR) 之间的等价关系，我们得到了 PINN 预测中的 Neurally-Informed Equation (NIE)。这个公式在大数据集Limit中 governs PINN 预测，并且添加了一个泛函型函数表达式，该函数表达式反映了网络架构选择，并且通过spectral decomposition of the source term in the original differential equation来衡量隐式偏见。

Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

paper_url: http://arxiv.org/abs/2307.06333
repo_url: None
paper_authors: Andi Peng, Aviv Netanyahu, Mark Ho, Tianmin Shu, Andreea Bobu, Julie Shah, Pulkit Agrawal
for: 提高机器人策略的个性化适应性，使其更能符合用户的任务目标。
methods: 利用用户反馈来自动标识任务不重要的概念，并使用这些概念进行数据扩展，以适应个性化用户任务目标。
results: 通过人工试验，我们的方法可以帮助用户更好地理解机器人失败的原因，降低必要的示例数量，并使机器人更好地适应用户的任务目标。

Abstract
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.

摘要
(Simplified Chinese translation)政策常常因为分布shift而失败 -- 在新环境中改变状态和奖励导致政策失效。数据扩展可以增强政策的鲁棒性，使模型对任务 irrelevant 的变化变得不变。然而，设计者们在新环境中不知道哪些概念是无关的，特别是当不同的用户有不同的任务完成方式的时候。我们提出一种互动式框架，利用用户直接反馈来标识个性化无关的概念。我们的关键思想是生成对比示例，让用户快速地认出可能的任务相关和无关的概念。然后，根据用户的个性化任务目标，使用这些知识进行数据扩展，并从而获得适应用户目标的策略。我们在实验中 validate 了我们的框架，并在真实的人类用户上进行了实验。我们的方法可以 (1) 帮助用户更好地理解机器人失败的原因， (2) 减少调整的次数， (3) 将机器人调整到用户个性化的任务目标。

Budgeting Counterfactual for Offline RL

paper_url: http://arxiv.org/abs/2307.06328
repo_url: None
paper_authors: Yao Liu, Pratik Chaudhari, Rasool Fakoor
for: 本研究旨在解决停机学习中数据稀缺的问题，具体来说是通过可行性推理的细分逻辑来减少极大值误差。
methods: 我们提出了一种新的方法，即在训练过程中直接约束出错动作的数量，以避免极大值误差的堆积。我们使用动态计划来决定是否进行出错动作，并且设置了最大出错动作数量的Upper bound。
results: 我们的方法在D4RL benchmark上表现较为优秀，与现有的停机学习方法相比，其总体性能更高。

Abstract
The main challenge of offline reinforcement learning, where data is limited, arises from a sequence of counterfactual reasoning dilemmas within the realm of potential actions: What if we were to choose a different course of action? These circumstances frequently give rise to extrapolation errors, which tend to accumulate exponentially with the problem horizon. Hence, it becomes crucial to acknowledge that not all decision steps are equally important to the final outcome, and to budget the number of counterfactual decisions a policy make in order to control the extrapolation. Contrary to existing approaches that use regularization on either the policy or value function, we propose an approach to explicitly bound the amount of out-of-distribution actions during training. Specifically, our method utilizes dynamic programming to decide where to extrapolate and where not to, with an upper bound on the decisions different from behavior policy. It balances between the potential for improvement from taking out-of-distribution actions and the risk of making errors due to extrapolation. Theoretically, we justify our method by the constrained optimality of the fixed point solution to our $Q$ updating rules. Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.

摘要
主要挑战在线束缚学习中，即数据有限时，是一系列对可能行动的反思困境：假设我们选择了不同的行动方案？这些情况 часто会导致推断错误，这些错误往往会积累性地增长，尤其是随着问题的规模增加。因此，在控制推断错误的同时，也变得非常重要承认不同的决策步骤对最终结果的影响不同。相比现有的方法，我们提出了一种方法，通过显式约束数量外来动作来控制推断错误。具体来说，我们的方法利用动态规划决定在训练中是否进行推断，并且设置了对行动策略的上限。这种方法可以平衡推断错误的风险和尝试外来动作的潜在改进。从理论角度来说，我们证明了我们的方法是受束优化的 fixes 点解的可行解。从实际角度来看，我们的方法在 D4RL 测试集上的总表现比现有的在线学习方法更好。

Provably Faster Gradient Descent via Long Steps

paper_url: http://arxiv.org/abs/2307.06324
repo_url: https://github.com/bgrimmer/longstepcertificates
paper_authors: Benjamin Grimmer
for: 这篇论文是为了证明梯度 DESCENT 在凸体函数优化中更快的数据速率。
methods: 这篇论文使用了计算机助け的分析技术，并使用非常常步骤 SIZE 策略，这些步骤可能会违反 DESCENT。
results: 论文证明了梯度 DESCENT 的数据速率比 traditional 一代方法快，并提出了一个假设，认为梯度 DESCENT 的数据速率可能是 $O(1/T\log T)$。

Abstract
This work establishes provably faster convergence rates for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.

摘要

Data Augmentation in Training CNNs: Injecting Noise to Images

paper_url: http://arxiv.org/abs/2307.06855
repo_url: None
paper_authors: M. Eren Akbiyik
for: 本研究旨在探讨对卷积神经网络（CNN）结构的数据扩展如何充分利用噪声插入工具。
methods: 本研究使用不同噪声模型，对不同的噪声级别进行比较，以找出最佳的噪声插入方法。
results: 研究发现，不同噪声模型的插入会对图像分类 tasks 产生不同的影响，并提出了一些新的准则和建议。这些新方法将为图像分类学习提供更好的理解和优化。

Abstract
Noise injection is a fundamental tool for data augmentation, and yet there is no widely accepted procedure to incorporate it with learning frameworks. This study analyzes the effects of adding or applying different noise models of varying magnitudes to Convolutional Neural Network (CNN) architectures. Noise models that are distributed with different density functions are given common magnitude levels via Structural Similarity (SSIM) metric in order to create an appropriate ground for comparison. The basic results are conforming with the most of the common notions in machine learning, and also introduce some novel heuristics and recommendations on noise injection. The new approaches will provide better understanding on optimal learning procedures for image classification.

摘要
噪声注入是数据增强的基本工具，但没有一个广泛接受的程序来将其与学习框架结合。这项研究分析了在卷积神经网络（CNN）架构上添加或应用不同噪声模型的效果，并通过不同分布函数来给噪声模型分配相同的噪声水平。研究结果与大多数机器学习概念相符，并提供了一些新的低噪声注入策略和建议，以便更好地理解图像分类的优化学习过程。

Facial Reenactment Through a Personalized Generator

paper_url: http://arxiv.org/abs/2307.06307
repo_url: None
paper_authors: Ariel Elazary, Yotam Nitzan, Daniel Cohen-Or
for: 这 paper 是用于 facial reenactment 的个性化生成模型的研究。
methods: 该 paper 使用了个性化生成器，通过使用简单的商业摄像头捕捉的短时间、多样化的自我扫描视频来训练个性化生成器，以保证图像具有人脸的真实性。
results: 经过广泛评估，该 paper 实现了 facial reenactment 的状态 искусственный智能性能，并且示示了可以在后期处理中进行semantic编辑和式化。

Abstract
In recent years, the role of image generative models in facial reenactment has been steadily increasing. Such models are usually subject-agnostic and trained on domain-wide datasets. The appearance of the reenacted individual is learned from a single image, and hence, the entire breadth of the individual's appearance is not entirely captured, leading these methods to resort to unfaithful hallucination. Thanks to recent advancements, it is now possible to train a personalized generative model tailored specifically to a given individual. In this paper, we propose a novel method for facial reenactment using a personalized generator. We train the generator using frames from a short, yet varied, self-scan video captured using a simple commodity camera. Images synthesized by the personalized generator are guaranteed to preserve identity. The premise of our work is that the task of reenactment is thus reduced to accurately mimicking head poses and expressions. To this end, we locate the desired frames in the latent space of the personalized generator using carefully designed latent optimization. Through extensive evaluation, we demonstrate state-of-the-art performance for facial reenactment. Furthermore, we show that since our reenactment takes place in a semantic latent space, it can be semantically edited and stylized in post-processing.

摘要
近年来，图像生成模型在人脸reenactment中的角色变得越来越重要。这些模型通常是无关主体的，并在域内数据上进行训练。通过学习单个图像中的人脸出现，这些方法会导致不准确的幻像生成。感谢最新的进步，现在可以专门为某个特定个体训练个性化生成器。在这篇论文中，我们提出一种使用专门生成器进行人脸reenactment的新方法。我们使用一段短、但具有多样性的自扫视频捕捉到了一般用途摄像头中的帧。通过专门训练个性化生成器，我们保证生成的图像会保持个体的身份。我们的工作假设是，reenactment任务可以reduced到准确地模仿头部姿势和表情。为此，我们使用特别设计的latent空间优化来确定感兴趣的帧。经过广泛评估，我们证明了在人脸reenactment中的状态之最高表现。此外，我们还表明了由于我们的reenactment发生在semanticlatent空间中，可以在后期处理中进行semantic编辑和风格化。

FDAPT: Federated Domain-adaptive Pre-training for Language Models

paper_url: http://arxiv.org/abs/2307.06933
repo_url: None
paper_authors: Lekang Jiang, Filip Svoboda, Nicholas D. Lane
for: 这个论文主要是为了探讨Domain-adaptive Pre-training (DAPT)与 Federated Learning (FL)的组合，以提高模型适应性，同时保持数据隐私。
methods: 该论文使用了Federated Domain-adaptive Pre-training (FDAPT)方法，并进行了首先的empirical研究来评估FDAPT的性能。
results: 研究发现，FDAPT可以保持下游任务性能与中央基eline相似，并且提出了一种新的算法FFDAPT，可以提高计算效率，并且与标准FDAPT的下游任务性能相似。

Abstract
Combining Domain-adaptive Pre-training (DAPT) with Federated Learning (FL) can enhance model adaptation by leveraging more sensitive and distributed data while preserving data privacy. However, few studies have focused on this method. Therefore, we conduct the first comprehensive empirical study to evaluate the performance of Federated Domain-adaptive Pre-training (FDAPT). We demonstrate that FDAPT can maintain competitive downstream task performance to the centralized baseline in both IID and non-IID situations. Furthermore, we propose a novel algorithm, Frozen Federated Domain-adaptive Pre-training (FFDAPT). FFDAPT improves the computational efficiency by 12.1% on average and exhibits similar downstream task performance to standard FDAPT, with general performance fluctuations remaining less than 1%. Finally, through a critical evaluation of our work, we identify promising future research directions for this new research area.

摘要
通过结合域适应性预训练（DAPT）和联合学习（FL），可以提高模型适应性，利用更敏感和分布式的数据，保护数据隐私。然而，目前很少关注这种方法。因此，我们进行了首次全面的实验研究，评估联邦域适应性预训练（FDAPT）的性能。我们发现，FDAPT可以与中央基线保持竞争性下沉Task性能，在IID和非IID情况下都能够达到这个目标。此外，我们提出了一种新的算法，冻结联邦域适应性预训练（FFDAPT）。FFDAPT可以提高计算效率，在 average 上降低了12.1%，并且与标准FDAPT的下沉任务性能相似，总性能波动低于1%。最后，通过对我们的工作进行批判性评估，我们确定了这个新研究领域的潜在未来研究方向。

Locally Adaptive Federated Learning via Stochastic Polyak Stepsizes

paper_url: http://arxiv.org/abs/2307.06306
repo_url: https://github.com/IssamLaradji/sps
paper_authors: Sohom Mukherjee, Nicolas Loizou, Sebastian U. Stich
For: 提高 Federated Learning 算法的性能，特别是在采用适当的步长调整方法时。* Methods: 基于最近提出的随机Polyak步长（SPS）方法，提出了新的分布式 SPS 变体（FedSPS 和 FedDecSPS），并证明其在强 convex 和 convex 设置下 linearly 收敛，并在一般情况下收敛到一个解近似的地方。* Results: 在 convex 实验中，我们的提案方法与 FedAvg 的最佳 Hyperparameter 相比赛，在 i.i.d. 情况下匹配其优化性能，并在非 i.i.d. 情况下超越 FedAvg。

Abstract
State-of-the-art federated learning algorithms such as FedAvg require carefully tuned stepsizes to achieve their best performance. The improvements proposed by existing adaptive federated methods involve tuning of additional hyperparameters such as momentum parameters, and consider adaptivity only in the server aggregation round, but not locally. These methods can be inefficient in many practical scenarios because they require excessive tuning of hyperparameters and do not capture local geometric information. In this work, we extend the recently proposed stochastic Polyak stepsize (SPS) to the federated learning setting, and propose new locally adaptive and nearly parameter-free distributed SPS variants (FedSPS and FedDecSPS). We prove that FedSPS converges linearly in strongly convex and sublinearly in convex settings when the interpolation condition (overparametrization) is satisfied, and converges to a neighborhood of the solution in the general case. We extend our proposed method to a decreasing stepsize version FedDecSPS, that converges also when the interpolation condition does not hold. We validate our theoretical claims by performing illustrative convex experiments. Our proposed algorithms match the optimization performance of FedAvg with the best tuned hyperparameters in the i.i.d. case, and outperform FedAvg in the non-i.i.d. case.

摘要
现代联合学习算法如FedAvg需要精心调整步长以达到最佳性能。现有的适应联合方法只考虑在服务器聚合轮次中的适应性，而不考虑本地的地形信息。这些方法在实际场景中可能不够灵活，因为它们需要过度调整超参数和不能捕捉本地的地形信息。在这个工作中，我们将最近提出的随机Polyak步长（SPS）应用于联合学习设置，并提出了新的分布式SPS变体（FedSPS和FedDecSPS）。我们证明了FedSPS在强 convex和sublinear convex情况下 linearly convergent，并在总体情况下 convergent to a neighborhood of the solution。我们还扩展了我们的提出的方法到一个递减步长版本FedDecSPS，该方法在不满足 interpolate condition 时也可以 convergent。我们验证了我们的理论声明，通过在 convex 实验中进行示范性实验。我们的提出的算法与FedAvg在i.i.d.情况下匹配最佳的优化性能，并在非i.i.d.情况下超越FedAvg。

Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

paper_url: http://arxiv.org/abs/2307.06304
repo_url: https://github.com/Natyren/NaViT
paper_authors: Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby
for: 这篇论文是为了挑战现有的固定分辨率预处理图像模型的做法，并提出了一种使用序列包装在训练时进行输入图像的自适应resize方法。
methods: 该论文使用了NaViT方法，即Native Resolution ViT，它在训练时使用序列包装来处理图像输入，并且可以适应不同的分辨率和比例。
results: 该论文表明了NaViT方法可以提高大规模的超参和对比图像预训练的训练效率，并且在图像分类、物体检测和 semantic segmentation 等标准任务上达到了更好的结果，同时也可以在检测时使用自适应的输入分辨率来适应不同的应用场景。

Abstract
The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence packing during training to process inputs of arbitrary resolutions and aspect ratios. Alongside flexible model usage, we demonstrate improved training efficiency for large-scale supervised and contrastive image-text pretraining. NaViT can be efficiently transferred to standard tasks such as image and video classification, object detection, and semantic segmentation and leads to improved results on robustness and fairness benchmarks. At inference time, the input resolution flexibility can be used to smoothly navigate the test-time cost-performance trade-off. We believe that NaViT marks a departure from the standard, CNN-designed, input and modelling pipeline used by most computer vision models, and represents a promising direction for ViTs.

摘要
“常见且不优雅的做法是将图像resize到固定分辨率进行计算机视觉模型处理，直到现在都没有被成功挑战。然而，模型如视Transformer（ViT）提供了 flexible sequence-based 模型，因此输入序列长度可变。我们利用这一点，使用 NaViT（Native Resolution ViT），在训练时使用序列压缩来处理任意分辨率和比例的输入。此外，我们还证明了在大规模的supervised和contrastive图像文本预训练中，NaViT可以提高训练效率。 NaViT可以高效地转移到标准任务中，如图像和视频分类、物体检测和Semantic segmentation，并导致提高了robustness和公平性标准。在推理时，输入分辨率灵活性可以用来平滑地 navigates 测试时的成本-性能质量交易。我们认为 NaViT 标志着计算机视觉模型的标准输入和模型管道中的一个重要转折，并表示了计算机视觉领域的一个有前途的方向。”

Towards a Certified Proof Checker for Deep Neural Network Verification

paper_url: http://arxiv.org/abs/2307.06299
repo_url: None
paper_authors: Remi Desmartin, Omri Isac, Grant Passmore, Kathrin Stark, Guy Katz, Ekaterina Komendantskaya
for: 这个论文的目的是提供一种新的深度神经网络（DNN）验证工具，以确保DNN在安全关键系统中的使用。
methods: 这个论文使用了Imandra丛体Proof提供的数学基础和形式验证基础，并利用了无限精度实数 arithmetic和形式验证基础来实现一种可靠的DNN验证工具。
results: 这个论文提出了一种新的DNN验证工具，可以提供更高的验证可靠性和稳定性。在进行验证时，该工具可以检查DNN的证明是否符合预期的结果，并提供一个可靠的验证机制。

Abstract
Recent developments in deep neural networks (DNNs) have led to their adoption in safety-critical systems, which in turn has heightened the need for guaranteeing their safety. These safety properties of DNNs can be proven using tools developed by the verification community. However, these tools are themselves prone to implementation bugs and numerical stability problems, which make their reliability questionable. To overcome this, some verifiers produce proofs of their results which can be checked by a trusted checker. In this work, we present a novel implementation of a proof checker for DNN verification. It improves on existing implementations by offering numerical stability and greater verifiability. To achieve this, we leverage two key capabilities of Imandra, an industrial theorem prover: its support of infinite precision real arithmetic and its formal verification infrastructure. So far, we have implemented a proof checker in Imandra, specified its correctness properties and started to verify the checker's compliance with them. Our ongoing work focuses on completing the formal verification of the checker and further optimizing its performance.

摘要
To achieve this, we leverage two key capabilities of Imandra, an industrial theorem prover: its support for infinite precision real arithmetic and its formal verification infrastructure. We have already implemented a proof checker in Imandra and specified its correctness properties. Our ongoing work focuses on completing the formal verification of the checker and further optimizing its performance.

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

paper_url: http://arxiv.org/abs/2307.06290
repo_url: None
paper_authors: Yihan Cao, Yanbin Kang, Lichao Sun
for: 提高语言模型对人工指令的理解和回应能力
methods: 使用特定的自然语言指标来评估指令遵从数据质量，并进行了广泛的finetuning实验研究关系 между数据质量和这些指标
results: 通过InstructMining选择高质量的数据，可以提高语言模型在人工指令遵从任务中的表现，比对使用未过滤的数据集来finetuning模型，提高了42.5%的 случаythrough extensive finetuning experiments and applying the results to estimate parameters in InstructMining.

Abstract
Large language models typically undergo two training stages, pretraining and finetuning. Despite that large-scale pretraining endows the model with strong capabilities to generate natural language responses, these pretrained models can still fail to understand human instructions at times. To enhance language models' ability of interpreting and responding to instructions, instruction finetuning has emerged as a critical method in this area. Recent studies found that large language models can be finetuned to perform well even with a small amount of high-quality instruction-following data. However, the selection of high-quality datasets for finetuning language models still lacks clear guidelines to follow. In this paper, we propose InstructMining, a linear rule for evaluating instruction-following data quality. We formulate InstructMining using specific natural language indicators. To investigate the relationship between data quality and these indicators, we further conduct extensive finetuning experiments. The experiment results are then applied to estimating parameters in InstructMining. To further investigate its performance, we use InstructMining to select high-quality data from unseen datasets. Results demonstrate that InstructMining can help select relatively high-quality samples from various instruction-following datasets. Compared to models finetuned on unfiltered datasets, models finetuned on InstructMining selected datasets perform better on 42.5% cases.

摘要
大型语言模型通常需要两个训练阶段，预训练和细化。 despite that large-scale pretraining endows the model with strong natural language response capabilities, these pretrained models can still fail to understand human instructions at times. To enhance language models' ability to interpret and respond to instructions, instruction finetuning has emerged as a critical method in this area. Recent studies found that large language models can be finetuned to perform well even with a small amount of high-quality instruction-following data. However, the selection of high-quality datasets for finetuning language models still lacks clear guidelines to follow. In this paper, we propose InstructMining, a linear rule for evaluating instruction-following data quality. We formulate InstructMining using specific natural language indicators. To investigate the relationship between data quality and these indicators, we further conduct extensive finetuning experiments. The experiment results are then applied to estimating parameters in InstructMining. To further investigate its performance, we use InstructMining to select high-quality data from unseen datasets. Results demonstrate that InstructMining can help select relatively high-quality samples from various instruction-following datasets. Compared to models finetuned on unfiltered datasets, models finetuned on InstructMining selected datasets perform better on 42.5% cases.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Rational Neural Network Controllers

paper_url: http://arxiv.org/abs/2307.06287
repo_url: None
paper_authors: Matthew Newton, Antonis Papachristodoulou
for: This paper aims to improve the robustness of neural network controllers in control systems by using rational activation functions and a general rational neural network structure.
methods: The paper proposes a method to recover a stabilising controller from a Sum of Squares feasibility test, and applies this method to a refined rational neural network that is more compatible with Sum of Squares programming.
results: The paper shows that the proposed method can successfully recover stabilising rational neural network controllers for neural feedback loops with non-linear plants and noise and parametric uncertainty.Here’s the full summary in Simplified Chinese:
for: 这篇论文目标是提高神经网络控制器在控制系统中的Robustness，使用了理智 activation functions和一种通用的理智神经网络结构。
methods: 论文提出了一种从Sum of Squares可行性测试中回收稳定控制器的方法，并应用这种方法于一种更适合Sum of Squares编程的精细化的理智神经网络。
results: 论文表明，提出的方法可以成功地回收稳定的理智神经网络控制器，用于神经反馈循环中的非线性植入和噪声和参数不确定性。

Abstract
Neural networks have shown great success in many machine learning related tasks, due to their ability to act as general function approximators. Recent work has demonstrated the effectiveness of neural networks in control systems (known as neural feedback loops), most notably by using a neural network as a controller. However, one of the big challenges of this approach is that neural networks have been shown to be sensitive to adversarial attacks. This means that, unless they are designed properly, they are not an ideal candidate for controllers due to issues with robustness and uncertainty, which are pivotal aspects of control systems. There has been initial work on robustness to both analyse and design dynamical systems with neural network controllers. However, one prominent issue with these methods is that they use existing neural network architectures tailored for traditional machine learning tasks. These structures may not be appropriate for neural network controllers and it is important to consider alternative architectures. This paper considers rational neural networks and presents novel rational activation functions, which can be used effectively in robustness problems for neural feedback loops. Rational activation functions are replaced by a general rational neural network structure, which is convex in the neural network's parameters. A method is proposed to recover a stabilising controller from a Sum of Squares feasibility test. This approach is then applied to a refined rational neural network which is more compatible with Sum of Squares programming. Numerical examples show that this method can successfully recover stabilising rational neural network controllers for neural feedback loops with non-linear plants with noise and parametric uncertainty.

摘要
Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

Tackling Computational Heterogeneity in FL: A Few Theoretical Insights

paper_url: http://arxiv.org/abs/2307.06283
repo_url: None
paper_authors: Adnan Ben Mansour, Gaia Carenini, Alexandre Duplessis
For: This paper focuses on Federated Learning (FL) as a solution to move data collection and training to the edge, and proposes a novel aggregation framework to tackle computational heterogeneity in federated optimization.* Methods: The paper introduces and analyzes a new aggregation framework that formalizes and addresses heterogeneity in federated optimization, including both heterogeneous data and local updates.* Results: The proposed aggregation algorithms are extensively analyzed from both theoretical and experimental perspectives.

Abstract
The future of machine learning lies in moving data collection along with training to the edge. Federated Learning, for short FL, has been recently proposed to achieve this goal. The principle of this approach is to aggregate models learned over a large number of distributed clients, i.e., resource-constrained mobile devices that collect data from their environment, to obtain a new more general model. The latter is subsequently redistributed to clients for further training. A key feature that distinguishes federated learning from data-center-based distributed training is the inherent heterogeneity. In this work, we introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneity in federated optimization, in terms of both heterogeneous data and local updates. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.

摘要
将来的机器学习未来在边缘集成。 Federation Learning（FL）是最近提出的一种方法，旨在实现这一目标。 FL的原则是将分布在多个资源有限的移动设备上进行学习的多个客户端模型集成，以获得更一般的模型。该模型后来将被重新分布给客户端进行进一步的训练。与数据中心基于分布式训练的方法不同， Federation Learning 具有内在的多样性。在这项工作中，我们介绍了一种新的聚合框架，用于正式地形式化和处理 federated 优化中的计算多样性，包括数据多样性和本地更新多样性。我们提出的聚合算法被广泛从理论和实验两个角度进行了分析。

Exposing the Fake: Effective Diffusion-Generated Images Detection

paper_url: http://arxiv.org/abs/2307.06272
repo_url: None
paper_authors: Ruipeng Ma, Jinhao Duan, Fei Kong, Xiaoshuang Shi, Kaidi Xu
for: 这篇论文的目的是提出一种检测Diffusion模型生成的图像的方法，以应对Diffusion模型对安全和隐私的潜在威胁。
methods: 这篇论文提出了一个名为Stepwise Error for Diffusion-generated Image Detection（SeDID）的检测方法，包括统计基础的SeDIDStat和神经网络基础的SeDIDNNs。SeDID利用Diffusion模型具有的特殊特征，例如复原和降噪计算的准确性，来检测Diffusion模型生成的图像。
results: 我们的评估表明，SeDID在应用于Diffusion模型时表现出色，较 existing methods 更好。因此，这篇论文对于分辨Diffusion模型生成的图像做出了重要贡献，实现了人工智能安全领域的一个重要突破口。

Abstract
Image synthesis has seen significant advancements with the advent of diffusion-based generative models like Denoising Diffusion Probabilistic Models (DDPM) and text-to-image diffusion models. Despite their efficacy, there is a dearth of research dedicated to detecting diffusion-generated images, which could pose potential security and privacy risks. This paper addresses this gap by proposing a novel detection method called Stepwise Error for Diffusion-generated Image Detection (SeDID). Comprising statistical-based $\text{SeDID}_{\text{Stat}$ and neural network-based $\text{SeDID}_{\text{NNs}$, SeDID exploits the unique attributes of diffusion models, namely deterministic reverse and deterministic denoising computation errors. Our evaluations demonstrate SeDID's superior performance over existing methods when applied to diffusion models. Thus, our work makes a pivotal contribution to distinguishing diffusion model-generated images, marking a significant step in the domain of artificial intelligence security.

摘要
SeDID consists of two components: statistical-based SeDID-Stat and neural network-based SeDID-NNs. It utilizes the unique characteristics of diffusion models, specifically the deterministic reverse and denoising computation errors. Our evaluations show that SeDID outperforms existing methods when applied to diffusion models. Therefore, our work makes an important contribution to distinguishing diffusion model-generated images, marking a significant step forward in the field of artificial intelligence security.

Physics-informed Machine Learning for Calibrating Macroscopic Traffic Flow Models

paper_url: http://arxiv.org/abs/2307.06267
repo_url: None
paper_authors: Yu Tang, Li Jin, Kaan Ozbay
for: 本研究旨在提出一种基于学习的拟合方法，用于理解交通现象和设计控制策略。
methods: 我们提出一种组合了经典深度自适应神经网络和交通流模型的方法，通过指定物理交通流模型来塑造神经网络的编码器，以便根据流速和速度测量得到合理的交通参数。我们还引入了降噪自适应神经网络，以处理含有缺失值的数据。
results: 我们通过一个在加利福尼亚州I-210 E的示例研究，证明我们的方法可以达到和优于传统优化方法的性能。

Abstract
Well-calibrated traffic flow models are fundamental to understanding traffic phenomena and designing control strategies. Traditional calibration has been developed base on optimization methods. In this paper, we propose a novel physics-informed, learning-based calibration approach that achieves performances comparable to and even better than those of optimization-based methods. To this end, we combine the classical deep autoencoder, an unsupervised machine learning model consisting of one encoder and one decoder, with traffic flow models. Our approach informs the decoder of the physical traffic flow models and thus induces the encoder to yield reasonable traffic parameters given flow and speed measurements. We also introduce the denoising autoencoder into our method so that it can handles not only with normal data but also with corrupted data with missing values. We verified our approach with a case study of I-210 E in California.

摘要
tradicional 的准确性检查方法是交通现象的基础，用于设计控制策略。在这篇论文中，我们提出了一种新的物理学习基于准确性方法，可以与优化方法相比。为此，我们将经典的深度自适应神经网络与交通流模型结合起来。我们的方法使得神经网络的解码器了解物理交通流模型，因此使得神经网络的编码器可以根据流速度测量获得合理的交通参数。我们还引入了杂化自适应神经网络，使得它可以处理不仅正常数据，还可以处理含有缺失值的数据。我们在加利福尼亚州I-210 E的案例研究中验证了我们的方法。

On the hierarchical Bayesian modelling of frequency response functions

paper_url: http://arxiv.org/abs/2307.06263
repo_url: None
paper_authors: T. A. Dardeno, R. S. Mills, N. Dervilis, K. Worden, L. A. Bull
for: 这个研究旨在监控人口中的结构健康状态，通过共享正常和损坏情况资讯，提高成员之间的决策。
methods: 这个研究使用了阶层式 Bayesian 方法，同时学习人口和单位（或领域）层级的Statistical Distribution，以增强参数间的统计强度。
results: 这个研究发展了一个可 combinatorial 的 probabilistic FRF 模型，可以处理不同温度情况下的单位之间的差异，同时利用单位之间的相似性。

Abstract
Population-based structural health monitoring (PBSHM) aims to share valuable information among members of a population, such as normal- and damage-condition data, to improve inferences regarding the health states of the members. Even when the population is comprised of nominally-identical structures, benign variations among the members will exist as a result of slight differences in material properties, geometry, boundary conditions, or environmental effects (e.g., temperature changes). These discrepancies can affect modal properties and present as changes in the characteristics of the resonance peaks of the frequency response function (FRF). Many SHM strategies depend on monitoring the dynamic properties of structures, so benign variations can be challenging for the practical implementation of these systems. Another common challenge with vibration-based SHM is data loss, which may result from transmission issues, sensor failure, a sample-rate mismatch between sensors, and other causes. Missing data in the time domain will result in decreased resolution in the frequency domain, which can impair dynamic characterisation. The hierarchical Bayesian approach provides a useful modelling structure for PBSHM, because statistical distributions at the population and individual (or domain) level are learnt simultaneously to bolster statistical strength among the parameters. As a result, variance is reduced among the parameter estimates, particularly when data are limited. In this paper, combined probabilistic FRF models are developed for a small population of nominally-identical helicopter blades under varying temperature conditions, using a hierarchical Bayesian structure. These models address critical challenges in SHM, by accommodating benign variations that present as differences in the underlying dynamics, while also considering (and utilising), the similarities among the blades.

摘要
《人口基于结构健康监测（PBSHM）》的目标是分享人口中成员的有价值信息，如正常和损害状态数据，以提高成员的健康状态的推断。即使人口由 nominally-identical 结构组成，也会存在由材料属性、几何 Parameters、边界条件和环境因素（例如温度变化）引起的轻微差异。这些差异可能影响模式特性，并在频率响应函数（FRF）的特征峰上出现变化。许多Structural Health Monitoring 策略依赖于监测结构的动态性能，因此轻微差异可能对实施这些系统的实用性造成挑战。另一个常见的挑战是数据丢失，可能由传输问题、传感器失效、抽象率匹配问题等所导致。在时域上缺失数据会导致频域中的分辨率降低，从而降低动态特征的描述。使用 hierarchical Bayesian 结构可以有效地为 PBSHM 提供模型结构，因为在人口和个体（或领域）层次上同时学习了统计分布，从而增强参数间的统计强度，特别是数据有限时。在这篇文章中，我们使用 hierarchical Bayesian 结构，为一小群 nominally-identical 飞机扇子在不同温度条件下的共聚 probabilistic FRF 模型的开发。这些模型可以解决 SHM 中的关键挑战，通过同时考虑结构之间的相似性和差异，以及利用这些相似性。

2023-07-13

PC-Droid: Faster diffusion and improved quality for particle cloud generation

A Novel Bayes’ Theorem for Upper Probabilities

A Causal Framework to Unify Common Domain Generalization Approaches

TinyMetaFed: Efficient Federated Meta-Learning for TinyML

Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics

Robotic surface exploration with vision and tactile sensing for cracks detection and characterisation

Towards Ordinal Data Science

A decision framework for selecting information-transfer strategies in population-based SHM

Generalizing Supervised Deep Learning MRI Reconstruction to Multiple and Unseen Contrasts using Meta-Learning Hypernetworks

Privacy-Utility Trade-offs in Neural Networks for Medical Population Graphs: Insights from Differential Privacy and Graph Structure

Extended Graph Assessment Metrics for Graph Neural Networks

Neuro-symbolic Empowered Denoising Diffusion Probabilistic Models for Real-time Anomaly Detection in Industry 4.0

Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent

Learning Multiple Coordinated Agents under Directed Acyclic Graph Constraints

Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach

Implicit regularization in AI meets generalized hardness of approximation in optimization – Sharp results for diagonal linear networks

MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

Unsupervised Calibration through Prior Adaptation for Text Classification using Large Language Models

GRAN is superior to GraphRNN: node orderings, kernel- and graph embeddings-based metrics for graph generators

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

Short Boolean Formulas as Explanations in Practice

IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation

Ageing Analysis of Embedded SRAM on a Large-Scale Testbed Using Machine Learning

Aeolus Ocean – A simulation environment for the autonomous COLREG-compliant navigation of Unmanned Surface Vehicles using Deep Reinforcement Learning and Maritime Object Detection

Machine Learning-Assisted Pattern Recognition Algorithms for Estimating Ultimate Tensile Strength in Fused Deposition Modeled Polylactic Acid Specimens

Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

Layerwise Linear Mode Connectivity

Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks

An Improved Uniform Convergence Bound with Fat-Shattering Dimension

Discovering How Agents Learn Using Few Data

Frameless Graph Knowledge Distillation

Quantum Autoencoders for Learning Quantum Channel Codes

Online Distributed Learning with Quantized Finite-Time Coordination

Learning IMM Filter Parameters from Measurements using Gradient Descent

Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks

Is Task-Agnostic Explainable AI a Myth?

Deep Neural Networks for Semiparametric Frailty Models via H-likelihood

Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

Prescriptive Process Monitoring Under Resource Constraints: A Reinforcement Learning Approach

Nested Elimination: A Simple Algorithm for Best-Item Identification from Choice-Based Feedback

Metal Oxide-based Gas Sensor Array for the VOCs Analysis in Complex Mixtures using Machine Learning

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

Causal Influences over Social Learning Networks

Full-resolution Lung Nodule Segmentation from Chest X-ray Images using Residual Encoder-Decoder Networks

On the Effective Horizon of Inverse Reinforcement Learning

Convolutional Neural Networks for Sentiment Analysis on Weibo Data: A Natural Language Processing Approach

Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems

DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection

Artificial Intelligence for Drug Discovery: Are We There Yet?

Machine Learning practices and infrastructures

Leveraging Contextual Counterfactuals Toward Belief Calibration

Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems

Embracing the chaos: analysis and diagnosis of numerical instability in variational flows

Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

Stochastic Delay Differential Games: Financial Modeling and Machine Learning Algorithms

On Collaboration in Distributed Parameter Estimation with Resource Constraints

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events

Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

Differentially Private Decoupled Graph Convolutions for Multigranular Topology Protection

Testing Sparsity Assumptions in Bayesian Networks

Trainability, Expressivity and Interpretability in Gated Neural ODEs

Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization

Personalized Anomaly Detection in PPG Data using Representation Learning and Biometric Identification

Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks

Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

Budgeting Counterfactual for Offline RL

Provably Faster Gradient Descent via Long Steps

Data Augmentation in Training CNNs: Injecting Noise to Images

Facial Reenactment Through a Personalized Generator

FDAPT: Federated Domain-adaptive Pre-training for Language Models

Locally Adaptive Federated Learning via Stochastic Polyak Stepsizes

Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Towards a Certified Proof Checker for Deep Neural Network Verification

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models