2023-10-24

cs.LG

cs.LG - 2023-10-24

Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction

paper_url: http://arxiv.org/abs/2310.16241
repo_url: None
paper_authors: Afiya Ayman, Ayan Mukhopadhyay, Aron Laszka
for: 本研究旨在找到优化多任务学习（MTL）模型性能的任务组合方法。
methods: 研究者使用了四个通用的 benchmark 数据集，对于神经网络基于的 MTL 模型进行研究，并identified inherent task features和单任务学习（STL）特征，以预测任务组合是否可以通过 MTL 进行学习。
results: 研究者提出了一种随机搜索算法，使用预测器来最小化 MTL 训练的数量，并在四个 benchmark 数据集上 demonstarted 该方法可以找到更好的任务组合，比较 existed 的基eline 方法更高。

Abstract
When a number of similar tasks have to be learned simultaneously, multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. However, the advantage of MTL depends on various factors, such as the similarity of the tasks, the sizes of the datasets, and so on; in fact, some tasks might not benefit from MTL and may even incur a loss of accuracy compared to STL. Hence, the question arises: which tasks should be learned together? Domain experts can attempt to group tasks together following intuition, experience, and best practices, but manual grouping can be labor-intensive and far from optimal. In this paper, we propose a novel automated approach for task grouping. First, we study the affinity of tasks for MTL using four benchmark datasets that have been used extensively in the MTL literature, focusing on neural network-based MTL models. We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL. Building on this predictor, we introduce a randomized search algorithm, which employs the predictor to minimize the number of MTL trainings performed during the search for task groups. We demonstrate on the four benchmark datasets that our predictor-driven search approach can find better task groupings than existing baseline approaches.

摘要
In this paper, we propose a novel automated approach for task grouping. First, we analyze the affinity of tasks for MTL using four benchmark datasets that have been widely used in the MTL literature, focusing on neural network-based MTL models. We identify inherent task features and STL characteristics that can help predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL. Building on this predictor, we introduce a randomized search algorithm that employs the predictor to minimize the number of MTL trainings performed during the search for task groups. We demonstrate on the four benchmark datasets that our predictor-driven search approach can find better task groupings than existing baseline approaches.

Attention-Based Ensemble Pooling for Time Series Forecasting

paper_url: http://arxiv.org/abs/2310.16231
repo_url: https://github.com/awikner/denpool
paper_authors: Dhruvit Patel, Alexander Wikner
for: 降低时间序列预测模型偏见
methods: 使用 ensemble 预测模型并将其输出汇总为ensemble预测
results: 在非站ARY Lorenz ‘63方程的多步预测中表现出色，但在COVID-19每周病例死亡的一步预测中不一定比现有的ensemble pooling表现更好。Here’s the breakdown of each point:
for: 降低时间序列预测模型偏见 (What the paper is written for: reducing bias in time-series forecasting models)
methods: 使用 ensemble 预测模型并将其输出汇总为ensemble预测 (What methods the paper uses: using an ensemble of predictive models and pooling their output)
results: 在非站ARY Lorenz ‘63方程的多步预测中表现出色，但在COVID-19每周病例死亡的一步预测中不一定比现有的ensemble pooling表现更好。 (What results the paper gets: excellent performance in multi-step forecasting of the non-stationary Lorenz ‘63 equation, but not consistently better than existing ensemble pooling in one-step forecasting of COVID-19 weekly incident deaths)

Abstract
A common technique to reduce model bias in time-series forecasting is to use an ensemble of predictive models and pool their output into an ensemble forecast. In cases where each predictive model has different biases, however, it is not always clear exactly how each model forecast should be weighed during this pooling. We propose a method for pooling that performs a weighted average over candidate model forecasts, where the weights are learned by an attention-based ensemble pooling model. We test this method on two time-series forecasting problems: multi-step forecasting of the dynamics of the non-stationary Lorenz `63 equation, and one-step forecasting of the weekly incident deaths due to COVID-19. We find that while our model achieves excellent valid times when forecasting the non-stationary Lorenz `63 equation, it does not consistently perform better than the existing ensemble pooling when forecasting COVID-19 weekly incident deaths.

摘要
一种常见的减少模型偏见技术在时间序列预测中是使用一个ensemble的预测模型，并将其输出Pool到一个ensemble预测中。在每个预测模型具有不同偏见的情况下，不一定是如何对每个模型预测应该进行权重的。我们提议一种使用注意力基于的ensemblePooling模型来学习 weights。我们在两个时间序列预测问题上测试了这种方法：非站点性 Lorenz '63 方程的多步预测和 COVID-19 每周病例死亡的一步预测。我们发现，虽然我们的模型在预测非站点性 Lorenz '63 方程时表现出色，但在预测 COVID-19 每周病例死亡时不一定能够超越现有的ensemblePooling。

Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks

paper_url: http://arxiv.org/abs/2310.16224
repo_url: None
paper_authors: Xinglong Chang, Katharina Dost, Gillian Dobbie, Jörg Wicker
for: 这个论文是为了检测机器学习模型中的攻击。
methods: 这个检测方法使用了一个全新的无预设框架，名为DIVA，它可以对于潜在的毒素资料集进行检测。DIVA 基于Classifier的精度差异 між受毒和清洁资料集来检测攻击。
results: 在这个论文中，DIVA 方法在面对标签转换攻击时得到了良好的结果，能够妥善地检测并识别攻击。

Abstract
The performance of machine learning models depends on the quality of the underlying data. Malicious actors can attack the model by poisoning the training data. Current detectors are tied to either specific data types, models, or attacks, and therefore have limited applicability in real-world scenarios. This paper presents a novel fully-agnostic framework, DIVA (Detecting InVisible Attacks), that detects attacks solely relying on analyzing the potentially poisoned data set. DIVA is based on the idea that poisoning attacks can be detected by comparing the classifier's accuracy on poisoned and clean data and pre-trains a meta-learner using Complexity Measures to estimate the otherwise unknown accuracy on a hypothetical clean dataset. The framework applies to generic poisoning attacks. For evaluation purposes, in this paper, we test DIVA on label-flipping attacks.

摘要
Machine learning 模型的性能取决于训练数据的质量。恶意攻击者可以让训练数据中掺入假数据，从而影响模型的性能。现有的检测器受限于特定数据类型、模型或攻击方式，因此在实际场景中有有限的应用。这篇论文提出了一种全新的无关数据类型和模型的检测框架，名为DIVA（检测隐藏攻击）。DIVA基于的想法是，攻击者可以通过比较污染和干净数据集中分类器的准确率来检测攻击。DIVA使用复杂度度量来估算干净数据集中模型的准确率，并在这个假设中预训练一个元学习器。该框架适用于普通的污染攻击。为评估目的，在这篇论文中，我们对标签替换攻击进行测试。

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

paper_url: http://arxiv.org/abs/2310.16214
repo_url: None
paper_authors: Adrian Perez Dieguez, Margarita Amor Lopez
for: 本文旨在提高GPU嵌入式系统的性能，以满足实时或时间consuming应用的需求。
methods: 本文提出了两种优化方法，一是分析模型驱动的优化方法，另一是基于机器学习（ML）的优化方法。
results: 对各种并行前缀操作（FFT、扫描 primitives、三角系统解）的表现进行了性能分析，并提供了开发者和研究人员可以参考的实践指导。

Abstract
GPU-embedded systems have gained popularity across various domains due to their efficient power consumption. However, in order to meet the demands of real-time or time-consuming applications running on these systems, it is crucial for them to be tuned to exhibit high performance. This paper addresses the issue by developing and comparing two tuning methodologies on GPU-embedded systems, and also provides performance insights for developers and researchers seeking to optimize applications running on these architectures. We focus on parallel prefix operations, such as FFT, scan primitives, and tridiagonal system solvers, which are performance-critical components in many applications. The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology. We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system, and compare their performance to the ones achieved through an exhaustive search. The findings shed light on the best strategies for handling the open challenge of performance portability for major computational patterns among server and embedded devices, providing practical guidance for offline and online tuning. We also address the existing gap in performance studies for parallel computational patterns in GPU-embedded systems by comparing the BPLG performance against other state-of-the-art libraries, including CUSPARSE, CUB, and CUFFT.

摘要

ELM Ridge Regression Boosting

paper_url: http://arxiv.org/abs/2310.16209
repo_url: None
paper_authors: M. Andrecut
for: 提高Extreme Learning Machine（ELM）的分类性能和Robustness。
methods: 使用Boosting方法对Ridge Regression（RR）方法进行改进。
results: 提高了ELM的分类性能和Robustness。In simplified Chinese, the three key points would be:
for: 提高ELM的分类性能和Robustness。
methods: 使用Boosting方法对RR方法进行改进。
results: 提高了ELM的分类性能和Robustness。

Abstract
We discuss a boosting approach for the Ridge Regression (RR) method, with applications to the Extreme Learning Machine (ELM), and we show that the proposed method significantly improves the classification performance and robustness of ELMs.

摘要
我们讨论了一种扩充方法，用于ridge回归(RR)方法，并应用于极限学习机(ELM)中。我们显示，提案的方法可以显著提高ELM的分类性能和可靠性。Here's the word-for-word translation:我们讨论了一种扩充方法，用于ridge回归(RR)方法，并应用于极限学习机(ELM)中。我们显示，提案的方法可以显著提高ELM的分类性能和可靠性。Note that the word "ridge" is translated as "ridge回归" (ridge regression) in Simplified Chinese, and "extreme learning machine" is translated as "极限学习机" (extreme learning machine).

Efficient deep data assimilation with sparse observations and time-varying sensors

paper_url: http://arxiv.org/abs/2310.16187
repo_url: https://github.com/dl-wg/vivid
paper_authors: Sibo Cheng, Che Liu, Yike Guo, Rossella Arcucci
for:* 这个论文是为了提出一种新的变量数据整合（DA）方法，用于处理高维动力系统中的不Structured观测数据。methods:* 这种新的DA方法称为Voronoi-tessellation Inverse operator for VariatIonal Data assimilation（VIVID），它 integration了深度学习（DL） inverse operator到数据整合目标函数中。* VIVID使用 Voronoi-tessellation和卷积神经网络来处理稀疏、不结构的观测数据，并且可以轻松地与Proper Orthogonal Decomposition（POD）结合，实现一个综合的减少维度数据整合方案。results:* 在一个流体动力系统中的数值实验中，VIVID可以明显超过现有的DA和DL算法。* VIVID的稳定性也被证明，通过对不同水平的先验错误、不同数量的探测器和数据整合错误 covariance的使用进行评估。

Abstract
Variational Data Assimilation (DA) has been broadly used in engineering problems for field reconstruction and prediction by performing a weighted combination of multiple sources of noisy data. In recent years, the integration of deep learning (DL) techniques in DA has shown promise in improving the efficiency and accuracy in high-dimensional dynamical systems. Nevertheless, existing deep DA approaches face difficulties in dealing with unstructured observation data, especially when the placement and number of sensors are dynamic over time. We introduce a novel variational DA scheme, named Voronoi-tessellation Inverse operator for VariatIonal Data assimilation (VIVID), that incorporates a DL inverse operator into the assimilation objective function. By leveraging the capabilities of the Voronoi-tessellation and convolutional neural networks, VIVID is adept at handling sparse, unstructured, and time-varying sensor data. Furthermore, the incorporation of the DL inverse operator establishes a direct link between observation and state space, leading to a reduction in the number of minimization steps required for DA. Additionally, VIVID can be seamlessly integrated with Proper Orthogonal Decomposition (POD) to develop an end-to-end reduced-order DA scheme, which can further expedite field reconstruction. Numerical experiments in a fluid dynamics system demonstrate that VIVID can significantly outperform existing DA and DL algorithms. The robustness of VIVID is also accessed through the application of various levels of prior error, the utilization of varying numbers of sensors, and the misspecification of error covariance in DA.

摘要
“Variational Data Assimilation（DA）在工程问题中广泛应用于场景重建和预测，通过对多种噪声数据进行权重组合。在过去几年，将深度学习（DL）技术integrated into DA中的潜在优化高维动态系统的效率和准确性。然而，现有的深度DA方法在处理不结构化观测数据时存在困难，特别是当感知器的位置和数量在时间上是动态变化的。我们介绍了一种新的variational DA方案，名为Voronoi-tessellation Inverse operator for VariatIonal Data assimilation（VIVID），它通过在权重组合中引入深度反向运算来解决这些问题。通过利用Voronoi-tessellation和卷积神经网络，VIVID能够处理稀疏、不结构化和时间变化的感知数据。此外，将深度反向运算integrated into DA目标函数可以直接连接观测和状态空间，从而减少DA的最小化步骤数量。此外，VIVID可以轻松地与Proper Orthogonal Decomposition（POD）结合，开发一个端到端减少DA方案，可以进一步加速场景重建。数值实验在流体动力系统中表明，VIVID可以明显超越现有的DA和DL算法。此外，我们还评估了VIVID的稳定性，通过在不同水平的先验错误、不同数量的感知器和DA中错误协方差的情况下进行测试。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images

paper_url: http://arxiv.org/abs/2310.16186
repo_url: None
paper_authors: Howard Yanxon, Eric Roberts, Hannah Parraga, James Weng, Wenqian Xu, Uta Ruett, Alexander Hexemer, Petrus Zwart, Nickolas Schwarz
for: 这个论文是为了探讨材料功能设备中的晶体结构的研究。
methods: 该论文提出了一种使用深度学习卷积神经网络方法来识别实验XRD图像中的假象。
results: 研究结果表明，U-Nets可以在测试数据集上保持92.4%的准确率，同时减少了False Positive的平均值34%，并减少了对假象的分析时间超过50%。

Abstract
Scientific researchers frequently use the in situ synchrotron high-energy powder X-ray diffraction (XRD) technique to examine the crystallographic structures of materials in functional devices such as rechargeable battery materials. We propose a method for identifying artifacts in experimental XRD images. The proposed method uses deep learning convolutional neural network architectures, such as tunable U-Nets to identify the artifacts. In particular, the predicted artifacts are evaluated against the corresponding ground truth (manually implemented) using the overall true positive rate or recall. The result demonstrates that the U-Nets can consistently produce great recall performance at 92.4% on the test dataset, which is not included in the training, with a 34% reduction in average false positives in comparison to the conventional method. The U-Nets also reduce the time required to identify and separate artifacts by more than 50%. Furthermore, the exclusion of the artifacts shows major changes in the integrated 1D XRD pattern, enhancing further analysis of the post-processing XRD data.

摘要
Here's the Simplified Chinese translation:科学研究人员 часто使用坐垦同步троン高能粉末X射 diffraction（XRD）技术来研究功能设备中材料的 кристал化结构。我们提出了用深度学习 convolutional neural network 架构来Identify experimental XRD 图像中的 artifacts的方法。特别是，预测的 artifacts 将与相应的ground truth（手动实现的）进行评估，并使用 true positive rate 或 recall 来评估预测的性能。结果显示，U-Nets 可以在测试集上保持92.4%的回归性能，与传统方法相比下降34%的假阳性率。U-Nets 还可以降低识别和分离 artifacts 所需的时间超过50%。此外，排除 artifacts 会导致整个1D XRD 图像中的积分呈现出明显的变化，进一步促进了后处理 XRD 数据的分析。

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration

paper_url: http://arxiv.org/abs/2310.16173
repo_url: None
paper_authors: Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury
for: 提供了深度强化学习中DQN($\epsilon$-资源探索)的理论理解。
methods: 使用目标网络和经验回放来获得不偏的MSBE估计，并提供了第一个实际 Setting中DQN的 тео리тиче converge和样本复杂度分析。
results: 证明了一种循环过程，其中 decaying $\epsilon$ converge to the optimal Q-value function geometrically，而且 higher level of $\epsilon$ values 增加了整体的整合区域，但是降低了循环速度，相反，lower level of $\epsilon$ values 减少了整合区域，但是提高了循环速度。实验证明了我们的理论成果。

Abstract
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.

摘要

Fine tuning Pre trained Models for Robustness Under Noisy Labels

paper_url: http://arxiv.org/abs/2310.17668
repo_url: None
paper_authors: Sumyeong Ahn, Sihyeon Kim, Jongwoo Ko, Se-Young Yun
for: 这篇论文的目的是为了解决养分拥有误标的训练集时，机器学习模型的性能如何受到影响。
methods: 这篇论文使用了对于误标数据的研究，以及对于这些误标数据的处理和范例。
results: 这篇论文的结果显示，使用TURN算法可以实现高效的误标数据处理，并且可以增强预训模型在不同 benchmark 上的表现。

Abstract
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. To tackle this issue, researchers have explored methods for Learning with Noisy Labels to identify clean samples and reduce the influence of noisy labels. However, constraining the influence of a certain portion of the training dataset can result in a reduction in overall generalization performance. To alleviate this, recent studies have considered the careful utilization of noisy labels by leveraging huge computational resources. Therefore, the increasing training cost necessitates a reevaluation of efficiency. In other areas of research, there has been a focus on developing fine-tuning techniques for large pre-trained models that aim to achieve both high generalization performance and efficiency. However, these methods have mainly concentrated on clean datasets, and there has been limited exploration of the noisy label scenario. In this research, our aim is to find an appropriate way to fine-tune pre-trained models for noisy labeled datasets. To achieve this goal, we investigate the characteristics of pre-trained models when they encounter noisy datasets. Through empirical analysis, we introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models. The algorithm consists of two main steps: (1) independently tuning the linear classifier to protect the feature extractor from being distorted by noisy labels, and (2) reducing the noisy label ratio and fine-tuning the entire model based on the noise-reduced dataset to adapt it to the target dataset. The proposed algorithm has been extensively tested and demonstrates efficient yet improved denoising performance on various benchmarks compared to previous methods.

摘要
训练数据中噪音标签的存在可能对机器学习模型的性能产生深见影响。为了解决这问题，研究人员已经探索了学习噪音标签的方法，以避免噪音标签的影响。然而，限制噪音标签的影响可能会导致总体的适应性下降。为了解决这问题，最近的研究已经考虑了大量计算资源的利用，以减少噪音标签的影响。因此，随着训练成本的增加，我们需要重新评估效率。在其他研究领域，研究人员已经对大型预训练模型进行了细化调整，以达到高适应性和高效率的目标。然而，这些方法主要集中在干净数据上进行调整，而噪音标签场景得到的研究很少。在这项研究中，我们的目标是找到适合的方法来调整预训练模型，以适应噪音标签数据。为了实现这个目标，我们进行了employm empirical分析，并提出了一种名为TURN的新算法。TURN算法包括两个主要步骤：（1）独立地调整线性分类器，以防止噪音标签对特征提取器的影响，和（2）通过减少噪音标签比例，并在减少后进行 fine-tuning，以适应目标数据。我们对TURN算法进行了广泛的测试，并证明了它可以有效地、高效率地除噪。

Brainchop: Next Generation Web-Based Neuroimaging Application

paper_url: http://arxiv.org/abs/2310.16162
repo_url: https://github.com/neuroneural/brainchop
paper_authors: Mohamed Masoud, Pratyush Reddy, Farfalla Hu, Sergey Plis
for: This paper is written for researchers and practitioners in the field of neuroimaging, particularly those interested in whole brain preprocessing and segmentation using deep learning models.
methods: The paper uses a pre-trained full-brain deep learning model to perform volumetric analysis of structural MRI data directly within the browser, without requiring technical expertise or intricate setup procedures. The MeshNet architecture is used to enable client-side processing for volumetric data.
results: The paper evaluates the performance of the Brainchop tool across various software and hardware configurations, demonstrating the practicality of client-side processing for volumetric data even within the resource-constrained environment of web browsers. The results show that Brainchop offers multiple benefits, including scalability, low latency, user-friendly operation, cross-platform compatibility, and enhanced accessibility.

Abstract
Performing volumetric image processing directly within the browser, particularly with medical data, presents unprecedented challenges compared to conventional backend tools. These challenges arise from limitations inherent in browser environments, such as constrained computational resources and the availability of frontend machine learning libraries. Consequently, there is a shortage of neuroimaging frontend tools capable of providing comprehensive end-to-end solutions for whole brain preprocessing and segmentation while preserving end-user data privacy and residency. In light of this context, we introduce Brainchop (http://www.brainchop.org) as a groundbreaking in-browser neuroimaging tool that enables volumetric analysis of structural MRI using pre-trained full-brain deep learning models, all without requiring technical expertise or intricate setup procedures. Beyond its commitment to data privacy, this frontend tool offers multiple features, including scalability, low latency, user-friendly operation, cross-platform compatibility, and enhanced accessibility. This paper outlines the processing pipeline of Brainchop and evaluates the performance of models across various software and hardware configurations. The results demonstrate the practicality of client-side processing for volumetric data, owing to the robust MeshNet architecture, even within the resource-constrained environment of web browsers.

摘要
<>将三维图像处理直接在浏览器中进行，特别是在医疗数据上，存在前所未有的挑战。这些挑战来自浏览器环境的限制，如计算资源的紧张和前端机器学习库的可用性。因此，存在一个缺乏神经成像前端工具，可以提供全 bran 的整体端到端解决方案，保持用户数据隐私和存储。在这个 контексте，我们介绍 Brainchop（http://www.brainchop.org），一种创新的在浏览器中的神经成像工具，可以使用预训练的全 bran 深度学习模型进行结构 MRI 的三维分析，无需技术专业知识或复杂的设置过程。 Brainchop 除了保持用户数据隐私外，还具有规模、延迟低、易用操作、跨平台兼容和更好的可访问性等多个特点。本文介绍 Brainchop 的处理管道和模型性能的评估，结果表明在web浏览器的资源环境中，可以通过强大的 MeshNet 架构实现Client-side 处理三维数据的实用性。

Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations

paper_url: http://arxiv.org/abs/2310.16154
repo_url: None
paper_authors: Leonardo Petrini
for: 这个论文旨在探讨深度学习模型的理论基础，具体来说是研究深度学习模型如何从数据中学习有用的特征，以及这些模型如何在高维数据中学习函数。
methods: 该论文采用了实验室方法，结合物理启发式的小型模型，以研究和解释深度学习系统中复杂的行为。
results: 该论文发现了深度学习模型的效果听起来是受数据结构的支持，而不是受数据量的支持。此外，不同的建筑均可以利用不同的数据结构，从而提高模型的性能。

Abstract
Artificial intelligence, particularly the subfield of machine learning, has seen a paradigm shift towards data-driven models that learn from and adapt to data. This has resulted in unprecedented advancements in various domains such as natural language processing and computer vision, largely attributed to deep learning, a special class of machine learning models. Deep learning arguably surpasses traditional approaches by learning the relevant features from raw data through a series of computational layers. This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process. In particular, we ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality-i.e. the difficulty of generally learning functions in high dimensions due to the exponentially increasing need for data points with increased dimensionality? Is it their ability to learn relevant representations of the data by exploiting their structure? How do different architectures exploit different data structures? In order to address these questions, we push forward the idea that the structure of the data can be effectively characterized by its invariances-i.e. aspects that are irrelevant for the task at hand. Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models. These simplified models allow us to investigate and interpret the complex behaviors we observe in deep learning systems, offering insights into their inner workings, with the far-reaching goal of bridging the gap between theory and practice.

摘要
人工智能，尤其是机器学习的一个子领域，已经经历了一种 Paradigm shift ，把注重于数据驱动的模型作为核心。这种 shift 导致了各个领域的不同领域，如自然语言处理和计算机视觉等领域，具有压倒性的进步，主要归功于深度学习，一种特殊的机器学习模型。深度学习可能超越传统方法，因为它可以从原始数据中学习相关的特征。这个论文探讨了深度学习的理论基础，研究了深度学习模型和数据之间的关系。具体来说，我们问：深度学习算法的效果是什么？它能够在高维度上学习函数吗？深度学习模型如何利用数据的结构来学习有用的表示？不同的架构如何利用不同的数据结构？为了回答这些问题，我们推进了一种理论，即数据的结构可以被有效地Characterized by its invariances，即不重要的特征。我们的方法采取了实验室的方法，结合了物理启发的简单模型。这些简单的模型允许我们investigate和 interpret 深度学习系统中的复杂行为，提供了对其内部工作的深入理解，以期 bridge 理论和实践之间的差距。

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

paper_url: http://arxiv.org/abs/2310.16152
repo_url: None
paper_authors: Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz
for: 本研究旨在探讨 federated learning（FL）中隐私泄露的问题，具体来说是在语言模型中实现隐私保护。
methods: 本研究使用了两个新发现，即模型Snapshot在中间轮次可能导致更大的隐私泄露，以及模型中选择性的权重可以增加隐私泄露的风险。
results: 研究发现，使用最佳方法可以提高会员推理精度29%，并达到70%的隐私数据重建率，质量明显超过现有攻击方法。

Abstract
Federated learning (FL) is becoming a key component in many technology-based applications including language modeling -- where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward and the existing attacks only intend to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

摘要
federated learning (FL) 已成为许多技术应用程序的关键组件，包括语音模型 - where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward, and existing attacks only aim to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

paper_url: http://arxiv.org/abs/2310.19821
repo_url: None
paper_authors: Reda Alami, Mohammed Mahfoud, Mastane Achab
for: 该文章目的是提出一种适应非站台环境的可靠多臂抓拍算法框架，以优化在健康保险或金融等高波动环境中的学习问题。
methods: 该框架基于多种常见的风险度量，将多个家族的多臂抓拍算法映射到风险敏感的设定中。此外，该框架还使用重启 Bayesian 线性时间变化检测算法（R-BOCPD）和一个可调的强制探索策略来检测每个臂的地方（本地）变化。
results: 虽然该框架在理论上的 finite-time 保证和 asymptotic regret bound 是 $\tilde O(\sqrt{K_T T})$，但在实际应用中，它在 synthetic 和实际环境中表现出色，并能够有效地处理风险敏感和非站台环境。

Abstract
In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environment-specific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive risk-aware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online Change-Point Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.

摘要
在一般的随机多臂抓拍问题中，目标通常是在某个时间戳 $T$ 上 maximize 的预期奖励总和。而在具有环境特定知识的情况下，一个简单的奖励最大化方法通常不能准确捕捉学习问题的复杂性，导致不可靠的解决方案。为解决这类问题，我们提出了一个适应风险感知策略框架。我们的框架将多种通用风险度量 integrate 到非站台环境中，将多个家族的多臂抓拍算法映射到风险敏感设定下。此外，我们还具有Restarted Bayesian Online Change-Point Detection（R-BOCPD）算法和（可调）强制探索策略，以探测每个臂上的地方（本地）交替。我们提供了finite-time理论保证和 asymptotic regret bound of order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ 是总共变化点数。在实践中，我们的框架与当前状态的最佳实践相比，在 sintetic 和实际环境中表现出色，并能够有效地处理风险敏感和非站台环境。

Online Thermal Field Prediction for Metal Additive Manufacturing of Thin Walls

paper_url: http://arxiv.org/abs/2310.16125
repo_url: None
paper_authors: Yifan Tang, M. Rahmani Dehaghani, Pouyan Sajadi, Shahriar Bakrani Balani, Akshay Dhalpe, Suraj Panicker, Di Wu, Eric Coatanea, G. Gary Wang
For: This paper aims to study a practical issue in metal AM, specifically how to predict the thermal field of yet-to-print parts online when only a few sensors are available.* Methods: The paper proposes an online thermal field prediction method using mapping and reconstruction, which incorporates an artificial neural network and a reduced order model (ROM) to estimate the temperature profiles of points on the yet-to-print layer.* Results: The proposed method can construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop, and has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters.

Abstract
This paper aims to study a practical issue in metal AM, i.e., how to predict the thermal field of yet-to-print parts online when only a few sensors are available. This work proposes an online thermal field prediction method using mapping and reconstruction, which could be integrated into a metal AM process for online performance control. Based on the similarity of temperature curves (curve segments of a temperature profile of one point), the thermal field mapping applies an artificial neural network to estimate the temperature curves of points on the yet-to-print layer from measured temperatures of certain points on the previously printed layer. With measured/predicted temperature profiles of several points on the same layer, the thermal field reconstruction proposes a reduced order model (ROM) to construct the temperature profiles of all points on the same layer, which could be used to build the temperature field of the entire layer. The training of ROM is performed with an extreme learning machine (ELM) for computational efficiency. Fifteen wire arc AM experiments and nine simulations are designed for thin walls with a fixed length and unidirectional printing of each layer. The test results indicate that the proposed prediction method could construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop. Meanwhile, the method has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters. More importantly, after fine-tuning the proposed method with limited experimental data, the relative errors of all predicted temperature profiles on a new experiment are sufficiently small, demonstrating the applicability and generalization of the proposed thermal field prediction method in online applications for metal AM.

摘要
The method uses artificial neural networks to estimate the temperature curves of points on the yet-to-print layer based on measured temperatures of certain points on the previously printed layer. With measured/predicted temperature profiles of several points on the same layer, a reduced order model (ROM) is constructed to build the temperature field of the entire layer. The ROM is trained with an extreme learning machine (ELM) for computational efficiency.Experiments and simulations were conducted on thin walls with a fixed length and unidirectional printing of each layer. The results show that the proposed prediction method can construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop, and has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters. After fine-tuning the method with limited experimental data, the relative errors of all predicted temperature profiles on a new experiment were sufficiently small, demonstrating the applicability and generalization of the proposed thermal field prediction method in online applications for metal AM.

Anchor Space Optimal Transport: Accelerating Batch Processing of Multiple OT Problems

paper_url: http://arxiv.org/abs/2310.16123
repo_url: None
paper_authors: Jianming Huang, Xun Su, Zhongxi Fang, Hiroyuki Kasai
for: 多个似然分布的批处理解决方案
methods: 使用共享anchor点空间来捕捉分布的共同特征，并提出三种方法来学习anchor空间，每种方法都有应用背景
results: 实验表明，提出的方法可以大幅降低计算时间，同时保持可接受的近似性表现

Abstract
The optimal transport (OT) theory provides an effective way to compare probability distributions on a defined metric space, but it suffers from cubic computational complexity. Although the Sinkhorn's algorithm greatly reduces the computational complexity of OT solutions, the solutions of multiple OT problems are still time-consuming and memory-comsuming in practice. However, many works on the computational acceleration of OT are usually based on the premise of a single OT problem, ignoring the potential common characteristics of the distributions in a mini-batch. Therefore, we propose a translated OT problem designated as the anchor space optimal transport (ASOT) problem, which is specially designed for batch processing of multiple OT problem solutions. For the proposed ASOT problem, the distributions will be mapped into a shared anchor point space, which learns the potential common characteristics and thus help accelerate OT batch processing. Based on the proposed ASOT, the Wasserstein distance error to the original OT problem is proven to be bounded by ground cost errors. Building upon this, we propose three methods to learn an anchor space minimizing the distance error, each of which has its application background. Numerical experiments on real-world datasets show that our proposed methods can greatly reduce computational time while maintaining reasonable approximation performance.

摘要
Optimal transport（OT）理论提供了一种有效地比较概率分布在定义的度量空间上，但它受到立方体计算复杂性的限制。虽然斯inkelhorn算法可以大幅降低OT解决方案的计算复杂性，但在实践中，多个OT问题的解决仍然占用了大量的时间和内存。然而，许多关于OT计算加速的研究通常基于单个OT问题的假设，忽略了多个分布在一个批处理中的共同特征。因此，我们提出了一个名为anchor space optimal transport（ASOT）问题的翻译问题，这是专门为批处理多个OT问题的解决而设计的。在我们的ASOT问题中，分布将被映射到共享的anchor点空间中，这里学习了可能共同的特征，从而帮助加速OT批处理。根据我们的ASOT问题，我们证明了对原OT问题的沃asserstein距离错误是由地面成本错误约束的。基于这个结论，我们提出了三种方法来学习一个anchor空间，以降低距离错误，每种方法都有其应用背景。在实际数据上进行的数值实验表明，我们的提出方法可以大幅减少计算时间，同时保持合理的近似性。

19 Parameters Is All You Need: Tiny Neural Networks for Particle Physics

paper_url: http://arxiv.org/abs/2310.16121
repo_url: https://github.com/abogatskiy/pelican-nano
paper_authors: Alexander Bogatskiy, Timothy Hoffman, Jan T. Offermann
for: 这 paper 是为了探讨快速 neural network 架构的可行性，用于低延迟任务 such as triggering。
methods: 这 paper 使用了一种最近的 Lorentz-和 permutation-symmetric 架构 PELICAN，并提供了具有只 19 个可训练参数的实例，可以与 generic 架构相比而言而出色的表现。
results: 论文表明，PELICAN 架构可以在 top quark jet 分类任务中表现更好，并且只需要很少的参数数量。

Abstract
As particle accelerators increase their collision rates, and deep learning solutions prove their viability, there is a growing need for lightweight and fast neural network architectures for low-latency tasks such as triggering. We examine the potential of one recent Lorentz- and permutation-symmetric architecture, PELICAN, and present its instances with as few as 19 trainable parameters that outperform generic architectures with tens of thousands of parameters when compared on the binary classification task of top quark jet tagging.

摘要
为了满足加速器的冲突速率的提高和深度学习解决方案的可行性，现有一个增长的需求是轻量级快速的神经网络架构，用于低延迟任务such as 触发。我们研究了最近的 Lorentz 和 permutation 相对symmetric架构PELICAN，并提供了具有只有19个可训练参数的实例，与通用架构数万个参数相比，在极高速批处理任务中表现出色。

Compressed representation of brain genetic transcription

paper_url: http://arxiv.org/abs/2310.16113
repo_url: None
paper_authors: James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev
for: 本研究旨在提供一种高效的方法来压缩大规模的脑组织表达数据，以便更好地探索脑组织结构和功能。
methods: 本研究使用了多种常用的线性和非线性方法，包括PCA、kernel PCA、NMF、t-SNE、UMAP以及深度自编码，以实现数据压缩。
results: 研究结果表明，使用深度自编码可以获得最高的重建精度、结构准确性和预测utilty，因此支持使用深度自编码来表示脑组织表达数据。

Abstract
The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.

摘要
人脑的建筑物理太复杂，无法直观探索，需要使用压缩表示方法将其变化 проек到一个紧凑可 navigate 空间中。特别是在高维数据，如基因表达，其结合型复杂性和谱系强度需要最大压缩。现有的做法是使用标准的主成分分析（PCA），它的计算方便性受到限制，特别是在大压缩比例时。使用整个脑、每个 voxel 的 Allan 脑 Atlases 的整个转录数据，我们系统地比较了压缩表示方法，包括 PCA、kernel PCA、非正式矩阵分解（NMF）、t-Stochastic neighbor embedding（t-SNE）、Uniform manifold approximation and projection（UMAP）以及深度自编码。我们发现深度自编码可以在所有性能指标和目标领域中获得最佳表示，支持它作为人脑转录模式的参照标准。

Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling

paper_url: http://arxiv.org/abs/2310.16106
repo_url: None
paper_authors: Daniel Pérez Herrera, Zheng Chen, Erik G. Larsson
for: 本文关注分布式学习无线网络上的通信方面，使用consensus-based分布式随机梯度下降（D-SGD）。考虑到网络中信息交换过程中的实际通信成本或延迟，我们的目标是使得算法快速 konvergence， measured by improvement per transmission slot。
methods: 我们提议了一种高效的通信框架BASS，用于D-SGD在无线网络上的广播传输和抽样。在每个迭代中，我们活化多个不干扰的节点广播模型更新到其邻居。这些subsets是随机变化的，与网络连接性相关，并且受到通信成本限制（例如，每次迭代的平均传输槽数）。在consensus更新步骤中，只有双向链接被保留，以保持通信对称。
results: 与现有的链接计划方法相比，无线通信频道的自然广播特性可以在同样的传输槽数下提高分布式学习的 konvergence 速度。

Abstract
This work centers on the communication aspects of decentralized learning over wireless networks, using consensus-based decentralized stochastic gradient descent (D-SGD). Considering the actual communication cost or delay caused by in-network information exchange in an iterative process, our goal is to achieve fast convergence of the algorithm measured by improvement per transmission slot. We propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling. In each iteration, we activate multiple subsets of non-interfering nodes to broadcast model updates to their neighbors. These subsets are randomly activated over time, with probabilities reflecting their importance in network connectivity and subject to a communication cost constraint (e.g., the average number of transmission slots per iteration). During the consensus update step, only bi-directional links are effectively preserved to maintain communication symmetry. In comparison to existing link-based scheduling methods, the inherent broadcasting nature of wireless channels offers intrinsic advantages in speeding up convergence of decentralized learning by creating more communicated links with the same number of transmission slots.

摘要
To achieve this, we propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling. In each iteration, we activate multiple subsets of non-interfering nodes to broadcast model updates to their neighbors. These subsets are randomly activated over time, with probabilities reflecting their importance in network connectivity and subject to a communication cost constraint (e.g., the average number of transmission slots per iteration). During the consensus update step, only bi-directional links are effectively preserved to maintain communication symmetry.Compared to existing link-based scheduling methods, the inherent broadcasting nature of wireless channels offers intrinsic advantages in speeding up convergence of decentralized learning by creating more communicated links with the same number of transmission slots.

Locally Differentially Private Gradient Tracking for Distributed Online Learning over Directed Graphs

paper_url: http://arxiv.org/abs/2310.16105
repo_url: None
paper_authors: Ziqin Chen, Yongqiang Wang
for: solves the tradeoff between learning accuracy and privacy in distributed online learning over directed graphs.
methods: proposes a locally differentially private gradient tracking based distributed online learning algorithm that ensures rigorous local differential privacy and converges to the exact optimal solution.
results: outperforms existing counterparts in both training and testing accuracies, with guaranteed finite cumulative privacy budget even when the number of iterations tends to infinity.

Abstract
Distributed online learning has been proven extremely effective in solving large-scale machine learning problems over streaming data. However, information sharing between learners in distributed learning also raises concerns about the potential leakage of individual learners' sensitive data. To mitigate this risk, differential privacy, which is widely regarded as the "gold standard" for privacy protection, has been widely employed in many existing results on distributed online learning. However, these results often face a fundamental tradeoff between learning accuracy and privacy. In this paper, we propose a locally differentially private gradient tracking based distributed online learning algorithm that successfully circumvents this tradeoff. We prove that the proposed algorithm converges in mean square to the exact optimal solution while ensuring rigorous local differential privacy, with the cumulative privacy budget guaranteed to be finite even when the number of iterations tends to infinity. The algorithm is applicable even when the communication graph among learners is directed. To the best of our knowledge, this is the first result that simultaneously ensures learning accuracy and rigorous local differential privacy in distributed online learning over directed graphs. We evaluate our algorithm's performance by using multiple benchmark machine-learning applications, including logistic regression of the "Mushrooms" dataset and CNN-based image classification of the "MNIST" and "CIFAR-10" datasets, respectively. The experimental results confirm that the proposed algorithm outperforms existing counterparts in both training and testing accuracies.

摘要
分布式在线学习已经证明可以非常有效地解决大规模机器学习问题，但是learners之间的信息共享也会使得个人学习者敏感数据泄露的风险增加。为了解决这个风险，分布式在线学习中广泛采用了广泛被视为“金标准”的隐私保护技术——差分隐私。然而，这些结果通常面临着学习精度和隐私之间的基本负担。在这篇论文中，我们提议一种基于差分隐私的梯度追踪分布式在线学习算法，该算法可以绕过学习精度和隐私之间的负担。我们证明该算法在mean square拟合到了准确的优质解，同时坚持rigorous的本地差分隐私，并且 garanttees the cumulative privacy budget是有限的，即使训练迭代数趋向于无穷。该算法适用于 directed communication graph中的learners。到目前为止，这是分布式在线学习中首先同时保证学习精度和rigorous的本地差分隐私的结果。我们通过多个benchmark机器学习应用程序进行性能评估，包括“蘑菇”数据集上的逻辑回归和“MNIST”和“CIFAR-10”数据集上的CNN图像分类，分别得出了良好的训练和测试准确率。

Contextual Bandits for Evaluating and Improving Inventory Control Policies

paper_url: http://arxiv.org/abs/2310.16096
repo_url: None
paper_authors: Dean Foster, Randy Jia, Dhruv Madeka
for: addresses the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times.
methods: uses a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies.
results: achieves favorable guarantees both theoretically and in empirical studies.Here’s the full translation in Simplified Chinese:
for: 本文addresses periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times.
methods: 使用轻量级上下文ual bandit-based算法来评估和 occasional tweak policies.
results: achieves favorable guarantees both theoretically and in empirical studies.

Abstract
Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning. Therefore, it is important to analyze and evaluate any inventory control policy, in particular to see if there is room for improvement. We introduce the concept of an equilibrium policy, a desirable property of a policy that intuitively means that, in hindsight, changing only a small fraction of actions does not result in materially more reward. We provide a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies, and show that this method achieves favorable guarantees, both theoretically and in empirical studies.

摘要
解决 periodic review 存储控制问题，包括非站ARY random demand，lost sales，和随机供应商延迟，通常需要做强大的假设，并使用优化、动态Programming或强化学习方法。因此，重要的是分析和评估存储控制策略，以确定是否有可能进行改进。我们介绍了平衡策略，这是一种策略的感知性质，意味着在后悔中，只需要改变一小部分的行动，不会导致更多的奖励。我们提供了一种轻量级上下文ual bandit-based算法来评估和偶尔调整策略，并证明了这种方法在理论和实际研究中都有有利的保证。

A Unified, Scalable Framework for Neural Population Decoding

paper_url: http://arxiv.org/abs/2310.16046
repo_url: None
paper_authors: Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael J. Mendelson, Blake Richards, Matthew G. Perich, Guillaume Lajoie, Eva L. Dyer
for: 这个论文的目的是提出一种基于深度学习的方法，用于分析神经活动的大规模数据。
methods: 这个方法使用了跨注意力和PerceiverIO底层，将神经活动的population dynamics tokenized成一个有效的表示，并使用这个表示来预训练模型。
results: 作者在七只非human Primates的158个不同会话中，使用了大规模的数据集和100个小时的录制，预训练了一个大规模多会话模型。在不同的任务中， authorsshowed that their pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, achieving few-shot performance with minimal labels.

Abstract
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.

摘要
我们可能通过深入学习方法解读神经活动受惠于更大的规模，包括模型大小和数据集。然而，将多个神经记录集成为一个统一的模型是挑战，因为每个记录包含不同动物的不同神经元的活动。在这篇论文中，我们介绍了一个训练框架和架构，用于模型神经活动范围内的人工智能动态。我们首先将数据集中的每个冲击分解成精细的时间结构，然后使用对话式和PerceiverIO脊梁来构建封装了神经人类活动的秘密tokenization。使用这个框架和训练方法，我们建立了一个大规模多Session模型，训练在7个非人类 Primates上，共计158个不同的Session，记录了27,373个神经元和100个小时的记录。在一些不同任务中，我们示示了我们预训练模型可以快速适应新、未看过的Session，并且只需少量标签。这项工作提出了一种新的深入学习工具来分析神经数据，并铺开了训练在大规模的道路。

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

paper_url: http://arxiv.org/abs/2310.16027
repo_url: https://github.com/travers-rhodes/timewarpvae
paper_authors: Travers Rhodes, Daniel D. Lee
for: 本研究旨在学习复杂任务中人类示例路径的有效表示。
methods: 该研究提出了一种完全可导的拟合推断算法 TimewarpVAE，它通过动态时间拟合（DTW）同时学习时间变化和空间变化的纬度因素。
results: 研究结果表明，TimewarpVAE算法可以在小手写和餐刀捏持任务上学习有意义的时间对齐和空间变化的表示，并且可以生成符合 semantics的新路径。

Abstract
Human demonstrations of trajectories are an important source of training data for many machine learning problems. However, the difficulty of collecting human demonstration data for complex tasks makes learning efficient representations of those trajectories challenging. For many problems, such as for handwriting or for quasistatic dexterous manipulation, the exact timings of the trajectories should be factored from their spatial path characteristics. In this work, we propose TimewarpVAE, a fully differentiable manifold-learning algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn both timing variations and latent factors of spatial variation. We show how the TimewarpVAE algorithm learns appropriate time alignments and meaningful representations of spatial variations in small handwriting and fork manipulation datasets. Our results have lower spatial reconstruction test error than baseline approaches and the learned low-dimensional representations can be used to efficiently generate semantically meaningful novel trajectories.

摘要
人类示例路径是许多机器学习问题的重要训练数据源。然而，收集复杂任务的人类示例数据的困难性使得学习这些路径的有效表示困难。例如，手写或 quasi-静止的手部操作中，路径的具体时间 shouldn't 被从其空间轨迹特征中分离。在这种情况下，我们提出了 TimewarpVAE，一种完全可导的推广学习算法，其中包含动态时间扭曲（DTW），以同时学习时间变化和空间变化的秘密因素。我们展示了 TimewarpVAE 算法如何学习合适的时间对齐和有意义的空间变化表示。我们的结果在小手写和铲 manipulate 数据集上具有较低的空间重建测试错误，并且学习的低维度表示可以高效地生成具有 semantic 意义的新路径。

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

paper_url: http://arxiv.org/abs/2310.16076
repo_url: https://github.com/idsia/fwp-formal-lang
paper_authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
for: 这项研究探讨了基于批处理器的循环神经网络（RNN）的计算能力层次结构，以及在实时和有限精度假设下的RNN架构。
methods: 本研究使用自动回归转换器，即线性转换器（LT）或快速Weight程序（FWP），这些模型可以看作是固定大小状态的RNN序列处理器，同时也可以表示为当前流行的自注意网络。
results: 我们的实验表明，许多对标准转换器的结果直接适用于LTs/FWPs，而且扩展FWP可以超越LT的一些局限性，例如在枚举问题上进行泛化。我们的代码公开。

Abstract
Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a fixed-size state, while they can also be expressed as the now-popular self-attention networks. We show that many well-known results for the standard Transformer directly transfer to LTs/FWPs. Our formal language recognition experiments demonstrate how recently proposed FWP extensions such as recurrent FWPs and self-referential weight matrices successfully overcome certain limitations of the LT, e.g., allowing for generalisation on the parity problem. Our code is public.

摘要
近期研究计算力学的回归神经网络（RNN）表明了一个RNN架构层次结构，基于实时和有限精度假设。我们研究了自动回归转换器（LT），它们是特殊的因为它们可以视为固定大小的状态Sequence processor，同时也可以表示为目前受欢迎的自注意网络。我们证明了许多标准Transformer的结果直接适用于LTs/FWPs。我们的正式语言识别实验表明了 reciprocal FWP和自referential weight matrix的扩展可以超越LT的一些局限性，例如解决基本问题的泛化问题。我们的代码公开。

MLFMF: Data Sets for Machine Learning for Mathematical Formalization

paper_url: http://arxiv.org/abs/2310.16005
repo_url: https://github.com/ul-fmf/mlfmf-data
paper_authors: Andrej Bauer, Matej Petković, Ljupčo Todorovski
for: 本研究用于开发一个收集数据集，用于对ormalization of mathematics with proof assistants进行benchmarking。
methods: 本研究使用了Agda和Lean两种证明助手中的library，通过EXTRACTING网络和s-expressions来 Represent each library in two ways。
results: 本研究reported基线结果，使用了标准图 embeddings、word embeddings、tree ensembles和instance-based learning算法。 MLFMF数据集提供了对formalized mathematics的numerous machine learningapproaches的Solid benchmarking支持。

Abstract
We introduce MLFMF, a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes the largest Lean~4 library Mathlib, and some of the largest Agda libraries: the standard library, the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of s-expressions representing the syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the s-expressions give complete and easily parsed information about every entry. We report baseline results using standard graph and word embeddings, tree ensembles, and instance-based learning algorithms. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. The methodology used to extract the networks and the s-expressions readily applies to other libraries, and is applicable to other proof assistants. With more than $250\,000$ entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format.

摘要
我们介绍MLFMF，一个收集数据集用于评估推荐系统的 formalized mathematics benchmarking 集合。这些系统可以帮助人类identify新的证明（ theorem、construction、数据类型和假设）和新的构造是否有 relevance 在证明新的证明或完成新的构造。每个数据集来自于Agda或Lean证明助手中的库，包括最大的Lean4库Mathlib，以及一些最大的Agda库：标准库、Agda-unimath库和TypeTopology库。每个数据集表示对应的库在两种方式：为heterogeneous网络和所有entry的Syntax树表示。网络包含库中的（模块）结构和entry之间的引用，而Syntax树表示了每个entry的完整和可读性。我们报告了使用标准图 embeddings、word embeddings、树集和实例学习算法的基准结果。MLFMF数据集为further investigation of numerous machine learning approaches to formalized mathematics提供了坚实的 benchmarking 支持。我们使用的方法可以轻松应用于其他库和证明助手，现已有超过250000个入库。这是目前最大的 formalized mathematical knowledge 在机器学习化Format中的集合。

White-box Compiler Fuzzing Empowered by Large Language Models

paper_url: http://arxiv.org/abs/2310.15991
repo_url: None
paper_authors: Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, Lingming Zhang
for: compiler testing, specifically white-box fuzzing of compiler optimizations
methods: uses Large Language Models (LLMs) with source-code information to generate high-quality tests for exercising deep optimizations
results: found 96 bugs, with 80 confirmed as previously unknown and 51 already fixed, demonstrating the effectiveness of WhiteFox in discovering previously undiscovered compiler bugs.

Abstract
Compiler correctness is crucial, as miscompilation falsifying the program behaviors can lead to serious consequences. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates tests without sufficient understanding of internal compiler behaviors. As such, they often fail to construct programs to exercise conditions of intricate optimizations. Meanwhile, traditional white-box techniques are computationally inapplicable to the giant codebase of compilers. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization. WhiteFox adopts a dual-model framework: (i) an analysis LLM examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) a generation LLM produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are used as feedback to further enhance the test generation on the fly. Our evaluation on four popular compilers shows that WhiteFox can generate high-quality tests to exercise deep optimizations requiring intricate conditions, practicing up to 80 more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80 confirmed as previously unknown and 51 already fixed. Beyond compiler testing, WhiteFox can also be adapted for white-box fuzzing of other complex, real-world software systems in general.

摘要
compiler Correctness 非常重要，因为 mistCompilation 可能会导致程序行为错误。在文献中，对于 Compiler 的检测已经进行了广泛的研究。然而， Compiler fuzzing 仍然是一项挑战：现有的技术主要集中在黑盒和灰盒 fuzzing 中，生成测试程序不具备内存compiler内部行为的充分理解。此外，传统的白盒技术对于编译器的代码库来说是计算不可行的。现有的研究表明， Large Language Models (LLMs) 在代码生成/理解任务中表现出色，并在黑盒 fuzzing 中达到了状态对照性。然而，在编译器测试中，LLMs 被提取source code 信息的研究还没有进行。为此，我们提出了 WhiteFox，首个使用 LLMs 和 source code 信息进行白盒编译器测试的工具。WhiteFox 采用了两个模型框架：（i）分析 LLM 对低级优化源代码进行分析，并生成高级测试程序可以触发优化;（ii）生成 LLM 根据总结的需求生成测试程序。此外，我们还使用优化触发测试作为反馈，以进一步改进测试生成。我们对四种流行的编译器进行评估，发现 WhiteFox 可以生成高质量的测试程序，激活深入的优化需求。至今，WhiteFox 共发现了96个bug，其中80个已经确认为新发现的问题，51个已经修复。除了编译器测试外，WhiteFox 还可以适用于白盒 fuzzing 其他复杂、实际世界软件系统。

Data-driven Traffic Simulation: A Comprehensive Review

paper_url: http://arxiv.org/abs/2310.15975
repo_url: None
paper_authors: Di Chen, Meixin Zhu, Hao Yang, Xuesong Wang, Yinhai Wang
For: This paper aims to review current research efforts and provide a futuristic perspective on data-driven microscopic traffic simulation for autonomous vehicles.* Methods: The paper discusses various methods, including imitation learning, reinforcement learning, generative learning, and deep learning, and evaluates their advantages and disadvantages.* Results: The paper provides a comprehensive evaluation of existing challenges and future research directions in data-driven microscopic traffic simulation for autonomous vehicles.Here is the same information in Simplified Chinese text:* For: 这篇论文目标是对当前的数据驱动微型交通模拟技术进行评估和未来发展规划。* Methods: 论文讨论了各种方法，包括仿制学习、奖励学习、生成学习和深度学习，并评估了它们的优缺点。* Results: 论文提供了数据驱动微型交通模拟技术的现有挑战和未来研究方向的全面评估。

Abstract
Autonomous vehicles (AVs) have the potential to significantly revolutionize society by providing a secure and efficient mode of transportation. Recent years have witnessed notable advance-ments in autonomous driving perception and prediction, but the challenge of validating the performance of AVs remains largely unresolved. Data-driven microscopic traffic simulation has be-come an important tool for autonomous driving testing due to 1) availability of high-fidelity traffic data; 2) its advantages of ena-bling large-scale testing and scenario reproducibility; and 3) its potential in reactive and realistic traffic simulation. However, a comprehensive review of this topic is currently lacking. This pa-per aims to fill this gap by summarizing relevant studies. The primary objective of this paper is to review current research ef-forts and provide a futuristic perspective that will benefit future developments in the field. It introduces the general issues of data-driven traffic simulation and outlines key concepts and terms. After overviewing traffic simulation, various datasets and evalua-tion metrics commonly used are reviewed. The paper then offers a comprehensive evaluation of imitation learning, reinforcement learning, generative and deep learning methods, summarizing each and analyzing their advantages and disadvantages in detail. Moreover, it evaluates the state-of-the-art, existing challenges, and future research directions.

摘要
自动驾驶车（AV）有可能重大改变社会，提供安全有效的交通方式。过去几年，自动驾驶驱动技术得到了显著的进步，但AV的性能验证仍然存在重要的挑战。基于数据驱动的微型交通 simulate 成为了自动驾驶测试中的重要工具，因为它们具有以下优势：1）高品质交通数据的可 availability; 2）大规模测试和enario reproducibility的优势; 3）可以实现反应性和真实的交通 simulate。然而，这个领域的全面回顾仍然缺失。这篇论文的主要目标是审查当前的研究努力，并为未来发展提供未来性的视角。它介绍了一般问题，并 outline 关键概念和术语。然后，它综述了不同的数据集和评价指标，并进行了详细的分析和评价。此外，它还评估了现状、挑战和未来研究方向。

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

paper_url: http://arxiv.org/abs/2310.15976
repo_url: None
paper_authors: Zhen Qin, Zhishuai Liu, Pan Xu
for: 这个论文主要针对非对称优化问题，提出了一种基于随机排序的签名SGD算法，并证明了其 converge 性。
methods: 该论文使用了随机排序（SignRR）、变量减少（SignRVR）和均值更新（SignRVM）等方法，并证明了它们的 converge 性。
results: 该论文通过实验证明，随机排序签名SGD算法可以与现有的基准相匹配或超越它们。

Abstract
signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $\sigma^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + \|\sigma\|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log(nT)/\sqrt{nT})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log(n_0T)/\sqrt{n_0T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.

摘要
<> translate text into Simplified Chinese<>signSGD 受欢迎在非对称优化中，因为它的通信效率高。然而，现有的 signSGD 分析假设数据在每个迭代中随机抽取，与实际情况不符，即数据随机排序并且按照顺序feed into the algorithm。我们将这一悖论 bridge 这一悖论，通过证明 signSGD 随机排序（SignRR）在非对称优化中的首次收敛结果。给出 dataset 大小 $n$，迭代数据 passes $T$， Stochastic gradient 的方差 bound $\sigma^2$，我们显示 SignRR 的收敛速率为 $O(\frac{\log(nT)}{\sqrt{nT} + \|\sigma\|_1)$，与 signSGD 相同。然后，我们提出 SignRVR 和 SignRVM，它们都使用减少噪声的 gradient 和摩托UPDATE 分别收敛，它们的收敛速率为 $O(\frac{\log(nT)}{\sqrt{nT}$。与 signSGD 分析不同，我们的结果不需要在每个迭代中批处理大小与总迭代数量之间的很大批处理大小。我们还扩展我们的算法，以适应数据分布在不同机器上的情况，得到 dist-SignRVR 和 dist-SignRVM，它们的收敛速率为 $O(\frac{\log(n_0T)}{\sqrt{n_0T}$，其中 $n_0$ 是单机dataset 大小。我们通过对模拟和实际问题进行实验，证明 randomly reshuffled sign 方法与现有的基准相匹配或甚至超越。

Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees

paper_url: http://arxiv.org/abs/2310.15974
repo_url: https://github.com/machinelearningbcam/imrcs-for-incremental-learning-neurips-2023
paper_authors: Verónica Álvarez, Santiago Mazuelas, Jose A. Lozano
for: 这篇论文是为了解决随时间推移而改变的任务序列中的分类任务。
methods: 本论文提出了一种叫做增量最小最大协方差分类器（IMRC），可以充分利用前向和后向学习，并考虑任务序列中任务之间的演进。
results: 实验结果显示，IMRC可以对于几乎无samples的情况下，实现明显的性能提升。

Abstract
For a sequence of classification tasks that arrive over time, it is common that tasks are evolving in the sense that consecutive tasks often have a higher similarity. The incremental learning of a growing sequence of tasks holds promise to enable accurate classification even with few samples per task by leveraging information from all the tasks in the sequence (forward and backward learning). However, existing techniques developed for continual learning and concept drift adaptation are either designed for tasks with time-independent similarities or only aim to learn the last task in the sequence. This paper presents incremental minimax risk classifiers (IMRCs) that effectively exploit forward and backward learning and account for evolving tasks. In addition, we analytically characterize the performance improvement provided by forward and backward learning in terms of the tasks' expected quadratic change and the number of tasks. The experimental evaluation shows that IMRCs can result in a significant performance improvement, especially for reduced sample sizes.

摘要
For a sequence of classification tasks that arrive over time, it is common that tasks are evolving in the sense that consecutive tasks often have a higher similarity. The incremental learning of a growing sequence of tasks holds promise to enable accurate classification even with few samples per task by leveraging information from all the tasks in the sequence (forward and backward learning). However, existing techniques developed for continual learning and concept drift adaptation are either designed for tasks with time-independent similarities or only aim to learn the last task in the sequence. This paper presents incremental minimax risk classifiers (IMRCs) that effectively exploit forward and backward learning and account for evolving tasks. In addition, we analytically characterize the performance improvement provided by forward and backward learning in terms of the tasks' expected quadratic change and the number of tasks. The experimental evaluation shows that IMRCs can result in a significant performance improvement, especially for reduced sample sizes.Here's the translation in Traditional Chinese:在时间序列中涌现的课题中，常有课题之间的相似性增加。对于这种情况，将批量学习应用到不断增长的课题序列上，可以实现仅使用少量样本进行精确的分类。然而，现有的持续学习和概念漂移适应技术 either 对任务之间的相似性为时间独立或只是针对最后一个任务进行学习。本文则提出了增量最大危险风险分类器（IMRC），可以有效地利用前向和后向学习，并考虑到任务的演化。此外，我们也 analytically Characterize 增量学习的性能改善，包括任务预期的quadratic change 和任务数量。实验评估显示，IMRC 可以对于仅使用少量样本进行分类时，实现重要的性能改善。

Constructing and Machine Learning Calabi-Yau Five-folds

paper_url: http://arxiv.org/abs/2310.15966
repo_url: None
paper_authors: R. Alawadhi, D. Angella, A. Leonardo, T. Schettini Gherardini
for: 这个论文目的是构建所有可能的完全交叉Calabi-Yau五维空间，使用四个或更少的复抽象 проекive空间，并对这些空间进行研究。
methods: 作者使用了configuration matrix的方法，并对这些空间进行计数。他们还计算了这些空间中的cohomological数据，并将其存储在 dataset 中。
results: 作者发现了2375个不同的Hodge diamond，并对这些数据进行了supervised机器学习，使用类ifier和回归神经网络。他们发现，可以非常有效地预测h^{1,1}的值，并且accuracy达96%。对于h^{1,4},h^{2,3}和η，也发现了非常高的R^2 Score，但是精度较低，因为这些值的范围很大。

Abstract
We construct all possible complete intersection Calabi-Yau five-folds in a product of four or less complex projective spaces, with up to four constraints. We obtain $27068$ spaces, which are not related by permutations of rows and columns of the configuration matrix, and determine the Euler number for all of them. Excluding the $3909$ product manifolds among those, we calculate the cohomological data for $12433$ cases, i.e. $53.7 \%$ of the non-product spaces, obtaining $2375$ different Hodge diamonds. The dataset containing all the above information is available at https://www.dropbox.com/scl/fo/z7ii5idt6qxu36e0b8azq/h?rlkey=0qfhx3tykytduobpld510gsfy&dl=0 . The distributions of the invariants are presented, and a comparison with the lower-dimensional analogues is discussed. Supervised machine learning is performed on the cohomological data, via classifier and regressor (both fully connected and convolutional) neural networks. We find that $h^{1,1}$ can be learnt very efficiently, with very high $R^2$ score and an accuracy of $96\%$, i.e. $96 \%$ of the predictions exactly match the correct values. For $h^{1,4},h^{2,3}, \eta$, we also find very high $R^2$ scores, but the accuracy is lower, due to the large ranges of possible values.

摘要
我们建构所有可能的完整交叉Calabi-Yau五维流体，在四个或更少的复数射影空间中，并将其条件为最多四个。我们获得了27068个空间，其中3909个是产生的流体，我们计算了这些流体的 cohomological 数据，获得了2375个不同的Hodge几何。我们提供了这些数据的Dataset，可以在https://www.dropbox.com/scl/fo/z7ii5idt6qxu36e0b8azq/h?rlkey=0qfhx3tykytduobpld510gsfy&dl=0 获取。我们分析了这些数据的分布，并与低维类型的数据进行比较。我们还使用了supervised机器学习，使用分类器和回归（both fully connected和 convolutional）神经网络，以学习cohomological数据。我们发现，可以非常高效地学习 $h^{1,1}$，其 $R^2$ 分数和准确率均非常高，分别为96%和96%。对于 $h^{1,4},h^{2,3}, \eta$，我们也发现了非常高的 $R^2$ 分数，但准确率较低，因为这些数据的可能的值幅度较大。

Weighted Distance Nearest Neighbor Condensing

paper_url: http://arxiv.org/abs/2310.15951
repo_url: None
paper_authors: Lee-Ad Gottlieb, Timor Sharabi, Roi Weiss
for: 这篇论文研究了加权距离最近邻居减去问题，即对减去集中分配权重，然后根据权重距离最近邻居来标注新点。
methods: 本论文提出了一种新的减去模型，并研究了其理论性质。具体来说，作者们使用权重距离来标注新点，并证明了这种方法可以比标准最近邻居规则更好地减去。
results: 作者们在论文中提出了一种减去启发式，并证明了这种启发式是极大likelihood估计的极限情况。此外，他们还进行了一些实验，并证明了这种方法在实际应用中的表现非常良好。

Abstract
The problem of nearest neighbor condensing has enjoyed a long history of study, both in its theoretical and practical aspects. In this paper, we introduce the problem of weighted distance nearest neighbor condensing, where one assigns weights to each point of the condensed set, and then new points are labeled based on their weighted distance nearest neighbor in the condensed set. We study the theoretical properties of this new model, and show that it can produce dramatically better condensing than the standard nearest neighbor rule, yet is characterized by generalization bounds almost identical to the latter. We then suggest a condensing heuristic for our new problem. We demonstrate Bayes consistency for this heuristic, and also show promising empirical results.

摘要
这个 nearest neighbor 压缩问题有很长的历史研究，包括理论和实践方面。在这篇论文中，我们引入了一个新的weighted distance nearest neighbor压缩模型，其中每个点都有一个权重，然后根据这些权重计算的距离最近的点在压缩集中。我们研究了这个新模型的理论性质，并证明它可以比标准的 nearest neighbor 规则更好地压缩，但是其泛化 bound 几乎与后者一样。我们then propose了一种压缩启发法，并证明了这种启发法的拜访统计准确性。最后，我们还提供了一些实验结果，表明这种方法的潜在实用性。

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

paper_url: http://arxiv.org/abs/2310.15938
repo_url: None
paper_authors: Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov
for: 这个论文的目的是提出一种基于注意力的知识填充（ABKD）方法，用于压缩图像网络（GNNs），以提高压缩后的准确率。methods: 这个论文使用了知识填充（KD）方法，并使用注意力机制来选择重要的教师-学生层对。results: compared to现有的方法，这个方法可以实现更高的压缩率（32.3倍），同时保持相对较高的准确率（1.79%提升）。

Abstract
Graph Neural Networks (GNNs) have proven to be quite versatile for a variety of applications, including recommendation systems, fake news detection, drug discovery, and even computer vision. Due to the expanding size of graph-structured data, GNN models have also increased in complexity, leading to substantial latency issues. This is primarily attributed to the irregular structure of graph data and its access pattern into memory. The natural solution to reduce latency is to compress large GNNs into small GNNs. One way to do this is via knowledge distillation (KD). However, most KD approaches for GNNs only consider the outputs of the last layers and do not consider the outputs of the intermediate layers of the GNNs; these layers may contain important inductive biases indicated by the graph structure. To address this shortcoming, we propose a novel KD approach to GNN compression that we call Attention-Based Knowledge Distillation (ABKD). ABKD is a KD approach that uses attention to identify important intermediate teacher-student layer pairs and focuses on aligning their outputs. ABKD enables higher compression of GNNs with a smaller accuracy dropoff compared to existing KD approaches. On average, we achieve a 1.79% increase in accuracy with a 32.3x compression ratio on OGBN-Mag, a large graph dataset, compared to state-of-the-art approaches.

摘要
Graph Neural Networks (GNNs) 已经证明非常适用于多种应用程序，包括推荐系统、假新闻检测、药物发现和计算机视觉。由于图数据的规模不断扩大，GNN 模型也在复杂度方面不断提高，导致了明显的延迟问题。这主要归结于图数据的不规则结构和访问模式。以减少延迟为目的，可以将大型 GNN 压缩成小型 GNN。一种可行的方法是通过知识传播（KD）。然而，现有的 KD 方法只考虑 GNN 的最后层输出，而忽略了中间层的输出，这些层可能包含图结构中重要的导向信息。为了解决这个缺陷，我们提出了一种新的 KD 方法，称之为注意力基本知识传播（ABKD）。ABKD 是一种使用注意力来确定重要的教师生Student层对对应的输出的准确性。ABKD 可以更好地压缩 GNN，并且相比现有的 KD 方法，具有更高的压缩率和更低的准确性下降。在 OGBN-Mag 大图数据集上，我们平均 achieved 1.79% 的准确性提高，与状态机器的 KD 方法相比，具有 32.3 倍的压缩率。

Online Robust Mean Estimation

paper_url: http://arxiv.org/abs/2310.15932
repo_url: None
paper_authors: Daniel M. Kane, Ilias Diakonikolas, Hanshen Xiao, Sihan Liu
for: 本文研究高维度 robust mean estimation 问题在在线 Setting下。 Specifically, we consider a scenario where $n$ sensor 测量一些常见、持续进行的现象。 At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$.
methods: 我们使用 online 算法来计算一个好的 approximation $\mu$ 来近似 true mean 值 $\mu^\ast := \mathbf{E}[X]$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously.
results: 我们证明了两个主要结论。 First, if the uncorrupted samples satisfy the standard condition of $(\epsilon,\delta)$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $|\mu-\mu^\ast|_2 = O(\delta \log(T))$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(\delta)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.

Abstract
We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $\mu$ to the true mean $\mu^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of $(\epsilon,\delta)$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $\|\mu-\mu^\ast\|_2 = O(\delta \log(T))$, where $\mu = (\mu_t)_{t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(\delta)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.

摘要
我们研究了一个高维Robust Mean Estimation问题在在线 Setting中。 Specifically, we consider a scenario where $n$ sensor are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $\mu$ to the true mean $\mu^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. 我们证明了在这种模型中的两个主要结果。 first, if the uncorrupted samples satisfy the standard condition of $( \epsilon, \delta )$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $\| \mu - \mu^\ast \|_2 = O( \delta \log(T) )$, where $\mu = ( \mu_t )_{t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O( \delta )$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.

Climate Change Impact on Agricultural Land Suitability: An Interpretable Machine Learning-Based Eurasia Case Study

paper_url: http://arxiv.org/abs/2310.15912
repo_url: None
paper_authors: Valeriy Shevchenko, Daria Taniushkina, Aleksander Lukashevich, Aleksandr Bulkin, Roland Grinis, Kirill Kovalev, Veronika Narozhnaia, Nazar Sotiriadi, Alexander Krenke, Yury Maximov
for: 预测 климатиче变化对农业适应性的影响，以便为政策制定人员提供有用的决策指南，以避免人道主义危机。
methods: 使用机器学习方法预测不同碳排放enario下农业适应性的风险，并通过全面的特征重要性分析，揭示了气候和地形特征对农业适应性的影响。
results: 该研究实现了Remarkable accuracy，为政策制定人员提供了有价值的决策指南，以避免人道主义危机。

Abstract
The United Nations has identified improving food security and reducing hunger as essential components of its sustainable development goals. As of 2021, approximately 828 million people worldwide are experiencing hunger and malnutrition, with numerous fatalities reported. Climate change significantly impacts agricultural land suitability, potentially leading to severe food shortages and subsequent social and political conflicts. To address this pressing issue, we have developed a machine learning-based approach to predict the risk of substantial land suitability degradation and changes in irrigation patterns. Our study focuses on Central Eurasia, a region burdened with economic and social challenges. This study represents a pioneering effort in utilizing machine learning methods to assess the impact of climate change on agricultural land suitability under various carbon emissions scenarios. Through comprehensive feature importance analysis, we unveil specific climate and terrain characteristics that exert influence on land suitability. Our approach achieves remarkable accuracy, offering policymakers invaluable insights to facilitate informed decisions aimed at averting a humanitarian crisis, including strategies such as the provision of additional water and fertilizers. This research underscores the tremendous potential of machine learning in addressing global challenges, with a particular emphasis on mitigating hunger and malnutrition.

摘要
联合国认为，提高食品安全和减少饥饿是可持续发展目标的重要组成部分。截至2021年，全球约有828万人在饥饿和营养不良的情况下生活，有多起死亡报告。气候变化对农业适应性的影响是重要的，可能导致严重的食品短缺和社会和政治冲突。为解决这一问题，我们开发了一种基于机器学习的方法，以预测气候变化对农业适应性的影响。我们的研究对中亚欧洲进行了应用，这是一个受经济和社会挑战的地区。本研究是机器学习方法在评估气候变化对农业适应性的影响方面的先驱之作。通过全面的特征重要性分析，我们揭示了气候和地形特征对适应性的影响。我们的方法具有remarkable的准确性，为政策制定人员提供了有价值的信息，以便采取积极措施，如提供额外的水和肥料，以避免人道主义危机。这项研究强调机器学习在解决全球问题方面的潜在力量，特别是减少饥饿和营养不良的挑战。

Neural Collapse in Multi-label Learning with Pick-all-label Loss

paper_url: http://arxiv.org/abs/2310.15903
repo_url: https://github.com/heimine/nc_mlab
paper_authors: Pengyu Li, Yutong Wang, Xiao Li, Qing Qu
for: 本文研究了深度神经网络在多类别分类任务中的彩色折衣（NC）现象，并扩展到多标签学习任务。
methods: 本文使用了先前的工作 restriction to multi-class classification setting中发现的NC现象的扩展，并证明了一种泛化的NC现象在多标签学习中存在。
results: 本文发现了一种新的 combinatorial 性，称为“标签均值”性质，其表明在多标签样本中，每个标签的特征类别均值是各个标签的特征类别均值的扩展。此外，本文也证明了全球优化结果，表明在UFM下，pick-all-label cross entropy risk的全球优化结果是唯一的。

Abstract
We study deep neural networks for the multi-label classification (MLab) task through the lens of neural collapse (NC). Previous works have been restricted to the multi-class classification setting and discovered a prevalent NC phenomenon comprising of the following properties for the last-layer features: (i) the variability of features within every class collapses to zero, (ii) the set of feature means form an equi-angular tight frame (ETF), and (iii) the last layer classifiers collapse to the feature mean upon some scaling. We generalize the study to multi-label learning, and prove for the first time that a generalized NC phenomenon holds with the "pick-all-label'' formulation. Under the natural analog of the unconstrained feature model (UFM), we establish that the only global classifier of the pick-all-label cross entropy loss display the same ETF geometry which further collapse to multiplicity-1 feature class means. Besides, we discover a combinatorial property in generalized NC which is unique for multi-label learning that we call ``tag-wise average'' property, where the feature class-means of samples with multiple labels are scaled average of the feature class-means of single label tags. Theoretically, we establish global optimality result for the pick-all-label cross-entropy risk for the UFM. Additionally, We also provide empirical evidence to support our investigation into training deep neural networks on multi-label datasets, resulting in improved training efficiency.

摘要
我们研究深度神经网络对多类别分类任务（MLab）的学习过程中的神经垮坏（NC）现象。先前的研究都是对多类别分类任务进行研究，发现了一种常见的NC现象，其特征是：（i）每个类别内的特征变量归一化到零，（ii）最后层特征均值组成一个等角紧凑矩阵（ETF），以及（iii）最后层分类器归一化到特征均值。我们扩展了研究到多类别学习任务，并证明了一种普遍的NC现象在“选取所有标签”（pick-all-label）形式下存在。在自然的无约束特征模型（UFM）下，我们证明了全球分类器的pick-all-label权重概率分布具有相同的ETF几何结构，并且归一化到多重特征类别均值。此外，我们发现了多类别学习中特有的“标签均值”性质，我们称之为“标签均值”性质，即每个样本的多个标签的特征类别均值是多个标签的特征类别均值的扩展。我们 theoretically 证明了pick-all-label权重概率的全球最优性Result，并且我们还提供了实验证明，证明在训练深度神经网络在多类别数据上的效率提高。

Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data

paper_url: http://arxiv.org/abs/2310.15890
repo_url: https://github.com/aparna-aketi/cross_feature_contrastive_loss
paper_authors: Sai Aparna Aketi, Kaushik Roy
for: 这篇论文的目的是提出一种基于对比损失的无中心学习方法，以应对实际分布式数据中的数据不均匀性。
methods: 这篇论文使用的方法包括了无中心学习算法，以及基于对比损失的数据自由知识传递技术。
results: 实验结果显示，这篇论文的提案方法可以在各种计算机视觉数据集（CIFAR-10、CIFAR-100、时尚MINST、Imagenette、ImageNet）、模型架构（ResNet、Inception）和网络架构（FC、CNN）上显示出超过0.2-4%的提升（相较于其他现有的无中心学习方法）。

Abstract
The current state-of-the-art decentralized learning algorithms mostly assume the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the distributed datasets can have significantly heterogeneous data distributions across the agents. In this work, we present a novel approach for decentralized learning on heterogeneous data, where data-free knowledge distillation through contrastive loss on cross-features is utilized to improve performance. Cross-features for a pair of neighboring agents are the features (i.e., last hidden layer activations) obtained from the data of an agent with respect to the model parameters of the other agent. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.

摘要
现代的分布式学习算法大多假设数据分布是独立同分布（IID）。然而，在实际应用场景中，分布式数据集可能具有明显不同的数据分布。在这项工作中，我们提出了一种新的分布式学习方法，通过对不同特征进行数据解压缩，提高性能。特征是指一对邻居代理的特征（即模型参数之间的最后隐藏层活动）。我们通过对各种计算机视觉数据集（CIFAR-10、CIFAR-100、时尚MINST、Imagenette和ImageNet）、模型架构和网络结构进行广泛的实验，证明了我们的方法的有效性。我们的实验结果表明，我们的方法可以与其他现有的分布式学习方法相比，在不同数据分布情况下提高测试准确率（0.2-4%）。

State Sequences Prediction via Fourier Transform for Representation Learning

paper_url: http://arxiv.org/abs/2310.15888
repo_url: https://github.com/miralab-ustc/rl-spf
paper_authors: Mingxuan Ye, Yufei Kuang, Jie Wang, Rui Yang, Wengang Zhou, Houqiang Li, Feng Wu
for: 提高深度强化学习（RL）的数据效率，增强RL在复杂控制任务中的性能。
methods: 利用 Fourier 变换来提取状态序列中的下行结构信息，并通过这种信息来学习高效表示。
results: 实验表明，提案的方法可以在样本效率和性能两个方面超过一些现有的算法。

Abstract
While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.

摘要
深度强化学习（RL）已经在复杂控制任务中得到了证明，但是样本效率仍然是一个关键挑战，因为需要大量数据来获得出色的性能。现有研究利用表示学习来提高数据效率的RL，例如通过预测长期未来状态来学习预测性表示。然而，许多现有方法没有充分利用状态序列中的结构信息，这可能会提高长期决策质量，但是在时间频谱中很难发现。为解决这个问题，我们提出了状态序列预测via傅里叶变换（SPF），一种新的方法，它利用状态序列的频谱频率来提取下表示的基本特征，从而学习表达式的表示。我们 theoretically 分析了状态序列中的结构信息，与策略性能和信号规律性 closely 相关，然后提议预测无穷步未来状态序列的傅里叶变换来提取这些信息。SPF 的一个吸引人的特点是它简单易行，不需要存储无穷步未来状态作为预测目标。实验表明，提案的方法在样本效率和性能两个方面都超过了一些现有算法。

Using Causality-Aware Graph Neural Networks to Predict Temporal Centralities in Dynamic Graphs

paper_url: http://arxiv.org/abs/2310.15865
repo_url: None
paper_authors: Franziska Heeg, Ingo Scholtes
for: 该论文旨在 Addressing the issue of computationally expensive path-based centrality calculation in temporal graphs, and exploring the application of De Bruijn Graph Neural Networks (DBGNN) for predicting temporal path-based centralities in time series data.
methods: 该论文使用了 De Bruijn Graph Neural Networks (DBGNN) 来预测 temporal 图的路径基于中心性。
results: 该论文在 13 个 temporal 图 dataset 上进行实验，并显示了 DBGNN 可以 considerably improve the prediction of both betweenness and closeness centrality compared to static Graph Convolutional Neural Network.

Abstract
Node centralities play a pivotal role in network science, social network analysis, and recommender systems. In temporal data, static path-based centralities like closeness or betweenness can give misleading results about the true importance of nodes in a temporal graph. To address this issue, temporal generalizations of betweenness and closeness have been defined that are based on the shortest time-respecting paths between pairs of nodes. However, a major issue of those generalizations is that the calculation of such paths is computationally expensive. Addressing this issue, we study the application of De Bruijn Graph Neural Networks (DBGNN), a causality-aware graph neural network architecture, to predict temporal path-based centralities in time series data. We experimentally evaluate our approach in 13 temporal graphs from biological and social systems and show that it considerably improves the prediction of both betweenness and closeness centrality compared to a static Graph Convolutional Neural Network.

摘要
节点中心性在网络科学、社交网络分析和推荐系统中扮演着关键性的角色。在时间数据中，静态路径基于中心性如距离或中间性可能会导致 mistakenly 高估节点的重要性。为解决这个问题，我们定义了时间概念的betweenness和closeness中心性的扩展，它们基于时间尊重的最短路径 между pair of nodes。然而，这些扩展的计算具有高计算成本。为此，我们研究了使用 De Bruijn 图解决方案（DBGNN），一种具有 causality-awareness 的图解决方案，来预测时间序列数据中的时间路径基于中心性。我们对 13 个时间图从生物和社会系统进行实验，并证明我们的方法可以在预测 betweenness 和 closeness 中心性方面提高比 static 图CONvolutional Neural Network 的表现。

Improving Event Time Prediction by Learning to Partition the Event Time Space

paper_url: http://arxiv.org/abs/2310.15853
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Jimmy Hickey, Ricardo Henao, Daniel Wojdyla, Michael Pencina, Matthew M. Engelhard
for: 本研究旨在提高现有方法的存活分析方法，通过预测每个预先指定的时间间隔中事件发生的概率，以提高预测性能，特别是当数据充沛时。
methods: 本研究使用了一种方法，即通过学习数据中的切点来定义一个有限数量的时间间隔，以便在存活分析中进行更好的预测。
results: 在两个 simulated datasets 中，我们能够回归到下samples的切点，并在三个实际的观察数据集中显示了提高的预测性能。此外，我们还表明了该方法可以帮助临床决策人员更好地选择适合每个任务的时间间隔，以提高预测性能。

Abstract
Recently developed survival analysis methods improve upon existing approaches by predicting the probability of event occurrence in each of a number pre-specified (discrete) time intervals. By avoiding placing strong parametric assumptions on the event density, this approach tends to improve prediction performance, particularly when data are plentiful. However, in clinical settings with limited available data, it is often preferable to judiciously partition the event time space into a limited number of intervals well suited to the prediction task at hand. In this work, we develop a method to learn from data a set of cut points defining such a partition. We show that in two simulated datasets, we are able to recover intervals that match the underlying generative model. We then demonstrate improved prediction performance on three real-world observational datasets, including a large, newly harmonized stroke risk prediction dataset. Finally, we argue that our approach facilitates clinical decision-making by suggesting time intervals that are most appropriate for each task, in the sense that they facilitate more accurate risk prediction.

摘要
Translated into Simplified Chinese:现在的生存分析方法已经超越了现有的方法，它可以预测每个预先指定的时间间隔中事件发生的概率。通过不对事件密度做强制的参数假设，这种方法在数据充沛的情况下往往会提高预测性能。然而，在医疗设置中具有有限的数据时，通常是选择judiciously事件时间空间中的一个有限数量的间隔，以适应预测任务。在这个工作中，我们开发了一种从数据中学习分割点的方法，以定义这些间隔。我们在两个随机数据集中能够回归到下面的生成模型。然后，我们在三个实际观察数据集上进行了改进预测性能。最后，我们认为我们的方法可以促进临床决策，因为它们可以为每个任务提供时间间隔，这些时间间隔可以更好地预测风险。

Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting

paper_url: http://arxiv.org/abs/2310.16070
repo_url: None
paper_authors: Chengzhi Yao, Zhi Li, Junbo Wang
for: traffic forecasting in Intelligent Transportation Systems, to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data.
methods: combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data, using a spatial module and a temporal module, including an adaptive MixHop hypergraph ODE network and a hyperedge evolving ODE network.
results: superior performance compared to various baselines, as demonstrated through extensive experiments conducted on four real-world traffic datasets.

Abstract
Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the complex road network for traffic forecasting shallowly. Despite their effectiveness, these methods are generally limited in fully capturing high-order spatial dependencies caused by road network topology and high-order temporal dependencies caused by traffic dynamics. To tackle the above issues, we focus on the essence of traffic system and propose STHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Network, which combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data. Technically, STHODE consists of a spatial module and a temporal module. On the one hand, we construct a spatial hypergraph and leverage an adaptive MixHop hypergraph ODE network to capture high-order spatial dependencies. On the other hand, we utilize a temporal hypergraph and employ a hyperedge evolving ODE network to capture high-order temporal dependencies. Finally, we aggregate the outputs of stacked STHODE layers to mutually enhance the prediction performance. Extensive experiments conducted on four real-world traffic datasets demonstrate the superior performance of our proposed model compared to various baselines.

摘要
traffic 预测，受移动互联网发展和位置技术启发，在智能交通系统中扮演了关键角色。它帮助实施丰富多样的交通应用程序，为人们提供了基于收集的交通数据的便捷交通服务。现有方法通常利用图形基的深度学习网络来模型复杂的公路网络，尽管它们有效，但通常只能 superficiety 捕捉公路网络的高级空间相互关系和交通动态的高级时间相互关系。为了解决以上问题，我们关注交通系统的核心，并提出了 STHODE：空间-时间多重图神经方程网络，它将公路网络和交通动态联系起来，捕捉交通数据中的高级空间-时间相互关系。技术上来说，STHODE包括空间模块和时间模块。一方面，我们构建了空间多重图，并利用自适应 MixHop 多重图 ODE 网络来捕捉高级空间相互关系。另一方面，我们利用时间多重图，并使用超过度演化 ODE 网络来捕捉高级时间相互关系。最后，我们将堆叠的 STHODE 层输出进行互相增强预测性能。我们在四个实际交通数据集上进行了广泛的实验，结果表明我们的提出的模型在对比多种基准模型时表现出色。

Localization of Small Leakages in Water Distribution Networks using Concept Drift Explanation Methods

paper_url: http://arxiv.org/abs/2310.15830
repo_url: None
paper_authors: Valerie Vaquet, Fabian Hinder, Kathrin Lammers, Jonas Vaquet, Barbara Hammer
for: 该研究旨在帮助找到和地图水 Distribution 网络中的漏斗，以减少水资源的损失。
methods: 该研究使用压力测量来进行漏斗检测和地图。首先，漏斗在水 Distribution 网络中被模型化为 bayesian 网络，然后通过分析系统动态来连接漏斗问题和概念漂移。
results: 实验结果表明，使用模型基的解释可以帮助在有限信息 circumstance 中找到漏斗。

Abstract
Facing climate change the already limited availability of drinking water will decrease in the future rendering drinking water an increasingly scarce resource. Considerable amounts of it are lost through leakages in water transportation and distribution networks. Leakage detection and localization are challenging problems due to the complex interactions and changing demands in water distribution networks. Especially small leakages are hard to pinpoint yet their localization is vital to avoid water loss over long periods of time. While there exist different approaches to solving the tasks of leakage detection and localization, they are relying on various information about the system, e.g. real-time demand measurements and the precise network topology, which is an unrealistic assumption in many real-world scenarios. In contrast, this work attempts leakage localization using pressure measurements only. For this purpose, first, leakages in the water distribution network are modeled employing Bayesian networks, and the system dynamics are analyzed. We then show how the problem is connected to and can be considered through the lens of concept drift. In particular, we argue that model-based explanations of concept drift are a promising tool for localizing leakages given limited information about the network. The methodology is experimentally evaluated using realistic benchmark scenarios.

摘要
面对气候变化，未来 drinking water 的可用性将更加紧缩，这将导致 drinking water 成为一种渐渐减少的资源。现有大量 drinking water 通过水交通和分配网络的泄漏而丢失。泄漏探测和定位是一项具有挑战性的问题，因为水分配网络的复杂交互和变化的需求。特别是小泄漏难以寻址，但其localization是非常重要，以避免长时间内泄漏水。现有不同的方法来解决泄漏探测和定位问题，但它们均基于不同的信息，如实时需求测量和网络topology的精确知识，这在许多实际场景中是不可能的假设。相反，这项工作使用压力测量来进行泄漏定位。为此，我们首先使用 Bayesian 网络模型了泄漏在水分配网络中，然后分析系统动态。我们 THEN 显示了如何通过概念漂移来连接这个问题，并证明了模型基于解释是一种有 Promise 的工具来定位泄漏。该方法在实际benchmark场景下进行了实验性评估。

One or Two Things We know about Concept Drift – A Survey on Monitoring Evolving Environments

paper_url: http://arxiv.org/abs/2310.15826
repo_url: None
paper_authors: Fabian Hinder, Valerie Vaquet, Barbara Hammer
for: 本研究对于概念漂移在无监督数据流中进行了文献综述，这个Setting particularly relevant for monitoring和异常检测，这些任务和挑战在工程中直接适用。
methods: 本文提供了一种系统的文献综述，同时还提供了precise的数学定义，以及基于 Parametric artificial datasets的标准化实验，allowing for direct comparison of different strategies for detection and localization。
results: 本研究提供了一种系统的文献综述，并对不同的探测和定位策略进行了系统的分析和评价，从而为实际场景中的应用提供了指南。 Additionally, the paper discusses the emerging topic of explaining concept drift.

Abstract
The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on drift detection. Besides, it covers the current state of research on drift localization in a systematic way. In addition to providing a systematic literature review, this work provides precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different strategies for detection and localization. Thereby, the suitability of different schemes can be analyzed systematically and guidelines for their usage in real-world scenarios can be provided. Finally, there is a section on the emerging topic of explaining concept drift.

摘要
世界周围的变化是不断发生的，这些变化通常被称为概念融合（concept drift），它们影响了多种工业和技术过程。由于这些变化可能会导致机器人员和其他不正常行为，因此检测和分析概念融合是非常重要的。在这篇论文中，我们提供了关于概念融合在无监督数据流中的文献综述。虽然许多调查集中了监督数据流，但是这个设置尤其关键对监测和异常检测，这些任务直接适用于许多工程应用。本文提供了概念融合检测的taxonomy，并对现有研究进行了系统性的概述。此外，本文还提供了精确的数学定义，以及参考 datasets的标准实验，以便对不同策略的检测和定位进行直接比较。因此，可以系统地分析不同方案的适用性，并提供实际应用场景中的指南。最后，文章还包括一节关于解释概念融合的新趋势。

Nonlinear dimensionality reduction then and now: AIMs for dissipative PDEs in the ML era

paper_url: http://arxiv.org/abs/2310.15816
repo_url: None
paper_authors: Eleni D. Koronaki, Nikolaos Evangelou, Cristina P. Martin-Linares, Edriss S. Titi, Ioannis G. Kevrekidis
for: 这个研究报告是关于构建分布式动力系统的减少预算模型（ROMs）的一系列纯数据驱动的工作流程。
methods: 这些ROMs是基于 Approximate Inertial Manifolds（AIMs）理论的数据助け模型，特别是garcia-archilla、novo和titi所提出的后处理加erkin方法。这种方法的应用范围可以扩展到不知道正确的潜在变量的情况下，使用机器学习工具来避免精度退化的问题。
results: 该方法可以表达ROMs为（a）理论（Fourier积分）、（b）线性数据驱动（POD模式）和/或（c）非线性数据驱动（Diffusion Maps）坐标。此外，还描述了黑盒和（理论 Informed和数据修正的）灰盒模型，后者是因为逐个 Galerkin 投影不准确而不可以进行后处理。通过使用 Chafee-Infante 反应扩散和 Kuramoto-Sivashinsky 损失偏微分方程来证明和成功测试整个框架。

Abstract
This study presents a collection of purely data-driven workflows for constructing reduced-order models (ROMs) for distributed dynamical systems. The ROMs we focus on, are data-assisted models inspired by, and templated upon, the theory of Approximate Inertial Manifolds (AIMs); the particular motivation is the so-called post-processing Galerkin method of Garcia-Archilla, Novo and Titi. Its applicability can be extended: the need for accurate truncated Galerkin projections and for deriving closed-formed corrections can be circumvented using machine learning tools. When the right latent variables are not a priori known, we illustrate how autoencoders as well as Diffusion Maps (a manifold learning scheme) can be used to discover good sets of latent variables and test their explainability. The proposed methodology can express the ROMs in terms of (a) theoretical (Fourier coefficients), (b) linear data-driven (POD modes) and/or (c) nonlinear data-driven (Diffusion Maps) coordinates. Both Black-Box and (theoretically-informed and data-corrected) Gray-Box models are described; the necessity for the latter arises when truncated Galerkin projections are so inaccurate as to not be amenable to post-processing. We use the Chafee-Infante reaction-diffusion and the Kuramoto-Sivashinsky dissipative partial differential equations to illustrate and successfully test the overall framework.

摘要
In this study, we focus on the use of machine learning tools to circumvent the need for accurate truncated Galerkin projections and to derive closed-formed corrections in the construction of ROMs. Specifically, we use autoencoders and Diffusion Maps to discover good sets of latent variables and to test their explainability. The proposed methodology is able to express the ROMs in terms of different types of coordinates, including theoretical (Fourier coefficients), linear data-driven (POD modes), and nonlinear data-driven (Diffusion Maps) coordinates. Both Black-Box and Gray-Box models are described, with the necessity for the latter arising when the truncated Galerkin projections are inaccurate and not amenable to post-processing.The study demonstrates the applicability of the proposed methodology on two examples: the Chafee-Infante reaction-diffusion equation and the Kuramoto-Sivashinsky dissipative partial differential equation. The results show that the methodology is able to successfully construct ROMs for these systems, and that the use of machine learning tools can improve the accuracy and efficiency of the ROM construction process.

Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations

paper_url: http://arxiv.org/abs/2310.15815
repo_url: None
paper_authors: Ye Yuan, Xin Li, Yong Heng, Leiji Zhang, MingZhong Wang
for: 本文目的是探讨如何使用自我驱动的imitasion learning（SMILE）方法，排除具有低级别专家知识的示范行为，从而帮助学习者更好地学习专家示范行为。
methods: 本文使用了Diffusion Models的前向和反向过程，模拟示范行为中的专家知识散布，然后利用这些信息来预测示范行为中的扩散步数，并在自我驱动的方式下筛选掉不符合当前策略的示范行为。
results: 经验证明，SMILE方法可以在具有噪音示范行为的情况下，有效地学习专家示范行为，并且可以准确地排除不符合当前策略的示范行为。

Abstract
Imitation Learning (IL) aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations. However, IL is susceptible to limitations imposed by noisy demonstrations from non-expert behaviors, presenting a significant challenge due to the lack of supplementary information to assess their expertise. In this paper, we introduce Self-Motivated Imitation LEarning (SMILE), a method capable of progressively filtering out demonstrations collected by policies deemed inferior to the current policy, eliminating the need for additional information. We utilize the forward and reverse processes of Diffusion Models to emulate the shift in demonstration expertise from low to high and vice versa, thereby extracting the noise information that diffuses expertise. Then, the noise information is leveraged to predict the diffusion steps between the current policy and demonstrators, which we theoretically demonstrate its equivalence to their expertise gap. We further explain in detail how the predicted diffusion steps are applied to filter out noisy demonstrations in a self-motivated manner and provide its theoretical grounds. Through empirical evaluations on MuJoCo tasks, we demonstrate that our method is proficient in learning the expert policy amidst noisy demonstrations, and effectively filters out demonstrations with expertise inferior to the current policy.

摘要
自适应学习（IL）目的是找到一个策略，以尽可能地减少代理者行为与专家示范之间的差异。然而，IL 受到非专家行为的示范噪声的限制，这是一个主要挑战，因为缺乏补充信息来评估他们的专业程度。在这篇论文中，我们介绍了一种名为自适应模仿学习（SMILE）的方法，可以逐渐排除由当前策略评估为非专业的示范集，不需要额外信息。我们利用了升降 diffusion 模型的前后进程，模拟示范专业程度从低到高和倒数，从而提取噪声信息。然后，我们利用这些噪声信息预测示范者与当前策略之间的扩散步骤，我们 theoretically 证明其等价于专业差距。我们进一步解释了在实际评估中如何应用预测的扩散步骤来自适应地排除噪声示范，并提供了其理论基础。通过在 MuJoCo 任务上的实际评估，我们证明了我们的方法可以在噪声示范下学习专家策略，并有效地排除不符合当前策略的示范。

Amortised Inference in Neural Networks for Small-Scale Probabilistic Meta-Learning

paper_url: http://arxiv.org/abs/2310.15786
repo_url: None
paper_authors: Matthew Ashman, Tommy Rochussen, Adrian Weller
for: 用于实现 task-specific BNNs 的 Bayesian inference
methods: 使用 inducing point variational approximation，将 inducing inputs 替换为实际数据，并使用权重网络（inference network）进行权重学习
results: 实现了对 task-specific BNNs 的权重学习 Bayesian inference

Abstract
The global inducing point variational approximation for BNNs is based on using a set of inducing inputs to construct a series of conditional distributions that accurately approximate the conditionals of the true posterior distribution. Our key insight is that these inducing inputs can be replaced by the actual data, such that the variational distribution consists of a set of approximate likelihoods for each datapoint. This structure lends itself to amortised inference, in which the parameters of each approximate likelihood are obtained by passing each datapoint through a meta-model known as the inference network. By training this inference network across related datasets, we can meta-learn Bayesian inference over task-specific BNNs.

摘要
全球启发点可变近似方法是基于使用一组启发输入来构建一系列条件分布，准确地近似真实 posterior 分布的 conditionals。我们的关键发现是这些启发输入可以被取代为实际数据，从而变量分布包含一系列的近似可能性分布，每个数据点的参数可以通过一个称为推理网络的meta-模型来获得。通过训练这个推理网络在相关的数据集上，我们可以meta-学习 bayesian推理。

Robust Learning via Conditional Prevalence Adjustment

paper_url: http://arxiv.org/abs/2310.15766
repo_url: https://github.com/mnhng/CoPA
paper_authors: Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu
for: 这个研究旨在解决健康领域资料中的潜在联合问题，该联合问题的存在可能导致深度学习模型在未见过的站点上失败。
methods: 这个研究提出了一种名为CoPA（Conditional Prevalence-Adjustment）的方法，该方法假设（1）生成机制是稳定的，即标签Y和偏导变量Z生成X，以及（2）每个站点E中的不稳定 conditional prevalence完全accounts for X和Y之间的不稳定相互关联。
results: 实验结果显示CoPA可以比基本reference beat competitive baselines。

Abstract
Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore unstable correlations, but doing so may lead to poor predictive performance. Other methods (e.g. Invariant risk minimization [4]) try to learn domain-invariant representations that rely only on stable associations by assuming a causal data-generating process (input X causes class label Y ). Thus, they may be ineffective for anti-causal tasks (Y causes X), which are common in computer vision. We propose a method called CoPA (Conditional Prevalence-Adjustment) for anti-causal tasks. CoPA assumes that (1) generation mechanism is stable, i.e. label Y and confounding variable(s) Z generate X, and (2) the unstable conditional prevalence in each site E fully accounts for the unstable correlations between X and Y . Our crucial observation is that confounding variables are routinely recorded in healthcare settings and the prevalence can be readily estimated, for example, from a set of (Y, Z) samples (no need for corresponding samples of X). CoPA can work even if there is a single training site, a scenario which is often overlooked by existing methods. Our experiments on synthetic and real data show CoPA beating competitive baselines.

摘要
医疗数据经常来自多个站点，其中 correlate 变量之间的相关性可能很大。如果深度学习模型利用这些不稳定相关性，它们可能在未看过的站点上失败 catastrophically。虽然许多方法已经提出来解决不稳定相关性问题，但每种方法都有其限制。例如，对抗性训练 forces 模型完全忽略不稳定相关性，但这可能导致预测性能差。其他方法（例如不变 risk minimization [4]）尝试学习域不变的表示，它们假设数据生成过程是 causal 的（输入 X 导致类别标签 Y）。因此，它们可能无效于反 causal 任务（Y 导致 X），这些任务在计算机视觉中很常见。我们提出了一种方法called CoPA（conditional Prevalence-Adjustment），这种方法适用于反 causal 任务。CoPA 假设（1）生成机制是稳定的，即标签 Y 和干扰变量 Z 生成 X，以及（2）每个站点 E 中的不稳定 conditional prevalence completly accounts for 不稳定相关性 между X 和 Y。我们的关键观察是，干扰变量通常在医疗设置中记录，并且 prevalence 可以从 (Y, Z) 样本中Ready estimation（无需对应的 X 样本）。CoPA 可以在单个训练站点上工作，这种情况frequently overlooked by existing methods。我们在 sintetic 和实际数据上进行了实验，CoPA 可以击败竞争基线。

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization

paper_url: http://arxiv.org/abs/2310.15744
repo_url: None
paper_authors: Yuta Hozumi, Guo-Wei Wei
for: 这个论文旨在探讨单个细胞RNA sequencing（scRNA-seq）数据的统计分析方法，尤其是非正定矩阵分解（NMF）方法在scRNA-seq数据中的应用。
methods: 这篇论文提出了两种持续拉普拉斯regularized NMF方法，即topological NMF（TNMF）和robust topological NMF（rTNMF），并在12个数据集中进行了比较性研究，结果显示这两种方法在NMF-based方法中具有显著优势。
results: 研究人员通过使用TNMF和rTNMF方法对流行的Uniform Manifold Approximation and Projection（UMAP）和t-distributed stochastic neighbor embedding（t-SNE）进行了可视化，并证明了这两种方法在scRNA-seq数据中的应用可以带来更好的结果。

Abstract
Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

摘要

Improving Diffusion Models for ECG Imputation with an Augmented Template Prior

paper_url: http://arxiv.org/abs/2310.15742
repo_url: None
paper_authors: Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic
for: 提高心电图（ECG）中缺失值的填充和预测精度，使用 probabilistic time-series models。
methods: 使用 template-guided denoising diffusion probabilistic model（PulseDiff），conditioned on informative prior for range of health conditions，包括个体特征和心跳水平的Variation。
results: 对PTBXL数据集进行实验，显示 PulseDiff 方法可以提高 DDPMs 基线模型 CSDI 和 SSSD$^{S4}$ 的性能，并且可以安全地使用医疗知识来提供先前的 priors。合并 SSDD$^{S4}$ 模型时，PulseDiff 方法在短时间缺失数据下表现出色，与长时间缺失数据下的性能相当。

Abstract
Pulsative signals such as the electrocardiogram (ECG) are extensively collected as part of routine clinical care. However, noisy and poor-quality recordings, leading to missing values, are a major issue for signals collected using mobile health systems, decreasing the signal quality and affecting the automated downstream tasks. Recent studies have explored imputation of missing values for ECG with probabilistic time-series models. Nevertheless, in comparison with the deterministic models, their performance is still limited, as the variations across subjects and heart-beat relationships are not explicitly considered in the training objective. In this work, to improve the ECG imputation and forecasting accuracy with probabilistic models, we present an template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned an informative prior for a range of health conditions. Specifically, 1) we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; 2) we then add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; 3) we finally design a confidence score to consider the health condition of subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.

摘要
脉冲信号如电子心脉agram（ECG）在 Routine 临床护理中广泛采集。然而，手持设备采集的信号具有噪声和低质量，导致数据损失，这会降低信号质量并影响下游任务的自动化。Recent studies explored imputation of missing values for ECG with probabilistic time-series models. However, their performance is still limited, as they do not explicitly consider the variations across subjects and heart-beat relationships in the training objective. In this work, we aim to improve the ECG imputation and forecasting accuracy with probabilistic models by presenting a template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned on an informative prior for a range of health conditions. Specifically, we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; then, we add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; finally, we design a confidence score to consider the health condition of the subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal that PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

paper_url: http://arxiv.org/abs/2310.15709
repo_url: None
paper_authors: Hiroshi Morioka, Aapo Hyvärinen
for: The paper is written for learning a causal model for hidden features in a data-driven manner, specifically addressing the problem of Causal Representation Learning (CRL) which is ill-posed and difficult to identify.
methods: The paper proposes a novel approach to CRL that is based on weak constraints and does not require temporal structure, intervention, or weak supervision. The approach uses assuming the observational mixing exhibits a suitable grouping of the observational variables.
results: The paper shows that the proposed approach is statistically consistent and experimentally demonstrates superior CRL performances compared to state-of-the-art baselines. Additionally, the paper demonstrates the robustness of the approach against latent confounders and causal cycles.

Abstract
A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles.

摘要
Currently, a topic of great interest is Causal Representation Learning (CRL), which aims to learn a causal model for hidden features in a data-driven manner. However, CRL is severely ill-posed because it combines the two notoriously ill-posed problems of representation learning and causal discovery. Finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most existing approaches rely on assumptions about the latent causal mechanisms, such as temporal causality or the existence of supervision or interventions, which can be too restrictive in real-world applications.In this study, we propose a novel approach to identifiability based on weak constraints that do not require temporal structure, intervention, or weak supervision. The approach is based on the assumption that the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework that is consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performance compared to state-of-the-art baselines. Furthermore, we demonstrate its robustness against latent confounders and causal cycles.

Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

paper_url: http://arxiv.org/abs/2310.15681
repo_url: None
paper_authors: Shintaro Nakamura, Masashi Sugiyama
for: 这个论文是关于实值 combinatorial 纯探索多臂抓拍机的 fixed-budget 设定下的研究。
methods: 论文首先介绍了 Combinatorial Successive Assignment (CSA) 算法，这是第一个可以在 exponentially large 的动作类型下标识最佳动作的算法。然后，论文还介绍了另一种名为 Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) 算法，用于对动作类型的大小为 polynomial 的情况，并证明其是最佳的。
results: 论文通过对先前方法进行实验比较，显示了自己的算法的优越性。

Abstract
We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better.

摘要
我们研究了实数值的 combinatorial 纯查探多枪仓设定下的多枪仓问题。我们首先介绍了 Combinatorial Successive Assignment（CSA）算法，这是第一个可以确定最佳动作，即使动作类型数量为对数几何函数而言是非常大的算法。我们表明了 CSA 算法的上界错误概率与下界差分Logarithmic factor。然后，我们介绍了另一种名为 Minimax Combinatorial Successive Accepts and Rejects（Minimax-CombSAR）算法，用于情况下动作类型数量是 polynomial，并证明其是优化的，与下界匹配。最后，我们对 previous methods 进行了实验比较，并证明我们的算法表现更好。Note: "多枪仓" (dà qiàng chuī) is a Chinese term that refers to the multi-armed bandit problem.

Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting

paper_url: http://arxiv.org/abs/2310.15662
repo_url: None
paper_authors: Linxiao Yang, Rui Ren, Xinyue Gu, Liang Sun
for: 预测电力负荷，帮助电力系统规划和管理。
methods: 我们提出了一种可互动的 generalized additive model（GAM），利用划分式线性函数，可以在限量数据或缺乏数据情况下提高预测性能。
results: 在公共benchmark和电力数据集上，我们的可互动GAM比现有状态对照方法高效，并且在极端天气事件情况下也表现良好。我们已经将这种模型集成到我们的eforecaster产品中，并创建了一个User-friendly的Web应用程序。

Abstract
Electric load forecasting is an indispensable component of electric power system planning and management. Inaccurate load forecasting may lead to the threat of outages or a waste of energy. Accurate electric load forecasting is challenging when there is limited data or even no data, such as load forecasting in holiday, or under extreme weather conditions. As high-stakes decision-making usually follows after load forecasting, model interpretability is crucial for the adoption of forecasting models. In this paper, we propose an interactive GAM which is not only interpretable but also can incorporate specific domain knowledge in electric power industry for improved performance. This boosting-based GAM leverages piecewise linear functions and can be learned through our efficient algorithm. In both public benchmark and electricity datasets, our interactive GAM outperforms current state-of-the-art methods and demonstrates good generalization ability in the cases of extreme weather events. We launched a user-friendly web-based tool based on interactive GAM and already incorporated it into our eForecaster product, a unified AI platform for electricity forecasting.

摘要
电力负荷预测是电力系统规划和管理中不可或缺的一部分。不准确的电力预测可能会导致供电停顺或能源浪费。正确的电力预测在有限数据或缺乏数据情况下是挑战。例如，在假日或极端天气情况下， load forecasting 难以准确。因为高飞度决策通常会随后进行load forecasting， therefore model interpretability是预测模型的采用的关键因素。在这篇论文中，我们提出了一种可交互的Generalized Additive Model（GAM），不仅可 interpretability，而且可以包含特定领域知识以提高性能。这种boosting-based GAM使用分割线性函数，可以通过我们的效果algorithm learn。在公共 benchmark和电力数据集上，我们的交互式 GAM 超过当前状态艺术方法，并在极端天气事件情况下展现了良好的总体化能力。我们已经将交互式 GAM 集成到我们的 eForecaster 产品中，这是一个统一的 AI 平台 для电力预测。

paper_url: http://arxiv.org/abs/2310.16063
repo_url: None
paper_authors: Yuanshao Zhu, Yongchao Ye, Xiangyu Zhao, James J. Q. Yu
for: 提高交通预测模型的准确率， addresses the challenge of modeling future traffic conditions by proposing a learnable filter module to adaptively filter out noise in traffic data.
methods: 使用 Fourier 变换将数据转移到频域，并基于频谱特征对噪声进行过滤。然后使用反 Fourier 变换将过滤后的数据转换回时域。
results: 实验结果表明，提出的模块可以有效地 Mitigate 噪声，提高交通预测性能。

Abstract
Modeling future traffic conditions often relies heavily on complex spatial-temporal neural networks to capture spatial and temporal correlations, which can overlook the inherent noise in the data. This noise, often manifesting as unexpected short-term peaks or drops in traffic observation, is typically caused by traffic accidents or inherent sensor vibration. In practice, such noise can be challenging to model due to its stochastic nature and can lead to overfitting risks if a neural network is designed to learn this behavior. To address this issue, we propose a learnable filter module to filter out noise in traffic data adaptively. This module leverages the Fourier transform to convert the data to the frequency domain, where noise is filtered based on its pattern. The denoised data is then recovered to the time domain using the inverse Fourier transform. Our approach focuses on enhancing the quality of the input data for traffic prediction models, which is a critical yet often overlooked aspect in the field. We demonstrate that the proposed module is lightweight, easy to integrate with existing models, and can significantly improve traffic prediction performance. Furthermore, we validate our approach with extensive experimental results on real-world datasets, showing that it effectively mitigates noise and enhances prediction accuracy.

摘要
(Simplified Chinese translation)模型未来交通情况经常利用复杂的空间-时间神经网络来捕捉空间和时间相关性，这可能会忽略数据中的随机噪声。这种噪声，通常表现为交通事故或感知器振荡所引起的意外短期峰值或下降，会导致模型过拟合。为解决这问题，我们提出了一个可学习的筛波模块，可以适应性地筛除交通数据中的噪声。这个模块利用快推转换数据到频率域，然后根据噪声的模式进行筛选。筛选后的数据再用逆快推转换回时间域。我们的方法注重提高交通预测模型的输入数据质量，这是预测领域中很重要的一点，尤其是在实际应用中。我们的实验结果表明，我们的方法可以减少噪声，提高预测精度。

Momentum Gradient-based Untargeted Attack on Hypergraph Neural Networks

paper_url: http://arxiv.org/abs/2310.15656
repo_url: None
paper_authors: Yang Chen, Stjepan Picek, Zhonglin Ye, Zhaoyang Wang, Haixing Zhao
For: 这个论文的目的是研究隐藏图论文模型（HGNNs）对于隐藏图相关任务的应用。由于HGNNs具有高阶表示能力，因此它们在各种任务上表现出色。然而，深度学习模型在攻击下可能会受到影响，而大多数研究都集中在图神经网络（GNNs）上，对于HGNNs的攻击研究还很少。这篇论文试图填补这个空白。* Methods: 这篇论文提出了一种新的HGNNs攻击模型，称为MGHGA，用于无目标攻击。MGHGA的核心思想是通过修改节点特征来攻击HGNNs。具体来说，MGHGA包括两个部分：特征选择和特征修改。在特征选择模块中，我们使用了沃尔tz势力Gradient机制来选择攻击节点的特征。在特征修改模块中，我们使用了两种特征生成方法（直接修改和签号Gradient）来使MGHGA适用于离散和连续数据集。* Results: 我们在五个benchmark数据集上进行了广泛的实验，以验证MGHGA在节点和视对象分类任务中的攻击性能。结果显示，MGHGA在比基线模型高2%的平均提高了性能。

Abstract
Hypergraph Neural Networks (HGNNs) have been successfully applied in various hypergraph-related tasks due to their excellent higher-order representation capabilities. Recent works have shown that deep learning models are vulnerable to adversarial attacks. Most studies on graph adversarial attacks have focused on Graph Neural Networks (GNNs), and the study of adversarial attacks on HGNNs remains largely unexplored. In this paper, we try to reduce this gap. We design a new HGNNs attack model for the untargeted attack, namely MGHGA, which focuses on modifying node features. We consider the process of HGNNs training and use a surrogate model to implement the attack before hypergraph modeling. Specifically, MGHGA consists of two parts: feature selection and feature modification. We use a momentum gradient mechanism to choose the attack node features in the feature selection module. In the feature modification module, we use two feature generation approaches (direct modification and sign gradient) to enable MGHGA to be employed on discrete and continuous datasets. We conduct extensive experiments on five benchmark datasets to validate the attack performance of MGHGA in the node and the visual object classification tasks. The results show that MGHGA improves performance by an average of 2% compared to the than the baselines.

摘要
希PEREP Neural Networks (HGNNs) 已经成功应用于多个几何グラフ関系的任务中，因为它具有出色的高阶表现能力。 recent works have shown that deep learning models are vulnerable to adversarial attacks. most studies on graph adversarial attacks have focused on Graph Neural Networks (GNNs), and the study of adversarial attacks on HGNNs remains largely unexplored. in this paper, we try to reduce this gap. we design a new HGNNs attack model for the untargeted attack, namely MGHGA, which focuses on modifying node features. we consider the process of HGNNs training and use a surrogate model to implement the attack before hypergraph modeling. specifically, MGHGA consists of two parts: feature selection and feature modification. we use a momentum gradient mechanism to choose the attack node features in the feature selection module. in the feature modification module, we use two feature generation approaches (direct modification and sign gradient) to enable MGHGA to be employed on discrete and continuous datasets. we conduct extensive experiments on five benchmark datasets to validate the attack performance of MGHGA in the node and the visual object classification tasks. the results show that MGHGA improves performance by an average of 2% compared to the baselines.

Deceptive Fairness Attacks on Graphs via Meta Learning

paper_url: http://arxiv.org/abs/2310.15653
repo_url: https://github.com/jiank2/fate
paper_authors: Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong
for: 本研究旨在Answering the question of how to launch poisoning attacks on graph learning models to exacerbate bias in a deceptive manner.
methods: 本研究使用bi-level优化问题和meta学习基础结构（FATE）来实现攻击。FATE 可以应用于不同的公平定义和图学习模型，以及任意的操作修改。
results: 实验结果表明，FATE 可以在真实世界数据集上增强图神经网络的偏见，无论是否考虑公平性。此外，FATE 还可以维持下游任务的有用性。本研究提供了对不公正公平图学习的抗 adversarial 性的新的理解，并可能为未来的研究提供指导。

Abstract
We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as well as arbitrary choices of manipulation operations. We further instantiate FATE to attack statistical parity and individual fairness on graph neural networks. We conduct extensive experimental evaluations on real-world datasets in the task of semi-supervised node classification. The experimental results demonstrate that FATE could amplify the bias of graph neural networks with or without fairness consideration while maintaining the utility on the downstream task. We hope this paper provides insights into the adversarial robustness of fair graph learning and can shed light on designing robust and fair graph learning in future studies.

摘要
我们研究欺骗性公平攻击图进行回答：如何通过欺骗攻击图学模型，扩大偏见？我们通过二级优化问题回答这个问题，并提出了一种基于元学习的框架名为FATE。FATE在不同的公平定义和图学模型之间都是广泛应用的，同时还可以针对各种杂化操作进行配置。我们进一步实现FATE来攻击统计均衡和个人公平在图神经网络上。我们在实际世界数据集上进行了广泛的实验评估，结果表明FATE可以在不考虑公平情况下或者考虑公平情况下增强图神经网络的偏见，同时保持下游任务的实用性。我们希望这篇论文可以提供关于公平 graph learning 的抗攻击性和未来研究的灵感。

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

paper_url: http://arxiv.org/abs/2310.15648
repo_url: https://github.com/fschmid56/efficientat
paper_authors: Florian Schmid, Khaled Koutini, Gerhard Widmer
for: 这个论文的目的是提高大规模的音频数据集上的音频标签任务的效果，并且比对传统的卷积神经网络（CNN）和变换器（Transformer）的性能。
methods: 该论文使用了知识填充（Knowledge Distillation）技术，将Transformer知识填充到有限的效率卷积神经网络（efficient CNN）中，以提高其性能。此外，该论文还引入了动态卷积块（dynamic convolutions）、非线性（non-linearities）和注意机制（attention mechanisms），以提高效率卷积神经网络的性能。
results: 实验结果表明，引入的动态卷积块和非线性等技术可以提高效率卷积神经网络的性能，并且在AudioSet和多个下游任务上达到或超过Transformer的性能。此外，该论文还发现，动态卷积块和非线性等技术可以提高下游任务的灵活性和泛化能力。

Abstract
The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks. However, current popular Audio Spectrogram Transformers are demanding in terms of computational complexity compared to CNNs. Recently, we have shown that, by employing Transformer-to-CNN Knowledge Distillation, efficient CNNs can catch up with and even outperform Transformers on large datasets. In this work, we extend this line of research and increase the capacity of efficient CNNs by introducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamic convolutions and attention mechanisms. We show that these dynamic CNNs outperform traditional efficient CNNs, in terms of the performance-complexity trade-off and parameter efficiency, at the task of audio tagging on the large-scale AudioSet. Our experiments further indicate that the introduced dynamic CNNs achieve better performance on downstream tasks and scale up well, attaining Transformer performance and even outperforming them on AudioSet and several downstream tasks.

摘要
大量的声音数据集，如AudioSet，开创了Transformer在声音领域的统治之路，取代了CNN作为下游任务的状态态程最佳神经网络架构。声音柱图Transformer具有大量数据集的利用能力，创造了强大的预训练模型，在下游任务上精度训练时超过CNN。然而，当前流行的声音柱图Transformer在计算复杂性方面较高，与CNN相比。在最近的研究中，我们发现，通过Transformer-to-CNN知识储存学习，可以使得高效的CNNcatch up与和超过Transformers在大规模数据集上。在这项工作中，我们扩展这一研究，提高高效CNN的容量，通过引入动态非线性、动态核算和注意机制来构建动态CNN块。我们示出，这些动态CNN可以在性能和复杂度之间取得更好的平衡点，并且在AudioSet和多个下游任务上达到或超过Transformer的性能。我们的实验还表明，引入的动态CNN可以在更多的下游任务上进行扩展，并且可以在AudioSet和多个下游任务上保持稳定的性能。

Light up that Droid! On the Effectiveness of Static Analysis Features against App Obfuscation for Android Malware Detection

paper_url: http://arxiv.org/abs/2310.15645
repo_url: None
paper_authors: Borja Molina-Coronado, Antonio Ruggia, Usue Mori, Alessio Merlo, Alexander Mendiburu, Jose Miguel-Alonso
for: 本研究旨在探讨针对Android平台的Machine Learning（ML）恶意软件检测器是否能够抵抗增强难以理解的软件（obfuscation）的影响。
methods: 本研究使用了多种常见的静态分析特征，包括代码字串、API调用记录、函数名称和参数等，并使用了多种静态分析工具来检测和分析软件的增强难以理解。
results: 研究发现，增强难以理解可以对静态分析特征产生一定的影响，但是certain features still retain their validity for ML malware detection even in the presence of obfuscation。基于这些发现，本研究提出了一种robust against obfuscation的ML恶意软件检测器，并比当前状态的检测器表现更高。

Abstract
Malware authors have seen obfuscation as the mean to bypass malware detectors based on static analysis features. For Android, several studies have confirmed that many anti-malware products are easily evaded with simple program transformations. As opposed to these works, ML detection proposals for Android leveraging static analysis features have also been proposed as obfuscation-resilient. Therefore, it needs to be determined to what extent the use of a specific obfuscation strategy or tool poses a risk for the validity of ML malware detectors for Android based on static analysis features. To shed some light in this regard, in this article we assess the impact of specific obfuscation techniques on common features extracted using static analysis and determine whether the changes are significant enough to undermine the effectiveness of ML malware detectors that rely on these features. The experimental results suggest that obfuscation techniques affect all static analysis features to varying degrees across different tools. However, certain features retain their validity for ML malware detection even in the presence of obfuscation. Based on these findings, we propose a ML malware detector for Android that is robust against obfuscation and outperforms current state-of-the-art detectors.

摘要
malware作者们看到了隐藏为了绕过基于静态分析特征的反毒软件。 Android 上有许多研究证明了许多反毒软件可以轻松地被隐藏。相比之下， ML 检测建议 для Android leveraging 静态分析特征也已经被提出侔避免这种情况。因此，需要决定使用特定隐藏策略或工具对 Android 基于静态分析特征的 ML 毒软件检测器的有效性对确。为了解释这问题，这篇文章评估了具体隐藏技术对常用的静态分析特征的影响，并决定这些变化是否足以导致 ML 毒软件检测器的无效性。实验结果表明隐藏技术对不同工具的静态分析特征产生了不同的影响，但certain 特征在隐藏情况下仍然保持其有效性。基于这些发现，我们提出了一个robust against 隐藏的 ML 毒软件检测器，并与现有的州�� ell-of-the-art 检测器进行比较。

Guaranteed Coverage Prediction Intervals with Gaussian Process Regression

paper_url: http://arxiv.org/abs/2310.15641
repo_url: None
paper_authors: Harris Papadopoulos
for: 提高 Gaussian Process Regression（GPR）中的预测不确定性估计的准确性，以便更好地评估模型的性能。
methods: 基于 Conformal Prediction（CP）机器学习框架，对 GPR 进行扩展，以 garantizar 预测interval的覆盖率达到所需的水平，即使模型是完全错误的。
results: 在实验中，提出的方法比现有方法更为有效，可以更好地评估模型的性能。

Abstract
Gaussian Process Regression (GPR) is a popular regression method, which unlike most Machine Learning techniques, provides estimates of uncertainty for its predictions. These uncertainty estimates however, are based on the assumption that the model is well-specified, an assumption that is violated in most practical applications, since the required knowledge is rarely available. As a result, the produced uncertainty estimates can become very misleading; for example the prediction intervals (PIs) produced for the 95\% confidence level may cover much less than 95\% of the true labels. To address this issue, this paper introduces an extension of GPR based on a Machine Learning framework called, Conformal Prediction (CP). This extension guarantees the production of PIs with the required coverage even when the model is completely misspecified. The proposed approach combines the advantages of GPR with the valid coverage guarantee of CP, while the performed experimental results demonstrate its superiority over existing methods.

摘要

Contextual directed acyclic graphs

paper_url: http://arxiv.org/abs/2310.15627
repo_url: https://github.com/ryan-thompson/contextualdag.jl
paper_authors: Ryan Thompson, Edwin V. Bonilla, Robert Kohn
for: 本研究旨在解决由观测数据提供的导向无环图（DAG）结构估计问题，具体是在各个个体基础上的图结构异同。
methods: 本研究使用神经网络来映射各个个体的Contextual特征到一个权重adjacency矩阵表示的DAG结构。神经网络具有一个新的投影层，使输出矩阵 sparse并满足最近发展的异环性特征。
results: 我们的实验表明，新的方法可以成功地回归真实的上下文特定的图结构，而existingsapproaches失败。

Abstract
Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.

摘要
estimate directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.Here's the text with some additional information about the translation:I used Google Translate to translate the text into Simplified Chinese. However, I noticed that the translation did not capture some of the technical terms and phrases used in the original text. Therefore, I made some adjustments to the translation to ensure that the meaning and context of the text are preserved.In particular, I replaced "directed acyclic graphs" with "指向无环图" (which is the literal translation of "directed acyclic graphs" in Simplified Chinese), and I replaced "contextual features" with "上下文特征" (which is a more common term used in machine learning to refer to features that are specific to a particular context or population). I also replaced "acyclic" with "无环" (which is the literal translation of "acyclic" in Simplified Chinese), and I added the word "特定" (which means "specific" or "context-specific") to the phrase "context-specific graph" to emphasize that the graph structure varies across individuals based on their specific context.Overall, I hope that the translation is helpful and accurate, and I apologize for any errors or inaccuracies that may remain.

Accelerating Split Federated Learning over Wireless Communication Networks

paper_url: http://arxiv.org/abs/2310.15584
repo_url: None
paper_authors: Ce Xu, Jinxuan Li, Yuan Liu, Yushi Ling, Miaowen Wen
for: 这篇论文旨在提高深度神经网络（DNN）在内存限制的Edge设备上的应用，但DNN的参数数量和计算复杂度使得它实现困难。
methods: 这篇论文提出了一种模型分割/拆分的方法，将DNN分成两部分，分别在设备和服务器上进行培训或推导。
results: 论文的实验结果显示，这种方法可以将系统延迟降至最低，同时提高准确性。

Abstract
The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.

摘要
人工智能（AI）的发展提供了深度神经网络（DNN）基于应用的推广机会。然而，DNN的参数量和计算复杂性使其在具有限制的边缘设备上部署困难。一种有效的方法是模型分割/拆分，其将DNN分成两部分，一部在设备上部署，另一部在服务器上部署，以进行合作训练或合作推理。在这篇论文中，我们考虑了一个分布式学习（FL）框架，该框架结合了分布式学习的并行训练机制和拆分学习（SL）的模型结构。我们考虑了一个实际的异构设备场景，其中每个设备有自己的拆分点。我们形式化了一个分布式系统延迟最小化问题，并使用交叉优化法解决这个问题。实验结果表明，我们的方法在延迟减少和准确率提高方面具有优势。

Identifiable Latent Polynomial Causal Models Through the Lens of Change

paper_url: http://arxiv.org/abs/2310.15580
repo_url: None
paper_authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi
for: 本研究旨在探讨 causal representation learning 如何揭示隐藏的高级 causal 表示，并提供可靠的确认方法。
methods: 本文提出了一种基于 latent causal variable 的变化分析方法，以确保 causal 模型的可靠性。
results: 本研究发现了一种扩展 latent causal models 的方法，包括非线性 causal 关系和不同的噪音分布。此外，本文还提出了一种新的 empirical estimation 方法，并通过实验验证了其理论成果。

Abstract
Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \citep{liu2022identifying}. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.

摘要
Translated into Simplified Chinese: causal representation learning aims to reveal latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

From Oja’s Algorithm to the Multiplicative Weights Update Method with Applications

paper_url: http://arxiv.org/abs/2310.15559
repo_url: None
paper_authors: Dan Garber
for: 这个论文主要针对的是在随机 principaal component analysis 中研究的在线算法 Oja 算法。
methods: 这个论文提出了一个简单 yet novel 的观察，即当应用到任何（不一定是随机）矩阵序列中，只要这些矩阵具有共同特征向量，那么 Oja 算法的 regret 可以直接关于 multiplicative weights update 方法的 regret 的 bound。
results: 论文提出了几个应用于 $\reals^n$ 上的二次形式优化问题，其中包括随机 principaal component analysis。

Abstract
Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of the well known multiplicative weights update method for the problem of prediction with expert advice. Several applications to optimization with quadratic forms over the unit sphere in $\reals^n$ are discussed.

摘要
“Oja的算法是一种著名的在线算法，主要在随机主成分分析中研究。我们作出了一个简单的观察，即当应用于任何（不一定是随机的）对称矩阵序列，这些矩阵共享共同特征向量时，Oja的算法的 regret可以直接 bounds 为multiplicative weights更新方法的 regret，用于预测专家建议问题。我们还讨论了在 Unit 球上二元函数优化问题的几种应用。”Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.

Transfer learning for day-ahead load forecasting: a case study on European national electricity demand time series

paper_url: http://arxiv.org/abs/2310.15555
repo_url: None
paper_authors: Alexandros-Menelaos Tzortzis, Sotiris Pelekis, Evangelos Spiliotis, Spiros Mouzakitis, John Psarras, Dimitris Askounis
for: 这个研究的目的是提高短期负载预测（STLF）的精度，并 investigate the performance of transfer learning（TL）in STLF.
methods: 本研究使用了一个流行的神经网络模型（NN），并进行了一个 clustering 分析来找出Series Similarity。
results: 研究结果显示，TL 可以比 convential approach 更高的精度，特别是当使用 clustering 分析时。

Abstract
Short-term load forecasting (STLF) is crucial for the daily operation of power grids. However, the non-linearity, non-stationarity, and randomness characterizing electricity demand time series renders STLF a challenging task. Various forecasting approaches have been proposed for improving STLF, including neural network (NN) models which are trained using data from multiple electricity demand series that may not necessary include the target series. In the present study, we investigate the performance of this special case of STLF, called transfer learning (TL), by considering a set of 27 time series that represent the national day-ahead electricity demand of indicative European countries. We employ a popular and easy-to-implement NN model and perform a clustering analysis to identify similar patterns among the series and assist TL. In this context, two different TL approaches, with and without the clustering step, are compiled and compared against each other as well as a typical NN training setup. Our results demonstrate that TL can outperform the conventional approach, especially when clustering techniques are considered.

摘要
In this study, we investigate the performance of transfer learning (TL) in STLF by using a set of 27 time series that represent the national day-ahead electricity demand of indicative European countries. We employ a popular and easy-to-implement NN model and perform a clustering analysis to identify similar patterns among the series and assist TL. We compare two different TL approaches, with and without the clustering step, against each other and a typical NN training setup.Our results show that TL can outperform the conventional approach, especially when clustering techniques are considered. By leveraging the similarities among the time series, TL can improve the accuracy of STLF and provide more reliable predictions for power grid operations.

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing

paper_url: http://arxiv.org/abs/2310.15549
repo_url: None
paper_authors: Ziye Ma, Javad Lavaei, Somayeh Sojoudi
for: 研究Gradient Descent（GD）在机器学习模型中的泛化能力，尤其是在矩阵优化问题中。
methods: 使用GD方法优化矩阵优化问题，特别是在 lifted matrix sensing 框架中。
results: 研究发现，对于 sufficient small initialization scale，GD 可以导致矩阵变为约等于rank-1矩阵，并且得到了稳定的解。这些结论 highlights the importance of tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.

Abstract
Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.

摘要
“梯度下降（GD）在机器学习模型中扮演着关键的角色，它会隐藏式地导致正规化，实现紧凑的表示。在这个工作中，我们研究GD在维度优化中的角色，特别是在 matrix sensing 框架下。这个框架最近被提出，以解决非齐形矩阵感知问题，并转换了假解答为绝对点状态。我们发现，将GD应用到这个升高的问题，则可以获得约等于矩阵的紧凑矩阵和潜在绝对点。我们的发现表明，矩阵参数化和首项方法的结合，可以实现这些问题的全球最佳解。”Note: Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. It is different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Symmetry-preserving graph attention network to solve routing problems at multiple resolutions

paper_url: http://arxiv.org/abs/2310.15543
repo_url: https://github.com/hysonlab/multires-np-hard
paper_authors: Cong Dao Tran, Thong Bach, Truong Son Hy
for: 解决Travelling Salesperson Problems (TSPs) 和 Vehicle Routing Problems (VRPs) 的精度和计算时间问题，通过采用机器学习 (ML) 方法。
methods: 提出了首次完全具有对称性的模型和训练方法，能够解决 combinatorial problems。同时，我们还提出了一种Multiresolution scheme和Equivariant Graph Attention network (mEGAT) 架构，可以充分利用图的多尺度结构， especial for large and long-range graphs。
results: 对比于现有基eline，我们的模型表现出了显著的改善，并证明了对称保持和多尺度是解决 combinatorial problems 的关键因素。我们的代码公开 disponível于 GitHub 上（https://github.com/HySonLab/Multires-NP-hard）。

Abstract
Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard

摘要
旅行销售人员问题 (TSP) 和车辆路径问题 (VRP) 已经通过机器学习 (ML) 方法实现了一定的改进，但是之前的所有工作都没有完全尊重 TSP 和 VRP 中出现的对称性，包括旋转、平移、Permutation 和缩放。在这项工作中，我们介绍了第一个完全对称的模型和训练方法，用于解决 combinatorial 问题。此外，我们认为需要捕捉输入图的多尺度结构（即从本地到全局信息），特别是在大型和长距离图中，而前一些方法只能提取本地信息，可能导致本地或不优解。为解决这些限制，我们提出了一种多尺度 schemes 和对称图注意力网络 (mEGAT) 架构，可以高效地学习输入图的优化路径。具体来说，我们的方法构建了输入图的层次结构，从输入图中构建一系列粗化图，并在这些粗化图上解决路径问题。我们首先在粗化图上解决路径问题，然后利用该知识来解决更复杂的高级图。实验证明，我们的模型已经超过了现有的基eline，并证明对称保持和多尺度是解决 combinatorial 问题的重要配方。我们的源代码可以在上获取。

Privacy Amplification for Matrix Mechanisms

paper_url: http://arxiv.org/abs/2310.15526
repo_url: None
paper_authors: Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta
for: 这个论文的目的是为了提供一种能够适用于新的状态流行算法的隐私抽象分析方法。
methods: 这个论文使用的方法是基于随机选择数据的隐私抽象分析，以提供更加紧张的隐私保证。
results: 这个论文提出了一种名为”MMCC”的算法，可以对任何generic matrix mechanism进行隐私抽象分析。MMCC的分析结果几乎是最佳的，随着epsilondrawing to zero，其分析结果接近下界。此外，这个论文还证明了可以通过conditioning来分析相关输出，从而使得隐私抽象分析可以应用于新的状态流行算法。

Abstract
Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.

摘要
“隐私增强”利用数据选择的随机性提供更紧密的隐私保证（DP）。这个分析是DP-SGD的成功之关键，但是不能directly应用于最新的state-of-the-art算法。这是因为这些算法，称为DP-FTRL，使用矩阵机制来添加相关的随机变数而不是独立的随机变数，如DP-SGD。在这篇论文中，我们提出“MMCC”，第一个可以分析隐私增强通过抽样的任何矩阵机制。MMCC几乎是紧致的，随着$\epsilon$趋向0，它接近下界。在分析相关的输出时，我们证明可以将它们视为独立的，通过对它们的先前输出进行条件。我们称之为“条件汇总定理”，它具有广泛的实用性：我们使用它来显示DP-FTRL中添加到二元树DP-FTRL的随机变数可以对应DP-SGD中的随机变数。我们的增强算法也有实际的实验实用性：我们显示它对DP-FTRL算法的隐私-功能贡献进行了重要的改善。

On the Inherent Privacy Properties of Discrete Denoising Diffusion Models

paper_url: http://arxiv.org/abs/2310.15524
repo_url: None
paper_authors: Rongzhe Wei, Eleonora Kreačić, Haoyu Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li
for: 该论文旨在探讨对散列模型（DDMs）的散列数据生成的隐私保护性。
methods: 该论文采用了数学理论来描述DDMs的隐私保护性，具体来说是对每个数据点进行分别的散列Diffusion Models（pDP）的隐私泄露分析，以提供数据预处理方法来减少DDMs生成的隐私风险。
results: 该论文的研究结果表明，在使用DDMs生成散列数据时，隐私保护性可以在不同的数据规模下得到保障，并且隐私泄露会随着散列率的增加而减少。此外，该论文还通过实验验证了其理论结论，并在真实世界的数据集上进行了验证。

Abstract
Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.

摘要
隐私问题的增加导致了人工数据集的创造，扩散模型emerging as a promising avenue。 Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $( \epsilon, \mathcal{O}( \frac{1}{s^2\epsilon}) )$-pDP to $( \epsilon, \mathcal{O}( \frac{1}{s\epsilon}) )$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.Note: Please note that the translation is in Simplified Chinese, which is one of the two standardized Chinese writing systems. If you prefer Traditional Chinese, I can provide that as well.

Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs

paper_url: http://arxiv.org/abs/2310.15516
repo_url: https://github.com/hysonlab/chinese_postman_problem
paper_authors: Cong Dao Tran, Truong Son Hy
for: solves the Chinese Postman Problem with load-dependent costs (CPP-LC) using a novel deep reinforcement learning (DRL) framework.
methods: proposes a DRL model consisting of an encoder and decoder to address the CPP-LC challenge effectively, and a bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA).
results: outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) regarding both solution quality and running time, and gives the best solution quality with much more running time compared to EA.Here is the summary in Traditional Chinese:
for: 解决中文邮差问题（CPP-LC）使用 Deep Reinforcement Learning（DRL）框架。
methods: 提出一个基于 DRL 的数组码（encoder）和解码（decoder）来解决 CPP-LC 挑战，以及一个生物灵感的 meta-heuristic 解决方案基于进化算法（EA）。
results: 与 exist 的 meta-heuristic 方法（如迭代本地搜索（ILS）和Variable Neighborhood Search（VNS））相比，在解决 CPP-LC 中获得更好的解决方案和更快的执行时间，并且与 EA 相比，获得最佳解决方案，但执行时间较长。

Abstract
Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature irregular and complex solution spaces compared to TSP. To fill these gaps, this paper proposes a novel DRL framework to address the CPP with load-dependent costs (CPP-LC) (Corberan et al., 2018), which is a complex arc routing problem with load constraints. The novelty of our method is two-fold. First, we formulate the CPP-LC as a Markov Decision Process (MDP) sequential model. Subsequently, we introduce an autoregressive model based on DRL, namely Arc-DRL, consisting of an encoder and decoder to address the CPP-LC challenge effectively. Such a framework allows the DRL model to work efficiently and scalably to arc routing problems. Furthermore, we propose a new bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA) for CPP-LC. Extensive experiments show that Arc-DRL outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) proposed by (Corberan et al., 2018) on large benchmark datasets for CPP-LC regarding both solution quality and running time; while the EA gives the best solution quality with much more running time. We release our C++ implementations for metaheuristics such as EA, ILS and VNS along with the code for data generation and our generated data at https://github.com/HySonLab/Chinese_Postman_Problem

摘要
近期，深度强化学习（DRL）模型已经在路径问题上显示了扎实的成果。然而，大多数DRL解决方案都是针对节点路径问题，如旅行销售人问题（TSP）。同时，对于弧路径问题，如中国邮政员问题（CPP），有限的研究尝试使用神经网络方法。为了填补这些漏洞，这篇论文提出了一种新的DRL框架，用于解决CPP-LC（Corberan et al., 2018），这是一种复杂的弧路径问题，具有负荷依赖成本。我们的创新在两个方面：1. 我们将CPP-LC转化为Markov决策过程（MDP）sequential模型。2. 我们引入了基于DRL的自适应模型，即弧路径模型（Arc-DRL），其包括编码器和解码器，以有效地解决CPP-LC挑战。这种框架使得DRL模型可以有效地和扩展地应用于弧路径问题。此外，我们还提出了一种基于进化算法（EA）的新生物启发式méta-希望解方法，用于CPP-LC。我们在大量的 benchmark 数据集上进行了广泛的实验，结果显示，Arc-DRL在解决CPP-LC问题时，不仅在解决质量和运行时间两个方面都超过了现有的méta-希望方法，如迭代本地搜索（ILS）和变化邻居搜索（VNS），而且EA提供了最佳的解决质量，但需要更多的运行时间。我们在https://github.com/HySonLab/Chinese_Postman_Problem中发布了我们在metaheuristics、EA、ILS和VNS等方面的C++实现，以及我们生成的数据和代码。

Interpretable Survival Analysis for Heart Failure Risk Prediction

paper_url: http://arxiv.org/abs/2310.15472
repo_url: None
paper_authors: Mike Van Ness, Tomas Bosschieter, Natasha Din, Andrew Ambrosy, Alexander Sandhu, Madeleine Udell
for: 这篇论文是针对医疗研究中的存生分析问题（time-to-event analysis），专门针对医院数据库中的心血管疾病风险评估。
methods: 这篇论文提出了一个新的存生分析管线，融合了机器学习和解释性分析，以提高存生分析的精度和可读性。管线包括改进的存生堆压法、ControlBurn 的特征选择和解释性加速机制。
results: 这篇论文使用大规模医院数据库，预测心血管疾病的风险，并 achieve state-of-the-art 性能。同时，管线还提供了有趣和新的风险因素统计，对于医疗应用有很好的帮助。

Abstract
Survival analysis, or time-to-event analysis, is an important and widespread problem in healthcare research. Medical research has traditionally relied on Cox models for survival analysis, due to their simplicity and interpretability. Cox models assume a log-linear hazard function as well as proportional hazards over time, and can perform poorly when these assumptions fail. Newer survival models based on machine learning avoid these assumptions and offer improved accuracy, yet sometimes at the expense of model interpretability, which is vital for clinical use. We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models. Specifically, we use an improved version of survival stacking to transform a survival analysis problem to a classification problem, ControlBurn to perform feature selection, and Explainable Boosting Machines to generate interpretable predictions. To evaluate our pipeline, we predict risk of heart failure using a large-scale EHR database. Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure.

摘要
To address this challenge, we propose a novel survival analysis pipeline that balances interpretability and competitive performance with state-of-the-art survival models. Our pipeline consists of three key components:1. Survival stacking: We use an improved version of survival stacking to transform the survival analysis problem into a classification problem.2. ControlBurn: We employ ControlBurn for feature selection to identify the most relevant features for predicting the risk of heart failure.3. Explainable Boosting Machines: We use Explainable Boosting Machines to generate interpretable predictions and provide novel insights into risk factors for heart failure.To evaluate our pipeline, we use a large-scale EHR database to predict the risk of heart failure. Our results show that our pipeline achieves state-of-the-art performance and provides interesting and novel insights into risk factors for heart failure.In summary, our proposed survival analysis pipeline offers a balance between interpretability and competitive performance, making it a valuable tool for healthcare researchers and clinicians.

EKGNet: A 10.96μW Fully Analog Neural Network for Intra-Patient Arrhythmia Classification

paper_url: http://arxiv.org/abs/2310.15466
repo_url: None
paper_authors: Benyamin Haghi, Lin Ma, Sahin Lale, Anima Anandkumar, Azita Emami
for: 这篇论文是用于开发一个低功耗且精准的电子心脏病诊断系统，特别是用于识别不同类型的心脏病。
methods: 这篇论文提出了一个整合式的方法，结合了分析计算和深度学习，用于识别电子心脏病（ECG）的不同类型。它提出了一个名为EKGNet的硬件高效的完全分析型心脏病识别架构，这个架构可以实现高精度和低功耗。
results: 实验结果显示，这个方法在PhysioNet的MIT-BIH和PTB诊断数据集上取得了平均的平衡精度为95%和94.25%，用于内部患者的心脏病识别和myocardial infarction（MI）识别。这个创新的方法具有优秀的可转移性和应用性，可以用于开发低功耗且精准的心脏病诊断系统。

Abstract
We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. We propose EKGNet, a hardware-efficient and fully analog arrhythmia classification architecture that archives high accuracy with low power consumption. The proposed architecture leverages the energy efficiency of transistors operating in the subthreshold region, eliminating the need for analog-to-digital converters (ADC) and static random access memory (SRAM). The system design includes a novel analog sequential Multiply-Accumulate (MAC) circuit that mitigates process, supply voltage, and temperature variations. Experimental evaluations on PhysioNet's MIT-BIH and PTB Diagnostics datasets demonstrate the effectiveness of the proposed method, achieving average balanced accuracy of 95% and 94.25% for intra-patient arrhythmia classification and myocardial infarction (MI) classification, respectively. This innovative approach presents a promising avenue for developing low-power arrhythmia classification systems with enhanced accuracy and transferability in biomedical applications.

摘要
我们提出了一种集成的方法，通过结合分析计算和深度学习来进行电cardiogram（ECG）动 irregularity分类。我们提出了EKGNet，一种具有高精度低功耗的完全分析动 irregularity分类架构。这种架构利用了晶体管在低阈值区工作的能源效率，从而消除了ADC和SRAM的需求。系统设计包括一种新的分析顺序Multiply-Accumulate（MAC）电路，以降低过程、供电电压和温度变化的影响。实验评估PhysioNet的MIT-BIH和PTB诊断数据集表明，提出的方法有效，实现平均权衡精度为95%和94.25%，分别用于患者内部动 irregularity分类和心肺病（MI）分类。这一创新方法对生物医学应用中的低功耗动 irregularity分类系统具有扩展精度和可贷 Transferability。

Private Learning with Public Features

paper_url: http://arxiv.org/abs/2310.15454
repo_url: https://github.com/Sfedfcv/redesigned-pancake
paper_authors: Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang
for: 这个论文研究了一类Private Learning问题，其中数据是私有和公共特征的Join。这种情况通常出现在个性化推荐或广告预测中，其中个人相关的特征是敏感信息，而物品相关的特征（如电影或歌曲推荐或广告）不需要保护。问题是公共特征的存在下，私人算法可以达到更高的利用吗？我们给出了一个答案，即多个编码器模型中的一个编码器可以操作公共特征。
methods: 我们开发了新的算法，利用这种分离来只保护必要的充分统计（而不是添加噪声到梯度）。这种方法在线性回归中有保证的利用提升，并在两个标准私人推荐 benchmark 上达到了状态的艺术，证明了适应私人-公共特征分离的方法的重要性。
results: 我们的实验结果表明，在两个标准私人推荐 benchmark 上，我们的方法可以达到状态的艺术，并且在线性回归中有保证的利用提升。这证明了适应私人-公共特征分离的方法的重要性。

Abstract
We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not require protection. A natural question is whether private algorithms can achieve higher utility in the presence of public features. We give a positive answer for multi-encoder models where one of the encoders operates on public features. We develop new algorithms that take advantage of this separation by only protecting certain sufficient statistics (instead of adding noise to the gradient). This method has a guaranteed utility improvement for linear regression, and importantly, achieves the state of the art on two standard private recommendation benchmarks, demonstrating the importance of methods that adapt to the private-public feature separation.

摘要
我们研究一类private学习问题，其数据是private和public特征的Join。这经常发生在private个性化任务中，如推荐或广告预测，个人相关的特征是敏感信息，而Item（电影或歌曲的推荐或用户展示广告）相关的特征公开可用并不需要保护。自然地出现一个问题：private算法在公共特征的存在下是否可以 дости得更高的实用性。我们给出了一个答案，即多个encoder模型中的一个encoder操作于公共特征。我们开发了新的算法，利用这种分离来只保护特定的充分统计（而不是添加噪声到梯度）。这种方法在线性回归中有保证的实用性提升，并在两个标准的private推荐benchmark上达到了state of the art，证明了对private-public特征分离的方法的重要性。

General Identifiability and Achievability for Causal Representation Learning

paper_url: http://arxiv.org/abs/2310.15450
repo_url: https://github.com/bvarici/score-general-id-crl
paper_authors: Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, Ali Tajer
for:This paper focuses on developing a method for causal representation learning (CRL) under a general nonparametric causal latent model and a general transformation model.methods:The method uses two hard uncoupled interventions per node in the latent causal graph to establish identifiability and achievability results. The algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables.results:The paper guarantees perfect recovery of the latent causal model and variables under uncoupled interventions, and recovers the existing identifiability result for two hard coupled interventions. The method does not require additional faithfulness assumptions when observational data is available.

Abstract
This paper focuses on causal representation learning (CRL) under a general nonparametric causal latent model and a general transformation model that maps the latent data to the observational data. It establishes \textbf{identifiability} and \textbf{achievability} results using two hard \textbf{uncoupled} interventions per node in the latent causal graph. Notably, one does not know which pair of intervention environments have the same node intervened (hence, uncoupled environments). For identifiability, the paper establishes that perfect recovery of the latent causal model and variables is guaranteed under uncoupled interventions. For achievability, an algorithm is designed that uses observational and interventional data and recovers the latent causal model and variables with provable guarantees for the algorithm. This algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables. The analysis, additionally, recovers the existing identifiability result for two hard \textbf{coupled} interventions, that is when metadata about the pair of environments that have the same node intervened is known. It is noteworthy that the existing results on non-parametric identifiability require assumptions on interventions and additional faithfulness assumptions. This paper shows that when observational data is available, additional faithfulness assumptions are unnecessary.

摘要

An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems

paper_url: http://arxiv.org/abs/2310.15448
repo_url: None
paper_authors: Huiling Zhang, Zi Xu
for: 解决随机非凸-凹最小最大值问题
methods: 使用加速的首项规则化慢步驱动算法（FORMDA）
results: 算法的迭代复杂度为 $\tilde{\mathcal{O}(\varepsilon ^{-6.5})$，可以在单循环算法中解决随机非凸-凹最小最大值问题，并且实现了最佳复杂度下限。

Abstract
Stochastic nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose an accelerated first-order regularized momentum descent ascent algorithm (FORMDA) for solving stochastic nonconvex-concave minimax problems. The iteration complexity of the algorithm is proved to be $\tilde{\mathcal{O}(\varepsilon ^{-6.5})$ to obtain an $\varepsilon$-stationary point, which achieves the best-known complexity bound for single-loop algorithms to solve the stochastic nonconvex-concave minimax problems under the stationarity of the objective function.

摘要

Learning Dynamics in Linear VAE: Posterior Collapse Threshold, Superfluous Latent Space Pitfalls, and Speedup with KL Annealing

paper_url: http://arxiv.org/abs/2310.15440
repo_url: None
paper_authors: Yuma Ichikawa, Koji Hukushima
for: 这个论文的目的是解决变量自动机 (VAE) 面临的一个知名问题，即随机 posterior 通常与先验密切相关，导致表示学习质量受损。
methods: 该论文提出了一个可调参数 $\beta$ 和一种宽泛化这个参数的策略，称为 KL 渐进。
results: 研究发现，在输入维度很大的情况下，VAE 的学习动力会 converges 到一个决定性过程，从而可以进行详细的泛化误差分析。此外，分析还表明，VAE 初期学习杂合表示，逐渐取得不杂合表示。在特定的学习期间，当 $\beta$ 超过一定的阈值时， posterior 坍塌变得不可避免。此外，过量的尺度变量会导致背景噪声的过拟合，从而影响总体化学习和学习速度。

Abstract
Variational autoencoders (VAEs) face a notorious problem wherein the variational posterior often aligns closely with the prior, a phenomenon known as posterior collapse, which hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter $\beta$ and a strategy for annealing this parameter, called KL annealing, are proposed. This study presents a theoretical analysis of the learning dynamics in a minimal VAE. It is rigorously proved that the dynamics converge to a deterministic process within the limit of large input dimensions, thereby enabling a detailed dynamical analysis of the generalization error. Furthermore, the analysis shows that the VAE initially learns entangled representations and gradually acquires disentangled representations. A fixed-point analysis of the deterministic process reveals that when $\beta$ exceeds a certain threshold, posterior collapse becomes inevitable regardless of the learning period. Additionally, the superfluous latent variables for the data-generative factors lead to overfitting of the background noise; this adversely affects both generalization and learning convergence. The analysis further unveiled that appropriately tuned KL annealing can accelerate convergence.

摘要
“短变自动encoder（VAEs）面临一种著名的问题，即变量 posterior frequently aligns closely with the prior，导致 representation learning 质量受损。为了解决这个问题，一个可调参数 $\beta$ 和一种冷却这个参数的策略，称为 KL 冷却，被提出。本研究对一个最小化 VAE 的学习动态进行了理论分析。通过证明了动态在大输入维度下 converges to a deterministic process，从而允许详细分析泛化错误。此外，分析表明 VAE 初始化时学习杂合的表示，逐渐获得分离的表示。一种固定点分析表明，当 $\beta$ 超过某个阈值时， posterior collapse 变得不可避免，无论学习期间如何。此外，杂合的幂 variable 对数据生成因素的过度适应，导致训练过程中的过拟合和泛化错误。研究还发现，合适地调整 KL 冷却可以加速 converge。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Off-Policy Evaluation for Large Action Spaces via Policy Convolution

paper_url: http://arxiv.org/abs/2310.15433
repo_url: None
paper_authors: Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley
for: 评估和优化新政策时需要开发准确的非策略估计器，以避免分布偏移问题。
methods: 我们介绍了Policy Convolution（PC）家族的估计器，利用行动嵌入来策略性卷积 logging 和目标策略，从而控制偏移的偏移。
results: 我们在 synthetic 和 benchmark 数据集上进行了实验，发现 PC 可以在策略差异较大的情况下提供remarkable的mean squared error（MSE）改进，最多达到5-6个数量级。

Abstract
Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiased value estimation but often comes with the trade-off of high variance, even in the simpler case of one-step contextual bandits. Furthermore, importance sampling relies on the common support assumption, which becomes impractical when the action space is large. To address these challenges, we introduce the Policy Convolution (PC) family of estimators. These methods leverage latent structure within actions -- made available through action embeddings -- to strategically convolve the logging and target policies. This convolution introduces a unique bias-variance trade-off, which can be controlled by adjusting the amount of convolution. Our experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC, especially when either the action space or policy mismatch becomes large, with gains of up to 5 - 6 orders of magnitude over existing estimators.

摘要
发展准确的脱政策估计器是评估和优化新政策的关键。脱政策估计的主要挑战在于数据生成政策和目标政策之间的分布偏移。通常，对偏移分布进行重要样本抽样是一种解决方法，但这种方法通常会带来高异常度，即使在一步上下文抽象带its。此外，重要样本抽样假设共同支持，当行动空间较大时变得不实际。为解决这些挑战，我们介绍了政策卷积（PC）家族估计器。这些方法利用行动中的隐藏结构（通过行动嵌入）进行策略性卷积，这种卷积引入了唯一的偏移-异常度质量trade-off，可以通过调整卷积的量控制。我们在synthetic和benchmark数据集上进行了实验，demonstrate PC在大的行动空间或政策偏移时实现了非常出色的均方差Error（MSE）改进，比现有估计器更高达5-6个数量级。

2023-10-24

Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction

Attention-Based Ensemble Pooling for Time Series Forecasting

Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

ELM Ridge Regression Boosting

Efficient deep data assimilation with sparse observations and time-varying sensors

Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration

Fine tuning Pre trained Models for Robustness Under Noisy Labels

Brainchop: Next Generation Web-Based Neuroimaging Application

Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Online Thermal Field Prediction for Metal Additive Manufacturing of Thin Walls

Anchor Space Optimal Transport: Accelerating Batch Processing of Multiple OT Problems

19 Parameters Is All You Need: Tiny Neural Networks for Particle Physics

Compressed representation of brain genetic transcription

Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling

Locally Differentially Private Gradient Tracking for Distributed Online Learning over Directed Graphs

Contextual Bandits for Evaluating and Improving Inventory Control Policies

A Unified, Scalable Framework for Neural Population Decoding

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

MLFMF: Data Sets for Machine Learning for Mathematical Formalization

White-box Compiler Fuzzing Empowered by Large Language Models

Data-driven Traffic Simulation: A Comprehensive Review

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees

Constructing and Machine Learning Calabi-Yau Five-folds

Weighted Distance Nearest Neighbor Condensing

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

Online Robust Mean Estimation

Climate Change Impact on Agricultural Land Suitability: An Interpretable Machine Learning-Based Eurasia Case Study

Neural Collapse in Multi-label Learning with Pick-all-label Loss

Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data

State Sequences Prediction via Fourier Transform for Representation Learning

Using Causality-Aware Graph Neural Networks to Predict Temporal Centralities in Dynamic Graphs

Improving Event Time Prediction by Learning to Partition the Event Time Space

Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting

Localization of Small Leakages in Water Distribution Networks using Concept Drift Explanation Methods

One or Two Things We know about Concept Drift – A Survey on Monitoring Evolving Environments

Nonlinear dimensionality reduction then and now: AIMs for dissipative PDEs in the ML era

Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations

Amortised Inference in Neural Networks for Small-Scale Probabilistic Meta-Learning

Robust Learning via Conditional Prevalence Adjustment

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization

Improving Diffusion Models for ECG Imputation with an Augmented Template Prior

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting

Enhancing Traffic Prediction with Learnable Filter Module

Momentum Gradient-based Untargeted Attack on Hypergraph Neural Networks

Deceptive Fairness Attacks on Graphs via Meta Learning

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

Light up that Droid! On the Effectiveness of Static Analysis Features against App Obfuscation for Android Malware Detection

Guaranteed Coverage Prediction Intervals with Gaussian Process Regression

Contextual directed acyclic graphs

Accelerating Split Federated Learning over Wireless Communication Networks

Identifiable Latent Polynomial Causal Models Through the Lens of Change

From Oja’s Algorithm to the Multiplicative Weights Update Method with Applications

Transfer learning for day-ahead load forecasting: a case study on European national electricity demand time series

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing

Symmetry-preserving graph attention network to solve routing problems at multiple resolutions

Privacy Amplification for Matrix Mechanisms

On the Inherent Privacy Properties of Discrete Denoising Diffusion Models

Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs

Interpretable Survival Analysis for Heart Failure Risk Prediction

EKGNet: A 10.96μW Fully Analog Neural Network for Intra-Patient Arrhythmia Classification

Private Learning with Public Features

General Identifiability and Achievability for Causal Representation Learning

An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems

Learning Dynamics in Linear VAE: Posterior Collapse Threshold, Superfluous Latent Space Pitfalls, and Speedup with KL Annealing

Off-Policy Evaluation for Large Action Spaces via Policy Convolution