2023-09-18

cs.LG

cs.LG - 2023-09-18

Causal Theories and Structural Data Representations for Improving Out-of-Distribution Classification

paper_url: http://arxiv.org/abs/2309.10211
repo_url: None
paper_authors: Donald Martin, Jr., David Kinney
for: 用于提高机器学习系统的稳定性和安全性，通过使用人类生成的 causal 知识来减少机器学习开发人员的论证不确定性。
methods: 使用人类中心的 causal 理论和动力学 литературы中的工具，将数据表示为epidemic系统的数据生成过程中的不变结构 causal 特征。
results: 通过使用这种数据表示方法，在训练神经网络时，可以提高数据外的泛化性表现，比如Naive Approach的数据表示方法。

Abstract
We consider how human-centered causal theories and tools from the dynamical systems literature can be deployed to guide the representation of data when training neural networks for complex classification tasks. Specifically, we use simulated data to show that training a neural network with a data representation that makes explicit the invariant structural causal features of the data generating process of an epidemic system improves out-of-distribution (OOD) generalization performance on a classification task as compared to a more naive approach to data representation. We take these results to demonstrate that using human-generated causal knowledge to reduce the epistemic uncertainty of ML developers can lead to more well-specified ML pipelines. This, in turn, points to the utility of a dynamical systems approach to the broader effort aimed at improving the robustness and safety of machine learning systems via improved ML system development practices.

摘要
我们考虑了人类中心 causal 理论和动力系统文献中的工具如何用于导引训练 нейрон网络时的数据表现。 Specifically, 我们使用模拟数据显示了训练一个 нейрон网络使用明确表达数据生成过程中的抗变化结构特征，可以提高数据外的测验性能（OOD）的数据表现。 We take these results to demonstrate that使用人类生成的 causal 知识可以减少机器学习开发人员的 epistemic 不确定性，可以导致更好地规划的 ML 管线。 This, in turn, points to the utility of a dynamical systems approach to the broader effort aimed at improving the robustness and safety of machine learning systems via improved ML system development practices.

The Kernel Density Integral Transformation

paper_url: http://arxiv.org/abs/2309.10194
repo_url: https://github.com/calvinmccarter/kditransform
paper_authors: Calvin McCarter
for: 本研究旨在提出基于机器学习和统计方法处理表格数据时的特征预处理策略。
methods: 本文提议使用核密度积分变换作为特征预处理步骤，该方法包含了Linear Min-Max Scaling和量iles transformation两种领先的特征预处理方法，而无需hyperparameter调整。
results: 本研究示出，无需调整hyperparameter，核密度积分变换可以作为Linear Min-Max Scaling和量iles transformation的简单替换方法，并且可以具有这两种方法的稳定性和性能。此外，通过调整单个连续hyperparameter，我们经常可以超越这两种方法的性能。最后，本文还示出了核密度变换在统计数据分析中的益处，特别是在相关分析和单variate clustering中。

Abstract
Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering robustness to the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.

摘要
zh功能预处理仍然在机器学习和统计方法应用到表格数据时扮演关键角色。在这篇论文中，我们提议使用核密度积分变换作为特征预处理步骤。我们的方法包含两种领先的特征预处理方法为限制 случа：线性最小最大缩放和量程变换。我们示示，无需hyperparameter调整，核密度积分变换可以作为简单的替换方法，具有对每个方法的强度。 Alternatively，通过调整单一连续的超参数，我们经常超越这两个方法。最后，我们示示了核密度变换可以营利地应用于统计数据分析，尤其是在相关分析和单variate归一化中。

Stochastic Deep Koopman Model for Quality Propagation Analysis in Multistage Manufacturing Systems

paper_url: http://arxiv.org/abs/2309.10193
repo_url: None
paper_authors: Zhiyi Chen, Harshal Maske, Huanyi Shui, Devesh Upadhyay, Michael Hopka, Joseph Cohen, Xingjian Lai, Xun Huan, Jun Ni
for: 这篇研究的目的是为了模型多阶制造系统（MMS）的复杂行为，并使用深度学习方法来实现这个目的。
methods: 这篇研究使用了Stochastic Deep Koopman（SDK）框架，将 kritical quality information 通过Variational Autoencoders（VAEs）提取出来，并使用Koopman operator来传递这些资讯。
results: 根据比较研究，SDK 模型在预测 MMS 中每个阶段的产品质量方面的准确性比其他常用的数据驱动模型高。此外，SDK 的特殊的扩散性和可追溯性使得可以实现制程中品质的追溯和根本原因分析。

Abstract
The modeling of multistage manufacturing systems (MMSs) has attracted increased attention from both academia and industry. Recent advancements in deep learning methods provide an opportunity to accomplish this task with reduced cost and expertise. This study introduces a stochastic deep Koopman (SDK) framework to model the complex behavior of MMSs. Specifically, we present a novel application of Koopman operators to propagate critical quality information extracted by variational autoencoders. Through this framework, we can effectively capture the general nonlinear evolution of product quality using a transferred linear representation, thus enhancing the interpretability of the data-driven model. To evaluate the performance of the SDK framework, we carried out a comparative study on an open-source dataset. The main findings of this paper are as follows. Our results indicate that SDK surpasses other popular data-driven models in accuracy when predicting stagewise product quality within the MMS. Furthermore, the unique linear propagation property in the stochastic latent space of SDK enables traceability for quality evolution throughout the process, thereby facilitating the design of root cause analysis schemes. Notably, the proposed framework requires minimal knowledge of the underlying physics of production lines. It serves as a virtual metrology tool that can be applied to various MMSs, contributing to the ultimate goal of Zero Defect Manufacturing.

摘要
多stage制造系统（MMS）的模型化吸引了学术和实践领域的越来越多的关注。现代深度学习方法的提出，为了实现这项任务，成本和专业知识的减少提供了机会。本研究提出了一种随机深度库曼（SDK）框架，用于模型MMS的复杂行为。特别是，我们提出了一种使用Variational Autoencoders提取的重要质量信息的 Koopman 算子应用。通过这种框架，我们可以有效地捕捉产品质量的总非线性演化，使用传输的线性表示，从而提高数据驱动模型的解释性。为评估SDK框架的性能，我们进行了一项比较研究，用于一个开源数据集。研究结果显示，SDK在MMS中预测Stagewise产品质量方面的准确率高于其他流行的数据驱动模型。此外，SDK在随机潜在空间中的特有线性传播性能，可以跟踪产品质量的演化，从而实现质量演化的跟踪和根本分析方案的设计。值得一提的是，提出的方案不需要对制造线的物理基础知识。它可以作为虚拟测量工具，应用于不同的MMS，为无瑕制造做出贡献。

Autoencoder-based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter

paper_url: http://arxiv.org/abs/2309.10157
repo_url: None
paper_authors: The CMS ECAL Collaboration
for: 该论文是关于CMS实验室中高能物理数据质量监测的研究。
methods: 该研究使用了一种基于自适应神经网络的异常检测系统，通过使用时间依赖性和空间变化来提高异常检测性能。
results: 该系统能够效率地检测异常，并保持非常低的假阳性率。研究 Validates 该系统的性能通过2018和2022 LHC冲撞数据中的异常检测结果，并在CMS在线数据质量监测工作流中首次部署该系统的结果。

Abstract
The CMS detector is a general-purpose apparatus that detects high-energy collisions produced at the LHC. Online Data Quality Monitoring of the CMS electromagnetic calorimeter is a vital operational tool that allows detector experts to quickly identify, localize, and diagnose a broad range of detector issues that could affect the quality of physics data. A real-time autoencoder-based anomaly detection system using semi-supervised machine learning is presented enabling the detection of anomalies in the CMS electromagnetic calorimeter data. A novel method is introduced which maximizes the anomaly detection performance by exploiting the time-dependent evolution of anomalies as well as spatial variations in the detector response. The autoencoder-based system is able to efficiently detect anomalies, while maintaining a very low false discovery rate. The performance of the system is validated with anomalies found in 2018 and 2022 LHC collision data. Additionally, the first results from deploying the autoencoder-based system in the CMS online Data Quality Monitoring workflow during the beginning of Run 3 of the LHC are presented, showing its ability to detect issues missed by the existing system.

摘要
“CMS探测器是一个通用的实验设备，用于探测在LHC中产生的高能撞击。CMS电磁calorimeter在线时质量监控是实验专家快速识别、定位和诊断各种实验器问题的重要操作工具。这个实时自适应器基于机器学习系统可以实时检测CMS电磁calorimeter数据中的问题。我们引入了一种新的方法，利用时间递增的问题演进以及探测器响应的空间变化，以最大化问题检测性。这个自适应器基于系统能够快速检测问题，同时保持很低的伪阳性率。我们 validate了这个系统的性能，使用2018和2022年LHC撞击数据中的问题。此外，我们还将这个自适应器基于系统在CMS线上质量监控工作流程中的首次应用结果给出，证明它能够检测已有系统所忽略的问题。”

Realistic Website Fingerprinting By Augmenting Network Trace

paper_url: http://arxiv.org/abs/2309.10147
repo_url: https://github.com/spin-umass/realistic-website-fingerprinting-by-augmenting-network-traces
paper_authors: Alireza Bahramali, Ardavan Bozorgi, Amir Houmansadr
for: 本研究旨在提高网络识别攻击的实际性，并且挑战了现有的WF攻击方法的假设。
methods: 本研究使用了网络追加技术（NetAugment），这种技术可以帮助WF攻击者在未知的网络条件下进行识别。具体来说，我们使用了半监督和自监督学习技术来实现NetAugment。
results: 我们的实验结果表明，使用了网络追加技术进行WF攻击可以提高攻击的准确率。例如，在一个关闭世界的场景下，我们的自监督WF攻击（NetCLR）在评估 traces 是未知的情况下达到了80%的准确率，而现有的Triplet Fingerprinting方法只达到了64.4%的准确率。

Abstract
Website Fingerprinting (WF) is considered a major threat to the anonymity of Tor users (and other anonymity systems). While state-of-the-art WF techniques have claimed high attack accuracies, e.g., by leveraging Deep Neural Networks (DNN), several recent works have questioned the practicality of such WF attacks in the real world due to the assumptions made in the design and evaluation of these attacks. In this work, we argue that such impracticality issues are mainly due to the attacker's inability in collecting training data in comprehensive network conditions, e.g., a WF classifier may be trained only on samples collected on specific high-bandwidth network links but deployed on connections with different network conditions. We show that augmenting network traces can enhance the performance of WF classifiers in unobserved network conditions. Specifically, we introduce NetAugment, an augmentation technique tailored to the specifications of Tor traces. We instantiate NetAugment through semi-supervised and self-supervised learning techniques. Our extensive open-world and close-world experiments demonstrate that under practical evaluation settings, our WF attacks provide superior performances compared to the state-of-the-art; this is due to their use of augmented network traces for training, which allows them to learn the features of target traffic in unobserved settings. For instance, with a 5-shot learning in a closed-world scenario, our self-supervised WF attack (named NetCLR) reaches up to 80% accuracy when the traces for evaluation are collected in a setting unobserved by the WF adversary. This is compared to an accuracy of 64.4% achieved by the state-of-the-art Triplet Fingerprinting [35]. We believe that the promising results of our work can encourage the use of network trace augmentation in other types of network traffic analysis.

摘要

A Geometric Framework for Neural Feature Learning

paper_url: http://arxiv.org/abs/2309.10140
repo_url: https://github.com/xiangxiangxu/nfe
paper_authors: Xiangxiang Xu, Lizhong Zheng
for: 这种Framework用于学习系统设计，基于神经元特征提取器，利用特征空间的几何结构。
methods: 该 Framework使用特征几何来解决学习问题，包括最佳特征应对、数据样本学习和多变量学习等。
results: 该 Framework可以应用于现有网络架构和优化器，并可以解释经典方法的连接。

Abstract
We present a novel framework for learning system design based on neural feature extractors by exploiting geometric structures in feature spaces. First, we introduce the feature geometry, which unifies statistical dependence and features in the same functional space with geometric structures. By applying the feature geometry, we formulate each learning problem as solving the optimal feature approximation of the dependence component specified by the learning setting. We propose a nesting technique for designing learning algorithms to learn the optimal features from data samples, which can be applied to off-the-shelf network architectures and optimizers. To demonstrate the application of the nesting technique, we further discuss multivariate learning problems, including conditioned inference and multimodal learning, where we present the optimal features and reveal their connections to classical approaches.

摘要
我们提出了一种新的系统设计框架，基于神经特征提取器来利用特征空间的几何结构。首先，我们介绍了特征几何，这将统一统计依赖和特征在同一功能空间中的几何结构。通过应用特征几何，我们将每个学习问题解释为在依赖组件指定的学习设置中解决最佳特征近似问题。我们提议一种嵌套技术，用于从数据样本中学习最佳特征，可以应用于现有的网络架构和优化器。为了证明嵌套技术的应用，我们进一步讨论了多变量学习问题，包括受条件推理和多模态学习，并介绍了最佳特征和其与经典方法的连接。

Deep smoothness WENO scheme for two-dimensional hyperbolic conservation laws: A deep learning approach for learning smoothness indicators

paper_url: http://arxiv.org/abs/2309.10117
repo_url: None
paper_authors: Tatiana Kossaczká, Ameya D. Jagtap, Matthias Ehrhardt
for: 提高两个维度的欧拉方程的精度，特别是在尖型冲击和薄层扩散区域
methods: 通过对Weighted Essentially Non-Oscillatory（WENO）方法中的稳定指标进行深度学习修改，以提高数值解的准确性
results: 在多个文献中的测试问题上，新方法比传统的第五级WENO方法更高精度，特别是在尖型冲击和薄层扩散区域存在过度散射或超过冲击的情况下。

Abstract
In this paper, we introduce an improved version of the fifth-order weighted essentially non-oscillatory (WENO) shock-capturing scheme by incorporating deep learning techniques. The established WENO algorithm is improved by training a compact neural network to adjust the smoothness indicators within the WENO scheme. This modification enhances the accuracy of the numerical results, particularly near abrupt shocks. Unlike previous deep learning-based methods, no additional post-processing steps are necessary for maintaining consistency. We demonstrate the superiority of our new approach using several examples from the literature for the two-dimensional Euler equations of gas dynamics. Through intensive study of these test problems, which involve various shocks and rarefaction waves, the new technique is shown to outperform traditional fifth-order WENO schemes, especially in cases where the numerical solutions exhibit excessive diffusion or overshoot around shocks.

摘要
在这篇论文中，我们介绍了一种基于深度学习技术改进的第五阶重量核非抽象冲击捕捉算法（WENO）。我们在WENO算法中添加了一个紧凑型神经网络，以调整WENO算法中的平滑指标。这种修改可以提高计算结果的准确性，特别是在强冲击的情况下。与过去的深度学习基于方法不同，我们的新方法不需要额外的后处理步骤，以保持一致性。我们通过对文献中的几个测试问题进行广泛的研究，包括各种冲击和薄层振荡，证明了我们的新方法在冲击强度较大的情况下表现更好，特别是在计算结果中出现过度散射或过度强化的情况下。

A Semi-Supervised Approach for Power System Event Identification

paper_url: http://arxiv.org/abs/2309.10095
repo_url: None
paper_authors: Nima Taghipourbazargani, Lalitha Sankar, Oliver Kosut
for: 提高电力系统可靠性、安全性和稳定性，使用数据科学技术进行数据驱动事件识别。
methods: 使用 semi-supervised 学习技术，利用标注和无标注样本进行事件识别。
results: 对四类事件的识别性能显著提高，与只使用少量标注样本相比， graph-based LS 方法表现最佳。

Abstract
Event identification is increasingly recognized as crucial for enhancing the reliability, security, and stability of the electric power system. With the growing deployment of Phasor Measurement Units (PMUs) and advancements in data science, there are promising opportunities to explore data-driven event identification via machine learning classification techniques. However, obtaining accurately-labeled eventful PMU data samples remains challenging due to its labor-intensive nature and uncertainty about the event type (class) in real-time. Thus, it is natural to use semi-supervised learning techniques, which make use of both labeled and unlabeled samples. %We propose a novel semi-supervised framework to assess the effectiveness of incorporating unlabeled eventful samples to enhance existing event identification methodologies. We evaluate three categories of classical semi-supervised approaches: (i) self-training, (ii) transductive support vector machines (TSVM), and (iii) graph-based label spreading (LS) method. Our approach characterizes events using physically interpretable features extracted from modal analysis of synthetic eventful PMU data. In particular, we focus on the identification of four event classes whose identification is crucial for grid operations. We have developed and publicly shared a comprehensive Event Identification package which consists of three aspects: data generation, feature extraction, and event identification with limited labels using semi-supervised methodologies. Using this package, we generate and evaluate eventful PMU data for the South Carolina synthetic network. Our evaluation consistently demonstrates that graph-based LS outperforms the other two semi-supervised methods that we consider, and can noticeably improve event identification performance relative to the setting with only a small number of labeled samples.

摘要
<>Translate the given text into Simplified Chinese.<>电力系统中的事件识别日益被认为是提高系统可靠性、安全性和稳定性的关键。随着phasor Measurement Units（PMUs）的广泛部署和数据科学技术的进步，有希望通过机器学习分类技术来探索数据驱动的事件识别。然而，在实时获得正确标注的事件ful PMU数据样本上存在劳动 INTENSIVE和未知事件类型的问题。因此，使用半supervised学习技术，这些技术使用标注和未标注样本。 %We propose a novel semi-supervised framework to assess the effectiveness of incorporating unlabeled eventful samples to enhance existing event identification methodologies. We evaluate three categories of classical semi-supervised approaches: (i) self-training, (ii) transductive support vector machines (TSVM), and (iii) graph-based label spreading (LS) method. Our approach characterizes events using physically interpretable features extracted from modal analysis of synthetic eventful PMU data. In particular, we focus on the identification of four event classes whose identification is crucial for grid operations. We have developed and publicly shared a comprehensive Event Identification package which consists of three aspects: data generation, feature extraction, and event identification with limited labels using semi-supervised methodologies. Using this package, we generate and evaluate eventful PMU data for the South Carolina synthetic network. Our evaluation consistently demonstrates that graph-based LS outperforms the other two semi-supervised methods that we consider, and can noticeably improve event identification performance relative to the setting with only a small number of labeled samples.

Invariant Probabilistic Prediction

paper_url: http://arxiv.org/abs/2309.10083
repo_url: https://github.com/alexanderhenzi/ipp
paper_authors: Alexander Henzi, Xinwei Shen, Michael Law, Peter Bühlmann
for: 本研究旨在探讨在数据分布变化下，使用统计方法实现robust性和不变性。
methods: 该文使用了一种 causality-inspired 框架，研究了probabilistic predictions 的不变性和robust性，并提出了一种可以在不同数据分布下实现不变性的方法。
results: 研究发现，在一般情况下，arbitrary distribution shifts 不会导致 invariant和robust probabilistic predictions，与点预测相比，probabilistic predictions 在不同数据分布下的性能更强。文章还提出了一种方法来实现不变性，并进行了对实验数据的验证。

Abstract
In recent years, there has been a growing interest in statistical methods that exhibit robust performance under distribution changes between training and test data. While most of the related research focuses on point predictions with the squared error loss, this article turns the focus towards probabilistic predictions, which aim to comprehensively quantify the uncertainty of an outcome variable given covariates. Within a causality-inspired framework, we investigate the invariance and robustness of probabilistic predictions with respect to proper scoring rules. We show that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions, in contrast to the setting of point prediction. We illustrate how to choose evaluation metrics and restrict the class of distribution shifts to allow for identifiability and invariance in the prototypical Gaussian heteroscedastic linear model. Motivated by these findings, we propose a method to yield invariant probabilistic predictions, called IPP, and study the consistency of the underlying parameters. Finally, we demonstrate the empirical performance of our proposed procedure on simulated as well as on single-cell data.

摘要
Translated into Simplified Chinese:近年来，有增长的兴趣在统计方法中具有对分布变化的鲁棒性。大多数相关的研究集中在点预测中，使用平方误差损失。然而，本文将注意力转移到 probabilistic 预测，它们意味着对输出变量的不确定性进行全面评估。在 causality 框架下，我们调查 probabilistic 预测的一致性和鲁棒性，使用合适的 scoring rule。我们发现，在一般情况下，不同分布变换不会导致鲁棒和一致的 probabilistic 预测，与点预测的情况不同。我们介绍如何选择评估指标和限制分布变换的类型，以便在 Gaussian 不同梯度线性模型中实现可识别性和一致性。为了实现这一目标，我们提出了一种名为 IPP 的方法，并研究其下面的参数一致性。最后，我们通过 simulations 和单元细胞数据进行了实验性评估。

A Unifying Perspective on Non-Stationary Kernels for Deeper Gaussian Processes

paper_url: http://arxiv.org/abs/2309.10068
repo_url: None
paper_authors: Marcus M. Noack, Hengrui Luo, Mark D. Risser
for: 本文旨在帮助机器学习实践者更好地理解非站立性的概率过程（Gaussian Process）中的非站立性形式，并提出一种新的kernel函数，以提高非站立性的预测性和不确定性评估。
methods: 本文使用了多种常见的非站立性kernels，并且对它们的性质进行了仔细的研究和比较，以挖掘它们的优点和缺点。
results: 本文通过使用不同的数据集和kernels进行了丰富的实践和比较，并发现了一些非站立性kernels的优点和缺点。基于这些发现，本文提出了一种新的kernel函数，以提高非站立性预测的准确性和不确定性评估。

Abstract
The Gaussian process (GP) is a popular statistical technique for stochastic function approximation and uncertainty quantification from data. GPs have been adopted into the realm of machine learning in the last two decades because of their superior prediction abilities, especially in data-sparse scenarios, and their inherent ability to provide robust uncertainty estimates. Even so, their performance highly depends on intricate customizations of the core methodology, which often leads to dissatisfaction among practitioners when standard setups and off-the-shelf software tools are being deployed. Arguably the most important building block of a GP is the kernel function which assumes the role of a covariance operator. Stationary kernels of the Mat\'ern class are used in the vast majority of applied studies; poor prediction performance and unrealistic uncertainty quantification are often the consequences. Non-stationary kernels show improved performance but are rarely used due to their more complicated functional form and the associated effort and expertise needed to define and tune them optimally. In this perspective, we want to help ML practitioners make sense of some of the most common forms of non-stationarity for Gaussian processes. We show a variety of kernels in action using representative datasets, carefully study their properties, and compare their performances. Based on our findings, we propose a new kernel that combines some of the identified advantages of existing kernels.

摘要
Gaussian process (GP) 是一种广泛使用的统计技术，用于数据不确定性评估和函数近似。在过去二十年中，GP 被机器学习领域采纳，因为它在数据稀缺的情况下表现出色，并且自然地提供了稳健的不确定性估计。然而，GP 的性能往往取决于核函数的细腻定制，这经常导致实践者在使用标准设置和商业化软件工具时感到不满。核函数是 GP 中最重要的构建块，它扮演了 covariance 算子的角色。在大多数应用研究中，使用 Stationary 核函数，但这些核函数的预测性能不佳，而且不符合实际情况。非站ARY 核函数可以提高性能，但它们的函数形式更复杂，需要更多的定制和优化。在这篇视点中，我们想帮助机器学习实践者理解 GP 中一些最常见的非站ARY 性。我们使用代表性的数据集，详细研究核函数的性质，并比较它们的性能。根据我们的发现，我们提出了一种新的核函数，它结合了一些已知核函数的优点。

Dual Student Networks for Data-Free Model Stealing

paper_url: http://arxiv.org/abs/2309.10058
repo_url: None
paper_authors: James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah
for: 提高数据预processing中的模型骚乱攻击 robustness
methods: 提出了一种基于两个学生模型的 dual student 方法，通过培养两个学生模型来提供生成器模型生成样本的依据，并通过对两个学生模型的分歧来鼓励生成器模型生成更多的样本空间
results: 实验结果表明，对于数据预processing中的模型骚乱攻击，我们的方法可以提供更高的鲁棒性和更好的攻击效果，同时也可以减少查询量和训练计算成本Here is the simplified Chinese text:
for: 提高数据预处理中模型骚乱攻击robustness
methods: 基于两个学生模型的 dual student方法，通过培养两个学生模型来提供生成器模型生成样本的依据，并通过对两个学生模型的分歧来鼓励生成器模型生成更多的样本空间
results: 实验结果表明，对于数据预处理中模型骚乱攻击，我们的方法可以提供更高的鲁棒性和更好的攻击效果，同时也可以减少查询量和训练计算成本

Abstract
Existing data-free model stealing methods use a generator to produce samples in order to train a student model to match the target model outputs. To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space. We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on. On one hand, disagreement on a sample implies at least one student has classified the sample incorrectly when compared to the target model. This incentive towards disagreement implicitly encourages the generator to explore more diverse regions of the input space. On the other hand, our method utilizes gradients of student models to indirectly estimate gradients of the target model. We show that this novel training objective for the generator network is equivalent to optimizing a lower bound on the generator's loss if we had access to the target model gradients. We show that our new optimization framework provides more accurate gradient estimation of the target model and better accuracies on benchmark classification datasets. Additionally, our approach balances improved query efficiency with training computation cost. Finally, we demonstrate that our method serves as a better proxy model for transfer-based adversarial attacks than existing data-free model stealing methods.

摘要
existed 无数据模型偷窃方法使用一个生成器生成样本，以训练一个学生模型与目标模型输出匹配。为此，两个主要挑战是无法估计目标模型参数的梯度，以及生成具有很好的覆盖度的训练样本。我们提议一种双学生方法，其中两个学生在相互对应的情况下受训。如果两个学生对某个样本表示不同意，那么至少有一个学生将该样本错误地分类为目标模型。这种启发性偏好探索更多的输入空间。另一方面，我们利用学生模型的梯度来间接估计目标模型的梯度。我们表明，这种新的训练目标函数对生成器网络来说等价于优化一个下界的生成器损失。我们示出，我们的新优化框架提供更准确的目标模型梯度估计和 benchmark 分类数据集上的更高的准确率。此外，我们的方法可以更好地平衡提高查询效率和训练计算成本。最后，我们证明我们的方法在基于转移型敌对攻击的传输基于模型偷窃方法中服为更好的代理模型。

Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach

paper_url: http://arxiv.org/abs/2309.10831
repo_url: None
paper_authors: Mohammad S. Ramadan, Mahmoud A. Hayajnh, Michael T. Tolley, Kyriakos G. Vamvoudakis
for: 这个论文是为了解决两个问题：（一）控制实验室/模拟和现实世界之间的模型不确定性导致的强化学习的脆弱性，以及（二）决策控制的计算成本过高。
methods: 该论文使用强化学习解决了随机动态计划方程的问题，从而获得了一个安全的强化学习控制器，可以自动探索和利用不确定性，并且可以在实时中学习。
results: 例如在模拟示例中，该控制器能够实时学习并适应不同的模型不确定性，并且能够保证控制器的安全性。

Abstract
In this paper we provide framework to cope with two problems: (i) the fragility of reinforcement learning due to modeling uncertainties because of the mismatch between controlled laboratory/simulation and real-world conditions and (ii) the prohibitive computational cost of stochastic optimal control. We approach both problems by using reinforcement learning to solve the stochastic dynamic programming equation. The resulting reinforcement learning controller is safe with respect to several types of constraints constraints and it can actively learn about the modeling uncertainties. Unlike exploration and exploitation, probing and safety are employed automatically by the controller itself, resulting real-time learning. A simulation example demonstrates the efficacy of the proposed approach.

摘要
在这篇论文中，我们提供了一个框架来解决两个问题：（i）控制学习中的模型不确定性导致实验室/模拟环境和实际环境之间的匹配问题，以及（ii） Stochastic Optimal Control 的计算成本过高。我们通过使用 reinforcement learning 解决随机动态程序方程，以获得一个安全的控制器，该控制器可以自动地进行探索和利用，并且可以活动地学习模型不确定性。一个示例在实验中证明了我们的方法的有效性。

A Modular Spatial Clustering Algorithm with Noise Specification

paper_url: http://arxiv.org/abs/2309.10047
repo_url: None
paper_authors: Akhil K, Srikanth H R
for: 提高 clustering 算法的精度和快速性，以便更好地处理数据挖掘、机器学习和模式识别等领域中的数据分类问题。
methods: 基于细菌园的生长模型，通过控制细菌的生长和消耗来实现理想的分组准则。模块化设计，可以根据具体任务和数据分布创建特定版本的算法。还提供了适当减少噪声的功能。
results: 提出了一种新的分组算法，即细菌园算法（Bacteria-Farm），它可以平衡性能和参数优化的时间。与其他分组算法相比，该算法具有更好的准确率和鲁棒性。

Abstract
Clustering techniques have been the key drivers of data mining, machine learning and pattern recognition for decades. One of the most popular clustering algorithms is DBSCAN due to its high accuracy and noise tolerance. Many superior algorithms such as DBSCAN have input parameters that are hard to estimate. Therefore, finding those parameters is a time consuming process. In this paper, we propose a novel clustering algorithm Bacteria-Farm, which balances the performance and ease of finding the optimal parameters for clustering. Bacteria- Farm algorithm is inspired by the growth of bacteria in closed experimental farms - their ability to consume food and grow - which closely represents the ideal cluster growth desired in clustering algorithms. In addition, the algorithm features a modular design to allow the creation of versions of the algorithm for specific tasks / distributions of data. In contrast with other clustering algorithms, our algorithm also has a provision to specify the amount of noise to be excluded during clustering.

摘要
translate into Simplified Chinese: clustering 技术已经是数据挖掘、机器学习和模式识别领域的关键驱动者，数十年来。DBSCAN 算法是最受欢迎的一种，因为它具有高度准确性和噪声忍容性。然而，许多更高级的算法，如 DBSCAN，具有难以估算的输入参数。因此，查找这些参数是一项时间consuming 的过程。在这篇论文中，我们提出了一种新的 clustering 算法，叫做 Bacteria-Farm，它可以平衡性能和找到最佳参数的易用性。Bacteria-Farm 算法是通过closed experimental farms 中细菌的生长和增长来 inspirited 的，这种生长模式与 clustering 算法中理想的群集生长非常相似。此外，算法还具有可重新配置的模块化设计，以便为特定任务/数据分布创建版本。与其他 clustering 算法不同，我们的算法还具有排除噪声的功能。

A Multi-Token Coordinate Descent Method for Semi-Decentralized Vertical Federated Learning

paper_url: http://arxiv.org/abs/2309.09977
repo_url: None
paper_authors: Pedro Valdeira, Yuejie Chi, Cláudia Soares, João Xavier
For: 采用 Multi-Token Coordinate Descent (MTCD) 算法进行 semi-decentralized вертикального联合学习，提高通信效率。* Methods: 利用客户端-服务器和客户端-客户端通信，每个客户端持有小subset的特征进行并行 Markov chain (block) coordinate descent 算法。* Results: 实现了 $\mathcal{O}(1/T)$ 的收敛速率 для非对称目标函数，并且可以控制并行通信的数量。

Abstract
Communication efficiency is a major challenge in federated learning (FL). In client-server schemes, the server constitutes a bottleneck, and while decentralized setups spread communications, they do not necessarily reduce them due to slower convergence. We propose Multi-Token Coordinate Descent (MTCD), a communication-efficient algorithm for semi-decentralized vertical federated learning, exploiting both client-server and client-client communications when each client holds a small subset of features. Our multi-token method can be seen as a parallel Markov chain (block) coordinate descent algorithm and it subsumes the client-server and decentralized setups as special cases. We obtain a convergence rate of $\mathcal{O}(1/T)$ for nonconvex objectives when tokens roam over disjoint subsets of clients and for convex objectives when they roam over possibly overlapping subsets. Numerical results show that MTCD improves the state-of-the-art communication efficiency and allows for a tunable amount of parallel communications.

摘要
通信效率是联邦学习（FL）的主要挑战。在客户端服务器方案中，服务器成为瓶颈，而分散式设计可以分散通信，但不一定可以减少它们，因为它们的减少速度较慢。我们提出了多token坐标降低（MTCD）算法，用于半分散式垂直联邦学习，利用每个客户端持有的小subset特征来实现高效的通信。我们的多token方法可以看作是并行Markov链（块）坐标降低算法，它包含客户端服务器和分散式设计的特殊情况。我们得到了非对称目标函数的 $\mathcal{O}(1/T)$ 收敛率，并且数学实验表明，MTCD可以提高当前最佳的通信效率，并允许调整的并行通信数量。

Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification

paper_url: http://arxiv.org/abs/2309.09976
repo_url: None
paper_authors: Niraj Kumar, Romina Yalovetzky, Changhao Li, Pierre Minssen, Marco Pistoia
For: This paper proposes a novel quantum algorithm for constructing and retraining decision trees in regression and binary classification tasks, with the goal of significantly reducing the time required for tree retraining.* Methods: The proposed algorithm, named Des-q, uses a quantum-accessible memory to efficiently estimate feature weights and perform k-piecewise linear tree splits at each internal node. It also employs a quantum-supervised clustering method based on the q-means algorithm to determine the k suitable anchor points for these splits.* Results: The simulated version of the Des-q algorithm is benchmarked against the state-of-the-art classical decision tree for regression and binary classification on multiple data sets with numerical features, and is shown to exhibit similar performance while significantly speeding up the periodic tree retraining.

Abstract
Decision trees are widely used in machine learning due to their simplicity in construction and interpretability. However, as data sizes grow, traditional methods for constructing and retraining decision trees become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce a novel quantum algorithm, named Des-q, for constructing and retraining decision trees in regression and binary classification tasks. Assuming the data stream produces small increments of new training examples, we demonstrate that our Des-q algorithm significantly reduces the time required for tree retraining, achieving a poly-logarithmic time complexity in the number of training examples, even accounting for the time needed to load the new examples into quantum-accessible memory. Our approach involves building a decision tree algorithm to perform k-piecewise linear tree splits at each internal node. These splits simultaneously generate multiple hyperplanes, dividing the feature space into k distinct regions. To determine the k suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm of Kerenidis et al. Des-q first efficiently estimates each feature weight using a novel quantum technique to estimate the Pearson correlation. Subsequently, we employ weighted distance estimation to cluster the training examples in k disjoint regions and then proceed to expand the tree using the same procedure. We benchmark the performance of the simulated version of our algorithm against the state-of-the-art classical decision tree for regression and binary classification on multiple data sets with numerical features. Further, we showcase that the proposed algorithm exhibits similar performance to the state-of-the-art decision tree while significantly speeding up the periodic tree retraining.

摘要
决策树在机器学习中广泛使用，因其建构和解释性很好。但是，随着数据集的增大，传统的决策树建构和重新训练方法会变得越来越慢，时间复杂度随着训练示例数量的增长而呈极函数关系。在这种情况下，我们提出了一种新的量子算法，名为Des-q，用于在回归和二分类任务中建构和重新训练决策树。假设数据流量产生小量的新训练示例，我们示出了Des-q算法可以在训练示例数量的极函数时间复杂度下，对决策树进行重新训练，而不是在训练示例数量的几乎方差时间复杂度下。我们的方法是在每个内部节点上使用k个 piecewise 线性树split，这些拆分同时生成多个抽象。为确定k个适当的吊革点，我们开发了一种高效的量子监督学习方法，基于kerenidis等人提出的q-means算法。Des-q首先高效地估计每个特征的权重，使用一种新的量子技术来估计pearson相关性。然后，我们使用质量距离估计来归类训练示例，并将其分为k个不同的区域。最后，我们使用同样的过程来扩展树。我们使用模拟版的算法对比州时的分类树的表现，并证明Des-q算法在多个数据集上的numerical特征上 exhibits similar performance，同时具有明显的时间复杂度优化。此外，我们还显示了Des-q算法在 periodic tree retraining 中的性能，并证明它在训练示例数量的增长中保持稳定性。

Empirical Study of Mix-based Data Augmentation Methods in Physiological Time Series Data

paper_url: http://arxiv.org/abs/2309.09970
repo_url: https://github.com/comp-well-org/mix-augmentation-for-physiological-time-series-classification
paper_authors: Peikun Guo, Huiyuan Yang, Akane Sano
for: 这个论文主要是为了探讨在生理时间序分类任务中使用mixup等混合基于数据增强技术的可能性和效果。
methods: 这个论文使用了多种mix-based数据增强技术，包括mixup、cutmix和替换混合，对六个生理时间序数据集进行了系统性的评估，以确定这些技术在不同的感知数据和分类任务中的表现。
results: 研究结果表明，三种mix-based数据增强技术可以在六个生理时间序数据集上提高表现，而且这些改进不需要专家知识或广泛的参数调整。

Abstract
Data augmentation is a common practice to help generalization in the procedure of deep model training. In the context of physiological time series classification, previous research has primarily focused on label-invariant data augmentation methods. However, another class of augmentation techniques (\textit{i.e., Mixup}) that emerged in the computer vision field has yet to be fully explored in the time series domain. In this study, we systematically review the mix-based augmentations, including mixup, cutmix, and manifold mixup, on six physiological datasets, evaluating their performance across different sensory data and classification tasks. Our results demonstrate that the three mix-based augmentations can consistently improve the performance on the six datasets. More importantly, the improvement does not rely on expert knowledge or extensive parameter tuning. Lastly, we provide an overview of the unique properties of the mix-based augmentation methods and highlight the potential benefits of using the mix-based augmentation in physiological time series data.

摘要
<>translate_language="zh-CN"<>数据扩充是一种常见的方法来帮助深度模型训练过程中的泛化。在生理时间序列分类领域，先前的研究主要集中在标签不变的数据扩充方法上。然而，另一类 augmentation 技术（即 Mixup）在计算机视觉领域出现后，尚未在时间序列领域得到完全探索。在这种研究中，我们系统地评估了基于混合的扩充方法，包括 mixup、cutmix 和 manifold mixup，在六个生理时间序列 dataset 上，并评估了不同的感知数据和分类任务中的性能。我们的结果表明，三种混合基于的扩充方法可以一致地提高六个 dataset 的性能。此外，这些改进不需要专家知识或广泛的参数调整。最后，我们介绍了混合基于的扩充方法的独特性质，并强调了在生理时间序列数据中使用混合基于的扩充方法的潜在优势。

Prompt a Robot to Walk with Large Language Models

paper_url: http://arxiv.org/abs/2309.09969
repo_url: https://github.com/HybridRobotics/prompt2walk
paper_authors: Yen-Jen Wang, Bike Zhang, Jianyu Chen, Koushil Sreenath
for: 这个研究旨在使用几何提示来将大型自然语言模型（LLM）应用于机器人控制中。
methods: 这个研究使用了几何提示收集自物理环境，并使用了LLM进行循环预测控制命令。
results: 实验结果显示，这个方法可以有效地将机器人诱导到行走。这证明了LLM可以作为机器人动作控制中的低层反馈控制器。I hope that helps! Let me know if you have any other questions.

Abstract
Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. Recently, there has been escalating interest in deploying LLMs for robotics, aiming to harness the power of foundation models in real-world settings. However, this approach faces significant challenges, particularly in grounding these models in the physical world and in generating dynamic robot motions. To address these issues, we introduce a novel paradigm in which we use few-shot prompts collected from the physical environment, enabling the LLM to autoregressively generate low-level control commands for robots without task-specific fine-tuning. Experiments across various robots and environments validate that our method can effectively prompt a robot to walk. We thus illustrate how LLMs can proficiently function as low-level feedback controllers for dynamic motion control even in high-dimensional robotic systems. The project website and source code can be found at: https://prompt2walk.github.io/ .

摘要
大型语言模型（LLM）在互联网规模数据上预训练后表现出了各种各样的能力。近期，有越来越多的人们对于使用 LLM 在实际场景中进行部署表示了极大的兴趣。然而，这种方法面临着 significativetranslation challenges，特别是在将模型 anchored 到物理世界中和生成动态机器人运动。为了解决这些问题，我们提出了一种新的思路，即通过几个shot的提示从物理环境中收集，使 LLM 可以自动生成机器人的低级控制命令，无需特定任务的微调。经过对各种机器人和环境的实验，我们发现我们的方法可以有效地使机器人行走。我们因此证明了 LLM 可以在高维机器人系统中作为低级反馈控制器，进行动态运动控制。项目官网和代码可以在以下链接中找到：https://prompt2walk.github.io/。

Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees

paper_url: http://arxiv.org/abs/2309.09968
repo_url: None
paper_authors: Alexia Jolicoeur-Martineau, Kilian Fatras, Tal Kachman
for: 生成和填充混合类型（连续和分类）表格数据
methods: 使用分布式扩散和条件流匹配生成混合类型表格数据，而不是使用前一些工作中的神经网络函数近似器
results: 在不同的数据集上，我们的方法可以生成高度真实的人工数据，并且可以生成多种可能的数据填充结果，经验显示我们的方法通常超越深度学习生成方法，可以在CPU上并行训练而无需GPU，我们将代码发布到PyPI和CRAN上。

Abstract
Tabular data is hard to acquire and is subject to missing values. This paper proposes a novel approach to generate and impute mixed-type (continuous and categorical) tabular data using score-based diffusion and conditional flow matching. Contrary to previous work that relies on neural networks as function approximators, we instead utilize XGBoost, a popular Gradient-Boosted Tree (GBT) method. In addition to being elegant, we empirically show on various datasets that our method i) generates highly realistic synthetic data when the training dataset is either clean or tainted by missing data and ii) generates diverse plausible data imputations. Our method often outperforms deep-learning generation methods and can trained in parallel using CPUs without the need for a GPU. To make it easily accessible, we release our code through a Python library on PyPI and an R package on CRAN.

摘要
表格数据具有困难和缺失值。这篇论文提出了一种新的方法，使用分数基diffusion和条件流匹配生成和填充混合类型（连续和分类）表格数据。与之前的工作不同，我们不使用神经网络作为函数估计器，而是使用XGBoost，一种受欢迎的梯度增强树（GBT）方法。我们的方法不仅简洁高效，而且在不同的数据集上经验表明，我们的方法可以生成高度真实的人工数据，并且可以生成多种可能的数据填充。我们的方法经常超过深度学习生成方法，并且可以在CPU上并行训练，不需要GPU。为便于使用，我们在Python库中发布了代码，并在CRAN上发布了R包。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Evaluating Adversarial Robustness with Expected Viable Performance

paper_url: http://arxiv.org/abs/2309.09928
repo_url: None
paper_authors: Ryan McCoppin, Colin Dawson, Sean M. Kennedy, Leslie M. Blaha
for: 评估预测模型的可靠性，特别关注对抗性扰动的影响。
methods: 使用可靠性指标，例如分类精度，来评估模型在不同抗性扰动下的性能。
results: 提出一种基于期望值的预测模型可靠性评估方法，以便更好地评估模型对抗性扰动的抗性性能。

Abstract
We introduce a metric for evaluating the robustness of a classifier, with particular attention to adversarial perturbations, in terms of expected functionality with respect to possible adversarial perturbations. A classifier is assumed to be non-functional (that is, has a functionality of zero) with respect to a perturbation bound if a conventional measure of performance, such as classification accuracy, is less than a minimally viable threshold when the classifier is tested on examples from that perturbation bound. Defining robustness in terms of an expected value is motivated by a domain general approach to robustness quantification.

摘要
我们提出了一种度量分类器的Robustness，特别关注对抗攻击的影响，以期望功能性对可能的攻击 perturbations 的平均值。我们假设分类器对某个 perturbation bound 的测试例而言，如果使用 convential 度量指标，例如分类率，则分类器的性能低于最低可接受水平，则分类器对该 bound 的测试例是不可用的（即其功能性为零）。定义Robustness 以期望值的方式受到Domain 通用的Robustness 评估方法的 inspirations。

Graph topological property recovery with heat and wave dynamics-based features on graphs

paper_url: http://arxiv.org/abs/2309.09924
repo_url: None
paper_authors: Dhananjay Bhaskar, Yanlei Zhang, Charles Xu, Xingzhi Sun, Oluwadamilola Fasina, Guy Wolf, Maximilian Nickel, Michael Perlmutter, Smita Krishnaswamy
for: 这个论文是为了提出一种基于解决常微分方程的图形网络（GDeNet），用于获取不同下游任务的连续节点和图级表示。
methods: 论文使用了解析解决方法，连接热和波方程的动力学特性和图形的 спектраль性质以及连续时间游走在图形上的行为。
results: 实验表明，这些动力学特性能够捕捉图形的几何和拓扑特征，并且在真实世界数据集上表现出优于其他方法。

Abstract
In this paper, we propose Graph Differential Equation Network (GDeNet), an approach that harnesses the expressive power of solutions to PDEs on a graph to obtain continuous node- and graph-level representations for various downstream tasks. We derive theoretical results connecting the dynamics of heat and wave equations to the spectral properties of the graph and to the behavior of continuous-time random walks on graphs. We demonstrate experimentally that these dynamics are able to capture salient aspects of graph geometry and topology by recovering generating parameters of random graphs, Ricci curvature, and persistent homology. Furthermore, we demonstrate the superior performance of GDeNet on real-world datasets including citation graphs, drug-like molecules, and proteins.

摘要
在这篇论文中，我们提出了图Diffusion Equation Network（GDeNet），它利用图上解析方程的表达能力来获得不同下游任务的连续节点和图级表示。我们 derive了对热和波方程的动态性和图的spectral properties以及漫步过程的连续性有关的理论结果。我们通过实验表明，这些动态可以捕捉图的几何和topology特征，例如生成参数、Ricci curvature和持续同构。此外，我们在真实世界数据集上证明了GDeNet的优越性。

Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation

paper_url: http://arxiv.org/abs/2309.09920
repo_url: None
paper_authors: Danilo de Oliveira, Timo Gerkmann
for: 压缩 HuBERT 模型的知识，以提高自动语音识别器的性能和储存需求。
methods: 使用知识传递和分离知识传递的方法，将 HuBERT 的 Transformer 层转换为 LSTM 型的压缩模型，以减少参数数量并提高自动语音识别器的性能。
results: 与 DistilHuBERT 相比， proposed 方法可以实现更好的自动语音识别器性能，并且对于储存需求产生更大的改善。

Abstract
Much research effort is being applied to the task of compressing the knowledge of self-supervised models, which are powerful, yet large and memory consuming. In this work, we show that the original method of knowledge distillation (and its more recently proposed extension, decoupled knowledge distillation) can be applied to the task of distilling HuBERT. In contrast to methods that focus on distilling internal features, this allows for more freedom in the network architecture of the compressed model. We thus propose to distill HuBERT's Transformer layers into an LSTM-based distilled model that reduces the number of parameters even below DistilHuBERT and at the same time shows improved performance in automatic speech recognition.

摘要
很多研究力量是用于压缩自动学习模型的知识，这些模型具有强大的能力，但却占用大量内存。在这项工作中，我们显示了知识储存方法（以及其最近提出的扩展方法，即分离知识储存）可以应用于压缩HuBERT。与其他方法不同的是，我们可以在压缩模型网络结构中有更多的自由。因此，我们提议将HuBERT的转换层压缩成一个使用LSTM的压缩模型，以降低参数数量，并且同时在自动语音识别中表现更好。

Learning Nonparametric High-Dimensional Generative Models: The Empirical-Beta-Copula Autoencoder

paper_url: http://arxiv.org/abs/2309.09916
repo_url: None
paper_authors: Maximilian Coblenz, Oliver Grothe, Fabian Kächele
for:* 这个论文的目的是把自动编码器转化为生成模型，并且寻找简单有效的方法来实现这一点。methods:* 论文使用了多种方法来模型自动编码器的幂数空间，包括核密度估计、泊松分布、正规流等。results:* 研究发现，使用泊松分布模型自动编码器的幂数空间可以很好地生成新的数据样本，并且可以控制生成的数据样本具有特定的特征。此外，论文还提供了一种新的copula模型，即Empirical Beta Copula Autoencoder，可以更好地实现生成模型的目的。

Abstract
By sampling from the latent space of an autoencoder and decoding the latent space samples to the original data space, any autoencoder can simply be turned into a generative model. For this to work, it is necessary to model the autoencoder's latent space with a distribution from which samples can be obtained. Several simple possibilities (kernel density estimates, Gaussian distribution) and more sophisticated ones (Gaussian mixture models, copula models, normalization flows) can be thought of and have been tried recently. This study aims to discuss, assess, and compare various techniques that can be used to capture the latent space so that an autoencoder can become a generative model while striving for simplicity. Among them, a new copula-based method, the Empirical Beta Copula Autoencoder, is considered. Furthermore, we provide insights into further aspects of these methods, such as targeted sampling or synthesizing new data with specific features.

摘要
<> translate english text into simplified chineseBy sampling from the latent space of an autoencoder and decoding the latent space samples to the original data space, any autoencoder can simply be turned into a generative model. For this to work, it is necessary to model the autoencoder's latent space with a distribution from which samples can be obtained. Several simple possibilities (kernel density estimates, Gaussian distribution) and more sophisticated ones (Gaussian mixture models, copula models, normalization flows) can be thought of and have been tried recently. This study aims to discuss, assess, and compare various techniques that can be used to capture the latent space so that an autoencoder can become a generative model while striving for simplicity. Among them, a new copula-based method, the Empirical Beta Copula Autoencoder, is considered. Furthermore, we provide insights into further aspects of these methods, such as targeted sampling or synthesizing new data with specific features. traducción al chino simplificado通过从自编码器的幂 space 中采样并将幂 space 采样转换为原始数据空间，任何自编码器都可以简单地变成生成模型。为了实现这一点，需要模型自编码器的幂空间 distribution 中的样本。一些简单的可能性（kernel density estimates，Gaussian distribution）和更复杂的一些（Gaussian mixture models，copula models，normalization flows）都有被考虑和尝试过。本研究旨在讨论、评估和比较这些方法，以实现将自编码器转化为生成模型，同时寻求简单性。其中，一种新的 copula-based 方法，即 Empirical Beta Copula Autoencoder，被考虑。此外，我们还提供了这些方法的进一步含义，例如针对的采样或生成特定特征的新数据。

Learning to Generate Lumped Hydrological Models

paper_url: http://arxiv.org/abs/2309.09904
repo_url: None
paper_authors: Yang Yang, Ting Fong May Chui
for: 这个研究旨在开发一种数据驱动的方法，用于表征水文功能在流域中的低维度表示，并使用这种表示来重建特定的水文功能。
methods: 这个研究使用深度学习方法来学习流域的水文功能，并直接从气候冲击和流域流量数据中学习出latent variable的值。
results: 研究发现，使用这种方法可以从全球超过3,000个流域的数据中学习出高质量的生成模型，并将这些生成模型应用到700个不同的流域中，得到了比或更好的估计结果。

Abstract
In a lumped hydrological model structure, the hydrological function of a catchment is characterized by only a few parameters. Given a set of parameter values, a numerical function useful for hydrological prediction is generated. Thus, this study assumes that the hydrological function of a catchment can be sufficiently well characterized by a small number of latent variables. By specifying the variable values, a numerical function resembling the hydrological function of a real-world catchment can be generated using a generative model. In this study, a deep learning method is used to learn both the generative model and the latent variable values of different catchments directly from their climate forcing and runoff data, without using catchment attributes. The generative models can be used similarly to a lumped model structure, i.e., by estimating the optimal parameter or latent variable values using a generic model calibration algorithm, an optimal numerical model can be derived. In this study, generative models using eight latent variables were learned from data from over 3,000 catchments worldwide, and the learned generative models were applied to model over 700 different catchments using a generic calibration algorithm. The quality of the resulting optimal models was generally comparable to or better than that obtained using 36 different types of lump model structures or using non-generative deep learning methods. In summary, this study presents a data-driven approach for representing the hydrological function of a catchment in low-dimensional space and a method for reconstructing specific hydrological functions from the representations.

摘要
在汇集型水文模型结构中，湍水功能的某catchment被定义为只有一些参数。给定一组参数值，可以生成一个数值函数用于水文预测。因此，本研究假设catchment的水文功能可以通过一小数量的隐变量足够准确地表示。通过 specifying变量值，可以使用生成模型生成一个数值函数类似于实际世界catchment的水文函数。在本研究中，使用深度学习方法来学习catchment的生成模型和隐变量值，不使用catchment特征。生成模型可以与汇集型模型结构相同地使用，即通过优化参数或隐变量值使用一个通用模型调整算法来获得最佳数值模型。在本研究中，使用八个隐变量学习了来自全球3,000多个catchment的数据，并将学习的生成模型应用到700多个不同的catchment上使用通用调整算法。得到的优化模型质量通常与或更高于使用36种不同的汇集模型结构或非生成型深度学习方法所获得的质量。总之，本研究提出了一种数据驱动的方法，用于表示catchment的水文功能在低维空间中，以及一种方法，用于从表示中重建特定的水文功能。

Deep Reinforcement Learning for the Joint Control of Traffic Light Signaling and Vehicle Speed Advice

paper_url: http://arxiv.org/abs/2309.09881
repo_url: None
paper_authors: Johannes V. S. Busch, Robert Voelckner, Peter Sossalla, Christian L. Vielhaus, Roberto Calandra, Frank H. P. Fitzek
for: 提高城市堵塞的效率和环保性
methods: 使用深度强化学习控制交通信号灯和车辆行驶速度
results: 在八个 из十一个测试场景中，联合控制方法可以降低车辆旅行延迟，并且观察到车辆速度建议策略可以平滑车辆附近交通信号灯的速度变化。

Abstract
Traffic congestion in dense urban centers presents an economical and environmental burden. In recent years, the availability of vehicle-to-anything communication allows for the transmission of detailed vehicle states to the infrastructure that can be used for intelligent traffic light control. The other way around, the infrastructure can provide vehicles with advice on driving behavior, such as appropriate velocities, which can improve the efficacy of the traffic system. Several research works applied deep reinforcement learning to either traffic light control or vehicle speed advice. In this work, we propose a first attempt to jointly learn the control of both. We show this to improve the efficacy of traffic systems. In our experiments, the joint control approach reduces average vehicle trip delays, w.r.t. controlling only traffic lights, in eight out of eleven benchmark scenarios. Analyzing the qualitative behavior of the vehicle speed advice policy, we observe that this is achieved by smoothing out the velocity profile of vehicles nearby a traffic light. Learning joint control of traffic signaling and speed advice in the real world could help to reduce congestion and mitigate the economical and environmental repercussions of today's traffic systems.

摘要
压力交通在紧张城市中带来经济和环境沉重负担。最近几年，车辆到任何通信技术的可用性允许车辆状态的详细传输到基础设施，以便智能交通灯控制。基础设施也可以为车辆提供适当的行驶方式建议，如 velocities，以改善交通系统的效率。一些研究工作使用深度强化学习控制交通灯或车辆速度。在这种工作中，我们提出了第一次同时学习交通灯控制和车辆速度建议的方法。我们示出这可以提高交通系统的效率。在我们的实验中，同时控制方法比只控制交通灯时间减少了平均车辆旅行延迟，在 eleven 个标准场景中出现了八个情况。分析车辆速度建议政策的Qualitative行为，我们发现这是通过车辆附近交通灯的速度profile的平滑来实现的。在实际世界中学习同时控制交通信号和车辆速度的可能会帮助减少拥堵和今天的交通系统的经济和环境后果。

Error Reduction from Stacked Regressions

paper_url: http://arxiv.org/abs/2309.09880
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Xin Chen, Jason M. Klusowski, Yan Shuo Tan
for: 提高预测精度，使用核算法组合多个回归分析器
methods: 使用非负约束最小二乘法学习组合权重，其中每个回归分析器都是线性最小二乘法项
results: 在嵌入多维空间中，使用核算法组合可以减小人口风险，并且比最佳单个回归分析器更好Here’s a breakdown of each point:* “for”: The paper is written to improve the accuracy of predictions by combining multiple regression analyzers using a nuclear algorithm.* “methods”: The paper uses a non-negative constraint least-squares method to learn the combination weights of the constituent estimators, and the optimization problem can be reformulated as isotonic regression.* “results”: The resulting stacked estimator has a strictly smaller population risk than the best single estimator among them, and it can be implemented with the same order of computation as the best single estimator.

Abstract
Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy. The conventional approach uses cross-validation data to generate predictions from the constituent estimators, and least-squares with nonnegativity constraints to learn the combination weights. In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint. When the constituent estimators are linear least-squares projections onto nested subspaces separated by at least three dimensions, we show that thanks to a shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them. Here ``best'' refers to a model that minimizes a selection criterion such as AIC or BIC. In other words, in this setting, the best single estimator is inadmissible. Because the optimization problem can be reformulated as isotonic regression, the stacked estimator requires the same order of computation as the best single estimator, making it an attractive alternative in terms of both performance and implementation.

摘要
核 stacking 是一种ensemble技术，通过将不同的回归估计器组合起来，提高预测精度。传统方法使用 Cross-validation 数据生成组合估计器的预测，并使用 least-squares 方法学习组合权重。在这篇论文中，我们通过最小化人口风险的估计器来学习这些权重，并且对这些权重进行非负性约束。当组合估计器是线性最小二乘投影 onto 嵌入在至少三维空间中的子空间时，我们显示了一种减小效果，即核 stacked 估计器的人口风险比最佳单个估计器（按照 AIC 或 BIC 选择 criterion）更小。这里的“最佳”指的是一个模型，可以最小化一个选择 criterion。在这种设定下，最佳单个估计器是不可接受的。因为优化问题可以 reformulated 为iso-tonic regression，核 stacked 估计器需要与最佳单个估计器相同的计算顺序，因此它在性能和实现方面都是一个吸引人的选择。

Domain Generalization with Fourier Transform and Soft Thresholding

paper_url: http://arxiv.org/abs/2309.09866
repo_url: None
paper_authors: Hongyi Pan, Bin Wang, Zheyuan Zhan, Xin Zhu, Debesh Jha, Ahmet Enis Cetin, Concetto Spampinato, Ulas Bagci
for: 用于提高脑网络模型对不同来源图像的泛化性能
methods: 使用傅리曼变换基于的频谱预处理策略，并 introduce soft-thresholding函数来消除频谱中的背景干扰
results: 通过实验 validate our approach的效果，与传统和现有方法相比，具有较好的 segmentation metric 和更好的泛化性能

Abstract
Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more robust to domain shifts. The mainstream Fourier-transform-based domain generalization swaps the Fourier amplitude spectrum while preserving the phase spectrum between the source and the target images. However, it neglects background interference in the amplitude spectrum. To overcome this limitation, we introduce a soft-thresholding function in the Fourier domain. We apply this newly designed algorithm to retinal fundus image segmentation, which is important for diagnosing ocular diseases but the neural network's performance can degrade across different sources due to domain shifts. The proposed technique basically enhances fundus image augmentation by eliminating small values in the Fourier domain and providing better generalization. The innovative nature of the soft thresholding fused with Fourier-transform-based domain generalization improves neural network models' performance by reducing the target images' background interference significantly. Experiments on public data validate our approach's effectiveness over conventional and state-of-the-art methods with superior segmentation metrics.

摘要

Prognosis of Multivariate Battery State of Performance and Health via Transformers

paper_url: http://arxiv.org/abs/2309.10014
repo_url: None
paper_authors: Noah H. Paulson, Joseph J. Kubal, Susan J. Babinec
for: 本研究的目的是提供一种深度学习模型，用于预测锂离子电池性能和使用寿命。
methods: 该研究使用了深度变换网络模型，利用两个循环测试数据集，表征了六种锂离子电池化学组成（LFP、NMC111、NMC532、NMC622、HE5050和5Vspinel）、不同的电解液/镍电极组合和充电/充电方式。
results: 该研究的结果表明，使用深度学习模型可以高度准确地预测锂离子电池的性能和使用寿命，其中LFP快速充电数据集的预测结束时间误差为19循环，表明深度学习对锂离子电池健康状况的预测具有扎实的批处能力。

Abstract
Batteries are an essential component in a deeply decarbonized future. Understanding battery performance and "useful life" as a function of design and use is of paramount importance to accelerating adoption. Historically, battery state of health (SOH) was summarized by a single parameter, the fraction of a battery's capacity relative to its initial state. A more useful approach, however, is a comprehensive characterization of its state and complexities, using an interrelated set of descriptors including capacity, energy, ionic and electronic impedances, open circuit voltages, and microstructure metrics. Indeed, predicting across an extensive suite of properties as a function of battery use is a "holy grail" of battery science; it can provide unprecedented insights toward the design of better batteries with reduced experimental effort, and de-risking energy storage investments that are necessary to meet CO2 reduction targets. In this work, we present a first step in that direction via deep transformer networks for the prediction of 28 battery state of health descriptors using two cycling datasets representing six lithium-ion cathode chemistries (LFP, NMC111, NMC532, NMC622, HE5050, and 5Vspinel), multiple electrolyte/anode compositions, and different charge-discharge scenarios. The accuracy of these predictions versus battery life (with an unprecedented mean absolute error of 19 cycles in predicting end of life for an LFP fast-charging dataset) illustrates the promise of deep learning towards providing deeper understanding and control of battery health.

摘要
锂离子电池是深度减碳未来的重要组件。理解锂离子电池性能和使用寿命的关系是加速采用的关键。历史上，锂离子电池状况（SOH）通常是用一个参数表示，即锂离子电池容量相对初始状态的比率。然而，一个更有用的方法是对锂离子电池状况进行全面描述，使用一组相关的参数，包括容量、能量、锂离子和电子阻抗、开路电压和微结构指标。实际上，预测锂离子电池的广泛性能特征是 battery science 的“圣杯”，可以提供前所未有的洞察，并帮助设计更好的锂离子电池，降低实验努力，并为温室气体减排目标做出更多的投资。在这项工作中，我们提出了一种首先采用深度变换网络来预测28个锂离子电池状况指标，使用两个循环数据集，表示六种锂离子陶瓷电池化学式（LFP、NMC111、NMC532、NMC622、HE5050和5Vspinel）、多种电解质/陶瓷组合、以及不同的充电-充电方案。预测的准确性（例如，LFP快充电数据集中预测结束生命的mean absolute error为19次）表明深度学习对锂离子电池健康提供了新的可能性。

Convolutional Deep Kernel Machines

paper_url: http://arxiv.org/abs/2309.09814
repo_url: https://github.com/luisgarzac/Data-Science-Course---Udemy-frogames-Juan-Gabriel-Gomila
paper_authors: Edward Milsom, Ben Anson, Laurence Aitchison
for: 这篇论文主要是为了探讨深度kernel机器（DKM）的应用和发展。
methods: 本论文使用了深度kernel机器（DKM），其不同于传统的 neural network 和 deep kernel learning，因为它们都使用特征作为基本组件。此外，论文还提出了一种有效的间领点拟合方案。
results: 根据实验结果，使用了不同的 normalization 和 likelihood 的模型 variants，可以达到约 99% 的测试准确率在 MNIST 上，92% 在 CIFAR-10 上，71% 在 CIFAR-100 上，而且只需要训练约 28 个 GPU 小时，相比于全功能 NNGP / NTK / Myrtle kernels，速度提高了1-2个数量级。

Abstract
Deep kernel machines (DKMs) are a recently introduced kernel method with the flexibility of other deep models including deep NNs and deep Gaussian processes. DKMs work purely with kernels, never with features, and are therefore different from other methods ranging from NNs to deep kernel learning and even deep Gaussian processes, which all use features as a fundamental component. Here, we introduce convolutional DKMs, along with an efficient inter-domain inducing point approximation scheme. Further, we develop and experimentally assess a number of model variants, including 9 different types of normalisation designed for the convolutional DKMs, two likelihoods, and two different types of top-layer. The resulting models achieve around 99% test accuracy on MNIST, 92% on CIFAR-10 and 71% on CIFAR-100, despite training in only around 28 GPU hours, 1-2 orders of magnitude faster than full NNGP / NTK / Myrtle kernels, whilst achieving comparable performance.

摘要

Learning Optimal Contracts: How to Exploit Small Action Spaces

paper_url: http://arxiv.org/abs/2309.09801
repo_url: None
paper_authors: Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
for: 解决主动자和代理人之间的主动-代理人问题，即主动자通过一系列的合同来劝动代理人进行成本高、不可见的行动，以实现有利的结果。
methods: 我们使用多轮合同的扩展版本，在没有主动자对代理人的信息的情况下，通过观察每轮的结果来学习最优的合同。我们采用一种算法，可以在小的动作空间下获得高probability的最优合同，并且可以在多轮合同中实现$\tilde{\mathcal{O}(T^{4/5})$的 regret bound。
results: 我们解决了Zhu等人（2022）提出的开放问题，并且可以在相关的在线学习 Setting中提供$\tilde{\mathcal{O}(T^{4/5})$的 regret bound，这比之前的 regret bound大大提高。

Abstract
We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme -- called contract -- in order to induce an agent to take a costly, unobservable action leading to favorable outcomes. We consider a generalization of the classical (single-round) version of the problem in which the principal interacts with the agent by committing to contracts over multiple rounds. The principal has no information about the agent, and they have to learn an optimal contract by only observing the outcome realized at each round. We focus on settings in which the size of the agent's action space is small. We design an algorithm that learns an approximately-optimal contract with high probability in a number of rounds polynomial in the size of the outcome space, when the number of actions is constant. Our algorithm solves an open problem by Zhu et al.[2022]. Moreover, it can also be employed to provide a $\tilde{\mathcal{O}(T^{4/5})$ regret bound in the related online learning setting in which the principal aims at maximizing their cumulative utility, thus considerably improving previously-known regret bounds.

摘要
我们研究主体-代理人问题，在这个问题中，主体会提出一个结果相依的支付计划，以使代理人采取费用高、不可见的行动，导致更加有利的结果。我们考虑了经典单回版本的问题的扩展，在这个问题中，主体和代理人在多轮交互中进行互动，主体没有关于代理人的信息，只能通过每轮结果来学习最佳合同。我们关注在代理人行动空间尺度小的情况下。我们设计了一个可以在小于Outcome空间大小的情况下获得近似最佳合同的算法，并且可以提供$\tilde{\mathcal{O}(T^{4/5})$的 regret bound，这比前所未见的 regret bound要更好。此外，这个算法还可以在相关的在线学习设定下应用，以最大化主体的总用用，从而提高之前已知的 regret bound。

Contrastive Initial State Buffer for Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.09752
repo_url: None
paper_authors: Nico Messikommer, Yunlong Song, Davide Scaramuzza
for: 提高强化学习的效率，使用有限样本学习
methods: 引入一种矛盾起始状态缓存，选择过去经验中的状态，用于初始化机器人在环境中，引导其走向更有信息的状态
results: 在两个复杂的机器人任务中，实验结果显示，我们的初始状态缓存可以比基线方案高效，同时也加速了训练的收敛Here’s the summary in English for reference:
for: Improving the efficiency of reinforcement learning, using limited samples to learn
methods: Introducing a Contrastive Initial State Buffer that strategically selects states from past experiences to initialize the agent in the environment, guiding it towards more informative states
results: Experimental results on two complex robotic tasks show that our initial state buffer achieves higher task performance than the baseline while also speeding up training convergence.

Abstract
In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples. While recent works have been effective in leveraging past experiences for policy updates, they often overlook the potential of reusing past experiences for data collection. Independent of the underlying RL algorithm, we introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment in order to guide it toward more informative states. We validate our approach on two complex robotic tasks without relying on any prior information about the environment: (i) locomotion of a quadruped robot traversing challenging terrains and (ii) a quadcopter drone racing through a track. The experimental results show that our initial state buffer achieves higher task performance than the nominal baseline while also speeding up training convergence.

摘要
在强化学习中，探索和利用之间的贸易带来了复杂的挑战，以实现从有限样本中获得高效的学习。然而， latest works often overlook the potential of reusing past experiences for data collection.我们在独立于基础RL算法的情况下，引入了一个对比起始缓存，该缓存从过去经验中选择状态，并将其用于初始化机器人在环境中，以引导它到更有用的状态。我们在两个复杂的机器人任务上进行了验证：（i）一只四脚 robot 在困难的地形上行走，以及（ii）一架quadcopter飞机在赛道上飞行。实验结果表明，我们的起始缓存可以比基线提高任务性能，同时也可以加速训练的收敛。

Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

paper_url: http://arxiv.org/abs/2309.09744
repo_url: None
paper_authors: Laixin Xie, Yang Ouyang, Longfei Chen, Ziming Wu, Quan Li
for: addresses the challenges of missing data in machine learning (ML) modeling
methods: uses Contrastive Learning (CL) framework to model observed data with missing values, without requiring any imputation
results: demonstrates high predictive accuracy and model interpretability through quantitative experiments, expert interviews, and a qualitative user study.Here’s the full summary in Simplified Chinese:
for: Addresses the challenges of missing data in machine learning (ML) modeling
methods: 使用异常学习（CL）框架，模拟带有缺失数据的观察数据，不需要任何替换
results: 通过量化实验、专家采访和用户研究，证明高预测精度和模型可读性。

Abstract
Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.

摘要
“缺失数据可能会对机器学习（ML）模型带来挑战。为了解决这个问题，现有的方法可以分为两种：特征填充和标签预测，它们主要是对缺失数据进行处理，以提高ML表现。这些方法将从观察到的数据中估算缺失的值，因此会遇到三个主要缺陷：需要不同的填充方法依照不同的缺失调制解调器制，依赖数据分布的假设，以及可能引入偏见。本研究提出了对缺失数据的对照学习（CL）框架，其中ML模型学习缺失数据中的相似性和不相似性。我们的提案方法不需要任何填充，并且可以提高可读性。为了增强可读性，我们导入了CIVis，一个可读性系统，它结合了可读技术来显示学习过程和诊断模型状态。用户可以透过互动采样来运用专业知识来选择负面和正面对照，而CIVis的输出则是一个已优化的模型，可以根据指定的特征进行下游任务预测。我们在回归和分类任务中提供了两个使用案例，并进行了量化实验、专家访谈和 качеitative用户研究，以证明我们的方法的有效性。简而言之，这个研究为ML模型在缺失数据下的挑战提供了实用的解决方案，可以实现高预测精度和模型可读性。”

The NFLikelihood: an unsupervised DNNLikelihood from Normalizing Flows

paper_url: http://arxiv.org/abs/2309.09743
repo_url: None
paper_authors: Humberto Reyes-Gonzalez, Riccardo Torre
for: 这个论文是为了探讨一种无监督的方法，基于正常化流程，来学习高维度的likelihood函数，具体来说是在高能物理分析中。
methods: 这个论文使用了自适应流程，基于 affine 和 rational quadratic spline 函数，来学习高维度的likelihood函数。
results: 论文通过实际例子示出，这种方法可以学习复杂的高维度likelihood函数，并且可以应用于高能物理分析中的几个实际问题。

Abstract
We propose the NFLikelihood, an unsupervised version, based on Normalizing Flows, of the DNNLikelihood proposed in Ref.[1]. We show, through realistic examples, how Autoregressive Flows, based on affine and rational quadratic spline bijectors, are able to learn complicated high-dimensional Likelihoods arising in High Energy Physics (HEP) analyses. We focus on a toy LHC analysis example already considered in the literature and on two Effective Field Theory fits of flavor and electroweak observables, whose samples have been obtained throught the HEPFit code. We discuss advantages and disadvantages of the unsupervised approach with respect to the supervised one and discuss possible interplays of the two.

摘要
我们提出了NFLikelihood，一种无监督版本，基于归一化流，与Ref.[1]中提出的DNNLikelihood相似。我们通过实际的示例显示，使用自适应流，基于线性和quadratic spline bijectors，可以学习高维的Likelihood函数，出现在高能物理分析中。我们将focus on一个LHC分析示例，已经出现在文献中，以及两个Effective Field Theory的观测量 fits，其样本通过HEPFit代码获得。我们讨论无监督方法与监督方法之间的优劣点，以及两者之间的可能的互动。

Contrastive Learning and Data Augmentation in Traffic Classification Using a Flowpic Input Representation

paper_url: http://arxiv.org/abs/2309.09733
repo_url: None
paper_authors: Alessandro Finamore, Chao Wang, Jonatan Krolikowski, Jose M. Navarro, Fuxing Chen, Dario Rossi
for: 本研究是一篇关于交通分类（TC）的论文，采用了最新的深度学习（DL）方法。
methods: 本研究使用了少量学习、自我超vision via对比学习和数据增强等方法，以学习从少量样本中，并将学习结果转移到不同的数据集上。
results: 研究发现，使用这些DL方法，只需要使用100个输入样本，可以达到非常高的准确率，使用“流图”（i.e., 每个流量的2D histogram）作为输入表示。本研究还重现了原论文中的一些关键结果，并在三个额外的公共数据集上进行了数据增强的研究。

Abstract
Over the last years we witnessed a renewed interest towards Traffic Classification (TC) captivated by the rise of Deep Learning (DL). Yet, the vast majority of TC literature lacks code artifacts, performance assessments across datasets and reference comparisons against Machine Learning (ML) methods. Among those works, a recent study from IMC'22 [17] is worth of attention since it adopts recent DL methodologies (namely, few-shot learning, self-supervision via contrastive learning and data augmentation) appealing for networking as they enable to learn from a few samples and transfer across datasets. The main result of [17] on the UCDAVIS19, ISCX-VPN and ISCX-Tor datasets is that, with such DL methodologies, 100 input samples are enough to achieve very high accuracy using an input representation called "flowpic" (i.e., a per-flow 2d histograms of the packets size evolution over time). In this paper (i) we reproduce [17] on the same datasets and (ii) we replicate its most salient aspect (the importance of data augmentation) on three additional public datasets, MIRAGE-19, MIRAGE-22 and UTMOBILENET21. While we confirm most of the original results, we also found a 20% accuracy drop on some of the investigated scenarios due to a data shift in the original dataset that we uncovered. Additionally, our study validates that the data augmentation strategies studied in [17] perform well on other datasets too. In the spirit of reproducibility and replicability we make all artifacts (code and data) available at [10].

摘要
过去几年，流行推理（TC）再次吸引了深度学习（DL）的关注。然而，大多数TC文献缺乏代码艺术ifacts，数据集之间的性能评估和对机器学习（ML）方法的参照比较。其中一项研究，IMC'22[17]，在网络领域引起了关注，因为它采用了当今DL技术（即少量学习、自我超视观察和数据扩展），这些技术可以通过几个样本学习并在数据集之间传递。这个研究的主要结果是，使用这些DL技术，只需要100个输入样本就可以达到非常高的准确率，使用名为"流图"（即每个流量2D histogram的包大小演化过时）的输入表示。在本文中，我们首先复制[17]中的主要方面（数据扩展的重要性）在三个公共数据集上进行了重复实验：MIRAGE-19、MIRAGE-22和UTMOBILENET21。我们证明了大部分原始结果的确认，但也发现了一些情况下的20%准确率下降，这是由原始数据集中的数据变换所致。此外，我们的研究还证明了在其他数据集上，[17]中研究的数据扩展策略也表现良好。为了保持可重复性和复制性，我们在[10]上公开了所有文件（代码和数据）。

Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data

paper_url: http://arxiv.org/abs/2309.09725
repo_url: https://github.com/wanlihongc/neural-collapse
paper_authors: Wanli Hong, Shuyang Ling
for: 这paper研究了不等式特征模型下的神经网络坍缩现象（Neural Collapse，NC）在不均衡数据上的扩展。
methods: 该paper使用了无约束特征模型（Unconstrained Feature Model，UFM）来解释NC现象。
results: 研究发现，在不均衡数据上，NC现象仍然存在，但是feature vectors内的坍缩现象不再是等角的，而是受样本大小的影响。此外，研究还发现了一个锐度的阈值，当阈值超过这个阈值时，小类坍缩（feature vectors of minority groups collapse to one single vector）会发生。最后，研究发现，随着样本大小的增加，数据不均衡的影响会逐渐减弱。

Abstract
Recent years have witnessed the huge success of deep neural networks (DNNs) in various tasks of computer vision and text processing. Interestingly, these DNNs with massive number of parameters share similar structural properties on their feature representation and last-layer classifier at terminal phase of training (TPT). Specifically, if the training data are balanced (each class shares the same number of samples), it is observed that the feature vectors of samples from the same class converge to their corresponding in-class mean features and their pairwise angles are the same. This fascinating phenomenon is known as Neural Collapse (N C), first termed by Papyan, Han, and Donoho in 2019. Many recent works manage to theoretically explain this phenomenon by adopting so-called unconstrained feature model (UFM). In this paper, we study the extension of N C phenomenon to the imbalanced data under cross-entropy loss function in the context of unconstrained feature model. Our contribution is multi-fold compared with the state-of-the-art results: (a) we show that the feature vectors exhibit collapse phenomenon, i.e., the features within the same class collapse to the same mean vector; (b) the mean feature vectors no longer form an equiangular tight frame. Instead, their pairwise angles depend on the sample size; (c) we also precisely characterize the sharp threshold on which the minority collapse (the feature vectors of the minority groups collapse to one single vector) will take place; (d) finally, we argue that the effect of the imbalance in datasize diminishes as the sample size grows. Our results provide a complete picture of the N C under the cross-entropy loss for the imbalanced data. Numerical experiments confirm our theoretical analysis.

摘要
近年来，深度神经网络（DNN）在计算机视觉和自然语言处理等领域取得了巨大成功。意外的是，这些DNN具有庞大参数的结构性质在特定阶段训练（TPT）中的特征表示和最后一层分类器之间存在类似性。具体来说，如果训练数据均衡（每个类具有相同的样本数），则观察到样本从同一个类划分的特征向量相互吸引，其对角度保持相同。这种精彩的现象被称为神经塌缩（NC），由Papyan、Han和Donoho在2019年提出。许多最近的工作尝试理解这种现象，通过采用不受限制的特征模型（UFM）。在这篇论文中，我们研究了NC现象在不均衡数据下，使用交叉熵损失函数的情况。我们的贡献包括以下几点：（a）特征向量展现塌缩现象，即同一个类划分的特征向量塌缩到同一个均值向量；（b）均值特征向量不再形成等角紧凑框架，而是对样本大小具有相互关系的对角度；（c）我们也准确地描述了小于一定的阈值，下面的少数塌缩（特征向量少数组划分到一个向量）会发生的具体时间点；（d）最后，我们认为数据大小差异的影响随着样本大小的增长而减少。我们的结果为NC现象在交叉熵损失下的不均衡数据提供了完整的图像。数据实验证实了我们的理论分析。

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

paper_url: http://arxiv.org/abs/2309.09719
repo_url: None
paper_authors: Hao Sun, Li Shen, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao
for: This paper focuses on improving the efficiency of federated learning, especially for training large-scale deep neural networks with heterogeneous data.
methods: The proposed method, FedLALR, adjusts the learning rate for each client based on local historical gradient squares and synchronized learning rates, which enables the method to converge and achieve linear speedup with respect to the number of clients.
results: The theoretical analysis and experimental results show that FedLALR outperforms several communication-efficient federated optimization methods in terms of convergence speed and scalability, and achieves promising results on CV and NLP tasks.

Abstract
Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.

摘要
归一学习是一种新般的分布式机器学习方法，允许大量客户端共同训练模型，无需交换本地数据。在归一学习中，通信时间成本是一个重要瓶颈，��pecially when training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. However, they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence.在这篇论文中，我们提出了一种归一学习中的本地自适应学习率调整方法，称为FedLALR。每个客户端根据本地历史梯度平方和同步学习率进行自适应学习率调整。我们的客户端自定义自适应学习率调整策略可以使模型快速收敛，并且可以在客户端数量增加时实现线性的速度增长。我们还对几种通信效率高的联邦优化方法进行了比较实验。我们的实验结果表明，FedLALR方法可以在CV任务和NLP任务上达到良好的效果，并且与我们的理论预测相符。

Multi-Dictionary Tensor Decomposition

paper_url: http://arxiv.org/abs/2309.09717
repo_url: None
paper_authors: Maxwell McNeil, Petko Bogdanov
for: 多方向数据的分析，如社交媒体、医疗、时空域等领域的数据分析
methods: 使用多字做 tensor decomposition 方法，利用各种数据驱动的假设来分解 tensor
results: 提出了一种多字做 tensor decomposition 框架（MDTD），可以利用外部structural信息来获得稀疏编码的 tensor 因子，并且可以处理大型稀疏tensor。实验表明，相比于字做法，MDTD 可以学习更简洁的模型，并且可以提高数据重建质量、缺失值填充质量和 tensor 维度的估计。同时，MDTD 的运行时间并不受影响，可以快速处理大型数据。

Abstract
Tensor decomposition methods are popular tools for analysis of multi-way datasets from social media, healthcare, spatio-temporal domains, and others. Widely adopted models such as Tucker and canonical polyadic decomposition (CPD) follow a data-driven philosophy: they decompose a tensor into factors that approximate the observed data well. In some cases side information is available about the tensor modes. For example, in a temporal user-item purchases tensor a user influence graph, an item similarity graph, and knowledge about seasonality or trends in the temporal mode may be available. Such side information may enable more succinct and interpretable tensor decomposition models and improved quality in downstream tasks. We propose a framework for Multi-Dictionary Tensor Decomposition (MDTD) which takes advantage of prior structural information about tensor modes in the form of coding dictionaries to obtain sparsely encoded tensor factors. We derive a general optimization algorithm for MDTD that handles both complete input and input with missing values. Our framework handles large sparse tensors typical to many real-world application domains. We demonstrate MDTD's utility via experiments with both synthetic and real-world datasets. It learns more concise models than dictionary-free counterparts and improves (i) reconstruction quality ($60\%$ fewer non-zero coefficients coupled with smaller error); (ii) missing values imputation quality (two-fold MSE reduction with up to orders of magnitude time savings) and (iii) the estimation of the tensor rank. MDTD's quality improvements do not come with a running time premium: it can decompose $19GB$ datasets in less than a minute. It can also impute missing values in sparse billion-entry tensors more accurately and scalably than state-of-the-art competitors.

摘要
tensor 分解方法是社交媒体、医疗、时空域等多维数据分析的流行工具。广泛采用的模型，如图克 decomposition 和 canonical polyadic decomposition (CPD) 采用数据驱动 philosophy：它们将 tensor decomposed into factors that approximate observed data well。在某些情况下，tensor 模式上有侧信息可用，例如，在 temporal 用户 item 购买 tensor 中，用户影响图、item similarity graph 和 temporal 模式中的季节性或趋势信息可能可用。这些侧信息可能使 tensor 分解模型更简洁可读，改进下游任务质量。我们提出了一个 Multi-Dictionary Tensor Decomposition (MDTD) 框架，利用 tensor 模式上的编码字典来获得精简编码 tensor 因子。我们 deriv 一种通用优化算法 для MDTD，可以处理完整输入和 missing 值输入。我们的框架可以处理大量的巨大稀盐 tensor，通常出现在实际应用中。我们通过实验表明，MDTD 可以学习更简洁的模型，并提高（i）重建质量（60% fewer non-zero coefficients 和 smaller error），（ii）缺失值插值质量（two-fold MSE reduction with up to orders of magnitude time savings）和（iii） tensor 级别的估计。MDTD 的质量改进不会带来运行时间开销：它可以在一分钟内分解 19GB 的数据。它还可以更准确和可扩展地插值缺失的 billion-entry tensor than state-of-the-art 竞争对手。

A Study of Data-driven Methods for Adaptive Forecasting of COVID-19 Cases

paper_url: http://arxiv.org/abs/2309.09698
repo_url: None
paper_authors: Charithea Stylianides, Kleanthis Malialis, Panayiotis Kolios
for: 本研究旨在investigate数据驱动（学习、统计）方法，以适应COVID-19病毒传播的非站点性条件。
methods: 本研究使用数据驱动学习和统计方法，以incrementally更新模型，适应不断变化的病毒传播条件。
results: 实验结果表明，该方法在不同的病毒浪涌期内，可以提供高准确率的预测结果，并在疫情爆发时进行有效的预测。

Abstract
Severe acute respiratory disease SARS-CoV-2 has had a found impact on public health systems and healthcare emergency response especially with respect to making decisions on the most effective measures to be taken at any given time. As demonstrated throughout the last three years with COVID-19, the prediction of the number of positive cases can be an effective way to facilitate decision-making. However, the limited availability of data and the highly dynamic and uncertain nature of the virus transmissibility makes this task very challenging. Aiming at investigating these challenges and in order to address this problem, this work studies data-driven (learning, statistical) methods for incrementally training models to adapt to these nonstationary conditions. An extensive empirical study is conducted to examine various characteristics, such as, performance analysis on a per virus wave basis, feature extraction, "lookback" window size, memory size, all for next-, 7-, and 14-day forecasting tasks. We demonstrate that the incremental learning framework can successfully address the aforementioned challenges and perform well during outbreaks, providing accurate predictions.

摘要
严重急性呼吸疾病SARS-CoV-2对公共卫生系统和医疗紧急应急响应有着深远的影响，尤其是在决定最有效的措施时采取决策。在过去三年的COVID-19疫情中，预测病例数量是一项有效的决策支持。然而，数据有限性和病毒传播性的高度动态和不确定性使得这项工作具有挑战性。本研究旨在调查这些挑战，并通过数据驱动（学习、统计）方法来适应这些非站点条件。我们进行了广泛的实践研究，包括精心分析不同的特征，如批处理大小、缓存大小、memory大小等，以及下一天、7天、14天预测任务的性能分析。我们示出了增量学习框架可以成功地解决上述挑战，并在爆发期间提供 precisions 的预测。

VULNERLIZER: Cross-analysis Between Vulnerabilities and Software Libraries

paper_url: http://arxiv.org/abs/2309.09649
repo_url: None
paper_authors: Irdin Pekaric, Michael Felderer, Philipp Steinmüller
For: 本研究旨在提供一种新的漏洞检测方法，用于针对软件项目中的漏洞进行检测。* Methods: 本方法使用CVE和软件库数据，结合归一化算法生成漏洞和库之间的链接。此外，还进行模型训练，以更新分配的权重。* Results: 研究结果显示，使用VULNERLIZER方法可以准确预测未来可能出现漏洞的软件库，并达到预测精度75%或更高。

Abstract
The identification of vulnerabilities is a continuous challenge in software projects. This is due to the evolution of methods that attackers employ as well as the constant updates to the software, which reveal additional issues. As a result, new and innovative approaches for the identification of vulnerable software are needed. In this paper, we present VULNERLIZER, which is a novel framework for cross-analysis between vulnerabilities and software libraries. It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries. In addition, the training of the model is conducted in order to reevaluate the generated associations. This is achieved by updating the assigned weights. Finally, the approach is then evaluated by making the predictions using the CVE data from the test set. The results show that the VULNERLIZER has a great potential in being able to predict future vulnerable libraries based on an initial input CVE entry or a software library. The trained model reaches a prediction accuracy of 75% or higher.

摘要
“找到漏洞是软件项目中不断的挑战。这是因为攻击者的方法不断发展以及软件不断更新，导致新的漏洞披露。为此，我们提出了一种新的漏洞识别框架——漏洞LIZER。它使用CVE和软件库数据，结合聚类算法生成漏洞和库之间的关联。此外，我们还进行了模型训练，以重新评估生成的关联。这是通过更新分配的权重来实现的。最后，我们对测试集中的CVE数据进行预测，结果显示，漏洞LIZER可以准确预测基于输入CVE记录或软件库的未来漏洞。训练模型的准确率达75%或更高。”

A Discussion on Generalization in Next-Activity Prediction

paper_url: http://arxiv.org/abs/2309.09618
repo_url: None
paper_authors: Luka Abb, Peter Pfeiffer, Peter Fettke, Jana-Rebecca Rehse
for: 本研究旨在评估深度学习技术在下一个活动预测中的效果，并提出了不同的预测场景，以促进未来研究的发展。
methods: 本研究使用了深度学习技术进行下一个活动预测，并评估了其预测性能使用公共可用事件日志。
results: 研究发现现有的评估方法带来很大的示例泄露问题，导致使用深度学习技术的预测方法并不如预期中效果好。

Abstract
Next activity prediction aims to forecast the future behavior of running process instances. Recent publications in this field predominantly employ deep learning techniques and evaluate their prediction performance using publicly available event logs. This paper presents empirical evidence that calls into question the effectiveness of these current evaluation approaches. We show that there is an enormous amount of example leakage in all of the commonly used event logs, so that rather trivial prediction approaches perform almost as well as ones that leverage deep learning. We further argue that designing robust evaluations requires a more profound conceptual engagement with the topic of next-activity prediction, and specifically with the notion of generalization to new data. To this end, we present various prediction scenarios that necessitate different types of generalization to guide future research.

摘要
下一个活动预测目标是预测运行进程实例的未来行为。现有文献主要采用深度学习技术进行预测性能评估，使用公共可用事件日志进行评估。本文提供了实验证据，质疑现有评价方法的效果。我们发现所有常用的事件日志具有很大的示例泄露，使得基本的预测方法几乎与深度学习相当。我们还认为设计Robust评估需要更深刻的概念理解，特别是一致到新数据的总结。为此，我们提出了不同类型的预测场景，以引导未来研究。

Latent assimilation with implicit neural representations for unknown dynamics

paper_url: http://arxiv.org/abs/2309.09574
repo_url: None
paper_authors: Zhuoyuan Li, Bin Dong, Pingwen Zhang
for: 这种研究是为了解决数据吸收中的高计算成本和数据维度问题。
methods: 该研究使用了新的抽象框架，即秘密吸收与卷积神经网络（LAINR），其中引入了圆形秘密神经表示（SINR）和数据驱动的神经网络 uncertainty 估计器。
results: 实验结果表明，与基于AutoEncoder的方法相比，LAINR在吸收过程中具有更高的精度和效率。

Abstract
Data assimilation is crucial in a wide range of applications, but it often faces challenges such as high computational costs due to data dimensionality and incomplete understanding of underlying mechanisms. To address these challenges, this study presents a novel assimilation framework, termed Latent Assimilation with Implicit Neural Representations (LAINR). By introducing Spherical Implicit Neural Representations (SINR) along with a data-driven uncertainty estimator of the trained neural networks, LAINR enhances efficiency in assimilation process. Experimental results indicate that LAINR holds certain advantage over existing methods based on AutoEncoders, both in terms of accuracy and efficiency.

摘要
<>转换文本到简化中文。<>数据融合在各种应用中是关键，但它经常遇到高计算成本的数据维度和下面机制的不完全理解。为解决这些挑战，本研究提出了一种新的融合框架，称为潜在融合（LAINR）。通过引入圆形潜在神经表示（SINR）以及基于训练神经网络的数据驱动 uncertainty 估计器，LAINR 提高了融合过程的效率。实验结果表明，LAINR 在比AutoEncoders 基于的方法上具有更高的准确性和效率。

New Bounds on the Accuracy of Majority Voting for Multi-Class Classification

paper_url: http://arxiv.org/abs/2309.09564
repo_url: None
paper_authors: Sina Aeeneh, Nikola Zlatanov, Jiangshan Yu
for: 这个论文主要研究了多类别分类问题中的多数投票函数（MVF）的精度。
methods: 本论文使用了独立且非Identically分布的选民模型，并 derivated了MVF在多类别分类问题中的新上界。
results: 研究发现，在满足 certain conditions 的情况下，MVF在多类别分类问题中的误差率会指数减少到零，而在不满足这些条件的情况下，误差率会指数增长。此外，研究还发现了对真实分类算法的精度，其在best-case情况下可以达到小误差率，但在worst-case情况下可能高于MVF的误差率。

Abstract
Majority voting is a simple mathematical function that returns the value that appears most often in a set. As a popular decision fusion technique, the majority voting function (MVF) finds applications in resolving conflicts, where a number of independent voters report their opinions on a classification problem. Despite its importance and its various applications in ensemble learning, data crowd-sourcing, remote sensing, and data oracles for blockchains, the accuracy of the MVF for the general multi-class classification problem has remained unknown. In this paper, we derive a new upper bound on the accuracy of the MVF for the multi-class classification problem. More specifically, we show that under certain conditions, the error rate of the MVF exponentially decays toward zero as the number of independent voters increases. Conversely, the error rate of the MVF exponentially grows with the number of independent voters if these conditions are not met. We first explore the problem for independent and identically distributed voters where we assume that every voter follows the same conditional probability distribution of voting for different classes, given the true classification of the data point. Next, we extend our results for the case where the voters are independent but non-identically distributed. Using the derived results, we then provide a discussion on the accuracy of the truth discovery algorithms. We show that in the best-case scenarios, truth discovery algorithms operate as an amplified MVF and thereby achieve a small error rate only when the MVF achieves a small error rate, and vice versa, achieve a large error rate when the MVF also achieves a large error rate. In the worst-case scenario, the truth discovery algorithms may achieve a higher error rate than the MVF. Finally, we confirm our theoretical results using numerical simulations.

摘要
多数投票是一种简单的数学函数，返回集合中出现最多的值。作为一种受欢迎的决策融合技术，多数投票函数（MVF）在解决冲突、 ensemble learning、数据投票、远程感知和数据链等领域都有应用。 despite its importance and various applications, the accuracy of MVF for the general multi-class classification problem remains unknown. In this paper, we derive a new upper bound on the accuracy of MVF for the multi-class classification problem. Specifically, we show that under certain conditions, the error rate of MVF exponentially decays toward zero as the number of independent voters increases. Conversely, the error rate of MVF exponentially grows with the number of independent voters if these conditions are not met. We first explore the problem for independent and identically distributed voters, assuming that every voter follows the same conditional probability distribution of voting for different classes given the true classification of the data point. Next, we extend our results to the case where the voters are independent but non-identically distributed. Using the derived results, we then provide a discussion on the accuracy of truth discovery algorithms. We show that in the best-case scenarios, truth discovery algorithms operate as an amplified MVF and thereby achieve a small error rate only when the MVF achieves a small error rate, and vice versa, achieve a large error rate when the MVF also achieves a large error rate. In the worst-case scenario, truth discovery algorithms may achieve a higher error rate than MVF. Finally, we confirm our theoretical results using numerical simulations.

Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

paper_url: http://arxiv.org/abs/2309.09548
repo_url: None
paper_authors: Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
for: 这个研究的目的是提高听力器扩展设备中的自动评估speech intelligibility的精度。
methods: 这个研究使用了一种改进后的多支分支扩展speech intelligibility预测模型，称为MBI-Net+和MBI-Net++。MBI-Net+使用Whisper嵌入来扩展语音特征，而MBI-Net++则使用了一个辅助任务来预测帧级和语录级的对象语音理解指标HASPI的分数。
results: 实验结果表明，MBI-Net++和MBI-Net+都比MBI-Net在多种指标上表现更好，而MBI-Net++还是MBI-Net+的更好。

Abstract
Automated assessment of speech intelligibility in hearing aid (HA) devices is of great importance. Our previous work introduced a non-intrusive multi-branched speech intelligibility prediction model called MBI-Net, which achieved top performance in the Clarity Prediction Challenge 2022. Based on the promising results of the MBI-Net model, we aim to further enhance its performance by leveraging Whisper embeddings to enrich acoustic features. In this study, we propose two improved models, namely MBI-Net+ and MBI-Net++. MBI-Net+ maintains the same model architecture as MBI-Net, but replaces self-supervised learning (SSL) speech embeddings with Whisper embeddings to deploy cross-domain features. On the other hand, MBI-Net++ further employs a more elaborate design, incorporating an auxiliary task to predict frame-level and utterance-level scores of the objective speech intelligibility metric HASPI (Hearing Aid Speech Perception Index) and multi-task learning. Experimental results confirm that both MBI-Net++ and MBI-Net+ achieve better prediction performance than MBI-Net in terms of multiple metrics, and MBI-Net++ is better than MBI-Net+.

摘要
Here's the translation in Simplified Chinese:自动评估听力器（HA）设备的听力 intelligibility 非常重要。我们之前的工作推出了一种非侵入式多支路听力 intelligibility 预测模型，称为 MBI-Net，在 Clarity Prediction Challenge 2022 中获得了优秀的成绩。基于 MBI-Net 模型的承诺 результа，我们希望进一步提高其性能，通过使用 Whisper 嵌入来增强语音特征。在这项研究中，我们提出了两种改进的模型，即 MBI-Net+ 和 MBI-Net++。MBI-Net+ 维持了同 MBI-Net 模型的同样结构，但是将 SSL 语音嵌入替换为 Whisper 嵌入，以便在不同频谱上进行跨频域特征的部署。而 MBI-Net++ 则进一步采用了更加复杂的设计，包括在对象听力指数 HASPI (Hearing Aid Speech Perception Index) 的帧级和语音级分数预测任务中使用多任务学习。实验结果表明，MBI-Net++ 和 MBI-Net+ 都在多个指标上超过 MBI-Net，而 MBI-Net++ 则是 MBI-Net+ 的更好。

Quantum Wasserstein GANs for State Preparation at Unseen Points of a Phase Diagram

paper_url: http://arxiv.org/abs/2309.09543
repo_url: None
paper_authors: Wiktor Jurasz, Christian B. Mendl
for: 本研究旨在扩展生成模型，特别是生成对抗网络（GANs）到量子域，并解决当前方法的局限性。
methods: 我们提出了一种新的混合类型-量子方法，基于量子沃氏赋形GANs，可以学习输入集中的测量期望函数，并生成未经见过的新状态。
results: 我们的方法可以生成新的状态，其测量期望函数遵循同一个下面函数，这些状态没有出现在输入集中。

Abstract
Generative models and in particular Generative Adversarial Networks (GANs) have become very popular and powerful data generation tool. In recent years, major progress has been made in extending this concept into the quantum realm. However, most of the current methods focus on generating classes of states that were supplied in the input set and seen at the training time. In this work, we propose a new hybrid classical-quantum method based on quantum Wasserstein GANs that overcomes this limitation. It allows to learn the function governing the measurement expectations of the supplied states and generate new states, that were not a part of the input set, but which expectations follow the same underlying function.

摘要
生成模型，特别是生成对抗网络（GANs），在过去几年变得非常流行和强大，用于数据生成。然而，大多数当前方法仅能生成训练时提供的类别的状态。在这种情况下，我们提议一种新的混合类 quantum 方法，基于量子沃尔帕特 GANs，可以超越这一限制。它可以学习输入状态的测量预期函数，并生成没有在输入集中出现过的新状态，但是预期函数follows the same underlying function。

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

paper_url: http://arxiv.org/abs/2309.09510
repo_url: https://github.com/dynamic-superb/dynamic-superb
paper_authors: Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee
for: This paper aims to present a benchmark for building universal speech models that can perform multiple tasks in a zero-shot fashion using instruction tuning.
methods: The paper proposes a benchmark called Dynamic-SUPERB, which combines 33 tasks and 22 datasets to provide comprehensive coverage of diverse speech tasks and harness instruction tuning. The paper also proposes several approaches to establish benchmark baselines, including the use of speech models, text language models, and the multimodal encoder.
results: The evaluation results show that while the baselines perform reasonably on seen tasks, they struggle with unseen ones. The paper also conducts an ablation study to assess the robustness and seek improvements in the performance.

Abstract
Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion. To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark. To initiate, Dynamic-SUPERB features 55 evaluation instances by combining 33 tasks and 22 datasets. This spans a broad spectrum of dimensions, providing a comprehensive platform for evaluation. Additionally, we propose several approaches to establish benchmark baselines. These include the utilization of speech models, text language models, and the multimodal encoder. Evaluation results indicate that while these baselines perform reasonably on seen tasks, they struggle with unseen ones. We also conducted an ablation study to assess the robustness and seek improvements in the performance. We release all materials to the public and welcome researchers to collaborate on the project, advancing technologies in the field together.

摘要
To ensure comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute to the benchmark, facilitating its dynamic growth. Dynamic-SUPERB currently features 55 evaluation instances by combining 33 tasks and 22 datasets, providing a broad spectrum of dimensions for evaluation. We also propose several approaches to establish benchmark baselines, including the use of speech models, text language models, and the multimodal encoder.Evaluation results show that while these baselines perform reasonably well on seen tasks, they struggle with unseen ones. To improve performance, we conducted an ablation study to assess the robustness and seek improvements. We release all materials to the public and welcome researchers to collaborate on the project, advancing technologies in the field together.Here's the translation in Simplified Chinese:文本语言模型已经展现出很强的零 shot 能力，可以通过提供良好的指令来泛化到未看过任务。然而，现有的语音处理研究主要集中在限定或特定的任务上，而lack of standardized benchmarks 使得不同方法之间的比较不公平。为此，我们提出了Dynamic-SUPERB，一个用于建立通用语音模型的 benchmark，可以通过指令调整来完成多个任务的零 shot 泛化。为确保语音任务的全面覆盖和利用指令调整，我们邀请社区参与合作，以便不断扩展 benchmark。Dynamic-SUPERB 目前已经包含了55个评估实例，通过组合 33 个任务和 22 个数据集，提供了广泛的维度评估。我们还提出了一些建立 benchmark 基准的方法，包括使用语音模型、文本语言模型和多模式Encoder。评估结果显示，虽然这些基准在看到任务上表现良好，但在未看到任务上表现不佳。为了提高性能，我们进行了一些剖析研究，以评估Robustness和寻找改进。我们将所有材料公开发布，并邀请研究人员一起合作项目，共同推动领域技术的发展。

Outlier-Insensitive Kalman Filtering: Theory and Applications

paper_url: http://arxiv.org/abs/2309.09505
repo_url: None
paper_authors: Shunit Truzman, Guy Revach, Nir Shlezinger, Itzik Klein
for: 这篇论文是关于如何从含有噪音观测的动力系统中进行状态估计，以提高适用范围和精度。
methods: 本文提出了一个具有自适应能力的实时状态估计方法，可以快速处理含有噪音观测的动力系统，并且不需要调整参数。
results: 实验和场景评估表明，本文的方法能够对含有噪音观测的动力系统进行精确的状态估计，并且比于其他方法更具有抗错误性。

Abstract
State estimation of dynamical systems from noisy observations is a fundamental task in many applications. It is commonly addressed using the linear Kalman filter (KF), whose performance can significantly degrade in the presence of outliers in the observations, due to the sensitivity of its convex quadratic objective function. To mitigate such behavior, outlier detection algorithms can be applied. In this work, we propose a parameter-free algorithm which mitigates the harmful effect of outliers while requiring only a short iterative process of the standard update step of the KF. To that end, we model each potential outlier as a normal process with unknown variance and apply online estimation through either expectation maximization or alternating maximization algorithms. Simulations and field experiment evaluations demonstrate competitive performance of our method, showcasing its robustness to outliers in filtering scenarios compared to alternative algorithms.

摘要
<> translates as:状态估计动力系统从噪声观测中是许多应用中的基本任务。通常使用线性卡尔曼筛（KF）来解决这个问题，但是在观测中出现异常值时，KF的性能会受到严重损害，因为它的对称二阶函数会变得敏感。为了解决这个问题，我们可以使用异常检测算法。在这种情况下，我们模型每个可能的异常为正常过程的不确定噪声，并通过在线估计来进行对其进行更新。这种方法不需要任何参数，仅需要短暂的迭代过程。在实验和场景评估中，我们的方法能够与其他算法相比赢得竞争优势，表明其对异常观测的抵抗力在筛码场景中较高。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Machine Learning Approaches to Predict and Detect Early-Onset of Digital Dermatitis in Dairy Cows using Sensor Data

paper_url: http://arxiv.org/abs/2309.10010
repo_url: None
paper_authors: Jennifer Magana, Dinu Gavojdian, Yakir Menachem, Teddy Lazebnik, Anna Zamansky, Amber Adams-Progar
for: 本研究旨在采用机器学习算法基于传感器行为数据早期发现和预测牛皮病（DD）。
methods: 本研究使用了机器学习模型，基于牛皮病症状出现的日期和时间，使用传感器数据进行预测和检测。
results: 研究发现，使用行为传感器数据预测和检测牛皮病的机器学习模型可达79%的准确率，预测牛皮病2天前的模型准确率为64%。这些机器学习模型可以帮助开发基于行为传感器数据的实时自动牛皮病监测和诊断工具，用于检测牛皮病的症状变化。

Abstract
The aim of this study was to employ machine learning algorithms based on sensor behavior data for (1) early-onset detection of digital dermatitis (DD); and (2) DD prediction in dairy cows. With the ultimate goal to set-up early warning tools for DD prediction, which would than allow a better monitoring and management of DD under commercial settings, resulting in a decrease of DD prevalence and severity, while improving animal welfare. A machine learning model that is capable of predicting and detecting digital dermatitis in cows housed under free-stall conditions based on behavior sensor data has been purposed and tested in this exploratory study. The model for DD detection on day 0 of the appearance of the clinical signs has reached an accuracy of 79%, while the model for prediction of DD 2 days prior to the appearance of the first clinical signs has reached an accuracy of 64%. The proposed machine learning models could help to develop a real-time automated tool for monitoring and diagnostic of DD in lactating dairy cows, based on behavior sensor data under conventional dairy environments. Results showed that alterations in behavioral patterns at individual levels can be used as inputs in an early warning system for herd management in order to detect variances in health of individual cows.

摘要
The study proposed a machine learning model that can predict and detect DD in cows housed under free-stall conditions based on behavior sensor data. The model achieved an accuracy of 79% in detecting DD on day 0 of the appearance of clinical signs, and an accuracy of 64% in predicting DD 2 days prior to the first clinical signs.The results of the study showed that alterations in behavioral patterns at the individual level can be used as inputs in an early warning system for herd management to detect variances in the health of individual cows. The proposed machine learning models have the potential to develop a real-time automated tool for monitoring and diagnosis of DD in lactating dairy cows based on behavior sensor data under conventional dairy environments.

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

paper_url: http://arxiv.org/abs/2309.09470
repo_url: None
paper_authors: Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling
for: 这篇论文目标是提出一种基于 face 图像的零 shot 语音转换任务（zero-shot FaceVC），即将源 speaker 的语音特征转换到目标 speaker 的语音特征上，只使用目标 speaker 的单个 face 图像。
methods: 我们提议一种基于 memory-based face-voice 对应模块的 zero-shot FaceVC 方法，通过槽来对这两种模式进行对应，以 capture 语音特征从 face 图像中。我们还提出了一种混合超级visit策略，以解决voice conversion任务在训练和推断阶段之间的一贯性问题。
results: 通过广泛的实验，我们证明了我们提出的方法在 zero-shot FaceVC 任务中的优越性。我们还设计了系统的主观和客观评价指标，以全面评估homogeneity、多样性和一致性 controlled by face images。

Abstract
This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker. To address this task, we propose a face-voice memory-based zero-shot FaceVC method. This method leverages a memory-based face-voice alignment module, in which slots act as the bridge to align these two modalities, allowing for the capture of voice characteristics from face images. A mixed supervision strategy is also introduced to mitigate the long-standing issue of the inconsistency between training and inference phases for voice conversion tasks. To obtain speaker-independent content-related representations, we transfer the knowledge from a pretrained zero-shot voice conversion model to our zero-shot FaceVC model. Considering the differences between FaceVC and traditional voice conversion tasks, systematic subjective and objective metrics are designed to thoroughly evaluate the homogeneity, diversity and consistency of voice characteristics controlled by face images. Through extensive experiments, we demonstrate the superiority of our proposed method on the zero-shot FaceVC task. Samples are presented on our demo website.

摘要
Here is the text in Simplified Chinese:这篇论文介绍了一个新的任务：基于面像的零shot语音转换（zero-shot FaceVC），该任务的目标是将任何来源说话人的语音特征转换为新来的目标说话人，只使用单个面像。为解决这个任务，作者们提出了一种面voice记忆基于的零shot FaceVC方法。这种方法利用了一种面voice记忆对应模块，将这两种模式相互对应，以捕捉面像中的语音特征。此外，作者们还提出了一种混合超级visión策略，以解决长期存在的语音转换任务的训练和推断阶段不一致问题。为了获得无关 speaker的内容相关表示，作者们将预训练的零shot语音转换模型的知识传递到了他们的零shot FaceVC模型。鉴于FaceVC和传统的语音转换任务之间的差异，作者们设计了系统的主观和客观评价指标，以全面评估零shot FaceVC模型的性能。通过广泛的实验，作者们证明了他们的提出的方法在零shot FaceVC任务中的优越性。详细的样例可以在他们的 Demo 网站上找到。

Active anomaly detection based on deep one-class classification

paper_url: http://arxiv.org/abs/2309.09465
repo_url: https://github.com/mkkim-home/AAD
paper_authors: Minkyung Kim, Junsik Kim, Jongmin Yu, Jun Kyun Choi
for: 这 paper 是为了提高深度异常检测模型的训练效果而使用活动学习工具。
methods: 这 paper 使用了一种基于 adaptive boundary 的查询策略，以及一种 combining noise contrastive estimation 和一类分类模型的 semi-supervised learning 方法。
results: 这 paper 在 seven 个异常检测数据集上分别采用了这两种方法，并分析了它们的效果。

Abstract
Active learning has been utilized as an efficient tool in building anomaly detection models by leveraging expert feedback. In an active learning framework, a model queries samples to be labeled by experts and re-trains the model with the labeled data samples. It unburdens in obtaining annotated datasets while improving anomaly detection performance. However, most of the existing studies focus on helping experts identify as many abnormal data samples as possible, which is a sub-optimal approach for one-class classification-based deep anomaly detection. In this paper, we tackle two essential problems of active learning for Deep SVDD: query strategy and semi-supervised learning method. First, rather than solely identifying anomalies, our query strategy selects uncertain samples according to an adaptive boundary. Second, we apply noise contrastive estimation in training a one-class classification model to incorporate both labeled normal and abnormal data effectively. We analyze that the proposed query strategy and semi-supervised loss individually improve an active learning process of anomaly detection and further improve when combined together on seven anomaly detection datasets.

摘要
active learning 已经被用作一种高效的工具来建立异常检测模型，通过借助专家反馈来优化模型。在一个 active learning 框架中，一个模型会问选择要被标注的样本，然后使用这些标注的数据样本来重新训练模型。这不仅可以减少获取标注的数据量，还可以提高异常检测性能。然而，大多数现有的研究都是帮助专家标注最多的异常数据样本，这是一种优化的一类分类基于深度异常检测的方法。在这篇论文中，我们解决了 active learning 中的两个重要问题：查询策略和半supervised学习方法。首先，我们的查询策略选择的是一个适应边缘的不确定样本。其次，我们在训练一个一类分类模型时应用了噪声对照估算，以便同时使用标注的正常和异常数据来有效地捕捉一类分类模型。我们分析表明，我们提出的查询策略和半supervised损失函数分别提高了活动学习过程中的异常检测性能，并且当他们结合在一起时，可以更好地提高异常检测性能。我们在七个异常检测数据集上进行了分析。

CaT: Balanced Continual Graph Learning with Graph Condensation

paper_url: http://arxiv.org/abs/2309.09455
repo_url: https://github.com/superallen13/CaT-CGL
paper_authors: Yilun Liu, Ruihong Qiu, Zi Huang
for: 本研究旨在解决 continual graph learning (CGL) 中的快速卡斯特罗菲特问题，提高模型的稳定性和性能。
methods: 该研究提出了一种 Condense and Train (CaT) 框架，包括对新来 graph 的压缩和存储，以及在内存中直接更新模型。
results: 实验结果表明，CaT 框架可以有效解决快速卡斯特罗菲特问题，并提高 CGL 的效果和效率。

Abstract
Continual graph learning (CGL) is purposed to continuously update a graph model with graph data being fed in a streaming manner. Since the model easily forgets previously learned knowledge when training with new-coming data, the catastrophic forgetting problem has been the major focus in CGL. Recent replay-based methods intend to solve this problem by updating the model using both (1) the entire new-coming data and (2) a sampling-based memory bank that stores replayed graphs to approximate the distribution of historical data. After updating the model, a new replayed graph sampled from the incoming graph will be added to the existing memory bank. Despite these methods are intuitive and effective for the CGL, two issues are identified in this paper. Firstly, most sampling-based methods struggle to fully capture the historical distribution when the storage budget is tight. Secondly, a significant data imbalance exists in terms of the scales of the complex new-coming graph data and the lightweight memory bank, resulting in unbalanced training. To solve these issues, a Condense and Train (CaT) framework is proposed in this paper. Prior to each model update, the new-coming graph is condensed to a small yet informative synthesised replayed graph, which is then stored in a Condensed Graph Memory with historical replay graphs. In the continual learning phase, a Training in Memory scheme is used to update the model directly with the Condensed Graph Memory rather than the whole new-coming graph, which alleviates the data imbalance problem. Extensive experiments conducted on four benchmark datasets successfully demonstrate superior performances of the proposed CaT framework in terms of effectiveness and efficiency. The code has been released on https://github.com/superallen13/CaT-CGL.

摘要
kontinuous graf lerning (CGL) 是为了不断更新一个 graf 模型，使其能够处理流动式的 graf 数据。由于模型容易忘记先前学习的知识，因此 catastrophic forgetting 问题成为了 CGL 的主要关注点。latest replay-based methods 尝试解决这个问题，通过将模型更新使用整个新来的数据和一个储存 replayed graphs 的 memory bank，以便 aproximate 历史数据的分布。在更新模型后，将新的 replayed graph 从进来的 graf 中抽出来，并添加到现有的 memory bank 中。although 这些方法是直觉和有效的，这篇文章中提出了两个问题。首先，大多数抽样方法在存储预算仅仅允许的情况下，很难全面捕捉历史分布。其次，进来的新数据和储存在 memory bank 中的数据 scale 不对称，导致训练不平衡。为解决这些问题，这篇文章提出了一个 Condense and Train (CaT) 框架。在每次模型更新之前，新来的 graf 会被压缩成一个小而有用的 synthesized replayed graph，并将其储存在 Condensed Graph Memory 中。在持续学习阶段，使用 Train in Memory 方法将模型直接更新使用 Condensed Graph Memory，而不是整个新来的 graf。实际实验在四个 benchmark 数据集上显示，CaT 框架在效率和效果上具有优越的表现。code 已经在 https://github.com/superallen13/CaT-CGL 发布。

Asymptotically Efficient Online Learning for Censored Regression Models Under Non-I.I.D Data

paper_url: http://arxiv.org/abs/2309.09454
repo_url: None
paper_authors: Lantian Zhang, Lei Guo
for: investigate the asymptotically efficient online learning problem for stochastic censored regression models.
methods: propose a two-step online algorithm, which achieves algorithm convergence and improves estimation performance.
results: show that the algorithm is strongly consistent and asymptotically normal, and the covariances of the estimates can achieve the Cramer-Rao bound asymptotically.

Abstract
The asymptotically efficient online learning problem is investigated for stochastic censored regression models, which arise from various fields of learning and statistics but up to now still lacks comprehensive theoretical studies on the efficiency of the learning algorithms. For this, we propose a two-step online algorithm, where the first step focuses on achieving algorithm convergence, and the second step is dedicated to improving the estimation performance. Under a general excitation condition on the data, we show that our algorithm is strongly consistent and asymptotically normal by employing the stochastic Lyapunov function method and limit theories for martingales. Moreover, we show that the covariances of the estimates can achieve the Cramer-Rao (C-R) bound asymptotically, indicating that the performance of the proposed algorithm is the best possible that one can expect in general. Unlike most of the existing works, our results are obtained without resorting to the traditionally used but stringent conditions such as independent and identically distributed (i.i.d) assumption on the data, and thus our results do not exclude applications to stochastic dynamical systems with feedback. A numerical example is also provided to illustrate the superiority of the proposed online algorithm over the existing related ones in the literature.

摘要
“ Stochastic censored regression 模型中的 asymptotically 高效在线学习问题被研究。这种模型在学习和统计领域中有很多应用，但是现在还没有完整的理论研究。为解决这个问题，我们提出了一个两步在线算法，其中第一步是使算法快速收敛，第二步是提高估计性能。在数据中的普通激动情况下，我们使用杰尼尼抽象函数方法和 martingale 限论来证明我们的算法是强连续和强conv的。此外，我们还证明了估计的协方差可以在极限下达到 Cramer-Rao (C-R) bound，这表示我们的算法在总体上具有最佳的性能。不同于大多数现有的研究，我们的结果不需要 resorting to i.i.d 假设，因此我们的结果不排除应用于Stochastic dynamical systems with feedback。一个数字Example 也是提供以 Illustrate 我们的在线算法在 literatura 中的优越性。”Note: Please keep in mind that the translation is done by a machine and may not be perfect, especially for idiomatic expressions or cultural references.

On the Use of the Kantorovich-Rubinstein Distance for Dimensionality Reduction

paper_url: http://arxiv.org/abs/2309.09442
repo_url: None
paper_authors: Gaël Giordano
for: 这个论文的目的是研究使用康托罗维奇-鲁宾逊距离来建立分类问题中的样本复杂度描述器。
methods: 这篇论文使用了康托罗维奇-鲁宾逊距离来量化样本之间的geometry和topology信息，并将每个类别的点关联到一个措施中。
results: 论文表明，如果康托罗维奇-鲁宾逊距离 между这些措施较大，则存在一个1-Lipschitz分类器，可以良好地分类这些点。

Abstract
The goal of this thesis is to study the use of the Kantorovich-Rubinstein distance as to build a descriptor of sample complexity in classification problems. The idea is to use the fact that the Kantorovich-Rubinstein distance is a metric in the space of measures that also takes into account the geometry and topology of the underlying metric space. We associate to each class of points a measure and thus study the geometrical information that we can obtain from the Kantorovich-Rubinstein distance between those measures. We show that a large Kantorovich-Rubinstein distance between those measures allows to conclude that there exists a 1-Lipschitz classifier that classifies well the classes of points. We also discuss the limitation of the Kantorovich-Rubinstein distance as a descriptor.

摘要
本论文的目标是研究使用庞特罗维奇-鲁比涅斯坦距离来建立分类问题中的样本复杂性描述器。我们使用庞特罗维奇-鲁比涅斯坦距离是度量空间概率的度量，同时也考虑度量空间的几何和 topology。我们对每个类别点分配一个概率，并研究庞特罗维奇-鲁比涅斯坦距离这些概率之间的几何信息。我们显示，当庞特罗维奇-鲁比涅斯坦距离大于某个阈值时，存在1-Lipschitz分类器可以良好地分类点类。我们还讨论了庞特罗维奇-鲁比涅斯坦距离作为描述器的限制。

DeepHEN: quantitative prediction essential lncRNA genes and rethinking essentialities of lncRNA genes

paper_url: http://arxiv.org/abs/2309.10008
repo_url: None
paper_authors: Hanlin Zhang, Wenzheng Cheng
for: 本研究旨在解释非编码蛋白质基因的必需性。
methods: 该研究使用深度学习和图神经网络来预测非编码蛋白质基因的必需性。
results: 该模型能够预测非编码蛋白质基因的必需性，并能够区分序列特征和网络空间特征对必需性的影响。此外，该模型还能够解决其他方法因为必需性蛋白质基因数量低而导致的过拟合问题。

Abstract
Gene essentiality refers to the degree to which a gene is necessary for the survival and reproductive efficacy of a living organism. Although the essentiality of non-coding genes has been documented, there are still aspects of non-coding genes' essentiality that are unknown to us. For example, We do not know the contribution of sequence features and network spatial features to essentiality. As a consequence, in this work, we propose DeepHEN that could answer the above question. By buidling a new lncRNA-proteion-protein network and utilizing both representation learning and graph neural network, we successfully build our DeepHEN models that could predict the essentiality of lncRNA genes. Compared to other methods for predicting the essentiality of lncRNA genes, our DeepHEN model not only tells whether sequence features or network spatial features have a greater influence on essentiality but also addresses the overfitting issue of those methods caused by the low number of essential lncRNA genes, as evidenced by the results of enrichment analysis.

摘要
Translated into Simplified Chinese:基因必需性指的是生物体的存活和繁殖能力中基因的必需程度。虽然非编码基因的必需性已经记录，但还有一些非编码基因的必需性还未知。例如，我们不知道序列特征和网络空间特征对必需性的贡献。因此，在这项工作中，我们提出了深度感知（DeepHEN），可以回答上述问题。我们通过构建新的lncRNA-蛋白质-蛋白质网络和使用表示学习和图神经网络，成功建立了我们的深度感知模型，可以预测lncRNA基因的必需性。与其他预测lncRNA基因必需性的方法相比，我们的深度感知模型不仅可以评估序列特征和网络空间特征对必需性的影响，还可以解决这些方法因为低数量必需lncRNA基因而导致的过拟合问题，根据结果分析中的恰合分析结果。

An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination

paper_url: http://arxiv.org/abs/2309.09436
repo_url: None
paper_authors: Minkyung Kim, Jongmin Yu, Junsik Kim, Tae-Hyun Oh, Jun Kyun Choi
for:这个论文的目的是提高深入型异常检测模型的Robustness，使其能够更好地适应实际数据分布中的异常tail。methods:该论文提出了一种学习框架，通过iteratively更新样本级别的正常性权重，以提高深入型异常检测模型的学习效果。该框架是模型无关和参数适应的，可以应用于现有的异常检测方法。results:在五个异常检测benchmark dataset和两个图像 dataset上，该框架能够提高异常检测模型的Robustness，并且在不同的杂杂度水平下表现出优于现有的异常检测方法。

Abstract
Most deep anomaly detection models are based on learning normality from datasets due to the difficulty of defining abnormality by its diverse and inconsistent nature. Therefore, it has been a common practice to learn normality under the assumption that anomalous data are absent in a training dataset, which we call normality assumption. However, in practice, the normality assumption is often violated due to the nature of real data distributions that includes anomalous tails, i.e., a contaminated dataset. Thereby, the gap between the assumption and actual training data affects detrimentally in learning of an anomaly detection model. In this work, we propose a learning framework to reduce this gap and achieve better normality representation. Our key idea is to identify sample-wise normality and utilize it as an importance weight, which is updated iteratively during the training. Our framework is designed to be model-agnostic and hyperparameter insensitive so that it applies to a wide range of existing methods without careful parameter tuning. We apply our framework to three different representative approaches of deep anomaly detection that are classified into one-class classification-, probabilistic model-, and reconstruction-based approaches. In addition, we address the importance of a termination condition for iterative methods and propose a termination criterion inspired by the anomaly detection objective. We validate that our framework improves the robustness of the anomaly detection models under different levels of contamination ratios on five anomaly detection benchmark datasets and two image datasets. On various contaminated datasets, our framework improves the performance of three representative anomaly detection methods, measured by area under the ROC curve.

摘要
大多数深度异常检测模型基于学习正常性从数据集中学习，由于异常性的多样性和不一致性，因此通常采用学习正常性假设。然而，在实际应用中，正常性假设经常被违反，因为真实数据分布包含异常尾部，即杂杂数据集。这会导致学习异常检测模型的时候， gap between假设和实际训练数据产生负面影响。在这种情况下，我们提出了一种学习框架，以减少这个 gap 并达到更好的正常性表示。我们的关键思想是在样本级别上确定正常性，并将其作为重要性Weight使用，这个Weight在训练过程中进行迭代更新。我们的框架采用模型无关和 гипер参数敏感的设计，可以应用于各种现有方法无需精心参数调整。我们在三种不同的深度异常检测方法上应用了我们的框架，分别是一类分类-, 概率模型-和重建基于的方法。此外，我们还考虑了异常检测目标中的终止条件，并提出了基于异常检测目标的终止 criterion。我们验证了我们的框架可以在不同的杂杂比例下提高异常检测模型的Robustness，并在五个异常检测benchmark datasets和两个图像 datasets上进行验证。在各种杂杂数据集上，我们的框架可以提高三种代表性异常检测方法的性能， measured by area under the ROC curve。

Distributionally Time-Varying Online Stochastic Optimization under Polyak-Łojasiewicz Condition with Application in Conditional Value-at-Risk Statistical Learning

paper_url: http://arxiv.org/abs/2309.09411
repo_url: None
paper_authors: Yuen-Man Pun, Farhad Farokhi, Iman Shames
for: 这个论文研究了一系列随机优化问题，通过在线优化的视角来探讨。
methods: 论文使用了在线随机梯度 DESCENT 和在线随机 proximal 梯度 DESCENT，并且为这些方法提供了动态 regret bound。
results: 论文证明了在线随机梯度 DESCENT 和在线随机 proximal 梯度 DESCENT 的动态 regret bound，并且应用到了 Conditional Value-at-Risk (CVaR) 学习问题。

Abstract
In this work, we consider a sequence of stochastic optimization problems following a time-varying distribution via the lens of online optimization. Assuming that the loss function satisfies the Polyak-{\L}ojasiewicz condition, we apply online stochastic gradient descent and establish its dynamic regret bound that is composed of cumulative distribution drifts and cumulative gradient biases caused by stochasticity. The distribution metric we adopt here is Wasserstein distance, which is well-defined without the absolute continuity assumption or with a time-varying support set. We also establish a regret bound of online stochastic proximal gradient descent when the objective function is regularized. Moreover, we show that the above framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem. Particularly, we improve an existing proof on the discovery of the PL condition of the CVaR problem, resulting in a regret bound of online stochastic gradient descent.

摘要
在这个工作中，我们考虑一个时间变化分布下的随机优化问题序列，通过在线优化的镜头来分析。我们假设损失函数满足波Яakov-{\L}ojasiewicz条件，我们应用在线随机梯度下降并确定其动态 regret bound，该 regret bound包括累积分布漂移和累积梯度偏差由随机性引起的。我们采用的分布度量是沃氏距离，这是不含绝对连续性假设或时间变化支持集的。此外，我们还证明在线随机距离梯度下降可以应用于 conditional Value-at-Risk（CVaR）学习问题。特别是，我们改进了现有的PL条件发现证明，从而获得在线随机梯度下降的 regret bound。

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

paper_url: http://arxiv.org/abs/2309.09408
repo_url: None
paper_authors: Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka, Chen Tang, Wei Zhan
for: 提高安全性和效率的强化学习（Reinforcement Learning）策略，使得RL agents可以在具有成本限制的情况下获得高奖励。
methods: 使用大容量模型（如决策变换器DT）来学习离线数据中的专家策略，并通过指导在线安全RL训练来策略减小。
results: GOLD框架可以成功减小离线DT策略，并在实际执行时在安全性和效率两个方面表现出色，在多种安全关键的scenario中解决决策问题。

Abstract
Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e.g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. Large-capacity models, e.g. decision transformers (DT), have been proven to be competent in offline policy learning. However, data collected in real-world scenarios rarely contain dangerous cases (e.g., collisions), which makes it prohibitive for the policies to learn safety concepts. Besides, these bulk policy networks cannot meet the computation speed requirements at inference time on real-world tasks such as autonomous driving. To this end, we propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework. GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms. Experiments in both benchmark safe RL tasks and real-world driving tasks based on the Waymo Open Motion Dataset (WOMD) demonstrate that GOLD can successfully distill lightweight policies and solve decision-making problems in challenging safety-critical scenarios.

摘要
<> translate "Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e.g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. Large-capacity models, e.g. decision transformers (DT), have been proven to be competent in offline policy learning. However, data collected in real-world scenarios rarely contain dangerous cases (e.g., collisions), which makes it prohibitive for the policies to learn safety concepts. Besides, these bulk policy networks cannot meet the computation speed requirements at inference time on real-world tasks such as autonomous driving. To this end, we propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework. GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms. Experiments in both benchmark safe RL tasks and real-world driving tasks based on the Waymo Open Motion Dataset (WOMD) demonstrate that GOLD can successfully distill lightweight policies and solve decision-making problems in challenging safety-critical scenarios."中文翻译：<>safe reinforcement learning（RL）的目标是找到一个政策，以高的奖励来满足成本限制。当学习从头开始时，safe RL 代理人往往太保守，这会阻碍探索和限制整体性能。在许多现实任务中，例如自动驾驶，大规模专家示范数据可以用于指导在线探索。我们认为从 offline 数据中提取专家策略来指导在线探索是一个有前途的解决方案，以减少保守性问题。大容量模型，如决策变换器（DT），已经在 offline 政策学习中证明自己的能力。然而，实际场景中收集的数据 редarily 包含危险情况（例如，相撞），这使得政策学习危险概念的缺乏。此外，这些大型政策网络不能在实际任务中实时进行计算，例如自动驾驶。为此，我们提出了 Guided Online Distillation（GOLD），一个 offline-to-online safe RL 框架。GOLD 通过在线安全 RL 培训来精炼 offline DT 政策，并且超过了 offline DT 政策和在线安全 RL 算法。在标准 safe RL 任务和基于 Waymo Open Motion Dataset（WOMD）的实际驾驶任务中，GOLD 可以成功精炼轻量级政策并解决安全关键问题。