2023-09-05

cs.LG

cs.LG - 2023-09-05

Superclustering by finding statistically significant separable groups of optimal gaussian clusters

paper_url: http://arxiv.org/abs/2309.02623
repo_url: https://github.com/berng/GMSDB
paper_authors: Oleg I. Berngardt
For: The paper proposes an algorithm for clustering a dataset into optimal superclusters based on the BIC criterion and statistical separability.* Methods: The algorithm consists of three stages: representing the dataset as a mixture of Gaussian distributions, estimating distances and cluster sizes using the Mahalanobis distance, and combining clusters into superclusters using the DBSCAN method with a statistical significance level.* Results: The algorithm demonstrates good results on test datasets in both noisy and noiseless situations, and can predict correct superclusters for new data based on already trained clusterer. However, the algorithm has low speed and stochastic nature, and requires a sufficiently large dataset for clustering.Here is the same information in Simplified Chinese text:* For: 文章提出一种算法，用于基于BIC criterion和统计分离性 clustering dataset。* Methods: 算法包括三个阶段：将dataset表示为一个 mixture of Gaussian distributions，使用 Mahalanobis distance 估计 distances 和 cluster size，并使用 DBSCAN 方法将 clusters 组合成 superclusters，并使用统计significance level。* Results: 算法在测试dataset上显示了良好的结果，能够预测新数据中 correct superclusters，基于已经训练好的 clusterer。然而，算法有低速度和随机性，需要一个充分大的 dataset 进行 clustering。

Abstract
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion, number of Gaussian clusters into the optimal, from the point of view of their statistical separability, superclusters. The algorithm consists of three stages: representation of the dataset as a mixture of Gaussian distributions - clusters, which number is determined based on the minimum of the BIC criterion; using the Mahalanobis distance, to estimate the distances between the clusters and cluster sizes; combining the resulting clusters into superclusters using the DBSCAN method by finding its hyperparameter (maximum distance) providing maximum value of introduced matrix quality criterion at maximum number of superclusters. The matrix quality criterion corresponds to the proportion of statistically significant separated superclusters among all found superclusters. The algorithm has only one hyperparameter - statistical significance level, and automatically detects optimal number and shape of superclusters based of statistical hypothesis testing approach. The algorithm demonstrates a good results on test datasets in noise and noiseless situations. An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer and perform soft (fuzzy) clustering. The disadvantages of the algorithm are: its low speed and stochastic nature of the final clustering. It requires a sufficiently large dataset for clustering, which is typical for many statistical methods.

摘要

Representing the dataset as a mixture of Gaussian distributions (clusters) and determining the number of clusters based on the minimum of the BIC criterion.2. Using the Mahalanobis distance to estimate the distances between the clusters and their sizes.3. Combining the resulting clusters into superclusters using the DBSCAN method by finding the maximum distance that provides the maximum value of the introduced matrix quality criterion at the maximum number of superclusters.The algorithm has only one hyperparameter, the statistical significance level, and automatically detects the optimal number and shape of superclusters based on a statistical hypothesis testing approach. The algorithm demonstrates good results on test datasets in both noisy and noise-free situations. An advantage of the algorithm is its ability to predict correct superclusters for new data based on an already trained clusterer and perform soft (fuzzy) clustering. However, the algorithm has some disadvantages, such as low speed and stochastic nature of the final clustering, and requires a sufficient dataset for clustering, which is common for many statistical methods.

paper_url: http://arxiv.org/abs/2309.02616
repo_url: None
paper_authors: Hongyang Du, Guangyuan Liu, Dusit Niyato, Jiayi Zhang, Jiawen Kang, Zehui Xiong, Bo Ai, Dong In Kim
for: 提高网络资源使用效率和实现通信目标
methods: 使用生成人工智能（GAI）模型，增强 semantic decoder 的可重构能力，并应用多模型提示进行精准内容解码
results: 实现精准内容解码和安全传输源消息，并提高网络资源使用效率

Abstract
Semantic communication (SemCom) holds promise for reducing network resource consumption while achieving the communications goal. However, the computational overheads in jointly training semantic encoders and decoders-and the subsequent deployment in network devices-are overlooked. Recent advances in Generative artificial intelligence (GAI) offer a potential solution. The robust learning abilities of GAI models indicate that semantic decoders can reconstruct source messages using a limited amount of semantic information, e.g., prompts, without joint training with the semantic encoder. A notable challenge, however, is the instability introduced by GAI's diverse generation ability. This instability, evident in outputs like text-generated images, limits the direct application of GAI in scenarios demanding accurate message recovery, such as face image transmission. To solve the above problems, this paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. Moreover, in response to security concerns, we introduce the application of covert communications aided by a friendly jammer. The system jointly optimizes the diffusion step, jamming, and transmitting power with the aid of the generative diffusion models, enabling successful and secure transmission of the source messages.

摘要

Generative Algorithms for Fusion of Physics-Based Wildfire Spread Models with Satellite Data for Initializing Wildfire Forecasts

paper_url: http://arxiv.org/abs/2309.02615
repo_url: https://github.com/bshaddy/cWGAN_fire_arrival_time_inference
paper_authors: Bryan Shaddy, Deep Ray, Angel Farguell, Valentina Calaza, Jan Mandel, James Haley, Kyle Hilburn, Derek V. Mallia, Adam Kochanski, Assad Oberai
for: 这个研究的目的是开发一种基于卫星数据的高分辨率野火行为模型，以便预测野火的 sprech.methods: 这个研究使用了一种称为conditional Wasserstein Generative Adversarial Network (cWGAN)的方法，用于从卫星活动火数据中推断野火的到达时间。results: 研究发现，使用cWGAN来预测野火的到达时间可以非常准确， average Sorensen’s coefficient of 0.81 for the fire perimeters和average ignition time error of 32 minutes。

Abstract
Increases in wildfire activity and the resulting impacts have prompted the development of high-resolution wildfire behavior models for forecasting fire spread. Recent progress in using satellites to detect fire locations further provides the opportunity to use measurements to improve fire spread forecasts from numerical models through data assimilation. This work develops a method for inferring the history of a wildfire from satellite measurements, providing the necessary information to initialize coupled atmosphere-wildfire models from a measured wildfire state in a physics-informed approach. The fire arrival time, which is the time the fire reaches a given spatial location, acts as a succinct representation of the history of a wildfire. In this work, a conditional Wasserstein Generative Adversarial Network (cWGAN), trained with WRF-SFIRE simulations, is used to infer the fire arrival time from satellite active fire data. The cWGAN is used to produce samples of likely fire arrival times from the conditional distribution of arrival times given satellite active fire detections. Samples produced by the cWGAN are further used to assess the uncertainty of predictions. The cWGAN is tested on four California wildfires occurring between 2020 and 2022, and predictions for fire extent are compared against high resolution airborne infrared measurements. Further, the predicted ignition times are compared with reported ignition times. An average Sorensen's coefficient of 0.81 for the fire perimeters and an average ignition time error of 32 minutes suggest that the method is highly accurate.

摘要
人类活动增加了野火的活动，并导致了一些影响。为了预测野火的扩散，人们已经开发了高分辨率野火行为模型。近年来，通过卫星探测火灾位置，可以使用测量数据进行数据吸收，以改进数字模型中的火灾扩散预测。这种工作通过卫星活动火灾数据来推算火灾历史，以便在物理学 informed 的方法下初始化气候-野火模型。火灾到达时间，即火灾到某个空间位置的时间， acted as a succinct representation of the history of a wildfire。在这种工作中，一种 conditional Wasserstein 生成 adversarial network (cWGAN) 被用来从卫星活动火灾数据中推算火灾到达时间。cWGAN 被用来生成 conditional 分布中的可能的火灾到达时间样本。这些样本被用来评估预测的不确定性。cWGAN 在加利福尼亚州2020-2022年四次野火中进行测试，并将预测的火灾范围与高分辨率空中热成像测量进行比较。此外，预测的点燃时间也与报告的点燃时间进行比较。 Sorensen 公式的平均值为0.81，表明方法的准确性很高。

T-SaS: Toward Shift-aware Dynamic Adaptation for Streaming Data

paper_url: http://arxiv.org/abs/2309.02610
repo_url: None
paper_authors: Weijieying Ren, Tianxiang Zhao, Wei Qin, Kunpeng Liu
for: 这篇研究是为了解决流数据中的分布类型逐步变化问题，特别是在没有预测的情况下，数据流中突然出现分布shift。
methods: 这篇研究使用了一个 Bayesian 框架，名为 T-SaS，并将分布类型变化变数纳入模型中，以捕捉数据流中突然的分布shift。然后，这篇研究设计了一个可动的网络选择策略，以适应不同的分布类型。
results: 实验结果显示，这篇研究的方法可以优于在数据流中准确地探测分布shift的范围，并对下游预测或分类任务进行有效适应。

Abstract
In many real-world scenarios, distribution shifts exist in the streaming data across time steps. Many complex sequential data can be effectively divided into distinct regimes that exhibit persistent dynamics. Discovering the shifted behaviors and the evolving patterns underlying the streaming data are important to understand the dynamic system. Existing methods typically train one robust model to work for the evolving data of distinct distributions or sequentially adapt the model utilizing explicitly given regime boundaries. However, there are two challenges: (1) shifts in data streams could happen drastically and abruptly without precursors. Boundaries of distribution shifts are usually unavailable, and (2) training a shared model for all domains could fail to capture varying patterns. This paper aims to solve the problem of sequential data modeling in the presence of sudden distribution shifts that occur without any precursors. Specifically, we design a Bayesian framework, dubbed as T-SaS, with a discrete distribution-modeling variable to capture abrupt shifts of data. Then, we design a model that enable adaptation with dynamic network selection conditioned on that discrete variable. The proposed method learns specific model parameters for each distribution by learning which neurons should be activated in the full network. A dynamic masking strategy is adopted here to support inter-distribution transfer through the overlapping of a set of sparse networks. Extensive experiments show that our proposed method is superior in both accurately detecting shift boundaries to get segments of varying distributions and effectively adapting to downstream forecast or classification tasks.

摘要
在许多实际场景中，流动数据中的分布shift存在，这些shift可能是随机的和突然的。许多复杂的顺序数据可以被有效地分解为不同的领域，每个领域都具有持续的动力学。了解流动数据中的shift和下沉pattern是理解动态系统的关键。现有方法通常是在不同的分布下训练一个Robust模型，或者采用显式给出的领域边界来逐步修改模型。然而，存在两个挑战：（1）数据流中的shift可能会发生急剧和突然，无法预测；（2）训练共享模型可能无法捕捉不同领域的变化模式。这篇论文的目的是解决流动数据中的顺序数据模型化问题，具体来说是在无前兆的分布shift下进行适应。我们提出了一种抽象 Bayesian 框架，名为T-SaS，它包含一个简单的分布模型变量，用于捕捉数据的突然shift。然后，我们设计了一种可动的网络选择conditioned于这个分布变量，以便适应不同的分布。我们的方法可以学习每个分布的特定参数，并且通过在全网络中活跃的 neuron 来确定哪些参数是有用的。我们采用了一种动态遮盾策略来支持 между分布传递，这种策略可以在不同的分布下共享一部分稀缺网络。我们的方法在 Segment 分布boundary 检测和适应下沉任务中具有显著优势。

Distributed Variational Inference for Online Supervised Learning

paper_url: http://arxiv.org/abs/2309.02606
repo_url: https://github.com/pptx/distributed-mapping
paper_authors: Parth Paritosh, Nikolay Atanasov, Sonia Martinez
for: 这篇论文旨在提出一种扩展可行的分布式概率推理算法，用于智能传感器网络中的推理问题。
methods: 该论文提出了一种分布式 probabilistic inference algorithm，可以应用于连续变量、不可解 posteriors 和大规模实时数据。在中央设置下，Variational inference 是一种基本的技术，用于perform approximate Bayesian estimation，其中一个难以求解 posterior density 被approximated 为一个参数化density。
results: 论文的关键贡献在于 derive 了一个分布式lower bound на centralized estimation objective，这使得在传感器网络中进行分布式variational inference，只需一次 hop 通信。此外，论文还设计了一种在线分布式算法，用于在流动数据中进行分类和回归问题的解决，并特化为 Gaussian variational densities with non-linear likelihoods。最后，论文还 derive 了一个高维模型的 diagonalized 版本，并应用于多机器人概率地图使用indoor LiDAR数据。

Abstract
Developing efficient solutions for inference problems in intelligent sensor networks is crucial for the next generation of location, tracking, and mapping services. This paper develops a scalable distributed probabilistic inference algorithm that applies to continuous variables, intractable posteriors and large-scale real-time data in sensor networks. In a centralized setting, variational inference is a fundamental technique for performing approximate Bayesian estimation, in which an intractable posterior density is approximated with a parametric density. Our key contribution lies in the derivation of a separable lower bound on the centralized estimation objective, which enables distributed variational inference with one-hop communication in a sensor network. Our distributed evidence lower bound (DELBO) consists of a weighted sum of observation likelihood and divergence to prior densities, and its gap to the measurement evidence is due to consensus and modeling errors. To solve binary classification and regression problems while handling streaming data, we design an online distributed algorithm that maximizes DELBO, and specialize it to Gaussian variational densities with non-linear likelihoods. The resulting distributed Gaussian variational inference (DGVI) efficiently inverts a $1$-rank correction to the covariance matrix. Finally, we derive a diagonalized version for online distributed inference in high-dimensional models, and apply it to multi-robot probabilistic mapping using indoor LiDAR data.

摘要
开发高效的推理解决方案是智能感知网络下一代位置跟踪和地图服务的关键。本文提出了一种可扩展的分布式概率推理算法，适用于继续变量、不可解决 posterior 和大规模实时数据。在中央化环境下，变量推理是概率推理的基本技术，用于approximate Bayesian estimation，其中一个难以解决的 posterior density 被approximated 为parametric density。我们的关键贡献在于 derive 一个可分离的下界于中央估计目标函数，这使得分布式变量推理可以在感知网络中进行一 hop 通信。我们的分布式证据下界（DELBO）是一个加权和 observation likelihood 和偏好函数之间的差异，它的差异是由consensus和modeling error 引起的。为了解决流动数据中的二分类和回归问题，我们设计了一种在线分布式算法，该算法可以最大化 DELBO，并特化为 Gaussian 变量推理函数。这导致了一种高效地减少 $1$-rank corrections 的方法。最后，我们 deriv 了一个 диагональ 版本，用于在线分布式推理高维模型，并应用于多 robot 概率地图使用indoor LiDAR 数据。

Screening of Pneumonia and Urinary Tract Infection at Triage using TriNet

paper_url: http://arxiv.org/abs/2309.02604
repo_url: None
paper_authors: Stephen Z. Lu
for: 这个论文是为了解决医疗机构的急诊室拥堵和效率下降问题而写的。methods: 这个论文使用机器学习算法来自动化急诊室的医疗指导，以提高急诊室的效率和质量。results: 论文中的TriNet模型在检测患有肺炎和慢性肾炎的病人中显示了高正确率（0.86和0.93），这些模型比现有的临床标准更高，表明机器学习医疗指导可以提供免费、不侵入的检测方式，从而降低急诊室的风险和提高医疗质量。

Abstract
Due to the steady rise in population demographics and longevity, emergency department visits are increasing across North America. As more patients visit the emergency department, traditional clinical workflows become overloaded and inefficient, leading to prolonged wait-times and reduced healthcare quality. One of such workflows is the triage medical directive, impeded by limited human workload, inaccurate diagnoses and invasive over-testing. To address this issue, we propose TriNet: a machine learning model for medical directives that automates first-line screening at triage for conditions requiring downstream testing for diagnosis confirmation. To verify screening potential, TriNet was trained on hospital triage data and achieved high positive predictive values in detecting pneumonia (0.86) and urinary tract infection (0.93). These models outperform current clinical benchmarks, indicating that machine-learning medical directives can offer cost-free, non-invasive screening with high specificity for common conditions, reducing the risk of over-testing while increasing emergency department efficiency.

摘要
Translated into Simplified Chinese:因人口减少和寿命增加，北美洲的急诊室访问量在增长。随着更多患者前往急诊室，传统的临床工作流程变得过载和不具有效率，导致排队时间增长，健康保健质量减退。其中一种工作流程是抢救医疗指南，受到人工负荷、不准确诊断和不必要的检测限制。为解决这一问题，我们提议TriNet：一种基于机器学习的医疗指南，自动化急诊室抢救阶段的首选检测，以确认诊断。为验证这一点，TriNet在医院急诊室数据上进行训练，在患有肺炎和尿感染的病例中达到了0.86的正确预测值，并在患有尿感染的病例中达到了0.93的正确预测值。这些模型比现有的临床标准更高，表明机器学习医疗指南可以提供免费、不侵入的检测，高度特异性 для常见的疾病，降低过测试风险，提高急诊室效率。

Causal Structure Recovery of Linear Dynamical Systems: An FFT based Approach

paper_url: http://arxiv.org/abs/2309.02571
repo_url: None
paper_authors: Mishfad Shaikh Veedu, James Melbourne, Murti V. Salapaka
for: 本研究旨在探讨时间序列观测中的 causal 效应， especial when there are dynamical dependencies between entities.
methods: 我们提出了一种方法，可以减少计算复杂性为 $O(Tn^3 \log N)$，用于回归 causation 结构，从而获得频域频谱 (FD) 表示。
results: 我们发现，对于LTI 系统，可以使用 do-calculus 机制在 FD 中进行 causal 推理，并且可以使用 multivariate Wiener projections 实现 graph 重建，具有 $O(n)$ 复杂性。

Abstract
Learning causal effects from data is a fundamental and well-studied problem across science, especially when the cause-effect relationship is static in nature. However, causal effect is less explored when there are dynamical dependencies, i.e., when dependencies exist between entities across time. Identifying dynamic causal effects from time-series observations is computationally expensive when compared to the static scenario. We demonstrate that the computational complexity of recovering the causation structure for the vector auto-regressive (VAR) model is $O(Tn^3N^2)$, where $n$ is the number of nodes, $T$ is the number of samples, and $N$ is the largest time-lag in the dependency between entities. We report a method, with a reduced complexity of $O(Tn^3 \log N)$, to recover the causation structure to obtain frequency-domain (FD) representations of time-series. Since FFT accumulates all the time dependencies on every frequency, causal inference can be performed efficiently by considering the state variables as random variables at any given frequency. We additionally show that, for systems with interactions that are LTI, do-calculus machinery can be realized in the FD resulting in versions of the classical single-door (with cycles), front and backdoor criteria. We demonstrate, for a large class of problems, graph reconstruction using multivariate Wiener projections results in a significant computational advantage with $O(n)$ complexity over reconstruction algorithms such as the PC algorithm which has $O(n^q)$ complexity, where $q$ is the maximum neighborhood size. This advantage accrues due to some remarkable properties of the phase response of the frequency-dependent Wiener coefficients which is not present in any time-domain approach.

摘要
学习 causal effects 从数据中是科学的基础问题，特别是当 causal relationship 是静态的时候。然而， causal effect 在存在时间相关性时更少研究。从时序观察数据中提取动态 causal effects 的计算复杂度比静态场景更高。我们证明了VAR 模型的 causation 结构恢复计算复杂度为 $O(Tn^3N^2)$, where $n$ 是节点数， $T$ 是样本数， $N$ 是最大时间延迟 между实体。我们报告了一种方法，计算复杂度为 $O(Tn^3 \log N)$, 恢复 causation 结构，以获得频域域 (FD) 表示。由于 FFT 积累了所有时间相关性，因此在频域中进行 causal inference 是高效的。我们还证明了，对于具有 LTI 交互的系统，do-calculus 机械可以在 FD 中实现，导致了类ical single-door (with cycles)、front 和 backdoor criterion。我们示例了， для 一类问题，使用 multivariate Wiener projections 进行图重建可以获得 $O(n)$ 复杂度的计算优势，比 PC 算法 ($O(n^q)$ 复杂度) 更高效。这个优势来自频域 Wiener 系数的相对应性，不存在在时域方法中。

Sparse Partitioning Around Medoids

paper_url: http://arxiv.org/abs/2309.02557
repo_url: None
paper_authors: Lars Lenssen, Erich Schubert
For: 这篇论文是关于分群算法，具体是使用Partitioning Around Medoids（PAM）和fastPAM方法来解决分群问题。* Methods: 这篇论文使用了PAM和fastPAM方法，并且提出了一个缓存簇数据的方法来解决分群问题。* Results: 这篇论文的结果显示了这个方法可以在实际应用中提供高品质的分群解决方案，并且可以避免过度的缓存和复杂运算。

Abstract
Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering technique to use with arbitrary distance functions or similarities, where each cluster is represented by its most central object, called the medoid or the discrete median. In operations research, this family of problems is also known as facility location problem (FLP). FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks. By exploiting sparsity, we can avoid the quadratic runtime and memory requirements, and make this method scalable to even larger problems, as long as we are able to build a small enough graph of sufficient connectivity to perform local optimization. Furthermore, we consider asymmetric cases, where the set of medoids is not identical to the set of points to be covered (or in the interpretation of facility location, where the possible facility locations are not identical to the consumer locations). Because of sparsity, it may be impossible to cover all points with just k medoids for too small k, which would render the problem unsolvable, and this breaks common heuristics for finding a good starting condition. We, hence, consider determining k as a part of the optimization problem and propose to first construct a greedy initial solution with a larger k, then to optimize the problem by alternating between PAM-style "swap" operations where the result is improved by replacing medoids with better alternatives and "remove" operations to reduce the number of k until neither allows further improving the result quality. We demonstrate the usefulness of this method on a problem from electrical engineering, with the input graph derived from cartographic data.

摘要
分割附近中心（PAM，k-Medoids）是一种流行的聚类技术，可以用于任何距离函数或相似度，每个群由其最中央对象表示，称为中心点或离散中值。在运维研究中，这家团队的问题也称为设施位置问题（FLP）。Recently, FastPAM introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. 在本章中，我们讨论了一种稀疏和不均匀的变体，用于应用于图数据，如道路网络。通过利用稀疏性，我们可以避免 quadratic runtime和内存需求，并使这种方法可扩展至更大的问题，只要我们能够构建一个足够紧凑的图，以便进行本地优化。此外，我们考虑了非对称情况，其中中心点与要覆盖的点不同。由于稀疏性，可能无法使用 too small k 覆盖所有点，这会导致问题不可解，并让常见的尝试找到好的初始状态失效。我们因此考虑在优化问题时确定 k 的部分，并提议先构建一个大于 k 的推荐解，然后通过 PAM 样式的 "交换" 操作和 "移除" 操作来优化问题，直到 neither 允许进一步提高结果质量。我们在电力工程中的一个问题上示cases the usefulness of this method.Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China and Singapore. The translation is based on the traditional Chinese characters, but the sentence structure and vocabulary have been adjusted to conform to Simplified Chinese conventions.

Data Aggregation for Hierarchical Clustering

paper_url: http://arxiv.org/abs/2309.02552
repo_url: https://github.com/elki-project/elki
paper_authors: Erich Schubert, Andreas Lang
for: 使用Hierarchical Agglomerative Clustering（HAC）进行数据 clustering，但是因为HAC需要全息距离矩阵和完整的层次结构，因此在资源受限的系统上可能会出现问题。
methods: 使用BETULA数据汇集算法，一种稳定的BIRCH数据汇集算法变体，来使HAC在受限资源的系统上可行，只有小的质量损失。
results: 可以使用BETULA数据汇集算法来实现HAC在受限资源的系统上的可行性，但是有小的质量损失。

Abstract
Hierarchical Agglomerative Clustering (HAC) is likely the earliest and most flexible clustering method, because it can be used with many distances, similarities, and various linkage strategies. It is often used when the number of clusters the data set forms is unknown and some sort of hierarchy in the data is plausible. Most algorithms for HAC operate on a full distance matrix, and therefore require quadratic memory. The standard algorithm also has cubic runtime to produce a full hierarchy. Both memory and runtime are especially problematic in the context of embedded or otherwise very resource-constrained systems. In this section, we present how data aggregation with BETULA, a numerically stable version of the well known BIRCH data aggregation algorithm, can be used to make HAC viable on systems with constrained resources with only small losses on clustering quality, and hence allow exploratory data analysis of very large data sets.

摘要

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

paper_url: http://arxiv.org/abs/2309.02539
repo_url: None
paper_authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott
for: 提取对话、音乐和特效的独立音频源
methods: 基于频谱分割的Bandsplit RNN模型，利用听觉学原则定义频率谱，使用1-norm损失函数和共享编码器提高分离性能
results: 在Divide and Remaster数据集上，模型达到了理想比例幕值以上的分离性能

Abstract
Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue stem, the music stem, and the effects stem from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psycho-acoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with easily detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.

摘要
《电影式音频源分离》是一个相对较新的音频源分离子任务，目的是从它们的混合中提取对话束、音乐束和特效束。在这个工作中，我们开发了一个泛化了Bandsplit RNN模型，用于任何完整或过完整的频谱分解。基于听觉驱动的频谱缩放被用来引导频段定义，这些频段现在具有冗余性，以提高特征提取的可靠性。我们还提出了基于信号噪声比和1-norm减少准则的损失函数。此外，我们还利用了通用编码器设置中的信息共享特性，以降低训练和推断时的计算复杂度，提高对困难总结类声音的分离性能，并允许在推断时进行轻松地分解。我们的最佳模型已经将Divide and Remaster数据集的状态标准化，对对话束的性能超过理想的掩码壁。

Diffusion on the Probability Simplex

paper_url: http://arxiv.org/abs/2309.02530
repo_url: None
paper_authors: Griffin Floto, Thorsteinn Jonsson, Mihai Nica, Scott Sanner, Eric Zhengyu Zhu
for: 生成模型学习数据分布的逆进程，创造一个生成模型。
methods: 使用概率 simplice 进行Diffusion，使用softmax函数应用于Ornstein-Unlenbeck过程。
results: 方法可以自然扩展到包括Diffusion在unit cube上，并有应用于 bounded image generation。Note: “概率 simplice” refers to the probability simplex, which is a geometric object used to represent probability distributions. “Diffusion on the unit cube” refers to a specific type of diffusion process that is applied to a cube-shaped domain, rather than a continuous space.

Abstract
Diffusion models learn to reverse the progressive noising of a data distribution to create a generative model. However, the desired continuous nature of the noising process can be at odds with discrete data. To deal with this tension between continuous and discrete objects, we propose a method of performing diffusion on the probability simplex. Using the probability simplex naturally creates an interpretation where points correspond to categorical probability distributions. Our method uses the softmax function applied to an Ornstein-Unlenbeck Process, a well-known stochastic differential equation. We find that our methodology also naturally extends to include diffusion on the unit cube which has applications for bounded image generation.

摘要
diffusion 模型可以学习将数据分布中的进行进行逆转，以创建一个生成模型。然而，所希望的连续性的噪声过程可能与数据的精度有冲突。为了解决这种连续和精度之间的矛盾，我们提出了将噪声应用到概率 simpliciter 的方法。使用概率 simpliciter 自然地创造了点对应的 categorical 概率分布的解释。我们的方法使用 Ornstein-Unlenbeck 过程和 softmax 函数。我们发现我们的方法也自然地扩展到包括单位立方体上的噪声，这有应用于 bounded 图像生成。Note: "概率 simpliciter" refers to the probability simplex, which is a geometric object used to represent probability distributions. In this context, the method proposed in the text applies diffusion to the probability simplex to create a generative model.

Adaptive Adversarial Training Does Not Increase Recourse Costs

paper_url: http://arxiv.org/abs/2309.02528
repo_url: None
paper_authors: Ian Hardy, Jayanth Yetukuri, Yang Liu
for: 本研究旨在 investigating the effects of adaptive adversarial training on algorithmic recourse costs.
methods: 本研究使用了 adaptive adversarial training 方法，以对模型的Robustness和Algorithmic recourse costs 进行研究.
results: 研究结果显示，adaptive adversarial training 可以对模型的Robustness进行改善，但是这些改善对Algorithmic recourse costs 没有明显的影响。

Abstract
Recent work has connected adversarial attack methods and algorithmic recourse methods: both seek minimal changes to an input instance which alter a model's classification decision. It has been shown that traditional adversarial training, which seeks to minimize a classifier's susceptibility to malicious perturbations, increases the cost of generated recourse; with larger adversarial training radii correlating with higher recourse costs. From the perspective of algorithmic recourse, however, the appropriate adversarial training radius has always been unknown. Another recent line of work has motivated adversarial training with adaptive training radii to address the issue of instance-wise variable adversarial vulnerability, showing success in domains with unknown attack radii. This work studies the effects of adaptive adversarial training on algorithmic recourse costs. We establish that the improvements in model robustness induced by adaptive adversarial training show little effect on algorithmic recourse costs, providing a potential avenue for affordable robustness in domains where recoursability is critical.

摘要
最近的研究已经连接了 adversarial attack 方法和 algorithmic recourse 方法：它们都寻找最小的输入实例修改，以变更模型的分类决策。已经证明，传统的 adversarial training，寻求对于黑客变化的抑制，将生成的 recourse 成本增加; avec larger adversarial training radii 相关的 recourse 成本高于。从 algorithmic recourse 的角度来看，适当的 adversarial training radius 一直未知。另一些最近的研究将 adversarial training 与 adaptive training radii 结合，以解决实例对于黑客变化的不确定性，并在不同的实例上显示成功。这项研究 investigate 了 adaptive adversarial training 对于 algorithmic recourse 成本的影响。我们确定了 adaptive adversarial training 对于模型Robustness 的改进，对于 algorithmic recourse 成本的影响几乎无效，提供了可能的折衣预算在域内的途径。

Comparative Analysis of CPU and GPU Profiling for Deep Learning Models

paper_url: http://arxiv.org/abs/2309.02521
repo_url: None
paper_authors: Dipesh Gyawali
for: 这个论文是关于深度学习和机器学习应用的研究，旨在探讨 GPU 和 CPU 在训练深度神经网络时的资源分配和消耗。
methods: 该论文使用了 Pytorch 框架来实现深度学习项目，并对 GPU 和 CPU 的操作跟踪进行分析，以了解它们在训练深度神经网络时的资源分配和消耗。
results: 研究显示，在训练深度神经网络时，GPU 的运行时间比 CPU 更低，但是对于更简单的网络，GPU 与 CPU 之间并没有很大的差异。

Abstract
Deep Learning(DL) and Machine Learning(ML) applications are rapidly increasing in recent days. Massive amounts of data are being generated over the internet which can derive meaningful results by the use of ML and DL algorithms. Hardware resources and open-source libraries have made it easy to implement these algorithms. Tensorflow and Pytorch are one of the leading frameworks for implementing ML projects. By using those frameworks, we can trace the operations executed on both GPU and CPU to analyze the resource allocations and consumption. This paper presents the time and memory allocation of CPU and GPU while training deep neural networks using Pytorch. This paper analysis shows that GPU has a lower running time as compared to CPU for deep neural networks. For a simpler network, there are not many significant improvements in GPU over the CPU.

摘要
深度学习（DL）和机器学习（ML）应用在最近几年内快速增长。互联网上的巨量数据可以通过ML和DL算法提取有意义的结果。硬件资源和开源库的出现使得实现这些算法变得更加容易。TensorFlow和PyTorch是实现ML项目的主要框架之一。通过使用这些框架，我们可以跟踪CPU和GPU上执行的操作，以分析资源分配和消耗。本文 presente在训练深度神经网络时CPU和GPU的时间和内存分配。本文分析显示，在深度神经网络训练中，GPU的运行时间比CPU更低。对于简单的网络，GPU上并没有很多显著的改进。

Towards User Guided Actionable Recourse

paper_url: http://arxiv.org/abs/2309.02517
repo_url: None
paper_authors: Jayanth Yetukuri, Ian Hardy, Yang Liu
for: This paper aims to provide actionable recourse to negatively impacted users in machine learning models, with a focus on capturing user preferences via soft constraints.
methods: The paper proposes using three simple forms of soft constraints to capture user preferences: scoring continuous features, bounding feature values, and ranking categorical features. Additionally, the paper proposes a gradient-based approach to identify User Preferred Actionable Recourse (UP-AR).
results: The paper conducts extensive experiments to verify the effectiveness of the proposed approach.

Abstract
Machine Learning's proliferation in critical fields such as healthcare, banking, and criminal justice has motivated the creation of tools which ensure trust and transparency in ML models. One such tool is Actionable Recourse (AR) for negatively impacted users. AR describes recommendations of cost-efficient changes to a user's actionable features to help them obtain favorable outcomes. Existing approaches for providing recourse optimize for properties such as proximity, sparsity, validity, and distance-based costs. However, an often-overlooked but crucial requirement for actionability is a consideration of User Preference to guide the recourse generation process. In this work, we attempt to capture user preferences via soft constraints in three simple forms: i) scoring continuous features, ii) bounding feature values and iii) ranking categorical features. Finally, we propose a gradient-based approach to identify User Preferred Actionable Recourse (UP-AR). We carried out extensive experiments to verify the effectiveness of our approach.

摘要

Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework

paper_url: http://arxiv.org/abs/2309.02428
repo_url: https://github.com/mhelal/TensorsPyBook
paper_authors: Manal Helal
for: 本文旨在概述张量化，它是一种将多维数据转化为二维矩阵的方法，用于提高深度学习模型的表示和分析能力。
methods: 本文使用了多种多方分析方法，包括盲源分离（BSS）等，并评估了这些方法在不同领域的应用。
results: 研究结果表明，使用多维数据的原始形式和多方分析方法可以更好地捕捉数据中的复杂关系，同时减少模型参数数量和加速处理速度。

Abstract
The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of Helal (2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different domains.

摘要
随着公共领域数据的急速增长和深度学习模型的复杂化，需要更有效的数据表示和分析技术。这篇论文是基于Helal（2023）的研究，旨在提供tensorization的全面介绍。这种转换方法 bridge了数据的自然多维性和通常用于线性代数学习算法中的简单二维矩阵之间的差异。本文探讨tensorization的过程、多维数据源、多方分析方法和其好处。此外，还提供了一个小例子， Comparing 2-dimensional算法和多方算法在Python中。结果表明，多方分析方法更加表达力。与传统的维度惩罚理论相反，使用原始多维数据和应用多方分析方法可以捕捉多维维度之间的复杂关系，同时减少模型参数的数量和加速处理。本文还提供了多方分析方法的survey和与不同领域的各种深度神经网络模型的集成。Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

Monotone Tree-Based GAMI Models by Adapting XGBoost

paper_url: http://arxiv.org/abs/2309.02426
repo_url: None
paper_authors: Linwei Hu, Soroush Aramideh, Jie Chen, Vijayan N. Nair
for:This paper aims to develop a monotone tree-based functional ANOVA model, called monotone GAMI-Tree, to address the issue of non-monotonicity in existing GAMI models.methods:The proposed method uses a filtering technique to select important interactions, followed by fitting a monotone XGBoost algorithm with the selected interactions. The results are then parsed and purified to obtain a monotone GAMI model.results:The proposed method is demonstrated on simulated datasets and shows better performance than existing GAMI models in terms of monotonicity and accuracy. The results also show that the main effects can be monotone, but the interactions may not be monotone.

Abstract
Recent papers have used machine learning architecture to fit low-order functional ANOVA models with main effects and second-order interactions. These GAMI (GAM + Interaction) models are directly interpretable as the functional main effects and interactions can be easily plotted and visualized. Unfortunately, it is not easy to incorporate the monotonicity requirement into the existing GAMI models based on boosted trees, such as EBM (Lou et al. 2013) and GAMI-Lin-T (Hu et al. 2022). This paper considers models of the form $f(x)=\sum_{j,k}f_{j,k}(x_j, x_k)$ and develops monotone tree-based GAMI models, called monotone GAMI-Tree, by adapting the XGBoost algorithm. It is straightforward to fit a monotone model to $f(x)$ using the options in XGBoost. However, the fitted model is still a black box. We take a different approach: i) use a filtering technique to determine the important interactions, ii) fit a monotone XGBoost algorithm with the selected interactions, and finally iii) parse and purify the results to get a monotone GAMI model. Simulated datasets are used to demonstrate the behaviors of mono-GAMI-Tree and EBM, both of which use piecewise constant fits. Note that the monotonicity requirement is for the full model. Under certain situations, the main effects will also be monotone. But, as seen in the examples, the interactions will not be monotone.

摘要
现在的研究论文使用机器学习建筑物来适应低阶函数ANOVA模型的主效应和次阶交互。这些GAMI（GAM + 交互）模型可以直接解释为函数主效应和交互，可以轻松地图表和可见化。然而，对于现有的GAMI模型，如EBM（Lou et al. 2013）和GAMI-Lin-T（Hu et al. 2022），不能直接包含幂随机性的要求。这篇论文考虑模型的形式为 $f(x)=\sum_{j,k}f_{j,k}(x_j, x_k)$，并开发了幂随机性基于XGBoost算法的幂随机性GAMI模型，称为幂随机性GAMI-Tree。使用XGBoost算法直接适应幂随机性模型是 straightforward。然而，适应后的模型仍然是黑盒模型。我们采用了不同的方法：i）使用筛选技术确定重要的交互，ii）适应幂随机性XGBoost算法，并iii）解析和纯化结果，以获得幂随机性GAMI模型。使用 simulate datasets 进行了示例，并证明了 mono-GAMI-Tree 和 EBM 都使用 piecewise constant fits 的行为。请注意，幂随机性要求是全模型的，而不是每个主效应或交互。在某些情况下，主效应也可能是幂随机的，但交互通常不是。

On the Minimax Regret in Online Ranking with Top-k Feedback

paper_url: http://arxiv.org/abs/2309.02425
repo_url: None
paper_authors: Mingyuan Zhang, Ambuj Tewari
For: Online ranking with top $k$ feedback* Methods: Partial monitoring techniques, minimax regret rates* Results: Full characterization of minimax regret rates for all $k$ and for Pairwise Loss, Discounted Cumulative Gain, and Precision@n, efficient algorithm for Precision@nHere is the Chinese translation of the three information points:* For: online排名 WITH top $k$ 反馈* Methods: partial monitoring 技术, minimax regret rates* Results: all $k$ 和 Pairwise Loss, Discounted Cumulative Gain, Precision@n 中的完整characterization, efficient algorithm for Precision@nI hope this helps! Let me know if you have any other questions.

Abstract
In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top $k$ feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top $k$ feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top $k$ feedback model for all $k$ and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n.

摘要
在在线排名中，一种学习算法顺序排序一组项目，并接收反馈分数以评估其排名的有用性。由于获取反馈分数通常需要人工标注，因此受限于top-$k$反馈的情况是非常有趣。查得和特ва里[2017]提出了在线排名算法的框架，并使用partial monitoring技术进行分析。在这篇论文中，我们进一步调查了在线排名的top-$k$反馈问题，并解决了查得和特wa里[2017]提出的一些开放问题。我们提供了所有$k$的最小最大偏差率的完整 характеристику，以及对Pairwise Loss、Discounted Cumulative Gain和Precision@n的排名性能指标进行分析。此外，我们还提供了实现最小最大偏差率的高效算法。

Maximum Mean Discrepancy Meets Neural Networks: The Radon-Kolmogorov-Smirnov Test

paper_url: http://arxiv.org/abs/2309.02422
repo_url: None
paper_authors: Seunghoon Paik, Michael Celentano, Alden Green, Ryan J. Tibshirani
For: The paper is written to introduce a new test for comparing two distributions, called the Radon-Kolmogorov-Smirnov (RKS) test, which is based on the concept of Radon bounded variation (RBV) and has applications in deep learning.* Methods: The RKS test uses the unit ball in the RBV space of a given smoothness order $k \geq 0$ as the function space $\mathcal{F}$ and maximizes the mean difference over samples from one distribution $P$ versus another $Q$. The test is related to neural networks and can be optimized using modern deep learning toolkits.* Results: The paper proves that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derives its asymptotic null distribution, and conducts extensive experiments to evaluate the strengths and weaknesses of the RKS test compared to the more traditional kernel MMD test.Here is the information in Simplified Chinese:
for: 这篇论文是为了介绍一种新的分布比较测试方法，称为Radon-Kolmogorov-Smirnov（RKS）测试，它基于Radon bounded variation（RBV）的概念，有应用于深度学习。
methods: RKS测试使用 unit ball在RBV空间中的smoothness order $k \geq 0$作为函数空间$\mathcal{F}$，对一个分布$P$和另一个分布$Q$的样本进行最大化mean difference。测试与神经网络有关，可以使用现代深度学习工具包来（近似地）优化测试的 критериion。
results: 论文证明RKS测试在任意不同的分布$P \not= Q$中具有极大的权重， derivation of its asymptotic null distribution, 并进行了广泛的实验来评估RKS测试与传统的kernel MMD测试的优劣点。

Abstract
Maximum mean discrepancy (MMD) refers to a general class of nonparametric two-sample tests that are based on maximizing the mean difference over samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $\mathcal{F}$. Inspired by recent work that connects what are known as functions of $\textit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the MMD defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness order $k \geq 0$. This test, which we refer to as the $\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. This allows us to leverage the power of modern deep learning toolkits to (approximately) optimize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out extensive experiments to elucidate the strengths and weakenesses of the RKS test versus the more traditional kernel MMD test.

摘要
“最大均值差（MMD）是一种通用的非参数TwoSample测试，它基于对一个分布$P$和另一个分布$Q$的样本之间的均值差进行最大化，而且对所有的数据变换$f$生成在一个函数空间$\mathcal{F}$中。 drawing inspiration from recent work that connects functions of Radon bounded variation (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the MMD defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness order $k \geq 0$. this test, which we refer to as the Radon-Kolmogorov-Smirnov (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. it is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. this allows us to leverage the power of modern deep learning toolkits to (approximately) optimize the criterion that underlies the RKS test. we prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out extensive experiments to elucidate the strengths and weaknesses of the RKS test versus the more traditional kernel MMD test.”Note: "Simplified Chinese" is a translation of the text into traditional Chinese characters, but with simpler grammar and vocabulary to make it easier to read and understand.

Computing SHAP Efficiently Using Model Structure Information

paper_url: http://arxiv.org/abs/2309.02417
repo_url: None
paper_authors: Linwei Hu, Ke Wang
for: 本研究旨在提高 SHAP（SHapley Additive exPlanations）计算的效率，因为现有的方法具有几乎 exponential 时间复杂度。
methods: 本文提出了三种方法来计算 SHAP，包括：1) 基于函数分解的方法，可以在 polynomial 时间内计算 SHAP; 2) 基于模型结构信息的方法，可以在 polynomial 时间内计算 SHAP; 3) 基于迭代方法的方法，可以用于 unknown 模型结构情况下。
results: 三种方法均可以准确计算 SHAP，并且 computationally efficient。与 Castor & Gomez (2008) 的采样方法进行比较，本文的方法在几乎所有情况下具有更高的效率。

Abstract
SHAP (SHapley Additive exPlanations) has become a popular method to attribute the prediction of a machine learning model on an input to its features. One main challenge of SHAP is the computation time. An exact computation of Shapley values requires exponential time complexity. Therefore, many approximation methods are proposed in the literature. In this paper, we propose methods that can compute SHAP exactly in polynomial time or even faster for SHAP definitions that satisfy our additivity and dummy assumptions (eg, kernal SHAP and baseline SHAP). We develop different strategies for models with different levels of model structure information: known functional decomposition, known order of model (defined as highest order of interaction in the model), or unknown order. For the first case, we demonstrate an additive property and a way to compute SHAP from the lower-order functional components. For the second case, we derive formulas that can compute SHAP in polynomial time. Both methods yield exact SHAP results. Finally, if even the order of model is unknown, we propose an iterative way to approximate Shapley values. The three methods we propose are computationally efficient when the order of model is not high which is typically the case in practice. We compare with sampling approach proposed in Castor & Gomez (2008) using simulation studies to demonstrate the efficacy of our proposed methods.

摘要

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

paper_url: http://arxiv.org/abs/2309.02412
repo_url: None
paper_authors: Nikita Doikov, Geovani Nunes Grapiglia
for: solving general non-convex optimization problems
methods: Cubically regularized Newton method with finite difference approximations of the derivatives, and an adaptive search procedure that simultaneously fits the regularization constant and the parameters of the finite difference approximations
results: global complexity bound of $\mathcal{O}( n^{1/2} \epsilon^{-3/2})$ function and gradient evaluations for the Hessian-free method, and a bound of $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations for the derivative-free method, which significantly improve the previously known ones in terms of the joint dependence on $n$ and $\epsilon$.

Abstract
In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference approximations of the derivatives. We use a special adaptive search procedure in our algorithms, which simultaneously fits both the regularization constant and the parameters of the finite difference approximations. It makes our schemes free from the need to know the actual Lipschitz constants. Additionally, we equip our algorithms with the lazy Hessian update that reuse a previously computed Hessian approximation matrix for several iterations. Specifically, we prove the global complexity bound of $\mathcal{O}( n^{1/2} \epsilon^{-3/2})$ function and gradient evaluations for our new Hessian-free method, and a bound of $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations for the derivative-free method, where $n$ is the dimension of the problem and $\epsilon$ is the desired accuracy for the gradient norm. These complexity bounds significantly improve the previously known ones in terms of the joint dependence on $n$ and $\epsilon$, for the first-order and zeroth-order non-convex optimization.

摘要
在这项工作中，我们开发了一种基于第一阶 Taylor 展开的卷积规范 Newton 方法，用于解决普通非凸优化问题。我们使用贝叶斯适应搜索算法，以同时调整规范常数和finite differenceapproximation的参数。这使得我们的算法不需要知道实际的Lipschitz常数。此外，我们还在我们的算法中使用懒散Hessian更新，使用已经计算过的Hessian近似矩阵进行多个迭代。我们证明了我们新的Hessian-free方法的全局复杂度上下文为 $\mathcal{O}(n^{1/2} \epsilon^{-3/2})$ 函数和梯度评估数，而derivative-free方法的复杂度上下文为 $\mathcal{O}(n^{3/2} \epsilon^{-3/2})$ 函数评估数，其中 $n$ 是优化问题的维度， $\epsilon$ 是梯度norm的desired accuracy。这些复杂度上下文在之前已知的joint $n$ 和 $\epsilon$ 的依赖关系方面具有显著改进。

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

paper_url: http://arxiv.org/abs/2309.02411
repo_url: None
paper_authors: Bojia Zi, Xianbiao Qi, Lingzhi Wang, Jianan Wang, Kam-Fai Wong, Lei Zhang
for: 本文提出了Delta-LoRA，一种基于大语言模型的参数高效的细化方法。
methods: Delta-LoRA不仅更新低级矩阵 $\bA$ 和 $\bB$，还通过更新 $\bW$ 中的差值（$\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$）来进行学习。
results: 对比LoRA和其他低级适应方法，Delta-LoRA在下游任务中表现出色，并且与LoRA相比，Delta-LoRA具有相同的内存需求和计算成本。

Abstract
In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs). In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $\bA$ and $\bB$, but also propagate the learning to the pre-trained weights $\bW$ via updates utilizing the delta of the product of two low-rank matrices ($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$). Such a strategy effectively addresses the limitation that the incremental update of low-rank matrices is inadequate for learning representations capable for downstream tasks. Moreover, as the update of $\bW$ does not need to compute the gradients of $\bW$ and store their momentums, Delta-LoRA shares comparable memory requirements and computational costs with LoRA. Extensive experiments show that Delta-LoRA significantly outperforms existing low-rank adaptation methods. We further support these results with comprehensive analyses that underscore the effectiveness of Delta-LoRA.

摘要
在这篇论文中，我们提出了Delta-LoRA，它是一种新的参数效率的方法，用于精细调整大型自然语言模型（LLM）。与LoRA和其他低级权 adaptation方法相比，Delta-LoRA不仅更新了低级矩阵$\bA$和$\bB$,还通过使用两个低级矩阵的乘积 delta（$\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$）来升级学习。这种策略有效地解决了低级矩阵逐步更新的限制，使得模型能够更好地适应下游任务。此外，由于更新 $\bW$ 不需要计算 $\bW$ 的梯度和存储它们的动量，Delta-LoRA 与 LoRA 的内存需求和计算成本相同。实验表明，Delta-LoRA 显著超过了现有的低级权 adaptation 方法。我们还提供了全面的分析，以证明 Delta-LoRA 的效果。

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

paper_url: http://arxiv.org/abs/2309.02393
repo_url: None
paper_authors: Philipp Schilk, Niccolò Polvani, Andrea Ronco, Milos Cernak, Michele Magno
for: 这个论文是为了解决远程会议中音频质量受到扰乱的问题而写的。
methods: 这篇论文使用了新型的MEMS骨传导 Microphone，并采用了个性化语音活动检测算法和循环神经网络来解决问题。
results: 论文评估了一种基于骨传导数据的低功耗个性化语音检测算法，并与传统麦克风输入的方法进行比较。实验结果显示，骨传导系统可以在12.8毫秒内正确地检测到语音，并且具有43小时的电池寿命。

Abstract
The recent ubiquitous adoption of remote conferencing has been accompanied by omnipresent frustration with distorted or otherwise unclear voice communication. Audio enhancement can compensate for low-quality input signals from, for example, small true wireless earbuds, by applying noise suppression techniques. Such processing relies on voice activity detection (VAD) with low latency and the added capability of discriminating the wearer's voice from others - a task of significant computational complexity. The tight energy budget of devices as small as modern earphones, however, requires any system attempting to tackle this problem to do so with minimal power and processing overhead, while not relying on speaker-specific voice samples and training due to usability concerns. This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones. Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications. Furthermore, the paper accurately evaluates a proposed low-power personalized speech detection algorithm based on bone conduction data and a recurrent neural network running on the implemented research platform. This algorithm is compared to an approach based on traditional microphone input. The performance of the bone conduction system, achieving detection of speech within 12.8ms at an accuracy of 95\% is evaluated. Different SoC choices are contrasted, with the final implementation based on the cutting-edge Ambiq Apollo 4 Blue SoC achieving 2.64mW average power consumption at 14uJ per inference, reaching 43h of battery life on a miniature 32mAh li-ion cell and without duty cycling.

摘要
现代无线耳机的普遍采用导致了远程会议中的声音扭曲或不清楚的问题，而声音提升技术可以补做小质量输入信号的问题。这种处理需要具备快速响应和可区分戴户的声音和其他声音的能力，但是由于设备的能量限制，系统不能依赖于特定的说话人样本和训练。本文描述了一种自定义研究平台，基于新型商业MEMS骨传声 microphone，用于低功耗无线耳机。这种 microphone 可以更好地隔离戴户的声音，以便实现个性化声音活动检测和其他声音提升应用。此外，文章详细评估了一种基于骨传声数据和回归神经网络的低功耗个性化声音检测算法。这种算法与传统 микрофон输入的方法进行比较，并评估了骨传声系统的性能，包括检测speech within 12.8ms 的精度为 95%。文章还对不同的SoC选择进行了比较，最终实现基于Ambiq Apollo 4 Blue SoC的实现，占用2.64mW的平均电力consumption，可以达到32mAh电池的43h寿命，无需循环停机。

Explaining grokking through circuit efficiency

paper_url: http://arxiv.org/abs/2309.02390
repo_url: None
paper_authors: Vikrant Varma, Rohin Shah, Zachary Kenton, János Kramár, Ramana Kumar
for: 本研究探讨了神经网络的总结和泛化问题，具体来说是 Grokking 现象，即训练准确率很高 yet 泛化率很低的网络，在进一步训练后会转变为泛化率很高 yet 训练准确率很低。
methods: 作者提出了一种解释 Grokking 现象的假设，即任务存在一个泛化解决方案和一个记忆解决方案，其中泛化解决方案 slower to learn 但更高效，生成更大的 logits 与参数 нор 相同。作者还提出了四个新预测，并证明了其中的四个。
results: 作者通过实验证明了他们的假设，并发现了两种新的行为：ungrokking 和 semi-grokking。ungrokking 是指网络从完美测试率下降到低测试率的现象，而 semi-grokking 是指网络在部分测试数据上显示延迟的泛化行为。

Abstract
One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when the task admits a generalising solution and a memorising solution, where the generalising solution is slower to learn but more efficient, producing larger logits with the same parameter norm. We hypothesise that memorising circuits become more inefficient with larger training datasets while generalising circuits do not, suggesting there is a critical dataset size at which memorisation and generalisation are equally efficient. We make and confirm four novel predictions about grokking, providing significant evidence in favour of our explanation. Most strikingly, we demonstrate two novel and surprising behaviours: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalisation to partial rather than perfect test accuracy.

摘要
一种非常有趣的神经网络泛化问题是“grokking”：一个网络在完美训练后却表现出低泛化性能。我们提出，grokking发生在任务允许一个泛化解决方案和一个记忆解决方案，其中泛化解决方案需要更长的时间学习，但生成更大的logits，同样的参数 нор。我们假设记忆化环路在更大的训练数据集上变得更不效率，而泛化环路不变。我们提出四个新预测，并证明了这些预测。最引人注目的是我们发现了两种新的行为：ungrokking和半泛化。ungrokking是指一个网络在完美训练后却降低到低测试精度，而半泛化是指一个网络在部分测试数据上显示延迟的泛化。

A Lightweight and Transferable Design for Robust LEGO Manipulation

paper_url: http://arxiv.org/abs/2309.02354
repo_url: None
paper_authors: Ruixuan Liu, Yifan Sun, Changliu Liu
for: 这篇论文是研究如何实现安全高效的机器人LEGO拼接。
methods: 本论文使用硬件软件共设计，设计了一个终端工具（EOAT），以便大型工业机器人轻松地拼接LEGO块。此外，本论文使用进化策略安全地优化机器人运动，以实现100%的成功率。
results: 实验表明，EOAT可靠地进行LEGO拼接，而学习框架可以安全地提高拼接性能，达到100%的成功率。另外，本解决方案在多个机器人（FANUC LR-mate 200id/7L和Yaskawa GP4）上进行了多机器人扩展和传输性测试，以证明其通用性和可重复性。最后，我们表明了该解决方案可以实现可持续的机器人LEGO拼接，机器人可以重复地组装和解组不同的原型。

Abstract
LEGO is a well-known platform for prototyping pixelized objects. However, robotic LEGO prototyping (i.e. manipulating LEGO bricks) is challenging due to the tight connections and accuracy requirement. This paper investigates safe and efficient robotic LEGO manipulation. In particular, this paper reduces the complexity of the manipulation by hardware-software co-design. An end-of-arm tool (EOAT) is designed, which reduces the problem dimension and allows large industrial robots to easily manipulate LEGO bricks. In addition, this paper uses evolution strategy to safely optimize the robot motion for LEGO manipulation. Experiments demonstrate that the EOAT performs reliably in manipulating LEGO bricks and the learning framework can effectively and safely improve the manipulation performance to a 100\% success rate. The co-design is deployed to multiple robots (i.e. FANUC LR-mate 200id/7L and Yaskawa GP4) to demonstrate its generalizability and transferability. In the end, we show that the proposed solution enables sustainable robotic LEGO prototyping, in which the robot can repeatedly assemble and disassemble different prototypes.

摘要
LEGO 是一个知名的原型平台，但是机器人 LEGO 拼接 (i.e. 拼接 LEGO 块) 具有挑战性，主要是因为连接紧密和精度要求高。这篇论文 investigate 安全和高效的机器人 LEGO 拼接方法。特别是这篇论文通过硬件软件共设计来降低拼接复杂性。一个终端工具 (EOAT) 被设计，可以减少问题维度，使大型工业机器人更容易地拼接 LEGO 块。此外，这篇论文使用进化策略来安全地优化机器人运动，以达到100% 的成功率。实验表明，EOAT 可靠地 manipulating LEGO 块，并且学习框架可以效果地提高拼接性能。 co-design 被部署到多个机器人 (i.e. FANUC LR-mate 200id/7L 和 Yaskawa GP4)，以示其通用性和传递性。最后，我们示出了我们的解决方案可以实现可持续的机器人 LEGO 拼接，机器人可以重复地组装和解组不同的原型。

Exact Inference for Continuous-Time Gaussian Process Dynamics

paper_url: http://arxiv.org/abs/2309.02351
repo_url: None
paper_authors: Katharina Ensinger, Nicholas Tagliapietra, Sebastian Ziesche, Sebastian Trimpe
for: 提取真实连续时间系统的GP模型
methods: 利用多步和泰勒积分器，实现直接GP推理
results: 实验和理论表明，该方法可以准确地表示连续时间系统的GP模型

Abstract
Physical systems can often be described via a continuous-time dynamical system. In practice, the true system is often unknown and has to be learned from measurement data. Since data is typically collected in discrete time, e.g. by sensors, most methods in Gaussian process (GP) dynamics model learning are trained on one-step ahead predictions. This can become problematic in several scenarios, e.g. if measurements are provided at irregularly-sampled time steps or physical system properties have to be conserved. Thus, we aim for a GP model of the true continuous-time dynamics. Higher-order numerical integrators provide the necessary tools to address this problem by discretizing the dynamics function with arbitrary accuracy. Many higher-order integrators require dynamics evaluations at intermediate time steps making exact GP inference intractable. In previous work, this problem is often tackled by approximating the GP posterior with variational inference. However, exact GP inference is preferable in many scenarios, e.g. due to its mathematical guarantees. In order to make direct inference tractable, we propose to leverage multistep and Taylor integrators. We demonstrate how to derive flexible inference schemes for these types of integrators. Further, we derive tailored sampling schemes that allow to draw consistent dynamics functions from the learned posterior. This is crucial to sample consistent predictions from the dynamics model. We demonstrate empirically and theoretically that our approach yields an accurate representation of the continuous-time system.

摘要
Physical systems can often be described using a continuous-time dynamical system. However, the true system is often unknown and must be learned from measurement data. Since data is typically collected in discrete time, e.g. by sensors, most methods in Gaussian process (GP) dynamics model learning are trained on one-step ahead predictions. This can become problematic in certain scenarios, e.g. if measurements are provided at irregularly-sampled time steps or physical system properties must be conserved. Therefore, we aim to learn a GP model of the true continuous-time dynamics. Higher-order numerical integrators provide the necessary tools to address this problem by discretizing the dynamics function with arbitrary accuracy. However, many higher-order integrators require dynamics evaluations at intermediate time steps, making exact GP inference intractable.In previous work, this problem is often tackled by approximating the GP posterior with variational inference. However, exact GP inference is preferable in many scenarios, e.g. due to its mathematical guarantees. To make direct inference tractable, we propose to leverage multistep and Taylor integrators. We derive how to derive flexible inference schemes for these types of integrators. Additionally, we derive tailored sampling schemes that allow drawing consistent dynamics functions from the learned posterior. This is crucial to sample consistent predictions from the dynamics model.We demonstrate empirically and theoretically that our approach yields an accurate representation of the continuous-time system.Note: The text is translated into Simplified Chinese, which is a standardized form of Chinese used in mainland China and Singapore. The translation is written in Traditional Chinese characters, which is the standard form of Chinese used in Taiwan and other countries.这些physical systems可以用一个连续时间的动力系统来描述。然而，真正的系统是未知的，需要从测量数据学习。由于数据通常是在离散时间收集的，例如由感应器收集，因此大多数GP动力系统学习方法是在一步之前预测。这可能会在某些情况下问题，例如测量是在不规则时间步进行的或物理系统特性需要保持。因此，我们的目标是学习一个GP模型的真正连续时间系统。高级数字积分器提供了必要的工具来解决这个问题。然而，许多高级积分器需要在中途时间步进行动力评估，使得GP数据 posterior 不可靠。在过去的工作中，这个问题通常是通过将GP posterior 近似为多标量数据学习的方法来解决。然而，精确的GP数据 posterior 是在许多情况下更好的，例如因为它具有数学保证。为了让直接推理可行，我们提议使用多步和泰勒积分器。我们 derivation 了如何 derivation flexible inference schemes for these types of integrators. 此外，我们也 derivation 了适合的抽样方案，允许从学习的 posterior 中获得一致的动力函数。这是关键的，以获得一致的预测。我们在实验和理论上证明了我们的方法可以实现一个精确的连续时间系统表示。

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

paper_url: http://arxiv.org/abs/2309.02334
repo_url: None
paper_authors: Marta Andronic, George A. Constantinides
for: 这个论文是为了提高深度学习推理的启动速度和面积使用Field-programmable gate arrays (FPGAs)的实现。
methods: 这个论文提出了一种使用多ivariate polynomials作为深度学习模型的基本建置物件，并将这些多ivariate polynomials hide在FPGA的Lookup Table (LUTs)中，以避免额外的负载。
results: 这个论文显示了使用多ivariate polynomials可以实现相同的准确性，但是使用许多 fewer layers of soft logic，从而获得了显著的启动速度和面积改善。这个方法在三个任务中得到了证明：网络入侵检测、CERN大 HadronCollider上的戳机识别和MNIST datasets上的手写数字识别。

Abstract
Field-programmable gate arrays (FPGAs) are widely used to implement deep learning inference. Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded the combination of linear maps and nonlinear activations inside FPGA lookup tables (LUTs). Our work is motivated by the idea that the LUTs in an FPGA can be used to implement a much greater variety of functions than this. In this paper, we propose a novel approach to training neural networks for FPGA deployment using multivariate polynomials as the basic building block. Our method takes advantage of the flexibility offered by the soft logic, hiding the polynomial evaluation inside the LUTs with zero overhead. We show that by using polynomial building blocks, we can achieve the same accuracy using considerably fewer layers of soft logic than by using linear functions, leading to significant latency and area improvements. We demonstrate the effectiveness of this approach in three tasks: network intrusion detection, jet identification at the CERN Large Hadron Collider, and handwritten digit recognition using the MNIST dataset.

摘要
Field-programmable gate arrays (FPGAs) 广泛用于深度学习推理实现。标准深度神经网络推理包括交叠的线性映射和非线性活化函数的计算。先前的工作对于超低延迟实现做出了硬编码了线性映射和非线性活化函数在FPGALookup表（LUTs）中的方法。我们的工作是基于FPGA LUTs 可以实现更多种函数的想法。在这篇论文中，我们提出了一种新的方法，使用多变量多项式作为神经网络训练的基本构件。我们的方法利用FPGA soft logic 的灵活性，将多项式评估隐藏在LUTs 中，无损失。我们显示，使用多项式构件可以与使用线性函数相比，实现同等准确性，但是具有显著的延迟和面积改进。我们在三个任务中证明了这种方法的有效性：网络入侵检测、在CERN大 адроン卫星中的戳彩识别和使用MNIST数据集的手写数字识别。

Resilient VAE: Unsupervised Anomaly Detection at the SLAC Linac Coherent Light Source

paper_url: http://arxiv.org/abs/2309.02333
repo_url: None
paper_authors: Ryan Humble, William Colocho, Finn O’Shea, Daniel Ratner, Eric Darve
for: 这篇论文旨在应用深度学习进行异常检测，但现有方法假设有正常训练集（i.e., 无异常数据）或完全标签训练集。在复杂的工程系统中，例如粒子加速器，标签是罕见和昂贵的；为了进行异常检测在这些情况下，我们必须搁置这些假设，并使用完全无监督的方法。
methods: 这篇论文提出了具有抗衰变性的自适应器（ResVAE），一种深度生成模型，用于异常检测。ResVAE在训练过程中学习每个样本的异常可能性，以及每个个别特征的异常可能性，并将这些可能性用于有效地忽略在训练数据中的异常样本。
results: 这篇论文应用ResVAE进行了加速器异常检测，并使用了射测系统中的枪位监控数据。 results show that ResVAE exhibits exceptional ability in identifying various types of anomalies present in the accelerator, and provides feature-level anomaly attribution.

Abstract
Significant advances in utilizing deep learning for anomaly detection have been made in recent years. However, these methods largely assume the existence of a normal training set (i.e., uncontaminated by anomalies) or even a completely labeled training set. In many complex engineering systems, such as particle accelerators, labels are sparse and expensive; in order to perform anomaly detection in these cases, we must drop these assumptions and utilize a completely unsupervised method. This paper introduces the Resilient Variational Autoencoder (ResVAE), a deep generative model specifically designed for anomaly detection. ResVAE exhibits resilience to anomalies present in the training data and provides feature-level anomaly attribution. During the training process, ResVAE learns the anomaly probability for each sample as well as each individual feature, utilizing these probabilities to effectively disregard anomalous examples in the training data. We apply our proposed method to detect anomalies in the accelerator status at the SLAC Linac Coherent Light Source (LCLS). By utilizing shot-to-shot data from the beam position monitoring system, we demonstrate the exceptional capability of ResVAE in identifying various types of anomalies that are visible in the accelerator.

摘要
Translation notes:* "Significant advances" is translated as "重要的进步" (zhòng yì de jìn bo)* "Utilizing deep learning" is translated as "使用深度学习" (shǐ yòu shēn dào xué xí)* "Anomaly detection" is translated as "异常检测" (yì cháng jiǎn té)* "Normal training set" is translated as "正常的训练集" (zhèng cháng de xùn xí jí)* "Completely labeled training set" is translated as "完全标注的训练集" (quán zhì biǎo xiǎo de xùn xí jí)* "Particle accelerators" is translated as "粒子加速器" (dì zhí jiā sù qì)* "Sparse and expensive labels" is translated as "稀疏和昂贵的标签" (xī shū hé gōng jí de biāo jiā)* "Completely unsupervised method" is translated as "完全无监督的方法" (quán zhì wú jiǎn dū de fāng fá)* "Resilient Variational Autoencoder" is translated as "鲁棒的变量自适应器" (ròng bò de biàn yù zì xiǎng qì)* "Feature-level anomaly attribution" is translated as "特征层异常报告" (fèi yì zhì yì cháng bào gāo)* "Shot-to-shot data" is translated as "一把一的数据" (yī bǎ yī de xùn xí)* "Beam position monitoring system" is translated as "贝壳位置监测系统" (bēi kē zhì dào jiān tè xì tuán)* "SLAC Linac Coherent Light Source" is translated as "SLAC激光干涉源" (SLA cè liàng kē gǎn chuī yuán)

A study on the impact of pre-trained model on Just-In-Time defect prediction

paper_url: http://arxiv.org/abs/2309.02317
repo_url: None
paper_authors: Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W. K. Chan, Bo Jiang
for: 本研究主要针对Just-In-Time（JIT）缺陷预测任务，旨在探讨不同预训模型之间的关系。
methods: 我们建立了六个模型：RoBERTaJIT、CodeBERTJIT、BARTJIT、PLBARTJIT、GPT2JIT和CodeGPTJIT，每个模型都使用不同的预训模型作为底层模型。我们系统地探讨这些模型之间的差异和连接。
results: 我们发现每个模型都有改进，当预训模型的类似性较高时，需要的训练资源减少得更多。我们 также发现提交代码对缺陷探测具有重要作用，不同的预训模型在少量数据下场景下表现出不同的缺陷探测能力。这些结果为JIT缺陷预测任务中使用预训模型优化提供新的视角，并 highlight了需要更多注意的因素。

Abstract
Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.

摘要

Inferring effective couplings with Restricted Boltzmann Machines

paper_url: http://arxiv.org/abs/2309.02292
repo_url: https://github.com/alfonso-navas/inferring_effective_couplings_with_RBMs
paper_authors: Aurélien Decelle, Cyril Furtlehner, Alfonso De Jesus Navas Gómez, Beatriz Seoane
for: 本研究旨在提出一种简单的解决方案，使能够准确地理解生成模型中的物理含义。
methods: 本研究使用了 Restricted Boltzmann Machine（RBM），并提出了一种将 RBM 的能量函数映射到有效磁矩度 Hamiltonian 的方法，该方法包括了所有可能的交互次数，超过了传统的对应 inverse Ising 方法所考虑的对习次数。
results: 研究人员通过控制的数值实验，训练 RBM 使用平衡样本，以验证该方法的有效性。结果表明，该方法可以学习正确的交互网络，并可以应用于模型复杂数据集。此外，研究人员还评估了不同训练方法的模型质量。

Abstract
Generative models offer a direct way to model complex data. Among them, energy-based models provide us with a neural network model that aims to accurately reproduce all statistical correlations observed in the data at the level of the Boltzmann weight of the model. However, one challenge is to understand the physical interpretation of such models. In this study, we propose a simple solution by implementing a direct mapping between the energy function of the Restricted Boltzmann Machine and an effective Ising spin Hamiltonian that includes high-order interactions between spins. This mapping includes interactions of all possible orders, going beyond the conventional pairwise interactions typically considered in the inverse Ising approach, and allowing the description of complex datasets. Earlier works attempted to achieve this goal, but the proposed mappings did not do properly treat the complexity of the problem or did not contain direct prescriptions for practical application. To validate our method, we performed several controlled numerical experiments where we trained the RBMs using equilibrium samples of predefined models containing local external fields, two-body and three-body interactions in various low-dimensional topologies. The results demonstrate the effectiveness of our proposed approach in learning the correct interaction network and pave the way for its application in modeling interesting datasets. We also evaluate the quality of the inferred model based on different training methods.

摘要
生成模型提供了直接模型复杂数据的方式。其中，能量基型模型为我们提供了一个基于神经网络的模型，该模型的目标是在数据中识别所有 estadísticos correlations，并在模型的Boltzmann权重水平上准确地复制它们。然而，一个挑战是理解这些模型的物理解释。在这项研究中，我们提议一种简单的解决方案，通过将Restricted Boltzmann Machine（RBM）的能量函数直接映射到包含高阶交互的有效牛顿矩阵 Hamiltoniano中。这种映射包括所有可能的顺序交互，超过了传统的对应 inverse Ising 方法中考虑的对应交互，并允许描述复杂的数据集。先前的工作尝试了实现这个目标，但是提议的映射没有正确地处理问题的复杂性或者没有直接的实践指南。为验证我们的方法，我们进行了一些控制性的数字实验，在 pré-definido 模型中使用平衡样本，包括局部外场、二体和三体交互在多种低维度拓扑中。结果表明了我们提议的方法的效iveness，可以学习正确的交互网络，并为复杂的数据集模型提供了道路。我们还评估了不同的培训方法的模型质量。

A Comparison of Residual-based Methods on Fault Detection

paper_url: http://arxiv.org/abs/2309.02274
repo_url: None
paper_authors: Chi-Ching Hsu, Gaetan Frusque, Olga Fink
for: 本研究旨在比较两种残差基于方法：自动编码器和输入输出模型，以检测FAULTS并区分不同的潜在FAULT类型。
methods: 两种方法都使用残差来检测FAULTS，并且都使用健康数据进行训练。
results: 两种方法都可以在约20个循环后检测FAULTS，并保持低的假阳性率。而输入输出模型提供更好的解释力，包括可能的FAULT类型和可能的FAULT Component。

Abstract
An important initial step in fault detection for complex industrial systems is gaining an understanding of their health condition. Subsequently, continuous monitoring of this health condition becomes crucial to observe its evolution, track changes over time, and isolate faults. As faults are typically rare occurrences, it is essential to perform this monitoring in an unsupervised manner. Various approaches have been proposed not only to detect faults in an unsupervised manner but also to distinguish between different potential fault types. In this study, we perform a comprehensive comparison between two residual-based approaches: autoencoders, and the input-output models that establish a mapping between operating conditions and sensor readings. We explore the sensor-wise residuals and aggregated residuals for the entire system in both methods. The performance evaluation focuses on three tasks: health indicator construction, fault detection, and health indicator interpretation. To perform the comparison, we utilize the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dynamical model, specifically a subset of the turbofan engine dataset containing three different fault types. All models are trained exclusively on healthy data. Fault detection is achieved by applying a threshold that is determined based on the healthy condition. The detection results reveal that both models are capable of detecting faults with an average delay of around 20 cycles and maintain a low false positive rate. While the fault detection performance is similar for both models, the input-output model provides better interpretability regarding potential fault types and the possible faulty components.

摘要
<>为了探测复杂工业系统中的FAULT， initially understanding its health condition是非常重要的一步。随后，对这个健康状况的持续监测变得非常重要，以观察其发展、跟踪变化并孤立FAULT。由于FAULT是非常罕见的，因此需要在无监督的情况下进行监测。多种方法已经被提议，不仅可以探测FAULT，还可以分辨不同的可能的FAULT类型。在本研究中，我们进行了总比较两种异常检测方法：自适应神经网络和输入输出模型，它们都可以建立运行条件和传感器读ings之间的映射。我们分析了传感器级别和系统级别的差异，并对三个不同的FAULT类型进行了评估。我们使用了商用模块式飞机发动机 simulate（C-MAPSS）动力模型，特别是一个包含三种FAULT类型的液冷发动机数据集。所有模型都是由健康数据进行了唯一的训练。异常检测是通过设置基于健康状况的阈值来实现的。检测结果显示，两种模型都能够在20轮异常检测，并保持低的假阳性率。虽然异常检测性能相似，但输入输出模型提供了更好的可解释性，即可能的FAULT类型和可能的异常组件。<>

Graph-Based Automatic Feature Selection for Multi-Class Classification via Mean Simplified Silhouette

paper_url: http://arxiv.org/abs/2309.02272
repo_url: https://github.com/davidlevinwork/GB-AFS
paper_authors: David Levin, Gonen Singer
for: 本研究提出了一种新的图structured filter方法，用于自动特征选择（简称GB-AFS），以便在多类分类任务中提高预测性能。
methods: 该方法使用Jeffries-Matusita距离和t-分布随机邻域嵌入（t-SNE）生成一个低维度空间，以反映每个特征在每对类之间如何有效地分化。而选择最小数量的特征则使用我们新提出的平均简化 Silhouette指数（简称MSS），用于评估特征选择任务中的凝结结果。
results: 实验结果表明，提案的GB-AFS方法在公共数据集上表现出优于其他筛子基本方法和自动特征选择方法。此外，GB-AFS方法可以保持使用所有特征时的准确率，只使用$7%$到$30%$的特征，从而降低了分类时间的消耗，从$15%$降低到$70%$.

Abstract
This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not require any user-defined parameters such as the number of features to select. The methodology employs the Jeffries-Matusita (JM) distance in conjunction with t-distributed Stochastic Neighbor Embedding (t-SNE) to generate a low-dimensional space reflecting how effectively each feature can differentiate between each pair of classes. The minimum number of features is selected using our newly developed Mean Simplified Silhouette (abbreviated as MSS) index, designed to evaluate the clustering results for the feature selection task. Experimental results on public data sets demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches. Moreover, the proposed algorithm maintained the accuracy achieved when utilizing all features, while using only $7\%$ to $30\%$ of the features. Consequently, this resulted in a reduction of the time needed for classifications, from $15\%$ to $70\%$.

摘要
The methodology employs Jeffries-Matusita distance and t-distributed Stochastic Neighbor Embedding (t-SNE) to generate a low-dimensional space that reflects how effectively each feature differentiates between each pair of classes. The minimum number of features is selected using the Mean Simplified Silhouette (MSS) index, which evaluates the clustering results for the feature selection task.Experimental results on public data sets demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches. The proposed algorithm achieved the same accuracy as using all features, while using only 7% to 30% of the features, resulting in a significant reduction in classification time, from 15% to 70%.

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

paper_url: http://arxiv.org/abs/2309.02476
repo_url: None
paper_authors: Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan Yao, Tong Zhang
For: 这个研究旨在提出一个可靠的方法来选择资料subset，以减少深度学习模型的训练成本和错误。* Methods: 本研究使用了潜在子集选择和活动学习，并提出了一个 theoretically 优质的解决方案，名为COPS（uncertainty based OPtimal Sub-sampling），可以对于线性软MAX regression进行最佳化。* Results: 在实验中，COPS 方法与基eline方法比较，结果显示 COPS 方法在深度学习任务中具有superior表现，并且可以对于模型缺失和低密度样本进行下条件调整。

Abstract
Modern deep learning heavily relies on large labeled datasets, which often comse with high costs in terms of both manual labeling and computational resources. To mitigate these challenges, researchers have explored the use of informative subset selection techniques, including coreset selection and active learning. Specifically, coreset selection involves sampling data with both input ($\bx$) and output ($\by$), active learning focuses solely on the input data ($\bx$). In this study, we present a theoretically optimal solution for addressing both coreset selection and active learning within the context of linear softmax regression. Our proposed method, COPS (unCertainty based OPtimal Sub-sampling), is designed to minimize the expected loss of a model trained on subsampled data. Unlike existing approaches that rely on explicit calculations of the inverse covariance matrix, which are not easily applicable to deep learning scenarios, COPS leverages the model's logits to estimate the sampling ratio. This sampling ratio is closely associated with model uncertainty and can be effectively applied to deep learning tasks. Furthermore, we address the challenge of model sensitivity to misspecification by incorporating a down-weighting approach for low-density samples, drawing inspiration from previous works. To assess the effectiveness of our proposed method, we conducted extensive empirical experiments using deep neural networks on benchmark datasets. The results consistently showcase the superior performance of COPS compared to baseline methods, reaffirming its efficacy.

摘要
现代深度学习强调大量标注数据，经常带来高的人工标注和计算成本。为了缓解这些挑战，研究人员已经探索了有用的子集选择技术，包括核心集选择和活动学习。特别是，核心集选择是通过采样数据来减少数据量，而活动学习则仅关注输入数据。在本研究中，我们提出了在线性软MAX回归中的理论优化解决方案，名为COPS（uncertainty based Optimal Sub-sampling）。我们的方法旨在降低由subsampled数据训练的模型预测错误的期望损失。与现有方法不同，COPS不需要显式计算 inverse covariance matrix，而是利用模型的logits来估算采样比率。这个采样比率与模型uncertainty有很Close关系，可以有效应用于深度学习任务。此外，我们还解决了模型偏置低密度样本的挑战，通过引入低密度样本下降值策略，这种策略 drew inspiration from previous works。为评估我们的提出方法的有效性，我们在深度神经网络上进行了广泛的实验。结果 consistently showcase COPS比基准方法更好，这再次证明了其效果。

RoBoSS: A Robust, Bounded, Sparse, and Smooth Loss Function for Supervised Learning

paper_url: http://arxiv.org/abs/2309.02250
repo_url: https://github.com/mtanveer1/RoBoSS
paper_authors: Mushir Akhtar, M. Tanveer, Mohd. Arshad
For: 本研究提出了一种新的稳定、缩短、稀疏和均匀（RoBoSS）损失函数，用于改进支持向量机（SVM）的超级vised学习算法。* Methods: 本文 integra RoBoSS损失函数到SVM框架中，并提出了一种新的稳定化算法($\mathcal{L}{rbss}$-SVM)。同时，本文也进行了对RoBoSS损失函数的分类准确性和泛化能力的理论分析。* Results: 实验表明，使用提出的$\mathcal{L}{rbss}$-SVM模型，在88个真实世界UCI和KEEL数据集上表现出色，并且在医学领域中，在EEG信号数据集和Breast Cancer（BreaKHis）数据集上也得到了惊喜的结果。

Abstract
In the domain of machine learning algorithms, the significance of the loss function is paramount, especially in supervised learning tasks. It serves as a fundamental pillar that profoundly influences the behavior and efficacy of supervised learning algorithms. Traditional loss functions, while widely used, often struggle to handle noisy and high-dimensional data, impede model interpretability, and lead to slow convergence during training. In this paper, we address the aforementioned constraints by proposing a novel robust, bounded, sparse, and smooth (RoBoSS) loss function for supervised learning. Further, we incorporate the RoBoSS loss function within the framework of support vector machine (SVM) and introduce a new robust algorithm named $\mathcal{L}_{rbss}$-SVM. For the theoretical analysis, the classification-calibrated property and generalization ability are also presented. These investigations are crucial for gaining deeper insights into the performance of the RoBoSS loss function in the classification tasks and its potential to generalize well to unseen data. To empirically demonstrate the effectiveness of the proposed $\mathcal{L}_{rbss}$-SVM, we evaluate it on $88$ real-world UCI and KEEL datasets from diverse domains. Additionally, to exemplify the effectiveness of the proposed $\mathcal{L}_{rbss}$-SVM within the biomedical realm, we evaluated it on two medical datasets: the electroencephalogram (EEG) signal dataset and the breast cancer (BreaKHis) dataset. The numerical results substantiate the superiority of the proposed $\mathcal{L}_{rbss}$-SVM model, both in terms of its remarkable generalization performance and its efficiency in training time.

摘要
在机器学习算法领域，损失函数的重要性非常高，特别在指导学习任务中。它作为基础的核心因素，深刻影响了指导学习算法的行为和效果。传统的损失函数，虽广泛使用，但经常难以处理噪音和高维数据，阻碍模型解释性，并导致训练过程中的慢 converges。本文提出了一种新的robust、bounded、稀疏和均匀（RoBoSS）损失函数，用于指导学习。此外，我们将RoBoSS损失函数 integration到支持向量机（SVM）框架中，并提出了一种新的Robust-SVM算法。对于理论分析方面，我们还提出了分类准备性和泛化能力的研究。这些研究对于了解RoBoSS损失函数在分类任务中的性能和泛化能力具有重要意义。为了证明提出的Lrbss-SVM模型的效果，我们对88个真实世界UCI和KEEL数据集进行了实验评估。此外，为了展示Lrbss-SVM模型在医学领域的效果，我们对电enzephalogram（EEG）信号数据集和Breast Cancer（BreaKHis）数据集进行了评估。实验结果表明，提出的Lrbss-SVM模型具有优秀的泛化性和训练效率。

Self-Similarity-Based and Novelty-based loss for music structure analysis

paper_url: http://arxiv.org/abs/2309.02243
repo_url: None
paper_authors: Geoffroy Peeters
for: 本研究旨在提出一种监督学习方法来解决音乐分割问题。
methods: 该方法同时学习特征和卷积核，并将自同相似矩阵（SSM）和新鲜度分数作为损失函数进行优化。
results: 研究人员通过对学习到的特征进行自我注意力，提高了音乐分割任务的性能。此外，与之前的方法进行比较，该方法在RWC-Pop和SALAMI等标准数据集上表现较优。

Abstract
Music Structure Analysis (MSA) is the task aiming at identifying musical segments that compose a music track and possibly label them based on their similarity. In this paper we propose a supervised approach for the task of music boundary detection. In our approach we simultaneously learn features and convolution kernels. For this we jointly optimize -- a loss based on the Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by SSM-loss, and -- a loss based on the novelty score obtained applying the learned kernels to the estimated SSM, denoted by novelty-loss. We also demonstrate that relative feature learning, through self-attention, is beneficial for the task of MSA. Finally, we compare the performances of our approach to previously proposed approaches on the standard RWC-Pop, and various subsets of SALAMI.

摘要
音乐结构分析（MSA）是目标在音乐轨道中确定乐曲的各个部分，并可能根据它们的相似性进行标签。在这篇论文中，我们提出了一种监督方法来实现音乐边界检测任务。我们同时学习特征和卷积核，并同步优化两种损失函数。其中一种损失函数基于自相似矩阵（SSM），通过学习的特征来获得，被称为SSM-损失；另一种损失函数基于新鲜度分数，通过学习的卷积核应用于估算的SSM中，被称为新鲜度-损失。我们还证明了通过自我注意力来实现相对特征学习是MSA任务中有利的。最后，我们比较了我们的方法与之前提出的方法在标准RWC-Pop和SALAMI中的性能。

Sample Size in Natural Language Processing within Healthcare Research

paper_url: http://arxiv.org/abs/2309.02237
repo_url: None
paper_authors: Jaya Chaturvedi, Diana Shamsutdinova, Felix Zimmer, Sumithra Velupillai, Daniel Stahl, Robert Stewart, Angus Roberts
for: 本研究是为了提供适合医疗领域文本数据的样本大小选择的建议。
methods: 本研究使用了不同的分类器和样本大小进行了 simulations，以评估不同样本大小对文本分类任务的影响。
results: 研究发现，使用 K-最近邻分类器时，小样本大小可以提供更好的性能指标，而使用支持向量机和BERT模型时，大样本大小提供更好的性能。总之，样本大小大于1000是适合的，可以提供良好的性能指标。

Abstract
Sample size calculation is an essential step in most data-based disciplines. Large enough samples ensure representativeness of the population and determine the precision of estimates. This is true for most quantitative studies, including those that employ machine learning methods, such as natural language processing, where free-text is used to generate predictions and classify instances of text. Within the healthcare domain, the lack of sufficient corpora of previously collected data can be a limiting factor when determining sample sizes for new studies. This paper tries to address the issue by making recommendations on sample sizes for text classification tasks in the healthcare domain. Models trained on the MIMIC-III database of critical care records from Beth Israel Deaconess Medical Center were used to classify documents as having or not having Unspecified Essential Hypertension, the most common diagnosis code in the database. Simulations were performed using various classifiers on different sample sizes and class proportions. This was repeated for a comparatively less common diagnosis code within the database of diabetes mellitus without mention of complication. Smaller sample sizes resulted in better results when using a K-nearest neighbours classifier, whereas larger sample sizes provided better results with support vector machines and BERT models. Overall, a sample size larger than 1000 was sufficient to provide decent performance metrics. The simulations conducted within this study provide guidelines that can be used as recommendations for selecting appropriate sample sizes and class proportions, and for predicting expected performance, when building classifiers for textual healthcare data. The methodology used here can be modified for sample size estimates calculations with other datasets.

摘要
Sample size calculation is an essential step in most data-based disciplines. Large enough samples ensure representativeness of the population and determine the precision of estimates. This is true for most quantitative studies, including those that employ machine learning methods, such as natural language processing, where free-text is used to generate predictions and classify instances of text. Within the healthcare domain, the lack of sufficient corpora of previously collected data can be a limiting factor when determining sample sizes for new studies. This paper tries to address the issue by making recommendations on sample sizes for text classification tasks in the healthcare domain. 模型在基于MIMIC-III数据库的医疗记录中训练后，用于分类文档是否有未特定主要高血压症，该数据库中最常见的诊断代码。在不同的样本大小和类别比例下，使用不同的分类器进行了 simulations。这些 simulations 重复使用不同的分类器和不同的样本大小。结果表明，使用 K-最近邻居分类器时，小样本大小得到了更好的结果，而使用支持向量机和BERT模型时，大样本大小得到了更好的结果。总的来说，样本大小大于1000是可以提供不错的性能指标的。这些 simulations 提供了适用于健康领域文本数据的分类器建立时选择合适的样本大小和类别比例，以及预测性能的指南。这种方法可以适用于其他数据集的样本大小估计计算。

Distributionally Robust Machine Learning with Multi-source Data

paper_url: http://arxiv.org/abs/2309.02211
repo_url: None
paper_authors: Zhenyu Wang, Peter Bühlmann, Zijian Guo
for: 这篇文章是用于提高预测性能，当目标分布与源 Population 不同时，传统机器学习方法可能会导致差强预测性能。
methods: 文章提出了一个基于多来源数据的集团分布强化预测模型，这个模型使用了对于target分布的敌方奖励函数来定义，以提高预测性能。
results: 文章的实验结果显示，相比于传统的empirical risk minimization，提案的强化预测模型可以提高预测性能，并且可以让用户对于不同的目标分布进行预测。

Abstract
Classical machine learning methods may lead to poor prediction performance when the target distribution differs from the source populations. This paper utilizes data from multiple sources and introduces a group distributionally robust prediction model defined to optimize an adversarial reward about explained variance with respect to a class of target distributions. Compared to classical empirical risk minimization, the proposed robust prediction model improves the prediction accuracy for target populations with distribution shifts. We show that our group distributionally robust prediction model is a weighted average of the source populations' conditional outcome models. We leverage this key identification result to robustify arbitrary machine learning algorithms, including, for example, random forests and neural networks. We devise a novel bias-corrected estimator to estimate the optimal aggregation weight for general machine-learning algorithms and demonstrate its improvement in the convergence rate. Our proposal can be seen as a distributionally robust federated learning approach that is computationally efficient and easy to implement using arbitrary machine learning base algorithms, satisfies some privacy constraints, and has a nice interpretation of different sources' importance for predicting a given target covariate distribution. We demonstrate the performance of our proposed group distributionally robust method on simulated and real data with random forests and neural networks as base-learning algorithms.

摘要
传统机器学习方法可能会导致预测性能差，因为目标分布与源 популяции不同。这篇论文利用多个源数据，提出了一种分布robust预测模型，定义为最小化一个对target分布的反对抗 reward的 adversarial objective function。相比于传统的empirical risk minimization，我们的robust预测模型可以提高预测性能 для目标population中的分布shift。我们证明了我们的分布robust预测模型是源population conditional outcome模型的Weighted average。我们利用这个关键的标识结果，以robustify任意机器学习算法，包括随机森林和神经网络。我们开发了一种偏导corrected estimator来估计最佳汇合权，并证明其提高了收敛率。我们的提议可以看作是一种分布robust Federated learning方法，computationally efficient，易于实现，满足一些隐私约束，并具有预测targetcovariate分布的nice interpretation of different sources的重要性。我们在 simulate和实际数据上测试了我们的提议，使用随机森林和神经网络作为基础学习算法。

Latent Disentanglement in Mesh Variational Autoencoders Improves the Diagnosis of Craniofacial Syndromes and Aids Surgical Planning

paper_url: http://arxiv.org/abs/2309.10825
repo_url: None
paper_authors: Simone Foti, Alexander J. Rickart, Bongjin Koo, Eimear O’ Sullivan, Lara S. van de Lande, Athanasios Papaioannou, Roman Khonsari, Danail Stoyanov, N. u. Owase Jeelani, Silvia Schievano, David J. Dunaway, Matthew J. Clarkson
for: 这项研究旨在应用Swap Disentangled Variational Autoencoder（SD-VAE）模型对人类头部复杂结构进行深度学习分析，以便更好地识别和分类各种颅骨畸形。
methods: 这项研究使用了SD-VAE模型，通过对整个头部网格进行分类，同时也可以分析每个区域对颅骨畸形的影响。此外，通过修改生成模型的参数，可以模拟不同的颅骨外科手术结果。
results: 这项研究可以帮助提高颅骨畸形诊断的准确性，帮助外科医生规划手术，以及对手术结果进行客观评估。

Abstract
The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muenke syndromes. Although syndrome classification is performed on the entire mesh, it is also possible, for the first time, to analyse the influence of each region of the head on the syndromic phenotype. By manipulating specific parameters of the generative model, and producing procedure-specific new shapes, it is also possible to simulate the outcome of a range of craniofacial surgical procedures. This opens new avenues to advance diagnosis, aids surgical planning and allows for the objective evaluation of surgical outcomes.

摘要
使用深度学习进行人类头部复杂性分析具有很大的抢救性。然而，在全球和地方层次上准确模型化却存在许多障碍。在这项工作中，我们将讨论使用Swap Disentangled Variational Autoencoder（SD-VAE）在Crouzon、Apert和Muenke综合症中的应用。虽然病种分类是基于整个网格进行的，但是也可以，如 nunca antes，分析每个头部区域对综合症fenotip的影响。通过修改生成模型的特定参数，并生成过程特定的新形状，也可以模拟多种颅外科手术的结果。这些新的技术可以提高诊断、帮助手术规划和评估手术结果的 объекivity。

Language Models for Novelty Detection in System Call Traces

paper_url: http://arxiv.org/abs/2309.02206
repo_url: None
paper_authors: Quentin Fournier, Daniel Aloise, Leandro R. Costa
for: 本研究旨在提出一种基于系统调用语言模型的新型行为探测方法，用于检测现代计算机系统中的异常行为。
methods: 本研究使用了三种不同的 neural network 架构：LSTM、Transformer 和 Longformer，并对这些架构进行了评估。
results: 研究发现，使用这些架构可以实现高于 95% 的 F-score 和 AuROC 在大多数新型行为上，而且该方法不需要大量的专家手动编制和可以在不同任务上进行数据独立的应用。

Abstract
Due to the complexity of modern computer systems, novel and unexpected behaviors frequently occur. Such deviations are either normal occurrences, such as software updates and new user activities, or abnormalities, such as misconfigurations, latency issues, intrusions, and software bugs. Regardless, novel behaviors are of great interest to developers, and there is a genuine need for efficient and effective methods to detect them. Nowadays, researchers consider system calls to be the most fine-grained and accurate source of information to investigate the behavior of computer systems. Accordingly, this paper introduces a novelty detection methodology that relies on a probability distribution over sequences of system calls, which can be seen as a language model. Language models estimate the likelihood of sequences, and since novelties deviate from previously observed behaviors by definition, they would be unlikely under the model. Following the success of neural networks for language models, three architectures are evaluated in this work: the widespread LSTM, the state-of-the-art Transformer, and the lower-complexity Longformer. However, large neural networks typically require an enormous amount of data to be trained effectively, and to the best of our knowledge, no massive modern datasets of kernel traces are publicly available. This paper addresses this limitation by introducing a new open-source dataset of kernel traces comprising over 2 million web requests with seven distinct behaviors. The proposed methodology requires minimal expert hand-crafting and achieves an F-score and AuROC greater than 95% on most novelties while being data- and task-agnostic. The source code and trained models are publicly available on GitHub while the datasets are available on Zenodo.

摘要
Recently, researchers have turned to system calls as the most fine-grained and accurate source of information to investigate the behavior of computer systems. This paper introduces a novelty detection methodology that relies on a probability distribution over sequences of system calls, which can be seen as a language model. The method estimates the likelihood of sequences, and since novelties deviate from previously observed behaviors by definition, they would be unlikely under the model.To evaluate the effectiveness of the method, three neural network architectures were used: the widespread LSTM, the state-of-the-art Transformer, and the lower-complexity Longformer. However, large neural networks typically require a large amount of data to be trained effectively. To address this limitation, this paper introduces a new open-source dataset of kernel traces comprising over 2 million web requests with seven distinct behaviors.The proposed methodology requires minimal expert hand-crafting and achieves an F-score and AuROC greater than 95% on most novelties while being data- and task-agnostic. The source code and trained models are publicly available on GitHub, while the datasets are available on Zenodo.

On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence

paper_url: http://arxiv.org/abs/2309.02202
repo_url: None
paper_authors: Achraf Azize, Marc Jourdan, Aymen Al Marjani, Debabrota Basu
for: 这个论文主要研究的是如何在数据敏感应用中实现最佳臂标识（BAI）问题，包括设计适应性临床试验、调整超参数以及进行用户研究等。
methods: 这篇论文使用了 $\epsilon$-全球隐私（DP）来保证数据隐私，并研究了 $\epsilon$-全球DP下BAI问题的解决方案。作者首先计算了隐私保护成本的下限，并发现了两种隐私模式，即高隐私模式（小 $\epsilon$）和低隐私模式（大 $\epsilon$）。在高隐私模式下，难度受到隐私和一种新的信息论量——总特征时间的共同影响。在低隐私模式下，下界降到非私下的下界。
results: 作者提出了一种名为 AdaP-TT 的 $\epsilon$-全球DP下的 BAI 算法，该算法在 arm-dependent 的扩展集内运行，并添加了 Laplace 噪声来保证好隐私与实用之间的融合。作者 deriv了 AdaP-TT 的 asymptotic 上限，并证明了其与下界之间的相似性。最后，作者进行了实验分析，并证明了理论结论。

Abstract
Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies to name a few. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence under $\epsilon$-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive a lower bound on the sample complexity of any $\delta$-correct BAI algorithm satisfying $\epsilon$-global DP. Our lower bound suggests the existence of two privacy regimes depending on the privacy budget $\epsilon$. In the high-privacy regime (small $\epsilon$), the hardness depends on a coupled effect of privacy and a novel information-theoretic quantity, called the Total Variation Characteristic Time. In the low-privacy regime (large $\epsilon$), the sample complexity lower bound reduces to the classical non-private lower bound. Second, we propose AdaP-TT, an $\epsilon$-global DP variant of the Top Two algorithm. AdaP-TT runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. We derive an asymptotic upper bound on the sample complexity of AdaP-TT that matches with the lower bound up to multiplicative constants in the high-privacy regime. Finally, we provide an experimental analysis of AdaP-TT that validates our theoretical results.

摘要
最佳臂标识问题（BAI）在数据敏感应用中得到普遍应用，如设计适应式临床试验、调整超参数以及进行用户研究等。由于这些应用中的数据隐私问题，我们研究BAI问题在固定信息保护环境下的$\epsilon$-全球隐私（DP）下进行研究。首先，我们定义了隐私成本的量化，即在任意$\delta$-正确BAI算法满足$\epsilon$-全球DP下的样本复杂度下界。我们的下界表明，隐私预算$\epsilon$的两个隐私模式存在：在高隐私模式（小$\epsilon$）下，难度受到隐私和一种新的信息论量度——总特征时间的共同作用的影响。在低隐私模式（大$\epsilon$）下，样本复杂度下界降低到经典非私有下界。其次，我们提出了一种$\epsilon$-全球DP variant的Top Two算法——AdaP-TT。AdaP-TT在臂 dependent的适应性集中运行，并在每个集中添加拉Place噪声以确保良好的隐私利用平衡。我们 deriv了AdaP-TT的 asymptotic 上界样本复杂度，与下界匹配到多项式常数在高隐私模式。最后，我们对AdaP-TT进行实验分析，并证明了我们的理论结果。

Sparse Function-space Representation of Neural Networks

paper_url: http://arxiv.org/abs/2309.02195
repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
paper_authors: Aidan Scannell, Riccardo Mereu, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin
for: 提高深度神经网络（NN）的不确定性估计和新数据 incorporation 能力
methods: 通过将NN从权重空间转换到函数空间，via dual parameterization，实现了一种简洁和原理正确的不确定性捕捉方法，并可以在整个数据集中捕捉信息
results: 通过proof-of-concept示例，证明了该方法在超参量学任务上可以有效地Quantifying uncertainty和 incorporating new data without retraining

Abstract
Deep neural networks (NNs) are known to lack uncertainty estimates and struggle to incorporate new data. We present a method that mitigates these issues by converting NNs from weight space to function space, via a dual parameterization. Importantly, the dual parameterization enables us to formulate a sparse representation that captures information from the entire data set. This offers a compact and principled way of capturing uncertainty and enables us to incorporate new data without retraining whilst retaining predictive performance. We provide proof-of-concept demonstrations with the proposed approach for quantifying uncertainty in supervised learning on UCI benchmark tasks.

摘要
Translated into Simplified Chinese:深度神经网络（NN）通常缺乏不确定性估计和新数据integrate的能力。我们提出了一种方法，通过将NN从权重空间转换到函数空间，使用双参数化。这种方法可以提供一种紧凑而原理的方式来捕捉不确定性信息，并且可以在不需要 RETRAINING 的情况下，将新数据集入库。我们在 UCI benchmark 任务上提供了证明示范，以证明我们的方法可以在超出学习中量化不确定性。

Personalized Federated Deep Reinforcement Learning-based Trajectory Optimization for Multi-UAV Assisted Edge Computing

paper_url: http://arxiv.org/abs/2309.02193
repo_url: None
paper_authors: Zhengrong Song, Chuan Ma, Ming Ding, Howard H. Yang, Yuwen Qian, Xiangwei Zhou
for: 提高多架空器飞行轨迹优化的通信系统吞吐量
methods: 使用个性化联合深度学习（PF-DRL）方法，为每个代理模型开发个性化模型，以Address数据稀缺和不均匀性问题
results: 在模拟环境中，提议的算法比其他DRL基本方法具有更好的训练性能和更快的 converges速率，并提高服务质量

Abstract
In the era of 5G mobile communication, there has been a significant surge in research focused on unmanned aerial vehicles (UAVs) and mobile edge computing technology. UAVs can serve as intelligent servers in edge computing environments, optimizing their flight trajectories to maximize communication system throughput. Deep reinforcement learning (DRL)-based trajectory optimization algorithms may suffer from poor training performance due to intricate terrain features and inadequate training data. To overcome this limitation, some studies have proposed leveraging federated learning (FL) to mitigate the data isolation problem and expedite convergence. Nevertheless, the efficacy of global FL models can be negatively impacted by the high heterogeneity of local data, which could potentially impede the training process and even compromise the performance of local agents. This work proposes a novel solution to address these challenges, namely personalized federated deep reinforcement learning (PF-DRL), for multi-UAV trajectory optimization. PF-DRL aims to develop individualized models for each agent to address the data scarcity issue and mitigate the negative impact of data heterogeneity. Simulation results demonstrate that the proposed algorithm achieves superior training performance with faster convergence rates, and improves service quality compared to other DRL-based approaches.

摘要
在5G移动通信时代，有一场很大的研究集中在无人飞行器（UAV）和边缘计算技术上。UAV可以在边缘计算环境中服务为智能服务器，最大化通信系统吞吐量。深度违离学（DRL）基于的轨迹优化算法可能因地形特征复杂和训练数据不充分而表现不佳。为了解决这些限制，一些研究已经提议利用联邦学习（FL）来减少数据隔离问题，加速融合。然而，全球FL模型的效果可能受到本地数据的高自similarity的影响，这可能会阻碍训练过程并可能下降本地代理的性能。这项工作提出了一种解决这些挑战的新解决方案，即个性化联邦深度违离学（PF-DRL），用于多个UAV的轨迹优化。PF-DRL的目标是为每个代理开发特定的模型，以解决数据缺乏问题，并减少数据不同性的负面影响。在模拟结果中，提出的算法可以在训练性能和速度上达到更好的表现，并提高服务质量相比其他DRL基于的方法。

Bias Propagation in Federated Learning

paper_url: http://arxiv.org/abs/2309.02160
repo_url: https://github.com/privacytrustlab/bias_in_FL
paper_authors: Hongyan Chang, Reza Shokri
for: 这个论文旨在探讨联邦学习中的群体公平问题，具体来说是研究在分布式数据集上如何避免偏见的扩散。
methods: 这个论文使用了联邦学习的实际应用场景，对实际分布式数据集进行分析和解释，探讨偏见如何在联邦学习中传播。
results: 研究发现，在联邦学习中，偏见可以通过网络传播给所有参与方，而且这种偏见的程度高于中央训练模型使用所有数据集的情况。这些结果告诉我们，在联邦学习中需要进行审核和设计具有群体公平性的学习算法。

Abstract
We show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few parties against under-represented groups (identified by sensitive attributes such as gender or race) can propagate through the network to all the parties in the network. We analyze and explain bias propagation in federated learning on naturally partitioned real-world datasets. Our analysis reveals that biased parties unintentionally yet stealthily encode their bias in a small number of model parameters, and throughout the training, they steadily increase the dependence of the global model on sensitive attributes. What is important to highlight is that the experienced bias in federated learning is higher than what parties would otherwise encounter in centralized training with a model trained on the union of all their data. This indicates that the bias is due to the algorithm. Our work calls for auditing group fairness in federated learning and designing learning algorithms that are robust to bias propagation.

摘要
我们显示了参与联邦学习可能会对群体公平性造成不良影响。事实上，一些党对受抑表示的群体（通过敏感特征如性别或种族）的偏见可以在网络中传播到所有党。我们分析并解释了联邦学习中偏见传播的现象。我们的分析表明，偏见党在训练过程中隐藏式地将偏见编码到少量的模型参数中，并在训练中不断增加受抑表示的参考。值得注意的是，在联邦学习中体验到的偏见高于中央训练一个模型使用所有数据的情况下所体验到的偏见。这表明，偏见是由算法所导致的。我们的工作呼吁了审核联邦学习中的群体公平性，并设计不受偏见传播的学习算法。

A Simple Asymmetric Momentum Make SGD Greatest Again

paper_url: http://arxiv.org/abs/2309.02130
repo_url: None
paper_authors: Gongyue Zhang, Dinghuang Zhang, Shuwen Zhao, Donghan Liu, Carrie M. Toptan, Honghai Liu
for: 本研究旨在解决深度学习中的枢轴点问题，提出了一种新的损控极大值梯度法（LCAM），不同于传统的梯度下降法，LCAM在计算成本上没有增加，却能够超越现有的优化器。
methods: 本研究使用了质量 conjugation 和拖动效应来解释 LCAM 的现象，并设计了一系列实验来快速降低学习率在特定的积分 epoch 来更好地吸引参数陷入枢轴点。
results: 在 WRN28-10 测试网络上，使用 LCAM 在 Cifar100 测试集上达到了平均测试精度的峰值 around 120 epoch，比原始 WRN 纸上的80.75% 高，而且使用的 convergence time 只有原始 WRN 的一半。

Abstract
We propose the simplest SGD enhanced method ever, Loss-Controlled Asymmetric Momentum(LCAM), aimed directly at the Saddle Point problem. Compared to the traditional SGD with Momentum, there's no increase in computational demand, yet it outperforms all current optimizers. We use the concepts of weight conjugation and traction effect to explain this phenomenon. We designed experiments to rapidly reduce the learning rate at specified epochs to trap parameters more easily at saddle points. We selected WRN28-10 as the test network and chose cifar10 and cifar100 as test datasets, an identical group to the original paper of WRN and Cosine Annealing Scheduling(CAS). We compared the ability to bypass saddle points of Asymmetric Momentum with different priorities. Finally, using WRN28-10 on Cifar100, we achieved a peak average test accuracy of 80.78\% around 120 epoch. For comparison, the original WRN paper reported 80.75\%, while CAS was at 80.42\%, all at 200 epoch. This means that while potentially increasing accuracy, we use nearly half convergence time. Our demonstration code is available at\\ https://github.com/hakumaicc/Asymmetric-Momentum-LCAM

摘要
我们提出了最简单的SGD加强方法之一，损控量子动量（LCAM），直接解决顶点问题。与传统的SGD加强方法相比，我们没有增加计算需求，但它在当前优化器中表现出色。我们使用了权重 conjugation 和拖动效应来解释这种现象。我们设计了实验，以快速降低学习率在 specify 的epoch中，以更容易将参数固定在顶点上。我们选择了 WRN28-10 作为测试网络，并选择了 cifar10 和 cifar100 作为测试集，与原始 WRN 和 Cosine Annealing Scheduling（CAS）的测试集一样。我们比较了不同优先级的偏置量子动量的缺过点途径能力。最后，使用 WRN28-10 在 Cifar100 上达到了约 120 epoch 的峰值平均测试准确率 around 80.78%。相比之下，原始 WRN 文章reported 80.75%，而 CAS 则是 80.42%，都在 200 epoch 上。这意味着，虽然可能提高准确率，但我们使用的是 nearly half 的 converge 时间。我们的示例代码可以在 https://github.com/hakumaicc/Asymmetric-Momentum-LCAM 上找到。

Exploiting Spatial-temporal Data for Sleep Stage Classification via Hypergraph Learning

paper_url: http://arxiv.org/abs/2309.02124
repo_url: None
paper_authors: Yuze Liu, Ziming Zhao, Tiehua Zhang, Kang Wang, Xin Chen, Xiaowei Huang, Jun Yin, Zhishu Shen
for: 静脉分类是诊断疾病的关键，现有模型主要使用卷积神经网络（CNN）模型几何数据，以及图 convolutional neural networks（GNN）模型非几何数据，但是这些模型无法同时考虑多modal数据的异质性和交互性，以及空间-时间相关性，因此它们的分类性能有限。
methods: 我们提出了一种动态学习框架STHL，该框架引入了Hipergraph来编码空间-时间数据 для静脉分类。Hipergraph可以构造多Modal/多类型的数据，而不是使用简单的对两个主体之间的对应。STHL创建空间和时间的Hiperedge分别来建立节点相关性，然后它进行类型特定的Hipergraph学习过程来编码特征到嵌入空间。
results: 我们的提出的STHL在静脉分类任务中超过了当前最佳模型的性能。

Abstract
Sleep stage classification is crucial for detecting patients' health conditions. Existing models, which mainly use Convolutional Neural Networks (CNN) for modelling Euclidean data and Graph Convolution Networks (GNN) for modelling non-Euclidean data, are unable to consider the heterogeneity and interactivity of multimodal data as well as the spatial-temporal correlation simultaneously, which hinders a further improvement of classification performance. In this paper, we propose a dynamic learning framework STHL, which introduces hypergraph to encode spatial-temporal data for sleep stage classification. Hypergraphs can construct multi-modal/multi-type data instead of using simple pairwise between two subjects. STHL creates spatial and temporal hyperedges separately to build node correlations, then it conducts type-specific hypergraph learning process to encode the attributes into the embedding space. Extensive experiments show that our proposed STHL outperforms the state-of-the-art models in sleep stage classification tasks.

摘要
📝 sleep stage classification 是诊断病人健康状况的关键。现有的模型主要使用卷积神经网络（CNN）来模型几何数据，以及图 convolutional neural networks（GNN）来模型非几何数据，但是这些模型无法同时考虑多模态数据的异质性和互动性以及空间-时间相关性，这会限制分类性能的进一步提高。在本文中，我们提出了一个动态学习框架 STHL，该框架利用卷积图来编码空间-时间数据 для睡眠阶段分类。卷积图可以构建多Modal/多类型的数据，而不是使用简单的对两个主题之间的对应关系。STHL 首先在空间和时间上分别创建特性 Edge，然后进行类型特定的卷积图学习过程来编码特征到嵌入空间中。广泛的实验表明，我们提出的 STHL 在睡眠阶段分类任务中表现出了优于现有的模型。

An Efficient Approach to Unsupervised Out-of-Distribution Detection with Variational Autoencoders

paper_url: http://arxiv.org/abs/2309.02084
repo_url: https://github.com/zjlab-ammi/vae4ood
paper_authors: Zezhen Zeng, Bin Liu
for: 这 paper 关注深度生成模型 (DGM) 的无监督 out-of-distribution (OOD) 检测。特别是我们关注 vanilla Variational Autoencoders (VAE)，使用标准正态分布 для隐藏变量。这些模型具有更小的模型大小，使得它们在资源有限的应用程序中更适合使用，比较复杂的 DGM。
methods: 我们提出了一种新的 OOD 分数，called Error Reduction (ER)，专门为 vanilla VAE 设计。ER 具有重建输入图像的损失 counterpart 的想法，并考虑图像的 Kolmogorov 复杂性。我们在多个数据集上进行了实验，比较基准方法。
results: 我们的实验结果表明，我们的方法在多个数据集上具有显著优势，比较基准方法。我们的代码可以在 GitHub 上找到：https://github.com/ZJLAB-AMMI/VAE4OOD。

Abstract
This paper is concerned with deep generative models (DGMs) for unsupervised out-of-distribution (OOD) detection. In particular, we focus on vanilla Variational Autoencoders (VAE) that use a standard normal prior distribution for the latent variables. These models have a smaller model size, enabling faster training and inference, making them well-suited for resource-limited applications compared to more complex DGMs. We propose a novel OOD score called Error Reduction (ER) specifically designed for vanilla VAE. ER incorporate the idea of reconstructing image inputs from their lossy counterparts and takes into account the Kolmogorov complexity of the images. Experimental results on diverse datasets demonstrate the superiority of our approach over baseline methods. Our code is available at: https://github.com/ZJLAB-AMMI/VAE4OOD.

摘要
Translation notes:* "DGMs" is translated as "深度生成模型" (shēn dào shēng chuàng mó delè)* "VAE" is translated as "自变量 autoencoder" (zì biàn xiàng autoencoder)* "OOD" is translated as "外围数据" (wài yù shù jí)* "ER" is translated as "错误减少" (cuò wù jiǎn shang)* "Kolmogorov complexity" is translated as "科玛戈罗夫复杂度" (kē mǎ gē luó fù zhòng dù)

BeeTLe: A Framework for Linear B-Cell Epitope Prediction and Classification

paper_url: http://arxiv.org/abs/2309.02071
repo_url: https://github.com/yuanx749/bcell
paper_authors: Xiao Yuan
for: 这个论文的目的是提出一种新的深度学习基于多任务框架，用于线性B细胞抗体复合体预测和抗体类型特定的复合体分类。
methods: 该论文提出了一种基于序列 neural network 模型，使用回归层和 Transformer 块来实现。此外，还提出了一种基于对角化的胺基encoding方法，以帮助模型学习复合体的表示。
results: 实验结果表明，提出的方法有效地预测B细胞抗体复合体，并且与竞争方法相比，表现出色。

Abstract
The process of identifying and characterizing B-cell epitopes, which are the portions of antigens recognized by antibodies, is important for our understanding of the immune system, and for many applications including vaccine development, therapeutics, and diagnostics. Computational epitope prediction is challenging yet rewarding as it significantly reduces the time and cost of laboratory work. Most of the existing tools do not have satisfactory performance and only discriminate epitopes from non-epitopes. This paper presents a new deep learning-based multi-task framework for linear B-cell epitope prediction as well as antibody type-specific epitope classification. Specifically, a sequenced-based neural network model using recurrent layers and Transformer blocks is developed. We propose an amino acid encoding method based on eigen decomposition to help the model learn the representations of epitopes. We introduce modifications to standard cross-entropy loss functions by extending a logit adjustment technique to cope with the class imbalance. Experimental results on data curated from the largest public epitope database demonstrate the validity of the proposed methods and the superior performance compared to competing ones.

摘要
“识别和描述B细胞结构的过程是免疫系统理解的重要部分，并有许多应用，包括疫苗开发、治疗和诊断。计算epitope预测是具有挑战性的，但是可以大幅降低实验室工作的时间和成本。现有的工具大多数无法达到满意的性能，只能区分epitope和非epitope。本文提出了一个新的深度学习多任务框架，用于直线B细胞epitope预测和抗体类型特定epitope分类。具体来说，我们使用序列化的神经网络模型，包括回传层和Transformer层。我们提出了一个使用对角解析法来编码氨基酸的方法，帮助模型学习epitope的表现。我们也提出了对标准十字熵损失函数进行修改，以应对分布不对称。实验结果显示，提出的方法有效性和与竞争方法相比较高的性能。”

Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI

paper_url: http://arxiv.org/abs/2309.02065
repo_url: None
paper_authors: Dustin Wright, Christian Igel, Gabrielle Samuel, Raghavendra Selvan
for: 本文旨在探讨机器学习（ML）技术的环境可持续性问题，并 argue against solely focusing on efficiency as the solution.methods: 本文使用高级数学（DL）和其他ML方法，以及系统思维来探讨ML技术对环境的影响。results: 本文认为，提高ML系统的效率并不能够完全解决其对环境的影响，而需要考虑多个变量的交互作用。

Abstract
Artificial Intelligence (AI) is currently spearheaded by machine learning (ML) methods such as deep learning (DL) which have accelerated progress on many tasks thought to be out of reach of AI. These ML methods can often be compute hungry, energy intensive, and result in significant carbon emissions, a known driver of anthropogenic climate change. Additionally, the platforms on which ML systems run are associated with environmental impacts including and beyond carbon emissions. The solution lionized by both industry and the ML community to improve the environmental sustainability of ML is to increase the efficiency with which ML systems operate in terms of both compute and energy consumption. In this perspective, we argue that efficiency alone is not enough to make ML as a technology environmentally sustainable. We do so by presenting three high level discrepancies between the effect of efficiency on the environmental sustainability of ML when considering the many variables which it interacts with. In doing so, we comprehensively demonstrate, at multiple levels of granularity both technical and non-technical reasons, why efficiency is not enough to fully remedy the environmental impacts of ML. Based on this, we present and argue for systems thinking as a viable path towards improving the environmental sustainability of ML holistically.

摘要
然而，我们认为效率alone是不足以使ML成为可持续的技术。我们这样做的原因在于，当ML系统与多个变数互动时，增加效率对环境可持续性的影响是复杂的。为了解释这个观点，我们在这篇文章中提出了三个高度不一致的问题，它们是：1. 碳排放和能源消耗之间的复杂关系。2. ML系统的可持续性受到多个因素的影响，包括硬件、软件、供应链和使用者。3. 增加效率可能会导致新的环境和社会影响，例如对于资源的掌控和可持续性。这些问题表明，增加ML系统的效率 alone 不能全面解决环境可持续性的问题。因此，我们提出了以系统思维为基础的可持续性解决方案，以确保ML技术在环境和社会方面的影响是可持续的。

MvFS: Multi-view Feature Selection for Recommender System

paper_url: http://arxiv.org/abs/2309.02064
repo_url: None
paper_authors: Youngjune Lee, Yeongjong Jeong, Keunchan Park, SeongKu Kang
for: 提高 recommender systems 中 feature selection 的性能，适应不同数据场景。
methods: 使用多视图网络和独立的重要性分数模型，避免特征选择过程受主导特征的偏袋问题，从而更有效地选择有用的特征。
results: 对实际数据进行实验，证明 MvFS 比state-of-the-art基elines更有效。

Abstract
Feature selection, which is a technique to select key features in recommender systems, has received increasing research attention. Recently, Adaptive Feature Selection (AdaFS) has shown remarkable performance by adaptively selecting features for each data instance, considering that the importance of a given feature field can vary significantly across data. However, this method still has limitations in that its selection process could be easily biased to major features that frequently occur. To address these problems, we propose Multi-view Feature Selection (MvFS), which selects informative features for each instance more effectively. Most importantly, MvFS employs a multi-view network consisting of multiple sub-networks, each of which learns to measure the feature importance of a part of data with different feature patterns. By doing so, MvFS mitigates the bias problem towards dominant patterns and promotes a more balanced feature selection process. Moreover, MvFS adopts an effective importance score modeling strategy which is applied independently to each field without incurring dependency among features. Experimental results on real-world datasets demonstrate the effectiveness of MvFS compared to state-of-the-art baselines.

摘要
Feature selection, which is a technique used in recommender systems to select key features, has recently received increasing research attention. Adaptive Feature Selection (AdaFS) has shown remarkable performance by adaptively selecting features for each data instance, considering that the importance of a given feature field can vary significantly across data. However, this method still has limitations, as its selection process can be easily biased towards major features that frequently occur. To address these problems, we propose Multi-view Feature Selection (MvFS), which selects informative features for each instance more effectively. Most importantly, MvFS employs a multi-view network consisting of multiple sub-networks, each of which learns to measure the feature importance of a part of data with different feature patterns. By doing so, MvFS mitigates the bias problem towards dominant patterns and promotes a more balanced feature selection process. Moreover, MvFS adopts an effective importance score modeling strategy which is applied independently to each field without incurring dependency among features. Experimental results on real-world datasets demonstrate the effectiveness of MvFS compared to state-of-the-art baselines.

No-Regret Caching with Noisy Request Estimates

paper_url: http://arxiv.org/abs/2309.02055
repo_url: None
paper_authors: Younes Ben Mazziane, Francescomaria Faticanti, Giovanni Neglia, Sara Alouf
for: 这个论文目的是设计缓存策略，以满足在高负荷和/或内存约束的情况下的缓存需求。
methods: 这个论文使用了在线学习算法，以实现缓存策略的设计，并且对缓存请求序列进行了预测。
results: 该论文提出了一种名为“听雷雨 Follow the Perturbed Leader”（NFPL）算法，该算法可以在缓存请求序列中受到噪声影响时，实现低 regret 的缓存策略。此外，该论文还进行了对 классические缓存策略的比较，并在实验中验证了提议的方法的可行性。

Abstract
Online learning algorithms have been successfully used to design caching policies with regret guarantees. Existing algorithms assume that the cache knows the exact request sequence, but this may not be feasible in high load and/or memory-constrained scenarios, where the cache may have access only to sampled requests or to approximate requests' counters. In this paper, we propose the Noisy-Follow-the-Perturbed-Leader (NFPL) algorithm, a variant of the classic Follow-the-Perturbed-Leader (FPL) when request estimates are noisy, and we show that the proposed solution has sublinear regret under specific conditions on the requests estimator. The experimental evaluation compares the proposed solution against classic caching policies and validates the proposed approach under both synthetic and real request traces.

摘要
在线学习算法已经成功地用于设计缓存策略，并且提供了 regret 保证。现有的算法假设缓存知道正确的请求序列，但在高负荷和/或内存压力高的enario中，这可能并不是可行的，因为缓存可能只有对请求数据进行采样或估计。在这篇论文中，我们提出了听从噪声扰动领导者（NFPL）算法，这是经典的跟踪扰动领导者（FPL）算法的变种，当请求估计具有噪声时。我们证明了我们的解决方案在特定的请求估计条件下具有下降式 regret。实验评估比较了我们的解决方案与经典缓存策略，并在 synthetic 和实际请求轨迹上验证了我们的方法。

Model-agnostic network inference enhancement from noisy measurements via curriculum learning

paper_url: http://arxiv.org/abs/2309.02050
repo_url: https://github.com/xiaoyuans/manie
paper_authors: Kai Wu, Yuanyuan Li, Jing Liu
for: 提高网络推理模型对噪声的抵抗性能
methods: curriculum learning + 模型自适应阈值调整 + 数据 augmentation
results: 在多种噪声环境下，提高了各种网络推理模型的性能，特别是在清晰样本充沥的情况下表现出色

Abstract
Noise is a pervasive element within real-world measurement data, significantly undermining the performance of network inference models. However, the quest for a comprehensive enhancement framework capable of bolstering noise resistance across a diverse array of network inference models has remained elusive. Here, we present an elegant and efficient framework tailored to amplify the capabilities of network inference models in the presence of noise. Leveraging curriculum learning, we mitigate the deleterious impact of noisy samples on network inference models. Our proposed framework is model-agnostic, seamlessly integrable into a plethora of model-based and model-free network inference methods. Notably, we utilize one model-based and three model-free network inference methods as the foundation. Extensive experimentation across various synthetic and real-world networks, encapsulating diverse nonlinear dynamic processes, showcases substantial performance augmentation under varied noise types, particularly thriving in scenarios enriched with clean samples. This framework's adeptness in fortifying both model-free and model-based network inference methodologies paves the avenue towards a comprehensive and unified enhancement framework, encompassing the entire spectrum of network inference models. Available Code: https://github.com/xiaoyuans/MANIE.

摘要
<>Translate the given text into Simplified Chinese.<>噪声是现实世界测量数据中的一种普遍存在的元素，对网络推理模型的性能产生了重要的影响。然而，找到一个全面提高抗噪抗性的框架，能够在多种网络推理模型上提高性能，一直未能实现。在这里，我们提出了一个简洁而高效的框架，用于增强网络推理模型在噪声存在下的性能。我们利用课程学习，以mitigate噪声样本对网络推理模型的负面影响。我们提posed的框架是模型无关的，可以轻松地整合到多种模型基于和模型自由的网络推理方法中。特别是，我们使用了一个模型基于的和三个模型自由的网络推理方法作为基础。经过对多种人工和实际网络、包括多种非线性动力学过程的广泛实验，表明我们的框架在不同的噪声类型下具有显著的性能提高，特别是在充满清晰样本的场景下表现出色。这种框架的能力在加强模型自由和模型基于的网络推理方法方面表现出了广泛的可用性，为建立一个涵盖整个网络推理模型谱系的全面和统一的提高框架提供了道路。可以在 GitHub 上获取代码：https://github.com/xiaoyuans/MANIE。

Probabilistic Self-supervised Learning via Scoring Rules Minimization

paper_url: http://arxiv.org/abs/2309.02048
repo_url: None
paper_authors: Amirhossein Vahidi, Simon Schoßer, Lisa Wimmer, Yawei Li, Bernd Bischl, Eyke Hüllermeier, Mina Rezaei
For: 提高表示质量和避免归一化表示* Methods: 使用 probabilistic models 和知识传播来增强表示质量，并提出一种新的损失函数基于适当的分数规则* Results: 在多种下游任务上达到了superior的准确率和准确性，比自我超vised基线在广泛的实验中表现出色，demonstrating scalability and real-world applicability.

Abstract
In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN's convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method's optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.

摘要
在这篇论文中，我们提出了一种新的概率自编学习方法，即 Scoring Rule Minimization（ProSMIN），它利用概率模型来提高表示质量和消除塑性表示。我们的提议方法包括两个神经网络：在线网络和目标网络，它们相互合作，通过知识传递学习来学习各自的多样化表示分布。我们将输入样本提供两种扩展视图，使在线网络通过预测目标网络对同一个样本的不同扩展视图的表示来训练。我们使用我们新提出的损失函数，基于正确的探索规则。我们提供了对ProSMIN的优化过程的理论正确性的证明，这种视角证明了其优化过程的正确性和效果性，从而提高了表示质量。我们在多种下游任务上评估了我们的概率模型，包括内部分布式、外部分布式、数据损害、低学习率和转移学习等。我们的方法在各种实验中具有优于自编学习基eline的高精度和抗混淆性。

Data-Juicer: A One-Stop Data Processing System for Large Language Models

paper_url: http://arxiv.org/abs/2309.02033
repo_url: https://github.com/alibaba/data-juicer
paper_authors: Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou
for: LLama 大语模型数据处理
methods: Data-Juicer 提供50多个内置Operator和可插入工具，以提高模块化、可组合、可扩展性，用于多种 LLama 数据处理需求。
results: Empirical validation reveals up to 7.45% relative improvement in LLaMA performance, and up to 88.7% reduction in single-machine processing time.

Abstract
The immense evolution in Large Language Models (LLMs) has underscored the importance of massive, diverse, and high-quality data. Despite this, existing open-source tools for LLM data processing remain limited and mostly tailored to specific datasets, with an emphasis on the reproducibility of released data over adaptability and usability, inhibiting potential applications. In response, we propose a one-stop, powerful yet flexible and user-friendly LLM data processing system named Data-Juicer. Our system offers over 50 built-in versatile operators and pluggable tools, which synergize modularity, composability, and extensibility dedicated to diverse LLM data processing needs. By incorporating visualized and automatic evaluation capabilities, Data-Juicer enables a timely feedback loop to accelerate data processing and gain data insights. To enhance usability, Data-Juicer provides out-of-the-box components for users with various backgrounds, and fruitful data recipes for LLM pre-training and post-tuning usages. Further, we employ multi-facet system optimization and seamlessly integrate Data-Juicer with both LLM and distributed computing ecosystems, to enable efficient and scalable data processing. Empirical validation of the generated data recipes reveals considerable improvements in LLaMA performance for various pre-training and post-tuning cases, demonstrating up to 7.45% relative improvement of averaged score across 16 LLM benchmarks and 16.25% higher win rate using pair-wise GPT-4 evaluation. The system's efficiency and scalability are also validated, supported by up to 88.7% reduction in single-machine processing time, 77.1% and 73.1% less memory and CPU usage respectively, and 7.91x processing acceleration when utilizing distributed computing ecosystems. Our system, data recipes, and multiple tutorial demos are released, calling for broader research centered on LLM data.

摘要
“巨大的语言模型（LLM）演化带来了大量、多样化和高质量数据的重要性。然而，现有的开源工具 для LLM 数据处理仍然有限，主要是为特定数据集设计的，强调数据重现性而不是适应性和用户友好性，这限制了其应用前景。为此，我们提出了一个一站式、强大 yet 灵活和用户友好的 LLM 数据处理系统，名为 Data-Juicer。我们的系统提供了50多种快速组合和可插入工具，以提高模块性、可 compose 性和扩展性，以满足不同 LLM 数据处理需求。通过包含可视化和自动评估功能，Data-Juicer 可以提供时效的反馈循环，加速数据处理并获得数据视图。为了提高可用性，Data-Juicer 提供了适合不同背景的准备组件，以及 LLMA 预训练和后处理用例的有用数据荚。此外，我们采用多方面优化和与 LLM 和分布式计算环境集成，以实现高效和可扩展的数据处理。我们的实验 validate 了生成的数据荚，显示 LLMA 性能提高明显，在16个 LLMBenchmark 和16个 GPT-4 评估中，相对提高7.45%的平均分数，并在对比评估中提高16.25%的胜率。系统的效率和扩展性也得到了 validate，包括单机处理时间减少88.7%、内存和CPU使用量减少77.1%和73.1%，以及使用分布式计算环境时的处理加速7.91倍。我们的系统、数据荚和多个教程示例都已经发布，呼吁更广泛的 LLM 数据研究。”

Non-Parametric Representation Learning with Kernels

paper_url: http://arxiv.org/abs/2309.02028
repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
paper_authors: Pascal Esser, Maximilian Fleissner, Debarghya Ghoshdastidar
for: 学习无监督的特征表示，从无标签数据中学习有用的特征。
methods: 使用kernel-based方法进行表示学习，包括对冲损函数和自适应 Encoder（AE）模型。
results: 提出了新的表示理论，并 derive了泛化误差上限，进行表示学习的评估。

Abstract
Unsupervised and self-supervised representation learning has become popular in recent years for learning useful features from unlabelled data. Representation learning has been mostly developed in the neural network literature, and other models for representation learning are surprisingly unexplored. In this work, we introduce and analyze several kernel-based representation learning approaches: Firstly, we define two kernel Self-Supervised Learning (SSL) models using contrastive loss functions and secondly, a Kernel Autoencoder (AE) model based on the idea of embedding and reconstructing data. We argue that the classical representer theorems for supervised kernel machines are not always applicable for (self-supervised) representation learning, and present new representer theorems, which show that the representations learned by our kernel models can be expressed in terms of kernel matrices. We further derive generalisation error bounds for representation learning with kernel SSL and AE, and empirically evaluate the performance of these methods in both small data regimes as well as in comparison with neural network based models.

摘要
<> translate into Simplified Chinese无监督和自监督表示学习在最近几年内变得非常流行，以学习无标记数据中的有用特征。表示学习主要发展在神经网络文献中，而其他模型表示学习却尚未得到探索。在这项工作中，我们引入并分析了几种基于核函数的表示学习方法：首先，我们定义了两种抽象损失函数基于自我监督学习（SSL）模型，其次，基于数据嵌入和重建的核自动编码（AE）模型。我们认为经典的supervised机器学习的表示定理不一定适用于（自监督）表示学习，并提出了新的表示定理，其中表示学习得到的表示可以表示为核矩阵。我们进一步 deriv Generalization Error bounds for representation learning with kernel SSL和AE，并对这些方法在小数据 régime和与神经网络模型相比进行实验评估。Note: "Simplified Chinese" is a romanization of Chinese characters, which is used to represent the language in the Latin alphabet. It is not a translation of the text into Traditional Chinese, which is a different writing system.

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

paper_url: http://arxiv.org/abs/2309.02027
repo_url: None
paper_authors: Katerina Hlavackova-Schindler, Anna Melnykova, Irene Tubikanec
for: 这个论文主要针对多变量郝克过程（MHPs）中的连接图生成和选择问题，并提出一种基于最小消息长度（MML）原理的优化 критерион和模型选择算法。
methods: 该论文使用了扩展衰减函数和优化 критерион，并通过比较不同模型对数据的适应度和 concise度来选择最佳模型。
results: 对于短时间适应度较高的场景，该方法可以达到最高的 F1 分数，并在具有特定稀疏图设置下进行了数值研究。更进一步，通过应用于 G7 财政债券数据，该方法可以获得一致的 causal 连接，与专业知识一致。

Abstract
Multivariate Hawkes processes (MHPs) are versatile probabilistic tools used to model various real-life phenomena: earthquakes, operations on stock markets, neuronal activity, virus propagation and many others. In this paper, we focus on MHPs with exponential decay kernels and estimate connectivity graphs, which represent the Granger causal relations between their components. We approach this inference problem by proposing an optimization criterion and model selection algorithm based on the minimum message length (MML) principle. MML compares Granger causal models using the Occam's razor principle in the following way: even when models have a comparable goodness-of-fit to the observed data, the one generating the most concise explanation of the data is preferred. While most of the state-of-art methods using lasso-type penalization tend to overfitting in scenarios with short time horizons, the proposed MML-based method achieves high F1 scores in these settings. We conduct a numerical study comparing the proposed algorithm to other related classical and state-of-art methods, where we achieve the highest F1 scores in specific sparse graph settings. We illustrate the proposed method also on G7 sovereign bond data and obtain causal connections, which are in agreement with the expert knowledge available in the literature.

摘要
多变量庞克过程（MHP）是一种通用的概率工具，用于模拟各种实际场景：地震、股票市场交易、神经元活动、病毒传播等。在这篇论文中，我们关注MHP中的凝聚函数和抽象树的估计问题。我们使用最小消息长度（MML）原理来解决这个问题，MML比较不同庞克模型的适应度，并选择最简洁的模型。大多数当前的方法使用lasso类型的惩罚往往会过拟合短时间尺度下的场景，而我们提出的MML基于方法在这些设置下达到了最高的F1分数。我们进行了一个数字实验，比较了我们的方法和其他相关的古典和当前状态的方法，我们在特定的稀疏图设置下达到了最高的F1分数。我们还应用了我们的方法在G7国债数据中，并获得了一致的 causal 连接，与文献中的专家知识一致。

RDGSL: Dynamic Graph Representation Learning with Structure Learning

paper_url: http://arxiv.org/abs/2309.02025
repo_url: None
paper_authors: Siwei Zhang, Yun Xiong, Yao Zhang, Yiheng Sun, Xi Chen, Yizhu Jiao, Yangyong Zhu
for: 本研究旨在学习 kontinuous-time 动态图 Representation，以提高下游任务的效果。
methods: 本研究提出了 RDGSL 方法，具有 dynamic graph structure learning 和 Temporal Embedding Learner 两个重要组成部分。dynamic graph structure learning 可以有效地抑制噪声，Temporal Embedding Learner 可以选择ively ignore 噪声边，以提高 Representation 的表达力。
results: 本研究的方法在 downstream 任务中表现出了5.1% 绝对 AUC 提升，与第二个基线相比。

Abstract
Temporal Graph Networks (TGNs) have shown remarkable performance in learning representation for continuous-time dynamic graphs. However, real-world dynamic graphs typically contain diverse and intricate noise. Noise can significantly degrade the quality of representation generation, impeding the effectiveness of TGNs in downstream tasks. Though structure learning is widely applied to mitigate noise in static graphs, its adaptation to dynamic graph settings poses two significant challenges. i) Noise dynamics. Existing structure learning methods are ill-equipped to address the temporal aspect of noise, hampering their effectiveness in such dynamic and ever-changing noise patterns. ii) More severe noise. Noise may be introduced along with multiple interactions between two nodes, leading to the re-pollution of these nodes and consequently causing more severe noise compared to static graphs. In this paper, we present RDGSL, a representation learning method in continuous-time dynamic graphs. Meanwhile, we propose dynamic graph structure learning, a novel supervisory signal that empowers RDGSL with the ability to effectively combat noise in dynamic graphs. To address the noise dynamics issue, we introduce the Dynamic Graph Filter, where we innovatively propose a dynamic noise function that dynamically captures both current and historical noise, enabling us to assess the temporal aspect of noise and generate a denoised graph. We further propose the Temporal Embedding Learner to tackle the challenge of more severe noise, which utilizes an attention mechanism to selectively turn a blind eye to noisy edges and hence focus on normal edges, enhancing the expressiveness for representation generation that remains resilient to noise. Our method demonstrates robustness towards downstream tasks, resulting in up to 5.1% absolute AUC improvement in evolving classification versus the second-best baseline.

摘要
temps 图网络（TGNs）在学习 continuous-time 动态图 Representation 方面表现出色，但实际世界中的动态图通常具有多样化和复杂的噪音。噪音可以对 Representation 生成质量产生重要影响，从而降低 TGNs 在下游任务中的效果。虽然结构学习在静止图中广泛应用，但其在动态图设置中存在两个主要挑战。i) 噪音动态性。现有的结构学习方法无法 Address 动态噪音的问题，因此其效iveness 在这些动态和改变中的噪音模式下受限。ii) 更严重的噪音。噪音可能会在两个节点之间多种交互中引入，导致这些节点重新污染，从而导致更严重的噪音 compared to 静止图。在这篇文章中，我们提出了 RDGSL，一种在 continuous-time 动态图上进行 Representation 学习的方法。同时，我们提出了动态图结构学习，一种新的监督信号，可以让 RDGSL 在动态图上有效地抗抗噪音。为 Address 噪音动态性问题，我们引入了动态噪音函数，可以动态地捕捉当前和历史噪音，使我们可以评估动态图中噪音的 temporal 方面，并生成一个 Denoised 图。此外，我们还提出了时间 Embedding Learner，可以在更严重的噪音下提高 Representation 生成的表达能力。我们的方法在下游任务中表现了Robustness，相比第二Best baseline，我们的方法在 evolving 分类中实现了5.1%的绝对 AUC 提升。

PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

paper_url: http://arxiv.org/abs/2309.02014
repo_url: None
paper_authors: Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell
for: 解决大规模的几何优化问题，如机器学习中的ridge和logistic回归问题。
methods: 使用绘制技术来实现预处理的渐进搜索法，包括SVRG、SAGA和Katyusha等方法，每个方法都有强大的理论分析和有效的默认超参数设置。
results: 经验表明，提出的方法可以在 default 超参数设置下超过或与通过手动调整的梯度搜索优化器相比，并且在实际中也能够更快地达到 globally 线性减少。

Abstract
This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf{M}$ethods by $\textbf{I}$ncorporating $\textbf{S}$calable Curvature $\textbf{E}$stimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. In contrast, traditional stochastic gradient methods require careful hyperparameter tuning to succeed, and degrade in the presence of ill-conditioning, a ubiquitous phenomenon in machine learning. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers on a test bed of $51$ ridge and logistic regression problems assembled from benchmark machine learning repositories. On the theoretical side, this paper introduces the notion of quadratic regularity in order to establish linear convergence of all proposed methods even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice, and explains the fast global linear convergence of the proposed methods.

摘要
Empirically, we demonstrate the superiority of the proposed algorithms by showing that they outperform or match popular tuned stochastic gradient optimizers on a test bed of $51$ ridge and logistic regression problems from benchmark machine learning repositories, using default hyperparameter values.Theoretically, this paper introduces the concept of quadratic regularity to establish the linear convergence of all proposed methods, even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice. This explains the fast global linear convergence of the proposed methods.

Representation Learning Dynamics of Self-Supervised Models

paper_url: http://arxiv.org/abs/2309.02011
repo_url: None
paper_authors: Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar
for: 本研究探讨了无监督学习（Self-Supervised Learning）模型中的学习动力学，具体来说是对减少对比和非对比损失来获得的表示进行研究。
methods: 作者使用了多变量回归模型的动力学来探讨SSL模型的学习动力学，并提出了包含对齐约束的SSL目标函数。
results: 研究发现，使用 gradient descent 在 Grassmannian manifold 上训练 SSL 模型时，模型会学习简单的标量表示，导致维度减少现象出现。作者还证明了无监督学习模型在无穷宽approximation中与supervised模型之间存在很大差异。

Abstract
Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.

摘要

Establishing a real-time traffic alarm in the city of Valencia with Deep Learning

paper_url: http://arxiv.org/abs/2309.02010
repo_url: None
paper_authors: Miguel Folgado, Veronica Sanz, Johannes Hirn, Edgar Lorenzo-Saez, Javier Urchueguia
for: 这项研究的目的是分析城市劳伦сия（Valencia，西班牙）的交通征流与污染物的相关性，以及开发一种用于预测下一30分钟内特定街区是否会出现异常高交通流的警报系统。
methods: 该研究使用了2018年的交通数据，通过Long Short-Term Memory（LSTM）神经网络进行预测，并在2019年的交通数据上进行测试。
results: 研究发现，交通征流对某些污染物（特别是$\text{NO}_\text{x}$）的水平有显著影响。同时，该研究开发出了一种独立的三级水平警报系统，可以预测特定街区在下一30分钟内是否会出现异常高交通流。

Abstract
Urban traffic emissions represent a significant concern due to their detrimental impacts on both public health and the environment. Consequently, decision-makers have flagged their reduction as a crucial goal. In this study, we first analyze the correlation between traffic flux and pollution in the city of Valencia, Spain. Our results demonstrate that traffic has a significant impact on the levels of certain pollutants (especially $\text{NO}_\text{x}$). Secondly, we develop an alarm system to predict if a street is likely to experience unusually high traffic in the next 30 minutes, using an independent three-tier level for each street. To make the predictions, we use traffic data updated every 10 minutes and Long Short-Term Memory (LSTM) neural networks. We trained the LSTM using traffic data from 2018, and tested it using traffic data from 2019.

摘要
城市交通排放对公共健康和环境产生了重要的影响，因此决策者们将其减少列为重要目标。在这项研究中，我们首先分析了Valencia市的交通流和污染物之间的相关性。我们的结果显示，交通具有对某些污染物（尤其是$\text{NO}_\text{x}$）的显著影响。其次，我们开发了一个预测在下一个30分钟内街道是否会出现异常高交通流的警示系统，并将每条街道分为三级水平。为了进行预测，我们使用了每10分钟更新的交通数据和Long Short-Term Memory（LSTM）神经网络。我们使用2018年的交通数据进行训练，并在2019年的交通数据上进行测试。

An LSTM-Based Predictive Monitoring Method for Data with Time-varying Variability

paper_url: http://arxiv.org/abs/2309.01978
repo_url: None
paper_authors: Jiaqi Qiu, Yu Lin, Inez Zwetsloot
for: 本文旨在探讨使用循环神经网络（RNN）和其变体来实现预测性监测，以检测数据中的异常现象。
methods: 本文提出了基于长短期记忆（LSTM）预测 интерVAL的控制图，用于监测时间变化的数据。
results: simulations 和实际应用表明，提出的方法在检测平均值变化时表现出色，并且在实际时系列感知器数据中得到了证明。

Abstract
The recurrent neural network and its variants have shown great success in processing sequences in recent years. However, this deep neural network has not aroused much attention in anomaly detection through predictively process monitoring. Furthermore, the traditional statistic models work on assumptions and hypothesis tests, while neural network (NN) models do not need that many assumptions. This flexibility enables NN models to work efficiently on data with time-varying variability, a common inherent aspect of data in practice. This paper explores the ability of the recurrent neural network structure to monitor processes and proposes a control chart based on long short-term memory (LSTM) prediction intervals for data with time-varying variability. The simulation studies provide empirical evidence that the proposed model outperforms other NN-based predictive monitoring methods for mean shift detection. The proposed method is also applied to time series sensor data, which confirms that the proposed method is an effective technique for detecting abnormalities.

摘要
“Recurrent neural network（RNN）和其变体在过去几年内得到了广泛的成功，但它尚未吸引过多的注意力在预测过程监测中。此外，传统的统计模型基于假设和假设测试，而神经网络（NN）模型则不需要这么多假设。这种灵活性使得NN模型在数据中具有时变变量的效率，这是实际数据中的一个常见特征。本文探讨了RNN结构在监测过程中的能力，并提出了基于长期快短时尺度内预测 интерval的控制图。实验研究表明，提议的模型在mean shift探测方面表现出色，并且在时间序列感知数据中进行了有效的异常检测。”Note: The translation is in Simplified Chinese, which is one of the two standard Chinese dialects. The other dialect is Traditional Chinese.

AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis

paper_url: http://arxiv.org/abs/2309.01966
repo_url: None
paper_authors: Lei Guan
for: 提出了一种高效的优化器 called AdaPlus，它将Nesterov冲击和精确步长调整与AdamW基础结合。
methods: 使用了AdamW、Nadam和AdaBelief的优点，而无需添加额外的超参数。
results: 通过对三个机器学习任务进行广泛的实验评估，证明了AdaPlus在图像分类任务中表现最优，并在语言模型任务和生成器任务中表现出最高的稳定性。

Abstract
This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus. The experiment results validate that AdaPlus (i) is the best adaptive method which performs most comparable with (even slightly better than) SGD with momentum on image classification tasks and (ii) outperforms other state-of-the-art optimizers on language modeling tasks and illustrates the highest stability when training GANs. The experiment code of AdaPlus is available at: https://github.com/guanleics/AdaPlus.

摘要
这份论文提出了一种高效的优化器called AdaPlus，它将Nesterov势量和精确步长调整 integrate到AdamW基础上。AdaPlus结合了AdamW、Nadam和AdaBelief的优点，并不添加任何额外hyper参数。我们在三个机器学习任务上进行了广泛的实验评估，以验证AdaPlus的效果。实验结果表明，AdaPlus：（1）在图像分类任务上与SGD势量几乎相同，甚至有slightly better的性能。（2）在语言模型任务上超过其他当前顶尖优化器。（3）在训练GAN时显示出最高稳定性。AdaPlus的实验代码可以在https://github.com/guanleics/AdaPlus中找到。

paper_url: http://arxiv.org/abs/2309.02467
repo_url: None
paper_authors: Yu Huang, Jingchuan Guo, William T Donahoo, Zhengkang Fan, Ying Lu, Wei-Han Chen, Huilin Tang, Lori Bilello, Elizabeth A Shenkman, Jiang Bian
for: 这个研究的目的是开发一个基于电子健康记录（EHR）的机器学习（ML）分析管道，以识别患有型二糖尿病（T2D）患者的社会需求不足，并且对这些需求进行解释性AI（XAI）评估和优化。methods: 这个研究使用了大学佐华利 Integrated Data Repository（UFH IR）中的EHR数据，包括社会Determinants of health（SDoH）和个体级SDoH，并开发了一个基于EHR的ML分析管道，称为个体化多社会风险分数（iPsRS），以识别患有T2D的患者中的高社会风险。results: 我们的iPsRS在各个种族-民族群体中进行了公平优化后，C statistic为0.72，可以准确预测患有T2D的患者1年内的入院风险。iPsRS能够准确捕捉高入院风险的个体，实际1年内top 5%的iPsRS的入院率比底层分数段高约13倍。

Abstract
Background: Racial and ethnic minority groups and individuals facing social disadvantages, which often stem from their social determinants of health (SDoH), bear a disproportionate burden of type 2 diabetes (T2D) and its complications. It is therefore crucial to implement effective social risk management strategies at the point of care. Objective: To develop an EHR-based machine learning (ML) analytical pipeline to identify the unmet social needs associated with hospitalization risk in patients with T2D. Methods: We identified 10,192 T2D patients from the EHR data (from 2012 to 2022) from the University of Florida Health Integrated Data Repository, including contextual SDoH (e.g., neighborhood deprivation) and individual-level SDoH (e.g., housing stability). We developed an electronic health records (EHR)-based machine learning (ML) analytic pipeline, namely individualized polysocial risk score (iPsRS), to identify high social risk associated with hospitalizations in T2D patients, along with explainable AI (XAI) techniques and fairness assessment and optimization. Results: Our iPsRS achieved a C statistic of 0.72 in predicting 1-year hospitalization after fairness optimization across racial-ethnic groups. The iPsRS showed excellent utility for capturing individuals at high hospitalization risk; the actual 1-year hospitalization rate in the top 5% of iPsRS was ~13 times as high as the bottom decile. Conclusion: Our ML pipeline iPsRS can fairly and accurately screen for patients who have increased social risk leading to hospitalization in T2D patients.

摘要
背景：种族和民族少数群体和受到社会不利条件影响的个人患有类型2糖尿病（T2D）和其并发症的负担较大。因此，在点患者处实施有效的社会风险管理策略是非常重要。目标：开发基于电子健康纪录（EHR）的机器学习（ML）分析管道，以识别患有T2D患者的社会需求不足，与住院风险相关。方法：我们从2012年至2022年的University of Florida Health Integrated Data Repository中提取了10,192名T2D患者的EHR数据，包括上下文性社会 determinants of health（SDoH）和个体级SDoH（如住房稳定）。我们开发了基于EHR的ML分析管道，称为个体化多社会风险分数（iPsRS），以识别患有T2D患者的高社会风险，同时使用可解释AI（XAI）技术和公平评估和优化。结果：我们的iPsRS在公平优化后，在不同种族-民族群体中的CStatistic为0.72，能够准确预测患有T2D患者1年内的住院风险。iPsRS表现出色地捕捉了高住院风险的个体，实际1年内患有T2D患者在top5%的iPsRS中住院率高达13倍于bottom decile。结论：我们的ML管道iPsRS可以公平、准确地在患有T2D患者中识别具有高社会风险的患者，并且可以通过可解释AI技术和公平评估和优化来提高其效果。

RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

paper_url: http://arxiv.org/abs/2309.01918
repo_url: None
paper_authors: Homanga Bharadhwaj, Jay Vakil, Mohit Sharma, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar
for: 这篇论文旨在开发一种能够快速 multiply 现有数据集，并提取高性能策略的 universal agent 训练系统。
methods: 该系统使用 semantic augmentations 和 action representations 来快速训练 universal agent，并使用可靠的任务条件和表达能力架构来实现多种 manipulate 技能。
results: 通过使用仅 7500 示例，该系统可以训练一个可以执行 12 种技能的 universal agent，并在不同的 kitchen 场景中展示其普遍性和多样性。在未seen 情况下，RoboAgent 的性能高于先前方法，并且更加 Sample Efficient。视频详情请参考 https://robopen.github.io/.

Abstract
The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets is strenuous due to manual efforts, operational costs, and safety challenges. A path toward such an universal agent would require a structured framework capable of wide generalization but trained within a reasonable data budget. In this paper, we develop an efficient system (RoboAgent) for training universal agents capable of multi-task manipulation skills using (a) semantic augmentations that can rapidly multiply existing datasets and (b) action representations that can extract performant policies with small yet diverse multi-modal datasets without overfitting. In addition, reliable task conditioning and an expressive policy architecture enable our agent to exhibit a diverse repertoire of skills in novel situations specified using language commands. Using merely 7500 demonstrations, we are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks spread across common daily activities in diverse kitchen scenes. On average, RoboAgent outperforms prior methods by over 40% in unseen situations while being more sample efficient and being amenable to capability improvements and extensions through fine-tuning. Videos at https://robopen.github.io/

摘要
“我们的大目标是建立一个可以 manipulate 任意物品的多功能机器人，但是实际上存在着机器人学习数据的稀缺。 acquiring 和 growing 这些数据需要许多人工干预、操作成本和安全挑战。为了实现这个目标，我们需要一个结构化的框架，可以实现广泛的普遍化，并在有限的数据预算下训练。在这篇论文中，我们开发了一个高效的系统（RoboAgent），可以通过（a）实义增强和（b）动作表示来快速增加现有数据，并将小量多 modal 数据中的精致政策EXTRACT。此外，我们的任务条件和表达政策架构可以让我们的代理人在新的语言指令下展现多元的技能。仅从7500次示例中，我们能够训练一个可以拥有12种技能的单一代理人，并在38个任务中展现其普遍性。在未见的情况下，RoboAgent 比PRIOR METHODS 高出40%的性能，同时更加sample efficient 和可以通过精致化和扩展来提高能力。影片请参考https://robopen.github.io/”

A Survey on Physics Informed Reinforcement Learning: Review and Open Problems

paper_url: http://arxiv.org/abs/2309.01909
repo_url: None
paper_authors: Chayan Banerjee, Kien Nguyen, Clinton Fookes, Maziar Raissi
for: 这种研究旨在推动physics-informed reinforcement learning（PIRL）的发展，增强机器学习框架中的物理信息 incorporation，以提高physical plausibility和数据效率。
methods: 这篇文章通过对现有的physics-informed reinforcement learning（PIRL）方法进行系统性的综述和分类，批判性地分析了它们的不同特点和适用场景，从而提供了一个权威的taxonomy。
results: 该文章提供了一个全面的视角，把physics-informed reinforcement learning（PIRL）的实现方法分类为不同的类别，并指出了这个领域的应用场景、潜在的挑战和未来研究的方向。

Abstract
The inclusion of physical information in machine learning frameworks has revolutionized many application areas. This involves enhancing the learning process by incorporating physical constraints and adhering to physical laws. In this work we explore their utility for reinforcement learning applications. We present a thorough review of the literature on incorporating physics information, as known as physics priors, in reinforcement learning approaches, commonly referred to as physics-informed reinforcement learning (PIRL). We introduce a novel taxonomy with the reinforcement learning pipeline as the backbone to classify existing works, compare and contrast them, and derive crucial insights. Existing works are analyzed with regard to the representation/ form of the governing physics modeled for integration, their specific contribution to the typical reinforcement learning architecture, and their connection to the underlying reinforcement learning pipeline stages. We also identify core learning architectures and physics incorporation biases (i.e., observational, inductive and learning) of existing PIRL approaches and use them to further categorize the works for better understanding and adaptation. By providing a comprehensive perspective on the implementation of the physics-informed capability, the taxonomy presents a cohesive approach to PIRL. It identifies the areas where this approach has been applied, as well as the gaps and opportunities that exist. Additionally, the taxonomy sheds light on unresolved issues and challenges, which can guide future research. This nascent field holds great potential for enhancing reinforcement learning algorithms by increasing their physical plausibility, precision, data efficiency, and applicability in real-world scenarios.

摘要
机器学习框架中包含物理信息的包含已经革命化了许多应用领域。这些包含物理约束和遵循物理法律，以提高学习过程的精度和有效性。在这项工作中，我们探讨物理信息在强化学习应用中的用途。我们提出了一种新的分类方法，将现有的强化学习方法分为三类：观察性、推理性和学习性。我们还对现有的强化学习方法进行了分析，包括物理模型的表示形式、在强化学习架构中的特点和与强化学习流程的连接。此外，我们还发现了现有的强化学习方法的核心学习架构和物理包含偏好。这种分类方法为未来研究提供了一个整体的视角，并且为实现物理信息的包含提供了一个有效的方法。此外，这种分类方法还透视了物理信息的包含在强化学习中的潜在问题和挑战，以及未来研究的可能性。这个领域的发展潜力很大，可以提高强化学习算法的物理可能性、精度、数据效率和实际应用场景中的适用性。

Extended Symmetry Preserving Attention Networks for LHC Analysis

paper_url: http://arxiv.org/abs/2309.01886
repo_url: None
paper_authors: Michael James Fenton, Alexander Shmakov, Hideki Okawa, Yuji Li, Ko-Yang Hsiao, Shih-Chieh Hsu, Daniel Whiteson, Pierre Baldi
for: 这篇论文是用来探索重积过的大型内部积体（ttH）搜寻、Top颗粒子质量测量和重Z’粒子衰变到Top颗粒子对的搜寻。
methods: 这篇论文使用了一种简化的注意力机制——对称保持注意力网络（SPANet），并将其扩展到考虑多个输入流，例如电子和全事件特征。
results: 研究发现在 semi-leptonic 探索中，使用扩展的 SPANet 可以获得 significative 的改善，并在 three 个代表性的研究中提供了详细的结果。

Abstract
Reconstructing unstable heavy particles requires sophisticated techniques to sift through the large number of possible permutations for assignment of detector objects to partons. An approach based on a generalized attention mechanism, symmetry preserving attention networks (SPANet), has been previously applied to top quark pair decays at the Large Hadron Collider, which produce six hadronic jets. Here we extend the SPANet architecture to consider multiple input streams, such as leptons, as well as global event features, such as the missing transverse momentum. In addition, we provide regression and classification outputs to supplement the parton assignment. We explore the performance of the extended capability of SPANet in the context of semi-leptonic decays of top quark pairs as well as top quark pairs produced in association with a Higgs boson. We find significant improvements in the power of three representative studies: search for ttH, measurement of the top quark mass and a search for a heavy Z' decaying to top quark pairs. We present ablation studies to provide insight on what the network has learned in each case.

摘要
重新建构不稳定的重子 particle需要使用复杂的技术来筛选大量的可能性，以将测器对象分配给束子。一种基于通用注意机制的 Symmetry Preserving Attention Networks (SPANet) 已经在大引子中子粒子机器人中应用于 top quark pair 减谱，该过程产生六个有征的树脂。在这里，我们扩展了 SPANet 架构，考虑多个输入流，如电子和全局事件特征，如转移质量。此外，我们还提供了 regression 和 classification 输出，以补充束子分配。我们在 semi-leptonic decay 中研究了 top quark pair 的扩展能力，以及 top quark pair 与 Higgs boson 共生生成的情况。我们发现，在三个表型研究中，使用扩展的 SPANet 能力具有显著改善。我们还进行了剥离研究，以了解每个情况中网络学习的内容。

Task Generalization with Stability Guarantees via Elastic Dynamical System Motion Policies

paper_url: http://arxiv.org/abs/2309.01884
repo_url: None
paper_authors: Tianyu Li, Nadia Figueroa
for: 本研究旨在提出一种基于动力系统（DS）的学习 FROM DEMONSTRATION（LfD）方法，以实现从少量轨迹学习稳定和准确的激发动作策略，并能够扩展到新的任务实例。
methods: 该方法基于线性参数变化（LPV）DS模型，并使用 Gaussian Mixture Model（GMM）来捕捉任务相关的参数变化。在新任务实例/上下文中，GMM将被改变并使用Laplacian Editing来重新估计LPV-DS策略。
results: 在许多 simulate 和实际 робо臂实验中，Elastic-DS 表现出了高度的灵活性和扩展性，同时保持了控制理论上的保证。详细视频可以在https://sites.google.com/view/elastic-ds 找到。

Abstract
Dynamical System (DS) based Learning from Demonstration (LfD) allows learning of reactive motion policies with stability and convergence guarantees from a few trajectories. Yet, current DS learning techniques lack the flexibility to generalize to new task instances as they ignore explicit task parameters that inherently change the underlying trajectories. In this work, we propose Elastic-DS, a novel DS learning, and generalization approach that embeds task parameters into the Gaussian Mixture Model (GMM) based Linear Parameter Varying (LPV) DS formulation. Central to our approach is the Elastic-GMM, a GMM constrained to SE(3) task-relevant frames. Given a new task instance/context, the Elastic-GMM is transformed with Laplacian Editing and used to re-estimate the LPV-DS policy. Elastic-DS is compositional in nature and can be used to construct flexible multi-step tasks. We showcase its strength on a myriad of simulated and real-robot experiments while preserving desirable control-theoretic guarantees. Supplementary videos can be found at https://sites.google.com/view/elastic-ds

摘要
dynamical system (DS) 基于学习from Demonstration (LfD) 可以从一些轨迹学习反应性动作策略，并且有稳定性和收敛保证。然而，当前的 DS 学习技术 ignore 表达式 task 参数，这些参数直接影响下面的轨迹，从而导致学习不具有普适性。在这项工作中，我们提出了 Elastic-DS，一种新的 DS 学习和总结方法，该方法在 Gaussian Mixture Model (GMM) 基于 Linear Parameter Varying (LPV) DS 形式ulation中嵌入任务参数。中心思想是 Elastic-GMM，一个受 SE(3) 任务相关框架约束的 GMM。给定一个新任务实例/上下文，Elastic-GMM 将通过 Laplacian Editing 变换，并用来重新估计 LPV-DS 政策。Elastic-DS 是可组合的性的，可以用于构建灵活的多步任务。我们在许多模拟和真实机器人实验中证明了其强大，同时保持了控制理论上的保证。补充视频可以在 https://sites.google.com/view/elastic-ds 找到。

2023-09-05

Superclustering by finding statistically significant separable groups of optimal gaussian clusters

Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts

Generative Algorithms for Fusion of Physics-Based Wildfire Spread Models with Satellite Data for Initializing Wildfire Forecasts

T-SaS: Toward Shift-aware Dynamic Adaptation for Streaming Data

Distributed Variational Inference for Online Supervised Learning

Screening of Pneumonia and Urinary Tract Infection at Triage using TriNet

Causal Structure Recovery of Linear Dynamical Systems: An FFT based Approach

Sparse Partitioning Around Medoids

Data Aggregation for Hierarchical Clustering

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

Diffusion on the Probability Simplex

Adaptive Adversarial Training Does Not Increase Recourse Costs

Comparative Analysis of CPU and GPU Profiling for Deep Learning Models

Towards User Guided Actionable Recourse

Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework

Monotone Tree-Based GAMI Models by Adapting XGBoost

On the Minimax Regret in Online Ranking with Top-k Feedback

Maximum Mean Discrepancy Meets Neural Networks: The Radon-Kolmogorov-Smirnov Test

Computing SHAP Efficiently Using Model Structure Information

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

Explaining grokking through circuit efficiency

A Lightweight and Transferable Design for Robust LEGO Manipulation

Exact Inference for Continuous-Time Gaussian Process Dynamics

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

Resilient VAE: Unsupervised Anomaly Detection at the SLAC Linac Coherent Light Source

A study on the impact of pre-trained model on Just-In-Time defect prediction

Inferring effective couplings with Restricted Boltzmann Machines

A Comparison of Residual-based Methods on Fault Detection

Graph-Based Automatic Feature Selection for Multi-Class Classification via Mean Simplified Silhouette

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

RoBoSS: A Robust, Bounded, Sparse, and Smooth Loss Function for Supervised Learning

Self-Similarity-Based and Novelty-based loss for music structure analysis

Sample Size in Natural Language Processing within Healthcare Research

Distributionally Robust Machine Learning with Multi-source Data

Latent Disentanglement in Mesh Variational Autoencoders Improves the Diagnosis of Craniofacial Syndromes and Aids Surgical Planning

Language Models for Novelty Detection in System Call Traces

On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence

Sparse Function-space Representation of Neural Networks

Personalized Federated Deep Reinforcement Learning-based Trajectory Optimization for Multi-UAV Assisted Edge Computing

Bias Propagation in Federated Learning

A Simple Asymmetric Momentum Make SGD Greatest Again

Exploiting Spatial-temporal Data for Sleep Stage Classification via Hypergraph Learning

An Efficient Approach to Unsupervised Out-of-Distribution Detection with Variational Autoencoders

BeeTLe: A Framework for Linear B-Cell Epitope Prediction and Classification

Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI

MvFS: Multi-view Feature Selection for Recommender System

No-Regret Caching with Noisy Request Estimates

Model-agnostic network inference enhancement from noisy measurements via curriculum learning

Probabilistic Self-supervised Learning via Scoring Rules Minimization

Data-Juicer: A One-Stop Data Processing System for Large Language Models

Non-Parametric Representation Learning with Kernels

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

RDGSL: Dynamic Graph Representation Learning with Structure Learning

PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

Representation Learning Dynamics of Self-Supervised Models

Establishing a real-time traffic alarm in the city of Valencia with Deep Learning

An LSTM-Based Predictive Monitoring Method for Data with Time-varying Variability

AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis

Developing A Fair Individualized Polysocial Risk Score (iPsRS) for Identifying Increased Social Risk of Hospitalizations in Patients with Type 2 Diabetes (T2D)

RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

A Survey on Physics Informed Reinforcement Learning: Review and Open Problems

Extended Symmetry Preserving Attention Networks for LHC Analysis

Task Generalization with Stability Guarantees via Elastic Dynamical System Motion Policies