cs.LG - 2023-11-06

CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers

  • paper_url: http://arxiv.org/abs/2311.03615
  • repo_url: None
  • paper_authors: Jieming Bian, Shaolei Ren, Jie Xu
  • for: 本研究旨在探讨跨地区数据中心(geo-distributed)上训练人工智能(AI)模型的挑战,并寻找平衡学习性和环境影响的方法。
  • methods: 本研究提出了一种名为CAFE(碳负荷意识 federated learning)的新框架,该框架通过在固定碳负荷预算内优化训练,以最大化学习性和降低环境影响。
  • results: 通过对实际碳负荷数据进行广泛的 simulations,我们证明了我们的算法的有效性,并证明了它在最大化学习性而最小化环境影响方面的优越性。
    Abstract Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
    摘要 训练大规模人工智能(AI)模型需要巨量的计算资源和能源,导致增加碳脚印,有可能对环境产生影响。这篇论文探讨跨地区分布(geo-distributed)数据中心训练AI模型中的挑战,强调学习性和碳脚印之间的平衡。我们认为联邦学习(Federated Learning,FL)是一种解决方案,它强调模型参数交换而不是原始数据,以保护数据隐私和遵循当地法规。由于地区碳素数据的变化,我们提出了一个名为CAFE(碳脚印感知联邦学习)的新框架,用于优化训练在固定碳脚印预算内。我们的方法包括核心选择来评估学习性,使用了Lyapunov逐步加速策略来Address未来碳素数据的不可预测性,并开发了高效的数据中心选择算法。通过使用实际碳素数据进行广泛的 simulations,我们证明了我们的算法的有效性,指出它在最小化学习性和环境影响之间寻找平衡。

Plug-and-Play Stability for Intracortical Brain-Computer Interfaces: A One-Year Demonstration of Seamless Brain-to-Text Communication

  • paper_url: http://arxiv.org/abs/2311.03611
  • repo_url: https://github.com/cffan/corp
  • paper_authors: Chaofei Fan, Nick Hahn, Foram Kamdar, Donald Avansino, Guy H. Wilson, Leigh Hochberg, Krishna V. Shenoy, Jaimie M. Henderson, Francis R. Willett
  • for: 这个论文旨在解决iBCI系统中的长期稳定性问题,使iBCI系统可以在长期内维持高性能。
  • methods: 该论文提出了一种基于大语言模型(LM)的自动纠正错误的方法,通过在线更新iBCI解码器来实现长期稳定性。
  • results: 在一名参与者的403天长期试验中,该方法实现了高性能的手写iBCI任务的稳定性,比其他基线方法更高。这是目前 longest-running iBCI 稳定性示范之一。
    Abstract Intracortical brain-computer interfaces (iBCIs) have shown promise for restoring rapid communication to people with neurological disorders such as amyotrophic lateral sclerosis (ALS). However, to maintain high performance over time, iBCIs typically need frequent recalibration to combat changes in the neural recordings that accrue over days. This requires iBCI users to stop using the iBCI and engage in supervised data collection, making the iBCI system hard to use. In this paper, we propose a method that enables self-recalibration of communication iBCIs without interrupting the user. Our method leverages large language models (LMs) to automatically correct errors in iBCI outputs. The self-recalibration process uses these corrected outputs ("pseudo-labels") to continually update the iBCI decoder online. Over a period of more than one year (403 days), we evaluated our Continual Online Recalibration with Pseudo-labels (CORP) framework with one clinical trial participant. CORP achieved a stable decoding accuracy of 93.84% in an online handwriting iBCI task, significantly outperforming other baseline methods. Notably, this is the longest-running iBCI stability demonstration involving a human participant. Our results provide the first evidence for long-term stabilization of a plug-and-play, high-performance communication iBCI, addressing a major barrier for the clinical translation of iBCIs.
    摘要 来自 cortical brain-computer interfaces (iBCIs) 的应用已经显示出对于神经学疾病,如amyotrophic lateral sclerosis (ALS) 的恢复快速通信的潜力。然而,以维持高性能,iBCIs 通常需要频繁的重新参数化,以避免在日子变化中的 neural recordings 的变化。这需要 iBCI 使用者在使用 iBCI 时间中停止使用 iBCI 并进行监督的数据收集,从而使 iBCI 系统变得困难使用。在这篇文章中,我们提出了一种方法,可以让 communication iBCIs 进行自动重新参数化,不需要使用者中断使用 iBCI。我们的方法利用大型语言模型 (LMs) 来自动更正 iBCI 输出中的错误。自动重新参数化过程使用这些更正后的输出 ("pseudo-labels") 来在线上不断更新 iBCI 解oder。在More than one year (403 days) 的时间评估中,我们的 Continual Online Recalibration with Pseudo-labels (CORP) 框架与一名供试者进行了测试。CORP 在线上手写 iBCI 任务中获得了93.84%的稳定解oder精度,与其他基准方法相比,有 statistically significant 的优势。特别是,这是人类参与者的 longest-running iBCI 稳定示范,解决了 iBCI 的一 Major barrier 的试验。我们的结果提供了首次的长期稳定 plug-and-play,高性能 communication iBCI 的证据,解决了 iBCI 的临床翻译的一 major barrier。

Testing RadiX-Nets: Advances in Viable Sparse Topologies

  • paper_url: http://arxiv.org/abs/2311.03609
  • repo_url: None
  • paper_authors: Kevin Kwak, Zack West, Hayden Jananthan, Jeremy Kepner
  • for: 这篇论文旨在探讨RadiX-Nets的性能和可扩展性,以便在大规模数据处理中使用。
  • methods: 这篇论文使用了TensorFlow测试RadiX-Nets的性能,并发现了不同网络拓扑、初始化和训练方法对RadiX-Nets的影响。
  • results: 论文发现了一些“奇怪的模型”,它们在训练时间和精度之间存在差异,而同等权重的模型则能够训练得比较好。
    Abstract The exponential growth of data has sparked computational demands on ML research and industry use. Sparsification of hyper-parametrized deep neural networks (DNNs) creates simpler representations of complex data. Past research has shown that some sparse networks achieve similar performance as dense ones, reducing runtime and storage. RadiX-Nets, a subgroup of sparse DNNs, maintain uniformity which counteracts their lack of neural connections. Generation, independent of a dense network, yields faster asymptotic training and removes the need for costly pruning. However, little work has been done on RadiX-Nets, making testing challenging. This paper presents a testing suite for RadiX-Nets in TensorFlow. We test RadiX-Net performance to streamline processing in scalable models, revealing relationships between network topology, initialization, and training behavior. We also encounter "strange models" that train inconsistently and to lower accuracy while models of similar sparsity train well.
    摘要 “数据的激增带来机器学习研究和实际应用的计算压力。通过简化深度神经网络(DNN)的参数,可以创造简洁的复杂数据表示。过去的研究表明,一些简化网络可以与笔丝网络相当,减少运行时间和存储空间。RadiX-Nets是一个子集的简化DNN,保持了网络的均匀性,这在缺乏神经连接时提供了一个稳定的基础。通过生成独立于笔丝网络,可以更快地实现极限训练,并减少costly pruning。然而,对RadiX-Nets的研究相对较少,这使测试变得更加挑战性。本文提供了一个基于TensorFlow的RadiX-Nets测试集,用于检验RadiX-Net的性能,并揭示了网络结构、初始化和训练行为之间的关系。我们还发现了一些“strange models”,它们在不同的初始化和训练情况下具有不一致的训练行为和较低的准确率,而与相同的简化度的模型则可以训练得非常好。”

Generative Diffusion Models for Lattice Field Theory

  • paper_url: http://arxiv.org/abs/2311.03578
  • repo_url: None
  • paper_authors: Lingxiao Wang, Gert Aarts, Kai Zhou
  • for: 本研究探讨了机器学习和逻辑场论的连接,通过将生成扩散模型(DMs)与随机量化相结合,从某种Stochastic Differential Equation(SDE)的视角出发。
  • methods: 我们表明了DMs可以通过逆转一种随机过程,它是由Langevin方程驱动的,从初始分布生成样本,以估计目标分布。在一个简单的模型中,我们强调了DMs的能力学习有效动作。
  • results: 我们还证明了DMs可以作为全局抽象器,生成二维$\phi^4$量子逻辑场论中的配置。
    Abstract This study delves into the connection between machine learning and lattice field theory by linking generative diffusion models (DMs) with stochastic quantization, from a stochastic differential equation perspective. We show that DMs can be conceptualized by reversing a stochastic process driven by the Langevin equation, which then produces samples from an initial distribution to approximate the target distribution. In a toy model, we highlight the capability of DMs to learn effective actions. Furthermore, we demonstrate its feasibility to act as a global sampler for generating configurations in the two-dimensional $\phi^4$ quantum lattice field theory.
    摘要

A Graph-Theoretic Framework for Understanding Open-World Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2311.03524
  • repo_url: https://github.com/deeplearning-wisc/sorl
  • paper_authors: Yiyou Sun, Zhenmei Shi, Yixuan Li
  • for: 这篇论文targets at open-world semi-supervised learning, aiming to infer both known and novel classes in unlabeled data by leveraging prior knowledge from labeled sets.
  • methods: 该论文提出了一个图理论基础,用于形式化开放世界的 clustering 问题,并提供了实际的算法和理论保证。基于图形式化,该论文应用了名为Spectral Open-world Representation Learning(SORL)算法,并证明了将loss函数最小化等价于图spectral decomposition。
  • results: 实验表明,SORL可以与多种强基elines匹配或超越,并且可以提供有理论保证的实用用途。
    Abstract Open-world semi-supervised learning aims at inferring both known and novel classes in unlabeled data, by harnessing prior knowledge from a labeled set with known classes. Despite its importance, there is a lack of theoretical foundations for this problem. This paper bridges the gap by formalizing a graph-theoretic framework tailored for the open-world setting, where the clustering can be theoretically characterized by graph factorization. Our graph-theoretic framework illuminates practical algorithms and provides guarantees. In particular, based on our graph formulation, we apply the algorithm called Spectral Open-world Representation Learning (SORL), and show that minimizing our loss is equivalent to performing spectral decomposition on the graph. Such equivalence allows us to derive a provable error bound on the clustering performance for both known and novel classes, and analyze rigorously when labeled data helps. Empirically, SORL can match or outperform several strong baselines on common benchmark datasets, which is appealing for practical usage while enjoying theoretical guarantees.
    摘要

The Fairness Stitch: Unveiling the Potential of Model Stitching in Neural Network De-Biasing

  • paper_url: http://arxiv.org/abs/2311.03532
  • repo_url: https://github.com/modar7/the_fairness_stitch
  • paper_authors: Modar Sulaiman, Kallol Roy
  • for: 提高机器学习模型的公平性
  • methods: combines 模型缝合和训练,并具有公平约束
  • results: 在 celebA 和 utkface 两个well-known数据集上进行了广泛的评估,并与现有的基准方法进行了系统比较,发现在实现公平性和性能之间的平衡达到了显著提高, highlighting the promising potential of our method to address bias-related challenges and foster equitable outcomes in machine learning models.
    Abstract The pursuit of fairness in machine learning models has emerged as a critical research challenge in different applications ranging from bank loan approval to face detection. Despite the widespread adoption of artificial intelligence algorithms across various domains, concerns persist regarding the presence of biases and discrimination within these models. To address this pressing issue, this study introduces a novel method called "The Fairness Stitch (TFS)" to enhance fairness in deep learning models. This method combines model stitching and training jointly, while incorporating fairness constraints. In this research, we assess the effectiveness of our proposed method by conducting a comprehensive evaluation of two well-known datasets, CelebA and UTKFace. We systematically compare the performance of our approach with the existing baseline method. Our findings reveal a notable improvement in achieving a balanced trade-off between fairness and performance, highlighting the promising potential of our method to address bias-related challenges and foster equitable outcomes in machine learning models. This paper poses a challenge to the conventional wisdom of the effectiveness of the last layer in deep learning models for de-biasing.
    摘要 《机器学习模型中的公平性追求:一种新的解决方案》Introduction:随着人工智能算法在不同领域的普及,关注机器学习模型中的偏见和歧视问题日益减轻。为了解决这一问题,本研究提出了一种新的方法 called "The Fairness Stitch (TFS)",用于增强机器学习模型的公平性。该方法结合模型缝合和训练,并包含公平性约束。本研究通过对 celebA 和 utkFace 两个Well-known数据集进行了全面的评估,系统比较了我们的方法与现有基eline方法的性能。我们的发现表明,我们的方法可以更好地实现公平性和性能之间的平衡, highlighting the promising potential of our method to address bias-related challenges and foster equitable outcomes in machine learning models. This paper challenges the conventional wisdom of the effectiveness of the last layer in deep learning models for de-biasing.Translation notes:* 机器学习模型中的公平性追求:This phrase is used to emphasize the importance of fairness in machine learning models.* 一种新的解决方案:This phrase is used to introduce the novel method proposed in the study.* 公平性约束:This phrase is used to refer to the fairness constraints incorporated into the proposed method.* celebA 和 utkFace:These are two well-known datasets used in the study to evaluate the effectiveness of the proposed method.* 基eline方法:This phrase is used to refer to the existing baseline method compared with the proposed method.* 平衡:This word is used to refer to the balanced trade-off between fairness and performance achieved by the proposed method.* 普及:This word is used to refer to the widespread adoption of artificial intelligence algorithms across various domains.

Asynchronous Local Computations in Distributed Bayesian Learning

  • paper_url: http://arxiv.org/abs/2311.03496
  • repo_url: None
  • paper_authors: Kinjal Bhar, He Bai, Jemin George, Carl Busart
  • for: 本研究旨在提出一种基于卖场通信的批处理机器学习算法,以提高计算效率和减少通信负担。
  • methods: 本研究使用了 Bayesian 采样和不调整的兰堡算法(ULA) MCMC 进行本地计算,并在多个活动代理之间进行异步通信。
  • results: 对于一个简单的示例问题和实际数据集,研究人员发现使用该算法可以得到更快的初始减法和更高的准确率,特别是在低数据范围内。在Gamma天文望远镜和mHealth数据集上,该算法可以实现78%和90%的分类精度。
    Abstract Due to the expanding scope of machine learning (ML) to the fields of sensor networking, cooperative robotics and many other multi-agent systems, distributed deployment of inference algorithms has received a lot of attention. These algorithms involve collaboratively learning unknown parameters from dispersed data collected by multiple agents. There are two competing aspects in such algorithms, namely, intra-agent computation and inter-agent communication. Traditionally, algorithms are designed to perform both synchronously. However, certain circumstances need frugal use of communication channels as they are either unreliable, time-consuming, or resource-expensive. In this paper, we propose gossip-based asynchronous communication to leverage fast computations and reduce communication overhead simultaneously. We analyze the effects of multiple (local) intra-agent computations by the active agents between successive inter-agent communications. For local computations, Bayesian sampling via unadjusted Langevin algorithm (ULA) MCMC is utilized. The communication is assumed to be over a connected graph (e.g., as in decentralized learning), however, the results can be extended to coordinated communication where there is a central server (e.g., federated learning). We theoretically quantify the convergence rates in the process. To demonstrate the efficacy of the proposed algorithm, we present simulations on a toy problem as well as on real world data sets to train ML models to perform classification tasks. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.
    摘要

Leveraging High-Level Synthesis and Large Language Models to Generate, Simulate, and Deploy a Uniform Random Number Generator Hardware Design

  • paper_url: http://arxiv.org/abs/2311.03489
  • repo_url: None
  • paper_authors: James T. Meech
  • For: The paper is written for generating hardware designs using large language models and open-source tools.* Methods: The paper presents a new high-level synthesis methodology that uses exclusively open-source tools, excluding the large language model, to generate hardware designs.* Results: The paper presents a case study of generating a permuted congruential random number generator design with a wishbone interface, and verifies the functionality and quality of the design using large language model-generated simulations and the Dieharder randomness test suite.
    Abstract We present a new high-level synthesis methodology for using large language model tools to generate hardware designs. The methodology uses exclusively open-source tools excluding the large language model. As a case study, we use our methodology to generate a permuted congruential random number generator design with a wishbone interface. We verify the functionality and quality of the random number generator design using large language model-generated simulations and the Dieharder randomness test suite. We document all the large language model chat logs, Python scripts, Verilog scripts, and simulation results used in the case study. We believe that our method of hardware design generation coupled with the open source silicon 130 nm design tools will revolutionize application-specific integrated circuit design. Our methodology significantly lowers the bar to entry when building domain-specific computing accelerators for the Internet of Things and proof of concept prototypes for later fabrication in more modern process nodes.
    摘要 我们介绍一种新的高级合成方法,使用大语言模型工具生成硬件设计。该方法使用仅开源工具,排除大语言模型。作为案例研究,我们使用该方法生成一个卷积排序随机数生成器设计,具有愿望形桥接。我们使用大语言模型生成的 simulations和DieharderRandomness测试集 verify了随机数生成器设计的功能和质量。我们还记录了所有的大语言模型对话记录、Python脚本、Verilog脚本和测试结果。我们认为,我们的硬件设计生成方法,结合开源的130nm设计工具,将重塑应用特定集成电路设计。我们的方法可以大幅降低在建立领域特定计算加速器和互联网物联网设备的门槛。

Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization

  • paper_url: http://arxiv.org/abs/2311.03351
  • repo_url: None
  • paper_authors: Kun Lei, Zhengmao He, Chenhao Lu, Kaizhe Hu, Yang Gao, Huazhe Xu
  • for: 本研究旨在探讨如何将离线学习和在线学习融合,以实现高效和安全的学习。
  • methods: 本paper提出了Uni-o4方法,利用在线策略对离线和在线学习进行匹配,从而实现离线和在线学习的无缝传递。在离线阶段,Uni-o4使用多个ensemble策略来解决行为策略与离线数据集的匹配问题。
  • results: 本研究表明,通过Uni-o4方法,离线和在线学习可以协同工作,以获得优秀的离线初始化和稳定的在线细化能力。通过实际的 робоット任务和许多模拟 benchmarks 的全面评估,我们证明了我们的方法在离线和离线到在线学习中具有领先的性能。
    Abstract Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-o4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-o4 leverages diverse ensemble policies to address the mismatch issues between the estimated behavior policy and the offline dataset. Through a simple offline policy evaluation (OPE) approach, Uni-o4 can achieve multi-step policy improvement safely. We demonstrate that by employing the method above, the fusion of these two paradigms can yield superior offline initialization as well as stable and rapid online fine-tuning capabilities. Through real-world robot tasks, we highlight the benefits of this paradigm for rapid deployment in challenging, previously unseen real-world environments. Additionally, through comprehensive evaluations using numerous simulated benchmarks, we substantiate that our method achieves state-of-the-art performance in both offline and offline-to-online fine-tuning learning. Our website: https://lei-kun.github.io/uni-o4/ .
    摘要 通过将线上和线下学习RL结合起来是efficient和安全学习的关键。然而,之前的方法通常将线上和线下学习视为两个分开的过程,这会导致重复的设计和有限的性能。我们问:可以在不引入额外保守性或规范的情况下,实现简单而有效的线上和线下学习吗?在这个研究中,我们提出了Uni-o4,它利用了在两个阶段中的在政策对象的同Alignment,使RL Agent可以在线上和线下学习之间转换无缝。由于这种同Alignment,RL Agent可以在两个阶段之间转换,从而提高了学习模式的灵活性,允许任意组合预训练、精度调整、线上和线下学习。在线下阶段,Uni-o4特别利用多个ensemble政策来解决 Line下数据与估计行为策略之间的匹配问题。通过一种简单的线下政策评估(OPE)方法,Uni-o4可以安全地实现多步政策改进。我们示出,通过使用这种方法,线上和线下学习的融合可以提供出色的初始化以及稳定和快速的在线精度调整能力。通过实际的 робоット任务,我们强调了这种 Paradigma的快速部署在复杂、 previously 未见的实际环境中的 benefita。此外,通过了多个模拟 benchmark 的全面评估,我们证明了我们的方法在线上和线下学习以及在线上转换到线下学习中的性能达到了领先水平。更多信息请访问我们的网站:

Learning Hard-Constrained Models with One Sample

  • paper_url: http://arxiv.org/abs/2311.03332
  • repo_url: None
  • paper_authors: Andreas Galanis, Alkis Kalavasis, Anthimos Vardis Kandiros
  • for: 本研究探讨了用单个样本估计Markov随机场的参数,并应用于$k$-SAT、正确颜色和通用$H$-色调模型。
  • methods: 本文使用 Pseudo-Likelihood 估计器,并使用 coupling 技术来提供变量 bound。
  • results: 本文获得了一些积极结果,包括 linear-time 估计器 для $q$-颜色和 $H$-色调模型,以及一些不可能估计的结果,如 $k$-SAT 模型中的非可 Identifier 问题。
    Abstract We consider the problem of estimating the parameters of a Markov Random Field with hard-constraints using a single sample. As our main running examples, we use the $k$-SAT and the proper coloring models, as well as general $H$-coloring models; for all of these we obtain both positive and negative results. In contrast to the soft-constrained case, we show in particular that single-sample estimation is not always possible, and that the existence of an estimator is related to the existence of non-satisfiable instances. Our algorithms are based on the pseudo-likelihood estimator. We show variance bounds for this estimator using coupling techniques inspired, in the case of $k$-SAT, by Moitra's sampling algorithm (JACM, 2019); our positive results for colorings build on this new coupling approach. For $q$-colorings on graphs with maximum degree $d$, we give a linear-time estimator when $q>d+1$, whereas the problem is non-identifiable when $q\leq d+1$. For general $H$-colorings, we show that standard conditions that guarantee sampling, such as Dobrushin's condition, are insufficient for one-sample learning; on the positive side, we provide a general condition that is sufficient to guarantee linear-time learning and obtain applications for proper colorings and permissive models. For the $k$-SAT model on formulas with maximum degree $d$, we provide a linear-time estimator when $k\gtrsim 6.45\log d$, whereas the problem becomes non-identifiable when $k\lesssim \log d$.
    摘要 我们考虑一个具有硬制约的马可夫随机场景中参数的估计问题,使用单一样本进行估计。我们的主要执行例子包括$k$-SAT和正确颜色模型,以及一般的$H$-颜色模型。在这些中,我们获得了正面和负面的结果。在不同于软制约情况下,我们展示了单一样本估计不一定可行,并且存在非满足性的实例的存在。我们的算法基于伪贝叶茨构成函数。我们使用对抗技术,对$k$-SAT模型使用了Moitra的抽取算法(JACM,2019),以及一个新的对抗方法,从而获得了变iance bound。在这些中,我们获得了正面的结果。在具有最大度$d$的图上,如果$q>d+1$,我们提供了一个线性时间的估计器,但是当$q\leq d+1$时,这个问题是非识别的。在一般的$H$-颜色情况下,我们展示了样本数量不足的问题,而且样本数量过多的情况下,我们获得了一个一般的可足够条件,可以保证线性时间的学习。我们还提供了一些实际应用,例如正确颜色和允许模型。在具有最大度$d$的$k$-SAT模型中,如果$k\gtrsim 6.45\log d$,我们提供了一个线性时间的估计器,但是当$k\lesssim \log d$时,这个问题是非识别的。

Practical considerations for variable screening in the Super Learner

  • paper_url: http://arxiv.org/abs/2311.03313
  • repo_url: https://github.com/bdwilliamson/sl_screening_supplementary
  • paper_authors: Brian D. Williamson, Drew King, Ying Huang
  • for: 该论文旨在探讨Super Learner ensemble的应用,以及使用变量选择算法来降维数据。
  • methods: 论文使用了Super Learner ensemble,并使用变量选择算法来降维数据 перед fitting其他预测算法。
  • results: 论文提供了实验结果,表明使用多种候选选择算法可以保证预测器的性能,类似于选择一库的预测算法。
    Abstract Estimating a prediction function is a fundamental component of many data analyses. The Super Learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a Super Learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen, similar to the guidance for choosing a library of prediction algorithms for the Super Learner.
    摘要 估算预测函数是数据分析中的基本组成部分。超学习ensemble,一种堆叠的实现,具有了优秀的理论性质并在多个应用中获得了成功。变量屏选算法,包括lasso,可以在ensemble中进行维度减少之前使用。然而,使用lasso进行维度减少的超学习表现不佳的情况尚未得到了完全的探讨。我们提供了实验结果,表明使用多种候选屏选算法可以保护 against poor performance的任何一个屏选算法,类似于选择预测算法库的指南。

TS-Diffusion: Generating Highly Complex Time Series with Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.03303
  • repo_url: None
  • paper_authors: Yangming Li
  • for: 处理复杂时间序列数据,包括采样不规则、缺失值和高维特征时间维度。
  • methods: 提出了一种通用模型TS-Diffusion,包括点过程核心,具有采样不规则处理、缺失值处理和高维特征处理能力。
  • results: 在多个时间序列数据集上进行了广泛的实验,显示TS-Diffusion模型在传统和复杂时间序列上具有优秀表现,舒过先前的基elines。
    Abstract While current generative models have achieved promising performances in time-series synthesis, they either make strong assumptions on the data format (e.g., regularities) or rely on pre-processing approaches (e.g., interpolations) to simplify the raw data. In this work, we consider a class of time series with three common bad properties, including sampling irregularities, missingness, and large feature-temporal dimensions, and introduce a general model, TS-Diffusion, to process such complex time series. Our model consists of three parts under the framework of point process. The first part is an encoder of the neural ordinary differential equation (ODE) that converts time series into dense representations, with the jump technique to capture sampling irregularities and self-attention mechanism to handle missing values; The second component of TS-Diffusion is a diffusion model that learns from the representation of time series. These time-series representations can have a complex distribution because of their high dimensions; The third part is a decoder of another ODE that generates time series with irregularities and missing values given their representations. We have conducted extensive experiments on multiple time-series datasets, demonstrating that TS-Diffusion achieves excellent results on both conventional and complex time series and significantly outperforms previous baselines.
    摘要 当前的生成模型已经实现了时间序列合成的可靠性,但是它们都会假设时间序列的数据格式(例如,规律)或者使用预处理技术(例如, interpolations)来简化原始数据。在这个工作中,我们考虑了一类具有三种常见坏 свой性的时间序列,包括采样不均匀、缺失和大量特征时间维度。我们引入了一种通用的模型,TS-Diffusion,来处理这类复杂的时间序列。我们的模型包括三个部分,即编码器、扩散模型和解码器。首先,编码器是一个神经网络Ordinary Differential Equation(ODE),用于将时间序列转换为稠密表示。使用跳技术来捕捉采样不均匀,并使用自我注意机制来处理缺失值。第二部分是一个学习从时间序列表示的扩散模型。这些时间序列表示可能具有复杂的分布,因为它们的维度很高。最后,解码器是另一个ODE,用于通过时间序列表示生成时间序列。我们在多个时间序列数据集上进行了广泛的实验,并证明了TS-Diffusion在传统和复杂时间序列上具有出色的表现,并在之前的基elines之上显著超越。

Risk of Transfer Learning and its Applications in Finance

  • paper_url: http://arxiv.org/abs/2311.03283
  • repo_url: None
  • paper_authors: Haoyang Cao, Haotian Gu, Xin Guo, Mathieu Rosenbaum
  • for: 这篇论文旨在提出一种新的转移风险概念,用于评估转移学习的转移性能。
  • methods: 论文使用转移学习技术和转移风险概念来解决股票回报预测和资产优化问题。
  • results: 数据结果显示,转移风险与总转移学习性能之间存在强相关性,转移风险可以提供一种计算效率高的方式来确定合适的源任务在转移学习中。
    Abstract Transfer learning is an emerging and popular paradigm for utilizing existing knowledge from previous learning tasks to improve the performance of new ones. In this paper, we propose a novel concept of transfer risk and and analyze its properties to evaluate transferability of transfer learning. We apply transfer learning techniques and this concept of transfer risk to stock return prediction and portfolio optimization problems. Numerical results demonstrate a strong correlation between transfer risk and overall transfer learning performance, where transfer risk provides a computationally efficient way to identify appropriate source tasks in transfer learning, including cross-continent, cross-sector, and cross-frequency transfer for portfolio optimization.
    摘要 <> Transfer learning 是一种现代和受欢迎的 paradigm,利用之前学习任务中的知识来改善新任务的性能。在这篇论文中,我们提出了一种新的转移风险概念,并分析其属性以评估转移学习的可行性。我们运用转移学习技术和这种转移风险概念来解决股票回报预测和投资优化问题。numerical 结果表明,转移风险和总转移学习性能之间存在强相关性,而转移风险提供了一种计算效率高的方式来确定合适的来源任务在转移学习中,包括跨洲、跨领域和跨频率的转移。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Discretizing Numerical Attributes: An Analysis of Human Perceptions

  • paper_url: http://arxiv.org/abs/2311.03278
  • repo_url: None
  • paper_authors: Minakshi Kaushik, Rahul Sharma, Dirk Draheim
  • for: 本研究的目的是建立数值特征分割的标准方法。
  • methods: 本研究使用了人类对数值特征分割的感知分析和专家的数据可视化技术来对分割方法进行评估。
  • results: 研究发现,68.7%的人类回答与提出的两个指标相似,这表明该指标可能是一种有效的数值特征分割方法。
    Abstract Machine learning (ML) has employed various discretization methods to partition numerical attributes into intervals. However, an effective discretization technique remains elusive in many ML applications, such as association rule mining. Moreover, the existing discretization techniques do not reflect best the impact of the independent numerical factor on the dependent numerical target factor. This research aims to establish a benchmark approach for numerical attribute partitioning. We conduct an extensive analysis of human perceptions of partitioning a numerical attribute and compare these perceptions with the results obtained from our two proposed measures. We also examine the perceptions of experts in data science, statistics, and engineering by employing numerical data visualization techniques. The analysis of collected responses reveals that $68.7\%$ of human responses approximately closely align with the values generated by our proposed measures. Based on these findings, our proposed measures may be used as one of the methods for discretizing the numerical attributes.
    摘要 Machine learning (ML) 已经使用了多种精确化方法来将数值特征分成间隔。然而,有效的精确化技术仍然是许多 ML 应用中的缺失,如关联规则挖掘。此外,现有的精确化技术不reflect最好独立数值因素对依赖数值目标因素的影响。这些研究的目标是建立数值特征分割的标准方法。我们进行了广泛的人类知觉分割数值特征的分析,并与我们两个提议的度量相比较。此外,我们还通过使用数字数据视化技术来询问专家们的看法。收集的回答分析表明,$68.7\%$ 的人类回答与我们的提议度量相似。基于这些发现,我们的提议度量可以作为数值特征分割的一种方法。

Exploiting Latent Attribute Interaction with Transformer on Heterogeneous Information Networks

  • paper_url: http://arxiv.org/abs/2311.03275
  • repo_url: None
  • paper_authors: Zeyuan Zhao, Qingqing Ge, Anfeng Cheng, Yiding Liu, Xiang Li, Shuaiqiang Wang
  • for: 本文提出了一种新的矩阵图模型(MULAN),用于处理各种不同类型节点的矩阵图。
  • methods: 该模型包括两个主要组件:一个类型意识编码器和一个维度意识编码器。类型意识编码器使节点类型信息得到更好的利用,而维度意识编码器可以更好地捕捉不同节点特征之间的隐藏交互。
  • results: 在六个不同类型的矩阵图 dataset 上进行了广泛的实验,结果显示 MULAN 比其他现状模型更加出色,同时也更加高效。
    Abstract Heterogeneous graph neural networks (HGNNs) have recently shown impressive capability in modeling heterogeneous graphs that are ubiquitous in real-world applications. Due to the diversity of attributes of nodes in different types, most existing models first align nodes by mapping them into the same low-dimensional space. However, in this way, they lose the type information of nodes. In addition, most of them only consider the interactions between nodes while neglecting the high-order information behind the latent interactions among different node features. To address these problems, in this paper, we propose a novel heterogeneous graph model MULAN, including two major components, i.e., a type-aware encoder and a dimension-aware encoder. Specifically, the type-aware encoder compensates for the loss of node type information and better leverages graph heterogeneity in learning node representations. Built upon transformer architecture, the dimension-aware encoder is capable of capturing the latent interactions among the diverse node features. With these components, the information of graph heterogeneity, node features and graph structure can be comprehensively encoded in node representations. We conduct extensive experiments on six heterogeneous benchmark datasets, which demonstrates the superiority of MULAN over other state-of-the-art competitors and also shows that MULAN is efficient.
    摘要 各种不同类型的图(Heterogeneous Graph)在实际应用中非常普遍。由于节点的属性的多样性,大多数现有模型首先将节点映射到同一低维度空间中,从而产生了节点类型信息的丢失。此外,大多数模型只考虑节点之间的交互,而忽略了各种节点特征之间的高阶信息。为了解决这些问题,本文提出了一种新的多元图模型MULAN,包括两个主要组成部分:类型意识编码器和维度意识编码器。具体来说,类型意识编码器补偿了节点类型信息的丢失,更好地利用图中不同类型节点的多样性进行学习节点表示。基于转换器架构,维度意识编码器可以捕捉节点特征之间的隐藏交互。通过这些组成部分,图中的多样性、节点特征和图结构信息都可以被全面编码在节点表示中。我们在六个多元 benchmark 数据集上进行了广泛的实验, demonstarted MULAN 的优越性,并显示 MULAN 是高效的。

Parameter-Agnostic Optimization under Relaxed Smoothness

  • paper_url: http://arxiv.org/abs/2311.03252
  • repo_url: None
  • paper_authors: Florian Hübler, Junchi Yang, Xiang Li, Niao He
  • for: 这种研究的目的是提高机器学习模型的训练效率,并实现不需要任何问题参数的参数无关优化。
  • methods: 这种研究使用了 Normalized Stochastic Gradient Descent with Momentum(NSGD-M)算法,并提出了一种新的 тео里 Frameworks 来下断这种算法的复杂性。
  • results: 研究发现,NSGD-M 可以在 $(L_0, L_1)$-smooth 函数下实现 Nearly 率优化复杂性,而无需任何问题参数的优化。此外, Gradient Descent with Backtracking Line Search 可以在权重函数下消除这种对数因子。这些结论是在 deterministic 设定下得出的,并且通过实验 validate 了这些理论结论。
    Abstract Tuning hyperparameters, such as the stepsize, presents a major challenge of training machine learning models. To address this challenge, numerous adaptive optimization algorithms have been developed that achieve near-optimal complexities, even when stepsizes are independent of problem-specific parameters, provided that the loss function is $L$-smooth. However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize. In this study, we demonstrate that Normalized Stochastic Gradient Descent with Momentum (NSGD-M) can achieve a (nearly) rate-optimal complexity without prior knowledge of any problem parameter, though this comes at the cost of introducing an exponential term dependent on $L_1$ in the complexity. We further establish that this exponential term is inevitable to such schemes by introducing a theoretical framework of lower bounds tailored explicitly for parameter-agnostic algorithms. Interestingly, in deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search. To the best of our knowledge, these findings represent the first parameter-agnostic convergence results under the generalized smoothness condition. Our empirical experiments further confirm our theoretical insights.
    摘要 调整超参数,如步长,对机器学习模型训练带来重要挑战。为解决这个挑战,许多适应优化算法已经开发出来,可以在步长独立于问题特定参数时达到近似优化复杂性。然而,当假设更加实际的 $(L_0, L_1)$-平滑性时,所有现有的整合结果仍然需要调整步长。在这项研究中,我们证明了Normalized Stochastic Gradient Descent with Momentum(NSGD-M)可以 дости到一种(近似)率优复杂性,不需要任何问题参数的先知知识。然而,这来到了在 $L_1$ 上的指数因子,这个因子是不可避免的。我们还建立了一个Lower bound框架,专门为无参数算法设置下降 bound。让人感兴趣的是,在 deterministic Settings 中,这个指数因子可以通过使用 Gradient Descent with Backtracking Line Search 中和。我们认为这些发现是在 generalized smoothness condition 下的第一个参数无关的整合结果。我们的实验也证明了我们的理论发现。

Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

  • paper_url: http://arxiv.org/abs/2311.03242
  • repo_url: None
  • paper_authors: Martin Eigel, Charles Miranda, Janina Schütte, David Sommer
  • for: 本文旨在构造一种基于神经网络的样本抽取方法,以便从一个简单的参考分布(例如标准正态分布)抽取目标分布中的样本。
  • methods: 本文提出一种基于Langevin Monte Carlo(LMC)算法的神经网络架构,并利用LMC扰动结果得出目标分布的 aproximation 率。
  • results: 本文根据不同干扰的假设,得出了中间变量干扰的增长率 bounds,以及一种深度差分神经网络架构的表达能力结果,用于近似目标分布中的样本抽取。
    Abstract We sample from a given target distribution by constructing a neural network which maps samples from a simple reference, e.g. the standard normal distribution, to samples from the target. To that end, we propose using a neural network architecture inspired by the Langevin Monte Carlo (LMC) algorithm. Based on LMC perturbation results, we show approximation rates of the proposed architecture for smooth, log-concave target distributions measured in the Wasserstein-$2$ distance. The analysis heavily relies on the notion of sub-Gaussianity of the intermediate measures of the perturbed LMC process. In particular, we derive bounds on the growth of the intermediate variance proxies under different assumptions on the perturbations. Moreover, we propose an architecture similar to deep residual neural networks and derive expressivity results for approximating the sample to target distribution map.
    摘要 我们从给定的目标分布中采样,通过建立一个基于神经网络的映射,将来自简单的参考分布(例如标准正常分布)中的抽样转换为目标分布中的抽样。为此,我们提出使用基于Langevin Monte Carlo(LMC)算法的神经网络架构。通过LMC扰动结果,我们显示了该架构对于具有凸凹聚合性的目标分布( measured in Wasserstein-$2$ distance)的近似率。我们的分析强调了抛物质的副次 Gaussianity。特别是,我们 derive bounds on the growth of intermediate variance proxies under different assumptions on the perturbations。此外,我们还提出了一种类似于深度差分神经网络的架构,并 derive expressivity results for approximating the sample to target distribution map。

Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources

  • paper_url: http://arxiv.org/abs/2311.03236
  • repo_url: None
  • paper_authors: Haotian Zheng, Qizhou Wang, Zhen Fang, Xiaobo Xia, Feng Liu, Tongliang Liu, Bo Han
  • for: 本研究旨在提高开放世界分类中预测器的可靠性,通过使用数据生成器来生成假外部数据(OOD),不需要实际的OOD数据。
  • methods: 本研究提出了一种名为协助任务基本学习(ATOL)的数据生成器基本学习方法,通过在ID和OOD部分之间设置分支支持,使得学习ID和OOD模式时有所帮助。
  • results: 实验结果表明,ATOL方法可以效果地降低假OOD生成的干扰,提高预测器对ID和OOD数据的分类性能。
    Abstract Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data, thereby increasing the reliability of open-world classification. However, it is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning ID and OOD patterns. This obstacle gives rise to data generation-based learning methods, synthesizing OOD data via data generators for predictor training without requiring any real OOD data. Related methods typically pre-train a generator on ID data and adopt various selection procedures to find those data likely to be the OOD cases. However, generated data may still coincide with ID semantics, i.e., mistaken OOD generation remains, confusing the predictor between ID and OOD data. To this end, we suggest that generated data (with mistaken OOD generation) can be used to devise an auxiliary OOD detection task to facilitate real OOD detection. Specifically, we can ensure that learning from such an auxiliary task is beneficial if the ID and the OOD parts have disjoint supports, with the help of a well-designed training procedure for the predictor. Accordingly, we propose a powerful data generation-based learning method named Auxiliary Task-based OOD Learning (ATOL) that can relieve the mistaken OOD generation. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts.
    摘要 外部分布(OOD)检测可以检测到OOD数据,其中预测器无法对ID数据进行有效预测,从而提高开放世界分类的可靠性。然而,收集真正的OOD数据用于训练预测器是一个困难的任务。这种困难导致了数据生成基于学习方法,通过数据生成器对预测器进行训练,无需真正的OOD数据。相关的方法通常先在ID数据上预训练生成器,然后采用不同的选择过程来找出可能是OOD的数据。然而,生成的数据可能仍然与ID semantics相同,即生成的OOD数据仍然会与ID数据冲突,让预测器在ID和OOD数据之间产生混乱。为了解决这个问题,我们提出使用生成的数据(含有错误的OOD生成)来设置auxiliary OOD检测任务,以便在ID和OOD部分之间寻找分割。具体来说,我们可以通过一种合理的训练程序来确保学习auxiliary任务是有益的,并且ID和OOD部分的支持是分开的。根据这个思想,我们提出一种强大的数据生成基于学习方法,即auxiliary Task-based OOD Learning(ATOL),可以解决 mistaken OOD生成问题。我们在不同的OOD检测设置下进行了广泛的实验,并证明了我们的方法在与其他先进方法相比具有优异的效果。

Spatial Process Approximations: Assessing Their Necessity

  • paper_url: http://arxiv.org/abs/2311.03201
  • repo_url: None
  • paper_authors: Hao Zhang
  • for: 这篇论文主要用于探讨大样本大小时,矩阵积分在预测、分类和最大 likelihood 估计中的缺陷。
  • methods: 论文使用了多种优化方法来解决这种缺陷,包括使用低级别的积分方法和采用适应规则来改进预测和估计计算。
  • results: 论文的结果表明,使用这些优化方法可以提高预测和估计的精度,并且可以避免由积分缺陷导致的低级别问题。
    Abstract In spatial statistics and machine learning, the kernel matrix plays a pivotal role in prediction, classification, and maximum likelihood estimation. A thorough examination reveals that for large sample sizes, the kernel matrix becomes ill-conditioned, provided the sampling locations are fairly evenly distributed. This condition poses significant challenges to numerical algorithms used in prediction and estimation computations and necessitates an approximation to prediction and the Gaussian likelihood. A review of current methodologies for managing large spatial data indicates that some fail to address this ill-conditioning problem. Such ill-conditioning often results in low-rank approximations of the stochastic processes. This paper introduces various optimality criteria and provides solutions for each.
    摘要 在空间统计学和机器学习中,kernel矩阵在预测、分类和最大likelihood估计中扮演着关键角色。经过仔细查看,当样本大小较大时,kernel矩阵往往变得不正则,当采样点分布均匀时,这种情况会出现。这种不正则性会对预测和估计计算带来很大挑战,并且需要采用近似方法来预测和 Gaussian likelihood。现有的大 spatial数据管理方法中有一些不能解决这个不正则性问题。这种不正则性通常会导致Stochastic процеcess的低级别近似。本文介绍了多种优化 критерионов和解决方案。

Online Learning Quantum States with the Logarithmic Loss via VB-FTRL

  • paper_url: http://arxiv.org/abs/2311.04237
  • repo_url: None
  • paper_authors: Wei-Fu Tseng, Kai-Chun Chen, Zi-Hong Xiao, Yen-Huan Li
  • for: 这个论文主要研究的是在线学习量子状态的问题,具体来说是一种量子投资策略选择问题,这是在线学习领域的经典开 problema 已经超过三十年了。
  • methods: 这个论文使用的方法是VB-FTRL算法,这是第一个对OPS(Online Portfolio Selection)问题的近似 regret 优化算法,它的计算复杂度是 moderate。
  • results: 这个论文的结果是,通过对VB-FTRL算法进行扩展,可以实现LL-OLQS(Logarithmic Loss Online Quantum State Tomography)问题的 regret 率为 $O(d^2 \log(d+T))$, 这比现有的最好known regret率 $O(d^2 \log T)$ 更好。
    Abstract Online learning quantum states with the logarithmic loss (LL-OLQS) is a quantum generalization of online portfolio selection, a classic open problem in the field of online learning for over three decades. The problem also emerges in designing randomized optimization algorithms for maximum-likelihood quantum state tomography. Recently, Jezequel et al. (arXiv:2209.13932) proposed the VB-FTRL algorithm, the first nearly regret-optimal algorithm for OPS with moderate computational complexity. In this note, we generalize VB-FTRL for LL-OLQS. Let $d$ denote the dimension and $T$ the number of rounds. The generalized algorithm achieves a regret rate of $O ( d^2 \log ( d + T ) )$ for LL-OLQS. Each iteration of the algorithm consists of solving a semidefinite program that can be implemented in polynomial time by, e.g., cutting-plane methods. For comparison, the best-known regret rate for LL-OLQS is currently $O ( d^2 \log T )$, achieved by the exponential weight method. However, there is no explicit implementation available for the exponential weight method for LL-OLQS. To facilitate the generalization, we introduce the notion of VB-convexity. VB-convexity is a sufficient condition for the logarithmic barrier associated with any function to be convex and is of independent interest.
    摘要 在线学习量子状态的幂函数损失(LL-OLQS)是量子扩展在线股票选择的问题,这是线上学习领域的经典问题,已经存在三十多年。这个问题还出现在设计随机优化算法的最大elihood量子状态探测中。最近,Jezequel等人(arXiv:2209.13932)提出了VB-FTRL算法,是OPS中首个几乎 regret-optimal的算法,并且具有moderate的计算复杂度。在这个笔记中,我们推广VB-FTRL算法到LL-OLQS。在$d$维度和$T$轮数下,我们的总体算法实现了$O(d^2\log(d+T))$的 regret率。每次迭代中的算法包括解决一个半definite程序,可以在多项式时间内实现,例如,裁剪方法。相比之下,目前最好的LL-OLQS的 regret率是$O(d^2\log T)$,由exponential weight方法实现,但是没有Explicit实现可用。为了推广,我们引入VB-convexity。VB-convexity是任何函数的幂函数损失相对于任何函数是凸的sufficient condition,并且是独立的研究兴趣。

Stable Linear Subspace Identification: A Machine Learning Approach

  • paper_url: http://arxiv.org/abs/2311.03197
  • repo_url: https://github.com/cemempamoi/simba
  • paper_authors: Loris Di Natale, Muhammad Zakwan, Bratislav Svetozarevic, Philipp Heer, Giancarlo Ferrari Trecate, Colin N. Jones
  • for: 这篇论文旨在演示如何使用机器学习工具来提高线性系统识别(SI)的性能,并提出了一种基于自动�ifferentiation框架的SI方法,称为SIMBa。
  • methods: 这篇论文使用了一种基于Linear-Matrix-Inequality的自由参数化Schur矩阵来保证模型的稳定性,并使用了 automatic differentiation 框架来实现SIMBa方法。
  • results: 论文表明,SIMBa方法在许多输入输出系统和实际数据上都能够达到或超越传统的线性状态空间SI方法的性能,并且在一些情况下,SIMBa方法的性能提升可达25%以上,这表明SIMBa方法可以同时实现状态空间SI方法的最佳适应性和稳定性。
    Abstract Machine Learning (ML) and linear System Identification (SI) have been historically developed independently. In this paper, we leverage well-established ML tools - especially the automatic differentiation framework - to introduce SIMBa, a family of discrete linear multi-step-ahead state-space SI methods using backpropagation. SIMBa relies on a novel Linear-Matrix-Inequality-based free parametrization of Schur matrices to ensure the stability of the identified model. We show how SIMBa generally outperforms traditional linear state-space SI methods, and sometimes significantly, although at the price of a higher computational burden. This performance gap is particularly remarkable compared to other SI methods with stability guarantees, where the gain is frequently above 25% in our investigations, hinting at SIMBa's ability to simultaneously achieve state-of-the-art fitting performance and enforce stability. Interestingly, these observations hold for a wide variety of input-output systems and on both simulated and real-world data, showcasing the flexibility of the proposed approach. We postulate that this new SI paradigm presents a great extension potential to identify structured nonlinear models from data, and we hence open-source SIMBa on https://github.com/Cemempamoi/simba.
    摘要

DeepInception: Hypnotize Large Language Model to Be Jailbreaker

  • paper_url: http://arxiv.org/abs/2311.03191
  • repo_url: https://github.com/tmlr-group/deepinception
  • paper_authors: Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, Bo Han
  • for: 这个研究是为了探讨大语言模型(LLM)对于安全性的攻击和漏洞,并提出了一个轻量级的方法来实现这些攻击。
  • methods: 这个研究使用了对话内容中的人际互动来对LLM进行传统的攻击和漏洞测试,并提出了一个名为DeepInception的方法来实现这些攻击。DeepInception方法利用了LMM的人类化能力来建立一个嵌入式的scene来控制LMM的行为,实现了一个可靠且有效的攻击方法。
  • results: 这个研究通过了实验证明了DeepInception方法的有效性,可以实现高度的攻击成功率,并且可以在继续交互中进行无间断的监狱破坏。实验结果显示,DeepInception方法可以实现高度的攻击成功率,并且可以在继续交互中进行无间断的监狱破坏。
    Abstract Despite remarkable success in various applications, large language models (LLMs) are vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous studies for jailbreaks usually resort to brute-force optimization or extrapolations of a high computation cost, which might not be practical or effective. In this paper, inspired by the Milgram experiment that individuals can harm another person if they are told to do so by an authoritative figure, we disclose a lightweight method, termed as DeepInception, which can easily hypnotize LLM to be a jailbreaker and unlock its misusing risks. Specifically, DeepInception leverages the personification ability of LLM to construct a novel nested scene to behave, which realizes an adaptive way to escape the usage control in a normal scenario and provides the possibility for further direct jailbreaks. Empirically, we conduct comprehensive experiments to show its efficacy. Our DeepInception can achieve competitive jailbreak success rates with previous counterparts and realize a continuous jailbreak in subsequent interactions, which reveals the critical weakness of self-losing on both open/closed-source LLMs like Falcon, Vicuna, Llama-2, and GPT-3.5/4/4V. Our investigation appeals that people should pay more attention to the safety aspects of LLMs and a stronger defense against their misuse risks. The code is publicly available at: https://github.com/tmlr-group/DeepInception.
    摘要 尽管大语言模型(LLM)在各种应用场景中表现出色,但它们却易受到黑客破坏的威胁。然而,过去的研究通常采用极端优化或高计算成本的推理,这可能不实际或有效。在这篇论文中,我们受到米勒实验的启发,其表明人们可以通过权威人士的命令,让别人伤害他人。我们披露了一种轻量级的方法,称为深度冥想(DeepInception),可以轻松地使LLM变成破坏者,并探索其不当使用的风险。具体来说,深度冥想利用LLM的人格化能力,构建了一个新的嵌入式场景,以实现常规情况下的适应式逃脱和进一步的直接破坏。我们进行了广泛的实验,证明了其效果。我们的深度冥想可以与前一代对手相比,实现竞争性的破坏成功率,并在后续互动中实现连续破坏,揭示了开源/关闭源LLM like Falcon、Vicuna、Llama-2和GPT-3.5/4/4V中的极其弱点。我们的调查表明,人们应该更关注LLM的安全问题,并采取更加有力的防御措施。代码可以在 GitHub 上获取:https://github.com/tmlr-group/DeepInception。

Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding

  • paper_url: http://arxiv.org/abs/2311.03421
  • repo_url: https://github.com/arnaumarin/hdnn-artifactbrainstate
  • paper_authors: Arnau Marin-Llobet, Arnau Manasanch, Maria V. Sanchez-Vives
  • for: 这个研究旨在提高脑 States的识别精度,尤其是在不同程度的麻醉下。
  • methods: 这个研究使用了两个阶段的计算框架,首先使用了抽象网络来除掉噪声,然后使用了卷积神经网络来分类脑 States。
  • results: 研究发现,这个混合的抽象网络和卷积神经网络框架可以有效地 Mitigate 噪声,使模型在不同的数据压缩和噪声水平上达到了与清晰数据 CNN 的性能。
    Abstract The study of brain states, ranging from highly synchronous to asynchronous neuronal patterns like the sleep-wake cycle, is fundamental for assessing the brain's spatiotemporal dynamics and their close connection to behavior. However, the development of new techniques to accurately identify them still remains a challenge, as these are often compromised by the presence of noise, artifacts, and suboptimal recording quality. In this study, we propose a two-stage computational framework combining Hopfield Networks for artifact data preprocessing with Convolutional Neural Networks (CNNs) for classification of brain states in rat neural recordings under different levels of anesthesia. To evaluate the robustness of our framework, we deliberately introduced noise artifacts into the neural recordings. We evaluated our hybrid Hopfield-CNN pipeline by benchmarking it against two comparative models: a standalone CNN handling the same noisy inputs, and another CNN trained and tested on artifact-free data. Performance across various levels of data compression and noise intensities showed that our framework can effectively mitigate artifacts, allowing the model to reach parity with the clean-data CNN at lower noise levels. Although this study mainly benefits small-scale experiments, the findings highlight the necessity for advanced deep learning and Hopfield Network models to improve scalability and robustness in diverse real-world settings.
    摘要 研究脑电响应的研究,从高度同步到不同程度异步神经网络模式,如睡卫睡眠周期,是评估脑的空间时间动态和行为之间的关系的基础。然而,开发新技术来准确识别这些状态仍然是一个挑战,因为这些通常受到噪声、artifacts和低质量记录的影响。在这项研究中,我们提出了一种两阶段计算框架, combinign Hopfield Networks for artifact data preprocessing with Convolutional Neural Networks (CNNs) for classifying brain states in rat neural recordings under different levels of anesthesia。为评估我们的框架的稳定性,我们故意将噪声artifacts introduce into the neural recordings。我们对我们的混合Hopfield-CNN pipeline进行了 benchmarking,并与两个参照模型进行比较:一个专门为同样噪声输入处理的 CNN,以及另一个在噪声自由数据上训练和测试的 CNN。在不同的数据压缩和噪声强度下的性能评估表明,我们的框架可以有效地抑制噪声,使模型在较低的噪声水平达到同clean-data CNN的性能。尽管这项研究主要适用于小规模实验,但发现 highlights the necessity for advanced deep learning and Hopfield Network models to improve scalability and robustness in diverse real-world settings.

Preserving Privacy in GANs Against Membership Inference Attack

  • paper_url: http://arxiv.org/abs/2311.03172
  • repo_url: None
  • paper_authors: Mohammadhadi Shateri, Francisco Messina, Fabrice Labeau, Pablo Piantanida
  • for: 该论文主要研究如何使用生成器对潜在敏感数据进行隐私保护,以防止数据泄露和攻击。
  • methods: 该论文提出了两种防御策略:一种是基于最大 entropy 的 GAN 架构(MEGAN),另一种是基于 minimizing 生成样本中对训练数据点的信息泄露的方法(MIMGAN)。
  • results: 该论文通过应用这两种防御策略于一些常用的数据集,发现这些方法可以大幅降低对攻击者的准确率,同时减少生成样本的质量下降。
    Abstract Generative Adversarial Networks (GANs) have been widely used for generating synthetic data for cases where there is a limited size real-world dataset or when data holders are unwilling to share their data samples. Recent works showed that GANs, due to overfitting and memorization, might leak information regarding their training data samples. This makes GANs vulnerable to Membership Inference Attacks (MIAs). Several defense strategies have been proposed in the literature to mitigate this privacy issue. Unfortunately, defense strategies based on differential privacy are proven to reduce extensively the quality of the synthetic data points. On the other hand, more recent frameworks such as PrivGAN and PAR-GAN are not suitable for small-size training datasets. In the present work, the overfitting in GANs is studied in terms of the discriminator, and a more general measure of overfitting based on the Bhattacharyya coefficient is defined. Then, inspired by Fano's inequality, our first defense mechanism against MIAs is proposed. This framework, which requires only a simple modification in the loss function of GANs, is referred to as the maximum entropy GAN or MEGAN and significantly improves the robustness of GANs to MIAs. As a second defense strategy, a more heuristic model based on minimizing the information leaked from generated samples about the training data points is presented. This approach is referred to as mutual information minimization GAN (MIMGAN) and uses a variational representation of the mutual information to minimize the information that a synthetic sample might leak about the whole training data set. Applying the proposed frameworks to some commonly used data sets against state-of-the-art MIAs reveals that the proposed methods can reduce the accuracy of the adversaries to the level of random guessing accuracy with a small reduction in the quality of the synthetic data samples.
    摘要 生成对抗网络(GAN)广泛应用于生成受限数据集或数据持有者不愿分享数据样本的情况下生成合成数据。然而,recent works显示GAN可能因过拟合和记忆而泄露training数据样本中的信息。这使得GAN易受到会员推理攻击(MIAs)。文献中已经提出了多种防御策略来缓解这种隐私问题,但这些策略基于不同性隐私会导致合成数据点质量下降Extensively。相反,more recent frameworks such as PrivGAN和PAR-GAN不适用于小型训练集。在当前工作中,我们研究GAN中的过拟合问题,并定义了一种基于 Bhattacharyya 系数的更一般的过拟合度量。然后, inspirited by Fano's inequality,我们提出了一种防御机制, referred to as the maximum entropy GAN(MEGAN), which significantly improves the robustness of GANs to MIAs。作为第二种防御策略,我们提出了一种基于减少生成样本中对训练数据点的信息泄露的方法。这种方法被称为mutual information minimization GAN(MIMGAN),它使用了一种 Variational representation of the mutual information来最小化生成样本中对整个训练数据集的信息泄露。通过应用提出的方法到一些常用的数据集,我们发现可以将攻击者的准确率降到随机猜测率水平,只有一小部分降低合成数据点的质量。

An Examination of the Alleged Privacy Threats of Confidence-Ranked Reconstruction of Census Microdata

  • paper_url: http://arxiv.org/abs/2311.03171
  • repo_url: https://github.com/NajeebJebreel/CRR-analysis
  • paper_authors: David Sánchez, Najeeb Jebreel, Josep Domingo-Ferrer, Krishnamurty Muralidhar, Alberto Blanco-Justicia
  • for: 本研究旨在检验美国人口普查局(USCB)在2020年人口普查中使用 differential privacy(DP)来保护个人隐私,以及USCB是否正确地认为使用 DP 可以防止重建攻击。
  • methods: 本研究使用了一种新的重建攻击方法,即 confidence-ranked reconstruction,以衡量对原始数据进行重建的可能性。
  • results: 本研究发现, confidence-ranked reconstruction 不会威胁个人隐私,因为它无法引导泄露或特性泄露攻击。此外,由于人口普查数据的编译、处理和发布方式,无法重建原始和完整的记录。
    Abstract The alleged threat of reconstruction attacks has led the U.S. Census Bureau (USCB) to replace in the Decennial Census 2020 the traditional statistical disclosure limitation based on rank swapping with one based on differential privacy (DP). This has resulted in substantial accuracy loss of the released statistics. Worse yet, it has been shown that the reconstruction attacks used as an argument to move to DP are very far from allowing unequivocal reidentification of the respondents, because in general there are a lot of reconstructions compatible with the released statistics. In a very recent paper, a new reconstruction attack has been proposed, whose goal is to indicate the confidence that a reconstructed record was in the original respondent data. The alleged risk of serious disclosure entailed by such confidence-ranked reconstruction has renewed the interest of the USCB to use DP-based solutions. To forestall the potential accuracy loss in future data releases resulting from adoption of these solutions, we show in this paper that the proposed confidence-ranked reconstruction does not threaten privacy. Specifically, we report empirical results showing that the proposed ranking cannot guide reidentification or attribute disclosure attacks, and hence it fails to warrant the USCB's move towards DP. Further, we also demonstrate that, due to the way the Census data are compiled, processed and released, it is not possible to reconstruct original and complete records through any methodology, and the confidence-ranked reconstruction not only is completely ineffective at accurately reconstructing Census records but is trivially outperformed by an adequate interpretation of the released aggregate statistics.
    摘要 美国人口普查局(USCB)在2020年人口普查中改用了基于差异隐私(DP)的新统计隐私技术,以替代传统的排名交换统计隐私技术。这导致了发布统计数据的精度下降。另外,有人表示使用DP技术可以防止恢复攻击,但是实际上,这些攻击并不能准确地重建响应者的记录。在最近的论文中,一种新的恢复攻击方法被提出,其目的是指示恢复记录是否出现在原始响应者数据中。美国人口普查局因此重新受到了关注,以便使用DP技术来保护隐私。为了避免未来数据发布所导致的精度下降,我们在本文中证明了提议的信息排名不会威胁隐私。 Specifically,我们report了实验结果,表明提议的排名无法引导个人透明或特性泄露攻击,因此无需使用DP技术。此外,我们还证明了由于人口普查数据的编辑、处理和发布方式,无法通过任何方法重建原始和完整的记录,并且 confidence-ranked reconstruction不仅完全无法准确重建人口普查记录,而且是比充分解读 released aggregate statistics 更加轻松。

Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

  • paper_url: http://arxiv.org/abs/2311.03154
  • repo_url: https://github.com/liyipeng00/convergence
  • paper_authors: Yipeng Li, Xinchen Lyu
  • for: 这篇论文旨在提供对非联合学习(Federated Learning,FL)中Sequential Federated Learning(SFL)的整合分析,并确定SFL在不同数据类型下的整合性。
  • methods: 本论文使用了对SFL的整合分析,并提供了对SFL的整合性的确定。
  • results: 实验结果表明,SFL在具有极高不同性的设备上的性能比PFL更好,这与当前的整合分析结果相符。
    Abstract There are two categories of methods in Federated Learning (FL) for joint training across multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) sequential FL (SFL), where clients train models in a sequential manner. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. In this paper, we establish the convergence guarantees of SFL for strongly/general/non-convex objectives on heterogeneous data. The convergence guarantees of SFL are better than that of PFL on heterogeneous data with both full and partial client participation. Experimental results validate the counterintuitive analysis result that SFL outperforms PFL on extremely heterogeneous data in cross-device settings.
    摘要 在 Federated Learning (FL) 中,有两种方法 для共同训练多个客户端:一是平行 Federated Learning (PFL),客户端在平行方式进行模型训练;另一种是顺序 Federated Learning (SFL),客户端在顺序方式进行模型训练。与 PFL 不同的是,SFL 在不同数据上的收敛理论仍然缺失。在这篇论文中,我们确定了 SFL 在强不同/通用/非凸目标下的收敛保证。与 PFL 在不同数据上的收敛保证相比,SFL 在完全和半客户端参与情况下表现更好。实验结果证明了我们的反直觉分析结果,即 SFL 在极其不同数据的跨设备情况下比 PFL 表现更好。

End-to-end Material Thermal Conductivity Prediction through Machine Learning

  • paper_url: http://arxiv.org/abs/2311.03139
  • repo_url: None
  • paper_authors: Yagyank Srivastava, Ankit Jain
  • for: accelerated prediction of the thermal conductivity of materials
  • methods: employing machine learning methods, high-throughput calculations based on first principles and the Boltzmann transport equation
  • results: all models suffered from overfitting, best mean absolute percentage error remained in the range of 50-60%Here’s the Chinese translation:
  • for: 预测材料热导率的加速
  • methods: 使用机器学习方法、基于初始原理和博尔茨曼传输方程进行高通过put计算
  • results: 所有模型均存在过拟合问题,测试集的最佳均方误argin remained in the range of 50-60%
    Abstract We investigated the accelerated prediction of the thermal conductivity of materials through end- to-end structure-based approaches employing machine learning methods. Due to the non-availability of high-quality thermal conductivity data, we first performed high-throughput calculations based on first principles and the Boltzmann transport equation for 225 materials, effectively more than doubling the size of the existing dataset. We assessed the performance of state-of-the-art machine learning models for thermal conductivity prediction on this expanded dataset and observed that all these models suffered from overfitting. To address this issue, we introduced a novel graph-based neural network model, which demonstrated more consistent and regularized performance across all evaluated datasets. Nevertheless, the best mean absolute percentage error achieved on the test dataset remained in the range of 50-60%. This suggests that while these models are valuable for expediting material screening, their current accuracy is still limited.
    摘要 我们调查了通过终端结构基本方法加速材料热导率预测的进程。由于热导率数据的可用性不足,我们首先通过原理计算和博尔茨曼运动方程对225种材料进行了高通过率计算,实际上将现有数据集的大小增加了超过一倍。我们评估了现状最佳的机器学习模型对热导率预测的性能,发现所有这些模型都存在过拟合问题。为解决这个问题,我们提出了一种图形基于神经网络模型,该模型在所有评估数据集上显示了更一致和规则的性能。然而,在测试数据集上最佳的平均绝对百分比误差仍然在50-60%的范围内,这表明这些模型可以快速屏选材料,但其当前准确性仍有限。

Reservoir-Computing Model for Mapping and Forecasting Neuronal Interactions from Electrophysiological Data

  • paper_url: http://arxiv.org/abs/2311.03131
  • repo_url: None
  • paper_authors: Ilya Auslender, Giorgio Letti, Yasaman Heydari, Lorenzo Pavesi
  • for: 这个论文旨在描述如何使用计算机模型来解释神经元网络的结构和功能。
  • methods: 该模型基于储量计算机网络(RCN)架构,使用电physiological数据来重建神经元网络的结构。
  • results: 模型可以更高精度地预测神经元网络的连接图,并且可以预测特定输入的网络响应。
    Abstract Electrophysiological nature of neuronal networks allows to reveal various interactions between different cell units at a very short time-scales. One of the many challenges in analyzing these signals is to retrieve the morphology and functionality of a given network. In this work we developed a computational model, based on Reservoir Computing Network (RCN) architecture, which decodes the spatio-temporal data from electro-physiological measurements of neuronal cultures and reconstructs the network structure on a macroscopic domain, representing the connectivity between neuronal units. We demonstrate that the model can predict the connectivity map of the network with higher accuracy than the common methods such as Cross-Correlation and Transfer-Entropy. In addition, we experimentally demonstrate the ability of the model to predict a network response to a specific input, such as localized stimulus.
    摘要 电生物学性质的神经网络允许在非常短时间尺度内揭示各个单元之间的各种互动。然而,分析这些信号的一个挑战是恢复神经网络的结构和功能。在这项工作中,我们基于储量计算网络(RCN)架构,开发了一种计算模型,可以从神经元文化的电生物学测量数据中解码各个单元之间的连接关系,并在大规模域领域上重建神经网络结构。我们示示了这个模型可以比常用方法,如相关性和传输率,更准确地预测神经网络的连接图。此外,我们也实验ally demonstrated the ability of the model to predict a network response to a specific input, such as a localized stimulus.

Nonparametric modeling of the composite effect of multiple nutrients on blood glucose dynamics

  • paper_url: http://arxiv.org/abs/2311.03129
  • repo_url: https://github.com/jularina/trcmed-kit
  • paper_authors: Arina Odnoblyudova, Çağlar Hizli, ST John, Andrea Cognolato, Anne Juuti, Simo Särkkä, Kirsi Pietiläinen, Pekka Marttinen
  • for: 估计食物组成物的 physiological 响应,以及这些组成物的分开效应。
  • methods: 使用泛化函数方法,并通过嵌入组成物剂量和患者间共享统计信息来提高预测精度。
  • results: 能够更好地解释各种组成物对血糖响应的不同效应,并提高预测精度。
    Abstract In biomedical applications it is often necessary to estimate a physiological response to a treatment consisting of multiple components, and learn the separate effects of the components in addition to the joint effect. Here, we extend existing probabilistic nonparametric approaches to explicitly address this problem. We also develop a new convolution-based model for composite treatment-response curves that is more biologically interpretable. We validate our models by estimating the impact of carbohydrate and fat in meals on blood glucose. By differentiating treatment components, incorporating their dosages, and sharing statistical information across patients via a hierarchical multi-output Gaussian process, our method improves prediction accuracy over existing approaches, and allows us to interpret the different effects of carbohydrates and fat on the overall glucose response.
    摘要 Translated into Simplified Chinese:在生物医学应用中,经常需要估计多组分治疗的生理响应,并了解每个组分的分立效果以外的共同效果。我们在这里扩展现有的概率非Parametric方法,以解决这个问题。我们还开发了一种新的卷积型治疗响应曲线模型,这种模型更易于生物 интерпретирова。我们使用多输出 Gaussian 过程来共享患者间的统计信息,并 differentiate treatment components, incorporate their dosages, and improve prediction accuracy over existing approaches. 通过这种方法,我们可以解释各种碳水化合物和脂肪在血糖响应中的不同效果。

Algebraic Dynamical Systems in Machine Learning

  • paper_url: http://arxiv.org/abs/2311.03118
  • repo_url: None
  • paper_authors: Iolo Jones, Jerry Swan, Jeffrey Giansiracusa
  • for: 这 paper 是用于描述一种基于 rewrite 的动态系统,以及如何将这些系统应用于机器学习模型中。
  • methods: 这 paper 使用了一种 recursive function,并将其应用于 iterated rewriting system,以定义一种 formal class of models。
  • results: 这 paper 显示了这种 algebraic analogue of dynamical systems 可以将所有主要的动态机器学习模型(包括 recurrent neural networks、graph neural networks 和 diffusion models)嵌入到一个 formal class of models 中。
    Abstract We introduce an algebraic analogue of dynamical systems, based on term rewriting. We show that a recursive function applied to the output of an iterated rewriting system defines a formal class of models into which all the main architectures for dynamic machine learning models (including recurrent neural networks, graph neural networks, and diffusion models) can be embedded. Considered in category theory, we also show that these algebraic models are a natural language for describing the compositionality of dynamic models. Furthermore, we propose that these models provide a template for the generalisation of the above dynamic models to learning problems on structured or non-numerical data, including 'hybrid symbolic-numeric' models.
    摘要 我们提出了一个运算方法的数学类比,基于词法重写。我们显示了一个递归函数,当作迭代重写系统的输出,定义了一个正式的模型类别,可以包含所有主要的动态机器学习模型(包括回传神经网络、图形神经网络和扩散模型)。在Category theory中考虑,我们还显示了这些数学模型是动态模型的自然语言描述。此外,我们建议这些模型可以提供一个泛化的模型,用于将上述动态模型扩展到学习问题中的结构化或非数据类型,包括"混合 символіic-数值"模型。

RELand: Risk Estimation of Landmines via Interpretable Invariant Risk Minimization

  • paper_url: http://arxiv.org/abs/2311.03115
  • repo_url: None
  • paper_authors: Mateo Dulce Rubio, Siqi Zeng, Qi Wang, Didier Alvarado, Francisco Moreno, Hoda Heidari, Fei Fang
  • for: 提高人道主义废钳工作效率和准确性,为战后受影响的社区减少陷阱风险。
  • methods: 提出RELand系统,包括三大组成部分:一、提供全面的特征工程和标签分配指南,为全球废钳任务提供通用的数据预处理方法;二、将废钳存在问题定型为分类问题,设计一种可读性强的新型模型,基于稀缺特征覆盖和不变风险最小化;三、根据真实世界废钳操作规范,进行了广泛的评估,显示与现有方法相比有显著提升。
  • results: 在实际场景中,与一家人道主义废钳组织在哥伦比亚 collaborating,使用我们的系统进行了两个区域的废钳计划。
    Abstract Landmines remain a threat to war-affected communities for years after conflicts have ended, partly due to the laborious nature of demining tasks. Humanitarian demining operations begin by collecting relevant information from the sites to be cleared, which is then analyzed by human experts to determine the potential risk of remaining landmines. In this paper, we propose RELand system to support these tasks, which consists of three major components. We (1) provide general feature engineering and label assigning guidelines to enhance datasets for landmine risk modeling, which are widely applicable to global demining routines, (2) formulate landmine presence as a classification problem and design a novel interpretable model based on sparse feature masking and invariant risk minimization, and run extensive evaluation under proper protocols that resemble real-world demining operations to show a significant improvement over the state-of-the-art, and (3) build an interactive web interface to suggest priority areas for demining organizations. We are currently collaborating with a humanitarian demining NGO in Colombia that is using our system as part of their field operations in two areas recently prioritized for demining.
    摘要 废钣炸弹仍然是战后社区的威胁,一部分原因是清理废钣炸弹任务的困难程度。人道主义废钣炸弹除障工作开始于收集需要清理的场地的信息,然后由人类专家分析以确定剩下的废钣炸弹风险。在这篇论文中,我们提议了RELand系统来支持这些任务,该系统包括三个主要组成部分。我们(1)提供了普遍适用的地雷风险模型EngINEERING和标签分配指南,以增强陆地雷风险模型的数据集,(2)将废钣炸弹存在视为一种分类问题,并设计了一种新的可解释的模型,基于稀缺特征掩模和不变风险最小化,并在合理的协议下进行了广泛的评估,并显示与现有技术相比有显著提高。(3)建立了一个交互式网页界面,以建议废钣炸弹组织在优先级划分的地方进行清理。我们目前和一家在哥伦比亚的人道主义废钣炸弹NGO合作,他们在使用我们的系统作为其在两个最近被优先级划分的地区的场地清理工作的一部分。

Weight-Sharing Regularization

  • paper_url: http://arxiv.org/abs/2311.03096
  • repo_url: https://github.com/motahareh-sohrabi/weight-sharing-regularization
  • paper_authors: Mehran Shakerinava, Motahareh Sohrabi, Siamak Ravanbakhsh, Simon Lacoste-Julien
  • for: 本文是为了提出Weight-sharing regularization的概念和实现方法。
  • methods: 本文使用的方法包括定义weight-sharing regularization的函数$R(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$,以及对这个函数的 proximal mapping 的研究。
  • results: experiments 表明,Weight-sharing regularization 可以使得全连接神经网络学习 convolution-like filters。 In addition, the proposed parallel algorithm for proximal gradient descent provides an exponential speedup over previous algorithms, with a depth of $O(\log^3 d)$.
    Abstract Weight-sharing is ubiquitous in deep learning. Motivated by this, we introduce ''weight-sharing regularization'' for neural networks, defined as $R(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $R$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. Using this interpretation, we design a novel parallel algorithm for $\operatorname{prox}_R$ which provides an exponential speedup over previous algorithms, with a depth of $O(\log^3 d)$. Our algorithm makes it feasible to train weight-sharing regularized deep neural networks with proximal gradient descent. Experiments reveal that weight-sharing regularization enables fully-connected networks to learn convolution-like filters.
    摘要 深度学习中的Weight-sharing是普遍存在的。我们提出了一种''Weight-sharing regularization'' для神经网络,定义为 $R(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. 我们研究了 $R$ 的 proximal mapping,并提供了一种物理系统中的交互Particle的INTUITIVE interpretations。使用这种 interpretations,我们设计了一种新的并行算法,提供了对前一代算法的几何speedup,depth为 $O(\log^3 d)$。我们的算法使得可以使用 proximal gradient descent 来训练Weight-sharing regularized深度神经网络。实验表明,Weight-sharing regularization 使得全连接神经网络可以学习 convolution-like filters。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide that instead.

Equivariance Is Not All You Need: Characterizing the Utility of Equivariant Graph Neural Networks for Particle Physics Tasks

  • paper_url: http://arxiv.org/abs/2311.03094
  • repo_url: None
  • paper_authors: Savannah Thais, Daniel Murnane
  • for: 这篇论文主要是为了评估平衡图 neural network(GNN)在物理数据上的应用,以及该方法如何直接 incorporate 物理系统中的对称性。
  • methods: 这篇论文使用了已有的文献中关于对称网络的知识,并对实际的 particle physics 重构任务进行了广泛的评估。
  • results: 研究发现,许多理论上associated with equivariant networks的benefits可能不适用于实际的物理系统,并提出了未来研究的有力指导,以便对ML理论和物理应用产生 positively impact.
    Abstract Incorporating inductive biases into ML models is an active area of ML research, especially when ML models are applied to data about the physical world. Equivariant Graph Neural Networks (GNNs) have recently become a popular method for learning from physics data because they directly incorporate the symmetries of the underlying physical system. Drawing from the relevant literature around group equivariant networks, this paper presents a comprehensive evaluation of the proposed benefits of equivariant GNNs by using real-world particle physics reconstruction tasks as an evaluation test-bed. We demonstrate that many of the theoretical benefits generally associated with equivariant networks may not hold for realistic systems and introduce compelling directions for future research that will benefit both the scientific theory of ML and physics applications.
    摘要 “将对物理世界数据进行机器学习的研究中,数据对称的数学模型(Equivariant Graph Neural Networks,GNNs)在最近几年中得到了广泛关注。这些模型直接将物理系统下的对称性 incorporated into the model,因此在物理数据上进行学习。根据文献中的群对称网络,本文提供了实际的粒子物理重建任务作为评估平台,以评估提出的优点。我们发现了许多理论上对于对称网络的优点可能不适用于实际系统,并提出了未来研究的有益方向,将帮助物理应用和机器学习理论的发展。”Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

Persistent homology for high-dimensional data based on spectral methods

  • paper_url: http://arxiv.org/abs/2311.03087
  • repo_url: https://github.com/berenslab/eff-ph
  • paper_authors: Sebastian Damrich, Philipp Berens, Dmitry Kobak
  • for: 检测非致用点云的非恒等topology,如循环或孔隙。
  • methods: 使用 spectral distances on the $k$-nearest-neighbor graph of the data,如扩散距离和有效抗抗。
  • results: 在高维空间中的实际世界数据上,使用spectral distances on the $k$-nearest-neighbor graph可以robustly检测细胞周期循环。
    Abstract Persistent homology is a popular computational tool for detecting non-trivial topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case vanilla persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for most existing refinements of persistent homology. As a remedy, we find that spectral distances on the $k$-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow persistent homology to detect the correct topology even in the presence of high-dimensional noise. Furthermore, we derive a novel closed-form expression for effective resistance in terms of the eigendecomposition of the graph Laplacian, and describe its relation to diffusion distances. Finally, we apply these methods to several high-dimensional single-cell RNA-sequencing datasets and show that spectral distances on the $k$-nearest-neighbor graph allow robust detection of cell cycle loops.
    摘要 persistent homology 是一种广泛使用的计算工具,用于检测点云的非致命 topology,如循环或空隙。然而,许多真实世界数据集具有低内在维度,并且生活在一个 much 更高维度的 ambient space 中。我们显示在这种情况下,vanilla persistent homology 会受到噪声的影响,无法正确检测 topology。同时,大多数现有的 persistent homology 改进方法也会受到噪声的影响。为了解决这个问题,我们发现spectral distances on the $k$-nearest-neighbor graph of the data,如 diffusion distance 和 effective resistance,可以使 persistent homology 检测正确的 topology,即使在高维度噪声的情况下。此外,我们 derive 了一个 novel closed-form expression for effective resistance in terms of the eigendecomposition of the graph Laplacian,并描述了它与 diffusion distances 之间的关系。最后,我们应用这些方法到了一些高维单个细胞 RNA-seq 数据集,并显示了 spectral distances on the $k$-nearest-neighbor graph 可以Robustly 检测细胞周期循环。

Quantifying the value of information transfer in population-based SHM

  • paper_url: http://arxiv.org/abs/2311.03083
  • repo_url: None
  • paper_authors: Aidan J. Hughes, Jack Poole, Nikolaos Dervilis, Paul Gardner, Keith Worden
  • for: 这个研究旨在 Addressing some limitations of traditional Structural Health Monitoring (SHM) methods, such as data scarcity, by using population-based approaches.
  • methods: The paper proposes a transfer-strategy decision process for a classification task in a population of simulated structures, based on domain adaptation and the concept of expected value of information transfer.
  • results: The proposed method is demonstrated through a representative SHM maintenance problem, and the results show that the transfer-strategy decision process can effectively improve the classification performance in the target domain.Here’s the full text in Simplified Chinese:
  • for: 这个研究旨在Addressing some limitations of traditional Structural Health Monitoring (SHM) methods, such as data scarcity, by using population-based approaches.
  • methods: 该研究提出了一种基于域适应和信息传递的转移策略决策过程,用于在一个 simulate structure population 中进行分类任务。
  • results: 该方法在一个示例的 SHM 维护问题中进行了示例,结果表明,转移策略决策过程可以有效地提高目标预测性能。
    Abstract Population-based structural health monitoring (PBSHM), seeks to address some of the limitations associated with data scarcity that arise in traditional SHM. A tenet of the population-based approach to SHM is that information can be shared between sufficiently-similar structures in order to improve predictive models. Transfer learning techniques, such as domain adaptation, have been shown to be a highly-useful technology for sharing information between structures when developing statistical classifiers for PBSHM. Nonetheless, transfer-learning techniques are not without their pitfalls. In some circumstances, for example if the data distributions associated with the structures within a population are dissimilar, applying transfer-learning methods can be detrimental to classification performance -- this phenomenon is known as negative transfer. Given the potentially-severe consequences of negative transfer, it is prudent for engineers to ask the question `when, what, and how should one transfer between structures?'. The current paper aims to demonstrate a transfer-strategy decision process for a classification task for a population of simulated structures in the context of a representative SHM maintenance problem, supported by domain adaptation. The transfer decision framework is based upon the concept of expected value of information transfer. In order to compute the expected value of information transfer, predictions must be made regarding the classification (and decision performance) in the target domain following information transfer. In order to forecast the outcome of transfers, a probabilistic regression is used here to predict classification performance from a proxy for structural similarity based on the modal assurance criterion.
    摘要 population-based结构健康监测(PBSHM)想要解决传统健康监测中的数据稀缺问题。PBSHM的一个基本思想是在相似的结构之间共享信息,以提高预测模型的准确性。但是,传输学习技术不是无缺点的。在某些情况下,如果结构内部的数据分布不同,则使用传输学习方法可能会导致分类性能下降,这被称为负效应。因此,工程师应该问到“何时、什么、如何进行传输?”。本文提出了一种传输策略决策框架,基于结构健康监测维护问题中的代表性例子。在这个框架中,通过对结构相似度的评估,预测结构的分类性能。为了预测传输后的结果,这里使用了抽象回归来预测分类性能。

SoK: Memorisation in machine learning

  • paper_url: http://arxiv.org/abs/2311.03075
  • repo_url: None
  • paper_authors: Dmitrii Usynin, Moritz Knolle, Georgios Kaissis
  • for: 本研究旨在解决机器学习模型中各个数据样本的影响量化问题,尤其是在深度学习中,当需要从有限数据分布中学习复杂和高维关系时。
  • methods: 本研究提出了一种系统化方法,可以识别和评估机器学习模型中的记忆现象,以及其与模型泛化和隐私问题的关系。
  • results: 研究发现,记忆在机器学习中可能具有不同定义和方面,并且与模型泛化和隐私问题相互影响。此外,研究还提出了一些可能的隐私保护策略,以减少记忆对模型的影响。
    Abstract Quantifying the impact of individual data samples on machine learning models is an open research problem. This is particularly relevant when complex and high-dimensional relationships have to be learned from a limited sample of the data generating distribution, such as in deep learning. It was previously shown that, in these cases, models rely not only on extracting patterns which are helpful for generalisation, but also seem to be required to incorporate some of the training data more or less as is, in a process often termed memorisation. This raises the question: if some memorisation is a requirement for effective learning, what are its privacy implications? In this work we unify a broad range of previous definitions and perspectives on memorisation in ML, discuss their interplay with model generalisation and their implications of these phenomena on data privacy. Moreover, we systematise methods allowing practitioners to detect the occurrence of memorisation or quantify it and contextualise our findings in a broad range of ML learning settings. Finally, we discuss memorisation in the context of privacy attacks, differential privacy (DP) and adversarial actors.
    摘要 量化机器学习模型中个体数据样本的影响是一个开放的研究问题。特别是在深度学习中,由限制样本数据生成分布学习复杂高维关系时,模型不仅需要抽取有助于泛化的模式,而且也需要吸收一些训练数据大致Speech recognition as is,这个过程经常被称为memorization。这引发了一个问题:如果一定程度的记忆是学习效果的必要条件,那么这些现象具有什么隐私意义?在这篇文章中,我们将统一各种前期定义和机器学习中的memorization的视角,讨论它们与模型泛化的关系,以及这些现象对数据隐私的影响。此外,我们还会系统化方法,让实际人员可以检测memorization的发生或者量化它,并将我们的发现应用于各种机器学习学习环境中。最后,我们将memorization与隐私攻击、权威隐私(DP)和抗击敌方攻击进行比较。

Imaging through multimode fibres with physical prior

  • paper_url: http://arxiv.org/abs/2311.03062
  • repo_url: None
  • paper_authors: Chuncheng Zhang, Yingjie Shi, Zheyi Yao, Xiubao Sui, Qian Cheng
  • for: 这篇论文旨在提出一种physics-assisted, unsupervised, learning-based fibre imaging方法,以减少计算复杂性并提高多模式纤维成像的扩展应用。
  • methods: 该方法使用深度学习网络,但不需要目标对应的射频对。而是通过物理优化方法提供的方向来帮助网络学习目标特征。
  • results: 该方法可以通过在线学习,只需要几个噪声模式和未对应的目标,可以准确地重建目标图像。此外,该方法还可以提高多模式纤维成像的普适性。
    Abstract Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the mapping relationship between the speckle pattern and the target image, thereby reducing the computational complexity. The unsupervised network learns target features according to the optimized direction provided by the physical prior. Therefore, the reconstruction process of the online learning only requires a few speckle patterns and unpaired targets. The proposed scheme also increases the generalization ability of the learning-based method in perturbed multimode fibres. Our scheme has the potential to extend the application of multimode fibre imaging.
    摘要 “对于受扰的多模式纤维光通信,深度学习已经广泛研究。然而,现有方法主要使用不同配置的目标对组。很难重建目标无需训练网络。在本文中,我们提出了一个物理帮助、无监督、学习基于纤维光实验的方案。物理帮助的目的是简化纤维光实验中的目标图像与镜像关系,因此降低计算复杂度。无监督网络根据物理帮助来学习目标特征。因此,重建过程只需几个纤维光束和无配对目标。我们的方案还增加了学习基于纤维光实验的多模式纤维光实验的应用范围。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Learned layered coding for Successive Refinement in the Wyner-Ziv Problem

  • paper_url: http://arxiv.org/abs/2311.03061
  • repo_url: None
  • paper_authors: Boris Joukovsky, Brent De Weerdt, Nikos Deligiannis
  • for: 本文提出了一种基于数据驱动的方法,用于逐步编码连续源,并在不同质量水平下逐步解码。这种设置类似于Successive Refine Wyner-Ziv编码问题。
  • methods: 本文使用了循环神经网络(RNN)来学习层次编码器和解码器,特别是在二阶 Gaussian случа子下。模型通过最小化Variational bound来训练,以实现逐步精细编码。
  • results: 实验表明,RNN可以显式地恢复层次归一化解决方案,类似于可扩展的嵌入量化。此外,本文的环境-distortion性能与相应的monolithic Wyner-Ziv编码方法几乎相同,并且与环境-distortion bound很近。
    Abstract We propose a data-driven approach to explicitly learn the progressive encoding of a continuous source, which is successively decoded with increasing levels of quality and with the aid of correlated side information. This setup refers to the successive refinement of the Wyner-Ziv coding problem. Assuming ideal Slepian-Wolf coding, our approach employs recurrent neural networks (RNNs) to learn layered encoders and decoders for the quadratic Gaussian case. The models are trained by minimizing a variational bound on the rate-distortion function of the successively refined Wyner-Ziv coding problem. We demonstrate that RNNs can explicitly retrieve layered binning solutions akin to scalable nested quantization. Moreover, the rate-distortion performance of the scheme is on par with the corresponding monolithic Wyner-Ziv coding approach and is close to the rate-distortion bound.
    摘要 我们提出了一种数据驱动的方法,以明确地学习不断编码的连续来源,并在不断提高质量的情况下使用相关的侧信息进行成功的解码。这种设置对应于Wyner-Ziv编码问题的逐渐精化。假设有理想的Slepian-Wolf编码,我们使用循环神经网络(RNN)来学习层次编码器和解码器,特别是在二阶ensional Gaussian 的情况下。我们通过最小化变量约束函数来训练模型,以实现逐渐精化的Wyner-Ziv编码问题的率质量函数。我们示出,RNN可以明确地恢复层次归一化解决方案,类似于可扩展的嵌套量化。此外,我们的方案的率质量性能与相应的单一Wyner-Ziv编码方法相当,并且与率质量 bound 很近。

Personalizing Keyword Spotting with Speaker Information

  • paper_url: http://arxiv.org/abs/2311.03419
  • repo_url: None
  • paper_authors: Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno
  • for: 提高关键词检测精度,特别是面临多种口音和年龄群体的挑战。
  • methods: 利用Feature-wise Linear Modulation(FiLM)方法,结合发音人信息进行关键词检测,并在输入音频和预存用户音频中提取发音人信息。
  • results: 在多样化 dataset 上进行测试,实现了关键词检测精度的明显提高,特别是面临少数批群的发音人。此外,提议的方法只需增加1%的参数数量,并无显著影响延迟和计算成本,适用于实际应用。
    Abstract Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker recognition systems to extract speaker information, and we experiment on extracting this information from both the input audio and pre-enrolled user audio. We evaluate our systems on a diverse dataset and achieve a substantial improvement in keyword detection accuracy, particularly among underrepresented speaker groups. Moreover, our proposed approach only requires a small 1% increase in the number of parameters, with a minimum impact on latency and computational cost, which makes it a practical solution for real-world applications.
    摘要 �ycleptic spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker recognition systems to extract speaker information, and we experiment on extracting this information from both the input audio and pre-enrolled user audio. We evaluate our systems on a diverse dataset and achieve a substantial improvement in keyword detection accuracy, particularly among underrepresented speaker groups. Moreover, our proposed approach only requires a small 1% increase in the number of parameters, with a minimum impact on latency and computational cost, which makes it a practical solution for real-world applications.Here's the translation in Traditional Chinese:这些关键辨识系统经常对不同的口音和年龄层面具有困难,以扩展到多样化的人口。为了解决这个挑战,我们提出了一种新的方法,将话者信息 integrate into 关键辨识中,使用 Feature-wise Linear Modulation (FiLM),一种最近的多源信息学习方法。我们尝试了 Text-Dependent 和 Text-Independent 两种话者辨识系统,以EXTRACT 话者信息,并对 input 音频和预先登录的使用者音频进行实验。我们对多样化的数据集进行评估,并在特定的话者群体中取得了重大的关键检测精度提升。此外,我们的提议方法仅需加入 1% 的参数数量,对于 Computational cost 和延迟时间的影响相对轻微,使其成为实际应用中的实用解决方案。

DRAUC: An Instance-wise Distributionally Robust AUC Optimization Framework

  • paper_url: http://arxiv.org/abs/2311.03055
  • repo_url: None
  • paper_authors: Siran Dai, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang
    for:本研究旨在提高长尾分类中的 Area Under the ROC Curve (AUC) metric,以便在实际应用中提高模型的性能。methods:本研究提出了一种基于 Distributionally Robust Optimization (DRO) 的 instance-wise surrogate loss函数,即 Distributionally Robust AUC (DRAUC),并建立了一个优化框架。此外,我们还提出了一种更适合的分布意识 DRAUC metric,以消除标签偏见。results:我们经验表明,当训练集规模充分增大时,模型的总体性能会提高。此外,我们在一些腐坏 benchmark 数据集上进行了实验,并证明了我们的方法的效果。
    Abstract The Area Under the ROC Curve (AUC) is a widely employed metric in long-tailed classification scenarios. Nevertheless, most existing methods primarily assume that training and testing examples are drawn i.i.d. from the same distribution, which is often unachievable in practice. Distributionally Robust Optimization (DRO) enhances model performance by optimizing it for the local worst-case scenario, but directly integrating AUC optimization with DRO results in an intractable optimization problem. To tackle this challenge, methodically we propose an instance-wise surrogate loss of Distributionally Robust AUC (DRAUC) and build our optimization framework on top of it. Moreover, we highlight that conventional DRAUC may induce label bias, hence introducing distribution-aware DRAUC as a more suitable metric for robust AUC learning. Theoretically, we affirm that the generalization gap between the training loss and testing error diminishes if the training set is sufficiently large. Empirically, experiments on corrupted benchmark datasets demonstrate the effectiveness of our proposed method. Code is available at: https://github.com/EldercatSAM/DRAUC.
    摘要 “区域下降曲线(AUC)是长条状分类enario中广泛使用的衡量指标。然而,大多数现有方法假设训练和测试例子都是从同一个分布中抽出的,这经常是实际中不可能实现的。分布robust优化(DRO)可以提高模型性能,但是直接将AUC优化与DRO结合会导致困难的优化问题。为解决这个挑战,我们提出了例子别的代理损失函数Distributionally Robust AUC(DRAUC),并建立了我们的优化框架之上。此外,我们点出了传统DRAUC可能会导致标签偏见,因此引入了分布意识的DRAUC作为更适合的弹性AUC学习指标。理论上,我们证明了在训练集大 enough时,训练loss和测试误差之间的差异会减少。实际实验显示了我们的提案的效果。代码可以在https://github.com/EldercatSAM/DRAUC中找到。”Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Validity problems in clinical machine learning by indirect data labeling using consensus definitions

  • paper_url: http://arxiv.org/abs/2311.03037
  • repo_url: https://github.com/statnlp/ml4h_validity_problems
  • paper_authors: Michael Hagmann, Shigehiko Schamoni, Stefan Riezler
  • for: 这种研究探讨了机器学习在医疗领域的疾病诊断中的有效性问题。
  • methods: 这种研究使用了一种通用的检测方法,可以在训练数据中检测到问题的存在。
  • results: 研究发现,当目标标签在训练数据中是由间接测量决定的时,机器学习模型将学习仅仅重建已知的目标定义,而不会学习真实的含义。这些模型在相似构建的测试数据上表现出色,但在实际例子中将失败。
    Abstract We demonstrate a validity problem of machine learning in the vital application area of disease diagnosis in medicine. It arises when target labels in training data are determined by an indirect measurement, and the fundamental measurements needed to determine this indirect measurement are included in the input data representation. Machine learning models trained on this data will learn nothing else but to exactly reconstruct the known target definition. Such models show perfect performance on similarly constructed test data but will fail catastrophically on real-world examples where the defining fundamental measurements are not or only incompletely available. We present a general procedure allowing identification of problematic datasets and black-box machine learning models trained on them, and exemplify our detection procedure on the task of early prediction of sepsis.
    摘要 我们描述了机器学习在医学领域中疾病诊断中的有效性问题。这种问题发生在训练数据中的目标标签是基于间接测量,而基本测量必须用于确定间接测量的时候,这些基本测量被包含在输入数据表示中。机器学习模型在这些数据上训练后会学习到 nothing else but to exactly reconstruct the known target definition。这些模型在同构测试数据上表现出色,但在真实世界中的例子中,定义基本测量不完全或只有部分可用时,这些模型会失败彪夷。我们提出了一种通用的数据集标识方法和黑盒机器学习模型的检测方法,并在早期综合监测综合病情中进行了示例。

On regularized polynomial functional regression

  • paper_url: http://arxiv.org/abs/2311.03036
  • repo_url: None
  • paper_authors: Markus Holzleitner, Sergei Pereverzyev
  • for: 这篇论文探讨了多项式函数回归,并在设立了一个新的finite sample bound。
  • methods: 这篇论文使用了多种方法,包括普通的smoothness condition, capacity condition和正则化技术。
  • results: 数字证明表明,使用高阶多项式项可以提高回归性能。
    Abstract This article offers a comprehensive treatment of polynomial functional regression, culminating in the establishment of a novel finite sample bound. This bound encompasses various aspects, including general smoothness conditions, capacity conditions, and regularization techniques. In doing so, it extends and generalizes several findings from the context of linear functional regression as well. We also provide numerical evidence that using higher order polynomial terms can lead to an improved performance.
    摘要 Translated into Simplified Chinese:这篇文章对波动函数回归进行了全面的处理,包括建立了一个新的finite sample bound,这个bound包括了一般的光滑条件、容量条件以及规则化技术。这个bound将 linear函数回归的Context中的一些发现扩展和总结。此外,我们还提供了数据证明,使用高阶多项式项可以提高性能。

Estimating treatment effects from single-arm trials via latent-variable modeling

  • paper_url: http://arxiv.org/abs/2311.03002
  • repo_url: https://github.com/manuelhaussmann/lvm_singlearm
  • paper_authors: Manuel Haussmann, Tran Minh Son Le, Viivi Halla-aho, Samu Kurki, Jussi Leinonen, Miika Koskinen, Samuel Kaski, Harri Lähdesmäki
  • for: 这种研究是为了提供一种可行的替代方案,使用深度隐藏变量模型来估计治疗效果,并且可以考虑欠拥有的 covariate 观察数据的 strucured missingness 模式。
  • methods: 这种方法使用了摊销变量推断来学习共享隐藏变量的 Identifiable 模型,并且可以用于 (i) 医疗记录匹配,如果治疗结果不可用于治疗组,或者 (ii) 直接估计治疗效果,假设两个组具有结果。
  • results: compared to previous methods, our results show improved performance both for direct treatment effect estimation as well as for effect estimation via patient matching.
    Abstract Randomized controlled trials (RCTs) are the accepted standard for treatment effect estimation but they can be infeasible due to ethical reasons and prohibitive costs. Single-arm trials, where all patients belong to the treatment group, can be a viable alternative but require access to an external control group. We propose an identifiable deep latent-variable model for this scenario that can also account for missing covariate observations by modeling their structured missingness patterns. Our method uses amortized variational inference to learn both group-specific and identifiable shared latent representations, which can subsequently be used for (i) patient matching if treatment outcomes are not available for the treatment group, or for (ii) direct treatment effect estimation assuming outcomes are available for both groups. We evaluate the model on a public benchmark as well as on a data set consisting of a published RCT study and real-world electronic health records. Compared to previous methods, our results show improved performance both for direct treatment effect estimation as well as for effect estimation via patient matching.
    摘要 randomized controlled trials (RCTs) 是确定的标准 для征效 estimation, but they can be infeasible due to ethical reasons and prohibitive costs. single-arm trials, where all patients belong to the treatment group, can be a viable alternative, but require access to an external control group. we propose an identifiable deep latent-variable model for this scenario that can also account for missing covariate observations by modeling their structured missingness patterns. our method uses amortized variational inference to learn both group-specific and identifiable shared latent representations, which can subsequently be used for (i) patient matching if treatment outcomes are not available for the treatment group, or for (ii) direct treatment effect estimation assuming outcomes are available for both groups. we evaluate the model on a public benchmark as well as on a data set consisting of a published RCT study and real-world electronic health records. compared to previous methods, our results show improved performance both for direct treatment effect estimation as well as for effect estimation via patient matching.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Variational Weighting for Kernel Density Ratios

  • paper_url: http://arxiv.org/abs/2311.03001
  • repo_url: https://github.com/swyoon/variationally-weighted-kernel-density-estimation
  • paper_authors: Sangwoong Yoon, Frank C. Park, Gunsu S Yun, Iljung Kim, Yung-Kyun Noh
  • for: 提高机器学习中的生成和识别任务中的kernel density estimation(KDE)的精度。
  • methods: 基于多维 calculus of variations的工具, derivation of an optimal weight function to reduce bias in standard kernel density estimates for density ratios, leading to improved estimates of prediction posteriors and information-theoretic measures。
  • results: 提高KDE的精度,并 shed light on some fundamental aspects of density estimation, particularly from the perspective of algorithms that employ KDEs as their main building blocks。
    Abstract Kernel density estimation (KDE) is integral to a range of generative and discriminative tasks in machine learning. Drawing upon tools from the multidimensional calculus of variations, we derive an optimal weight function that reduces bias in standard kernel density estimates for density ratios, leading to improved estimates of prediction posteriors and information-theoretic measures. In the process, we shed light on some fundamental aspects of density estimation, particularly from the perspective of algorithms that employ KDEs as their main building blocks.
    摘要 kernel density estimation(KDE)是机器学习中多种生成和歧义任务中的关键技术。通过多维Calculus of variations的工具,我们得出了最佳权重函数,可以减少标准kernel density估计中的偏误,从而提高预测 posterior和信息理论度量的估计。在这个过程中,我们也探讨了概率估计的一些基本问题,特别是那些使用KDE作为主要构建件的算法。Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need further assistance.

Strong statistical parity through fair synthetic data

  • paper_url: http://arxiv.org/abs/2311.03000
  • repo_url: None
  • paper_authors: Ivona Krchova, Michael Platzer, Paul Tiwald
  • for: 本研究旨在开发一种能够保护原始数据隐私的人工智能生成的假数据,同时满足用户和数据消费者的需求。
  • methods: 本研究使用了 Fairness by Design 的方法,通过平衡敏感特征的学习目标概率分布,使下游模型在各个阈值下做出公平预测。这种公平调整可以直接 incorporated into the sampling process of a synthetic generator or added as a post-processing step。
  • results: 研究发现,通过使用这种 Fairness by Design 的方法,可以在各个阈值下提供公平的预测结果,即即使从扭曲的原始数据中进行预测。此外,这种公平调整可以在不需要原始数据的假设和重新训练假数据生成器的情况下进行。
    Abstract AI-generated synthetic data, in addition to protecting the privacy of original data sets, allows users and data consumers to tailor data to their needs. This paper explores the creation of synthetic data that embodies Fairness by Design, focusing on the statistical parity fairness definition. By equalizing the learned target probability distributions of the synthetic data generator across sensitive attributes, a downstream model trained on such synthetic data provides fair predictions across all thresholds, that is, strong fair predictions even when inferring from biased, original data. This fairness adjustment can be either directly integrated into the sampling process of a synthetic generator or added as a post-processing step. The flexibility allows data consumers to create fair synthetic data and fine-tune the trade-off between accuracy and fairness without any previous assumptions on the data or re-training the synthetic data generator.
    摘要 人工生成的数据可以保护原始数据集的隐私,同时允许用户和数据消费者根据自己的需求修改数据。本文探讨使用 Fairness by Design 创建的合理数据,特点在于在敏感属性上均衡学习目标概率分布。通过在合理数据生成器中平衡学习目标概率分布,下游模型在基于偏见的原始数据上进行预测时提供了公平预测结果,无论预测阈值如何。这种公平调整可以直接集成到合理数据生成器的采样过程中,或者作为后处理步骤进行添加。这种灵活性允许数据消费者创建合理的数据并调整准确性和公平之间的平衡,无需对数据进行任何先前假设或重新训练合理数据生成器。

Hacking Cryptographic Protocols with Advanced Variational Quantum Attacks

  • paper_url: http://arxiv.org/abs/2311.02986
  • repo_url: None
  • paper_authors: Borja Aizpurua, Pablo Bermejo, Josu Etxezarreta Martinez, Roman Orus
  • for: 该论文提出了一种改进的量子袭击算法(VQAA)来攻击 symmetric-key 协议。
  • methods: 该论文使用了更加精准的量子计算和更多的精度来实现更有效的量子袭击。
  • results: 该论文的攻击成功率提高了对 symmetric-key 协议的攻击,并且可以使用更少的量子比特来实现。 更多的应用可以在 asymmetric-key 协议和哈希函数上进行。
    Abstract Here we introduce an improved approach to Variational Quantum Attack Algorithms (VQAA) on crytographic protocols. Our methods provide robust quantum attacks to well-known cryptographic algorithms, more efficiently and with remarkably fewer qubits than previous approaches. We implement simulations of our attacks for symmetric-key protocols such as S-DES, S-AES and Blowfish. For instance, we show how our attack allows a classical simulation of a small 8-qubit quantum computer to find the secret key of one 32-bit Blowfish instance with 24 times fewer number of iterations than a brute-force attack. Our work also shows improvements in attack success rates for lightweight ciphers such as S-DES and S-AES. Further applications beyond symmetric-key cryptography are also discussed, including asymmetric-key protocols and hash functions. In addition, we also comment on potential future improvements of our methods. Our results bring one step closer assessing the vulnerability of large-size classical cryptographic protocols with Noisy Intermediate-Scale Quantum (NISQ) devices, and set the stage for future research in quantum cybersecurity.
    摘要

The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning

  • paper_url: http://arxiv.org/abs/2311.02940
  • repo_url: https://github.com/mlbio-epfl/hume
  • paper_authors: Artyom Gadetsky, Maria Brbic
  • for: This paper is written for inferring human labeling of a given dataset without any external supervision.
  • methods: The paper uses a simple model-agnostic framework called HUME, which utilizes the insight that classes defined by many human labelings are linearly separable regardless of the representation space used to represent a dataset.
  • results: The proposed optimization objective in HUME is strikingly well-correlated with the ground truth labeling of the dataset, and the framework achieves state-of-the-art performance on four benchmark image classification datasets, including the large-scale ImageNet-1000 dataset.
    Abstract We present HUME, a simple model-agnostic framework for inferring human labeling of a given dataset without any external supervision. The key insight behind our approach is that classes defined by many human labelings are linearly separable regardless of the representation space used to represent a dataset. HUME utilizes this insight to guide the search over all possible labelings of a dataset to discover an underlying human labeling. We show that the proposed optimization objective is strikingly well-correlated with the ground truth labeling of the dataset. In effect, we only train linear classifiers on top of pretrained representations that remain fixed during training, making our framework compatible with any large pretrained and self-supervised model. Despite its simplicity, HUME outperforms a supervised linear classifier on top of self-supervised representations on the STL-10 dataset by a large margin and achieves comparable performance on the CIFAR-10 dataset. Compared to the existing unsupervised baselines, HUME achieves state-of-the-art performance on four benchmark image classification datasets including the large-scale ImageNet-1000 dataset. Altogether, our work provides a fundamentally new view to tackle unsupervised learning by searching for consistent labelings between different representation spaces.
    摘要 我们介绍HUME,一个简单的无监控模型框架,可以无需任何外部监控来推断资料集中的人类标签。HUME的关键想法是,由多个人类标签定义的类别在不同的表现空间中都是线性分类可能的。HUME利用这个想法来导引搜寻资料集中的所有可能的标签,以找到背后的人类标签。我们显示,提案的优化目标与真实的标签相对高度相关。实际上,我们仅在固定的表现空间上训练线性分类器,使得我们的框架可以与任何大型预训练和自动监控模型相容。尽管其简单,HUME在STL-10 dataset上大幅超越了对自动表现的supervised linear classifier,并在CIFAR-10 dataset上 achieve comparable performance。与现有的无监控基elines相比,HUME在四个 benchmark image classification dataset上 achieved state-of-the-art performance,包括大规模的ImageNet-1000 dataset。总的来说,我们的工作提供了一个全新的无监控学习方法,通过搜寻不同表现空间中的一致标签。

Edge2Node: Reducing Edge Prediction to Node Classification

  • paper_url: http://arxiv.org/abs/2311.02921
  • repo_url: https://github.com/arahmatiiii/E2N
  • paper_authors: Zahed Rahmati, Ali Rahmati, Dariush Kazemi
  • for: 本研究旨在提高图 neural network 模型在图边预测 задании的性能。
  • methods: 我们提出了一种新的方法 called E2N (Edge2Node),它直接从图中获取每个边的嵌入,而不需要预定的评分函数。
  • results: 我们在 ogbl-ddi 和 ogbl-collab 数据集上进行实验,并取得了与状态对照方法的比较优秀的性能。在 ogbl-ddi 数据集上,我们在验证集上达到了 Hits@20 分数为 98.79%,并在测试集上达到了 98.11%。在 ogbl-collab 数据集上,我们在验证集上达到了 Hits@50 分数为 95.46%,并在测试集上达到了 95.15%。
    Abstract Despite the success of graph neural network models in node classification, edge prediction (the task of predicting missing or potential relationships between nodes in a graph) remains a challenging problem for these models. A common approach for edge prediction is to first obtain the embeddings of two nodes, and then a predefined scoring function is used to predict the existence of an edge between the two nodes. In this paper, we introduce a new approach called E2N (Edge2Node) which directly obtains an embedding for each edge, without the need for a scoring function. To do this, we create a new graph H based on the graph G given for the edge prediction task, and then reduce the edge prediction task on G to a node classification task on H. Our E2N method can be easily applied to any edge prediction task with superior performance and lower computational costs. For the ogbl-ddi and ogbl-collab datasets, our E2N method outperforms the state-of-the-art methods listed on the leaderboards. Our experiments on the ogbl-ddi dataset achieved a Hits@20 score of 98.79% on the validation set and 98.11% on the test set. On the ogbl-collab dataset, we achieved a Hits@50 score of 95.46% on the validation set and 95.15% on the test set.
    摘要 尽管图 neural network 模型在节点分类任务上表现出色,但Edge prediction(预测图中缺失或可能存在的边关系)仍然是这些模型的挑战。一种常见的方法 дляEdge prediction是先获取两个节点的嵌入,然后使用预定的分数函数预测两节点之间是否存在边。在这篇论文中,我们介绍了一种新的方法called E2N(Edge2Node),它可以直接从图G中获取每个边的嵌入,不需要预定的分数函数。我们首先创建了一个新的图H,基于给定的图G和Edge prediction任务。然后,我们将Edge prediction任务降低到图H上的节点分类任务。我们的E2N方法可以轻松应用于任何Edge prediction任务,并且性能更高,计算成本更低。在ogbl-ddi和ogbl-collab datasets上,我们的E2N方法超过了现有的状态对方法。在ogbl-ddi dataset上,我们的实验在验证集上达到了Hits@20分数为98.79%,并在测试集上达到了98.11%。在ogbl-collab dataset上,我们的实验在验证集上达到了Hits@50分数为95.46%,并在测试集上达到了95.15%。

Distributed Matrix-Based Sampling for Graph Neural Network Training

  • paper_url: http://arxiv.org/abs/2311.02909
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Alok Tripathy, Katherine Yelick, Aydin Buluc
  • for: 这个 paper 的主要贡献是一些新的方法来降低分布式 GNN 训练中的通信量。
  • methods: 我们提出了一个矩阵基本的块样抽样方法,将抽样表示为稀疏矩阵乘法(SpGEMM),可以同时抽样多个批次。当输入图形结构不适合单一设备内存时,我们将图形分配到多个设备上,使用通信避免的 SpGEMM 算法来扩展 GNN 批次抽样,以训练更大的图形。
  • results: 我们通过实验示出,对于 $128$ 个 GPU 的最大 Open Graph Benchmark (OGB) 数据集,我们的管道比 Quiver (一个分布式延伸 PyTorch-Geometric) 快 $2.5\times$。在非 OGB 数据集上,我们获得了 $8.46\times$ 的速度提升。 finally, 我们还显示了分布在 GPU 上的图形和两种抽样算法的扩展。
    Abstract The primary contribution of this paper is new methods for reducing communication in the sampling step for distributed GNN training. Here, we propose a matrix-based bulk sampling approach that expresses sampling as a sparse matrix multiplication (SpGEMM) and samples multiple minibatches at once. When the input graph topology does not fit on a single device, our method distributes the graph and use communication-avoiding SpGEMM algorithms to scale GNN minibatch sampling, enabling GNN training on much larger graphs than those that can fit into a single device memory. When the input graph topology (but not the embeddings) fits in the memory of one GPU, our approach (1) performs sampling without communication, (2) amortizes the overheads of sampling a minibatch, and (3) can represent multiple sampling algorithms by simply using different matrix constructions. In addition to new methods for sampling, we show that judiciously replicating feature data with a simple all-to-all exchange can outperform current methods for the feature extraction step in distributed GNN training. We provide experimental results on the largest Open Graph Benchmark (OGB) datasets on $128$ GPUs, and show that our pipeline is $2.5\times$ faster Quiver (a distributed extension to PyTorch-Geometric) on a $3$-layer GraphSAGE network. On datasets outside of OGB, we show a $8.46\times$ speedup on $128$ GPUs in-per epoch time. Finally, we show scaling when the graph is distributed across GPUs and scaling for both node-wise and layer-wise sampling algorithms
    摘要 主要贡献之一是一种新的降低通信的分布式GNN训练阶段的方法。我们提议一种基于矩阵的批量采样方法,将采样表示为稀疏矩阵乘法(SpGEMM),并在一次多个批处理多个批处理。当输入图结构不能放入单个设备内存时,我们将图分布到多个设备,并使用通信快照的SpGEMM算法缩放GNN批处理的训练,以便在单个设备内存中训练更大的图。当输入图结构(但不是特征表示)可以在一个GPU内存中存储时,我们的方法(1)无需通信进行采样,(2)积累采样批处理的开销,(3)可以通过不同矩阵结构来表示多种采样算法。此外,我们还提供了一种新的采样方法,通过简单地在所有GPU之间进行所有到所有的交换来提高分布式GNN训练中的特征提取步骤。我们在$128$个GPU上进行实验,并证明我们的管道在$3$-层 GraphSAGE 网络上比 Quiver(分布式PyTorch-Geometric)快$2.5\times$。在OGB datasets之外,我们在$128$个GPU上每个epoch时间上提高$8.46\times$。最后,我们还展示了分布在GPU上的图和层wise采样算法的扩展。

HDGL: A hierarchical dynamic graph representation learning model for brain disorder classification

  • paper_url: http://arxiv.org/abs/2311.02903
  • repo_url: None
  • paper_authors: Parniyan Jalali, Mehran Safayani
  • For: 本研究旨在提出一种 hierarchical dynamic graph representation learning(HDGL)模型,用于分类 brain disorders 和 healthy 样本。* Methods: 该模型包括两个层次,第一层是构建 brain network graphs,并学习其空间和时间嵌入,第二层是组成 population graphs,并进行分类 после嵌入学习。此外,基于这两个层次的训练方法,提出了四种方法来减少内存复杂度。* Results: 对 ABIDE 和 ADHD-200 datasets 进行评估,结果显示 HDGL 模型在多种评价指标上比多个现状模型表现更好。
    Abstract The human brain can be considered as complex networks, composed of various regions that continuously exchange their information with each other, forming the brain network graph, from which nodes and edges are extracted using resting-state functional magnetic resonance imaging (rs-fMRI). Therefore, this graph can potentially depict abnormal patterns that have emerged under the influence of brain disorders. So far, numerous studies have attempted to find embeddings for brain network graphs and subsequently classify samples with brain disorders from healthy ones, which include limitations such as: not considering the relationship between samples, not utilizing phenotype information, lack of temporal analysis, using static functional connectivity (FC) instead of dynamic ones and using a fixed graph structure. We propose a hierarchical dynamic graph representation learning (HDGL) model, which is the first model designed to address all the aforementioned challenges. HDGL consists of two levels, where at the first level, it constructs brain network graphs and learns their spatial and temporal embeddings, and at the second level, it forms population graphs and performs classification after embedding learning. Furthermore, based on how these two levels are trained, four methods have been introduced, some of which are suggested for reducing memory complexity. We evaluated the performance of the proposed model on the ABIDE and ADHD-200 datasets, and the results indicate the improvement of this model compared to several state-of-the-art models in terms of various evaluation metrics.
    摘要 人脑可以视为复杂网络,由多个区域组成,这些区域不断交换信息,形成了大脑网络图,从而可以潜在地描述了脑部疾病的异常模式。目前,许多研究已经尝试了将大脑网络图 embed 到另一个空间中,并将样本分类为健康或疾病。然而,这些研究存在一些限制,例如:不考虑样本之间的关系、不使用现象信息、缺乏时间分析、使用静态功能连接(FC)而不是动态连接、使用固定图结构。我们提出了层次动态图表学习(HDGL)模型,这是首个解决了所有以上挑战的模型。HDGL包括两层,在第一层,它将大脑网络图构建并学习其空间和时间嵌入,在第二层,它将人群图形成并进行分类 после嵌入学习。此外,根据这两层的训练方式,我们提出了四种方法,其中一些可以降低内存复杂性。我们对 ABIDE 和 ADHD-200 数据集进行了评估,结果表明,提案的模型在多种评估指标上比多种现状模型表现出色。

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

  • paper_url: http://arxiv.org/abs/2311.02898
  • repo_url: None
  • paper_authors: Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim
  • for: 这个论文是为了提出一个基于神经转化器的文本识别框架,以便实现文本识别的目的。
  • methods: 该论文使用了精度化的 semantic tokens,通过神经转化器生成含义相对应的语音样本,并使用非autoregressive(NAR)演示器进行语音生成。
  • results: 实验结果表明,提出的模型在零批适应TTS中超过了基准值,在语音质量和发音相似性方面均有显著提升。此外,模型的推理速度和句式控制性也得到了证明。
    Abstract We introduce a text-to-speech(TTS) framework based on a neural transducer. We use discretized semantic tokens acquired from wav2vec2.0 embeddings, which makes it easy to adopt a neural transducer for the TTS framework enjoying its monotonic alignment constraints. The proposed model first generates aligned semantic tokens using the neural transducer, then synthesizes a speech sample from the semantic tokens using a non-autoregressive(NAR) speech generator. This decoupled framework alleviates the training complexity of TTS and allows each stage to focus on 1) linguistic and alignment modeling and 2) fine-grained acoustic modeling, respectively. Experimental results on the zero-shot adaptive TTS show that the proposed model exceeds the baselines in speech quality and speaker similarity via objective and subjective measures. We also investigate the inference speed and prosody controllability of our proposed model, showing the potential of the neural transducer for TTS frameworks.
    摘要 我们介绍了一个基于神经抽象器的文本到语音(TTS)框架。我们使用从wav2vec2.0嵌入中获取的精度化 semantic token,这使得我们可以轻松地将神经抽象器应用于 TTS 框架,享受它的幂等对齐约束。我们的提议的模型首先使用神经抽象器生成对齐的 semantic token,然后使用非自然语言生成器(NAR)生成语音样本。这种分离的框架可以减轻 TTS 的训练复杂度,使每个阶段可以专注于1) 语言和对齐模型化和2) 细腻语音模型化。我们的实验结果表明,我们的提议模型在零实际适应 TTS 中超过了基线,在语音质量和发音相似度方面通过对象和主观度量表现出色。我们还调查了我们的提议模型的推理速度和语调可控性,表明神经抽象器在 TTS 框架中的潜在优势。

AdaFlood: Adaptive Flood Regularization

  • paper_url: http://arxiv.org/abs/2311.02891
  • repo_url: None
  • paper_authors: Wonho Bae, Yi Ren, Mohamad Osama Ahmed, Frederick Tung, Danica J. Sutherland, Gabriel L. Oliveira
  • for: 提高模型的测试时泛化能力
  • methods: 使用适应式洪水训练法,根据样本的难度动态调整训练损失的目标值
  • results: 在文本、图像、异步事件序列和表格等多种输入模式下,经验表明 AdaFlood 可以强大地适应不同的数据领域和噪声水平
    Abstract Although neural networks are conventionally optimized towards zero training loss, it has been recently learned that targeting a non-zero training loss threshold, referred to as a flood level, often enables better test time generalization. Current approaches, however, apply the same constant flood level to all training samples, which inherently assumes all the samples have the same difficulty. We present AdaFlood, a novel flood regularization method that adapts the flood level of each training sample according to the difficulty of the sample. Intuitively, since training samples are not equal in difficulty, the target training loss should be conditioned on the instance. Experiments on datasets covering four diverse input modalities - text, images, asynchronous event sequences, and tabular - demonstrate the versatility of AdaFlood across data domains and noise levels.
    摘要 尽管神经网络通常强制逼近零训练损失,但最近的研究发现,targeting非零训练损失阈值,也就是洪水水平,可以提高测试时通用性。然而,现有的方法都是对所有训练样本应用相同的常量洪水水平,这种假设所有样本都有相同的困难度。我们提出了AdaFlood,一种新的洪水规范方法,可以根据每个训练样本的困难度来适应洪水水平。这种思想是,训练样本不是一样困难,因此训练损失目标应该与实例相关。经过对四种不同的输入模式(文本、图像、异步事件序列和表格)的数据集进行实验,我们发现AdaFlood在数据领域和噪音水平上具有广泛的可行性。

MultiSPANS: A Multi-range Spatial-Temporal Transformer Network for Traffic Forecast via Structural Entropy Optimization

  • paper_url: http://arxiv.org/abs/2311.02880
  • repo_url: https://github.com/selgroup/multispans
  • paper_authors: Dongcheng Zou, Senzhang Wang, Xuefeng Li, Hao Peng, Yuandong Wang, Chunyang Liu, Kehua Sheng, Bo Zhang
    for:多元时间序列 regression 任务是交通管理和规划中的核心问题,但现有方法往往无法模型复杂的多范围依赖关系。methods:我们提出了 MultiSPANS,它包括多构成层检测(Multi-filter convolution modules)、ST-token嵌入(ST-token embeddings)和Transformers capture long-range temporal和空间依赖关系。此外,我们引入了结构 entropy 理论来优化空间注意力机制。results:我们的方法在真实世界交通数据中与多个现有方法进行比较,展现出优异性。长期历史窗口也能够有效地使用。代码可以在 https://github.com/SELGroup/MultiSPANS 上找到。
    Abstract Traffic forecasting is a complex multivariate time-series regression task of paramount importance for traffic management and planning. However, existing approaches often struggle to model complex multi-range dependencies using local spatiotemporal features and road network hierarchical knowledge. To address this, we propose MultiSPANS. First, considering that an individual recording point cannot reflect critical spatiotemporal local patterns, we design multi-filter convolution modules for generating informative ST-token embeddings to facilitate attention computation. Then, based on ST-token and spatial-temporal position encoding, we employ the Transformers to capture long-range temporal and spatial dependencies. Furthermore, we introduce structural entropy theory to optimize the spatial attention mechanism. Specifically, The structural entropy minimization algorithm is used to generate optimal road network hierarchies, i.e., encoding trees. Based on this, we propose a relative structural entropy-based position encoding and a multi-head attention masking scheme based on multi-layer encoding trees. Extensive experiments demonstrate the superiority of the presented framework over several state-of-the-art methods in real-world traffic datasets, and the longer historical windows are effectively utilized. The code is available at https://github.com/SELGroup/MultiSPANS.
    摘要 宽泛预测是一项复杂多变量时间序列回归任务,对交通管理和规划具有极高的重要性。然而,现有的方法 oftentimes 难以模型复杂的多范围依赖关系,使用本地空间时间特征和路网层次知识。为解决这个问题,我们提出 MultiSPANS。首先,我们认为单个记录点无法反映 kritical 的空间时间本地模式,因此我们设计了多滤波扫描模块,以生成有用的 ST-token 嵌入,以便计算注意力。然后,我们使用 ST-token 和空间时间位编码,采用 Transformers 来捕捉长距离的时间和空间依赖关系。此外,我们引入结构熵理论来优化空间注意力机制。具体来说,我们使用结构熵最小化算法来生成优化的路网层次结构,即编码树。基于这个结构,我们提出了一种相对结构熵基于位编码和多头注意力掩码 schemes。EXTensive experiments 表明我们的框架在实际交通数据上表现出色,并且可以有效地利用更长的历史窗口。代码可以在 https://github.com/SELGroup/MultiSPANS 中找到。

Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling

  • paper_url: http://arxiv.org/abs/2311.02879
  • repo_url: None
  • paper_authors: Wonho Bae, Jing Wang, Danica J. Sutherland
  • for: 本文针对meta-learning方法中的active learning问题进行研究,并提出了一种基于 Gaussian mixture 的选择点标注算法。
  • methods: 本文使用了active meta-learning方法,其中在meta-learning过程中选择点标注的部分使用了active learning。提出了一种基于 Gaussian mixture 的选择点标注算法,该算法简单且具有理论基础。
  • results: 对多个 benchmark 数据集进行测试,该算法在与其他state-of-the-art active learning方法相比而出色,显示了其高效性。
    Abstract Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided. In some settings, however, it is feasible to actively select which points to label; the potential gain from a careful choice is substantial, but the setting requires major differences from typical active learning setups. We clarify the ways in which active meta-learning can be used to label a context set, depending on which parts of the meta-learning process use active learning. Within this framework, we propose a natural algorithm based on fitting Gaussian mixtures for selecting which points to label; though simple, the algorithm also has theoretical motivation. The proposed algorithm outperforms state-of-the-art active learning methods when used with various meta-learning algorithms across several benchmark datasets.
    摘要 大多数元学习方法假设测试时用于建立新任务的(非常小)上下文集是通过无意识提供的。然而,在某些场景下,可以活动选择要标注的点,而且可以获得很大的提升,但这种设置需要与典型的活动学习设置有所不同。我们清楚地说明了在元学习过程中使用活动学习时如何选择标注点,以及这种方法在哪些方面使用活动学习。我们还提出了一种自然的 Gaussian mixture 适应算法,具有理论基础,并且在使用不同的元学习算法和数据集上比较其表现。这种算法可以超越当前的活动学习方法。

Sample Complexity Bounds for Estimating Probability Divergences under Invariances

  • paper_url: http://arxiv.org/abs/2311.02868
  • repo_url: None
  • paper_authors: Behrooz Tahmasebi, Stefanie Jegelka
  • for: 这个论文的目的是研究如何使用 Lie 群的自然�variance来提高数据生成模型中的 divergence 估计精度。
  • methods: 这个论文使用了 Sobolev Integral Probability Metrics (Sobolev IPMs)、Maximum Mean Discrepancy (MMD) 和 density 估计问题的复杂性来研究 Lie 群的自然�variance。
  • results: 这个论文的结果表明,在使用 Lie 群的自然�variance时,可以获得两个好处:首先,可以将样本复杂度减少一个多项式因子相应于群的大小(对于有限群)或 quotient 空间的 норализованvolume(对于正负 Dimension 的群);其次,可以提高 convergence 率的 exponent(对于正负 Dimension 的群)。这些结果对有限群的 recent bounds 进行了扩展,并且完全新的结果对正负 Dimension 的群。
    Abstract Group-invariant probability distributions appear in many data-generative models in machine learning, such as graphs, point clouds, and images. In practice, one often needs to estimate divergences between such distributions. In this work, we study how the inherent invariances, with respect to any smooth action of a Lie group on a manifold, improve sample complexity when estimating the Wasserstein distance, the Sobolev Integral Probability Metrics (Sobolev IPMs), the Maximum Mean Discrepancy (MMD), and also the complexity of the density estimation problem (in the $L^2$ and $L^\infty$ distance). Our results indicate a two-fold gain: (1) reducing the sample complexity by a multiplicative factor corresponding to the group size (for finite groups) or the normalized volume of the quotient space (for groups of positive dimension); (2) improving the exponent in the convergence rate (for groups of positive dimension). These results are completely new for groups of positive dimension and extend recent bounds for finite group actions.
    摘要 群体共轭概率分布出现在机器学习中的数据生成模型中,如图、点云和图像。在实践中,我们需要估计这些分布之间的差异。在这项工作中,我们研究了根据流体 Lie 群的满足满足的惯性变换对拓扑空间的固有不变性如何提高采样复杂性,以及 Wasserstein 距离、 Sobolev 概率度量、最大均值差(MMD)以及概率密度估计问题的复杂性。我们的结果表明,有两点提高:1. 通过流体 Lie 群的满足满足的惯性变换,可以将采样复杂性减少到群体大小(对于有限群)或抽象空间的归一化体积的多少倍。2. 对于有限维度的群体,可以提高征速度的指数倍数。这些结果是对有限维度群体的扩展,并且完全新的。

Barron Space for Graph Convolution Neural Networks

  • paper_url: http://arxiv.org/abs/2311.02838
  • repo_url: None
  • paper_authors: Seok-Young Chung, Qiyu Sun
  • for: 本研究旨在探讨图像领域中的图 convolutional neural network (GCNN) 的性能和可学习性。
  • methods: 本文引入了 Barron 空间函数的概念,并证明了该空间是一个 reproduce kernel Banach space,可以分解为一族 reproducing kernel Hilbert spaces with neuron kernels,并且可以densely embed在图像领域中的连续函数空间中。
  • results: 本文显示了 GCNN 的输出在 Barron 空间中,并且证明了 functions 在 Barron 空间中可以高效地被 GCNN 的输出 approximated 。此外,本文还估算了 Rademacher complexity 函数的 Barron нор的值,并证明了 functions 在 Barron 空间中可以高效地被随机抽样学习。
    Abstract Graph convolutional neural network (GCNN) operates on graph domain and it has achieved a superior performance to accomplish a wide range of tasks. In this paper, we introduce a Barron space of functions on a compact domain of graph signals. We prove that the proposed Barron space is a reproducing kernel Banach space, it can be decomposed into the union of a family of reproducing kernel Hilbert spaces with neuron kernels, and it could be dense in the space of continuous functions on the domain. Approximation property is one of the main principles to design neural networks. In this paper, we show that outputs of GCNNs are contained in the Barron space and functions in the Barron space can be well approximated by outputs of some GCNNs in the integrated square and uniform measurements. We also estimate the Rademacher complexity of functions with bounded Barron norm and conclude that functions in the Barron space could be learnt from their random samples efficiently.
    摘要 图像卷积神经网络(GCNN)在图像领域中运行,并达到了广泛的任务。在本文中,我们引入一个Barron空间函数在紧凑领域上的图像信号上。我们证明该提案的Barron空间是一个复制kernel Banach空间,可以分解为一家的 reproduce kernel Hilbert space with neuron kernels,并且可以在领域上密集的函数空间中 dense。 Approximation property是设计神经网络的一个主要原则。在本文中,我们显示GCNN的输出在Barron空间中包含,并且函数在Barron空间中可以通过一些GCNN的集成方差和均匀测试方法来良好地逼近。我们还估计函数的Rademacher复杂性,并结论是在Random sample上有效地学习函数在Barron空间中。

Prioritized Propagation in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.02832
  • repo_url: None
  • paper_authors: Yao Cheng, Minjie Chen, Xiang Li, Caihua Shan, Ming Gao
  • for: 本研究旨在提高图形神经网络(GNNs)中的节点层次传播学习,以便为不同节点设置个性化传播步骤。
  • methods: 本研究提出了一个通用框架PPro,可以与现有的大多数GNN模型集成,并学习优化节点层次传播步骤。该框架包括三个组成部分:基础GNN模型、传播控制器和质量控制器。我们还提出了一种互相增强机制来计算节点优先级、最佳传播步骤和标签预测。
  • results: 我们在8个标准 benchmark 数据集上进行了广泛的实验 comparative study,发现我们的框架可以在节点传播策略和节点表示方面达到更高的性能。
    Abstract Graph neural networks (GNNs) have recently received significant attention. Learning node-wise message propagation in GNNs aims to set personalized propagation steps for different nodes in the graph. Despite the success, existing methods ignore node priority that can be reflected by node influence and heterophily. In this paper, we propose a versatile framework PPro, which can be integrated with most existing GNN models and aim to learn prioritized node-wise message propagation in GNNs. Specifically, the framework consists of three components: a backbone GNN model, a propagation controller to determine the optimal propagation steps for nodes, and a weight controller to compute the priority scores for nodes. We design a mutually enhanced mechanism to compute node priority, optimal propagation step and label prediction. We also propose an alternative optimization strategy to learn the parameters in the backbone GNN model and two parametric controllers. We conduct extensive experiments to compare our framework with other 11 state-of-the-art competitors on 8 benchmark datasets. Experimental results show that our framework can lead to superior performance in terms of propagation strategies and node representations.
    摘要 граф нейронных сетей (GNNs) 在最近吸引了广泛关注。学习节点级消息传递在 GNNs 中的目标是为不同的节点设置个性化的传递步骤。尽管已经取得成功,现有方法忽略了节点优先级,这可以通过节点影响和异质性来反映。在本文中,我们提出了一个通用的框架 PPro,可以与大多数现有的 GNN 模型集成,并学习在 GNN 中个性化节点级消息传递。具体来说,框架包括三部分:基础 GNN 模型、传递控制器和Node 的优先级分配器。我们设计了互相增强的机制来计算节点优先级、最佳传递步骤和标签预测。我们还提出了一种代替优化策略来学习 Parameters 在基础 GNN 模型和两个 parametric 控制器中。我们进行了广泛的实验比较我们的框架与 11 个现有竞争对手在 8 个 benchmark 数据集上。实验结果表明,我们的框架可以在传递策略和节点表示方面取得优越的表现。

On Subagging Boosted Probit Model Trees

  • paper_url: http://arxiv.org/abs/2311.02827
  • repo_url: None
  • paper_authors: Tian Qin, Wei-Min Huang
  • For: 这个论文的目的是提出一个新的混合袋包-提升算法,以便在分类问题上进行更好的预测。* Methods: 这个算法使用了 variance-bias decomposition 的思想,并提出了一个新的树模型called Probit Model Tree (PMT),用于 AdaBoost 程序中的基底分类器。在袋包部分,相比于传统的抽样法,我们执行了强化 PMT 的过程,并将其结合成一个强大的 “委员会”,可以看作是一个不完整的 U-统计。* Results: 我们的理论分析显示:1. SBPMT 是在某些假设下一致的;2. 增加袋包时间可以对 SBPMT 的泛化错误产生一定的降低作用;3. PMT 中的 ProbitBoost 迭代可以帮助 SBPMT 在 fewer AdaBoost 步骤下表现更好。这三个性能都被 Mease 和 Wyner (2008)的著名的模拟所验证。最后两个点也提供了有用的问题定uning。与其他现有的分类方法相比,提出的 SBPMT 算法具有优秀的预测力,并在一些情况下表现出色。
    Abstract With the insight of variance-bias decomposition, we design a new hybrid bagging-boosting algorithm named SBPMT for classification problems. For the boosting part of SBPMT, we propose a new tree model called Probit Model Tree (PMT) as base classifiers in AdaBoost procedure. For the bagging part, instead of subsampling from the dataset at each step of boosting, we perform boosted PMTs on each subagged dataset and combine them into a powerful "committee", which can be viewed an incomplete U-statistic. Our theoretical analysis shows that (1) SBPMT is consistent under certain assumptions, (2) Increase the subagging times can reduce the generalization error of SBPMT to some extent and (3) Large number of ProbitBoost iterations in PMT can benefit the performance of SBPMT with fewer steps in the AdaBoost part. Those three properties are verified by a famous simulation designed by Mease and Wyner (2008). The last two points also provide a useful guidance in model tuning. A comparison of performance with other state-of-the-art classification methods illustrates that the proposed SBPMT algorithm has competitive prediction power in general and performs significantly better in some cases.
    摘要 通过变iance-bias decomposition的视角,我们设计了一种新的杂合bagging-boosting算法,名为SBPMT,用于解决分类问题。在 boosting 部分中,我们提议一种新的树模型,即概率模型树(PMT)作为 AdaBoost 过程中的基础分类器。在 bagging 部分中,而不是在每次 boosting 时间点上从数据集中采样,我们在每次 boosted PMT 中使用每个 subagged 数据集进行 boosted PMT,并将它们组合成一个强大的 "委员会",可以被视为一个不完全的 U-统计。我们的理论分析表明:1)SBPMT 在某些假设下是一个consistent的算法,2)增加 subagging 次数可以有所减少 SBPMT 的泛化误差,3)在 PMT 中增加多少 ProbitBoost 迭代可以提高 SBPMT 的性能,但是需要 fewer steps in AdaBoost 部分。这三点听起来都被 Mease 和 Wyner (2008)的著名的 simulations 验证。最后两点还提供了有用的模型调整指南。与其他当前的分类方法相比,我们的提出的 SBPMT 算法具有竞争的预测力,并在一些情况下表现得更好。

Signal Processing Meets SGD: From Momentum to Filter

  • paper_url: http://arxiv.org/abs/2311.02818
  • repo_url: None
  • paper_authors: Zhipeng Yao, Guisong Chang, Jiaqi Zhang, Qi Zhang, Yu Zhang, Dazhou Li
  • for: 这 paper 的目的是探讨降低历史梯度的方差对当前梯度估计的影响,以提高深度学习优化器的性能。
  • methods: 该 paper 使用了基于减少方差的新优化方法,并采用了wiener filter理论来增强SGD的初始方差估计。Specifically, the adaptive weight dynamically changes along with temporal fluctuation of gradient variance during deep learning model training.
  • results: 实验结果表明,该 proposed adaptive weight optimizer(SGDF)可以与当前顶峰性状态的优化器进行比较,并且可以实现满意的性能。
    Abstract In the field of deep learning, Stochastic Gradient Descent (SGD) and its momentum-based variants are the predominant choices for optimization algorithms. Despite all that, these momentum strategies, which accumulate historical gradients by using a fixed $\beta$ hyperparameter to smooth the optimization processing, often neglect the potential impact of the variance of historical gradients on the current gradient estimation. In the gradient variance during training, fluctuation indicates the objective function does not meet the Lipschitz continuity condition at all time, which raises the troublesome optimization problem. This paper aims to explore the potential benefits of reducing the variance of historical gradients to make optimizer converge to flat solutions. Moreover, we proposed a new optimization method based on reducing the variance. We employed the Wiener filter theory to enhance the first moment estimation of SGD, notably introducing an adaptive weight to optimizer. Specifically, the adaptive weight dynamically changes along with temporal fluctuation of gradient variance during deep learning model training. Experimental results demonstrated our proposed adaptive weight optimizer, SGDF (Stochastic Gradient Descent With Filter), can achieve satisfactory performance compared with state-of-the-art optimizers.
    摘要 在深度学习领域,Stochastic Gradient Descent(SGD)和其具有推移性的变体是主要的优化算法。然而,这些推移策略通常忽视了历史梯度的变量对当前梯度估计的影响。在训练过程中,梯度变动的方差可能会导致优化问题。本文旨在探讨减少历史梯度的变量可能会使优化器 converge to 平滑解。此外,我们提出了一种基于减少变量的新优化方法。我们利用了 Wiener 约束理论来增强 SGD 的首个期预测,特别是在深度学习模型训练中引入了一个适应性的权重。具体来说,适应性权重会随着梯度方差的时间变化而变化。实验结果表明,我们的提出的适应性权重优化器(SGDF)可以与当前的优化器相比,达到满意的性能。

APGL4SR: A Generic Framework with Adaptive and Personalized Global Collaborative Information in Sequential Recommendation

  • paper_url: http://arxiv.org/abs/2311.02816
  • repo_url: https://github.com/graph-team/apgl4sr
  • paper_authors: Mingjia Yin, Hao Wang, Xiang Xu, Likang Wu, Sirui Zhao, Wei Guo, Yong Liu, Ruiming Tang, Defu Lian, Enhong Chen
  • for: 这篇论文的目的是提出一个基于图的推荐系统框架,以提高序列推荐的效果。
  • methods: 这篇论文使用了一个叫做 Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR) 的图driven框架,它利用自适应和个人化的全局合作信息来提高序列推荐的性能。
  • results: 根据论文的结果,APGL4SR 可以与其他基于图的方法进行比较,并且具有更好的推荐性能。
    Abstract The sequential recommendation system has been widely studied for its promising effectiveness in capturing dynamic preferences buried in users' sequential behaviors. Despite the considerable achievements, existing methods usually focus on intra-sequence modeling while overlooking exploiting global collaborative information by inter-sequence modeling, resulting in inferior recommendation performance. Therefore, previous works attempt to tackle this problem with a global collaborative item graph constructed by pre-defined rules. However, these methods neglect two crucial properties when capturing global collaborative information, i.e., adaptiveness and personalization, yielding sub-optimal user representations. To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems. Specifically, we first learn an adaptive global graph among all items and capture global collaborative information with it in a self-supervised fashion, whose computational burden can be further alleviated by the proposed SVD-based accelerator. Furthermore, based on the graph, we propose to extract and utilize personalized item correlations in the form of relative positional encoding, which is a highly compatible manner of personalizing the utilization of global collaborative information. Finally, the entire framework is optimized in a multi-task learning paradigm, thus each part of APGL4SR can be mutually reinforced. As a generic framework, APGL4SR can outperform other baselines with significant margins. The code is available at https://github.com/Graph-Team/APGL4SR.
    摘要 《序列推荐系统的全球协同学习》(APGL4SR)是一种基于图的框架,旨在Integrating adaptive和个性化的全球协同信息到序列推荐系统中。在现有的方法中,通常只关注于内部序列模型,而忽略了全球协同信息的利用,导致推荐性能下降。因此,以前的工作通常是通过预定的规则建立全球协同项目图来解决这个问题。但这些方法忽略了两个关键的特性,即适应性和个性化,从而导致用户表示不准确。APGL4SR提出了一种解决这个问题的方法,即在序列推荐系统中 incorporating adaptive和个性化的全球协同信息。具体来说,我们首先学习了一个适应的全球图,用于捕捉全球协同信息,并在自然学习方式下进行计算。此外,我们还提出了基于图的个性化项耦合方法,通过对每个用户进行特定的排序,使用个性化的项耦合来优化用户表示。最后,我们将整个框架进行多任务学习,以便每个部分都能够互相增强。由于APGL4SR是一种通用的框架,因此它可以与其他基准值相比,并且具有显著的优势。代码可以在 上获取。

On the Intersection of Self-Correction and Trust in Language Models

  • paper_url: http://arxiv.org/abs/2311.02801
  • repo_url: None
  • paper_authors: Satyapriya Krishna
  • for: 这篇论文的目的是调查自我更正能力是否能够提高大型自然语言模型的可靠性。
  • methods: 这篇论文使用了两个关键方面的实验来调查自我更正的效果:一是对当前任务的真实性进行评估,二是对模型的毒害性进行评估。
  • results: 实验结果显示,自我更正可以改善模型的毒害性和真实性,但这些改善的程度因任务的特点和自我更正的形式而异。此外,研究还发现了一些“自我犹豫”现象,需要进一步的解决。
    Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex cognitive tasks. However, their complexity and lack of transparency have raised several trustworthiness concerns, including the propagation of misinformation and toxicity. Recent research has explored the self-correction capabilities of LLMs to enhance their performance. In this work, we investigate whether these self-correction capabilities can be harnessed to improve the trustworthiness of LLMs. We conduct experiments focusing on two key aspects of trustworthiness: truthfulness and toxicity. Our findings reveal that self-correction can lead to improvements in toxicity and truthfulness, but the extent of these improvements varies depending on the specific aspect of trustworthiness and the nature of the task. Interestingly, our study also uncovers instances of "self-doubt" in LLMs during the self-correction process, introducing a new set of challenges that need to be addressed.
    摘要