cs.LG - 2023-07-17

A Study on the Performance of Generative Pre-trained Transformer (GPT) in Simulating Depressed Individuals on the Standardized Depressive Symptom Scale

  • paper_url: http://arxiv.org/abs/2307.08576
  • repo_url: None
  • paper_authors: Sijin Cai, Nanfeng Zhang, Jiaying Zhu, Yanjie Liu, Yongjin Zhou
    for:* 这个论文的目的是评估GPT技术在诊断抑郁症中的潜力。methods:* 这个论文使用了三种抑郁症评估工具(HAMD-17、SDS、GDS-15),并在两个实验中使用GPT模拟抑郁症和正常个体的响应。results:* GPT在抑郁症评估中表现出了准确的性能,能够与正常个体和抑郁症个体的反应保持一致。* GPT在不同的抑郁程度下的表现存在一些差异,其表现更好的是在高敏感度的评估工具上。I hope that helps! Let me know if you have any other questions.
    Abstract Background: Depression is a common mental disorder with societal and economic burden. Current diagnosis relies on self-reports and assessment scales, which have reliability issues. Objective approaches are needed for diagnosing depression. Objective: Evaluate the potential of GPT technology in diagnosing depression. Assess its ability to simulate individuals with depression and investigate the influence of depression scales. Methods: Three depression-related assessment tools (HAMD-17, SDS, GDS-15) were used. Two experiments simulated GPT responses to normal individuals and individuals with depression. Compare GPT's responses with expected results, assess its understanding of depressive symptoms, and performance differences under different conditions. Results: GPT's performance in depression assessment was evaluated. It aligned with scoring criteria for both individuals with depression and normal individuals. Some performance differences were observed based on depression severity. GPT performed better on scales with higher sensitivity. Conclusion: GPT accurately simulates individuals with depression and normal individuals during depression-related assessments. Deviations occur when simulating different degrees of depression, limiting understanding of mild and moderate cases. GPT performs better on scales with higher sensitivity, indicating potential for developing more effective depression scales. GPT has important potential in depression assessment, supporting clinicians and patients.
    摘要 背景:抑郁是一种常见的心理疾病,带来社会和经济的负担。现在的诊断仍然基于自我报告和评估工具,这些工具的可靠性存在问题。需要更Objective的方法来诊断抑郁。目标:评估GPT技术在诊断抑郁方面的潜力。判断它能够模拟抑郁症状和调查不同程度的抑郁症状对响应。方法:使用三种抑郁相关的评估工具(HAMD-17、SDS、GDS-15)。进行两个实验,通过GPT对正常人和抑郁人群的响应进行模拟,与预期结果进行比较,评估它对抑郁症状的理解程度和不同情况下的表现差异。结果:GPT在抑郁诊断方面的表现被评估。它与正常人和抑郁人群的评分标准相符。有些情况下,随着抑郁的严重程度不同,GPT的表现会有所不同。GPT在感知度较高的评价工具上表现更好,表明GPT可能有助于开发更有效的抑郁评价工具。结论:GPT可以准确模拟正常人和抑郁人群在抑郁相关评估中的表现,但是在不同程度的抑郁中会出现一些差异。GPT在感知度较高的评价工具上表现更好,表明它可能有助于开发更有效的抑郁评价工具,支持临床医生和患者。

FedCME: Client Matching and Classifier Exchanging to Handle Data Heterogeneity in Federated Learning

  • paper_url: http://arxiv.org/abs/2307.08574
  • repo_url: None
  • paper_authors: Jun Nie, Danyang Xiao, Lei Yang, Weigang Wu
  • for: 本研究的目的是解决 federated learning 中的数据不同性问题,以提高模型的全球性性和性能。
  • methods: 本研究提出了一种新的 federated learning 框架,即 FedCME,通过客户端匹配和分类器交换来解决数据不同性问题。在本方法中,客户端的匹配和分类器交换可以更好地调整本地模型的训练方向,从而缓解本地更新偏移。此外,本研究还提出了一种特征对齐方法来增强特征提取器的训练。
  • results: 实验结果表明,FedCME 比 FedAvg、FedProx、MOON 和 FedRS 在 FMNIST 和 CIFAR10 等常用 federated learning 测试 benchmark 上表现更好,特别在数据不同性的情况下。
    Abstract Data heterogeneity across clients is one of the key challenges in Federated Learning (FL), which may slow down the global model convergence and even weaken global model performance. Most existing approaches tackle the heterogeneity by constraining local model updates through reference to global information provided by the server. This can alleviate the performance degradation on the aggregated global model. Different from existing methods, we focus the information exchange between clients, which could also enhance the effectiveness of local training and lead to generate a high-performance global model. Concretely, we propose a novel FL framework named FedCME by client matching and classifier exchanging. In FedCME, clients with large differences in data distribution will be matched in pairs, and then the corresponding pair of clients will exchange their classifiers at the stage of local training in an intermediate moment. Since the local data determines the local model training direction, our method can correct update direction of classifiers and effectively alleviate local update divergence. Besides, we propose feature alignment to enhance the training of the feature extractor. Experimental results demonstrate that FedCME performs better than FedAvg, FedProx, MOON and FedRS on popular federated learning benchmarks including FMNIST and CIFAR10, in the case where data are heterogeneous.
    摘要 “数据不同性是联邦学习(FL)中的一个关键挑战,可能会 slow down 全球模型的融合和甚至弱化全球模型的性能。现有的方法通常通过对本地模型更新进行约束,通过服务器提供的全球信息进行参考。这可以减轻全球模型的性能下降。与现有方法不同,我们将注重客户端之间的信息交换,可以提高本地训练的效iveness,并通过生成高性能的全球模型。具体来说,我们提出了一种新的联邦学习框架,名为FedCME。在FedCME中,客户端之间的数据分布差异较大的将被匹配,并在本地训练阶段进行交换类型的分类器。由于本地数据决定本地模型训练方向,我们的方法可以正确更新类ifiers,有效地缓解本地更新偏移。此外,我们还提出了特征对齐来强化特征提取器的训练。实验结果表明,FedCME在FMNIST和CIFAR10等流行的联邦学习 benchmark 上表现比 FedAvg、FedProx、MOON 和 FedRS 更好,即使数据具有不同性。”

Revisiting the Robustness of the Minimum Error Entropy Criterion: A Transfer Learning Case Study

  • paper_url: http://arxiv.org/abs/2307.08572
  • repo_url: https://github.com/lpsilvestrin/mee-finetune
  • paper_authors: Luis Pedro Silvestrin, Shujian Yu, Mark Hoogendoorn
  • for: 本研究旨在探讨在实际任务中常见的分布偏移问题下,使用传输学习方法达到良好性能。
  • methods: 本研究使用了最小错误Entropy(MEE) criterion,一种广泛用于统计信号处理中对非高斯噪声的优化目标,并investigated its feasibility和usefulness在实际传输学习回归任务中。
  • results: 研究发现,通过简单地将基本传输学习算法中的MSE损失换为MEE,可以达到与现状技术传输学习算法相当的性能。
    Abstract Coping with distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.
    摘要 handle distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.Here's the translation in Traditional Chinese:handle distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.

Deep Learning with Passive Optical Nonlinear Mapping

  • paper_url: http://arxiv.org/abs/2307.08558
  • repo_url: None
  • paper_authors: Fei Xia, Kyungduk Kim, Yaniv Eliezer, Liam Shaughnessy, Sylvain Gigan, Hui Cao
  • for: 这项研究旨在开发一种基于光学加速器的 Deep Learning 系统,以提高人工智能的性能和能效性。
  • methods: 该研究使用了多散射在反射室中的技术,实现了无需加速器的光学非线性随机映射,以提高计算性能。
  • results: 研究发现,通过光学数据压缩和数字解码器,可以实现高性能、高压缩比的实时人体检测和其他计算任务。
    Abstract Deep learning has fundamentally transformed artificial intelligence, but the ever-increasing complexity in deep learning models calls for specialized hardware accelerators. Optical accelerators can potentially offer enhanced performance, scalability, and energy efficiency. However, achieving nonlinear mapping, a critical component of neural networks, remains challenging optically. Here, we introduce a design that leverages multiple scattering in a reverberating cavity to passively induce optical nonlinear random mapping, without the need for additional laser power. A key advantage emerging from our work is that we show we can perform optical data compression, facilitated by multiple scattering in the cavity, to efficiently compress and retain vital information while also decreasing data dimensionality. This allows rapid optical information processing and generation of low dimensional mixtures of highly nonlinear features. These are particularly useful for applications demanding high-speed analysis and responses such as in edge computing devices. Utilizing rapid optical information processing capabilities, our optical platforms could potentially offer more efficient and real-time processing solutions for a broad range of applications. We demonstrate the efficacy of our design in improving computational performance across tasks, including classification, image reconstruction, key-point detection, and object detection, all achieved through optical data compression combined with a digital decoder. Notably, we observed high performance, at an extreme compression ratio, for real-time pedestrian detection. Our findings pave the way for novel algorithms and architectural designs for optical computing.
    摘要 A key advantage emerging from our work is that we show we can perform optical data compression, facilitated by multiple scattering in the cavity, to efficiently compress and retain vital information while also decreasing data dimensionality. This allows rapid optical information processing and generation of low-dimensional mixtures of highly nonlinear features. These are particularly useful for applications demanding high-speed analysis and responses such as in edge computing devices.Utilizing rapid optical information processing capabilities, our optical platforms could potentially offer more efficient and real-time processing solutions for a broad range of applications. We demonstrate the efficacy of our design in improving computational performance across tasks, including classification, image reconstruction, key-point detection, and object detection, all achieved through optical data compression combined with a digital decoder. Notably, we observed high performance, at an extreme compression ratio, for real-time pedestrian detection.Our findings pave the way for novel algorithms and architectural designs for optical computing.

Machine-Learning-based Colorectal Tissue Classification via Acoustic Resolution Photoacoustic Microscopy

  • paper_url: http://arxiv.org/abs/2307.08556
  • repo_url: None
  • paper_authors: Shangqing Tong, Peng Ge, Yanan Jiao, Zhaofu Ma, Ziye Li, Longhai Liu, Feng Gao, Xiaohui Du, Fei Gao
  • for: 检测肠癌的有效方法
  • methods: 使用机器学习基于ARPAM技术进行肠部细胞分类
  • results: 通过多种机器学习方法对肠部细胞进行分类,并对结果进行量化和质量分析以评估方法效果
    Abstract Colorectal cancer is a deadly disease that has become increasingly prevalent in recent years. Early detection is crucial for saving lives, but traditional diagnostic methods such as colonoscopy and biopsy have limitations. Colonoscopy cannot provide detailed information within the tissues affected by cancer, while biopsy involves tissue removal, which can be painful and invasive. In order to improve diagnostic efficiency and reduce patient suffering, we studied machine-learningbased approach for colorectal tissue classification that uses acoustic resolution photoacoustic microscopy (ARPAM). With this tool, we were able to classify benign and malignant tissue using multiple machine learning methods. Our results were analyzed both quantitatively and qualitatively to evaluate the effectiveness of our approach.
    摘要 COLERECTAL CANCER 是一种致命的疾病,在最近几年内逐渐增加。早期检测是保存生命的关键,但传统的诊断方法,如colonoscopy和biopsy,有限制。colonoscopy无法提供癌变组织中的详细信息,而biopsy则需要组织切除,可能很痛苦和侵入性。为了改善诊断效率和减少患者的痛苦,我们研究了机器学习基于的抑制癌变组织分类方法,使用高分辨率光子振荡显微镜(ARPAM)。我们使用多种机器学习方法来分类健康和癌变组织。我们的结果经过了量化和质数分析,以评估我们的方法的有效性。

Multi-class point cloud completion networks for 3D cardiac anatomy reconstruction from cine magnetic resonance images

  • paper_url: http://arxiv.org/abs/2307.08535
  • repo_url: None
  • paper_authors: Marcel Beetz, Abhirup Banerjee, Julius Ossenberg-Engels, Vicente Grau
  • for: 这篇论文是为了提出一种全自动的三维心脏形态重建pipeline,用于从硬件磁共振成像(cine MRI)数据中提取多类三维心脏形态模型。
  • methods: 该ipeline使用了一种新型的多类点云完成网络(PCCN)来解决三维重建任务中的稀疏性和不对称性问题,并且在大量的Synthetic数据集上进行了评估。
  • results: 研究发现,使用PCCN可以在不同程度的扫描角度下实现下面或相似于原始图像分辨率的 Chamfer距离,并且相比于参考的3D U-Net模型,减少了32%和24%的重建误差。此外,该ipeline在UK Biobank研究中的1000个主题中实现了准确且 topological plausible的双心脏形态模型,并且与之前的文献中的临床指标相似。 finally, 研究发现该方法在多种常见异常条件下的稳定性。
    Abstract Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a novel fully automatic surface reconstruction pipeline capable of reconstructing multi-class 3D cardiac anatomy meshes from raw cine MRI acquisitions. Its key component is a multi-class point cloud completion network (PCCN) capable of correcting both the sparsity and misalignment issues of the 3D reconstruction task in a unified model. We first evaluate the PCCN on a large synthetic dataset of biventricular anatomies and observe Chamfer distances between reconstructed and gold standard anatomies below or similar to the underlying image resolution for multiple levels of slice misalignment. Furthermore, we find a reduction in reconstruction error compared to a benchmark 3D U-Net by 32% and 24% in terms of Hausdorff distance and mean surface distance, respectively. We then apply the PCCN as part of our automated reconstruction pipeline to 1000 subjects from the UK Biobank study in a cross-domain transfer setting and demonstrate its ability to reconstruct accurate and topologically plausible biventricular heart meshes with clinical metrics comparable to the previous literature. Finally, we investigate the robustness of our proposed approach and observe its capacity to successfully handle multiple common outlier conditions.
    摘要 magnetic resonance imaging (MRI) 是当今心脏形态和功能评估的标准金属。然而,它通常只取得心脏三维形态的二维图像,因此限制了对健康和疾病心脏形态和physiology的理解和分析。在这篇论文中,我们提议一种全自动表面重建管道,可以从raw cine MRI获得多类三维心脏形态矩阵。其关键组件是一种多类点云完成网络(PCCN),可以在三维重建任务中解决缺失和不对称问题。我们首先在大量的人工数据集上评估PCCN,并观察到下同图像分辨率下的Chamfer距离。此外,我们发现与参考3D U-Net模型相比,PCCN可以降低重建错误的 Hausdorff距离和平均表面距离,分别降低32%和24%。然后,我们将PCCN作为自动重建管道的一部分应用于UK Biobank研究中的1000名参与者,并证明它能够重建准确和可靠的两个心脏宫室表面矩阵,与前一代文献的临床指标相符。最后,我们调查了我们提议的方法的稳定性,并发现它可以成功处理多种常见异常情况。

Nonlinear Processing with Linear Optics

  • paper_url: http://arxiv.org/abs/2307.08533
  • repo_url: None
  • paper_authors: Mustafa Yildirim, Niyazi Ulas Dinc, Ilker Oguz, Demetri Psaltis, Christophe Moser
  • for: 这个论文旨在实现多层光网络,并且解决在不使用电子组件的情况下实现多层光网络的挑战。
  • methods: 这篇论文提出了一种新的框架,使用多散射来实现可编程的线性和非线性变换,并且可以在低光功率 kontinuierender CW 光中实现非线性光计算。
  • results: 理论和实验研究表明,通过多散射重复数据可以实现低功率连续波光的非线性计算。
    Abstract Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the absence of low-power optical nonlinearities, the challenge in the implementation of multilayer optical networks lies in realizing multiple optical layers without resorting to electronic components. In this study, we present a novel framework that uses multiple scattering that is capable of synthesizing programmable linear and nonlinear transformations concurrently at low optical power by leveraging the nonlinear relationship between the scattering potential, represented by data, and the scattered field. Theoretical and experimental investigations show that repeating the data by multiple scattering enables non-linear optical computing at low power continuous wave light.
    摘要 文本翻译为简化中文:深度神经网络在数据处理多层级处理中提取隐藏表示的成就很大,但是需要大量电子计算能力。为了提高能效性和速度, оптиче实现神经网络寻求利用光波宽频带和光通信的优点。在缺乏低功率光非线性下,实现多层光网络的挑战在于不使用电子组件实现多个光层。本研究提出了一种新的框架,利用多散射实现可编程的线性和非线性变换,并在低光力连续波光下实现非线性光计算。理论和实验研究表明,通过多散射复制数据可实现低功率连续波光非线性计算。Here is the translation of the text into Simplified Chinese:深度神经网络在数据处理多层级处理中提取隐藏表示的成就很大,但是需要大量电子计算能力。为了提高能效性和速度, оптиче实现神经网络寻求利用光波宽频带和光通信的优点。在缺乏低功率光非线性下,实现多层光网络的挑战在于不使用电子组件实现多个光层。本研究提出了一种新的框架,利用多散射实现可编程的线性和非线性变换,并在低光力连续波光下实现非线性光计算。理论和实验研究表明,通过多散射复制数据可实现低功率连续波光非线性计算。

LuckyMera: a Modular AI Framework for Building Hybrid NetHack Agents

  • paper_url: http://arxiv.org/abs/2307.08532
  • repo_url: https://github.com/pervasive-ai-lab/luckymera
  • paper_authors: Luigi Quarantiello, Simone Marzeddu, Antonio Guzzi, Vincenzo Lomonaco
  • for: 这个论文的目的是提出一个可 configurable、可扩展的 AI 框架,用于在 NetHack 游戏中测试和训练 AI 代理人。
  • methods: 这个框架使用了 симвоlic 和神经网络学习方法,并提供了一些实用功能来保存经验和用于训练神经网络。
  • results: 经验证明,这个框架可以实现 state-of-the-art 的表现在完整的 NetHack 游戏中,并且提供了一个强大的基线代理人。
    Abstract In the last few decades we have witnessed a significant development in Artificial Intelligence (AI) thanks to the availability of a variety of testbeds, mostly based on simulated environments and video games. Among those, roguelike games offer a very good trade-off in terms of complexity of the environment and computational costs, which makes them perfectly suited to test AI agents generalization capabilities. In this work, we present LuckyMera, a flexible, modular, extensible and configurable AI framework built around NetHack, a popular terminal-based, single-player roguelike video game. This library is aimed at simplifying and speeding up the development of AI agents capable of successfully playing the game and offering a high-level interface for designing game strategies. LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches, with the possibility of creating compositional hybrid solutions. Additionally, LuckyMera comes with a set of utility features to save its experiences in the form of trajectories for further analysis and to use them as datasets to train neural modules, with a direct interface to the NetHack Learning Environment and MiniHack. Through an empirical evaluation we validate our skills implementation and propose a strong baseline agent that can reach state-of-the-art performances in the complete NetHack game. LuckyMera is open-source and available at https://github.com/Pervasive-AI-Lab/LuckyMera.
    摘要 在过去几十年中,人工智能(AI)领域已经经历了 significative 的发展,很大一部分归功于各种测试环境和游戏的可用性。 Among them, roguelike 游戏提供了一个非常好的复杂性环境和计算成本的trade-off,使其成为测试 AI 代理的 идеal 选择。在这项工作中,我们介绍了 LuckyMera,一个灵活、可模块化、可扩展和配置化的 AI 框架,基于 NetHack terminal 基于单player 游戏。这个库的目标是为 AI 代理的开发提供简单化和加速的方式,并提供一个高级接口来设计游戏策略。LuckyMera 包含一些固定的符号学模块(called "skills")和神经网络学习方法,以及可创建compositional 混合解决方案。此外,LuckyMera 还提供了一些实用功能,以保存其经验为 trajectories 进行后续分析,并直接与 NetHack Learning Environment 和 MiniHack 进行交互。通过实验 validate 我们的技能实现,并提出了一个强大的基线代理,可以在完整的 NetHack 游戏中达到状态艺术表现。LuckyMera 是开源的,可以在 https://github.com/Pervasive-AI-Lab/LuckyMera 上获取。

Synthetic Lagrangian Turbulence by Generative Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.08529
  • repo_url: https://github.com/smartturb/diffusion-lagr
  • paper_authors: Tianyi Li, Luca Biferale, Fabio Bonaccorso, Martino Andrea Scarpolini, Michele Buzzicotti
  • For: The paper aims to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers using a machine learning approach.* Methods: The paper proposes a state-of-the-art Diffusion Model to generate the trajectories, which bypasses the need for direct numerical simulations or experiments to obtain reliable Lagrangian data.* Results: The model demonstrates the ability to quantitatively reproduce all relevant statistical benchmarks over the entire range of time scales, including the presence of fat tails distribution for the velocity increments, anomalous power law, and enhancement of intermittency around the dissipative scale. The model also exhibits good generalizability for extreme events, achieving unprecedented intensity and rarity.
    Abstract Lagrangian turbulence lies at the core of numerous applied and fundamental problems related to the physics of dispersion and mixing in engineering, bio-fluids, atmosphere, oceans, and astrophysics. Despite exceptional theoretical, numerical, and experimental efforts conducted over the past thirty years, no existing models are capable of faithfully reproducing statistical and topological properties exhibited by particle trajectories in turbulence. We propose a machine learning approach, based on a state-of-the-art Diffusion Model, to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers, thereby bypassing the need for direct numerical simulations or experiments to obtain reliable Lagrangian data. Our model demonstrates the ability to quantitatively reproduce all relevant statistical benchmarks over the entire range of time scales, including the presence of fat tails distribution for the velocity increments, anomalous power law, and enhancement of intermittency around the dissipative scale. The model exhibits good generalizability for extreme events, achieving unprecedented intensity and rarity. This paves the way for producing synthetic high-quality datasets for pre-training various downstream applications of Lagrangian turbulence.
    摘要 拉格朗日流动在许多应用和基础问题中扮演重要角色,包括物理杂化和混合在工程、生物流体、大气、海洋和astrophysics中。尽管过去三十年来有过 Exceptional theoretical, numerical, and experimental efforts,但现有的模型无法准确地复制流动中粒子轨迹的统计和 тополоڤ���IC Properties。我们提出一种基于当前最佳Diffusion Model的机器学习方法,可以在高 Reynolds 数下生成三维流动中的单粒子轨迹,并且不需要直接进行数值 simulate or experiment to obtain reliable Lagrangian data。我们的模型能够准确地复制所有有关时间尺度的统计标准,包括粒子增量的轮廓分布、罕见的功率律和在混合度下的增强。这种模型具有良好的通用性,能够生成extreme events,达到了历史上最高的Intensity和罕见性。这些Synthetic高质量数据可以用于PRE-TRAINING various downstream应用程序。

Multi-Domain Learning with Modulation Adapters

  • paper_url: http://arxiv.org/abs/2307.08528
  • repo_url: None
  • paper_authors: Ekaterina Iakovleva, Karteek Alahari, Jakob Verbeek
  • For: The paper is written for computer vision tasks, specifically for image classification across multiple domains.* Methods: The paper introduces Modulation Adapters, which update the convolutional filter weights of the model in a multiplicative manner for each task. The adaptation weights are parameterized in a factored manner, allowing for flexible scaling of the number of per-task parameters and different parameter-accuracy trade-offs.* Results: The approach yields excellent results on the Visual Decathlon challenge and the ImageNet-to-Sketch benchmark, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.
    Abstract Deep convolutional networks are ubiquitous in computer vision, due to their excellent performance across different tasks for various domains. Models are, however, often trained in isolation for each task, failing to exploit relatedness between tasks and domains to learn more compact models that generalise better in low-data regimes. Multi-domain learning aims to handle related tasks, such as image classification across multiple domains, simultaneously. Previous work on this problem explored the use of a pre-trained and fixed domain-agnostic base network, in combination with smaller learnable domain-specific adaptation modules. In this paper, we introduce Modulation Adapters, which update the convolutional filter weights of the model in a multiplicative manner for each task. Parameterising these adaptation weights in a factored manner allows us to scale the number of per-task parameters in a flexible manner, and to strike different parameter-accuracy trade-offs. We evaluate our approach on the Visual Decathlon challenge, composed of ten image classification tasks across different domains, and on the ImageNet-to-Sketch benchmark, which consists of six image classification tasks. Our approach yields excellent results, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.
    摘要

Image Captions are Natural Prompts for Text-to-Image Models

  • paper_url: http://arxiv.org/abs/2307.08526
  • repo_url: None
  • paper_authors: Shiye Lei, Hao Chen, Sen Zhang, Bo Zhao, Dacheng Tao
  • for: 提高文本到图像生成模型的训练数据 Informative 性和多样性
  • methods: 使用高级captioning模型生成图像描述,并将描述与分类名称 concatenate 用作生成模型的训练数据
  • results: 对ImageNette、ImageNet-100和ImageNet-1K进行了广泛的实验,并 verify 了我们的方法可以significantly improve 模型在 sintetic 训练数据上的表现,即平均提高10%的分类精度。
    Abstract With the rapid development of Artificial Intelligence Generated Content (AIGC), it has become common practice in many learning tasks to train or fine-tune large models on synthetic data due to the data-scarcity and privacy leakage problems. Albeit promising with unlimited data generation, owing to massive and diverse information conveyed in real images, it is challenging for text-to-image generative models to synthesize informative training data with hand-crafted prompts, which usually leads to inferior generalization performance when training downstream models. In this paper, we theoretically analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts. Then we correspondingly propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data. Specifically, we caption each real image with the advanced captioning model to obtain informative and faithful prompts that extract class-relevant information and clarify the polysemy of class names. The image captions and class names are concatenated to prompt generative models for training image synthesis. Extensive experiments on ImageNette, ImageNet-100, and ImageNet-1K verify that our method significantly improves the performance of models trained on synthetic training data, i.e., 10% classification accuracy improvements on average.
    摘要 随着人工智能生成内容(AIGC)的快速发展,在许多学习任务中通常是通过人工生成的数据进行训练或细化大型模型,因为实际数据的缺乏和隐私泄露问题。虽然人工生成的数据具有无限数据的优势,但是由于实际图像中含有庞大和多样化的信息, Text-to-image生成模型很难以通过手工提示生成有用的训练数据,这通常会导致下游模型训练时的泛化性能差。在这篇论文中,我们 theoretically 分析了人工生成数据训练效果和提示数据分布之间的关系,然后对应提出了一种简单 yet effective的方法。具体来说,我们使用高级描述模型将每个实际图像描述为 faithful 和有用的提示,以提取类别相关的信息并减少类名的多义性。图像描述和类别名称 concatenated 作为提示生成模型进行训练图像生成。我们在 ImageNette、ImageNet-100 和 ImageNet-1K 进行了广泛的实验,结果显示,我们的方法可以在使用人工生成数据进行训练时提高模型的性能,即平均提高10%的分类精度。

Results on Counterfactual Invariance

  • paper_url: http://arxiv.org/abs/2307.08519
  • repo_url: None
  • paper_authors: Jake Fawkes, Robin J. Evans
  • for: 本文提供了对Counterfactual invariants的理论分析。
  • methods: 文章presented a variety of existing definitions, studied their relationships and graphical implications.
  • results: 文章showed that counterfactual invariance implies conditional independence, but conditional independence does not provide any information about the likelihood of satisfying counterfactual invariance. Additionally, for discrete causal models, counterfactually invariant functions are often restricted to being functions of specific variables or constant.Here’s the same information in Traditional Chinese:
  • for: 本文的目的是提供Counterfactual invariants的理论分析。
  • methods: 文章使用了多种现有的定义,研究它们之间的关系和图形 implications。
  • results: 文章显示了Counterfactual invariance implies conditional independence, but conditional independence does not provide any information about the likelihood of satisfying counterfactual invariance. In addition, for discrete causal models, counterfactually invariant functions are often restricted to being functions of specific variables or constant.
    Abstract In this paper we provide a theoretical analysis of counterfactual invariance. We present a variety of existing definitions, study how they relate to each other and what their graphical implications are. We then turn to the current major question surrounding counterfactual invariance, how does it relate to conditional independence? We show that whilst counterfactual invariance implies conditional independence, conditional independence does not give any implications about the degree or likelihood of satisfying counterfactual invariance. Furthermore, we show that for discrete causal models counterfactually invariant functions are often constrained to be functions of particular variables, or even constant.
    摘要 在本文中,我们提供了对Counterfactual invariants的理论分析。我们提供了多种现有的定义,研究它们之间的关系以及它们在图形上的含义。然后,我们转向现在主要关注的问题:Counterfactual invariants与Conditional independence之间的关系。我们表明,Counterfactual invariants imply Conditional independence,但Conditional independence不能提供关于满足Counterfactual invariants的度或概率的任何信息。此外,我们表明,对于排序 causal模型,Counterfactually invariants的函数frequently是特定变量或常数。

Kernel-Based Testing for Single-Cell Differential Analysis

  • paper_url: http://arxiv.org/abs/2307.08509
  • repo_url: https://github.com/anthoozier/kernel_testsda
  • paper_authors: Anthony Ozier-Lafontaine, Camille Fourneaux, Ghislain Durif, Céline Vallot, Olivier Gandrillon, Sandrine Giraud, Bertrand Michel, Franck Picard
  • for: 这种方法用于比较单个细胞中分子特征的分布, 例如基因表达和epigenomic修饰。
  • methods: 该方法基于kernel embedding的非线性比较框架, 可以进行单元细胞特征之间的 feature-wise 分析以及全面的 transcriptome 或 epigenome 比较, 考虑到它们的复杂依赖关系。
  • results: 该方法可以检测单元细胞中的不同类型 células, 并且可以成功地识别在分化过程中的细胞在转化过程中的阶段。 此外,通过分析单元细胞 ChIP-Seq 数据, 该方法可以找到不受治疗的乳腺癌细胞中的 persistenter 细胞 под类型。
    Abstract Single-cell technologies have provided valuable insights into the distribution of molecular features, such as gene expression and epigenomic modifications. However, comparing these complex distributions in a controlled and powerful manner poses methodological challenges. Here we propose to benefit from the kernel-testing framework to compare the complex cell-wise distributions of molecular features in a non-linear manner based on their kernel embedding. Our framework not only allows for feature-wise analyses but also enables global comparisons of transcriptomes or epigenomes, considering their intricate dependencies. By using a classifier to discriminate cells based on the variability of their embedding, our method uncovers heterogeneities in cell populations that would otherwise go undetected. We show that kernel testing overcomes the limitations of differential analysis methods dedicated to single-cell. Kernel testing is applied to investigate the reversion process of differentiating cells, successfully identifying cells in transition between reversion and differentiation stages. Additionally, we analyze single-cell ChIP-Seq data and identify a subpopulation of untreated breast cancer cells that exhibit an epigenomic profile similar to persister cells.
    摘要 单元技术提供了价值的内在分布的分析,如基因表达和聚合酶修饰。然而,对这些复杂的分布进行控制和有力的比较具有挑战性。我们提议利用kernel-测试框架来比较单元细胞水平的分布,以非线性方式基于其内存映射。我们的框架不仅允许特征值分析,而且允许全球比较单元胞营养或者聚合酶修饰,考虑其复杂的依赖关系。通过使用一个分类器来根据单元细胞的变化程度来识别单元细胞,我们的方法揭示了单元细胞群体中的异质性,这些异质性否则可能会被忽略。我们在研究单元细胞的还原过程中成功地使用kernel测试方法,并成功地识别在还原和分化过程中的单元细胞。此外,我们分析单元细胞ChIP-Seq数据,并发现一个未经治疗的乳腺癌细胞subpopulation,其聚合酶修饰profile与持续细胞类似。

Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

  • paper_url: http://arxiv.org/abs/2307.08507
  • repo_url: https://github.com/adaptive-agents-lab/mdot-pncg
  • paper_authors: Mete Kemertas, Allan D. Jepson, Amir-massoud Farahmand
  • for: 本文提出了一种新的优化运输算法,基于优化运输、投影下降和 conjugate gradients 等方法。
  • methods: 该算法可以计算优化运输成本,无论精度如何,而不会遇到数值稳定性问题。它利用 GPU 进行高效实现,并在许多情况下比传统算法 such as Sinkhorn’s Algorithm 更快 converges。
  • results: 我们对 marginal 分布 entropy 进行了特别关注,并证明高 entropy marginals 会导致更难的优化运输问题,而我们的算法适合这类问题。我们还进行了精心的减少分析,并对算法和问题参数进行了精心的调整。我们的结果表明,我们的算法可以为优化运输问题提供一个有用的工具。代码可以在 https://github.com/adaptive-agents-lab/MDOT-PNCG 上获取。
    Abstract We design a novel algorithm for optimal transport by drawing from the entropic optimal transport, mirror descent and conjugate gradients literatures. Our algorithm is able to compute optimal transport costs with arbitrary accuracy without running into numerical stability issues. The algorithm is implemented efficiently on GPUs and is shown empirically to converge more quickly than traditional algorithms such as Sinkhorn's Algorithm both in terms of number of iterations and wall-clock time in many cases. We pay particular attention to the entropy of marginal distributions and show that high entropy marginals make for harder optimal transport problems, for which our algorithm is a good fit. We provide a careful ablation analysis with respect to algorithm and problem parameters, and present benchmarking over the MNIST dataset. The results suggest that our algorithm can be a useful addition to the practitioner's optimal transport toolkit. Our code is open-sourced at https://github.com/adaptive-agents-lab/MDOT-PNCG .
    摘要 我们设计了一种新的优化交通算法,基于优化交通、镜像下降和 conjugate gradients 文献。我们的算法可以计算优化交通成本,无论精度如何,而不会遇到数值稳定性问题。我们的算法可以高效地运行在 GPU 上,并在许多情况下被证明可以更快 converge than 传统的算法,如 sinkhorn 算法,以数 iteration 和墙 clock 时间来说。我们特别关注到 marginal 分布的 entropy,并证明高 entropy marginal 会导致更难的优化交通问题,而我们的算法适合这种情况。我们进行了仔细的减少分析,并对算法和问题参数进行了精心的调整。我们在 MNIST 数据集上进行了 benchmarking,结果表明,我们的算法可以成为优化交通工具箱中的有用工具。我们的代码可以在 https://github.com/adaptive-agents-lab/MDOT-PNCG 上获取。

Does Visual Pretraining Help End-to-End Reasoning?

  • paper_url: http://arxiv.org/abs/2307.08506
  • repo_url: None
  • paper_authors: Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, Cordelia Schmid
  • for: investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, and confirm the feasibility of a neural network “generalist” to solve visual recognition and reasoning tasks.
  • methods: use a simple and general self-supervised framework which “compresses” each video frame into a small set of tokens with a transformer network, and reconstructs the remaining frames based on the compressed temporal context.
  • results: observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning, and our proposed framework outperforms traditional supervised pretraining, including image classification and explicit object detection, by large margins.
    Abstract We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would refute the common belief that explicit visual abstraction (e.g. object detection) is essential for compositional generalization on visual reasoning, and confirm the feasibility of a neural network "generalist" to solve visual recognition and reasoning tasks. We propose a simple and general self-supervised framework which "compresses" each video frame into a small set of tokens with a transformer network, and reconstructs the remaining frames based on the compressed temporal context. To minimize the reconstruction loss, the network must learn a compact representation for each image, as well as capture temporal dynamics and object permanence from temporal context. We perform evaluation on two visual reasoning benchmarks, CATER and ACRE. We observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning. Our proposed framework outperforms traditional supervised pretraining, including image classification and explicit object detection, by large margins.
    摘要 我们的目标是 investigate Whether end-to-end 视觉逻辑学习可以通过通用神经网络实现,帮助了由 visual pretraining 。一个正面的结果会证明 Explicit 视觉抽象(例如对象检测)不是必要的 для compositional generalization 的视觉逻辑任务,并证明神经网络 "通用" 可以解决视识别和逻辑任务。我们提出了一个简单和通用的自我超vised framework,将每帧视频图片"压缩" 成一个小集合 of tokens 使用 transformer 网络,并使用压缩的时间上下文来重建剩下的帧。为了降低重建损失,网络必须学习每幅图片的紧凑表示,以及从时间上下文中捕捉时间动态和对象的持续性。我们在 CATER 和 ACRE 两个视觉逻辑标准benchmark上进行评估,发现预训练是必要的,以实现 compositional generalization 的 end-to-end 视觉逻辑学习。我们提出的方法在图像分类和显式对象检测的传统预训练下,都有大幅度的优势。

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

  • paper_url: http://arxiv.org/abs/2307.11770
  • repo_url: https://github.com/cgshpi/topic-models-and-dimensionality-reduction-benchmark
  • paper_authors: Daniel Atzberger, Tim Cech, Willy Scheibel, Matthias Trapp, Rico Richter, Jürgen Döllner, Tobias Schreck
  • for: 这个论文的目的是 investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots.
  • methods: 该论文使用了多种主题模型和维度减少算法,并对它们的组合进行了大规模的计算评估。
  • results: 根据计算结果, interpretable topic models 能够很好地捕捉文本 Corpora 的结构,而 t-SNE 作为维度减少算法也有良好的效果。
    Abstract Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and supporting corpus analysis. Although the choice of the topic model, the dimensionality reduction, and their underlying hyperparameters significantly impact the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics. To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based computational evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of layout algorithms that are combinations of topic models and dimensionality reductions, and (3) quality metrics for quantifying the resulting layout. The corpora are given as document-term matrices, and each document is assigned to a thematic class. The chosen metrics quantify the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots. By evaluating the benchmark on a computing cluster, we derived a multivariate dataset with over 45 000 individual layouts and corresponding quality metrics. Based on the results, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions. As a main result, we show that interpretable topic models are beneficial for capturing the structure of text corpora. We furthermore recommend the use of t-SNE as a subsequent dimensionality reduction.
    摘要 To investigate the effectiveness of topic models and dimensionality reduction methods for spatializing text corpora as two-dimensional scatter plots, we conducted a large-scale, benchmark-based computational evaluation. Our evaluation consisted of three parts:1. A set of corpora, given as document-term matrices, with each document assigned to a thematic class.2. A set of layout algorithms that combined topic models and dimensionality reductions.3. Quality metrics to quantify the resulting layout, including the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots.We evaluated the benchmark on a computing cluster and derived a multivariate dataset with over 45,000 individual layouts and corresponding quality metrics. Our results show that interpretable topic models are beneficial for capturing the structure of text corpora, and we recommend the use of t-SNE as a subsequent dimensionality reduction. Based on our findings, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions.

Can We Trust Race Prediction?

  • paper_url: http://arxiv.org/abs/2307.08496
  • repo_url: https://github.com/cangyuanli/pyethnicity
  • paper_authors: Cangyuan Li
  • for: 这个论文是为了提高美国选民登记数据中的人口统计和地理编码的准确性而写的。
  • methods: 作者使用了irectional Long Short-Term Memory (BiLSTM) 模型和一个ensemble模型,并使用了一个新的选民登记数据集来提高模型的性能。
  • results: 作者通过创建了一个全面的美国人名和姓氏分布数据库,并提供了一个高品质的比较数据集,以提高 bayesian improved surname geocoding (BISG) 和 bayesian improved firstname surname geocoding (BIFSG) 的准确性。
    Abstract In the absence of sensitive race and ethnicity data, researchers, regulators, and firms alike turn to proxies. In this paper, I train a Bidirectional Long Short-Term Memory (BiLSTM) model on a novel dataset of voter registration data from all 50 US states and create an ensemble that achieves up to 36.8% higher out of sample (OOS) F1 scores than the best performing machine learning models in the literature. Additionally, I construct the most comprehensive database of first and surname distributions in the US in order to improve the coverage and accuracy of Bayesian Improved Surname Geocoding (BISG) and Bayesian Improved Firstname Surname Geocoding (BIFSG). Finally, I provide the first high-quality benchmark dataset in order to fairly compare existing models and aid future model developers.
    摘要 在没有敏感的种族和族谱数据的情况下,研究人员、监管机构和公司都会寻找代理。在这篇论文中,我将一个 bidirectional Long Short-Term Memory(BiLSTM)模型训练在所有50个美国州的选民登记数据上,创建了一个ensemble,其在验证样本外的F1分数高达36.8%,较文献中最高performing机器学习模型高。此外,我还建立了美国最完整的姓名和名字分布数据库,以提高Bayesian Improved Surname Geocoding(BISG)和Bayesian Improved Firstname Surname Geocoding(BIFSG)的覆盖和精度。最后,我提供了第一个高品质的比较基准集,以便比较现有模型和未来的模型开发者。

Fairness in KI-Systemen

  • paper_url: http://arxiv.org/abs/2307.08486
  • repo_url: None
  • paper_authors: Janine Strotherm, Alissa Müller, Barbara Hammer, Benjamin Paaßen
  • for: 本文提供了关于机器学习中的公平研究的导入,包括主要的公平定义和实现公平的策略。
  • methods: 本文使用了可见的示例和图像来解释公平定义和策略,适用于多学科读者。
  • results: 本文提供了一个欧洲Context中的公平研究的导入,包括主要的公平定义和实现公平的策略。
    Abstract The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples. -- Je mehr KI-gest\"utzte Entscheidungen das Leben von Menschen betreffen, desto wichtiger ist die Fairness solcher Entscheidungen. In diesem Kapitel geben wir eine Einf\"uhrung in die Forschung zu Fairness im maschinellen Lernen. Wir erkl\"aren die wesentlichen Fairness-Definitionen und Strategien zur Erreichung von Fairness anhand konkreter Beispiele und ordnen die Fairness-Forschung in den europ\"aischen Kontext ein. Unser Beitrag richtet sich dabei an ein interdisziplin\"ares Publikum und verzichtet daher auf die mathematische Formulierung sondern betont Visualisierungen und Beispiele.
    摘要 更多的AI助け的决策对人们的生活产生了影响,因此决策的公正性变得越来越重要。在这章中,我们提供了对决策公正性的研究Introduction,并解释了主要的公正性定义和在实际示例中实现公正性的策略。我们的贡献是向多学科读者群体进行了定向,因此弃用了数学表述,而是强调视觉化和示例。In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples. As AI-assisted decisions increasingly affect people's lives, the fairness of such decisions becomes more important.

Cross Feature Selection to Eliminate Spurious Interactions and Single Feature Dominance Explainable Boosting Machines

  • paper_url: http://arxiv.org/abs/2307.08485
  • repo_url: None
  • paper_authors: Shree Charran R, Sandipan Das Mahapatra
  • for: 本研究旨在提高EBM模型的解释性和可靠性,并应用于各种预测任务中。
  • methods: 本研究使用了 alternate 横向选择、集成特征和模型配置变更等技术来解决EBM模型中的干扰和单个特征主导问题。
  • results: 对三个 benchmark 数据集进行了评估,结果表明 alternate 技术可以超越原始 EBM 方法,提供更好的解释性和特征选择稳定性,并提高模型的预测性能。
    Abstract Interpretability is a crucial aspect of machine learning models that enables humans to understand and trust the decision-making process of these models. In many real-world applications, the interpretability of models is essential for legal, ethical, and practical reasons. For instance, in the banking domain, interpretability is critical for lenders and borrowers to understand the reasoning behind the acceptance or rejection of loan applications as per fair lending laws. However, achieving interpretability in machine learning models is challenging, especially for complex high-performance models. Hence Explainable Boosting Machines (EBMs) have been gaining popularity due to their interpretable and high-performance nature in various prediction tasks. However, these models can suffer from issues such as spurious interactions with redundant features and single-feature dominance across all interactions, which can affect the interpretability and reliability of the model's predictions. In this paper, we explore novel approaches to address these issues by utilizing alternate Cross-feature selection, ensemble features and model configuration alteration techniques. Our approach involves a multi-step feature selection procedure that selects a set of candidate features, ensemble features and then benchmark the same using the EBM model. We evaluate our method on three benchmark datasets and show that the alternate techniques outperform vanilla EBM methods, while providing better interpretability and feature selection stability, and improving the model's predictive performance. Moreover, we show that our approach can identify meaningful interactions and reduce the dominance of single features in the model's predictions, leading to more reliable and interpretable models. Index Terms- Interpretability, EBM's, ensemble, feature selection.
    摘要 <> translate the following text into Simplified Chinese<>机器学习模型的可解释性是一个关键的特点,它使得人们可以更好地理解和信任机器学习模型的决策过程。在实际应用中,机器学习模型的可解释性是非常重要的,特别是在银行领域。在这个领域中,可解释性是对借款申请的 Accept 或 Reject 决策进行法律、伦理和实用上的要求。然而,实现机器学习模型的可解释性是一个挑战,特别是在复杂高性能模型中。因此,可解释性增强的机器学习模型(EBM)在各种预测任务中得到了广泛的应用。然而,这些模型可能会面临一些问题,如 redundancy 特征之间的干扰和单个特征在所有交互中的占主导地位,这些问题可能会影响模型预测的可靠性和解释性。在这篇论文中,我们探讨了一些新的方法来解决这些问题,包括使用另一种 Cross-feature 选择、ensemble 特征和模型配置变化技术。我们的方法包括一个多步特征选择过程,选择一组候选特征、ensemble特征,然后使用 EBM 模型来评估。我们在三个 benchmark 数据集上评估了我们的方法,并显示了它们在比vanilla EBM方法更高的可解释性和特征选择稳定性,以及提高模型预测性能。此外,我们还发现了我们的方法可以找到有意义的交互和减少单个特征在模型预测中的主导地位,从而提高模型的可靠性和解释性。Index Terms- 可解释性, EBM, ensemble, 特征选择.

A Fast Task Offloading Optimization Framework for IRS-Assisted Multi-Access Edge Computing System

  • paper_url: http://arxiv.org/abs/2307.08474
  • repo_url: https://github.com/uic-jq/iopo
  • paper_authors: Jianqiu Wu, Zhongyi Yu, Jianxiong Guo, Zhiqing Tang, Tian Wang, Weijia Jia
  • for: 这个论文旨在提高无线网络,尤其是基于飞行器多访问边缘计算系统。
  • methods: 该论文提出了一种基于深度学习的优化框架,称为迭代保持顺序决策(IOPO),用于生成高质量的任务卸载分配。
  • results: 实验结果表明,该提议的框架可以在很短的时间内生成高效的任务卸载决策,超过其他标准方法。
    Abstract Terahertz communication networks and intelligent reflecting surfaces exhibit significant potential in advancing wireless networks, particularly within the domain of aerial-based multi-access edge computing systems. These technologies enable efficient offloading of computational tasks from user electronic devices to Unmanned Aerial Vehicles or local execution. For the generation of high-quality task-offloading allocations, conventional numerical optimization methods often struggle to solve challenging combinatorial optimization problems within the limited channel coherence time, thereby failing to respond quickly to dynamic changes in system conditions. To address this challenge, we propose a deep learning-based optimization framework called Iterative Order-Preserving policy Optimization (IOPO), which enables the generation of energy-efficient task-offloading decisions within milliseconds. Unlike exhaustive search methods, IOPO provides continuous updates to the offloading decisions without resorting to exhaustive search, resulting in accelerated convergence and reduced computational complexity, particularly when dealing with complex problems characterized by extensive solution spaces. Experimental results demonstrate that the proposed framework can generate energy-efficient task-offloading decisions within a very short time period, outperforming other benchmark methods.
    摘要 “tera兆通信网络和智能反射表示技术在提高无线网络方面具有 significannot potential,特别是在基于飞行器多访问边缘计算系统中。这些技术可以有效地减轻用户电子设备中的计算任务到无人飞行机或本地执行。为生成高质量的任务卸载分配,常见的数字优化方法经常陷入在限制性减震时间内的复杂 combinatorial 优化问题中,从而无法快速应对系统条件的动态变化。为解决这个挑战,我们提出了一种基于深度学习的优化框架,即迭代保持顺序分配优化(IOPO)。与极限搜索方法不同,IOPO可以在毫秒级时间内生成能效的任务卸载决策,而无需进行极限搜索,从而降低计算复杂性,特别是在面临复杂问题时。实验结果表明,提议的框架可以在很短的时间内生成能效的任务卸载决策,超过了其他 Referenced 方法。”

Hidden Markov Models with Random Restarts vs Boosting for Malware Detection

  • paper_url: http://arxiv.org/abs/2307.10256
  • repo_url: None
  • paper_authors: Aditya Raghavan, Fabio Di Troia, Mark Stamp
  • for: 这个研究是为了提高适当的黑客病毒检测方法。
  • methods: 这个研究使用了隐藏Markovchain模型(HMM)和AdaBoost数据集训练方法。
  • results: 研究发现,Random Restarts方法在训练数据匮乏时表现出来 surprisingly well,而boosting则仅在最困难的“冰结”案例(训练数据匮乏)中能够提供足够的改善,以正fi运算阶段的computational cost。
    Abstract Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general-and malware detection in particular-is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult "cold start" cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.
    摘要 “有效和高效的恶意软件检测是当前安全数字系统研究的核心。与其他领域一样,恶意软件检测研究在使用机器学习算法方面表现了快速增长。一种广泛应用于模式匹配领域(包括恶意软件检测)的机器学习技术是隐藏Markov模型(HMM)。HMM训练基于山峰搜索,因此可以通过不同的初始值进行多次训练以改进模型。在这项研究中,我们比较了使用AdaBoost加强HMM和多个随机重启来进行恶意软件检测。这些技术在许多复杂的恶意软件数据集中进行应用。我们发现,随机重启 surprisingly well 在对抗“冰结”(训练数据 severely limited)情况下表现出色,而boosting 只在这些情况下能够提供足够的改进,以 justify its higher computational cost in the scoring phase。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Generalizable Classification of UHF Partial Discharge Signals in Gas-Insulated HVDC Systems Using Neural Networks

  • paper_url: http://arxiv.org/abs/2307.08466
  • repo_url: None
  • paper_authors: Steffen Seitz, Thomas Götz, Christopher Lindenberg, Ronald Tetzlaff, Stephan Schlegel
  • for: 本研究旨在为高压直流绝缘系统(HVDC GIS)中检测未探测到的部分磁振(PD)提供一种基于神经网络的分类方法,无需基于振荡序列分析特征。
  • methods: 本研究使用神经网络分类方法,并对时域和频域输入信号进行比较,以及不同 нормализа schemes的影响。
  • results: 研究发现,使用神经网络分类方法可以区分由金属凸起和导电粒子引起的PD信号,并且可以在不同的运行电压多пли中进行泛化。
    Abstract Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we propose and analyze a neural network-based approach for classifying PD signals caused by metallic protrusions and conductive particles on the insulator of HVDC GIS, without relying on pulse sequence analysis features. In contrast to previous approaches, our proposed model can discriminate the studied PD signals obtained at negative and positive potentials, while also generalizing to unseen operating voltage multiples. Additionally, we compare the performance of time- and frequency-domain input signals and explore the impact of different normalization schemes to mitigate the influence of free-space path loss between the sensor and defect location.
    摘要 未探测的偏置负荷(PD)是高压直流瓦尔瑙系统(GIS)中的安全关键问题。 Although the diagnosis of PDs under AC voltage is well established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis.在本文中,我们提出了一种基于神经网络的方法,用于分类HVDC GIS中的卷积物和导电粒子引起的PD信号,不需要基于激射序列分析特征。 与前一些方法不同,我们的提议的模型可以在负和正潜能下分辨出 studied PD signals,同时还能泛化到未经见过的操作电压倍数。 此外,我们还比较了时域和频域输入信号的性能,并探讨了不同的normalization schemes来减少卷积物和导电粒子之间的自由空间通路损失的影响。

A benchmark of categorical encoders for binary classification

  • paper_url: http://arxiv.org/abs/2307.09191
  • repo_url: https://github.com/drcohomology/encoderbenchmarking
  • paper_authors: Federico Matteucci, Vadim Arzamasov, Klemens Boehm
  • for: 本研究是机器学习领域中 categorical 编码器的最全面的比较研究,涵盖了多种编码器家族的32种配置,以及36种实验因素的组合,在50个数据集上进行了广泛的评估。
  • methods: 本研究使用了多种编码器家族的32种配置,以及36种实验因素的组合,在50个数据集上进行了广泛的评估。
  • results: 研究发现,选择数据集、实验因素和综合策略会对比较结论产生深远的影响,这些因素在先前的编码器比较中未得到考虑。
    Abstract Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of (1) encoders, (2) experimental factors, and (3) datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper is the most comprehensive benchmark of categorical encoders to date, including an extensive evaluation of 32 configurations of encoders from diverse families, with 36 combinations of experimental factors, and on 50 datasets. The study shows the profound influence of dataset selection, experimental factors, and aggregation strategies on the benchmark's conclusions -- aspects disregarded in previous encoder benchmarks.
    摘要 categorical 编码器将 categorical 特征转换为数字表示形式,这些表示形式是机器学习模型的不可或缺的一部分。现有的编码器比较研究受到限制因为它们选择的(1)编码器、(2)实验因素和(3)数据集的选择有限。此外,由于不同的汇集策略的采用,导致了不一致性。这篇论文是目前最全面的 categorical 编码器比较研究,包括了32种编码器家族中的广泛评估,以及36种实验因素的组合,和50个数据集的评估。研究显示数据集选择、实验因素和汇集策略对比较的结论产生了深远的影响,这些因素在前一次编码器比较中被忽略了。

SBMLtoODEjax: efficient simulation and optimization of ODE SBML models in JAX

  • paper_url: http://arxiv.org/abs/2307.08452
  • repo_url: https://github.com/flowersteam/sbmltoodejax
  • paper_authors: Mayalen Etcheverry, Michael Levin, Clément Moulin-Frier, Pierre-Yves Oudeyer
  • for: 这篇论文是为了提供一个可以自动将系统生物学标记语言(SBML)模型转换成Python代码的轻量级库。
  • methods: 该库使用JAX高性能数学计算库自动推导 capabilities来实现高效的数字化模拟和优化。
  • results: 该库可以帮助研究人员快速将SBML模型integrated into ихPython项目和机器学习管道,只需几行代码即可实现高性能的数字化模拟和优化。
    Abstract Developing methods to explore, predict and control the dynamic behavior of biological systems, from protein pathways to complex cellular processes, is an essential frontier of research for bioengineering and biomedicine. Thus, significant effort has gone in computational inference and mathematical modeling of biological systems. This effort has resulted in the development of large collections of publicly-available models, typically stored and exchanged on online platforms (such as the BioModels Database) using the Systems Biology Markup Language (SBML), a standard format for representing mathematical models of biological systems. SBMLtoODEjax is a lightweight library that allows to automatically parse and convert SBML models into python models written end-to-end in JAX, a high-performance numerical computing library with automatic differentiation capabilities. SBMLtoODEjax is targeted at researchers that aim to incorporate SBML-specified ordinary differential equation (ODE) models into their python projects and machine learning pipelines, in order to perform efficient numerical simulation and optimization with only a few lines of code. SBMLtoODEjax is available at https://github.com/flowersteam/sbmltoodejax.
    摘要 开发方法来探索、预测和控制生物系统的动态行为,从蛋白道路到复杂的细胞过程,是生物工程和生物医学研究的关键前沿。因此,在计算推理和数学模型化方面的努力很大,以致已经形成了大量的公共可用模型,通常通过在线平台(如生物模型数据库)存储和交换,使用系统生物学标记语言(SBML),这是表示生物系统数学模型的标准格式。SBMLtoODEjax 是一个轻量级库,它可以自动解析和将 SBML 模型转换为 Python 模型,并将其写入终端到终端在 JAX 中,JAX 是一个高性能的数值计算库,具有自动导数能力。SBMLtoODEjax 是为研究者们提供,他们想将 SBML 规定的常微分方程(ODE)模型 integrate 到他们的 Python 项目和机器学习管道中,以实现高效的数值优化和优化,只需几行代码。SBMLtoODEjax 可以在 GitHub 上找到:https://github.com/flowersteam/sbmltoodejax。

From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs

  • paper_url: http://arxiv.org/abs/2307.08433
  • repo_url: None
  • paper_authors: Ahmad Naser Eddin, Jacopo Bono, David Aparício, Hugo Ferreira, João Ascensão, Pedro Ribeiro, Pedro Bizarro
  • for: 这篇研究是为了提出一个能够实现低延迟、高效的动态图像学习框架,并且能够处理真实世界的动态图像资料。
  • methods: 这篇研究使用了流动测量的方法,即时间感知的点 cloud 来捕捉多阶资讯,并且使用了单一阶资讯来计算时间感知的点 cloud。
  • results: 研究结果显示,使用 graph-sprints 的方法可以实现与现有的高延迟模型相同或更好的性能,并且可以实现低延迟的推断运算。
    Abstract Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
    摘要 muchos datos del mundo real tienen una estructura de grafo subyacente dinámica, donde las entidades y sus interacciones evolucionan con el tiempo. Los modelos de aprendizaje automático deben considerar estas dinámicas para aprovechar su potencial completo en tareas downstream. Los enfoques anteriores para aprender representaciones de grafos han centrado en la muestra de neighbborhoods k-hop, similar a búsqueda en profundidad, o caminatas aleatorias, similar a búsqueda en anchura. Sin embargo, estos métodos son costosos en términos de computación y no son adecuados para inferencia en tiempo real y baja latencia en grafos dinámicos. Para superar estos límites, propusimos graph-sprints, un marco general de extracción de características para grafos continuos en tiempo dinámico (CTDGs) que tiene baja latencia y es competitivo con modelos de estado del arte de mayor latencia. Para lograr esto, se propone una aproximación de bajo latencia y streaming a las características basadas en caminatas aleatorias. En nuestro marco, las embeddings de nodos conscientes del tiempo resumen la información de múltiples hop utilizando solo operaciones de un hop en las aristas entrantes. Evaluamos nuestro enfoque propuesto en tres conjuntos de datos abiertos y dos conjuntos de datos internos, y lo comparamos con tres algoritmos estado del arte (TGN-attn, TGN-ID, Jodie). Demostramos que nuestras características de graph-sprints, combinadas con una clase de aprendizaje automático, logran rendimientos competitivos (superando a todos los baselines para las tareas de clasificación de nodos en cinco conjuntos de datos). Al mismo tiempo, graph-sprints reduce significativamente las latencias de inferencia, logrando una reducción de cerca de un orden de magnitud en nuestra configuración experimental.

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

  • paper_url: http://arxiv.org/abs/2307.08423
  • repo_url: https://github.com/divelab/AIRS
  • paper_authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence, Hannes Stärk, Shurui Gui, Carl Edwards, Nicholas Gao, Adriana Ladera, Tailin Wu, Elyssa F. Hofgard, Aria Mansouri Tehrani, Rui Wang, Ameya Daigavane, Montgomery Bohde, Jerry Kurtin, Qian Huang, Tuong Phung, Minkai Xu, Chaitanya K. Joshi, Simon V. Mathis, Kamyar Azizzadenesheli, Ada Fang, Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, Shuiwang Ji
  • for: 这篇论文主要针对的是利用人工智能(AI)进行自然科学研究(AI4Science)的新 paradigm。
  • methods: 论文使用的方法包括深度学习方法,以捕捉自然系统中的物理原理,特别是对称变换的equivariance。
  • results: 论文提供了一种Foundational and unified treatment of AI for quantum, atomistic, and continuum systems,并提出了一些解释性、过度分布采样和不确定性评估等技术挑战。
    Abstract Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This paper aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.
    摘要 人工智能技术的发展(AI)正在推动一种新的发现 paradigm in 自然科学。今天,AI已经开始为自然科学的研究提供了改进、加速和实现自然现象的理解,从而创造了一个新的研究领域——AI for science(AI4Science)。作为一个emerging research paradigm,AI4Science具有巨大的多学科性和挑战性,因此需要一种统一和技术性的处理。本文的目的是提供AI4Science中的一个子领域的技术深入报告,即用AI研究量子、原子istic和连续体系。这些领域旨在理解自然世界从子原子(振荡函数和电子密度)、原子(分子、蛋白质、材料和交互)到宏观(液体、气候和地壳)级别的物理世界,并形成AI4Science中一个重要的子领域。这些领域之间共享许多挑战,因此可以实现一种统一和基础的处理。一个关键的共同挑战是如何通过深度学习方法捕捉自然系统中的物理基本原理,特别是对称性。我们提供了深入 yet 易于理解的对于实现对称变换的方法的详细讲解。我们还讨论了其他一些常见的技术挑战,包括可解释性、out-of-distribution扩展、基础和大语言模型知识传递、和不确定性评估。为便于学习和教育,我们提供了分类列表,我们认为是有用的资源。我们努力保持统一和完整,希望这个初步努力可以触发更多的社区兴趣和努力,以进一步推动AI4Science的发展。

Neurosymbolic AI for Reasoning on Biomedical Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2307.08411
  • repo_url: None
  • paper_authors: Lauren Nicole DeLong, Ramon Fernández Mir, Zonglin Ji, Fiona Niamh Coulter Smith, Jacques D. Fleuriot
  • for: 这篇论文旨在介绍基于神经符号智能的 hybrid 方法,以及其在生物医学领域中的应用和优势。
  • methods: 这篇论文主要介绍了基于 embedding 和符号 logic 的 hybrid 方法,以及它们在生物医学领域中的应用。
  • results: 论文总结了 hybrid 方法的优势和可能性,并指出了它们在生物医学领域中的应用可能性。I hope that helps! Let me know if you have any other questions.
    Abstract Biomedical datasets are often modeled as knowledge graphs (KGs) because they capture the multi-relational, heterogeneous, and dynamic natures of biomedical systems. KG completion (KGC), can, therefore, help researchers make predictions to inform tasks like drug repositioning. While previous approaches for KGC were either rule-based or embedding-based, hybrid approaches based on neurosymbolic artificial intelligence are becoming more popular. Many of these methods possess unique characteristics which make them even better suited toward biomedical challenges. Here, we survey such approaches with an emphasis on their utilities and prospective benefits for biomedicine.
    摘要

Vocoder drift compensation by x-vector alignment in speaker anonymisation

  • paper_url: http://arxiv.org/abs/2307.08403
  • repo_url: None
  • paper_authors: Michele Panariello, Massimiliano Todisco, Nicholas Evans
  • for: 本研究旨在探讨xvector基于的听者隐私保护方法中, vocoding而不是核心隐私函数对听者隐私的保护产生了主要的影响。
  • methods: 本研究使用了一种新的方法来衡量 vocoder drift,并提出了一种新的隐私函数来减少 vocoder drift。
  • results: 研究发现,使用新的隐私函数可以有效地减少 vocoder drift,并提供了更好的控制 sobre xvector 空间。
    Abstract For the most popular x-vector-based approaches to speaker anonymisation, the bulk of the anonymisation can stem from vocoding rather than from the core anonymisation function which is used to substitute an original speaker x-vector with that of a fictitious pseudo-speaker. This phenomenon can impede the design of better anonymisation systems since there is a lack of fine-grained control over the x-vector space. The work reported in this paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody. Also reported is an original approach to vocoder drift compensation. While anonymisation performance degrades as expected, compensation reduces vocoder drift substantially, offers improved control over the x-vector space and lays a foundation for the design of better anonymisation functions in the future.
    摘要 Translation notes:* "x-vector" is translated as "语音特征向量" (yùn zhòng yǐn xiàng wù)* "vocoding" is translated as "语音编码" (yùn zhòng biān mǎ)* "anonymization" is translated as "匿名化" (mìng mìng huà)* "core anonymization function" is translated as "核心匿名函数" (zhū xīn mìng mìng fù xiàng)* "fictitious pseudo-speaker" is translated as "虚拟的假发音者" (xū yì de jiǎ fā yīn zhě)* "linguistic content, intonation, and prosody" are translated as "语言内容、听调和语调" (yǔ yán nèi xìng, tīng diào hé yǔ diào)* "vocoder drift compensation" is translated as "语音编码落差补做" (yùn zhòng biān mǎ lù chē bǔ zuò)

On the application of Large Language Models for language teaching and assessment technology

  • paper_url: http://arxiv.org/abs/2307.08393
  • repo_url: None
  • paper_authors: Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis, Yuan Gao, Oeistein Andersen, Zheng Yuan, Mark Elliott, Russell Moore, Christopher Bryant, Marek Rei, Helen Yannakoudakis, Andrew Mullooly, Diane Nicholls, Paula Buttery
  • for: 这篇论文探讨了用大型自然语言处理模型(PaLM和GPT-4)在语言教学和评估系统中的潜在应用。
  • methods: 论文考虑了几个研究领域,并讨论了在教育技术中使用生成AI的风险和伦理问题。
  • results: 研究发现大型语言模型在文本生成方面有所改进,但是在自动评分和语法错误检查方面,大型语言模型独立使用不能超越现有的状态艺术metric。
    Abstract The recent release of very large language models such as PaLM and GPT-4 has made an unprecedented impact in the popular media and public consciousness, giving rise to a mixture of excitement and fear as to their capabilities and potential uses, and shining a light on natural language processing research which had not previously received so much attention. The developments offer great promise for education technology, and in this paper we look specifically at the potential for incorporating large language models in AI-driven language teaching and assessment systems. We consider several research areas and also discuss the risks and ethical considerations surrounding generative AI in education technology for language learners. Overall we find that larger language models offer improvements over previous models in text generation, opening up routes toward content generation which had not previously been plausible. For text generation they must be prompted carefully and their outputs may need to be reshaped before they are ready for use. For automated grading and grammatical error correction, tasks whose progress is checked on well-known benchmarks, early investigations indicate that large language models on their own do not improve on state-of-the-art results according to standard evaluation metrics. For grading it appears that linguistic features established in the literature should still be used for best performance, and for error correction it may be that the models can offer alternative feedback styles which are not measured sensitively with existing methods. In all cases, there is work to be done to experiment with the inclusion of large language models in education technology for language learners, in order to properly understand and report on their capacities and limitations, and to ensure that foreseeable risks such as misinformation and harmful bias are mitigated.
    摘要 最近发布的非常大的自然语言处理模型,如PaLM和GPT-4,在流行媒体和公众意识中产生了无前例的影响,引发了诸多人的兴奋和担忧,对其能力和应用领域的潜在影响。这些发展在教育技术方面具有巨大潜力,在这篇论文中,我们专门关注在AI驱动的语言教学和评估系统中可能的应用。我们考虑了多个研究领域,并讨论了生成AI在教育技术中的风险和伦理考虑。总之,更大的语言模型在文本生成方面提供了改进,打开了新的内容生成途径,但是需要谨慎地提供提示,并且输出可能需要重新处理。在自动评分和 grammatical error correction 方面,初步调查表明,大语言模型独立使用不会超越现有的标准评价指标。在评分方面,使用现有的语言特征仍然是最佳选择,而在 error correction 方面,模型可能提供不同的反馈样式,不同于现有方法的敏感度评价。总之,需要进行实验来探索将大语言模型包含在教育技术中的可能性和局限性,以确保预期的风险,如误导和不良偏见,得到控制。

Correlation-aware Spatial-Temporal Graph Learning for Multivariate Time-series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.08390
  • repo_url: https://github.com/astha-chem/mvts-ano-eval
  • paper_authors: Yu Zheng, Huan Yee Koh, Ming Jin, Lianhua Chi, Khoa T. Phan, Shirui Pan, Yi-Ping Phoebe Chen, Wei Xiang
  • for: 本研究旨在提出一种新的多变量时间序列异常检测方法,以解决现有方法中的非线性关系捕捉和时间序列异常检测问题。
  • methods: 本方法基于多变量时间序列相关学习模块,并采用空间时间图学习网络(STGNN)来编码复杂的变量间相关性。另外,通过借鉴一元和多元邻居信息,我们的STGNN组件可以吸收复杂的空间信息。同时,我们还提出了一种新的异常分数组件,可以在无监督情况下估计异常程度。
  • results: 实验结果表明,CST-GL方法可以在一般情况下有效地检测异常,并且可以在不同的时间延迟下进行早期检测。
    Abstract Multivariate time-series anomaly detection is critically important in many applications, including retail, transportation, power grid, and water treatment plants. Existing approaches for this problem mostly employ either statistical models which cannot capture the non-linear relations well or conventional deep learning models (e.g., CNN and LSTM) that do not explicitly learn the pairwise correlations among variables. To overcome these limitations, we propose a novel method, correlation-aware spatial-temporal graph learning (termed CST-GL), for time series anomaly detection. CST-GL explicitly captures the pairwise correlations via a multivariate time series correlation learning module based on which a spatial-temporal graph neural network (STGNN) can be developed. Then, by employing a graph convolution network that exploits one- and multi-hop neighbor information, our STGNN component can encode rich spatial information from complex pairwise dependencies between variables. With a temporal module that consists of dilated convolutional functions, the STGNN can further capture long-range dependence over time. A novel anomaly scoring component is further integrated into CST-GL to estimate the degree of an anomaly in a purely unsupervised manner. Experimental results demonstrate that CST-GL can detect anomalies effectively in general settings as well as enable early detection across different time delays.
    摘要 多变量时间序列异常检测在许多应用程序中非常重要,包括零售、交通、电力网络和水处理厂。现有的方法通常使用统计模型,这些模型不能很好地捕捉非线性关系,或者使用传统的深度学习模型(如CNN和LSTM),这些模型不直接学习时间序列变量之间的对比关系。为了解决这些限制,我们提出了一种新的方法,即相关意识空间时间图学习(CST-GL),用于时间序列异常检测。CST-GL使用多变量时间序列相关学习模块,该模块可以识别时间序列变量之间的对比关系。然后,通过基于这些相关关系的空间时间图 neural network(STGNN)的开发,我们可以融合复杂的对比关系信息,以获得rich的空间信息。此外,我们还使用一个包含扩展延迟 convolutional functions的时间模块,以捕捉长距离时间关系。最后,我们还添加了一个异常分数组件,以无监督的方式估算异常的程度。实验结果表明,CST-GL可以有效地检测异常情况,并且可以在不同的时间延迟下进行早期检测。

Tabular Machine Learning Methods for Predicting Gas Turbine Emissions

  • paper_url: http://arxiv.org/abs/2307.08386
  • repo_url: None
  • paper_authors: Rebecca Potts, Rick Hackney, Georgios Leontidis
  • for: 这个研究旨在评估机器学习模型在预测液压机发动机排放气体中的性能。
  • methods: 我们比较了一个现有的预测排放模型(化学动力学模型)和我们基于SAINT和XGBoost的两种机器学习模型,以示机器学习技术可以提供更好的预测性能。
  • results: 我们发现,使用机器学习技术可以提高氮氧化物(NOx)和碳 моно氧化物(CO)的预测性能。
    Abstract Predicting emissions for gas turbines is critical for monitoring harmful pollutants being released into the atmosphere. In this study, we evaluate the performance of machine learning models for predicting emissions for gas turbines. We compare an existing predictive emissions model, a first principles-based Chemical Kinetics model, against two machine learning models we developed based on SAINT and XGBoost, to demonstrate improved predictive performance of nitrogen oxides (NOx) and carbon monoxide (CO) using machine learning techniques. Our analysis utilises a Siemens Energy gas turbine test bed tabular dataset to train and validate the machine learning models. Additionally, we explore the trade-off between incorporating more features to enhance the model complexity, and the resulting presence of increased missing values in the dataset.
    摘要 预测液压机排放是监测污染物排入大气中的关键。本研究对机器学习模型的表现进行评估,以证明使用机器学习技术可以改善液压机排放氮氧化物和碳 моно氧化物的预测性能。我们比较了现有的预测排放模型、基于化学动力学的首要原理模型,与我们基于SAINT和XGBoost开发的两种机器学习模型,以示出机器学习技术的改善性。我们的分析使用了Siemens Energy液压机测试床数据集来训练和验证机器学习模型。此外,我们还探讨了增加特征以增强模型复杂性所带来的数据缺失问题。

Predicting Battery Lifetime Under Varying Usage Conditions from Early Aging Data

  • paper_url: http://arxiv.org/abs/2307.08382
  • repo_url: None
  • paper_authors: Tingkai Li, Zihao Zhou, Adam Thelen, David Howey, Chao Hu
  • For: 预测 Lithium-ion 电池寿命,以便预防维护、赔偿和改进电池设计和生产。* Methods: 利用 early-life 数据(例如容量-电压数据) derivate 新的特征,以预测 cells 在不同充放速度、放电速度和充放深度下的寿命。* Results: 使用新生成的数据集(来自 225 个 nickel-manganese-cobalt/graphite Li-ion 电池),实现了准确预测 in-distribution cells 的寿命(15.1% 的 mean absolute percentage error),并且使用 hierarchical Bayesian regression model 可以更好地预测 extrapolation 情况(21.8% 的 mean absolute percentage error)。
    Abstract Accurate battery lifetime prediction is important for preventative maintenance, warranties, and improved cell design and manufacturing. However, manufacturing variability and usage-dependent degradation make life prediction challenging. Here, we investigate new features derived from capacity-voltage data in early life to predict the lifetime of cells cycled under widely varying charge rates, discharge rates, and depths of discharge. Features were extracted from regularly scheduled reference performance tests (i.e., low rate full cycles) during cycling. The early-life features capture a cell's state of health and the rate of change of component-level degradation modes, some of which correlate strongly with cell lifetime. Using a newly generated dataset from 225 nickel-manganese-cobalt/graphite Li-ion cells aged under a wide range of conditions, we demonstrate a lifetime prediction of in-distribution cells with 15.1% mean absolute percentage error using no more than the first 15% of data, for most cells. Further testing using a hierarchical Bayesian regression model shows improved performance on extrapolation, achieving 21.8% mean absolute percentage error for out-of-distribution cells. Our approach highlights the importance of using domain knowledge of lithium-ion battery degradation modes to inform feature engineering. Further, we provide the community with a new publicly available battery aging dataset with cells cycled beyond 80% of their rated capacity.
    摘要 importante battery lifetime prediction è importante per la manutenzione preventiva, le garanzie e l'improvviso design e produzione di celle. However, la variabilità di produzione e la degradazione dipendenti dall'utilizzo rendono la predizione della vita difficile. Ecco, investigiamo nuove feature derivate dai dati di capacità e tensione in primis dell' vita per predir la durata delle celle ciclate sotto caricate e discariche widely varying. Le feature sono estratte dai test di riferimento regolarmente programmati (ad esempio, cicli full low rate) durante il ciclo. Le feature early-life capture lo stato di salute della cella e il tasso di cambiamento dei modi di degradazione dei componenti, alcuni dei quali correlano strettamente con la durata della cella. Utilizzando un nuovo dataset generato da 225 celle nickel-manganese-cobalt/graphite Li-ion aged under a wide range of conditions, dimostriamo una predizione di vita di cellule in-distribution con un errore assoluto del 15,1% using no more than the first 15% of data, per la maggior parte delle celle. Further testing using a hierarchical Bayesian regression model shows improved performance on extrapolation, achieving 21,8% errore assoluto percentage per le cellule out-of-distribution. Our approach highlights the importance of using domain knowledge of lithium-ion battery degradation modes to inform feature engineering. Further, we provide the community with a new publicly available battery aging dataset with cells cycled beyond 80% of their rated capacity.

Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

  • paper_url: http://arxiv.org/abs/2307.08364
  • repo_url: https://github.com/LennartPurucker/PopulationBasedQDO-PostHocEnsembleSelectionAutoML
  • paper_authors: Lennart Purucker, Lennart Schneider, Marie Anastacio, Joeran Beel, Bernd Bischl, Holger Hoos
  • for: 提高预测性能(Post hoc ensemble learning)
  • methods: 引入两种新的人口基于 ensemble selection方法(QO-ES和QDO-ES),比较GES
  • results: 在71个分类数据集上测试,QO-ES和QDO-ES常常超过GES,但只有在验证数据上 statistically significant,并且发现多样性可以优化后期ensemble,但也增加预测风险。
    Abstract Automated machine learning (AutoML) systems commonly ensemble models post hoc to improve predictive performance, typically via greedy ensemble selection (GES). However, we believe that GES may not always be optimal, as it performs a simple deterministic greedy search. In this work, we introduce two novel population-based ensemble selection methods, QO-ES and QDO-ES, and compare them to GES. While QO-ES optimises solely for predictive performance, QDO-ES also considers the diversity of ensembles within the population, maintaining a diverse set of well-performing ensembles during optimisation based on ideas of quality diversity optimisation. The methods are evaluated using 71 classification datasets from the AutoML benchmark, demonstrating that QO-ES and QDO-ES often outrank GES, albeit only statistically significant on validation data. Our results further suggest that diversity can be beneficial for post hoc ensembling but also increases the risk of overfitting.
    摘要 自动机器学习(AutoML)系统通常会 ensemble 模型后增进预测性能,通常透过单调式排序(GES)。但我们认为 GES 可能不是最佳,因为它只是一个简单决策的排序。在这个工作中,我们引入了两种新的人口基于的ensemble选择方法,QO-ES 和 QDO-ES,并与 GES 进行比较。而 QO-ES 则优化仅对预测性能,而 QDO-ES 则考虑ensemble population 中的多样性,保持一个多样的集合 ensemble 的多样性在依据质量多样化优化。这些方法在 AutoML benchmark 中的 71 个分类 datasets 上进行评估,结果显示 QO-ES 和 QDO-ES 通常比 GES 高,但仅在验证数据上 statistically significant。我们的结果还表明了多样性可以帮助后续的ensemble,但也增加了过滤的风险。

Universal Online Learning with Gradual Variations: A Multi-layer Online Ensemble Approach

  • paper_url: http://arxiv.org/abs/2307.08360
  • repo_url: None
  • paper_authors: Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou
  • for: 这个论文是为了提出一种在线 convex optimization 方法,该方法可以在不同级别上适应不同类型和凹度的损失函数。
  • methods: 该方法基于一种多层在线 ensemble,包括新的优势函数和层次误差 correction。
  • results: 该方法可以获得 $\mathcal{O}(\ln V_T)$, $\mathcal{O}(d \ln V_T)$ 和 $\hat{\mathcal{O}(\sqrt{V_T})$ 的 regret bounds,其中 $d$ 是维度、$V_T$ 是问题依赖于的梯度变化。此外,该方法还有广泛的应用和意义,包括保证最坏情况下的 guarantees,直接从分析中提取小损 bounds,以及与对抗/随机 convex optimization 和游戏理论的深刻连接。
    Abstract In this paper, we propose an online convex optimization method with two different levels of adaptivity. On a higher level, our method is agnostic to the specific type and curvature of the loss functions, while at a lower level, it can exploit the niceness of the environments and attain problem-dependent guarantees. To be specific, we obtain $\mathcal{O}(\ln V_T)$, $\mathcal{O}(d \ln V_T)$ and $\hat{\mathcal{O}(\sqrt{V_T})$ regret bounds for strongly convex, exp-concave and convex loss functions, respectively, where $d$ is the dimension, $V_T$ denotes problem-dependent gradient variations and $\hat{\mathcal{O}(\cdot)$-notation omits logarithmic factors on $V_T$. Our result finds broad implications and applications. It not only safeguards the worst-case guarantees, but also implies the small-loss bounds in analysis directly. Besides, it draws deep connections with adversarial/stochastic convex optimization and game theory, further validating its practical potential. Our method is based on a multi-layer online ensemble incorporating novel ingredients, including carefully-designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability. Remarkably, despite its multi-layer structure, our algorithm necessitates only one gradient query per round, making it favorable when the gradient evaluation is time-consuming. This is facilitated by a novel regret decomposition equipped with customized surrogate losses.
    摘要 在这篇论文中,我们提出了一种在线凸优化方法,具有两个不同的水平的适应性。在更高的水平上,我们的方法是对具体类型和曲率的损失函数不偏袋,而在更低的水平上,它可以利用环境的温柔性并实现问题依赖的保证。具体来说,我们获得了$\mathcal{O}(\ln V_T)$, $\mathcal{O}(d \ln V_T)$和$\hat{\mathcal{O}(\sqrt{V_T})$的 regret bound,其中$d$是维度,$V_T$表示问题依赖的梯度变化,$\hat{\mathcal{O}(\cdot)$-notation忽略了$V_T$的对数因子。我们的结果具有广泛的应用和意义。它不仅保证了最坏情况的保证,而且直接从分析中获得了小损失的 bound。此外,它还与反对抗/随机凸优化和游戏理论之间存在深刻的连接,进一步证明其实用性。我们的方法基于一种多层在线ensemble,包括新的优势因子,例如精心设计的乐观性以及随机逻辑的级联 corrections。备注意的是,即使具有多层结构,我们的算法只需要每个回合一次获取梯度,因此在梯度评估是时间耗费的情况下,它是有利的。这是由一种新的 regret decomposition和自定义损失函数帮助实现的。

Zero-th Order Algorithm for Softmax Attention Optimization

  • paper_url: http://arxiv.org/abs/2307.08352
  • repo_url: None
  • paper_authors: Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song
  • for: 本研究旨在提高大型自然语言模型(LLM)的优化技术,特别是在计算梯度时的效率。
  • methods: 本研究提出了一种针对软max单元的零次顺序算法,通过只进行前向传输来approximately计算梯度。
  • results: 我们的算法能够高效地计算大规模LLM的梯度,并且可以在不同的语言模型中实现效果。
    Abstract Large language models (LLMs) have brought about significant transformations in human society. Among the crucial computations in LLMs, the softmax unit holds great importance. Its helps the model generating a probability distribution on potential subsequent words or phrases, considering a series of input words. By utilizing this distribution, the model selects the most probable next word or phrase, based on the assigned probabilities. The softmax unit assumes a vital function in LLM training as it facilitates learning from data through the adjustment of neural network weights and biases. With the development of the size of LLMs, computing the gradient becomes expensive. However, Zero-th Order method can approximately compute the gradient with only forward passes. In this paper, we present a Zero-th Order algorithm specifically tailored for Softmax optimization. We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs. By leveraging the Zeroth-Order method, our work contributes to the advancement of optimization techniques in the context of complex language models.
    摘要 With the development of the size of LLMs, computing the gradient becomes expensive. However, the Zero-th Order method can approximately compute the gradient with only forward passes. In this paper, we present a Zero-th Order algorithm specifically tailored for Softmax optimization. We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs. By leveraging the Zeroth-Order method, our work contributes to the advancement of optimization techniques in the context of complex language models.(Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China. The translation may differ slightly from the traditional Chinese writing system used in Hong Kong and Taiwan.)

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

  • paper_url: http://arxiv.org/abs/2307.08347
  • repo_url: https://github.com/cheliu-computation/m-flag-miccai2023
  • paper_authors: Che Liu, Sibo Cheng, Chen Chen, Mengyun Qiao, Weitong Zhang, Anand Shah, Wenjia Bai, Rossella Arcucci
  • for: 这篇研究旨在提出一种新的医疗视力语言模型预训练方法,以提高医疗视力语言模型的训练稳定性和效率。
  • methods: 提案方法名为M-FLAG,它利用固定的语言模型进行预训练,并引入一个新的正交对映损失来调和视力语言模型的内存空间几何。
  • results: 实验结果显示,M-FLAG方法可以对医疗视力语言模型进行有效的预训练,并在三个下游任务中表现出色:医疗图像分类、分割和物体检测。尤其是在分割任务中,M-FLAG方法只使用了RSNA数据集的1%,却可以超越已经精心适应的ImageNet预训练模型。
    Abstract Medical vision-language models enable co-learning and integrating features from medical imaging and clinical text. However, these models are not easy to train and the latent representation space can be complex. Here we propose a novel way for pre-training and regularising medical vision-language models. The proposed method, named Medical vision-language pre-training with Frozen language models and Latent spAce Geometry optimization (M-FLAG), leverages a frozen language model for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry. We demonstrate the potential of the pre-trained model on three downstream tasks: medical image classification, segmentation, and object detection. Extensive experiments across five public datasets demonstrate that M-FLAG significantly outperforms existing medical vision-language pre-training approaches and reduces the number of parameters by 78\%. Notably, M-FLAG achieves outstanding performance on the segmentation task while using only 1\% of the RSNA dataset, even outperforming ImageNet pre-trained models that have been fine-tuned using 100\% of the data.
    摘要 医疗视语模型可以同时学习医疗影像和临床文本特征。然而,这些模型不易于训练,其潜在表示空间可能很复杂。在这里,我们提出了一种新的医疗视语预训练方法,名为医疗视语预训练with Frozen language models和Latent spAce Geometry optimization(M-FLAG)。我们利用一个冻结的语言模型来保持训练稳定和高效,并引入了一种新的正交准则来融和潜在空间准则。我们在三个下游任务中展示了预训练模型的潜力:医疗影像分类、 segmentation 和对象检测。我们在五个公共数据集进行了广泛的实验,并证明了M-FLAG在现有的医疗视语预训练方法中显著超越,并将参数数量减少了78%。特别是,M-FLAG在分割任务上表现出色,只使用了RSNA数据集的1%,甚至超过了ImageNet预训练模型,这些模型在100%的数据上进行了精细调节。

Efficient selective attention LSTM for well log curve synthesis

  • paper_url: http://arxiv.org/abs/2307.10253
  • repo_url: None
  • paper_authors: Yuankai Zhou, Huanyu Li, Hu liu
  • for: 这 paper 是为了提出一种机器学习方法,用于预测缺失的井 logging 曲线。
  • methods: 该方法基于传统的 Long Short-Term Memory (LSTM) 神经网络,并添加了一个自注意机制来分析数据的空间相关性。
  • results: 实验结果表明,该方法可以高效地预测缺失的井 logging 曲线,并且比传统的 Fully Connected Neural Networks (FCNN) 和 LSTM 方法更高精度。
    Abstract Non-core drilling has gradually become the primary exploration method in geological engineering, and well logging curves have increasingly gained importance as the main carriers of geological information. However, factors such as geological environment, logging equipment, borehole quality, and unexpected events can all impact the quality of well logging curves. Previous methods of re-logging or manual corrections have been associated with high costs and low efficiency. This paper proposes a machine learning method that utilizes existing data to predict missing well logging curves, and its effectiveness and feasibility have been validated through experiments. The proposed method builds upon the traditional Long Short-Term Memory (LSTM) neural network by incorporating a self-attention mechanism to analyze the spatial dependencies of the data. It selectively includes the dominant computational results in the LSTM, reducing the computational complexity from O(n^2) to O(nlogn) and improving model efficiency. Experimental results demonstrate that the proposed method achieves higher accuracy compared to traditional curve synthesis methods based on Fully Connected Neural Networks (FCNN) and LSTM. This accurate, efficient, and cost-effective prediction method holds practical value in engineering applications.
    摘要 非核心钻探逐渐成为地质工程的主要探测方法,而井 logging 曲线也逐渐成为主要的地质信息传递者。然而,地质环境、钻探设备、井井质量和意外事件等因素都会影响井 logging 曲线的质量。过去的重新采样或手动修正方法均具有高成本和低效率。这篇论文提出了一种使用现有数据预测缺失井 logging 曲线的机器学习方法,并通过实验证明其效果和可行性。该方法基于传统的 Long Short-Term Memory (LSTM) 神经网络,并在该网络中添加了自注意机制来分析数据的空间相关性。它选择性地包含 LSTM 中的主导计算结果,从而将计算复杂性从 O(n^2) 降低到 O(nlogn),提高模型效率。实验结果表明,提议的方法与基于 Fully Connected Neural Networks (FCNN) 和 LSTM 的传统曲线合成方法相比,具有更高的准确率。这种准确、有效、Cost-effective 的预测方法在工程应用中具有实际价值。

Gaussian processes for Bayesian inverse problems associated with linear partial differential equations

  • paper_url: http://arxiv.org/abs/2307.08343
  • repo_url: None
  • paper_authors: Tianming Bai, Aretha L. Teckentrup, Konstantinos C. Zygalakis
  • for: 该论文关注使用 Gaussian 替身模型解决 bayesian 反问题,特别是只有小量训练数据的情况下。
  • methods: authors extend Raissi et al. (2017) 的框架,使用 PDE-informed Gaussian 假设来构建不同的approximate posteriors。
  • results: numerical experiments 表明,使用 PDE-informed Gaussian 假设可以提高模型的性能,比传统假设更好。Here’s the same information in English:
  • for: The paper focuses on using Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations, particularly in the regime where only a small amount of training data is available.
  • methods: The authors extend the framework of Raissi et al. (2017) to construct PDE-informed Gaussian priors, which are used to construct different approximate posteriors.
  • results: Numerical experiments demonstrate the superiority of the PDE-informed Gaussian priors over more traditional priors.
    Abstract This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inversion. We extend the framework of Raissi et. al. (2017) to construct PDE-informed Gaussian priors that we then use to construct different approximate posteriors. A number of different numerical experiments illustrate the superiority of the PDE-informed Gaussian priors over more traditional priors.
    摘要

RAYEN: Imposition of Hard Convex Constraints on Neural Networks

  • paper_url: http://arxiv.org/abs/2307.08336
  • repo_url: https://github.com/leggedrobotics/rayen
  • paper_authors: Jesus Tordesillas, Jonathan P. How, Marco Hutter
  • for: 这个论文是用来实现神经网络中的硬 convex 约束的框架,保证任何输入或者神经网络的参数,约束都会被满足。
  • methods: 这个框架使用了一些新的技术,例如不需要计算量占用的正交投影步骤、不需要软约束(不能保证约束在测试时都会被满足)、不需要保守的近似约束集和不需要慢速的内部梯度下降来保证约束。
  • results: 使用这个框架,可以很快地(比如1Kquadratic约束在1000维变量上的 overhead低于8ms,300x300稠密矩阵LMI约束在10000维变量上的 overhead低于12ms)进行约束优化问题的解决,而且可以保证约束的满足,计算时间比状态艺术算法快,计算结果几乎与最优解一致。
    Abstract This paper presents RAYEN, a framework to impose hard convex constraints on the output or latent variable of a neural network. RAYEN guarantees that, for any input or any weights of the network, the constraints are satisfied at all times. Compared to other approaches, RAYEN does not perform a computationally-expensive orthogonal projection step onto the feasible set, does not rely on soft constraints (which do not guarantee the satisfaction of the constraints at test time), does not use conservative approximations of the feasible set, and does not perform a potentially slow inner gradient descent correction to enforce the constraints. RAYEN supports any combination of linear, convex quadratic, second-order cone (SOC), and linear matrix inequality (LMI) constraints, achieving a very small computational overhead compared to unconstrained networks. For example, it is able to impose 1K quadratic constraints on a 1K-dimensional variable with an overhead of less than 8 ms, and an LMI constraint with 300x300 dense matrices on a 10K-dimensional variable in less than 12 ms. When used in neural networks that approximate the solution of constrained optimization problems, RAYEN achieves computation times between 20 and 7468 times faster than state-of-the-art algorithms, while guaranteeing the satisfaction of the constraints at all times and obtaining a cost very close to the optimal one.
    摘要

A Machine Learning based Empirical Evaluation of Cyber Threat Actors High Level Attack Patterns over Low level Attack Patterns in Attributing Attacks

  • paper_url: http://arxiv.org/abs/2307.10252
  • repo_url: None
  • paper_authors: Umara Noor, Sawera Shahid, Rimsha Kanwal, Zahid Rashid
    for:这个论文旨在探讨了Cyber threat attribution的问题,即在网络空间中识别攻击者的过程。methods:这篇论文使用了手动分析攻击 patrerns,包括骗财机制、侵入检测系统、防火墙和trace-back过程,以及高级指标(IOC)和低级指标(IOC)的比较。results:实验结果显示,高级指标(IOC)训练模型可以有95%的准确率地归类攻击,而低级指标(IOC)训练模型的准确率只有40%。
    Abstract Cyber threat attribution is the process of identifying the actor of an attack incident in cyberspace. An accurate and timely threat attribution plays an important role in deterring future attacks by applying appropriate and timely defense mechanisms. Manual analysis of attack patterns gathered by honeypot deployments, intrusion detection systems, firewalls, and via trace-back procedures is still the preferred method of security analysts for cyber threat attribution. Such attack patterns are low-level Indicators of Compromise (IOC). They represent Tactics, Techniques, Procedures (TTP), and software tools used by the adversaries in their campaigns. The adversaries rarely re-use them. They can also be manipulated, resulting in false and unfair attribution. To empirically evaluate and compare the effectiveness of both kinds of IOC, there are two problems that need to be addressed. The first problem is that in recent research works, the ineffectiveness of low-level IOC for cyber threat attribution has been discussed intuitively. An empirical evaluation for the measure of the effectiveness of low-level IOC based on a real-world dataset is missing. The second problem is that the available dataset for high-level IOC has a single instance for each predictive class label that cannot be used directly for training machine learning models. To address these problems in this research work, we empirically evaluate the effectiveness of low-level IOC based on a real-world dataset that is specifically built for comparative analysis with high-level IOC. The experimental results show that the high-level IOC trained models effectively attribute cyberattacks with an accuracy of 95% as compared to the low-level IOC trained models where accuracy is 40%.
    摘要 “网络威胁识别是指在网络空间中识别攻击事件的具体执行者。正确和及时的威胁识别对于防止未来攻击提供了重要的防御机制。现今,安全分析师仍然采用手动分析攻击模式,包括骗子部署、入侵检测系统、防火墙等,以及跟踪返回过程,进行威胁识别。这些攻击模式被称为低级别征识(Indicators of Compromise,IOC)。它们表示敌对者在其攻击活动中使用的策略、技术、程序(Tactics, Techniques, Procedures,TTP)和软件工具。敌对者很少重复使用这些攻击模式,它们也可以被修改,导致假的和不公正的威胁识别。为了empirically评估和比较低级别征识和高级别征识的效果,这些问题需要被解决。第一个问题是,在当前的研究中,低级别征识的效果不够有效性已经被直观提出。empirical评估基于实际数据集的低级别征识效果缺失。第二个问题是,可用的高级别征识数据集中每个预测类别的单个实例不能直接用于机器学习模型训练。为解决这些问题,我们在这项研究中Empirically评估了低级别征识的效果,并使用特定 для比较分析的实际数据集。实验结果显示,高级别征识训练模型可以准确地归类攻击事件,准确率达95%,而低级别征识训练模型的准确率只有40%。”

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

  • paper_url: http://arxiv.org/abs/2307.08327
  • repo_url: None
  • paper_authors: Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak
  • for: 这个论文探讨了深度学习模型对针对性攻击的抵触性。
  • methods: 作者采用了一种ML基于的文本分类模型,然后引入了针对性攻击来影响模型的分类性能。
  • results: 研究发现,针对性攻击可以轻松地使模型错误地预测文本。此外,作者还分析了模型的解释性前后针对性攻击,以了解模型在攻击后的性能。
    Abstract Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack
    摘要 adversarial 攻击是一种针对机器学习模型的攻击,攻击者故意修改输入,以让模型作出错误预测。 adversarial 攻击可能会有严重的后果,特别是在自动驾驶、医疗诊断和安全系统等应用中。我们在深度学习模型对 adversarial 攻击的抵触性上进行了研究。我们开发了一个基于 ML 的文本数据分类模型,然后引入了对文本数据的 adversarial 偏移,以了解攻击后分类性能。接着,我们分析和解释模型在攻击后的解释性。

A Secure Aggregation for Federated Learning on Long-Tailed Data

  • paper_url: http://arxiv.org/abs/2307.08324
  • repo_url: None
  • paper_authors: Yanna Jiang, Baihe Ma, Xu Wang, Guangsheng Yu, Caijun Sun, Wei Ni, Ren Ping Liu
  • for: 本研究针对 Federated Learning (FL) 面临的两大挑战:数据分布不均和模型攻击。
  • methods: 提出了一种新的两层聚合方法,可以拒绝恶意模型和选择值得模型,并具有较好的抗辐射性。
  • results: 实验表明,思想团(think tank)可以有效地选择模型进行全局聚合。
    Abstract As a distributed learning, Federated Learning (FL) faces two challenges: the unbalanced distribution of training data among participants, and the model attack by Byzantine nodes. In this paper, we consider the long-tailed distribution with the presence of Byzantine nodes in the FL scenario. A novel two-layer aggregation method is proposed for the rejection of malicious models and the advisable selection of valuable models containing tail class data information. We introduce the concept of think tank to leverage the wisdom of all participants. Preliminary experiments validate that the think tank can make effective model selections for global aggregation.
    摘要 作为分布式学习的一种形式, federated learning (FL) 面临两大挑战:训练数据在参与者中的不均匀分布,以及由拜占庭节点引起的模型攻击。在这篇论文中,我们考虑了在 FL 场景中存在长板分布和拜占庭节点的情况下的长板分布。我们提出了一种新的两层聚合方法,用于拒绝恶意模型和选择值得采用的模型,并具有拥有尾类数据信息。我们引入了“思想库”的概念,以利用所有参与者的智慧。初步实验表明,思想库可以做到有效地选择模型进行全球聚合。

Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

  • paper_url: http://arxiv.org/abs/2307.08318
  • repo_url: None
  • paper_authors: Ron Keuth, Mattias Heinrich, Martin Eichenlaub, Marian Himstedt
  • For: This paper provides a novel approach for navigation guidance during bronchoscopy interventions without the need for electromagnetic tracking or patient-specific CT scans.* Methods: The proposed approach uses topological bronchoscope localization and incorporates sequences of CNN-based airway likelihoods into a Hidden Markov Model, leveraging anatomical constraints and temporal context for improved accuracy.* Results: The approach is evaluated in a lung phantom model and achieves an accuracy of up to 0.98 compared to 0.81 for a classification based on individual frames, demonstrating the effectiveness of the proposed method.
    Abstract Purpose: Navigation guidance is a key requirement for a multitude of lung interventions using video bronchoscopy. State-of-the-art solutions focus on lung biopsies using electromagnetic tracking and intraoperative image registration w.r.t. preoperative CT scans for guidance. The requirement of patient-specific CT scans hampers the utilisation of navigation guidance for other applications such as intensive care units. Methods: This paper addresses navigation guidance solely incorporating bronchosopy video data. In contrast to state-of-the-art approaches we entirely omit the use of electromagnetic tracking and patient-specific CT scans. Guidance is enabled by means of topological bronchoscope localization w.r.t. an interpatient airway model. Particularly, we take maximally advantage of anatomical constraints of airway trees being sequentially traversed. This is realized by incorporating sequences of CNN-based airway likelihoods into a Hidden Markov Model. Results: Our approach is evaluated based on multiple experiments inside a lung phantom model. With the consideration of temporal context and use of anatomical knowledge for regularization, we are able to improve the accuracy up to to 0.98 compared to 0.81 (weighted F1: 0.98 compared to 0.81) for a classification based on individual frames. Conclusion: We combine CNN-based single image classification of airway segments with anatomical constraints and temporal HMM-based inference for the first time. Our approach renders vision-only guidance for bronchoscopy interventions in the absence of electromagnetic tracking and patient-specific CT scans possible.
    摘要 目的:用视频镜头导航是肺间化学疗法中不可或缺的一种重要需求。现代解决方案主要关注于基于电磁场追踪和在手术过程中对先前的CT扫描图进行图像匹配的医学器械导航。但是,需要患者特定的CT扫描图的使用限制了导航导航的使用范围只能用于血液急救室等其他应用。方法:本文提出一种具有视频镜头导航功能的新方法,与现有方法不同之处在于完全没有使用电磁场追踪和患者特定的CT扫描图。导航是基于气管内部空间模型和视频镜头数据进行的,具有较高的准确率和可靠性。结果:我们在肺脏模型中进行了多次实验,结果表明,通过利用空间和时间上的约束和图像分类的权重补做,我们可以提高准确率至0.98(weighted F1 score: 0.98),比对ividual帧的分类结果(0.81)高出了17.6%。结论:我们在空间和时间上具有约束的HMM模型中结合了CNN基于单帧图像分类和空间约束,实现了没有电磁场追踪和患者特定CT扫描图的视频镜头导航。这种方法可以在肺间化学疗法中提供更加可靠和高效的导航导航。

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

  • paper_url: http://arxiv.org/abs/2307.08303
  • repo_url: https://github.com/zhiyuanpeng/sptar
  • paper_authors: Zhiyuan Peng, Xuyang Wu, Yi Fang
  • for: 提高 dense retrieval 模型的性能,尤其是在lacking domain-specific training data的情况下。
  • methods: 使用 soft prompt tuning 方法,通过优化任务特定的软提示来提高 LLMs 生成的弱查询语句质量,然后使用这些弱查询语句来训练任务特定的 dense retriever。
  • results: SPTAR 方法在不supervised baselines BM25 和 LLMs-based augmentation method 的基础上具有更高的性能,可以提高 dense retrieval 模型的搜索效果。
    Abstract Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag unlabeled documents with weak queries, yielding enough weak document-query pairs to train task-specific dense retrievers. We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries. To the best of our knowledge, there is no prior work utilizing soft prompt tuning to augment DR models. The experiments demonstrate that SPTAR outperforms the unsupervised baselines BM25 and the recently proposed LLMs-based augmentation method for DR.
    摘要 dense retrieval (DR) 将查询和文档转换为紧凑表示并在 vector 空间中度量查询和文档之间的相似性。DR 的一个挑战是缺乏域pecific 训练数据。虽然 DR 模型可以通过转移学习从大规模公共数据集如 MS MARCO 学习,但证据表明不 все DR 模型和领域可以受益于转移学习相同。 reciently,一些研究人员已经使用大语言模型 (LLMs) 来提高零ocket 和几ocket DR 模型。然而,使用的 hard prompts 或人工写的 prompts 无法保证生成的弱 queries 的好质量。为了解决这个问题,我们提出了软提示调整 для增强 DR (SPTAR):对每个任务,我们利用软提示调整来优化任务特定的软提示,然后使用 LLMs 将标注无标注文档,生成足够的弱文档-查询对以训练任务特定的紧凑检索器。我们设计了一个筛选器来选择高质量的示例文档-查询对,以进一步提高弱标注查询的质量。到目前为止,没有什么先进的工作利用软提示调整来增强 DR 模型。实验表明,SPTAR 超过了无监督基准和最近提出的基于 LLMs 的增强方法。

GBT: Two-stage transformer framework for non-stationary time series forecasting

  • paper_url: http://arxiv.org/abs/2307.08302
  • repo_url: https://github.com/origamisl/gbt
  • paper_authors: Li Shen, Yuning Wei, Yangzhu Wang
  • for: 本研究旨在解决时间序列预测变换器(TSFT)的严重过拟合问题,尤其是在处理非站ARY时间序列时。
  • methods: 我们提出了一种新的两阶段变换器框架,称为Good Beginning Transformer(GBT),它将TSFT的预测过程分解成两个阶段:自动回归阶段和自我回归阶段。在自动回归阶段,预测结果作为一个更好的初始化方法,并在自我回归阶段进行进一步的预测。
  • results: 我们在七个基准数据集上进行了广泛的实验,结果显示GBT在预测能力方面超过了现有的TSFT和其他预测模型(SCINet、N-HiTS等),并且具有较低的时间和空间复杂度。GBT还可以与这些模型结合使用,以增强其预测能力。
    Abstract This paper shows that time series forecasting Transformer (TSFT) suffers from severe over-fitting problem caused by improper initialization method of unknown decoder inputs, esp. when handling non-stationary time series. Based on this observation, we propose GBT, a novel two-stage Transformer framework with Good Beginning. It decouples the prediction process of TSFT into two stages, including Auto-Regression stage and Self-Regression stage to tackle the problem of different statistical properties between input and prediction sequences.Prediction results of Auto-Regression stage serve as a Good Beginning, i.e., a better initialization for inputs of Self-Regression stage. We also propose Error Score Modification module to further enhance the forecasting capability of the Self-Regression stage in GBT. Extensive experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA TSFTs (FEDformer, Pyraformer, ETSformer, etc.) and many other forecasting models (SCINet, N-HiTS, etc.) with only canonical attention and convolution while owning less time and space complexity. It is also general enough to couple with these models to strengthen their forecasting capability. The source code is available at: https://github.com/OrigamiSL/GBT
    摘要 To further enhance the forecasting capability of GBT, we propose an Error Score Modification module. This module adjusts the error scores of the Self-Regression stage to better handle the difference in statistical properties between the input and prediction sequences.Our extensive experiments on seven benchmark datasets show that GBT outperforms state-of-the-art TSFTs (FEDformer, Pyraformer, ETSformer, etc.) and other forecasting models (SCINet, N-HiTS, etc.) with only canonical attention and convolution, while requiring less time and space complexity. Additionally, GBT is general enough to be combined with these models to strengthen their forecasting capability. The source code is available at: .

Systematic Testing of the Data-Poisoning Robustness of KNN

  • paper_url: http://arxiv.org/abs/2307.08288
  • repo_url: None
  • paper_authors: Yannan Li, Jingbo Wang, Chao Wang
  • for: 这篇论文目的是提高机器学习基于训练集的软件组件的数据欺走抵触性。
  • methods: 该论文提出了一种系统测试基于方法,可以证明和证伪数据欺走 robustness。
  • results: 该方法比基eline枚举方法快速和准确,可以快速缩小搜索空间,并通过系统测试在具体空间找到实际的违反。测试结果表明,该方法可以有效地判断k- nearest neighbors(KNN)预测结果的数据欺走Robustness。
    Abstract Data poisoning aims to compromise a machine learning based software component by contaminating its training set to change its prediction results for test inputs. Existing methods for deciding data-poisoning robustness have either poor accuracy or long running time and, more importantly, they can only certify some of the truly-robust cases, but remain inconclusive when certification fails. In other words, they cannot falsify the truly-non-robust cases. To overcome this limitation, we propose a systematic testing based method, which can falsify as well as certify data-poisoning robustness for a widely used supervised-learning technique named k-nearest neighbors (KNN). Our method is faster and more accurate than the baseline enumeration method, due to a novel over-approximate analysis in the abstract domain, to quickly narrow down the search space, and systematic testing in the concrete domain, to find the actual violations. We have evaluated our method on a set of supervised-learning datasets. Our results show that the method significantly outperforms state-of-the-art techniques, and can decide data-poisoning robustness of KNN prediction results for most of the test inputs.
    摘要 “数据毒化”是一种攻击机器学习基础的软件元件,通过污染它的训练集,让它的预测结果对测试输入进行变化。现有的方法可以评估数据毒化Robustness,但是它们的精度受限,或者执行时间很长,而且它们只能认证一些真正可靠的情况,但是无法确定不可靠的情况。为了解决这个限制,我们提出了一个系统性的测试方法,可以确定以及否定数据毒化Robustness,这个方法在一个广泛使用的超过近边法(KNN)上进行了评估。我们的方法比基准枚举方法更快和更精度,是因为我们使用了一种新的抽象领域中的误差分析,快速地缩小搜索空间,并且在实际领域中进行系统性的测试,实际找到了违背的情况。我们在一些超过近边法的数据上进行了评估,结果显示,我们的方法可以在大多数的测试输入上实现数据毒化Robustness的决定。”

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

  • paper_url: http://arxiv.org/abs/2307.08286
  • repo_url: None
  • paper_authors: Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu
  • for: 本研究探讨了神经网络训练过程中的一些有趣实验现象,包括Linear Mode Connectivity(LMC)等。
  • methods: 本研究使用了多种方法来探讨LMC和Layerwise Linear Feature Connectivity(LLFC)的现象,包括随机排序和生成新的网络等。
  • results: 研究发现,当两个训练过的网络满足LMC时,它们通常也满足LLFC在大多数层次。此外,研究还探讨了LLFC的下面因素,提供了新的思路和技术来理解LMC和LLFC。
    Abstract Recent work has revealed many intriguing empirical phenomena in neural network training, despite the poorly understood and highly complex loss landscapes and training dynamics. One of these phenomena, Linear Mode Connectivity (LMC), has gained considerable attention due to the intriguing observation that different solutions can be connected by a linear path in the parameter space while maintaining near-constant training and test losses. In this work, we introduce a stronger notion of linear connectivity, Layerwise Linear Feature Connectivity (LLFC), which says that the feature maps of every layer in different trained networks are also linearly connected. We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC (via either spawning or permutation methods), they also satisfy LLFC in nearly all the layers. Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the spawning and permutation approaches. The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.
    摘要

Complexity Matters: Rethinking the Latent Space for Generative Modeling

  • paper_url: http://arxiv.org/abs/2307.08283
  • repo_url: None
  • paper_authors: Tianyang Hu, Fei Chen, Haonan Wang, Jiawei Li, Wenjia Wang, Jiacheng Sun, Zhenguo Li
  • for: 本研究旨在探讨generative模型中latent space的选择,尤其是如何选择最佳的latent space,以提高generative性能。
  • methods: 我们提出了一种基于模型复杂度的latent space选择方法,并提出了一种两阶段训练策略called Decoupled Autoencoder (DAE),可以改善latent distribution并提高生成性能。
  • results: 我们的理论分析和实验结果表明,DAE可以提高sample质量,同时降低模型的复杂度。
    Abstract In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion models the latent space induced by an encoder and generates images through a paired decoder. Although the selection of the latent space is empirically pivotal, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity.
    摘要 在生成模型中,许多成功的方法利用低维度的隐藏空间,例如稳定扩散模型,通过一个匹配的解码器生成图像。although the selection of the latent space is crucial, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity.Here's the translation in Traditional Chinese:在生成模型中,许多成功的方法利用低维度的隐藏空间,例如稳定扩散模型,通过一个匹配的解码器生成图像。although the selection of the latent space is crucial, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity.

Certifying the Fairness of KNN in the Presence of Dataset Bias

  • paper_url: http://arxiv.org/abs/2307.08722
  • repo_url: None
  • paper_authors: Yannan Li, Jingbo Wang, Chao Wang
  • for: The paper is written for certifying the fairness of the classification result of the k-nearest neighbors (KNN) algorithm under the assumption of historical bias in the training data.
  • methods: The paper proposes a method for certifying fairness based on three variants of fairness definitions: individual fairness, $\epsilon$-fairness, and label-flipping fairness. The method uses sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm to reduce computational cost.
  • results: The paper shows the effectiveness of the proposed method through experimental evaluation on six widely used datasets in the fairness research literature. The method is able to obtain fairness certifications for a large number of test inputs despite the presence of historical bias in the datasets.
    Abstract We propose a method for certifying the fairness of the classification result of a widely used supervised learning algorithm, the k-nearest neighbors (KNN), under the assumption that the training data may have historical bias caused by systematic mislabeling of samples from a protected minority group. To the best of our knowledge, this is the first certification method for KNN based on three variants of the fairness definition: individual fairness, $\epsilon$-fairness, and label-flipping fairness. We first define the fairness certification problem for KNN and then propose sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm. This is meant to lift the computation results from the concrete domain to an abstract domain, to reduce the computational cost. We show effectiveness of this abstract interpretation based technique through experimental evaluation on six datasets widely used in the fairness research literature. We also show that the method is accurate enough to obtain fairness certifications for a large number of test inputs, despite the presence of historical bias in the datasets.
    摘要 我们提出了一种方法,用于证明某种广泛使用的直接学习算法(k-最近邻)的分类结果是否公平,假设训练数据可能受到历史偏见的影响,特别是保护少数群体样本的系统化错误标注。根据我们所知,这是第一种基于三种公平定义(个体公平、ε-公平和标签抓取公平)的公平证明方法。我们首先定义了公平证明问题,然后提出了使用现代KNN算法中的复杂数学计算的准确估计方法,以减少计算成本。我们通过实验评估六个广泛用于公平研究文献中的数据集,并证明了这种抽象计算方法的有效性。我们还证明了方法可以快速获得大量测试输入的公平证明,即使训练数据中存在历史偏见。

Automated Action Model Acquisition from Narrative Texts

  • paper_url: http://arxiv.org/abs/2307.10247
  • repo_url: None
  • paper_authors: Ruiqi Li, Leyang Cui, Songtuan Lin, Patrik Haslum
  • for: 本研究旨在提高AI代理人的规划技术应用,通过自动从叙述文本中提取струк成事件和生成 планинг语言风格的动作模型。
  • methods: 本研究使用了自动提取叙述文本中的结构事件,并通过预测通用常识事件关系、文本矛盾和相似性来生成 planning-language-style 动作模型。
  • results: 实验结果表明,NaRuto可以在经典叙述规划领域生成高质量的动作模型,与现有的完全自动方法相当,甚至与半自动方法相当。
    Abstract Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents. Action model acquisition has been identified as a bottleneck in the application of planning technology, especially within narrative planning. Acquiring action models from narrative texts in an automated way is essential, but challenging because of the inherent complexities of such texts. We present NaRuto, a system that extracts structured events from narrative text and subsequently generates planning-language-style action models based on predictions of commonsense event relations, as well as textual contradictions and similarities, in an unsupervised manner. Experimental results in classical narrative planning domains show that NaRuto can generate action models of significantly better quality than existing fully automated methods, and even on par with those of semi-automated methods.
    摘要 文本翻译为简化中文。<>行动模型,即前提/效果axioms,为AI代理人提供了 causal 和 motivational 连接。行动模型获取被识别为规划技术应用的瓶颈,尤其在叙述规划领域。自动从叙述文本中获取行动模型是重要,但具有内在复杂性。我们提出了NaRuto系统,该系统通过预测常识事件关系以及文本矛盾和相似性来自动生成 планинг语言风格的行动模型,并在经验领域中达到了现有完全自动方法的水平。

Adversarial Attacks on Traffic Sign Recognition: A Survey

  • paper_url: http://arxiv.org/abs/2307.08278
  • repo_url: None
  • paper_authors: Svetlana Pavlitska, Nico Lambing, J. Marius Zöllner
  • for: 本研究旨在探讨攻击 autonomous driving 系统的可能性,尤其是针对交通标识模型的攻击。
  • methods: 本研究准确地描述了现有的攻击方法,包括数字和实际攻击。
  • results: 研究发现,现有的攻击方法可以轻松地破坏交通标识模型的正常工作,需要进一步的研究以减少这些攻击的风险。
    Abstract Traffic sign recognition is an essential component of perception in autonomous vehicles, which is currently performed almost exclusively with deep neural networks (DNNs). However, DNNs are known to be vulnerable to adversarial attacks. Several previous works have demonstrated the feasibility of adversarial attacks on traffic sign recognition models. Traffic signs are particularly promising for adversarial attack research due to the ease of performing real-world attacks using printed signs or stickers. In this work, we survey existing works performing either digital or real-world attacks on traffic sign detection and classification models. We provide an overview of the latest advancements and highlight the existing research areas that require further investigation.
    摘要 自动驾驶车辆的辨识功能中,交通标志识别是一个关键组件,目前大多使用深度神经网络(DNN)来实现。但是,DNN受到恶意攻击的可能性很高。先前的研究已经证明了对交通标志识别模型的攻击的可能性。由于交通标志的易攻击性,使得实际攻击更加容易。在这种情况下,我们对现有的数字和实际攻击研究进行了抽象和概述,并高亮了需要进一步研究的领域。

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)

  • paper_url: http://arxiv.org/abs/2307.10246
  • repo_url: None
  • paper_authors: Subba Reddy Oota, Manish Gupta, Raju S. Bapi, Gael Jobard, Frederic Alexandre, Xavier Hinaut
  • for: 研究大脑如何表示不同的信息模式,以及设计一个系统可以自动理解用户的思维?
  • methods: 使用 функциональ磁共振成像(fMRI)记录大脑活动,并提出了多种基于深度学习的编码和解码模型。
  • results: 这些模型可以用于评估和诊断神经科学问题,以及设计大脑机器或计算机界面。
    Abstract How does the brain represent different modes of information? Can we design a system that automatically understands what the user is thinking? Such questions can be answered by studying brain recordings like functional magnetic resonance imaging (fMRI). As a first step, the neuroscience community has contributed several large cognitive neuroscience datasets related to passive reading/listening/viewing of concept words, narratives, pictures and movies. Encoding and decoding models using these datasets have also been proposed in the past two decades. These models serve as additional tools for basic research in cognitive science and neuroscience. Encoding models aim at generating fMRI brain representations given a stimulus automatically. They have several practical applications in evaluating and diagnosing neurological conditions and thus also help design therapies for brain damage. Decoding models solve the inverse problem of reconstructing the stimuli given the fMRI. They are useful for designing brain-machine or brain-computer interfaces. Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, recently several neural encoding and decoding models have been proposed. In this survey, we will first discuss popular representations of language, vision and speech stimuli, and present a summary of neuroscience datasets. Further, we will review popular deep learning based encoding and decoding architectures and note their benefits and limitations. Finally, we will conclude with a brief summary and discussion about future trends. Given the large amount of recently published work in the `computational cognitive neuroscience' community, we believe that this survey nicely organizes the plethora of work and presents it as a coherent story.
    摘要 如何让脑子表示不同的信息?我们可以通过研究脑电图像(fMRI)来回答这些问题。脑科学社区已经提供了许多大量的认知神经科学数据集,这些数据集关于静止阅读/听取/观看概念词、故事、图片和电影。使用这些数据集,以前已经提出了编码和解码模型。这些模型可以用于基础研究认知科学和神经科学。编码模型可以自动生成脑电图像,它们有许多实际应用,如诊断和治疗神经系统疾病。解码模型可以 reconstruction 脑电图像,它们有用于设计脑机或脑计算机界面。鼓励于深度学习模型在自然语言处理、计算机视觉和语音处理等领域的效果,最近几年有很多 neural encoding 和 decoding 模型被提出。在这篇评论中,我们将首先讲讲语言、视觉和听说 stimuli 的受欢迎表示,并提供脑科学数据集的摘要。然后,我们将回顾深度学习基于编码和解码架构的一些模型,并注意它们的优点和局限性。最后,我们将结束于简要的总结和讨论,并讨论未来的趋势。由于最近出版的大量工作在 'computational cognitive neuroscience' 社区,我们认为这篇评论 nicely 组织了这些工作,并将它们表现为一个coherent 的故事。

Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats

  • paper_url: http://arxiv.org/abs/2308.01921
  • repo_url: None
  • paper_authors: Wei Chen, Yihui Ren, Ai Kagawa, Matthew R. Carbone, Samuel Yen-Chi Chen, Xiaohui Qu, Shinjae Yoo, Austin Clyde, Arvind Ramanathan, Rick L. Stevens, Hubertus J. J. van Dam, Deyu Liu
  • for: 这个论文的目的是为了快速屏测药物分子,以便在药物发现过程中快速搜索出有效的药物候选者。
  • methods: 这个论文使用的方法是基于蛋白质绑定亲和力的图 neural fingerprint方法,这种方法可以在高速和高准确性之间进行药物 docking 模拟。
  • results: 这个论文的结果表明,使用图 neural fingerprint方法可以对 COVID-19 药物 docking 问题进行高效的虚拟屏测,并且其预测精度比传统的圆形指纹方法更高, сред平方误差小于 $0.21$ kcal/mol。此外, authors 还提出了一种可以适用于未知目标的转移性图 neural fingerprint方法,该方法可以在多个目标上进行训练,并且与特定目标的图 neural fingerprint模型具有相似的准确性。
    Abstract Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.
    摘要 Translated into Simplified Chinese:快速药物探测基于药物结合亲和力是药物发现管道中一个重要步骤。图像神经指纹是一种有前途的方法,可以快速、高精度地实现药物对抗体结合。在这项研究中,我们建立了一个COVID-19药物探测集合,包含约300,000个药物候选者,对23种新型冠状病毒蛋白目标进行了探测。使用这些数据集,我们训练了图像神经指纹对高通量虚拟COVID-19药物探测进行了训练。图像神经指纹模型在多数探测目标上显示了高精度预测吸附分数,与传统径向指纹方法相比,显示了明显的改善。为使 neural fingerprint 可以应用于未知目标,我们还提出了多目标图像神经指纹方法。与特定目标图像神经指纹模型相比,多目标模型在训练和数据效率方面表现出色。我们强调,这项研究的影响不仅限于COVID-19数据集,我们的方法可以轻松地适应和整合到一个通用的机器学习加速管道中,以应对未来的生物威胁。

Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors

  • paper_url: http://arxiv.org/abs/2307.10244
  • repo_url: https://github.com/vu-detail/pytei
  • paper_authors: Dongning Ma, Xun Jiao, Fred Lin, Mengshi Zhang, Alban Desmaison, Thomas Sellinger, Daniel Moore, Sriram Sankar
  • for: 这 paper 是关于深度推荐系统(DRS)的可靠性研究,以寻找在大规模队列系统中发现的硬件错误对 DRS 的影响。
  • methods: 这 paper 使用了 PyTorch 构建了一个简单、高效、可扩展的错误插入框架(Terrorch),以测试 DRS 的可靠性。
  • results: 研究发现,DRS 对硬件错误的抵抗力受到多种因素的影响,包括模型参数和输入特征。研究还发现,使用活动clipping可以提高 AUC-ROC 分数,达到30%的恢复率。
    Abstract Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware errors. We develop Terrorch, a user-friendly, efficient and flexible error injection framework on top of the widely-used PyTorch. We evaluate a wide range of models and datasets and observe that the DRS robustness against hardware errors is influenced by various factors from model parameters to input characteristics. We also explore 3 error mitigation methods including algorithm based fault tolerance (ABFT), activation clipping and selective bit protection (SBP). We find that applying activation clipping can recover up to 30% of the degraded AUC-ROC score, making it a promising mitigation method.
    摘要 深度推荐系统(DRS)强依赖特殊的高性能计算硬件和加速器来优化能效和推荐质量。尽管大规模队列系统中DRS的可靠性受到许多硬件错误的影响,但DRS的可靠性问题还尚未得到足够的关注。本文提出了DRS可靠性对硬件错误的首次系统性研究。我们开发了一个简单、高效和灵活的错误插入框架——Terrorch,并在PyTorch上实现。我们对各种模型和数据集进行了广泛的测试,发现DRS对硬件错误的可靠性受到多种因素的影响,从模型参数到输入特征。我们还探讨了3种错误缓解方法,包括算法基于缺陷tolerance(ABFT)、活动截断和选择性位保护(SBP)。我们发现通过实施活动截断可以恢复30%的降低的AUC-ROC分数,这表明这是一种有前途的缓解方法。

Convex Bi-Level Optimization Problems with Non-smooth Outer Objective Function

  • paper_url: http://arxiv.org/abs/2307.08245
  • repo_url: None
  • paper_authors: Roey Merchav, Shoham Sabach
  • for: 解决 convex bi-level 优化问题
  • methods: 提出 Bi-Sub-Gradient (Bi-SG) 方法,基于 classical sub-gradient 方法的一种泛化
  • results: Bi-SG 方法可以在 convex bi-level 优化问题中实现 sub-线性速率,并且如果外部目标函数具有强度 convexity,可以提高外部速率至线性速率。此外,我们证明 Bi-SG 方法生成的序列与 bi-level 优化问题的优化解的距离 converges to zero.
    Abstract In this paper, we propose the Bi-Sub-Gradient (Bi-SG) method, which is a generalization of the classical sub-gradient method to the setting of convex bi-level optimization problems. This is a first-order method that is very easy to implement in the sense that it requires only a computation of the associated proximal mapping or a sub-gradient of the outer non-smooth objective function, in addition to a proximal gradient step on the inner optimization problem. We show, under very mild assumptions, that Bi-SG tackles bi-level optimization problems and achieves sub-linear rates both in terms of the inner and outer objective functions. Moreover, if the outer objective function is additionally strongly convex (still could be non-smooth), the outer rate can be improved to a linear rate. Last, we prove that the distance of the generated sequence to the set of optimal solutions of the bi-level problem converges to zero.
    摘要 在本文中,我们提出了Bi-Sub-Gradient(Bi-SG)方法,这是对凸二级优化问题的一种普适化。这是一种一阶方法,只需计算相关的贸易映射或外层非凸目标函数的子gradient,以及内部优化问题的质量步骤。我们证明,在非常轻松的假设下,Bi-SG可以解决二级优化问题,并在内部和外部目标函数上实现下行速率。此外,如果外层目标函数另外是强Converter (仍然可能是非凸),我们可以提高外层速率到线性速率。最后,我们证明生成的序列与二级优化问题的最佳解集的距离 converge to zero。

A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection

  • paper_url: http://arxiv.org/abs/2307.08237
  • repo_url: None
  • paper_authors: Jing Ma, Chen Chen, Anil Vullikanti, Ritwick Mishra, Gregory Madden, Daniel Borrajo, Jundong Li
  • for: The paper is written to study the problem of causal effect estimation with treatment entangled in a graph, and to propose a novel method (NEAT) to tackle this challenge.
  • methods: The proposed method NEAT explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling.
  • results: The proposed method is validated through experiments on both synthetic datasets and a real-world MRSA dataset, and provides effective results in estimating causal effects with entangled treatments.
    Abstract Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals -- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications.
    摘要 MRSA(多剂肠炎杆菌)是一种抗药菌,它的感染难以预防。在抗生素耗用多年的尝试下,许多研究被提出来估计MRSA感染的 causal effect,从观察数据中获得。在这个问题中,治疗分配机制扮演着关键的角色,它确定了潜在的缺失对照数据的模式——基本挑战 causal effect 估计。大多数现有的观察数据研究假设每个单元都 individually 接受了治疗。然而,在许多情况下,治疗是在图表中连接的单元之间分配的,即不同单元的治疗是 entangled 的。忽略这些杂合的治疗可能会妨碍 causal effect 估计。在这篇文章中,我们研究了图表中的 causal effect 估计问题。虽然有一些对 entangled 治疗的探索,但这个问题仍然具有挑战,因为:(1)杂合带来了对 treatment assignment mechanism 的模型和利用的困难;(2)可能存在隐藏的假设因素,导致 causal effect 估计受到抵消的影响;(3)观察数据通常是时间变化的。为了解决这些挑战,我们提出了一种新方法 NEAT,它明确利用图表结构来模型治疗分配机制,并根据治疗分配模型来减少假设因素的影响。我们还将方法推广到动态设定,以处理时间变化的观察数据。在 synthetic 数据和一个实际MRSA数据上进行了实验,并证明了我们的方法的有效性,并提供了未来应用的参考。

HeroLT: Benchmarking Heterogeneous Long-Tailed Learning

  • paper_url: http://arxiv.org/abs/2307.08235
  • repo_url: https://github.com/ssskj/herolt
  • paper_authors: Haohui Wang, Weijie Guan, Jianpeng Chen, Zi Wang, Dawei Zhou
  • for: 本研究旨在提供一个系统性的长尾学习视角,涵盖数据长尾性、域困难度和新任务多样性等三个纬度。
  • methods: 本研究开发了包括13种现状之最先进算法和6种评价指标的最全面的长尾学习 benchmark 名为 HeroLT,并在14个真实 benchmark 数据集上进行了264项实验。
  • results: 研究人员通过对 HeroLT benchmark 进行了全面的实验和分析,并提出了一些有 Promise 的未来方向。
    Abstract Long-tailed data distributions are prevalent in a variety of domains, including finance, e-commerce, biomedical science, and cyber security. In such scenarios, the performance of machine learning models is often dominated by the head categories, while the learning of tail categories is significantly inadequate. Given abundant studies conducted to alleviate the issue, this work aims to provide a systematic view of long-tailed learning with regard to three pivotal angles: (A1) the characterization of data long-tailedness, (A2) the data complexity of various domains, and (A3) the heterogeneity of emerging tasks. To achieve this, we develop the most comprehensive (to the best of our knowledge) long-tailed learning benchmark named HeroLT, which integrates 13 state-of-the-art algorithms and 6 evaluation metrics on 14 real-world benchmark datasets across 4 tasks from 3 domains. HeroLT with novel angles and extensive experiments (264 in total) enables researchers and practitioners to effectively and fairly evaluate newly proposed methods compared with existing baselines on varying types of datasets. Finally, we conclude by highlighting the significant applications of long-tailed learning and identifying several promising future directions. For accessibility and reproducibility, we open-source our benchmark HeroLT and corresponding results at https://github.com/SSSKJ/HeroLT.
    摘要 长尾数据分布广泛存在多个领域,如金融、电商、生物医学和网络安全。在这些场景下,机器学习模型的性能frequently受到主要类别的影响,而tail categories的学习则是不足的。鉴于丰富的相关研究,本工作想要提供长尾学习的系统视图,涉及以下三个重要角度:(A1)数据长尾性的特征,(A2)各领域的数据复杂性,以及(A3)emerging task的多样性。为实现这一目标,我们开发了最 complet(到我们所知)的长尾学习 benchmarck named HeroLT,该benchmark integrate 13种state-of-the-art算法和6种评价指标在14个真实世界 benchmark数据集上。 HeroLT通过新的角度和广泛的实验(共264个),帮助研究者和实践者对新提出的方法进行有效和公平的评估,并与现有基准值进行比较。最后,我们 conclude by highlighting long-tailed learning的重要应用和未来发展的一些可能性。为便捷性和可重复性,我们在 GitHub 上公开了我们的 benchmark HeroLT 和相应的结果。

Learning for Counterfactual Fairness from Observational Data

  • paper_url: http://arxiv.org/abs/2307.08232
  • repo_url: None
  • paper_authors: Jing Ma, Ruocheng Guo, Aidong Zhang, Jundong Li
  • for: 避免机器学习模型具有对某些子群体的偏见(如种族、性别、年龄等),实现对所有 subgroup 的公正预测。
  • methods: Counterfactual fairness 是一种从 causal 角度定义的公正性观,通过比较每个个体在原始世界和在对敏感特征值进行修改后的世界中的预测,来衡量模型的公正性。但在实际应用中,通常无法获得准确的 causal 模型,因此直接使用这些模型可能会带来偏见。本文提出了一种新的框架 CLAIRE,通过对数据进行 counterfactual 数据扩展和一种对称约束来减轻敏感特征的偏见。
  • results: experiments 表明,CLAIRE 在对实际数据进行预测时比其他方法更好,同时也能够保证对所有 subgroup 的公正预测。
    Abstract Fairness-aware machine learning has attracted a surge of attention in many domains, such as online advertising, personalized recommendation, and social media analysis in web applications. Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age. Among many existing fairness notions, counterfactual fairness is a popular notion defined from a causal perspective. It measures the fairness of a predictor by comparing the prediction of each individual in the original world and that in the counterfactual worlds in which the value of the sensitive attribute is modified. A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data. However, in real-world scenarios, the underlying causal model is often unknown, and acquiring such human knowledge could be very difficult. In these scenarios, it is risky to directly trust the causal models obtained from information sources with unknown reliability and even causal discovery methods, as incorrect causal models can consequently bring biases to the predictor and lead to unfair predictions. In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE. Specifically, under certain general assumptions, CLAIRE effectively mitigates the biases from the sensitive attribute with a representation learning framework based on counterfactual data augmentation and an invariant penalty. Experiments conducted on both synthetic and real-world datasets validate the superiority of CLAIRE in both counterfactual fairness and prediction performance.
    摘要 “对待公平机器学习在多个领域中引起了广泛关注,例如在网络广告、个人化推荐和社交媒体分析中的网络应用程序。对待公平机器学习的目标是删除机器学习模型对某些子群体(敏感特征)的偏袋,例如性别、年龄和种族。许多现有的公平定义中,Counterfactual fairness是一种受欢迎的定义,它从 causal 的角度定义了公平的定义。Counterfactual fairness 的定义是根据每个个体在原始世界中的预测和在替代世界中的预测来衡量模型的公平。现有的方法以前需要人类对敏感特征的 causal 模型有充分的知识。但在实际情况下,背景 causal 模型通常是未知的,获取这种人类知识可能是很困难的。在这些情况下,直接对这些信息来源不确定的 causal 模型进行信任可能是很危险的。在这个工作中,我们解决了从观察数据中进行 counterfactually 公平预测的问题,不需要人类对敏感特征的 causal 模型的知识。我们提出了一个名为 CLAIRE 的新框架,它在满足一些一般假设下,可以对敏感特征进行优化,并且使用 counterfactual 数据增强和不变 penalty 来减少偏袋。实验结果显示,CLAIRE 在 counterfactual 公平和预测性能方面具有优越性。”

Can Euclidean Symmetry be Leveraged in Reinforcement Learning and Planning?

  • paper_url: http://arxiv.org/abs/2307.08226
  • repo_url: None
  • paper_authors: Linfeng Zhao, Owen Howell, Jung Yeon Park, Xupeng Zhu, Robin Walters, Lawson L. S. Wong
  • for: 这个论文的目的是设计改进的学习算法,用于控制和规划任务,具有欧几何群同质性。
  • methods: 论文使用了一种统一优化算法,可以应用于离散和连续的 symmetry 问题,包括优化算法和样本生成算法。
  • results: 实验证明,通过具有欧几何群同质性的算法,可以更好地解决自然的控制问题。
    Abstract In robotic tasks, changes in reference frames typically do not influence the underlying physical properties of the system, which has been known as invariance of physical laws.These changes, which preserve distance, encompass isometric transformations such as translations, rotations, and reflections, collectively known as the Euclidean group. In this work, we delve into the design of improved learning algorithms for reinforcement learning and planning tasks that possess Euclidean group symmetry. We put forth a theory on that unify prior work on discrete and continuous symmetry in reinforcement learning, planning, and optimal control. Algorithm side, we further extend the 2D path planning with value-based planning to continuous MDPs and propose a pipeline for constructing equivariant sampling-based planning algorithms. Our work is substantiated with empirical evidence and illustrated through examples that explain the benefits of equivariance to Euclidean symmetry in tackling natural control problems.
    摘要 在机器人任务中,参照系统的变化通常不会影响系统的物理性质,这被称为不变性法律。这些变化包括同构射影、旋转和反射,合称为欧几何群。在这个工作中,我们深入探讨改进学习算法的设计,以便在奖励学习和规划任务中具有欧几何群的对称性。我们提出了对往年的绝对同构和连续同构在奖励学习、规划和最优控制中的统一理论。算法方面,我们进一步扩展了二维路径规划,并提出了一个管道的构建同构抽样计划算法。我们的工作得到了实验证明,并通过例子解释了在自然控制问题中如何通过对维持欧几何群的同构性来获得利益。

A Lightweight Framework for High-Quality Code Generation

  • paper_url: http://arxiv.org/abs/2307.08220
  • repo_url: None
  • paper_authors: Mohammed Latif Siddiq, Beatrice Casey, Joanna C. S. Santos
  • for: This paper aims to improve the quality and security of automatically generated source codes using transformer-based code generation models.
  • methods: The proposed framework, FRANC, includes a static filter and a quality-aware ranker to sort code snippets based on compilability and quality scores. Prompt engineering is also used to fix persistent quality issues.
  • results: FRANC improves the compilability of Java and Python code suggestions by 9% to 46% and 10% to 43%, respectively. The average improvement in NDCG@10 score is 0.0763, and the repairing techniques repair the highest 80% of prompts. The framework takes approximately 1.98 seconds for Java and 0.08 seconds for Python.
    Abstract In recent years, the use of automated source code generation utilizing transformer-based generative models has expanded, and these models can generate functional code according to the requirements of the developers. However, recent research revealed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts to enhance code generation models, retraining and fine-tuning large language models is time-consuming and resource-intensive. Thus, we describe FRANC, a lightweight framework for recommending more secure and high-quality source code derived from transformer-based code generation models. FRANC includes a static filter to make the generated code compilable with heuristics and a quality-aware ranker to sort the code snippets based on a quality score. Moreover, the framework uses prompt engineering to fix persistent quality issues. We evaluated the framework with five Python and Java code generation models and six prompt datasets, including a newly created one in this work (SOEval). The static filter improves 9% to 46% Java suggestions and 10% to 43% Python suggestions regarding compilability. The average improvement over the NDCG@10 score for the ranking system is 0.0763, and the repairing techniques repair the highest 80% of prompts. FRANC takes, on average, 1.98 seconds for Java; for Python, it takes 0.08 seconds.
    摘要 近年来,使用自动生成源代码的使用者模型(transformer-based generative models)的使用已扩展。这些模型可以根据开发者的需求生成功能代码。然而,最新的研究发现,这些自动生成的代码可能含有漏洞和质量问题。尽管研究人员和实践者尝试了增强代码生成模型,但是重新训练和精度调整大型自然语言模型是时间consuming和资源占用。因此,我们描述了FRANC框架,它是一个轻量级的框架,可以为基于 transformer 的代码生成模型提供更安全和更高质量的源代码。FRANC 包括一个静态筛选器,使得生成的代码可以遵循规范和质量评分器,以根据代码片段的质量进行排序。此外,框架还使用 prompt 工程来修复持续存在的质量问题。我们对五种 Python 和 Java 代码生成模型,以及六个提示集进行评估。静态筛选器可以提高 Java 建议的可 compiling 率由 9% 到 46%,Python 建议的可 compiling 率由 10% 到 43%。具有 NDCG@10 指标的平均提升为 0.0763,并且修复技术可以修复最高 80% 的提示。FRANC 平均需要 1.98 秒钟 для Java,占用 0.08 秒钟 для Python。

Forward Laplacian: A New Computational Framework for Neural Network-based Variational Monte Carlo

  • paper_url: http://arxiv.org/abs/2307.08214
  • repo_url: None
  • paper_authors: Ruichen Li, Haotian Ye, Du Jiang, Xuelan Wen, Chuwei Wang, Zhe Li, Xiang Li, Di He, Ji Chen, Weiluo Ren, Liwei Wang
  • for: 能够扩展NN-VMC的应用范围到更大的系统,包括更多的原子、分子和化学反应。
  • methods: 使用了一种新的计算框架 named Forward Laplacian,通过高效的前进传播过程计算了神经网络中的 Laplacian,从而大幅提高了NN-VMC的计算效率。
  • results: 对于一系列的原子、分子和化学反应,NN-VMC通过Empirical数据示出了可以解决通用量子力学问题的潜力。
    Abstract Neural network-based variational Monte Carlo (NN-VMC) has emerged as a promising cutting-edge technique of ab initio quantum chemistry. However, the high computational cost of existing approaches hinders their applications in realistic chemistry problems. Here, we report the development of a new NN-VMC method that achieves a remarkable speed-up by more than one order of magnitude, thereby greatly extending the applicability of NN-VMC to larger systems. Our key design is a novel computational framework named Forward Laplacian, which computes the Laplacian associated with neural networks, the bottleneck of NN-VMC, through an efficient forward propagation process. We then demonstrate that Forward Laplacian is not only versatile but also facilitates more developments of acceleration methods across various aspects, including optimization for sparse derivative matrix and efficient neural network design. Empirically, our approach enables NN-VMC to investigate a broader range of atoms, molecules and chemical reactions for the first time, providing valuable references to other ab initio methods. The results demonstrate a great potential in applying deep learning methods to solve general quantum mechanical problems.
    摘要

Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

  • paper_url: http://arxiv.org/abs/2307.08208
  • repo_url: https://github.com/hanbocai/badspeech_soe
  • paper_authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Stefanos Koffas, Yiming Li
  • for: 这个论文的目的是研究潜在攻击者可以通过恶意投入到语音识别模型的训练过程中,使模型具有恶意预测行为的问题。
  • methods: 这篇论文使用了一些新的攻击方法,包括使用高频谱的尖声作为触发器,并将其与其他音频 clip 混合以实现更隐蔽的攻击。它们还使用了timbre特征来实现隐蔽的攻击。
  • results: 实验结果表明,这些攻击方法可以在不同的设定下(例如,all-to-one、all-to-all、干净标签、物理和多个攻击点设定)下实现高效的攻击。这些攻击方法也比较隐蔽,可以逃脱检测。
    Abstract Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against speech recognition. We reveal that existing methods are not stealthy since their trigger patterns are perceptible to humans or machine detection. This limitation is mostly because their trigger patterns are simple noises or separable and distinctive clips. Motivated by these findings, we propose to exploit elements of sound ($e.g.$, pitch and timbre) to design more stealthy yet effective poison-only backdoor attacks. Specifically, we insert a short-duration high-pitched signal as the trigger and increase the pitch of remaining audio clips to `mask' it for designing stealthy pitch-based triggers. We manipulate timbre features of victim audios to design the stealthy timbre-based attack and design a voiceprint selection module to facilitate the multi-backdoor attack. Our attacks can generate more `natural' poisoned samples and therefore are more stealthy. Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our attacks under different settings ($e.g.$, all-to-one, all-to-all, clean-label, physical, and multi-backdoor settings) and their stealthiness. The code for reproducing main experiments are available at \url{https://github.com/HanboCai/BadSpeech_SoE}.
    摘要 深度神经网络(DNNs)在语音识别应用中广泛采用和部署。近期一些研究表明,这些模型容易受到后门攻击,敌人可以通过恶意污染训练过程中植入Malicious prediction behaviors。在这篇文章中,我们再次研究了对语音识别的poison-only后门攻击。我们发现现有方法不够隐蔽,因为启发模式是人类或机器检测的。这是因为启发模式通常是简单的噪声或分离的和特征的音频clip。我们被这些发现 motivated,我们提议利用音频元素(如抑声和 timbre)设计更隐蔽又有效的poison-only后门攻击。我们插入短暂的高频声讯作为启发,并增加剩下的音频clip的抑声来mask它。我们操纵受害者音频的timbre特征来设计隐蔽的timbre-based攻击,并设计一个voiceprint选择模块来促进多个后门攻击。我们的攻击可以生成更自然的杂 poisoned samples,因此更隐蔽。我们在标准数据集上进行了广泛的实验,以验证我们的攻击在不同的设置(例如all-to-one、all-to-all、spot、physical和多个后门设置)下的效果和隐蔽性。代码可以在\url{https://github.com/HanboCai/BadSpeech_SoE}中找到。

A Quantum Convolutional Neural Network Approach for Object Detection and Classification

  • paper_url: http://arxiv.org/abs/2307.08204
  • repo_url: None
  • paper_authors: Gowri Namratha Meedinti, Kandukuri Sai Srirekha, Radhakrishnan Delhibabu
  • for: 这篇论文主要评估量子卷积神经网络(QCNN)的潜在能力,与经典卷积神经网络(CNN)和人工神经网络(ANN)模型进行比较。
  • methods: 本论文使用了量子计算方法,将数据存储在量子环境中,并应用了CNN结构来处理这些数据。
  • results: 分析结果表明,QCNNs在某些应用场景下可以超越经典CNN和ANN模型,both in terms of accuracy and efficiency。此外,QCNNs还可以处理更大的复杂性水平。
    Abstract This paper presents a comprehensive evaluation of the potential of Quantum Convolutional Neural Networks (QCNNs) in comparison to classical Convolutional Neural Networks (CNNs) and Artificial / Classical Neural Network (ANN) models. With the increasing amount of data, utilizing computing methods like CNN in real-time has become challenging. QCNNs overcome this challenge by utilizing qubits to represent data in a quantum environment and applying CNN structures to quantum computers. The time and accuracy of QCNNs are compared with classical CNNs and ANN models under different conditions such as batch size and input size. The maximum complexity level that QCNNs can handle in terms of these parameters is also investigated. The analysis shows that QCNNs have the potential to outperform both classical CNNs and ANN models in terms of accuracy and efficiency for certain applications, demonstrating their promise as a powerful tool in the field of machine learning.
    摘要 Note: Simplified Chinese is also known as "简化字" or "简化字".Here's the translation in Simplified Chinese:这篇论文对量子卷积神经网络(QCNN)与经典卷积神经网络(CNN)以及人工神经网络(ANN)模型进行了全面的评估。随着数据量不断增加,使用计算方法如CNN在实时中变得越来越困难。QCNNs利用量子粒子来表示数据,并在量子计算机上应用卷积结构,从而超越了经典模型的限制。在不同的批处理大小和输入大小条件下,QCNNs的时间和准确率与经典CNNs和ANN模型进行了比较。此外,QCNNs的最大复杂度水平也进行了调查。分析结果表明,QCNNs在某些应用中可以在准确率和效率方面超越经典模型,表明它们在机器学习领域是一种有力的工具。

Noise removal methods on ambulatory EEG: A Survey

  • paper_url: http://arxiv.org/abs/2308.02437
  • repo_url: None
  • paper_authors: Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi
  • for: 本研究旨在实时处理患者短访EEG数据,以提高医疗干预的精确性和效率。
  • methods: 本研究使用了许多检测和移除噪声的技术,包括模式识别、机器学习、和信号处理等。
  • results: 本研究发现,不同条件下的EEG数据可以使用不同的检测和移除噪声技术,以提高医疗干预的精确性和效率。
    Abstract Over many decades, research is being attempted for the removal of noise in the ambulatory EEG. In this respect, an enormous number of research papers is published for identification of noise removal, It is difficult to present a detailed review of all these literature. Therefore, in this paper, an attempt has been made to review the detection and removal of an noise. More than 100 research papers have been discussed to discern the techniques for detecting and removal the ambulatory EEG. Further, the literature survey shows that the pattern recognition required to detect ambulatory method, eye open and close, varies with different conditions of EEG datasets. This is mainly due to the fact that EEG detected under different conditions has different characteristics. This is, in turn, necessitates the identification of pattern recognition technique to effectively distinguish EEG noise data from a various condition of EEG data.
    摘要

HOPE: High-order Polynomial Expansion of Black-box Neural Networks

  • paper_url: http://arxiv.org/abs/2307.08192
  • repo_url: https://github.com/harrypotterxtx/hope
  • paper_authors: Tingxiong Xiao, Weihang Zhang, Yuxiao Cheng, Jinli Suo
  • for: 这篇论文旨在提供一种方法,使深度神经网络变得更加可解,以便在需要作出有理的决策的领域中应用。
  • methods: 这篇论文使用了高阶多项式扩展(High-order Polynomial Expansion,HOPE)方法,将神经网络拓展成高阶多项式的参考输入。特别是,authors derive了高阶DERIVATIVE规则 для复杂函数,并将其扩展到神经网络,以快速和准确地计算神经网络的高阶DERIVATIVE。
  • results: 数值分析表明,提案的方法具有高精度、低计算复杂度和良好的收敛性。此外,authors还用HOPE方法实现了深度学习中的功能发现、快速推理和特征选择等广泛应用。
    Abstract Despite their remarkable performance, deep neural networks remain mostly ``black boxes'', suggesting inexplicability and hindering their wide applications in fields requiring making rational decisions. Here we introduce HOPE (High-order Polynomial Expansion), a method for expanding a network into a high-order Taylor polynomial on a reference input. Specifically, we derive the high-order derivative rule for composite functions and extend the rule to neural networks to obtain their high-order derivatives quickly and accurately. From these derivatives, we can then derive the Taylor polynomial of the neural network, which provides an explicit expression of the network's local interpretations. Numerical analysis confirms the high accuracy, low computational complexity, and good convergence of the proposed method. Moreover, we demonstrate HOPE's wide applications built on deep learning, including function discovery, fast inference, and feature selection. The code is available at https://github.com/HarryPotterXTX/HOPE.git.
    摘要 尽管它们的表现很出色,深度神经网络仍然具有大量的“黑盒子”特性,这限制了它们在需要做合理决策的领域应用。我们在这里介绍HOPE(高阶多项式扩展)方法,它可以将神经网络扩展成参考输入的高阶多项式。我们 derivated高阶DERIVATIVE规则 для复杂函数,并将这个规则扩展到神经网络,从而快速和高精度地计算神经网络的高阶DERIVATIVE。基于这些DERIVATIVE,我们可以计算神经网络的泰勒多项式,从而获得神经网络的本地解释。数值分析表明HOPE的精度高、计算复杂度低,并且 converge 很好。此外,我们还证明HOPE在深度学习建立的各种应用中具有广泛的应用前景,包括函数发现、快速推理和特征选择。代码可以在https://github.com/HarryPotterXTX/HOPE.git中找到。

Mini-Giants: “Small” Language Models and Open Source Win-Win

  • paper_url: http://arxiv.org/abs/2307.08189
  • repo_url: None
  • paper_authors: Zhengping Zhou, Lezhi Li, Xinxi Chen, Andy Li
  • for: 这篇论文主要是为了讨论小语言模型的发展和应用。
  • methods: 论文使用了开源社区和小语言模型来实现技术、伦理和社会上的赢利。
  • results: 论文提出了小语言模型在实际应用场景中的需求和潜力,并进行了对小语言模型的比较研究和评估方法。
    Abstract ChatGPT is phenomenal. However, it is prohibitively expensive to train and refine such giant models. Fortunately, small language models are flourishing and becoming more and more competent. We call them "mini-giants". We argue that open source community like Kaggle and mini-giants will win-win in many ways, technically, ethically and socially. In this article, we present a brief yet rich background, discuss how to attain small language models, present a comparative study of small language models and a brief discussion of evaluation methods, discuss the application scenarios where small language models are most needed in the real world, and conclude with discussion and outlook.
    摘要 chatgpt是非常出色的,但是它的训练和精细化过程却非常昂贵。幸好,小语言模型在繁殖和成熔的过程中逐渐强大起来。我们称之为“小巨人”。我们认为开源社区如Kaggle和小巨人在技术、道德和社会各个方面都将取得胜利。在这篇文章中,我们会提供简短 yet rich的背景介绍,讲述如何获得小语言模型,进行小语言模型的比较研究, brief discussion of evaluation methods,介绍实际世界中小语言模型的应用场景,并结束 WITH discussion and outlook。

An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

  • paper_url: http://arxiv.org/abs/2307.08187
  • repo_url: None
  • paper_authors: Hiroki Naganuma, Ryuichiro Hataya
  • for: 提高out-of-distribution泛化性能和推理不确定性
  • methods: investigate pre-trained model selection的影响,并比较不同的数据集和模型参数对性能指标的影响
  • results: 发现预训练模型选择对out-of-distribution泛化性能有显著影响,大型模型表现较好,但需要进一步研究memorization和真正的泛化之间的平衡。
    Abstract In the realm of out-of-distribution generalization tasks, finetuning has risen as a key strategy. While the most focus has been on optimizing learning algorithms, our research highlights the influence of pre-trained model selection in finetuning on out-of-distribution performance and inference uncertainty. Balancing model size constraints of a single GPU, we examined the impact of varying pre-trained datasets and model parameters on performance metrics like accuracy and expected calibration error. Our findings underscore the significant influence of pre-trained model selection, showing marked performance improvements over algorithm choice. Larger models outperformed others, though the balance between memorization and true generalization merits further investigation. Ultimately, our research emphasizes the importance of pre-trained model selection for enhancing out-of-distribution generalization.
    摘要 在异常分布泛化任务中, fine-tuning 已成为一项关键策略。而我们的研究表明,预训练模型选择在 fine-tuning 中对异常分布性能和推理不确定性产生了重要影响。我们在单个 GPU 的模型大小限制下对不同的预训练数据集和模型参数进行了研究,发现预训练模型选择对性能指标如准确率和预期抽象误差产生了显著的影响。大型模型表现更好,但是要找到Memorization 和真正的泛化之间的平衡仍然需要进一步的调查。最终,我们的研究强调了预训练模型选择对异常分布泛化的重要性。

Measuring Faithfulness in Chain-of-Thought Reasoning

  • paper_url: http://arxiv.org/abs/2307.13702
  • repo_url: None
  • paper_authors: Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez
  • for: investigate how Chain-of-Thought (CoT) reasoning may be unfaithful in large language models (LLMs)
  • methods: examine how model predictions change when intervening on the CoT (e.g., adding mistakes or paraphrasing)
  • results: CoT’s performance boost is not due to added test-time compute or information encoded in the CoT’s phrasing, and models produce less faithful reasoning as they become larger and more capable.Here’s the full abstract in Simplified Chinese:for: investigate how Chain-of-Thought (CoT) reasoning may be unfaithful in large language models (LLMs)methods: examine how model predictions change when intervening on the CoT (e.g., adding mistakes or paraphrasing)results: CoT’s performance boost is not due to added test-time compute or information encoded in the CoT’s phrasing, and models produce less faithful reasoning as they become larger and more capable.
    Abstract Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.
    摘要 (Simplified Chinese translation)大型语言模型(LLM)在回答问题时,可以通过“链接思维”(CoT)的步骤式 reasoning 表现更好,但是未知模型的实际运算是否忠于 CoT 的解释。我们 investigate 假设 CoT 的不实际运算,通过对 CoT 进行干扰(例如添加错误或重新诠释)来探索。我们发现模型在不同任务上对 CoT 的conditioning 程度有很大的变化,有时将重点放在 CoT 上,有时则几乎忽略它。CoT 的性能提升不似乎来自 CoT 的额外测试计算或由特定表述 CoT 中的信息。当模型变得更大和更强大时,它们在大多数任务上显示出不忠的运算。总之,我们的结果表明,CoT 可以忠实,只要选择适当的模型大小和任务。

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

  • paper_url: http://arxiv.org/abs/2307.11768
  • repo_url: https://github.com/anthropics/decompositionfaithfulnesspaper
  • paper_authors: Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez
  • for: 帮助验证大型自然语言模型(LLM)的正确性和安全性。
  • methods: 使用 decomposition-based methods,即将问题拆分成多个子问题,让模型在不同的上下文中答swers simpler subquestions,以提高模型生成的 reasoning 的准确性。
  • results: 研究表明,通过使用 decomposition-based methods,可以提高模型生成的 reasoning 的准确性,而不会 sacrifice too much的性能。这些方法可以帮助我们更好地验证 LLM 的正确性和安全性。
    Abstract As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach to help with this issue is to prompt LLMs to externalize their reasoning, e.g., by having them generate step-by-step reasoning as they answer a question (Chain-of-Thought; CoT). The reasoning may enable us to check the process that models use to perform tasks. However, this approach relies on the stated reasoning faithfully reflecting the model's actual reasoning, which is not always the case. To improve over the faithfulness of CoT reasoning, we have models generate reasoning by decomposing questions into subquestions. Decomposition-based methods achieve strong performance on question-answering tasks, sometimes approaching that of CoT while improving the faithfulness of the model's stated reasoning on several recently-proposed metrics. By forcing the model to answer simpler subquestions in separate contexts, we greatly increase the faithfulness of model-generated reasoning over CoT, while still achieving some of the performance gains of CoT. Our results show it is possible to improve the faithfulness of model-generated reasoning; continued improvements may lead to reasoning that enables us to verify the correctness and safety of LLM behavior.
    摘要 To improve the faithfulness of CoT reasoning, we have developed methods that decompose questions into subquestions. This approach achieves strong performance on question-answering tasks and sometimes approaches the performance of CoT while improving the faithfulness of the model's stated reasoning on several recently proposed metrics. By forcing the model to answer simpler subquestions in separate contexts, we significantly increase the faithfulness of model-generated reasoning over CoT, while still achieving some of the performance gains of CoT.Our results show that it is possible to improve the faithfulness of model-generated reasoning. Continued improvements may lead to reasoning that enables us to verify the correctness and safety of LLM behavior.

Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding

  • paper_url: http://arxiv.org/abs/2307.09169
  • repo_url: None
  • paper_authors: Zihan Liu, Jiaqi Wang, Yun Luo, Shuang Zhao, Wenbin Li, Stan Z. Li
  • for: 这篇论文旨在探讨深度学习如何应用于蛋白质自组装预测,以提高预测精度。
  • methods: 本研究使用了现代深度学习模型,包括RNN、LSTM、Transformer和GCN、GAT、GraphSAGE等,进行了系统性的检查,以探讨蛋白质编码的影响。
  • results: 研究发现,Transformer模型是最有力的序列编码基于深度学习模型,可以预测蛋白质自组装的精度。此外,研究还发现了不同的蛋白质编码方法对预测精度的影响。
    Abstract In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for AI-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62,000 samples generated by coarse-grained molecular dynamics (CGMD). Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e., RNN, LSTM, and Transformer) and structural deep learning models (i.e., GCN, GAT, and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
    摘要 在最近几年,因为蛋白质的发展和市场潜力而增加了对深度学习应用于蛋白质性质预测的研究,特别是蛋白质自组合的预测。分子动力学技术为蛋白质自组合预测提供了可靠的训练数据,但是蛋白质编码问题的缺乏系统性分析,使得预测精度的提高成为了紧迫的问题。为解决这问题,我们首先收集了高质量的大型蛋白质自组合 simulated annealing 数据集,包含62,000个样本,并使用现状最佳的序列(如 RNN、LSTM 和 Transformer)和结构深度学习模型(如 GCN、GAT 和 GraphSAGE)进行系统性分析,以 investigate the effect of peptide encoding of amino acids into sequences and molecular graphs on the accuracy of peptide self-assembly prediction. 经过广泛的比较研究,我们发现Transformer是最强的序列编码基于深度学习模型,可以为蛋白质自组合预测提供最高的精度,并且可以推动蛋白质自组合预测至十个氨基酸。总之,这项研究提供了深度学习模型的全面性分析,可以作为蛋白质相关预测,如离子点、� hydration free energy 等的指南。

Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models

  • paper_url: http://arxiv.org/abs/2307.08175
  • repo_url: https://github.com/slds-lmu/paper_2023_eagga
  • paper_authors: Lennart Schneider, Bernd Bischl, Janek Thomas
  • for: 提高超参数优化和解释性之间的协同优化,以提高表格数据上的预测性能和解释性。
  • methods: 利用多目标优化问题的方法,将超参数优化和解释性之间的质量考虑为一个单一的优化问题,并通过增加特征选择、交互和 monotonicity 约束来扩展学习算法的搜索空间。
  • results: 在 benchmark 实验中,提出了一种新的进化算法,可以高效地在扩展的搜索空间上进行优化,并在表格数据上提高了性能和解释性的模型。
    Abstract We present a model-agnostic framework for jointly optimizing the predictive performance and interpretability of supervised machine learning models for tabular data. Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. By treating hyperparameter optimization of a machine learning algorithm as a multi-objective optimization problem, our framework allows for generating diverse models that trade off high performance and ease of interpretability in a single optimization run. Efficient optimization is achieved via augmentation of the search space of the learning algorithm by incorporating feature selection, interaction and monotonicity constraints into the hyperparameter search space. We demonstrate that the optimization problem effectively translates to finding the Pareto optimal set of groups of selected features that are allowed to interact in a model, along with finding their optimal monotonicity constraints and optimal hyperparameters of the learning algorithm itself. We then introduce a novel evolutionary algorithm that can operate efficiently on this augmented search space. In benchmark experiments, we show that our framework is capable of finding diverse models that are highly competitive or outperform state-of-the-art XGBoost or Explainable Boosting Machine models, both with respect to performance and interpretability.
    摘要 我们提出了一个模型不偏向的框架,用于同时优化supervised机器学习模型的预测性能和可解性。可解性是通过三个度量来衡量:特征稀缺、特征之间的互动稀缺和非升序特征效应的稀缺。我们将hyperparameter优化问题定义为多目标优化问题,以便在单一优化运行中生成兼顾高性能和易于理解的模型。我们通过将特征选择、互动和升序约束添加到学习算法的搜索空间中来实现高效的优化。我们示出,优化问题实际上是找到允许互动的分组选择的Pareto优化集,以及这些分组中每个特征的最佳升序约束和学习算法的优化参数。然后,我们引入了一种新的进化算法,可以高效地在这个扩展的搜索空间上运行。在测试中,我们发现我们的框架能够找到高竞争力或超越当前XGBoost或Explainable Boosting Machine模型, both with respect to performance and interpretability。

Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.08169
  • repo_url: None
  • paper_authors: L. L. Ankile, B. S. Ham, K. Mao, E. Shin, S. Swaroop, F. Doshi-Velez, W. Pan
  • for: 用于描述用户在帮助人类学习(RL)中的行为,并研究用户特征来减少 intervención设计中的时间。
  • methods: 使用RL代理表示用户,研究用户行为与特征之间的关系,并提出一种用于研究用户类型崩溃的直观工具。
  • results: 确认了不同实际环境中的用户类型崩溃是一样的,并将这一观察形式化为环境之间的等式关系。通过在同一等式类中转移 intervención设计,可以快速个性化 intervención。
    Abstract When assisting human users in reinforcement learning (RL), we can represent users as RL agents and study key parameters, called \emph{user traits}, to inform intervention design. We study the relationship between user behaviors (policy classes) and user traits. Given an environment, we introduce an intuitive tool for studying the breakdown of "user types": broad sets of traits that result in the same behavior. We show that seemingly different real-world environments admit the same set of user types and formalize this observation as an equivalence relation defined on environments. By transferring intervention design between environments within the same equivalence class, we can help rapidly personalize interventions.
    摘要 Translated into Simplified Chinese:在帮助人类用户进行学习增强(RL)时,我们可以将用户表示为RL agent,并研究关键参数,即“用户特征”,以便设计干预。我们研究用户行为(策略类型)和用户特征之间的关系。给定环境,我们引入一种直观的工具来研究“用户类型”的分解:广泛的特征集合,导致同样的行为。我们发现不同的实际环境都可以拥有同一组用户类型,并将这一观察形式化为环境之间的等式关系。通过在同一等式类中传递干预设计,我们可以帮助快速个化干预。

Integer Factorisation, Fermat & Machine Learning on a Classical Computer

  • paper_url: http://arxiv.org/abs/2308.12290
  • repo_url: None
  • paper_authors: Sam Blake
  • for: 该论文提出了一种基于深度学习的整数分解算法。
  • methods: 该算法使用欧拉扩展的法拉第分解算法将整数分解问题转化为二分类问题,并使用大量的synthetic数据进行训练。
  • results: 该论文介绍了算法的实现和一些实验结果,并分析了实验的缺陷。它还呼吁其他研究人员复现、验证和改进这种方法,以确定其可scalability和实用性。
    Abstract In this paper we describe a deep learning--based probabilistic algorithm for integer factorisation. We use Lawrence's extension of Fermat's factorisation algorithm to reduce the integer factorisation problem to a binary classification problem. To address the classification problem, based on the ease of generating large pseudo--random primes, a corpus of training data, as large as needed, is synthetically generated. We will introduce the algorithm, summarise some experiments, analyse where these experiments fall short, and finally put out a call to others to reproduce, verify and see if this approach can be improved to a point where it becomes a practical, scalable factorisation algorithm.
    摘要 在这篇论文中,我们描述了一种深度学习基于概率算法的整数分解方法。我们使用劳伦斯扩展的费马分解算法将整数分解问题转化为二分类问题。为解决这个分类问题,我们使用大量生成的假随机 prime 数据集进行训练。我们将算法介绍、summarize一些实验结果、分析实验的缺陷,最后呼吁其他人重现、验证以及提高这种方法,以使其成为实用、可扩展的分解算法。

Feedback is All You Need: Real-World Reinforcement Learning with Approximate Physics-Based Models

  • paper_url: http://arxiv.org/abs/2307.08168
  • repo_url: None
  • paper_authors: Tyler Westenbroek, Jacob Levy, David Fridovich-Keil
  • for: 本研究旨在开发高效可靠的政策优化策略,用于机器学习实际数据上的 робоット学习。
  • methods: 本研究使用政策梯度方法,并系统地利用一个可能很简单的第一原理模型,以生成有限量的实际数据上的精确控制策略。
  • results: 本研究通过理论分析和硬件实验,证明了这种方法可以在几分钟的实际数据上学习精确控制策略,并且可以重新使用。
    Abstract We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.
    摘要 我们主要专注于开发高效可靠的政策优化策略,以对现实世界数据进行机器学习。在最近几年中,政策势方法emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper, we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data.Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.Here's the translation in Traditional Chinese:我们主要专注于开发高效可靠的政策优化策略,以对现实世界数据进行机器学习。在最近几年中,政策势方法emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper, we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data.Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

Computing the gradients with respect to all parameters of a quantum neural network using a single circuit

  • paper_url: http://arxiv.org/abs/2307.08167
  • repo_url: https://github.com/gphehub/grad2210
  • paper_authors: Guang Ping He
  • for: 计算量子神经网络的梯度时,需要计算两次Cost函数,一次是为了计算单个可调参数的梯度。当总参数数量很高时,量子电路需要调整和运行多次。
  • methods: 我们提出了一种方法,可以使用单个Circuit来计算所有梯度,减少Circuit的深度和классическийRegister的数量。
  • results: 我们实验表明,我们的方法可以在真实的量子硬件和模拟器上实现速度减少, compiling circuit takes significantly less time than conventional approach, resulting in a total runtime speedup.
    Abstract When computing the gradients of a quantum neural network using the parameter-shift rule, the cost function needs to be calculated twice for the gradient with respect to a single adjustable parameter of the network. When the total number of parameters is high, the quantum circuit for the computation has to be adjusted and run for many times. Here we propose an approach to compute all the gradients using a single circuit only, with a much reduced circuit depth and less classical registers. We also demonstrate experimentally, on both real quantum hardware and simulator, that our approach has the advantages that the circuit takes a significantly shorter time to compile than the conventional approach, resulting in a speedup on the total runtime.
    摘要 当计算量子神经网络中参数的梯度使用参数变化规则时,需要计算两次函数值以计算单个可变参数的梯度。当总参数数量较高时,量子电路需要调整并运行多次。我们提出了一种方法,可以通过单个电路计算所有梯度,减少电路深度和классический寄存器数量。我们还在实际中进行了实验,在真实的量子硬件和模拟器上证明了我们的方法可以减少电路编译时间,从而提高总时间的速度。Note: The word "quantum" is translated as "量子" in Simplified Chinese.

Neural Stream Functions

  • paper_url: http://arxiv.org/abs/2307.08142
  • repo_url: https://github.com/skywolf829/neuralstreamfunction
  • paper_authors: Skylar Wolfgang Wurster, Hanqi Guo, Tom Peterka, Han-Wei Shen
  • for: 这个论文是为了计算流函数的 neural network 方法,流函数是一种可以描述流体动态的scalar函数,其梯度与给定的vector field正交。
  • methods: 该论文使用了一种 implicit neural network 方法,将输入vector field作为输入,并使用内积loss函数来学习一个流函数。该网络可以将输入坐标映射到流函数值上,并且可以通过梯度内积来保证梯度的正交性。
  • results: 该论文的结果表明,使用这种方法可以生成高质量的流函数解,并且可以根据不同的regularizing loss函数来生成流函数解的不同版本。另外,论文还提供了一些关于如何正确visualize和提取artefact-free的流函数解的建议。
    Abstract We present a neural network approach to compute stream functions, which are scalar functions with gradients orthogonal to a given vector field. As a result, isosurfaces of the stream function extract stream surfaces, which can be visualized to analyze flow features. Our approach takes a vector field as input and trains an implicit neural representation to learn a stream function for that vector field. The network learns to map input coordinates to a stream function value by minimizing the inner product of the gradient of the neural network's output and the vector field. Since stream function solutions may not be unique, we give optional constraints for the network to learn particular stream functions of interest. Specifically, we introduce regularizing loss functions that can optionally be used to generate stream function solutions whose stream surfaces follow the flow field's curvature, or that can learn a stream function that includes a stream surface passing through a seeding rake. We also discuss considerations for properly visualizing the trained implicit network and extracting artifact-free surfaces. We compare our results with other implicit solutions and present qualitative and quantitative results for several synthetic and simulated vector fields.
    摘要 我们提出了一种神经网络方法来计算流函数,这些函数的梯度与给定的vector场垂直。因此,iso面流函数提取流面,可以用于分析流体特征。我们的方法通过输入vector场来训练一个隐式神经表示,以学习一个流函数。神经网络将输入坐标映射到流函数值上,通过神经网络输出的梯度和vector场的内积来进行折叠。由于流函数解可能不唯一,我们可以选择ally加入regularizing loss函数,以学习特定的流函数解。例如,我们可以添加一个束制约损失函数,使流函数解的流面与流体场的弯曲性相符,或者学习一个流函数解,其中流面通过种子托铁 passing through。我们还讨论了对训练完成后的隐式神经表示进行正确的visual化和提取 artifact-free的流面。我们与其他隐式解相比较,并对一些synthetic和simulated vector fields进行了质量和量化的结果。

DynamicFL: Balancing Communication Dynamics and Client Manipulation for Federated Learning

  • paper_url: http://arxiv.org/abs/2308.06267
  • repo_url: None
  • paper_authors: Bocheng Chen, Nikolay Ivanov, Guangjing Wang, Qiben Yan
  • for: 这篇论文的目的是提出一个新的联合学习(Federated Learning,FL)框架,以解决联合学习中的高系统多样性问题。
  • methods: 这篇论文使用了一个特殊的客户端操作策略,将客户端选择基于其网络预测和训练数据质量。此外,它还使用了一个长期追攻策略来解决网络动态环境中的性能下降问题。
  • results: 相比于现有的客户端选择方案,这篇论文的方法可以实现更好的模型准确性,并且只需要18.9%-84.0%的墙时钟时间。此外,论文还进行了分成和敏感性研究,以证明其在实际应用中的稳定性和可靠性。
    Abstract Federated Learning (FL) is a distributed machine learning (ML) paradigm, aiming to train a global model by exploiting the decentralized data across millions of edge devices. Compared with centralized learning, FL preserves the clients' privacy by refraining from explicitly downloading their data. However, given the geo-distributed edge devices (e.g., mobile, car, train, or subway) with highly dynamic networks in the wild, aggregating all the model updates from those participating devices will result in inevitable long-tail delays in FL. This will significantly degrade the efficiency of the training process. To resolve the high system heterogeneity in time-sensitive FL scenarios, we propose a novel FL framework, DynamicFL, by considering the communication dynamics and data quality across massive edge devices with a specially designed client manipulation strategy. \ours actively selects clients for model updating based on the network prediction from its dynamic network conditions and the quality of its training data. Additionally, our long-term greedy strategy in client selection tackles the problem of system performance degradation caused by short-term scheduling in a dynamic network. Lastly, to balance the trade-off between client performance evaluation and client manipulation granularity, we dynamically adjust the length of the observation window in the training process to optimize the long-term system efficiency. Compared with the state-of-the-art client selection scheme in FL, \ours can achieve a better model accuracy while consuming only 18.9\% -- 84.0\% of the wall-clock time. Our component-wise and sensitivity studies further demonstrate the robustness of \ours under various real-life scenarios.
    摘要 联合学习(FL)是一种分布式机器学习(ML)模式,旨在透过分散在数百万副本设备(例如移动设备、车辆、火车、或地铁)上的分散数据,训练全球模型。相比中央学习,FL 保持客户端隐私,不直接下载客户端数据。然而,在野外的分散式边缘设备上,因为高度动态的网络环境,聚合所有模型更新从参与设备会带来不可预测的长尾延迟。这将严重损害训练过程的效率。为解决高度系统多样性的时间敏感FL情况下,我们提出了一个新的FL框架,即动态FL,通过考虑网络预测和训练数据质量,对于大量边缘设备进行特殊设计的客户端操作策略。我们在选择客户端进行模型更新时,会根据其网络条件预测和训练数据质量进行选择。此外,我们还使用了长期追击策略,以解决因为短期调度而导致的系统性能下降。最后,为寻求训练过程中的平衡,我们在训练过程中动态调整观察窗口的长度,以便最佳化系统效率。相比之前的客户端选择方案,我们的方案可以获得更好的模型精度,并且只需消耗18.9%-84.0%的壁网时间。我们的组件实验和敏感性研究显示了我们的方案在实际情况下的可持续性。

Heterogeneous graphs model spatial relationships between biological entities for breast cancer diagnosis

  • paper_url: http://arxiv.org/abs/2307.08132
  • repo_url: None
  • paper_authors: Akhila Krishna K, Ravi Kant Gupta, Nikhil Cherian Kurian, Pranav Jeevan, Amit Sethi
  • for: 这篇论文旨在提高乳癌早期检测、诊断和治疗选择的准确性,挑战乳癌病例的多样性。
  • methods: 这篇论文使用graph neural network(GNN)来捕捉乳癌病例中细胞和组织之间的空间关系,并将细胞和组织转换为网络结构,以提高对乳癌病例的检测和诊断。
  • results: 这篇论文的模型在三个公开可用的乳癌数据集上(BRIGHT、BreakHis和BACH)上显示出超过比较器-基于的现有方法的高准确性,并且显示出较低的参数数量和训练时间。
    Abstract The heterogeneity of breast cancer presents considerable challenges for its early detection, prognosis, and treatment selection. Convolutional neural networks often neglect the spatial relationships within histopathological images, which can limit their accuracy. Graph neural networks (GNNs) offer a promising solution by coding the spatial relationships within images. Prior studies have investigated the modeling of histopathological images as cell and tissue graphs, but they have not fully tapped into the potential of extracting interrelationships between these biological entities. In this paper, we present a novel approach using a heterogeneous GNN that captures the spatial and hierarchical relations between cell and tissue graphs to enhance the extraction of useful information from histopathological images. We also compare the performance of a cross-attention-based network and a transformer architecture for modeling the intricate relationships within tissue and cell graphs. Our model demonstrates superior efficiency in terms of parameter count and achieves higher accuracy compared to the transformer-based state-of-the-art approach on three publicly available breast cancer datasets -- BRIGHT, BreakHis, and BACH.
    摘要 breast cancer 的多样性呈现出较大的检测早期、诊断和治疗选择的挑战。 convolutional neural networks 经常忽略图像中的空间关系,这可能会限制其精度。 graph neural networks (GNNs)提供了一个有前途的解决方案,通过编码图像中的空间关系。先前的研究已经研究了模型 histopathological 图像为细胞和组织图像,但它们没有充分利用了提取这些生物体系间的关系的潜在。在本文中,我们提出了一种新的方法,使用多样性 GNN 捕捉图像中的空间和层次关系,以提高对 histopathological 图像的EXTRACT 有用信息。我们还对 cross-attention 网络和 transformer 架构进行比较,以模型图像中的复杂关系。我们的模型在三个公共可用的 breast cancer 数据集(BRIGHT、BreakHis 和 BACH)上达到了更高的准确率,并且在参数计数方面表现出了更高的效率,比较 transformer 基于 state-of-the-art 方法。

INFLECT-DGNN: Influencer Prediction with Dynamic Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.08131
  • repo_url: https://github.com/banking-analytics-lab/inflect
  • paper_authors: Elena Tiukhova, Emiliano Penaloza, María Óskarsdóttir, Bart Baesens, Monique Snoeck, Cristián Bravo
  • for: 本研究旨在透过 integrate 动态图 neural network 和 recurrent neural network 等技术,提高 influencer 预测的准确性。
  • methods: 本研究提出了一种新的 INFLECT-DGNN 框架, combining 图 neural network 和 recurrent neural network ,并使用 weighted loss functions、Synthetic Minority Oversampling TEchnique (SMOTE) 和 rolling-window strategy。
  • results: 研究结果表明,使用 RNN 编码时间特征并与 GNN 结合使用,能够显著提高 influencer 预测的准确性。 compare various models, demonstrate capture 图表示、时间依赖和使用财务驱动方法评价的重要性。
    Abstract Leveraging network information for predictive modeling has become widespread in many domains. Within the realm of referral and targeted marketing, influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the ongoing development of customer-brand relationships. To elaborate this idea, we introduce INFLECT-DGNN, a new framework for INFLuencer prEdiCTion with Dynamic Graph Neural Networks that combines Graph Neural Networks (GNN) and Recurrent Neural Networks (RNN) with weighted loss functions, the Synthetic Minority Oversampling TEchnique (SMOTE) adapted for graph data, and a carefully crafted rolling-window strategy. To evaluate predictive performance, we utilize a unique corporate data set with networks of three cities and derive a profit-driven evaluation methodology for influencer prediction. Our results show how using RNN to encode temporal attributes alongside GNNs significantly improves predictive performance. We compare the results of various models to demonstrate the importance of capturing graph representation, temporal dependencies, and using a profit-driven methodology for evaluation.
    摘要 利用网络信息进行预测模型已经在多个领域广泛应用。在推荐和目标营销领域中,Influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the ongoing development of customer-brand relationships。为了开发这个想法,我们介绍了 INFLECT-DGNN,一个新的框架 для INFLuencer prEdiCTion with Dynamic Graph Neural Networks,该框架结合图 neural network (GNN) 和回归神经网络 (RNN),并使用负权重函数、Synthetic Minority Oversampling TEchnique (SMOTE) adapted for graph data,以及一种精心制定的滚动窗口策略。为了评估预测性能,我们使用了一个独特的企业数据集,并 derivated a profit-driven evaluation methodology for influencer prediction。我们的结果表明,使用 RNN 来编码时间特征 alongside GNNs 可以显著提高预测性能。我们对各种模型进行比较,以示出捕捉图表示、时间依赖和使用财务驱动的评估方法的重要性。

Tangent Transformers for Composition, Privacy and Removal

  • paper_url: http://arxiv.org/abs/2307.08122
  • repo_url: None
  • paper_authors: Tian Yu Liu, Aditya Golatkar, Stefano Soatto
  • for: 这篇论文是为了提出一种名为 Tangent Attention Fine-Tuning (TAFT) 的方法,用于精度调整 linearized transformers。
  • methods: 这种方法使用计算 First-order Taylor Expansion 的方式来计算 Jacobian-Vector Product,从而在单个前进 pass 中计算训练和推理成本,与原始非线性网络相同的数目参数。
  • results: 当应用于不同的视觉分类任务时,使用 TAFT 精度调整 Tangent Transformer 可以与原始非线性网络精度调整相比肩,而且在同样的参数数量下。此外,TAFT 具有许多优点,例如模型组合、并行训练、机器忘却和差分隐私等。
    Abstract We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy.
    摘要 我们介绍 Tangent Attention Fine-Tuning(TAFT),一种精简 Linearized Transformers 的方法,通过计算首项泰利扩展来初始化预训练。我们证明了 Jacobian-Vector Product 的计算可以在单一前进中进行高效地进行,因此训练和测试成本与原始非线性网络相同的阶层,同时使用相同的参数数量。此外,我们显示了在不同的下游视觉分类任务中,使用 TAFT 精简 Tangent Transformer 的 fine-tuning 可以与原始非线性网络的 fine-tuning 相比。因为 Tangent Transformers 是对新的参数集线性的,并且 fine-tuning 的损失函数是凸函数,我们显示了 TAFT 在模型结构、平行训练、机器学习推广和数据隐私方面具有多个优点。

Domain Generalisation with Bidirectional Encoder Representations from Vision Transformers

  • paper_url: http://arxiv.org/abs/2307.08117
  • repo_url: https://github.com/sw-packages/d23c4b6afa05094a23071333bd230aceceec08117355003f5c0ea958e60c9c98
  • paper_authors: Hamza Riaz, Alan F. Smeaton
  • for: 这篇论文主要针对domain generalization问题,即将知识从源领域传递到未见领域,以实现深度学习模型的通用化。
  • methods: 本论文使用了vision transformer(ViT)、LeViT、DeiT和BEIT四种架构进行领域通用化,并在out-of-distribution(OOD)数据上进行初步评估。最终选择了BEIT架构进行进一步的实验。
  • results: 本论文的结果显示,使用BEIT架构进行领域通用化可以获得显著的提升,具体来说是在PACS、Home-Office和DomainNet三个benchmark上有着优秀的验证和测试准确率表现。此外,本论文的实现也能够填补在 Within-distribution和OOD数据之间的差距。
    Abstract Domain generalisation involves pooling knowledge from source domain(s) into a single model that can generalise to unseen target domain(s). Recent research in domain generalisation has faced challenges when using deep learning models as they interact with data distributions which differ from those they are trained on. Here we perform domain generalisation on out-of-distribution (OOD) vision benchmarks using vision transformers. Initially we examine four vision transformer architectures namely ViT, LeViT, DeiT, and BEIT on out-of-distribution data. As the bidirectional encoder representation from image transformers (BEIT) architecture performs best, we use it in further experiments on three benchmarks PACS, Home-Office and DomainNet. Our results show significant improvements in validation and test accuracy and our implementation significantly overcomes gaps between within-distribution and OOD data.
    摘要 域名总结是将多个源域的知识汇集到一个可以总结到未看到的目标域的模型中。近期在域名总结中使用深度学习模型时,面临了与训练数据分布不同的数据分布相互作用的挑战。我们在out-of-distribution(OOD)视觉审核中进行域名总结,初步分析了四种视觉变换器架构,即ViT、LeViT、DeiT和BEIT。其中, bidirectional encoder representation from image transformers(BEIT)架构表现最佳,因此我们在三个审核标准 benchmark(PACS、Home-Office和DomainNet)上进行了进一步的实验。我们的结果表明,使用BEIT架构可以在验证和测试精度上实现显著改进,并且我们的实现可以弥补在 dentro-distribution和OOD数据之间的差距。

Tangent Model Composition for Ensembling and Continual Fine-tuning

  • paper_url: http://arxiv.org/abs/2307.08114
  • repo_url: None
  • paper_authors: Tian Yu Liu, Stefano Soatto
  • for: 这种方法用于组合独立地练习过的模型,以实现逐步学习、集成、或忘记学习。
  • methods: 这种方法使用 Tangent Model Composition (TMC) 方法,该方法可以在推理时将组件模型相加、缩放或减去,以支持逐步学习、集成、或忘记学习。
  • results: TMC 方法可以提高精度,比采用非线性练习模型的集成方法高出4.2%,并且在推理成本的2.5倍至10倍的减少下,推理成本与单个模型相同。此外,TMC 方法可以免除额外成本,并且不会留下任何残留效应。
    Abstract Tangent Model Composition (TMC) is a method to combine component models independently fine-tuned around a pre-trained point. Component models are tangent vectors to the pre-trained model that can be added, scaled, or subtracted to support incremental learning, ensembling, or unlearning. Component models are composed at inference time via scalar combination, reducing the cost of ensembling to that of a single model. TMC improves accuracy by 4.2% compared to ensembling non-linearly fine-tuned models at a 2.5x to 10x reduction of inference cost, growing linearly with the number of component models. Each component model can be forgotten at zero cost, with no residual effect on the resulting inference. When used for continual fine-tuning, TMC is not constrained by sequential bias and can be executed in parallel on federated data. TMC outperforms recently published continual fine-tuning methods almost uniformly on each setting -- task-incremental, class-incremental, and data-incremental -- on a total of 13 experiments across 3 benchmark datasets, despite not using any replay buffer. TMC is designed for composing models that are local to a pre-trained embedding, but could be extended to more general settings.
    摘要 tangent模型组合(TMC)是一种方法,可以独立地微调component模型,然后在预训练点上组合。component模型是预训练模型的 tangent вектор,可以加、乘、减以支持逐步学习、集成或忘记学习。在推理时,component模型通过scalar组合来实现,因此推理成本只是一个模型的成本。TMC提高了精度4.2%,相比 Ensemble非线性微调模型,并且在推理成本的2.5倍至10倍之间减少了1.5倍。每个component模型可以忘记于零成本,无残留效果。在用于 continual fine-tuning 时,TMC不受顺序偏见的限制,可以在 federated data 上并行执行。TMC在任务逐步、类逐步和数据逐步的13个实验中,对 reciprocal fine-tuning 方法 almost uniformly 的性能优于其他方法,即使不使用 replay buffer。TMC是针对本地预训练 embedding 的模型组合方法,可以扩展到更广泛的设置。

Discovering a reaction-diffusion model for Alzheimer’s disease by combining PINNs with symbolic regression

  • paper_url: http://arxiv.org/abs/2307.08107
  • repo_url: None
  • paper_authors: Zhen Zhang, Zongren Zou, Ellen Kuhl, George Em Karniadakis
  • for: 这些研究旨在描述阿尔ツ海默病的发展和病理过程中,蛋白质tau的折叠错误的角色。
  • methods: 这些研究使用深度学习和人工智能技术,以发现阿尔ツ海默病的数学模型。具体来说,他们使用物理学 Informed Neural Networks (PINNs) 和符号回归来发现tau蛋白质折叠错误的征化方程。
  • results: 这些研究发现,在46名可能发展阿尔ツ海默病的个体和30名健康控制群体的tau蛋白质扫描数据上,使用PINNs和符号回归可以发现不同的折叠模型,而且阿尔ツ海默病群体的折叠模型比健康控制群体快。这些结果表明,PINNs 和符号回归可以用于发现阿尔ツ海默病中tau蛋白质折叠错误的数学模型。
    Abstract Misfolded tau proteins play a critical role in the progression and pathology of Alzheimer's disease. Recent studies suggest that the spatio-temporal pattern of misfolded tau follows a reaction-diffusion type equation. However, the precise mathematical model and parameters that characterize the progression of misfolded protein across the brain remain incompletely understood. Here, we use deep learning and artificial intelligence to discover a mathematical model for the progression of Alzheimer's disease using longitudinal tau positron emission tomography from the Alzheimer's Disease Neuroimaging Initiative database. Specifically, we integrate physics informed neural networks (PINNs) and symbolic regression to discover a reaction-diffusion type partial differential equation for tau protein misfolding and spreading. First, we demonstrate the potential of our model and parameter discovery on synthetic data. Then, we apply our method to discover the best model and parameters to explain tau imaging data from 46 individuals who are likely to develop Alzheimer's disease and 30 healthy controls. Our symbolic regression discovers different misfolding models $f(c)$ for two groups, with a faster misfolding for the Alzheimer's group, $f(c) = 0.23c^3 - 1.34c^2 + 1.11c$, than for the healthy control group, $f(c) = -c^3 +0.62c^2 + 0.39c$. Our results suggest that PINNs, supplemented by symbolic regression, can discover a reaction-diffusion type model to explain misfolded tau protein concentrations in Alzheimer's disease. We expect our study to be the starting point for a more holistic analysis to provide image-based technologies for early diagnosis, and ideally early treatment of neurodegeneration in Alzheimer's disease and possibly other misfolding-protein based neurodegenerative disorders.
    摘要 互助蛋白质在阿尔茨海默病的发展和病理中扮演了关键角色。最新的研究表明,蛋白质的折叠发生 follows a reaction-diffusion type equation的特征。然而,正确的数学模型和参数,用于描述蛋白质的发展过程,仍然未得到完全理解。在这里,我们使用深度学习和人工智能,以发现阿尔茨海默病的数学模型。特别是,我们将物理学习神经网络(PINNs)和符号回归相结合,以找到蛋白质折叠的液态方程。首先,我们在 sintetic data 上验证了我们的模型和参数的潜力。然后,我们将我们的方法应用于46名可能发展阿尔茨海默病的个体和30名健康控制组的tau imaging数据中,以发现最佳的模型和参数,以解释蛋白质的折叠。我们的符号回归发现了两个组的不同的折叠模型,即 $f(c) = 0.23c^3 - 1.34c^2 + 1.11c$ 和 $f(c) = -c^3 + 0.62c^2 + 0.39c$。我们的结果表明,PINNs,补充符号回归,可以发现阿尔茨海默病中蛋白质折叠的液态方程。我们预计我们的研究将成为蛋白质折叠技术的开端,以提供早期诊断和治疗阿尔茨海默病和其他折叠蛋白质基因性神经退化疾病的技术。

Using Decision Trees for Interpretable Supervised Clustering

  • paper_url: http://arxiv.org/abs/2307.08104
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Natallia Kokash, Leonid Makhnist
  • for: 本研究探讨了对标注数据集中的分类数据进行可解释的分群问题,即interpretable supervised clustering。
  • methods: 本文提出了一种迭代方法,使用基于决策树的分类器作为最直观的学习方法,并讨论了节点选择方法以提高分群质量。
  • results: 本文获得了高密度分群,并通过描述分群的规则集来描述分群。
    Abstract In this paper, we address an issue of finding explainable clusters of class-uniform data in labelled datasets. The issue falls into the domain of interpretable supervised clustering. Unlike traditional clustering, supervised clustering aims at forming clusters of labelled data with high probability densities. We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules. We propose an iterative method to extract high-density clusters with the help of decisiontree-based classifiers as the most intuitive learning method, and discuss the method of node selection to maximize quality of identified groups.
    摘要 在本文中,我们讨论了一个标签数据集中找到可解释的封闭集的问题。这个问题属于可解释supervised clustering的领域。不同于传统封闭,supervised clustering寻求高概率密度的封闭,以便更好地描述数据。我们特别关注找到某个类型的数据的封闭,并使用设计树基于分类器来描述封闭。我们提出了一种迭代方法,使用决策树基于分类器来提取高密度封闭,并讨论了选择节点以提高寻索到的集的质量。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

A max-affine spline approximation of neural networks using the Legendre transform of a convex-concave representation

  • paper_url: http://arxiv.org/abs/2307.09602
  • repo_url: https://github.com/adamgoodtime/legendre_net
  • paper_authors: Adam Perrett, Danny Wood, Gavin Brown
  • for: 该 paper 的目的是提出一种将神经网络转换为spline表示的算法。
  • methods: 该算法不需要 convex 和 piecewise-affine 网络操作符,而是关注函数的 bounded-ness 和 second derivative 的定义性。
  • results: 该算法可以在整个网络中进行,而不仅仅是在每层独立进行。这种方法可以 bridge 神经网络和近似理论之间,同时允许抽象网络特征图。实验证明了该算法的正确性和效果。
    Abstract This work presents a novel algorithm for transforming a neural network into a spline representation. Unlike previous work that required convex and piecewise-affine network operators to create a max-affine spline alternate form, this work relaxes this constraint. The only constraint is that the function be bounded and possess a well-define second derivative, although this was shown experimentally to not be strictly necessary. It can also be performed over the whole network rather than on each layer independently. As in previous work, this bridges the gap between neural networks and approximation theory but also enables the visualisation of network feature maps. Mathematical proof and experimental investigation of the technique is performed with approximation error and feature maps being extracted from a range of architectures, including convolutional neural networks.
    摘要 这个研究提出了一种新的算法,用于将神经网络转换成spline表示形式。与前一些研究不同,这个算法不需要几何和分割的网络运算符来创建一个最大 afine spline alternate form。它只需要函数是有界的,且具有定义的二阶导数,尽管实验表明这并不是必要的。此外,这个算法还可以在整个网络上进行,而不仅是每层独立进行。与以前的研究相似,这种技术将神经网络与近似理论相连接,同时允许网络特征地图的可视化。这个研究包括数学证明和实验调查,并从多种架构,包括卷积神经网络中提取了近似误差和特征地图。

EasyTPP: Towards Open Benchmarking the Temporal Point Processes

  • paper_url: http://arxiv.org/abs/2307.08097
  • repo_url: https://github.com/ant-research/easytemporalpointprocess
  • paper_authors: Siqiao Xue, Xiaoming Shi, Zhixuan Chu, Yan Wang, Fan Zhou, Hongyan Hao, Caigao Jiang, Chen Pan, Yi Xu, James Y. Zhang, Qingsong Wen, Jun Zhou, Hongyuan Mei
  • for: This paper is written to establish a central benchmark for evaluating temporal point processes (TPPs) in order to promote reproducible research and accelerate progress in the field.
  • methods: The paper uses eight highly cited neural TPPs and integrates commonly used evaluation metrics and datasets into a standardized benchmarking pipeline. The benchmark is implemented in a universal framework that supports multiple machine learning libraries and custom implementations.
  • results: The paper presents a comprehensive implementation of TPPs and a standardized benchmarking pipeline for comparing different methods on different datasets, which can help promote reproducible research and accelerate progress in the field. The benchmark is open-sourced and available at a Github repository.
    Abstract Continuous-time event sequences play a vital role in real-world domains such as healthcare, finance, online shopping, social networks, and so on. To model such data, temporal point processes (TPPs) have emerged as the most advanced generative models, making a significant impact in both academic and application communities. Despite the emergence of many powerful models in recent years, there is still no comprehensive benchmark to evaluate them. This lack of standardization impedes researchers and practitioners from comparing methods and reproducing results, potentially slowing down progress in this field. In this paper, we present EasyTPP, which aims to establish a central benchmark for evaluating TPPs. Compared to previous work that also contributed datasets, our EasyTPP has three unique contributions to the community: (i) a comprehensive implementation of eight highly cited neural TPPs with the integration of commonly used evaluation metrics and datasets; (ii) a standardized benchmarking pipeline for a transparent and thorough comparison of different methods on different datasets; (iii) a universal framework supporting multiple ML libraries (e.g., PyTorch and TensorFlow) as well as custom implementations. Our benchmark is open-sourced: all the data and implementation can be found at this \href{https://github.com/ant-research/EasyTemporalPointProcess}{\textcolor{blue}{Github repository}\footnote{\url{https://github.com/ant-research/EasyTemporalPointProcess}.}. We will actively maintain this benchmark and welcome contributions from other researchers and practitioners. Our benchmark will help promote reproducible research in this field, thus accelerating research progress as well as making more significant real-world impacts.
    摘要 continuous-time event sequences在真实世界中的应用领域,如医疗、金融、在线购物、社交网络等,扮演着重要的角色。为模型这种数据,时间点过程(TPP)已经成为最先进的生成模型,在学术和应用社区中产生了深见的影响。尽管最近几年出现了许多强大的模型,但是还没有一个通用的标准准则来评估它们。这种标准化的缺失使得研究人员和实践者无法比较方法和重现结果,可能会抑制这个领域的进步。在这篇论文中,我们提出了EasyTPP,它的目标是建立TPP的中心评估标准。与之前的工作相比,EasyTPP有三个独特的贡献:(i)对八种最具影响力的神经网络TPP进行了完整的实现,并集成了通用的评估指标和数据集;(ii)提供了一个标准化的评估管道,使得不同的方法在不同的数据集上进行了公平的比较;(iii)支持多种Machine Learning库(如PyTorch和TensorFlow)以及自定义实现。我们的标准是开源的:所有的数据和实现可以在这个 \href{https://github.com/ant-research/EasyTemporalPointProcess}{\textcolor{blue}{Github repository} 中找到。我们将积极维护这个标准,并欢迎其他研究人员和实践者的贡献。我们的标准将助推可重复性的研究进步,从而加速这个领域的研究进步,并在真实世界中产生更 significative的影响。

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning

  • paper_url: http://arxiv.org/abs/2307.09218
  • repo_url: https://github.com/ennengyang/awesome-forgetting-in-deep-learning
  • paper_authors: Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang
  • for: This paper is written to provide a comprehensive survey of forgetting in deep learning, beyond its conventional boundaries, and to explore the potential advantages of forgetting in certain cases, such as privacy-preserving scenarios.
  • methods: The paper uses a broad range of methods to examine forgetting in various research domains within deep learning, including generative models and federated learning. It also draws upon ideas and approaches from other fields that have dealt with forgetting.
  • results: The paper presents a nuanced understanding of forgetting as a double-edged sword, highlighting its potential advantages in certain cases, and provides a comprehensive list of papers about forgetting in various research fields. It encourages the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications.Here’s the Chinese translation of the three key points:
  • for: 这篇论文是为了提供深度学习中忘记的全面检讨,超出传统的 bound,并 explore忘记在特定情况下的优点,如隐私保护场景。
  • methods: 论文使用了多种方法来检查深度学习中忘记的不同领域,包括生成器模型和联合学习。它还借鉴了其他领域对忘记的经验和方法。
  • results: 论文提供了忘记作为双刃剑的综合理解,并 highlight了忘记在特定情况下的优点。它还提供了关于忘记的广泛列表,并鼓励开发新的方法来 mitigate、利用或甚至欢迎忘记在实际应用中。
    Abstract Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}.
    摘要 忘卷(Forgetting)指的是在学习过程中失去或衰退已经获得的信息或知识。 existing surveys on forgetting 主要集中在持续学习领域,但是忘卷在深度学习中的研究领域中也是非常普遍的现象。忘卷在生成模型中的生成器变化和 federated learning 中的客户端数据分布不同而导致的现象。 Addressing 忘卷涉及到保持过去任务知识的 equilibrio 和快速学习新任务的挑战,以及处理任务干扰和矛盾目标的挑战。此外,现有的持续学习survey implicit assumes that forgetting is always harmful。相反,我们的survey argue that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios。通过探讨忘卷在更广泛的上下文中,我们希望呈现一种更加细腻的理解,并高亮其潜在的优点。通过这种全面的survey,我们希望探讨可以从不同领域中的想法和方法中练习解决忘卷。在未来的工作中,我们希望通过探讨忘卷的不同方面,激发开发 novel strategies for mitigating, harnessing, or even embracing forgetting in real applications。一个完整的关于忘卷的paper的列表可以在 \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning} 中找到。