cs.LG - 2023-11-11

Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques

  • paper_url: http://arxiv.org/abs/2311.06690
  • repo_url: None
  • paper_authors: Ari Karchmer
  • for: 这paper主要研究了在agnostic learning模型中设计高效的算法(Haussler, 1992; Kearns et al., 1994)。
  • methods: 本paper使用了membership queries方法,特别是针对touchstone classes的frontier agnostic learning问题。
  • results: 本paper提出了多种agnostic learning算法,其中包括一个可以处理具有折衣数量的gate的circuit,并且可以在2^n时间内运行,而不是默认的2^n时间。此外,paper还提出了一个可以处理任意函数计算的\sym^+ circuit的算法,并且可以在2^n时间内运行。
    Abstract (Abridged) Designing computationally efficient algorithms in the agnostic learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult. In this work, we consider agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning, with a focus on how much computation can be saved over the trivial runtime of 2^n$. This approach is inspired by and continues the study of ``learning with nontrivial savings'' (Servedio and Tan, 2017). To this end, we establish multiple agnostic learning algorithms, highlighted by: 1. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, which can each be any function computable by a sublogarithmic degree k polynomial threshold function (the depth of the circuit is bounded only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx n/(k+1), and learns over the uniform distribution over unlabelled examples on \{0,1\}^n. 2. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, where each can be any function computable by a \sym^+ circuit of subexponential size and sublogarithmic degree k. This algorithm runs in time 2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of unlabelled examples that are products of k+1 arbitrary and unknown distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality that k+1 divides n).
    摘要 (简化)在agnostic学习模型(Haussler,1992;Kearns等,1994)中设计 computationally efficient algorithm 是非常困难的。在这个工作中,我们考虑agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning,并关注如何在基于2^n的极端情况下节省计算时间。这种方法是servedio和tan(2017)的研究继续。为了实现这一目标,我们提出了多种agnostic learning算法,其中包括:1. 一种agnostic learning算法 для满足一个子线性数量的门的Circuit,每个门可以是一个可以由k度多项式阈值函数计算的任意函数(Circuit的深度只决定了大小)。这个算法在时间2^n-s(n)中运行,其中s(n)约等于n/(k+1),并在 uniform distribution over unlabelled examples on \{0,1\}^n上学习。2. 一种agnostic learning算法 для满足一个子线性数量的门的Circuit,其中每个门可以是一个可以由subexponential size和k度多项式阈值函数计算的任意函数(Circuit的深度只决定了大小)。这个算法在时间2^n-s(n)中运行,其中s(n)约等于n/(k+1),并在分布 over unlabelled examples是k+1个未知和无标签的分布的产物上学习,即assume without loss of generality that k+1 divides n。

Heuristic Optimal Transport in Branching Networks

  • paper_url: http://arxiv.org/abs/2311.06650
  • repo_url: None
  • paper_authors: M. Andrecut
  • for: 学习一种可以在网络上最优化运输的方法,以最小化成本。
  • methods: 使用快速的规则来生成分支结构,以便在网络上实现最优化运输。
  • results: 提供了一些应用场景,例如在社交网络上的人员调配和物流网络中的货物分配。
    Abstract Optimal transport aims to learn a mapping of sources to targets by minimizing the cost, which is typically defined as a function of distance. The solution to this problem consists of straight line segments optimally connecting sources to targets, and it does not exhibit branching. These optimal solutions are in stark contrast with both natural, and man-made transportation networks, where branching structures are prevalent. Here we discuss a fast heuristic branching method for optimal transport in networks, and we provide several applications.
    摘要 Translation in Simplified Chinese:优化交通目标是学习源到目标的映射,通常通过距离定义成本来实现。解决这个问题的解是直线段最优连接源到目标,无分支结构。这些优化解与自然和人工交通网络不同,后者通常具有分支结构。我们介绍了一种快速冒泡分支方法优化交通网络,并提供了多个应用。

Privacy Risks Analysis and Mitigation in Federated Learning for Medical Images

  • paper_url: http://arxiv.org/abs/2311.06643
  • repo_url: https://github.com/mlsysx/medpfl
  • paper_authors: Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu
  • for: 本研究旨在分析和 Mitigate Medical data Privacy risk in Federated Learning (FL) 中的隐私风险。
  • methods: 本研究提出了一个整体的框架(MedPFL)来分析和 Mitigate FL 中隐私风险,并在实验中表明了对医疗数据的隐私攻击的极大威胁。
  • results: 研究发现,通过加入随机噪声来保护医疗数据的防御策略可能不一定有效,存在独特和紧迫的医疗数据隐私挑战。
    Abstract Federated learning (FL) is gaining increasing popularity in the medical domain for analyzing medical images, which is considered an effective technique to safeguard sensitive patient data and comply with privacy regulations. However, several recent studies have revealed that the default settings of FL may leak private training data under privacy attacks. Thus, it is still unclear whether and to what extent such privacy risks of FL exist in the medical domain, and if so, ``how to mitigate such risks?''. In this paper, first, we propose a holistic framework for Medical data Privacy risk analysis and mitigation in Federated Learning (MedPFL) to analyze privacy risks and develop effective mitigation strategies in FL for protecting private medical data. Second, we demonstrate the substantial privacy risks of using FL to process medical images, where adversaries can easily perform privacy attacks to reconstruct private medical images accurately. Third, we show that the defense approach of adding random noises may not always work effectively to protect medical images against privacy attacks in FL, which poses unique and pressing challenges associated with medical data for privacy protection.
    摘要 受到批评的学习(Federated Learning,FL)在医疗领域的应用正在增加,用于分析医疗图像,这被视为一种有效的技术来保护敏感的病人数据和遵守隐私法规。然而,一些最近的研究表明,FL的默认设置可能会泄露敏感训练数据面临隐私攻击。因此,在医疗领域中是否存在这种隐私风险,以及如何 Mitigate 这些风险仍然是一个未知。在这篇论文中,我们提出了一个整体的框架,以便在 Federated Learning 中进行医疗数据隐私风险分析和降低(MedPFL),以分析隐私风险并开发有效的降低策略,以保护敏感的医疗数据。其次,我们示出了使用 FL 处理医疗图像时存在严重的隐私风险,敌方可以轻松地进行隐私攻击,以重建私人医疗图像。最后,我们表明了在 FL 中添加随机噪声可能无法有效地保护医疗图像 Against 隐私攻击,这增加了医疗数据隐私保护的特殊挑战。

The Exact Determinant of a Specific Class of Sparse Positive Definite Matrices

  • paper_url: http://arxiv.org/abs/2311.06632
  • repo_url: None
  • paper_authors: Mehdi Molkaraie
  • for: 这篇论文是为了解决一种特定的稀畴 Gaussian graphical model 的 determinant 问题而写的。
  • methods: 这篇论文使用了 Normal Factor Graph Duality Theorem 和 holographic algorithms 来提供一个关闭式解决方案,即通过 Matrix Determinant Lemma 对 transformed graphical model 进行处理。
  • results: 这篇论文提供了一个关闭式表达式,用于计算稀畴 Gaussian graphical model 的 determinant。此外, paper 还定义了一种等价关系 между两个 Gaussian graphical model。
    Abstract For a specific class of sparse Gaussian graphical models, we provide a closed-form solution for the determinant of the covariance matrix. In our framework, the graphical interaction model (i.e., the covariance selection model) is equal to replacement product of $\mathcal{K}_{n}$ and $\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$ vertices. Our analysis is based on taking the Fourier transform of the local factors of the model, which can be viewed as an application of the Normal Factor Graph Duality Theorem and holographic algorithms. The closed-form expression is obtained by applying the Matrix Determinant Lemma on the transformed graphical model. In this context, we will also define a notion of equivalence between two Gaussian graphical models.
    摘要 For a specific class of sparse Gaussian graphical models, we provide a closed-form solution for the determinant of the covariance matrix. In our framework, the graphical interaction model (i.e., the covariance selection model) is equal to the replacement product of $\mathcal{K}_{n}$ and $\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$ vertices. Our analysis is based on taking the Fourier transform of the local factors of the model, which can be viewed as an application of the Normal Factor Graph Duality Theorem and holographic algorithms. The closed-form expression is obtained by applying the Matrix Determinant Lemma on the transformed graphical model. In this context, we will also define a notion of equivalence between two Gaussian graphical models.Here's the translation:为特定类型的稀疏 Gaussian 图模型,我们提供一个关闭式解的 determinant 表达。在我们的框架中,图模型的交互模型(即covariance 选择模型)等于 $\mathcal{K}_{n}$ 和 $\mathcal{K}_{n-1}$ 的交换乘积,其中 $\mathcal{K}_n$ 是一个完全图 WITH $n$ 个顶点。我们的分析基于图模型的本地ifactors 的傅ри幂变换,这可以看作是 Normal Factor Graph Duality Theorem 和 holographic algorithms 的应用。关闭式表达是通过应用 Matrix Determinant Lemma onto the transformed graphical model 获得的。在这个上下文中,我们还将定义 Gaussian 图模型之间的一种相等性。

Streamlining Energy Transition Scenarios to Key Policy Decisions

  • paper_url: http://arxiv.org/abs/2311.06625
  • repo_url: None
  • paper_authors: Florian Joseph Baader, Stefano Moret, Wolfram Wiesemann, Iain Staffell, André Bardow
  • For: The paper is written to provide an approach for interpreting and prioritizing key factors in the energy transition, specifically in the context of global decarbonization scenarios and a fossil-free Europe.* Methods: The paper uses decision trees, a popular machine-learning technique, to derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked.* Results: The paper demonstrates that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand, and that the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification.Here is the information in Simplified Chinese text:* For: 这篇论文是为了提供一种方法来解释和优先级化能源转型的关键因素,具体是在全球减排场景和不burn欧洲的背景下。* Methods: 这篇论文使用决策树,一种流行的机器学习技术,来 derivates interpretable storylines从多个量化enario中,并显示了能源转型中关键决策之间的关联。* Results: 这篇论文发现,选择高部署的可再生能源和部署相互连接会使全球减排场景对气候敏感度和需求的不确定性 exhibit robustness,而不burn欧洲的能源转型主要取决于生物能源、存储和热电气化的角色。
    Abstract Uncertainties surrounding the energy transition often lead modelers to present large sets of scenarios that are challenging for policymakers to interpret and act upon. An alternative approach is to define a few qualitative storylines from stakeholder discussions, which can be affected by biases and infeasibilities. Leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked. Specifically, our results demonstrate that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand. Also, the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification. Our transferrable approach translates vast energy model results into a small set of critical decisions, guiding decision-makers in prioritizing the key factors that will shape the energy transition.
    摘要 uncertainties surrounding the energy transition often lead modelers to present large sets of scenarios that are challenging for policymakers to interpret and act upon. an alternative approach is to define a few qualitative storylines from stakeholder discussions, which can be affected by biases and infeasibilities. leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked. specifically, our results demonstrate that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand. also, the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification. our transferrable approach translates vast energy model results into a small set of critical decisions, guiding decision-makers in prioritizing the key factors that will shape the energy transition.Here's the text with some additional information about the Simplified Chinese translation:The Simplified Chinese translation is written in 简化字符 (Simplified Chinese characters) rather than 正体字符 (Traditional Chinese characters). This is because Simplified Chinese is more widely used in mainland China and other countries, while Traditional Chinese is more commonly used in Hong Kong, Macau, and Taiwan.In the translation, some technical terms and concepts have been translated into Simplified Chinese, such as "能源转型" (energy transition), "可再生能源" (renewable energy), and "燃料电池" (fuel cell). However, some terms and concepts have been retained in English, such as "scenarios" and "storylines," as there may not be direct equivalents in Simplified Chinese.Additionally, some sentence structures and wording have been adjusted to conform to the grammatical conventions of Simplified Chinese. For example, in the sentence "Leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked," the word order has been adjusted to place the verb "derive" before the object "interpretable storylines" to conform to Simplified Chinese sentence structure.

Sparse Attention-Based Neural Networks for Code Classification

  • paper_url: http://arxiv.org/abs/2311.06575
  • repo_url: None
  • paper_authors: Ziyang Xiang, Zaixi Zhang, Qi Liu
    for: 这个论文是为了解决实际programming教育平台中的代码分类问题而写的。methods: 这个论文使用了模型基于抽象语法树(ASTs)的方法,包括 syntax parsing和递归神经网络编码,以及一种特制的稀疏注意机制。results: 对于代码分类任务,这个方法能够提供高效精准的分类结果,并且可以解决之前相关研究中的问题,如不完整的分类标签和小型数据集。
    Abstract Categorizing source codes accurately and efficiently is a challenging problem in real-world programming education platform management. In recent years, model-based approaches utilizing abstract syntax trees (ASTs) have been widely applied to code classification tasks. We introduce an approach named the Sparse Attention-based neural network for Code Classification (SACC) in this paper. The approach involves two main steps: In the first step, source code undergoes syntax parsing and preprocessing. The generated abstract syntax tree is split into sequences of subtrees and then encoded using a recursive neural network to obtain a high-dimensional representation. This step simultaneously considers both the logical structure and lexical level information contained within the code. In the second step, the encoded sequences of subtrees are fed into a Transformer model that incorporates sparse attention mechanisms for the purpose of classification. This method efficiently reduces the computational cost of the self-attention mechanisms, thus improving the training speed while preserving effectiveness. Our work introduces a carefully designed sparse attention pattern that is specifically designed to meet the unique needs of code classification tasks. This design helps reduce the influence of redundant information and enhances the overall performance of the model. Finally, we also deal with problems in previous related research, which include issues like incomplete classification labels and a small dataset size. We annotated the CodeNet dataset with algorithm-related labeling categories, which contains a significantly large amount of data. Extensive comparative experimental results demonstrate the effectiveness and efficiency of SACC for the code classification tasks.
    摘要 优化代码分类任务的准确性和效率是现实世界程序教育平台管理中的挑战。在过去几年,基于抽象树(AST)的模型方法在代码分类任务中得到了广泛的应用。我们在这篇论文中介绍了一种名为代码分类 neural network with sparse attention(SACC)的方法。该方法包括两个主要步骤:第一步:源代码进行语法分析和处理,并将生成的抽象树分解成多个子树序列,然后使用回归神经网络编码以获得高维度表示。这一步同时考虑了代码的逻辑结构和字面层次信息。第二步:编码后的子树序列被传输到一个包含稀缺注意机制的Transformer模型中,用于分类。这种方法可以有效减少自注意机制的计算成本,从而提高训练速度,同时保持效果。我们还设计了一种特殊的稀缺注意模式,用于满足代码分类任务的唯一需求。这种设计可以减少重复信息的影响,提高模型的总性能。最后,我们还解决了过去相关研究中的一些问题,如 incomplete classification labels和小型数据集。我们对CodeNet数据集进行了算法相关标签注释,该数据集包含很大量数据。我们进行了广泛的比较 эксперименталь研究,证明了 SACC 在代码分类任务中的有效性和效率。

Convolve and Conquer: Data Comparison with Wiener Filters

  • paper_url: http://arxiv.org/abs/2311.06558
  • repo_url: https://github.com/dpelacani/AWLoss
  • paper_authors: Deborah Pelacani Cruz, George Strong, Oscar Bates, Carlos Cueto, Jiashun Yao, Lluis Guasch
  • for: 这个论文是为了提出一种新的数据比较方法,用于量化评估数据样本之间的差异和相似性。
  • methods: 该方法基于温因 filter 理论,通过卷积方式对数据样本进行全面比较,以便更好地捕捉数据分布的特征。
  • results: 研究人员在四种机器学习应用中使用该方法,包括数据压缩、医学影像填充、翻译类别和非Parametric生成模型。结果表明,该方法可以提供更高的数据准确率和更好的感知质量,同时具有对摆动的Robustness。
    Abstract Quantitative evaluations of differences and/or similarities between data samples define and shape optimisation problems associated with learning data distributions. Current methods to compare data often suffer from limitations in capturing such distributions or lack desirable mathematical properties for optimisation (e.g. smoothness, differentiability, or convexity). In this paper, we introduce a new method to measure (dis)similarities between paired samples inspired by Wiener-filter theory. The convolutional nature of Wiener filters allows us to comprehensively compare data samples in a globally correlated way. We validate our approach in four machine learning applications: data compression, medical imaging imputation, translated classification, and non-parametric generative modelling. Our results demonstrate increased resolution in reconstructed images with better perceptual quality and higher data fidelity, as well as robustness against translations, compared to conventional mean-squared-error analogue implementations.
    摘要 量化评估数据样本之间的差异和相似性定义和shape优化问题相关于学习数据分布。现有的比较方法 oft suffer from capturing这些分布的限制或缺乏优化中desirable的数学性质(例如,smoothness、 differentiability或convexity)。本文引入一种新的方法来衡量paired samples之间的(dis)similarities, draws inspiration from Wiener-filter theory。Wiener filters的卷积性质允许我们全面比较数据样本,并且在全球相关的方式下进行比较。我们在四种机器学习应用中 validate我们的方法:数据压缩、医学影像补充、翻译类别和非 Parametric生成模型。我们的结果表明我们的方法可以提供更高的重建图像分辨率、更好的感知质量和更高的数据准确性,同时具有对于平移的Robustness。

Graph ODE with Factorized Prototypes for Modeling Complicated Interacting Dynamics

  • paper_url: http://arxiv.org/abs/2311.06554
  • repo_url: None
  • paper_authors: Xiao Luo, Yiyang Gu, Huiyu Jiang, Jinsheng Huang, Wei Ju, Ming Zhang, Yizhou Sun
  • for: 本研究探讨了模型交互动力系统的问题,这对理解物理动力和生物过程都是关键。
  • methods: 研究使用了 геометрические图进行表示交互关系,然后使用强大的图神经网络(GNNs)进行捕捉。
  • results: 研究提出了一种新的方法 named Graph ODE with factorized prototypes (GOAT),可以解决难以预测交互动力的问题,包括偏移量和复杂的基础规则。 GOAT 使用了分解原型的方法来提取对象级和系统级的上下文知识,从而提高了模型的通用性。
    Abstract This paper studies the problem of modeling interacting dynamical systems, which is critical for understanding physical dynamics and biological processes. Recent research predominantly uses geometric graphs to represent these interactions, which are then captured by powerful graph neural networks (GNNs). However, predicting interacting dynamics in challenging scenarios such as out-of-distribution shift and complicated underlying rules remains unsolved. In this paper, we propose a new approach named Graph ODE with factorized prototypes (GOAT) to address the problem. The core of GOAT is to incorporate factorized prototypes from contextual knowledge into a continuous graph ODE framework. Specifically, GOAT employs representation disentanglement and system parameters to extract both object-level and system-level contexts from historical trajectories, which allows us to explicitly model their independent influence and thus enhances the generalization capability under system changes. Then, we integrate these disentangled latent representations into a graph ODE model, which determines a combination of various interacting prototypes for enhanced model expressivity. The entire model is optimized using an end-to-end variational inference framework to maximize the likelihood. Extensive experiments in both in-distribution and out-of-distribution settings validate the superiority of GOAT.
    摘要 To address this challenge, we propose a new approach called Graph ODE with factorized prototypes (GOAT). The core of GOAT is to incorporate factorized prototypes from contextual knowledge into a continuous graph ODE framework. Specifically, GOAT extracts both object-level and system-level contexts from historical trajectories using representation disentanglement and system parameters, allowing us to explicitly model their independent influence and enhance the generalization capability under system changes.Next, we integrate these disentangled latent representations into a graph ODE model, which combines various interacting prototypes for enhanced model expressivity. The entire model is optimized using an end-to-end variational inference framework to maximize the likelihood.Experimental results in both in-distribution and out-of-distribution settings demonstrate the superiority of GOAT. This paper provides a new approach to modeling interacting dynamic systems, which can be applied to various fields such as physical dynamics and biological processes.

From Charts to Atlas: Merging Latent Spaces into One

  • paper_url: http://arxiv.org/abs/2311.06547
  • repo_url: None
  • paper_authors: Donato Crisostomi, Irene Cannistraci, Luca Moschella, Pietro Barbiero, Marco Ciccone, Pietro Liò, Emanuele Rodolà
  • for: 这个研究的目的是创建一个汇集多个相关任务和数据集的综合空间,以便进行更好的分类。
  • methods: 这个研究使用了相对表示来使多个空间相似,然后使用简单的均值来汇集这些空间。
  • results: 研究发现,通过这种方法可以创建一个更好的分类空间,并且这个空间中含有任务特有的印记。此外,这种方法还可以在没有共同区域的情况下进行空间汇集,尽管效果不如结合所有任务的模型。
    Abstract Models trained on semantically related datasets and tasks exhibit comparable inter-sample relations within their latent spaces. We investigate in this study the aggregation of such latent spaces to create a unified space encompassing the combined information. To this end, we introduce Relative Latent Space Aggregation, a two-step approach that first renders the spaces comparable using relative representations, and then aggregates them via a simple mean. We carefully divide a classification problem into a series of learning tasks under three different settings: sharing samples, classes, or neither. We then train a model on each task and aggregate the resulting latent spaces. We compare the aggregated space with that derived from an end-to-end model trained over all tasks and show that the two spaces are similar. We then observe that the aggregated space is better suited for classification, and empirically demonstrate that it is due to the unique imprints left by task-specific embedders within the representations. We finally test our framework in scenarios where no shared region exists and show that it can still be used to merge the spaces, albeit with diminished benefits over naive merging.
    摘要 模型在semantically相关的数据集和任务上训练后,其间的inter-sample关系在幂空间中相似。本研究 investigate这种情况下,如何将这些幂空间融合成一个涵盖所有信息的共同空间。为此,我们提出了相对表示空间融合(Relative Latent Space Aggregation),它包括两个步骤:首先使用相对表示来使幂空间相似,然后使用简单的均值来融合它们。我们在三种不同的设置下分别训练了一个模型:分享样本、分享类别或者不分享任何内容。然后我们训练了每个任务的模型,并将其所得到的幂空间融合起来。我们与一个结束到终端模型训练所有任务的空间进行比较,并发现它们之间的关系很相似。我们还观察到,融合后的空间更适合分类,并且实际上表明了任务特定的嵌入器在表示中留下了独特的印记。最后,我们在没有共同区域的情况下测试了我们的框架,并发现它仍可以将空间融合,尽管效果不如预期。

Understanding Generalization via Set Theory

  • paper_url: http://arxiv.org/abs/2311.06545
  • repo_url: None
  • paper_authors: Shiqi Liu
  • for: 本研究旨在更好地理解机器学习模型的泛化性。
  • methods: 本研究使用集合论来引入算法、假设和数据集泛化的概念。我们分析了数据集泛化的性质,并证明了一个关于代理泛化过程的定理。这个定理导致了我们的泛化方法。
  • results: 通过对MNIST数据集进行泛化实验,我们获得了13,541个样本基。当使用整个训练集来评估模型性能时,模型的准确率达99.945%。但是如果将样本基Shift或修改神经网络结构,模型的性能会受到显著的下降。我们还发现了一些难以预测的样本,并发现它们都是挑战性的示例。实验证明了泛化定义的准确性和我们提出的方法的有效性。
    Abstract Generalization is at the core of machine learning models. However, the definition of generalization is not entirely clear. We employ set theory to introduce the concepts of algorithms, hypotheses, and dataset generalization. We analyze the properties of dataset generalization and prove a theorem on surrogate generalization procedures. This theorem leads to our generalization method. Through a generalization experiment on the MNIST dataset, we obtain 13,541 sample bases. When we use the entire training set to evaluate the model's performance, the models achieve an accuracy of 99.945%. However, if we shift the sample bases or modify the neural network structure, the performance experiences a significant decline. We also identify consistently mispredicted samples and find that they are all challenging examples. The experiments substantiated the accuracy of the generalization definition and the effectiveness of the proposed methods. Both the set-theoretic deduction and the experiments help us better understand generalization.
    摘要 <>将文本翻译成简化中文。<>机器学习模型的核心是泛化。然而,泛化的定义并不很明确。我们使用集合论来介绍算法、假设和数据泛化的概念。我们分析数据泛化的性质并证明了代替泛化过程的定理。这个定理导致我们的泛化方法。通过对 MNIST 数据集进行泛化实验,我们获得了13541个样本基。当我们使用整个训练集来评估模型的性能时,模型的准确率为99.945%。然而,如果将样本基shift或修改神经网络结构,模型的性能会受到显著的下降。我们还发现了一些难以预测的样本,并发现它们都是挑战性的示例。实验证明了泛化定义的准确性和我们提议的方法的有效性。同时,集合论 deduction 和实验帮助我们更好地理解泛化。

TURBO: The Swiss Knife of Auto-Encoders

  • paper_url: http://arxiv.org/abs/2311.06527
  • repo_url: None
  • paper_authors: Guillaume Quétant, Yury Belousov, Vitaliy Kinakh, Slava Voloshynovskiy
  • for: 本研究旨在系统地分析和总结自动编码方法的信息理论基础。
  • methods: 该框架基于两个方向的共聚information flow的最大化,以derive its core concept。
  • results: 研究发现多个常见神经网络模型都可以被包含在该框架中,而信息瓶颈概念无法涵盖这些模型,因此TURBO框架成为一个更好的理论参照。
    Abstract We present a novel information-theoretic framework, termed as TURBO, designed to systematically analyse and generalise auto-encoding methods. We start by examining the principles of information bottleneck and bottleneck-based networks in the auto-encoding setting and identifying their inherent limitations, which become more prominent for data with multiple relevant, physics-related representations. The TURBO framework is then introduced, providing a comprehensive derivation of its core concept consisting of the maximisation of mutual information between various data representations expressed in two directions reflecting the information flows. We illustrate that numerous prevalent neural network models are encompassed within this framework. The paper underscores the insufficiency of the information bottleneck concept in elucidating all such models, thereby establishing TURBO as a preferable theoretical reference. The introduction of TURBO contributes to a richer understanding of data representation and the structure of neural network models, enabling more efficient and versatile applications.
    摘要 我们提出了一种新的信息理论框架,称之为TURBO,用于系统地分析和总结自编码方法。我们从自编码设置中检查信息瓶颈和瓶颈基础网络的原则,并指出其内在的限制,尤其是数据具有多个相关的物理相关表示。然后,我们介绍了TURBO框架,其核心思想是在两个方向强制实现各种数据表示之间的最大共同信息。我们示示了许多流行的神经网络模型都包含在这个框架内。文章强调信息瓶颈概念无法描述所有这些模型,因此Establish TURBO作为更好的理论参照。TURBO的引入将推动数据表示和神经网络模型的结构更深入理解,并提供更有效和灵活的应用。

CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset

  • paper_url: http://arxiv.org/abs/2311.06505
  • repo_url: None
  • paper_authors: Le Chen, Arijit Bhattacharjee, Nesreen K. Ahmed, Niranjan Hasabnis, Gal Oren, Bin Lei, Ali Jannesari
  • for: 提高 LLM 在 C 和 C++ 代码生成和理解方面的表现
  • methods: 使用编译器作为教师,通过 CompCodeVet approach 提高 LLM 的 zero-shot 思维能力
  • results: CompCodeVet 在两个开源代码集中进行评估,显示其能够改善 LLM 的训练数据质量
    Abstract Large language models (LLMs) have become increasingly prominent in academia and industry due to their remarkable performance in diverse applications. As these models evolve with increasing parameters, they excel in tasks like sentiment analysis and machine translation. However, even models with billions of parameters face challenges in tasks demanding multi-step reasoning. Code generation and comprehension, especially in C and C++, emerge as significant challenges. While LLMs trained on code datasets demonstrate competence in many tasks, they struggle with rectifying non-compilable C and C++ code. Our investigation attributes this subpar performance to two primary factors: the quality of the training dataset and the inherent complexity of the problem which demands intricate reasoning. Existing "Chain of Thought" (CoT) prompting techniques aim to enhance multi-step reasoning. This approach, however, retains the limitations associated with the latent drawbacks of LLMs. In this work, we propose CompCodeVet, a compiler-guided CoT approach to produce compilable code from non-compilable ones. Diverging from the conventional approach of utilizing larger LLMs, we employ compilers as a teacher to establish a more robust zero-shot thought process. The evaluation of CompCodeVet on two open-source code datasets shows that CompCodeVet has the ability to improve the training dataset quality for LLMs.
    摘要

Stacked networks improve physics-informed training: applications to neural networks and deep operator networks

  • paper_url: http://arxiv.org/abs/2311.06483
  • repo_url: None
  • paper_authors: Amanda A Howard, Sarah H Murphy, Shady E Ahmed, Panos Stinis
  • for: 解决physics-informed neural networks和operator networks困难或无法准确地训练某些物理系统方程的问题。
  • methods: 提出了一种新的多优化框架,通过逐步堆叠physics-informed neural networks和operator networks来促进训练。每一步的输出可以作为下一步的低精度输入进行训练,逐步增加学习的模型表达能力。在每一步的迭代过程中,可以使用相同或不同的方程来模拟热处理(类似于随机扰动)。
  • results: 通过使用 benchmark问题,包括非线性摆车、波方程和viscous Burgers方程,我们展示了堆叠可以提高physics-informed neural networks和operator networks的准确率,并降低它们的大小。
    Abstract Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks.
    摘要 physics-informed neural networks 和 operator networks 已经展示了解决物理系统方程的能力。然而,这些网络可能具有一些或所有系统方程难以准确地训练。我们提出了一种新的多优化框架,用于栈层physics-informed neural networks 和 operator networks,以便训练。我们逐步建立一串网络,其输出在一个步骤可以作为下一步训练的低精度输入,逐步增加学习的模型表达能力。在每个迭代步骤中,可以使用相同或不同的方程(类似于模拟热处理)。我们的方法的迭代性让我们可以逐步学习解决方程中的难以直接学习的特征。通过使用不同的测试问题,包括非线性摆、波方程和粘性拜尔斯方程,我们证明了栈层可以提高physics-informed neural networks 和 operator networks的准确率,同时减少这些网络的大小。

Topology-Matching Normalizing Flows for Out-of-Distribution Detection in Robot Learning

  • paper_url: http://arxiv.org/abs/2311.06481
  • repo_url: None
  • paper_authors: Jianxiang Feng, Jongseok Lee, Simon Geisler, Stephan Gunnemann, Rudolph Triebel
  • for: 提高自主机器人在实际世界中可靠部署的可靠性,通过异常检测能力。
  • methods: 使用Normalizing Flows(NFs)进行异常检测,但是在使用NFs时,往往会遇到复杂的目标分布与基础分布之间的匹配问题。这里我们使用一种表达力强的分布来匹配目标分布的 topology。
  • results: 在density estimation和2D对象检测benchmark中获得了较好的结果,并且在实际 robot部署中也展现出了良好的性能。
    Abstract To facilitate reliable deployments of autonomous robots in the real world, Out-of-Distribution (OOD) detection capabilities are often required. A powerful approach for OOD detection is based on density estimation with Normalizing Flows (NFs). However, we find that prior work with NFs attempts to match the complex target distribution topologically with naive base distributions leading to adverse implications. In this work, we circumvent this topological mismatch using an expressive class-conditional base distribution trained with an information-theoretic objective to match the required topology. The proposed method enjoys the merits of wide compatibility with existing learned models without any performance degradation and minimum computation overhead while enhancing OOD detection capabilities. We demonstrate superior results in density estimation and 2D object detection benchmarks in comparison with extensive baselines. Moreover, we showcase the applicability of the method with a real-robot deployment.
    摘要 In this work, we address this limitation by using an expressive class-conditional base distribution trained with an information-theoretic objective to match the required topology. Our method is compatible with existing learned models, incurs minimal computation overhead, and enhances OOD detection capabilities. We demonstrate superior performance in density estimation and 2D object detection benchmarks compared to extensive baselines, and showcase the practicality of our method with a real-robot deployment.

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

  • paper_url: http://arxiv.org/abs/2311.06480
  • repo_url: https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound
  • paper_authors: June-Woo Kim, Chihyeon Yoon, Miika Toikkanen, Sangmin Bae, Ho-Young Jung
  • for: 提高呼吸音数据的分类性能,特别是对少数类型的呼吸音进行改进。
  • methods: 使用音频扩散模型作为 Conditional Neural Vocoder,并实现对呼吸音数据的增强。
  • results: 对ICBHI dataset进行实验,并证明了我们的反对抗学习方法可以提高呼吸音分类性能,并且在一些少数类型上提高了准确率。
    Abstract Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.
    摘要 深度生成模型在医疗图像领域已经出现为数据稀缺问题提供了一个有前途的解决方案。然而,它们在时序数据如呼吸音波中的应用还较少。在这个工作中,我们提出了一种简单直观的增强呼吸音波数据不均衡问题的方法,利用音频扩散模型作为受控神经 vocoder。我们还提出了一种简单又有效的对抗训练方法,用于对真实呼吸音波样本和生成的呼吸音波样本进行对齐特征。我们的实验结果表明,我们的对抗训练方法是有效的,只使用常见增强方法时则会导致性能下降。此外,我们的方法比基线方法高2.24%的ICBHI Score和加强少数类准确率最高26.58%。详细的实验结果和代码可以在 GitHub 上找到:https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound。

Online Continual Learning via Logit Adjusted Softmax

  • paper_url: http://arxiv.org/abs/2311.06460
  • repo_url: https://github.com/k1nght/online_cl_logit_adjusted_softmax
  • paper_authors: Zhehao Huang, Tao Li, Chenhe Yuan, Yingwen Wu, Xiaolin Huang
  • for: 本研究旨在解决在线 continual learning 问题,即模型在非站ARY数据流中学习时避免衰老现象,并且减少最近学习类别的预测偏见。
  • methods: 本研究使用了理论分析,发现了间类差异完全由类别预置带来,并且通过调整模型征值来实现 Bayes-优论法。
  • results: 我们的方法可以有效地避免类别预置的影响,并在实际场景下提供显著的性能改进(比如 CIFAR10 上的最佳基eline 提高4.6%),而且增加了非常少的计算成本。
    Abstract Online continual learning is a challenging problem where models must learn from a non-stationary data stream while avoiding catastrophic forgetting. Inter-class imbalance during training has been identified as a major cause of forgetting, leading to model prediction bias towards recently learned classes. In this paper, we theoretically analyze that inter-class imbalance is entirely attributed to imbalanced class-priors, and the function learned from intra-class intrinsic distributions is the Bayes-optimal classifier. To that end, we present that a simple adjustment of model logits during training can effectively resist prior class bias and pursue the corresponding Bayes-optimum. Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups, with little additional computational cost. We evaluate our approach on various benchmarks and demonstrate significant performance improvements compared to prior arts. For example, our approach improves the best baseline by 4.6% on CIFAR10.
    摘要 (online continuous learning是一个困难的问题,where models must learn from a non-stationary data stream while avoiding catastrophic forgetting. Inter-class imbalance during training has been identified as a major cause of forgetting, leading to model prediction bias towards recently learned classes. In this paper, we theoretically analyze that inter-class imbalance is entirely attributed to imbalanced class-priors, and the function learned from intra-class intrinsic distributions is the Bayes-optimal classifier. To that end, we present that a simple adjustment of model logits during training can effectively resist prior class bias and pursue the corresponding Bayes-optimum. Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups, with little additional computational cost. We evaluate our approach on various benchmarks and demonstrate significant performance improvements compared to prior arts. For example, our approach improves the best baseline by 4.6% on CIFAR10.)

Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding

  • paper_url: http://arxiv.org/abs/2311.06456
  • repo_url: None
  • paper_authors: Hao Xu, Yifei Wang, Yunrui Li, Pengyu Hong
  • for: 这篇论文旨在提出一种新的多模态深度学习方法,用于提高化学研究和应用。
  • methods: 这篇论文使用了异形对比学习方法,将化学多modalities的信息转移到分子图表示中,以实现多modalities的共同理解。
  • results: 实验表明,ACML可以帮助化学研究人员更好地理解分子的含义,并提高化学应用的表达力和可解释性。
    Abstract The versatility of multimodal deep learning holds tremendous promise for advancing scientific research and practical applications. As this field continues to evolve, the collective power of cross-modal analysis promises to drive transformative innovations, leading us to new frontiers in chemical understanding and discovery. Hence, we introduce Asymmetric Contrastive M}ultimodal Learning (ACML) as a novel approach tailored for molecules, showcasing its potential to advance the field of chemistry. ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations. By combining pre-trained chemical unimodal encoders and a shallow-designed graph encoder, ACML facilitates the assimilation of coordinated chemical semantics from different modalities, leading to comprehensive representation learning with efficient training. This innovative framework enhances the interpretability of learned representations and bolsters the expressive power of graph neural networks. Through practical tasks such as isomer discrimination and uncovering crucial chemical properties for drug discovery, ACML exhibits its capability to revolutionize chemical research and applications, providing a deeper understanding of chemical semantics of different modalities.
    摘要 多模态深度学习的多样性具有推进科学研究和实用应用的巨大承诺。随着这个领域的进一步发展,跨模态分析的共同力将驱动 transformative 创新,带我们进入新的化学理解和发现的前iers。因此,我们介绍 Asymmetric Contrastive Multimodal Learning(ACML)作为一种新的方法,特地设计用于分子,展示其在化学领域的潜在发展 potential。ACML 利用有效的不对称对比学习来传递不同化学modalities中的各种 semantics 到分子图表示。通过将预训练的化学uni模态编码器和一个浅层设计的图编码器结合在一起,ACML 实现了模态之间的协调化学 semantics的同化,从而实现了全面的表示学习,并且可以高效地训练。这种创新的框架提高了学习表示的可读性和图神经网络的表达能力。通过实际任务,如分子同分子识别和找到重要的药物发现中的化学性质,ACML 展示了其在化学研究和应用中的革命性潜力,为不同modalities的化学semantics提供了更深刻的理解。

A Saliency-based Clustering Framework for Identifying Aberrant Predictions

  • paper_url: http://arxiv.org/abs/2311.06454
  • repo_url: None
  • paper_authors: Aina Tersol Montserrat, Alexander R. Loftus, Yael Daihes
  • for: 这篇论文旨在提高机器学习分类器在高度不确定的生物医学应用中的可靠性和信任性。
  • methods: 本论文提出了一种新的训练方法,旨在降低误分率并识别异常预测。
  • results: 本论文的方法在 veterinary radiology 领域中实现了20%的精度提升。
    Abstract In machine learning, classification tasks serve as the cornerstone of a wide range of real-world applications. Reliable, trustworthy classification is particularly intricate in biomedical settings, where the ground truth is often inherently uncertain and relies on high degrees of human expertise for labeling. Traditional metrics such as precision and recall, while valuable, are insufficient for capturing the nuances of these ambiguous scenarios. Here we introduce the concept of aberrant predictions, emphasizing that the nature of classification errors is as critical as their frequency. We propose a novel, efficient training methodology aimed at both reducing the misclassification rate and discerning aberrant predictions. Our framework demonstrates a substantial improvement in model performance, achieving a 20\% increase in precision. We apply this methodology to the less-explored domain of veterinary radiology, where the stakes are high but have not been as extensively studied compared to human medicine. By focusing on the identification and mitigation of aberrant predictions, we enhance the utility and trustworthiness of machine learning classifiers in high-stakes, real-world scenarios, including new applications in the veterinary world.
    摘要 Simplified Chinese:机器学习中的分类任务是广泛应用的基础。在生物医学设置下,可靠、可信的分类特别复杂,因为ground truth的自然状况是 uncertain,需要高度的人类专业知识进行标注。传统的精度和 recall 指标不能 Capture 这些抽象的情况。我们提出了异常预测的概念,强调异常预测的性质是重要的,不仅是频率。我们提出了一种新的训练方法,可以减少错分率,并且可以识别异常预测。我们的框架在 veterinary radiology 领域中实现了20%的提升精度。我们将这种方法应用到未经充分研究的 veterinary 世界,以提高机器学习分类器在高风险、真实世界中的可靠性和可信worthiness。

Mitigating Pooling Bias in E-commerce Search via False Negative Estimation

  • paper_url: http://arxiv.org/abs/2311.06444
  • repo_url: None
  • paper_authors: Xiaochen Wang, Xiao Xiao, Ruhan Zhang, Xuan Zhang, Taesik Na, Tejaswi Tenneti, Haixun Wang, Fenglong Ma
  • for: 提高用户体验和商业成功,需要准确和高效地评估产品相关性。
  • methods: 使用新的偏见抑制硬性负采样策略(BHNS),可以减轻pooling bias,提高性能和商业影响。
  • results: 在Instacart搜索设置中,BHNS实现了实用电商应用。此外,对公共数据集进行比较分析,表明BHNS具有适用于多种应用场景的领域独特性。
    Abstract Efficient and accurate product relevance assessment is critical for user experiences and business success. Training a proficient relevance assessment model requires high-quality query-product pairs, often obtained through negative sampling strategies. Unfortunately, current methods introduce pooling bias by mistakenly sampling false negatives, diminishing performance and business impact. To address this, we present Bias-mitigating Hard Negative Sampling (BHNS), a novel negative sampling strategy tailored to identify and adjust for false negatives, building upon our original False Negative Estimation algorithm. Our experiments in the Instacart search setting confirm BHNS as effective for practical e-commerce use. Furthermore, comparative analyses on public dataset showcase its domain-agnostic potential for diverse applications.
    摘要 高效和准确的产品相关性评估对用户体验和商业成功至关重要。训练一个高效的相关性评估模型需要高质量的查询-产品对,常常通过负样本策略获得。然而,现有方法带有汇总偏见,由于错误地抽取假负样本,导致性能和商业影响减退。为解决这问题,我们提出了减少偏见的负样本选择策略(BHNS),基于我们原始的假负样本估计算算法。我们在Instacart搜索设置中进行了实验,证实BHNS在实际电商应用中是有效的。此外,我们对公共数据集进行了比较分析,显示BHNS在多种应用领域具有领域无关的潜在应用潜力。