cs.LG - 2023-09-12

Bregman Graph Neural Network

  • paper_url: http://arxiv.org/abs/2309.06645
  • repo_url: https://github.com/jiayuzhai1207/bregmangnn
  • paper_authors: Jiayu Zhai, Lequan Lin, Dai Shi, Junbin Gao
  • for: 本研究旨在提出一种基于布雷格曼距离的二级优化框架,以解决图神经网络(GNN)在节点分类任务中的过度简化问题。
  • methods: 本研究使用了布雷格曼距离的想法,设计了一种新的GNN层,并通过实验 validate 其效果。
  • results: 对比原始GNN层,布雷格曼GNN层能够更好地 mitigate 过度简化问题,并且在多层情况下仍然保持良好的学习精度。
    Abstract Numerous recent research on graph neural networks (GNNs) has focused on formulating GNN architectures as an optimization problem with the smoothness assumption. However, in node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes, leading to adverse effects such as over-smoothing and misclassification. In this paper, we propose a novel bilevel optimization framework for GNNs inspired by the notion of Bregman distance. We demonstrate that the GNN layer proposed accordingly can effectively mitigate the over-smoothing issue by introducing a mechanism reminiscent of the "skip connection". We validate our theoretical results through comprehensive empirical studies in which Bregman-enhanced GNNs outperform their original counterparts in both homophilic and heterophilic graphs. Furthermore, our experiments also show that Bregman GNNs can produce more robust learning accuracy even when the number of layers is high, suggesting the effectiveness of the proposed method in alleviating the over-smoothing issue.
    摘要 Recent research on graph neural networks (GNNs) has focused on formulating GNN architectures as optimization problems with the smoothness assumption. However, in node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes, leading to adverse effects such as over-smoothing and misclassification. In this paper, we propose a novel bilevel optimization framework for GNNs inspired by the notion of Bregman distance. We demonstrate that the GNN layer proposed accordingly can effectively mitigate the over-smoothing issue by introducing a mechanism reminiscent of the "skip connection". We validate our theoretical results through comprehensive empirical studies in which Bregman-enhanced GNNs outperform their original counterparts in both homophilic and heterophilic graphs. Furthermore, our experiments also show that Bregman GNNs can produce more robust learning accuracy even when the number of layers is high, suggesting the effectiveness of the proposed method in alleviating the over-smoothing issue.Here is the translation in Traditional Chinese:近期研究Graph Neural Networks (GNNs) 的方法都集中在设计GNN架构为优化问题中的匀数假设。然而,在节点分类任务中,GNNs 对节点的数据汇合和调整导致节点的表现变得太平等,从而导致过滤和错分类的问题。在这篇论文中,我们提出了一个新的两级优化框架 для GNNs, inspirited by Bregman distance的想法。我们显示了这种GNN层可以有效地减少过滤的问题,通过引入一种"skip connection"的机制。我们透过实验 validate 我们的理论结果,并证明了Bregman-enhanced GNNs 在同样的节点分类任务中表现更好,并且在不同的节点分布情况下也能够获得更好的性能。

Audio-Based Classification of Respiratory Diseases using Advanced Signal Processing and Machine Learning for Assistive Diagnosis Support

  • paper_url: http://arxiv.org/abs/2309.07183
  • repo_url: None
  • paper_authors: Constantino Álvarez Casado, Manuel Lage Cañellas, Matteo Pedone, Xiaoting Wu, Miguel Bordallo López
  • for: 这个研究旨在提高快速检测技术,以帮助诊断呼吸系统疾病。
  • methods: 这个研究使用了一个大型的医疗数据库,并使用了Empirical Mode Decomposition(EMD)和spectral analysis来提取呼吸音数据中的生物对应信号。
  • results: 研究获得了87%的准确率来分别识别健康和疾病个体,并且还使用了六种分类模型来诊断呼吸系统疾病,例如肺炎和呼吸道疾病(COPD)。此外,这个研究还提出了一个年龄和体重指数(BMI)估计模型,以及一个性别分类模型,全都基于呼吸音数据。
    Abstract In global healthcare, respiratory diseases are a leading cause of mortality, underscoring the need for rapid and accurate diagnostics. To advance rapid screening techniques via auscultation, our research focuses on employing one of the largest publicly available medical database of respiratory sounds to train multiple machine learning models able to classify different health conditions. Our method combines Empirical Mode Decomposition (EMD) and spectral analysis to extract physiologically relevant biosignals from acoustic data, closely tied to cardiovascular and respiratory patterns, making our approach apart in its departure from conventional audio feature extraction practices. We use Power Spectral Density analysis and filtering techniques to select Intrinsic Mode Functions (IMFs) strongly correlated with underlying physiological phenomena. These biosignals undergo a comprehensive feature extraction process for predictive modeling. Initially, we deploy a binary classification model that demonstrates a balanced accuracy of 87% in distinguishing between healthy and diseased individuals. Subsequently, we employ a six-class classification model that achieves a balanced accuracy of 72% in diagnosing specific respiratory conditions like pneumonia and chronic obstructive pulmonary disease (COPD). For the first time, we also introduce regression models that estimate age and body mass index (BMI) based solely on acoustic data, as well as a model for gender classification. Our findings underscore the potential of this approach to significantly enhance assistive and remote diagnostic capabilities.
    摘要 首先,我们部署了一个二分类模型,其在健康和疾病个体之间具有87%的平衡准确率。然后,我们使用六类分类模型,其在诊断特定的呼吸疾病,如肺炎和慢性呼吸疾病(COPD)时, achieve a balanced accuracy of 72%。此外,我们还引入了年龄和体重指数(BMI)基于呼吸数据solely的回归模型,以及一个性别分类模型。我们的发现表明,这种方法可以备受提高辅助和远程诊断能力。

Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.06642
  • repo_url: None
  • paper_authors: Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi
  • for: 本研究旨在提高逆Problem的解决效率,通过估算受损度的严重程度来适应不同样本的受损程度,以提高解决效率和计算效率。
  • methods: 本研究使用了扩展的强迫推理模型,通过在推理过程中估算受损度的严重程度来适应不同样本的受损程度,并通过采用精度适应的推理方法来提高解决效率。
  • results: 本研究的实验结果表明,采用估算受损度的严重程度来适应不同样本的受损程度可以提高逆Problem的解决效率和计算效率,并且与现有的扩展推理方法相比,提高了解决效率和计算效率。
    Abstract Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the structure of the ground truth signal, the severity of the degradation, the implicit bias of the reconstruction model and the complex interactions between the above factors. This results in natural sample-by-sample variation in the difficulty of a reconstruction task, which is often overlooked by contemporary techniques. Recently, diffusion-based inverse problem solvers have established new state-of-the-art in various reconstruction tasks. However, they have the drawback of being computationally prohibitive. Our key observation in this paper is that most existing solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in long inference times, subpar performance and wasteful resource allocation. We propose a novel method that we call severity encoding, to estimate the degradation severity of noisy, degraded signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can give useful hints at the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. We utilize latent diffusion posterior sampling to maintain data consistency with observations. We perform experiments on both linear and nonlinear inverse problems and demonstrate that our technique achieves performance comparable to state-of-the-art diffusion-based techniques, with significant improvements in computational efficiency.
    摘要 <>将文本翻译成简化中文。<>逆 проблеme 在多种应用中出现,其目标是从噪声和可能非线性的观测数据中回归到清晰的信号。逆回归问题的difficulty取决于多种因素,如真实的地面信号结构、观测数据的严重程度、逆回归模型的隐式偏见以及这些因素之间的复杂互动。这导致每个样本的逆回归任务的difficulty具有自然的样本差异,通常被当前技术所忽略。在这篇论文中,我们提出了一种新的方法,称为严重编码,以估计噪声损坏的严重程度。我们发现,这个估计的严重程度与真实的损坏水平具有强相关性,并且可以为逆回归任务的样本差异提供有用的提示。此外,我们提出了基于扩散模型的逆回归方法,利用预测的损坏严重程度来细化反扩散抽样 trajectory,以实现样本适应的计算效率。我们使用扩散 posterior 抽样来保持数据的一致性与观测数据。我们在线性和非线性逆回归问题上进行了实验,并证明了我们的技术与当前扩散基于技术相比,可以实现类似的性能,同时具有显著的计算效率提升。

PCN: A Deep Learning Approach to Jet Tagging Utilizing Novel Graph Construction Methods and Chebyshev Graph Convolutions

  • paper_url: http://arxiv.org/abs/2309.08630
  • repo_url: https://github.com/yvsemlani/pcn-jet-tagging
  • paper_authors: Yash Semlani, Mihir Relan, Krithik Ramesh
  • for: 这个论文的目的是提高高能物理实验中的jet标记率,以便搜索新的物理理论。
  • methods: 这个论文使用了深度学习来探索复杂的射击数据中的隐藏模式。它使用了图基的表示方法,并设计了一种叫做Particle Chebyshev Network(PCN)的图神经网络,使用Chebyshev图 convolution来学习图数据。
  • results: 这个论文的实验结果表明,PCN可以提高jet标记率,并且比现有的标记器更高。这个研究开启了将图基的表示方法和ChebConv层应用于高能物理实验的可能性。
    Abstract Jet tagging is a classification problem in high-energy physics experiments that aims to identify the collimated sprays of subatomic particles, jets, from particle collisions and tag them to their emitter particle. Advances in jet tagging present opportunities for searches of new physics beyond the Standard Model. Current approaches use deep learning to uncover hidden patterns in complex collision data. However, the representation of jets as inputs to a deep learning model have been varied, and often, informative features are withheld from models. In this study, we propose a graph-based representation of a jet that encodes the most information possible. To learn best from this representation, we design Particle Chebyshev Network (PCN), a graph neural network (GNN) using Chebyshev graph convolutions (ChebConv). ChebConv has been demonstrated as an effective alternative to classical graph convolutions in GNNs and has yet to be explored in jet tagging. PCN achieves a substantial improvement in accuracy over existing taggers and opens the door to future studies into graph-based representations of jets and ChebConv layers in high-energy physics experiments. Code is available at https://github.com/YVSemlani/PCN-Jet-Tagging.
    摘要 高能物理实验中的喷气标记是一种分类问题,旨在从素子反应中检测和标识喷气,这些喷气是由素子产生的。随着喷气标记的进步,开放了新物理学之外的搜索。现有方法使用深度学习来探索复杂的喷气数据中的隐藏模式。然而,喷气被输入到深度学习模型中的表示方法各异,经常会排除有用的特征。在这个研究中,我们提议一种图格基的喷气表示方法,该方法可以最大化喷气中的信息。为了利用这种表示方法,我们设计了Particle Chebyshev Network(PCN),这是一种使用Chebychev图 convolution(ChebConv)的图神经网络(GNN)。ChebConv已经证明是传统图 convolution的有效替代方案,尚未在喷气标记中使用。PCN实现了与现有标记器相比的显著改善,打开了将来研究图基的喷气表示方法和ChebConv层在高能物理实验中的大门。代码可以在https://github.com/YVSemlani/PCN-Jet-Tagging上获取。

Sleep Stage Classification Using a Pre-trained Deep Learning Model

  • paper_url: http://arxiv.org/abs/2309.07182
  • repo_url: None
  • paper_authors: Hassan Ardeshir, Mohammad Araghi
  • for: 这个研究是为了开发一个基于机器学习的睡眠阶段分类模型,以帮助诊断睡眠障碍、评估治疗效果和理解睡眠阶段与不同的健康状况之间的关系。
  • methods: 这个研究使用了预训练的模型和电encephalogram(EEG)spectrograms的脑信号来开发一个名为“EEGMobile”的机器学习模型。
  • results: 这个模型在一个公开available的数据集“Sleep-EDF20”上取得了86.97%的准确性,比其他已知的模型更高。其中在阶段N1上的准确性为56.4%,也比其他模型更高。这些发现表明这个模型有 potential to achieve better results for the treatment of sleep disorders.
    Abstract One of the common human diseases is sleep disorders. The classification of sleep stages plays a fundamental role in diagnosing sleep disorders, monitoring treatment effectiveness, and understanding the relationship between sleep stages and various health conditions. A precise and efficient classification of these stages can significantly enhance our understanding of sleep-related phenomena and ultimately lead to improved health outcomes and disease treatment. Models others propose are often time-consuming and lack sufficient accuracy, especially in stage N1. The main objective of this research is to present a machine-learning model called "EEGMobile". This model utilizes pre-trained models and learns from electroencephalogram (EEG) spectrograms of brain signals. The model achieved an accuracy of 86.97% on a publicly available dataset named "Sleep-EDF20", outperforming other models proposed by different researchers. Moreover, it recorded an accuracy of 56.4% in stage N1, which is better than other models. These findings demonstrate that this model has the potential to achieve better results for the treatment of this disease.
    摘要 一种常见的人类疾病是睡眠障碍。睡眠阶段的分类扮演着基本的角色在诊断睡眠障碍、监测治疗效果和理解各种健康状况之间的关系。一个精准和高效的分类方法可以有效提高我们对睡眠相关现象的理解,从而导致改善健康结果和疾病治疗。其他研究人员提出的模型经常占用时间和缺乏准确性,尤其是N1阶段。本研究的主要目标是提出一种名为"EEGMobile"的机器学习模型,该模型利用预训练模型和电энцефаogram(EEG)spectrogram的脑信号学习。该模型在公共数据集"Sleep-EDF20"上 achievied an accuracy of 86.97%, outperforming other models proposed by different researchers. Furthermore, it recorded an accuracy of 56.4% in stage N1, which is better than other models. These findings demonstrate that this model has the potential to achieve better results for the treatment of this disease.

$G$-Mapper: Learning a Cover in the Mapper Construction

  • paper_url: http://arxiv.org/abs/2309.06634
  • repo_url: None
  • paper_authors: Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, Sourabh Palande, Sarah Percival, Emilie Purvine
  • for: 这个论文主要是关于如何选择Mapper图的覆盖,以便在扩展TDA中保留数据的本质。
  • methods: 该论文提出了一种基于$G$-means clustering的算法,通过Iteratively进行安德森-达瑞尔测试来选择最佳覆盖,并使用 Gaussian mixture model来决定覆盖的基于数据的分布。
  • results: 实验表明,该算法可以生成高质量的覆盖,使得Mapper图能够 retain the essence of the datasets。
    Abstract The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. The Mapper algorithm requires tuning several parameters in order to generate a "nice" Mapper graph. The paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by conducting iteratively the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model in order to choose carefully the cover based on the distribution of a given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets.
    摘要 “Mapper”算法是一种数据可视化技术,用于批处理数据分析(TDA)中的数据结构。“Mapper”算法需要调整一些参数,以生成一个“好看”的图表。本文主要关注选择“覆盖”参数。我们提出了一种基于$G$-mean clustering的算法,通过每次进行 Anderson-Darling 测试来选择最佳分区。我们的分割过程使用 Gaussian mixture model,以便选择基于数据的分布来选择覆盖。我们的实验表明,我们的算法可以生成一个保留数据的本质的 Mapper 图。

Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning

  • paper_url: http://arxiv.org/abs/2309.06628
  • repo_url: https://github.com/atticusbeachy/multi-fidelity-nn-ensemble-examples
  • paper_authors: Atticus Beachy, Harok Bae, Jose Camberos, Ramana Grandhi
  • for: 这个研究旨在开发一种内置模拟器的类型physics informed neural network,以便优化航空工程系统的设计探索。
  • methods: 这个方法使用多元数据来源,包括不同的随机初始化模型。 ensemble of 模型 realizations 用于评估因为缺乏训练数据而导致的模型建构不确定性。
  • results: 这个研究发现,使用rapid neural network paradigm可以实现快速的模型训练,而无需损失预测精度。 在多个分析例子中,以及一个实际的 hypersonic vehicle 飞行 Parameters 研究中,提出了一种内置模拟器的新型类型。
    Abstract Emulator embedded neural networks, which are a type of physics informed neural network, leverage multi-fidelity data sources for efficient design exploration of aerospace engineering systems. Multiple realizations of the neural network models are trained with different random initializations. The ensemble of model realizations is used to assess epistemic modeling uncertainty caused due to lack of training samples. This uncertainty estimation is crucial information for successful goal-oriented adaptive learning in an aerospace system design exploration. However, the costs of training the ensemble models often become prohibitive and pose a computational challenge, especially when the models are not trained in parallel during adaptive learning. In this work, a new type of emulator embedded neural network is presented using the rapid neural network paradigm. Unlike the conventional neural network training that optimizes the weights and biases of all the network layers by using gradient-based backpropagation, rapid neural network training adjusts only the last layer connection weights by applying a linear regression technique. It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy. The proposed method is demonstrated on multiple analytical examples, as well as an aerospace flight parameter study of a generic hypersonic vehicle.
    摘要 <>转换文本到简化中文。<>模拟器内置神经网络,它是physics informed neural network的一种类型,利用多种数据来源进行航空工程系统设计的有效探索。多个神经网络模型实现被不同的随机初始化训练。ensemble的模型实现用于评估因缺乏训练样本而引起的epistemic模型不确定性。这种不确定性评估是成功目标适应学arning的关键信息。然而,训练ensemble模型的成本经常成为计算挑战,尤其当模型在适应学习过程中不在平行进行训练。在这种情况下,一种新的模拟器内置神经网络方法被提出,使用rapid neural network paradigm。与传统神经网络训练不同,这种训练方法只是将最后层连接权重调整,通过应用线性回归技术。发现,提议的模拟器内置神经网络在几乎实时内训练,通常无损失预测精度。这种方法在多个分析例子中进行了证明,以及一个涉及到一个通用 hypersonic 飞行器的航空飞行参数研究。

A Sequentially Fair Mechanism for Multiple Sensitive Attributes

  • paper_url: http://arxiv.org/abs/2309.06627
  • repo_url: https://github.com/phi-ra/SequentialFairness
  • paper_authors: François Hu, Philipp Ratz, Arthur Charpentier
    for: 这个论文的目的是为了减少敏感变量和相应分数之间的关系,以解决多重敏感特征情况下的内部平衡问题。methods: 这个论文使用了多重条件 Wasserstein 中心来扩展了传统的强人口平衡定义,以便在多重敏感特征情况下进行平衡。这种方法提供了一个关键的关系解释,允许进行精确的不公平预测器。results: 这个论文的实验结果显示,这种方法可以有效地减少多重敏感特征情况下的不公平情况。此外,这种方法还可以让敏感特征之间的联乘关系获得明确的解释。
    Abstract In the standard use case of Algorithmic Fairness, the goal is to eliminate the relationship between a sensitive variable and a corresponding score. Throughout recent years, the scientific community has developed a host of definitions and tools to solve this task, which work well in many practical applications. However, the applicability and effectivity of these tools and definitions becomes less straightfoward in the case of multiple sensitive attributes. To tackle this issue, we propose a sequential framework, which allows to progressively achieve fairness across a set of sensitive features. We accomplish this by leveraging multi-marginal Wasserstein barycenters, which extends the standard notion of Strong Demographic Parity to the case with multiple sensitive characteristics. This method also provides a closed-form solution for the optimal, sequentially fair predictor, permitting a clear interpretation of inter-sensitive feature correlations. Our approach seamlessly extends to approximate fairness, enveloping a framework accommodating the trade-off between risk and unfairness. This extension permits a targeted prioritization of fairness improvements for a specific attribute within a set of sensitive attributes, allowing for a case specific adaptation. A data-driven estimation procedure for the derived solution is developed, and comprehensive numerical experiments are conducted on both synthetic and real datasets. Our empirical findings decisively underscore the practical efficacy of our post-processing approach in fostering fair decision-making.
    摘要 通常的用例中的算法公平 goal 是消除敏感变量和相应的分数之间的关系。在过去的几年中,科学社区已经开发出了许多定义和工具来解决这个问题,这些工具在许多实际应用中表现良好。然而,在多个敏感特征的情况下,这些工具和定义的可用性和效果变得更加复杂。为了解决这个问题,我们提出了一个顺序框架,可以逐步实现公平性 across a set of sensitive features。我们实现这一点通过利用多个敏感特征的 Wasserstein 多重中心,该扩展了标准的强人口平衡定义到多个敏感特征的情况。这种方法还提供了一个关闭式解的优化、公平预测器,允许明确地理解敏感特征之间的相互关系。我们的方法自然地扩展到近似公平性,包括一个折衔策略,可以根据特定敏感特征内的减少不公平性进行目标化优化。这种扩展允许特定敏感特征的公平改进,使得可以实现案例特定的适应。我们发展了一种数据驱动的估计过程,并在synthetic和实际数据集上进行了广泛的数值实验。我们的实验证明了我们的后处理方法在不公平决策中具有实际的有效性。

On the Contraction Coefficient of the Schrödinger Bridge for Stochastic Linear Systems

  • paper_url: http://arxiv.org/abs/2309.06622
  • repo_url: None
  • paper_authors: Alexis M. H. Teter, Yongxin Chen, Abhishek Halder
  • for: 解决Schrödinger bridge问题,即在控制Diffusion下从一个初始状态分布转移到另一个状态分布,并且在满足时间约束下实现最优化。
  • methods: 使用Contractive fixed point recursions方法来数学解Schrödinger bridge问题,这种方法可以看作是动态版本的Sinkhorn迭代,并且在某些假设下可以 garantizar线性减少。
  • results: 研究Schrödinger系统的前提估计,并提供新的几何和控制理论 интерпретаciones。此外,还指出了可以通过预处理终点支持集来提高worst-case contraction coefficient的计算。
    Abstract Schr\"{o}dinger bridge is a stochastic optimal control problem to steer a given initial state density to another, subject to controlled diffusion and deadline constraints. A popular method to numerically solve the Schr\"{o}dinger bridge problems, in both classical and in the linear system settings, is via contractive fixed point recursions. These recursions can be seen as dynamic versions of the well-known Sinkhorn iterations, and under mild assumptions, they solve the so-called Schr\"{o}dinger systems with guaranteed linear convergence. In this work, we study a priori estimates for the contraction coefficients associated with the convergence of respective Schr\"{o}dinger systems. We provide new geometric and control-theoretic interpretations for the same. Building on these newfound interpretations, we point out the possibility of improved computation for the worst-case contraction coefficients of linear SBPs by preconditioning the endpoint support sets.
    摘要 Schrödinger 桥是一个 Stochastic Optimal Control 问题,旨在从一个初始状态密度到另一个,并且受到控制的扩散和时间限制。一种广泛使用的方法来数学解决 Schrödinger 桥问题是通过收缩的定点反射,这些反射可以看作是动态版本的 Sinkhorn 迭代。在我们的研究中,我们研究了 Schrödinger 系统的前期估计,并提供了新的几何和控制理论解释。我们还指出了可以通过预处理终点支持集来提高 worst-case 收缩系数的计算。

The Grand Illusion: The Myth of Software Portability and Implications for ML Progress

  • paper_url: http://arxiv.org/abs/2309.07181
  • repo_url: https://github.com/for-ai/portability
  • paper_authors: Fraser Mince, Dzung Dinh, Jonas Kgomo, Neil Thompson, Sara Hooker
  • for: 本研究旨在探讨主流机器学习软件框架在不同硬件类型上的可移植性。
  • methods: 我们采用了大规模的研究方法,测试了主流机器学习软件框架在不同硬件类型上的可移植性。
  • results: 我们的结果显示,主流机器学习软件框架在不同硬件类型上可能会产生更多于40%的关键功能损失,并且即使功能可以移植,其性能下降也可能是极大的,使得性能不可接受。
    Abstract Pushing the boundaries of machine learning often requires exploring different hardware and software combinations. However, the freedom to experiment across different tooling stacks can be at odds with the drive for efficiency, which has produced increasingly specialized AI hardware and incentivized consolidation around a narrow set of ML frameworks. Exploratory research can be restricted if software and hardware are co-evolving, making it even harder to stray away from mainstream ideas that work well with popular tooling stacks. While this friction increasingly impacts the rate of innovation in machine learning, to our knowledge the lack of portability in tooling has not been quantified. In this work, we ask: How portable are popular ML software frameworks? We conduct a large-scale study of the portability of mainstream ML frameworks across different hardware types. Our findings paint an uncomfortable picture -- frameworks can lose more than 40% of their key functions when ported to other hardware. Worse, even when functions are portable, the slowdown in their performance can be extreme and render performance untenable. Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and suggest that specialization of hardware impedes innovation in machine learning research.
    摘要 推动机器学学科的前沿研究经常需要尝试不同的硬件和软件组合。然而,在效率的推动下,AI硬件的特циали化和主流ML框架的吸引力使得尝试新的想法和工具栈的探索变得更加困难。探索性研究可能会受到软硬件的共演化限制,使得尝试离开主流想法和工具栈更加困难。这种阻力在机器学学科的创新速度中产生了一定的影响,但到目前为止,工具栈的无法移植的问题尚未得到量化的解决。在这项工作中,我们问:流行的ML软件框架是否可以具有可移植性?我们进行了大规模的流行ML框架在不同硬件类型上的可移植性研究。我们的结果表现出一个不适的情况:框架在其他硬件上转移时可能会产生超过40%的关键功能损失。更糟糕的是,即使功能是可移植的,其性能下降可能会非常大,使其性能不可接受。总的来说,我们的结果表明特циали化的硬件对机器学学科的创新带来了成本,并建议特циали化的硬件阻碍了机器学学科的进步。

Unsupervised Learning of Nanoindentation Data to Infer Microstructural Details of Complex Materials

  • paper_url: http://arxiv.org/abs/2309.06613
  • repo_url: None
  • paper_authors: Chen Zhang, Clémence Bos, Stefan Sandfeld, Ruth Schwaiger
  • for: 这项研究使用了氧化镍-碳钴复合材料,通过尺寸挤压测试来研究其机械性能。
  • methods: 这项研究使用了尺寸挤压测试,并采用了不supervised学习技术, Gaussian mixture model,来分析数据,确定了机械阶段的数量和相应的机械性能。
  • results: 研究结果显示,采用不supervised学习技术可以有效地分析机械性能数据,并确定了机械阶段的数量。此外,通过cross-validation方法,研究人员还能够评估数据的充足性,并建议数据的充足量为可靠预测所需。
    Abstract In this study, Cu-Cr composites were studied by nanoindentation. Arrays of indents were placed over large areas of the samples resulting in datasets consisting of several hundred measurements of Young's modulus and hardness at varying indentation depths. The unsupervised learning technique, Gaussian mixture model, was employed to analyze the data, which helped to determine the number of "mechanical phases" and the respective mechanical properties. Additionally, a cross-validation approach was introduced to infer whether the data quantity was adequate and to suggest the amount of data required for reliable predictions -- one of the often encountered but difficult to resolve issues in machine learning of materials science problems.
    摘要 在这项研究中,氧化铜-镍复合材料被使用nano indent方法进行研究。数组式的 indent 被置于样品表面上,从而生成了包含多个百度测量 Young's modulus 和硬度的数据集。我们使用了无监督学习技术 Gaussian mixture model 来分析数据,以确定机械相的数量和相应的机械性质。此外,我们还提出了一种cross-validation方法,以判断数据量是否充分,并建议数据的充分量是否可靠的预测。这是机器学习材料科学问题中经常遇到,但很难解决的问题。

Reasoning with Latent Diffusion in Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.06599
  • repo_url: https://github.com/ldcq/ldcq
  • paper_authors: Siddarth Venkatraman, Shivesh Khaitan, Ravi Tej Akella, John Dolan, Jeff Schneider, Glen Berseth
  • for: 这篇论文的目的是学习无需环境互动的停滞学习(Offline Reinforcement Learning)策略,从静止数据集中学习高奖策略。
  • methods: 这篇论文提出了一种使用干扰扩散来模型支持游戏中的循环游戏路径,从而避免因缺乏数据支持而导致的推断错误。
  • results: 这篇论文的实验结果显示,使用该方法可以在D4RL测试集上达到最佳性能,特别是在长期、罕见奖励任务中表现出色。
    Abstract Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch-constraining. The latent space is also expressive and gracefully copes with multi-modal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state-actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks.
    摘要 <>translate "Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch-constraining. The latent space is also expressive and gracefully copes with multi-modal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state-actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks." into Simplified Chinese.下面是文本的中文翻译:<>offline reinforcement learning (RL) 承袭了从静态数据集中学习高奖策略的承袭,无需进一步与环境进行交互。然而,offline RL 中的一个关键挑战在于有效地将静态数据集中的部分亚优轨迹串连起来,而不导致因数据集缺失支持而产生的推断错误。现有的方法通常使用保守的方法,困难调整并在多模态数据上异常表现(我们显示),或者基于噪声 Monte Carlo 返回样本进行奖励条件。在这种情况下,我们提出了一种新的方法,利用干扰扩散的表达能力来模型在支持轨迹序列上的压缩 latent 技巧。这使得学习 Q-函数时避免推断错误,并通过批量约束来实现。干扰扩散空间也具有表达能力,可以gracefully 处理多模态数据。我们显示,学习的时间抽象 latent 空间中包含更加任务特定的信息,比raw state-action 更加丰富,这使得奖励赋权更加准确,并促进了更快的奖励传递。我们的方法在 D4RL 标准准测试上达到了领先的性能,特别是在长远、稀热奖 Task 上。

Optimal and Fair Encouragement Policy Evaluation and Learning

  • paper_url: http://arxiv.org/abs/2309.07176
  • repo_url: None
  • paper_authors: Angela Zhou
  • for: 这篇论文主要是研究如何设计优化的治疗方案,以便在不可预知的人们是否遵循治疗建议的情况下,最大化 causal 结果。
  • methods: 这篇论文使用了 causal 标识、统计量化减少、和robust 估计方法,以便在不可预知的人们是否遵循治疗建议的情况下,估计优化的治疗方案。
  • results: 这篇论文的结果表明,在不可预知的人们是否遵循治疗建议的情况下,可以使用 constrained 优化方法来设计优化的治疗方案,以便实现最大化 causal 结果。
    Abstract In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.
    摘要 在重要领域中,常常无法强制个人接受治疗,因此优化的政策规则仅仅是建议在人类不遵循治疗建议的情况下。在这些同时,可能存在人群响应不同,以及治疗效果的差异。优化的治疗规则可以最大化人口级别的 causal 结果,但是访问平等约束或其他公平考虑可能在鼓励中发挥作用。例如,在社会服务中,一个持续的谜题是养成很好的服务的投入率不均匀分布。当决策者有访问和平均结果的分布预期时,优化的决策规则会改变。我们研究 causal 验证、统计减少的估计和robust estimation of 优化的治疗规则,包括在可能的非正式性下进行验证。我们考虑公平约束,如人群响应不同和其他约束,通过受限优化来实现。我们的框架可以扩展到处理算法建议在conditionally reasonable covariate-conditional exclusion restriction下进行验证,使用我们的robustness检查来检测非正式性。我们开发了一种两个参数的算法,用于在总体约束下解决参数化政策类型的问题,以获得变量敏感的 regret bound。我们在两个案例研究中应用了这些方法,基于随机鼓励保险和预 trial supervised release with electronic monitoring。

Convergence of Gradient-based MAML in LQR

  • paper_url: http://arxiv.org/abs/2309.06588
  • repo_url: None
  • paper_authors: Negin Musavi, Geir E. Dullerud
  • for: investigate the local convergence characteristics of Model-agnostic Meta-learning (MAML) in linear system quadratic optimal control (LQR)
  • methods: uses MAML and its variations, with theoretical guarantees provided for the local convergence of the algorithm
  • results: presents simple numerical results to demonstrate the convergence properties of MAML in LQR tasksHere’s the format you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope that helps!
    Abstract The main objective of this research paper is to investigate the local convergence characteristics of Model-agnostic Meta-learning (MAML) when applied to linear system quadratic optimal control (LQR). MAML and its variations have become popular techniques for quickly adapting to new tasks by leveraging previous learning knowledge in areas like regression, classification, and reinforcement learning. However, its theoretical guarantees remain unknown due to non-convexity and its structure, making it even more challenging to ensure stability in the dynamic system setting. This study focuses on exploring MAML in the LQR setting, providing its local convergence guarantees while maintaining the stability of the dynamical system. The paper also presents simple numerical results to demonstrate the convergence properties of MAML in LQR tasks.
    摘要 主要目标 OF 这个研究论文是 investigate Model-agnostic Meta-learning(MAML)在线性系统弗 quadratic 优化控制(LQR)中的本地叉�强特性。 MAML 和其变种在领域如回归、分类和 reinforcement learning 中利用 previous learning knowledge 快速适应新任务,但其理论保证仍然不清楚,因为非拟合性和结构,使得在动态系统设置下稳定性更加挑战。 这个研究集中关注 MAML 在 LQR 设置中的本地叉�强特性,提供了本地叉�强保证,同时保证动态系统的稳定性。 论文还展示了简单的数据结果,以证明 MAML 在 LQR 任务中的 converges 性质。Here's the text with some additional information about the translation:I used Google Translate to translate the text into Simplified Chinese. However, please note that the translation may not be perfect and may require some adjustments to ensure accuracy. Additionally, the translation may not capture the exact nuances and idiomatic expressions of the original text, so some phrasing and wording may be different from the original.

  • paper_url: http://arxiv.org/abs/2309.06584
  • repo_url: None
  • paper_authors: Xinyue Hu, Zenan Sun, Yi Nian, Yifang Dang, Fang Li, Jingna Feng, Evan Yu, Cui Tao
    for:* 这个研究旨在提高了阿尔ツ海默症和相关失智症(ADRD)的风险预测,使用机器学习和养成数据。methods:* 这个研究使用了变换正则化编码器-解码器图ael neural network(VGNN)来估算ADRD的可能性。results:* VGNN比Random Forest和Light Gradient Boost Machine基线模型高出10%的 receiver operating characteristic 面积。In simplified Chinese:for:* 这个研究旨在提高了阿尔ツ海默症和相关失智症(ADRD)的风险预测,使用机器学习和养成数据。methods:* 这个研究使用了变换正则化编码器-解码器图ael neural network(VGNN)来估算ADRD的可能性。results:* VGNN比Random Forest和Light Gradient Boost Machine基线模型高出10%的 receiver operating characteristic 面积。
    Abstract Alzheimer's disease and related dementias (ADRD) ranks as the sixth leading cause of death in the US, underlining the importance of accurate ADRD risk prediction. While recent advancement in ADRD risk prediction have primarily relied on imaging analysis, yet not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. Our goal is to utilize Graph Neural Networks (GNNs) with claims data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative method to evaluate relationship importance and its influence on ADRD risk prediction, ensuring comprehensive interpretation. We employed Variationally Regularized Encoder-decoder Graph Neural Network (VGNN) for estimating ADRD likelihood. We created three scenarios to assess the model's efficiency, using Random Forest and Light Gradient Boost Machine as baselines. We further used our relation importance method to clarify the key relationships for ADRD risk prediction. VGNN surpassed other baseline models by 10% in the area under the receiver operating characteristic. The integration of the GNN model and relation importance interpretation could potentially play an essential role in providing valuable insight into factors that may contribute to or delay ADRD progression. Employing a GNN approach with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
    摘要 供应链疾病(ADRD)是美国第六大死因之一,因此精准预测ADRD风险的重要性。Recent Advances in ADRD风险预测主要基于成像分析,但不 все患者会在ADRD诊断之前进行成像检查。通过将机器学习与保险数据结合,可以揭示更多的风险因素和不同的医疗代码之间的联系。我们的目标是使用图ael Neural Networks(GNNs)与保险数据进行ADRD风险预测。为了解释预测结果中的人类可读性原因,我们引入了关系重要性评估方法。我们使用了Variational Regularized Encoder-decoder Graph Neural Network(VGNN)来估计ADRD可能性。我们创建了三个场景来评估模型的效率,使用Random Forest和Light Gradient Boost Machine作为基线。我们还使用我们的关系重要性评估方法来解释ADRD风险预测中的关键关系。VGNN比基线模型提高了10%的接收操作特征曲线地区 beneath。将GNN模型和关系重要性评估结合可能会在提供ADRD进程中的价值预测和解释方面发挥关键作用。使用GNN方法和保险数据可以提高ADRD风险预测,同时提供成像分析预测中的可能性。这种方法不仅可以用于ADRD风险预测,还可以应用于其他成像分析预测中。

Electron Energy Regression in the CMS High-Granularity Calorimeter Prototype

  • paper_url: http://arxiv.org/abs/2309.06582
  • repo_url: https://github.com/fair-umn/fair-umn-hgcal
  • paper_authors: Roger Rusack, Bhargav Joshi, Alpana Alpana, Seema Sharma, Thomas Vadnais
  • for: 该论文为了探讨一种新型加速器中的净能计,并将其数据公开发布,以便促进机器学习专家利用该数据进行高精度的图像重建。
  • methods: 该论文使用了最新的机器学习技术来重建 incident electrons 的能量,从三维hit的能量中确定incident electrons的能量。
  • results: 该论文通过使用机器学习方法,成功地重建 incident electrons 的能量,并公开发布了这些数据,以便促进相关领域的研究和应用。
    Abstract We present a new publicly available dataset that contains simulated data of a novel calorimeter to be installed at the CERN Large Hadron Collider. This detector will have more than six-million channels with each channel capable of position, ionisation and precision time measurement. Reconstructing these events in an efficient way poses an immense challenge which is being addressed with the latest machine learning techniques. As part of this development a large prototype with 12,000 channels was built and a beam of high-energy electrons incident on it. Using machine learning methods we have reconstructed the energy of incident electrons from the energies of three-dimensional hits, which is known to some precision. By releasing this data publicly we hope to encourage experts in the application of machine learning to develop efficient and accurate image reconstruction of these electrons.
    摘要 我们现在公布了一个新的公共可用数据集,该数据集包含 simulate 的加速器中的一种新的冷却器,该冷却器将有超过六百万个通道,每个通道都可以测量位置、离子化和精度时间。重建这些事件的方式具有极大的挑战,我们正在使用最新的机器学习技术来解决这个问题。在这个开发过程中,我们建立了一个大型原型,该原型有12,000个通道,并使用高能电子束照射它。使用机器学习方法,我们已经从三维hit的能量中重建了入射电子的能量,这已知道一定的精度。我们通过公布这些数据希望能吸引专家在机器学习应用中发展高效和准确的图像重建技术。

Promises of Deep Kernel Learning for Control Synthesis

  • paper_url: http://arxiv.org/abs/2309.06569
  • repo_url: None
  • paper_authors: Robert Reed, Luca Laurenti, Morteza Lahijanian
  • for: 学习和控制复杂动力系统
  • methods: 使用深度kernel学习 combines with Gaussian Processes, 并使用抽象基础框架进行控制synthesis
  • results: 对各种准确性要求的 benchmark 进行了实验,显示了与现有竞争方法相比,控制synthesis with DKL 可以具有显著性能优势。
    Abstract Deep Kernel Learning (DKL) combines the representational power of neural networks with the uncertainty quantification of Gaussian Processes. Hence, it is potentially a promising tool to learn and control complex dynamical systems. In this work, we develop a scalable abstraction-based framework that enables the use of DKL for control synthesis of stochastic dynamical systems against complex specifications. Specifically, we consider temporal logic specifications and create an end-to-end framework that uses DKL to learn an unknown system from data and formally abstracts the DKL model into an Interval Markov Decision Process (IMDP) to perform control synthesis with correctness guarantees. Furthermore, we identify a deep architecture that enables accurate learning and efficient abstraction computation. The effectiveness of our approach is illustrated on various benchmarks, including a 5-D nonlinear stochastic system, showing how control synthesis with DKL can substantially outperform state-of-the-art competitive methods.
    摘要 深度kernel学习(DKL)结合神经网络的表达能力和高斯过程的不确定性评估,因此它可能是控制复杂动态系统的有望工具。在这项工作中,我们开发了可扩展的抽象基础框架,使得使用DKL进行动态系统的控制合成 against complex specs 可能。特别是,我们考虑了时间逻辑特性规定,并创建了一个终端框架,使用DKL来从数据中学习未知系统,并正确地抽象DKL模型为一个Interval Markov Decision Process(IMDP)来进行控制合成。此外,我们确定了一种深度架构,使得准确地学习和高效地抽象计算。我们的方法的有效性被证明在多个标准准例上,包括一个5D非线性随机系统,显示了使用DKL进行控制合成可以大幅超越现有竞争方法。

MELAGE: A purely python based Neuroimaging software (Neonatal)

  • paper_url: http://arxiv.org/abs/2309.07175
  • repo_url: https://github.com/bahramjafrasteh/melage
  • paper_authors: Bahram Jafrasteh, Simón Pedro Lubián López, Isabel Benavente Fernández
  • for: 这篇论文主要用于介绍MELAGE软件,一种基于Python的神经成像软件,可以用于处理和分析医疗图像。
  • methods: MELAGE软件使用了深度学习模块和自动化大脑EXTRACTION工具,可以快速和精确地提取大脑结构信息从MRI和3D超声数据中。
  • results: MELAGE软件可以快速处理和分析医疗图像,并且具有多种功能,如动态3D视化、准确的测量和交互式图像分割。这个软件在医学成像领域中具有广泛的应用前景和潜在的推广前景。
    Abstract MELAGE, a pioneering Python-based neuroimaging software, emerges as a versatile tool for the visualization, processing, and analysis of medical images. Initially conceived to address the unique challenges of processing 3D ultrasound and MRI brain images during the neonatal period, MELAGE exhibits remarkable adaptability, extending its utility to the domain of adult human brain imaging. At its core, MELAGE features a semi-automatic brain extraction tool empowered by a deep learning module, ensuring precise and efficient brain structure extraction from MRI and 3D Ultrasound data. Moreover, MELAGE offers a comprehensive suite of features, encompassing dynamic 3D visualization, accurate measurements, and interactive image segmentation. This transformative software holds immense promise for researchers and clinicians, offering streamlined image analysis, seamless integration with deep learning algorithms, and broad applicability in the realm of medical imaging.
    摘要 美laps, a pioneering Python-based neuroimaging software, emerges as a versatile tool for the visualization, processing, and analysis of medical images. Initially conceived to address the unique challenges of processing 3D ultrasound and MRI brain images during the neonatal period, MELAGE exhibits remarkable adaptability, extending its utility to the domain of adult human brain imaging. At its core, MELAGE features a semi-automatic brain extraction tool empowered by a deep learning module, ensuring precise and efficient brain structure extraction from MRI and 3D Ultrasound data. Moreover, MELAGE offers a comprehensive suite of features, encompassing dynamic 3D visualization, accurate measurements, and interactive image segmentation. This transformative software holds immense promise for researchers and clinicians, offering streamlined image analysis, seamless integration with deep learning algorithms, and broad applicability in the realm of medical imaging.Here's the text with some additional information about the Simplified Chinese translation:The Simplified Chinese translation of the text uses the traditional Chinese characters for "MELAGE" (美laps), which is the pinyin Romanization of the name. The translation is written in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.In the translation, I have used the phrase "美laps" to refer to the software, as this is the name that is commonly used in the field of neuroimaging. I have also used the phrase " neural imaging" (神经成像) to refer to the broader field of medical imaging, as this is the term that is commonly used in China.Additionally, I have used the phrase " semi-automatic brain extraction tool" (半自动的脑EXTRACTION工具) to refer to the software's ability to automatically extract brain structures from medical images. I have also used the phrase " deep learning module" (深度学习模块) to refer to the software's use of machine learning algorithms to improve its performance.Overall, the translation is written in a formal and technical style, using language that is appropriate for a scientific or medical audience.

Commands as AI Conversations

  • paper_url: http://arxiv.org/abs/2309.06551
  • repo_url: https://github.com/dspinellis/ai-cli
  • paper_authors: Diomidis Spinellis
  • for: 提高开发者和数据科学家在命令行中输入数据的效率,使命令行界面更加智能和用户友好。
  • methods: 使用OpenAI的API,通过JSON HTTP请求进行交互,将自然语言提示转换为可执行的命令行指令。
  • results: 使命令行界面更加智能和用户友好,开拓了进一步完善和跨平台应用的可能性。
    Abstract Developers and data scientists often struggle to write command-line inputs, even though graphical interfaces or tools like ChatGPT can assist. The solution? "ai-cli," an open-source system inspired by GitHub Copilot that converts natural language prompts into executable commands for various Linux command-line tools. By tapping into OpenAI's API, which allows interaction through JSON HTTP requests, "ai-cli" transforms user queries into actionable command-line instructions. However, integrating AI assistance across multiple command-line tools, especially in open source settings, can be complex. Historically, operating systems could mediate, but individual tool functionality and the lack of a unified approach have made centralized integration challenging. The "ai-cli" tool, by bridging this gap through dynamic loading and linking with each program's Readline library API, makes command-line interfaces smarter and more user-friendly, opening avenues for further enhancement and cross-platform applicability.
    摘要 开发者和数据科学家经常遇到 command-line 输入的问题,即使使用图形化界面或 ChatGPT 等工具。解决方案是“ ai-cli”,一个基于 GitHub Copilot 的开源系统,可以将自然语言提示转换成多种 Linux 命令行工具的执行命令。通过使用 OpenAI 的 API,可以通过 JSON HTTP 请求进行交互,将用户提问转换成可执行的命令行指令。然而,在多个命令行工具之间集成 AI 帮助,特别是在开源环境下,可能会具有复杂性。历史上,操作系统可以作为中介,但每个工具的功能和缺乏一致的方法使得中央集成变得困难。“ ai-cli” 工具通过在每个程序的 Readline 库 API 上进行动态加载和链接,使命令行界面变得更加智能和用户友好,打开了进一步改进和跨平台应用的可能性。

Distributionally Robust Transfer Learning

  • paper_url: http://arxiv.org/abs/2309.06534
  • repo_url: https://github.com/rvr-account/rvr
  • paper_authors: Xin Xiong, Zijian Guo, Tianxi Cai
  • for: 这篇论文是针对对转移学习领域的研究,具体目的是提出一种名为分布robust优化的转移学习方法(TransDRO),可以突破传统对目标数据的相似性限制,实现更好的预测性能。
  • methods: 这篇论文使用了分布robust优化的方法,具体来说是在一个不确定集中优化最大不确定损失,这个不确定集是通过将多个来源分布混合而成的,并且保证预测性能优化。
  • results: 论文通过了复杂的数据分析和实验研究,证明了TransDRO的可靠性和精准性,并且比传统转移学习方法更快速地调整模型。
    Abstract Many existing transfer learning methods rely on leveraging information from source data that closely resembles the target data. However, this approach often overlooks valuable knowledge that may be present in different yet potentially related auxiliary samples. When dealing with a limited amount of target data and a diverse range of source models, our paper introduces a novel approach, Distributionally Robust Optimization for Transfer Learning (TransDRO), that breaks free from strict similarity constraints. TransDRO is designed to optimize the most adversarial loss within an uncertainty set, defined as a collection of target populations generated as a convex combination of source distributions that guarantee excellent prediction performances for the target data. TransDRO effectively bridges the realms of transfer learning and distributional robustness prediction models. We establish the identifiability of TransDRO and its interpretation as a weighted average of source models closest to the baseline model. We also show that TransDRO achieves a faster convergence rate than the model fitted with the target data. Our comprehensive numerical studies and analysis of multi-institutional electronic health records data using TransDRO further substantiate the robustness and accuracy of TransDRO, highlighting its potential as a powerful tool in transfer learning applications.
    摘要 许多现有的转移学习方法倾向于利用源数据中与目标数据相似的信息。然而,这种方法经常忽视了可能存在的不同 yet 相关的辅助样本中的有价值知识。当面临有限的目标数据和多种源模型时,我们的论文引入了一种新的方法:分布robust优化 для转移学习(TransDRO)。TransDRO是设计用于最大化内部的 adversarial 损失,而不是仅仅是依靠源数据的相似性。TransDRO 可以减轻转移学习中的压力,并帮助模型更好地适应不同的目标数据。我们证明了TransDRO的可识别性和其作为基准模型 closest 的weighted 平均的解释。此外,我们还证明了TransDRO在比目标数据更快地收敛的特点。我们的全面的数学研究和多所机构电子医疗记录数据使用TransDRO进行了详细的分析,证明了TransDRO在转移学习应用中的可靠性和准确性。

Exploring the Benefits of Differentially Private Pre-training and Parameter-Efficient Fine-tuning for Table Transformers

  • paper_url: http://arxiv.org/abs/2309.06526
  • repo_url: https://github.com/ibm/dp-tabtransformer
  • paper_authors: Xilong Wang, Chia-Mu Yu, Pin-Yu Chen
  • for: 本研究探讨了将 differential privacy (DP) 和 Table Transformer (TabTransformer) 结合使用,以实现在转移学习中的数据隐私保护和模型效果优化。
  • methods: 本研究使用了多种参数效率优化(PEFT)方法,包括 Adapter、LoRA 和 Prompt Tuning,以进行增强的预训练和精度调整。
  • results: 实验结果表明,使用 PEFT 方法可以在 ACSIncome 数据集上提高下游任务的准确率和数据隐私保护,同时具有更好的参数效率、隐私和准确率之间的平衡。 codes 可以在 github.com/IBM/DP-TabTransformer 上下载。
    Abstract For machine learning with tabular data, Table Transformer (TabTransformer) is a state-of-the-art neural network model, while Differential Privacy (DP) is an essential component to ensure data privacy. In this paper, we explore the benefits of combining these two aspects together in the scenario of transfer learning -- differentially private pre-training and fine-tuning of TabTransformers with a variety of parameter-efficient fine-tuning (PEFT) methods, including Adapter, LoRA, and Prompt Tuning. Our extensive experiments on the ACSIncome dataset show that these PEFT methods outperform traditional approaches in terms of the accuracy of the downstream task and the number of trainable parameters, thus achieving an improved trade-off among parameter efficiency, privacy, and accuracy. Our code is available at github.com/IBM/DP-TabTransformer.
    摘要 为机器学习 tabular 数据,表Transformer(TabTransformer)是现代神经网络模型的state-of-the-art,而数据隐私(DP)则是保证数据隐私的必要组成部分。在这篇论文中,我们探讨将这两个方面结合在一起的场景——即具有数据隐私的预训练和精度调整 TabTransformers 的方法,包括 Adapter、LoRA 和 Prompt Tuning。我们在 ACSIncome 数据集进行了广泛的实验,发现这些 PEFT 方法在下游任务的准确率和可训练参数数量方面都高于传统方法,从而实现了参数效率、隐私和准确率之间的改善的平衡。我们的代码可以在github.com/IBM/DP-TabTransformer 找到。

A Q-learning Approach for Adherence-Aware Recommendations

  • paper_url: http://arxiv.org/abs/2309.06519
  • repo_url: None
  • paper_authors: Ioannis Faros, Aditya Dave, Andreas A. Malikopoulos
  • for: 这种情况下,人工智能提供的建议将被人类决策者(HDM)作为最终决策者接受或拒绝。
  • methods: 我们开发了一种“遵循度感知Q学习”算法,该算法学习HDМ的遵循度水平,并在实时更新最佳建议策略。
  • results: 我们证明了我们提出的Q学习算法 converges to the optimal value,并评估了它在多种情况下的性能。
    Abstract In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. In this letter, we develop an "adherence-aware Q-learning" algorithm to address this problem. The algorithm learns the "adherence level" that captures the frequency with which an HDM follows the recommended actions and derives the best recommendation policy in real time. We prove the convergence of the proposed Q-learning algorithm to the optimal value and evaluate its performance across various scenarios.
    摘要 在许多高风险高安全性的实际场景中,人工智能推荐(HDM)可能会接收人工智能推荐的建议,而最终负责做出决策。在这封信中,我们开发了一种“遵循程度意识Q学习”算法,以解决这个问题。该算法学习“遵循程度”, capture推荐行为的频率,并在实时 derivates最佳推荐策略。我们证明算法 converges to 优化值,并评估其性能在多种场景中。

Bayesian longitudinal tensor response regression for modeling neuroplasticity

  • paper_url: http://arxiv.org/abs/2309.10065
  • repo_url: None
  • paper_authors: Suprateek Kundu, Alec Reinhardt, Serena Song, M. Lawson Meadows, Bruce Crosson, Venkatagiri Krishnamurthy
  • For: 这个论文的主要目标是 investigate longitudinal neuroimaging 数据中 voxel 级别的 neural plasticity,并且使用 Bayesian tensor response regression 方法来做这种研究。* Methods: 这个方法使用 Markov chain Monte Carlo (MCMC) 采样来实现,并使用 low-rank decomposition 来降维并保持 voxel 的空间配置。它还可以通过联合可信区间来进行特征选择,以更准确地进行推断。* Results: 这个方法可以在 group 级别和个体级别进行推断,并且可以检测到不同干扰因素对 brain 活动的影响。在应用于一个 longitudinal aphasia 数据集上,这个方法发现,对于 control 治疗,brain 活动在长期内增加,而对于 intention treatment,brain 活动在短期内增加,两者都集中在特定的本地化区域。相比之下,voxel-wise regression 无法检测到任何 significannot neuroplasticity после多重性调整,这是生物学上不可能的和表明缺乏力。
    Abstract A major interest in longitudinal neuroimaging studies involves investigating voxel-level neuroplasticity due to treatment and other factors across visits. However, traditional voxel-wise methods are beset with several pitfalls, which can compromise the accuracy of these approaches. We propose a novel Bayesian tensor response regression approach for longitudinal imaging data, which pools information across spatially-distributed voxels to infer significant changes while adjusting for covariates. The proposed method, which is implemented using Markov chain Monte Carlo (MCMC) sampling, utilizes low-rank decomposition to reduce dimensionality and preserve spatial configurations of voxels when estimating coefficients. It also enables feature selection via joint credible regions which respect the shape of the posterior distributions for more accurate inference. In addition to group level inferences, the method is able to infer individual-level neuroplasticity, allowing for examination of personalized disease or recovery trajectories. The advantages of the proposed approach in terms of prediction and feature selection over voxel-wise regression are highlighted via extensive simulation studies. Subsequently, we apply the approach to a longitudinal Aphasia dataset consisting of task functional MRI images from a group of subjects who were administered either a control intervention or intention treatment at baseline and were followed up over subsequent visits. Our analysis revealed that while the control therapy showed long-term increases in brain activity, the intention treatment produced predominantly short-term changes, both of which were concentrated in distinct localized regions. In contrast, the voxel-wise regression failed to detect any significant neuroplasticity after multiplicity adjustments, which is biologically implausible and implies lack of power.
    摘要 一个主要兴趣点在长itudinal神经成像研究是调查征量级的神经重塑,因为不同因素的影响。然而,传统的征量级方法存在多种困难,可能会降低准确性。我们提出了一种新的 bayesian tensor response regression方法,用于长itudinal神经成像数据,该方法将信息归一化到空间分布的壳体上,以确定变化的主要因素,并对 covariates 进行补做。我们使用 markov chain Monte Carlo (MCMC) 采样实现该方法,并使用低级别分解来降低维度并保持壳体中各个壳体的空间配置。此外,该方法还允许功能选择,通过共同可信区域来更准确地进行推理。除了群体水平的推理,该方法还可以进行个体水平的神经重塑推理,以检测个体化的疾病或恢复轨迹。我们通过了EXTENSIVE 仪器实验来比较我们的方法与征量级 regression 的优势,并应用到一个长itudinal 语言障碍数据集,该数据集包含一组主动或接受治疗的 subjects 的任务功能 MRI 图像,从baseline开始,并在后续访问中进行追踪。我们的分析发现,控制疗法在长期内表现出增加的脑活动,而意图治疗则在短期内产生主要的变化,这些变化集中在特定的本地化区域。相比之下,征量级 regression 无法检测任何的神经重塑,这是生物学上不可能的,并 implies 缺乏力量。

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

  • paper_url: http://arxiv.org/abs/2309.06497
  • repo_url: https://github.com/facebookresearch/optimizers/tree/main/distributed_shampoo
  • paper_authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhijing Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat
  • For: 这篇论文主要探讨了一种基于AdaGrad家族的在线泛化优化算法Shampoo,用于训练神经网络。* Methods: 该算法使用了块对角矩阵预conditioner,其中每个块包含一个粗略 Kronecker产品方法来对神经网络每个参数进行AdaGrad优化。* Results: 作者提供了一个完整的算法描述以及实现方法,并通过对ImageNet ResNet50进行ablation研究,证明了Shampoo比标准的 diagonally-scaling-based adaptive gradient方法具有更好的性能。
    Abstract Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning.
    摘要 <>首先,我们需要介绍一下Shampoo算法。Shampoo是一种在线和随机优化算法,属于AdaGrad家族的方法,用于训练神经网络。它构建了一个块对角预conditioner,每个块包含一个粗糙的Kronecker产品approxiamtion来对神经网络每个参数进行AdaGrad优化。在这篇文章中,我们将提供Shampoo算法的完整描述,以及我们的实现方法来使用PyTorch框架进行高效的多GPU分布式数据并行训练。我们的实现方法包括将每个参数的块分配到不同的GPU上,并使用PyTorch的DTensor数据结构进行分布式计算。在每次迭代中,我们使用AllGather primitives来合并所有GPU上的计算结果。这种主要的性能优化方法使得我们可以在每步wall-clock时间中减少训练时间的最大值为10%,相比标准的对 diagonalfactor-scaling 的 adaptive gradient方法。我们验证了我们的实现方法,通过对ImageNet ResNet50进行减少学习率的研究,并证明Shampoo的superiority。<>

Learning topological operations on meshes with application to block decomposition of polygons

  • paper_url: http://arxiv.org/abs/2309.06484
  • repo_url: None
  • paper_authors: Arjun Narayanan, Yulong Pan, Per-Olof Persson
  • for: 提高不结构化三角形和四角形网格质量
  • methods: 使用自适应学习游戏 reinforcement learning,无需先知识策略,通过标准本地和全局元素操作进行网格优化
  • results: 能够有效地减少节点度差与理想值的差异,即内部顶点的异常节点数量减少
    Abstract We present a learning based framework for mesh quality improvement on unstructured triangular and quadrilateral meshes. Our model learns to improve mesh quality according to a prescribed objective function purely via self-play reinforcement learning with no prior heuristics. The actions performed on the mesh are standard local and global element operations. The goal is to minimize the deviation of the node degrees from their ideal values, which in the case of interior vertices leads to a minimization of irregular nodes.
    摘要 我们提出了一种基于学习的框架,用于改善无结构三角形和四边形网格的质量。我们的模型通过自动化反射学习,不含任何先验知识,来改善网格质量 according to a prescribed objective function。操作的方法包括标准的本地和全局元素操作。目标是将节点度 deviation from their ideal values as minimal as possible,即在内部顶点情况下 minimize irregular nodes。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Flows for Flows: Morphing one Dataset into another with Maximum Likelihood Estimation

  • paper_url: http://arxiv.org/abs/2309.06472
  • repo_url: None
  • paper_authors: Tobias Golling, Samuel Klein, Radha Mastandrea, Benjamin Nachman, John Andrew Raine
  • for: 本研究旨在解决高能物理和其他领域中数据分析中的数据变换问题,通过将一个数据集转换成另一个数据集而不需要知道起始数据集的概率密度。
  • methods: 本研究使用了normalizing flows机器学习模型,该模型具有在多种 particle physics 任务中的出色精度。然而,normalizing flows 模型需要知道起始数据集的概率密度,而在大多数情况下,我们只能生成更多的例子,但不知道densities explicitly。为解决这个问题,我们提出了一种叫做“flows for flows”的协议,该协议可以让normalizing flows 模型在不知道起始数据集的概率密度情况下进行数据变换。
  • results: 我们研究了这种协议的多种变种,以explore如何使用normalizing flows 模型来 Statistically match two datasets。此外,我们还示了如何使用conditioning feature来创建一个基于特定特征的 morphing function,以便为每个特征值创建一个 morphing function。我们用了 Toy examples 和 collider physics 示例来 ilustrate our results。
    Abstract Many components of data analysis in high energy physics and beyond require morphing one dataset into another. This is commonly solved via reweighting, but there are many advantages of preserving weights and shifting the data points instead. Normalizing flows are machine learning models with impressive precision on a variety of particle physics tasks. Naively, normalizing flows cannot be used for morphing because they require knowledge of the probability density of the starting dataset. In most cases in particle physics, we can generate more examples, but we do not know densities explicitly. We propose a protocol called flows for flows for training normalizing flows to morph one dataset into another even if the underlying probability density of neither dataset is known explicitly. This enables a morphing strategy trained with maximum likelihood estimation, a setup that has been shown to be highly effective in related tasks. We study variations on this protocol to explore how far the data points are moved to statistically match the two datasets. Furthermore, we show how to condition the learned flows on particular features in order to create a morphing function for every value of the conditioning feature. For illustration, we demonstrate flows for flows for toy examples as well as a collider physics example involving dijet events
    摘要 很多高能物理数据分析中的组件需要将一个数据集转换成另一个数据集。通常通过重新权重来解决这个问题,但是有很多优点在保持权重并将数据点移动。正则化流是一种机器学习模型,它在多种粒子物理任务上具有出色的精度。然而,正则化流无法用于 morphing,因为它们需要知道开始数据集的概率密度。在大多数粒子物理任务中,我们可以生成更多的例子,但我们不知道概率密度的准确值。我们提出了一个协议called flows for flows,用于在不知道开始数据集的概率密度情况下,通过最大likelihood估计来训练正则化流。这种设置可以在相关任务中展示出非常高效。我们还研究了这个协议的变种,以探索如何在数据点之间移动的距离,以达到两个数据集的统计匹配。此外,我们还示出了如何基于特定的特征来创建一个适应于每个特征的 morphing 函数。为了说明,我们对假示例和一个撞击物理示例进行了演示。

On Computationally Efficient Learning of Exponential Family Distributions

  • paper_url: http://arxiv.org/abs/2309.06413
  • repo_url: None
  • paper_authors: Abhin Shah, Devavrat Shah, Gregory W. Wornell
  • for: 学习 truncated exponential family 的自然参数,即使参数数量很大,可以在计算上和统计上减少计算复杂度。
  • methods: 提出了一种新的损失函数和计算效率高的估计方法,该方法可以在轻量级上实现consistent和asymptotically normal的性质,并且可以视为maximum likelihood estimation的一种变形。
  • results: 提供了finite sample guarantees,表明该估计方法可以在样本数量为$O({\sf poly}(k)/\alpha^2)$下 Achieves an error (in $\ell_2$-norm) of $\alpha$ in the parameter estimation,并且在特定的Markov random fields上实现order-optimal的sample complexity $O({\sf log}(k)/\alpha^2)$。
    Abstract We consider the classical problem of learning, with arbitrary accuracy, the natural parameters of a $k$-parameter truncated \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a novel loss function and a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family. Further, we show that our estimator can be interpreted as a solution to minimizing a particular Bregman score as well as an instance of minimizing the \textit{surrogate} likelihood. We also provide finite sample guarantees to achieve an error (in $\ell_2$-norm) of $\alpha$ in the parameter estimation with sample complexity $O({\sf poly}(k)/\alpha^2)$. Our method achives the order-optimal sample complexity of $O({\sf log}(k)/\alpha^2)$ when tailored for node-wise-sparse Markov random fields. Finally, we demonstrate the performance of our estimator via numerical experiments.
    摘要 我们考虑了经典的学习问题,即用任意精度学习 naturalk parameter 的 $k$-parameter简化 exponential family 的 i.i.d. 样本。我们关注在支持和自然参数均有限制的设置下。传统的最大 LIKElihood estimator 是一个可靠、 asymptotically normal 和 asymptotically efficient 的 estimator,但是计算困难。在这种工作中,我们提出了一个新的损失函数和一个 computationally efficient 的 estimator,该 estimator 是一个可靠的 estimator ,并且在某些条件下具有 asymptotically normal 性。我们证明了,在人口水平,我们的方法可以视为一个 maximum likelihood estimation 的 re-parameterized distribution 的一个例子,这个 distribution 属于同一个类型的 exponential family。此外,我们还证明了我们的 estimator 可以被视为一个 minimize 的 Bregman score 和一个 surrogate likelihood 的解。我们还提供了 finite sample guarantees,可以在 $\ell_2$-norm 内达到 $\alpha$ 的错误水平, sample complexity 为 $O({\sf poly}(k)/\alpha^2)$。当我们特意适应 node-wise-sparse Markov random fields 时,我们的方法可以 дости到 order-optimal 的 sample complexity $O({\sf log}(k)/\alpha^2)$。最后,我们通过数值实验证明了我们的 estimator 的性能。

Using Reed-Muller Codes for Classification with Rejection and Recovery

  • paper_url: http://arxiv.org/abs/2309.06359
  • repo_url: https://github.com/dfenth/rmaggnet
  • paper_authors: Daniel Fentham, David Parker, Mark Ryan
  • for: 防止攻击者通过生成难以分类的图像来诱导分类器错误输出。
  • methods: 基于Reed-Muller错误校正码的Aggregation Networks(RMAggNet),能够在多种攻击下 corrections 和拒绝输入。
  • results: RMAggNet可以减少错误率,同时保持good correctness,并且可以在不同的攻击budget下进行多种攻击。
    Abstract When deploying classifiers in the real world, users expect them to respond to inputs appropriately. However, traditional classifiers are not equipped to handle inputs which lie far from the distribution they were trained on. Malicious actors can exploit this defect by making adversarial perturbations designed to cause the classifier to give an incorrect output. Classification-with-rejection methods attempt to solve this problem by allowing networks to refuse to classify an input in which they have low confidence. This works well for strongly adversarial examples, but also leads to the rejection of weakly perturbed images, which intuitively could be correctly classified. To address these issues, we propose Reed-Muller Aggregation Networks (RMAggNet), a classifier inspired by Reed-Muller error-correction codes which can correct and reject inputs. This paper shows that RMAggNet can minimise incorrectness while maintaining good correctness over multiple adversarial attacks at different perturbation budgets by leveraging the ability to correct errors in the classification process. This provides an alternative classification-with-rejection method which can reduce the amount of additional processing in situations where a small number of incorrect classifications are permissible.
    摘要 traditional classifiers 传统的分类器adversarial perturbations 攻击性的偏移classification-with-rejection methods 分类与拒绝方法Reed-Muller error-correction codes 里德-迈尔Error correction codesRMAggNet 重元聚合网络incorrectness 错误性good correctness 好的正确性adversarial attacks 攻击性的攻击perturbation budgets 偏移预算additional processing 额外处理

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

  • paper_url: http://arxiv.org/abs/2309.06349
  • repo_url: None
  • paper_authors: Prateek Jaiswal, Debdeep Pati, Anirban Bhattacharya, Bani K. Mallick
  • For: The paper is written to improve the performance of Thompson Sampling (TS) algorithm in stochastic multi-armed bandit problems by using a fractional posterior distribution.* Methods: The paper proposes an variant of TS called $\alpha$-TS, which uses a fractional or $\alpha$-posterior instead of the standard posterior distribution. The paper also provides regret bounds for $\alpha$-TS using recent theoretical developments in non-asymptotic concentration analysis and Bernstein-von Mises type results.* Results: The paper obtains instance-dependent and instance-independent regret bounds for $\alpha$-TS, which are better than those of standard TS. Specifically, the instance-dependent bound is $\mathcal{O}\left(\sum_{k \neq i^*} \Delta_k\left(\frac{\log(T)}{C(\alpha)\Delta_k^2} + \frac{1}{2} \right)\right)$ and the instance-independent bound is $\mathcal{O}(\sqrt{KT\log K})$. The paper also matches the performance of improved UCB algorithm.
    Abstract Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $\alpha$-TS, where we use a fractional or $\alpha$-posterior ($\alpha\in(0,1)$) instead of the standard posterior distribution. To compute an $\alpha$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $\alpha$. For $\alpha$-TS we obtain both instance-dependent $\mathcal{O}\left(\sum_{k \neq i^*} \Delta_k\left(\frac{\log(T)}{C(\alpha)\Delta_k^2} + \frac{1}{2} \right)\right)$ and instance-independent $\mathcal{O}(\sqrt{KT\log K})$ frequentist regret bounds under very mild conditions on the prior and reward distributions, where $\Delta_k$ is the gap between the true mean rewards of the $k^{th}$ and the best arms, and $C(\alpha)$ is a known constant. Both the sub-Gaussian and exponential family models satisfy our general conditions on the reward distribution. Our conditions on the prior distribution just require its density to be positive, continuous, and bounded. We also establish another instance-dependent regret upper bound that matches (up to constants) to that of improved UCB [Auer and Ortner, 2010]. Our regret analysis carefully combines recent theoretical developments in the non-asymptotic concentration analysis and Bernstein-von Mises type results for the $\alpha$-posterior distribution. Moreover, our analysis does not require additional structural properties such as closed-form posteriors or conjugate priors.
    摘要 汤姆采样(TS)是最受欢迎并且最早的多重护卫机器人问题的算法之一。我们考虑了一种变体called $\alpha$-TS,其中使用一个分数或$\alpha$-后验($\alpha\in(0,1)$)而不是标准后验分布。为计算一个$\alpha$-后验,在标准后验定义中的概率被温和了一个因子$\alpha$。对于$\alpha$-TS,我们获得了两种不同的频见 regret bounds:一种是 $\mathcal{O}\left(\sum_{k \neq i^*} \Delta_k\left(\frac{\log(T)}{C(\alpha)\Delta_k^2} + \frac{1}{2} \right)\right)$,另一种是 $\mathcal{O}(\sqrt{KT\log K})$,其中 $\Delta_k$ 是真实奖励的 $k$ 个和最佳武器的差距,$C(\alpha)$ 是已知的常量。两者都满足非常轻量级的前提和奖励分布。我们的前提只需要其概率密度是正、连续和有界的。我们还提出了另一种与改进的 UCB(Auer 和 Ortner, 2010)的 regret upper bound,其中的 regret bound 与(除了常量外)相同。我们的 regret 分析结合了最近的非 asymptotic 归一化分析和 Bernstein-von Mises 类型的 $\alpha$- posterior 分布结果。此外,我们的分析不需要额外的结构性质,例如闭合形 posterior 或 conjugate priors。

Band-gap regression with architecture-optimized message-passing neural networks

  • paper_url: http://arxiv.org/abs/2309.06348
  • repo_url: https://github.com/tisabe/jraph_mpeu
  • paper_authors: Tim Bechtel, Daniel T. Speckhard, Jonathan Godwin, Claudia Draxl
  • for: 本研究使用图结构神经网络(MPNN)预测固体物质的物理性质。
  • methods: 本研究使用密集 functional theory数据从AFLOW数据库进行分类材料为半导体/隔离体或金属。然后,通过神经建筑搜索来探索MPNN模型的建筑和超参空间,以预测材料标记为非金属的带隔。
  • results: 搜索出的最佳模型组成一个ensemble,与现有文献中的模型相比显著提高了性能。不确定性评估使用Monte Carlo Dropout和集成方法,集成方法更为成功。对适用范围进行分析,包括晶系、包括Hubbard参数在内的密集函数计算、以及材料中原子species。
    Abstract Graph-based neural networks and, specifically, message-passing neural networks (MPNNs) have shown great potential in predicting physical properties of solids. In this work, we train an MPNN to first classify materials through density functional theory data from the AFLOW database as being metallic or semiconducting/insulating. We then perform a neural-architecture search to explore the model architecture and hyperparameter space of MPNNs to predict the band gaps of the materials identified as non-metals. The parameters in the search include the number of message-passing steps, latent size, and activation-function, among others. The top-performing models from the search are pooled into an ensemble that significantly outperforms existing models from the literature. Uncertainty quantification is evaluated with Monte-Carlo Dropout and ensembling, with the ensemble method proving superior. The domain of applicability of the ensemble model is analyzed with respect to the crystal systems, the inclusion of a Hubbard parameter in the density functional calculations, and the atomic species building up the materials.
    摘要 GRaph-based neural networks和具体地是消息传递神经网络(MPNNs)在预测固体物理性质方面表现出了极高的潜力。在这项工作中,我们使用MPNN进行物料分类,通过density functional theory数据库AFLOW中的数据来判断物料是金属或半导体/隔离体。然后,我们进行神经网络架构和超参数的搜索,以提高MPNN预测非金属材料带隔的能力。搜索的参数包括消息传递步数、隐藏大小和活化函数等。我们从搜索中选拔出最佳性能的模型,并将其 Pooling 成 ensemble,该ensemble在已有文献中的模型表现出色。我们使用Monte Carlo Dropout和集成来评估uncertainty quantification,集成方法表现更优。我们还分析了ensemble模型的适用范围,包括晶系、包括Hubbard参数在density functional计算中,以及物质组成的原子种。

Modeling Supply and Demand in Public Transportation Systems

  • paper_url: http://arxiv.org/abs/2309.06299
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Miranda Bihler, Hala Nelson, Erin Okey, Noe Reyes Rivas, John Webb, Anna White
  • for: HDPT 想要使用数据提高运营效率和效果。
  • methods: 我们构建了两个供应和需求模型,帮助HDPT发现服务中的缺陷。这些模型考虑了多个变量,包括HDPT向联邦政府报告的方式和海arrisonburg市最有强制力的地区。我们使用数据分析和机器学习技术进行预测。
  • results: 我们预测了HDPT的服务缺陷,以帮助它提高运营效率和效果。
    Abstract The Harrisonburg Department of Public Transportation (HDPT) aims to leverage their data to improve the efficiency and effectiveness of their operations. We construct two supply and demand models that help the department identify gaps in their service. The models take many variables into account, including the way that the HDPT reports to the federal government and the areas with the most vulnerable populations in Harrisonburg City. We employ data analysis and machine learning techniques to make our predictions.
    摘要 哈里逊堡公共交通部门(HDPT)想要利用数据提高其运营效率和效果。我们构建了两个供应和需求模型,帮助公共交通部门识别服务中的缺陷。这些模型考虑了许多变量,包括公共交通部门向联邦政府报告的方式和哈里逊城市最为易受影响的地区。我们使用数据分析和机器学习技术进行预测。

ELRA: Exponential learning rate adaption gradient descent optimization method

  • paper_url: http://arxiv.org/abs/2309.06274
  • repo_url: None
  • paper_authors: Alexander Kleinsorge, Stefan Kupper, Alexander Fauck, Felix Rothe
  • For: 本研究提出了一种新的、快速(指数减速)、无参数(hyper-parameter-free)的梯度下降优化算法。* Methods: 该方法利用情况意识来适应学习率α,主要寻求与邻域梯度垂直的梯度。该方法具有高成功率和快速收敛率,不需手动调整参数,因此更通用。它可应用于任意维度n和问题规模上,并且只 Linearly 增长(order O(n))。* Results: 对于MNIST数据集,该方法与现有优化器进行比较,并达到了惊人的性能。作者认为,这将开启一个全新的研究方向,即梯度下降优化。
    Abstract We present a novel, fast (exponential rate adaption), ab initio (hyper-parameter-free) gradient based optimizer algorithm. The main idea of the method is to adapt the learning rate $\alpha$ by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and does not rely on hand-tuned parameters giving it greater universality. It can be applied to problems of any dimensions n and scales only linearly (of order O(n)) with the dimension of the problem. It optimizes convex and non-convex continuous landscapes providing some kind of gradient. In contrast to the Ada-family (AdaGrad, AdaMax, AdaDelta, Adam, etc.) the method is rotation invariant: optimization path and performance are independent of coordinate choices. The impressive performance is demonstrated by extensive experiments on the MNIST benchmark data-set against state-of-the-art optimizers. We name this new class of optimizers after its core idea Exponential Learning Rate Adaption - ELRA. We present it in two variants c2min and p2min with slightly different control. The authors strongly believe that ELRA will open a completely new research direction for gradient descent optimize.
    摘要 我们提出了一种新的、快速(指数级别适应)、无参数(hyper-parameter-free)的梯度基本优化算法。主要思想是根据情况意识来适应学习率α,主要寻求垂直邻域梯度的协调。该方法具有高成功率和快速收敛率,不依赖手动调整参数,因此更加通用。它可以应用于任意维度n和问题规模上,且只linearly(order O(n))随维度增长。它可以优化凸和非凸连续地征地形,并且提供梯度的一种形式。与Ada家族(AdaGrad、AdaMax、AdaDelta、Adam等)不同,该方法是坐标选择无关的:优化路径和性能独立于坐标选择。我们在MNIST数据集上进行了广泛的实验,并证明了该新类型的优化器对现状最优化器的性能具有很高的表现。我们称之为权重学习率适应(ELRA)。我们提出了两种变体:c2min和p2min,它们有些微的不同控制。作者们认为,ELRA将开启一个全新的研究方向,它将gradient descent优化至新的高度。

ssVERDICT: Self-Supervised VERDICT-MRI for Enhanced Prostate Tumour Characterisation

  • paper_url: http://arxiv.org/abs/2309.06268
  • repo_url: None
  • paper_authors: Snigdha Sen, Saurabh Singh, Hayley Pye, Caroline Moore, Hayley Whitaker, Shonit Punwani, David Atkinson, Eleftheria Panagiotaki, Paddy J. Slator
  • for: 用于诊断 próstata癌(PCa), particularly with diffusion MRI (dMRI) to estimate microstructural information such as cell size.
  • methods: 使用自我指导的神经网络(DNNs)来修正非线性最小二乘(NLLS)的恶性问题, 而不需要专门的训练数据集。
  • results: 比基eline方法(NLLS和Supervised DNN)更高的估计精度和降低偏差,以及更高的信任度对比 benign prostate tissue和癌细胞组织的分化。
    Abstract MRI is increasingly being used in the diagnosis of prostate cancer (PCa), with diffusion MRI (dMRI) playing an integral role. When combined with computational models, dMRI can estimate microstructural information such as cell size. Conventionally, such models are fit with a nonlinear least squares (NLLS) curve fitting approach, associated with a high computational cost. Supervised deep neural networks (DNNs) are an efficient alternative, however their performance is significantly affected by the underlying distribution of the synthetic training data. Self-supervised learning is an attractive alternative, where instead of using a separate training dataset, the network learns the features of the input data itself. This approach has only been applied to fitting of trivial dMRI models thus far. Here, we introduce a self-supervised DNN to estimate the parameters of the VERDICT (Vascular, Extracellular and Restricted DIffusion for Cytometry in Tumours) model for prostate. We demonstrate, for the first time, fitting of a complex three-compartment biophysical model with machine learning without the requirement of explicit training labels. We compare the estimation performance to baseline NLLS and supervised DNN methods, observing improvement in estimation accuracy and reduction in bias with respect to ground truth values. Our approach also achieves a higher confidence level for discrimination between cancerous and benign prostate tissue in comparison to the other methods on a dataset of 20 PCa patients, indicating potential for accurate tumour characterisation.
    摘要

Toward Discretization-Consistent Closure Schemes for Large Eddy Simulation Using Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.06260
  • repo_url: https://github.com/flexi-framework/relexi
  • paper_authors: Andrea Beck, Marius Kurz
  • for: developing discretization-consistent closure schemes for implicitly filtered Large Eddy Simulation (LES)
  • methods: Markov decision process with Reinforcement Learning (RL) to adapt the coefficients of LES closure models
  • results: accurate and consistent results, either matching or outperforming classical state-of-the-art models for different discretizations and resolutions
    Abstract We propose a novel method for developing discretization-consistent closure schemes for implicitly filtered Large Eddy Simulation (LES). In implicitly filtered LES, the induced filter kernel, and thus the closure terms, are determined by the properties of the grid and the discretization operator, leading to additional computational subgrid terms that are generally unknown in a priori analysis. Therefore, the task of adapting the coefficients of LES closure models is formulated as a Markov decision process and solved in an a posteriori manner with Reinforcement Learning (RL). This allows to adjust the model to the actual discretization as it also incorporates the interaction between the discretization and the model itself. This optimization framework is applied to both explicit and implicit closure models. An element-local eddy viscosity model is optimized as the explicit model. For the implicit modeling, RL is applied to identify an optimal blending strategy for a hybrid discontinuous Galerkin (DG) and finite volume scheme. All newly derived models achieve accurate and consistent results, either matching or outperforming classical state-of-the-art models for different discretizations and resolutions. Moreover, the explicit model is demonstrated to adapt its distribution of viscosity within the DG elements to the inhomogeneous discretization properties of the operator. In the implicit case, the optimized hybrid scheme renders itself as a viable modeling ansatz that could initiate a new class of high order schemes for compressible turbulence. Overall, the results demonstrate that the proposed RL optimization can provide discretization-consistent closures that could reduce the uncertainty in implicitly filtered LES.
    摘要 我们提出了一种新的方法,用于开发基于隐式筛护的大气流体学(LES)中的精度适应闭合方法。在隐式筛护LES中,抽象筛护kernels和关闭项的计算是由网格和精度分解算法的性质决定的,导致额外的计算误差,通常在先前分析中不可知。因此,我们将adapting模型的精度适应问题转化为Markov决策过程,并使用再增强学习(RL)来解决。这种优化框架可以在Explicit和隐式关闭模型之间进行选择。我们使用了一种元素本地风格膜粘性模型作为Explicit模型。而对于隐式模型,我们使用RL来标识最佳混合策略。所有新 derive的模型都实现了准确和一致的结果,与经典的状态对照模型相当或超越。此外,Explicit模型还示出了适应不同精度和分辨率的 Distribution of viscosity 的能力。在隐式情况下,优化的混合方案表明了一种可能的新的高阶方法。总的来说,结果表明了我们提出的RL优化可以提供基于隐式筛护LES的精度适应闭合,从而减少了不确定性。

Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models

  • paper_url: http://arxiv.org/abs/2309.06256
  • repo_url: None
  • paper_authors: Yong Lin, Lu Tan, Hangyu Lin, Zeming Zheng, Renjie Pi, Jipeng Zhang, Shizhe Diao, Haoxiang Wang, Han Zhao, Yuan Yao, Tong Zhang
  • for: 本研究旨在探讨基础模型在精化过程中的特性和通用性之间的质量trade-off,以及如何使用多种正则化方法来解决这个问题。
  • methods: 本研究使用了多种正则化方法,包括 continual learning 和 Wise-FT 方法,以mitigate the loss of generality 在精化过程中。
  • results: 研究发现,使用 Wise-FT 方法可以最 effectively 平衡特性和通用性,并且在多种任务和分布上表现最佳。
    Abstract Foundation models, including Vision Language Models (VLMs) and Large Language Models (LLMs), possess the $generality$ to handle diverse distributions and tasks, which stems from their extensive pre-training datasets. The fine-tuning of foundation models is a common practice to enhance task performance or align the model's behavior with human expectations, allowing them to gain $speciality$. However, the small datasets used for fine-tuning may not adequately cover the diverse distributions and tasks encountered during pre-training. Consequently, the pursuit of speciality during fine-tuning can lead to a loss of {generality} in the model, which is related to catastrophic forgetting (CF) in deep learning. In this study, we demonstrate this phenomenon in both VLMs and LLMs. For instance, fine-tuning VLMs like CLIP on ImageNet results in a loss of generality in handling diverse distributions, and fine-tuning LLMs like Galactica in the medical domain leads to a loss in following instructions and common sense. To address the trade-off between the speciality and generality, we investigate multiple regularization methods from continual learning, the weight averaging method (Wise-FT) from out-of-distributional (OOD) generalization, which interpolates parameters between pre-trained and fine-tuned models, and parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA). Our findings show that both continual learning and Wise-ft methods effectively mitigate the loss of generality, with Wise-FT exhibiting the strongest performance in balancing speciality and generality.
    摘要 基础模型,如视觉语言模型(VLM)和大型语言模型(LLM),拥有泛化能力,可以处理多样化分布和任务,这是由于它们的广泛预训练数据集的影响。然而,在精度调整过程中,使用小型数据集可能无法完全覆盖预训练中遇到的多样化分布和任务。因此,在精度调整过程中寻求特点可能会导致模型失去泛化能力,这与深度学习中的恶化学习(CF)有关。在这个研究中,我们 demonstarte了这种现象在VLM和LLM中。例如,在CLIP的ImageNet精度调整中,VLM失去了处理多样化分布的能力,而在医疗领域的Galactica精度调整中,LLM失去了遵循指令和常识。为了解决特点和泛化之间的负担,我们 investigate了多种CONTINUAL LEARNING的正则化方法,包括Weight Averaging Method(Wise-FT)和Parameter-Efficient Fine-Tuning Method(LoRA)。我们的发现表明,CONTINUAL LEARNING和Wise-FT方法都能有效避免失去泛化能力,其中Wise-FT方法在均衡特点和泛化能力方面表现最佳。

Rethinking Evaluation Metric for Probability Estimation Models Using Esports Data

  • paper_url: http://arxiv.org/abs/2309.06248
  • repo_url: None
  • paper_authors: Euihyeon Choi, Jooyoung Kim, Wonkyung Lee
  • for: 这个论文是为了评估电子竞技比赛中的胜率估计模型而写的。
  • methods: 这个论文使用了布莱尔分数和预期准确错误(ECE)作为评估胜率估计模型的性能评估指标,并提出了一个新的简单 yet effective的度量标准 called Balance score,该标准具有六种好的属性。
  • results: 经过广泛的 simulation studies 和实际游戏快照数据的评估,提出的 Balance score 显示出了可靠地评估电子竞技比赛中胜率估计模型的性能,并且可以作为一个全面的评估指标来评估总的比赛模型。
    Abstract Probability estimation models play an important role in various fields, such as weather forecasting, recommendation systems, and sports analysis. Among several models estimating probabilities, it is difficult to evaluate which model gives reliable probabilities since the ground-truth probabilities are not available. The win probability estimation model for esports, which calculates the win probability under a certain game state, is also one of the fields being actively studied in probability estimation. However, most of the previous works evaluated their models using accuracy, a metric that only can measure the performance of discrimination. In this work, we firstly investigate the Brier score and the Expected Calibration Error (ECE) as a replacement of accuracy used as a performance evaluation metric for win probability estimation models in esports field. Based on the analysis, we propose a novel metric called Balance score which is a simple yet effective metric in terms of six good properties that probability estimation metric should have. Under the general condition, we also found that the Balance score can be an effective approximation of the true expected calibration error which has been imperfectly approximated by ECE using the binning technique. Extensive evaluations using simulation studies and real game snapshot data demonstrate the promising potential to adopt the proposed metric not only for the win probability estimation model for esports but also for evaluating general probability estimation models.
    摘要 probabilistic estimation models play an important role in various fields, such as weather forecasting, recommendation systems, and sports analysis. among several models estimating probabilities, it is difficult to evaluate which model gives reliable probabilities since the ground-truth probabilities are not available. the win probability estimation model for esports, which calculates the win probability under a certain game state, is also one of the fields being actively studied in probability estimation. however, most of the previous works evaluated their models using accuracy, a metric that only can measure the performance of discrimination. in this work, we firstly investigate the Brier score and the Expected Calibration Error (ECE) as a replacement of accuracy used as a performance evaluation metric for win probability estimation models in esports field. based on the analysis, we propose a novel metric called Balance score which is a simple yet effective metric in terms of six good properties that probability estimation metric should have. under the general condition, we also found that the Balance score can be an effective approximation of the true expected calibration error which has been imperfectly approximated by ECE using the binning technique. extensive evaluations using simulation studies and real game snapshot data demonstrate the promising potential to adopt the proposed metric not only for the win probability estimation model for esports but also for evaluating general probability estimation models.

Consistency and adaptivity are complementary targets for the validation of variance-based uncertainty quantification metrics in machine learning regression tasks

  • paper_url: http://arxiv.org/abs/2309.06240
  • repo_url: None
  • paper_authors: Pascal Pernot
  • for: 这篇论文主要关注机器学习 uncertainty quantification(UQ)在材料和化学领域的可靠性评估。
  • methods: 论文使用了多种方法来评估UQ的可靠性,包括consistency和adaptivity。
  • results: 论文表明了consistency和adaptivity是补充的验证目标,并且一个good consistency不一定意味着good adaptivity。
    Abstract Reliable uncertainty quantification (UQ) in machine learning (ML) regression tasks is becoming the focus of many studies in materials and chemical science. It is now well understood that average calibration is insufficient, and most studies implement additional methods testing the conditional calibration with respect to uncertainty, i.e. consistency. Consistency is assessed mostly by so-called reliability diagrams. There exists however another way beyond average calibration, which is conditional calibration with respect to input features, i.e. adaptivity. In practice, adaptivity is the main concern of the final users of a ML-UQ method, seeking for the reliability of predictions and uncertainties for any point in features space. This article aims to show that consistency and adaptivity are complementary validation targets, and that a good consistency does not imply a good adaptivity. Adapted validation methods are proposed and illustrated on a representative example.
    摘要 通用不确定评估(UQ)在机器学习(ML)回归任务中的可靠性已成为许多材料和化学科学研究的焦点。现已经明确,平均调整不足,大多数研究采用附加方法测试受到不确定性的Conditional calibration,即一致性。一致性通常通过所谓的可靠性图表示。然而,还有另一种超出平均调整的方法,即基于输入特征的 Conditional calibration,即适应性。在实践中,适应性是最终用户的ML-UQ方法可靠性预测和不确定性的首要关心,寻求任何特征空间中的可靠性和不确定性。本文目标表明了一致性和适应性是补充 validate 目标,并且一个好的一致性不一定意味着一个好的适应性。适应性验证方法被提出并在一个代表性的示例中 ilustrated。

Risk-Aware Reinforcement Learning through Optimal Transport Theory

  • paper_url: http://arxiv.org/abs/2309.06239
  • repo_url: None
  • paper_authors: Ali Baheri
  • for: This paper is written for researchers and practitioners interested in developing risk-aware reinforcement learning (RL) algorithms that can operate in dynamic and uncertain environments.
  • methods: The paper integrates Optimal Transport (OT) theory with RL to create a risk-aware framework. The approach modifies the objective function to ensure that the resulting policy maximizes expected rewards while respecting risk constraints dictated by OT distances between state visitation distributions and desired risk profiles.
  • results: The paper offers a formulation that elevates risk considerations alongside conventional RL objectives, and provides a series of theorems that map the relationships between risk distributions, optimal value functions, and policy behaviors. The work demonstrates a promising direction for RL, ensuring a balanced fusion of reward pursuit and risk awareness.
    Abstract In the dynamic and uncertain environments where reinforcement learning (RL) operates, risk management becomes a crucial factor in ensuring reliable decision-making. Traditional RL approaches, while effective in reward optimization, often overlook the landscape of potential risks. In response, this paper pioneers the integration of Optimal Transport (OT) theory with RL to create a risk-aware framework. Our approach modifies the objective function, ensuring that the resulting policy not only maximizes expected rewards but also respects risk constraints dictated by OT distances between state visitation distributions and the desired risk profiles. By leveraging the mathematical precision of OT, we offer a formulation that elevates risk considerations alongside conventional RL objectives. Our contributions are substantiated with a series of theorems, mapping the relationships between risk distributions, optimal value functions, and policy behaviors. Through the lens of OT, this work illuminates a promising direction for RL, ensuring a balanced fusion of reward pursuit and risk awareness.
    摘要 在动态和不确定环境中,控制风险成为RL运算中关键的一个因素,以确保可靠的决策。传统RL方法,虽然能够优化奖励,但经常忽略潜在的风险风险。为应对这些问题,这篇论文提出了将Optimal Transport(OT)理论与RL结合,创建一个风险意识框架。我们的方法修改了目标函数,使得结果策略不仅最大化预期奖励,还遵循由OT距离状态访问分布和愿望风险规则的风险约束。通过OT的数学精度,我们提供了一种形ulation,使得风险考虑与传统RL目标联系起来。我们的贡献得到了一系列定理的证明,映射了风险分布、优化值函数和策略行为之间的关系。通过OT的视角,这项工作突出了RL中风险意识的重要性,并提供了一个平衡RL目标和风险考虑的可行方法。

A Consistent and Scalable Algorithm for Best Subset Selection in Single Index Models

  • paper_url: http://arxiv.org/abs/2309.06230
  • repo_url: None
  • paper_authors: Borui Tang, Jin Zhu, Junxian Zhu, Xueqin Wang, Heping Zhang
  • for: 这篇论文旨在提出一种可扩展的算法,用于在高维数据中选择最佳子集,以提高模型选择和预测性能。
  • methods: 该论文提出了一种新的算法,基于通信簇信息理论和概率评估,可以在高维数据中直接解决最佳子集选择问题,并且可以确定支持大小。
  • results: simulations 表明,该算法不仅可以快速计算出最佳子集,而且可以准确地回归最佳子集,并且不需要进行模型选择调整。
    Abstract Analysis of high-dimensional data has led to increased interest in both single index models (SIMs) and best subset selection. SIMs provide an interpretable and flexible modeling framework for high-dimensional data, while best subset selection aims to find a sparse model from a large set of predictors. However, best subset selection in high-dimensional models is known to be computationally intractable. Existing methods tend to relax the selection, but do not yield the best subset solution. In this paper, we directly tackle the intractability by proposing the first provably scalable algorithm for best subset selection in high-dimensional SIMs. Our algorithmic solution enjoys the subset selection consistency and has the oracle property with a high probability. The algorithm comprises a generalized information criterion to determine the support size of the regression coefficients, eliminating the model selection tuning. Moreover, our method does not assume an error distribution or a specific link function and hence is flexible to apply. Extensive simulation results demonstrate that our method is not only computationally efficient but also able to exactly recover the best subset in various settings (e.g., linear regression, Poisson regression, heteroscedastic models).
    摘要 Translated into Simplified Chinese:高维数据分析引发了对单指数模型(SIMs)和最佳子集选择的更多的关注。SIMs提供了可解释和灵活的模型框架,而最佳子集选择则目标是从大量预测变量中找到一个稀疏的模型。然而,高维模型中的最佳子集选择是计算 tractable。现有方法通常会放弃选择,而不是获得最佳子集解决方案。在这篇论文中,我们直接面临到了 tractability 的问题,并提出了高维 SIMs 中首个可扩展性的最佳子集选择算法。我们的算法解决了模型选择决策,并且具有可算法性和高概率 oracle 性。我们的方法不仅不需要错误分布或特定的链函数,而且可以适应应用。我们的实验结果表明,我们的方法不仅是计算高效的,而且能够在不同的设置(如线性回归、波利回归、不同的风险函数)中准确回归最佳子集。

Long-term drought prediction using deep neural networks based on geospatial weather data

  • paper_url: http://arxiv.org/abs/2309.06212
  • repo_url: None
  • paper_authors: Vsevolod Grabar, Alexander Marusov, Yury Maximov, Nazar Sotiriadi, Alexander Bulkin, Alexey Zaytsev
  • for: 预测specific地区的旱情可能性,以便农业决策。
  • methods: 使用various spatiotemporal neural networks,包括Convolutional LSTM和transformer模型,以预测旱情Intensity。
  • results: 比基线模型更高的ROC AUC scores,表明Convolutional LSTM和transformer模型具有更高的预测精度。
    Abstract The accurate prediction of drought probability in specific regions is crucial for informed decision-making in agricultural practices. It is important to make predictions one year in advance, particularly for long-term decisions. However, forecasting this probability presents challenges due to the complex interplay of various factors within the region of interest and neighboring areas. In this study, we propose an end-to-end solution to address this issue based on various spatiotemporal neural networks. The models considered focus on predicting the drought intensity based on the Palmer Drought Severity Index (PDSI) for subregions of interest, leveraging intrinsic factors and insights from climate models to enhance drought predictions. Comparative evaluations demonstrate the superior accuracy of Convolutional LSTM (ConvLSTM) and transformer models compared to baseline gradient boosting and logistic regression solutions. The two former models achieved impressive ROC AUC scores from 0.90 to 0.70 for forecast horizons from one to six months, outperforming baseline models. The transformer showed superiority for shorter horizons, while ConvLSTM did so for longer horizons. Thus, we recommend selecting the models accordingly for long-term drought forecasting. To ensure the broad applicability of the considered models, we conduct extensive validation across regions worldwide, considering different environmental conditions. We also run several ablation and sensitivity studies to challenge our findings and provide additional information on how to solve the problem.
    摘要 “精准预测特定地区的旱情机会是农业实践中重要的决策依据。特别是在长期决策中,一年前的预测非常重要。然而,预测这些机会受到当地和邻近地区的复杂因素之间的互动所困扰。在这个研究中,我们提出了一个终端解决方案,基于不同的时空神经网络。我们考虑的模型集中心于预测旱情强度,基于Palmer旱情严重指数(PDSI)的子区域,利用自然因素和气候模型的内在知识来增强旱情预测。比较评估显示,Convolutional LSTM(ConvLSTM)和transformer模型在基于梯度提升和逻辑回传模型的比较下表现出更高的ROC AUC分数,分别在1至6个月的预测时间范围内。transformer模型在短期预测中表现出色,而ConvLSTM模型在长期预测中表现出色。因此,我们建议在长期旱情预测中选择这两种模型。为确保考虑的模型在不同环境下具有广泛的应用性,我们进行了广泛的验证,考虑了不同的环境条件。我们还进行了多个ablation和敏感性研究,以提供额外的信息和解决方案。”

Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding

  • paper_url: http://arxiv.org/abs/2309.06195
  • repo_url: None
  • paper_authors: Shaik Basheeruddin Shah, Pradyumna Pradhan, Wei Pu, Ramunaidu Randhi, Miguel R. D. Rodrigues, Yonina C. Eldar
  • for: 本研究探讨了线性逆问题的解决方法,具体来说是使用iterative soft-thresholding algorithm (LISTA)和alternating direction method of multipliers compressive sensing network (ADMM-CSNet)来有效地Addressing these problems.
  • methods: 本文使用了finite-layer unfolded networks,如LISTA和ADMM-CSNet,以及smooth soft-thresholding nonlinearity。
  • results: 本研究表明,在over-parameterized(OP) regime下,通过利用一种修改后的Polyak-Lojasiewicz(PL*)条件,可以确保 Training loss减少到 Near-zero 的情况,并且存在global minimum和抽象减少从初始化点使用梯度下降方法。此外,我们还证明了,随着网络宽度的增加, unfolded networks 的阈值会增加,而 FFNN 的阈值则会减少。
    Abstract Solving linear inverse problems plays a crucial role in numerous applications. Algorithm unfolding based, model-aware data-driven approaches have gained significant attention for effectively addressing these problems. Learned iterative soft-thresholding algorithm (LISTA) and alternating direction method of multipliers compressive sensing network (ADMM-CSNet) are two widely used such approaches, based on ISTA and ADMM algorithms, respectively. In this work, we study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs, for finite-layer unfolded networks such as LISTA and ADMM-CSNet with smooth soft-thresholding in an over-parameterized (OP) regime. We achieve this by leveraging a modified version of the Polyak-Lojasiewicz, denoted PL$^*$, condition. Satisfying the PL$^*$ condition within a specific region of the loss landscape ensures the existence of a global minimum and exponential convergence from initialization using gradient descent based methods. Hence, we provide conditions, in terms of the network width and the number of training samples, on these unfolded networks for the PL$^*$ condition to hold. We achieve this by deriving the Hessian spectral norm of these networks. Additionally, we show that the threshold on the number of training samples increases with the increase in the network width. Furthermore, we compare the threshold on training samples of unfolded networks with that of a standard fully-connected feed-forward network (FFNN) with smooth soft-thresholding non-linearity. We prove that unfolded networks have a higher threshold value than FFNN. Consequently, one can expect a better expected error for unfolded networks than FFNN.
    摘要 解决线性逆问题在许多应用中发挥关键作用。基于数据驱动的算法折叠方法在这些问题上获得了广泛的关注。例如,学习迭代软阈值算法(LISTA)和多向量方法混合压缩感知网络(ADMM-CSNet)是两种广泛使用的方法,基于ISTA和ADMM算法 соответственно。在这种情况下,我们研究了优化保证,即随着学习epoch数量的增加,训练损失逼近零。我们通过利用修改后的波利亚克-洛亚西埃茨(PL$^*$)条件来实现这一点。在特定的损失图像中满足PL$^*$条件可以保证存在全局最小值,并使用梯度下降方法进行快速收敛。因此,我们提供了基于网络宽度和训练样本数量的条件,以确保PL$^*$条件在 unfolded networks 中成立。我们通过计算这些网络的梯度特征值来实现这一点。此外,我们还证明了随着网络宽度的增加,训练样本数量的阈值也会增加。此外,我们比较了 unfolded networks 和标准的全连接径Feedforward Network(FFNN)的训练样本数量的阈值,并证明了 unfolded networks 的阈值高于 FFNN。因此,我们可以预期 unfolded networks 的预期错误更低。

Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments

  • paper_url: http://arxiv.org/abs/2309.06183
  • repo_url: None
  • paper_authors: Philippe Gonzalez, Tommy Sonne Alstrøm, Tobias May
    for: 这种研究旨在解决学习型声音提高系统在不同条件下的一致性问题。methods: 研究使用了一个参照模型,该模型在测试条件下进行训练,以便用于评估测试条件的难度。results: 研究发现,所有模型在语音匹配情况下表现最差,而好的噪声和房间泛化可以通过训练多个数据库来实现。此外,最新的模型在匹配情况下表现最好,但在不匹配情况下表现很差,甚至可能 inferior to FFNN-based system。
    Abstract The acoustic variability of noisy and reverberant speech mixtures is influenced by multiple factors, such as the spectro-temporal characteristics of the target speaker and the interfering noise, the signal-to-noise ratio (SNR) and the room characteristics. This large variability poses a major challenge for learning-based speech enhancement systems, since a mismatch between the training and testing conditions can substantially reduce the performance of the system. Generalization to unseen conditions is typically assessed by testing the system with a new speech, noise or binaural room impulse response (BRIR) database different from the one used during training. However, the difficulty of the speech enhancement task can change across databases, which can substantially influence the results. The present study introduces a generalization assessment framework that uses a reference model trained on the test condition, such that it can be used as a proxy for the difficulty of the test condition. This allows to disentangle the effect of the change in task difficulty from the effect of dealing with new data, and thus to define a new measure of generalization performance termed the generalization gap. The procedure is repeated in a cross-validation fashion by cycling through multiple speech, noise, and BRIR databases to accurately estimate the generalization gap. The proposed framework is applied to evaluate the generalization potential of a feedforward neural network (FFNN), Conv-TasNet, DCCRN and MANNER. We find that for all models, the performance degrades the most in speech mismatches, while good noise and room generalization can be achieved by training on multiple databases. Moreover, while recent models show higher performance in matched conditions, their performance substantially decreases in mismatched conditions and can become inferior to that of the FFNN-based system.
    摘要 难以分辨的声音混响样本中的声音特征是多种因素的影响,包括目标说话人的spectro-temporal特征、干扰噪声和room特性。这种大量的变化对学习型声音提高系统来说是一个主要挑战,因为在测试和训练条件不同时,系统的性能可能会下降substantially。通常来衡量系统的普适性是通过在测试数据集中测试系统,并对其进行cross-validation验证。然而,任务难度可能会在不同的数据集中发生变化,这会对结果产生很大的影响。本研究提出了一种普适性评估框架,通过使用测试条件下的参考模型,以便用其作为测试条件的困难度的代理。这 позволяет分解把握新数据的效果与把握任务难度的效果分开,并定义一个新的普适性度量,称为普适差(generalization gap)。该框架在多个语音、噪声和BRIR数据集中重复应用,以准确估计普适性差。研究发现,对所有模型来说,性能最大程度下降是在语音匹配中,而好的噪声和room普适性可以通过训练多个数据集来实现。此外,最新的模型在匹配条件下表现出色,但在匹配不符条件下表现很差,可能变成较为老的FFNN基于系统的性能。

Efficient Memory Management for Large Language Model Serving with PagedAttention

  • paper_url: http://arxiv.org/abs/2309.06180
  • repo_url: https://github.com/vllm-project/vllm
  • paper_authors: Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
  • for: 提高大语言模型(LLM)的高速服务性能,需要批处理足够多的请求。但现有系统受到缓存(KV cache)内存的废弃和重复占用的问题,限制批处理大小。
  • methods: 我们提出了基于经典虚拟内存和分页技术的注意力算法——PagedAttention,并在其基础上构建了vLLM,一个实现了近于零废弃的KV cache内存和请求之间共享的LLM服务系统。
  • results: 我们的评估结果显示,vLLM可以提高受欢迎的LLM的吞吐量,比现状态之系统(如FasterTransformer和Orca)高出2-4倍,同时保持同样的响应时间。这种改进更加明显地出现在更长的序列、更大的模型和更复杂的解码算法中。vLLM的源代码可以在https://github.com/vllm-project/vllm上获取。
    Abstract High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address this problem, we propose PagedAttention, an attention algorithm inspired by the classical virtual memory and paging techniques in operating systems. On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage. Our evaluations show that vLLM improves the throughput of popular LLMs by 2-4$\times$ with the same level of latency compared to the state-of-the-art systems, such as FasterTransformer and Orca. The improvement is more pronounced with longer sequences, larger models, and more complex decoding algorithms. vLLM's source code is publicly available at https://github.com/vllm-project/vllm
    摘要 高速服务大语言模型(LLM)需要批处理足够多的请求。然而,现有系统受到缓存(KV cache)的内存占用增加和减少的影响,导致批处理大小受限。当管理不善时,这些内存可能会受到浪费,因为碎片和重复备份,限制批处理大小。为解决这问题,我们提出了 PagedAttention,一种基于经典虚拟内存和分页技术的注意力算法。此外,我们建立了 vLLM,一个实现了(1)缓存内存几乎为零和(2)在请求之间和请求中 flexible 分享缓存的 LLM 服务系统。我们的评估显示,vLLM 可以在相同的延迟水平下提高流行的 LLM 的 Throughput 2-4 倍,比现状态的系统(如 FasterTransformer 和 Orca)更高效。这种改进更加明显,当遇到长序、大模型和复杂的解码算法时。vLLM 的源代码可以在 上获取。

Accelerating Edge AI with Morpher: An Integrated Design, Compilation and Simulation Framework for CGRAs

  • paper_url: http://arxiv.org/abs/2309.06127
  • repo_url: https://github.com/ecolab-nus/morpher-v2
  • paper_authors: Dhananjaya Wijerathne, Zhaoying Li, Tulika Mitra
  • for: 这个论文旨在探讨基于CGRA的粗粒度可重配置扩展(Coarse-Grained Reconfigurable Arrays)在边缘计算中的应用潜力,以及Morpher框架如何自动将AI应用程序核心编译到用户定义的CGRA架构上,并验证其功能。
  • methods: 这篇论文使用了Morpher框架,包括特制的编译器、模拟器、加速器合成和验证框架,来探讨CGRA在边缘计算中的灵活性和高效性。
  • results: 该论文表明,Morpher框架可以自动将AI应用程序核心编译到用户定义的CGRA架构上,并验证其功能。这些结果表明CGRA在边缘计算中的应用潜力,以及Morpher框架的可靠性和灵活性。
    Abstract Coarse-Grained Reconfigurable Arrays (CGRAs) hold great promise as power-efficient edge accelerator, offering versatility beyond AI applications. Morpher, an open-source, architecture-adaptive CGRA design framework, is specifically designed to explore the vast design space of CGRAs. The comprehensive ecosystem of Morpher includes a tailored compiler, simulator, accelerator synthesis, and validation framework. This study provides an overview of Morpher, highlighting its capabilities in automatically compiling AI application kernels onto user-defined CGRA architectures and verifying their functionality. Through the Morpher framework, the versatility of CGRAs is harnessed to facilitate efficient compilation and verification of edge AI applications, covering important kernels representative of a wide range of embedded AI workloads. Morpher is available online at https://github.com/ecolab-nus/morpher-v2.
    摘要 便捷grained重构阵列(CGRA)具有很大的潜力,作为智能边缘加速器,它们的灵活性超出了人工智能应用。Morpher是一个开源的、建筑架构适应的CGRA设计框架,专门为探索CGRA的庞大设计空间而设计。Morpher的全面生态系统包括特制的编译器、模拟器、加速器合成和验证框架。本文提供了Morpher的概述,强调它在自动将人工智能应用程序核心编译到用户定义的CGRA架构上并验证其功能的能力。通过Morpher框架,CGRA的灵活性得以充分发挥,以便高效地编译和验证边缘AI应用程序,覆盖了各种嵌入式AI工作负荷中的重要核心。Morpher可在https://github.com/ecolab-nus/morpher-v2上下载。

A robust synthetic data generation framework for machine learning in High-Resolution Transmission Electron Microscopy (HRTEM)

  • paper_url: http://arxiv.org/abs/2309.06122
  • repo_url: None
  • paper_authors: Luis Rangel DaCosta, Katherine Sytwu, Catherine Groschner, Mary Scott
  • for: 这项研究的目的是开发高精度自动分析工具,用于 characterizing nanomaterials。
  • methods: 这项研究使用了机器学习技术,包括 neural networks,来开发高精度自动分析工具。
  • results: 研究人员通过使用 Construction Zone 和 simulated databases,成功地实现了高精度的自动分析工具,并在多个实验室 benchmark 上达到了州际精度。
    Abstract Machine learning techniques are attractive options for developing highly-accurate automated analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In this work, we introduce Construction Zone, a Python package for rapidly generating complex nanoscale atomic structures, and develop an end-to-end workflow for creating large simulated databases for training neural networks. Construction Zone enables fast, systematic sampling of realistic nanomaterial structures, and can be used as a random structure generator for simulated databases, which is important for generating large, diverse synthetic datasets. Using HRTEM imaging as an example, we train a series of neural networks on various subsets of our simulated databases to segment nanoparticles and holistically study the data curation process to understand how various aspects of the curated simulated data -- including simulation fidelity, the distribution of atomic structures, and the distribution of imaging conditions -- affect model performance across several experimental benchmarks. Using our results, we are able to achieve state-of-the-art segmentation performance on experimental HRTEM images of nanoparticles from several experimental benchmarks and, further, we discuss robust strategies for consistently achieving high performance with machine learning in experimental settings using purely synthetic data.
    摘要 机器学习技术是为发展高精度自动分析工具的可能性非常高,特别是在高解度电子镜icroscopy (HRTEM) 领域。然而,实现这些机器学习工具可能会困难,因为实验获得足够大、高质量训练数据的挑战。在这项工作中,我们介绍了 Construction Zone,一个 Python 包,用于快速生成复杂的 nanoscale 原子结构,并开发了一个端到端的工作流程,用于创建大规模的 simulated 数据库。Construction Zone 允许快速、系统地采样真实的 nanomaterial 结构,可以用作随机结构生成器,这是生成大、多样化的 sintetic 数据库的重要工具。使用 HRTEM 成像为例,我们在不同的 sub-dataset 上训练了一系列神经网络,以分类 nanoparticles,并对数据准备过程进行了全面的研究,以了解不同的 simulated 数据特性如 simulation fidelity、原子结构分布和成像条件分布对模型性能的影响。使用我们的结果,我们可以在多个实验室benchmark上达到 estado-of-the-art 的分类性能,并讨论了在实验设置中使用纯 synthetic 数据获得高性能的机器学习策略。

Overview of Human Activity Recognition Using Sensor Data

  • paper_url: http://arxiv.org/abs/2309.07170
  • repo_url: None
  • paper_authors: Rebeen Ali Hamad, Wai Lok Woo, Bo Wei, Longzhi Yang
  • for: 这篇论文主要是为了探讨人活动识别(HAR)领域的最新进展和挑战。
  • methods: 本论文使用了多种感知器和深度学习技术来探讨人活动识别。
  • results: 论文提出了一些关键应用场景,如家居和办公室自动化、安全监测和医疗保健等,同时也探讨了深度学习技术在HAR领域的应用。
    Abstract Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently developing deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last decade. Even though several review and survey studies have been published, there is a lack of sensor-based HAR overview studies focusing on summarising the usage of wearable sensors and smart home sensors data as well as applications of HAR and deep learning techniques. Hence, we overview sensor-based HAR, discuss several important applications that rely on HAR, and highlight the most common machine learning methods that have been used for HAR. Finally, several challenges of HAR are explored that should be addressed to further improve the robustness of HAR.
    摘要 人类活动识别(HAR)是一个重要的研究领域,在不同的应用中都有广泛的应用,包括家庭和工作场所自动化、安全监测以及医疗保健等。过去十年,在传统的机器学习方法基础上,深度学习技术的发展以及互联网物联网技术的应用,在HAR领域内部维护了重要的贡献。然而,当前的文献综述和评视研究却缺乏关于基于感知器的HAR的概要,包括穿戴式感知器和智能家居感知器数据的使用情况,以及基于HAR和深度学习技术的应用。因此,本文将对感知器基于HAR进行概要,讨论一些重要的依赖于HAR的应用,并高亮一些最常用的机器学习方法,最后探讨HAR面临的一些挑战,以便进一步改善HAR的稳定性。

A General Verification Framework for Dynamical and Control Models via Certificate Synthesis

  • paper_url: http://arxiv.org/abs/2309.06090
  • repo_url: None
  • paper_authors: Alec Edwards, Andrea Peruffo, Alessandro Abate
  • for: 这篇论文主要关注在证书学习中,即指定一个自动或控制模型的行为,并通过函数基本证明其正确性。
  • methods: 该论文提出了一种通用框架,用于编码系统特性和相应的证书,以及一种自动化控制器和证书的生成方法。该方法利用神经网络提供候选控制函数和证书函数,并使用SMT解决器提供正式正确性保证。
  • results: 该论文通过开发一个软件工具,测试了其核心框架的可靠性和灵活性。 Results show that the proposed approach can effectively synthesize controllers and certificates for a wide range of system specifications, and provide a formal guarantee of correctness.
    Abstract An emerging branch of control theory specialises in certificate learning, concerning the specification of a desired (possibly complex) system behaviour for an autonomous or control model, which is then analytically verified by means of a function-based proof. However, the synthesis of controllers abiding by these complex requirements is in general a non-trivial task and may elude the most expert control engineers. This results in a need for automatic techniques that are able to design controllers and to analyse a wide range of elaborate specifications. In this paper, we provide a general framework to encode system specifications and define corresponding certificates, and we present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks to provide candidate control and certificate functions, whilst using SMT-solvers to offer a formal guarantee of correctness. We test our framework by developing a prototype software tool, and assess its efficacy at verification via control and certificate synthesis over a large and varied suite of benchmarks.
    摘要 一种新般的控制理论分支是证书学习,关注自动或控制模型的行为规范,并通过函数基本证明其分析。然而,实现这些复杂要求的控制器设计通常是一个非rivial任务,可能会让控制工程师感到惑乱。这导致了一种自动化的技术需求,能够设计控制器并分析各种复杂规范。在这篇论文中,我们提供一个通用框架来编码系统规范和相应的证书,并提出一种自动化的控制器和证书synthesis方法。我们的方法在安全学习控制领域中发挥作用,利用神经网络提供候选控制和证书函数,而使用SMT-解决器提供正式的正确性保证。我们测试了我们的框架,开发了一个原型软件工具,并通过控制和证书验证 benchmarks 进行验证。

Information Flow in Graph Neural Networks: A Clinical Triage Use Case

  • paper_url: http://arxiv.org/abs/2309.06081
  • repo_url: None
  • paper_authors: Víctor Valls, Mykhaylo Zayats, Alessandra Pascale
  • for: This paper aims to investigate the effect of embedding information flow within Graph Neural Networks (GNNs) on the prediction of links in Knowledge Graphs (KGs), with a specific use case in clinical triage.
  • methods: The paper proposes a mathematical model that decouples the GNN connectivity from the connectivity of the graph data, and evaluates the performance of GNNs with different connectivity strategies.
  • results: The results show that incorporating domain knowledge into the GNN connectivity leads to better performance than using the same connectivity as the KG or allowing unconstrained embedding propagation. Additionally, the paper finds that negative edges play a crucial role in achieving good predictions, and that using too many GNN layers can degrade performance.Here’s the simplified Chinese text for the three information points:
  • for: 这篇论文目标是调查图神经网络(GNNs)中 embedding 信息流动对知识图(KGs)中预测链接的影响,特别是在医疗抢救use case中。
  • methods: 论文提出了一个数学模型,将GNN连接性与图数据连接性分离开来,并评估不同连接策略下GNN的表现。
  • results: 结果表明,基于域知识的GNN连接策略可以在预测链接方面获得更好的性能,而使用同KG连接策略或允许无约 embedding 传播的策略则不如其他策略。此外,论文还发现,负边在预测链接方面扮演着关键的角色,并且使用过多GNN层可能会降低性能。
    Abstract Graph Neural Networks (GNNs) have gained popularity in healthcare and other domains due to their ability to process multi-modal and multi-relational graphs. However, efficient training of GNNs remains challenging, with several open research questions. In this paper, we investigate how the flow of embedding information within GNNs affects the prediction of links in Knowledge Graphs (KGs). Specifically, we propose a mathematical model that decouples the GNN connectivity from the connectivity of the graph data and evaluate the performance of GNNs in a clinical triage use case. Our results demonstrate that incorporating domain knowledge into the GNN connectivity leads to better performance than using the same connectivity as the KG or allowing unconstrained embedding propagation. Moreover, we show that negative edges play a crucial role in achieving good predictions, and that using too many GNN layers can degrade performance.
    摘要 graph neural networks (GNNs) 在医疗和其他领域中得到广泛应用,这是因为它们可以处理多modal和多关系图。然而,efficiently training GNNs 仍然是一个开放的研究问题,有几个未解决的问题。在这篇论文中,我们研究了在知识图(KGs)中预测链接的情况下,GNNs 中嵌入信息的流动如何影响预测性能。我们提出了一个数学模型,该模型将GNN 连接分离于图数据的连接,并评估了在临床排序用例中GNNs 的性能。我们的结果表明,在GNN 连接中 incorporate 域知识可以提高预测性能,并且使用相同的KG连接或不受限制的嵌入传播也可以提高性能。此外,我们发现,使用负边可以获得好的预测结果,并且使用太多GNN层可以降低性能。

Verifiable Fairness: Privacy-preserving Computation of Fairness for Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2309.06061
  • repo_url: None
  • paper_authors: Ehsan Toreini, Maryam Mehrnezhad, Aad van Moorsel
  • for: 这篇论文目的是提出一种安全、可靠、隐私保护的 Fairness as a Service (FaaS) 协议,用于计算和验证机器学习 (ML) 模型的公平性。
  • methods: 该协议使用密文和零知识证明来保证数据和结果的隐私和有效性。它是模型无关的,可以用来审核任何 ML 模型的公平性。
  • results: 我们实现了 FaaS,并对一个公共数据集进行了实验,证明了它的性能和可行性。
    Abstract Fair machine learning is a thriving and vibrant research topic. In this paper, we propose Fairness as a Service (FaaS), a secure, verifiable and privacy-preserving protocol to computes and verify the fairness of any machine learning (ML) model. In the deisgn of FaaS, the data and outcomes are represented through cryptograms to ensure privacy. Also, zero knowledge proofs guarantee the well-formedness of the cryptograms and underlying data. FaaS is model--agnostic and can support various fairness metrics; hence, it can be used as a service to audit the fairness of any ML model. Our solution requires no trusted third party or private channels for the computation of the fairness metric. The security guarantees and commitments are implemented in a way that every step is securely transparent and verifiable from the start to the end of the process. The cryptograms of all input data are publicly available for everyone, e.g., auditors, social activists and experts, to verify the correctness of the process. We implemented FaaS to investigate performance and demonstrate the successful use of FaaS for a publicly available data set with thousands of entries.
    摘要 《公平机器学习》是一个繁殖的研究领域,在这篇论文中,我们提出了《公平服务(FaaS)》,一种安全、可靠、隐私保护的协议,用于计算和验证任何机器学习(ML)模型的公平性。在FaaS的设计中,数据和结果都是通过密文来保护隐私。此外,零知识证明 garantice了密文和下面数据的正确性。FaaS是模型无关的,可以支持多种公平度量,因此可以作为对任何ML模型的公平性进行审核的服务。我们的解决方案不需要任何不信任第三方或私人通道来计算公平度量。安全保证和承诺都是在安全可靠的方式实现的,从计算开始到结束,每一步都是安全透明的。所有输入数据的密文都是公开可见的,例如,审计人、社会活动人士和专家可以随时查看和验证过程的正确性。我们实现了FaaS,以 investigate性能和 demonstarte其在公共数据集上的成功应用。

Frequency Convergence of Complexon Shift Operators (Extended Version)

  • paper_url: http://arxiv.org/abs/2309.07169
  • repo_url: None
  • paper_authors: Purui Zhang, Xingchao Jian, Feng Ji, Wee Peng Tay, Bihan Wen
  • For: 这个论文研究了高阶结构( simplicial complex)模型的转移性,并使用了一种扩展的高阶 graphon( complexon)来模型这些结构。* Methods: 作者使用了一种基于 integral operator 的 complexon shift operator(CSO)来研究复杂子复杂体系的特性。他们还研究了 CSO 的 eigenvalues 和 eigenvectors,并与一种新的质量权重Matrix 之间的关系。* Results: 作者证明了,当 simplicial complex sequence converges to a complexon,then the eigenvalues of the corresponding CSOs converge to that of the limit complexon。这个结论通过一个数值实验得到了证明。这些结果提出了在大 simplicial complex 或 simplicial complex sequence 上进行学习转移性的可能性,这种框架将 graphon signal processing 扩展到更高阶结构上。
    Abstract Topological Signal Processing (TSP) utilizes simplicial complexes to model structures with higher order than vertices and edges. In this paper, we study the transferability of TSP via a generalized higher-order version of graphon, known as complexon. We recall the notion of a complexon as the limit of a simplicial complex sequence. Inspired by the integral operator form of graphon shift operators, we construct a marginal complexon and complexon shift operator (CSO) according to components of all possible dimensions from the complexon. We investigate the CSO's eigenvalues and eigenvectors, and relate them to a new family of weighted adjacency matrices. We prove that when a simplicial complex sequence converges to a complexon, the eigenvalues of the corresponding CSOs converge to that of the limit complexon. This conclusion is further verified by a numerical experiment. These results hint at learning transferability on large simplicial complexes or simplicial complex sequences, which generalize the graphon signal processing framework.
    摘要 《拓扑信号处理(TSP)利用 simplicial 复lexes 来模型高阶结构。在本文中,我们研究了 TSP 的传送性,使用一种普遍化高阶 graphon 的概念—— complexon。我们提及 complexon 的定义为 simplicial 复lex 序列的极限。受 graphon Shift 算子的积分Operator 启发,我们构建了 marginal complexon 和 complexon shift operator(CSO),根据所有可能的维度组成部分。我们研究了 CSO 的 eigenvalues 和 eigenvectors,并与它们相关的一个新的加权邻接矩阵。我们证明,当 simplicial 复lex 序列 converge 到 complexon 时,相应的 CSO 的 eigenvalues 会 converge 到 complexon 的限制。这一结论通过数值实验得到了证明。这些结果表明在大 simplicial 复lex 或 simplicial 复lex 序列上进行学习传送性是可能的,这种框架将 graphon 信号处理扩展到更高的维度。》Note: Simplified Chinese is a written language that uses simpler characters and grammar than Traditional Chinese. It is commonly used in mainland China and is the official language of the People's Republic of China.

A Perceptron-based Fine Approximation Technique for Linear Separation

  • paper_url: http://arxiv.org/abs/2309.06049
  • repo_url: None
  • paper_authors: Ákos Hajnal
  • for: 本研究提出了一种新的在线学习方法,旨在找到标注为正或负的数据点之间的分隔平面。
  • methods: 本方法基于Perceptron算法,但是只在搜索分隔平面时调整神经元的权重。
  • results: 实验结果表明,该方法可以更高效地than Perceptron算法,特别是当数据集大于数据维度时。
    Abstract This paper presents a novel online learning method that aims at finding a separator hyperplane between data points labelled as either positive or negative. Since weights and biases of artificial neurons can directly be related to hyperplanes in high-dimensional spaces, the technique is applicable to train perceptron-based binary classifiers in machine learning. In case of large or imbalanced data sets, use of analytical or gradient-based solutions can become prohibitive and impractical, where heuristics and approximation techniques are still applicable. The proposed method is based on the Perceptron algorithm, however, it tunes neuron weights in just the necessary extent during searching the separator hyperplane. Due to an appropriate transformation of the initial data set we need not to consider data labels, neither the bias term. respectively, reducing separability to a one-class classification problem. The presented method has proven converge; empirical results show that it can be more efficient than the Perceptron algorithm, especially, when the size of the data set exceeds data dimensionality.
    摘要 The proposed method is based on the Perceptron algorithm, but it only tunes the neuron weights to the necessary extent during the search for the separator hyperplane. This is achieved through an appropriate transformation of the initial data set, which eliminates the need to consider data labels or the bias term. As a result, the method reduces the separability problem to a one-class classification problem.Empirical results show that the proposed method is more efficient than the Perceptron algorithm, especially when the size of the data set exceeds the dimensionality of the data. The method has been proven to converge, and the results demonstrate its effectiveness in finding the optimal separator hyperplane.

Normality Learning-based Graph Anomaly Detection via Multi-Scale Contrastive Learning

  • paper_url: http://arxiv.org/abs/2309.06034
  • repo_url: https://github.com/felixdjc/nlgad
  • paper_authors: Jingcan Duan, Pei Zhang, Siwei Wang, Jingtao Hu, Hu Jin, Jiaxin Zhang, Haifang Zhou, Haifang Zhou
  • for: 这篇论文的目的是提出一个基于多尺度对照学习网络的异常GRAPH检测方法(GAD),以提高检测性能。
  • methods: 本文使用了多尺度对照学习网络(Contrastive Learning Networks,CLN)来学习正常模式,并将其应用于GAD中。具体来说,首先将CLN initialization在不同尺度上,然后设计了一个有效的混合策略来选择正常节点。最后,模型仅对可靠的正常节点进行调整,以学习更加精确的正常模式,以便更好地识别异常节点。
  • results: 实验结果显示,提出的方法可以提高GAD的检测性能(最高提升5.89%的AUC),相比之前的方法。code可以在https://github.com/FelixDJC/NLGAD中下载。
    Abstract Graph anomaly detection (GAD) has attracted increasing attention in machine learning and data mining. Recent works have mainly focused on how to capture richer information to improve the quality of node embeddings for GAD. Despite their significant advances in detection performance, there is still a relative dearth of research on the properties of the task. GAD aims to discern the anomalies that deviate from most nodes. However, the model is prone to learn the pattern of normal samples which make up the majority of samples. Meanwhile, anomalies can be easily detected when their behaviors differ from normality. Therefore, the performance can be further improved by enhancing the ability to learn the normal pattern. To this end, we propose a normality learning-based GAD framework via multi-scale contrastive learning networks (NLGAD for abbreviation). Specifically, we first initialize the model with the contrastive networks on different scales. To provide sufficient and reliable normal nodes for normality learning, we design an effective hybrid strategy for normality selection. Finally, the model is refined with the only input of reliable normal nodes and learns a more accurate estimate of normality so that anomalous nodes can be more easily distinguished. Eventually, extensive experiments on six benchmark graph datasets demonstrate the effectiveness of our normality learning-based scheme on GAD. Notably, the proposed algorithm improves the detection performance (up to 5.89% AUC gain) compared with the state-of-the-art methods. The source code is released at https://github.com/FelixDJC/NLGAD.
    摘要 《图像异常检测(GAD)在机器学习和数据挖掘领域受到了越来越多的关注。 latest works mainly focus on how to capture richer information to improve the quality of node embeddings for GAD. Despite their significant advances in detection performance, there is still a relative dearth of research on the properties of the task. GAD aims to discern the anomalies that deviate from most nodes. However, the model is prone to learn the pattern of normal samples which make up the majority of samples. Meanwhile, anomalies can be easily detected when their behaviors differ from normality. Therefore, the performance can be further improved by enhancing the ability to learn the normal pattern. To this end, we propose a normality learning-based GAD framework via multi-scale contrastive learning networks (NLGAD for abbreviation). Specifically, we first initialize the model with the contrastive networks on different scales. To provide sufficient and reliable normal nodes for normality learning, we design an effective hybrid strategy for normality selection. Finally, the model is refined with the only input of reliable normal nodes and learns a more accurate estimate of normality so that anomalous nodes can be more easily distinguished. Eventually, extensive experiments on six benchmark graph datasets demonstrate the effectiveness of our normality learning-based scheme on GAD. Notably, the proposed algorithm improves the detection performance (up to 5.89% AUC gain) compared with the state-of-the-art methods. The source code is released at https://github.com/FelixDJC/NLGAD.》Note: "GAD" in the text refers to "Graph Anomaly Detection".

Energy-Aware Federated Learning with Distributed User Sampling and Multichannel ALOHA

  • paper_url: http://arxiv.org/abs/2309.06033
  • repo_url: None
  • paper_authors: Rafael Valente da Silva, Onel L. Alcaraz López, Richard Demo Souza
  • for: 这篇论文是针对分布式学习在边缘设备上的能源有效性问题进行研究。
  • methods: 本论文使用了多频道ALOHA协议,并提出了一种方法来保证低能源缺席概率和未来任务的成功执行。
  • results: numerical results表明这种方法在某些重要的设置下具有更好的优化性和电池水平,并且比一种 нор based 解决方案更快地训练。
    Abstract Distributed learning on edge devices has attracted increased attention with the advent of federated learning (FL). Notably, edge devices often have limited battery and heterogeneous energy availability, while multiple rounds are required in FL for convergence, intensifying the need for energy efficiency. Energy depletion may hinder the training process and the efficient utilization of the trained model. To solve these problems, this letter considers the integration of energy harvesting (EH) devices into a FL network with multi-channel ALOHA, while proposing a method to ensure both low energy outage probability and successful execution of future tasks. Numerical results demonstrate the effectiveness of this method, particularly in critical setups where the average energy income fails to cover the iteration cost. The method outperforms a norm based solution in terms of convergence time and battery level.
    摘要 随着联合学习(FL)的出现,分布式学习在边缘设备上已经吸引了更多的注意力。然而,边缘设备通常具有有限的电池和多样化的能源供应,而多轮FL的需求会加剧能效率问题。如果不得要遇到能源枯竭,它会阻碍训练过程和模型的有效使用。为解决这些问题,本文考虑了在FL网络中 интеGRATE了能量收集(EH)设备,并提出了一种方法来保证低能源停机概率和未来任务的成功执行。numerical results表明该方法的效果,特别在 average energy income 不足以覆盖迭代成本的情况下。该方法在 convergence time 和电池水平方面也超越了 norm 基于解决方案。

Emergent Communication in Multi-Agent Reinforcement Learning for Future Wireless Networks

  • paper_url: http://arxiv.org/abs/2309.06021
  • repo_url: None
  • paper_authors: Marwa Chafii, Salmane Naoumi, Reda Alami, Ebtesam Almazrouei, Mehdi Bennis, Merouane Debbah
  • for: 本文探讨了Future 6G无线网络中多个网络实体之间的合作问题,以实现最小延迟和能耗的方式解决高维数据的交换问题。
  • methods: 本文提出了一种基于多代理学习和自然通信(EC-MARL)的解决方案,可以在不可见状态下实现高维连续控制问题的解决。
  • results: 本文介绍了EC-MARL在Future 6G无线网络中的应用潜在性和研究机遇,包括自动驾驶、机器人导航、飞行基站网络规划和智能城市应用。
    Abstract In different wireless network scenarios, multiple network entities need to cooperate in order to achieve a common task with minimum delay and energy consumption. Future wireless networks mandate exchanging high dimensional data in dynamic and uncertain environments, therefore implementing communication control tasks becomes challenging and highly complex. Multi-agent reinforcement learning with emergent communication (EC-MARL) is a promising solution to address high dimensional continuous control problems with partially observable states in a cooperative fashion where agents build an emergent communication protocol to solve complex tasks. This paper articulates the importance of EC-MARL within the context of future 6G wireless networks, which imbues autonomous decision-making capabilities into network entities to solve complex tasks such as autonomous driving, robot navigation, flying base stations network planning, and smart city applications. An overview of EC-MARL algorithms and their design criteria are provided while presenting use cases and research opportunities on this emerging topic.
    摘要 不同无线网络enario中,多个网络元件需要合作以实现最小的延迟和能源消耗,实现高维度资料交换。未来的无线网络将实施高维度连续控制问题,因此通信控制任务将变得更加困难和复杂。多智能推劝学习(EC-MARL)是一种可能的解决方案,它可以在不可见的状态下,通过协调多个智能推劝学习代理人,解决复杂的控制问题。本文说明了EC-MARL在未来6G无线网络中的重要性,具体来说,它将具有自主决策能力,实现无人驾驶、机器人导航、飞行基站网络规划和智慧城市应用等复杂任务。文中还提供了EC-MARL算法和设计需求,以及实际应用和研究机会。

Interpolation, Approximation and Controllability of Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.06015
  • repo_url: None
  • paper_authors: Jingpu Cheng, Qianxiao Li, Ting Lin, Zuowei Shen
  • for: investigate the expressive power of deep residual neural networks idealized as continuous dynamical systems through control theory.
  • methods: consider two properties from supervised learning: universal interpolation and universal approximation, and give a characterization of universal interpolation.
  • results: show that universal interpolation holds for essentially any architecture with non-linearity, and elucidate the relationship between universal interpolation and universal approximation in the context of general control systems.
    Abstract We investigate the expressive power of deep residual neural networks idealized as continuous dynamical systems through control theory. Specifically, we consider two properties that arise from supervised learning, namely universal interpolation - the ability to match arbitrary input and target training samples - and the closely related notion of universal approximation - the ability to approximate input-target functional relationships via flow maps. Under the assumption of affine invariance of the control family, we give a characterisation of universal interpolation, showing that it holds for essentially any architecture with non-linearity. Furthermore, we elucidate the relationship between universal interpolation and universal approximation in the context of general control systems, showing that the two properties cannot be deduced from each other. At the same time, we identify conditions on the control family and the target function that ensures the equivalence of the two notions.
    摘要 我们研究深度径 residual神经网络作为连续动力系统的表达力,通过控制理论进行调查。具体来说,我们考虑了两个从supervised learning中得到的性质, namely universal interpolation和 closely related universal approximation。我们假设控制家族具有平移变换的可变性,然后给出universal interpolation的特征化,证明它对于任意架构都成立。此外,我们还解释了universal interpolation和 universal approximation在通用控制系统中的关系,并证明这两个性质之间无法从一个到另一个推导。同时,我们还提出了控制家族和目标函数的条件,以确保这两个概念之间的等价性。

Learning Unbiased News Article Representations: A Knowledge-Infused Approach

  • paper_url: http://arxiv.org/abs/2309.05981
  • repo_url: None
  • paper_authors: Sadia Kamal, Jimmy Hartford, Jeremy Willis, Arunkumar Bagavathi
  • for: This paper aims to quantify the political leaning of online news articles and mitigate the algorithmic political bias in machine learning models used for this task.
  • methods: The proposed knowledge-infused deep learning model uses relatively reliable external data resources to learn unbiased representations of news articles based on their global and local contexts.
  • results: The proposed model outperforms baseline methods to predict the political leaning of news articles with up to 73% accuracy, mitigating algorithmic political bias.Here’s the Chinese translation of the three pieces of information:
  • for: 这篇论文目的是量化在线新闻文章的政治倾向,并使用机器学习模型来mitigate这种algorithmic political bias。
  • methods: 提议的知识汇集深度学习模型使用相对可靠的外部数据资源来学习不受政治偏见的新闻文章表示。
  • results: 提议的模型在测试集中的准确率达到73%,比基eline方法高。
    Abstract Quantification of the political leaning of online news articles can aid in understanding the dynamics of political ideology in social groups and measures to mitigating them. However, predicting the accurate political leaning of a news article with machine learning models is a challenging task. This is due to (i) the political ideology of a news article is defined by several factors, and (ii) the innate nature of existing learning models to be biased with the political bias of the news publisher during the model training. There is only a limited number of methods to study the political leaning of news articles which also do not consider the algorithmic political bias which lowers the generalization of machine learning models to predict the political leaning of news articles published by any new news publishers. In this work, we propose a knowledge-infused deep learning model that utilizes relatively reliable external data resources to learn unbiased representations of news articles using their global and local contexts. We evaluate the proposed model by setting the data in such a way that news domains or news publishers in the test set are completely unseen during the training phase. With this setup we show that the proposed model mitigates algorithmic political bias and outperforms baseline methods to predict the political leaning of news articles with up to 73% accuracy.
    摘要 政治倾向量化在在线新闻文章中可以帮助我们理解社会团体中政治 идеологи的动态和 mitigate其中的问题。然而,使用机器学习模型预测新闻文章的政治倾向是一项具有挑战性的任务。这是因为(i)政治 идеологи的定义是由多个因素组成,(ii)现有的学习模型具有新闻发布商的政治偏见,从而降低了机器学习模型预测新闻文章的政治倾向的泛化性。目前只有有限的方法可以研究新闻文章的政治倾向,而且这些方法不考虑算法政治偏见。在这项工作中,我们提出一种知识感知深度学习模型,该模型使用可靠的外部数据资源来学习不受偏见的新闻文章表示。我们通过将训练集中的新闻域或新闻发布商完全不见于测试集来评估该模型。 results show that our proposed model can mitigate algorithmic political bias and outperform baseline methods in predicting the political leaning of news articles with up to 73% accuracy.

CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram

  • paper_url: http://arxiv.org/abs/2309.05975
  • repo_url: None
  • paper_authors: Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro
  • for: 这个论文是为了提出一种 integrate waveform denoiser 和 spectrogram denoiser 的 speech denoising模型,以提高 speech denoising 的性能。
  • methods: 该模型使用了 two-stage 框架,首先使用 waveform 模型来生成清晰的 speech waveform,然后使用 spectrogram 模型来生成高质量的 spectrogram,并将两者组合起来进行 denoising。
  • results: 对于多种 objective 和 subjective 评估指标,CleanUNet 2 的性能都高于先前的方法,并且可以在不同的 speech 质量和频率范围下提供高质量的 denoising 结果。
    Abstract In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds. CleanUNet 2 uses a two-stage framework inspired by popular speech synthesis methods that consist of a waveform model and a spectrogram model. Specifically, CleanUNet 2 builds upon CleanUNet, the state-of-the-art waveform denoiser, and further boosts its performance by taking predicted spectrograms from a spectrogram denoiser as the input. We demonstrate that CleanUNet 2 outperforms previous methods in terms of various objective and subjective evaluations.
    摘要 在这项工作中,我们介绍CleanUNet 2,一种混合波形去噪器和spectrogram去噪器的语音去噪模型,实现了两者的优点。CleanUNet 2采用了两个阶段框架,受到流行的语音合成方法的启发,包括波形模型和spectrogram模型。具体来说,CleanUNet 2基于CleanUNet,当前的波形去噪器顶峰性能,再加以使用预测的spectrogram去噪器输入,进一步提高其性能。我们展示了CleanUNet 2在多个对象和主观评价中的优越性。

Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

  • paper_url: http://arxiv.org/abs/2309.05968
  • repo_url: None
  • paper_authors: Ng Shyh-Chang, A-Li Luo, Bo Qiu
  • for: 这项研究证明了神经网络(NN)编码定理的对偶,即每个稳定地聚合的NN中的权重矩阵实际上编码了一个连续函数,该函数在训练集中的bounded域内准确地approximates the training dataset.
  • methods: 该研究使用了Eckart-Young定理和减少特征值分解来描述NN层的weight矩阵,从而理解 latent space manifold 的结构和NN层中的数学运算的几何特性。
  • results: 研究发现,NN可以通过储存容量来破坏维度约束,并且这两者是补偿的。此外,层矩阵分解(LMD)还发现了神经网络层的归一化矩阵和Hopfield网络和Transformer NN模型的最新发展之间的密切关系。
    Abstract We prove the converse of the universal approximation theorem, i.e. a neural network (NN) encoding theorem which shows that for every stably converged NN of continuous activation functions, its weight matrix actually encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain. We further show that using the Eckart-Young theorem for truncated singular value decomposition of the weight matrix for every NN layer, we can illuminate the nature of the latent space manifold of the training dataset encoded and represented by every NN layer, and the geometric nature of the mathematical operations performed by each NN layer. Our results have implications for understanding how NNs break the curse of dimensionality by harnessing memory capacity for expressivity, and that the two are complementary. This Layer Matrix Decomposition (LMD) further suggests a close relationship between eigen-decomposition of NN layers and the latest advances in conceptualizations of Hopfield networks and Transformer NN models.
    摘要 我们证明了射频学 theorem 的对偶,即一个神经网络(NN)编码 theorem,表明每个稳定地收敛的 NN 的权重矩阵实际上编码了一个连续函数,该函数在培训数据集中的 bounded edomain 内对数据集进行approximation,并且有finite 的误差。我们还证明了,通过 truncated singular value decomposition(Eckart-Young 定理)的 weight matrix 对每个 NN 层,可以照明 latent space manifold 被NN层所编码和表示的训练数据集的结构和 mathe matical 操作的几何特性。我们的结论有关于如何NN突破维度约束,通过吸收记忆来提高表达力,以及这两者之间的关系。此外,我们的层矩阵分解(LMD)还 suggets一种close relationship между NN层的eigen-decomposition和最新的 Hopfield 网络和Transformer NN 模型的概念化。

GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection

  • paper_url: http://arxiv.org/abs/2309.05953
  • repo_url: https://github.com/yul091/GraphLogAD
  • paper_authors: Yufei Li, Yanchi Liu, Haoyu Wang, Zhengzhang Chen, Wei Cheng, Yuncong Chen, Wenchao Yu, Haifeng Chen, Cong Liu
  • for: 本研究旨在探讨系统日志中异常检测,尤其是考虑系统组件之间的关系,以优化异常检测和原因探测。
  • methods: 本研究提出了一种基于图的日志异常检测框架(GLAD),它将日志 semantics、关系模式和时序模式综合考虑到异常检测中。GLAD包括日志内容提取模块、动态日志图构建模块和时序注意力图 anomaly detection模型。
  • results: 在三个数据集上测试 GLAD,结果表明 GLAD 能够有效地检测异常,异常的关系模式也能够被识别出来。
    Abstract Logs play a crucial role in system monitoring and debugging by recording valuable system information, including events and states. Although various methods have been proposed to detect anomalies in log sequences, they often overlook the significance of considering relations among system components, such as services and users, which can be identified from log contents. Understanding these relations is vital for detecting anomalies and their underlying causes. To address this issue, we introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect relational anomalies in system logs. GLAD incorporates log semantics, relational patterns, and sequential patterns into a unified framework for anomaly detection. Specifically, GLAD first introduces a field extraction module that utilizes prompt-based few-shot learning to identify essential fields from log contents. Then GLAD constructs dynamic log graphs for sliding windows by interconnecting extracted fields and log events parsed from the log parser. These graphs represent events and fields as nodes and their relations as edges. Subsequently, GLAD utilizes a temporal-attentive graph edge anomaly detection model for identifying anomalous relations in these dynamic log graphs. This model employs a Graph Neural Network (GNN)-based encoder enhanced with transformers to capture content, structural and temporal features. We evaluate our proposed method on three datasets, and the results demonstrate the effectiveness of GLAD in detecting anomalies indicated by varying relational patterns.
    摘要 系统监控和调试中,日志扮演着关键角色,记录了系统中的重要信息,包括事件和状态。虽然有很多方法用于检测日志序列中的异常,但它们通常忽略了考虑系统组件之间的关系,例如服务和用户,这些关系可以从日志内容中得到。了解这些关系非常重要,可以帮助检测异常和其下面的原因。为解决这个问题,我们提出了 GLAD,一个基于图的日志异常检测框架,可以检测系统日志中的关系异常。GLAD 将日志Semantics、关系模式和时序模式集成到一个统一的异常检测框架中。具体来说,GLAD 首先引入一个字段提取模块,使用提示based few-shot learning来确定日志内容中的重要字段。然后,GLAD 构建了动态日志图,将提取的字段和日志事件与日志解析器解析的日志事件相连接。这些图表示事件和字段作为节点,以及它们之间的关系作为边。接着,GLAD 利用一个 temporal-attentive 图边异常检测模型来检测动态日志图中的异常关系。这个模型使用图神经网络(GNN)基本encoder和转换器来捕捉内容、结构和时序特征。我们对 GLAD 进行了三个数据集的测试,结果表明 GLAD 能够根据不同的关系模式检测异常。

Quantized Non-Volatile Nanomagnetic Synapse based Autoencoder for Efficient Unsupervised Network Anomaly Detection

  • paper_url: http://arxiv.org/abs/2309.06449
  • repo_url: None
  • paper_authors: Muhammad Sabbir Alam, Walid Al Misba, Jayasimha Atulasimha
  • for: 本研究旨在提出一种基于自编码器的异常检测方法,并在边缘设备中实现实时学习,以解决边缘设备的限制性。
  • methods: 该方法使用低分辨率非朋素储存器 synapse 和有效的量化神经网络学习算法,并利用磁导轨道上引入的磁界墙(DW)来实现非朋素储存器 synapse。
  • results: 该方法在 NSL-KDD 数据集上进行异常检测,并证明了对异常检测的改进(90.98%),并且在训练过程中减少了至少三个数量级的weight更新,从而实现了显著的能源节省。
    Abstract In the autoencoder based anomaly detection paradigm, implementing the autoencoder in edge devices capable of learning in real-time is exceedingly challenging due to limited hardware, energy, and computational resources. We show that these limitations can be addressed by designing an autoencoder with low-resolution non-volatile memory-based synapses and employing an effective quantized neural network learning algorithm. We propose a ferromagnetic racetrack with engineered notches hosting a magnetic domain wall (DW) as the autoencoder synapses, where limited state (5-state) synaptic weights are manipulated by spin orbit torque (SOT) current pulses. The performance of anomaly detection of the proposed autoencoder model is evaluated on the NSL-KDD dataset. Limited resolution and DW device stochasticity aware training of the autoencoder is performed, which yields comparable anomaly detection performance to the autoencoder having floating-point precision weights. While the limited number of quantized states and the inherent stochastic nature of DW synaptic weights in nanoscale devices are known to negatively impact the performance, our hardware-aware training algorithm is shown to leverage these imperfect device characteristics to generate an improvement in anomaly detection accuracy (90.98%) compared to accuracy obtained with floating-point trained weights. Furthermore, our DW-based approach demonstrates a remarkable reduction of at least three orders of magnitude in weight updates during training compared to the floating-point approach, implying substantial energy savings for our method. This work could stimulate the development of extremely energy efficient non-volatile multi-state synapse-based processors that can perform real-time training and inference on the edge with unsupervised data.
    摘要 在基于自适应器的异常检测 paradigm中,在边缘设备中实现自适应器是极其挑战性的,主要是因为边缘设备的硬件、能源和计算资源有限。我们表明,这些限制可以通过设计一个具有低分辨率、不可变存储器 synapse的 autoencoder,并使用有效的量化神经网络学习算法来解决。我们提议一种磁気轨道上的磁Domain墙(DW)作为 autoencoder synapse,其中有限状态(5状) synaptic веса通过磁力辐射(SOT)电流脉冲来 manipulate。我们对提议的 autoencoder 模型在 NSL-KDD 数据集上进行异常检测性能的评估。我们采用了限制分辨率和 DW 设备不确定性的意识training autoencoder,而不是使用浮点数精度 weights。虽然有限数量的量化状态和 nanoscale 设备内的固有不确定性会对性能产生负面影响,但我们的硬件意识训练算法可以利用这些不完美设备特性来提高异常检测精度(90.98%)相比浮点数训练 weights。此外,我们的 DW 方法显示在训练期间对 weight updates 的减少是至少三个数量级,这意味着我们的方法可以获得显著的能源抑制。这种工作可能会促进非常能效的非易失multi-state synapse基于处理器的开发,以便在边缘上进行实时训练和推理,并使用不supervised数据。