results: 实验结果表明,使用PF ODE模型进行概率估计是对高复杂度、高概率攻击的Robustness的。此外,在CIFAR-10 dataset上,一些黑客攻击样本具有semantic meaning,如预期的Robust estimator中的。Abstract
Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization attacks and the relation to sample complexity, where the compressed size of a sample is used as a measure of its complexity. We introduce and evaluate six gradient-based log-likelihood maximization attacks, including a novel reverse integration attack. Our experimental evaluations on CIFAR-10 show that density estimation using the PF ODE is robust against high-complexity, high-likelihood attacks, and that in some cases adversarial samples are semantically meaningful, as expected from a robust estimator.
摘要
除了其吸引人的采样能力之外,分数基于扩散模型还提供了一种强大的分析工具,即对训练数据分布下的查询样本进行不偏的density估计。在这种工作中,我们研究了PF neural differential equation(ODE)模型对梯度基于可能性最大化攻击的Robustness,以及与样本复杂度之间的关系。我们介绍并评估了6种梯度基于Log-likelihood最大化攻击,其中包括一种新的反整合攻击。我们的实验评估表明,使用PF ODE进行density估计对于高复杂性、高可能性攻击是Robust,而且在某些情况下,黑客样本具有semantically meaningful的意义,与一个Robust估计器相符。
Taking the human out of decomposition-based optimization via artificial intelligence: Part II. Learning to initialize
for: 解决大规模优化问题,frequently encountered in process systems engineering tasks.
methods: 使用机器学习方法学习优化算法的最佳初始化,以减少计算时间。
results: 提出的方法可以带来显著减少解决时间,并且活动学习可以减少学习数据量。Here’s a breakdown of each point:1. for: The paper is written for solving large-scale optimization problems in process systems engineering tasks.2. methods: The paper proposes using machine learning to learn the optimal initialization of decomposition-based solution methods, which can reduce the computational time.3. results: The proposed method can significantly reduce the solution time, and active learning can reduce the amount of data required for learning.Abstract
The repeated solution of large-scale optimization problems arises frequently in process systems engineering tasks. Decomposition-based solution methods have been widely used to reduce the corresponding computational time, yet their implementation has multiple steps that are difficult to configure. We propose a machine learning approach to learn the optimal initialization of such algorithms which minimizes the computational time. Active and supervised learning is used to learn a surrogate model that predicts the computational performance for a given initialization. We apply this approach to the initialization of Generalized Benders Decomposition for the solution of mixed integer model predictive control problems. The surrogate models are used to find the optimal number of initial cuts that should be added in the master problem. The results show that the proposed approach can lead to a significant reduction in solution time, and active learning can reduce the data required for learning.
摘要
大规模优化问题的重复解决问题经常出现在进程系统工程中的任务中。基于分解的解决方法广泛使用,但它们的实施具有多个步骤,这些步骤困难配置。我们提议使用机器学习方法来学习优化算法的初始化,以降低计算时间。我们使用活动学习和监督学习来学习一个预测算法的计算性能,这个预测算法用于确定给定初始化的计算时间。我们应用这种方法到 generalized Benders decomposition 的初始化中,用于解决混合整数预测控制问题。 surrogate 模型用于找到最佳的初始剖分数,以降低解决时间。结果表明,我们的方法可以带来显著的解决时间减少,并且活动学习可以减少学习数据量。
results: 在30% 恶意客户端情况下,通过信誉机制实现快速模型融合和高精度。Abstract
Federated Learning (FL) is a well-known paradigm of distributed machine learning on mobile and IoT devices, which preserves data privacy and optimizes communication efficiency. To avoid the single point of failure problem in FL, decentralized federated learning (DFL) has been proposed to use peer-to-peer communication for model aggregation, which has been considered an attractive solution for machine learning tasks on distributed personal devices. However, this process is vulnerable to attackers who share false models and data. If there exists a group of malicious clients, they might harm the performance of the model by carrying out a poisoning attack. In addition, in DFL, clients often lack the incentives to contribute their computing powers to do model training. In this paper, we proposed Blockchain-based Decentralized Federated Learning (BDFL), which leverages a blockchain for decentralized model verification and auditing. BDFL includes an auditor committee for model verification, an incentive mechanism to encourage the participation of clients, a reputation model to evaluate the trustworthiness of clients, and a protocol suite for dynamic network updates. Evaluation results show that, with the reputation mechanism, BDFL achieves fast model convergence and high accuracy on real datasets even if there exist 30\% malicious clients in the system.
摘要
federated learning(FL)是一种已知的分布式机器学习模式,适用于移动设备和物联网设备,保持数据隐私和通信效率。为了解决FL中的单点失败问题,分布式 federated learning(DFL)已经提议使用对等通信进行模型集成,这被视为对于分布在个人设备上的机器学习任务的有appealing解决方案。然而,这个过程受到攻击者们发送false模型和数据的威胁。如果存在一群恶意客户端,他们可能会通过毒品攻击伤害模型的性能。此外,在DFL中,客户端经常缺乏参与到模型训练中的动机。在这篇论文中,我们提出了基于区块链的分布式 federated learning(BDFL),该技术利用区块链进行分布式模型验证和审核。BDFL包括一个审计委员会 для模型验证、一种激励客户端参与的机制、一个客户端信任度评估模型以及一套协议集 для动态网络更新。评估结果表明,在各种情况下,包括30%的恶意客户端,BDFL仍能够快速启 converges和高精度地完成实际数据集的模型训练。
Taking the human out of decomposition-based optimization via artificial intelligence: Part I. Learning when to decompose
results: 该方法可以开发一个可以判断凸混合整数非线性 програм的最佳解决方法是使用分支和约束算法还是外接算法。此外,可以将学习的分类器 integrate到现有混合整数优化解决方案中。Abstract
In this paper, we propose a graph classification approach for automatically determining whether to use a monolithic or a decomposition-based solution method. In this approach, an optimization problem is represented as a graph that captures the structural and functional coupling among the variables and constraints of the problem via an appropriate set of features. Given this representation, a graph classifier is built to determine the best solution method for a given problem. The proposed approach is used to develop a classifier that determines whether a convex Mixed Integer Nonlinear Programming problem should be solved using branch and bound or the outer approximation algorithm. Finally, it is shown how the learned classifier can be incorporated into existing mixed integer optimization solvers.
摘要
在这篇论文中,我们提出了一种图 классификация方法,用于自动确定是否使用简单或含 decomposition 的解决方法。在这种方法中,一个优化问题被表示为一个图, capture 变量和约束之间的结构和功能相互关系via 合适的特征集。给出这种表示,一个图分类器被建立,以确定给定问题的最佳解决方法。我们所提出的方法用于开发一个可以判断 convex 混合整数非线性程序问题是否使用分支和约束算法或外接算法解决。最后,我们示出了如何将学习的分类器集成到现有的混合整数优化解决方案中。
Acoustic Model Fusion for End-to-end Speech Recognition
paper_authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu
for: 提高 ASR 系统的准确率和 named entity recognition 性能
methods: 提出一种将 external acoustic model integrated into end-to-end ASR 系统的方法,以更好地解决频率域匹配问题
results: 实现了在不同测试集上的词错率下降,最高达14.3%,同时Named entity recognition的性能也得到了明显提高Abstract
Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition.
摘要
Traditional ASR systems consist of separate AM and language model (LM) components, but E2E systems combine these components into a single network trained on audio-text pairs. Despite this simpler architecture, fusing a separate LM trained exclusively on text corpora into the E2E system has been shown to be beneficial. However, this approach is limited by its inability to address the domain mismatch issue inherent to the internal AM. Our proposed approach of integrating an external AM into the E2E system addresses this issue by providing a more diverse set of acoustic features to the network. This allows the network to better handle variations in speech and improve overall accuracy. We have tested our approach on a variety of datasets and have achieved significant improvements in word error rates, with an impressive drop of up to 14.3% across all test sets. Additionally, we have found that this approach is particularly effective in enhancing named entity recognition.
Spiral-Elliptical automated galaxy morphology classification from telescope images
results: 使用 Sloan Digital Sky Survey 的星系图像数据,我们证明了我们提出的图像统计方法可以高效地检测扁旋和螺旋星系,并且可以作为Random Forest 分类器的特征来使用。Abstract
The classification of galaxy morphologies is an important step in the investigation of theories of hierarchical structure formation. While human expert visual classification remains quite effective and accurate, it cannot keep up with the massive influx of data from emerging sky surveys. A variety of approaches have been proposed to classify large numbers of galaxies; these approaches include crowdsourced visual classification, and automated and computational methods, such as machine learning methods based on designed morphology statistics and deep learning. In this work, we develop two novel galaxy morphology statistics, descent average and descent variance, which can be efficiently extracted from telescope galaxy images. We further propose simplified versions of the existing image statistics concentration, asymmetry, and clumpiness, which have been widely used in the literature of galaxy morphologies. We utilize the galaxy image data from the Sloan Digital Sky Survey to demonstrate the effective performance of our proposed image statistics at accurately detecting spiral and elliptical galaxies when used as features of a random forest classifier.
摘要
《星系形态分类是astrophysical structure formation理论研究中一个重要步骤。虽然人类专家视觉分类仍然非常有效和准确,但由于天文大观测数据的涌入,人类分类无法满足数据的需求。各种方法被提出来分类大量的星系,包括人工智能分类和计算机方法,如基于设计的形态统计和深度学习。在这项工作中,我们开发了两种新的星系形态统计,即下降平均值和下降方差,可以快速从望远镜星系图像中提取。我们还提出了现有图像统计的简化版本,包括吸引度、非均匀性和块性,这些统计在Literature中广泛使用。我们使用 Sloan Digital Sky Survey 的星系图像数据来证明我们提出的图像统计能够准确地检测扁旋和椭圆星系,当作Random Forest 分类器的特征。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication
methods: 本研究提出了一种名为Federated Multimodal Fusion learning with Selective modality communication(FedMFS)的新方法,该方法利用Shapley值来衡量每个模式的贡献,并根据模式模型大小来衡量通信开销,以便每个客户端可以选择上传模式模型到服务器进行集成。
results: 实验结果表明,FedMFS方法可以减少一很大的通信开销,同时保持与基准值相对的准确性。实际上,FedMFS方法可以在真实的多Modal数据集上实现相对于基准值的20%的通信开销减少。Abstract
Federated learning (FL) is a distributed machine learning (ML) paradigm that enables clients to collaborate without accessing, infringing upon, or leaking original user data by sharing only model parameters. In the Internet of Things (IoT), edge devices are increasingly leveraging multimodal data compositions and fusion paradigms to enhance model performance. However, in FL applications, two main challenges remain open: (i) addressing the issues caused by heterogeneous clients lacking specific modalities and (ii) devising an optimal modality upload strategy to minimize communication overhead while maximizing learning performance. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is to utilize Shapley values to quantify each modality's contribution and modality model size to gauge communication overhead, so that each client can selectively upload the modality models to the server for aggregation. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and applications. Experiments on real-world multimodal datasets demonstrate the effectiveness of FedMFS, achieving comparable accuracy while reducing communication overhead by one twentieth compared to baselines.
摘要
federated learning (FL) 是一种分布式机器学习 (ML) paradigma, enables clients to collaborate without accessing, infringing upon, or leaking original user data by sharing only model parameters. In the Internet of Things (IoT), edge devices are increasingly leveraging multimodal data compositions and fusion paradigms to enhance model performance. However, in FL applications, two main challenges remain open: (i) addressing the issues caused by heterogeneous clients lacking specific modalities and (ii) devising an optimal modality upload strategy to minimize communication overhead while maximizing learning performance. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is to utilize Shapley values to quantify each modality's contribution and modality model size to gauge communication overhead, so that each client can selectively upload the modality models to the server for aggregation. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and applications. Experiments on real-world multimodal datasets demonstrate the effectiveness of FedMFS, achieving comparable accuracy while reducing communication overhead by one twentieth compared to baselines.
A predict-and-optimize approach to profit-driven churn prevention
results: 在12个客户流失预测数据集上,该策略达到了最佳平均收益水平,比其他常见策略更高。Abstract
In this paper, we introduce a novel predict-and-optimize method for profit-driven churn prevention. We frame the task of targeting customers for a retention campaign as a regret minimization problem. The main objective is to leverage individual customer lifetime values (CLVs) to ensure that only the most valuable customers are targeted. In contrast, many profit-driven strategies focus on churn probabilities while considering average CLVs. This often results in significant information loss due to data aggregation. Our proposed model aligns with the guidelines of Predict-and-Optimize (PnO) frameworks and can be efficiently solved using stochastic gradient descent methods. Results from 12 churn prediction datasets underscore the effectiveness of our approach, which achieves the best average performance compared to other well-established strategies in terms of average profit.
摘要
在这篇论文中,我们介绍了一种新的预测和优化方法,用于防止利润驱动的客户流失。我们将客户退货活动的目标客户群作为 regret 最小化问题来定义。我们的主要目标是通过个体客户生命周期价值(CLV)来确保只有最有价值的客户被targeting。与此相比,许多利润驱动策略往往强调退货概率,而不考虑CLV的含义。这经常导致数据汇总所产生的信息损失。我们提出的模型遵循Predict-and-Optimize(PnO)框架的指南,可以使用Stochastic Gradient Descent(SGD)方法高效地解决。Results from 12 churn prediction datasets confirm the effectiveness of our approach, which achieves the best average performance compared to other well-established strategies in terms of average profit.Here's a word-for-word translation of the text into Simplified Chinese:在这篇论文中,我们介绍了一种新的预测和优化方法,用于防止利润驱动的客户流失。我们将客户退货活动的目标客户群作为 regret 最小化问题来定义。我们的主要目标是通过个体客户生命周期价值(CLV)来确保只有最有价值的客户被targeting。与此相比,许多利润驱动策略往往强调退货概率,而不考虑CLV的含义。这经常导致数据汇总所产生的信息损失。我们提出的模型遵循Predict-and-Optimize(PnO)框架的指南,可以使用Stochastic Gradient Descent(SGD)方法高效地解决。Results from 12 churn prediction datasets confirm the effectiveness of our approach, which achieves the best average performance compared to other well-established strategies in terms of average profit.
Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing
results: 在非线性系统识别问题上,提出的方法得到了证明。在音频干扰抑制问题中,通过对比与其他现有解决方案的实验,表明了我们的方法在实际应用中的效果。Abstract
Improving the interpretability of deep neural networks has recently gained increased attention, especially when the power of deep learning is leveraged to solve problems in physics. Interpretability helps us understand a model's ability to generalize and reveal its limitations. In this paper, we introduce a causal interpretable deep structure for modeling dynamic systems. Our proposed model makes use of the harmonic analysis by modeling the system in a time-frequency domain while maintaining high temporal and spectral resolution. Moreover, the model is built in an order recursive manner which allows for fast, robust, and exact second order optimization without the need for an explicit Hessian calculation. To circumvent the resulting high dimensionality of the building blocks of our system, a neural network is designed to identify the frequency interdependencies. The proposed model is illustrated and validated on nonlinear system identification problems as required for audio signal processing tasks. Crowd-sourced experimentation contrasting the performance of the proposed approach to other state-of-the-art solutions on an acoustic echo cancellation scenario confirms the effectiveness of our method for real-life applications.
摘要
深度学习在物理问题中的应用已经受到了提高解释性的关注,特别是当深度学习的力量被应用于解决物理问题时。解释性能我们理解模型的泛化能力和其局限性。在这篇论文中,我们介绍了一种可 causal 解释深度结构,用于模型动态系统。我们的提议的模型利用干扰分析,将系统模型在时间频域中进行了时间频谱分析,同时保持高度的时间和频率分辨率。此外,模型采用递归的构建方式,可以快速、稳定、准确地进行第二阶导数计算,不需要显式表达Hessian。为了避免建模块的高维度,我们设计了一个神经网络来识别频率相互关系。我们的提议模型在非线性系统识别问题中得到了验证,特别是在音频信号处理任务中。通过人工 эксперимент,我们比较了我们的方法与其他现有解决方案在音频适应噪抑问题中的性能,并证明了我们的方法在实际应用中的有效性。
Neural Relational Inference with Fast Modular Meta-learning
methods: 这 paper 使用模块化元学习法,通过不同组合方式训练神经模块,以解决多种任务。
results: 这 paper 使用模块化元学习法提高了推理能力,可以更有效地利用观察数据,并且可以估计未直接观察到的实体状态。Abstract
\textit{Graph neural networks} (GNNs) are effective models for many dynamical systems consisting of entities and relations. Although most GNN applications assume a single type of entity and relation, many situations involve multiple types of interactions. \textit{Relational inference} is the problem of inferring these interactions and learning the dynamics from observational data. We frame relational inference as a \textit{modular meta-learning} problem, where neural modules are trained to be composed in different ways to solve many tasks. This meta-learning framework allows us to implicitly encode time invariance and infer relations in context of one another rather than independently, which increases inference capacity. Framing inference as the inner-loop optimization of meta-learning leads to a model-based approach that is more data-efficient and capable of estimating the state of entities that we do not observe directly, but whose existence can be inferred from their effect on observed entities. To address the large search space of graph neural network compositions, we meta-learn a \textit{proposal function} that speeds up the inner-loop simulated annealing search within the modular meta-learning algorithm, providing two orders of magnitude increase in the size of problems that can be addressed.
摘要
\begin{itemize}\item 图 neural networks(GNNs)是适用于许多动态系统中的有效模型,该系统包括实体和关系。 although most GNN applications assume a single type of entity and relation, many situations involve multiple types of interactions.\item 关系推理(relational inference)是从观察数据中推理这些交互的问题,学习这些交互的动态。 we frame relational inference as a modular meta-learning problem, where neural modules are trained to be composed in different ways to solve many tasks.\item 这个meta-learning框架允许我们通过不同的模块组合来解决多种任务,从而隐式地编码了时间不变性,并在彼此之间学习关系,这使得推理能力更高。\item 将推理视为meta-learning的内部循环优化问题,导致一种基于模型的方法,更有效率地使用数据,并能够估计不 direktly observable的实体状态,而是通过其影响已知实体来推理其存在。\item 为了解决图 neural network的模块组合搜索的大搜索空间,我们meta-learn a proposal function,这将在模块meta-learning算法中加速内部逻辑搜索,提供了两个数量级的提高,使得可以处理的问题规模提高了两个数量级。\end{itemize}Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for Phishing Prevention
paper_authors: Rodolfo Valentim, Idilio Drago, Marco Mellia, Federico Cerutti
for: 防御声钩攻击(Sound-squatting),使用人工智能生成声钩候选者。
methods: 使用 transformers 网络和声学模型组合,学习声音相似性。
results: 可以自动找到已知同音词和数千个高质量候选者,同时支持交互语言的声钩攻击。Abstract
Sound-squatting is a phishing attack that tricks users into malicious resources by exploiting similarities in the pronunciation of words. Proactive defense against sound-squatting candidates is complex, and existing solutions rely on manually curated lists of homophones. We here introduce Sound-skwatter, a multi-language AI-based system that generates sound-squatting candidates for proactive defense. Sound-skwatter relies on an innovative multi-modal combination of Transformers Networks and acoustic models to learn sound similarities. We show that Sound-skwatter can automatically list known homophones and thousands of high-quality candidates. In addition, it covers cross-language sound-squatting, i.e., when the reader and the listener speak different languages, supporting any combination of languages. We apply Sound-skwatter to network-centric phishing via squatted domain names. We find ~ 10% of the generated domains exist in the wild, the vast majority unknown to protection solutions. Next, we show attacks on the PyPI package manager, where ~ 17% of the popular packages have at least one existing candidate. We believe Sound-skwatter is a crucial asset to mitigate the sound-squatting phenomenon proactively on the Internet. To increase its impact, we publish an online demo and release our models and code as open source.
摘要
声音骗鱼是一种钓鱼攻击,通过利用声音相似性来骗用户访问恶意资源。现有的防御方法复杂,并且 existing solutions 依赖于手动维护的同音词列表。我们在这里介绍 Sound-skwatter,一个多语言基于 AI 系统,用于生成声音骗鱼候选者。Sound-skwatter 利用了一种创新的多模式 комбиinación,包括 transformers 网络和声音模型,以学习声音相似性。我们表明,Sound-skwatter 可以自动列出已知同音词和数千个高质量候选者。此外,它还支持跨语言声音骗鱼,即当读者和听众说不同语言时。我们应用 Sound-skwatter 于网络中心式骗鱼 via 骗取的域名。我们发现 ~ 10% 的生成域名在野,大多数都是未知的保护解决方案。接着,我们表明 ~ 17% 的流行包在 PyPI 包管理器中有至少一个现有的候选者。我们认为 Sound-skwatter 是在互联网上防止声音骗鱼的关键资产,以提高其影响力,我们在线发布了 demo 和发布我们的模型和代码为开源。
CarDS-Plus ECG Platform: Development and Feasibility Evaluation of a Multiplatform Artificial Intelligence Toolkit for Portable and Wearable Device Electrocardiograms
paper_authors: Sumukh Vasisht Shankar, Evangelos K Oikonomou, Rohan Khera for: 这个研究旨在开发一个多平台系统,以快速部署基于人工智能的单导电喷(ECG)解决方案,用于临床调查和诊断。methods: 这个研究使用了多种设计考虑因素,包括具体应用场景、数据流程优化和实时推断等方面,以实现将多种来源的单导电喷数据传输到中央数据湖,并通过人工智能模型进行ECG解译。results: 研究表明,这个平台可以快速地从获取到报告结果,平均需时为33.0-35.7秒,无论使用哪种商业化的设备(Apple Watch和KardiaMobile)。这些结果表明了将设计原则翻译到快速部署的策略是可行的,并且可以在临床医疗中实现影响。Abstract
In the rapidly evolving landscape of modern healthcare, the integration of wearable & portable technology provides a unique opportunity for personalized health monitoring in the community. Devices like the Apple Watch, FitBit, and AliveCor KardiaMobile have revolutionized the acquisition and processing of intricate health data streams. Amidst the variety of data collected by these gadgets, single-lead electrocardiogram (ECG) recordings have emerged as a crucial source of information for monitoring cardiovascular health. There has been significant advances in artificial intelligence capable of interpreting these 1-lead ECGs, facilitating clinical diagnosis as well as the detection of rare cardiac disorders. This design study describes the development of an innovative multiplatform system aimed at the rapid deployment of AI-based ECG solutions for clinical investigation & care delivery. The study examines design considerations, aligning them with specific applications, develops data flows to maximize efficiency for research & clinical use. This process encompasses the reception of single-lead ECGs from diverse wearable devices, channeling this data into a centralized data lake & facilitating real-time inference through AI models for ECG interpretation. An evaluation of the platform demonstrates a mean duration from acquisition to reporting of results of 33.0 to 35.7 seconds, after a standard 30 second acquisition. There were no substantial differences in acquisition to reporting across two commercially available devices (Apple Watch and KardiaMobile). These results demonstrate the succcessful translation of design principles into a fully integrated & efficient strategy for leveraging 1-lead ECGs across platforms & interpretation by AI-ECG algorithms. Such a platform is critical to translating AI discoveries for wearable and portable ECG devices to clinical impact through rapid deployment.
摘要
在现代医疗面前的急速发展 landscape中,穿戴式和可携式技术的集成提供了个人化健康监测在社区的唯一机会。例如Apple Watch、FitBit和AliveCor KardiaMobile等设备已经革命化了健康数据流的收集和处理。在这些设备收集的数据中,单Channel electrocardiogram(ECG)记录已经成为监测心血管健康的关键来源。人工智能(AI)技术的进步使得可以解释这些1-Channel ECG,从而促进诊断和检测罕见心血管疾病。这个设计研究描述了一种创新的多平台系统,旨在快速部署AI-基于ECG解决方案 для临床调查和诊疗。研究考虑了设计因素,与特定应用相对应,并开发了数据流程,以最大化研究和临床使用的效率。这个过程包括从多种穿戴式设备接收单Channel ECG,将数据传输到中央数据湖,并通过AI模型对ECG进行实时解释。研究表明,平台的实现可以在收集到报告结果的时间内减少了33.0到35.7秒,并且没有显著差异在不同的商业设备(Apple Watch和KardiaMobile)上。这些结果证明了设计原则的成功翻译为一个高效集成的策略,可以在多个平台上使用AI-ECG算法进行单Channel ECG的解释。这种平台是评估AI发现的穿戴式和可携式ECG设备的临床影响的关键。
Federated Quantum Machine Learning with Differential Privacy
results: 使用量子-классиical机器学习模型对猫vs狗数据集进行二分类,实现了测试准确率超过0.98,同时保持ε值小于1.3。验证了 federated differentially private training 是一种可行的隐私保护方法 для量子机器学习 на Noisy Intermediate-Scale Quantum(NISQ)设备。Abstract
The preservation of privacy is a critical concern in the implementation of artificial intelligence on sensitive training data. There are several techniques to preserve data privacy but quantum computations are inherently more secure due to the no-cloning theorem, resulting in a most desirable computational platform on top of the potential quantum advantages. There have been prior works in protecting data privacy by Quantum Federated Learning (QFL) and Quantum Differential Privacy (QDP) studied independently. However, to the best of our knowledge, no prior work has addressed both QFL and QDP together yet. Here, we propose to combine these privacy-preserving methods and implement them on the quantum platform, so that we can achieve comprehensive protection against data leakage (QFL) and model inversion attacks (QDP). This implementation promises more efficient and secure artificial intelligence. In this paper, we present a successful implementation of these privacy-preservation methods by performing the binary classification of the Cats vs Dogs dataset. Using our quantum-classical machine learning model, we obtained a test accuracy of over 0.98, while maintaining epsilon values less than 1.3. We show that federated differentially private training is a viable privacy preservation method for quantum machine learning on Noisy Intermediate-Scale Quantum (NISQ) devices.
摘要
保护隐私是人工智能在敏感训练数据实施中的关键问题。有几种技术来保护数据隐私,但量子计算机是因为无论护法 theorem,因此在计算平台上具有最好的安全性。先前有关保护数据隐私的研究,包括量子联合学习(QFL)和量子差分隐私(QDP),但是到目前为止没有任何研究既 combinates these two privacy-preserving methods。在这篇文章中,我们提议将这两种隐私保护方法结合在一起,并在量子平台上实现,以实现全面的数据泄露防止(QFL)和模型反向攻击防止(QDP)。这种实现承诺更高效和安全的人工智能。在这篇文章中,我们成功地实现了这些隐私保护方法,通过对猫vs狗数据集进行二分类。使用我们的量子-классиical机器学习模型,我们在测试精度达0.98,而且psilon值低于1.3。我们显示,联邦差分隐私训练是量子机器学习在Noisy Intermediate-Scale Quantum(NISQ)设备上可行的隐私保护方法。
Flood and Echo: Algorithmic Alignment of GNNs with Distributed Computing
results: 研究表明,该框架在许多情况下比传统的推理框架更有效率,并且能够有效地进行信息交换和推理扩展。Abstract
Graph Neural Networks are a natural fit for learning algorithms. They can directly represent tasks through an abstract but versatile graph structure and handle inputs of different sizes. This opens up the possibility for scaling and extrapolation to larger graphs, one of the most important advantages of an algorithm. However, this raises two core questions i) How can we enable nodes to gather the required information in a given graph ($\textit{information exchange}$), even if is far away and ii) How can we design an execution framework which enables this information exchange for extrapolation to larger graph sizes ($\textit{algorithmic alignment for extrapolation}$). We propose a new execution framework that is inspired by the design principles of distributed algorithms: Flood and Echo Net. It propagates messages through the entire graph in a wave like activation pattern, which naturally generalizes to larger instances. Through its sparse but parallel activations it is provably more efficient in terms of message complexity. We study the proposed model and provide both empirical evidence and theoretical insights in terms of its expressiveness, efficiency, information exchange and ability to extrapolate.
摘要
GRAPH Neural Networks 是一种自然的适应算法。它们可以直接通过抽象但强大的图结构表示任务,并处理不同大小的输入。这打开了扩大和推断到更大图的可能性,是算法中最重要的优势。然而,这引出了两个核心问题:(i)如何使节点获得图中需要的信息(信息交换),即使它们在远方的 ;(ii)如何设计一个执行框架,使得这些信息交换在更大的图像上进行推断(算法对适应推断)。我们提出了一种新的执行框架,它是基于分布式算法的设计原则:洪涝网络和回声网络。它在整个图上传递消息,使得它自然泛化到更大的实例。通过它的稀疏但平行的活动,可以证明它比消息复杂度更高效。我们研究了提议的模型,并提供了both empirical evidence和理论听见,包括表达能力、效率、信息交换和推断能力。
Positivity-free Policy Learning with Observational Data
results: 本研究提供了对政策学习的理论保证,并验证了提出的框架的finite-sample表现,通过了全面的数据实验,以确保从观察数据中提取 causal 效应是 Both 可靠和可靠。Abstract
Policy learning utilizing observational data is pivotal across various domains, with the objective of learning the optimal treatment assignment policy while adhering to specific constraints such as fairness, budget, and simplicity. This study introduces a novel positivity-free (stochastic) policy learning framework designed to address the challenges posed by the impracticality of the positivity assumption in real-world scenarios. This framework leverages incremental propensity score policies to adjust propensity score values instead of assigning fixed values to treatments. We characterize these incremental propensity score policies and establish identification conditions, employing semiparametric efficiency theory to propose efficient estimators capable of achieving rapid convergence rates, even when integrated with advanced machine learning algorithms. This paper provides a thorough exploration of the theoretical guarantees associated with policy learning and validates the proposed framework's finite-sample performance through comprehensive numerical experiments, ensuring the identification of causal effects from observational data is both robust and reliable.
摘要
政策学习使用观察数据是多种领域的关键,旨在学习最佳治理分配策略,遵循特定的限制,如公平、预算和简单性。本研究提出了一种新的无正定性(随机)政策学习框架,用于实际世界场景中缺乏正定性的挑战。这种框架利用增量抽象分数策略来调整治理分数值,而不是将固定值分配给治理。我们描述这种增量抽象分数策略,并提出了定型条件,使用半 Parametric 效率理论提出高效的估计器,可以在融合先进机器学习算法时实现快速收敛速率。本文对政策学习的理论保证和finite-sample表现进行了全面的探讨,并通过了广泛的数字实验,以确保从观察数据中检测到的 causal 效应是可靠和可信。Here's the translation of the text into Simplified Chinese:政策学习使用观察数据是多种领域的关键,旨在学习最佳治理分配策略,遵循特定的限制,如公平、预算和简单性。本研究提出了一种新的无正定性(随机)政策学习框架,用于实际世界场景中缺乏正定性的挑战。这种框架利用增量抽象分数策略来调整治理分数值,而不是将固定值分配给治理。我们描述这种增量抽象分数策略,并提出了定型条件。使用半 Parametric 效率理论提出高效的估计器,可以在融合先进机器学习算法时实现快速收敛速率。本文对政策学习的理论保证和finite-sample表现进行了全面的探讨,并通过了广泛的数字实验,以确保从观察数据中检测到的 causal 效应是可靠和可信。
Diffusion Prior Regularized Iterative Reconstruction for Low-dose CT
paper_authors: Wenjun Xia, Yongyi Shi, Chuang Niu, Wenxiang Cong, Ge Wang
for: 减少X射线辐射剂量,提高 computed tomography(CT)图像质量
methods: 引入迭代重建算法,并将杂散抑制推理模型(DDPM)与数据准确性优先重建方法融合
results: 实现高Definition CT图像重建,减少辐射剂量Abstract
Computed tomography (CT) involves a patient's exposure to ionizing radiation. To reduce the radiation dose, we can either lower the X-ray photon count or down-sample projection views. However, either of the ways often compromises image quality. To address this challenge, here we introduce an iterative reconstruction algorithm regularized by a diffusion prior. Drawing on the exceptional imaging prowess of the denoising diffusion probabilistic model (DDPM), we merge it with a reconstruction procedure that prioritizes data fidelity. This fusion capitalizes on the merits of both techniques, delivering exceptional reconstruction results in an unsupervised framework. To further enhance the efficiency of the reconstruction process, we incorporate the Nesterov momentum acceleration technique. This enhancement facilitates superior diffusion sampling in fewer steps. As demonstrated in our experiments, our method offers a potential pathway to high-definition CT image reconstruction with minimized radiation.
摘要
computed tomography (CT) 涉及到辐射 ionizing radiation,以降低辐射剂量,可以 either 降低 X-ray фото counts 或者下推 projection views。然而,任一种方法通常会 compromise 图像质量。为 Addressing this challenge, here we introduce an iterative reconstruction algorithm regularized by a diffusion prior。 drawing on the exceptional imaging prowess of the denoising diffusion probabilistic model (DDPM), we merge it with a reconstruction procedure that prioritizes data fidelity。 This fusion capitalizes on the merits of both techniques, delivering exceptional reconstruction results in an unsupervised framework。 To further enhance the efficiency of the reconstruction process, we incorporate the Nesterov momentum acceleration technique。 This enhancement facilitates superior diffusion sampling in fewer steps。 As demonstrated in our experiments, our method offers a potential pathway to high-definition CT image reconstruction with minimized radiation。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you need Traditional Chinese, please let me know.
A Variational Autoencoder Framework for Robust, Physics-Informed Cyberattack Recognition in Industrial Cyber-Physical Systems
results: 经过实验研究,提出的方法在一个网络化的电力传输系统上的实验研究中表现出了应用性和效果。Abstract
Cybersecurity of Industrial Cyber-Physical Systems is drawing significant concerns as data communication increasingly leverages wireless networks. A lot of data-driven methods were develope for detecting cyberattacks, but few are focused on distinguishing them from equipment faults. In this paper, we develop a data-driven framework that can be used to detect, diagnose, and localize a type of cyberattack called covert attacks on networked industrial control systems. The framework has a hybrid design that combines a variational autoencoder (VAE), a recurrent neural network (RNN), and a Deep Neural Network (DNN). This data-driven framework considers the temporal behavior of a generic physical system that extracts features from the time series of the sensor measurements that can be used for detecting covert attacks, distinguishing them from equipment faults, as well as localize the attack/fault. We evaluate the performance of the proposed method through a realistic simulation study on a networked power transmission system as a typical example of ICS. We compare the performance of the proposed method with the traditional model-based method to show its applicability and efficacy.
摘要
工业控制系统的网络化Cybersecurity引发了 significiant concerns,因为数据通信越来越多地使用无线网络。许多数据驱动方法已经开发,但很少关注于分化攻击和设备故障之间的差异。在这篇论文中,我们开发了一个数据驱动的框架,可以用于检测、诊断和地址网络化工业控制系统中的隐藏攻击。这个框架具有混合设计,组合了变量自适应器(VAE)、回归神经网络(RNN)和深度神经网络(DNN)。这个数据驱动框架考虑了生成器物理系统的时间行为,从感知器测量时间序列中提取特征,用于检测隐藏攻击、分化攻击和位置攻击。我们通过一个现实的 simulate 研究,对一个网络化的电力传输系统进行评估,以示方法的适用性和效果。我们将传统的模型基型方法与该方法进行比较,以显示其适用性和有效性。
LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing
results: 研究人员通过 demonstrate 一个简单的网络攻击use case来评估 LLM 在网络攻击方面的知识程度,并提供了引言设计的指导方针。研究人员还提出了 LLM 在加速威胁actor能力方面的可能影响和伦理考虑。研究结果表明,LLM 可以用于生成有用的信息和自动化网络攻击,但是它们的潜力和敏捷性仍需进一步探索。Abstract
In this paper, we explore the potential of Large Language Models (LLMs) to reason about threats, generate information about tools, and automate cyber campaigns. We begin with a manual exploration of LLMs in supporting specific threat-related actions and decisions. We proceed by automating the decision process in a cyber campaign. We present prompt engineering approaches for a plan-act-report loop for one action of a threat campaign and and a prompt chaining design that directs the sequential decision process of a multi-action campaign. We assess the extent of LLM's cyber-specific knowledge w.r.t the short campaign we demonstrate and provide insights into prompt design for eliciting actionable responses. We discuss the potential impact of LLMs on the threat landscape and the ethical considerations of using LLMs for accelerating threat actor capabilities. We report a promising, yet concerning, application of generative AI to cyber threats. However, the LLM's capabilities to deal with more complex networks, sophisticated vulnerabilities, and the sensitivity of prompts are open questions. This research should spur deliberations over the inevitable advancements in LLM-supported cyber adversarial landscape.
摘要
在这篇论文中,我们探讨大语言模型(LLM)在处理威胁、生成工具信息和自动化网络攻击方面的潜力。我们开始于手动探索LLM在支持特定威胁行动和决策过程中的能力。然后我们将决策过程自动化,并提出了一种plan-act-report循环和一种链接式提示设计,以导引多个行动的顺序决策过程。我们评估了LLM在短期攻击кампаgn中的网络专业知识的程度,并提供了提示设计的启示,以便获得可行的回答。我们讨论了LLM在威胁风险面临的潜在影响和使用LLM加速攻击者能力的伦理考虑因素。我们报道了一种有前途又担忧的应用 génériques AI 在网络威胁方面,但 LLM 在更复杂的网络、更复杂的漏洞和提示敏感性方面的能力仍然是开Question。这种研究应当促使人们对 LLM 在网络威胁领域的不断进步举行深思熟虑。
Quantum Shadow Gradient Descent for Quantum Learning
results: 我们的研究表明,使用量子影可以减少计算量,并且可以应用于更一般的非产品 Ansatz 中。我们提供了理论证明、减速分析和数值实验来支持我们的结论。Abstract
This paper proposes a new procedure called quantum shadow gradient descent (QSGD) that addresses these key challenges. Our method has the benefits of a one-shot approach, in not requiring any sample duplication while having a convergence rate comparable to the ideal update rule using exact gradient computation. We propose a new technique for generating quantum shadow samples (QSS), which generates quantum shadows as opposed to classical shadows used in existing works. With classical shadows, the computations are typically performed on classical computers and, hence, are prohibitive since the dimension grows exponentially. Our approach resolves this issue by measurements of quantum shadows. As the second main contribution, we study more general non-product ansatz of the form $\exp\{i\sum_j \theta_j A_j\}$ that model variational Hamiltonians. We prove that the gradient can be written in terms of the gradient of single-parameter ansatzes that can be easily measured. Our proof is based on the Suzuki-Trotter approximation; however, our expressions are exact, unlike prior efforts that approximate non-product operators. As a result, existing gradient measurement techniques can be applied to more general VQAs followed by correction terms without any approximation penalty. We provide theoretical proofs, convergence analysis and verify our results through numerical experiments.
摘要
We introduce a new technique for generating quantum shadow samples (QSS), which generates quantum shadows instead of the classical shadows used in existing works. With classical shadows, computations are typically performed on classical computers, and the dimension grows exponentially. Our approach resolves this issue by measuring quantum shadows.As the second main contribution, we study more general non-product ansatz of the form $\exp\{i\sum_j \theta_j A_j\}$ that model variational Hamiltonians. We prove that the gradient can be written in terms of the gradient of single-parameter ansatzes that can be easily measured. Our proof is based on the Suzuki-Trotter approximation, but our expressions are exact, unlike prior efforts that approximate non-product operators.As a result, existing gradient measurement techniques can be applied to more general VQAs followed by correction terms without any approximation penalty. We provide theoretical proofs, convergence analysis, and verify our results through numerical experiments.
results: 研究结果表明,使用这种方法可以更好地预测读书 recording中的语调属性,并且与人工读书 recording更加相似。此外,人类评估研究也表明,人们更偏好使用这种方法生成的Audiobook读书 recording。Abstract
Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text. However, audiobook narrations involve dramatic vocalizations and intonations by the reader, with greater reliance on emotions, dialogues, and descriptions in the narrative. Using our dataset of 93 aligned book-audiobook pairs, we present improved models for prosody prediction properties (pitch, volume, and rate of speech) from narrative text using language modeling. Our predicted prosody attributes correlate much better with human audiobook readings than results from a state-of-the-art commercial TTS system: our predicted pitch shows a higher correlation with human reading for 22 out of the 24 books, while our predicted volume attribute proves more similar to human reading for 23 out of the 24 books. Finally, we present a human evaluation study to quantify the extent that people prefer prosody-enhanced audiobook readings over commercial text-to-speech systems.
摘要
Stochastic Super-resolution of Cosmological Simulations with Denoising Diffusion Models
results: 这个论文的结果表明,使用 denoising diffusion models 可以生成高度可信度的 super-resolution 图像和电磁波谱,并且能够复制低分辨率 simulation 中的小规模特征。这些结果表明,这种 super-resolution 模型可以用于 cosmic structure formation 中的 uncertainty quantification。Abstract
In recent years, deep learning models have been successfully employed for augmenting low-resolution cosmological simulations with small-scale information, a task known as "super-resolution". So far, these cosmological super-resolution models have relied on generative adversarial networks (GANs), which can achieve highly realistic results, but suffer from various shortcomings (e.g. low sample diversity). We introduce denoising diffusion models as a powerful generative model for super-resolving cosmic large-scale structure predictions (as a first proof-of-concept in two dimensions). To obtain accurate results down to small scales, we develop a new "filter-boosted" training approach that redistributes the importance of different scales in the pixel-wise training objective. We demonstrate that our model not only produces convincing super-resolution images and power spectra consistent at the percent level, but is also able to reproduce the diversity of small-scale features consistent with a given low-resolution simulation. This enables uncertainty quantification for the generated small-scale features, which is critical for the usefulness of such super-resolution models as a viable surrogate model for cosmic structure formation.
摘要
Recently, deep learning models have been successfully used for augmenting low-resolution cosmological simulations with small-scale information, a task known as "super-resolution". So far, these cosmological super-resolution models have relied on generative adversarial networks (GANs), which can achieve highly realistic results, but suffer from various shortcomings (e.g. low sample diversity). We introduce denoising diffusion models as a powerful generative model for super-resolving cosmic large-scale structure predictions (as a first proof-of-concept in two dimensions). To obtain accurate results down to small scales, we develop a new "filter-boosted" training approach that redistributes the importance of different scales in the pixel-wise training objective. We demonstrate that our model not only produces convincing super-resolution images and power spectra consistent at the percent level, but is also able to reproduce the diversity of small-scale features consistent with a given low-resolution simulation. This enables uncertainty quantification for the generated small-scale features, which is critical for the usefulness of such super-resolution models as a viable surrogate model for cosmic structure formation.Here is the text with some additional information about the translation:The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and widely used in other countries as well. The translation is written in a formal and precise style, using technical terms and phrases appropriate for a scientific paper. The text includes some specialized vocabulary and concepts related to cosmology and deep learning, which are translated accurately and consistently based on their meanings in the context of the text. The translation also includes some cultural references and expressions that are specific to Chinese culture, but are not essential to the understanding of the scientific content. Overall, the translation is accurate and faithful to the original text, and should be easily understandable to readers who are familiar with the subject matter and the language.
Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation Learning
results: 该论文通过对一些复杂的竞争和合作多智能体游戏环境进行了广泛的实验,证明了该算法的有效性,并且比现有的多智能体IL算法表现更好。Abstract
This paper concerns imitation learning (IL) (i.e, the problem of learning to mimic expert behaviors from demonstrations) in cooperative multi-agent systems. The learning problem under consideration poses several challenges, characterized by high-dimensional state and action spaces and intricate inter-agent dependencies. In a single-agent setting, IL has proven to be done efficiently through an inverse soft-Q learning process given expert demonstrations. However, extending this framework to a multi-agent context introduces the need to simultaneously learn both local value functions to capture local observations and individual actions, and a joint value function for exploiting centralized learning. In this work, we introduce a novel multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions. A main advantage of this approach is that the weights of the mixing networks can be trained using information derived from global states. We further establish conditions for the mixing networks under which the multi-agent objective function exhibits convexity within the Q function space. We present extensive experiments conducted on some challenging competitive and cooperative multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2), which demonstrates the effectiveness of our proposed algorithm compared to existing state-of-the-art multi-agent IL algorithms.
摘要
To address these challenges, we propose a novel multi-agent IL algorithm that leverages mixing networks to aggregate decentralized Q functions. The weights of the mixing networks can be trained using information derived from global states. We establish conditions for the mixing networks under which the multi-agent objective function exhibits convexity within the Q function space.We present extensive experiments conducted on challenging competitive and cooperative multi-agent game environments, including the advanced version of the Star-Craft multi-agent challenge (SMACv2). Our proposed algorithm outperforms existing state-of-the-art multi-agent IL algorithms.
Test & Evaluation Best Practices for Machine Learning-Enabled Systems
for: This paper aims to present best practices for the Test and Evaluation (T&E) of Machine Learning (ML)-enabled software systems across their lifecycle.
methods: The paper categorizes the lifecycle of ML-enabled software systems into three stages: component, integration and deployment, and post-deployment. The primary objective is to test and evaluate the ML model as a standalone component, and then evaluate an integrated ML-enabled system consisting of both ML and non-ML components.
results: The paper highlights the challenges of T&E in ML-enabled software systems and the need for systematic testing approaches, adequacy measurements, and metrics to address these challenges across all stages of the ML-enabled system lifecycle.Abstract
Machine learning (ML) - based software systems are rapidly gaining adoption across various domains, making it increasingly essential to ensure they perform as intended. This report presents best practices for the Test and Evaluation (T&E) of ML-enabled software systems across its lifecycle. We categorize the lifecycle of ML-enabled software systems into three stages: component, integration and deployment, and post-deployment. At the component level, the primary objective is to test and evaluate the ML model as a standalone component. Next, in the integration and deployment stage, the goal is to evaluate an integrated ML-enabled system consisting of both ML and non-ML components. Finally, once the ML-enabled software system is deployed and operationalized, the T&E objective is to ensure the system performs as intended. Maintenance activities for ML-enabled software systems span the lifecycle and involve maintaining various assets of ML-enabled software systems. Given its unique characteristics, the T&E of ML-enabled software systems is challenging. While significant research has been reported on T&E at the component level, limited work is reported on T&E in the remaining two stages. Furthermore, in many cases, there is a lack of systematic T&E strategies throughout the ML-enabled system's lifecycle. This leads practitioners to resort to ad-hoc T&E practices, which can undermine user confidence in the reliability of ML-enabled software systems. New systematic testing approaches, adequacy measurements, and metrics are required to address the T&E challenges across all stages of the ML-enabled system lifecycle.
摘要
At the component stage, the primary goal is to evaluate the ML model as a standalone component. In the integration and deployment stage, the objective is to evaluate an integrated ML-enabled system consisting of both ML and non-ML components. Finally, once the ML-enabled software system is deployed and operationalized, the T&E objective is to ensure the system performs as intended.Maintenance activities for ML-enabled software systems span the lifecycle and involve maintaining various assets of ML-enabled software systems. The T&E of ML-enabled software systems is challenging due to their unique characteristics. While there has been significant research on T&E at the component level, there is limited work on T&E in the remaining two stages. Moreover, there is often a lack of systematic T&E strategies throughout the ML-enabled system's lifecycle, leading practitioners to resort to ad-hoc T&E practices that can undermine user confidence in the reliability of ML-enabled software systems.New systematic testing approaches, adequacy measurements, and metrics are needed to address the T&E challenges across all stages of the ML-enabled system lifecycle.
Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning
paper_authors: Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere
for: Matrix estimation problems in reinforcement learning (RL) with low-rank structure, such as low-rank bandits and Markov Decision Processes (MDPs).
methods: Spectral-based matrix estimation approaches that efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
results: State-of-the-art performance guarantees for two examples of algorithms: a regret minimization algorithm for low-rank bandit problems, and a best policy identification algorithm for reward-free RL in low-rank MDPs.Abstract
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP. In both cases, each entry of the matrix carries important information, and we seek estimation methods with low entry-wise error. Importantly, these methods further need to accommodate for inherent correlations in the available data (e.g. for MDPs, the data consists of system trajectories). We investigate the performance of simple spectral-based matrix estimation approaches: we show that they efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error. These new results on low-rank matrix estimation make it possible to devise reinforcement learning algorithms that fully exploit the underlying low-rank structure. We provide two examples of such algorithms: a regret minimization algorithm for low-rank bandit problems, and a best policy identification algorithm for reward-free RL in low-rank MDPs. Both algorithms yield state-of-the-art performance guarantees.
摘要
我们研究在奖励学习(RL)中出现的矩阵估计问题,其中矩阵往往具有低级别结构。在低级别投机中,矩阵需要 recuperate 表示每个臂奖励,而在低级别Markov决策过程(MDP)中,它可能表示MDP的转移核函数。在两种情况下,每个矩阵元素都具有重要信息,我们寻找低入门错误的估计方法。这些方法还需要考虑数据中的自然相关性(例如,MDP数据包括系统轨迹)。我们研究spectral-based矩阵估计方法的性能,并证明它们可以高效地回归矩阵的单个子空间,并且显示出nearly-minimal 入门错误。这些新结果在低级别矩阵估计方面,使得我们可以开发充分利用低级别结构的奖励学习算法。我们提供了两个例子:一个为低级别投机问题的奖励最小化算法,另一个为无奖励RL的低级别MDP中的最佳策略标识算法。两个算法都有状态 искусственный智能的性能保证。
Enhancing Predictive Capabilities in Data-Driven Dynamical Modeling with Automatic Differentiation: Koopman and Neural ODE Approaches
results: 这个方法在测试了多种方法后,与STATE SPACE APPROACH(神经 ODEs)相比,表现更好,并且在不满足 Koopman 算子的线性条件下,也可以达到比较好的结果。Abstract
Data-driven approximations of the Koopman operator are promising for predicting the time evolution of systems characterized by complex dynamics. Among these methods, the approach known as extended dynamic mode decomposition with dictionary learning (EDMD-DL) has garnered significant attention. Here we present a modification of EDMD-DL that concurrently determines both the dictionary of observables and the corresponding approximation of the Koopman operator. This innovation leverages automatic differentiation to facilitate gradient descent computations through the pseudoinverse. We also address the performance of several alternative methodologies. We assess a 'pure' Koopman approach, which involves the direct time-integration of a linear, high-dimensional system governing the dynamics within the space of observables. Additionally, we explore a modified approach where the system alternates between spaces of states and observables at each time step -- this approach no longer satisfies the linearity of the true Koopman operator representation. For further comparisons, we also apply a state space approach (neural ODEs). We consider systems encompassing two and three-dimensional ordinary differential equation systems featuring steady, oscillatory, and chaotic attractors, as well as partial differential equations exhibiting increasingly complex and intricate behaviors. Our framework significantly outperforms EDMD-DL. Furthermore, the state space approach offers superior performance compared to the 'pure' Koopman approach where the entire time evolution occurs in the space of observables. When the temporal evolution of the Koopman approach alternates between states and observables at each time step, however, its predictions become comparable to those of the state space approach.
摘要
“数据驱动的科普曼算子估计方法显示出预测复杂动力系统时间演化的承诺。这些方法中,使用字典学习的扩展动态模式分解(EDMD-DL)已经吸引了广泛的关注。在这里,我们提出了一种同时确定字典和科普曼算子的估计方法的修改。这种创新利用了自动微分的技术来促进梯度下降计算,通过 Pseudoinverse 来实现。我们还评估了一些其他方法。我们评估了一种 '纯' 科普曼方法,该方法直接在可观察空间中进行时间 инте格alion,并且可以在高维度系统中实现。此外,我们还探讨了一种 modify 方法,该方法在每次时间步骤时将系统转换到不同的空间中,这种方法不再满足真正的科普曼算子表示。为了进一步比较,我们还应用了一种状态空间方法(神经 ODEs)。我们考虑了两维和三维常微方程系统,以及具有复杂和精细行为的 partial differential equation 系统。我们的框架在 EDMD-DL 方法上表现出了显著的改善,而且状态空间方法在比较 '纯' 科普曼方法和 EDMD-DL 方法时表现出了更好的性能。当科普曼方法在每次时间步骤时 alternate между状态和可观察空间时,其预测结果与状态空间方法相近。”
results: 作者的信息理论奖励方法在多个游戏中表现出了更高的效率和可扩展性,包括Montezuma Revenge这个知名的奖励学习任务。此外,作者还提出了一种扩展方案,即在离散压缩的射频空间中最大化信息内容,以提高样本效率和扩展性。Abstract
Sparse reward environments are known to be challenging for reinforcement learning agents. In such environments, efficient and scalable exploration is crucial. Exploration is a means by which an agent gains information about the environment. We expand on this topic and propose a new intrinsic reward that systemically quantifies exploratory behavior and promotes state coverage by maximizing the information content of a trajectory taken by an agent. We compare our method to alternative exploration based intrinsic reward techniques, namely Curiosity Driven Learning and Random Network Distillation. We show that our information theoretic reward induces efficient exploration and outperforms in various games, including Montezuma Revenge, a known difficult task for reinforcement learning. Finally, we propose an extension that maximizes information content in a discretely compressed latent space which boosts sample efficiency and generalizes to continuous state spaces.
摘要
稀有奖励环境是束缚学习代理的挑战之一。在这些环境中,高效和可扩展的探索是关键。探索是一种方式,通过哪里让代理获得环境信息。我们在这个主题上进一步探讨,并提出一种新的内在奖励方法,系统地量化探索行为,并且通过最大化征文轨迹中的信息内容来促进状态覆盖。我们与其他探索基于内在奖励技术进行比较,包括Curiosity Driven Learning和Random Network Distillation。我们显示,我们的信息学的奖励induces高效的探索,并在多个游戏中表现出优秀,包括Montezuma Revenge,这是已知的Difficult Task for reinforcement learning。最后,我们提出了一种扩展,通过最大化离散压缩的秘密空间中的信息内容来提高样本效率和普遍性,以便应用于连续状态空间。
Causal Rule Learning: Enhancing the Understanding of Heterogeneous Treatment Effect via Weighted Causal Rules
paper_authors: Ying Wu, Hanzhong Liu, Kai Ren, Xiangyu Chang
For: The paper aims to estimate heterogeneous treatment effects using machine learning methods, with a focus on interpretability for healthcare applications.* Methods: The proposed method, called causal rule learning, involves three phases: rule discovery, rule selection, and rule analysis. It uses a causal forest and D-learning method to identify and deconstruct individual-level treatment effects as a linear combination of subgroup-level effects.* Results: The paper demonstrates the superior performance of causal rule learning in estimating heterogeneous treatment effects when the ground truth is complex and the sample size is sufficient, compared to other methods. It also provides insights into the treatment effects of different subgroups and the weights of each rule in the linear combination.Here is the information in Simplified Chinese text:* For: 该研究使用机器学习方法来估计不同受试者对待的差异效果,特别是在医疗应用中,高度需要可读性。* Methods: 提议的方法是 causal rule learning,它包括三个阶段:规则发现、规则选择和规则分析。它使用 causal forest 和 D-learning 方法来发现和分解个体级待遇的差异效果,以解答过去的忽略问题:一个个体是多个组的成员吗?* Results: 研究表明, causal rule learning 在复杂的真实场景中,对差异效果的可读性估计具有显著优势,比其他方法更好。它还提供了不同组别待遇的治疗效果的信息和每个规则在线性组合中的权重。Abstract
Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. Causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient.
摘要
<>转换文本到简化中文。<>解释性是机器学习方法估计不同征型对减震效果的关键问题,特别是在医疗应用中,高度决策是经常被做的。 draw inspiration from predictive, descriptive, relevant framework of interpretability, we propose causal rule learning, which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient.
Growing ecosystem of deep learning methods for modeling protein$\unicode{x2013}$protein interactions
results: 论文提出了一系列的成果,包括使用表示学习capture蛋白质交互的复杂特征,使用几何深度学习预测蛋白质结构和交互,以及使用生成模型设计新的蛋白质组合体。这些成果推动了蛋白质交互模型的发展,并为探索蛋白质交互的物理机制和工程蛋白质交互提供了新的思路。Abstract
Numerous cellular functions rely on protein$\unicode{x2013}$protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically-informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
摘要
许多细胞功能都依赖于蛋白质-蛋白质交互。然而,完全描述这些交互的问题仍然面临着蛋白质多样性所带来的挑战。深度学习在解决这个问题上表现出了扎根,因为它可以利用实验数据和蛋白质交互的基本生物物理知识。在这篇文章中,我们评论了深度学习用于模拟蛋白质交互的生态系统,包括这些生物学上 Informed 模型的多样性和它们之间的贸易。我们还讨论了在 representation learning 中捕捉蛋白质交互的复杂特征,在 geometric deep learning 中预测蛋白质结构和预测复杂结构,以及在生成模型中设计新的蛋白质组装。此外,我们还概述了一些未解决的挑战和前瞻的新方向。在使用深度学习来发现新的交互、解释它们的物理机制和通过设计蛋白质拓展器来调整交互的功能时,有很多机会。最终,我们希望通过深度学习来解释蛋白质交互如何指挥细胞行为。
Improving Pseudo-Time Stepping Convergence for CFD Simulations With Neural Networks
results: 在这种模拟中,使用了一种叫做pseudo-transient continuation的技术,以提高非线性律vikrey-Stokes方程的收敛性。这种技术使用了一个神经网络模型,用于预测当地 pseudo-time step。这种预测方法可以在每个元素上独立地进行,只需要使用当地的信息。 numerically simulate the results of standard benchmark problems, such as flow through a backward facing step geometry and Couette flow, show the performance of the machine learning-enhanced globalization approach.Abstract
Computational fluid dynamics (CFD) simulations of viscous fluids described by the Navier-Stokes equations are considered. Depending on the Reynolds number of the flow, the Navier-Stokes equations may exhibit a highly nonlinear behavior. The system of nonlinear equations resulting from the discretization of the Navier-Stokes equations can be solved using nonlinear iteration methods, such as Newton's method. However, fast quadratic convergence is typically only obtained in a local neighborhood of the solution, and for many configurations, the classical Newton iteration does not converge at all. In such cases, so-called globalization techniques may help to improve convergence. In this paper, pseudo-transient continuation is employed in order to improve nonlinear convergence. The classical algorithm is enhanced by a neural network model that is trained to predict a local pseudo-time step. Generalization of the novel approach is facilitated by predicting the local pseudo-time step separately on each element using only local information on a patch of adjacent elements as input. Numerical results for standard benchmark problems, including flow through a backward facing step geometry and Couette flow, show the performance of the machine learning-enhanced globalization approach; as the software for the simulations, the CFD module of COMSOL Multiphysics is employed.
摘要
computational fluid dynamics (CFD) 模拟可以描述由navier-Stokes方程所描述的粘性流体行为。各种 Reynolds 数值可以导致 Navier-Stokes 方程在不同程度上具有非线性行为。通过离散 Navier-Stokes 方程得到的系统非线性方程可以通过非线性迭代方法,如新颖方法,进行解决。然而,通常只有在解的本地邻域内具有快速quadratic convergence的情况下才能获得快速的收敛。在这些情况下,所谓的全局化技术可以帮助改善收敛。在这篇论文中,使用pseudo-transient continuation的方法来改进非线性收敛。经过训练的神经网络模型可以预测当前粘性流体中的local pseudo-time step。通过在每个元素上分别预测local pseudo-time step,并且只使用当地信息进行预测,这种全局化技术可以在不同的粘性流体中实现更好的收敛性。在实验中,使用CFD模块在COMSOL Multiphysics中进行 simulations。Please note that Simplified Chinese is a simplified version of Chinese, and it may not be the exact translation of the original text.
S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models
results: 研究发现,这些架构在SHHS数据集上显示了 statistically significant的性能提高,并通过了 both statistical和systematic error estimations。Abstract
Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these design choices within the broad category of encoder-predictor architectures. We identify robust architectures applicable to both time series and spectrogram input representations. These architectures incorporate structured state space models as integral components, leading to statistically significant advancements in performance on the extensive SHHS dataset. These improvements are assessed through both statistical and systematic error estimations. We anticipate that the architectural insights gained from this study will not only prove valuable for future research in sleep staging but also hold relevance for other time series annotation tasks.
摘要
评分睡眠阶段在多somnography记录中是一项时间消耗性的任务,受到差异评分者的影响。因此,它可以从机器学习算法的应用中受益。虽然许多算法已经被提出用于此目的,但一些关键的建筑设计决策尚未得到系统的探讨。在这项研究中,我们仔细调查了这些设计选择,并在广泛的SHHS数据集上进行了实证验证。我们发现了一些稳定的架构,可以在时间序列和峰值spectrogram输入表示中应用。这些架构包括结构化状态空间模型为组件,导致了 statistically significant的性能提升。我们通过统计和系统的错误估计来评估这些改进。我们预计,这些建筑学习的成果将不仅对Future sleep stage评分研究有价值,还将对其他时间序列注释任务有 relevance。
Interpretable Traffic Event Analysis with Bayesian Networks
results: 通过一个具体的案例研究,本研究的方法可以准确预测交通事故,并分析交通和天气事件之间的关系,从而提供可读性的交通事故预测方法。Abstract
Although existing machine learning-based methods for traffic accident analysis can provide good quality results to downstream tasks, they lack interpretability which is crucial for this critical problem. This paper proposes an interpretable framework based on Bayesian Networks for traffic accident prediction. To enable the ease of interpretability, we design a dataset construction pipeline to feed the traffic data into the framework while retaining the essential traffic data information. With a concrete case study, our framework can derive a Bayesian Network from a dataset based on the causal relationships between weather and traffic events across the United States. Consequently, our framework enables the prediction of traffic accidents with competitive accuracy while examining how the probability of these events changes under different conditions, thus illustrating transparent relationships between traffic and weather events. Additionally, the visualization of the network simplifies the analysis of relationships between different variables, revealing the primary causes of traffic accidents and ultimately providing a valuable reference for reducing traffic accidents.
摘要
尽管现有的机器学习基于方法可以提供下游任务的好质量结果,但它们缺乏可解性,这是交通事故分析中的关键问题。这篇论文提出了一种可解的框架,基于 bayesian 网络,用于交通事故预测。为了实现可解性,我们设计了一个数据建构管道,将交通数据feed到框架中,保留交通数据的重要信息。通过具体的案例研究,我们的框架可以从 dataset 中 deriv 出 bayesian 网络,该网络表示美国交通事故和天气事件之间的 causal 关系。因此,我们的框架可以在不同条件下预测交通事故的发生概率,并评估这些事件的发生probability在不同条件下的变化,从而显示交通和天气事件之间的透明关系。此外,网络的可视化可以简化不同变量之间的关系分析,揭示交通事故的主要原因,并为减少交通事故提供了有价值的参考。
results: 研究人员在三个不同的环境中证明了该算法的效果,包括一个简单的游戏、一个中等难度的游戏和一个复杂的游戏,并且在不同的传递知识要求下进行了证明。Abstract
We present an algorithm that learns to imitate expert behavior and can transfer to previously unseen domains without retraining. Such an algorithm is extremely relevant in real-world applications such as robotic learning because 1) reward functions are difficult to design, 2) learned policies from one domain are difficult to deploy in another domain and 3) learning directly in the real world is either expensive or unfeasible due to security concerns. To overcome these constraints, we combine recent advances in Deep RL by using an AnnealedVAE to learn a disentangled state representation and imitate an expert by learning a single Q-function which avoids adversarial training. We demonstrate the effectiveness of our method in 3 environments ranging in difficulty and the type of transfer knowledge required.
摘要
我们提出了一种算法,可以模仿专家行为,并可以在未经 retraining 的情况下在新领域中传输。这种算法在实际应用中非常有用,因为1)奖励函数设计困难,2)从一个领域学习的策略Difficult to deploy in another domain,3)在真实世界中学习直接是非常昂贵或者安全问题。为了解决这些限制,我们将最近的深度学习RL技术与AnnealedVAE结合,学习一个分离的状态表示,并通过学习单个Q函数来模仿专家。我们在3个不同的环境中展示了我们的方法的有效性,这些环境的难度和传输知识类型都不同。
results: 本文得到了一个新的泛化函数,可以用于代表不同的产品随机变量的乘积。Abstract
We review the cumulant decomposition (a way of decomposing the expectation of a product of random variables (e.g. $\mathbb{E}[XYZ]$) into a sum of terms corresponding to partitions of these variables.) and the Wick decomposition (a way of decomposing a product of (not necessarily random) variables into a sum of terms corresponding to subsets of the variables). Then we generalize each one to a new decomposition where the product function is generalized to an arbitrary function.
摘要
我们审查汇数分解(一种分解互动随机变量(例如 $\mathbb{E}[XYZ]$)的期望为汇数分割)和威克分解(一种分解互动变量的产生为汇数分割)。然后我们将它们扩展为一个新的分解,其中互动函数被扩展为一个通用函数。Note that "汇数分解" (cumulant decomposition) and "威克分解" (Wick decomposition) are both commonly used terms in probability theory and statistics, and they are often used to analyze the properties of multivariate distributions.
Enhanced Graph Neural Networks with Ego-Centric Spectral Subgraph Embeddings Augmentation
results: 我们在 seven 个数据集和八个基eline模型上进行评估,结果显示,对于图像分类任务,ESGEA 可以提高 AUC 的表现,相比基eline模型,提高了10%。对于节点分类任务,ESGEA 可以提高 accuracy 的表现,相比基eline模型,提高了7%。Abstract
Graph Neural Networks (GNNs) have shown remarkable merit in performing various learning-based tasks in complex networks. The superior performance of GNNs often correlates with the availability and quality of node-level features in the input networks. However, for many network applications, such node-level information may be missing or unreliable, thereby limiting the applicability and efficacy of GNNs. To address this limitation, we present a novel approach denoted as Ego-centric Spectral subGraph Embedding Augmentation (ESGEA), which aims to enhance and design node features, particularly in scenarios where information is lacking. Our method leverages the topological structure of the local subgraph to create topology-aware node features. The subgraph features are generated using an efficient spectral graph embedding technique, and they serve as node features that capture the local topological organization of the network. The explicit node features, if present, are then enhanced with the subgraph embeddings in order to improve the overall performance. ESGEA is compatible with any GNN-based architecture and is effective even in the absence of node features. We evaluate the proposed method in a social network graph classification task where node attributes are unavailable, as well as in a node classification task where node features are corrupted or even absent. The evaluation results on seven datasets and eight baseline models indicate up to a 10% improvement in AUC and a 7% improvement in accuracy for graph and node classification tasks, respectively.
摘要
graph neural networks (GNNs) 已经表现出非常出色的表现力在复杂网络中进行学习任务。 GNNs 的高效性 frequently correlates with the availability and quality of node-level features in the input networks。 however, for many network applications, such node-level information may be missing or unreliable, thereby limiting the applicability and efficacy of GNNs。 To address this limitation, we present a novel approach denoted as Ego-centric Spectral subGraph Embedding Augmentation (ESGEA), which aims to enhance and design node features, particularly in scenarios where information is lacking。 Our method leverages the topological structure of the local subgraph to create topology-aware node features。 The subgraph features are generated using an efficient spectral graph embedding technique, and they serve as node features that capture the local topological organization of the network。 The explicit node features, if present, are then enhanced with the subgraph embeddings in order to improve the overall performance。 ESGEA is compatible with any GNN-based architecture and is effective even in the absence of node features。 We evaluate the proposed method in a social network graph classification task where node attributes are unavailable, as well as in a node classification task where node features are corrupted or even absent。 The evaluation results on seven datasets and eight baseline models indicate up to a 10% improvement in AUC and a 7% improvement in accuracy for graph and node classification tasks, respectively。
On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions
results: 发现虽然去除绑定站信息会降低准确性,修改后的模型仍可以高度准确地预测系统的压缩能量,并且可以在O20数据集上达到remarkably decent MAE。Abstract
The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorporate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.
摘要
传统上,机器学习 для物理性质预测和发现都是通过图 neural networks来实现,其中包括所有原子的几何配置。然而,在实践中,这些信息可能不可获取,例如,评估可能未知的材料吸附物的绑定。在这篇文章中,我们研究了是否可以预测系统的压缩能量在OC20数据集中,而不考虑附着物的相对位置。我们考虑了SchNet、DimeNet++和FAENet作为基础体系,并测试了四种修改对模型性能的影响: removing edges in the input graph、pooling independent representations、不共享背部网重和使用注意机制来传播非几何相对信息。我们发现,尽管 removing binding site information 会降低准确性,但修改后的模型仍然可以预测压缩能量的投影值,并且具有相当的平均误差。我们的工作建议将来的材料发现加速,可以采用减少或完全去除reactant配置信息的方法。
Machine Learning Quantum Systems with Magnetic p-bits
results: 研究表明,使用这种概率计算机可以实现可扩展和能效的计算,特别适用于将机器学习和量子物理结合起来的新领域。Abstract
The slowing down of Moore's Law has led to a crisis as the computing workloads of Artificial Intelligence (AI) algorithms continue skyrocketing. There is an urgent need for scalable and energy-efficient hardware catering to the unique requirements of AI algorithms and applications. In this environment, probabilistic computing with p-bits emerged as a scalable, domain-specific, and energy-efficient computing paradigm, particularly useful for probabilistic applications and algorithms. In particular, spintronic devices such as stochastic magnetic tunnel junctions (sMTJ) show great promise in designing integrated p-computers. Here, we examine how a scalable probabilistic computer with such magnetic p-bits can be useful for an emerging field combining machine learning and quantum physics.
摘要
Note:* "Moore's Law" is translated as "Moore's 法则" (Moore zhì yì)* "computing workloads" is translated as "计算工作负载" (jìsuan gongzuò fùyòu)* "Probabilistic computing" is translated as "概率计算" (guīshí jìsuan)* "p-bits" is translated as "p-位" (p-bit)* "spintronic devices" is translated as "磁电子设备" (spintronic seti)* "stochastic magnetic tunnel junctions" is translated as "随机磁隧道结构" (stochastic magnetic tunnel junctions)* "integrated p-computers" is translated as "集成p计算机" (integrated p-computers)* "emerging field" is translated as "新兴领域" (emerging field)* "machine learning and quantum physics" is translated as "机器学习和量子物理" (machine learning and quantum physics)
Tertiary Lymphoid Structures Generation through Graph-based Diffusion
results: 研究者通过数据扩充来证明了学习生成模型的utilty,并展示了这种模型可以帮助提高肿瘤诊断的准确率。这是首次利用图基的扩散模型来生成生物学意义的细胞图。Abstract
Graph-based representation approaches have been proven to be successful in the analysis of biomedical data, due to their capability of capturing intricate dependencies between biological entities, such as the spatial organization of different cell types in a tumor tissue. However, to further enhance our understanding of the underlying governing biological mechanisms, it is important to accurately capture the actual distributions of such complex data. Graph-based deep generative models are specifically tailored to accomplish that. In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs. In particular, we show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content, a well-established biomarker for evaluating the cancer progression in oncology research. Additionally, we further illustrate the utility of the learned generative models for data augmentation in a TLS classification task. To the best of our knowledge, this is the first work that leverages the power of graph diffusion models in generating meaningful biological cell structures.
摘要
基于图表表示方法已经在生物医学数据分析中取得成功,因为它们可以捕捉生物实体之间复杂的依赖关系,如肿瘤组织中不同细胞类型之间的空间组织。然而,为了更好地理解生物机制的下面驱动,需要准确地捕捉实际数据的分布。基于图表的深度生成模型可以帮助实现这一目标。在这种工作中,我们利用了状态机器的图表傅振模型,生成生物意义正的细胞图。特别是,我们表明采用的图表傅振模型可以准确地学习细胞的三元免疫结构(TLS)含量,这是评估肿瘤发展的生物标志物。此外,我们还进一步证明了学习的生成模型可以用于数据扩展在TLS分类任务中。根据我们所知,这是首次利用图表傅振模型生成有意义的生物细胞结构。
results: 我们在多种数据集上进行了实验,并证明了我们的方法的有效性、普遍性和可扩展性。我们的方法可以处理弯曲3D形状、单类编码和多类编码等多种应用场景。Abstract
Neural shape representation generally refers to representing 3D geometry using neural networks, e.g., to compute a signed distance or occupancy value at a specific spatial position. Previous methods tend to rely on the auto-decoder paradigm, which often requires densely-sampled and accurate signed distances to be known during training and testing, as well as an additional optimization loop during inference. This introduces a lot of computational overhead, in addition to having to compute signed distances analytically, even during testing. In this paper, we present a novel encoder-decoder neural network for embedding 3D shapes in a single forward pass. Our architecture is based on a multi-scale hybrid system incorporating graph-based and voxel-based components, as well as a continuously differentiable decoder. Furthermore, the network is trained to solve the Eikonal equation and only requires knowledge of the zero-level set for training and inference. Additional volumetric samples can be generated on-the-fly, and incorporated in an unsupervised manner. This means that in contrast to most previous work, our network is able to output valid signed distance fields without explicit prior knowledge of non-zero distance values or shape occupancy. In other words, our network computes approximate solutions to the boundary-valued Eikonal equation. It also requires only a single forward pass during inference, instead of the common latent code optimization. We further propose a modification of the loss function in case that surface normals are not well defined, e.g., in the context of non-watertight surface-meshes and non-manifold geometry. We finally demonstrate the efficacy, generalizability and scalability of our method on datasets consisting of deforming 3D shapes, single class encoding and multiclass encoding, showcasing a wide range of possible applications.
摘要
Implicit Variational Inference for High-Dimensional Posteriors
paper_authors: Anshuk Uppal, Kristoffer Stensbo-Smidt, Wouter K. Boomsma, Jes Frellsen
For: The paper is written for advancing the field of variational inference in Bayesian neural networks, specifically by proposing a new method for approximating complex multimodal and correlated posteriors using neural samplers with implicit distributions.* Methods: The paper introduces novel bounds that come about by locally linearizing the neural sampler, which is distinct from existing methods that rely on additional discriminator networks and unstable adversarial objectives. The paper also presents a new sampler architecture that enables implicit distributions over millions of latent variables, addressing computational concerns by using differentiable numerical approximations.* Results: The paper demonstrates that the proposed method is capable of recovering correlations across layers in large Bayesian neural networks, a property that is crucial for a network’s performance but notoriously challenging to achieve. The paper also shows that the expressive posteriors obtained using the proposed method outperform state-of-the-art uncertainty quantification methods in downstream tasks, validating the effectiveness of the training algorithm and the quality of the learned implicit approximation.Abstract
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors in high-dimensional spaces. Our approach advances inference using implicit distributions by introducing novel bounds that come about by locally linearising the neural sampler. This is distinct from existing methods that rely on additional discriminator networks and unstable adversarial objectives. Furthermore, we present a new sampler architecture that, for the first time, enables implicit distributions over millions of latent variables, addressing computational concerns by using differentiable numerical approximations. Our empirical analysis indicates our method is capable of recovering correlations across layers in large Bayesian neural networks, a property that is crucial for a network's performance but notoriously challenging to achieve. To the best of our knowledge, no other method has been shown to accomplish this task for such large models. Through experiments in downstream tasks, we demonstrate that our expressive posteriors outperform state-of-the-art uncertainty quantification methods, validating the effectiveness of our training algorithm and the quality of the learned implicit approximation.
摘要
在变分推断中, bayesian 模型的优点取决于正确地捕捉真实 posterior distribution。我们提议使用神经网络 sampler,这些 sampler specify implicit distribution,适用于高维空间中复杂的多模态和相关 posterior。我们的方法在神经网络 sampler 中引入新的 bound,通过本地线性化来提高推断。这与现有的方法不同,它们基于额外的 discriminator 网络和不稳定的对抗性目标。此外,我们提出了一新的 sampler 架构,可以对 millions 个 latent variable 进行隐式分布,通过使用可微的数学近似来解决计算问题。我们的实验表明,我们的方法可以在大 bayesian 神经网络中恢复层之间的相关性,这是一个关键的性能因素,但是很难实现。而我们的表达式 posterior 可以超越现有的 uncertainty quantification 方法,证明我们的训练算法的有效性和学习的隐式近似质量。
The Lattice Overparametrization Paradigm for the Machine Learning of Lattice Operators
results: 本文的结果表明,通过使用stochastic lattice gradient descent算法可以有效地学习势函数算子,并且可以通过计算其基来了解势函数算子的性质。此外,本文还证明了这种学习方法具有控制、透明度和可解性的特点,这些特点在现代机器学习方法中缺失。Abstract
The machine learning of lattice operators has three possible bottlenecks. From a statistical standpoint, it is necessary to design a constrained class of operators based on prior information with low bias, and low complexity relative to the sample size. From a computational perspective, there should be an efficient algorithm to minimize an empirical error over the class. From an understanding point of view, the properties of the learned operator need to be derived, so its behavior can be theoretically understood. The statistical bottleneck can be overcome due to the rich literature about the representation of lattice operators, but there is no general learning algorithm for them. In this paper, we discuss a learning paradigm in which, by overparametrizing a class via elements in a lattice, an algorithm for minimizing functions in a lattice is applied to learn. We present the stochastic lattice gradient descent algorithm as a general algorithm to learn on constrained classes of operators as long as a lattice overparametrization of it is fixed, and we discuss previous works which are proves of concept. Moreover, if there are algorithms to compute the basis of an operator from its overparametrization, then its properties can be deduced and the understanding bottleneck is also overcome. This learning paradigm has three properties that modern methods based on neural networks lack: control, transparency and interpretability. Nowadays, there is an increasing demand for methods with these characteristics, and we believe that mathematical morphology is in a unique position to supply them. The lattice overparametrization paradigm could be a missing piece for it to achieve its full potential within modern machine learning.
摘要
《机器学习阶层算子的三个可能的瓶颈》 Machine learning lattice operators have three possible bottlenecks. From a statistical standpoint, it is necessary to design a constrained class of operators based on prior information with low bias and low complexity relative to the sample size. From a computational perspective, there should be an efficient algorithm to minimize an empirical error over the class. From an understanding standpoint, the properties of the learned operator need to be derived, so its behavior can be theoretically understood.The statistical bottleneck can be overcome due to the rich literature about the representation of lattice operators, but there is no general learning algorithm for them. In this paper, we discuss a learning paradigm in which, by overparametrizing a class via elements in a lattice, an algorithm for minimizing functions in a lattice is applied to learn. We present the stochastic lattice gradient descent algorithm as a general algorithm to learn on constrained classes of operators as long as a lattice overparametrization of it is fixed, and we discuss previous works which are proofs of concept.Moreover, if there are algorithms to compute the basis of an operator from its overparametrization, then its properties can be deduced, and the understanding bottleneck is also overcome. This learning paradigm has three properties that modern methods based on neural networks lack: control, transparency, and interpretability. Nowadays, there is an increasing demand for methods with these characteristics, and we believe that mathematical morphology is in a unique position to supply them. The lattice overparametrization paradigm could be a missing piece for it to achieve its full potential within modern machine learning.
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
results: 在多个真实世界数据集上实现了一致状态的前一个性和泛化能力,提高了Transformer家族在时间序列预测中的表现,并且可以更好地利用不同的lookback窗口和变量。Abstract
The recent boom of linear forecasting models questions the ongoing passion for architectural modifications of Transformer-based forecasters. These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp. However, Transformer is challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. Besides, the unified embedding for each temporal token fuses multiple variates with potentially unaligned timestamps and distinct physical measurements, which may fail in learning variate-centric representations and result in meaningless attention maps. In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any adaptation on the basic components. We propose iTransformer that simply inverts the duties of the attention mechanism and the feed-forward network. Specifically, the time points of individual series are embedded into variate tokens which are utilized by the attention mechanism to capture multivariate correlations; meanwhile, the feed-forward network is applied for each variate token to learn nonlinear representations. The iTransformer model achieves consistent state-of-the-art on several real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting.
摘要
Recent 崩溃 linear 预测模型 让人们对 transformer 基于预测器的建筑修改lost interest。这些预测器利用 transformer 模型全球时间序列中的全局依赖关系,每个时间戳由多个变量组成。然而, transformer 在更大的 lookback 窗口预测中表现不佳,因为性能下降和计算暴涨。此外,通用 embedding 对每个时间戳进行综合 embedding 可能会失去变量 centered 表示和无用的注意力地图。在这项工作中,我们反思 transformer 组件的能力和挑战,并将 transformer 架构重新定义为 iTransformer。iTransformer 简单地将 attention 机制和 feed-forward 网络的职责反转过来。具体来说,每个时间序列的时刻点被转换成 variate token,并由 attention 机制来捕捉多元相关性;而 feed-forward 网络则是为每个 variate token 进行非线性表示学习。iTransformer 模型在多个实际数据集上具有一致的 state-of-the-art 性能,这使得 transformer 家族受到了提高性能、泛化能力和不同变量之间的更好利用,从而成为时间序列预测的基本脊梁。
Robustness May be More Brittle than We Think under Different Degrees of Distribution Shifts
results: 研究人员发现,模型在不同分布偏移度下的抗衰减性可能很弱,而且可能存在较大的分布偏移度下的潜在风险。此外,大规模预训练模型,如CLIP,在novel downstream任务中的分布偏移度下也具有敏感性。Abstract
Out-of-distribution (OOD) generalization is a complicated problem due to the idiosyncrasies of possible distribution shifts between training and test domains. Most benchmarks employ diverse datasets to address this issue; however, the degree of the distribution shift between the training domains and the test domains of each dataset remains largely fixed. This may lead to biased conclusions that either underestimate or overestimate the actual OOD performance of a model. Our study delves into a more nuanced evaluation setting that covers a broad range of shift degrees. We show that the robustness of models can be quite brittle and inconsistent under different degrees of distribution shifts, and therefore one should be more cautious when drawing conclusions from evaluations under a limited range of degrees. In addition, we observe that large-scale pre-trained models, such as CLIP, are sensitive to even minute distribution shifts of novel downstream tasks. This indicates that while pre-trained representations may help improve downstream in-distribution performance, they could have minimal or even adverse effects on generalization in certain OOD scenarios of the downstream task if not used properly. In light of these findings, we encourage future research to conduct evaluations across a broader range of shift degrees whenever possible.
摘要
外部分布(OOD)泛化是一个复杂的问题,因为可能存在训练和测试领域之间的特殊性和分布差异。大多数标准准测试使用多种数据集来解决这个问题,但是每个数据集的测试领域分布shift的度量仍然很大程度上固定。这可能会导致偏向的结论, Either underestimate或Overestimate实际OOD模型的性能。我们的研究探讨了一种更加细化的评估环境,覆盖了广泛的分布差异度。我们发现模型的Robustness可能很脆弱和不一致,因此在不同的分布差异度下,一个应该更加小心地做结论。此外,我们发现大规模预训练模型,如CLIP,对小型分布差异的新任务有敏感性。这表示,虽然预训练表示可以帮助改进下游领域的表现,但是在某些OOD场景下,它们可能会具有微不足或甚至有害的效果。为了更好地评估OOD性能,我们建议将来的研究在可能的范围内进行评估。
Discovering Interpretable Physical Models Using Symbolic Regression and Discrete Exterior Calculus
methods: 这种方法结合了Symbolic Regression(SR)和Discrete Exterior Calculus(DEC),使用了一种自然的通用的整数数学语言,以便推导和分析物理模型。 DEC 提供了一些拓扑学上的建构,以及一种强类型的 SR 过程,以确保数学表达的正确性和减少搜索空间。
results: 通过使用这种方法, authors 成功地重新发现了三个维度物理学中的模型:波松方程、欧拉的弹性材料和Linear Elasticity 方程。这些模型具有通用的特点,可以应用于多种物理模拟问题。Abstract
Computational modeling is a key resource to gather insight into physical systems in modern scientific research and engineering. While access to large amount of data has fueled the use of Machine Learning (ML) to recover physical models from experiments and increase the accuracy of physical simulations, purely data-driven models have limited generalization and interpretability. To overcome these limitations, we propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models starting from experimental data. Since these models consist of mathematical expressions, they are interpretable and amenable to analysis, and the use of a natural, general-purpose discrete mathematical language for physics favors generalization with limited input data. Importantly, DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems. Further, we show that DEC allows to implement a strongly-typed SR procedure that guarantees the mathematical consistency of the recovered models and reduces the search space of symbolic expressions. Finally, we prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data: Poisson equation, the Euler's Elastica and the equations of Linear Elasticity. Thanks to their general-purpose nature, the methods developed in this paper may be applied to diverse contexts of physical modeling.
摘要
现代科学研究和工程中的物理系统模型化是一个关键资源,帮助我们更深入理解物理系统的行为。虽然大量数据的可用性推动了机器学习(ML)技术来从实验中提取物理模型并提高物理仿真的准确性,但纯数据驱动的模型受到限制,其可重复性和可解释性受到限制。为了超越这些限制,我们提出了一种整合符号 regression(SR)和离散外部 calculus(DEC)的框架,用于自动找到从实验数据开始的物理模型。由于这些模型由数学表达组成,它们可以被解释和分析,而使用自然的通用离散数学语言也会增加泛化的能力。此外,DEC提供了离散场论的建构元素,这些元素超越了当前SR在物理问题上的应用状况。此外,我们还证明了DEC可以实现强类型的SR过程,以确保数学模型的数学一致性,并减少符号表达的搜索空间。最后,我们证明了我们的方法效果,通过从合成实验数据中重新发现波松方程、欧拉-埃拉斯特拉方程和线性弹性方程。由于这些方程的通用性,我们的方法可以应用于多种物理模型化的 Context。
results: 确定的时间范围为2019年4月21日至2019年8月9日,与全时序相比只失去0.75%的准确率,而且LRP得到的重要时间步骤也揭示了输入值中的小 Details,这些Details可以用来区分不同的类别。Abstract
We propose an approach for early crop classification through identifying important timesteps with eXplainable AI (XAI) methods. Our approach consists of training a baseline crop classification model to carry out layer-wise relevance propagation (LRP) so that the salient time step can be identified. We chose a selected number of such important time indices to create the bounding region of the shortest possible classification timeframe. We identified the period 21st April 2019 to 9th August 2019 as having the best trade-off in terms of accuracy and earliness. This timeframe only suffers a 0.75% loss in accuracy as compared to using the full timeseries. We observed that the LRP-derived important timesteps also highlight small details in input values that differentiates between different classes and
摘要
我们提出了一种采用可解释AI(XAI)方法进行早期作物分类的方法。我们的方法包括训练一个基eline作物分类模型,并使用层wise relevance propagation(LRP)来确定重要的时间步骤。我们选择了一些重要的时间索引,并将其用于创建最短的可能的分类时间范围。我们确定的时间范围为2019年4月21日至2019年8月9日,这个时间范围只减少了0.75%的准确率,相比使用完整时间序列。我们发现LRP得到的重要时间步骤还高亮了输入值中的小 Details,这些细节可以用于分类不同类型的作物。
Deep Learning reconstruction with uncertainty estimation for $γ$ photon interaction in fast scintillator detectors
results: 研究结果表明该方法的有效性和可靠性,并且强调了估算结果的不确定性的重要性。我们还讨论了该方法在PET成像质量提高方面的潜在影响和如何使用结果来改进模型和应用中的表现。此外,我们还指出该方法可以扩展到其他应用场景以外。Abstract
This article presents a physics-informed deep learning method for the quantitative estimation of the spatial coordinates of gamma interactions within a monolithic scintillator, with a focus on Positron Emission Tomography (PET) imaging. A Density Neural Network approach is designed to estimate the 2-dimensional gamma photon interaction coordinates in a fast lead tungstate (PbWO4) monolithic scintillator detector. We introduce a custom loss function to estimate the inherent uncertainties associated with the reconstruction process and to incorporate the physical constraints of the detector. This unique combination allows for more robust and reliable position estimations and the obtained results demonstrate the effectiveness of the proposed approach and highlights the significant benefits of the uncertainties estimation. We discuss its potential impact on improving PET imaging quality and show how the results can be used to improve the exploitation of the model, to bring benefits to the application and how to evaluate the validity of the given prediction and the associated uncertainties. Importantly, our proposed methodology extends beyond this specific use case, as it can be generalized to other applications beyond PET imaging.
摘要
Simplified Chinese:这篇文章介绍了一种基于物理学的深度学习方法,用于量化gamma交互的空间坐标 within a monolithic scintillator detector,尤其是Positron Emission Tomography(PET)成像。该方法使用了Density Neural Network来估算gamma photon交互的2维坐标在fast lead tungstate(PbWO4)monolithic scintillator detector中。我们引入了一个自定义损失函数,以估算重建过程中的自然不确定性和仪器的物理约束。这种独特的组合使得位置估算更加稳定和可靠,并且实际结果证明了我们的提议的有效性,并强调了估算不确定性的重要性。这种方法有可能改善PET成像质量,并且可以扩展到其他 beyond PET成像的应用。
Statistical properties and privacy guarantees of an original distance-based fully synthetic data generation method
results: 计算的指标表明,使用全部框架时,每个假数据集都具有了满意的隐私保护水平,特别是对于特性泄露攻击。成员泄露攻击被正式防止,而无需重大改变数据。机器学习方法显示,对于模拟的单个化和链接攻击,成功率很低。各数据集的分布和推论指标与原始数据相似。Abstract
Introduction: The amount of data generated by original research is growing exponentially. Publicly releasing them is recommended to comply with the Open Science principles. However, data collected from human participants cannot be released as-is without raising privacy concerns. Fully synthetic data represent a promising answer to this challenge. This approach is explored by the French Centre de Recherche en {\'E}pid{\'e}miologie et Sant{\'e} des Populations in the form of a synthetic data generation framework based on Classification and Regression Trees and an original distance-based filtering. The goal of this work was to develop a refined version of this framework and to assess its risk-utility profile with empirical and formal tools, including novel ones developed for the purpose of this evaluation.Materials and Methods: Our synthesis framework consists of four successive steps, each of which is designed to prevent specific risks of disclosure. We assessed its performance by applying two or more of these steps to a rich epidemiological dataset. Privacy and utility metrics were computed for each of the resulting synthetic datasets, which were further assessed using machine learning approaches.Results: Computed metrics showed a satisfactory level of protection against attribute disclosure attacks for each synthetic dataset, especially when the full framework was used. Membership disclosure attacks were formally prevented without significantly altering the data. Machine learning approaches showed a low risk of success for simulated singling out and linkability attacks. Distributional and inferential similarity with the original data were high with all datasets.Discussion: This work showed the technical feasibility of generating publicly releasable synthetic data using a multi-step framework. Formal and empirical tools specifically developed for this demonstration are a valuable contribution to this field. Further research should focus on the extension and validation of these tools, in an effort to specify the intrinsic qualities of alternative data synthesis methods.Conclusion: By successfully assessing the quality of data produced using a novel multi-step synthetic data generation framework, we showed the technical and conceptual soundness of the Open-CESP initiative, which seems ripe for full-scale implementation.
摘要
引言:原始研究数据的数量正在急剧增长。按照开放科学原则,公共发布这些数据是建议的。然而,从人类参与者收集的数据不能直接发布,否则会引起隐私问题。完全 sintética 数据表示一种有 Promise的解决方案。法国中央研究所在这种 sintética 数据生成框架基于分类和回归树和一种原始的距离基于筛选。该工作的目的是开发一个改进版的这种框架,并通过实验和正式工具评估其风险利用性。材料和方法:我们的 sintesis 框架由四个阶段组成,每个阶段都是为预防特定风险的披露。我们使用两个或更多的这些阶段来处理一个丰富的 epidemiological 数据集。隐私和利用度指标在每个 sintetic 数据集中计算,并使用机器学习方法进行评估。结果:计算的指标表明,使用全部框架时,每个 sintetic 数据集的隐私保护水平很高,特别是对于特征披露攻击。成员披露攻击得到了正式防范,而不是对数据造成重要的变化。机器学习方法表示,在模拟的单个化和链接攻击中, sintetic 数据集的风险很低。 distribución 和推论上的相似性很高,所有的 sintetic 数据集都具有高度的相似性。讨论:这项工作证明了使用多步 sintetic 数据生成框架的技术可行性。为此,我们开发了特有的 formal 和实验工具,这些工具对这一领域做出了重要贡献。未来的研究应该集中在这些工具的扩展和验证上,以确定其他数据生成方法的内在特质。结论:通过成功评估使用多步 sintetic 数据生成框架生成的数据质量,我们证明了开放-CESP INITIATIVE 的技术和概念合理性。这一initiative 似乎准备好进行大规模实施。
An Edge-Aware Graph Autoencoder Trained on Scale-Imbalanced Data for Travelling Salesman Problems
results: 对50,000个TSP实例进行了实验,并证明了该方法可以在不同的规模下达到高度竞争力的性能。Abstract
Recent years have witnessed a surge in research on machine learning for combinatorial optimization since learning-based approaches can outperform traditional heuristics and approximate exact solvers at a lower computation cost. However, most existing work on supervised neural combinatorial optimization focuses on TSP instances with a fixed number of cities and requires large amounts of training samples to achieve a good performance, making them less practical to be applied to realistic optimization scenarios. This work aims to develop a data-driven graph representation learning method for solving travelling salesman problems (TSPs) with various numbers of cities. To this end, we propose an edge-aware graph autoencoder (EdgeGAE) model that can learn to solve TSPs after being trained on solution data of various sizes with an imbalanced distribution. We formulate the TSP as a link prediction task on sparse connected graphs. A residual gated encoder is trained to learn latent edge embeddings, followed by an edge-centered decoder to output link predictions in an end-to-end manner. To improve the model's generalization capability of solving large-scale problems, we introduce an active sampling strategy into the training process. In addition, we generate a benchmark dataset containing 50,000 TSP instances with a size from 50 to 500 cities, following an extremely scale-imbalanced distribution, making it ideal for investigating the model's performance for practical applications. We conduct experiments using different amounts of training data with various scales, and the experimental results demonstrate that the proposed data-driven approach achieves a highly competitive performance among state-of-the-art learning-based methods for solving TSPs.
摘要
Data-level hybrid strategy selection for disk fault prediction model based on multivariate GAN
results: 研究表明,通过使用 GAN 生成数据和遗传算法,可以提高硬盘缺陷分类预测精度,并且可以更好地处理数据类别不均问题。Abstract
Data class imbalance is a common problem in classification problems, where minority class samples are often more important and more costly to misclassify in a classification task. Therefore, it is very important to solve the data class imbalance classification problem. The SMART dataset exhibits an evident class imbalance, comprising a substantial quantity of healthy samples and a comparatively limited number of defective samples. This dataset serves as a reliable indicator of the disc's health status. In this paper, we obtain the best balanced disk SMART dataset for a specific classification model by mixing and integrating the data synthesised by multivariate generative adversarial networks (GAN) to balance the disk SMART dataset at the data level; and combine it with genetic algorithms to obtain higher disk fault classification prediction accuracy on a specific classification model.
摘要
数据类别不匹配是常见的分类问题,其中少数类样本经常更重要和更昂贵的错误分类。因此,解决数据类别不匹配分类问题非常重要。SMART数据集显示了明显的类别不匹配,包括大量的健康样本和相对较少的缺陷样本。这个数据集作为磁盘健康状况的可靠指标。在这篇论文中,我们通过将多变量生成对抗网络(GAN)生成的数据混合和 интегра,在数据层面减少磁盘SMART数据集的类别不匹配;并将生成遗传算法与特定分类模型结合,以提高磁盘缺陷分类预测精度。
Disk failure prediction based on multi-layer domain adaptive learning
results: 提高预测磁盘失败的能力, especialy for disk data with few failure samples.Here’s a more detailed explanation of each point:
for: The paper is written for predicting disk failures, which is an important task in large-scale data storage systems.
methods: The paper proposes a novel method for predicting disk failures by leveraging multi-layer domain adaptive learning techniques. This method involves selecting disk data with numerous faults as the source domain and disk data with fewer faults as the target domain, and training a feature extraction network with the selected origin and destination domains.
results: The proposed technique is demonstrated to be effective in generating a reliable prediction model and improving the ability to predict failures on disk data with few failure samples.Abstract
Large scale data storage is susceptible to failure. As disks are damaged and replaced, traditional machine learning models, which rely on historical data to make predictions, struggle to accurately predict disk failures. This paper presents a novel method for predicting disk failures by leveraging multi-layer domain adaptive learning techniques. First, disk data with numerous faults is selected as the source domain, and disk data with fewer faults is selected as the target domain. A training of the feature extraction network is performed with the selected origin and destination domains. The contrast between the two domains facilitates the transfer of diagnostic knowledge from the domain of source and target. According to the experimental findings, it has been demonstrated that the proposed technique can generate a reliable prediction model and improve the ability to predict failures on disk data with few failure samples.
摘要
Note: Simplified Chinese is also known as "简化字" or "简化字".Please note that the translation is done using Google Translate and it may not be perfect. Also, the translation may not be exactly the same as the original text, as some words or phrases may not have direct translations in Simplified Chinese.
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments
results: 研究发现,在 AttributionLab 中设计的 synthetic environment 中,使用了手动设置的 neural network 和数据可以准确 Reflects the neural network’s learning process,并且可以用这种方法来检验 attribute 方法的准确性。Abstract
Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.
摘要
Feature 归属解释 neural network 输出sBy identifying relevant input features。如何确定这些标识的特征是 neural network 中用到的?这个概念被称为 faithfulness,它是一种重要的性质,它反映了模型中使用的特征与归属特征之间的对应关系。一种最近的趋势是通过设计数据来测试 faithfulness,即在训练模型时,知道哪些输入特征与标签之间存在关系,然后在这些设计的真实特征上训练模型。然而,这个想法假设模型学习所有设计的特征,而这并不一定是真实的。在这篇论文中,我们解决了这个缺失的联系。我们明确地设计了 neural network 的权重,并与数据一起设计,因此我们知道哪些数据集中的输入特征与我们设计的网络中用到的特征之间存在关系。因此,我们可以在 AttributionLab 中测试 faithfulness,这是我们自己设计的人工环境,它作为一种 santity check 有效地筛选出归属方法。如果归属方法不忠实在这种简单控制的环境中,那么它在更复杂的场景中可能不可靠。此外,AttributionLab 环境还可以作为一个 controlled experiments 的实验室,我们可以通过这里进行学习归属方法、发现问题和提出改进建议。
Self-Supervised Dataset Distillation for Transfer Learning
results: 通过实验 validate了方法的有效性,并且可以降低计算成本和获得关键的kernel ridge regression解。Abstract
Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.
摘要
dataset 简化方法已经取得了很大的成功,将大量数据简化成一小集 representative samples。然而,它们并不是为生成可以有效地用于自动学习的简化数据集设计的。为此,我们提出了一个新的问题:简化一个没有标签的数据集成一小组小样本,以便高效地进行自动学习(SSL)。我们首先证明了在随机数据扩充或masking中引入的随机性导致的 SSL 目标函数在 naive bilevel 优化中的梯度是偏移的。为解决这个问题,我们提议将内部目标函数设置为 mean squared error(MSE),这样不会引入随机性。我们的主要动机是希望通过提议的内部优化来模仿自动学习目标模型。为此,我们还引入了 MSE между representations of the inner model 和 self-supervised target model 在原始全 dataset 上,用于外部优化。最后,我们假设了一个固定的 feature extractor,只有在 feature extractor 上进行 linear head 的优化,这使得我们可以降低计算成本并获得一个关于 kernel ridge regression 的闭合型解。我们实际验证了我们的方法在不同应用中的转移学习中的效果。
Runway Sign Classifier: A DAL C Certifiable Machine Learning System
paper_authors: Konstantin Dmitriev, Johann Schumann, Islam Bostanov, Mostafa Abdelhamid, Florian Holzapfel
For: This paper aims to address the certification challenges of Machine Learning (ML) based systems for medium criticality airborne applications.* Methods: The authors use a Deep Neural Network (DNN) for airport sign detection and classification, and employ an established architectural mitigation technique involving two redundant and dissimilar DNNs. They also use novel ML-specific data management techniques to enhance this approach.* Results: The authors demonstrate compliance with Design Assurance Level (DAL) C, which is a more stringent requirement than their previous work that achieved DAL D.Abstract
In recent years, the remarkable progress of Machine Learning (ML) technologies within the domain of Artificial Intelligence (AI) systems has presented unprecedented opportunities for the aviation industry, paving the way for further advancements in automation, including the potential for single pilot or fully autonomous operation of large commercial airplanes. However, ML technology faces major incompatibilities with existing airborne certification standards, such as ML model traceability and explainability issues or the inadequacy of traditional coverage metrics. Certification of ML-based airborne systems using current standards is problematic due to these challenges. This paper presents a case study of an airborne system utilizing a Deep Neural Network (DNN) for airport sign detection and classification. Building upon our previous work, which demonstrates compliance with Design Assurance Level (DAL) D, we upgrade the system to meet the more stringent requirements of Design Assurance Level C. To achieve DAL C, we employ an established architectural mitigation technique involving two redundant and dissimilar Deep Neural Networks. The application of novel ML-specific data management techniques further enhances this approach. This work is intended to illustrate how the certification challenges of ML-based systems can be addressed for medium criticality airborne applications.
摘要
This paper presents a case study of an airborne system that utilizes a Deep Neural Network (DNN) for airport sign detection and classification. Building on our previous work, which demonstrated compliance with Design Assurance Level (DAL) D, we upgraded the system to meet the more stringent requirements of DAL C. To achieve DAL C, we employed an established architectural mitigation technique involving two redundant and dissimilar DNNs. Additionally, we applied novel ML-specific data management techniques to enhance this approach.The purpose of this work is to demonstrate how the certification challenges of ML-based systems can be addressed for medium criticality airborne applications. By upgrading the system to meet DAL C requirements, we were able to demonstrate the feasibility of certifying ML-based airborne systems for use in the aviation industry.
Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory
results: 我们的 теоретиче研究表明,在线对照学习中使用方差减少的 gradient 会导致下降 regret 的改进 bound。实验结果表明,我们的算法在实际数据上比 both kernelized 和 linear 在线对照学习算法更有优势。Abstract
Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of the problem makes scalability challenging, as the gradient computation for a new sample involves all past samples. Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$. However, these approaches are primarily limited to linear models and have induced variance. In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret. Specifically, we establish a clear connection between the variance of online gradients and the regret, and construct online gradients using the most recent stratified samples with a limited buffer of size of $s$ representing all past data, which have a complexity of $O(sT)$ and employs $O(\sqrt{T}\log{T})$ random Fourier features for kernel approximation. Importantly, our theoretical results demonstrate that the variance-reduced online gradients lead to an improved sublinear regret bound. The experiments on real-world datasets demonstrate the superiority of our algorithm over both kernelized and linear online pairwise learning algorithms.
摘要
<>转换给定文本到简化中文。<>在机器学习中,对于基于对例学习的问题,对例学习是非常重要的。在线 gradient descent(OGD)算法已经提出来处理在线对例学习,数据顺序到达时进行学习。然而,对例性问题的特点使得扩展性困难,因为新的样本计算gradient时需要所有过去的样本。 latest advances in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$. However, these approaches are primarily limited to linear models and have induced variance.在这种研究中,我们提出了有限内存OGD算法,扩展到内核在线对例学习,改进了下界 regret。具体来说,我们确定在线 gradients的方差和 regret之间的关系,并使用最近的降序排序样本buffer的大小为$s$,表示所有过去的数据,其复杂度为$O(sT)$。此外,我们还使用$O(\sqrt{T}\log{T})$个随机傅立叶特征来近似内核。我们的理论结果表明,减少方差的在线 gradients会导致改进的下界 regret bound。实验表明,我们的算法在实际 dataset 上比 both kernelized 和 linear online pairwise learning algorithms 高效。
An improved CTGAN for data processing method of imbalanced disk failure
for: solves the problem of insufficient failure data and imbalance between normal and failure data in disk failure diagnosis.
methods: uses an improved Conditional Tabular Generative Adversarial Networks (CTGAN) with a residual network and a classifier for specific category discrimination, as well as a discriminator based on residual network.
results: the synthesized data can further improve the fault diagnosis accuracy of the classifier, as demonstrated by the experimental results.Here is the text in Simplified Chinese:
results: 通过实验结果表明,使用RCTGAN生成的数据可以进一步提高磁盘故障诊断精度。Abstract
To address the problem of insufficient failure data generated by disks and the imbalance between the number of normal and failure data. The existing Conditional Tabular Generative Adversarial Networks (CTGAN) deep learning methods have been proven to be effective in solving imbalance disk failure data. But CTGAN cannot learn the internal information of disk failure data very well. In this paper, a fault diagnosis method based on improved CTGAN, a classifier for specific category discrimination is added and a discriminator generate adversarial network based on residual network is proposed. We named it Residual Conditional Tabular Generative Adversarial Networks (RCTGAN). Firstly, to enhance the stability of system a residual network is utilized. RCTGAN uses a small amount of real failure data to synthesize fake fault data; Then, the synthesized data is mixed with the real data to balance the amount of normal and failure data; Finally, four classifier (multilayer perceptron, support vector machine, decision tree, random forest) models are trained using the balanced data set, and the performance of the models is evaluated using G-mean. The experimental results show that the data synthesized by the RCTGAN can further improve the fault diagnosis accuracy of the classifier.
摘要
Firstly, to enhance the stability of the system, a residual network is utilized. RCTGAN uses a small amount of real failure data to synthesize fake fault data, then the synthesized data is mixed with the real data to balance the amount of normal and failure data. Finally, four classifier (multilayer perceptron, support vector machine, decision tree, random forest) models are trained using the balanced data set, and the performance of the models is evaluated using G-mean. The experimental results show that the data synthesized by the RCTGAN can further improve the fault diagnosis accuracy of the classifier.
Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory
results: 在 MNIST dataset 上进行了实验,测试精度与 FedAvg 和 FedProx 无攻击情况下相比,提高了 3.12% 和 5.84%;相比理想的本地 SGD,在攻击情况下提高了 1.35%。此外,在寻求同目标准确率情况下,我们的框架需要较少的计算时间。Abstract
To address the challenges posed by the heterogeneity inherent in federated learning (FL) and to attract high-quality clients, various incentive mechanisms have been employed. However, existing incentive mechanisms are typically utilized in conventional synchronous aggregation, resulting in significant straggler issues. In this study, we propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory. Within the incentive mechanism, we strive to maximize the utility of the task publisher by adaptively adjusting clients' local model training epochs, taking into account factors such as time delay and test accuracy. In the asynchronous scheme, considering client quality, we devise aggregation weights and an access control algorithm to facilitate asynchronous aggregation. Through experiments conducted on the MNIST dataset, the simulation results demonstrate that the test accuracy achieved by our framework is 3.12% and 5.84% higher than that achieved by FedAvg and FedProx without any attacks, respectively. The framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks. Furthermore, aiming for the same target accuracy, our framework demands notably less computation time than both FedAvg and FedProx.
摘要
在聚合学习(FL)中处理多样性的挑战和吸引高质量客户端的吸引力,各种奖励机制已经被应用。然而,现有的奖励机制通常在同步聚合中使用,导致了显著的延迟问题。在这项研究中,我们提出了一种新的异步FL框架,该框架 integrate了基于合同理论的奖励机制。在奖励机制中,我们尝试以最大化任务发布者的利益为目标,通过调整客户端本地模型训练 epoch,考虑因素如时间延迟和测试准确率。在异步方案中,考虑客户端质量,我们设计了聚合权重和访问控制算法,以便异步聚合。经过在MNIST数据集上进行的实验,实验结果表明,我们的框架测试准确率与FedAvg和FedProx无攻击情况下的测试准确率相差3.12%和5.84%,分别高于FedAvg和FedProx无攻击情况下的测试准确率。此外,我们的框架在攻击情况下与理想的本地SGD准确率之间差不多。此外,在寻求同样的目标准确率情况下,我们的框架需要比FedAvg和FedProx更少的计算时间。
for: This paper is written for developers who need to continually update or correct machine learning models to ensure high prediction accuracy, particularly in complex systems or software.
methods: The paper proposes a correction rule mining approach to acquire a comprehensive list of rules that describe inaccurate subpopulations and how to correct them. The proposed algorithm combines frequent itemset mining and a unique pruning technique for correction rules.
results: The paper found that the proposed algorithm discovered various rules that help collect data insufficiently learned, directly correct model outputs, and analyze concept drift.Abstract
Machine learning models need to be continually updated or corrected to ensure that the prediction accuracy remains consistently high. In this study, we consider scenarios where developers should be careful to change the prediction results by the model correction, such as when the model is part of a complex system or software. In such scenarios, the developers want to control the specification of the corrections. To achieve this, the developers need to understand which subpopulations of the inputs get inaccurate predictions by the model. Therefore, we propose correction rule mining to acquire a comprehensive list of rules that describe inaccurate subpopulations and how to correct them. We also develop an efficient correction rule mining algorithm that is a combination of frequent itemset mining and a unique pruning technique for correction rules. We observed that the proposed algorithm found various rules which help to collect data insufficiently learned, directly correct model outputs, and analyze concept drift.
摘要
Deep reinforcement learning uncovers processes for separating azeotropic mixtures without prior knowledge
results: 论文通过示例化一种可以自动学习并应用于多种化学系统中的流程设计方法,并且可以将大于99%的材料分离成纯组分。这显示出探索器的计划灵活性和可靠性。Abstract
Process synthesis in chemical engineering is a complex planning problem due to vast search spaces, continuous parameters and the need for generalization. Deep reinforcement learning agents, trained without prior knowledge, have shown to outperform humans in various complex planning problems in recent years. Existing work on reinforcement learning for flowsheet synthesis shows promising concepts, but focuses on narrow problems in a single chemical system, limiting its practicality. We present a general deep reinforcement learning approach for flowsheet synthesis. We demonstrate the adaptability of a single agent to the general task of separating binary azeotropic mixtures. Without prior knowledge, it learns to craft near-optimal flowsheets for multiple chemical systems, considering different feed compositions and conceptual approaches. On average, the agent can separate more than 99% of the involved materials into pure components, while autonomously learning fundamental process engineering paradigms. This highlights the agent's planning flexibility, an encouraging step toward true generality.
摘要
Adversarial Robustness in Graph Neural Networks: A Hamiltonian Approach
paper_authors: Kai Zhao, Qiyu Kang, Yang Song, Rui She, Sijie Wang, Wee Peng Tay
For: 本研究探讨了基于多种神经流的图神经网络(GNNs)的抗震性能,尤其是它们与不同的稳定性观念相关,如BIBO稳定性、 Lyapunov稳定性、结构稳定性和保守稳定性。* Methods: 本文提出了基于物理原理的保守汉密尔顿神经流,用于构建抗震性能强的GNNs。并进行了多种验证 benchmark datasets 上的 adversarial attacks 下的实验,以评估不同神经流GNNs 的抗震性能。* Results: 实验结果表明,基于保守汉密尔顿神经流的GNNs 在 adversarial attacks 下具有显著的抗震性能,而 Lyapunov稳定性并不一定能 garantate adversarial robustness。Abstract
Graph neural networks (GNNs) are vulnerable to adversarial perturbations, including those that affect both node features and graph topology. This paper investigates GNNs derived from diverse neural flows, concentrating on their connection to various stability notions such as BIBO stability, Lyapunov stability, structural stability, and conservative stability. We argue that Lyapunov stability, despite its common use, does not necessarily ensure adversarial robustness. Inspired by physics principles, we advocate for the use of conservative Hamiltonian neural flows to construct GNNs that are robust to adversarial attacks. The adversarial robustness of different neural flow GNNs is empirically compared on several benchmark datasets under a variety of adversarial attacks. Extensive numerical experiments demonstrate that GNNs leveraging conservative Hamiltonian flows with Lyapunov stability substantially improve robustness against adversarial perturbations. The implementation code of experiments is available at https://github.com/zknus/NeurIPS-2023-HANG-Robustness.
摘要
“神经网络(GNNs)对于攻击性变化具有漏洞,包括影响节点特征和GraphTopology。这篇论文研究GNNs从多种神经流中派生出来的不同稳定性概念,特别是BIBO稳定性、Lyapunov稳定性、结构稳定性和保守稳定性。我们认为Lyapunov稳定性,即使常用,并不一定能保证攻击适应性。以物理原理为 inspiration,我们倡议使用保守的Hamiltonian神经流创建GNNs,以提高对于攻击性变化的抗性。不同神经流GNNs的攻击适应性在多个Benchmark数据集上进行了实验性的比较。实验结果显示, leveraging conservative Hamiltonian flows with Lyapunov stability can significantly improve the robustness of GNNs against adversarial attacks。相关实验代码可以在https://github.com/zknus/NeurIPS-2023-HANG-Robustness中找到。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Harnessing Administrative Data Inventories to Create a Reliable Transnational Reference Database for Crop Type Monitoring
results: 研究实现了一个名为 E URO C ROPS 的参 Referenced Dataset,可以用于耕地类型分类。Abstract
With leaps in machine learning techniques and their applicationon Earth observation challenges has unlocked unprecedented performance across the domain. While the further development of these methods was previously limited by the availability and volume of sensor data and computing resources, the lack of adequate reference data is now constituting new bottlenecks. Since creating such ground-truth information is an expensive and error-prone task, new ways must be devised to source reliable, high-quality reference data on large scales. As an example, we showcase E URO C ROPS, a reference dataset for crop type classification that aggregates and harmonizes administrative data surveyed in different countries with the goal of transnational interoperability.
摘要
随着机器学习技术的大跃进和其应用于地球观测挑战,已经实现了无 precedent的表现在这个领域。然而,由于感知器数据和计算资源的可用性的限制,这些方法的进一步发展被限制。现在,由于创建这些基准信息是一项昂贵和容易出错的任务,新的方法需要被设计,以获取可靠、高质量的参 refer 数据。作为一个示例,我们展示了E URO C ROPS,一个用于蔬菜类别分类的参 refer 数据集,该数据集在不同国家surveyed的行政数据的基础上,实现了跨国共享和协调。
CAST: Cluster-Aware Self-Training for Tabular Data
results: 在 20 个真实世界数据集上进行了广泛的 empirical 评估,证明 CAST 方法不仅性能更高,还具有在不同的自我训练场景下的稳定性。Abstract
Self-training has gained attraction because of its simplicity and versatility, yet it is vulnerable to noisy pseudo-labels. Several studies have proposed successful approaches to tackle this issue, but they have diminished the advantages of self-training because they require specific modifications in self-training algorithms or model architectures. Furthermore, most of them are incompatible with gradient boosting decision trees, which dominate the tabular domain. To address this, we revisit the cluster assumption, which states that data samples that are close to each other tend to belong to the same class. Inspired by the assumption, we propose Cluster-Aware Self-Training (CAST) for tabular data. CAST is a simple and universally adaptable approach for enhancing existing self-training algorithms without significant modifications. Concretely, our method regularizes the confidence of the classifier, which represents the value of the pseudo-label, forcing the pseudo-labels in low-density regions to have lower confidence by leveraging prior knowledge for each class within the training data. Extensive empirical evaluations on up to 20 real-world datasets confirm not only the superior performance of CAST but also its robustness in various setups in self-training contexts.
摘要
自适应学习已经吸引了广泛关注,因为它的简单性和灵活性,但它受到噪声 pseudo-label 的威胁。许多研究已经提出了成功的方法来解决这个问题,但这些方法减少了自适应学习的优势,因为它们需要特定的修改在自适应学习算法或模型结构上。此外,大多数方法与梯度拟合树不兼容,梯度拟合树在标量领域占据主导地位。为解决这个问题,我们回到了均匀分布假设,即数据样本在邻近的情况下往往属于同一个类。受到这个假设的激发,我们提出了 Cluster-Aware Self-Training(CAST),这是一种简单而通用的方法,可以增强现有的自适应学习算法,无需重大修改。具体来说,我们的方法规范了分类器的信任值,即 pseudo-label 的值,使低密度区域的 pseudo-labels 的信任值降低,通过利用每个类在训练数据中的先验知识。我们对 Up to 20 个实际数据集进行了广泛的实证评估,并证明了 CAST 不仅在不同的自适应学习设置中表现出色,而且在各种各样的自适应学习上下文中具有强大的稳定性。
Initialization Bias of Fourier Neural Operator: Revisiting the Edge of Chaos
results: 建议一种基于He初始化方案的FNO初始化方法,可以缓解FNO的初始化偏见问题,并且实验表明这种方法可以稳定地训练32层FNO,无需额外技术或显著性能下降。Abstract
This paper investigates the initialization bias of the Fourier neural operator (FNO). A mean-field theory for FNO is established, analyzing the behavior of the random FNO from an ``edge of chaos'' perspective. We uncover that the forward and backward propagation behaviors exhibit characteristics unique to FNO, induced by mode truncation, while also showcasing similarities to those of densely connected networks. Building upon this observation, we also propose a FNO version of the He initialization scheme to mitigate the negative initialization bias leading to training instability. Experimental results demonstrate the effectiveness of our initialization scheme, enabling stable training of a 32-layer FNO without the need for additional techniques or significant performance degradation.
摘要
Simplified Chinese:这篇论文研究了傅立叶神经算法(FNO)的初始化偏见。一种mean-field理论被建立,从“边缘化”的角度分析FNO的行为。研究发现,FNO的前向和反向传播行为具有特有的特征,与紧密连接网络类似,但也受到模式舍入的影响。基于这一观察,我们还提出了一种基于FNO的He初始化方案,以缓解初始化偏见,实现了一个32层FNO的稳定训练,无需额外技术或显著性能下降。
Partition-based differentially private synthetic data generation
results: 实验结果显示,这篇论文的方法比以前的方法更好,可以生成高质量的实验数据,并且可以实现更好的隐私保证。Abstract
Private synthetic data sharing is preferred as it keeps the distribution and nuances of original data compared to summary statistics. The state-of-the-art methods adopt a select-measure-generate paradigm, but measuring large domain marginals still results in much error and allocating privacy budget iteratively is still difficult. To address these issues, our method employs a partition-based approach that effectively reduces errors and improves the quality of synthetic data, even with a limited privacy budget. Results from our experiments demonstrate the superiority of our method over existing approaches. The synthetic data produced using our approach exhibits improved quality and utility, making it a preferable choice for private synthetic data sharing.
摘要
<>私有的合成数据分享被 preference 为它可以保持原始数据的分布和特点,而不是仅仅是使用摘要统计。现状的方法采用 select-measure-generate 方法,但测量大型领域边缘仍然导致很大的错误,并且分配隐私预算的迭代仍然困难。为解决这些问题,我们的方法使用分区方法,有效地减少错误并提高合成数据的质量,即使具有有限的隐私预算。我们的实验结果表明我们的方法在现有的方法之上具有明显的优势。合成使用我们的方法生成的数据具有更高的质量和用用,使其成为私有合成数据分享的首选。>>>
DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening
results: 实验结果显示,DrugCLIP方法可以对多种虚拟探测任务进行高效的预测,特别是在零shot情况下,并且可以与传统的探测方法和超vised学习方法相比较。Abstract
Virtual screening, which identifies potential drugs from vast compound databases to bind with a particular protein pocket, is a critical step in AI-assisted drug discovery. Traditional docking methods are highly time-consuming, and can only work with a restricted search library in real-life applications. Recent supervised learning approaches using scoring functions for binding-affinity prediction, although promising, have not yet surpassed docking methods due to their strong dependency on limited data with reliable binding-affinity labels. In this paper, we propose a novel contrastive learning framework, DrugCLIP, by reformulating virtual screening as a dense retrieval task and employing contrastive learning to align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores. We also introduce a biological-knowledge inspired data augmentation strategy to learn better protein-molecule representations. Extensive experiments show that DrugCLIP significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks with highly reduced computation time, especially in zero-shot setting.
摘要
虚拟屏选,可以从庞大的化合物库中标识可能的药物,是人工智能辅助药物发现的关键步骤。传统的停船方法需要很长时间,并且在实际应用中只能使用有限的搜索库。最近的监督学习方法使用紧密度函数预测绑定能力,虽然有承诺,仍然受到有限数据中可靠绑定能力标签的依赖。在这篇论文中,我们提出了一种新的对比学习框架,药物CLIP,通过将虚拟屏选改为密集检索任务,并使用对比学习对绑定蛋白质和分子的表示进行对齐。我们还提出了基于生物知识的数据增强策略,以更好地学习蛋白质-分子表示。广泛的实验表明,药物CLIP在多种虚拟屏选标准准 benchmark上表现出色,特别是在零shot Setting下。
Core-Intermediate-Peripheral Index: Factor Analysis of Neighborhood and Shortest Paths-based Centrality Metrics
methods: 本研究使用变макс基于Eigenvector的因子分析(varimax-based rotation of the Eigenvectors)对中心性指标数据矩阵的转置矩阵进行分析,假设网络中有两个因素(核心和边缘)对节点的中心性指标值产生影响。
results: 本研究在12种复杂的实际世界网络上测试了该方法,并发现CIP指标可以准确地捕捉节点在网络中的中心性和边缘性,并且可以用于评估网络中不同类型节点的中心性和边缘性。Abstract
We perform factor analysis on the raw data of the four major neighborhood and shortest paths-based centrality metrics (Degree, Eigenvector, Betweeenness and Closeness) and propose a novel quantitative measure called the Core-Intermediate-Peripheral (CIP) Index to capture the extent with which a node could play the role of a core node (nodes at the center of a network with larger values for any centrality metric) vis-a-vis a peripheral node (nodes that exist at the periphery of a network with lower values for any centrality metric). We conduct factor analysis (varimax-based rotation of the Eigenvectors) on the transpose matrix of the raw centrality metrics dataset, with the node ids as features, under the hypothesis that there are two factors (core and peripheral) that drive the values incurred by the nodes with respect to the centrality metrics. We test our approach on a diverse suite of 12 complex real-world networks.
摘要
我们对Raw数据进行因素分析,并提出一种新的量化指标called Core-Intermediate-Peripheral(CIP)指数,用于捕捉节点是核心节点(网络中心部分的节点,具有大于其他中心指标值的任何指标)与边缘节点(网络边缘部分的节点,具有较低的任何指标值)的角色扮演的程度。我们使用变差-基于Eigenvector的因素分析(varimax-based rotation of the Eigenvectors)对转置矩阵中的中心指标数据进行分析,假设网络中有两个因素(核心和边缘)驱动节点的中心指标值。我们对12种不同的实际世界网络进行测试。
Boosting Continuous Control with Consistency Policy
results: 实验结果显示,CPQL可以快速地改进策略,并且在11个Offline任务和21个Online任务中达到了新的状态纪录,提高了推理速度,相比Diffusion-QL,CPQL的推理速度提高了约45倍。Abstract
Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline reinforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL. We will release our code later.
摘要
The demand for a large number of diffusion steps makes the diffusion-model-based methods time-inefficient and limits their applications in real-time control.2. How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem.Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function.We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline reinforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL. We will release our code later.
Federated Learning with Reduced Information Leakage and Computation
paper_authors: Tongxin Yin, Xueru Zhang, Mohammad Mahdi Khalili, Mingyan Liu
for: 这 paper 是为了提出一种基于分布式学习的隐私保护机制,以便在多个分布式客户端之间协同学习共同模型,而不需要直接披露本地数据。
methods: 该 paper 使用了一种基于首频采样的方法,其中在每个偶数轮 iteration 中, client 只需要提供一个首频采样,而不需要提供整个数据集。这种方法可以减少了 client 的计算量和隐私泄露。
results: 实验表明,Upcycled-FL 可以在具有不同数据类型的客户端上达到更高的准确率,同时具有更好的隐私保护性和训练时间减少。在 average 的情况下,Upcycled-FL 可以减少 48% 的训练时间。Abstract
Federated learning (FL) is a distributed learning paradigm that allows multiple decentralized clients to collaboratively learn a common model without sharing local data. Although local data is not exposed directly, privacy concerns nonetheless exist as clients' sensitive information can be inferred from intermediate computations. Moreover, such information leakage accumulates substantially over time as the same data is repeatedly used during the iterative learning process. As a result, it can be particularly difficult to balance the privacy-accuracy trade-off when designing privacy-preserving FL algorithms. In this paper, we introduce Upcycled-FL, a novel federated learning framework with first-order approximation applied at every even iteration. Under this framework, half of the FL updates incur no information leakage and require much less computation. We first conduct the theoretical analysis on the convergence (rate) of Upcycled-FL, and then apply perturbation mechanisms to preserve privacy. Experiments on real-world data show that Upcycled-FL consistently outperforms existing methods over heterogeneous data, and significantly improves privacy-accuracy trade-off while reducing 48% of the training time on average.
摘要
federated learning(FL)是一种分布式学习 paradigma,允许多个分散的客户端共同学习一个共同模型,无需直接分享本地数据。 although local data 不会直接暴露,但是隐私问题仍然存在,因为客户端的敏感信息可以通过中间计算被推断出。此外,这种信息泄露会随着训练过程中的重复使用数据堆积,从而使得在设计隐私保护FL算法时进行平衡隐私精度质量的权衡变得特别困难。在这篇论文中,我们介绍了Upcycled-FL,一种新的联合学习框架,在每次偶数轮中应用首降法。在这个框架下,FL更新中的一半不会导致隐私泄露,同时需要 much less computation。我们首先对Upcycled-FL的抽象分析进行了理论分析,然后通过干扰机制来保护隐私。实验表明,Upcycled-FL在具有多样化数据的实际数据上适用,并在隐私精度质量和训练时间之间进行了显著平衡,而且在平均下降48%的训练时间。
Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination
paper_authors: Siyuan Jiang, Yan Ding, Yuling Wang, Lei Xu, Wenli Dai, Wanru Chang, Jianfeng Zhang, Jie Yu, Jianqiao Zhou, Chunquan Zhang, Ping Liang, Dexing Kong
for: This paper aims to address the problem of identifying and differentiating nodules in breast ultrasound videos, which is a challenging task due to the heterogeneous appearances of nodules in different cross-sectional views.
methods: The authors collected hundreds of breast ultrasound videos and built a nodule reidentification system that consists of two parts: an extractor based on a deep learning model and a real-time clustering algorithm.
results: The system obtained satisfactory results and was able to differentiate ultrasound videos. This is the first attempt to apply re-identification technique in the ultrasonic field.Abstract
Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views which makes it hard to perform per-nodule examination. Sonographers usually discriminate different nodules by examining the nodule features and the surrounding structures like gland and duct, which is cumbersome and time-consuming. To address this problem, we collected hundreds of breast ultrasound videos and built a nodule reidentification system that consists of two parts: an extractor based on the deep learning model that can extract feature vectors from the input video clips and a real-time clustering algorithm that automatically groups feature vectors by nodules. The system obtains satisfactory results and exhibits the capability to differentiate ultrasound videos. As far as we know, it's the first attempt to apply re-identification technique in the ultrasonic field.
摘要
乳腺超音波是现代医学检测技术中的一种重要方法,具有不侵入、成本低、无辐射等优点,因此广泛应用于腺体诊断。然而,它受到医生和医疗技术人员的专业知识和临床经验的限制。在超音波图像中,单个腺体可能会显示不同的多样性表现,这使得每个腺体的检测变得困难。医生通常通过评估腺体特征和周围的腺体和腺管来 diferenciation 腺体,这是耗时和耗力的。为解决这个问题,我们收集了数百个乳腺超音波视频,并建立了一个腺体重新标识系统,该系统包括两部分:基于深度学习模型的特征提取器,可以从输入视频剪辑中提取特征向量,以及实时分组算法,可以自动将特征向量分组成不同的腺体。系统取得了满意的结果,并表现出了分辑视频的能力。到目前为止,这是首次应用重新标识技术在超音波领域。
Learning bounded-degree polytrees with known skeleton
paper_authors: Davin Choo, Joy Qiping Yang, Arnab Bhattacharyya, Clément L. Canonne
for: efficient proper learning of bounded-degree polytrees
methods: polynomial-time algorithm and information-theoretic sample complexity lower bound
results: finite-sample guarantees for learning $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is knownAbstract
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.
摘要
我们设定有限样本保证的高维概率分布bounded-degree polytrees的有效性学习,这是一种高维概率分布的丰富类型和 bayesian networks的一个子集,这种图形模型广泛研究。 Bhattacharyya et al. (2021) 已经获得了恢复树状 bayesian networks的有限样本保证,我们将其结果扩展,提供了在 полиtrees 中efficient的算法,可以在有 bounded 的 $d$ 下在有限时间内learns 和样本复杂度。我们还补充了信息理论样本复杂度下界,表明我们的样本复杂度和精度参数之间的依赖关系几乎是紧密的。
Exploit the antenna response consistency to define the alignment criteria for CSI data
results: 我们通过实验证明了ARC的有效性,它可以提高WIFI基于HAR中自助学习的性能。Abstract
Self-supervised learning (SSL) for WiFi-based human activity recognition (HAR) holds great promise due to its ability to address the challenge of insufficient labeled data. However, directly transplanting SSL algorithms, especially contrastive learning, originally designed for other domains to CSI data, often fails to achieve the expected performance. We attribute this issue to the inappropriate alignment criteria, which disrupt the semantic distance consistency between the feature space and the input space. To address this challenge, we introduce \textbf{A}netenna \textbf{R}esponse \textbf{C}onsistency (ARC) as a solution to define proper alignment criteria. ARC is designed to retain semantic information from the input space while introducing robustness to real-world noise. We analyze ARC from the perspective of CSI data structure, demonstrating that its optimal solution leads to a direct mapping from input CSI data to action vectors in the feature map. Furthermore, we provide extensive experimental evidence to validate the effectiveness of ARC in improving the performance of self-supervised learning for WiFi-based HAR.
摘要
自我监督学习(SSL) для WiFi-based人体活动识别(HAR)具有很大的推荐力,因为它可以解决因不充分的标注数据而带来的挑战。然而,直接将SSL算法,特别是对比学习,从其他领域直接应用于CSI数据时,经常无法达到预期的性能。我们认为这是因为不适当的对齐标准,导致Feature空间和输入空间之间的semantic distance的一致性被打乱。为解决这个挑战,我们介绍了ARC(自适应响应相关)作为一种解决方案,以定义适当的对齐标准。ARC是一种可以保持输入空间中的semantic信息的算法,同时具有对实际世界噪音的抗针对性。我们从CSI数据结构的角度分析ARC,并证明其最佳解决方案导致输入CSI数据直接映射到功能图中的动作向量。此外,我们还提供了详细的实验证据,以证明ARC在自我监督学习中提高WiFi-based HAR性能的效果。
Transfer learning-based physics-informed convolutional neural network for simulating flow in porous media with time-varying controls
results: 这个模型可以准确地预测油压和水含量在每个时间步骤中,并且可以快速地训练和转移学习。对于不同的储量格和方向,模型的计算效率和准确性都被证明。在对比 numerical方法的计算效率和准确性方面,模型表现出了优异的性能。Abstract
A physics-informed convolutional neural network is proposed to simulate two phase flow in porous media with time-varying well controls. While most of PICNNs in existing literatures worked on parameter-to-state mapping, our proposed network parameterizes the solution with time-varying controls to establish a control-to-state regression. Firstly, finite volume scheme is adopted to discretize flow equations and formulate loss function that respects mass conservation laws. Neumann boundary conditions are seamlessly incorporated into the semi-discretized equations so no additional loss term is needed. The network architecture comprises two parallel U-Net structures, with network inputs being well controls and outputs being the system states. To capture the time-dependent relationship between inputs and outputs, the network is well designed to mimic discretized state space equations. We train the network progressively for every timestep, enabling it to simultaneously predict oil pressure and water saturation at each timestep. After training the network for one timestep, we leverage transfer learning techniques to expedite the training process for subsequent timestep. The proposed model is used to simulate oil-water porous flow scenarios with varying reservoir gridblocks and aspects including computation efficiency and accuracy are compared against corresponding numerical approaches. The results underscore the potential of PICNN in effectively simulating systems with numerous grid blocks, as computation time does not scale with model dimensionality. We assess the temporal error using 10 different testing controls with variation in magnitude and another 10 with higher alternation frequency with proposed control-to-state architecture. Our observations suggest the need for a more robust and reliable model when dealing with controls that exhibit significant variations in magnitude or frequency.
摘要
提出了一种基于物理学的卷积神经网络(PICNN),用于模拟具有时间变化的两相流体在孔隙媒体中的行为。大多数现有的PICNN都是基于参数到状态映射,而我们提出的网络则使用时间变化的控制来建立控制到状态重 regression。首先,我们采用了 finite volume 方法来离散流体方程,并将损失函数设计为尊重流体保守定律。Neumann 边界条件可以自然地包含在半离散方程中,因此不需要额外的损失项。网络架构包括两个并行的 U-Net 结构,网络输入为控制,输出为系统状态。为了捕捉时间依赖关系 между输入和输出,网络设计得能够模拟离散状态空间方程。我们在每个时间步进行逐步训练,使网络能够同时预测每个时间步的油压和水含量。在训练一个时间步后,我们利用了传输学习技术来加速后续时间步的训练过程。提出的模型用于模拟各种不同的油水孔隙流场景,并进行了对应的numerical方法的比较。结果表明PICNN可以有效地模拟高维度的系统,计算时间不随模型维度增长。我们使用10个测试控制,其中每个控制都有不同的大小和频率,以及另外10个测试控制,其中每个控制都有更高的振荡频率,来评估模型的时间误差。我们的观察表明,当控制 exhibits 显著的变化 в大小或频率时,需要一个更加可靠和可靠的模型。
Discovering Mixtures of Structural Causal Models from Time Series Data
results: 经过对 synthetic 和实际数据进行了广泛的实验,这 paper 的方法在 causal discovery 任务中表现出色,特别是当数据来源于多种不同的 causal 图时。 Additionally, the paper proves the identifiability of the model under some mild assumptions.Abstract
In fields such as finance, climate science, and neuroscience, inferring causal relationships from time series data poses a formidable challenge. While contemporary techniques can handle nonlinear relationships between variables and flexible noise distributions, they rely on the simplifying assumption that data originates from the same underlying causal model. In this work, we relax this assumption and perform causal discovery from time series data originating from mixtures of different causal models. We infer both the underlying structural causal models and the posterior probability for each sample belonging to a specific mixture component. Our approach employs an end-to-end training process that maximizes an evidence-lower bound for data likelihood. Through extensive experimentation on both synthetic and real-world datasets, we demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks, particularly when the data emanates from diverse underlying causal graphs. Theoretically, we prove the identifiability of such a model under some mild assumptions.
摘要
在金融、气候科学和神经科学等领域,从时间序列数据推断 causal 关系是一项具有挑战性的任务。当今技术可以处理非线性变量之间的关系和 flexible 噪声分布,但它们假设数据来自同一个下游 causal 模型。在这种工作中,我们放弃了这个假设,并从时间序列数据来自多种不同 causal 模型的混合中进行 causal 发现。我们推断出下游结构 causal 模型以及每个样本属于特定混合组件的 posterior 概率。我们的方法使用一个端到端的训练过程,以最大化数据可能性的证据下界。经过了大量的实验,我们发现我们的方法在 causal 发现任务中超过了现状征的标准准则,特别是数据来自多种不同 causal 图的情况。从理论角度,我们证明了这样的模型可以在某些轻微假设下进行可 identificability。
Ensemble Active Learning by Contextual Bandits for AI Incubation in Manufacturing
methods: 提议使用ensemble active learning方法,通过contextual bandits实现活动样本标注,保持exploration-exploitation平衡,提高AI模型表现。
results: 实验结果表明,提议方法可以减少注释努力,同时保持数据质量,从而提高AI模型的表现。Abstract
It is challenging but important to save annotation efforts in streaming data acquisition to maintain data quality for supervised learning base learners. We propose an ensemble active learning method to actively acquire samples for annotation by contextual bandits, which is will enforce the exploration-exploitation balance and leading to improved AI modeling performance.
摘要
“保持流式数据收集中的注释努力是重要的,以确保超参学习基础模型的数据质量。我们提出了一种 ensemble active learning 方法,通过contextual bandits来活动收集样本,以保持探索与利用的平衡,从而提高 AI 模型表现。”Here's a word-for-word translation:“保持流式数据收集中的注释努力是重要的,以确保超参学习基础模型的数据质量。我们提出了一种 ensemble active learning 方法,通过contextual bandits来活动收集样本,以保持探索与利用的平衡,从而提高 AI 模型表现。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and other regions.
Gem5Pred: Predictive Approaches For Gem5 Simulation Time
results: 我们的最佳回归模型的 Mean Absolute Error (MAE) 为0.546,而我们的最高精度分类模型的 Accuracy 为0.696。这些模型可以作为未来研究的基础,并且可以与之后的模型进行比较。Abstract
Gem5, an open-source, flexible, and cost-effective simulator, is widely recognized and utilized in both academic and industry fields for hardware simulation. However, the typically time-consuming nature of simulating programs on Gem5 underscores the need for a predictive model that can estimate simulation time. As of now, no such dataset or model exists. In response to this gap, this paper makes a novel contribution by introducing a unique dataset specifically created for this purpose. We also conducted analysis of the effects of different instruction types on the simulation time in Gem5. After this, we employ three distinct models leveraging CodeBERT to execute the prediction task based on the developed dataset. Our superior regression model achieves a Mean Absolute Error (MAE) of 0.546, while our top-performing classification model records an Accuracy of 0.696. Our models establish a foundation for future investigations on this topic, serving as benchmarks against which subsequent models can be compared. We hope that our contribution can simulate further research in this field. The dataset we used is available at https://github.com/XueyangLiOSU/Gem5Pred.
摘要
Better and Simpler Lower Bounds for Differentially Private Statistical Estimation
results: 这两个研究得到了以下结论: + 对于covariance estimation,需要有 $\tilde{\Omega}\left(\frac{d^{3/2}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right)$ 样本,这是前一个研究的改进版本,且是 simpler than previous work。 + 对于mean estimation,需要有 $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right)$ 样本,这与已知的Upper bound相符,并且超过了对于纯 diferencial privacy 的最佳下界。Abstract
We provide improved lower bounds for two well-known high-dimensional private estimation tasks. First, we prove that for estimating the covariance of a Gaussian up to spectral error $\alpha$ with approximate differential privacy, one needs $\tilde{\Omega}\left(\frac{d^{3/2}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right)$ samples for any $\alpha \le O(1)$, which is tight up to logarithmic factors. This improves over previous work which established this for $\alpha \le O\left(\frac{1}{\sqrt{d}\right)$, and is also simpler than previous work. Next, we prove that for estimating the mean of a heavy-tailed distribution with bounded $k$th moments with approximate differential privacy, one needs $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right)$ samples. This matches known upper bounds and improves over the best known lower bound for this problem, which only hold for pure differential privacy, or when $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.
摘要
我们提供了几个改进的下界 для两个高维度私人推导任务。首先,我们证明了为了在 Gaussian 的均值上进行约定 $\alpha$ 的私人推导,需要 $\tilde{\Omega}\left(\frac{d^{3/2}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right)$ 样本,这是对于任何 $\alpha \le O(1)$ 都是严格的下界,并且比前一次的成果更为简单。其次,我们证明了在具有bounded $k$th moments 的非常粗糙分布上进行均值推导时,需要 $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right)$ 样本,这与知名的上界相匹配,并且超过了对于纯粹的推导性能的下界,只有在 $k = 2$ 时才能推导出。我们的技术基于指纹技术,通常很简单。我们的下界 для 均值推导基于黑盒减少,具体来说是从私人推导均值 Gaussian 的方向下减少。我们的下界 для 均值推导使用了 bayesian 方法,证明在对均值矩阵的 inverse wishart 分布下,没有私人推导器可以在预期中准确地推导,不具备充分的样本。
Bi-Level Offline Policy Optimization with Limited Exploration
results: 在使用 synthetic、标准 benchmark 和实际世界数据集进行评估中,我们的模型与现状顶尖方法竞争性表现。Abstract
We study offline reinforcement learning (RL) which seeks to learn a good policy based on a fixed, pre-collected dataset. A fundamental challenge behind this task is the distributional shift due to the dataset lacking sufficient exploration, especially under function approximation. To tackle this issue, we propose a bi-level structured policy optimization algorithm that models a hierarchical interaction between the policy (upper-level) and the value function (lower-level). The lower level focuses on constructing a confidence set of value estimates that maintain sufficiently small weighted average Bellman errors, while controlling uncertainty arising from distribution mismatch. Subsequently, at the upper level, the policy aims to maximize a conservative value estimate from the confidence set formed at the lower level. This novel formulation preserves the maximum flexibility of the implicitly induced exploratory data distribution, enabling the power of model extrapolation. In practice, it can be solved through a computationally efficient, penalized adversarial estimation procedure. Our theoretical regret guarantees do not rely on any data-coverage and completeness-type assumptions, only requiring realizability. These guarantees also demonstrate that the learned policy represents the "best effort" among all policies, as no other policies can outperform it. We evaluate our model using a blend of synthetic, benchmark, and real-world datasets for offline RL, showing that it performs competitively with state-of-the-art methods.
摘要
(Simplified Chinese translation)我们研究无线RL,它的目标是基于预先收集的固定数据集学习一个好策略。然而,这个任务面临着数据分布变化的挑战,尤其是在函数近似下。为解决这个问题,我们提出了一个二级结构化策略优化算法,它模型了策略(上层)和价值函数(下层)之间的层次交互。下层关注于建立一个可靠的价值估计集,使其保持小于一定的均值 Bellman 误差,同时控制由数据分布匹配引起的uncertainty。而上层则是通过最大化一个保守的价值估计来优化策略。这种新的表述保留了隐式引入的探索数据分布的最大灵活性,使得模型渐近。在实践中,它可以通过一种 computationally efficient 的偏好对抗估计过程解决。我们的理论 regret 保证不需要任何数据覆盖和完整性类型的假设,只需要可行性。这些保证还证明了学习的策略是所有策略中的"最佳努力",因为没有其他策略可以超越它。我们使用了一个混合的synthetic、benchmark和实际数据集来评估我们的模型,并显示它与当前的方法竞争。
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
results: 本研究发现,现有的MBRL方法通常会受到“目标差异”的问题,即模型预测的对环境的准确性与策略优化的对环境的回应有所差异。此外,本研究还发现了一些相关的解决方案,包括: + 对环境模型的适应和更新。 + 使用不同的策略优化方法。 + 使用不同的评估标准。Abstract
Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable by learning an explicit model of the environment. While the capabilities of MBRL agents have significantly improved in recent years, how to best learn the model is still an unresolved question. The majority of MBRL algorithms aim at training the model to make accurate predictions about the environment and subsequently using the model to determine the most rewarding actions. However, recent research has shown that model predictive accuracy is often not correlated with action quality, tracing the root cause to the \emph{objective mismatch} between accurate dynamics model learning and policy optimization of rewards. A number of interrelated solution categories to the objective mismatch problem have emerged as MBRL continues to mature as a research area. In this work, we provide an in-depth survey of these solution categories and propose a taxonomy to foster future research.
摘要
results: 这篇论文通过应用层次变换来生成一组特征(或特征),然后使用概率统计方法进行预测。这种方法可以同时实现缩放预测规则和不确定性评估。Abstract
Our goal is to provide a review of deep learning methods which provide insight into structured high-dimensional data. Rather than using shallow additive architectures common to most statistical models, deep learning uses layers of semi-affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied. Thus, the best of both worlds can be achieved: scalable prediction rules fortified with uncertainty quantification, where sparse regularization finds the features.
摘要
我们的目标是为深度学习方法进行评估,以获得结构化高维数据的深入理解。而不是使用大多数统计模型常用的浅层添加性架构,深度学习使用层次的半 Similarity输入变换来提供预测规则。通过这些层次变换,可以获得一组特征(或者特征),这些特征可以通过 probabilistic 统计方法进行评估。因此,可以实现最好的两个世界:可扩展的预测规则和不确定性评估,而 sparse 正则化可以找到特征。Note: "Simplified Chinese" is a translation of "Traditional Chinese" and "简化字" (Simplified Chinese) is a romanization of "简化字" (Simplified Chinese characters).
Sample-Efficient Multi-Agent RL: An Optimization Perspective
results: 作者的算法可以在学习 Nash 均衡、粗略相关均衡和相关均衡问题中达到相对较低的梯度损失,并且与现有的算法相比具有相似的折衡 regret。此外,作者的算法可以避免在数据依赖的约束中解决各个对象的优化问题,从而更易于实际应用。Abstract
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation. In order to find the minimum assumption for sample-efficient learning, we introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. Using this measure, we propose the first unified algorithmic framework that ensures sample efficiency in learning Nash Equilibrium, Coarse Correlated Equilibrium, and Correlated Equilibrium for both model-based and model-free MARL problems with low MADC. We also show that our algorithm provides comparable sublinear regret to the existing works. Moreover, our algorithm combines an equilibrium-solving oracle with a single objective optimization subprocedure that solves for the regularized payoff of each deterministic joint policy, which avoids solving constrained optimization problems within data-dependent constraints (Jin et al. 2020; Wang et al. 2023) or executing sampling procedures with complex multi-objective optimization problems (Foster et al. 2023), thus being more amenable to empirical implementation.
摘要
我们研究多体学习(MARL)的总和游戏(MG)下的通用函数近似下的多体减噪系数(MADC),以找到最小的假设,以实现样本效率的学习。我们提出了第一个统一的算法框架,可以保证样本效率的学习 Nash 平衡、粗 corr 平衡和相关平衡,并且可以避免在数据依赖的约束下解决减噪问题(Jin et al. 2020;Wang et al. 2023)或者在复杂多目标优化问题下执行抽象多目标优化问题(Foster et al. 2023)。这使我们的算法更易于实际应用。
A Bayesian framework for discovering interpretable Lagrangian of dynamical systems from data
results: 我们在六个不同的示例中证明了我们的方法的可行性,这些示例包括 both discrete 和连续系统。Abstract
Learning and predicting the dynamics of physical systems requires a profound understanding of the underlying physical laws. Recent works on learning physical laws involve generalizing the equation discovery frameworks to the discovery of Hamiltonian and Lagrangian of physical systems. While the existing methods parameterize the Lagrangian using neural networks, we propose an alternate framework for learning interpretable Lagrangian descriptions of physical systems from limited data using the sparse Bayesian approach. Unlike existing neural network-based approaches, the proposed approach (a) yields an interpretable description of Lagrangian, (b) exploits Bayesian learning to quantify the epistemic uncertainty due to limited data, (c) automates the distillation of Hamiltonian from the learned Lagrangian using Legendre transformation, and (d) provides ordinary (ODE) and partial differential equation (PDE) based descriptions of the observed systems. Six different examples involving both discrete and continuous system illustrates the efficacy of the proposed approach.
摘要
学习和预测物理系统的动力学需要深刻的物理知识。现有的工作是把物理法则推广到物理系统的寻找方程的发现。而现有的方法通常使用神经网络参数化拉格朗日函数,我们提议一种 alternate 的方法,使得可以从有限数据获得可读性的拉格朗日描述,并且可以量化有限数据所带来的epistemic不确定性。此外,该方法还可以自动从学习到的拉格朗日函数中提取汉密尔顿函数,并且提供了描述系统的常微分方程和偏微分方程描述。我们在六个不同的示例中验证了该方法的有效性,这些示例包括连续和离散系统。
paper_authors: Tatsuki Koga, Kamalika Chaudhuri, David Page for: This paper aims to provide a federated analytics approach for estimating the average treatment effect (ATE) in healthcare applications while ensuring differential privacy (DP) guarantees at each site.methods: The proposed method uses a class of per-site estimation algorithms that report the ATE estimate and its variance as a quality measure, and an aggregation algorithm on the server side that minimizes the overall variance of the final ATE estimate.results: The authors’ experiments on real and synthetic data show that their method reliably aggregates private statistics across sites and provides a better privacy-utility tradeoff under site heterogeneity than baselines.Abstract
Patient privacy is a major barrier to healthcare AI. For confidentiality reasons, most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems that need large volumes of patient data to make effective decisions. A solution to this is collective learning across multiple sites through federated learning with differential privacy. However, literature in this space typically focuses on differentially private statistical estimation and machine learning, which is different from the causal inference-related problems that arise in healthcare. In this work, we take a fresh look at federated learning with a focus on causal inference; specifically, we look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications, and provide a federated analytics approach to enable ATE estimation across multiple sites along with differential privacy (DP) guarantees at each site. The main challenge comes from site heterogeneity -- different sites have different sample sizes and privacy budgets. We address this through a class of per-site estimation algorithms that reports the ATE estimate and its variance as a quality measure, and an aggregation algorithm on the server side that minimizes the overall variance of the final ATE estimate. Our experiments on real and synthetic data show that our method reliably aggregates private statistics across sites and provides better privacy-utility tradeoff under site heterogeneity than baselines.
摘要
�ynamic privacy is a major barrier to healthcare AI. For confidentiality reasons, most patient data remains in silos in separate hospitals, preventing the design of data-driven healthcare AI systems that need large volumes of patient data to make effective decisions. A solution to this is collective learning across multiple sites through federated learning with differential privacy. However, literature in this space typically focuses on differentially private statistical estimation and machine learning, which is different from the causal inference-related problems that arise in healthcare. In this work, we take a fresh look at federated learning with a focus on causal inference; specifically, we look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications, and provide a federated analytics approach to enable ATE estimation across multiple sites along with differential privacy (DP) guarantees at each site. The main challenge comes from site heterogeneity -- different sites have different sample sizes and privacy budgets. We address this through a class of per-site estimation algorithms that reports the ATE estimate and its variance as a quality measure, and an aggregation algorithm on the server side that minimizes the overall variance of the final ATE estimate. Our experiments on real and synthetic data show that our method reliably aggregates private statistics across sites and provides better privacy-utility tradeoff under site heterogeneity than baselines.
Low-Rank Tensor Completion via Novel Sparsity-Inducing Regularizers
paper_authors: Zhi-Yong Wang, Hing Cheung So, Abdelhak M. Zoubir
for: 提高low-rank tensor completion问题中的稀疏性能。
methods: 使用非对称Surrogate/正则化,并开发了基于替换方法的高效算法。
results: 实验结果表明,提案方法可以在实际数据上比前种方法更高的稀疏性能。Abstract
To alleviate the bias generated by the l1-norm in the low-rank tensor completion problem, nonconvex surrogates/regularizers have been suggested to replace the tensor nuclear norm, although both can achieve sparsity. However, the thresholding functions of these nonconvex regularizers may not have closed-form expressions and thus iterations are needed, which increases the computational loads. To solve this issue, we devise a framework to generate sparsity-inducing regularizers with closed-form thresholding functions. These regularizers are applied to low-tubal-rank tensor completion, and efficient algorithms based on the alternating direction method of multipliers are developed. Furthermore, convergence of our methods is analyzed and it is proved that the generated sequences are bounded and any limit point is a stationary point. Experimental results using synthetic and real-world datasets show that the proposed algorithms outperform the state-of-the-art methods in terms of restoration performance.
摘要
对于低矩阵完成问题中带来的偏调,非凸代替器/规律被建议来取代矩阵核心 нор,尽管它们都能够产生简洁性。然而,非凸规律的擦除函数可能无关closed-form表达,因此需要迭代运算,这会增加computational负担。为解决这个问题,我们设计了一个架构,可以生成具有关闭式擦除函数的简洁化规律。这些规律被应用到低管阵完成问题上,并开发了基于多重方向积分法的有效算法。此外,我们分析了我们的方法的收敛性,并证明其生成的序列是紧缩的,任何限点都是稳定点。实验结果显示,提出的方法在实验数据上比州前方法有更好的修复性。
Exploring adversarial attacks in federated learning for medical imaging
results: 测试发现,域专业配置可以使攻击者成功率明显增加。结论强调需要有效的防御机制,并建议现有安全协议在 Federated Medical Image Analysis 系统中进行重新评估。Abstract
Federated learning offers a privacy-preserving framework for medical image analysis but exposes the system to adversarial attacks. This paper aims to evaluate the vulnerabilities of federated learning networks in medical image analysis against such attacks. Employing domain-specific MRI tumor and pathology imaging datasets, we assess the effectiveness of known threat scenarios in a federated learning environment. Our tests reveal that domain-specific configurations can increase the attacker's success rate significantly. The findings emphasize the urgent need for effective defense mechanisms and suggest a critical re-evaluation of current security protocols in federated medical image analysis systems.
摘要
translate_chinese( "Federated learning 提供了一个隐私保护的框架 для医疗影像分析,但暴露了系统于敌意攻击。这篇论文旨在评估 federated learning 网络在医疗影像分析中对这些攻击的抵御能力。使用具有域专属 MRI 肿瘤和病理图像 Datasets,我们评估了已知威胁enario在 Federated learning 环境中的效果。我们的测试发现,域专属配置可以提高攻击者的成功率,这些发现强调了现有安全协议的重要性,并建议进行重新评估。")以下是翻译结果:Federated learning 提供了一个隐私保护的框架 для医疗影像分析,但暴露了系统于敌意攻击。这篇论文旨在评估 federated learning 网络在医疗影像分析中对这些攻击的抵御能力。使用具有域专属 MRI 肿瘤和病理图像 Datasets,我们评估了已知威胁enario在 Federated learning 环境中的效果。我们的测试发现,域专属配置可以提高攻击者的成功率,这些发现强调了现有安全协议的重要性,并建议进行重新评估。
Detecting and Learning Out-of-Distribution Data in the Open world: Algorithm and Theory
results: 本研究提出了一系列算法和理论基础,以便建立能够在开放世界中表现出色并且可靠的机器学习模型。Abstract
This thesis makes considerable contributions to the realm of machine learning, specifically in the context of open-world scenarios where systems face previously unseen data and contexts. Traditional machine learning models are usually trained and tested within a fixed and known set of classes, a condition known as the closed-world setting. While this assumption works in controlled environments, it falls short in real-world applications where new classes or categories of data can emerge dynamically and unexpectedly. To address this, our research investigates two intertwined steps essential for open-world machine learning: Out-of-distribution (OOD) Detection and Open-world Representation Learning (ORL). OOD detection focuses on identifying instances from unknown classes that fall outside the model's training distribution. This process reduces the risk of making overly confident, erroneous predictions about unfamiliar inputs. Moving beyond OOD detection, ORL extends the capabilities of the model to not only detect unknown instances but also learn from and incorporate knowledge about these new classes. By delving into these research problems of open-world learning, this thesis contributes both algorithmic solutions and theoretical foundations, which pave the way for building machine learning models that are not only performant but also reliable in the face of the evolving complexities of the real world.
摘要
OOD detection involves identifying instances from unknown classes that fall outside the model's training distribution. This process helps reduce the risk of making overly confident, erroneous predictions about unfamiliar inputs. In addition, ORL extends the capabilities of the model to not only detect unknown instances but also learn from and incorporate knowledge about these new classes. By tackling these research problems of open-world learning, this thesis provides both algorithmic solutions and theoretical foundations, laying the groundwork for building machine learning models that are not only high-performing but also reliable in the face of the evolving complexities of the real world.
Federated Multi-Level Optimization over Decentralized Networks
results: 该算法可以在不同应用场景中实现优秀表现,包括hyperparameter tuning、分布式 reinforcement learning 和风险谨慎优化。同时,该算法的样本复杂度为网络大小的直线性增长。Abstract
Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization. In this paper, we study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors. This setting is motivated by the need for distributed optimization in large-scale systems, where centralized optimization may not be practical or feasible. To address this problem, we propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale and share information through network propagation. Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications, including hyper-parameter tuning, decentralized reinforcement learning, and risk-averse optimization.
摘要
多层优化在最近几年内得到了越来越多的关注,因为它提供了一个强大的框架来解决许多领域中的复杂优化问题,如元学习、多 Player 游戏、回归学习和嵌套组合优化。在这篇论文中,我们研究了分布式多层优化问题,其中代理可以只与当前邻居进行交流。这种设定是由大规模系统中的分布式优化需求所驱动的,因为中央化优化可能不是实际或可行的。为解决这个问题,我们提出了一种基于吹拂的分布式多层优化算法,允许网络代理在不同层次上解决优化问题,并在单个时间尺度内共享信息。我们的算法实现了最佳样本复杂度,线性增长与网络大小相关,并在多个应用程序中达到了状态革命性的性能,包括超参调整、分布式回归学习和风险谨慎优化。