results: 研究发现,宠物狗在不同语言环境下发出的叫声存在显著的声音差异。此外,研究还发现了一些可能与主人语言模式相关的宠物狗叫声特征。Abstract
How hosts language influence their pets' vocalization is an interesting yet underexplored problem. This paper presents a preliminary investigation into the possible correlation between domestic dog vocal expressions and their human host's language environment. We first present a new dataset of Shiba Inu dog vocals from YouTube, which provides 7500 clean sound clips, including their contextual information of these vocals and their owner's speech clips with a carefully-designed data processing pipeline. The contextual information includes the scene category in which the vocal was recorded, the dog's location and activity. With a classification task and prominent factor analysis, we discover significant acoustic differences in the dog vocals from the two language environments. We further identify some acoustic features from dog vocalizations that are potentially correlated to their host language patterns.
摘要
< translating_language: "zh-CN" >人类主人的语言环境如何影响宠物的叫声是一个有趣又未得到充分研究的问题。这篇论文提出了对宠物叫声和主人语言环境之间可能存在相关性的初步调查。我们首先提供了一个新的Shiba Inu狗叫音数据集,包括YouTube上的7500个干净的叫音示例和其上下文信息,以及主人的语音示例和一个仔细设计的数据处理管道。上下文信息包括叫声在录制场景中的类别、狗的位置和活动。通过分类任务和显著因子分析,我们发现了宠物叫声在两个语言环境下存在显著的声音差异。我们进一步发现了一些宠物叫声特征与主人语言模式之间的可能相关性。
Trip Planning for Autonomous Vehicles with Wireless Data Transfer Needs Using Reinforcement Learning
results: 相比于不相关的基准和带宽不相关的基准,该解决方案在不同的干线交通情况下表现明显更好,可以更好地满足车辆的数据传输需求和行驶时间需求。Abstract
With recent advancements in the field of communications and the Internet of Things, vehicles are becoming more aware of their environment and are evolving towards full autonomy. Vehicular communication opens up the possibility for vehicle-to-infrastructure interaction, where vehicles could share information with components such as cameras, traffic lights, and signage that support a countrys road system. As a result, vehicles are becoming more than just a means of transportation; they are collecting, processing, and transmitting massive amounts of data used to make driving safer and more convenient. With 5G cellular networks and beyond, there is going to be more data bandwidth available on our roads, but it may be heterogeneous because of limitations like line of sight, infrastructure, and heterogeneous traffic on the road. This paper addresses the problem of route planning for autonomous vehicles in urban areas accounting for both driving time and data transfer needs. We propose a novel reinforcement learning solution that prioritizes high bandwidth roads to meet a vehicles data transfer requirement, while also minimizing driving time. We compare this approach to traffic-unaware and bandwidth-unaware baselines to show how much better it performs under heterogeneous traffic. This solution could be used as a starting point to understand what good policies look like, which could potentially yield faster, more efficient heuristics in the future.
摘要
(注意:以下是简化中文版本,与原文可能有所不同)随着交通和互联网的技术进步,车辆正在变得更加自动化。车辆与基础设施之间的交通开发了可以让车辆与道路系统中的设备进行交换信息,如摄像头、交通灯和路标。这使得车辆不仅成为了交通工具,还开始收集、处理和传输大量数据,以提高驾驶的安全性和便利性。5G移动通信网络和更进一步的技术将在路上提供更多的数据带宽,但这可能会具有不同的限制,如视线、基础设施和路上的异化交通。本文关注城市地区自动驾驶车辆的路径规划问题,考虑到驾驶时间和数据传输需求的平衡。我们提出了一种基于强化学习的新解决方案,它会优先选择高带宽道路,以满足车辆的数据传输需求,同时尽量减少驾驶时间。我们与无规则和带宽无规则的基线相比较,以显示这种方法在异化交通情况下的性能有多好。这种解决方案可以作为未来更快、更高效的启示。
Confidence Calibration for Systems with Cascaded Predictive Modules
results: 研究人员通过 theoretically justifying和实验证明,证明了这种方法的效果和性能优势。对比各个模块的预测间隔,这种方法生成的预测间隔更加准确,并提供了更好的性能保证。Abstract
Existing conformal prediction algorithms estimate prediction intervals at target confidence levels to characterize the performance of a regression model on new test samples. However, considering an autonomous system consisting of multiple modules, prediction intervals constructed for individual modules fall short of accommodating uncertainty propagation over different modules and thus cannot provide reliable predictions on system behavior. We address this limitation and present novel solutions based on conformal prediction to provide prediction intervals calibrated for a predictive system consisting of cascaded modules (e.g., an upstream feature extraction module and a downstream regression module). Our key idea is to leverage module-level validation data to characterize the system-level error distribution without direct access to end-to-end validation data. We provide theoretical justification and empirical experimental results to demonstrate the effectiveness of proposed solutions. In comparison to prediction intervals calibrated for individual modules, our solutions generate improved intervals with more accurate performance guarantees for system predictions, which are demonstrated on both synthetic systems and real-world systems performing overlap prediction for indoor navigation using the Matterport3D dataset.
摘要
现有的准确预测算法可以为新的测试样本提供预测 интерVAL,以评估回归模型的性能。然而,对于由多个模块组成的自主系统,单个模块的预测间隔无法考虑模块之间的uncertainty协同传递,因此无法提供可靠的系统行为预测。我们解决这个限制,并提出了基于准确预测的新解决方案,以提供calibrated的预测间隔,用于评估预测系统中各个模块之间的协同影响。我们的关键思想是使用模块级验证数据来描述系统级错误分布,而不需要直接访问端到端验证数据。我们提供了理论 justify和实验 result,以证明我们的解决方案的有效性。相比单个模块的预测间隔,我们的解决方案可以生成更加 precisions的预测间隔,并提供更加准确的性能保证,这些结果在 synthetic 系统和实际世界中进行 overlap 预测的 Matterport3D 数据集上得到证明。
results: 在流行的路径预测数据集上,得到了joint trajectory metrics的state of the art表现,并且可以直接在测试时从多种有价值的决定性分布中随机抽样。Abstract
Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.
摘要
simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. the use of prerecorded real-world traffic scenarios in simulation ensures realism, but the rarity of safety critical events makes large-scale collection of driving scenarios expensive. in this paper, we present djinn - a diffusion-based method of generating traffic scenarios. our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. on popular trajectory forecasting datasets, we report state-of-the-art performance on joint trajectory metrics. in addition, we demonstrate how djinn flexibly enables direct test-time sampling from a variety of valuable conditional distributions, including goal-based sampling, behavior-class sampling, and scenario editing.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. Traditional Chinese is used in Hong Kong, Taiwan, and other regions.
User-Level Differential Privacy With Few Examples Per User
For: 本文研究了用户级别的差分隐私(DP),并取得了以下结果:* Methods: 本文使用了item-level DP算法的通用变换,以实现用户级别的DP。此外,本文还使用了对数机制(McSherry, Talwar FOCS 2007)进行适应。* Results: 本文取得了以下结果: + 对 approximate-DP,我们提供了一个通用的item-level DP算法到用户级别DP算法的变换,具有$(O_{\varepsilon,\delta}(\sqrt{m}))$的优化。 + 对 pure-DP,我们提供了一种简单的适应技术,可以应用于各种任务,如private PAC learning、假设选择和分布学习。对这些问题,我们显示了我们的 bound 是 near-optimal。Abstract
Previous work on user-level differential privacy (DP) [Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the example-rich regime, where the users have so many examples that each user could themselves solve the problem. In this work we consider the example-scarce regime, where each user has only a few examples, and obtain the following results: 1. For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. Roughly speaking, the latter gives a (multiplicative) savings of $O_{\varepsilon,\delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm, while recovering most known bounds for specific problems, also gives new bounds, e.g., for PAC learning. 2. For pure-DP, we present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.
摘要
previous research on user-level differential privacy (DP) (Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023)obtained general algorithms that work for various learning tasks. However, their focus was on the example-rich regime, where users have many examples that they can solve themselves. In this work, we consider the example-scarce regime, where each user only has a few examples, and obtain the following results:1. For approximate-DP, we provide a generic transformation of any item-level DP algorithm to a user-level DP algorithm. This roughly speaking, gives a (multiplicative) savings of $O_{\varepsilon,\delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm recovers most known bounds for specific problems and also gives new bounds, such as for PAC learning.2. For pure-DP, we present a simple technique for adapting the exponential mechanism (McSherry, Talwar FOCS 2007)to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.
Evidential uncertainties on rich labels for active learning
methods: 这个论文使用了两种方法: sampling by Klir uncertainty和sampling by evidential epistemic uncertainty。这两种方法都基于信念函数理论。
results: 这个论文的结果表明,使用这两种方法可以更好地解决exploration-exploitation问题,并且可以更准确地考虑labels的uncertainty。Abstract
Recent research in active learning, and more precisely in uncertainty sampling, has focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, we propose to simplify the computational phase and remove the dependence on observations, but more importantly to take into account the uncertainty already present in the labels, \emph{i.e.} the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which addresses the exploration-exploitation problem, and sampling by evidential epistemic uncertainty, which extends the reducible uncertainty to the evidential framework, both using the theory of belief functions.
摘要
近期研究在活动学习中,更具体地说是在uncertainty sampling中,关注模型不确定性的分解。在这篇论文中,我们提议简化计算阶段,并从观察依赖中解脱,更重要的是,考虑标签上的不确定性,即观察者的不确定性。我们提出了两种策略:基于Klir不确定性的采样,解决探索与利用问题,以及基于证据性不确定性的采样,扩展可reducible uncertainty到证据框架,都使用信仰函数理论。
Sharpness-Aware Minimization and the Edge of Stability
methods: 本文使用一种名为“边缘稳定性”的计算方法,来研究SAM在训练神经网络时的稳定性。这种方法基于Localquadratic approximation of the loss函数。
results: 经验表明,SAM在训练神经网络时会操作在“边缘稳定性”的edge上,这个edge取决于梯度的norm。这些结果用三个深度学习训练任务进行了实证验证。Abstract
Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
摘要
现在的实验表明,当使用梯度下降(GD)学习神经网络时,损失函数的偏导数的 operator нор平方根会不断增长,直到约等于 $2/\eta$,然后会随机变化。这个值被称为 "稳定边缘",基于Localquadratic Approximation of the loss。我们对Sharpness-Aware Minimization(SAM)进行类似的计算,并发现SAM-edge会随着梯度的norm而变化。通过三个深度学习训练任务的实证,我们发现SAM在这个分析定义的稳定边缘上运行。Note: "Simplified Chinese" is a romanization of the Chinese language, and the translation is based on the standardized system known as "Mainland Chinese" or "Mandarin". The translation may vary depending on the specific dialect or regional variation.
Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development
results: 实验结果显示,提出的方法可以获得重要的能源预测结果。Abstract
Energy consumption is a fundamental concern in mobile application development, bearing substantial significance for both developers and end-users. Moreover, it is a critical determinant in the consumer's decision-making process when considering a smartphone purchase. From the sustainability perspective, it becomes imperative to explore approaches aimed at mitigating the energy consumption of mobile devices, given the significant global consequences arising from the extensive utilisation of billions of smartphones, which imparts a profound environmental impact. Despite the existence of various energy-efficient programming practices within the Android platform, the dominant mobile ecosystem, there remains a need for documented machine learning-based energy prediction algorithms tailored explicitly for mobile app development. Hence, the main objective of this research is to propose a novel neural network-based framework, enhanced by a metaheuristic approach, to achieve robust energy prediction in the context of mobile app development. The metaheuristic approach here plays a crucial role in not only identifying suitable learning algorithms and their corresponding parameters but also determining the optimal number of layers and neurons within each layer. To the best of our knowledge, prior studies have yet to employ any metaheuristic algorithm to address all these hyperparameters simultaneously. Moreover, due to limitations in accessing certain aspects of a mobile phone, there might be missing data in the data set, and the proposed framework can handle this. In addition, we conducted an optimal algorithm selection strategy, employing 13 metaheuristic algorithms, to identify the best algorithm based on accuracy and resistance to missing values. The comprehensive experiments demonstrate that our proposed approach yields significant outcomes for energy consumption prediction.
摘要
Mobile 应用程序开发中的能源消耗是一个基本问题,对开发者和用户都具有重要意义。此外,它还是购买智能手机的决策因素之一。从可持续发展的角度来看,针对移动设备的能源消耗减少成为了必要的。虽然Android平台上有各种能效编程做法,但是还没有任何文献记录了基于机器学习的移动应用程序开发中的能源预测算法。因此,本研究的主要目标是提出一种基于神经网络的新框架,通过metaheuristic方法进行优化,以实现移动应用程序开发中的 Robust 能源预测。在我们所知道的范围内,现有的研究都没有使用metaheuristic算法来处理所有的超参数。此外,由于移动设备的一些特性是无法访问的,数据集中可能会有缺失数据,并且我们的提posed方法可以处理这种情况。此外,我们采用了最佳算法选择策略,使用13种metaheuristic算法,以确定最佳算法,基于准确率和缺失值的抗性。广泛的实验表明,我们的提posed方法可以带来显著的能源消耗预测效果。
results: 研究发现,当多Modal学习存在连接和多样性时,模型可以获得更好的普遍性Bound,比单Modal学习提高到$O(\sqrt{n})$。Abstract
Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities', of underlying objects. Despite the longstanding consideration of this perspective in philosophy and cognitive science, the study of multimodality remains relatively under-explored within the field of machine learning. Nevertheless, current studies of multimodal machine learning are limited to empirical practices, lacking theoretical foundations beyond heuristic arguments. An intriguing finding from the practice of multimodal learning is that a model trained on multiple modalities can outperform a finely-tuned unimodal model, even on unimodal tasks. This paper provides a theoretical framework that explains this phenomenon, by studying generalization properties of multimodal learning algorithms. We demonstrate that multimodal learning allows for a superior generalization bound compared to unimodal learning, up to a factor of $O(\sqrt{n})$, where $n$ represents the sample size. Such advantage occurs when both connection and heterogeneity exist between the modalities.
摘要
人类对现实世界的感知包括认识不同的表现形式,或“Modalities”,下面的物体。尽管这个视角在哲学和认知科学中已经有很长的历史,但在机器学习领域中对多 modal 学习的研究仍然较为少 explore。然而,当前的多 modal 学习研究仅限于实践,缺乏更加深入的理论基础,只有一些启发性的 Argument。多 modal 学习实践中的一个感人发现是,一个通过多种Modalities 训练的模型可以在单模态任务上超越精心调整的单模态模型。这篇论文提供了一个解释这种现象的理论框架,通过研究多 modal 学习算法的泛化性质。我们示出,在Modalities 之间存在连接和多样性时,多 modal 学习可以与单 modal 学习相比,提高泛化级别,最高可以达到 $O(\sqrt{n})$,其中 $n$ 表示样本大小。这种优势发生在Modalities 之间存在连接和多样性时。
A Convex Framework for Confounding Robust Inference
results: 提出一种通用的估计器,可以提供精确的下界估计政策价值,并且可以扩展到敏感分析、模型选择和Robust政策学习等领域。Abstract
We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value using convex programming. The generality of our estimator enables various extensions such as sensitivity analysis with f-divergence, model selection with cross validation and information criterion, and robust policy learning with the sharp lower bound. Furthermore, our estimation method can be reformulated as an empirical risk minimization problem thanks to the strong duality, which enables us to provide strong theoretical guarantees of the proposed estimator using techniques of the M-estimation.
摘要
我们研究线上上下文抽屉策略评估,受到不观测的偏见影响。感知分析方法通常用于估计策略价值在最差折衔集下,但现有工作经常使用一些粗略放宽不确定集来简化计算,导致估计策略价值过于保守。在本文中,我们提出一种通用的估计器,可以提供精确的下界估计策略价值使用几何编程。我们的估计器具有通用性,可以进行多种扩展,如感知分析使用f-散度、模型选择使用分割validation和信息因子,以及robust策略学习使用锐下界。此外,我们的估计方法可以转化为empirical risk minimization问题,使得我们可以通过强duality提供强 тео리тиче保证我们的提议估计器。
Change Management using Generative Modeling on Digital Twins
paper_authors: Nilanjana Das, Anantaa Kotal, Daniel Roseberry, Anupam Joshi
For: The paper is written for small and medium-sized businesses that need to securely manage software updates and changes, but do not have the resources to set up a non-production environment for stress testing.* Methods: The paper proposes using “digital twins” on the cloud to create a non-production environment for stress testing software changes, and using Generative Artificial Intelligence (AI) models to generate testing scenarios to check for points of failure.* Results: The paper shows how using digital twins and Generative AI models can help small and medium-sized businesses securely test software changes before releasing them into production, without the need for a dedicated non-production environment.Here is the text in Simplified Chinese:* For: 这篇论文是为小型和中型企业写的,它们需要安全地管理软件更新和变更,但是没有设置非生产环境进行压力测试的资源。* Methods: 论文提议使用云端的数字双向来创建非生产环境,并使用生成式人工智能模型来生成测试场景来检查失败点。* Results: 论文显示,通过使用数字双向和生成式人工智能模型,小型和中型企业可以安全地测试软件更新和变更,无需非生产环境。Abstract
A key challenge faced by small and medium-sized business entities is securely managing software updates and changes. Specifically, with rapidly evolving cybersecurity threats, changes/updates/patches to software systems are necessary to stay ahead of emerging threats and are often mandated by regulators or statutory authorities to counter these. However, security patches/updates require stress testing before they can be released in the production system. Stress testing in production environments is risky and poses security threats. Large businesses usually have a non-production environment where such changes can be made and tested before being released into production. Smaller businesses do not have such facilities. In this work, we show how "digital twins", especially for a mix of IT and IoT environments, can be created on the cloud. These digital twins act as a non-production environment where changes can be applied, and the system can be securely tested before patch release. Additionally, the non-production digital twin can be used to collect system data and run stress tests on the environment, both manually and automatically. In this paper, we show how using a small sample of real data/interactions, Generative Artificial Intelligence (AI) models can be used to generate testing scenarios to check for points of failure.
摘要
小和中等规模的企业面临着安全管理软件更新和变化的一个关键挑战。具体来说,随着黑客攻击的快速演化,软件系统中的更新和补丁是必须的,以保持防御力和符合法规要求。然而,安全补丁和更新在生产环境中进行压力测试是具有安全风险的。大型企业通常具有非生产环境,可以在这些环境中进行更改和测试,然后将其推送到生产环境。然而,小型企业没有这样的设施。在这种情况下,我们表明了如何使用“数字双”,特别是混合IT和IoT环境下的数字双,在云上创建。这些数字双可以作为非生产环境,应用更改并在安全测试前进行压力测试。此外,非生产数字双还可以用来收集系统数据和自动和手动执行压力测试。在这篇论文中,我们表明了如何使用小样本的实际数据和互动,生成人工智能模型来生成测试场景,检查系统的漏洞点。
Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis
results: 研究使用了不净化的表演,并取得了现有最高的 FAD 实实主义分数,并允许了新的时间和风格控制。详细资讯可以参考 benadar293.github.io/midipm。Abstract
Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm
摘要
Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm.Here's the text with traditional Chinese characters: generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm.
The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains
results: 在比特币价格预测、语音情感识别和慢性颈部疼痛检测等三个实验中,FINs方法可以将性能提高约1000、3%和7%。Abstract
Initialization of neural network weights plays a pivotal role in determining their performance. Feature Imitating Networks (FINs) offer a novel strategy by initializing weights to approximate specific closed-form statistical features, setting a promising foundation for deep learning architectures. While the applicability of FINs has been chiefly tested in biomedical domains, this study extends its exploration into other time series datasets. Three different experiments are conducted in this study to test the applicability of imitating Tsallis entropy for performance enhancement: Bitcoin price prediction, speech emotion recognition, and chronic neck pain detection. For the Bitcoin price prediction, models embedded with FINs reduced the root mean square error by around 1000 compared to the baseline. In the speech emotion recognition task, the FIN-augmented model increased classification accuracy by over 3 percent. Lastly, in the CNP detection experiment, an improvement of about 7 percent was observed compared to established classifiers. These findings validate the broad utility and potency of FINs in diverse applications.
摘要
<>将神经网络权重初始化的策略对其性能产生决定性影响。特征模仿网络(FIN)提供了一种新的策略,将权重初始化为近似特定的关闭式统计特征,为深度学习架构设置了良好的基础。尽管FIN的可应用性主要在生物医学领域进行了证明,但本研究扩展了其探索范围到其他时间序列数据集。本研究进行了三种不同的实验来测试对 Tsallis entropy的模仿提高性能的可能性:比特币价格预测、语音情感识别和慢性 neck 疼痛检测。在比特币价格预测任务中,含有 FIN 的模型可以相比基准下降约 1000 的平均方差误差。在语音情感识别任务中,FIN 加装后的模型可以提高分类精度高于 3%。最后,在慢性 neck 疼痛检测实验中,FIN 加装后的模型可以相比已知分类器提高约 7%的正确率。这些发现证明了 FIN 在多样化应用中的广泛适用性和能力。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Health diagnosis and recuperation of aged Li-ion batteries with data analytics and equivalent circuit modeling
paper_authors: Riko I Made, Jing Lin, Jintao Zhang, Yu Zhang, Lionel C. H. Moh, Zhaolin Liu, Ning Ding, Sing Yang Chiam, Edwin Khoo, Xuesong Yin, Guangyuan Wesley Zheng for:This paper aims to assess the battery health and develop a strategy for cell rejuvenation of second-life Li-ion batteries.methods:The paper presents aging and reconditioning experiments of 62 commercial high-energy type lithium iron phosphate (LFP) cells, and uses machine learning models to predict cycle life and identify important indicators of recoverable capacity.results:The paper achieves an average test error of 16.84% ± 1.87% (mean absolute percentage error) for cycle life prediction, and finds that some of the recoverable lost capacity is attributed to the lateral lithium non-uniformity within the electrodes. Additionally, the paper demonstrates how battery operation history significantly affects the capacity recovery.Abstract
Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries. However, due to ambiguous aging mechanisms and lack of correlations between the recovery effects and operational states, it is challenging to accurately estimate battery health and devise a clear strategy for cell rejuvenation. This paper presents aging and reconditioning experiments of 62 commercial high-energy type lithium iron phosphate (LFP) cells, which supplement existing datasets of high-power LFP cells. The relatively large-scale data allow us to use machine learning models to predict cycle life and identify important indicators of recoverable capacity. Considering cell-to-cell inconsistencies, an average test error of $16.84\% \pm 1.87\%$ (mean absolute percentage error) for cycle life prediction is achieved by gradient boosting regressor given information from the first 80 cycles. In addition, it is found that some of the recoverable lost capacity is attributed to the lateral lithium non-uniformity within the electrodes. An equivalent circuit model is built and experimentally validated to demonstrate how such non-uniformity can be accumulated, and how it can give rise to recoverable capacity loss. SHapley Additive exPlanations (SHAP) analysis also reveals that battery operation history significantly affects the capacity recovery.
摘要
锂离子电池寿命和恢复对二次利用锂离子电池的利用率有着关键作用。然而,由于龄测不准确和操作状态和恢复效果之间没有明确的相关性,因此难以正确地评估电池健康状况和恢复策略。本文通过62个商业高能量锂铁磷铌(LFP)电池的年轻和恢复实验,补充了现有的高功率LFP电池数据。基于大规模数据,使用机器学习模型预测循环寿命和重要的循环容量指标。考虑到电池间差异,使用树 boosting 回归器可以在第80次前的80次内实现平均测试错误率为16.84% ± 1.87%(精度error)。此外,发现一些可以恢复的失去容量是由电极铁离子不均匀引起的。通过建立等式模型和实验验证,表明这种不均匀可以在恢复过程中积累,并且可以导致可恢复容量损失。使用 SHapley Additive exPlanations(SHAP)分析也发现,电池操作历史对容量恢复产生了显著影响。
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
results: 经过实验表明,融合后的神经网络性能明显提高,并且更加稳定,比传统的模型融合方法更好Here’s a brief explanation of each point:* “for”: The paper aims to improve the performance and stability of deep learning models.* “methods”: The authors propose a “soft merging” method that combines multiple local optima models quickly and efficiently, using a surrogate of the $l_0$ norm to learn gate parameters.* “results”: The experiments show that the merged neural networks have better performance and are more stable than traditional model merging methods.Abstract
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.
摘要
Parallelizing non-linear sequential models over the sequence length
results: 根据论文的数据显示,这个平行算法可以让循环神经网络和射预 diferencial equation 的训练速度提高至多达 3 倍,而且不会对输出准确性造成影响。此外,这个算法适用于各种循环神经网络和射预 diferencial equation 架构。Abstract
Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.
摘要
纵向模型,如回归神经网络和几何 diferencial equation,长期受到纵向性的限制,导致训练速度慢。多年来,这一限制被认为是不可改变的,许多人认为纵向模型无法并行化。我们挑战这一长期固有的信念,提出了一种并行算法,可以在 GPU 上加速纵向模型的评估,提高训练速度到3个数量级。这种算法不需要纵向模型的特殊结构,因此适用于各种架构。使用我们的方法,纵向模型的训练可以比普通纵向方法快上到10倍,而无需任何意义的训练结果差异。通过加速训练,我们发现了彩虹 Recurrent Unit 在长时间序列分类问题中的效果,并在17k个时间样本上进行了证明。通过突破训练瓶颈,我们的工作为非线性纵向模型在长序列问题中的潜力开辟了第一步。
Weakly-supervised Automated Audio Captioning via text only training
methods: weakly-supervised approach using contrastive language-audio pretraining (CLAP)
results: relative performance of up to ~$83%$ compared to fully supervised approaches, demonstrated on Clotho and AudioCaps datasets.Here’s the full text in Simplified Chinese:
for: 自动化语音描述 (AAC)
methods: 弱型指导方法,使用语音-文本预训练 (CLAP)
results: 与完全指导方法比较, relative performance 高达 ~$83%$, 验证于 Clotho 和 AudioCaps 数据集。Abstract
In recent years, datasets of paired audio and captions have enabled remarkable success in automatically generating descriptions for audio clips, namely Automated Audio Captioning (AAC). However, it is labor-intensive and time-consuming to collect a sufficient number of paired audio and captions. Motivated by the recent advances in Contrastive Language-Audio Pretraining (CLAP), we propose a weakly-supervised approach to train an AAC model assuming only text data and a pre-trained CLAP model, alleviating the need for paired target data. Our approach leverages the similarity between audio and text embeddings in CLAP. During training, we learn to reconstruct the text from the CLAP text embedding, and during inference, we decode using the audio embeddings. To mitigate the modality gap between the audio and text embeddings we employ strategies to bridge the gap during training and inference stages. We evaluate our proposed method on Clotho and AudioCaps datasets demonstrating its ability to achieve a relative performance of up to ~$83\%$ compared to fully supervised approaches trained with paired target data.
摘要
Recently, datasets of paired audio and captions have enabled remarkable success in automatically generating descriptions for audio clips, specifically Automated Audio Captioning (AAC). However, collecting a sufficient number of paired audio and captions is labor-intensive and time-consuming. Motivated by the recent advances in Contrastive Language-Audio Pretraining (CLAP), we propose a weakly-supervised approach to train an AAC model assuming only text data and a pre-trained CLAP model, eliminating the need for paired target data. Our approach leverages the similarity between audio and text embeddings in CLAP. During training, we learn to reconstruct the text from the CLAP text embedding, and during inference, we decode using the audio embeddings. To mitigate the modality gap between the audio and text embeddings, we employ strategies to bridge the gap during training and inference stages. We evaluate our proposed method on Clotho and AudioCaps datasets, demonstrating its ability to achieve a relative performance of up to approximately 83% compared to fully supervised approaches trained with paired target data.
t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators
results: 研究发现,t-EER 是一个具有优点的度量,可以同时评估 PAD 和 biometric verification 系统的可靠性。Abstract
Presentation attack (spoofing) detection (PAD) typically operates alongside biometric verification to improve reliablity in the face of spoofing attacks. Even though the two sub-systems operate in tandem to solve the single task of reliable biometric verification, they address different detection tasks and are hence typically evaluated separately. Evidence shows that this approach is suboptimal. We introduce a new metric for the joint evaluation of PAD solutions operating in situ with biometric verification. In contrast to the tandem detection cost function proposed recently, the new tandem equal error rate (t-EER) is parameter free. The combination of two classifiers nonetheless leads to a \emph{set} of operating points at which false alarm and miss rates are equal and also dependent upon the prevalence of attacks. We therefore introduce the \emph{concurrent} t-EER, a unique operating point which is invariable to the prevalence of attacks. Using both modality (and even application) agnostic simulated scores, as well as real scores for a voice biometrics application, we demonstrate application of the t-EER to a wide range of biometric system evaluations under attack. The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.
摘要
translate into Simplified Chinese:presentation attack (spoofing) detection (PAD) 通常与生物认证结合使用,以提高骗用攻击时的可靠性。尽管这两个子系统在解决单一任务的可靠生物认证时都会运行,但它们处理不同的检测任务,因此通常会被分别评估。证据表明,这种方法是不优化的。我们介绍了一个新的度量来评估在生物认证过程中运行的 PAD 解决方案。与最近提出的 tandem 检测成本函数不同,我们的新 tandem 相同错误率(t-EER)没有参数。尽管这两个分类器组合会导致一个集合的操作点,其中假阳数和遗漏率都是等值的,并且受到攻击频率的影响。因此,我们引入了同时concurrent 的 t-EER,这是不受攻击频率影响的唯一操作点。使用模式(甚至应用)无关的模拟得分,以及一个voice biometrics应用的实际得分,我们在攻击下进行了广泛的生物系统评估。我们的提出的方法是tandem 评估 PAD 系统和生物比较器的强有力的候选度量。
Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing
results: 论文的实验结果表明,这种修正方法可以减轻常见的量化问题,并且可以提供一个更加准确的准确性指标。 此外,这种方法还可以生成一个视觉化的准确性图表,可以较好地表示预测器的准确性。Abstract
Calibration measures and reliability diagrams are two fundamental tools for measuring and interpreting the calibration of probabilistic predictors. Calibration measures quantify the degree of miscalibration, and reliability diagrams visualize the structure of this miscalibration. However, the most common constructions of reliability diagrams and calibration measures -- binning and ECE -- both suffer from well-known flaws (e.g. discontinuity). We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function. We prove that with a careful choice of bandwidth, this method yields a calibration measure that is well-behaved in the sense of (B{\l}asiok, Gopalan, Hu, and Nakkiran 2023a) -- a consistent calibration measure. We call this measure the SmoothECE. Moreover, the reliability diagram obtained from this smoothed function visually encodes the SmoothECE, just as binned reliability diagrams encode the BinnedECE. We also provide a Python package with simple, hyperparameter-free methods for measuring and plotting calibration: `pip install relplot\`.
摘要
“滑动测量和可靠图是两种基本工具用于测量和解释 probabilistic 预测器的准确性。滑动测量量化了偏差的度量,而可靠图可视化了这种偏差的结构。但是,最常用的可靠图和准确度测量的构造(例如 binning 和 ECE)都受到了良好知名的缺陷(例如缺点)。我们展示了一种简单的修改可以解决这些缺陷:首先使用 RBF 核函数平滑 observation,然后计算这个平滑函数的预期准确性错误(ECE)。我们证明了,在选择合适的宽度时,这种方法可以得到一个准确的准确度测量。我们称这种测量为 SmoothECE。此外,从这个平滑函数中获得的可靠图可以视觉地编码 SmoothECE,与 binning 可靠图中的 BinnedECE 类似。我们还提供了一个 Python 包,其中包含了简单、无参数的准确度测量和可靠图plotting 方法:`pip install relplot`。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese. The other form is Traditional Chinese.
results: 作者证明了在一定参数 Régime 下,存在常量时间的随机算法可以查找弱$\epsilon$-近似$\sigma$-细腻 Nash均衡,以及一个多项式时间的杜氏算法可以查找强$\epsilon$-近似$\sigma$-细腻 Nash均衡。Abstract
A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called $\sigma$-smooth Nash equilibrium, for a smoothness parameter $\sigma$. In a $\sigma$-smooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a $\sigma$-smooth strategy, which is a distribution that does not put too much mass (as parametrized by $\sigma$) on any fixed action. We distinguish two variants of $\sigma$-smooth Nash equilibria: strong $\sigma$-smooth Nash equilibria, in which players are required to play $\sigma$-smooth strategies under equilibrium play, and weak $\sigma$-smooth Nash equilibria, where there is no such requirement. We show that both weak and strong $\sigma$-smooth Nash equilibria have superior computational properties to Nash equilibria: when $\sigma$ as well as an approximation parameter $\epsilon$ and the number of players are all constants, there is a constant-time randomized algorithm to find a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games. In the same parameter regime, there is a polynomial-time deterministic algorithm to find a strong $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in a normal-form game. These results stand in contrast to the optimal algorithm for computing $\epsilon$-approximate Nash equilibria, which cannot run in faster than quasipolynomial-time. We complement our upper bounds by showing that when either $\sigma$ or $\epsilon$ is an inverse polynomial, finding a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibria becomes computationally intractable.
摘要
“纳什平衡概念的基本缺陷是其计算复杂性:在正常形游戏中,approximating纳什平衡是PPAD困难的。在这篇论文中,我们采纳了缓和分析的想法,并引入了一种名为$\sigma$-缓平衡的弱化版本。在$\sigma$-缓平衡中,玩家只需要实现Utility在最好的偏转前的最高水平,这个水平是一个 Distribution 不能集中过多的动作。我们分为两种类型的$\sigma$-缓平衡:强$\sigma$-缓平衡和弱$\sigma$-缓平衡。在强$\sigma$-缓平衡中,玩家在平衡状态下必须采用$\sigma$-缓动作,而在弱$\sigma$-缓平衡中,没有这种要求。我们证明了两种$\sigma$-缓平衡都有较好的计算性质:当$\sigma$ 以及approximation参数 $\epsilon$ 和玩家数量都是常数时,可以在常数时间内随机找到一个 $\epsilon$-近似的 $\sigma$-缓平衡。在同样的参数 régime 中,可以在多项时间内决定一个强 $\epsilon$-近似的 $\sigma$-缓平衡。这些结果与 оптималь的算法 для计算 $\epsilon$-近似纳什平衡不同,它们不能在更快的 quasi-polynomial 时间内运行。我们补充我们的上限 bounds ,表明当 $\sigma$ 或 $\epsilon$ 是反对数 polynomials 时,找到一个 $\epsilon$-近似的 $\sigma$-缓平衡变得计算困难。”
results: 对 synthetic 和实际数据进行实验,结果表明 RAMs 可以提高表达能力 compared to GAMs 而又保持可解释性。Abstract
Generalized Additive Models (GAMs) are widely used explainable-by-design models in various applications. GAMs assume that the output can be represented as a sum of univariate functions, referred to as components. However, this assumption fails in ML problems where the output depends on multiple features simultaneously. In these cases, GAMs fail to capture the interaction terms of the underlying function, leading to subpar accuracy. To (partially) address this issue, we propose Regionally Additive Models (RAMs), a novel class of explainable-by-design models. RAMs identify subregions within the feature space where interactions are minimized. Within these regions, it is more accurate to express the output as a sum of univariate functions (components). Consequently, RAMs fit one component per subregion of each feature instead of one component per feature. This approach yields a more expressive model compared to GAMs while retaining interpretability. The RAM framework consists of three steps. Firstly, we train a black-box model. Secondly, using Regional Effect Plots, we identify subregions where the black-box model exhibits near-local additivity. Lastly, we fit a GAM component for each identified subregion. We validate the effectiveness of RAMs through experiments on both synthetic and real-world datasets. The results confirm that RAMs offer improved expressiveness compared to GAMs while maintaining interpretability.
摘要
通用加itive模型(GAMs)广泛应用于不同领域的解释可能模型。GAMs假设输出可以表示为一个或多个变量函数的总和,称为组件。然而,在机器学习问题中,输出与多个特征同时相互作用,这导致GAMs无法捕捉到下面函数的交叉项,从而导致准确率下降。为解决这个问题,我们提出了地域加itive模型(RAMs),一种新的解释可能模型。RAMs将特征空间分成多个子区域,其中交叉项减少。在这些子区域内,我们可以更准确地表示输出为一个或多个变量函数的总和(组件)。因此,RAMs在每个特征上采用一个组件,而不是一个组件。这种方法比GAMs更加表达力,同时保持可解释性。RAMs的框架包括三个步骤:首先,我们训练黑盒模型;其次,使用地域效果图来确定特征空间中交叉项下降的子区域;最后,我们在每个确定的子区域内采用GAM组件。我们通过对真实数据和 sintetic 数据进行实验,证明RAMs可以提高表达力,同时保持可解释性。
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices
results: 实现了约 7.8 x 10^4 倍的能效率提升,与同级准确性相当Here’s the full translation of the paper’s abstract in Simplified Chinese:
for: 本文用于逻辑 neuromorphic computing 加速
methods: 利用 AQFP 设备的随机行为和软件硬件协调
results: 实现了约 7.8 x 10^4 倍的能效率提升,与同级准确性相当I hope this helps! Let me know if you have any further questions.Abstract
Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high energy efficiency. By employing the distinct polarity of current to denote logic `0' and `1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward developing an AQFP-based BNN accelerator, several critical challenges remain, preventing the design from being a comprehensive solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN acceleration framework that leverages software-hardware co-optimization to eventually make the AQFP devices a feasible solution for BNN acceleration. Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation. To tackle the accumulation problem and improve overall hardware performance, we propose a stochastic computing-based accumulation module and a clocking scheme adjustment-based circuit optimization method. We validate our SupeRBNN framework across various datasets and network architectures, comparing it with implementations based on different technologies, including CMOS, ReRAM, and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our design achieves an energy efficiency of approximately 7.8x10^4 times higher than that of the ReRAM-based BNN framework while maintaining a similar level of model accuracy. Furthermore, when compared with superconductor-based counterparts, our framework demonstrates at least two orders of magnitude higher energy efficiency.
摘要
adiabatic量子流 Parametron (AQFP) 是一种超导逻辑,其能效率极高。通过使用电流的正负polarity来表示逻辑“0”和“1”,AQFP设备成为了优秀的二进制神经网络(BNN)计算器。虽然最近的研究已经做出了初步的进展,但还有许多关键挑战,使得AQFP设备无法成为全面的解决方案。在这篇论文中,我们提出了SupeRBNN框架,它是基于AQFP的随机BNN加速器,通过软件硬件协同优化来使AQFP设备成为BNN加速器的可能性。我们调查了AQFP设备的随机行为,分析了交叉板大小对电流强度的影响,并将电流强度转化为适合BNN计算的值。为了解决积累问题并提高硬件性能,我们提出了随机计算模块和时钟控制调整的电路优化方法。我们在不同的 datasets 和网络架构上验证了我们的SupeRBNN框架,与不同的技术,包括CMOS、ReRAM和超导器RSFQ/ERSFQ进行比较。实验结果表明,我们的设计可以达到约7.8亿次高于ReRAM基于BNN框架的能效率,同时保持相同的模型准确性。此外,与超导器基于的同类设计相比,我们的设计可以达到至少两个数量级的高于能效率。
Physics-informed State-space Neural Networks for Transport Phenomena
results: 通过两个在silico实验(加热管和冷却系统循环)的示例,证明PSMs比普通的数据驱动模型更加准确。此外,PSMs还可以用于创建非线性监管控制器和系统诊断算法。Abstract
This work introduces Physics-informed State-space neural network Models (PSMs), a novel solution to achieving real-time optimization, flexibility, and fault tolerance in autonomous systems, particularly in transport-dominated systems such as chemical, biomedical, and power plants. Traditional data-driven methods fall short due to a lack of physical constraints like mass conservation; PSMs address this issue by training deep neural networks with sensor data and physics-informing using components' Partial Differential Equations (PDEs), resulting in a physics-constrained, end-to-end differentiable forward dynamics model. Through two in silico experiments - a heated channel and a cooling system loop - we demonstrate that PSMs offer a more accurate approach than purely data-driven models. Beyond accuracy, there are several compelling use cases for PSMs. In this work, we showcase two: the creation of a nonlinear supervisory controller through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates the ability of PSMs to handle both constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further posit that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems.
摘要
Besides accuracy, there are several compelling use cases for PSMs. In this work, we showcase two: the creation of a nonlinear supervisory controller through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates the ability of PSMs to handle both constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further propose that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems.
Boolformer: Symbolic Regression of Logic Functions with Transformers
methods: 该论文使用了一种名为 Boolformer 的 transformer 架构,通过使用 clean truth table 预测复杂函数的简洁表达式。
results: 该论文在一系列实际的二分类 datasets 上进行了评估,并示出了 Boolformer 的可解释性和高效性。 Additionally, the paper shows that Boolformer can be applied to the task of modeling gene regulatory networks, and is competitive with state-of-the-art genetic algorithms with a significant speedup.Abstract
In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions. First, we show that it can predict compact formulas for complex functions which were not seen during training, when provided a clean truth table. Then, we demonstrate its ability to find approximate expressions when provided incomplete and noisy observations. We evaluate the Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods. Finally, we apply it to the widespread task of modelling the dynamics of gene regulatory networks. Using a recent benchmark, we show that Boolformer is competitive with state-of-the art genetic algorithms with a speedup of several orders of magnitude. Our code and models are available publicly.
摘要
在这项工作中,我们介绍了Boolformer,第一个基于Transformer架构的端到端符号重 regression的布尔函数预测模型。我们首先表明,它可以在提供了干净的真值表时预测复杂函数的简洁表达。然后,我们示出它可以在提供不完整和噪声探测数据时找到近似表达。我们对一组真实世界的 binary 分类数据进行评估,示出它的可读性和可比较性。最后,我们将其应用于模型生物学制御网络的动态学习任务,使用最新的 benchmark ,并证明它与当前的遗传算法竞争。我们的代码和模型公共可用。
Optimal Conditional Inference in Adaptive Experiments
results: 这 paper 的结果表明,在批处理bandit实验中,使用最后一批数据进行推断是最优的。当批处理bandit实验中的特性是Location-invariant时,存在一个额外的信息,即批处理arm means的一个线性函数。在更加 restrictive 的情况下,可以 derivate computationally tractable 和最优的 conditional inference 方法。Abstract
We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.
摘要
我们研究批处bandit实验,考虑实验结束时间、分配概率和目标参数的推断问题,这些参数都可能是通过实验前几批数据来采取适应性的选择。不含任何额外限制,我们显示在实验结束后使用最后一批数据进行推断是优化的。当批处arm的 adaptive 特性是位置不变的,即将所有批处arm的均值shifted by a constant,我们显示该数据中还存在一个额外的线性函数, capture 了批处arm的均值。在更restrictive的情况下, stopping time、分配概率和目标参数都是通过数据来定义的polyhedral事件集合,我们 derive computationally tractable and optimal conditional inference procedures.Note: "批处bandit" refers to a batched bandit experiment, where the experimenter collects data by interacting with a set of arms (e.g., treatments or actions) in batches, rather than one at a time.
Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval
results: 研究表明,使用深度学习方法可以在不同模式之间建立连接,并且可以提高音频-谱面重 Retrieval的精度。但是,还有一些挑战需要解决,以实现大规模的应用。Abstract
A range of applications of multi-modal music information retrieval is centred around the problem of connecting large collections of sheet music (images) to corresponding audio recordings, that is, identifying pairs of audio and score excerpts that refer to the same musical content. One of the typical and most recent approaches to this task employs cross-modal deep learning architectures to learn joint embedding spaces that link the two distinct modalities - audio and sheet music images. While there has been steady improvement on this front over the past years, a number of open problems still prevent large-scale employment of this methodology. In this article we attempt to provide an insightful examination of the current developments on audio-sheet music retrieval via deep learning methods. We first identify a set of main challenges on the road towards robust and large-scale cross-modal music retrieval in real scenarios. We then highlight the steps we have taken so far to address some of these challenges, documenting step-by-step improvement along several dimensions. We conclude by analysing the remaining challenges and present ideas for solving these, in order to pave the way to a unified and robust methodology for cross-modal music retrieval.
摘要
多种多Modal音乐信息检索的应用中心于将大量的Sheet Music图像与相应的音频记录相连接,即将Audio和Sheet Music图像中的同一段音乐内容相匹配。最近的一种常见的方法是使用交叉模态深度学习建筑来学习连接两种不同模式的Audio和Sheet Music图像的共同空间。虽然在过去几年内有所进步,但还有许多未解决的问题,阻碍大规模应用这种方法。在这篇文章中,我们尝试提供了深入的检查现代深度学习方法在Audio-Sheet Music检索中的最新发展。我们首先确定了cross-modal音乐检索中的主要挑战,然后高亮我们已经采取的措施来解决一些这些挑战,并记录了一系列维度上的改进。我们最后分析了剩下的挑战,并提出了解决这些挑战的想法,以便推导一种简单、稳定的方法来实现cross-modal音乐检索。
Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems
paper_authors: Luis Carvalho, Tobias Washüttl, Gerhard Widmer
for: 提高跨模式音乐检索系统的效果
methods: 使用自动提取的音频和Sheet图像的对比学习
results: 在多种实验中,预训练模型可以更好地检索音频和Sheet图像的剪辑,并且在跨模式音乐标识任务中,检索精度从30%提高到100%。Abstract
Linking sheet music images to audio recordings remains a key problem for the development of efficient cross-modal music retrieval systems. One of the fundamental approaches toward this task is to learn a cross-modal embedding space via deep neural networks that is able to connect short snippets of audio and sheet music. However, the scarcity of annotated data from real musical content affects the capability of such methods to generalize to real retrieval scenarios. In this work, we investigate whether we can mitigate this limitation with self-supervised contrastive learning, by exposing a network to a large amount of real music data as a pre-training step, by contrasting randomly augmented views of snippets of both modalities, namely audio and sheet images. Through a number of experiments on synthetic and real piano data, we show that pre-trained models are able to retrieve snippets with better precision in all scenarios and pre-training configurations. Encouraged by these results, we employ the snippet embeddings in the higher-level task of cross-modal piece identification and conduct more experiments on several retrieval configurations. In this task, we observe that the retrieval quality improves from 30% up to 100% when real music data is present. We then conclude by arguing for the potential of self-supervised contrastive learning for alleviating the annotated data scarcity in multi-modal music retrieval models.
摘要
把乐谱图像和声音记录相连接是跨模态音乐检索系统的关键问题。一种基本的方法是通过深度神经网络学习跨模态嵌入空间,以连接短暂的声音和乐谱。但是,实际音乐内容上的标注数据的稀缺性影响这些方法在实际检索场景中的泛化能力。在这项工作中,我们研究了是否可以通过自我超vised强制学习来缓解这种限制,通过对模拟和真实音乐数据进行随机增强后,让网络对声音和乐谱两种模态进行对比。经过一些实验,我们发现在所有场景和预训练配置下,预训练模型都能够更好地进行检索。这些结果使我们对跨模态乐谱识别任务进行更多的实验,并观察到在真实音乐数据存在的情况下,检索质量从30%提高到100%。最后,我们 conclude通过自我超vised强制学习可以缓解跨模态音乐检索模型中的标注数据稀缺性。
Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems
results: 提供了确定的恢复和扩展 garanties,并 derive了过参数 boundsHere’s a more detailed explanation of each point:
for: The paper is written to solve inverse problems using unsupervised feedforward multilayer neural networks.
methods: The paper uses unsupervised feedforward multilayer neural networks to solve inverse problems.
results: The paper provides deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. Additionally, the paper derives overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from the guarantees.Abstract
Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to control the Neural Tangent Kernel. In this work we investigate how to bridge these two worlds and we provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
摘要
neural networks 已成为 inverse problems 的解决方法之一,而且在过去几年中,有许多这种方法被开发出来解决 inverse problems。然而,我们仍然缺乏这些方法的明确理论保证。一方面,许多研究证明了 neural networks 在更一般情况下的抽象上是可控的,通过过 parametrization 来控制 Neural Tangent Kernel。在这篇文章中,我们尝试将这两个世界联系起来,并为 class of unsupervised feedforward multilayer neural networks 解决 inverse problems 提供确定的收敛和恢复保证。我们还 deriv overparametrization 下界,以便在 smooth activation function 的情况下,two-layers Deep Inverse Prior network 能够benefit from our guarantees。
Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval
methods: 该论文提出了一种基于深度神经网络学习的跨模态整合空间,通过适当的相似结构来相关短长 audio和Sheet Music snippet。
results: 该论文通过设计跨模态回归网络,解决了训练网络需要强相关数据和音频-Sheet Music snippet中的音乐内容差异问题。实验结果表明,该方法可以在所有可能的配置下提供更高精度的检索结果,只需要弱相关的音频-Sheet Music pair。Abstract
Many applications of cross-modal music retrieval are related to connecting sheet music images to audio recordings. A typical and recent approach to this is to learn, via deep neural networks, a joint embedding space that correlates short fixed-size snippets of audio and sheet music by means of an appropriate similarity structure. However, two challenges that arise out of this strategy are the requirement of strongly aligned data to train the networks, and the inherent discrepancies of musical content between audio and sheet music snippets caused by local and global tempo differences. In this paper, we address these two shortcomings by designing a cross-modal recurrent network that learns joint embeddings that can summarize longer passages of corresponding audio and sheet music. The benefits of our method are that it only requires weakly aligned audio-sheet music pairs, as well as that the recurrent network handles the non-linearities caused by tempo variations between audio and sheet music. We conduct a number of experiments on synthetic and real piano data and scores, showing that our proposed recurrent method leads to more accurate retrieval in all possible configurations.
摘要
很多跨Modal音乐检索应用都与将乐谱图像与音频录音相连接。一种常见的方法是通过深度神经网络学习一个共同embedding空间,以便通过适当的相似结构相关短段音频和乐谱图像。然而,这种策略存在两个挑战:首先,需要强相关的数据来训练网络;其次,由于音频和乐谱图像片段之间的本地和全局拍速差异,音频和乐谱图像之间的 Musical content会有差异。在这篇论文中,我们解决这两个缺陷,通过设计一种跨Modal循环网络,学习联合表示音频和乐谱图像的joint embedding。我们的方法的优点是:只需弱相关的音频-乐谱图像对,以及循环网络可以处理音频和乐谱图像之间的非线性。我们在synthetic和真实钢琴数据和谱面上进行了许多实验,结果表明,我们提议的循环方法可以在所有配置下实现更高精度的检索。
results: 可以达到25%下降峰值内存使用量和15%快速训练速度,同时保持模型准确性水平Abstract
Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both single and half-precision floating point arithmetic to reduce memory requirements while maintaining model accuracy. We provide here an algorithm to further reduce memory usage during the training of a model by getting rid of the floating point copy of the parameters, virtually keeping only half-precision numbers. We also explore the benefits of getting rid of the gradient's value by executing the optimizer step during the back-propagation. In practice, we achieve up to 25% lower peak memory use and 15% faster training while maintaining the same level of accuracy.
摘要
(以下是简化中文版)传统优化方法通过单精度浮点数运算来实现,这可能会占用大量内存空间和计算资源。然而,混合精度优化技术利用单精度和半精度浮点数运算来减少内存需求,保持模型准确性。我们提供一种算法,以减少训练过程中模型参数的浮点复制,实际上只保留半精度数字。我们还探索了在反传propagation过程中禁用梯度值的优势,通过在反传propagation过程中执行优化器步骤。在实践中,我们达到了25%下降的峰值内存使用量和15%快速训练,同时保持同等准确性。
methods: 使用在线 clustering 方法,基于动态更新的 finite pool of samples 或 gradients,避免提供算法 task 变化信息。
results: 在域增量学习中成功避免灾难性忘记,并在实际 dataset 上进行了实验,比对state-of-the-art 方法的表现。Abstract
We consider the problem of learning multiple tasks in a continual learning setting in which data from different tasks is presented to the learner in a streaming fashion. A key challenge in this setting is the so-called "catastrophic forgetting problem", in which the performance of the learner in an "old task" decreases when subsequently trained on a "new task". Existing continual learning methods, such as Averaged Gradient Episodic Memory (A-GEM) and Orthogonal Gradient Descent (OGD), address catastrophic forgetting by minimizing the loss for the current task without increasing the loss for previous tasks. However, these methods assume the learner knows when the task changes, which is unrealistic in practice. In this paper, we alleviate the need to provide the algorithm with information about task changes by using an online clustering-based approach on a dynamically updated finite pool of samples or gradients. We thereby successfully counteract catastrophic forgetting in one of the hardest settings, namely: domain-incremental learning, a setting for which the problem was previously unsolved. We showcase the benefits of our approach by applying these ideas to projection-based methods, such as A-GEM and OGD, which lead to task-agnostic versions of them. Experiments on real datasets demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
摘要
我们考虑一个多任务学习的情况,在这个情况下,不同任务的数据会在流动的方式下提供给学习者。一个重要的挑战是所谓的“惨重遗传问题”,即在训练新任务后,学习者对旧任务的性能下降。现有的几种对策方法,如Averaged Gradient Episodic Memory(A-GEM)和Orthogonal Gradient Descent(OGD),可以避免惨重遗传,但是这些方法假设学习者知道任务的变化,这是实际上不可能的。在这篇论文中,我们解决这个问题,通过在动态更新的有限组合中使用线上剂化的方法,以避免学习者对旧任务的损害。我们运用这些想法,将A-GEM和OGD等方法转换为任务不对称的版本,并对真实数据进行实验,展示了我们的方法的有效性和与现有方法相比的应用前景。
S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees
methods: 本研究使用了四种主要技术来改善实用性与隐私贸易的问题:(1)改进隐私泄露的规定,使其与实际泄露规律相符;(2)将个人Rényi范围给integrated into our method,以从训练过程中未利用的数据点中学习;(3)将随机决策树分割给集中隐私预算;(4)将隐私预算优化。
results: 我们的评估结果显示,在Abalone dataset(约4k训练数据点)上,我们可以在隐私水平$\varepsilon=0.15$下达到$R^2$-score的0.39,比前一代研究只能在$\varepsilon=10.0$下达到。在Adult dataset(50k训练数据点)上,我们可以在隐私水平$\varepsilon=0.07$下达到test error的18.7%,比前一代研究只能在$\varepsilon=1.0$下达到。在Abalone dataset上,在隐私水平$\varepsilon=0.54$下,我们可以达到$R^2$-score的0.47,仅次于非隐私版本的GBDT。在Adult dataset上,在隐私水平$\varepsilon=0.54$下,我们可以达到test error的17.1%,仅次于非隐私版本的GBDT。Abstract
Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with $O(1/n)$, for $n$ data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset ($<4k$ training data points) a $R^2$-score of $0.39$ for $\varepsilon=0.15$, which the closest prior work only achieved for $\varepsilon=10.0$. On the Adult dataset ($50k$ training data points) we achieve test error of $18.7\,\%$ for $\varepsilon=0.07$ which the closest prior work only achieved for $\varepsilon=1.0$. For the Abalone dataset for $\varepsilon=0.54$ we achieve $R^2$-score of $0.47$ which is very close to the $R^2$-score of $0.54$ for the nonprivate version of GBDT. For the Adult dataset for $\varepsilon=0.54$ we achieve test error $17.1\,\%$ which is very close to the test error $13.7\,\%$ of the nonprivate version of GBDT.
摘要
privacy-preserving 学习gradient boosting decision trees(GBDT)具有强大的用于数据的可用性-隐私贸易,例如人口普查数据或医疗数据:经典GBDT学习者可以从小型数据集中提取非线性模式。我们引入了一种新的具有可证明隐私性质的GBDT学习器,并利用以下四种主要技术来提高用于隐私贸易的质量:1. 我们使用改进的噪声涨落方法,对决策树叶的隐私泄露进行更精细的评估,从而使噪声在平均情况下与$O(1/n)$相关,其中$n$是数据点数。2. 我们将个体Rényi筛选器 integrate到我们的方法中,以利用在训练过程中尚未被利用的数据点,这可能是独立有趣的发现,可能是自然而有效的学习流程。3. 我们利用随机决策树分裂的概念,将隐私预算集中在学习叶。4. 我们使用隐私压缩。我们的评估表明,对于Abalone数据集(训练数据点数 fewer than 4k),我们可以在$\ε=0.15$下 achieved $R^2$-score of 0.39,而最接近的前一个工作只能在$\ε=10.0$下达到这个成绩。对于Adult数据集(训练数据点数为50k),我们可以在$\ε=0.07$下 achieved test error of 18.7%,而最接近的前一个工作只能在$\ε=1.0$下达到这个成绩。对于Abalone数据集,当$\varepsilon=0.54$时,我们可以 achieved $R^2$-score of 0.47,几乎与非隐私版GBDT的$R^2$-score相同(0.54)。对于Adult数据集,当$\varepsilon=0.54$时,我们可以 achieved test error of 17.1%,几乎与非隐私版GBDT的test error相同(13.7%)。
Uplift vs. predictive modeling: a theoretical analysis
results: 研究发现,在某些情况下,uplift模型可以超过predictive方法的性能,但这取决于一些参数,如 Mutual Information、variance of estimators、distribution of potential outcomes 和underlying costs and benefits。Abstract
Despite the growing popularity of machine-learning techniques in decision-making, the added value of causal-oriented strategies with respect to pure machine-learning approaches has rarely been quantified in the literature. These strategies are crucial for practitioners in various domains, such as marketing, telecommunications, health care and finance. This paper presents a comprehensive treatment of the subject, starting from firm theoretical foundations and highlighting the parameters that influence the performance of the uplift and predictive approaches. The focus of the paper is on a binary outcome case and a binary action, and the paper presents a theoretical analysis of uplift modeling, comparing it with the classical predictive approach. The main research contributions of the paper include a new formulation of the measure of profit, a formal proof of the convergence of the uplift curve to the measure of profit ,and an illustration, through simulations, of the conditions under which predictive approaches still outperform uplift modeling. We show that the mutual information between the features and the outcome plays a significant role, along with the variance of the estimators, the distribution of the potential outcomes and the underlying costs and benefits of the treatment and the outcome.
摘要
尽管机器学习技术在决策中日益受欢迎,但是 causal-oriented 策略在相关文献中对于纯机器学习方法的加值 rarely been quantified. 这些策略在各个领域,如市场营销、电信、医疗和金融中都非常重要。这篇论文从公司理论基础开始, highlighting the parameters that influence the performance of the uplift and predictive approaches。文章的ocus是二分类结果的情况,并对 uplift 模型与传统预测方法进行比较。文章的主要研究贡献包括:1. 一种新的衡量利润的形式。2. 预测曲线的整合到利润的正式证明。3. 通过模拟来说明,在某些条件下,预测方法仍然超越 uplift 模型。我们发现,Feature 和结果之间的共识度和估计器的方差、潜在结果的分布、对于治疗和结果的成本和利益都对 uplift 模型的性能产生了重要影响。
Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
paper_authors: Tiago da Silva, Eliezer Silva, Adèle Ribeiro, António Góis, Dominik Heider, Samuel Kaski, Diego Mesquita for:The paper is written to address the issue of brittleness in causal discovery algorithms when dealing with scarce data and latent confounders, and to provide a new method that incorporates expert knowledge to improve the inference process.methods:The paper proposes a new method that uses generative flow networks to sample causal ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC). The method also leverages an optimal experimental design to iteratively probe the expert about the relations among variables, and updates the samples with human feedback via importance sampling.results:The paper shows through experiments with synthetic observational data that the proposed method can accurately sample from distributions over ancestral graphs and greatly improve inference quality with human aid.Abstract
Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.
摘要
STRUCTURE learning 是 causal inference 的关键。尤其是 causal discovery(CD)算法在数据稀缺时会变得脆弱,可能推断不准确的 causal 关系,而且这些关系可能与专家知识相悖。此外,大多数 CD 方法不提供 uncertainty 估计,使得用户很难 интерпретирова结果并改进推断过程。尚未有任何works 关注建立可以 both 1) 输出 uncertainty 估计,并且 2) 与专家进行迭代改进 CD 的方法。为解决这些问题,我们开始由 sampling (causal) ancestral graphs 根据信念分布(如 Bayesian information criterion,BIC)中的分配函数,使用生成流网络。然后,我们利用候选图的多样性,引入 optimal experimental design 以让专家关于变量之间的关系进行反馈,从而减少我们对 ancestral graphs 的信念不确定性。最后,我们通过 importance sampling 更新我们的样本,以反映专家的反馈。需要注意的是,我们的方法不需要 causal sufficiency(即存在无观察隐变量)。使用 sintetic observational data 的实验表明,我们的方法可以准确地从 distributions over ancestral graphs 中采样,并且可以通过专家的帮助大幅提高推断质量。
Methods for generating and evaluating synthetic longitudinal patient data: a systematic review
paper_authors: Katariina Perkonoja, Kari Auranen, Joni Virta
for: This paper is written for researchers and developers who are interested in generating and evaluating synthetic longitudinal patient data in medicine, with the aim of addressing the issue of data privacy and availability.
methods: The paper presents a systematic review of 17 methods for generating and evaluating synthetic longitudinal patient data, including traditional simulation techniques and modern deep learning methods. The methods are evaluated based on their type, source code availability, and approaches used to assess resemblance, utility, and privacy.
results: The paper provides a comprehensive overview of the existing methods for generating and evaluating synthetic longitudinal patient data, and discusses practical guidelines and key considerations for developing such methods. The paper also highlights the challenges and limitations of these methods, and identifies future research directions in this area.Abstract
The proliferation of data in recent years has led to the advancement and utilization of various statistical and deep learning techniques, thus expediting research and development activities. However, not all industries have benefited equally from the surge in data availability, partly due to legal restrictions on data usage and privacy regulations, such as in medicine. To address this issue, various statistical disclosure and privacy-preserving methods have been proposed, including the use of synthetic data generation. Synthetic data are generated based on some existing data, with the aim of replicating them as closely as possible and acting as a proxy for real sensitive data. This paper presents a systematic review of methods for generating and evaluating synthetic longitudinal patient data, a prevalent data type in medicine. The review adheres to the PRISMA guidelines and covers literature from five databases until the end of 2022. The paper describes 17 methods, ranging from traditional simulation techniques to modern deep learning methods. The collected information includes, but is not limited to, method type, source code availability, and approaches used to assess resemblance, utility, and privacy. Furthermore, the paper discusses practical guidelines and key considerations for developing synthetic longitudinal data generation methods.
摘要
“在最近几年,数据的普及和深入应用的技术得到了普及和应用,从而加速了研究和开发活动。然而,不是所有领域都得到了同等的利益,部分是因为数据使用和隐私法规的限制,如医学。为解决这个问题,一些统计透明度和隐私保护方法被提议,包括使用 sintetic 数据生成。 sintetic 数据是基于现有数据,目的是尽可能地复制它们,并作为真正敏感数据的代理。本文发表了一项系统性的评论,涵盖了生成和评估 sintetic 长期患者数据的方法。评论遵循 PRISMA 指南,检索到2022年底止的五个数据库中的文献。文章描述了 17 种方法,从传统的模拟技术到现代的深度学习方法。收集的信息包括,但不限于:方法类型、代码可用性和用于评估相似性、有用性和隐私的方法。此外,文章还讨论了实践指南和关键考虑事项,用于开发 sintetic 长期患者数据生成方法。”
Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint
results: 对一些实验INSTANCES(Influence Maximization和Sensor Placement等)进行了评估,结果表明,提出的算法可以保持理论上的质量,同时减少了查询数量。Abstract
The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of the big size of data. This paper introduces two deterministic approximation algorithms for the problem that competitively improve the query complexity of existing algorithms. Our first algorithm, $\LAA$, returns an approximation ratio of $1/19$ within $O(nk)$ query complexity. The second one, $\RLA$, improves the approximation ratio to $1/5-\epsilon$ in $O(nk)$ queries, where $\epsilon$ is an input parameter. Our algorithms are the first ones that provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective. They, therefore, need fewer the number of queries than state-of-the-the-art ones by a factor of $\Omega(\log n)$. Besides the theoretical analysis, we have evaluated our proposed ones with several experiments in some instances: Influence Maximization and Sensor Placement for the problem. The results confirm that our algorithms ensure theoretical quality as the cutting-edge techniques and significantly reduce the number of queries.
摘要
“$\kSMK$问题中的非升渐函数最大化问题已经在机器学习中出现了多种应用,如数据概要、信息传播等。然而,现有的算法对这个问题存在两个问题:如何处理非升渐情况,以及如何快速返回良好的解决方案。这篇论文提出了两种杜氏抽象算法来解决这个问题,其中一种是$\LAA$算法,可以在$O(nk)$查询复杂度下提供$1/19$的近似比率;另一种是$\RLA$算法,可以在$O(nk)$查询复杂度下提供$1/5-\epsilon$的近似比率,其中$\epsilon$是输入参数。这些算法是第一个在非升渐情况下提供常数近似比率的$O(nk)$查询复杂度内部的算法。因此,它们比现有的算法快速返回更好的解决方案,并且可以避免$\Omega(\log n)$的查询复杂度。”I hope this helps! Let me know if you have any further questions or if you'd like me to translate anything else.
Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization
results: 在两个测试集上,相比于回归和分类模型,关系模型在选择阶段表现出了明显的优势,而使用模拟器选择的未评估解决方案也显著提高了算法的效率。Abstract
Surrogate-assisted evolutionary algorithms (SAEAs) hold significant importance in resolving expensive optimization problems~(EOPs). Extensive efforts have been devoted to improving the efficacy of SAEAs through the development of proficient model-assisted selection methods. However, generating high-quality solutions is a prerequisite for selection. The fundamental paradigm of evaluating a limited number of solutions in each generation within SAEAs reduces the variance of adjacent populations, thus impacting the quality of offspring solutions. This is a frequently encountered issue, yet it has not gained widespread attention. This paper presents a framework using unevaluated solutions to enhance the efficiency of SAEAs. The surrogate model is employed to identify high-quality solutions for direct generation of new solutions without evaluation. To ensure dependable selection, we have introduced two tailored relation models for the selection of the optimal solution and the unevaluated population. A comprehensive experimental analysis is performed on two test suites, which showcases the superiority of the relation model over regression and classification models in the selection phase. Furthermore, the surrogate-selected unevaluated solutions with high potential have been shown to significantly enhance the efficiency of the algorithm.
摘要
SAEs(代理协助进化算法)在解决成本高的优化问题(EOPs)中具有重要 significanc。 总的来说,大量的努力已经投入到提高 SAEs 的效果,特别是通过开发高效的模型协助选择方法。然而,生成高质量的解决方案是选择解决方案的前提。 SAEs 中每代评估一部分解决方案的基本思路会减少邻近 популяции的变异,从而影响下一代解决方案的质量。这是一个常见的问题,然而它尚未受到广泛关注。本文提出了一种基于未评估解决方案的框架,使用代理模型来确定高质量的解决方案,以直接生成新的解决方案而不需要评估。为确保可靠的选择,我们已经引入了两种特定的关系模型,一种用于选择优质解决方案,另一种用于选择未评估的人口。我们对两个测试集进行了全面的实验分析,结果表明,关系模型在选择阶段的性能明显超过了回归和分类模型。此外,使用代理选择的未评估解决方案显示有很大的优化效果。
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
methods: 论文使用了CTC和变量模型的组合,并 derivated two 个版本的变量CTC。这两个版本都假设了不同的假设,即每个时间步骤的变量 latent 变量是独立的,以及这些变量是Markovian。
results: 论文显示了这两个版本的变量CTC 都可以直接优化变量下界,并提供了计算可能的实现方式。Abstract
Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.
摘要
<>传输连接主义(CTC)通常用于序列模型任务,如语音识别,因为需要保持输入和目标序列之间的顺序。然而,CTC只适用于决定性序列模型,其潜在空间离散和稀疏,这使得它们在数据变化时更难处理。在这篇论文中,我们将CTC与可变模型结合,并 derive loss函ls可以用来训练更一般化的序列模型,保持顺序。我们 derivTwo versions of the novel variational CTC based on two reasonable assumptions:the first is that the variational latent variables at each time step are conditionally independent; and the second is that these latent variables are Markovian。我们显示这两个损失函数可以直接优化可变下界,并提供了实现方法。Note: "潜在空间" (pinyin: "màn zì kōng chǎng") is a term used in information theory and machine learning to refer to the space of all possible states of a system, and "离散" (pinyin: "liáo chǎng") means "discrete". "Markovian" (pinyin: "mǎ kè yuán") refers to a system that satisfies the Markov property, which states that the future state of the system depends only on its current state, and not on any of its past states.
Generating Hierarchical Structures for Improved Time Series Classification Using Stochastic Splitting Functions
results: 研究表明,使用了rocket和svm分类器,和不同的分割函数,该方法在约半数和一第数据集中显著提高了分类性能。此外,研究还探讨了不同的数据特征和层次结构如何影响HC性能。Abstract
This study introduces a novel hierarchical divisive clustering approach with stochastic splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC). The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy. By systematically dividing classes into two subsets based on their discriminability according to the classifier, the proposed approach constructs a binary tree representation of hierarchical classes. The approach is evaluated on 46 multi-class time series datasets using popular classifiers (svm and rocket) and SSFs (potr, srtr, and lsoo). The results reveal that the approach significantly improves classification performance in approximately half and a third of the datasets when using rocket and svm as the classifier, respectively. The study also explores the relationship between dataset features and HC performance. While the number of classes and flat classification (FC) score show consistent significance, variations are observed with different splitting functions. Overall, the proposed approach presents a promising strategy for enhancing classification by generating hierarchical structure in multi-class time series datasets. Future research directions involve exploring different splitting functions, classifiers, and hierarchy structures, as well as applying the approach to diverse domains beyond time series data. The source code is made openly available to facilitate reproducibility and further exploration of the method.
摘要
results: 作者提出了 indirect免疫的概念,并重复了之前的分析。此外,他们还提出了对免疫概率的敏感分析方法。Abstract
This work is devoted to the study of the probability of immunity, i.e. the effect occurs whether exposed or not. We derive necessary and sufficient conditions for non-immunity and $\epsilon$-bounded immunity, i.e. the probability of immunity is zero and $\epsilon$-bounded, respectively. The former allows us to estimate the probability of benefit (i.e., the effect occurs if and only if exposed) from a randomized controlled trial, and the latter allows us to produce bounds of the probability of benefit that are tighter than the existing ones. We also introduce the concept of indirect immunity (i.e., through a mediator) and repeat our previous analysis for it. Finally, we propose a method for sensitivity analysis of the probability of immunity under unmeasured confounding.
摘要
这项研究专门研究了免疫的概率,即效果发生或不发生。我们 deriv了免疫的必要和 suficient 条件,即免疫概率为零和 $\epsilon$- bounded 免疫,分别表示效果发生只有在曝露的情况下,和效果发生的概率在 $\epsilon$ 范围内。前者允许我们从随机化控制试验中估算效果发生的概率,而后者允许我们生成更紧的效果发生的概率 bounds。我们还介绍了间接免疫(通过介质)的概念,并重复了我们之前的分析。最后,我们提出了对免疫概率下无量化干扰的敏感分析方法。
A Machine Learning-oriented Survey on Tiny Machine Learning
results: 论文提出了三种实现tinyml系统的工作流程(ML-oriented、HW-oriented和协同设计),并对tinyml中的学习领域进行了详细探讨,包括不同家族的模型优化和设计,以及当前领先的学习技术。Abstract
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.
摘要 tinyml 的出现已经对人工智能领域产生了积极的革命,推动了资源有限的 iot 硬件设备和其学习基础架构的共同设计。 tinyml 在第四和第五个工业革命中发挥着重要的作用,帮助社会、经济和个人使用有效的 ai 混合技术(如智能城市、汽车和医疗机器人)。由于 tinyml 的多学科性质,这个领域被不同的方法研究:这篇系统性评价报告尝试提供 tinyml 基础上的所有学习算法的全面概述。本文采用 prism 方法ológico流程,以系统和完整的方式进行文献评价。特别是,我们将首先描述 tinyml 基础上的三个不同工作流程,即 ml oriented、hw oriented 和 co-design。其次,我们提出了 tinyml 视野下的学习天空分类,详细探讨不同家族的模型优化和设计,以及当前领域的 state-of-the-art 学习技术。 finally,这篇评价报告将介绍当前 tinyml 智能边缘应用中的最新硬件设备和软件工具。最后,我们讨论了挑战和未来方向。Note: "tinyml" in the text is translated as "小Machine Learning" in Simplified Chinese, which is a common way to refer to the field of Tiny Machine Learning.
Shedding Light on the Ageing of Extra Virgin Olive Oil: Probing the Impact of Temperature with Fluorescence Spectroscopy and Machine Learning Techniques
results: 研究显示,辐射光谱可以准确地监测EVOO的氧化程度,并且可以通过Machine Learning来处理高度归一化的数据,从而提供一种可以在场地条件下进行评估的方法。Abstract
This work systematically investigates the oxidation of extra virgin olive oil (EVOO) under accelerated storage conditions with UV absorption and total fluorescence spectroscopy. With the large amount of data collected, it proposes a method to monitor the oil's quality based on machine learning applied to highly-aggregated data. EVOO is a high-quality vegetable oil that has earned worldwide reputation for its numerous health benefits and excellent taste. Despite its outstanding quality, EVOO degrades over time owing to oxidation, which can affect both its health qualities and flavour. Therefore, it is highly relevant to quantify the effects of oxidation on EVOO and develop methods to assess it that can be easily implemented under field conditions, rather than in specialized laboratories. The following study demonstrates that fluorescence spectroscopy has the capability to monitor the effect of oxidation and assess the quality of EVOO, even when the data are highly aggregated. It shows that complex laboratory equipment is not necessary to exploit fluorescence spectroscopy using the proposed method and that cost-effective solutions, which can be used in-field by non-scientists, could provide an easily-accessible assessment of the quality of EVOO.
摘要
Phase Synchrony Component Self-Organization in Brain Computer Interface
paper_authors: Xu Niu, Na Lu, Huan Luo, Ruofan Yan
for: This paper aims to develop a deep learning end-to-end network for motor imagery (MI) classification based on phase synchrony information, which can automatically extract optimal filters for preprocessing and channel selection, and achieve better performance than traditional methods.
methods: The proposed method uses a deep learning network to directly extract phase synchrony-based features from raw EEG signals and perform classification. The network learns optimal filters during training, which are obtained when the network achieves peak classification results.
results: The proposed method outperforms state-of-the-art methods and discovers significant phase synchronization phenomena in tongue MI, with an average PLV exceeding 0.87 across all tongue MI samples. This high PLV indicates a groundbreaking discovery in the synchrony pattern of tongue MI.Abstract
Phase synchrony information plays a crucial role in analyzing functional brain connectivity and identifying brain activities. A widely adopted feature extraction pipeline, composed of preprocessing, selection of EEG acquisition channels, and phase locking value (PLV) calculation, has achieved success in motor imagery classification (MI). However, this pipeline is manual and reliant on expert knowledge, limiting its convenience and adaptability to different application scenarios. Moreover, most studies have employed mediocre data-independent spatial filters to suppress noise, impeding the exploration of more significant phase synchronization phenomena. To address the issues, we propose the concept of phase synchrony component self-organization, which enables the adaptive learning of data-dependent spatial filters for automating both the preprocessing and channel selection procedures. Based on this concept, the first deep learning end-to-end network is developed, which directly extracts phase synchrony-based features from raw EEG signals and perform classification. The network learns optimal filters during training, which are obtained when the network achieves peak classification results. Extensive experiments have demonstrated that our network outperforms state-of-the-art methods. Remarkably, through the learned optimal filters, significant phase synchronization phenomena can be observed. Specifically, by calculating the PLV between a pair of signals extracted from each sample using two of the learned spatial filters, we have obtained an average PLV exceeding 0.87 across all tongue MI samples. This high PLV indicates a groundbreaking discovery in the synchrony pattern of tongue MI.
摘要
<> simultanous phase information plays a crucial role in analyzing functional brain connectivity and identifying brain activities. A widely adopted feature extraction pipeline, composed of preprocessing, selection of EEG acquisition channels, and phase locking value (PLV) calculation, has achieved success in motor imagery classification (MI). However, this pipeline is manual and reliant on expert knowledge, limiting its convenience and adaptability to different application scenarios. Moreover, most studies have employed mediocre data-independent spatial filters to suppress noise, impeding the exploration of more significant phase synchronization phenomena. To address the issues, we propose the concept of phase synchrony component self-organization, which enables the adaptive learning of data-dependent spatial filters for automating both the preprocessing and channel selection procedures. Based on this concept, the first deep learning end-to-end network is developed, which directly extracts phase synchrony-based features from raw EEG signals and perform classification. The network learns optimal filters during training, which are obtained when the network achieves peak classification results. Extensive experiments have demonstrated that our network outperforms state-of-the-art methods. Remarkably, through the learned optimal filters, significant phase synchronization phenomena can be observed. Specifically, by calculating the PLV between a pair of signals extracted from each sample using two of the learned spatial filters, we have obtained an average PLV exceeding 0.87 across all tongue MI samples. This high PLV indicates a groundbreaking discovery in the synchrony pattern of tongue MI.中文简体版:同步相关信息在分析 fonctional brain connectivity 和脑活动中扮演了关键角色。一种广泛采用的特征提取管道,包括预处理、EEG采集通道选择和相位锁定值(PLV)计算,在motor imagery classification(MI)中取得了成功。然而,这个管道是手动操作的,受专家知识的限制,因此对不同应用场景的可 conveniency和适应性具有局限性。此外,大多数研究都使用了平均数据独立的空间滤波器来抑制噪声,这阻碍了更进一步的相同报时现象的探索。为解决这些问题,我们提出了相位同步组成自组织的概念,允许自动学习数据dependent的空间滤波器,以自动进行预处理和采集通道选择。基于这个概念,我们开发了首个深度学习端到端网络,直接从原始 EEG 信号中提取相位同步基于特征,并进行分类。该网络在训练时 learns 优化的滤波器,当网络达到最高分类结果时,获得最佳的滤波器。广泛的实验证明,我们的网络超过了当前的状态艺方法。另外,通过我们学习的优化滤波器,可以观察到更加明显的相同报时现象。例如,通过计算每个样本中对应的两个学习的空间滤波器之间的相位锁定值(PLV),我们在所有舌MI样本中获得了平均PLV超过0.87。这高PLV表明了一个很有前途的发现在舌MI中的同步模式。
From Peptides to Nanostructures: A Euclidean Transformer for Fast and Stable Machine Learned Force Fields
paper_authors: J. Thorben Frank, Oliver T. Unke, Klaus-Robert Müller, Stefan Chmiela
For: The paper aims to improve the stability and efficiency of machine learned force fields (MLFFs) in molecular dynamics (MD) simulations, particularly for systems with large numbers of degrees of freedom.* Methods: The authors propose a transformer architecture called SO3krates, which combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism to separate invariant and equivariant information. This allows for more efficient and stable MD simulations.* Results: The authors demonstrate the ability of SO3krates to generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms, and explore the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. The results show that SO3krates can balance the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.Here is the same information in Simplified Chinese text:* 为:文章目的是提高机器学习力学场(MLFFs)在分子动力学(MD)模拟中的稳定性和效率,特别是在多个自由度系统上。* 方法:作者提出了一种名为SO3krates的变换架构,它将稀缺几何变换(Euclidean variables)与自我注意机制结合起来,以分离不变和变换的信息。这使得MD模拟更加稳定和高效。* 结果:作者demonstrate了SO3krates可以生成稳定的MD轨迹 для柔软蛋白质和含百个原子的超分子结构,并explore了中等长的链状分子(如小蛋白质)的PES顶点结构,探索了千个最低能量态。结果表明,SO3krates可以均衡稳定性和训练数据之外的新最低能量配置的出现,这是生物化学领域中的实际探索任务中的关键。Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the suitability of MLFFs in molecular dynamics (MD) simulations is being increasingly scrutinized due to concerns about instability. Our findings suggest a potential connection between MD simulation stability and the presence of equivariant representations in MLFFs, but their computational cost can limit practical advantages they would otherwise bring. To address this, we propose a transformer architecture called SO3krates that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that can separate invariant and equivariant information, eliminating the need for expensive tensor products. SO3krates achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on unprecedented time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3krates demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
摘要
近年来,Machine learned force fields(MLFFs)基于初始参考计算的发展呈现了很大的进步。尽管它们在测试中的错误很低,但MLFFs在分子动力学(MD) simulations中的适用性受到了越来越多的质疑,因为有关其稳定性的担忧。我们的发现表明MLFFs中的equivariant表示可能与MD simulations的稳定性有关,但计算成本限制了它们在实际应用中的实用性。为解决这个问题,我们提出了一种名为SO3krates的变换架构,它结合了稀缺的equivariant表示(欧几何变量)和一种自注意机制,可以分离 invariantinformation和 equivariant information,从而消除高成本的tensor乘积。SO3krates实现了一种独特的精度、稳定性和速度的平衡,使得可以在前所未有的时间和系统大小 scales上进行有用的分子性质的分析。为证明这一点,我们生成了稳定的MD trajectory дляflexible peptides和supra-molecular structures with hundreds of atoms。此外,我们还 investigate了medium-sized chainlike molecules(例如小蛋白质)的PES topology,通过探索 thousands of minimum。特别是,SO3krates表现出可以平衡稳定性和训练数据之外的新的最低能 conformations的能力,这是生物化学领域中的实际探索任务中的关键。
Limited Communications Distributed Optimization via Deep Unfolded Distributed ADMM
methods: 这篇论文提出了一种新的分布式优化算法,即折叠分布式D-ADMM,它通过Iteratively combining local computations和message exchanges来实现分布式优化。
results: 该论文的数据结果表明,折叠分布式D-ADMM可以减少D-ADMM中的消息交换量,同时保持了D-ADMM的性能。此外,该论文还特化了折叠分布式D-ADMM的应用于分布式估算和分布式学习等场景。Abstract
Distributed optimization is a fundamental framework for collaborative inference and decision making in decentralized multi-agent systems. The operation is modeled as the joint minimization of a shared objective which typically depends on observations gathered locally by each agent. Distributed optimization algorithms, such as the common D-ADMM, tackle this task by iteratively combining local computations and message exchanges. One of the main challenges associated with distributed optimization, and particularly with D-ADMM, is that it requires a large number of communications, i.e., messages exchanged between the agents, to reach consensus. This can make D-ADMM costly in power, latency, and channel resources. In this work we propose unfolded D-ADMM, which follows the emerging deep unfolding methodology to enable D-ADMM to operate reliably with a predefined and small number of messages exchanged by each agent. Unfolded D-ADMM fully preserves the operation of D-ADMM, while leveraging data to tune the hyperparameters of each iteration of the algorithm. These hyperparameters can either be agent-specific, aiming at achieving the best performance within a fixed number of iterations over a given network, or shared among the agents, allowing to learn to distributedly optimize over different networks. For both settings, our unfolded D-ADMM operates with limited communications, while preserving the interpretability and flexibility of the original D-ADMM algorithm. We specialize unfolded D-ADMM for two representative settings: a distributed estimation task, considering a sparse recovery setup, and a distributed learning scenario, where multiple agents collaborate in learning a machine learning model. Our numerical results demonstrate that the proposed approach dramatically reduces the number of communications utilized by D-ADMM, without compromising on its performance.
摘要
分布式优化是多机合作推理和决策的基础框架,用于分布式多代理系统中的共同目标最小化。该操作通常基于每个代理收集本地观测数据所得到的共同目标函数。分布式优化算法,如共同D-ADMM,通过融合本地计算和信息交换来实现这个任务。但是,分布式优化具有许多通信 overhead,特别是D-ADMM,可能会占用大量的功能、延迟和通信资源。在这种情况下,我们提出了 unfolded D-ADMM,它采用深度嵌入方法来允许D-ADMM在固定并小于数量的消息交换中进行可靠地操作。 unfolded D-ADMM保留了D-ADMM的操作,并通过数据来调整每个迭代的超参数。这些超参数可以是特定于代理的,寻求在给定网络上达到最佳性能 Within 一定数量的迭代,或者是共享的,以学习分布式优化不同网络。在这两种设置下,我们的 unfolded D-ADMM 具有限制通信的特点,同时保持了原始 D-ADMM 的解释性和灵活性。我们在分布式估计任务和分布式学习场景中特化 unfolded D-ADMM,我们的数值结果表明,我们的方法可以很大幅降低 D-ADMM 使用的通信量,不会影响其性能。
Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization
results: 本文的实验结果表明,使用块级量化中间Activation map 可以减少 GPU 内存消耗和运行时间,同时保持相似的性能交易。Abstract
Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.
摘要
大规模图 neuron 网络(GNNs)的高效训练已经得到了研究的重点,以减少它们的内存占用。工作 by Liu et al. (2022) 提出了极化活动压缩(EXACT)策略,通过对中间活动地图进行量化,以INT2精度进行压缩,并达到了大幅减少GPU内存占用的目标。在这项工作中,我们提出了对EXACT策略的改进,通过对中间活动地图进行块式量化。我们通过不同的块大小进行实验分析,并证明了可以得到更大的内存占用减少(>15%)和每 epoch 的运行速度增加(约5%),即使在执行极端的量化时,与原始EXACT的性能交换空间保持相同。此外,我们还提出了对EXACT中对中间活动地图分布的假设(假设为均匀分布)的修正,并显示了压缩和解压缩步骤的变量估计得到了改善。
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
paper_authors: Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, Wenxuan Tu, Sihang Zhou, Xinwang Liu for:This paper proposes a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, to handle the temporal information in multi-modal data.methods:The proposed method constructs a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments, and models the temporal relationships between nodes using graph learning techniques.results:Experiments demonstrate that TMac outperforms other state-of-the-art models in performance, smoothing capturing the dynamic information in intra-modal and inter-modal.Abstract
Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.
摘要
在数字时代,audiovisual数据 everywhere,这heightened the requirements for deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.
A Comprehensive Review of Community Detection in Graphs
paper_authors: Songning Lai, Jiakang Li, Yonggang Lu
for: 本文旨在探讨复杂网络中社区结构的检测问题,以解释复杂系统的组织和功能。
methods: 本文介绍了多种社区检测方法,包括作者提出的新方法。
results: 本文 explore了多种实际应用场景,并提供了对社区检测问题的深入理解。Abstract
The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a crucial role in understanding the organization and functioning of complex systems. We begin by introducing the concept of community structure, which refers to the arrangement of vertices into clusters, with strong internal connections and weaker connections between clusters. Then, we provide a thorough exposition of various community detection methods, including a new method designed by us. Additionally, we explore real-world applications of community detection in diverse networks. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs. It serves as a valuable resource for researchers and practitioners in multiple disciplines, offering insights into the challenges, methodologies, and applications of community detection in complex networks.
摘要
研究复杂网络已有很大进步,我们对社区结构的理解得到了深刻的提高。检测社区在图中的问题是一个复杂的问题,在社会学、生物学和计算机科学等领域都有着广泛的应用。尽管科学家们努力协作,但是满意的解决方案仍然没有得到。这篇文章探讨社区检测在图中的问题,这是理解复杂系统的关键组成部分。我们首先介绍社区结构的概念,即顶点的分布在团队中,具有内部强连接和 между团队的弱连接。然后,我们提供了详细的社区检测方法,包括我们自己的新方法。此外,我们还探讨了不同网络中社区检测的实际应用。结束时,这篇综述提供了对社区检测在图中的深入理解,作为多种领域的研究人员和实践者的 valuabe资源,它提供了对复杂网络的挑战、方法和应用的深入了解。
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation
paper_authors: Xinyu Tang, Richard Shin, Huseyin A. Inan, Andre Manoel, Fatemehsadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Robert Sim
results: 经过广泛的实验证明,我们的算法可以实现有效的ICL,并且可以保证高度的隐私水平。这些结果开启了新的可能性,允许ICL在隐私保证下进行应用。Abstract
We study the problem of in-context learning (ICL) with large language models (LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications.
摘要
我们研究了大语言模型(LLM)在私有数据上进行境界学习(ICL)问题,这种情况可能会导致语言模型泄露或重复示例。我们提出了一种新的算法,可以从私有数据集中生成几个步骤示例,并具有正式权限保证(DP)。我们通过实验证明,该算法可以实现有效的ICL,并且与非私有ICL和零例解决方案进行比较。我们的结果表明,我们的算法可以实现竞争性的性能,同时保证隐私水平。这些结果开启了新的可能性,允许在隐私保护下进行ICL应用广泛。
Extracting Physical Causality from Measurements to Detect and Localize False Data Injection Attacks
paper_authors: Shengyang Wu, Jingyu Wang, Dongyuan Shi for: 这个研究旨在探讨 False Data Injection Attack (FDIA) 在现代 циber-物理力系统中的问题,并提出一个基于 causal inference 和 Graph Attention Network (GAT) 的共同检测和地点化框架,以检测侵入系统中的攻击。methods: 本研究使用 X-learner 算法估算测量之间的 causality strength,生成 Measurement Causality Graphs (MCGs),然后使用 GAT 检测 MCGs 中的异常模式,从而检测侵入系统中的攻击。results: 实验结果显示,基于 causal inference 和 GAT 的检测和地点化框架具有高度可读性和稳定性,并且能够快速和精确地检测侵入系统中的攻击。Abstract
False Data Injection Attack (FDIA) has become a growing concern in modern cyber-physical power systems. Most existing FDIA detection techniques project the raw measurement data into a high-dimensional latent space to separate normal and attacked samples. These approaches focus more on the statistical correlations of data values and are therefore susceptible to data distribution drifts induced by changes in system operating points or changes in FDIA types and strengths, especially for FDIA localization tasks. Causal inference, on the other hand, extracts the causality behind the coordinated fluctuations of different measurements. The causality patterns are determined by fundamental physical laws such as Ohm's Law and Kirchhoff's Law. They are sensitive to the violation of physical laws caused by FDIA, but tend to remain stable with the drift of system operating points. Leveraging this advantage, this paper proposes a joint FDIA detection and localization framework based on causal inference and the Graph Attention Network (GAT) to identify the attacked system nodes. The proposed framework consists of two levels. The lower level uses the X-learner algorithm to estimate the causality strength between measurements and generate Measurement Causality Graphs (MCGs). The upper level then applies a GAT to identify the anomaly patterns in the MCGs. Since the extracted causality patterns are intrinsically related to the measurements, it is easier for the upper level to figure out the attacked nodes than the existing FDIA localization approaches. The performance of the proposed framework is evaluated on the IEEE 39-bus system. Experimental results show that the causality-based FDIA detection and localization mechanism is highly interpretable and robust.
摘要
现代半导体系统中的假数据插入攻击(FDIA)已成为一种快速增长的问题。大多数现有的FDIA检测技术将原始测量数据投影到高维的干扰空间中,以分离正常和攻击的样本。这些方法更关注数据值的统计相关性,因此容易受到系统操作点的变化或攻击类型和强度的变化的影响,特别是对FDIA的本地化任务。然而, causal inference 可以提取测量数据中的 causality 模式,这些模式是基于物理法则,如奥姆的法则和基本电路的法则。它们对 FDIA 的攻击而言是不稳定的,但是对系统操作点的变化具有稳定性。基于这个优势,本文提出了一种基于 causal inference 和图注意力网络(GAT)的 Joint FDIA 检测和本地化框架,用于标识攻击的系统节点。该框架包括两层。下层使用 X-learner 算法来估算测量之间的 causality 强度,并生成 Measurement Causality Graphs(MCGs)。上层然后使用 GAT 来识别 MCGs 中的异常模式。由于提取的 causality 模式与测量数据直接相关,因此上层更容易于确定攻击的节点,而不是现有的 FDIA 本地化方法。本文的性能被评估在 IEEE 39-bus 系统上。实验结果表明,基于 causal inference 的 FDIA 检测和本地化机制具有高度可读性和稳定性。
Unveiling Optimal SDG Pathways: An Innovative Approach Leveraging Graph Pruning and Intent Graph for Effective Recommendations
methods: 这篇论文使用了User Graph after Pruning和Intent Graph(UGPIG)方法,具体来说是利用删减后的用户图高密度连接能力来解决推荐算法对空间不均衡的问题,并且建立了意图图以捕捉目标区域的偏好。
results: 根据实验结果,UGPIG方法比现有的推荐算法如KGCN、KGAT和KGIN等表现更好,具体来说是在Top-3推荐性能中实现了最大提升9.61%。Abstract
The recommendation of appropriate development pathways, also known as ecological civilization patterns for achieving Sustainable Development Goals (namely, sustainable development patterns), are of utmost importance for promoting ecological, economic, social, and resource sustainability in a specific region. To achieve this, the recommendation process must carefully consider the region's natural, environmental, resource, and economic characteristics. However, current recommendation algorithms in the field of computer science fall short in adequately addressing the spatial heterogeneity related to environment and sparsity of regional historical interaction data, which limits their effectiveness in recommending sustainable development patterns. To overcome these challenges, this paper proposes a method called User Graph after Pruning and Intent Graph (UGPIG). Firstly, we utilize the high-density linking capability of the pruned User Graph to address the issue of spatial heterogeneity neglect in recommendation algorithms. Secondly, we construct an Intent Graph by incorporating the intent network, which captures the preferences for attributes including environmental elements of target regions. This approach effectively alleviates the problem of sparse historical interaction data in the region. Through extensive experiments, we demonstrate that UGPIG outperforms state-of-the-art recommendation algorithms like KGCN, KGAT, and KGIN in sustainable development pattern recommendations, with a maximum improvement of 9.61% in Top-3 recommendation performance.
摘要
“推荐合适发展路径”(简称“生态文明模式”)是实现可持续发展目标(即可持续发展模式)的重要因素。为此,推荐过程必须考虑特定区域的自然、环境、资源和经济特点。然而,现有的计算机科学领域的推荐算法对于区域当地环境和历史互动数据的稀畴性均有所缺乏,从而限制了它们在可持续发展模式推荐方面的效果。为解决这些挑战,本文提出了一种方法 called User Graph after Pruning and Intent Graph (UGPIG)。首先,我们利用高密度连结能力的删除后User Graph来解决推荐算法对于区域当地环境的忽略问题。其次,我们建立了意向图,将目标区域的意向网络融合到推荐过程中,以解决缺乏历史互动数据的问题。透过广泛的实验,我们证明UGPIG可以较前者优化可持续发展模式的推荐性能,最大改进率为9.61%。
Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs
results: 研究人员在一系列实际项目上进行了实验,结果显示,复杂度指导的采样方法可以提高代理的准确性。Abstract
Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.
摘要
Efficient Core-selecting Incentive Mechanism for Data Sharing in Federated Learning
paper_authors: Mengda Ji, Genjiu Xu, Jianjun Ge, Mingqiang Li for: This paper focuses on developing an incentive mechanism for federated learning that encourages participants to input high-quality data truthfully and promotes stable cooperation.methods: The authors use game-theoretic approaches and the concept of the core from cooperative games to design an incentive mechanism. They also propose an efficient core-selecting mechanism based on sampling approximation to reduce computational overhead.results: The proposed mechanism is able to incentivize inputting high-quality data and stable cooperation, while reducing computational overhead compared to the core-selecting mechanism. Extensive experiments verify the effectiveness of the proposed mechanism.Here’s the simplified Chinese text version:for: 这篇论文关注于为联合学习建立一种奖励机制,以便参与者输入高质量数据并寻求稳定合作。methods: 作者使用游戏理论和合作游戏中的核概念来设计奖励机制。他们还提出了一种基于抽样估计的有效核选机制,以降低计算开销。results: 提出的机制能够奖励输入高质量数据和稳定合作,同时降低计算开销相比核选机制。广泛的实验证明了机制的有效性。Abstract
Federated learning is a distributed machine learning system that uses participants' data to train an improved global model. In federated learning, participants cooperatively train a global model, and they will receive the global model and payments. Rational participants try to maximize their individual utility, and they will not input their high-quality data truthfully unless they are provided with satisfactory payments based on their data quality. Furthermore, federated learning benefits from the cooperative contributions of participants. Accordingly, how to establish an incentive mechanism that both incentivizes inputting data truthfully and promotes stable cooperation has become an important issue to consider. In this paper, we introduce a data sharing game model for federated learning and employ game-theoretic approaches to design a core-selecting incentive mechanism by utilizing a popular concept in cooperative games, the core. In federated learning, the core can be empty, resulting in the core-selecting mechanism becoming infeasible. To address this, our core-selecting mechanism employs a relaxation method and simultaneously minimizes the benefits of inputting false data for all participants. However, this mechanism is computationally expensive because it requires aggregating exponential models for all possible coalitions, which is infeasible in federated learning. To address this, we propose an efficient core-selecting mechanism based on sampling approximation that only aggregates models on sampled coalitions to approximate the exact result. Extensive experiments verify that the efficient core-selecting mechanism can incentivize inputting high-quality data and stable cooperation, while it reduces computational overhead compared to the core-selecting mechanism.
摘要
federated learning 是一种分布式机器学习系统,使用参与者的数据来训练global模型。在 federated learning 中,参与者合作训练 global模型,并将收到 global模型和支付。理智参与者会尽可能地提高自己的个人利益,而不会 Inputting truthful high-quality data Unless satisfactory payments are provided based on data quality. In addition, federated learning benefits from the cooperative contributions of participants. Therefore, how to establish an incentive mechanism that both incentivizes inputting data truthfully and promotes stable cooperation has become an important issue to consider.In this paper, we introduce a data sharing game model for federated learning and employ game-theoretic approaches to design a core-selecting incentive mechanism based on a popular concept in cooperative games, the core. In federated learning, the core can be empty, resulting in the core-selecting mechanism becoming infeasible. To address this, our core-selecting mechanism employs a relaxation method and simultaneously minimizes the benefits of inputting false data for all participants. However, this mechanism is computationally expensive because it requires aggregating exponential models for all possible coalitions, which is infeasible in federated learning.To address this, we propose an efficient core-selecting mechanism based on sampling approximation that only aggregates models on sampled coalitions to approximate the exact result. Extensive experiments verify that the efficient core-selecting mechanism can incentivize inputting high-quality data and stable cooperation, while it reduces computational overhead compared to the core-selecting mechanism.
results: 通过实验表明,QSW和Randomized Quasi-Sliced Wasserstein(RQSW)variant具有优秀的性能,可以应用于多种三维任务,如点云比较、点云插值、图像风格传递和深度点云自动编码器的训练。Abstract
Monte Carlo (MC) approximation has been used as the standard computation approach for the Sliced Wasserstein (SW) distance, which has an intractable expectation in its analytical form. However, the MC method is not optimal in terms of minimizing the absolute approximation error. To provide a better class of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of QMC for SW, we focus on the 3D setting, specifically computing the SW between probability measures in three dimensions. In greater detail, we empirically verify various ways of constructing QMC points sets on the 3D unit-hypersphere, including Gaussian-based mapping, equal area mapping, generalized spiral points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased estimation for stochastic optimization, we extend QSW into Randomized Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed low-discrepancy sequences. For theoretical properties, we prove the asymptotic convergence of QSW and the unbiasedness of RQSW. Finally, we conduct experiments on various 3D tasks, such as point-cloud comparison, point-cloud interpolation, image style transfer, and training deep point-cloud autoencoders, to demonstrate the favorable performance of the proposed QSW and RQSW variants.
摘要
蒙特卡洛(MC)方法已经被用作水平割(SW)距离的标准计算方法,但MC方法不是最佳的精度最小化方法。为提供更好的empirical SW,我们提议 quasi-sliced Wasserstein(QSW)近似方法,基于Quasi-Monte Carlo(QMC)方法。为了进行全面的QMC方法对SW的调查,我们在3D设置中计算了SW между概率分布。在更详细的描述中,我们在3D单位球上构建了QMC点集,包括高斯映射、等面积映射、通用螺旋点和优化误差能量。此外,为了获得无偏估的优化,我们将QSW扩展到随机化 quasi-sliced Wasserstein(RQSW)中,通过引入随机性来讲谱低误差序列。我们证明了QSW的极限减少和RQSW的无偏估性。最后,我们在various 3D任务上进行了实验,如点云比较、点云插值、图像风格传递和训练深点云自动编码器,以示我们提议的QSW和RQSW变体的优异性。
methods: 这篇论文使用了 Contextual Linear Setting 和奖励通信协议(Inc-FedUCB),实际验证了这个方法在不同环境下的效果。
results: 这篇论文的实验结果显示,这个方法可以在不同的数据集和环境下获得近乎最佳的 regret 性能和通信成本保证。Abstract
Most existing works on federated bandits take it for granted that all clients are altruistic about sharing their data with the server for the collective good whenever needed. Despite their compelling theoretical guarantee on performance and communication efficiency, this assumption is overly idealistic and oftentimes violated in practice, especially when the algorithm is operated over self-interested clients, who are reluctant to share data without explicit benefits. Negligence of such self-interested behaviors can significantly affect the learning efficiency and even the practical operability of federated bandit learning. In light of this, we aim to spark new insights into this under-explored research area by formally introducing an incentivized communication problem for federated bandits, where the server shall motivate clients to share data by providing incentives. Without loss of generality, we instantiate this bandit problem with the contextual linear setting and propose the first incentivized communication protocol, namely, Inc-FedUCB, that achieves near-optimal regret with provable communication and incentive cost guarantees. Extensive empirical experiments on both synthetic and real-world datasets further validate the effectiveness of the proposed method across various environments.
摘要
大多数现有的联合搜寻工作假设所有客户端都是善于分享其数据给服务器以实现共同好的,这是一个过于理想化的假设,在实际应用中经常被违反。特别是当算法运行在自私的客户端上时,这些客户端可能会拒绝分享数据没有显式的利益。忽略这种自私行为可能会对联合搜寻的学习效率和实际运行造成重要的影响。为了提供新的思想和挑战,我们正式引入了一个奖励通信问题,即服务器应该如何鼓励客户端分享数据,以提高联合搜寻的性能。不失一般性,我们在上下文分析的情况下实例化了这个带itul bandit问题,并提出了首个奖励通信协议,即Inc-FedUCB,该协议可以实现近似优化的停损 regret,同时具有可证明的通信和奖励成本保证。在synthetic和实际数据上进行了广泛的实验,证明了我们的方法在不同环境下的效果。