paper_authors: Jinchao Feng, Charles Kulick, Sui Tang
For: The paper aims to develop a data-driven approach for discovering a general second-order particle-based model for modeling the aggregation and collective behavior of interacting agents.* Methods: The proposed approach uses Gaussian Process (GP) priors on latent interaction kernels constrained to dynamics and observational data, which allows for uncertainty quantification and nonparametric modeling of interacting dynamical systems. The paper also develops acceleration techniques to improve scalability.* Results: The proposed approach is demonstrated to be effective in modeling various prototype systems, including real-world fish motion datasets, and outperforms competitor methods despite the use of small data sets. The approach learns an effective representation of the nonlinear dynamics in these spaces.Abstract
In this paper, we focus on the data-driven discovery of a general second-order particle-based model that contains many state-of-the-art models for modeling the aggregation and collective behavior of interacting agents of similar size and body type. This model takes the form of a high-dimensional system of ordinary differential equations parameterized by two interaction kernels that appraise the alignment of positions and velocities. We propose a Gaussian Process-based approach to this problem, where the unknown model parameters are marginalized by using two independent Gaussian Process (GP) priors on latent interaction kernels constrained to dynamics and observational data. This results in a nonparametric model for interacting dynamical systems that accounts for uncertainty quantification. We also develop acceleration techniques to improve scalability. Moreover, we perform a theoretical analysis to interpret the methodology and investigate the conditions under which the kernels can be recovered. We demonstrate the effectiveness of the proposed approach on various prototype systems, including the selection of the order of the systems and the types of interactions. In particular, we present applications to modeling two real-world fish motion datasets that display flocking and milling patterns up to 248 dimensions. Despite the use of small data sets, the GP-based approach learns an effective representation of the nonlinear dynamics in these spaces and outperforms competitor methods.
摘要
在这篇论文中,我们关注数据驱动的第二阶射数据驱动模型的发现,这个模型包含了许多当前的模型,用于描述相互作用的体积粒子之间的聚合和集体行为。这个模型以高维系数方程的形式表示,由两个交互卷积函数来评估位姿速度的Alignment。我们提议使用 Gaussian Process 方法来解决这个问题,其中未知模型参数通过两个独立的 Gaussian Process 假设来推敲。这results in a nonparametric model for interacting dynamical systems that accounts for uncertainty quantification. In addition, we develop acceleration techniques to improve scalability. Furthermore, we perform a theoretical analysis to interpret the methodology and investigate the conditions under which the kernels can be recovered. We demonstrate the effectiveness of the proposed approach on various prototype systems, including the selection of the order of the systems and the types of interactions. In particular, we present applications to modeling two real-world fish motion datasets that display flocking and milling patterns up to 248 dimensions. Despite the use of small data sets, the GP-based approach learns an effective representation of the nonlinear dynamics in these spaces and outperforms competitor methods.
COSTAR: Improved Temporal Counterfactual Estimation with Self-Supervised Learning
results: 比较 existing models 的实验结果显示,COSTAR 方法可以实现更高的估计精度和扩展性,并在不同的数据集上具有良好的一致性。Abstract
Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality. For real-world datasets, modeling time-dependent confounders is challenging due to complex dynamics, long-range dependencies and both past treatments and covariates affecting the future outcomes. In this paper, we introduce COunterfactual Self-supervised TrAnsformeR (COSTAR), a novel approach that integrates self-supervised learning for improved historical representations. The proposed framework combines temporal and feature-wise attention with a component-wise contrastive loss tailored for temporal treatment outcome observations, yielding superior performance in estimation accuracy and generalization to out-of-distribution data compared to existing models, as validated by empirical results on both synthetic and real-world datasets.
摘要
<>TRANSLATE_TEXT描述:在各种领域中,如医疗和电商,估算时间上的假设结果是决策的关键,特别是当随机控制试验(RCT)的成本或实用性太高时。实际数据中模型时间依赖的难点在于复杂的动态、长距离依赖和过去的治疗和 covariates 影响未来结果。本文介绍一种新的方法——自动学习Counterfactual Self-supervised TrAnsformeR(COSTAR),它将自动学习纳入历史表示中,以提高估算精度和对异常数据的泛化。方法:1. 将时间序列分解成多个特征,并对每个特征应用自动学习模型进行学习。2. 使用时间和特征粒度的注意力,对历史数据进行注意力机制。3. 使用特征粒度的对比损失函数,以优化历史表示。效果:1. 与现有模型相比,COSTAR 在估算精度和对异常数据的泛化方面具有显著优势。2. COSTAR 在实际数据上显示出较高的鲁棒性和泛化能力。结论:COSTAR 是一种可靠的、高效的 temporal counterfactual outcomes 估算方法,可以在各种领域中应用。<>I hope this helps! Let me know if you have any further questions or if there's anything else I can help with.
results: 可以处理高维观测数据,涵盖观测噪声、复杂互动规则、缺失互动特征和实际观测agent系统观测数据Abstract
We present a review of a series of learning methods used to identify the structure of dynamical systems, aiming to understand emergent behaviors in complex systems of interacting agents. These methods not only offer theoretical guarantees of convergence but also demonstrate computational efficiency in handling high-dimensional observational data. They can manage observation data from both first- and second-order dynamical systems, accounting for observation/stochastic noise, complex interaction rules, missing interaction features, and real-world observations of interacting agent systems. The essence of developing such a series of learning methods lies in designing appropriate loss functions using the variational inverse problem approach, which inherently provides dimension reduction capabilities to our learning methods.
摘要
我们提出了一系列学习方法,用于识别动力系统的结构,以便理解复杂系统中间的 emergent 行为。这些方法不仅提供了理论上的确定性 guarantees,还能够效率地处理高维观测数据。它们可以处理来自一元和二元动力系统的观测数据,并考虑观测/抽象噪声、复杂互动规则、缺失互动特征和实际世界中间的互动代理系统观测数据。难点在于开发这种系列学习方法的核心在于设计适当的损失函数,使用变量逆问题方法,这种方法自然提供了维度减少能力。
results: 模型在16kHz采样率下的延迟低于20毫秒,并且在Consumer CPU上运行速度约为2.8倍于实时。此外,该模型还实现了最低的资源占用和延迟,并提供了开源样本、代码和预训练模型重量。Abstract
We adapt the architectures of previous audio manipulation and generation neural networks to the task of real-time any-to-one voice conversion. Our resulting model, LLVC ($\textbf{L}$ow-latency $\textbf{L}$ow-resource $\textbf{V}$oice $\textbf{C}$onversion), has a latency of under 20ms at a bitrate of 16kHz and runs nearly 2.8x faster than real-time on a consumer CPU. LLVC uses both a generative adversarial architecture as well as knowledge distillation in order to attain this performance. To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model. We provide open-source samples, code, and pretrained model weights at https://github.com/KoeAI/LLVC.
摘要
我们适应之前的音频修改和生成神经网络的架构,用于实时任意到一个语音转换任务。我们的结果是LLVC(低延迟低资源语音转换)模型,延迟时间低于20毫秒,采样率16kHz,在消耗者CPU上运行速度约为2.8倍于实时。LLVC使用生成对抗架构以及知识塑造来实现这种性能。我们认为LLVC在开源语音转换模型中实现了最低的资源使用和延迟时间。我们在https://github.com/KoeAI/LLVC提供开源样本、代码和预训练模型参数。
results: 提出了一些新的可 identificability 结果,以适应实际场景中的不完整、偏 sparse 和依赖源等限制。Abstract
Nonlinear independent component analysis (ICA) aims to uncover the true latent sources from their observable nonlinear mixtures. Despite its significance, the identifiability of nonlinear ICA is known to be impossible without additional assumptions. Recent advances have proposed conditions on the connective structure from sources to observed variables, known as Structural Sparsity, to achieve identifiability in an unsupervised manner. However, the sparsity constraint may not hold universally for all sources in practice. Furthermore, the assumptions of bijectivity of the mixing process and independence among all sources, which arise from the setting of ICA, may also be violated in many real-world scenarios. To address these limitations and generalize nonlinear ICA, we propose a set of new identifiability results in the general settings of undercompleteness, partial sparsity and source dependence, and flexible grouping structures. Specifically, we prove identifiability when there are more observed variables than sources (undercomplete), and when certain sparsity and/or source independence assumptions are not met for some changing sources. Moreover, we show that even in cases with flexible grouping structures (e.g., part of the sources can be divided into irreducible independent groups with various sizes), appropriate identifiability results can also be established. Theoretical claims are supported empirically on both synthetic and real-world datasets.
摘要
To address these limitations and generalize nonlinear ICA, we propose a set of new identifiability results in the general settings of undercompleteness, partial sparsity, and source dependence, and flexible grouping structures. Specifically, we prove identifiability when there are more observed variables than sources (undercomplete), and when certain sparsity and/or source independence assumptions are not met for some changing sources. Moreover, we show that even in cases with flexible grouping structures (e.g., part of the sources can be divided into irreducible independent groups with various sizes), appropriate identifiability results can also be established.Empirical results on both synthetic and real-world datasets support our theoretical claims.
SmoothHess: ReLU Network Feature Interactions via Stein’s Lemma
results: 论文通过在 benchmark datasets 和一个实际医疗领域的呼吸测量数据集上验证了SmoothHess的能力,并证明了它的优于其他方法。Abstract
Several recent methods for interpretability model feature interactions by looking at the Hessian of a neural network. This poses a challenge for ReLU networks, which are piecewise-linear and thus have a zero Hessian almost everywhere. We propose SmoothHess, a method of estimating second-order interactions through Stein's Lemma. In particular, we estimate the Hessian of the network convolved with a Gaussian through an efficient sampling algorithm, requiring only network gradient calls. SmoothHess is applied post-hoc, requires no modifications to the ReLU network architecture, and the extent of smoothing can be controlled explicitly. We provide a non-asymptotic bound on the sample complexity of our estimation procedure. We validate the superior ability of SmoothHess to capture interactions on benchmark datasets and a real-world medical spirometry dataset.
摘要
几种最近的方法用于解释神经网络中的特征交互,通过查看神经网络的第二Derivative。然而,ReLU网络是划分线性的,因此其第二Derivative在大多数地方都是零。我们提出了SmoothHess方法,通过斯坦 lemma来估算神经网络抽象后的第二Derivative。具体来说,我们使用一种高效的采样算法来估算神经网络与 Gaussian 卷积后的第二Derivative,只需要网络梯度的调用。SmoothHess 是一种后期应用的方法,不需要修改 ReLU 网络的结构,并且可以控制抽象程度的Explicitly。我们提供了非 asymptotic 的样本复杂性下界。我们验证了 SmoothHess 在 benchmark 数据集和一个真实世界的医疗领域的呼吸测试数据集上的超过其他方法 capture 交互的能力。
Electronic excited states from physically-constrained machine learning
results: 作者的模型可以对更大和更复杂的分子进行预测,并且可以通过间接目标已经得到的计算结果,从而实现了很大的计算成本减少。这些结果证明了将数据驱动技术与物理基础相互结合的优势,并提供了开发ML-加上电子结构方法的蓝本。Abstract
Data-driven techniques are increasingly used to replace electronic-structure calculations of matter. In this context, a relevant question is whether machine learning (ML) should be applied directly to predict the desired properties or be combined explicitly with physically-grounded operations. We present an example of an integrated modeling approach, in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation. The resulting model can make predictions for molecules that are much larger and more complex than those that it is trained on, and allows for dramatic computational savings by indirectly targeting the outputs of well-converged calculations while using a parameterization corresponding to a minimal atom-centered basis. These results emphasize the merits of intertwining data-driven techniques with physical approximations, improving the transferability and interpretability of ML models without affecting their accuracy and computational efficiency, and providing a blueprint for developing ML-augmented electronic-structure methods.
摘要
“数据驱动技术在物质计算中越来越广泛应用,问题是何时应用机器学习(ML)直接预测感兴趣的性质,或者与物理基础相结合显式地运算。我们提出一种集成模型方法,其中使用对效 Hamiltoniano的 симметри优化机器学习模型来复制电子激发的量子力学计算结果。得到的模型可以预测大量和复杂的分子,并且可以在很大程度上减少计算成本,因为直接targeting已经 converges的计算结果,使用一个对应于最小原子基的参数。这些结果强调了将数据驱动技术与物理方法相结合的优势,提高机器学习模型的传输性和解释性,不影响它们的准确性和计算效率,并为开发机器学习增强电子结构方法提供蓝本。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you need the translation in Traditional Chinese, please let me know.
Sharp Noisy Binary Search with Monotonic Probabilities
results: 这篇论文提出了一个实用的算法,可以在 $\Theta(\frac{1}{\varepsilon^2} \log n)$ 样本中找到目标概率 $\tau$,这是最佳的Bound。此外,论文还解决了两个理论挑战:高概率行为和锐度常数。Abstract
We revisit the noisy binary search model of Karp and Kleinberg, in which we have $n$ coins with unknown probabilities $p_i$ that we can flip. The coins are sorted by increasing $p_i$, and we would like to find where the probability crosses (to within $\varepsilon$) of a target value $\tau$. This generalized the fixed-noise model of Burnashev and Zigangirov , in which $p_i = \frac{1}{2} \pm \varepsilon$, to a setting where coins near the target may be indistinguishable from it. Karp and Kleinberg showed that $\Theta(\frac{1}{\varepsilon^2} \log n)$ samples are necessary and sufficient for this task. We produce a practical algorithm by solving two theoretical challenges: high-probability behavior and sharp constants. We give an algorithm that succeeds with probability $1-\delta$ from \[ \frac{1}{C_{\tau, \varepsilon} \cdot \left(\lg n + O(\log^{2/3} n \log^{1/3} \frac{1}{\delta} + \log \frac{1}{\delta})\right) \] samples, where $C_{\tau, \varepsilon}$ is the optimal such constant achievable. For $\delta > n^{-o(1)}$ this is within $1 + o(1)$ of optimal, and for $\delta \ll 1$ it is the first bound within constant factors of optimal.
摘要
我们重访噪音搜寻模型,由karp和kleinberg提出的模型,其中我们有n枚钱币,每枚钱币的概率$p_i$都是未知的。这些钱币按照增加的$p_i$排序,我们想找出这些概率在 target值 $\tau$ 附近的地方。这个问题可以视为burnashev和zigangirov的固定噪音模型的扩展,这个模型中每枚钱币的概率都是 $\frac{1}{2} \pm \varepsilon$。karp和kleinberg表明了 $\Theta\left(\frac{1}{\varepsilon^2} \log n\right)$ 样本是必要和充分的。我们提出了一个实用的算法,解决了两个理论挑战:高概率行为和锐数常数。我们的算法在 $1-\delta$ 的几率下成功,并且需要 $\frac{1}{C_{\tau, \varepsilon} \cdot \left(\lg n + O\left(\log^{2/3} n \log^{1/3} \frac{1}{\delta} + \log \frac{1}{\delta}\right)\right)$ 样本。如果 $\delta > n^{-o(1)}$ ,这个结果在 $1 + o(1)$ 内,而如果 $\delta \ll 1$ ,这个结果是第一个在常数因数内的界限。
A quantum-classical performance separation in nonconvex optimization
results: 对比 represntative的类别优化算法/解决方案(包括Gurobi),量子算法能够在超polynomial时间内解决这些优化问题,而类别优化算法则需要超polynomial时间。Abstract
In this paper, we identify a family of nonconvex continuous optimization instances, each $d$-dimensional instance with $2^d$ local minima, to demonstrate a quantum-classical performance separation. Specifically, we prove that the recently proposed Quantum Hamiltonian Descent (QHD) algorithm [Leng et al., arXiv:2303.01471] is able to solve any $d$-dimensional instance from this family using $\widetilde{\mathcal{O}(d^3)$ quantum queries to the function value and $\widetilde{\mathcal{O}(d^4)$ additional 1-qubit and 2-qubit elementary quantum gates. On the other side, a comprehensive empirical study suggests that representative state-of-the-art classical optimization algorithms/solvers (including Gurobi) would require a super-polynomial time to solve such optimization instances.
摘要
在这篇论文中,我们确定了一个家族非凸连续优化问题,每个 $d$-维问题有 $2^d$ 的本地最优点。我们证明了最近提出的量子汉密顿下降(QHD)算法 [Leng et al., arXiv:2303.01471] 能够解决任何 $d$-维问题。具体来说,QHD 算法需要 $\widetilde{\mathcal{O}(d^3)$ 量子查询函数值和 $\widetilde{\mathcal{O}(d^4)$ additional 1-qubit 和 2-qubit 元素量子门。而在另一个方面,一项大规模实验表明,代表性的现代类比优化算法/解决方案(包括 Gurobi)会在解决这些优化问题上需要超多项时间。
Mahalanobis-Aware Training for Out-of-Distribution Detection
results: 在CIFAR-10上,本研究获得了明显改善的false positive rate的结果,尤其是在距离OOD标的任务上降低了50%以上。Abstract
While deep learning models have seen widespread success in controlled environments, there are still barriers to their adoption in open-world settings. One critical task for safe deployment is the detection of anomalous or out-of-distribution samples that may require human intervention. In this work, we present a novel loss function and recipe for training networks with improved density-based out-of-distribution sensitivity. We demonstrate the effectiveness of our method on CIFAR-10, notably reducing the false-positive rate of the relative Mahalanobis distance method on far-OOD tasks by over 50%.
摘要
深度学习模型在控制环境中已经取得了广泛的成功,但在开放世界上仍存在采用障碍物。一个重要的任务是检测异常或非标量样本,以便人工干预。在这项工作中,我们提出了一种新的损失函数和训练方法,可以提高网络对非标量敏感性。我们在CIFAR-10上进行了实验,并证明我们的方法可以在远程OOD任务中降低相对马哈拉诺比距离方法的假阳性率高于50%。
Neural Field Dynamics Model for Granular Object Piles Manipulation
methods: Fully convolutional neural network (FCNN) with density field-based representation and translation equivariance, differentiable action rendering module
results: Exceeds existing latent or particle-based methods in accuracy and computation efficiency, demonstrates zero-shot generalization capabilities in various environments and tasks.Abstract
We present a learning-based dynamics model for granular material manipulation. Inspired by the Eulerian approach commonly used in fluid dynamics, our method adopts a fully convolutional neural network that operates on a density field-based representation of object piles and pushers, allowing it to exploit the spatial locality of inter-object interactions as well as the translation equivariance through convolution operations. Furthermore, our differentiable action rendering module makes the model fully differentiable and can be directly integrated with a gradient-based trajectory optimization algorithm. We evaluate our model with a wide array of piles manipulation tasks both in simulation and real-world experiments and demonstrate that it significantly exceeds existing latent or particle-based methods in both accuracy and computation efficiency, and exhibits zero-shot generalization capabilities across various environments and tasks.
摘要
我们提出了一种学习基于的动力学模型,用于粒子材料处理。 Drawing inspiration from fluid dynamics的Eulerian方法,我们的方法使用了一个完全卷积神经网络,该网络在基于物体堆和推进器的浓度场表示上进行操作,以利用物体之间的空间地区性和翻译同构性。此外,我们的可 diferenciable action rendering模块使得模型成为可导数的,可以直接与梯度基本的轨迹优化算法集成。我们在模拟和实际实验中对堆物处理任务进行了广泛的评估,并证明了我们的模型在精度和计算效率两个方面明显超过了现有的隐藏或 particulate 方法,并且在不同环境和任务上显示出零拟合能力。
GIST: Generated Inputs Sets Transferability in Deep Learning
results: 实验结果显示,GIST可以将测试集转移到新的模型下,并且可以选择符合运算目标的测试集。这显示GIST可以帮助将测试集转移到不同的模型和测试集生成程序中。Abstract
As the demand for verifiability and testability of neural networks continues to rise, an increasing number of methods for generating test sets are being developed. However, each of these techniques tends to emphasize specific testing aspects and can be quite time-consuming. A straightforward solution to mitigate this issue is to transfer test sets between some benchmarked models and a new model under test, based on a desirable property one wishes to transfer. This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets among Deep Learning models. Given a property of interest that a user wishes to transfer (e.g., coverage criterion), GIST enables the selection of good test sets from the point of view of this property among available ones from a benchmark. We empirically evaluate GIST on fault types coverage property with two modalities and different test set generation procedures to demonstrate the approach's feasibility. Experimental results show that GIST can select an effective test set for the given property to transfer it to the model under test. Our results suggest that GIST could be applied to transfer other properties and could generalize to different test sets' generation procedures and modalities
摘要
随着神经网络的验证性和测试性需求不断增长,开发测试集的方法也在不断增加。然而,每种测试集生成技术都强调特定的测试方面,可能占用很多时间。为了解决这个问题,我们提出了一种简单的解决方案:通过将 benchmarked 模型中的测试集转移到新的模型下,基于您想要传输的性能来选择好的测试集。这篇论文介绍了 GIST(生成输入集传输性),一种新的深度学习模型测试集传输方法。给定一个您想要传输的性能(例如,覆盖标准),GIST 可以在可用的测试集中选择符合这个性能的好测试集,并将其传输到模型下。我们在不同的检测类型和测试集生成过程中对 GIST 进行了实验评估。实验结果表明,GIST 可以选择一个有效的测试集,以便将其传输到模型下。我们的结果表明,GIST 可以应用于传输其他性能,并且可以扩展到不同的测试集生成过程和模式。
Accelerating Electronic Stopping Power Predictions by 10 Million Times with a Combination of Time-Dependent Density Functional Theory and Machine Learning
results: 在铝中的贫子辐射中实现了电子停止的首先原理计算,并通过机器学习 interpolate 到其他方向,提高了评估新材料的速度,并预测了入射角影响“布拉格峰”的深度变化。Abstract
Knowing the rate at which particle radiation releases energy in a material, the stopping power, is key to designing nuclear reactors, medical treatments, semiconductor and quantum materials, and many other technologies. While the nuclear contribution to stopping power, i.e., elastic scattering between atoms, is well understood in the literature, the route for gathering data on the electronic contribution has for decades remained costly and reliant on many simplifying assumptions, including that materials are isotropic. We establish a method that combines time-dependent density functional theory (TDDFT) and machine learning to reduce the time to assess new materials to mere hours on a supercomputer and provides valuable data on how atomic details influence electronic stopping. Our approach uses TDDFT to compute the electronic stopping contributions to stopping power from first principles in several directions and then machine learning to interpolate to other directions at rates 10 million times higher. We demonstrate the combined approach in a study of proton irradiation in aluminum and employ it to predict how the depth of maximum energy deposition, the "Bragg Peak," varies depending on incident angle -- a quantity otherwise inaccessible to modelers. The lack of any experimental information requirement makes our method applicable to most materials, and its speed makes it a prime candidate for enabling quantum-to-continuum models of radiation damage. The prospect of reusing valuable TDDFT data for training the model make our approach appealing for applications in the age of materials data science.
摘要
知道物质辐射中粒子的能量释放率,即停挡力,对设计核反应堆、医疗治疗、半导体和量子材料等技术非常重要。虽然核心在材料中的辐射贡献已经在文献中很好地理解,但是获取电子贡献的数据一直被视为成本高、依赖于简化假设的问题。我们提出了一种方法,它结合时间相关函数理论(TDDFT)和机器学习,可以在超级计算机上减少评估新材料的时间到只需几个小时,并提供了有价值的数据,以及材料细节如何影响电子停挡。我们的方法使用TDDFT计算电子停挡贡献的数据,从 primera principios Compute 在多个方向上,然后使用机器学习进行 interpolate 到其他方向。我们在钴离子辐射中的铝 studied 并使用它来预测入射角度对最大能量储存深度的影响,这个量抑制器 otherwise 不可能通过模型来模拟。我们的方法不需要任何实验性数据,因此适用于大多数材料,而且它的速度使其成为量子至连续模型的辐射损害的优秀选择。此外,可以 reuse 值得TDDFT数据来训练模型,使我们的方法更有吸引力。
Harnessing machine learning for accurate treatment of overlapping opacity species in GCMs
paper_authors: Aaron David Schneider, Paul Mollière, Gilles Louppe, Ludmila Carone, Uffe Gråe Jørgensen, Leen Decin, Christiane Helling for: 这个研究旨在帮助我们更好地理解外星 planet 和棕矮星的高精度观测结果,特别是通过详细的通流模型(GCM)来模拟化学和辐射的交互作用。methods: 这个研究使用了各种混合方法,包括深度学习(DeepSets,DS)、自适应等效抑制(AEE)和随机重叠与重新排序(RORR)等方法,以混合不同化学种类的Opacity的相关k表。results: 研究发现,DS方法在GCM中具有高度准确和高效的特点,而RORR方法则太慢了。此外,AEE方法的准确性取决于其特定的实现方式,并可能会在实现辐射传输解的过程中引入数值问题。最后,通过模拟降水 TiO 和 VO 的情况,我们证明了降水会阻碍大气层的形成。Abstract
To understand high precision observations of exoplanets and brown dwarfs, we need detailed and complex general circulation models (GCMs) that incorporate hydrodynamics, chemistry, and radiation. In this study, we specifically examine the coupling between chemistry and radiation in GCMs and compare different methods for mixing opacities of different chemical species in the correlated-k assumption, when equilibrium chemistry cannot be assumed. We propose a fast machine learning method based on DeepSets (DS), which effectively combines individual correlated-k opacities (k-tables). We evaluate the DS method alongside other published methods like adaptive equivalent extinction (AEE) and random overlap with rebinning and resorting (RORR). We integrate these mixing methods into our GCM (expeRT/MITgcm) and assess their accuracy and performance for the example of the hot Jupiter HD~209458 b. Our findings indicate that the DS method is both accurate and efficient for GCM usage, whereas RORR is too slow. Additionally, we observe that the accuracy of AEE depends on its specific implementation and may introduce numerical issues in achieving radiative transfer solution convergence. We then apply the DS mixing method in a simplified chemical disequilibrium situation, where we model the rainout of TiO and VO, and confirm that the rainout of TiO and VO would hinder the formation of a stratosphere. To further expedite the development of consistent disequilibrium chemistry calculations in GCMs, we provide documentation and code for coupling the DS mixing method with correlated-k radiative transfer solvers. The DS method has been extensively tested to be accurate enough for GCMs, however, other methods might be needed for accelerating atmospheric retrievals.
摘要
要更好地理解外星 planet 和棕矮星的高精度观测结果,我们需要详细、复杂的通流模型(GCM),这些模型包括 гидро动力学、化学和辐射。在这个研究中,我们专门研究 GCM 中化学和辐射的关系,并比较不同的混合方法。我们提出了一种基于 DeepSets(DS)的快速机器学习方法,可以有效地结合不同化学种类的 opacity 值。我们评估了 DS 方法 alongside 其他已发表的方法,如适应性等抵抗(AEE)和随机 overlap with rebinning and resorting(RORR)。我们将这些混合方法 integrate 到我们的 GCM(expeRT/MITgcm)中,并评估它们在 HD~209458 b 的示例中的准确性和性能。我们发现 DS 方法在 GCM 中具有高准确性和高效率,而 RORR 方法过于慢。此外,我们发现 AEE 方法的准确性取决于具体的实现方式,可能会在实现辐射传输解的过程中引入数值问题。我们然后使用 DS 混合方法在简化化的化学不平衡情况下模拟降水 TiO 和 VO,并证实降水 TiO 和 VO 会阻碍大气层的形成。为了更快速地开发一致的不平衡化学计算方法,我们提供了相关的文档和代码。 DS 方法已经在 GCM 中得到了充分的测试和证明,但可能需要其他方法来加速大气层 retrieval 的发展。
Conformalized Deep Splines for Optimal and Efficient Prediction Sets
methods: 使用神经网络 parameterized splines 进行 conditional density 估计,并提供了两种高效计算 conformal scores
results: 实验结果表明,SPICE-ND 模型可以减少 prediction set 的平均大小,包括某些数据集的减少率达 nearly 50% 相比基eline; SPICE-HPD 模型可以实现最好的 conditional coverage compared to baselines.Abstract
Uncertainty estimation is critical in high-stakes machine learning applications. One effective way to estimate uncertainty is conformal prediction, which can provide predictive inference with statistical coverage guarantees. We present a new conformal regression method, Spline Prediction Intervals via Conformal Estimation (SPICE), that estimates the conditional density using neural-network-parameterized splines. We prove universal approximation and optimality results for SPICE, which are empirically validated by our experiments. SPICE is compatible with two different efficient-to-compute conformal scores, one oracle-optimal for marginal coverage (SPICE-ND) and the other asymptotically optimal for conditional coverage (SPICE-HPD). Results on benchmark datasets demonstrate SPICE-ND models achieve the smallest average prediction set sizes, including average size reductions of nearly 50% for some datasets compared to the next best baseline. SPICE-HPD models achieve the best conditional coverage compared to baselines. The SPICE implementation is made available.
摘要
高度优先级机器学习应用中的不确定性估计是 kritical。一种有效的不确定性估计方法是尊重预测(Conformal Prediction),它可以提供统计保证的预测推断。我们介绍了一种新的尊重回归方法,即spline预测区间via Conformal Estimation(SPICE),它使用神经网络参数化的spline来估计条件概率分布。我们证明了SPICE的 универсалapproximation和优化结果,并通过实验验证了这些结果。SPICE与两种高效计算的尊重分数相容,一种是对边缘覆盖(SPICE-ND)的oracle-optimal分数,另一种是对条件覆盖(SPICE-HPD)的极限优化分数。实验结果表明SPICE-ND模型在一些数据集上实现了最小的平均预测集大小,包括某些数据集上预测集大小减少了近50%的情况。SPICE-HPD模型在基线上比基eline的条件覆盖更好。SPICE实现 disponible。
results: 我们证明了这种比较复杂度是对于考虑的错误度下 theoretically 优化的。实验证明了在排序任务中应用学习增强算法的潜在优势。Abstract
We explore the fundamental problem of sorting through the lens of learning-augmented algorithms, where algorithms can leverage possibly erroneous predictions to improve their efficiency. We consider two different settings: In the first setting, each item is provided a prediction of its position in the sorted list. In the second setting, we assume there is a "quick-and-dirty" way of comparing items, in addition to slow-and-exact comparisons. For both settings, we design new and simple algorithms using only $O(\sum_i \log \eta_i)$ exact comparisons, where $\eta_i$ is a suitably defined prediction error for the $i$th element. In particular, as the quality of predictions deteriorates, the number of comparisons degrades smoothly from $O(n)$ to $O(n\log n)$. We prove that the comparison complexity is theoretically optimal with respect to the examined error measures. An experimental evaluation against existing adaptive and non-adaptive sorting algorithms demonstrates the potential of applying learning-augmented algorithms in sorting tasks.
摘要
我们通过学习加强算法的视角探讨排序问题的基本问题,其中算法可以利用可能存在误差的预测来提高其效率。我们考虑了两种不同的设置:在第一个设置中,每个项目都被提供一个排序列表中的预测位置。在第二个设置中,我们假设存在一种“快速和粗糙”的比较方法,并且与慢速和精确的比较方法相加。对于两种设置,我们设计了新的简单算法,只需要$O(\sum_i \log \eta_i)$ 的确定比较次数,其中 $\eta_i$ 是元素 $i$ 的预测误差。特别是,预测质量逐渐下降时,比较次数会逐渐下降从 $O(n)$ 到 $O(n\log n)$。我们证明了比较复杂度是对于考虑的误差度量来说 theoretically 最优的。实验评估了现有的适应式和非适应式排序算法,示出了应用学习加强算法在排序任务中的潜在优势。
Decision Support Framework for Home Health Caregiver Allocation: A Case Study of HHC Agency in Tennessee, USA
paper_authors: Seyed Mohammad Ebrahim Sharifnia, Faezeh Bagheri, Rupy Sawhney, John E. Kobza, Enrique Macias De Anda, Mostafa Hajiaghaei-Keshteli, Michael Mirrielees
results: 使用田索美洲洲 Tennessee 的家庭健康照顾机构数据,本研究的方法实现了优化旅行里程的目的,并获得了优化旅行里程(最多达42%)和增加每个观察期间的访问次数,而不需要对家庭健康照顾者实施限制。此外,提出的框架还用于评估照顾者资源的分配,提供宝贵的照顾资源管理意见。Abstract
Population aging is a global challenge, leading to increased demand for healthcare and social services for the elderly. Home Health Care (HHC) emerges as a vital solution, specifically designed to serve this population segment. Given the surging demand for HHC, it's essential to coordinate and regulate caregiver allocation efficiently. This is crucial for both budget-optimized planning and ensuring the delivery of high-quality care. This research addresses a key question faced by home health agencies (HHAs): "How can caregiver allocation be optimized, especially when caregivers prefer flexibility in their visiting sequences?". While earlier studies proposed rigid visiting sequences, our study introduces a decision support framework that allocates caregivers through a hybrid method that considers the flexibility in visiting sequences and aims to reduce travel mileage, increase the number of visits per planning period, and maintain the continuity of care - a critical metric for patient satisfaction. Utilizing data from an HHA in Tennessee, United States, our approach led to an impressive reduction in average travel mileage (up to 42% depending on discipline) without imposing restrictions on caregivers. Furthermore, the proposed framework is used for caregivers' supply analysis to provide valuable insights into caregiver resource management.
摘要
全球人口老龄化是一个挑战,导致医疗和社会服务需求增加。家庭健康照顾(HHC)成为一种重要的解决方案,特意设计来服务这个人口段。随着护理需求的增加,有效协调和规范护理人员分配变得非常重要。这是为了保证优质护理的提供,同时也是为了降低成本。本研究面临的问题是:“护理人员分配如何优化,特别是当护理人员喜欢灵活的访问顺序呢?”EARLIER STUDIES提出了固定的访问顺序,但我们的研究推出了一个决策支持框架,该框架通过考虑灵活的访问顺序,以减少旅行里程、增加每个规划期的访问次数,并保持护理连续性——一个关键的满意度指标。使用了一家美国田纳西州的家庭健康照顾机构的数据,我们的方法实现了减少平均旅行里程(最多42%),而不是强制限制护理人员的行为。此外,我们的提案的框架还用于护理人员供应分析,为护理资源管理提供有价值的洞察。
Software Repositories and Machine Learning Research in Cyber Security
paper_authors: Mounika Vanamala, Keith Bryant, Alex Caravella
For: 本研究旨在提高软件开发过程中早期阶段的漏洞检测,通过利用cyber安全存储库如MITRE的CAPEC和CVE数据库,并采用主题模型和机器学习技术,检测软件需求阶段的漏洞。* Methods: 本研究使用了不同的机器学习方法,包括LDA、主题模型、SVM、Na"ive Bayes、随机森林和神经网络,以及深度学习,以寻找软件需求阶段的漏洞。* Results: 研究表明,采用机器学习技术可以提高软件开发过程中早期阶段的漏洞检测,并且可以在不同的软件开发enario下提供关键的助け手,帮助开发者开发更加安全的软件。Abstract
In today's rapidly evolving technological landscape and advanced software development, the rise in cyber security attacks has become a pressing concern. The integration of robust cyber security defenses has become essential across all phases of software development. It holds particular significance in identifying critical cyber security vulnerabilities at the initial stages of the software development life cycle, notably during the requirement phase. Through the utilization of cyber security repositories like The Common Attack Pattern Enumeration and Classification (CAPEC) from MITRE and the Common Vulnerabilities and Exposures (CVE) databases, attempts have been made to leverage topic modeling and machine learning for the detection of these early-stage vulnerabilities in the software requirements process. Past research themes have returned successful outcomes in attempting to automate vulnerability identification for software developers, employing a mixture of unsupervised machine learning methodologies such as LDA and topic modeling. Looking ahead, in our pursuit to improve automation and establish connections between software requirements and vulnerabilities, our strategy entails adopting a variety of supervised machine learning techniques. This array encompasses Support Vector Machines (SVM), Na\"ive Bayes, random forest, neural networking and eventually transitioning into deep learning for our investigation. In the face of the escalating complexity of cyber security, the question of whether machine learning can enhance the identification of vulnerabilities in diverse software development scenarios is a paramount consideration, offering crucial assistance to software developers in developing secure software.
摘要
今天的技术发展和软件开发在加速,网络安全攻击的升级成为了严重的担忧。在软件开发生命周期的所有阶段中,集成强大的网络安全防御变得非常重要。特别是在软件开发需求阶段,早期发现网络安全漏洞变得非常重要。通过利用MITRE提供的Common Attack Pattern Enumeration and Classification(CAPEC)和Common Vulnerabilities and Exposures(CVE)数据库,尝试使用主题模型和机器学习自动检测早期网络安全漏洞。过去的研究主题已经取得了成功的结果,通过杂交不监督机器学习方法,如Latent Dirichlet Allocation(LDA)和主题模型,自动检测漏洞。在前进的探索中,我们将采取多种监督机器学习技术,包括支持向量机(SVM)、愚见树(Na\"ive Bayes)、Random Forest、神经网络和最终过渡到深度学习。面对网络安全的加速升级,机器学习是否能够提高早期网络安全漏洞的检测,成为了 paramount 的考虑,为软件开发者提供关键的帮助,以开发安全的软件。
Deep Learning-Based Classification of Gamma Photon Interactions in Room-Temperature Semiconductor Radiation Detectors
results: 研究人员通过使用模拟数据和实验数据进行训练和验证,发现CoPhNet模型可以在CZTS半导体探测器中具有高精度地分辨Compton散射和照相电子事件。此外,模型还能够在不同的操作参数下保持性能稳定。Abstract
Photon counting radiation detectors have become an integral part of medical imaging modalities such as Positron Emission Tomography or Computed Tomography. One of the most promising detectors is the wide bandgap room temperature semiconductor detectors, which depends on the interaction gamma/x-ray photons with the detector material involves Compton scattering which leads to multiple interaction photon events (MIPEs) of a single photon. For semiconductor detectors like CdZnTeSe (CZTS), which have a high overlap of detected energies between Compton and photoelectric events, it is nearly impossible to distinguish between Compton scattered events from photoelectric events using conventional readout electronics or signal processing algorithms. Herein, we report a deep learning classifier CoPhNet that distinguishes between Compton scattering and photoelectric interactions of gamma/x-ray photons with CdZnTeSe (CZTS) semiconductor detectors. Our CoPhNet model was trained using simulated data to resemble actual CZTS detector pulses and validated using both simulated and experimental data. These results demonstrated that our CoPhNet model can achieve high classification accuracy over the simulated test set. It also holds its performance robustness under operating parameter shifts such as Signal-Noise-Ratio (SNR) and incident energy. Our work thus laid solid foundation for developing next-generation high energy gamma-rays detectors for better biomedical imaging.
摘要
吸收辐射探测器在医疗成像方面扮演着重要角色,如 пози特罗密度 Tomatoes Emission Tomography 或 Computed Tomography。最有前途的探测器是宽带隔材料温度 Semiconductor 探测器,这种探测器通过辐射 photon 与探测器材料的交互,发生 Compton 散射,从而导致多个交互 photon 事件(MIPEs)。例如 CdZnTeSe(CZTS)探测器,它们的探测能谱重叠度很高,因此使用传统的读取电路或信号处理算法来分辨 Compton 散射事件和光电事件是很困难的。在这种情况下,我们提出了一种深度学习分类器 CoPhNet,可以在 CdZnTeSe(CZTS) 探测器上分辨 Compton 散射和光电事件。我们的 CoPhNet 模型通过使用 simulate 数据来模拟实际 CZTS 探测器脉冲,并 validate 使用实验和 simulate 数据。这些结果表明,我们的 CoPhNet 模型可以在 simulate 测试集上 достичь高精度分类。此外,我们的模型还保持了操作参数的变化,如信号噪声比(SNR)和入射能量的稳定性。因此,我们的工作为开发下一代高能γ射线探测器奠定了坚实的基础,以提高生物医疗成像。
Complexity of Single Loop Algorithms for Nonlinear Programming with Stochastic Objective and Constraints
results: 在三个不同情况下,分别需要 $\widetilde{O}(\varepsilon^{-3})$, $\widetilde{O}(\varepsilon^{-4})$, $\widetilde{O}(\varepsilon^{-5})$的复杂性来找到 $\varepsilon$-近似首项约束的解。这些复杂性都是最佳known的。Abstract
We analyze the complexity of single-loop quadratic penalty and augmented Lagrangian algorithms for solving nonconvex optimization problems with functional equality constraints. We consider three cases, in all of which the objective is stochastic and smooth, that is, an expectation over an unknown distribution that is accessed by sampling. The nature of the equality constraints differs among the three cases: deterministic and linear in the first case, deterministic, smooth and nonlinear in the second case, and stochastic, smooth and nonlinear in the third case. Variance reduction techniques are used to improve the complexity. To find a point that satisfies $\varepsilon$-approximate first-order conditions, we require $\widetilde{O}(\varepsilon^{-3})$ complexity in the first case, $\widetilde{O}(\varepsilon^{-4})$ in the second case, and $\widetilde{O}(\varepsilon^{-5})$ in the third case. For the first and third cases, they are the first algorithms of "single loop" type (that also use $O(1)$ samples at each iteration) that still achieve the best-known complexity guarantees.
摘要
我们分析单循环二项罚则和增强拉格历预算法来解决非凸似格式化问题。我们考虑三个情况,其中所有问题的目标都是随机且平滑的,即透过抽样取得未知分布的期望值。具体来说,第一个情况中的等价条件是决定性的且线性的,第二个情况中的等价条件是决定性的、平滑的且非线性的,第三个情况中的等价条件是随机的、平滑的且非线性的。我们使用差分reduction技术来改善复杂性。为了找到满足 $\varepsilon$-近似first-order条件的点,我们需要 $\widetilde{O}(\varepsilon^{-3})$ 的复杂性在第一个情况中,$\widetilde{O}(\varepsilon^{-4})$ 的复杂性在第二个情况中,以及 $\widetilde{O}(\varepsilon^{-5})$ 的复杂性在第三个情况中。在第一个和第三个情况中,这些算法是单循环类型的首个Algorithm (也使用 $O(1)$ 抽样在每个迭代中),它们仍能获得最好的复杂性保证。
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
paper_authors: Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
for: solving large-scale two-player zero-sum games in practice
methods: regret matching$^+$ (RM$^+$) and its variants
results: last-iterate convergence properties of various popular variants of RM$^+$Here are the concise summaries in Simplified Chinese text:
for: 解决大规模的两个玩家零点游戏问题
methods: regret matching$^+$ (RM$^+$) 和其变种
results: regret matching$^+$ 的最后轮收敛性质的研究Abstract
Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.
摘要
算法基于后悔匹配,特别是后悔匹配$^+$(RM$^+)和其变体,在实务中是解决大规模两者零余游戏的最受欢迎方法。不同于如优化 Gradient Descent 的算法,它们在零余游戏中有强烈的最后迭代和均匀收敛性 properties,但是关于 regret-matching 算法的最后迭代属性几乎没有知识。为了解决这个问题,在这篇论文中,我们研究 regret-matching 算法的最后迭代属性。首先,我们通过实验发现了几个实际的variant,例如同步 RM$^+$, 交替 RM$^+$, 和同步预测 RM$^+$,都没有最后迭代 convergence guarantees,甚至在一个简单的 $3\times 3$ 游戏中。然后,我们证明了这些算法的新变体,基于抽象技术,具有最后迭代 convergence:我们证明了 extragradient RM$^+$ 和精确预测 RM$^+$ 在无限次迭代下具有 asymptotic last-iterate convergence(无范围)和 $1/\sqrt{t}$ best-iterate convergence。最后,我们引入了重新启动这些算法的变体,并证明它们具有线性率最后迭代 convergence。
Recovering Linear Causal Models with Latent Variables via Cholesky Factorization of Covariance Matrix
results: 在 sintetic和实际数据上,该算法比前方法快速得多,并在状态艺术表现上达到了顶峰性。在等误差假设下,我们还提出了一种优化过程,用于处理含隐变量的DAG回归问题,并在数值仿真中显示其效果。Abstract
Discovering the causal relationship via recovering the directed acyclic graph (DAG) structure from the observed data is a well-known challenging combinatorial problem. When there are latent variables, the problem becomes even more difficult. In this paper, we first propose a DAG structure recovering algorithm, which is based on the Cholesky factorization of the covariance matrix of the observed data. The algorithm is fast and easy to implement and has theoretical grantees for exact recovery. On synthetic and real-world datasets, the algorithm is significantly faster than previous methods and achieves the state-of-the-art performance. Furthermore, under the equal error variances assumption, we incorporate an optimization procedure into the Cholesky factorization based algorithm to handle the DAG recovering problem with latent variables. Numerical simulations show that the modified "Cholesky + optimization" algorithm is able to recover the ground truth graph in most cases and outperforms existing algorithms.
摘要
发现 causal 关系via recovering directed acyclic graph (DAG) 结构从观察数据是一个常见的 combinatorial 问题。当存在潜在变量时,问题变得更加困难。在这篇论文中,我们首先提出了 DAG 结构回归算法,基于观察数据的协方差矩阵的 Cholesky 分解。该算法快速易于实现,并有理论保证对数据进行正确回归。在synthetic和实际数据集上,该算法比前方法更快速,并达到了领域内最佳性能。进一步,在等误差假设下,我们将 Cholesky 基于算法与优化过程结合,以处理含潜在变量的 DAG 回归问题。numerical simulations 表明,修改后的 "Cholesky + 优化" 算法能够回归真实图形,并在大多数情况下高于现有算法。
results: 研究发现,可以使用这种转换方法来协调神经网络模型的编码器和解码器,并在不同的训练、领域、架构和下游任务中进行有效的适应。此外,研究还发现可以使用零shot学习来协调文本编码器和视觉解码器,即使这两个模型在训练和测试数据上没有共同学习。Abstract
While different neural models often exhibit latent spaces that are alike when exposed to semantically related data, this intrinsic similarity is not always immediately discernible. Towards a better understanding of this phenomenon, our work shows how representations learned from these neural modules can be translated between different pre-trained networks via simpler transformations than previously thought. An advantage of this approach is the ability to estimate these transformations using standard, well-understood algebraic procedures that have closed-form solutions. Our method directly estimates a transformation between two given latent spaces, thereby enabling effective stitching of encoders and decoders without additional training. We extensively validate the adaptability of this translation procedure in different experimental settings: across various trainings, domains, architectures (e.g., ResNet, CNN, ViT), and in multiple downstream tasks (classification, reconstruction). Notably, we show how it is possible to zero-shot stitch text encoders and vision decoders, or vice-versa, yielding surprisingly good classification performance in this multimodal setting.
摘要
不同神经网络经常表现出 semantic 相似的幽默空间,但这种内在相似性并不总是明显可见。为了更好地理解这种现象,我们的工作表明了如何通过简单的变换来跨越不同预训练的神经网络中的表示。我们的方法直接估算了两个给定的幽默空间之间的转换,从而实现了不需要额外训练的卷积Encoder和Decoder的团合。我们广泛验证了这种翻译过程的适应性,包括不同的训练数据、领域、架构(例如 ResNet、CNN、ViT)以及多个下游任务(分类、重建)。尤其是,我们示出了可以零shot卷积文本Encoder和视觉Decoder,或者vice versa,即使没有额外训练,也可以获得良好的分类性能在多模态 setting。
Online Signal Estimation on the Graph Edges via Line Graph Transformation
results: 该方法可以准确地预测图边信号,并且可以在线实时进行预测。Abstract
We propose the Line Graph Normalized Least Mean Square (LGNLMS) algorithm for online time-varying graph edge signals prediction. LGNLMS utilizes the Line Graph to transform graph edge signals into the node of its edge-to-vertex dual. This enables edge signals to be processed using established GSP concepts without redefining them on graph edges.
摘要
我们提出了线 graphs 正规化最小方差方法(LGNLMS)来线上时间变化的图Edge信号预测。LGNLMS利用线图来转换图Edge信号到它的edge-to-vertex dual中的node。这使得图Edge信号可以使用已经定义的GSP概念进行处理,而不需要在图Edge上重新定义它们。
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
results: 研究发现,使用 K-FAC 优化方法可以快速地训练神经网络,并且可以在 $50$-$75%$ 的步数上达到相同的VALIDATION 度量目标。此外,两种不同的 K-FAC 变种在训练图 neural network 和视Transformer 中也具有类似的性能。Abstract
The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currently no framework to apply it to generic architectures, specifically ones with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers which motivate two flavours of K-FAC -- $\textit{expand}$ and $\textit{reduce}$. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimising the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in $50$-$75\%$ of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.
摘要
Modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with weight-sharing. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimization method, has shown promise in speeding up neural network training and reducing computational costs. However, there is currently no framework to apply it to generic architectures, specifically those with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers that motivate two flavors of K-FAC - expand and reduce. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimizing the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in 50-75% of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.Here's the text with the Chinese characters:现代神经网络架构,如 transformers、卷积 neural network 或 graph neural network,可以表示为线性层的 linear layers with weight-sharing。Kronecker-Factored Approximate Curvature (K-FAC),一种二阶优化方法,已经示出了减少神经网络训练时间的损益。然而,目前没有一个框架来应用它们到通用架构,尤其是 Linear weight-sharing 层。在这项工作中,我们确定了 linear weight-sharing 层的两种不同设置,它们驱动了 two flavors of K-FAC - expand 和 reduce。我们证明它们是深度线性网络中 weight-sharing 的 exact setting。另外,K-FAC-reduce 通常比 K-FAC-expand 更快,我们利用它来加速自动超参的选择。最后,我们发现使用这两种 K-FAC 变化来训练图 neural network 和视Transformer 时,两者之间没有很大差异。然而,它们都能在首选Reference run 的 $50$-$75\%$ 步骤中达到固定的验证指标目标,这意味着它们在wall-clock时间中具有相似的改善。这一点 highlights K-FAC 的潜在应用可能性。
Controllable Music Production with Diffusion Models and Guidance Gradients
methods: 这个论文使用 sampling-time 引导的扩散模型进行音乐生成,支持 both reconstruction 和 classification 损失,或任何组合。
results: 这个论文可以生成匹配周围上下文的音乐,或符合类型分布或预训练 embedding 模型。Abstract
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic characteristics to existing audio clips. We achieve this by applying guidance at sampling time in a simple framework that supports both reconstruction and classification losses, or any combination of the two. This approach ensures that generated audio can match its surrounding context, or conform to a class distribution or latent representation specified relative to any suitable pre-trained classifier or embedding model.
摘要
我们展示了如何使用扩散模型进行条件生成,以解决音乐生产中的各种现实任务,包括续写、填充和重新生成音乐Audio,创造缓和过渡 между两个不同的音乐轨道,以及将欲要的风格特征传递到现有音频clip中。我们通过在采样时提供指导,实现了这些任务,并且支持 both reconstruction和 classificationlosses,或任何组合其中的两者。这种方法确保生成的音频能匹配周围的上下文,或者符合任何适用的预训练分类器或嵌入模型中的分布或表示。
A Collaborative Filtering-Based Two Stage Model with Item Dependency for Course Recommendation
results: 论文的实验结果表明,该方法可以达到 AUC 为 0.97 的高精度。Abstract
Recommender systems have been studied for decades with numerous promising models been proposed. Among them, Collaborative Filtering (CF) models are arguably the most successful one due to its high accuracy in recommendation and elimination of privacy-concerned personal meta-data from training. This paper extends the usage of CF-based model to the task of course recommendation. We point out several challenges in applying the existing CF-models to build a course recommendation engine, including the lack of rating and meta-data, the imbalance of course registration distribution, and the demand of course dependency modeling. We then propose several ideas to address these challenges. Eventually, we combine a two-stage CF model regularized by course dependency with a graph-based recommender based on course-transition network, to achieve AUC as high as 0.97 with a real-world dataset.
摘要
“推荐系统已经进行了多年研究,许多有挑战性的模型被提出。其中,协同推荐(CF)模型被认为是最成功的,主要因为它的推荐精度高和不需要隐私敏感的人际资料在训练中。本文将CF模型应用到课程推荐任务上,并提出了许多问题。包括缺乏评价和元数据、课程注册分布不均衡以及课程之间的依赖关系模型。我们随后提出了一些解决方案。最终,我们结合了两阶段CF模型与基于课程迁移网络的图形推荐,实现了实际数据上的AUC值为0.97。”Note that Simplified Chinese is the official standard for Chinese writing in mainland China and is used in this translation. Traditional Chinese is used in Taiwan and Hong Kong, and the translation would be slightly different in those variants.
Structure Learning with Adaptive Random Neighborhood Informed MCMC
paper_authors: Alberto Caron, Xitong Liang, Samuel Livingstone, Jim Griffin
For: The paper is written for learning the structure of Directed Acyclic Graphs (DAGs) under observational data, using a fully-Bayesian approach with a novel Markov Chain Monte Carlo (MCMC) sampler called PARNI-DAG.* Methods: PARNI-DAG uses a locally informed, adaptive random neighborhood proposal to efficiently sample DAGs, and it includes a pre-tuning procedure of the sampler’s parameters to ensure better scalability.* Results: PARNI-DAG quickly converges to high-probability regions and is less likely to get stuck in local modes in high-dimensional settings, and it is demonstrated to be effective in learning DAG structures on a variety of experiments.Here is the same information in Simplified Chinese text:* For: 本 paper 用于在观察数据下学习 Directed Acyclic Graphs (DAGs) 的结构,使用完全 Bayesian 方法和一种新的 Markov Chain Monte Carlo (MCMC) 抽取器 called PARNI-DAG。* Methods: PARNI-DAG 使用本地 Informed, adaptive random neighborhood proposal 来有效地抽取 DAGs,并包括一种适应性 parameter 的预调整过程,以确保更好的扩展性。* Results: PARNI-DAG 快速 converges to high-probability regions,并且在高维度设置中更 unlikely to get stuck in local modes,并在各种实验中证明其效果。Abstract
In this paper, we introduce a novel MCMC sampler, PARNI-DAG, for a fully-Bayesian approach to the problem of structure learning under observational data. Under the assumption of causal sufficiency, the algorithm allows for approximate sampling directly from the posterior distribution on Directed Acyclic Graphs (DAGs). PARNI-DAG performs efficient sampling of DAGs via locally informed, adaptive random neighborhood proposal that results in better mixing properties. In addition, to ensure better scalability with the number of nodes, we couple PARNI-DAG with a pre-tuning procedure of the sampler's parameters that exploits a skeleton graph derived through some constraint-based or scoring-based algorithms. Thanks to these novel features, PARNI-DAG quickly converges to high-probability regions and is less likely to get stuck in local modes in the presence of high correlation between nodes in high-dimensional settings. After introducing the technical novelties in PARNI-DAG, we empirically demonstrate its mixing efficiency and accuracy in learning DAG structures on a variety of experiments.
摘要
在本文中,我们介绍了一种新的MCMC抽样器,PARNI-DAG,用于在观察数据下进行完全 Bayesian 结构学习。假设 causal sufficiency,该算法允许直接从 posterior 分布中采样 Directed Acyclic Graphs (DAGs)。PARNI-DAG 使用地方 Informed 适应随机 neighboor proposal,从而实现更好的混合性。此外,为了提高扩展性,我们将 PARNI-DAG 结合一种预调整 sampler 参数的方法,该方法利用一个基于约束或 scoring 算法 derivied的skeleton graph。这些新特性使得 PARNI-DAG 快速 converges to high-probability regions,并且更 unlikely to get stuck in local modes 在高维度设置下。我们在技术新特性的介绍后,通过实验证明 PARNI-DAG 的混合效率和准确性在学习 DAG 结构方面。
Flexible Tails for Normalising Flows, with Application to the Modelling of Financial Return Data
results: 通过应用这种方法,可以模拟金融收益的极端冲击,并生成新的可能极端的返回数据集。Abstract
We propose a transformation capable of altering the tail properties of a distribution, motivated by extreme value theory, which can be used as a layer in a normalizing flow to approximate multivariate heavy tailed distributions. We apply this approach to model financial returns, capturing potentially extreme shocks that arise in such data. The trained models can be used directly to generate new synthetic sets of potentially extreme returns
摘要
我们提出一种转换,可以改变分布的尾部性质,基于极值理论,可以用作正常化流中的层,近似多变量极大尾部分布。我们应用这种方法模型金融回报,捕捉可能出现的极端冲击。训练模型可以直接生成新的可能极端的 sintetic返回集。Here's a breakdown of the translation:* "We propose" is translated as "我们提出" (wǒmen tīshì)* "a transformation" is translated as "一种转换" (yī zhī zhuān biàn)* "capable of altering the tail properties" is translated as "可以改变分布的尾部性质" (kěyǐ gǎi biàn fēn xiǎng de yǐ yóu)* "motivated by extreme value theory" is translated as "基于极值理论" (jī yù lǐ lún)* "which can be used as a layer in a normalizing flow" is translated as "可以用作正常化流中的层" (kěyǐ yòu yì zhèng huà lù)* "to approximate multivariate heavy tailed distributions" is translated as "近似多变量极大尾部分布" (jìn shì duō biàn yù jí dà wěi bù fēn xiǎng)* "We apply this approach to model financial returns" is translated as "我们应用这种方法模型金融回报" (wǒmen yìng yòu zhèng xìng fāng yì jīn yìu huì bò)* "capturing potentially extreme shocks that arise in such data" is translated as "捕捉可能出现的极端冲击" (bò shòu kěnéng chūshì de jí dà chōng jī)* "The trained models can be used directly to generate new synthetic sets of potentially extreme returns" is translated as "训练模型可以直接生成新的可能极端的 sintetic返回集" (xùn xīng mó delè yì yī zhèng shēng chuāng xīn de kěnéng jí dà de sintetic fù bù)
Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference Accelerators
methods: 这 paper 使用了数据流映射的空间和时间数据重复特性,以及建筑设计的指示信息,对数据流基于 CNN 加速器进行了Memory-based 侧通道攻击,以恢复 CNN 模型结构。
results: 实验结果表明,该攻击可以成功恢复Lenet、Alexnet和VGGnet16等Popular CNN 模型的结构。Abstract
Convolution Neural Networks (CNNs) are widely used in various domains. Recent advances in dataflow-based CNN accelerators have enabled CNN inference in resource-constrained edge devices. These dataflow accelerators utilize inherent data reuse of convolution layers to process CNN models efficiently. Concealing the architecture of CNN models is critical for privacy and security. This paper evaluates memory-based side-channel information to recover CNN architectures from dataflow-based CNN inference accelerators. The proposed attack exploits spatial and temporal data reuse of the dataflow mapping on CNN accelerators and architectural hints to recover the structure of CNN models. Experimental results demonstrate that our proposed side-channel attack can recover the structures of popular CNN models, namely Lenet, Alexnet, and VGGnet16.
摘要
卷积神经网络(CNN)在各个领域广泛应用。最新的数据流基于CNN加速器的进步使得CNN推理可以在有限的边缘设备中进行 efficiently。这些数据流加速器利用卷积层的自然数据重用来处理CNN模型。隐藏CNN模型的架构是重要的隐私和安全问题。这篇论文评估了基于数据流的CNN推理加速器中的内存相关的侧annel信息,以便还原CNN模型的结构。我们提出的攻击利用卷积层的空间和时间数据重用以及架构提示来还原流行的CNN模型 Lenet、Alexnet 和 VGGnet16 的结构。实验结果表明,我们的侧annel攻击可以成功地还原这些CNN模型的结构。
Transfer learning for improved generalizability in causal physics-informed neural networks for beam simulations
results: 实验表明,提议的方法可以快速 converge 并提供高精度的结果,并且在不同的初始条件下(包括噪声)都能够提供 precisions 的结果。此外,对 Timoshenko 钢板进行了扩展的空间和时间领域测试,并与当前 physics-informed 方法进行了比较,结果表明,提议的方法可以准确捕捉钢板的内在动态。Abstract
This paper introduces a novel methodology for simulating the dynamics of beams on elastic foundations. Specifically, Euler-Bernoulli and Timoshenko beam models on the Winkler foundation are simulated using a transfer learning approach within a causality-respecting physics-informed neural network (PINN) framework. Conventional PINNs encounter challenges in handling large space-time domains, even for problems with closed-form analytical solutions. A causality-respecting PINN loss function is employed to overcome this limitation, effectively capturing the underlying physics. However, it is observed that the causality-respecting PINN lacks generalizability. We propose using solutions to similar problems instead of training from scratch by employing transfer learning while adhering to causality to accelerate convergence and ensure accurate results across diverse scenarios. Numerical experiments on the Euler-Bernoulli beam highlight the efficacy of the proposed approach for various initial conditions, including those with noise in the initial data. Furthermore, the potential of the proposed method is demonstrated for the Timoshenko beam in an extended spatial and temporal domain. Several comparisons suggest that the proposed method accurately captures the inherent dynamics, outperforming the state-of-the-art physics-informed methods under standard $L^2$-norm metric and accelerating convergence.
摘要
Traditional PINNs struggle with large space-time domains, even for problems with known analytical solutions. To overcome this limitation, the authors employ a causality-respecting PINN loss function that effectively captures the underlying physics. However, this approach lacks generalizability.To address this issue, the authors propose using transfer learning while adhering to causality to accelerate convergence and ensure accurate results across diverse scenarios. The proposed method is tested on the Euler-Bernoulli beam for various initial conditions, including noisy data, and demonstrates improved accuracy and faster convergence compared to state-of-the-art physics-informed methods.The proposed method is also applied to the Timoshenko beam in a larger spatial and temporal domain, showing its potential for simulating the dynamics of more complex beam systems. The results suggest that the proposed method accurately captures the inherent dynamics of the beams, outperforming existing methods under the standard $L^2$-norm metric.
Personalized Assignment to One of Many Treatment Arms via Regularized and Clustered Joint Assignment Forests
results: 通过 simulations 和理论模型,我们发现使用聚合信息可以减少噪声,并且可以实现个性化分配的医疗效果。Here’s a more detailed explanation of each point:
for: The paper is focused on learning personalized assignments to one of many treatment arms from a randomized controlled trial. The goal is to estimate the heterogeneous treatment effects for each arm, while accounting for the excess variance that can arise when there are many arms.
methods: The paper proposes two methods to address this challenge: (1) a regularized forest-based assignment algorithm based on greedy recursive partitioning, and (2) a clustering scheme that combines treatment arms with consistently similar outcomes. These methods pool information across treatment arms to reduce the excess variance and improve the accuracy of the treatment assignments.
results: The paper presents the results of simulations and a theoretical model that demonstrate the effectiveness of the proposed methods. The results show that the regularized optimization and clustering methods can lead to significant gains in terms of predicting arm-wise outcomes and achieving sizable utility gains from personalization.Abstract
We consider learning personalized assignments to one of many treatment arms from a randomized controlled trial. Standard methods that estimate heterogeneous treatment effects separately for each arm may perform poorly in this case due to excess variance. We instead propose methods that pool information across treatment arms: First, we consider a regularized forest-based assignment algorithm based on greedy recursive partitioning that shrinks effect estimates across arms. Second, we augment our algorithm by a clustering scheme that combines treatment arms with consistently similar outcomes. In a simulation study, we compare the performance of these approaches to predicting arm-wise outcomes separately, and document gains of directly optimizing the treatment assignment with regularization and clustering. In a theoretical model, we illustrate how a high number of treatment arms makes finding the best arm hard, while we can achieve sizable utility gains from personalization by regularized optimization.
摘要
我们考虑学习对很多治疗臂的对照试验 personnalized 任务。标准方法可能在这种情况下表现不佳,因为过度差异。我们 instead propose 方法可以聚集到治疗臂上的信息:首先,我们考虑一种对应树基于循环分割的调整算法,将效果估计调整到不同的臂上。其次,我们将 clustering 方案与治疗臂相结合,以实现具有相似结果的臂集合。在一个 simulated study 中,我们比较这些方法和分别预测每个臂的结果,并证明了通过调整和聚集可以实现更高的利益。在一个理论模型中,我们显示出一高数量的治疗臂使得找到最佳臂的问题困难,但是通过调整估计可以实现较大的价值增加。
Online Student-$t$ Processes with an Overall-local Scale Structure for Modelling Non-stationary Data
for: Handle time-dependent data with non-stationarity and heavy-tailed errors.
methods: Bayesian mixture of student-$t$ processes with overall-local scale structure for the covariance, and sequential Monte Carlo (SMC) sampler for online inference.
results: Superior performance compared to typical Gaussian process-based models on real-world data sets.Abstract
Time-dependent data often exhibit characteristics, such as non-stationarity and heavy-tailed errors, that would be inappropriate to model with the typical assumptions used in popular models. Thus, more flexible approaches are required to be able to accommodate such issues. To this end, we propose a Bayesian mixture of student-$t$ processes with an overall-local scale structure for the covariance. Moreover, we use a sequential Monte Carlo (SMC) sampler in order to perform online inference as data arrive in real-time. We demonstrate the superiority of our proposed approach compared to typical Gaussian process-based models on real-world data sets in order to prove the necessity of using mixtures of student-$t$ processes.
摘要
时间相关数据经常具有非站立性和重 tailed 错误特点,这些特点不适合使用流行的模型假设。因此,需要更 flexible 的方法来处理这些问题。为此,我们提议使用 Bayesian mixture of student-$t$ processes 以及全局本地尺度结构来描述协方差。此外,我们还使用 sequential Monte Carlo (SMC) 样本器进行在线推断,以便在实时接收数据时进行推断。我们通过对实际数据集进行比较,证明了我们的提议方法的必要性,并且超过了 typical Gaussian process-based 模型。
Learning to optimize by multi-gradient for multi-objective optimization
methods: 本研究提出了一种基于自动学习的多 gradient 学习方法(ML2O),该方法可以自动学习一个生成器或映射,以更新方向。此外,我们还提出了一种受保护的多 gradient 学习方法(GML2O),并证明其迭代序列会 converges to a Pareto 稳定点。
results: 实验结果表明,我们学习的优化器在训练多任务学习(MTL)神经网络时表现更高效,比手动设计的竞争对手。Abstract
The development of artificial intelligence (AI) for science has led to the emergence of learning-based research paradigms, necessitating a compelling reevaluation of the design of multi-objective optimization (MOO) methods. The new generation MOO methods should be rooted in automated learning rather than manual design. In this paper, we introduce a new automatic learning paradigm for optimizing MOO problems, and propose a multi-gradient learning to optimize (ML2O) method, which automatically learns a generator (or mappings) from multiple gradients to update directions. As a learning-based method, ML2O acquires knowledge of local landscapes by leveraging information from the current step and incorporates global experience extracted from historical iteration trajectory data. By introducing a new guarding mechanism, we propose a guarded multi-gradient learning to optimize (GML2O) method, and prove that the iterative sequence generated by GML2O converges to a Pareto critical point. The experimental results demonstrate that our learned optimizer outperforms hand-designed competitors on training multi-task learning (MTL) neural network.
摘要
人工智能(AI)的发展对科学研究带来了新的学习基本设计方法的需求,这需要我们重新评估多目标优化(MOO)方法的设计。新一代MOO方法应该是基于自动学习而不是手动设计。在这篇论文中,我们介绍了一种新的自动学习 парадиг,用于优化MOO问题,并提出了多Gradient学习优化(ML2O)方法。这种学习基于的方法可以自动学习一个生成器(或映射),以更新方向。通过利用当前步骤中的信息和历史迭代轨迹数据来捕捉当地特征,ML2O方法可以学习当地景观。我们还提出了一种新的保护机制,称为卫护多Gradient学习优化(GML2O)方法,并证明其迭代序列会 converge to a Pareto优点。实验结果表明,我们学习优化器比手动设计的竞争对手在训练多任务学习(MTL)神经网络上表现更好。
Machine Learning Without a Processor: Emergent Learning in a Nonlinear Electronic Metamaterial
paper_authors: Sam Dillavou, Benjamin D Beyer, Menachem Stern, Marc Z Miskin, Andrea J Liu, Douglas J Durian
for: 这项研究旨在开发一种可以进行快速、能效的分析学习机制,以替代传统的深度学习算法。
methods: 研究人员使用了一种基于晶体管的非线性学习元件,实现了无计算机的非线性学习。
results: 研究人员发现,这种非线性学习元件可以完成传统 linear 系统无法实现的任务,包括 XOR 和非线性回归。此外,这种系统还具有较低的能耗和可重新编辑的特点。Abstract
Standard deep learning algorithms require differentiating large nonlinear networks, a process that is slow and power-hungry. Electronic learning metamaterials offer potentially fast, efficient, and fault-tolerant hardware for analog machine learning, but existing implementations are linear, severely limiting their capabilities. These systems differ significantly from artificial neural networks as well as the brain, so the feasibility and utility of incorporating nonlinear elements have not been explored. Here we introduce a nonlinear learning metamaterial -- an analog electronic network made of self-adjusting nonlinear resistive elements based on transistors. We demonstrate that the system learns tasks unachievable in linear systems, including XOR and nonlinear regression, without a computer. We find our nonlinear learning metamaterial reduces modes of training error in order (mean, slope, curvature), similar to spectral bias in artificial neural networks. The circuitry is robust to damage, retrainable in seconds, and performs learned tasks in microseconds while dissipating only picojoules of energy across each transistor. This suggests enormous potential for fast, low-power computing in edge systems like sensors, robotic controllers, and medical devices, as well as manufacturability at scale for performing and studying emergent learning.
摘要
标准深度学习算法需要分别大的非线性网络,这是一个缓态且电力消耗很大的过程。电子学习元件提供了可能快速、效率高、错误快速修复的硬件 для数位机器学习,但现有的实现方法是线性的,这限制了它们的能力。这些系统与人工神经网络以及大脑有所不同,因此尚未探讨了非线性元素的可行性和价值。我们现在引入了非线性学习元件---一个基于普通遮蔽器的数位电子网络。我们证明了这个系统可以进行线性系统无法进行的任务,包括XOR和非线性回归,并且不需要电脑。我们发现了我们的非线性学习元件可以将训练错误分解为不同的模式(平均值、斜率、曲线),类似于人工神经网络的spectral bias。这个网络的普通遮蔽器是可靠的、可重复 trains 秒钟内,并且在微秒钟内完成学习任务,同时电子普通遮蔽器只消耗了每个普通遮蔽器的picojoules的能量。这表明这种快速、低功率的计算在边缘系统中,如感测器、 robot控制器和医疗设备,以及在大规模生产中进行和研究emergent learning的可能性很大。
results: 这个智能噪音减少系统可以干预低频噪音,并且搭配睡眠追踪、音乐播放等应用程序,可以提供轻松、安全、智能的噪音减少解决方案。Abstract
While our world is filled with its own natural sounds that we can't resist enjoying, it is also chock-full of other sounds that can be irritating, this is noise. Noise not only influences the working efficiency but also the human's health. The problem of reducing noise is one of great importance and great difficulty. The problem has been addressed in many ways over the years. The current methods for noise reducing mostly rely on the materials and transmission medium, which are only effective to some extent for the high frequency noise. However, the effective reduction noise method especially for low frequency noise is very limited. Here we come up with a noise reduction system consist of a sensor to detect the noise in the environment. Then the noise will be sent to an electronic control system to process the noise, which will generate a reverse phase frequency signal to counteract the disturbance. Finally, the processed smaller noise will be broadcasted by the speaker. Through this smart noise reduction system, even the noise with low-frequency can be eliminated. The system is also integrated with sleep tracking and music player applications. It can also remember and store settings for the same environment, sense temperature, and smart control of home furniture, fire alarm, etc. This smart system can transfer data easily by Wi-Fi or Bluetooth and controlled by its APP. In this project, we will present a model of the above technology which can be used in various environments to prevent noise pollution and provide a solution to the people who have difficulties finding a peaceful and quiet environment for sleep, work or study.
摘要
在我们的世界中,充满自然的声音是我们无法抵抗的乐趣之一,但是这些声音也包括了吵闹和干扰的声音,它们不仅影响工作效率,还影响人类的健康。减少吵闹的问题是一项非常重要且具有挑战性的问题。过去多年来,人们已经使用了许多方法来解决这个问题,但现有的减少吵闹方法主要依靠材料和传输媒体,它们只能够有限地降低高频吵闹。然而,对低频吵闹的有效减少方法尚存在很大的限制。为了解决这问题,我们提出了一种吵闹减少系统,该系统包括一个检测环境吵闹的感知器。然后,吵闹将被传输到一个电子控制系统进行处理,该系统会生成一个逆相频率信号,以抵消干扰。最后,已经处理过的小于吵闹将被广播器播发出来。通过这个智能吵闹减少系统,甚至可以减少低频吵闹。这个系统还 integrates 睡眠跟踪和音乐播放应用程序,可以记录和存储相同环境的设置,感测 темпераature 和智能控制家具、火警等。这个智能系统可以通过 Wi-Fi 或蓝牙传输数据,并由其APP控制。在这个项目中,我们将展示一种可以在不同环境中应用的技术模型,用于防止吵闹污染和提供舒适的睡眠、工作或学习环境。
Real-Time Magnetic Tracking and Diagnosis of COVID-19 via Machine Learning
paper_authors: Dang Nguyen, Phat K. Huynh, Vinh Duc An Bui, Kee Young Hwang, Nityanand Jain, Chau Nguyen, Le Huu Nhat Minh, Le Van Truong, Xuan Thanh Nguyen, Dinh Hoang Nguyen, Le Tien Dung, Trung Q. Le, Manh-Huong Phan
results: 研究发现,这种检测平台可以准确地识别 COVID-19 患者和健康者,准确率高达 90%。Abstract
The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through three specific breath testing protocols: normal breath, holding breath, and deep breath. We collected breath data from both COVID-19 patients and healthy subjects in Vietnam using this platform, which then served to train and validate ML models. Our evaluation encompassed multiple ML algorithms, including support vector machines and deep learning models, assessing their ability to diagnose COVID-19. Our multi-model validation methodology ensures a thorough comparison and grants the adaptability to select the most optimal model, striking a balance between diagnostic precision with model interpretability. The findings highlight the exceptional potential of our diagnostic tool in pinpointing respiratory anomalies, achieving over 90% accuracy. This innovative sensor technology can be seamlessly integrated into healthcare settings for patient monitoring, marking a significant enhancement for the healthcare infrastructure.
摘要
COVID-19 大流行强调了可靠、不侵入的诊断工具的重要性,以便实施有效的公共卫生措施。在这项工作中,我们将磁力呼吸感测技术(MRST)与机器学习(ML)相结合,创造了一个用于实时跟踪和诊断COVID-19和其他呼吸疾病的诊断平台。MRST可以高精度地捕捉呼吸模式,通过三种呼吸测试协议:正常呼吸、停吸和深呼吸。我们在越南collected呼吸数据,并使用这个平台来训练和验证ML模型。我们的评估包括多种ML算法,包括支持向量机和深度学习模型,评估它们是否能够诊断COVID-19。我们的多模型验证方法可以很好地比较多种模型,从而选择最佳模型,并且能够寻求在诊断精度和模型解释性之间取得平衡。研究发现,我们的诊断工具可以准确地检测呼吸异常,达到90%以上的准确率。这种创新的感测技术可以轻松地integrated into healthcare settings,为患者监测提供了 significanthenhancement。
Retrieval-Based Reconstruction For Time-series Contrastive Learning
results: 该研究通过多种Modalities的验证实验表明,在REBAR contrastive learning框架中,可以学习一个高效的嵌入,并且在下游任务上达到了 estado-of-the-art 表现。Abstract
The success of self-supervised contrastive learning hinges on identifying positive data pairs that, when pushed together in embedding space, encode useful information for subsequent downstream tasks. However, in time-series, this is challenging because creating positive pairs via augmentations may break the original semantic meaning. We hypothesize that if we can retrieve information from one subsequence to successfully reconstruct another subsequence, then they should form a positive pair. Harnessing this intuition, we introduce our novel approach: REtrieval-BAsed Reconstruction (REBAR) contrastive learning. First, we utilize a convolutional cross-attention architecture to calculate the REBAR error between two different time-series. Then, through validation experiments, we show that the REBAR error is a predictor of mutual class membership, justifying its usage as a positive/negative labeler. Finally, once integrated into a contrastive learning framework, our REBAR method can learn an embedding that achieves state-of-the-art performance on downstream tasks across various modalities.
摘要
文章标题:基于检索的重构减错学习文章摘要:自监督减错学习的成功取决于标识符合的数据对,使得在嵌入空间中拼接起来的信息具有下游任务的用于。然而,在时序序列中,通过扩展可能会破坏原始 semantics 的含义。我们假设如果可以从一个子序列中检索到另一个子序列,并成功地重构它,那么它们应该组成一个正确的对。基于这个假设,我们介绍了我们的新方法:REtrieval-BAsed Reconstruction(REBAR)减错学习。首先,我们使用卷积 convolutional cross-attention 架构来计算 REBAR 错误 между两个不同的时序序列。然后,通过验证实验,我们表明 REBAR 错误是分类成员之间的共同标识符,因此可以作为正/负标签。最后,我们将 REBAR 方法integrated into a contrastive learning framework,可以学习一个在不同模式下 achieve state-of-the-art 的嵌入。
Fixed-Budget Best-Arm Identification in Sparse Linear Bandits
results: 研究人员通过仔细选择几何参数(如lasso的正则化参数),并在两个阶段中均衡错误概率,从而得到了较低的错误概率。此外,研究人员还证明了lasso-od算法在稀疏和高维的线性弹性中具有几乎最佳性。最后,通过数值示例,研究人员证明了lasso-od算法在非稀疏的线性弹性中的显著性能提高。Abstract
We study the best-arm identification problem in sparse linear bandits under the fixed-budget setting. In sparse linear bandits, the unknown feature vector $\theta^*$ may be of large dimension $d$, but only a few, say $s \ll d$ of these features have non-zero values. We design a two-phase algorithm, Lasso and Optimal-Design- (Lasso-OD) based linear best-arm identification. The first phase of Lasso-OD leverages the sparsity of the feature vector by applying the thresholded Lasso introduced by Zhou (2009), which estimates the support of $\theta^*$ correctly with high probability using rewards from the selected arms and a judicious choice of the design matrix. The second phase of Lasso-OD applies the OD-LinBAI algorithm by Yang and Tan (2022) on that estimated support. We derive a non-asymptotic upper bound on the error probability of Lasso-OD by carefully choosing hyperparameters (such as Lasso's regularization parameter) and balancing the error probabilities of both phases. For fixed sparsity $s$ and budget $T$, the exponent in the error probability of Lasso-OD depends on $s$ but not on the dimension $d$, yielding a significant performance improvement for sparse and high-dimensional linear bandits. Furthermore, we show that Lasso-OD is almost minimax optimal in the exponent. Finally, we provide numerical examples to demonstrate the significant performance improvement over the existing algorithms for non-sparse linear bandits such as OD-LinBAI, BayesGap, Peace, LinearExploration, and GSE.
摘要
我们研究最佳臂识别问题在简线性弹珠下,尤其是在固定预算设定下。在简线性弹珠中,未知特征向量 $\theta^*$ 可能是高维度 $d$,但只有一些,例如 $s \ll d$ 的特征有非零值。我们设计了两相运算法,即 Lasso 和 Optimal-Design-(Lasso-OD)基于的线性最佳臂识别。第一相的 Lasso-OD 利用特征向量的简单性,通过实际 Zhou (2009) 提出的降顿 Lasso,估计 $\theta^*$ 的支持正确地使用对选择的枪和设计矩阵获得的奖励。第二相的 Lasso-OD 则应用 Yang 和 Tan (2022) 提出的 OD-LinBAI 算法。我们谨慎地选择几何 Parameters(如 Lasso 的 regularization 参数),并将两相的错误概率均衡,以取得非对应数学上的最佳性。对于固定的 $s$ 和 $T$,Lasso-OD 的错误概率的指数随 $s$ 而变化,从而获得高维度和简线性弹珠的明显性能提升。此外,我们还证明 Lasso-OD 是对数最佳的。最后,我们提供了一些实际的数据,以证明 Lasso-OD 对非简线性弹珠的现有算法,如 OD-LinBAI、BayesGap、Peace、LinearExploration 和 GSE 的性能有很大的提升。
results: 论文通过对一些常见的 Bayesian 模型进行评估,显示了 DMVI 的 posterior 推理结果比 contemporary 方法在 PPL 中更为准确,同时 computation cost 相对 similar,且需要更少的手动调整。Abstract
We propose Diffusion Model Variational Inference (DMVI), a novel method for automated approximate inference in probabilistic programming languages (PPLs). DMVI utilizes diffusion models as variational approximations to the true posterior distribution by deriving a novel bound to the marginal likelihood objective used in Bayesian modelling. DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model. We evaluate DMVI on a set of common Bayesian models and show that its posterior inferences are in general more accurate than those of contemporary methods used in PPLs while having a similar computational cost and requiring less manual tuning.
摘要
我们提出了Diffusion Model Variational Inference(DMVI),一种新的自动化近似推理方法,用于probabilistic programming languages(PPLs)中的推理。DMVI利用扩散模型作为真实 posterior distribution的可变 approximations,通过 derive a novel bound to the marginal likelihood objective used in Bayesian modeling。DMVI易于实现,在 PPLs 中进行快速简单的推理,不受 normalizing flows 等方法的缺点,而且不需要对基于神经网络模型的任何约束。我们对一组常见的 Bayesian 模型进行评估,发现 DMVI 的 posterior inferences 通常比当今 PPLs 中的方法更准确,而且计算成本和手动调整的需求相似。
Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization
results: 提供了一个通用的算法框架,可以包括异步版本的多种算法,如SGD、分布式SGD、本地SGD、FedBuff,并且在更宽泛的假设下提供了速度 converge 的速率。Abstract
Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization. Yet, combining these two techniques together still remains a challenge. In this paper, we take a step in this direction and introduce Asynchronous SGD on Graphs (AGRAF SGD) -- a general algorithmic framework that covers asynchronous versions of many popular algorithms including SGD, Decentralized SGD, Local SGD, FedBuff, thanks to its relaxed communication and computation assumptions. We provide rates of convergence under much milder assumptions than previous decentralized asynchronous works, while still recovering or even improving over the best know results for all the algorithms covered.
摘要
distributed 和异步通信是分布机器学习中通信复杂性的两种受欢迎技术,分别取消中央把关和同步需求。然而,将这两种技术相结合仍然是一项挑战。本文提出了一个步骤,即异步SGD on Graphs(AGRAF SGD)——一种涵盖异步版本的多种流行算法,包括SGD、分布SGD、本地SGD、FedBuff等。我们提供了更加宽松的通信和计算假设,并且可以在较弱的假设下提供速度收敛率,同时仍然能够回归或者超越之前的最佳结果。
for: This paper aims to improve the robustness of Gaussian process (GP) regression by developing a provably robust and conjugate Gaussian process (RCGP) regression method.
methods: The RCGP method uses generalised Bayesian inference to perform provably robust and conjugate closed-form updates at virtually no additional cost.
results: The paper demonstrates the strong empirical performance of RCGP on a range of problems, including Bayesian optimisation and sparse variational Gaussian processes.Abstract
To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes.
摘要
要启用闭式条件,一个常见的假设在泊松过程(GP)回归是独立和同样分布的 Gaussian 观测噪声。这强大且简单的假设经常在实践中被违反,导致不可靠的推断和不确定性评估。现有的方法用于强化 GPs 会打砸闭式条件,使其变得更加不吸引实践人员并显着增加计算成本。在这篇论文中,我们示示如何在较低的成本下实现可证的Robust conjugate Gaussian process(RCGP)回归,使用通用极权推理。RCGP 特别是可以在所有情况下实现扩展 conjugate 闭式更新,因此在标准 GPs 承认它们时具有精确的预测性。为证明其强大的实际性,我们在 Bayesian 优化到 sparse variational Gaussian processes 中应用 RCGP。
Optimal Budgeted Rejection Sampling for Generative Models
paper_authors: Alexandre Verine, Muni Sreenivas Pydi, Benjamin Negrevergne, Yann Chevaleyre
for: 提高权威性模型的性能和多样性
methods: 使用优化采样方法,包括提出的最优预算采样方案和综合训练方法
results: 通过实验和理论支持,显示提出的方法可以显著提高样本质量和多样性Abstract
Rejection sampling methods have recently been proposed to improve the performance of discriminator-based generative models. However, these methods are only optimal under an unlimited sampling budget, and are usually applied to a generator trained independently of the rejection procedure. We first propose an Optimal Budgeted Rejection Sampling (OBRS) scheme that is provably optimal with respect to \textit{any} $f$-divergence between the true distribution and the post-rejection distribution, for a given sampling budget. Second, we propose an end-to-end method that incorporates the sampling scheme into the training procedure to further enhance the model's overall performance. Through experiments and supporting theory, we show that the proposed methods are effective in significantly improving the quality and diversity of the samples.
摘要
<>将文本翻译成简化中文。>Recently, rejection sampling methods have been proposed to improve the performance of discriminator-based generative models. However, these methods are only optimal with an unlimited sampling budget, and are usually applied to a generator trained independently of the rejection procedure. We first propose an Optimal Budgeted Rejection Sampling (OBRS) scheme that is provably optimal with respect to any $f$-divergence between the true distribution and the post-rejection distribution, for a given sampling budget. Second, we propose an end-to-end method that incorporates the sampling scheme into the training procedure to further enhance the model's overall performance. Through experiments and supporting theory, we show that the proposed methods are effective in significantly improving the quality and diversity of the samples.Here's the translation in Traditional Chinese:<>将文本翻译为简化中文。>最近,拒绝抽样方法已经被提议来提高标注器基本的生成模型性能。然而,这些方法仅在无限抽样预算下是最佳的,并通常将抽样程序独立应用于生成器。我们首先提出了一个Optimal Budgeted Rejection Sampling(OBRS)方案,可以在任何 $f$-分布之间的对应预算下,具有最佳的性能。其次,我们提出了一个统一方法,将抽样方案 integrate到训练过程中,以进一步提高模型的总性能。通过实验和支持理论,我们证明了提案的方法可以对样本质量和多样性作出重要改善。
Hessian Eigenvectors and Principal Component Analysis of Neural Network Weight Matrices
results: 研究发现了网络参数和网络 weights 之间的关系,并提出了一种基于这种关系的方法来 Mitigate catastrophic forgetting。这种方法可以应用于不同规模的神经网络,包括更大的网络 architecture。Abstract
This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters. Trained networks predominantly continue training in a single direction, known as the drift mode. This drift mode can be explained by the quadratic potential model of the loss function, suggesting a slow exponential decay towards the potential minima. We unveil a correlation between Hessian eigenvectors and network weights. This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network. Notably, the significance of these directions relies on two defining attributes: the curvature of their potential wells (indicated by the magnitude of Hessian eigenvalues) and their alignment with the weight vectors. Our exploration extends to the decomposition of weight matrices through singular value decomposition. This approach proves practical in identifying critical directions within the Hessian, considering both their magnitude and curvature. Furthermore, our examination showcases the applicability of principal component analysis in approximating the Hessian, with update parameters emerging as a superior choice over weights for this purpose. Remarkably, our findings unveil a similarity between the largest Hessian eigenvalues of individual layers and the entire network. Notably, higher eigenvalues are concentrated more in deeper layers. Leveraging these insights, we venture into addressing catastrophic forgetting, a challenge of neural networks when learning new tasks while retaining knowledge from previous ones. By applying our discoveries, we formulate an effective strategy to mitigate catastrophic forgetting, offering a possible solution that can be applied to networks of varying scales, including larger architectures.
摘要
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
paper_authors: Peter A. Zachares, Vahan Hovhannisyan, Alan Mosca, Yarin Gal
for: 本研究针对 conditioned on 图形函数需求下的图形生成问题进行了 novel 的设定。
methods: 我们将问题定义为文本到文本生成问题,并提出了一种基于预训练大型自然语言模型(LLM)的方法,通过 incorporating message passing layers into LLM 的架构来增加图形结构信息。
results: 我们设计了一系列公共available和广泛研究的分子和知识图数据集,以评估我们的提议方法。结果表明,我们的方法可以更好地满足请求的函数需求,与类似任务的基线方法相比,具有 statistically significant 的差异。Abstract
This work focuses on the novel problem setting of generating graphs conditioned on a description of the graph's functional requirements in a downstream task. We pose the problem as a text-to-text generation problem and focus on the approach of fine-tuning a pretrained large language model (LLM) to generate graphs. We propose an inductive bias which incorporates information about the structure of the graph into the LLM's generation process by incorporating message passing layers into an LLM's architecture. To evaluate our proposed method, we design a novel set of experiments using publicly available and widely studied molecule and knowledge graph data sets. Results suggest our proposed approach generates graphs which more closely meet the requested functional requirements, outperforming baselines developed on similar tasks by a statistically significant margin.
摘要
Crop Disease Classification using Support Vector Machines with Green Chromatic Coordinate (GCC) and Attention based feature extraction for IoT based Smart Agricultural Applications
For: 农民可以快速和准确地识别作物疾病,保持农业产量和食品安全。* Methods: 使用注意力基于的特征提取,RGB通道基于的色彩分析,支持向量机(SVM)等机器学习和深度学习算法,并可以与移动应用程序和物联网设备集成。* Results: 提出一种新的分类方法,基于先前的研究,使用注意力基于的特征提取、RGB通道基于的色彩分析、SVM等算法,并可以与移动应用程序和物联网设备集成,并且在准确率方面与其他算法相比,达到了99.69%的精度。Abstract
Crops hold paramount significance as they serve as the primary provider of energy, nutrition, and medicinal benefits for the human population. Plant diseases, however, can negatively affect leaves during agricultural cultivation, resulting in significant losses in crop output and economic value. Therefore, it is crucial for farmers to identify crop diseases. However, this method frequently necessitates hard work, a lot of planning, and in-depth familiarity with plant pathogens. Given these numerous obstacles, it is essential to provide solutions that can easily interface with mobile and IoT devices so that our farmers can guarantee the best possible crop development. Various machine learning (ML) as well as deep learning (DL) algorithms have been created & studied for the identification of plant disease detection, yielding substantial and promising results. This article presents a novel classification method that builds on prior work by utilising attention-based feature extraction, RGB channel-based chromatic analysis, Support Vector Machines (SVM) for improved performance, and the ability to integrate with mobile applications and IoT devices after quantization of information. Several disease classification algorithms were compared with the suggested model, and it was discovered that, in terms of accuracy, Vision Transformer-based feature extraction and additional Green Chromatic Coordinate feature with SVM classification achieved an accuracy of (GCCViT-SVM) - 99.69%, whereas after quantization for IoT device integration achieved an accuracy of - 97.41% while almost reducing 4x in size. Our findings have profound implications because they have the potential to transform how farmers identify crop illnesses with precise and fast information, thereby preserving agricultural output and ensuring food security.
摘要
This article presents a novel classification method that leverages attention-based feature extraction, RGB channel-based chromatic analysis, and Support Vector Machines (SVM) for improved performance. The method also has the ability to integrate with mobile applications and IoT devices after quantization of information. Several disease classification algorithms were compared with the proposed model, and the results showed that the Vision Transformer-based feature extraction and additional Green Chromatic Coordinate feature with SVM classification achieved an accuracy of 99.69%, while the quantized model achieved an accuracy of 97.41% with a reduction of almost 4x in size.These findings have significant implications for the agricultural industry, as they have the potential to revolutionize how farmers identify crop diseases with precise and fast information, ensuring food security and preserving agricultural output.
NEO-KD: Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks
results: 实验结果表明, compared to基于现有 adversarial training 或知识塑造技术的基eline,我们的方法在不同的数据集/模型上 achieve the best adversarial accuracy with reduced computation budgets。Abstract
While multi-exit neural networks are regarded as a promising solution for making efficient inference via early exits, combating adversarial attacks remains a challenging problem. In multi-exit networks, due to the high dependency among different submodels, an adversarial example targeting a specific exit not only degrades the performance of the target exit but also reduces the performance of all other exits concurrently. This makes multi-exit networks highly vulnerable to simple adversarial attacks. In this paper, we propose NEO-KD, a knowledge-distillation-based adversarial training strategy that tackles this fundamental challenge based on two key contributions. NEO-KD first resorts to neighbor knowledge distillation to guide the output of the adversarial examples to tend to the ensemble outputs of neighbor exits of clean data. NEO-KD also employs exit-wise orthogonal knowledge distillation for reducing adversarial transferability across different submodels. The result is a significantly improved robustness against adversarial attacks. Experimental results on various datasets/models show that our method achieves the best adversarial accuracy with reduced computation budgets, compared to the baselines relying on existing adversarial training or knowledge distillation techniques for multi-exit networks.
摘要
多出口神经网络被视为减少推理成本的有前途的解决方案,但是抗击敌方攻击仍然是一个困难的问题。在多出口网络中,由于不同子模型之间的高度依赖关系,攻击一个特定的出口不仅会降低该出口的性能,而且同时降低所有其他出口的性能。这使得多出口网络对简单的敌方攻击非常敏感。在这篇论文中,我们提出了NEO-KD,基于知识塑造的对抗训练策略,以解决这一基本挑战。NEO-KD首先通过邻居知识塑造引导攻击样本的输出倾向于净数据邻居出口的 ensemble 输出。NEO-KD还使用出口wise ortogonal knowledge塑造来降低攻击传播性 across 不同子模型。这使得我们的方法在不同的数据集/模型上实现了显著提高的对抗性。实验结果表明,我们的方法在计算预算下可以实现最好的对抗精度,相比于基于现有的对抗训练或知识塑造技术的基eline。
Uncertainty quantification and out-of-distribution detection using surjective normalizing flows
results: 作者将方法应用到一个人工合成的数据集和一些从 intervenational 分布中导出的数据集上,并证明了这个方法可靠地分辨出内部数据集和外部数据集。作者与Dirichlet 过程混合模型和bijective flow进行比较,发现surjective flow模型是关键的 ком成分,可以可靠地分辨内部数据集和外部数据集。Abstract
Reliable quantification of epistemic and aleatoric uncertainty is of crucial importance in applications where models are trained in one environment but applied to multiple different environments, often seen in real-world applications for example, in climate science or mobility analysis. We propose a simple approach using surjective normalizing flows to identify out-of-distribution data sets in deep neural network models that can be computed in a single forward pass. The method builds on recent developments in deep uncertainty quantification and generative modeling with normalizing flows. We apply our method to a synthetic data set that has been simulated using a mechanistic model from the mobility literature and several data sets simulated from interventional distributions induced by soft and atomic interventions on that model, and demonstrate that our method can reliably discern out-of-distribution data from in-distribution data. We compare the surjective flow model to a Dirichlet process mixture model and a bijective flow and find that the surjections are a crucial component to reliably distinguish in-distribution from out-of-distribution data.
摘要
可靠地量化知识型和随机型uncertainty在应用中是非常重要的,特别是在模型在多个环境中训练后应用于多个不同的环境中,例如气候科学或者流动分析。我们提出了一种简单的方法,使用射影正则化流来在深度神经网络模型中标识不同环境中的数据集。该方法基于深度不确定性评估和生成模型的正则化流的最新发展。我们在一个人工生成的数据集和一些基于软件和原子性改变的数据集上应用了我们的方法,并证明了我们的方法可靠地将不同环境中的数据集分为准确和不准确两类。我们与 Dirichlet 过程混合模型和 bijection 流进行比较,发现射影流模型是重要的组成部分,可以可靠地分辨在 Distribution 和 out-of-distribution 数据之间。
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
results: 作者们的实现方法在目标Intel数据中心GPU上达到了 peak性能,并与Intel oneMKL库在Intel GPU上的性能相比,以及NVIDIA V100 GPU上的一个最近CUDA实现相比, demonstrate that their implementations of sparse matrix operations outperform either.Abstract
In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition of the SDDMM with SPMM, also termed as FusedMM. We develop optimized implementations for SPMM, SDDMM, and FusedMM operations utilizing Intel oneAPI's Explicit SIMD (ESIMD) SYCL extension API. In contrast to CUDA or SYCL, the ESIMD API enables the writing of explicitly vectorized kernel code. Sparse matrix algorithms implemented with the ESIMD API achieved performance close to the peak of the targeted Intel Data Center GPU. We compare our performance results to Intel's oneMKL library on Intel GPUs and to a recent CUDA implementation for the sparse matrix operations on NVIDIA's V100 GPU and demonstrate that our implementations for sparse matrix operations outperform either.
摘要
在这篇论文中,我们关注了三种稀疏矩阵操作,即稀疏积分矩阵(SPMM)、采样积分积分矩阵(SDDMM)以及这两者的融合(FusedMM)。我们开发了优化的实现方法,使用 intel oneAPI 的显式 SIMD(ESIMD) SYCL 扩展 API。与 CUDA 或 SYCL 不同,ESIMD API 允许我们编写明确的向量化kernel代码。我们使用 ESIMD API 实现的稀疏矩阵算法在目标 Intel 数据中心 GPU 上达到了 peak 性能。我们对 intel oneMKL 库在 intel GPU 上的性能进行比较,以及 NVIDIA V100 GPU 上的一个最近的 CUDA 实现,并证明了我们的稀疏矩阵操作实现的性能高于其他任何一个。
Adversarially Robust Distributed Count Tracking via Partial Differential Privacy
results: 我们提供了一个robust algorithm,其communication cost是 deterministic 算法的lower bound。 existing robustification techniques 不能实现optimal bounds,因为分布式问题的特殊性。 To address this, we extend the differential privacy framework by introducing “partial differential privacy” and proving a new generalization theorem. This theorem may have broader applications beyond robust count tracking, making it of independent interest.Abstract
We study the distributed tracking model, also known as distributed functional monitoring. This model involves $k$ sites each receiving a stream of items and communicating with the central server. The server's task is to track a function of all items received thus far continuously, with minimum communication cost. For count tracking, it is known that there is a $\sqrt{k}$ gap in communication between deterministic and randomized algorithms. However, existing randomized algorithms assume an "oblivious adversary" who constructs the entire input streams before the algorithm starts. Here we consider adaptive adversaries who can choose new items based on previous answers from the algorithm. Deterministic algorithms are trivially robust to adaptive adversaries, while randomized ones may not. Therefore, we investigate whether the $\sqrt{k}$ advantage of randomized algorithms is from randomness itself or the oblivious adversary assumption. We provide an affirmative answer to this question by giving a robust algorithm with optimal communication. Existing robustification techniques do not yield optimal bounds due to the inherent challenges of the distributed nature of the problem. To address this, we extend the differential privacy framework by introducing "partial differential privacy" and proving a new generalization theorem. This theorem may have broader applications beyond robust count tracking, making it of independent interest.
摘要
我们研究分布式跟踪模型,也称为分布式功能监测。这个模型中,有 $k$ 个站点,每个站点接收一束项目并与中央服务器进行交流。服务器的任务是不断跟踪所有项目的函数,以最小化交流成本。对于计数跟踪,已知存在 $\sqrt{k}$ 的交流差异 между deterministic 和 randomized 算法。然而,现有的 randomized 算法假设了一个 "无知敌手"(oblivious adversary),该敌手在算法开始之前构建整个输入流。在这里,我们考虑 adaptive 敌手,该敌手可以根据先前答案选择新的项目。deterministic 算法对 adaptive 敌手是可以逆转的,而 randomized 算法可能不是。因此,我们研究是 randomized 算法的 $\sqrt{k}$ 优势来自于随机性本身,还是 oblivious adversary 假设。我们提供了一个有optimal communication的robust算法,现有的 robustification 技术不能实现optimal bounds,因为分布式问题的特殊性。为解决这一点,我们扩展了 differential privacy 框架,引入 "partial differential privacy",并证明一个新的总则。这个总则可能有更广泛的应用,因此是独立的兴趣。
The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture
paper_authors: Anuroop Sriram, Sihoon Choi, Xiaohan Yu, Logan M. Brabson, Abhishek Das, Zachary Ulissi, Matt Uyttendaele, Andrew J. Medford, David S. Sholl
results: 该论文提供了一个名为Open DAC 2023(ODAC23)的开源数据集,包含了38亿次DFT计算,并经过了深入分析,从而提取了MOF材料的特性。此外,该论文还使用了最新的ML模型,以估计DFT计算的结果。Abstract
New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,800 MOF materials containing adsorbed CO2 and/or H2O. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.
摘要
新的碳排放除去方法urgently需要来 combat global climatic change。直接空气 capture(DAC)是一种emerging technology to capture碳排放 directly from ambient air。 Metal-organic frameworks(MOFs)have been widely studied as potentially customizable adsorbents for DAC。However,discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature。We explore a computational approach benefiting from recent innovations in machine learning(ML)and present a dataset named Open DAC 2023(ODAC23)consisting of more than 38M density functional theory(DFT)calculations on more than 8,800 MOF materials containing adsorbed CO2 and/or H2O。ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available。In addition to probing properties of adsorbed molecules,the dataset is a rich source of information on structural relaxation of MOFs,which will be useful in many contexts beyond specific applications for DAC。A large number of MOFs with promising properties for DAC are identified directly in ODAC23。We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level。This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications,including DAC。
Latent Space Inference For Spatial Transcriptomics
results: 研究发现,通过将这两种数据映射到共同的假设空间表示中,可以同时获取细胞表达信息和其 spatial坐标,从而为我们带来更深刻的理解细胞生物学过程和路径way.Abstract
In order to understand the complexities of cellular biology, researchers are interested in two important metrics: the genetic expression information of cells and their spatial coordinates within a tissue sample. However, state-of-the art methods, namely single-cell RNA sequencing and image based spatial transcriptomics can only recover a subset of this information, either full genetic expression with loss of spatial information, or spatial information with loss of resolution in sequencing data. In this project, we investigate a probabilistic machine learning method to obtain the full genetic expression information for tissues samples while also preserving their spatial coordinates. This is done through mapping both datasets to a joint latent space representation with the use of variational machine learning methods. From here, the full genetic and spatial information can be decoded and to give us greater insights on the understanding of cellular processes and pathways.
摘要
为了理解细胞生物学中的复杂性,研究人员对两个重要指标感兴趣:细胞的遗传表达信息和它们在组织样本中的空间坐标。然而,现有的技术,即单细胞RNA扩增和图像基于的空间转录组学,只能回归一部分这些信息,即全遗传表达信息的损失或图像数据中的分辨率损失。在本项目中,我们研究一种概率机器学习方法,以获取组织样本中的全遗传表达信息,同时保持它们的空间坐标。我们通过将两个数据集映射到共同的假设空间表示中,使用变分机器学习方法来实现这一目标。从这里,我们可以解码全遗传和空间信息,以提供更深刻的细胞生物学过程和 PATHway 的理解。
Multi-task Representation Learning for Pure Exploration in Bilinear Bandits
results: 本文的结果表明,通过共享表示来加速找到每个任务中的优化对象,可以减少样本数量,比传统独立解决每个任务的方法更有效。Abstract
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common low-dimensional linear representation. The objective is to leverage this characteristic to expedite the process of identifying the best pair of arms for all tasks. We propose the algorithm GOBLIN that uses an experimental design approach to optimize sample allocations for learning the global representation as well as minimize the number of samples needed to identify the optimal pair of arms in individual tasks. To the best of our knowledge, this is the first study to give sample complexity analysis for pure exploration in bilinear bandits with shared representation. Our results demonstrate that by learning the shared representation across tasks, we achieve significantly improved sample complexity compared to the traditional approach of solving tasks independently.
摘要
我们研究多任务表示学习 Bilinear bandits 中的纯探索问题。在 Bilinear bandits 中,一个动作是两个不同类型的 arm 的对,奖励是两个known feature vector 的 bilinear函数。在多任务 Bilinear bandit 问题中,我们目标是找到多个任务共享的低维度线性表示,并利用这个特点来快速确定所有任务的最佳对。我们提出了 GOBLIN 算法,使用实验设计方法来优化样本分配以学习全局表示,以及最小化每个任务中的样本数量。根据我们所知,这是首次对 Bilinear bandits 中纯探索问题进行样本复杂度分析的研究。我们的结果表明,通过学习共享表示,我们可以在每个任务中实现明显改善的样本复杂度。
Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables
results: 研究结果表明,该系统可以处理20种声音类型,并且在连接式手机上实现了6.56毫秒的运行时间。在实际生活中进行的评估中,证明了该系统可以提取目标声音并保持它们的空间cue。项目页面与代码:https://semantichearing.cs.washington.eduAbstract
Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output. Project page with code: https://semantichearing.cs.washington.edu
摘要
想像你可以在公园中听到鸟叫,而不听到其他游客的喊喊叫,或者在忙街上听到交通噪音,而快速听到紧急警 siren 和车 horn。我们引入semantic hearing,一种新的能力 для智能听众设备,允许它们在实时中,选择性地听到或忽略来自实际环境中的具体声音,而不失去声学位置信息。为了实现这一目标,我们做出了两项技术贡献:1. 我们提出了第一个能够在干扰声和背景噪音的情况下进行针对声音抽取的 neural network,可以在实时中提取Target声音。2. 我们设计了一种可以在实际应用中通用的训练方法,使我们的系统能够在实际应用中泛化。结果表明,我们的系统可以处理20种声音类型,并且我们使用 transformer 网络的运行时间为6.56 ms。在实际场景中进行了审试,我们的证明系统可以提取Target声音并保持声学位置信息。项目页面与代码:https://semantichearing.cs.washington.eduNote: Simplified Chinese is used here, as it is the most widely used standard for Chinese writing.
Federated Topic Model and Model Pruning Based on Variational Autoencoder
results: 实验结果显示,基于VAE的FTM剪枝方法可以大幅提高模型训练速度,而不失其性能。Abstract
Topic modeling has emerged as a valuable tool for discovering patterns and topics within large collections of documents. However, when cross-analysis involves multiple parties, data privacy becomes a critical concern. Federated topic modeling has been developed to address this issue, allowing multiple parties to jointly train models while protecting pri-vacy. However, there are communication and performance challenges in the federated sce-nario. In order to solve the above problems, this paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model, where the client periodically sends the model neu-ron cumulative gradients and model weights to the server, and the server prunes the model. To address different requirements, two different methods are proposed to determine the model pruning rate. The first method involves slow pruning throughout the entire model training process, which has limited acceleration effect on the model training process, but can ensure that the pruned model achieves higher accuracy. This can significantly reduce the model inference time during the inference process. The second strategy is to quickly reach the target pruning rate in the early stage of model training in order to accelerate the model training speed, and then continue to train the model with a smaller model size after reaching the target pruning rate. This approach may lose more useful information but can complete the model training faster. Experimental results show that the federated topic model pruning based on the variational autoencoder proposed in this paper can greatly accelerate the model training speed while ensuring the model's performance.
摘要
通用主题模型在处理大量文档时发现模式和话题变得非常有用。然而,当跨分析包括多个方的情况下,数据隐私变得非常重要。联邦主题模型是为此目的而开发的,允许多个方共同训练模型,保护每个节点的隐私。然而,联邦场景中存在交流和性能问题。为解决以上问题,本文提出了一种方法,可以在多个节点之间共同训练模型,保证每个节点的隐私,并使用神经网络模型剪辑以加速模型训练。在客户端 periodic 发送模型神经元累积偏移和模型参数给服务器,服务器进行模型剪辑。为了应对不同的需求,本文提出了两种不同的方法来确定模型剪辑率。第一种方法是在整个模型训练过程中慢慢剪辑模型,可以在模型训练过程中有限度地加速模型训练,但是可以确保剪辑后的模型准确率高。这可以减少模型推理时间。第二种方法是在模型训练过程的早期 quickly 到达目标剪辑率,以加速模型训练速度,然后继续使用较小的模型大小进行模型训练。这种方法可能会产生更多的有用信息产生,但可以更快地完成模型训练。实验结果表明,基于变量自动encoder的联邦主题模型剪辑可以快速加速模型训练速度,保证模型性能。
Stacking an autoencoder for feature selection of zero-day threats
For: This paper is written for researchers and practitioners in the field of cybersecurity, particularly those interested in zero-day attack detection and artificial neural networks.* Methods: The paper uses a stacked autoencoder (SAE) and a Long Short-Term Memory (LSTM) scheme for feature selection and zero-day threat classification. The SAE is used for unsupervised feature extraction, and the LSTM is used for supervised learning to enhance the model’s discriminative capabilities.* Results: The paper reports high precision, recall, and F1 score values for the SAE-LSTM model in identifying various types of zero-day attacks, and demonstrates strong predictive capabilities across all three attack categories. The balanced average scores suggest that the model generalizes effectively and consistently across different attack categories.Here’s the simplified Chinese text for the three key information points:* For: 这篇论文是为了针对安全领域的研究人员和实践者而写的,尤其是关注于零日攻击检测和人工神经网络。* Methods: 这篇论文使用了堆叠自动编码器(SAE)和长短期记忆(LSTM)方法来实现特征选择和零日威胁分类。 SAE 用于无监督特征提取,而 LSTM 用于监督学习以提高模型的识别能力。* Results: 论文报告了 SAELSTM 模型在不同类型的零日攻击上的高精度、回归率和 F1 分数值,并示出了模型在不同攻击类别之间的一致性和通用性。Abstract
Zero-day attack detection plays a critical role in mitigating risks, protecting assets, and staying ahead in the evolving threat landscape. This study explores the application of stacked autoencoder (SAE), a type of artificial neural network, for feature selection and zero-day threat classification using a Long Short-Term Memory (LSTM) scheme. The process involves preprocessing the UGRansome dataset and training an unsupervised SAE for feature extraction. Finetuning with supervised learning is then performed to enhance the discriminative capabilities of this model. The learned weights and activations of the autoencoder are analyzed to identify the most important features for discriminating between zero-day threats and normal system behavior. These selected features form a reduced feature set that enables accurate classification. The results indicate that the SAE-LSTM performs well across all three attack categories by showcasing high precision, recall, and F1 score values, emphasizing the model's strong predictive capabilities in identifying various types of zero-day attacks. Additionally, the balanced average scores of the SAE-LSTM suggest that the model generalizes effectively and consistently across different attack categories.
摘要
<> translate("Zero-day attack detection plays a critical role in mitigating risks, protecting assets, and staying ahead in the evolving threat landscape. This study explores the application of stacked autoencoder (SAE), a type of artificial neural network, for feature selection and zero-day threat classification using a Long Short-Term Memory (LSTM) scheme. The process involves preprocessing the UGRansome dataset and training an unsupervised SAE for feature extraction. Finetuning with supervised learning is then performed to enhance the discriminative capabilities of this model. The learned weights and activations of the autoencoder are analyzed to identify the most important features for discriminating between zero-day threats and normal system behavior. These selected features form a reduced feature set that enables accurate classification. The results indicate that the SAE-LSTM performs well across all three attack categories by showcasing high precision, recall, and F1 score values, emphasizing the model's strong predictive capabilities in identifying various types of zero-day attacks. Additionally, the balanced average scores of the SAE-LSTM suggest that the model generalizes effectively and consistently across different attack categories.")]以下是文本的简化中文翻译: Zero-day 攻击检测在降低风险、保护资产和在演化的威胁领域中发挥关键角色。本研究探讨使用堆式自适应器(SAE),一种人工神经网络,来选择特征和预测 zero-day 威胁。该过程包括对 UGRansome 数据集进行预处理,并使用无监督 SAE 进行特征提取。然后,通过监督学习进行训练,以提高模型的推论能力。模型学习的权重和活动都被分析以确定最重要的特征,以便在预测 zero-day 威胁和正常系统行为之间进行分类。这些选择的特征组成了一个减少特征集,使得准确地进行分类。结果表明,SAE-LSTM 在三个攻击类别中都表现出色,具有高精度、 recall 和 F1 分数值,证明模型在不同的攻击类别中具有强的预测能力。此外,SAE-LSTM 的平衡平均分数表明,模型在不同的攻击类别中具有一致的泛化能力和一致性。
Model-driven Engineering for Machine Learning Components: A Systematic Literature Review
results: 本研究的结果表明,使用MDE4ML可以提高开发效率、降低开发成本、提高系统可维护性和可扩展性等。但是,还存在一些局限性和挑战,需要进一步的研究和发展。Abstract
Context: Machine Learning (ML) has become widely adopted as a component in many modern software applications. Due to the large volumes of data available, organizations want to increasingly leverage their data to extract meaningful insights and enhance business profitability. ML components enable predictive capabilities, anomaly detection, recommendation, accurate image and text processing, and informed decision-making. However, developing systems with ML components is not trivial; it requires time, effort, knowledge, and expertise in ML, data processing, and software engineering. There have been several studies on the use of model-driven engineering (MDE) techniques to address these challenges when developing traditional software and cyber-physical systems. Recently, there has been a growing interest in applying MDE for systems with ML components. Objective: The goal of this study is to further explore the promising intersection of MDE with ML (MDE4ML) through a systematic literature review (SLR). Through this SLR, we wanted to analyze existing studies, including their motivations, MDE solutions, evaluation techniques, key benefits and limitations. Results: We analyzed selected studies with respect to several areas of interest and identified the following: 1) the key motivations behind using MDE4ML; 2) a variety of MDE solutions applied, such as modeling languages, model transformations, tool support, targeted ML aspects, contributions and more; 3) the evaluation techniques and metrics used; and 4) the limitations and directions for future work. We also discuss the gaps in existing literature and provide recommendations for future research. Conclusion: This SLR highlights current trends, gaps and future research directions in the field of MDE4ML, benefiting both researchers and practitioners
摘要
Machine Learning (ML) 已经成为现代软件应用中的一个重要组件。由于大量数据的可用性,组织希望通过数据来提取有意义的洞见和提高商业利润。 ML ком�ponenets 提供预测功能、偏差检测、建议、精准的图像和文本处理,并帮助做出了解决策。然而,开发具有 ML ком�ponenets 的系统不是易事;它需要时间、努力、知识和 ML、数据处理和软件工程的专家知识。过去,有关使用模型驱动工程(MDE)技术来解决这些挑战的研究已经很多。最近,关于应用 MDE for 系统 with ML ком�ponenets 的研究则有所增加。目标:本研究的目标是进一步探索 MDE 与 ML (MDE4ML)的联合领域,通过系统性文献综述(SLR)。通过这个 SLR,我们想要分析选择的研究,包括他们的动机、MDE 解决方案、评估技术和 метри克、主要优点和局限性。结果:我们分析选择的研究,并评估他们在以下几个领域:1)使用 MDE4ML 的动机;2)MDE 解决方案的多样性,包括模型语言、模型转换、工具支持、针对 ML 方面的贡献等;3)评估技术和 метри克的使用;和4)限制和未来研究的方向。我们还讨论了现有文献中的潜在空白和未来研究的建议。结论:这个 SLR 显示了 MDE4ML 的现有趋势、缺点和未来研究的方向,对研究者和实践者都有帮助。
Generalization Bounds for Label Noise Stochastic Gradient Descent
for: This paper is written for those interested in understanding the generalization error bounds of stochastic gradient descent (SGD) with label noise in non-convex settings.
methods: The paper uses a combination of uniform dissipativity and smoothness conditions, as well as a suitable choice of semimetric, to establish a contraction in Wasserstein distance of the label noise stochastic gradient flow.
results: The paper derives time-independent generalization error bounds for the discretized algorithm with a constant learning rate, which scales polynomially with the parameter dimension $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD) under similar conditions.Abstract
We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD) -- which employs parameter-independent Gaussian noise -- under similar conditions. Our analysis offers quantitative insights into the effect of label noise.
摘要
我们研究了批处理梯度下降(SGD)中标签噪声的总化误差上限在非对称 Setting下,并且在各种不同的semimetric下进行了选择。我们在标签噪声梯度流中确立了一个 Wasserstein 距离的减少,这个减少的程度取决于参数维度 $d$。使用算法稳定性框架,我们得到了一个时间独立的总化误差上限,这个上限与参数维度 $d$ 和学习率 $n$ 相乘。我们的分析提供了标签噪声的量化意见,并且我们的误差上限的速度与 $d$ 和 $n$ 相乘。这个速度比最好known的 $n^{-1/2}$ 更快,这个速度是在类似条件下使用参数独立的 Gaussian 噪声的SGLD中得到的。
results: 作者首先表明,如果给定任何最佳活动学习算法,则与整个数据集进行协同学习即可保证IR协同协议。然而,计算最佳算法是NP困难的。因此,作者提供了一些IR协同协议,可以与最佳 tractable approximation algorithm 相比肩并减少标签复杂性。Abstract
In collaborative active learning, where multiple agents try to learn labels from a common hypothesis, we introduce an innovative framework for incentivized collaboration. Here, rational agents aim to obtain labels for their data sets while keeping label complexity at a minimum. We focus on designing (strict) individually rational (IR) collaboration protocols, ensuring that agents cannot reduce their expected label complexity by acting individually. We first show that given any optimal active learning algorithm, the collaboration protocol that runs the algorithm as is over the entire data is already IR. However, computing the optimal algorithm is NP-hard. We therefore provide collaboration protocols that achieve (strict) IR and are comparable with the best known tractable approximation algorithm in terms of label complexity.
摘要
协同活动学习中,多个代理尝试学习共同假设中的标签,我们提出了一种创新的协作框架,以增加代理的奖励。在这个框架中,合理的代理希望通过最小化标签复杂度来获得标签。我们主要关注设计(严格)个人合理(IR)的协作协议,以确保代理不能通过单独行动减少其预期标签复杂度。我们首先证明,任何优化的活动学习算法都可以通过整个数据集进行执行,而不需要进行协作。然而,计算优化算法是NP困难的。因此,我们提供了一些可比较于最佳可追踪算法的协作协议,以达到(严格)IR的目标,并且和最佳可追踪算法相比,标签复杂度减少的程度相对较小。
Active Neural Topological Mapping for Multi-Agent Exploration
paper_authors: Xinyi Yang, Yuxiang Yang, Chao Yu, Jiayu Chen, Jingchen Yu, Haibing Ren, Huazhong Yang, Yu Wang for: 多个自动机器人在未知环境中进行合作探索任务,需要在有限时间内利用感知信号完成探索任务。methods: 提出了一种基于神经网络的多自动机器人 topological mapping 技术,包括图像编码器和距离基于的优化技术,以及一种基于图 neural network 的层次嵌入式规划算法。results: 在一个 физи实验中,该技术可以在未seen scenario 下提高探索效率和普适性,比基于规划的基eline 下降至少 26.40%,并比基于RL的竞争对手下降至少 7.63%。Abstract
This paper investigates the multi-agent cooperative exploration problem, which requires multiple agents to explore an unseen environment via sensory signals in a limited time. A popular approach to exploration tasks is to combine active mapping with planning. Metric maps capture the details of the spatial representation, but are with high communication traffic and may vary significantly between scenarios, resulting in inferior generalization. Topological maps are a promising alternative as they consist only of nodes and edges with abstract but essential information and are less influenced by the scene structures. However, most existing topology-based exploration tasks utilize classical methods for planning, which are time-consuming and sub-optimal due to their handcrafted design. Deep reinforcement learning (DRL) has shown great potential for learning (near) optimal policies through fast end-to-end inference. In this paper, we propose Multi-Agent Neural Topological Mapping (MANTM) to improve exploration efficiency and generalization for multi-agent exploration tasks. MANTM mainly comprises a Topological Mapper and a novel RL-based Hierarchical Topological Planner (HTP). The Topological Mapper employs a visual encoder and distance-based heuristics to construct a graph containing main nodes and their corresponding ghost nodes. The HTP leverages graph neural networks to capture correlations between agents and graph nodes in a coarse-to-fine manner for effective global goal selection. Extensive experiments conducted in a physically-realistic simulator, Habitat, demonstrate that MANTM reduces the steps by at least 26.40% over planning-based baselines and by at least 7.63% over RL-based competitors in unseen scenarios.
摘要
Translated into Simplified Chinese:这篇论文研究了多个机器人合作探索问题,它们需要在未知环境中通过感知信号在有限时间内进行探索。现有的探索任务的办法是将活动地图与规划结合使用。度量地图捕捉了环境的细节信息,但是它们可能因为enario的变化而导致普遍性下降。而图像地图则是一种有前途的替代方案,它只包含节点和边,并且通过距离基于的优化来构建图像。但是大多数现有的图像基于的探索任务仍然使用经典的规划方法,这些方法可能是时间消耗和不优化的。深度强化学习(DRL)已经显示出了学习(近似)优化政策的潜力,而这些政策可以通过快速的终端推理来实现。在这篇论文中,我们提出了多机器人神经图像映射(MANTM),以提高探索效率和普遍性。MANTM主要包括一个图像映射器和一个基于RL的层次图像规划器(HTP)。图像映射器使用视觉编码器和距离基于的优化来构建包含主节点和其对应的鬼节点的图像。HTP利用图像神经网络来捕捉多个机器人和图像节点之间的相互关系,并在层次结构中进行有效的全局目标选择。在Habitat simulator中进行了广泛的实验,表明MANTM可以比基于规划的基准解决方案减少步骤数少于26.40%,并且比基于RL的竞争对手减少步骤数少于7.63%。
DistDNAS: Search Efficient Feature Interactions within 2 Hours
results: 在一个1TB的Criteo Terabyte dataset上进行了广泛的实验评估,结果显示DistDNAS可以提高0.001的AUC和60%的FLOPs,比当前状态艺术CTR模型更好。Abstract
Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathematical operations introduces potential conflicts and additional redundancy toward recommender models, leading to sub-optimal trade-offs in performance and serving cost. In this paper, we present DistDNAS as a neat solution to brew swift and efficient feature interaction design. DistDNAS proposes a supernet to incorporate interaction modules of varying orders and types as a search space. To optimize search efficiency, DistDNAS distributes the search and aggregates the choice of optimal interaction modules on varying data dates, achieving over 25x speed-up and reducing search cost from 2 days to 2 hours. To optimize serving efficiency, DistDNAS introduces a differentiable cost-aware loss to penalize the selection of redundant interaction modules, enhancing the efficiency of discovered feature interactions in serving. We extensively evaluate the best models crafted by DistDNAS on a 1TB Criteo Terabyte dataset. Experimental evaluations demonstrate 0.001 AUC improvement and 60% FLOPs saving over current state-of-the-art CTR models.
摘要
搜索效率和服务效率是建立功能互动和加速模型开发过程中的两大轴心。在大规模的参考数据上,搜索到最佳功能互动设计需要很大的成本,因为搜索工作流程需要遍历大量数据。此外,将来自不同来源、顺序和mathematical operations的互动组合导致推荐模型中的冲突和额外累累,从而导致性能和服务成本的折冲。在本文中,我们提出了DistDNAS作为一个简单的解决方案,它透过建立互动模组的supernet,实现了快速和高效的功能互动设计。DistDNAS通过分布搜索和聚合选择最佳互动模组的选择,实现了25倍的速度提升和从2天缩短为2小时的搜索成本。此外,DistDNAS引入了一个可微的成本警示loss,以惩罚选择重复的互动模组,提高发现的功能互动效率。我们对1TB Criteo Terabyte数据集进行了广泛的实验评估,结果显示了0.001 AUC提升和60% FLOPs节省,较前瞻性的CTR模型。
Transformers are Efficient In-Context Estimators for Wireless Communication
results: 经过广泛的 simulations,这个方法不仅可以达到标准方法的性能水平,还可以在几个 context 示例后达到同样的性能水平,这表明 transformers 是一种高效的 in-context 估计器在通信设置下。Abstract
Pre-trained transformers can perform in-context learning, where they adapt to a new task using only a small number of prompts without any explicit model optimization. Inspired by this attribute, we propose a novel approach, called in-context estimation, for the canonical communication problem of estimating transmitted symbols from received symbols. A communication channel is essentially a noisy function that maps transmitted symbols to received symbols, and this function can be represented by an unknown parameter whose statistics depend on an (also unknown) latent context. Conventional approaches ignore this hierarchical structure and simply attempt to use known transmissions, called pilots, to perform a least-squares estimate of the channel parameter, which is then used to estimate successive, unknown transmitted symbols. We make the basic connection that transformers show excellent contextual sequence completion with a few prompts, and so they should be able to implicitly determine the latent context from pilot symbols to perform end-to-end in-context estimation of transmitted symbols. Furthermore, the transformer should use information efficiently, i.e., it should utilize any pilots received to attain the best possible symbol estimates. Through extensive simulations, we show that in-context estimation not only significantly outperforms standard approaches, but also achieves the same performance as an estimator with perfect knowledge of the latent context within a few context examples. Thus, we make a strong case that transformers are efficient in-context estimators in the communication setting.
摘要
pré-entraîné transformers peuvent effectuer l'apprentissage en contexte, où ils s'adaptent à une nouvelle tâche à l'aide seulement d'un petit nombre de prompts sans aucune optimization explicite du modèle. Inspirés par cette caractéristique, nous proposons une nouvelle approche, appelée estimation en contexte, pour le problème canonique de communication de déterminer les symboles transmis à partir des symboles reçus. Un canal de communication est essentiellement une fonction bruitée qui cartographie les symboles transmis en symboles reçus, et cette fonction peut être représentée par un paramètre inconnu dont les statistiques dépendent d'un contexte latent également inconnu. Les approches conventionnelles négligent cette structure hiérarchique et essaient simplement d'utiliser des transmissions conocues, appelées piliers, pour effectuer un estimateur de least-squares du paramètre de canal, qui est ensuite utilisé pour estimer les symboles transmis successifs inconus. Nous faisons la connexion basique que les transformers montrent une excellente completion de séquence contextuelle avec quelques prompts, et donc ils devraient être en mesure d'implicitement déterminer le contexte latent à partir des symboles de pilote pour effectuer une estimation en contexte de transmissions. De plus, le transformer devrait utiliser l'information de manière efficace, c'est-à-dire qu'il devrait utiliser tous les piliers reçus pour obtenir les estimations de symboles les meilleures possibles. Grâce à des simulations étendues, nous montrons que l'estimation en contexte ne seulement outrepasse les approches standard, mais également atteint le même niveau de performance qu'un estimateur avec une connaissance parfaite du contexte latent dans quelques exemples de contexte. Ainsi, nous faisons un fort cas que les transformers sont des estimateurs efficaces en contexte dans le setting de la communication.
WinNet:time series forecasting with a window-enhanced period extracting and interacting
results: 在九个基准数据集上,WinNet 可以达到 SOTA 性能和较低的计算复杂度,比 CNN、MLP 和 Transformer 等方法高。WinNet 为 CNN 基于方法在时间序列预测任务中提供了潜在的替代方案,具有完美的性能和效率平衡。Abstract
Recently, Transformer-based methods have significantly improved state-of-the-art time series forecasting results, but they suffer from high computational costs and the inability to capture the long and short periodicity of time series. We present a highly accurate and simply structured CNN-based model for long-term time series forecasting tasks, called WinNet, including (i) Inter-Intra Period Encoder (I2PE) to transform 1D sequence into 2D tensor with long and short periodicity according to the predefined periodic window, (ii) Two-Dimensional Period Decomposition (TDPD) to model period-trend and oscillation terms, and (iii) Decomposition Correlation Block (DCB) to leverage the correlations of the period-trend and oscillation terms to support the prediction tasks by CNNs. Results on nine benchmark datasets show that the WinNet can achieve SOTA performance and lower computational complexity over CNN-, MLP-, Transformer-based approaches. The WinNet provides potential for the CNN-based methods in the time series forecasting tasks, with perfect tradeoff between performance and efficiency.
摘要
近期,基于Transformer的方法在时间序列预测中取得了显著的进步,但它们受到高计算成本和无法捕捉时间序列的长短周期性的限制。我们提出了一种高准确性和简单结构的CNN基本模型,称为WinNet,包括以下三个 Component:1. 时间序列 Period Encoder (I2PE),将1D序列转换成2D张量,并以预定的 periodic 窗口中的长短周期性进行编码。2. Two-Dimensional Period Decomposition (TDPD),用于模型期势和抽象的振荡项。3. Decomposition Correlation Block (DCB),利用 period-trend 和抽象的振荡项的相关性,以支持预测任务。在九个标准测试集上,WinNet可以 дости得State-of-the-art(SOTA)性能和低计算复杂度,比 CNN-, MLP-、Transformer-based 方法更高效。WinNet提供了CNN基本方法在时间序列预测任务中的潜在潜力,同时实现了精度和效率的完美平衡。
A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning
paper_authors: Samuel E. Otto, Nicholas Zolman, J. Nathan Kutz, Steven L. Brunton
for: 这 paper 是为了探讨在物理和机器学习中如何使用对称性来提高模型的泛化能力。
methods: 这 paper 使用了以下方法:1. 在训练模型时强制执行已知的对称性; 2. 从数据集或模型中发现未知的对称性; 3. 在训练过程中通过学习打砸对称性的潜在 канди达到提高模型性能。
results: 这 paper 提出了一种新的对称性核算法,可以在各种机器学习模型中应用,包括基函数回归、动力系统发现、多层感知机和图像空间中的神经网络。这种算法可以在训练过程中提高模型的泛化能力和性能。Abstract
Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincar\'{e} invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, multilayer perceptrons, and neural networks acting on spatial fields such as images.
摘要
自然中央存在对称,并且在物理和机器学习中扮演着越来越重要的角色。基本对称,如波兰卷积几何,使物理法则在宇宙中的各个方向都适用。对称是机器学习应用中达到这种推广能力的关键。例如,图像分类中的翻译对称使得使用 fewer parameters的卷积神经网络进行训练,可以达到现场的状态arta performance。在这篇论文中,我们提供了一种统一的理论和方法框架,用于在机器学习模型中包含对称。我们的方法包括:1. 在训练过程中强制执行已知对称; 2. 发现数据集或模型未知的对称; 3. 在训练过程中通过学习打砸对称时,找到对称的破坏。我们显示这些任务可以在共同的数学框架下进行,该框架的中心对象是在纤维线性 Lie 群动作下的 Lie 导数。我们扩展和统一了一些现有的结果,并证明了在训练过程中强制执行对称和发现对称是线性代数任务,这些任务是对 Lie 导数的 bilinear 结构进行对应。我们还提出了一种新的方法,基于 Lie 导数和核心 нор数relaxation,用于在训练机器学习模型时强制对称。我们解释了这些想法如何应用于各种机器学习模型,包括基函数回归、动力系统发现、多层感知机和图像空间中的神经网络。
Machine learning for accuracy in density functional approximations
results: 提高了计算机能力和预测力,并能 corrrect 基函数方法中的基本错误Here’s a more detailed explanation of each point:
for: The paper is written to improve the accuracy and efficiency of computational chemistry simulations and materials design using machine learning techniques.
methods: The paper uses machine learning approaches to enhance the predictive power of density functional theory and related approximations.
results: The paper reviews recent progress in applying machine learning to improve the accuracy of density functional and related approximations, and discusses the promises and challenges of devising machine learning models that can be transferred between different chemistries and materials classes.Abstract
Machine learning techniques have found their way into computational chemistry as indispensable tools to accelerate atomistic simulations and materials design. In addition, machine learning approaches hold the potential to boost the predictive power of computationally efficient electronic structure methods, such as density functional theory, to chemical accuracy and to correct for fundamental errors in density functional approaches. Here, recent progress in applying machine learning to improve the accuracy of density functional and related approximations is reviewed. Promises and challenges in devising machine learning models transferable between different chemistries and materials classes are discussed with the help of examples applying promising models to systems far outside their training sets.
摘要
使用机器学习技术加速 atomistic simulations 和材料设计已成为计算化学中不可或缺的工具。此外,机器学习方法还拥有提高计算效率电子结构方法的预测力的潜力,使其达到化学精度。本文将 recensione recent progress in applying machine learning to improve the accuracy of density functional and related approximations。文章还讨论了在不同化学和材料类型之间传递机器学习模型的承诺和挑战,并通过example应用有前提模型到远离训练集的系统。Translation:使用机器学习技术加速 atomistic simulations 和材料设计已成为计算化学中不可或缺的工具。此外,机器学习方法还拥有提高计算效率电子结构方法的预测力,使其达到化学精度。本文将 recensione recent progress in applying machine learning to improve the accuracy of density functional and related approximations。文章还讨论了在不同化学和材料类型之间传递机器学习模型的承诺和挑战,并通过example apply 有前提模型到远离训练集的系统。