for: 这 paper 是为了解决自动驾驶车辆 плаanner 中的多个可接受的计划问题而写的。
methods: 这 paper 使用了一种可解释的神经网络执行器,通过灵活的 Gaussian 几何函数和放松的小时钟损失函数来更好地捕捉规划问题的不确定性。
results: 作者在 Lyft 开放数据集上进行系统性的评估,发现其模型在真实世界驾驶enario 中比 Priors 性能更好,具有更安全和更灵活的驾驶性能。Abstract
Learning-based approaches to autonomous vehicle planners have the potential to scale to many complicated real-world driving scenarios by leveraging huge amounts of driver demonstrations. However, prior work only learns to estimate a single planning trajectory, while there may be multiple acceptable plans in real-world scenarios. To solve the problem, we propose an interpretable neural planner to regress a heatmap, which effectively represents multiple potential goals in the bird's-eye view of an autonomous vehicle. The planner employs an adaptive Gaussian kernel and relaxed hourglass loss to better capture the uncertainty of planning problems. We also use a negative Gaussian kernel to add supervision to the heatmap regression, enabling the model to learn collision avoidance effectively. Our systematic evaluation on the Lyft Open Dataset across a diverse range of real-world driving scenarios shows that our model achieves a safer and more flexible driving performance than prior works.
摘要
学习基本的自动驾驶车辆规划方法有可能在许多复杂的实际驾驶场景中扩大,通过利用庞大量驾驶员示例来担快学习。然而,先前的工作只 learns to estimate 一个规划路径,而实际场景可能存在多个可接受的规划方案。为解决这个问题,我们提议一种可解释性神经网络规划器,使用折衔函数来回归热图,该热图有效表示自动驾驶车辆的飞行视图中的多个可能目标。我们的规划器使用适应 Gaussian 核函数和松弛小时钟损失函数,以更好地捕捉规划问题的不确定性。此外,我们还使用负 Gaussian 核函数来给热图回归添加监督,使模型能够有效地学习避免碰撞。我们对 Lyft 开放数据集进行系统性评估,并在实际驾驶场景中表明我们的模型可以比先前的工作更安全和更灵活地驾驶。
for: 这 paper 旨在提供一种统一的分析方法,用于研究深度学习中的 softmax 回归和 residual neural network(ResNet)两种技术的关系。
methods: 这 paper 使用了 theoretically 分析方法,对 regression 问题 $| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b |_2^2$ 进行了分析。
results: 这 paper 得到了 loss 函数的梯度、Hessian 和 Lipschitz 性质的分析结果,并证明了梯度和 Hessian 都是正semidefinite matrix,这使得可以使用高效的 approximate Newton 方法优化。这种统一的方法可以连接两个之前认为是无关的领域,并提供了新的视角 для深度学习模型的优化。Abstract
Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to image classification, object detection, semantic segmentation, and tensors. Previous research works studied these two concepts separately. In this paper, we provide a theoretical analysis of the regression problem: $\| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b \|_2^2$, where $A$ is a matrix in $\mathbb{R}^{n \times d}$, $b$ is a vector in $\mathbb{R}^n$, and ${\bf 1}_n$ is the $n$-dimensional vector whose entries are all $1$. This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before. We derive the gradient, Hessian, and Lipschitz properties of the loss function. The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix. This enables an efficient approximate Newton method. As a result, this unified scheme helps to connect two previously thought unrelated fields and provides novel insight into loss landscape and optimization for emerging over-parameterized neural networks, which is meaningful for future research in deep learning models.
摘要
大型语言模型(LLM)对人类社会带来了重要的变革。软极值回归和差异神经网络(ResNet)是深度学习中两种重要的技术:它们不仅支持 LLM 的功能,而且与其他机器学习和理论计算机科学领域有着密切的关系,包括图像分类、物体检测、 semantic 分割和矩阵等。在过去的研究中,这两个概念分别得到了研究。在这篇文章中,我们提供了对 regression 问题的理论分析:$\| \langle \exp(Ax) + A x , \mathbf{1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b \|_2^2$, 其中 $A$ 是一个 $n \times d$ 维度的矩阵,$b$ 是一个 $n$ 维度的向量,${\bf 1}_n$ 是一个 $n$ 维度的向量,其中每个元素都是 1。这个 regression 问题是融合软极值回归和 ResNet 的统一方案,这是之前从未有过的。我们 derive 了梯度、Hessian 和 Lipschitz 性质。Hessian 显示为正定定义的矩阵,其结构可以分解为低级matrix 和 диагональ矩阵。这使得我们可以使用高效的approximate Newton 方法。因此,这个统一方案可以将两个 formerly 不相关的领域相连接,提供了新的视角,对深度学习模型的未来研究具有深刻的意义。
Real-time Bandwidth Estimation from Offline Expert Demonstrations
paper_authors: Aashish Gottipati, Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler
for: 该论文targets the problem of bandwidth estimation (BWE) for real-time communication systems, with a focus on integrating data-driven bandwidth estimators into real-time systems.
results: 实验表明,Merlin在对比WebRTC的视频会议中具有42.85%和12.8%的包丢失和延迟减少,分别。这些结果表明Merlin可以在实时网络控制中提供高质量的带宽估计。Abstract
In this work, we tackle the problem of bandwidth estimation (BWE) for real-time communication systems; however, in contrast to previous works, we leverage the vast efforts of prior heuristic-based BWE methods and synergize these approaches with deep learning-based techniques. Our work addresses challenges in generalizing to unseen network dynamics and extracting rich representations from prior experience, two key challenges in integrating data-driven bandwidth estimators into real-time systems. To that end, we propose Merlin, the first purely offline, data-driven solution to BWE that harnesses prior heuristic-based methods to extract an expert BWE policy. Through a series of experiments, we demonstrate that Merlin surpasses state-of-the-art heuristic-based and deep learning-based bandwidth estimators in terms of objective quality of experience metrics while generalizing beyond the offline world to in-the-wild network deployments where Merlin achieves a 42.85% and 12.8% reduction in packet loss and delay, respectively, when compared against WebRTC in inter-continental videoconferencing calls. We hope that Merlin's offline-oriented design fosters new strategies for real-time network control.
摘要
在这项工作中,我们解决了实时通信系统中的带宽估计(BWE)问题,但是与前一些工作不同,我们利用了过去的规则基本方法的巨大努力和深度学习基本技术的相互作用。我们的工作解决了在总结到未经见过的网络动态和从前经验中提取丰富表示的两个关键挑战,以使得数据驱动的带宽估计器可以成功地集成到实时系统中。为此,我们提出了Merlin,第一个完全OFFLINE、数据驱动的BWE解决方案,利用了过去的规则基本方法提取出专家级带宽估计策略。经过一系列实验,我们证明Merlin在对象质量体验指标方面超过了现有的规则基本方法和深度学习基本方法的带宽估计器,并在实际网络部署中实现了42.85%和12.8%的数据损失和延迟减少,分别与WebRTC在跨洲视频会议中比较。我们希望Merlin的OFFLINE-oriented设计会激发新的实时网络控制策略。
CA-PCA: Manifold Dimension Estimation, Adapted for Curvature
results: 经过严格的实验表明,本文提出的 CA-PCA 方法在各种设定下都有所改进,可以更好地估计高维数据的维度。Abstract
The success of algorithms in the analysis of high-dimensional data is often attributed to the manifold hypothesis, which supposes that this data lie on or near a manifold of much lower dimension. It is often useful to determine or estimate the dimension of this manifold before performing dimension reduction, for instance. Existing methods for dimension estimation are calibrated using a flat unit ball. In this paper, we develop CA-PCA, a version of local PCA based instead on a calibration of a quadratic embedding, acknowledging the curvature of the underlying manifold. Numerous careful experiments show that this adaptation improves the estimator in a wide range of settings.
摘要
高维数据分析中算法的成功常被归结于 manifold 假设,即数据位于或靠近一个低维度的抽象 manifold。在进行维度减少之前,常常需要先确定或估算 manifold 的维度。现有的维度估算方法通常使用平坦单位球进行准备。本文提出了 CA-PCA,基于 quadratic embedding 的本地 PCA 方法,考虑到 manifold 的曲率性。详细的实验表明,这种改进可以在各种场景中提高估计器的性能。Note: "高维数据" (gāo wèi xué) in Chinese refers to data with many features or dimensions, and "抽象 manifold" (chōu xiǎng jiāo) refers to a hypothetical lower-dimensional space that the high-dimensional data is assumed to lie on or near.
SUDS: Sanitizing Universal and Dependent Steganography
paper_authors: Preston K. Robinette, Hanchen D. Wang, Nishan Shehadeh, Daniel Moyer, Taylor T. Johnson
for: This paper focuses on developing a deep learning sanitization technique called SUDS to mitigate the shortcomings of steganalysis in detecting steganography.
methods: The paper uses a deep learning approach called SUDS that is not reliant on prior knowledge of steganographic hiding techniques and can sanitize universal and dependent steganography.
results: The paper demonstrates the capabilities and limitations of SUDS through five research questions, including baseline comparisons and an ablation study, and shows that SUDS can increase the resistance of a poisoned classifier against attacks by 1375%.Here’s the Chinese translation of the three key information points:
results: 这篇论文通过五个研究问题,包括基线比较和减少研究,展示了SUDS的能力和局限性。此外,SUDS在一个实际场景中能够提高恶意分类器对攻击的抵抗力 by 1375%.Abstract
Steganography, or hiding messages in plain sight, is a form of information hiding that is most commonly used for covert communication. As modern steganographic mediums include images, text, audio, and video, this communication method is being increasingly used by bad actors to propagate malware, exfiltrate data, and discreetly communicate. Current protection mechanisms rely upon steganalysis, or the detection of steganography, but these approaches are dependent upon prior knowledge, such as steganographic signatures from publicly available tools and statistical knowledge about known hiding methods. These dependencies render steganalysis useless against new or unique hiding methods, which are becoming increasingly common with the application of deep learning models. To mitigate the shortcomings of steganalysis, this work focuses on a deep learning sanitization technique called SUDS that is not reliant upon knowledge of steganographic hiding techniques and is able to sanitize universal and dependent steganography. SUDS is tested using least significant bit method (LSB), dependent deep hiding (DDH), and universal deep hiding (UDH). We demonstrate the capabilities and limitations of SUDS by answering five research questions, including baseline comparisons and an ablation study. Additionally, we apply SUDS to a real-world scenario, where it is able to increase the resistance of a poisoned classifier against attacks by 1375%.
摘要
《隐藏信息在明目张观的形式》,也称为隐藏通信,是一种常用于推广邮件、披露数据和秘密交流的信息隐藏方法。现代隐藏媒体包括图像、文本、音频和视频,这种通信方式在不良行为者中日益普及,以散播蠕虫、披露数据和秘密交流。现有的保护机制主要基于隐藏分析(steganalysis),但这些方法依赖于已知的隐藏技术和统计知识,因此对新或独特的隐藏方法无效。为了解决隐藏分析的缺陷,本研究提出了一种基于深度学习的清洁技术called SUDS,不依赖于隐藏技术的知识,可以清洁universal和依赖隐藏。SUDS在LSB、DDH和UDH方法上进行测试,我们通过 five 个研究问题回答了SUDS的能力和局限性,并进行了减少研究。此外,我们将SUDS应用于实际场景,其能够增加毒性分类器的抵抗力,达到1375%。
Tight bounds on Pauli channel learning without entanglement
results: 论文提出了一个紧binding的下界 bounds for 无共振学习Pauli通道,这个下界是cubic gap 的关键。具体来说, authors 证明了需要 $\Theta(2^n\varepsilon^{-2})$ 轮 measurements来估算Pauli通道的每个特征值到 $\varepsilon$ 错误的高概率。与此相比,一个具有共振的学习算法只需要 $\Theta(\varepsilon^{-2})$ 轮 measurements。这个下界加强了实验准确地示出了共振增强的优势。Abstract
Entanglement is a useful resource for learning, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize separable states, measurements, and operations between the main system of interest and an ancillary system. These algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. We prove a tight lower bound for learning Pauli channels without entanglement that closes a cubic gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ rounds of measurements. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for characterizing Pauli noise.
摘要
Entanglement 是一种有用的资源 для学习,但准确地量ify its advantage 可以具有挑战。在这项工作中,我们认为不使用束缚状态的学习算法Equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward。我们证明了无束缚状态学习Pauli通道的下界, closing a cubic gap between the best-known upper and lower bound。specifically, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ rounds of measurements. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for characterizing Pauli noise.
Monotonic Neural Ordinary Differential Equation: Time-series Forecasting for Cumulative Data
results: 通过对奖金分配场景的广泛实验,我们展示了MODE的优异性,能够处理积累数据中的幂等性和不规则性,并提供了更好的预测性能。Abstract
Time-Series Forecasting based on Cumulative Data (TSFCD) is a crucial problem in decision-making across various industrial scenarios. However, existing time-series forecasting methods often overlook two important characteristics of cumulative data, namely monotonicity and irregularity, which limit their practical applicability. To address this limitation, we propose a principled approach called Monotonic neural Ordinary Differential Equation (MODE) within the framework of neural ordinary differential equations. By leveraging MODE, we are able to effectively capture and represent the monotonicity and irregularity in practical cumulative data. Through extensive experiments conducted in a bonus allocation scenario, we demonstrate that MODE outperforms state-of-the-art methods, showcasing its ability to handle both monotonicity and irregularity in cumulative data and delivering superior forecasting performance.
摘要
时序预测基于累积数据(TSFCD)是决策中的一项重要问题,存在许多工业场景中。然而,现有的时序预测方法通常忽视累积数据中的两个重要特征: monotonicity 和 irregularity,这限制了它们的实际应用。为解决这一限制,我们提议一种原则正的方法 called Monotonic Neural Ordinary Differential Equation(MODE),基于神经常微方程。通过利用 MODE,我们可以有效地捕捉和表示实际累积数据中的 monotonicity 和 irregularity。经过广泛的奖励分配场景的实验,我们示出 MODE 可以在 monotonicity 和 irregularity 的情况下提供更好的预测性能,超过现有的方法。
NetDiffus: Network Traffic Generation by Diffusion Models through Time-Series Imaging
results: 论文表明,使用NetDiffus可以提高66.4%的数据准确性和18.1%的下游机器学习任务。在七种不同的流量轨迹上进行评估, synthetic数据可以显著改善流量识别、异常检测和流量分类。Abstract
Network data analytics are now at the core of almost every networking solution. Nonetheless, limited access to networking data has been an enduring challenge due to many reasons including complexity of modern networks, commercial sensitivity, privacy and regulatory constraints. In this work, we explore how to leverage recent advancements in Diffusion Models (DM) to generate synthetic network traffic data. We develop an end-to-end framework - NetDiffus that first converts one-dimensional time-series network traffic into two-dimensional images, and then synthesizes representative images for the original data. We demonstrate that NetDiffus outperforms the state-of-the-art traffic generation methods based on Generative Adversarial Networks (GANs) by providing 66.4% increase in fidelity of the generated data and 18.1% increase in downstream machine learning tasks. We evaluate NetDiffus on seven diverse traffic traces and show that utilizing synthetic data significantly improves traffic fingerprinting, anomaly detection and traffic classification.
摘要
网络数据分析现在成为网络解决方案的核心。然而,因为现代网络的复杂性、商业敏感性、隐私和法规限制等多种原因,实际网络数据访问受到了限制。在这种情况下,我们探讨了如何利用最近的扩散模型(DM)来生成合成网络流量数据。我们开发了一个端到端框架——NetDiffus,它首先将一维时间序列网络流量转换为二维图像,然后将原始数据生成 representativeness 的图像。我们证明了 NetDiffus 比基于生成对抗网络(GANs)的现有流量生成方法提供了66.4% 的真实性提升和18.1% 的下游机器学任务提升。我们对七个多样化的流量轨迹进行了评估,并显示了利用合成数据可以显著提高流量识别、异常检测和流量分类。
Early Classification for Dynamic Inference of Neural Networks
results: 通过各级别分类器来除外不相关的类别,从而使后续层只需要确定剩下的目标类别。实验结果表明,可以有效降低DNNs在推理中的计算成本。Abstract
Deep neural networks (DNNs) have been successfully applied in various fields. In DNNs, a large number of multiply-accumulate (MAC) operations is required to be performed, posing critical challenges in applying them in resource-constrained platforms, e.g., edge devices. Dynamic neural networks have been introduced to allow a structural adaption, e.g., early-exit, according to different inputs to reduce the computational cost of DNNs. Existing early-exit techniques deploy classifiers at intermediate layers of DNNs to push them to make a classification decision as early as possible. However, the learned features at early layers might not be sufficient to exclude all the irrelevant classes and decide the correct class, leading to suboptimal results. To address this challenge, in this paper, we propose a class-based early-exit for dynamic inference. Instead of pushing DNNs to make a dynamic decision at intermediate layers, we take advantages of the learned features in these layers to exclude as many irrelevant classes as possible, so that later layers only have to determine the target class among the remaining classes. Until at a layer only one class remains, this class is the corresponding classification result. To realize this class-based exclusion, we assign each class with a classifier at intermediate layers and train the networks together with these classifiers. Afterwards, an exclusion strategy is developed to exclude irrelevant classes at early layers. Experimental results demonstrate the computational cost of DNNs in inference can be reduced significantly.
摘要
To address this challenge, we propose a class-based early-exit for dynamic inference. Instead of pushing DNNs to make a dynamic decision at intermediate layers, we take advantage of the learned features in these layers to exclude as many irrelevant classes as possible. We assign each class with a classifier at intermediate layers and train the networks together with these classifiers. An exclusion strategy is then developed to exclude irrelevant classes at early layers.Experimental results demonstrate that the computational cost of DNNs in inference can be significantly reduced using our approach.Simplified Chinese translation:深度神经网络(DNN)在各个领域得到了成功应用,但是它们需要大量的 multiply-accumulate(MAC)操作,这对于具有限制的资源的平台,如边缘设备,具有挑战性。动态神经网络被引入以实现结构适应,例如早期离开,以降低 DNN 的计算成本。现有的早期离开技术是通过在 DNN 中的中间层添加分类器,以便在不同的输入上使 DNN 尽早做出分类决策。然而,学习在中间层的特征可能不够用于排除所有无关的类并决定正确的类,从而导致低效果。为了解决这个挑战,我们在这篇论文中提出了类型基于的早期离开。而不是在 DNN 中的中间层强制做出动态决策,我们利用中间层学习的特征来排除最多的无关类。在每个层只剩下一个类时,这个类就是对应的分类结果。为实现这种类型基于的排除,我们将每个类分配了中间层的分类器,并与这些分类器一起训练 DNN。后续,我们开发了一种排除策略,以便在早期层中排除无关的类。实验结果表明,使用我们的方法可以在 DNN 的推理中减少计算成本。
results: 研究发现,使用点基的深度神经网络可以在mmWave雷达数据上实现更高的人类活动识别精度。此外,该研究还提供了一个大规模的开放数据集,可供研究者进一步探索mmWave雷达技术在人类活动识别领域的应用。Abstract
Millimetre-wave (mmWave) radar has emerged as an attractive and cost-effective alternative for human activity sensing compared to traditional camera-based systems. mmWave radars are also non-intrusive, providing better protection for user privacy. However, as a Radio Frequency (RF) based technology, mmWave radars rely on capturing reflected signals from objects, making them more prone to noise compared to cameras. This raises an intriguing question for the deep learning community: Can we develop more effective point set-based deep learning methods for such attractive sensors? To answer this question, our work, termed MiliPoint, delves into this idea by providing a large-scale, open dataset for the community to explore how mmWave radars can be utilised for human activity recognition. Moreover, MiliPoint stands out as it is larger in size than existing datasets, has more diverse human actions represented, and encompasses all three key tasks in human activity recognition. We have also established a range of point-based deep neural networks such as DGCNN, PointNet++ and PointTransformer, on MiliPoint, which can serve to set the ground baseline for further development.
摘要
幂米波(mmWave)雷达已成为人类活动感知的吸引人和经济实惠的替代方案,比传统的摄像头系统更加cost-effective。另外,mmWave雷达也是不侵入的,为用户隐私提供更好的保护。然而,作为一种Radio Frequency(RF)基于的技术,mmWave雷达需要捕捉到物体上的反射信号,这使其更容易受到噪声的影响,与摄像头相比。这引发了深度学习社区的一个感人问题:可以开发更有效的点集基于深度学习方法吗?为回答这个问题,我们的工作,称为MiliPoint,探讨了这一想法,提供了一个大规模、开放的数据集,让社区可以探索mmWave雷达如何用于人类活动识别。此外,MiliPoint更大、更多样化的人类行为被表示出来,并包括人类活动识别的三个关键任务。我们还在MiliPoint上建立了一些点基的深度神经网络,如DGCNN、PointNet++和PointTransformer,以设置基准 для后续的发展。
DenMune: Density peak based clustering using mutual nearest neighbors
results: 该论文表明,DenMune算法能够在低维和高维数据集上 produz 稳定和Robust的聚类结果,并能自动除掉聚类过程中的噪音和找到目标聚类。Abstract
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other, even in two dimensions. A novel clustering algorithm, DenMune is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle. The algorithm is stable for a wide range of values of K. Moreover, it is able to automatically detect and remove noise from the clustering process as well as detecting the target clusters. It produces robust results on various low and high-dimensional datasets relative to several known state-of-the-art clustering algorithms.
摘要
很多聚类算法在聚类形状不规则、密度不同、数据类型近似的情况下失败。一种新的聚类算法,DenMune,以解决这些挑战。该算法基于在尺度K中的积 nearest neighborhoods,K是用户需要提供的唯一参数,同时遵循积 nearest neighbor consistency principle。算法在不同的K值下具有稳定性,并且可以自动除掉聚类过程中的噪声,同时检测目标聚类。它在各种低维和高维数据集上显示出了相对稳定的结果,与许多已知的状态 искусственный智能算法相比。
Learning Large-Scale MTP$_2$ Gaussian Graphical Models via Bridge-Block Decomposition
results: 实验结果显示, compared to state-of-the-art 参考标准,本研究的提案方法具有重要的速度优化。Abstract
This paper studies the problem of learning the large-scale Gaussian graphical models that are multivariate totally positive of order two ($\text{MTP}_2$). By introducing the concept of bridge, which commonly exists in large-scale sparse graphs, we show that the entire problem can be equivalently optimized through (1) several smaller-scaled sub-problems induced by a \emph{bridge-block decomposition} on the thresholded sample covariance graph and (2) a set of explicit solutions on entries corresponding to bridges. From practical aspect, this simple and provable discipline can be applied to break down a large problem into small tractable ones, leading to enormous reduction on the computational complexity and substantial improvements for all existing algorithms. The synthetic and real-world experiments demonstrate that our proposed method presents a significant speed-up compared to the state-of-the-art benchmarks.
摘要
这篇论文研究大规模的高斯图模型,即多变量完全正的第二阶(MTP2)。我们引入了桥梁概念,这种概念在大规模稀疏图中广泛存在。我们显示,整个问题可以等价地通过以下两个步骤优化:1. 使用桥梁块分解法对阈值矩阵相关的一些更小规模的子问题进行优化。2. 对桥梁相对应的输入进行直观的解决方案。从实践角度来看,这种简单可证的方法可以将大问题分解成小可解决的问题,从而减少计算复杂性和提高所有现有算法的性能。实验表明,我们提出的方法与现有的标准准则相比,具有显著的速度提升。
ML Algorithm Synthesizing Domain Knowledge for Fungal Spores Concentration Prediction
paper_authors: Md Asif Bin Syed, Azmine Toushik Wasi, Imtiaz Ahmed
for: 提高纸品质量控制的效率和可持续性,实现实时精度测试和纠正控制。
methods: 利用时间序列数据和领域知识,采用机器学习算法进行精度预测。
results: 实现实时精度测试和纠正控制,提高纸品质量和可持续性。Abstract
The pulp and paper manufacturing industry requires precise quality control to ensure pure, contaminant-free end products suitable for various applications. Fungal spore concentration is a crucial metric that affects paper usability, and current testing methods are labor-intensive with delayed results, hindering real-time control strategies. To address this, a machine learning algorithm utilizing time-series data and domain knowledge was proposed. The optimal model employed Ridge Regression achieving an MSE of 2.90 on training and validation data. This approach could lead to significant improvements in efficiency and sustainability by providing real-time predictions for fungal spore concentrations. This paper showcases a promising method for real-time fungal spore concentration prediction, enabling stringent quality control measures in the pulp-and-paper industry.
摘要
《纸品生产业需要精准质量控制,以 Ensure 纸品具备不同应用的纯度和不损害性。蕈菌苗量是影响纸品使用性的关键指标,现有的测试方法具有劳动 INTENSIVE 和延迟结果,使得实时控制策略受阻。为此,一种机器学习算法使用时序数据和领域知识进行建模,选择最佳模型为ridge regression,实现了训练和验证数据的MSE为2.90。这种方法可能会在效率和可持续性方面提供重要的改进,并为纸品生产业提供实时蕈菌苗量预测,帮助实施严格的质量控制措施。本文介绍了实时蕈菌苗量预测的有效方法,为纸品生产业带来更好的质量控制和可持续发展。》Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
On the Sweet Spot of Contrastive Views for Knowledge-enhanced Recommendation
results: 对于三个实际 dataset,该方法的实验结果显示,相比之前的状态 искус技术,该方法能够更高效地提高推荐系统的效果。代码可以通过以下隐藏链接获取:https://figshare.com/articles/conference_contribution/SimKGCL/22783382Abstract
In recommender systems, knowledge graph (KG) can offer critical information that is lacking in the original user-item interaction graph (IG). Recent process has explored this direction and shows that contrastive learning is a promising way to integrate both. However, we observe that existing KG-enhanced recommenders struggle in balancing between the two contrastive views of IG and KG, making them sometimes even less effective than simply applying contrastive learning on IG without using KG. In this paper, we propose a new contrastive learning framework for KG-enhanced recommendation. Specifically, to make full use of the knowledge, we construct two separate contrastive views for KG and IG, and maximize their mutual information; to ease the contrastive learning on the two views, we further fuse KG information into IG in a one-direction manner.Extensive experimental results on three real-world datasets demonstrate the effectiveness and efficiency of our method, compared to the state-of-the-art. Our code is available through the anonymous link:https://figshare.com/articles/conference_contribution/SimKGCL/22783382
摘要
在推荐系统中,知识图(KG)可以提供用户-ITEM互动图(IG)缺失的关键信息。近期的进程探索了这个方向,并显示了对照学习是一种有前途的方法来整合两者。然而,我们发现现有的KG强化推荐器在平衡两个对照视图IG和KG的视图之间很困难,有时甚至比不用对IG进行对照学习更不有效。在本文中,我们提出了一个新的对照学习框架 для KG强化推荐。具体来说,为了充分利用知识,我们构建了两个独立的对照视图 для KG和IG,并尽可能地增加它们之间的相互信息。此外,为了让对照学习在两个视图之间更加容易,我们进一步将KG信息集成到IG中一irectionally。我们的实验结果表明,我们的方法比现有的状态前方法更有效和高效,并且可以在三个实际 dataset上进行证明。我们的代码可以通过以下匿名链接获取:https://figshare.com/articles/conference_contribution/SimKGCL/22783382
Learning Invariant Representations with a Nonparametric Nadaraya-Watson Head
results: 通过在三个实际世界领域的域泛化任务上进行验证,表明本方法可以学习不受环境影响的抽象特征,并且可以在不同环境下提供好的预测性能。Abstract
Machine learning models will often fail when deployed in an environment with a data distribution that is different than the training distribution. When multiple environments are available during training, many methods exist that learn representations which are invariant across the different distributions, with the hope that these representations will be transportable to unseen domains. In this work, we present a nonparametric strategy for learning invariant representations based on the recently-proposed Nadaraya-Watson (NW) head. The NW head makes a prediction by comparing the learned representations of the query to the elements of a support set that consists of labeled data. We demonstrate that by manipulating the support set, one can encode different causal assumptions. In particular, restricting the support set to a single environment encourages the model to learn invariant features that do not depend on the environment. We present a causally-motivated setup for our modeling and training strategy and validate on three challenging real-world domain generalization tasks in computer vision.
摘要
机器学习模型经常在培育环境不同于训练环境下部署时失败。当有多个环境可用于训练时,许多方法可以学习不受环境影响的表示,以期这些表示可以在未见领域中传输。在这种工作中,我们提出了一种非 Parametric 策略,基于最近提出的 Nadaraya-Watson(NW)头来学习不受环境影响的表示。NW 头通过比较学习的表示和一个支持集中的标注数据进行比较,来预测。我们表明,通过修改支持集,可以编码不同的 causal 假设。例如,限制支持集为单个环境,使模型学习不受环境的无关特征。我们采用 causally-motivated 的模型和训练策略,并在计算机视觉领域中进行了三个复杂的实际领域泛化任务的验证。
results: 在BirdCLEF2023和AudioSet(平衡) datasets上,ASCA模型实现了81.2%和35.1%的准确率,均高于竞争方法。Abstract
Audio recognition in specialized areas such as birdsong and submarine acoustics faces challenges in large-scale pre-training due to the limitations in available samples imposed by sampling environments and specificity requirements. While the Transformer model excels in audio recognition, its dependence on vast amounts of data becomes restrictive in resource-limited settings. Addressing this, we introduce the Audio Spectrogram Convolution Attention (ASCA) based on CoAtNet, integrating a Transformer-convolution hybrid architecture, novel network design, and attention techniques, further augmented with data enhancement and regularization strategies. On the BirdCLEF2023 and AudioSet(Balanced), ASCA achieved accuracies of 81.2% and 35.1%, respectively, significantly outperforming competing methods. The unique structure of our model enriches output, enabling generalization across various audio detection tasks. Our code can be found at https://github.com/LeeCiang/ASCA.
摘要
<>将文本翻译成简化中文。<>特殊领域的音频识别,如鸟唱和潜船音频识别,由样本环境和特定要求所限制大规模预训练的样本数量。虽然Transformer模型在音频识别方面表现出色,但它在资源有限的设置下变得有限制。为此,我们介绍Audio Spectrogram Convolution Attention(ASCA),基于CoAtNet的干扰混合架构,加入了Transformer- convolution 混合 Architecture,新型网络设计,以及注意技术,并进一步使用数据增强和常规化策略。在BirdCLEF2023和AudioSet(平衡)上,ASCA实现了81.2%和35.1%的准确率,分别大幅超越竞争方法。ASCA的独特结构使得输出更加丰富,使得泛化到不同的音频检测任务。我们的代码可以在https://github.com/LeeCiang/ASCA中找到。
results: 研究发现,使用该方法可以在批量处理和神经网络训练中实现低功耗,并且在分类和回归学习任务中达到了高准确率和低误差。Abstract
Machine learning studies need colossal power to process massive datasets and train neural networks to reach high accuracies, which have become gradually unsustainable. Limited by the von Neumann bottleneck, current computing architectures and methods fuel this high power consumption. Here, we present an analog computing method that harnesses chaotic nonlinear attractors to perform machine learning tasks with low power consumption. Inspired by neuromorphic computing, our model is a programmable, versatile, and generalized platform for machine learning tasks. Our mode provides exceptional performance in clustering by utilizing chaotic attractors' nonlinear mapping and sensitivity to initial conditions. When deployed as a simple analog device, it only requires milliwatt-scale power levels while being on par with current machine learning techniques. We demonstrate low errors and high accuracies with our model for regression and classification-based learning tasks.
摘要
Translated into Simplified Chinese:机器学习研究需要巨大的能源来处理庞大的数据集和训练神经网络以达到高精度,这已经变得不可持续。由于 von Neumann 瓶颈,当前的计算架构和方法都在提高能 consumption。在这里,我们提出了一种 Analog computing 方法,利用混沌非线性吸引器来实现机器学习任务,具有低功耗Characteristics。 draw inspiration from neuromorphic computing, our model is a programmable, versatile, and generalized platform for machine learning tasks. Our model shows excellent performance in clustering by leveraging chaotic attractors' nonlinear mapping and sensitivity to initial conditions. When deployed as a simple analog device, it only requires milliwatt-scale power levels while being on par with current machine learning techniques. We demonstrate low errors and high accuracies with our model for regression and classification-based learning tasks.
Accelerating Particle and Fluid Simulations with Differentiable Graph Networks for Solving Forward and Inverse Problems
results: 对于 granular flow prediction 比 CPU 平行数值计算 Speedup 达到 165 倍,提议一种 hybrid GNS/Material Point Method(MPM),能够在 GNS rollouts 中插入 MPM,以满足保守量和误差最小化,实现了对数值计算的 24 倍加速。 GNS 还可以解决 inverse problems,通过自动导数来计算摩擦角的梯度,并且可以逐步更新摩擦角,以实现最佳匹配目标跑道距离。Abstract
We leverage physics-embedded differentiable graph network simulators (GNS) to accelerate particulate and fluid simulations to solve forward and inverse problems. GNS represents the domain as a graph with particles as nodes and learned interactions as edges. Compared to modeling global dynamics, GNS enables learning local interaction laws through edge messages, improving its generalization to new environments. GNS achieves over 165x speedup for granular flow prediction compared to parallel CPU numerical simulations. We propose a novel hybrid GNS/Material Point Method (MPM) to accelerate forward simulations by minimizing error on a pure surrogate model by interleaving MPM in GNS rollouts to satisfy conservation laws and minimize errors achieving 24x speedup compared to pure numerical simulations. The differentiable GNS enables solving inverse problems through automatic differentiation, identifying material parameters that result in target runout distances. We demonstrate the ability of GNS to solve inverse problems by iteratively updating the friction angle (a material property) by computing the gradient of a loss function based on the final and target runouts, thereby identifying the friction angle that best matches the observed runout. The physics-embedded and differentiable simulators open an exciting new paradigm for AI-accelerated design, control, and optimization.
摘要
Translated into Simplified Chinese:我们利用嵌入物理的分解urable图示网络优化器(GNS)加速粒子和流体 simulations以解决前向和反向问题。GNS将Domain表示为一个图,粒子作为节点,学习交互作为边。与模型全局动力学相比,GNS可以通过边上消息学习地方交互规则,提高其适应新环境的能力。GNS在粒子流预测方面实现了165倍的加速,比CPU并行数值 simulations 高得多。我们提出了一种新的混合GNS/物理点方法(MPM),通过混合MPM在GNS扫描中来加速前向 simulations,以满足保守法和降低误差,实现24倍的加速比例。可微的GNS可以通过自动导数来解决反向问题,Material angle (一种材料属性)的迭代更新,以实现最佳匹配目标跑道距离。我们示出了GNS可以解决反向问题,通过计算目标跑道距离和最终跑道距离的损失函数梯度来更新Friction angle。物理嵌入和可微的模拟器开启了一个新的AI加速设计、控制和优化的新时代。
On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay
results: 研究结果表明,在小量噪声下,very wide neural network 才存在 ‘benign overfitting’ 现象。Abstract
The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the 'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign overfitting phenomenon' exists in very wide neural networks only when the noise level is small.
摘要
widely observed "benign overfitting phenomenon" in the neural network literature raises the challenge to the "bias-variance trade-off" doctrine in the statistical learning theory. Since the generalization ability of the "lazy trained" over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the "Gaussian design" assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition, and the noise. In particular, our results suggest that the "benign overfitting phenomenon" exists in very wide neural networks only when the noise level is small.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know.
Predicting Temperature of Major Cities Using Machine Learning and Deep Learning
paper_authors: Wasiou Jaharabi, MD Ibrahim Al Hossain, Rownak Tahmid, Md. Zuhayer Islam, T. M. Saad Rayhan
For: The paper aims to develop an accurate temperature prediction method using machine learning algorithms and time series analysis, specifically focusing on the temperature data of major cities.* Methods: The authors use a dataset provided by the University of Dayton, which includes temperature data from major cities. They apply time series analysis techniques such as ARIMA, SARIMA, and Prophet, and incorporate the concept of RNN and LSTM to filter out abnormalities, preprocess the data, and make predictions of future temperature trends.* Results: The authors achieve accurate predictions of temperature in major cities based on the available data, and demonstrate the effectiveness of their method in combating climate change by providing accurate temperature predictions for future reference.Abstract
Currently, the issue that concerns the world leaders most is climate change for its effect on agriculture, environment and economies of daily life. So, to combat this, temperature prediction with strong accuracy is vital. So far, the most effective widely used measure for such forecasting is Numerical weather prediction (NWP) which is a mathematical model that needs broad data from different applications to make predictions. This expensive, time and labor consuming work can be minimized through making such predictions using Machine learning algorithms. Using the database made by University of Dayton which consists the change of temperature in major cities we used the Time Series Analysis method where we use LSTM for the purpose of turning existing data into a tool for future prediction. LSTM takes the long-term data as well as any short-term exceptions or anomalies that may have occurred and calculates trend, seasonality and the stationarity of a data. By using models such as ARIMA, SARIMA, Prophet with the concept of RNN and LSTM we can, filter out any abnormalities, preprocess the data compare it with previous trends and make a prediction of future trends. Also, seasonality and stationarity help us analyze the reoccurrence or repeat over one year variable and removes the constrain of time in which the data was dependent so see the general changes that are predicted. By doing so we managed to make prediction of the temperature of different cities during any time in future based on available data and built a method of accurate prediction. This document contains our methodology for being able to make such predictions.
摘要
To develop our methodology, we used a database of temperature changes in major cities, provided by the University of Dayton. We employed Time Series Analysis, specifically Long Short-Term Memory (LSTM) models, to turn existing data into a tool for future prediction. LSTM models can capture long-term trends, seasonality, and stationarity in the data, allowing us to make accurate predictions.We used models such as ARIMA, SARIMA, and Prophet, all of which incorporate the concept of Recurrent Neural Networks (RNN) and LSTM. These models can filter out anomalies, preprocess the data, and compare it with previous trends to make accurate predictions. Additionally, seasonality and stationarity help us analyze the reoccurrence of variables over one year and remove the constraints of time, allowing us to see general changes predicted.By using this methodology, we were able to make accurate predictions of temperature in different cities at any time in the future based on available data. This document outlines our methodology for making such predictions.
An Interpretable Systematic Review of Machine Learning Models for Predictive Maintenance of Aircraft Engine
paper_authors: Abdullah Al Hasib, Ashikur Rahman, Mahpara Khabir, Md. Tanvir Rouf Shawon
For: This paper aims to predict aircraft engine failure using machine learning and deep learning models to avoid any kind of disaster.* Methods: The paper utilizes sensor data and employs various machine learning and deep learning models such as LSTM, Bi-LSTM, RNN, Bi-RNN, GRU, Random Forest, KNN, Naive Bayes, and Gradient Boosting to predict aircraft engine failure within a predetermined number of cycles.* Results: The paper achieves a lucrative accuracy of 97.8%, 97.14%, and 96.42% using GRU, Bi-LSTM, and LSTM respectively, demonstrating the capability of the models to predict maintenance at an early stage.Abstract
This paper presents an interpretable review of various machine learning and deep learning models to predict the maintenance of aircraft engine to avoid any kind of disaster. One of the advantages of the strategy is that it can work with modest datasets. In this study, sensor data is utilized to predict aircraft engine failure within a predetermined number of cycles using LSTM, Bi-LSTM, RNN, Bi-RNN GRU, Random Forest, KNN, Naive Bayes, and Gradient Boosting. We explain how deep learning and machine learning can be used to generate predictions in predictive maintenance using a straightforward scenario with just one data source. We applied lime to the models to help us understand why machine learning models did not perform well than deep learning models. An extensive analysis of the model's behavior is presented for several test data to understand the black box scenario of the models. A lucrative accuracy of 97.8%, 97.14%, and 96.42% are achieved by GRU, Bi-LSTM, and LSTM respectively which denotes the capability of the models to predict maintenance at an early stage.
摘要
CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity
For: 降低分布式机器学习中的通信复杂度,以提高训练速度和扩展机器数。* Methods: 提出了一新技术 named Common randOm REconstruction(CORE), 可以压缩在机器之间传输的信息,以降低通信复杂度,不受其他严格条件限制。 CORE 将 вектор值信息投影到低维度的归一化向量上,并在通信后重建信息,通过共同Random vectors。* Results: 应用 CORE 到两个分布式任务,分别是线性模型的凸优化和通用非凸优化,设计了新的分布式算法,可以证明性地降低通信复杂度。例如,我们示出对线性模型,CORE 基于算法可以编码梯度 вектор到 $\mathcal{O}(1)$-bits(对 $\mathcal{O}(d)$ 比),并保持不变的整体趋势,超过现有结果。Abstract
With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. In this paper, we propose a new technique named Common randOm REconstruction(CORE), which can be used to compress the information transmitted between machines in order to reduce communication complexity without other strict conditions. Especially, our technique CORE projects the vector-valued information to a low-dimensional one through common random vectors and reconstructs the information with the same random noises after communication. We apply CORE to two distributed tasks, respectively convex optimization on linear models and generic non-convex optimization, and design new distributed algorithms, which achieve provably lower communication complexities. For example, we show for linear models CORE-based algorithm can encode the gradient vector to $\mathcal{O}(1)$-bits (against $\mathcal{O}(d)$), with the convergence rate not worse, preceding the existing results.
摘要
With 分布式机器学习技术在大规模机器学习任务中成为主要瓶颈,交流复杂性已成为加速训练和扩大机器数量的关键问题。在这篇论文中,我们提出一种新的技术名为通用随机重建(CORE),可以减少机器之间的信息传输量,从而降低交流复杂性,而无需其他严格条件。尤其是,我们的CORE技术将向量值信息映射到低维度的均匀随机向量上,并在通信后重建信息,同时保留了同样的随机噪声。我们在两个分布式任务中应用CORE技术,分别是线性模型的凸优化和非凸优化,并设计了新的分布式算法,实现了可靠性下降的交流复杂性。例如,我们示出了对线性模型的CORE-based算法可以将梯度向量编码为$\mathcal{O}(1)$-比特(对$\mathcal{O}(d)$比特),并且保持不变的整体性能。
Beyond Fairness: Age-Harmless Parkinson’s Detection via Voice
results: 我们的方法可以增强PD早期识别的准确性,不会对老年人群(age > 55)的预测精度造成影响。此外,我们也提出了一个两步检测策略,以便评估年轻人群中可能的PD早期病人。Abstract
Parkinson's disease (PD), a neurodegenerative disorder, often manifests as speech and voice dysfunction. While utilizing voice data for PD detection has great potential in clinical applications, the widely used deep learning models currently have fairness issues regarding different ages of onset. These deep models perform well for the elderly group (age $>$ 55) but are less accurate for the young group (age $\leq$ 55). Through our investigation, the discrepancy between the elderly and the young arises due to 1) an imbalanced dataset and 2) the milder symptoms often seen in early-onset patients. However, traditional debiasing methods are impractical as they typically impair the prediction accuracy for the majority group while minimizing the discrepancy. To address this issue, we present a new debiasing method using GradCAM-based feature masking combined with ensemble models, ensuring that neither fairness nor accuracy is compromised. Specifically, the GradCAM-based feature masking selectively obscures age-related features in the input voice data while preserving essential information for PD detection. The ensemble models further improve the prediction accuracy for the minority (young group). Our approach effectively improves detection accuracy for early-onset patients without sacrificing performance for the elderly group. Additionally, we propose a two-step detection strategy for the young group, offering a practical risk assessment for potential early-onset PD patients.
摘要
帕金森病(PD),一种神经退化疾病,经常表现为语音和嗓音畸形。使用语音数据进行PD检测具有很大的优势,但目前广泛使用的深度学习模型存在年龄偏见问题。这些深度模型对于年龄大于55岁的老年组(age>55)表现良好,但对年龄小于或等于55岁的青年组(age<=55)表现不准确。经过我们的调查,年龄偏见的原因包括1)数据集偏见和2)早期病人的轻微症状。然而,传统的偏见纠正方法不实用,因为它们通常会降低主要组(老年组)的预测精度。为解决这个问题,我们提出了一种新的偏见纠正方法,利用GradCAM基于特征遮盖和ensemble模型,以确保不会降低公平性和精度。具体来说,GradCAM基于特征遮盖选择性地遮盖语音数据中年龄相关的特征,保留PD检测中必要的信息。而ensemble模型进一步提高了少数群(年龄小于或等于55岁)的预测精度。我们的方法可以提高早期PD检测的准确率,而不会损害老年组的表现。此外,我们还提议了一种两步检测策略,为 potential early-onset PD 患者提供实用的风险评估。
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
results: 该文章的方法可以在无需假设任何弱依赖关系的情况下,对无限时间执行 Markov 决策过程中的目标策略值进行高置信度评估,并且可以适应不同的分布转换。Abstract
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes, where the objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies. This task faces two primary challenges: providing a comprehensive and rigorous error quantification in CI estimation, and addressing the distributional shift that results from discrepancies between the distribution induced by the target policy and the offline data-generating process. Motivated by an innovative unified error analysis, we jointly quantify the two sources of estimation errors: the misspecification error on modeling marginalized importance weights and the statistical uncertainty due to sampling, within a single interval. This unified framework reveals a previously hidden tradeoff between the errors, which undermines the tightness of the CI. Relying on a carefully designed discriminator function, the proposed estimator achieves a dual purpose: breaking the curse of the tradeoff to attain the tightest possible CI, and adapting the CI to ensure robustness against distributional shifts. Our method is applicable to time-dependent data without assuming any weak dependence conditions via leveraging a local supermartingale/martingale structure. Theoretically, we show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings. The numerical performance of the proposed method is examined in synthetic datasets and an OhioT1DM mobile health study.
摘要
我们研究高自信度偏离策略评估在无穷远 horizon Markov决策过程中,目标是使用偏离策略评估数据来建立一个信息interval(CI)的目标策略价值。这个任务面临两个主要挑战:一是提供全面和准确的误差量化,二是Addressing the distributional shift that results from discrepancies between the distribution induced by the target policy and the offline data-generating process。我们受益于一种创新的统一错误分析,可以同时量化两个来源的误差:模型缺陷导致的重要性加权的误差和采样统计误差。这种统一框架显示了一个隐藏的贸易关系,这种关系使得CI的紧密性受到影响。我们采用一种特制的探测函数,该函数可以实现两个目的:打破贸易关系,以便获得最紧密的CI,并适应CI以确保对分布差异的Robustness。我们的方法可以在没有任何弱依赖条件下应用于时间相关的数据,通过利用本地超martingale/martingale结构。理论上,我们显示了我们的算法是样本效率的,误差稳定的,并在非线性函数近似设定下可靠地收敛。我们的方法的数学性能在 sintetic数据和OhioT1DM移动医学研究中进行了数值验证。
A Deep Learning Sequential Decoder for Transient High-Density Electromyography in Hand Gesture Recognition Using Subject-Embedded Transfer Learning
methods: 这个研究使用了深度学习模型,并将Subject-specific transfer learning和多因素混合结构组合在一起,以提高手势识别精度。
results: 研究获得了73%的平均准确率,在65个手势中预测了73%的手势,并且在有限的训练数据下表现比subject-specific方法更好。Abstract
Hand gesture recognition (HGR) has gained significant attention due to the increasing use of AI-powered human-computer interfaces that can interpret the deep spatiotemporal dynamics of biosignals from the peripheral nervous system, such as surface electromyography (sEMG). These interfaces have a range of applications, including the control of extended reality, agile prosthetics, and exoskeletons. However, the natural variability of sEMG among individuals has led researchers to focus on subject-specific solutions. Deep learning methods, which often have complex structures, are particularly data-hungry and can be time-consuming to train, making them less practical for subject-specific applications. In this paper, we propose and develop a generalizable, sequential decoder of transient high-density sEMG (HD-sEMG) that achieves 73% average accuracy on 65 gestures for partially-observed subjects through subject-embedded transfer learning, leveraging pre-knowledge of HGR acquired during pre-training. The use of transient HD-sEMG before gesture stabilization allows us to predict gestures with the ultimate goal of counterbalancing system control delays. The results show that the proposed generalized models significantly outperform subject-specific approaches, especially when the training data is limited, and there is a significant number of gesture classes. By building on pre-knowledge and incorporating a multiplicative subject-embedded structure, our method comparatively achieves more than 13% average accuracy across partially observed subjects with minimal data availability. This work highlights the potential of HD-sEMG and demonstrates the benefits of modeling common patterns across users to reduce the need for large amounts of data for new users, enhancing practicality.
摘要
人工智能(AI)激活人机界面(HGR)已经吸引了广泛的关注,因为它可以通过解读 périphérique nervous system的深度空间动态特征,如表面电MYography(sEMG)来控制虚拟现实、迅速 prótesis 和 exoskeletons。然而,人类的自然变化导致研究人员更加注重具体化解决方案。深度学习方法,经常具有复杂结构,需要大量数据和训练时间,使其更难实现具体化应用。在这篇论文中,我们提出和开发了一种通用的、顺序解码器,可以在部分观察者下达73%的平均准确率,对65个手势进行预测,通过在pre-training中获得的HGR知识进行嵌入式传播学习。使用transient HD-sEMG передgesture稳定化可以预测手势,以ultimate goal of counterbalancing system control delays。结果表明,我们的总体模型在限制数据量的情况下,特别是有许多手势类型的情况下,较subject-specific方法表现出优异。通过建立在pre-knowledge基础上,并通过multiplicative subject-embedded结构,我们的方法可以在有限的数据可用性下,实现更高的13%的平均准确率。这种工作展示了HD-sEMG的潜力,并证明了模型Users across common patterns可以降低新用户需要的数据量,提高实用性。
Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
results: 这篇论文的结果显示,使用Zen的Gradient Synchronization系统可以实现至多5.09倍的交通时间减少和训练过程中的对比增加,相比之前的方法。Abstract
Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training efficiency. Yet, the optimal communication scheme to fully leverage sparsity is still missing. This paper aims to address this gap. We first analyze the characteristics of sparse tensors in popular DNN models to understand the fundamentals of sparsity. We then systematically explore the design space of communication schemes for sparse tensors and find the optimal one. % We then find the optimal scheme based on the characteristics by systematically exploring the design space. We also develop a gradient synchronization system called Zen that approximately realizes it for sparse tensors. We demonstrate that Zen can achieve up to 5.09x speedup in communication time and up to 2.48x speedup in training throughput compared to the state-of-the-art methods.
摘要
分布式训练是深度神经网络(DNN)训练的标准化方法,以多个GPU进行扩大。但是,分布式训练的性能瓶颈在交换梯度同步方面。在实践中,人们发现了梯度矩阵中的稀疏性,这表明可以减少交换的流量并提高端到端训练效率。然而,完全利用稀疏性的最佳通信方案仍然缺失。这篇论文的目的是填补这个差距。我们首先分析了流行的DNN模型中稀疏矩阵的特点,以了解稀疏性的基础。然后,我们系统地探索了稀疏矩阵的通信方案的设计空间,并找到最佳的一种。我们还开发了一个名为“Zen”的梯度同步系统,可以对稀疏矩阵进行约束式实现。我们 demonstarte了Zen可以在交换时间方面实现5.09倍的速度提高和在训练吞吐量方面实现2.48倍的速度提高,比现有方法更高。
Importance of negative sampling in weak label learning
results: 该论文在CIFAR-10和AudioSet datasets上进行测试,并显示了减少计算成本和提高弱标注分类性能的result。Abstract
Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open problem that has not been well studied for weak-label learning. In this paper, we study several sampling strategies that can measure the usefulness of negative instances for weak-label learning and select them accordingly. We test our method on CIFAR-10 and AudioSet datasets and show that it improves the weak-label classification performance and reduces the computational cost compared to random sampling methods. Our work reveals that negative instances are not all equally irrelevant, and selecting them wisely can benefit weak-label learning.
摘要
弱标记学习是一项具有挑战性的任务,需要从包含正例和负例的数据袋中学习,但只知道包袋标签。负例pool通常比正例pool更大,因此选择每个包袋中最有用的负例是一个开放的问题,尚未得到了充分的研究。在这篇论文中,我们研究了一些采样策略,可以衡量负例对弱标记学习的用于fulfillment,并根据此选择负例。我们在CIFAR-10和AudioSet数据集上测试了我们的方法,并证明它可以提高弱标记分类性能和降低计算成本,相比随机采样方法。我们的工作表明,负例不是一样无关,选择它们谨慎可以帮助弱标记学习。
Grad DFT: a software library for machine learning enhanced density functional theory
results: 研究人员开发了一个名为Grad DFT的完全可导的JAX基础库,可以快速实现和试验机器学习提高DFT的exchange-correlation能量函数。此外,研究人员还编译了一个精心选择的实验数据集,用于训练和测试模型的准确性。Abstract
Density functional theory (DFT) stands as a cornerstone method in computational quantum chemistry and materials science due to its remarkable versatility and scalability. Yet, it suffers from limitations in accuracy, particularly when dealing with strongly correlated systems. To address these shortcomings, recent work has begun to explore how machine learning can expand the capabilities of DFT; an endeavor with many open questions and technical challenges. In this work, we present Grad DFT: a fully differentiable JAX-based DFT library, enabling quick prototyping and experimentation with machine learning-enhanced exchange-correlation energy functionals. Grad DFT employs a pioneering parametrization of exchange-correlation functionals constructed using a weighted sum of energy densities, where the weights are determined using neural networks. Moreover, Grad DFT encompasses a comprehensive suite of auxiliary functions, notably featuring a just-in-time compilable and fully differentiable self-consistent iterative procedure. To support training and benchmarking efforts, we additionally compile a curated dataset of experimental dissociation energies of dimers, half of which contain transition metal atoms characterized by strong electronic correlations. The software library is tested against experimental results to study the generalization capabilities of a neural functional across potential energy surfaces and atomic species, as well as the effect of training data noise on the resulting model accuracy.
摘要
density functional theory(DFT)是计算量子化学和材料科学中的一种重要方法,它具有优秀的 universality 和可扩展性。然而,它在强 correlate 系统中的准确性有限制。为了解决这些缺陷,最近的工作开始使用机器学习技术来扩展 DFT 的能力;这是一个充满开放 вопросов和技术挑战的尝试。在这个工作中,我们提出了 Grad DFT:一个完全可导的 JAX 基础库,允许快速的原型和机器学习增强 exchange-correlation 能量函数的 экспериментирование。Grad DFT 使用一种先进的 exchange-correlation 函数的参数化方法,该方法通过使用神经网络确定的权重,将 exchange-correlation 函数转化为一个可导的形式。此外,Grad DFT 还包括一系列辅助函数,其中包括一个可编译的和完全可导的自 consistent 迭代过程。为支持训练和测试努力,我们还编译了一个权威的对应的实验性离解能量数据集,该数据集包括含有过渡金属原子的 dimer 的实验离解能量,这些金属原子具有强电子 correlate 性。软件库在实验结果上进行测试,以研究一个神经函数在 potential energy surface 和原子种之间的泛化能力,以及训练数据噪声对模型准确性的影响。
Causal Reasoning: Charting a Revolutionary Course for Next-Generation AI-Native Wireless Networks
results: 本文指出,通过 incorporating causal discovery,可以解决无线网络中的一些挑战,如 ultra-reliable beamforming、near-accurate physical twin modeling、training data augmentation 和 semantic communication。同时,本文还提出了一些可能的框架,用于通过 causal inference 实现未来无线网络的总体目标,包括意图管理、动态适应性、人类水平的认知和理解。Abstract
Despite the basic premise that next-generation wireless networks (e.g., 6G) will be artificial intelligence (AI)-native, to date, most existing efforts remain either qualitative or incremental extensions to existing ``AI for wireless'' paradigms. Indeed, creating AI-native wireless networks faces significant technical challenges due to the limitations of data-driven, training-intensive AI. These limitations include the black-box nature of the AI models, their curve-fitting nature, which can limit their ability to reason and adapt, their reliance on large amounts of training data, and the energy inefficiency of large neural networks. In response to these limitations, this article presents a comprehensive, forward-looking vision that addresses these shortcomings by introducing a novel framework for building AI-native wireless networks; grounded in the emerging field of causal reasoning. Causal reasoning, founded on causal discovery, causal representation learning, and causal inference, can help build explainable, reasoning-aware, and sustainable wireless networks. Towards fulfilling this vision, we first highlight several wireless networking challenges that can be addressed by causal discovery and representation, including ultra-reliable beamforming for terahertz (THz) systems, near-accurate physical twin modeling for digital twins, training data augmentation, and semantic communication. We showcase how incorporating causal discovery can assist in achieving dynamic adaptability, resilience, and cognition in addressing these challenges. Furthermore, we outline potential frameworks that leverage causal inference to achieve the overarching objectives of future-generation networks, including intent management, dynamic adaptability, human-level cognition, reasoning, and the critical element of time sensitivity.
摘要
尽管下一代无线网络(如6G)将是人工智能(AI)Native,但到目前为止,大多数现有努力仍然是质量的或增量的对现有“AI for wireless” paradigms的扩展。实际上,创建AI Native的无线网络面临着 significativetchnical挑战,主要是因为AI模型的黑盒性、curve-fitting性、需要大量训练数据、以及大 neural networks的能源浪费。为了解决这些挑战,这篇文章提出了一个全面的、前瞻的视野,通过引入一种新的AI Native无线网络框架来解决这些缺陷。这个框架基于emerging field of causal reasoning,可以帮助建立可解释、 reasoning-aware 和可持续的无线网络。为实现这个视野,我们首先 highlight了一些无线网络挑战可以通过 causal discovery 和 representation learning来解决,包括THz系统中的可靠性 beamforming、数字 twin 模型化、训练数据增强和semantic communication。我们示出了如何通过 causal discovery 来实现动态适应、抗难以适应和认知的能力。此外,我们还 outline了可以利用 causal inference 来实现未来 generation networks 的主要目标,包括意图管理、动态适应、人类水平的认知、reasoning 和时间敏感性。