results: 论文表明,使用GPT-4生成的提问可以大幅提高CLIP模型的零shot传输精度(7%),并且设计了一种简单的几招 adapter 来选择最佳的句子来构建通用的分类器,该分类器可以超过 CoCoOP 的表现(2%)。Abstract
Contrastive pretrained large Vision-Language Models (VLMs) like CLIP have revolutionized visual representation learning by providing good performance on downstream datasets. VLMs are 0-shot adapted to a downstream dataset by designing prompts that are relevant to the dataset. Such prompt engineering makes use of domain expertise and a validation dataset. Meanwhile, recent developments in generative pretrained models like GPT-4 mean they can be used as advanced internet search tools. They can also be manipulated to provide visual information in any structure. In this work, we show that GPT-4 can be used to generate text that is visually descriptive and how this can be used to adapt CLIP to downstream tasks. We show considerable improvements in 0-shot transfer accuracy on specialized fine-grained datasets like EuroSAT (~7%), DTD (~7%), SUN397 (~4.6%), and CUB (~3.3%) when compared to CLIP's default prompt. We also design a simple few-shot adapter that learns to choose the best possible sentences to construct generalizable classifiers that outperform the recently proposed CoCoOP by ~2% on average and by over 4% on 4 specialized fine-grained datasets. The code, prompts, and auxiliary text dataset is available at https://github.com/mayug/VDT-Adapter.
摘要
带有对比学习的大型视力语言模型(VLM)如CLIP,已经革命化视觉表示学习。VLM可以通过设计相关的提问来适应下游数据集,而这种提问工程充分利用了领域专业知识和验证数据集。同时,最新的生成预训练模型如GPT-4,可以用作高级网络搜索工具,同时也可以通过提供视觉信息的任意结构来 manipulate。在这项工作中,我们示出了使用GPT-4生成文本可以提供视觉描述,并如何使用这种方法适应CLIP到下游任务。我们在特殊细化数据集上(如EuroSAT、DTD、SUN397和CUB)表现出了 considerable 的提升(~7%、~7%、~4.6%和~3.3%),相比CLIP的默认提问。我们还设计了一个简单的几招学习器,可以选择最佳的句子来构建通用的分类器,超过CoCoOP的最近提出的 ~2% 的平均提升和 ~4% 的特殊细化数据集上的提升。代码、提问和辅助文本数据集可以在 https://github.com/mayug/VDT-Adapter 上找到。
paper_authors: Khashayar Khosravi, Renato Paes Leme, Chara Podimata, Apostolis Tsorvantzis
for: 学习渠道反馈中的抽象渠道问题,特别是在推荐系统和在线广告中。
methods: 使用多支武器抽象渠道模型,考虑渠道状态的演化和不可见性。
results: 提出了一种受到演化状态的抽象渠道模型,并对这种模型进行了线上学习算法的分析,包括对任何可能的演化率$\lambda$的分析。结果显示,在不同的演化率下,算法的 regret 率可以达到 $\widetilde O(\sqrt{KT})$、$\widetilde O(T^{b/a})$ 和 $\widetilde O(K^{1/3}T^{2/3})$ 等 bound。Abstract
We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States. The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short-term reward of the action chosen and how ``healthy'' the system is (i.e., as measured by its state). For example, in recommendation systems, the reward that the platform obtains from a user's engagement with a particular type of content depends not only on the inherent features of the specific content, but also on how the user's preferences have evolved as a result of interacting with other types of content on the platform. Our general model accounts for the different rate $\lambda \in [0,1]$ at which the state evolves (e.g., how fast a user's preferences shift as a result of previous content consumption) and encompasses standard multi-armed bandits as a special case. The goal of the algorithm is to minimize a notion of regret against the best-fixed sequence of arms pulled. We analyze online learning algorithms for any possible parametrization of the evolution rate $\lambda$. Specifically, the regret rates obtained are: for $\lambda \in [0, 1/T^2]$: $\widetilde O(\sqrt{KT})$; for $\lambda = T^{-a/b}$ with $b < a < 2b$: $\widetilde O (T^{b/a})$; for $\lambda \in (1/T, 1 - 1/\sqrt{T}): \widetilde O (K^{1/3}T^{2/3})$; and for $\lambda \in [1 - 1/\sqrt{T}, 1]: \widetilde O (K\sqrt{T})$.
摘要
我们提出一种学习模型,称为带有束缚演化的带刺机制(Bandits with Deterministically Evolving States)。该模型的应用场景包括推荐系统和在线广告学习。在这两种情况下,算法在每个轮次获得的奖励是基于选择的动作的短期奖励和系统的状态。例如,在推荐系统中,用户对于特定类型的内容的互动奖励取决于内容的特性以及用户在前一次互动后的偏好的演化程度。我们的通用模型考虑了不同的演化速率 $\lambda \in [0,1]$,并包括标准多重机枪为特例。算法的目标是在最佳固定序列的机枪上取得最小的误差。我们分析了在任何可能的参数化 $\lambda$ 下的在线学习算法,并得到了以下 regret 率:对 $\lambda \in [0, 1/T^2]$, $\widetilde O(\sqrt{KT})$;对 $\lambda = T^{-a/b}$, $b < a < 2b$, $\widetilde O (T^{b/a})$;对 $\lambda \in (1/T, 1 - 1/\sqrt{T})$, $\widetilde O (K^{1/3}T^{2/3})$;对 $\lambda \in [1 - 1/\sqrt{T}, 1]$, $\widetilde O (K\sqrt{T})$。
Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs
for: This paper is written for improving the exploration of reinforcement learning (RL) in single-agent scenarios with sparse reward signals, and enabling the ease of decomposition in multi-agent systems.
methods: The paper proposes a multi-agent skill discovery method that approximates the joint state space as a Kronecker graph, and estimates its Fiedler vector using the Laplacian spectrum of individual agents’ transition graphs. The method also includes a deep learning extension using NN-based representation learning techniques.
results: The proposed algorithm is evaluated on multi-agent tasks built with simulators like Mujoco, and shows significant outperformance compared to the state-of-the-art.Here is the text in Simplified Chinese:
results: 提出的算法在使用 Mujoco 等模拟器构建的多机器人任务上进行评估,并显示出了明显的超越现有的state-of-the-art。Abstract
Covering skill (a.k.a., option) discovery has been developed to improve the exploration of RL in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. Given that joint state space grows exponentially with the number of agents in multi-agent systems, existing researches still relying on single-agent skill discovery either become prohibitive or fail to directly discover joint skills that improve the connectivity of the joint state space. In this paper, we propose multi-agent skill discovery which enables the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector using the Laplacian spectrum of individual agents' transition graphs. Further, considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method by estimating eigenfunctions through NN-based representation learning techniques. The evaluation on multi-agent tasks built with simulators like Mujoco, shows that the proposed algorithm can successfully identify multi-agent skills, and significantly outperforms the state-of-the-art. Codes are available at: https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP.
摘要
<>使用简化中文表述文本。<>利用技能探索(即选项)发展来改进单机RL探索,通过连接状态transition图中的最远状态点。由于多机体系中的共同状态空间 exponentiation WITH agent数量,现有的研究仍然依赖单机体系技能探索,导致计算拥堵或直接探索多机体系技能而不可能。在这篇论文中,我们提出了多机体系技能探索,允许拆分。我们的关键思想是将共同状态空间approximer为Kronecker图,基于这个图可以直接估算其Fiedler вектор使用个体代理的过程图laplacian spectrum。此外,由于直接计算laplacian spectrum是任务 WITH 无穷大状态空间中的不可能,我们进一步提出了基于深度学习的方法,通过NN-based representation learning技术来估算eigenfunctions。在使用Mujoco等模拟器constructed的多机任务上进行评估,我们的方法可以成功地确定多机技能,并在与当前最佳方法进行比较时显著超越。代码可以在https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP中找到。
Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization
methods: Implicit global-to-local value regularization, in-sample learning
results: Superior performance over state-of-the-art offline MARL methods in almost all tasks, as demonstrated through comprehensive experiments on the offline multi-agent MuJoCo and StarCraft II micro-management tasks.Abstract
Offline reinforcement learning (RL) has received considerable attention in recent years due to its attractive capability of learning policies from offline datasets without environmental interactions. Despite some success in the single-agent setting, offline multi-agent RL (MARL) remains to be a challenge. The large joint state-action space and the coupled multi-agent behaviors pose extra complexities for offline policy optimization. Most existing offline MARL studies simply apply offline data-related regularizations on individual agents, without fully considering the multi-agent system at the global level. In this work, we present OMIGA, a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization. OMIGA provides a principled framework to convert global-level value regularization into equivalent implicit local value regularizations and simultaneously enables in-sample learning, thus elegantly bridging multi-agent value decomposition and policy learning with offline regularizations. Based on comprehensive experiments on the offline multi-agent MuJoCo and StarCraft II micro-management tasks, we show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
摘要
偏向式学习(RL)在线下学习(Offline RL)方面已经吸引了一些注意,因为它可以从在线数据集中学习策略而不需要环境交互。然而,多智能体RL(MARL)在线下仍然是一个挑战。大量共同状态-动作空间和多智能体行为的交互增加了在线策略优化的复杂性。大多数现有的在线MARL研究只是对个体代理应用在线数据相关的正则化,没有全面考虑多智能体系统的全局水平。在这种情况下,我们提出了OMIGA,一种新的在线多智能体RL算法,其中包含隐式全局到本地值正则化。OMIGA可以将全局水平的值正则化转换为等效的隐式本地值正则化,同时允许在样本中学习,因此简洁地结合多智能体值分解和策略学习,并且使用在线正则化。根据对多智能体MuJoCo和StarCraft II微管理任务的完整实验,我们表明OMIGA在大多数任务上表现出优于当前最佳的在线MARL方法。
Robust Fully-Asynchronous Methods for Distributed Training over General Architecture
paper_authors: Zehan Zhu, Ye Tian, Yan Huang, Jinming Xu, Shibo He
for: 提高分布式机器学习问题中的同步效率和稳定性,因为存在延迟、包失和偏移者等问题。
methods: 提出了一种Robust Fully-Asynchronous Stochastic Gradient Tracking方法(R-FAST),每个设备都可以在自己的pace进行本地计算和通信,不需要任何同步。与现有的异步分布式算法不同,R-FAST可以消除设备间数据不同性的影响和 packet loss 的问题,通过设计合适的副本变量来跟踪和缓存总导数向量。
results: R-FAST 可以在平均情况下达到一个邻域的优点,具有平方根速率和强共形性的目标函数,以及一个子线性速率的总体非共形目标函数。实验表明,R-FAST 比同步 refer 算法(如环形 AllReduce 和 D-PSGD)快 1.5-2 倍,同时保持相似的精度,并超过现有的异步 SOTA 算法(如 AD-PSGD 和 OSGP),特别是在存在偏移者的情况下。Abstract
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers. We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across devices and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. More importantly, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication architectures. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex settings. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms existing asynchronous SOTA algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.
摘要
完美同步在分布式机器学习问题上是不可取和无法实现,因为存在延迟、包装丢失和偏移器。我们提议一种Robust Fully-Asynchronous Stochastic Gradient Tracking方法(R-FAST),其中每个设备都会在自己的速度和方式下进行本地计算和通信,不需要任何同步。与现有的异步分布式算法不同,R-FAST可以消除设备之间数据不一致的影响,并且能够承受 packet 丢失,通过设计合适的辅助变量来跟踪和缓存总Gradient向量。此外,我们利用了两个拓扑图进行通信,只要这两个拓扑图至少有一个共同的根节点,然后可以实现 flexible 的通信架构。我们证明了R-FAST在各种情况下都可以减少到预期的一个邻域中的优化率,包括圆满和强烈的对称问题。同时,R-FAST在一般非对称问题下也可以到达一个站点点的稳定点,但是速度是不可预测的。我们的实验表明,R-FAST比同步标准算法快1.5-2倍,同时仍能保持相同的准确率,并且在存在偏移器的情况下比现有的异步SOTA算法更高效。
Persistent Ballistic Entanglement Spreading with Optimal Control in Quantum Spin Chains
results: 研究发现,在VEEF的控制下,束缚 entropy(EE)会 linearly增长,直到EE到达真正的极限值 $\tilde{S} = - \log_{2} 2^{-\frac{N}{2} = \frac{N}{2}$。在这个过程中,EE的增长速度是velocity $v$,其中$v \about 2.76$, $4.98$,和$5.75$分别对应于各种束缚(Ising、XY、Heisenberg)。此外,研究还发现了在长距离相互作用下出现非线性增长的束缚。Abstract
Entanglement propagation provides a key routine to understand quantum many-body dynamics in and out of equilibrium. In this work, we uncover that the ``variational entanglement-enhancing'' field (VEEF) robustly induces a persistent ballistic spreading of entanglement in quantum spin chains. The VEEF is time dependent, and is optimally controlled to maximize the bipartite entanglement entropy (EE) of the final state. Such a linear growth persists till the EE reaches the genuine saturation $\tilde{S} = - \log_{2} 2^{-\frac{N}{2}=\frac{N}{2}$ with $N$ the total number of spins. The EE satisfies $S(t) = v t$ for the time $t \leq \frac{N}{2v}$, with $v$ the velocity. These results are in sharp contrast with the behaviors without VEEF, where the EE generally approaches a sub-saturation known as the Page value $\tilde{S}_{P} =\tilde{S} - \frac{1}{2\ln{2}$ in the long-time limit, and the entanglement growth deviates from being linear before the Page value is reached. The dependence between the velocity and interactions is explored, with $v \simeq 2.76$, $4.98$, and $5.75$ for the spin chains with Ising, XY, and Heisenberg interactions, respectively. We further show that the nonlinear growth of EE emerges with the presence of long-range interactions.
摘要
Entanglement 传播提供了量子多体动力学中重要的键式程序,以解释在和离EQilibrium中的动力学行为。在这个工作中,我们发现了“可变性增强”的场(VEEF)可以坚定地促进量子磁链中的抽象协议射频衰减。VEEF是时间依赖的,并且可以优化地控制以最大化磁链的二分 Entanglement entropy(EE)的最终状态。这种线性增长持续到EE到达真正的极限 $\tilde{S} = - \log_{2} 2^{-\frac{N}{2}=\frac{N}{2}$ 的时间 $t \leq \frac{N}{2v}$,其中 $v$ 是速度。这些结果与无VEEF情况下的行为不同,其中EE通常在长时间后达到一个下降的值known as the Page value $\tilde{S}_{P} =\tilde{S} - \frac{1}{2\ln{2}$,而且协议射频的增长不是线性的。我们还发现了在存在长距离交互时,非线性增长的EE现象。
Learning minimal representations of stochastic processes with variational autoencoders
results: 方法可以准确地描述随机过程的动态,并且可以生成符合预期随机行为的新轨迹。Abstract
Stochastic processes have found numerous applications in science, as they are broadly used to model a variety of natural phenomena. Due to their intrinsic randomness and uncertainty, they are however difficult to characterize. Here, we introduce an unsupervised machine learning approach to determine the minimal set of parameters required to effectively describe the dynamics of a stochastic process. Our method builds upon an extended $\beta$-variational autoencoder architecture. By means of simulated datasets corresponding to paradigmatic diffusion models, we showcase its effectiveness in extracting the minimal relevant parameters that accurately describe these dynamics. Furthermore, the method enables the generation of new trajectories that faithfully replicate the expected stochastic behavior. Overall, our approach enables for the autonomous discovery of unknown parameters describing stochastic processes, hence enhancing our comprehension of complex phenomena across various fields.
摘要
Note: Simplified Chinese is also known as "简化字" or "简化字".Please note that the translation is done using the "free" translation tool provided by Google, and may not be 100% accurate or idiomatic.
Finding Optimal Diverse Feature Sets with Alternative Feature Selection
results: 研究发现,使用备选特征集可以获得高精度预测模型,并且分析了一些影响这种结果的因素。Abstract
Feature selection is popular for obtaining small, interpretable, yet highly accurate prediction models. Conventional feature-selection methods typically yield one feature set only, which might not suffice in some scenarios. For example, users might be interested in finding alternative feature sets with similar prediction quality, offering different explanations of the data. In this article, we introduce alternative feature selection and formalize it as an optimization problem. In particular, we define alternatives via constraints and enable users to control the number and dissimilarity of alternatives. Next, we analyze the complexity of this optimization problem and show NP-hardness. Further, we discuss how to integrate conventional feature-selection methods as objectives. Finally, we evaluate alternative feature selection with 30 classification datasets. We observe that alternative feature sets may indeed have high prediction quality, and we analyze several factors influencing this outcome.
摘要
<>通过选择特征来获得小、可解释、具有高准确性预测模型的做法非常受欢迎。传统的特征选择方法通常只能提供一个特征集,这可能无法满足一些场景中的需求。例如,用户可能会有兴趣找到不同的特征集,这些特征集可以提供不同的数据解释,同时具有相似的预测质量。在这篇文章中,我们将介绍代替特征选择,并将其формализова为优化问题。具体来说,我们将通过约束定义代替,并让用户控制代替的数量和差异。接下来,我们分析了这个优化问题的复杂性,并证明其NP困难。此外,我们还讨论了如何 интеGRATE传统的特征选择方法作为目标。最后,我们对30种分类 datasets进行了评估,并发现代替特征集可以具有高预测质量。我们还分析了一些影响这种结果的因素。
Transferability of Convolutional Neural Networks in Stationary Learning Tasks
paper_authors: Damian Owerko, Charilaos I. Kanatsoulis, Jennifer Bondarchuk, Donald J. Bucci Jr, Alejandro Ribeiro
for: This paper is written for those interested in efficient training of convolutional neural networks (CNNs) for large-scale spatial problems.
methods: The paper investigates the properties of CNNs for tasks where the underlying signals are stationary, and proposes a novel framework for efficient training of CNNs on small windows of such signals.
results: The paper shows that the proposed framework achieves nearly optimal performance on much larger windows without retraining, and demonstrates this through theoretical analysis and experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. The results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.Abstract
Recent advances in hardware and big data acquisition have accelerated the development of deep learning techniques. For an extended period of time, increasing the model complexity has led to performance improvements for various tasks. However, this trend is becoming unsustainable and there is a need for alternative, computationally lighter methods. In this paper, we introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. To accomplish this we investigate the properties of CNNs for tasks where the underlying signals are stationary. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. This claim is supported by our theoretical analysis, which provides a bound on the performance degradation. Additionally, we conduct thorough experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten. Thus, CNN architectures provide solutions to these problems at previously computationally intractable scales.
摘要
Translated into Simplified Chinese:latest advances in hardware and big data acquisition have accelerated the development of deep learning techniques. for a long time, increasing the model complexity has led to performance improvements for various tasks. however, this trend is becoming unsustainable, and there is a need for alternative, computationally lighter methods. in this paper, we introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. to accomplish this, we investigate the properties of CNNs for tasks where the underlying signals are stationary. we show that a CNN trained on small windows of such signals achieves nearly the same performance on much larger windows without retraining. this claim is supported by our theoretical analysis, which provides a bound on the performance degradation. additionally, we conduct thorough experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten. thus, CNN architectures provide solutions to these problems at previously computationally intractable scales.
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion
results: 我们的方法在MELD数据集上实现了substantial的result,而Modality-Conversion++方法在语音基于的approaches中实现了最高的SERWF1分数。这些结果表明,模式转换可以帮助提高tasks的性能,特别是在不同的modalities上进行的任务。Abstract
Speech Emotion Recognition (SER) is a challenging task. In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. We assess our approach through two experiments: first, a method named Modality-Conversion that employs automatic speech recognition (ASR) systems, followed by a text classifier; second, we assume perfect ASR output and investigate the impact of modality conversion on SER, this method is called Modality-Conversion++. Our findings indicate that the first method yields substantial results, while the second method outperforms state-of-the-art (SOTA) speech-based approaches in terms of SER weighted-F1 (WF1) score on the MELD dataset. This research highlights the potential of modality conversion for tasks that can be conducted in alternative modalities.
摘要
《语音情感识别(SER)是一项复杂的任务。在这篇论文中,我们介绍了一种模态转换概念,用于提高MELD数据集上的情感识别性能。我们通过两个实验进行评估:第一个方法是使用自动语音识别(ASR)系统,然后使用文本分类器;第二个方法是假设ASR输出是完美的,并调查模态转换对SER的影响。我们的发现表明第一个方法具有显著的效果,而第二个方法在MELD数据集上的SERWeighted-F1(WF1)分数超过了现有的speech-based方法。这项研究强调了模态转换在可以进行多种模态任务的场景中的潜力。》
Design Space Exploration on Efficient and Accurate Human Pose Estimation from Sparse IMU-Sensing
methods: 本研究使用了 simulative Design Space Exploration(DSE)来研究变量IMU传感器的数量和位置的影响。首先,我们生成了基于公共可用的人体模型数据集的IMU数据,并使用深度学习模型进行训练。此外,我们提出了一个结合约束的精度-资源交互度量来评估感知器的配置。
results: 我们的研究表明,对于一个与精度和资源平衡的系统,可以通过选择4个感知器的优质配置,来提高精度32.7%,同时减少硬件尝试量 by two个感知器。我们的工作可以用于设计关注隐私和资源管理的健康应用程序。Abstract
Human Pose Estimation (HPE) to assess human motion in sports, rehabilitation or work safety requires accurate sensing without compromising the sensitive underlying personal data. Therefore, local processing is necessary and the limited energy budget in such systems can be addressed by Inertial Measurement Units (IMU) instead of common camera sensing. The central trade-off between accuracy and efficient use of hardware resources is rarely discussed in research. We address this trade-off by a simulative Design Space Exploration (DSE) of a varying quantity and positioning of IMU-sensors. First, we generate IMU-data from a publicly available body model dataset for different sensor configurations and train a deep learning model with this data. Additionally, we propose a combined metric to assess the accuracy-resource trade-off. We used the DSE as a tool to evaluate sensor configurations and identify beneficial ones for a specific use case. Exemplary, for a system with equal importance of accuracy and resources, we identify an optimal sensor configuration of 4 sensors with a mesh error of 6.03 cm, increasing the accuracy by 32.7% and reducing the hardware effort by two sensors compared to state of the art. Our work can be used to design health applications with well-suited sensor positioning and attention to data privacy and resource-awareness.
摘要
人体姿势估算(HPE)在运动、重abilitation或工作安全中需要准确感知而不妨碍敏感个人数据的保护。因此,本地处理是必要的,而常见的相机感知可能会占用过多的硬件资源。中心的准确性和硬件资源的有效使用之间的贸易是在研究中 rarely 被讨论。我们通过 simulative 的 Design Space Exploration(DSE)来解决这个贸易,并通过不同的 IMU 传感器的数量和位置来调整 IMU 数据。首先,我们使用公共可用的人体模型集来生成 IMU 数据,并使用这些数据来训练深度学习模型。此外,我们提出了一个合并的度量来评估准确性和资源之间的贸易。通过 DSE 作为工具,我们可以评估不同的传感器配置,并为特定应用场景选择有利的传感器配置。例如,对于需要同时保证准确性和资源的系统,我们identified 最佳的传感器配置为4个传感器,具有6.03 cm的网格误差,提高准确性32.7%,并在相对于状态艺术的系统中减少硬件努力两个传感器。我们的工作可以用于设计健康应用程序,并且注重数据隐私和硬件资源的注意。
FMT: Removing Backdoor Feature Maps via Feature Map Testing in Deep Neural Networks
paper_authors: Dong Huang, Qingwen Bu, Yahao Qing, Yichao Fu, Heming Cui
for: 本研究旨在提出一种新的防御策略,以防止深度神经网络(DNN)模型受到攻击。
methods: 本研究使用了特征图测试(Feature Map Testing,FMT)方法,不同于现有的防御策略,FMT尝试探测DNN模型中的后门特征图,然后将其消除,并使用安全的subset of training data进行微调。
results: 对比现有防御策略,FMT可以更好地降低攻击成功率(ASR),并保持模型性能高。此外,FMT的Robust Accuracy(RA)比 conventinal defense methods高,表明其在mitigating the effects of backdoor attacks时可以更好地保持模型性能。Abstract
Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attack, which is achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender can not successfully reproduce the trigger. Consequently, the DNN model will not be repaired since the trigger is not effectively removed. In this work, we propose Feature Map Testing~(FMT). Different from existing defense strategies, which focus on reproducing backdoor triggers, FMT tries to detect the backdoor feature maps, which are trained to extract backdoor information from the inputs. After detecting these backdoor feature maps, FMT will erase them and then fine-tune the model with a secure subset of training data. Our experiments demonstrate that, compared to existing defense strategies, FMT can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers. Second, unlike conventional defense methods that tend to exhibit low Robust Accuracy (i.e., the model's accuracy on the poisoned data), FMT achieves higher RA, indicating its superiority in maintaining model performance while mitigating the effects of backdoor attacks~(e.g., FMT obtains 87.40\% RA in CIFAR10). Third, compared to existing feature map pruning techniques, FMT can cover more backdoor feature maps~(e.g., FMT removes 83.33\% of backdoor feature maps from the model in the CIFAR10 \& BadNet scenario).
摘要
深度神经网络在许多关键应用中广泛使用,如自动驾驶和医疗诊断。然而,它们的安全性受到后门攻击的威胁,后门攻击通过添加人工 patrerns到特定的训练数据来实现。现有的防御策略主要集中在使用反工程来复制攻击者生成的后门触发器,并在使用ground-truth标签进行修复模型。然而,如果攻击者生成的后门触发器复杂且隐藏的话,DEFENDER无法成功复制它。因此,模型不会被修复,因为触发器不会有效地除除。在这种情况下,我们提出了特征图测试~(FMT)。与现有的防御策略不同,FMT尝试探测backdoor特征图,它们是用于从输入中提取backdoor信息的。一旦探测到这些backdoor特征图,FMT会将它们清除,然后使用安全的训练数据进行微调。我们的实验表明,相比于现有的防御策略,FMT可以更有效地减少攻击成功率,即使攻击者生成的触发器非常复杂和隐藏。二、与传统防御方法不同,FMT可以保持模型性能的高度,而不是产生低Robust Accuracy(i.e.,模型在恶意数据上的准确率)。三、与现有的特征图剔除技术相比,FMT可以覆盖更多的backdoor特征图,例如在CIFAR10 & BadNet场景中,FMT可以从模型中删除83.33%的backdoor特征图。
A multi-modal representation of El Niño Southern Oscillation Diversity
results: 研究人员发现,ENSO事件并不是二元的,而是有五种类型:极端 El Ni~no、EP El Ni~no、CP La Ni~na、EP La Ni~na 和 Extreme La Ni~na。此外,研究人员还发现,这些不同类型的 ENSO 事件在不同的时间和强度方面具有显著的差异。Abstract
The El Ni\~no-Southern Oscillation (ENSO) is characterized by alternating periods of warm (El Ni\~no) and cold (La Ni\~na) sea surface temperature anomalies (SSTA) in the equatorial Pacific. Although El Ni\~no and La Ni\~na are well-defined climate patterns, no two events are alike. To date, ENSO diversity has been described primarily in terms of the longitudinal location of peak SSTA, used to define a bimodal classification of events in Eastern Pacific (EP) and Central Pacific (CP) types. Here, we use low-dimensional representations of Pacific SSTAs to argue that binary categorical memberships are unsuitable to describe ENSO events. Using fuzzy unsupervised clustering, we recover the four known ENSO categories, along with a fifth category: an Extreme El Ni\~no. We show that Extreme El Ni\~nos differ both in their intensity and temporal evolution from canonical EP El Ni\~nos. We also find that CP La Ni\~nas, EP El Ni\~nos, and Extreme El Ni\~nos contribute the most to interdecadal ENSO variability.
摘要
“恶elf Niño-南方振荡(ENSO)是指太平洋赤道区的海水温度异常(SSTA)在 alternate periods of warm(El Niño)和 cold(La Niña)。虽然El Niño和La Niña是明确定义的气候模式,但每个事件都不一样。到目前为止,ENSO 多样性仅被描述为东太平洋(EP)和中太平洋(CP)类型的 longitudinal 位置峰值SSTA的 биModal 分类。本文使用低维表示太平洋SSTAs, argue binary categorical memberships 不适用于描述ENSO事件。使用不supervised clustering,我们回收了四种已知ENSO类别,以及一个第五个类别:极端 El Niño。我们显示,极端 El Niño 与 canonical EP El Niño 不同,具有不同的强度和时间演化。我们还发现,CP La Niña、EP El Niño 和极端 El Niño 在 Interdecadal ENSO 变化中发挥了关键作用。”Note: "恶elf" is a typo, the correct word is "El Niño".
Towards practical reinforcement learning for tokamak magnetic control
paper_authors: Brendan D. Tracey, Andrea Michi, Yuri Chervonyi, Ian Davies, Cosmin Paduraru, Nevena Lazic, Federico Felici, Timo Ewalds, Craig Donner, Cristian Galperti, Jonas Buchli, Michael Neunert, Andrea Huber, Jonathan Evens, Paula Kurylowicz, Daniel J. Mankowitz, Martin Riedmiller, The TCV Team
for: 这个研究旨在改善基于增强反馈学习(RL)的真实时间控制系统,特别是核聚体控制领域。
methods: 本研究使用RL方法,并提出了几个算法改进,包括代理架构和训练程序。
results: simulation results show that the proposed RL-based controller can achieve up to 65% improvement in shape accuracy, reduce the long-term bias of the plasma current, and reduce the training time required to learn new tasks by a factor of 3 or more. 新的实验结果显示,RL方法可以在TCV Tokamak上实现精准的燃烧。Abstract
Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.
摘要
现代回归学习(RL)已经在实时控制系统中展现出了扎实的成果,包括磁泵控制领域。然而,RL方法还存在许多缺点,比如传统反馈控制方法更高的控制精度和稳定性。在这项工作中,我们解决了RL方法的关键缺点,包括提高愿望束质量的控制精度、减少稳态误差和减少学习新任务所需的训练时间。我们基于\cite{degrave2022magnetic}的研究,提出了算法改进和训练过程优化。我们在实验中获得了65%的形状精度提高、长期偏差的磁泵电流减少和学习新任务所需的训练时间减少了3倍以上。此外,我们在TCVtokamak上使用了更新的RL控制器,实际实现了在实验中获得的结果,并预示了在RL方法中实现精度的常见实践。
Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning
results: 通过对 EfficientNetV2 模型在 MNIST 数据集上进行广泛的实验,证明了提案的 SFL 框架的有效性和提高性。Abstract
To alleviate the shortage of computing power faced by clients in training deep neural networks (DNNs) using federated learning (FL), we leverage the edge computing and split learning to propose a model-splitting allowed FL (SFL) framework, with the aim to minimize the training latency without loss of test accuracy. Under the synchronized global update setting, the latency to complete a round of global training is determined by the maximum latency for the clients to complete a local training session. Therefore, the training latency minimization problem (TLMP) is modelled as a minimizing-maximum problem. To solve this mixed integer nonlinear programming problem, we first propose a regression method to fit the quantitative-relationship between the cut-layer and other parameters of an AI-model, and thus, transform the TLMP into a continuous problem. Considering that the two subproblems involved in the TLMP, namely, the cut-layer selection problem for the clients and the computing resource allocation problem for the parameter-server are relative independence, an alternate-optimization-based algorithm with polynomial time complexity is developed to obtain a high-quality solution to the TLMP. Extensive experiments are performed on a popular DNN-model EfficientNetV2 using dataset MNIST, and the results verify the validity and improved performance of the proposed SFL framework.
摘要
To solve this mixed integer nonlinear programming problem, we first develop a regression method to establish a quantitative relationship between the cut-layer and other parameters of an AI model. This transformation allows us to convert the TLMP into a continuous problem. As the two subproblems involved in the TLMP, namely, the cut-layer selection problem for clients and the computing resource allocation problem for the parameter server, are relatively independent, we develop an alternate optimization-based algorithm with polynomial time complexity to obtain a high-quality solution to the TLMP.We conduct extensive experiments on the popular DNN model EfficientNetV2 using the MNIST dataset, and the results demonstrate the effectiveness and improved performance of the proposed SFL framework.
General regularization in covariate shift adaptation
paper_authors: Duc Hoan Nguyen, Sergei V. Pereverzyev, Werner Zellinger
for: corrected error of least squares learning algorithms in reproducing kernel Hilbert spaces (RKHS)
methods: sample weights determined by estimated Radon-Nikod'ym derivative of future data distribution w.r.t.~training data distribution
results: novel error bounds for reweighted kernel regression in RKHS, showing that fewer samples are needed for the same level of accuracy compared to state-of-the-art analyses under weak smoothness conditions.Here’s the Chinese text in the format you requested:
for: 修正最小二乘学习算法在嵌入kernel空间(RKHS)中的错误
methods: 使用估计的拉戈-尼科德偏微分来确定未来数据分布与训练数据分布之间的比较
results: novel的误差下界结果,表明在弱约束下,只需要 fewer samples 来达到相同级别的准确率,与现有分析相比。Abstract
Sample reweighting is one of the most widely used methods for correcting the error of least squares learning algorithms in reproducing kernel Hilbert spaces (RKHS), that is caused by future data distributions that are different from the training data distribution. In practical situations, the sample weights are determined by values of the estimated Radon-Nikod\'ym derivative, of the future data distribution w.r.t.~the training data distribution. In this work, we review known error bounds for reweighted kernel regression in RKHS and obtain, by combination, novel results. We show under weak smoothness conditions, that the amount of samples, needed to achieve the same order of accuracy as in the standard supervised learning without differences in data distributions, is smaller than proven by state-of-the-art analyses.
摘要
Sample 重点是一种广泛使用的方法,用于纠正最小二乘学习算法在 reproduce kernel Hilbert space(RKHS)中的错误,这种错误是由未来数据分布不同于训练数据分布所导致的。在实际应用中,样本权重通常是根据估计的Radon-Nikodym derivate,未来数据分布与训练数据分布之间的比例来确定。在这种工作中,我们回顾了已知的重量 kernel 回归错误约束,并通过组合,获得了新的结果。我们在弱纹理条件下表明,需要的样本数量,以达到标准无 diferenze 学习中的同等精度水平,比已知分析的结果少。
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
results: 研究结果显示TSDiff可以与多个任务特定的条件散射模型竞争,并且可以透过将TSDiff训练为下游任务来实现更好的预测性和生成性。Abstract
Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (predict). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (refine). Notably, the generative performance of the model remains intact -- downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (synthesize).
摘要
Translate the given text into Simplified Chinese.Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (预测). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (修正). Notably, the generative performance of the model remains intact -- downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (生成).
A New Deep State-Space Analysis Framework for Patient Latent State Estimation and Classification from EHR Time Series Data
results: 研究人员通过对12,695名癌病人的时间序列实验数据进行分析,成功地发现了与诊断相关的潜在状态。通过可见化和分群分析,研究人员还发现了不同抗癌药物的时间过程中病人状态的特征变化。这种框架比现有方法更能够捕捉可解释的潜在空间。Abstract
Many diseases, including cancer and chronic conditions, require extended treatment periods and long-term strategies. Machine learning and AI research focusing on electronic health records (EHRs) have emerged to address this need. Effective treatment strategies involve more than capturing sequential changes in patient test values. It requires an explainable and clinically interpretable model by capturing the patient's internal state over time. In this study, we propose the "deep state-space analysis framework," using time-series unsupervised learning of EHRs with a deep state-space model. This framework enables learning, visualizing, and clustering of temporal changes in patient latent states related to disease progression. We evaluated our framework using time-series laboratory data from 12,695 cancer patients. By estimating latent states, we successfully discover latent states related to prognosis. By visualization and cluster analysis, the temporal transition of patient status and test items during state transitions characteristic of each anticancer drug were identified. Our framework surpasses existing methods in capturing interpretable latent space. It can be expected to enhance our comprehension of disease progression from EHRs, aiding treatment adjustments and prognostic determinations.
摘要
许多疾病,包括癌症和chronic condition,需要长期的治疗期和长期策略。机器学习和人工智能研究集中在电子医疗记录(EHRs)上,以解决这一需求。有效的治疗策略不仅仅是记录患者的测试值序列变化,而是需要一个可解释和临床可解释的模型,可以捕捉患者的内部状态的变化。在这项研究中,我们提出了“深度状态空间分析框架”,使用时间序列无监督学习EHRs中的深度状态空间模型。这个框架允许我们学习、可见化和分组患者的时间变化状态。我们对12,695名癌症患者的时间序列实验室数据进行了评估。通过估计 latent states,我们成功地发现了与诊断相关的秘密状态。通过可视化和分组分析,我们可以了解患者的状态转移特征和测试项的时间变化特征。我们的框架比现有方法更能捕捉可解释的状态空间。它可以预期地提高我们从EHRs中了解疾病进程的认知,以便治疗调整和诊断决策。
A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values
methods: 本研究使用变换器架构,不需要任何填充策略,可以Effectively learning from both censored and uncensored patients and their available features,预测NSCLC患者的OS。
results: comparing with现有的方法,本研究 obtiented Ct-index of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.Abstract
One of the most challenging fields where Artificial Intelligence (AI) can be applied is lung cancer research, specifically non-small cell lung cancer (NSCLC). In particular, overall survival (OS), the time between diagnosis and death, is a vital indicator of patient status, enabling tailored treatment and improved OS rates. In this analysis, there are two challenges to take into account. First, few studies effectively exploit the information available from each patient, leveraging both uncensored (i.e., dead) and censored (i.e., survivors) patients, considering also the events' time. Second, the handling of incomplete data is a common issue in the medical field. This problem is typically tackled through the use of imputation methods. Our objective is to present an AI model able to overcome these limits, effectively learning from both censored and uncensored patients and their available features, for the prediction of OS for NSCLC patients. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. By making use of ad-hoc losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
摘要
一个非常挑战性的领域是用人工智能(AI)应用于肺癌研究,特别是非小细胞肺癌(NSCLC)。在这种分析中,需要考虑两个挑战。首先,很少的研究有效利用每个病人的信息,同时利用bothuncensored(即死亡)和censored(即存活)病人的信息,同时考虑事件的时间。第二,医疗领域中的数据缺失是一个常见的问题。通常通过使用插入方法来解决这个问题。我们的目标是开发一个能够超越这些限制的AI模型,可以从bothuncensored和censored病人中学习可用的特征,并且可以预测NSCLC病人的生存时间。我们提出了一种新的缺失值处理方法,利用变换器架构来考虑only可用的特征,不需要任何插入策略。通过使用适应性损失函数来考虑bothuncensored和censored病人,以及时间变化的风险。我们与状态arta的模型进行比较,并使用不同的插入策略。我们在6年的评估期间使用不同的时间粒度obtained a Ct-index of 71.97, 77.58, and 80.72 for time units of 1 month, 1 year, and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
Improve Long-term Memory Learning Through Rescaling the Error Temporally
results: 我们在不同的长期任务和序列模型上进行了数值实验,并证实了我们的主张。数值结果表明,适当的时间折算错误对长期记忆学习是重要的。据我们所知,这是首次对序列模型中不同错误的短期记忆偏好进行量化分析。Abstract
This paper studies the error metric selection for long-term memory learning in sequence modelling. We examine the bias towards short-term memory in commonly used errors, including mean absolute/squared error. Our findings show that all temporally positive-weighted errors are biased towards short-term memory in learning linear functionals. To reduce this bias and improve long-term memory learning, we propose the use of a temporally rescaled error. In addition to reducing the bias towards short-term memory, this approach can also alleviate the vanishing gradient issue. We conduct numerical experiments on different long-memory tasks and sequence models to validate our claims. Numerical results confirm the importance of appropriate temporally rescaled error for effective long-term memory learning. To the best of our knowledge, this is the first work that quantitatively analyzes different errors' memory bias towards short-term memory in sequence modelling.
摘要
Neural Operators for Delay-Compensating Control of Hyperbolic PIDEs
paper_authors: Jie Qi, Jing Zhang, Miroslav Krstic
for: 该 paper 探讨了将 DeepONet Operator-learning 框架应用于高级湍流 PDE 控制中的延迟问题。
methods: 该 paper 使用了 PDE 倒退设计生成积分函数,并使用 DeepONet 神经网络来近似这些积分函数。
results: 该 paper 证明了在反馈 controllers 下的稳定性,并且还提出了 DeepONet 近似的观测器和输出反馈法则,并证明了其稳定性。 numerics 表明,使用 DeepONet 可以大幅减少计算量,减少两个数量级。Abstract
The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input. The PDE backstepping design produces gain functions that are outputs of a nonlinear operator, mapping functions on a spatial domain into functions on a spatial domain, and where this gain-generating operator's inputs are the PDE's coefficients. The operator is approximated with a DeepONet neural network to a degree of accuracy that is provably arbitrarily tight. Once we produce this approximation-theoretic result in infinite dimension, with it we establish stability in closed loop under feedback that employs approximate gains. In addition to supplying such results under full-state feedback, we also develop DeepONet-approximated observers and output-feedback laws and prove their own stabilizing properties under neural operator approximations. With numerical simulations we illustrate the theoretical results and quantify the numerical effort savings, which are of two orders of magnitude, thanks to replacing the numerical PDE solving with the DeepONet.
摘要
With this approximation-theoretic result in infinite dimension, we establish stability in closed loop under feedback that employs approximate gains. In addition to supplying results under full-state feedback, we also develop DeepONet-approximated observers and output-feedback laws and prove their own stabilizing properties under neural operator approximations.Numerical simulations illustrate the theoretical results and quantify the numerical effort savings, which are of two orders of magnitude, thanks to replacing the numerical PDE solving with the DeepONet.
Batching for Green AI – An Exploratory Study on Inference
results: 研究发现,批处理有显著影响于能耗和响应时间。此外,研究还提供了过去十年内神经网络精度和能耗的时间轴图表,发现能耗在精度提升的同时增长得远远 быстре,并提出了一些问题。同时,研究发现ShuffleNetV2(2018)模型在其时代实现了竞争性性能,但能耗较低。但是,研究表明结果受模型影响。Abstract
The batch size is an essential parameter to tune during the development of new neural networks. Amongst other quality indicators, it has a large degree of influence on the model's accuracy, generalisability, training times and parallelisability. This fact is generally known and commonly studied. However, during the application phase of a deep learning model, when the model is utilised by an end-user for inference, we find that there is a disregard for the potential benefits of introducing a batch size. In this study, we examine the effect of input batching on the energy consumption and response times of five fully-trained neural networks for computer vision that were considered state-of-the-art at the time of their publication. The results suggest that batching has a significant effect on both of these metrics. Furthermore, we present a timeline of the energy efficiency and accuracy of neural networks over the past decade. We find that in general, energy consumption rises at a much steeper pace than accuracy and question the necessity of this evolution. Additionally, we highlight one particular network, ShuffleNetV2(2018), that achieved a competitive performance for its time while maintaining a much lower energy consumption. Nevertheless, we highlight that the results are model dependent.
摘要
批处大小是深度学习模型开发中非常重要的参数,它对模型的准确率、通用性、训练时间和并行性具有很大的影响。这个事实已经广泛被研究和了解。但在深度学习模型在实际应用阶段使用时,批处大小却往往被忽视。在这个研究中,我们研究了五种在发表时被视为顶尖的计算机视觉神经网络的输入批处效果。结果表明,批处有着显著的影响于能源消耗和响应时间。此外,我们还提供了过去十年内神经网络能效和准确率的时间轴,发现在总体来说,能源消耗在准确率增长的速度上升得 Much steeper than the latter,并质疑这种演化的必要性。此外,我们还提到了一个特定的网络,ShuffleNetV2(2018),它在其时间内实现了竞争性的性能,同时保持了远低的能源消耗。但我们要注意,结果是模型具体的。
Unsupervised Embedding Learning for Human Activity Recognition Using Wearable Sensor Data
for: Recognizing different human activities from wearable sensor data in ubiquitous computing.
methods: Unsupervised approach based on the nature of human activity to project data into an embedding space, followed by clustering algorithms to form behavior clusters.
results: Experimental results on three labeled benchmark datasets show the effectiveness of the framework and improved performance in identifying and categorizing human activities compared to unsupervised techniques applied directly to the original data set.Here’s the full Chinese text:
results: 对三个标注数据集进行实验,结果表明我们的框架具有效果,可以帮助 clustering 算法更好地认定和分类人类活动,相比直接应用于原始数据集的无监督技术。Abstract
The embedded sensors in widely used smartphones and other wearable devices make the data of human activities more accessible. However, recognizing different human activities from the wearable sensor data remains a challenging research problem in ubiquitous computing. One of the reasons is that the majority of the acquired data has no labels. In this paper, we present an unsupervised approach, which is based on the nature of human activity, to project the human activities into an embedding space in which similar activities will be located closely together. Using this, subsequent clustering algorithms can benefit from the embeddings, forming behavior clusters that represent the distinct activities performed by a person. Results of experiments on three labeled benchmark datasets demonstrate the effectiveness of the framework and show that our approach can help the clustering algorithm achieve improved performance in identifying and categorizing the underlying human activities compared to unsupervised techniques applied directly to the original data set.
摘要
随处通用的智能手机和其他搭载设备中嵌入的传感器使人类活动数据更加可 accessible。然而,从智能手机传感器数据中识别不同的人类活动仍然是智能 computing中一个挑战。其中一个原因是大多数获取到的数据没有标签。在这篇论文中,我们提出了一种不supervised的方法,基于人类活动的自然特征,将人类活动 проек到一个嵌入空间中,在该空间中相似的活动会受到相似的嵌入。使用这些嵌入,后续的聚类算法可以从这些嵌入中受益,形成人类活动的行为归一化。实验结果表明,我们的方法可以帮助聚类算法更好地识别和分类人类活动,相比直接应用于原始数据集的无监督技术。
An Analysis of Multi-Agent Reinforcement Learning for Decentralized Inventory Control Systems
results: 通过对不同供应链网络和不确定性水平的 simulate 结果显示,使用多代理人距离优化算法与中央数据驱动方法相比,性能几乎相同,而且在大多数情况下超过分布式模型基于方法。Abstract
Most solutions to the inventory management problem assume a centralization of information that is incompatible with organisational constraints in real supply chain networks. The inventory management problem is a well-known planning problem in operations research, concerned with finding the optimal re-order policy for nodes in a supply chain. While many centralized solutions to the problem exist, they are not applicable to real-world supply chains made up of independent entities. The problem can however be naturally decomposed into sub-problems, each associated with an independent entity, turning it into a multi-agent system. Therefore, a decentralized data-driven solution to inventory management problems using multi-agent reinforcement learning is proposed where each entity is controlled by an agent. Three multi-agent variations of the proximal policy optimization algorithm are investigated through simulations of different supply chain networks and levels of uncertainty. The centralized training decentralized execution framework is deployed, which relies on offline centralization during simulation-based policy identification, but enables decentralization when the policies are deployed online to the real system. Results show that using multi-agent proximal policy optimization with a centralized critic leads to performance very close to that of a centralized data-driven solution and outperforms a distributed model-based solution in most cases while respecting the information constraints of the system.
摘要
大多数解决存储管理问题的解决方案假设了中央化信息,这与实际供应链网络中的组织结构不兼容。存储管理问题是操作研究中的一个常见规划问题,旨在找到优化再配货策略的节点在供应链中。虽然许多中央化解决方案存在,但它们不适用于实际的独立实体组成的供应链。问题可以自然划分为子问题,每个问题与一个独立实体相关。因此,一种基于多代理人学习的数据驱动的分布式存储管理解决方案被提议,其中每个实体由一个代理人控制。通过在不同供应链网络和不确定程度下进行多个多代理人变种的迪斯特普罗ximal Policy优化算法的实践,研究了这种解决方案的性能。中央训练、分布行动的框架被采用,其在在线部署策略时仅仅是在模拟期间进行中央化训练,但在实际系统中允许分布式执行。结果表明,使用多代理人 proximal Policy优化算法和中央评估器可以与中央数据驱动解决方案的性能几乎相同,而且在大多数情况下超越分布式模型基于解决方案,同时尊重系统信息约束。
Prompting Large Language Models with Speech Recognition Abilities
paper_authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
for: extensions the capabilities of large language models (LLMs) to perform speech recognition
methods: directly attaching a small audio encoder to the LLM, prepending a sequence of audial embeddings to the text token embeddings
results: outperformed monolingual baselines by 18% and performed multilingual speech recognition despite LLaMA being trained overwhelmingly on English text, and the LLM can be frozen or with fewer embeddingsAbstract
Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings, the LLM can be converted to an automatic speech recognition (ASR) system, and be used in the exact same manner as its textual counterpart. Experiments on Multilingual LibriSpeech (MLS) show that incorporating a conformer encoder into the open sourced LLaMA-7B allows it to outperform monolingual baselines by 18% and perform multilingual speech recognition despite LLaMA being trained overwhelmingly on English text. Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings. The results from these studies show that multilingual ASR is possible even when the LLM is frozen or when strides of almost 1 second are used in the audio encoder opening up the possibility for LLMs to operate on long-form audio.
摘要
methods: 这种研究使用了 relative entropy(RE)和 maximum entropy principle(MEP)来分析和设计通信协议。具体来说,研究使用了 RE 来导引合理编码和解码消息,并使用了 weighted RE 来引导注意力协调。
results: 研究发现,使用 entropic attention communication 可以实现合理的注意力协调,并且可以帮助解决人类communication 中的各种问题,例如不同利益的合作。此外,研究还发现,在某些情况下,weighted RE 不是一个正确的方法,需要使用 proper attention communication。Abstract
The concept of attention, numerical weights that emphasize the importance of particular data, has proven to be very relevant in artificial intelligence. Relative entropy (RE, aka Kullback-Leibler divergence) plays a central role in communication theory. Here we combine these concepts, attention and RE. RE guides optimal encoding of messages in bandwidth-limited communication as well as optimal message decoding via the maximum entropy principle (MEP). In the coding scenario, RE can be derived from four requirements, namely being analytical, local, proper, and calibrated. Weighted RE, used for attention steering in communications, turns out to be improper. To see how proper attention communication can emerge, we analyze a scenario of a message sender who wants to ensure that the receiver of the message can perform well-informed actions. If the receiver decodes the message using the MEP, the sender only needs to know the receiver's utility function to inform optimally, but not the receiver's initial knowledge state. In case only the curvature of the utility function maxima are known, it becomes desirable to accurately communicate an attention function, in this case a by this curvature weighted and re-normalized probability function. Entropic attention communication is here proposed as the desired generalization of entropic communication that permits weighting while being proper, thereby aiding the design of optimal communication protocols in technical applications and helping to understand human communication. For example, our analysis shows how to derive the level of cooperation expected under misaligned interests of otherwise honest communication partners.
摘要
人工智能中的注意力概念(numerical weights that emphasize the importance of particular data)已经证明非常有用。相对 entropy(RE,又称库拉克-莱布尔差分)在通信理论中扮演着中心角色。在这里,我们将这些概念结合起来。RE guideline优化的信息编码和解码,以及最大Entropy原则(MEP)。在编码场景下,RE可以从四个需求 derivation,namely analytical, local, proper, and calibrated。在通信中使用Weighted RE进行注意力引导,实际上是不正确的。要实现正确的注意力通信,我们分析了一个消息发送者想要确保接收者可以做出有知识的行动的场景。如果接收者使用MEP解码消息, THEN sender只需要知道接收者的用途函数,以便具有最佳知识,而不需要知道接收者的初始知识状态。如果只知道用途函数的拐点的 curvature,则可以准确地传输注意力函数,即通过拐点的权重和normalized probability function。我们提出了Entropic attention communication作为可以正确地权重的通信 generalized,从而帮助设计优化的通信协议和理解人类通信。例如,我们的分析表明,在不同的利益冲突下,可以预期的合作水平。
Direct and inverse modeling of soft robots by learning a condensed FEM model
paper_authors: Etienne Ménager, Tanguy Navez, Olivier Goury, Christian Duriez
for: 该文章是为了提出一种学习基于方法来获得一个具有充分精炼的机械表示的软体机器人控制方法。
methods: 该文章使用了非线性弹性数据在活动器/效应器空间,从 Condensation of Finite Element Method 模型中提取出了一个压缩后的机械模型。
results: 该文章表明了这个压缩后的机械模型可以通过一个合理的数据量来学习,同时也非常高效地模拟软体机器人的直接和反直接姿势。 authors also show了一个由 FEM 模型和学习后的压缩模型组成的融合模型,并对其进行了比较。Abstract
The Finite Element Method (FEM) is a powerful modeling tool for predicting the behavior of soft robots. However, its use for control can be difficult for non-specialists of numerical computation: it requires an optimization of the computation to make it real-time. In this paper, we propose a learning-based approach to obtain a compact but sufficiently rich mechanical representation. Our choice is based on nonlinear compliance data in the actuator/effector space provided by a condensation of the FEM model. We demonstrate that this compact model can be learned with a reasonable amount of data and, at the same time, be very efficient in terms of modeling, since we can deduce the direct and inverse kinematics of the robot. We also show how to couple some models learned individually in particular on an example of a gripper composed of two soft fingers. Other results are shown by comparing the inverse model derived from the full FEM model and the one from the compact learned version. This work opens new perspectives, namely for the embedded control of soft robots, but also for their design. These perspectives are also discussed in the paper.
摘要
finite element method (FEM) 是软型机器人预测行为的强大工具。然而,它的控制使用可能会对非专家们数值计算不太容易。在这篇论文中,我们提出了一种基于学习的方法,以获得实时计算的减少。我们的选择基于软型机器人 actuator/effector 空间中的非线性弹性数据,即 FEM 模型的压缩。我们证明了这个简洁的模型可以通过一定数据量学习,同时具有高效的计算模型特性,因为我们可以直接从模型中推导软型机器人的直接和反直接骨骼动作。我们还将一些单独学习的模型,如一个由两个软指组成的抓取器,融合在一起。其他结果还是通过对全 ФЭМ 模型 derive 的反模型和学习后的模型进行比较。这项工作打开了新的视野,包括软型机器人 Embedded 控制和设计。这些视野也在论文中进行了讨论。
Probabilistic Modeling of Inter- and Intra-observer Variability in Medical Image Segmentation
results: 对实际的肿瘤分割数据进行实验,显示Pionono模型在准确性和效率方面比前者模型(如STAPLE、概率U-Net等)高,同时还可以预测多个协调的分割图,提供诊断过程中的有价值信息。Abstract
Medical image segmentation is a challenging task, particularly due to inter- and intra-observer variability, even between medical experts. In this paper, we propose a novel model, called Probabilistic Inter-Observer and iNtra-Observer variation NetwOrk (Pionono). It captures the labeling behavior of each rater with a multidimensional probability distribution and integrates this information with the feature maps of the image to produce probabilistic segmentation predictions. The model is optimized by variational inference and can be trained end-to-end. It outperforms state-of-the-art models such as STAPLE, Probabilistic U-Net, and models based on confusion matrices. Additionally, Pionono predicts multiple coherent segmentation maps that mimic the rater's expert opinion, which provides additional valuable information for the diagnostic process. Experiments on real-world cancer segmentation datasets demonstrate the high accuracy and efficiency of Pionono, making it a powerful tool for medical image analysis.
摘要
医学图像分割是一项具有挑战性的任务,特别是因为 между观察者和内部观察者之间存在差异,甚至包括医学专家。在这篇论文中,我们提出了一种新的模型,即可信度描述网络(Pionono)。它捕捉每名评分员的标签行为,并将其映射到多维度的可信度分布中。然后,将这些信息与图像特征图拟合,生成可信度推测。该模型通过变分推断优化,可以在端到端方式进行训练。与现有的STAPLE、概率U-Net和基于冲突矩阵的模型相比,Pionono表现出较高的准确率和效率。此外,Pionono可以预测多个具有一致性的分割图,这些图像与评分员的专业意见相似,提供了更多有价值的信息 для诊断过程。在实际抑制癌症分割数据集上进行了实验,Pionono表现出了高度的准确性和效率,使其成为医学图像分析中的有力工具。
Towards Better Fairness-Utility Trade-off: A Comprehensive Measurement-Based Reinforcement Learning Framework
paper_authors: Simiao Zhang, Jitao Bai, Menghong Guan, Yihao Huang, Yueling Zhang, Jun Sun, Geguang Pu
for: This paper aims to ensure the fairness of machine learning classifiers while maintaining their utility.
methods: The proposed method, CFU (Comprehensive Fairness-Utility), is a reinforcement learning-based framework that considers multiple fairness metrics and utility simultaneously.
results: CFU outperforms all state-of-the-art techniques and improves the classifier on multiple fairness metrics without sacrificing its utility, with an average improvement of 37.5%.Here is the answer in Simplified Chinese text:
results: CFU比所有现有技术更高效,在多种公正性指标上提高分类器,无需牺牲其用用性,平均提高37.5%。Abstract
Machine learning is widely used to make decisions with societal impact such as bank loan approving, criminal sentencing, and resume filtering. How to ensure its fairness while maintaining utility is a challenging but crucial issue. Fairness is a complex and context-dependent concept with over 70 different measurement metrics. Since existing regulations are often vague in terms of which metric to use and different organizations may prefer different fairness metrics, it is important to have means of improving fairness comprehensively. Existing mitigation techniques often target at one specific fairness metric and have limitations in improving multiple notions of fairness simultaneously. In this work, we propose CFU (Comprehensive Fairness-Utility), a reinforcement learning-based framework, to efficiently improve the fairness-utility trade-off in machine learning classifiers. A comprehensive measurement that can simultaneously consider multiple fairness notions as well as utility is established, and new metrics are proposed based on an in-depth analysis of the relationship between different fairness metrics. The reward function of CFU is constructed with comprehensive measurement and new metrics. We conduct extensive experiments to evaluate CFU on 6 tasks, 3 machine learning models, and 15 fairness-utility measurements. The results demonstrate that CFU can improve the classifier on multiple fairness metrics without sacrificing its utility. It outperforms all state-of-the-art techniques and has witnessed a 37.5% improvement on average.
摘要
在这种情况下,我们提出了 CFU(全面公平性-实用性)框架,用于改进机器学习分类器的公平性-实用性贸易。我们提出了一种涵盖多个公平性指标以及实用性的衡量方法,并基于这些指标构建了奖励函数。我们在 6 个任务、3 种机器学习模型和 15 个公平性-实用性衡量中进行了广泛的实验。结果显示,CFU 可以同时改进多个公平性指标,不 sacrifi 其实用性。它超过了所有现有技术,并在平均上提高了 37.5%。
LatentAugment: Data Augmentation via Guided Manipulation of GAN’s Latent Space
paper_authors: Lorenzo Tronchin, Minh H. Vu, Paolo Soda, Tommy Löfstedt
for: 提高训练数据的量和多样性,并减少过拟合和提高泛化。
methods: 使用生成对抗网络(GANs)生成高质量样本,同时增加样本的多样性和模式覆盖率。
results: 在MRI-to-CT翻译任务中,使用LatentAugment DA策略可以提高模型的泛化能力,并在多样性和模式覆盖率方面超过标准DA和GAN-based sampling。Abstract
Data Augmentation (DA) is a technique to increase the quantity and diversity of the training data, and by that alleviate overfitting and improve generalisation. However, standard DA produces synthetic data for augmentation with limited diversity. Generative Adversarial Networks (GANs) may unlock additional information in a dataset by generating synthetic samples having the appearance of real images. However, these models struggle to simultaneously address three key requirements: fidelity and high-quality samples; diversity and mode coverage; and fast sampling. Indeed, GANs generate high-quality samples rapidly, but have poor mode coverage, limiting their adoption in DA applications. We propose LatentAugment, a DA strategy that overcomes the low diversity of GANs, opening up for use in DA applications. Without external supervision, LatentAugment modifies latent vectors and moves them into latent space regions to maximise the synthetic images' diversity and fidelity. It is also agnostic to the dataset and the downstream task. A wide set of experiments shows that LatentAugment improves the generalisation of a deep model translating from MRI-to-CT beating both standard DA as well GAN-based sampling. Moreover, still in comparison with GAN-based sampling, LatentAugment synthetic samples show superior mode coverage and diversity. Code is available at: https://github.com/ltronchin/LatentAugment.
摘要
数据扩展(DA)是一种技术来增加训练数据的量和多样性,从而避免过拟合和提高泛化。然而,标准的DA仅生成有限多样性的 sintetic 数据。生成对抗网络(GANs)可以从数据集中提取更多的信息,生成具有真实图像外观的 sintetic 样本。然而,这些模型很难同时满足三个关键要求:准确性和高质量样本; 多样性和模式覆盖率; 和快速采样。实际上,GANs 可以快速生成高质量样本,但它们的模式覆盖率很低,因此它们在 DA 应用中尚未得到广泛采用。我们提出了 LatentAugment,一种 DA 策略,可以在 GANs 中提高 sintetic 样本的多样性和准确性。无需外部监督,LatentAugment 会修改幽默向量并将其移动到幽默空间中,以 maximize sintetic 样本的多样性和准确性。此外,LatentAugment 是无关于数据集和下游任务的。一系列实验表明,LatentAugment 可以提高一个深度模型从 MRI-to-CT 的翻译效果,比标准 DA 和 GAN-based 采样更高。此外,与 GAN-based 采样相比,LatentAugment 的 sintetic 样本还显示出更高的模式覆盖率和多样性。代码可以在 GitHub 上找到:https://github.com/ltronchin/LatentAugment。
results: 本研究的主要贡献是将 Fenchel 对偶、强化学习和无监督技能发现相连接起来,并提供了一种简单的离线算法,可以在不同的环境下学习多个与专家相似的独特技能。Abstract
There has been significant recent progress in the area of unsupervised skill discovery, with various works proposing mutual information based objectives, as a source of intrinsic motivation. Prior works predominantly focused on designing algorithms that require online access to the environment. In contrast, we develop an \textit{offline} skill discovery algorithm. Our problem formulation considers the maximization of a mutual information objective constrained by a KL-divergence. More precisely, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning and unsupervised skill discovery, and to give a simple offline algorithm for learning diverse skills that are aligned with an expert.
摘要
“Recently, there have been significant advancements in the field of unsupervised skill discovery, with various studies proposing mutual information-based objectives as a source of intrinsic motivation. Previous works primarily focused on designing algorithms that require online access to the environment. In contrast, we develop an \textbf{offline} skill discovery algorithm. Our problem formulation involves maximizing a mutual information objective while constraining the KL-divergence. Specifically, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery, and to provide a simple offline algorithm for learning diverse skills that are aligned with an expert.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.
Random Separating Hyperplane Theorem and Learning Polytopes
paper_authors: Chiranjib Bhattacharyya, Ravindran Kannan, Amit Kumar
for: 本 paper 的目的是提供一种快速和有效地学习多聚体的算法,特别是在高维空间中。
methods: 本 paper 使用 Random Separating Hyperplane Theorem (RSH) 来学习多聚体。RSH 是一种强化了 Separating Hyperplane theorem 的结论,可以用来分离多聚体中的点。
results: 本 paper 的结果是提供了一种可以快速和有效地学习多聚体的算法,并且可以 garantuee 学习的误差的上界是 O(δ)。此外,本 paper 还提供了一种可以在高维空间中学习多聚体的算法,并且可以 garantuee 学习的误差的上界是 O(δ)。Abstract
The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. $\rsh$ asserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $\delta$, where $\delta$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $\Omega \left(\delta/\sqrt{d} \right)$. An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope $K$ within Hausdorff distance $\delta$, given an optimization oracle for $K$. Using RSH, we show that with polynomially many random queries to the optimization oracle, $K$ can be approximated within error $O(\delta)$. To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of $K$ are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance $O(\delta)$ of $K$, with the property that the list contains a point close to each vertex of $K$. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption.
摘要
“ separating 凸函数定理”是凸 геометRY的基本结果,它在各种应用中具有重要意义。我们的第一个结果是随机分隔函数定理(RSH),它是凸函数定理的强化版本,适用于多面体。RSHasserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $\delta$, where $\delta$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $\Omega \left(\delta/\sqrt{d} \right)$.这个结果的直接后果是关于凸函数定理中Error增加的首先近似Optimal bound。RSH有Algorithmic应用在学习多面体上。我们考虑了一个基本问题,称为“ Hausdorff 问题”,即在 Hausdorff 距离 $\delta$ 内, Learning一个 unit diameter 多面体 $K$,给定一个优化函数 oracle for $K$.使用 RSH,我们表明了可以使用 polynomially many random queries to the optimization oracle, Approximate $K$ within error $O(\delta)$。这是我们知道的第一个可证算法 для Hausdorff 问题。基于这个结果,我们还证明了如果多面体的顶点受到很好地分离,那么可以使用优化函数 oracle 生成一个包含多面体顶点的点列表,每个点在 Hausdorff 距离 $O(\delta)$ 内的 $K$ 的顶点。此外,我们还证明了如何将这个列表缩减,以生成一个(唯一)approximation to each vertex of the polytope。最后,我们证明了在多变量设置中,例如主题模型化、LDA、优化函数 oracle 存在,只要Project to a suitable SVD subspace。因此,我们的工作提供了第一个高效的算法来找到隐藏多面体的顶点approximation,假设顶点受到很好地分离。
Bridging the Reality Gap of Reinforcement Learning based Traffic Signal Control using Domain Randomization and Meta Learning
results: 实验结果表明,DR 和 MAML 两种策略都能够超过现有RL算法的性能,因此有望在RL基于TSC系统中减少实际与模拟之间的差距。Abstract
Reinforcement Learning (RL) has been widely explored in Traffic Signal Control (TSC) applications, however, still no such system has been deployed in practice. A key barrier to progress in this area is the reality gap, the discrepancy that results from differences between simulation models and their real-world equivalents. In this paper, we address this challenge by first presenting a comprehensive analysis of potential simulation parameters that contribute to this reality gap. We then also examine two promising strategies that can bridge this gap: Domain Randomization (DR) and Model-Agnostic Meta-Learning (MAML). Both strategies were trained with a traffic simulation model of an intersection. In addition, the model was embedded in LemgoRL, a framework that integrates realistic, safety-critical requirements into the control system. Subsequently, we evaluated the performance of the two methods on a separate model of the same intersection that was developed with a different traffic simulator. In this way, we mimic the reality gap. Our experimental results show that both DR and MAML outperform a state-of-the-art RL algorithm, therefore highlighting their potential to mitigate the reality gap in RLbased TSC systems.
摘要
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
results: 作者们提供了许多与注意结构相关的特点,如(1)与标准两层随机特征网络相比,随机特征注意层在样本复杂性方面具有优势;(2)随机特征注意层可以高效地学习一类自然的目标函数;以及(3)采样Query-key权重矩阵(Query和Key矩阵的乘积)的分布对学习某些自然目标函数的效果有所影响。实验结果与理论发现相一致,并证明了样本大小和目标函数的复杂度之间的交互关系。Abstract
Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with a sequence of key vectors and a separate query vector as input. We consider the random feature setting where the attention layer has a large number of heads, with randomly sampled frozen query and key matrices, and trainable value matrices. We show that such a random-feature attention layer can express a broad class of target functions that are permutation invariant to the key vectors. We further provide quantitative excess risk bounds for learning these target functions from finite samples, using random feature attention with finitely many heads. Our results feature several implications unique to the attention structure compared with existing random features theory for neural networks, such as (1) Advantages in the sample complexity over standard two-layer random-feature networks; (2) Concrete and natural classes of functions that can be learned efficiently by a random-feature attention layer; and (3) The effect of the sampling distribution of the query-key weight matrix (the product of the query and key matrix), where Gaussian random weights with a non-zero mean result in better sample complexities over the zero-mean counterpart for learning certain natural target functions. Experiments on simulated data corroborate our theoretical findings and further illustrate the interplay between the sample size and the complexity of the target function.
摘要
注意层 -- 将输入序列映射到输出序列 -- 是现代人工智能中的核心组件之一,它们已经实现了重要的突破。这篇论文对单个多头注意层的学习和泛化进行了严格的理论研究,输入包括一个序列的键 vector 和一个独立的查询 вектор。我们考虑了随机特征设置,其中注意层有大量的头,查询和键矩阵随机冻结,值矩阵可变。我们表明,这种随机特征注意层可以表示一类具有排序不变性的目标函数。我们进一步提供了来自 finite samples 的过度风险下界,用于学习这类目标函数。我们的结果具有以下特点:1. 相比标准的两层随机特征网络,随机特征注意层在样本复杂性方面具有优势。2. 随机特征注意层可以高效地学习一类自然的目标函数,这些目标函数具有排序不变性。3. 查询-键权重矩阵(查询和键矩阵的乘积)的采样分布对学习某些自然目标函数的性能产生了影响。在某些情况下,随机采样的查询-键权重矩阵的 Gaussian 分布可以导致更好的样本复杂性。实验结果证明了我们的理论结论,并进一步阐明了样本大小和目标函数复杂性之间的关系。
Model-based Offline Reinforcement Learning with Count-based Conservatism
results: 通过广泛的数字实验,我们证明了 $\texttt{Count-MORL}$ 与哈希码实现在 D4RL 测试数据集上表现出色,与现有的离线RL算法相比显著超越。代码可以在 $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$ 上获取。Abstract
In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.
摘要
“在这篇论文中,我们提出了一种基于模型的离线束缚学习方法,称为$\texttt{Count-MORL}$。我们的方法利用状态动作对的计数估计来衡量模型估计误差,这是离线深度学习中COUNT-based保守性的首次应用,至于我们所知道的最佳办法。首先,我们证明了估计误差与状态动作对的频率成反比。其次,我们示出了learned政策在基于计数的保守模型下提供了近似优化性能保证。通过广泛的数值实验,我们证明了$\texttt{Count-MORL}$与哈希码实现在D4RL benchmark数据集上表现出色,明显超过了现有的离线RL算法。代码可以在 $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$ 上获取。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Bounded P-values in Parametric Programming-based Selective Inference
results: 我们在线性模型和深度神经网络中进行了Feature选择和注意区域标识等假设测试问题,并证明了我们的方法的有效性和高效性。Abstract
Selective inference (SI) has been actively studied as a promising framework for statistical hypothesis testing for data-driven hypotheses. The basic idea of SI is to make inferences conditional on an event that a hypothesis is selected. In order to perform SI, this event must be characterized in a traceable form. When selection event is too difficult to characterize, additional conditions are introduced for tractability. This additional conditions often causes the loss of power, and this issue is referred to as over-conditioning. Parametric programming-based SI (PP-based SI) has been proposed as one way to address the over-conditioning issue. The main problem of PP-based SI is its high computational cost due to the need to exhaustively explore the data space. In this study, we introduce a procedure to reduce the computational cost while guaranteeing the desired precision, by proposing a method to compute the upper and lower bounds of p-values. We also proposed three types of search strategies that efficiently improve these bounds. We demonstrate the effectiveness of the proposed method in hypothesis testing problems for feature selection in linear models and attention region identification in deep neural networks.
摘要
选择性推理(SI)已经广泛研究,作为数据驱动假设检测的可能性框架。SI的基本思想是在检测假设时,根据某些事件进行 conditional inference。为了实现SI,这个事件必须能够 traces。当选择事件太难以 traces时,通常会引入额外条件,这会导致损失力量,称为过度条件。Parametric programming-based SI(PP-based SI)已经提出来解决过度条件问题。然而,PP-based SI的主要问题是高计算成本,因为需要完全探索数据空间。在这种情况下,我们提出了一种方法来降低计算成本,保证所需的精度,通过计算权值的上下限。此外,我们还提出了三种搜索策略,可以有效地提高这些上下限。我们在线性模型中的特征选择问题和深度神经网络中的注意区域标识问题中进行了示例。
Improving Transferability of Adversarial Examples via Bayesian Attacks
methods: incorporating Bayesian formulation into model parameters and input, 采用权重学习策略,提高模型的适应性和稳定性。
results: 实验表明, combining Bayesian formulations for both model input and parameters leads to significant improvements in transferability, 新方法在 ImageNet 和 CIFAR-10 上的平均成功率提高19.14%和2.08%,分别。Abstract
This paper presents a substantial extension of our work published at ICLR. Our ICLR work advocated for enhancing transferability in adversarial examples by incorporating a Bayesian formulation into model parameters, which effectively emulates the ensemble of infinitely many deep neural networks, while, in this paper, we introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters. Our empirical findings demonstrate that: 1) the combination of Bayesian formulations for both the model input and model parameters yields significant improvements in transferability; 2) by introducing advanced approximations of the posterior distribution over the model input, adversarial transferability achieves further enhancement, surpassing all state-of-the-arts when attacking without model fine-tuning. Moreover, we propose a principled approach to fine-tune model parameters in such an extended Bayesian formulation. The derived optimization objective inherently encourages flat minima in the parameter space and input space. Extensive experiments demonstrate that our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively, when comparing with our ICLR basic Bayesian method. We will make our code publicly available.
摘要
The combination of Bayesian formulations for both the model input and model parameters leads to significant improvements in transferability.2. By introducing advanced approximations of the posterior distribution over the model input, adversarial transferability is further enhanced, surpassing all state-of-the-art methods when attacking without model fine-tuning.Moreover, we propose a principled approach to fine-tune model parameters in such an extended Bayesian formulation. The derived optimization objective inherently encourages flat minima in both the parameter space and input space. Extensive experiments demonstrate that our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively, when compared with our ICLR basic Bayesian method. Our code will be publicly available.
Demystifying Local and Global Fairness Trade-offs in Federated Learning Using Partial Information Decomposition
For: This paper aims to provide an information-theoretic perspective on group fairness trade-offs in federated learning (FL) with respect to sensitive attributes, such as gender and race.* Methods: The paper uses a body of work in information theory called partial information decomposition (PID) to identify three sources of unfairness in FL, namely, Unique Disparity, Redundant Disparity, and Masked Disparity.* Results: The paper derives fundamental limits and trade-offs between global and local fairness, particularly under data heterogeneity, and presents experimental results on benchmark datasets to support the theoretical findings.Here is the same information in Simplified Chinese:* For: 这篇论文目标是通过对敏感特征(如性别和种族)的联邦学习(FL)中的群体公平负担进行信息理论的视角来探讨。* Methods: 这篇论文使用信息理论中的剩余信息分解(PID)技术来识别联邦学习中的三种不公平源,即唯一差异、重复差异和遮盖差异。* Results: 这篇论文 derivates了全球公平和本地公平之间的基本限制和负担,特别是在数据不均衡情况下,并通过使用标准例子来说明这三种差异如何影响全球和本地公平。Abstract
In this paper, we present an information-theoretic perspective to group fairness trade-offs in federated learning (FL) with respect to sensitive attributes, such as gender, race, etc. Existing works mostly focus on either \emph{global fairness} (overall disparity of the model across all clients) or \emph{local fairness} (disparity of the model at each individual client), without always considering their trade-offs. There is a lack of understanding of the interplay between global and local fairness in FL, and if and when one implies the other. To address this gap, we leverage a body of work in information theory called partial information decomposition (PID) which first identifies three sources of unfairness in FL, namely, \emph{Unique Disparity}, \emph{Redundant Disparity}, and \emph{Masked Disparity}. Using canonical examples, we demonstrate how these three disparities contribute to global and local fairness. This decomposition helps us derive fundamental limits and trade-offs between global or local fairness, particularly under data heterogeneity, as well as, derive conditions under which one implies the other. We also present experimental results on benchmark datasets to support our theoretical findings. This work offers a more nuanced understanding of the sources of disparity in FL that can inform the use of local disparity mitigation techniques, and their convergence and effectiveness when deployed in practice.
摘要
在这篇论文中,我们提出了一种信息学方面的视角来探讨在联合学习(Federated Learning,FL)中的群体公平负担。现有的工作主要关注global fairness(总体模型对所有客户的不均衡)或local fairness(每个客户模型不均衡),而不一定考虑它们之间的负担。对于FL中的群体公平负担的理解匮乏,特别是这些负担之间的关系。为了解决这个难题,我们利用了一种信息理论中的分解技术(partial information decomposition,PID),并识别了FL中三种不公平来源,即Unique Disparity、Redundant Disparity和Masked Disparity。使用标准示例,我们示出了这三种不公平如何影响全局和本地公平。这种分解帮助我们 derive fundamental limits和这些负担之间的负担规则,特别是在数据不均衡情况下。我们还提供了实验结果,以支持我们的理论发现。这项工作为FL中不公平负担管理技术的使用提供了更加细化的理解,以及这些技术在实践中的协调和效果。
Beyond Convergence: Identifiability of Machine Learning and Deep Learning Models
for: investigate the notion of model parameter identifiability through a case study
methods: utilizing a deep neural network to estimate subject-wise parameters from motion sensor data
results: certain parameters can be identified, while others remain unidentifiable due to the experimental setup’s limitationsAbstract
Machine learning (ML) and deep learning models are extensively used for parameter optimization and regression problems. However, not all inverse problems in ML are ``identifiable,'' indicating that model parameters may not be uniquely determined from the available data and the data model's input-output relationship. In this study, we investigate the notion of model parameter identifiability through a case study focused on parameter estimation from motion sensor data. Utilizing a bipedal-spring mass human walk dynamics model, we generate synthetic data representing diverse gait patterns and conditions. Employing a deep neural network, we attempt to estimate subject-wise parameters, including mass, stiffness, and equilibrium leg length. The results show that while certain parameters can be identified from the observation data, others remain unidentifiable, highlighting that unidentifiability is an intrinsic limitation of the experimental setup, necessitating a change in data collection and experimental scenarios. Beyond this specific case study, the concept of identifiability has broader implications in ML and deep learning. Addressing unidentifiability requires proven identifiable models (with theoretical support), multimodal data fusion techniques, and advancements in model-based machine learning. Understanding and resolving unidentifiability challenges will lead to more reliable and accurate applications across diverse domains, transcending mere model convergence and enhancing the reliability of machine learning models.
摘要
机器学习(ML)和深度学习模型在参数优化和回归问题中广泛应用。然而,不是所有机器学习 inverse problem 是可识别的,表示模型参数可能不是来自可用数据和输入输出关系的数据模型的唯一确定。在本研究中,我们通过人行动数据的情况进行研究,探讨模型参数可识别性的概念。使用一个人体弹簧模型,我们生成了多种步态和条件的 sintetic 数据。使用深度神经网络,我们尝试了每个参与者的参数,包括质量、刚度和平衡脚长。结果表明,有些参数可以从观察数据中提取,而其他参数则无法准确地确定,这反映了实验设置的内在限制,需要更改数据收集和实验方法。这种特定的案例研究还有更广泛的意义在机器学习和深度学习中。解决不可识别性需要有理据支持的可识别模型、多Modal 数据融合技术和机器学习模型的发展。更好地理解和解决不可识别性挑战,将导致更可靠、更准确的应用在多个领域,超越模型的极限并提高机器学习模型的可靠性。
Methodologies for Improving Modern Industrial Recommender Systems
results: 这篇论文提出了一些有效的方法,可以提高现代工业RS的适用率和持续时间。Abstract
Recommender system (RS) is an established technology with successful applications in social media, e-commerce, entertainment, and more. RSs are indeed key to the success of many popular APPs, such as YouTube, Tik Tok, Xiaohongshu, Bilibili, and others. This paper explores the methodology for improving modern industrial RSs. It is written for experienced RS engineers who are diligently working to improve their key performance indicators, such as retention and duration. The experiences shared in this paper have been tested in some real industrial RSs and are likely to be generalized to other RSs as well. Most contents in this paper are industry experience without publicly available references.
摘要
“推荐系统(RS)是一种已经成熟的技术,在社交媒体、电商、娱乐等领域都有成功应用。RS在许多受欢迎的APP中扮演重要角色,如YouTube、Tik Tok、Xiaohongshu和Bilibili等。本文探讨现代工业RS的改进方法。这篇文章主要对于经验丰富的RS工程师,以提高关键性表现指标(如滞留率和使用时间)为目标。文中的经验主要基于实际的工业应用,并且可能对其他RS进行应用。”Note: Please keep in mind that the translation is based on the provided text and may not be perfect or entirely accurate, as the nuances of the original text may be lost in translation.
Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration
results: 该论文通过对实际环境中训练的深度学习模型进行适应,在虚拟环境中实现了高效的手势识别。Abstract
Virtual reality has proved to be useful in applications in several fields ranging from gaming, medicine, and training to development of interfaces that enable human-robot collaboration. It empowers designers to explore applications outside of the constraints posed by the real world environment and develop innovative solutions and experiences. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets. In order to utilize the power of natural and intuitive hand gestures in the virtual domain for enabling embodied teleoperation of collaborative robots, similarly large datasets must be created so as to keep the working interface easy to learn and flexible enough to add more gestures. Depending on the application, this may be computationally or economically prohibitive. Thus, the adaptation of trained deep learning models that perform well in the real environment to the virtual may be a solution to this challenge. This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset along with guidelines for creating a curated dataset. Finally, while hand gestures have been considered as the communication mode, the guidelines and recommendations presented are generic. These are applicable to other modes such as body poses and facial expressions which have large datasets available in the real domain which must be adapted to the virtual one.
摘要
虚拟现实已经在各种领域展示了其用途,包括游戏、医疗、训练和人机合作交互的开发。它让设计师能够在虚拟环境中探索不受实际环境限制的应用,并开发创新的解决方案和体验。手势认识是虚拟现实中的一个重要话题,因为大量标注的数据集的创建使得手势在实际世界中得到了商业化。为了在虚拟世界中使用自然和直观的手势,需要创建大量的虚拟数据集,以保持工作界面简单易学习,并能够添加更多的手势。在应用程序方面,这可能是计算机或经济上的瓶颈。因此,将已经在实际环境中表现好的深度学习模型适应到虚拟环境可能是一个解决方案。本文提出了一个系统化的实际环境到虚拟环境的适应框架,以及创建审核数据集的指南。最后,尽管手势被视为交流方式,但是这些指南和建议适用于其他模式,如身体姿态和表情,这些在实际环境中有大量数据集可以适应到虚拟环境。
Analysis of Elephant Movement in Sub-Saharan Africa: Ecological, Climatic, and Conservation Perspectives
results: 研究发现 elephant 的移动 behaviors 受到季节变化和降水 patrerns 等生态因素的影响,并提供了一种可预测 elephant 移动模式的方法,这可能为保护这些动物提供有价值的信息。Abstract
The interaction between elephants and their environment has profound implications for both ecology and conservation strategies. This study presents an analytical approach to decipher the intricate patterns of elephant movement in Sub-Saharan Africa, concentrating on key ecological drivers such as seasonal variations and rainfall patterns. Despite the complexities surrounding these influential factors, our analysis provides a holistic view of elephant migratory behavior in the context of the dynamic African landscape. Our comprehensive approach enables us to predict the potential impact of these ecological determinants on elephant migration, a critical step in establishing informed conservation strategies. This projection is particularly crucial given the impacts of global climate change on seasonal and rainfall patterns, which could substantially influence elephant movements in the future. The findings of our work aim to not only advance the understanding of movement ecology but also foster a sustainable coexistence of humans and elephants in Sub-Saharan Africa. By predicting potential elephant routes, our work can inform strategies to minimize human-elephant conflict, effectively manage land use, and enhance anti-poaching efforts. This research underscores the importance of integrating movement ecology and climatic variables for effective wildlife management and conservation planning.
摘要
elephants和其环境之间的互动对生态和保护策略有深刻的影响。这项研究提出了一种分析方法,用于解读在亚洲南部的非洲大象运动征分布的复杂模式,主要考虑了季节变化和降水 patrerns等生态因素。尽管这些因素具有复杂性,但我们的分析提供了一个整体的视角,用于理解非洲静止的大象migrations。我们的全面approach可以预测大象移动的可能性,这是为建立有知识基础的保护策略提供了关键的一步。由于全球气候变化对季节和降水 patrerns的影响,这些因素可能会对大象移动产生深刻的影响。我们的研究结果希望能不仅提高大象运动学的理解,还能推动人类和大象之间可持续合作的发展,以保护非洲南部的野生动物资源。我们的工作可以预测可能的大象路线,以便为人类-大象冲突的避免、土地管理和反贼战斗提供有价值的信息。这项研究强调了在保护野生动物资源方面 integrating movement ecology和 climatic variables的重要性。
XLDA: Linear Discriminant Analysis for Scaling Continual Learning to Extreme Classification at the Edge
For: 该研究旨在提出一种基于流式线性减少分析(XLDA)的分类器,用于在边缘部署中进行类增量学习(Class-IL),并在极端分类场景下保持等效性。* Methods: 该研究提出了一种基于XLDA的框架,包括对LDA分类器的证明,以及在边缘部署中进行训练和推断的优化策略,以提高效率和可扩展性。* Results: 研究人员通过使用批处理训练策略和最近匹配搜索策略,在极端分类场景下实现了至多42倍的训练速度减少和至多5倍的推断速度减少,在AliProducts和Google Landmarks V2等极端数据集上进行了实验 validate。Abstract
Streaming Linear Discriminant Analysis (LDA) while proven in Class-incremental Learning deployments at the edge with limited classes (upto 1000), has not been proven for deployment in extreme classification scenarios. In this paper, we present: (a) XLDA, a framework for Class-IL in edge deployment where LDA classifier is proven to be equivalent to FC layer including in extreme classification scenarios, and (b) optimizations to enable XLDA-based training and inference for edge deployment where there is a constraint on available compute resources. We show up to 42x speed up using a batched training approach and up to 5x inference speedup with nearest neighbor search on extreme datasets like AliProducts (50k classes) and Google Landmarks V2 (81k classes)
摘要
<>将文本翻译成简化中文。<>流式线性准确分析(LDA)在Edge环境中进行类增量学习部署(upto 1000)已经证明,但在极端分类场景下没有得到证明。本文提出了以下两点:(a)XLDA,一个基于LDA分类器的Edge环境中的类增量学习框架,其中LDA分类器在极端分类场景下与FC层等价;(b)为了实现XLDA基于训练和推断的编制和优化,以便在 Edge环境中使用有限的计算资源。我们在极端 dataset like AliProducts (50k类) 和 Google Landmarks V2 (81k类) 上实现了最多42倍的批处理训练速度和最多5倍的 nearest neighbor search 推断速度。
Making Pre-trained Language Models both Task-solvers and Self-calibrators
methods: 这 paper 使用了一种叫做 LM-TOAST 的训练算法,以使 PLM 同时成为任务解决者和自我调整器。
results: 实验结果表明,LM-TOAST 可以有效地利用训练数据,使 PLM 有合理的自信估计,而不会影响其原始任务性能。Abstract
Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows that introducing an extra calibration task can mitigate this issue. The basic idea involves acquiring additional data to train models in predicting the confidence of their initial predictions. However, it only demonstrates the feasibility of this kind of method, assuming that there are abundant extra available samples for the introduced calibration task. In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and self-calibrators. Three challenges are presented, including limited training samples, data imbalance, and distribution shifts. We first conduct pilot experiments to quantify various decisive factors in the calibration task. Based on the empirical analysis results, we propose a training algorithm LM-TOAST to tackle the challenges. Experimental results show that LM-TOAST can effectively utilize the training data to make PLMs have reasonable confidence estimations while maintaining the original task performance. Further, we consider three downstream applications, namely selective classification, adversarial defense, and model cascading, to show the practical usefulness of LM-TOAST. The code will be made public at \url{https://github.com/Yangyi-Chen/LM-TOAST}.
摘要
预训言语模型(PLM)作为各种实际系统的基础,其中一个重要问题是保证预测结果的可靠性。虽然vanilla confidence scores已经可以有效地使用,但PLM在错误预测时常常过分自信,这不是实际应用中所需的。 previous work表明,通过添加额外的calibration任务可以解决这个问题。基本思想是通过训练模型预测其初始预测的可信度。然而,这只是一种可行的方法,假设有充足的额外可用样本。在这种实际场景中,我们需要使PLM同时成为任务解决者和自我调整者。我们描述了三个挑战,包括有限的训练样本、数据不均衡和分布shift。我们首先进行了飞行实验,以量化各种决定性因素在calibration任务中。根据实验分析结果,我们提出了一种训练算法LM-TOAST,用于解决这些挑战。实验结果表明,LM-TOAST可以有效地利用训练数据,使PLM有合理的可信度估计,同时保持原始任务性能。此外,我们考虑了三个下游应用,namely selective classification、adversarial defense和model cascading,以显示LM-TOAST的实际用途。代码将在\url{https://github.com/Yangyi-Chen/LM-TOAST}公开。
Artificial Intelligence-Generated Terahertz Multi-Resonant Metasurfaces via Improved Transformer and CGAN Neural Networks
paper_authors: Yangpeng Huang, Naixing Feng, Yijun Cai
For: This paper proposes improved Transformer and conditional generative adversarial neural networks (CGAN) for the inverse design of graphene metasurfaces based on THz multi-resonant absorption spectra.* Methods: The paper uses traditional deep neural networks (DNNs), improved Transformer, and CGAN for the inverse design of graphene metasurfaces.* Results: The improved Transformer achieves higher accuracy and generalization performance in the StoV design, while the StoI design achieved through CGAN provides more comprehensive information and higher accuracy than the StoV design obtained by MLP. The improved CGAN can also achieve the inverse design of graphene metasurface images directly from the desired multi-resonant absorption spectra.Here’s the Chinese translation of the three key points:* For: 这篇论文提出了基于 THz 多晶谐振荷重Graphene 元件的 inverse 设计方法,使用传统的深度神经网络 (DNNs) 和改进的 Transformer 和 conditional generative adversarial neural networks (CGAN)。* Methods: 这篇论文使用了 DNNs、改进的 Transformer 和 CGAN 进行 inverse 设计。* Results: 改进的 Transformer 在 StoV 设计中实现了更高的准确率和泛化能力,而 StoI 设计通过 CGAN 实现了更全面的信息和更高的准确率,而且改进的 CGAN 还可以直接从愿景多晶谐振荷谱图中生成图像。Abstract
It is well known that the inverse design of terahertz (THz) multi-resonant graphene metasurfaces by using traditional deep neural networks (DNNs) has limited generalization ability. In this paper, we propose improved Transformer and conditional generative adversarial neural networks (CGAN) for the inverse design of graphene metasurfaces based upon THz multi-resonant absorption spectra. The improved Transformer can obtain higher accuracy and generalization performance in the StoV (Spectrum to Vector) design compared to traditional multilayer perceptron (MLP) neural networks, while the StoI (Spectrum to Image) design achieved through CGAN can provide more comprehensive information and higher accuracy than the StoV design obtained by MLP. Moreover, the improved CGAN can achieve the inverse design of graphene metasurface images directly from the desired multi-resonant absorption spectra. It is turned out that this work can finish facilitating the design process of artificial intelligence-generated metasurfaces (AIGM), and even provide a useful guide for developing complex THz metasurfaces based on 2D materials using generative neural networks.
摘要
bekannt ist, dass die inverse Design von terahertz (THz) multi-resonant graphene metasurfaces using traditional deep neural networks (DNNs) begrenzt ist. In diesem paper, we propose improved Transformer and conditional generative adversarial neural networks (CGAN) for the inverse Design of graphene metasurfaces based on THz multi-resonant absorption spectra. die verbesserte Transformer kann eine höhere Genauigkeit und Generalisierungleistung im StoV (Spektrum zu Vektor) Design gegenüber traditionellen multilayer perceptron (MLP) neural networks erzielen, während die StoI (Spektrum zu Bild) Gestaltung durch CGAN mehr umfassende Informationen und eine höhere Genauigkeit als die StoV Gestaltung durch MLP bereitstellt. besides, the improved CGAN can achieve the inverse design of graphene metasurface images directly from the desired multi-resonant absorption spectra. it turns out that this work can facilitate the design process of artificial intelligence-generated metasurfaces (AIGM), and even provide a useful guide for developing complex THz metasurfaces based on 2D materials using generative neural networks.
Neuromorphic Online Learning for Spatiotemporal Patterns with a Forward-only Timeline
methods: 这篇论文提出了一种名为Spatiotemporal Online Learning for Synaptic Adaptation(SOLSA)的在线学习算法,用于学习具有泄漏散发和软重置的窗口级神经元(LIF)和其相关的Synapse。该算法不仅学习了synaptic weight,还适应了时间滤波器。相比BPTT算法,SOLSA具有远低的内存需求,并实现了更好的时间工作负荷分布。此外,SOLSA还包含了启用技术,如调度的weight更新、早期停止训练和自适应synapse滤波器,这些技术使得SOLSA更快速地 converges和提高了学习性能。
results: 相比非BPTT基于SNN学习算法,SOLSA在平均学习精度上提高了14.2%。而相比BPTT算法,SOLSA在内存成本下降72%的情况下,实现了5%高的平均学习精度。Abstract
Spiking neural networks (SNNs) are bio-plausible computing models with high energy efficiency. The temporal dynamics of neurons and synapses enable them to detect temporal patterns and generate sequences. While Backpropagation Through Time (BPTT) is traditionally used to train SNNs, it is not suitable for online learning of embedded applications due to its high computation and memory cost as well as extended latency. Previous works have proposed online learning algorithms, but they often utilize highly simplified spiking neuron models without synaptic dynamics and reset feedback, resulting in subpar performance. In this work, we present Spatiotemporal Online Learning for Synaptic Adaptation (SOLSA), specifically designed for online learning of SNNs composed of Leaky Integrate and Fire (LIF) neurons with exponentially decayed synapses and soft reset. The algorithm not only learns the synaptic weight but also adapts the temporal filters associated to the synapses. Compared to the BPTT algorithm, SOLSA has much lower memory requirement and achieves a more balanced temporal workload distribution. Moreover, SOLSA incorporates enhancement techniques such as scheduled weight update, early stop training and adaptive synapse filter, which speed up the convergence and enhance the learning performance. When compared to other non-BPTT based SNN learning, SOLSA demonstrates an average learning accuracy improvement of 14.2%. Furthermore, compared to BPTT, SOLSA achieves a 5% higher average learning accuracy with a 72% reduction in memory cost.
摘要
神经网络(SNN)是生物可能性计算模型,具有高能效性。 neuron和 synapse 的时间动态性允许它们检测时间模式和生成序列。 而传统上使用的 Backpropagation Through Time(BPTT)不适用于嵌入式应用的在线学习,因为它具有高计算和存储成本以及延迟。 先前的工作已经提出了在线学习算法,但它们通常使用简化的脑神经模型,无法考虑 synaptic 动态和软Reset,导致性能不佳。 在这种工作中,我们提出了Spatiotemporal Online Learning for Synaptic Adaptation(SOLSA),专门为SNNs组成的Leaky Integrate and Fire(LIF) neurons with exponentially decayed synapses and soft reset进行在线学习。该算法不仅学习synaptic weight,还适应相关的时间筛子。相比BPTT算法,SOLSA具有远低的内存需求,并实现了更平衡的时间工作负荷分布。此外,SOLSA还包括了加强技术,如计划的weight更新、早期停止训练和适应synapse筛子,快速增长和提高学习性能。与其他非BPTT基于SNN学习相比,SOLSA表现出14.2%的学习精度提高。而与BPTT相比,SOLSA实现了72%的内存成本减少,并达到15%高的学习精度。
Who should I Collaborate with? A Comparative Study of Academia and Industry Research Collaboration in NLP
paper_authors: Hussain Sadiq Abuwala, Bohan Zhang, Mushi Wang
for: investigate the effects of collaboration between academia and industry on Natural Language Processing (NLP)
methods: created a pipeline to extract affiliations and citations from NLP papers and divided them into three categories: academia, industry, and hybrid (collaborations between academia and industry)
results: found a trend towards an increase in industry and academia-industry collaboration publications, and these types of publications tend to have a higher impact compared to those produced solely within academia.Here’s the information in Simplified Chinese text:
for: 研究学术与产业之间的协作对自然语言处理(NLP)的影响
methods: 创建了一个管道,EXTRACT affiliations and citations from NLP papers, and divided them into three categories: academia, industry, and hybrid (collaborations between academia and industry)
results: 发现了协作类别的发表数量在提高,并且这些类别的发表物 tend to have a higher impact compared to those produced solely within academia.Abstract
The goal of our research was to investigate the effects of collaboration between academia and industry on Natural Language Processing (NLP). To do this, we created a pipeline to extract affiliations and citations from NLP papers and divided them into three categories: academia, industry, and hybrid (collaborations between academia and industry). Our empirical analysis found that there is a trend towards an increase in industry and academia-industry collaboration publications and that these types of publications tend to have a higher impact compared to those produced solely within academia.
摘要
我们的研究目标是研究学术与工业合作对自然语言处理(NLP)的影响。为此,我们创建了一个管道,以EXTRACT affiliations和引用信息从NLP论文中,并将其分为三类:学术、产业和混合(学术与产业合作)。我们的实证分析发现, there is a trend towards an increase in industry and academia-industry collaboration publications, and these types of publications tend to have a higher impact compared to those produced solely within academia.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
methods: integrate governing physical laws into Physics Informed Variational Embedding Generative Adversarial Network (PI-VEGAN) with automatic differentiation, introduce variational encoder to approximate latent variables, and use stochastic gradient descent algorithm to update components.
results: PI-VEGAN achieves satisfactory stability and accuracy in addressing forward, inverse, and mixed problems, outperforming previous Physics-Informed Generative Adversarial Network (PI-WGAN) in numerical results.Abstract
We present a new category of physics-informed neural networks called physics informed variational embedding generative adversarial network (PI-VEGAN), that effectively tackles the forward, inverse, and mixed problems of stochastic differential equations. In these scenarios, the governing equations are known, but only a limited number of sensor measurements of the system parameters are available. We integrate the governing physical laws into PI-VEGAN with automatic differentiation, while introducing a variational encoder for approximating the latent variables of the actual distribution of the measurements. These latent variables are integrated into the generator to facilitate accurate learning of the characteristics of the stochastic partial equations. Our model consists of three components, namely the encoder, generator, and discriminator, each of which is updated alternatively employing the stochastic gradient descent algorithm. We evaluate the effectiveness of PI-VEGAN in addressing forward, inverse, and mixed problems that require the concurrent calculation of system parameters and solutions. Numerical results demonstrate that the proposed method achieves satisfactory stability and accuracy in comparison with the previous physics-informed generative adversarial network (PI-WGAN).
摘要
我们提出了一新类型的物理学信息泛化网络(PI-VEGAN),用于有效地解决Stochastic Differential Equations(SDEs)中的前向、逆向和混合问题。在这些场景下,系统参数的 governing equations 已知,但只有一个有限多的感知器测量可用。我们将物理法则integrated into PI-VEGAN中,并引入了一个变量编码器来 aproximate latent variables的实际分布。这些 latent variables 被 integrate into the generator 以便准确地学习 SDEs 的特征。我们的模型包括三个组件:编码器、生成器和批判器,每个组件都在使用随机梯度下降算法进行更新。我们评估了PI-VEGAN的有效性,并发现它在解决前向、逆向和混合问题时具有满意的稳定性和准确性,比之前的物理学信息泛化网络(PI-WGAN)更好。
results: 这个论文的算法比一种使用均匀随机 contexts 的策略表现更好,并得到了实验证明。Abstract
Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and introduce the offline contextual dueling bandit setting. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound. We also give empirical confirmation that this method outperforms a similar strategy that uses uniformly sampled contexts.
摘要
Preferences-based feedback 是许多应用程序中非常重要的,特别是当直接评估奖金函数不可能时。一个最近的例子来自大自然语言模型的人工反馈。许多这些应用程序中,人类反馈的成本可能很高或甚至是不可接受的。在这种情况下,我们利用agent可以选择获取人类反馈的上下文,以便最有效地标识一个好策略,并引入了线上上下文战斗带 setting。我们提供了一种上确界风格的算法,并证明了一个违和 bound。我们还提供了实验证明,这种方法在相似的策略中表现比较好。
MAS: Towards Resource-Efficient Federated Multiple-Task Learning
methods: 本研究提出了首个实现多元FL任务训练的系统,称为MAS(Merge and Split)。MAS首先将多元FL任务合并为一个统一FL任务,然后在训练一些循环后,使用任务之间的相互关联性来拆分为多个FL任务。接着,MAS将每个拆分的FL任务继续训练,根据在统一训练中获得的模型参数。
results: 实验结果显示,MAS比其他方法具有更好的性能,同时降低了训练时间和能源消耗。实验结果显示,在训练20个FL任务时,MAS可以降低训练时间2倍,并降低能源消耗40%。Abstract
Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous FL tasks could overload resource-constrained devices. In this work, we propose the first FL system to effectively coordinate and train multiple simultaneous FL tasks. We first formalize the problem of training simultaneous FL tasks. Then, we present our new approach, MAS (Merge and Split), to optimize the performance of training multiple simultaneous FL tasks. MAS starts by merging FL tasks into an all-in-one FL task with a multi-task architecture. After training for a few rounds, MAS splits the all-in-one FL task into two or more FL tasks by using the affinities among tasks measured during the all-in-one training. It then continues training each split of FL tasks based on model parameters from the all-in-one training. Extensive experiments demonstrate that MAS outperforms other methods while reducing training time by 2x and reducing energy consumption by 40%. We hope this work will inspire the community to further study and optimize training simultaneous FL tasks.
摘要
federa 学习(FL)是一种emerging分布式机器学习方法,它允许在分布式边缘设备上进行增 Situ 模型训练。然而,多个同时进行FL任务可能会过载具有限的设备资源。在这种情况下,我们提出了首个可以有效协调和训练多个同时FL任务的FL系统。我们首先正式定义同时训练多个FL任务的问题。然后,我们介绍了我们的新方法MAS(合并并分),用于优化训练多个FL任务的性能。MAS开始是将FL任务合并成一个所有任务的FL任务,并使用多任务架构进行训练。在训练一些循环后,MAS将所有任务合并的FL任务分解成两个或更多的FL任务,然后继续基于所有任务训练的模型参数进行每个分解的FL任务训练。广泛的实验表明,MAS在减少训练时间2x,减少能量消耗40%的情况下,能够超越其他方法。我们希望这种工作能够激励社区进一步研究并优化同时训练FL任务。
Epsilon*: Privacy Metric for Machine Learning Models
paper_authors: Diana M. Negoescu, Humberto Gonzalez, Saad Eddin Al Orjany, Jilei Yang, Yuliia Lut, Rahul Tandra, Xiaowen Zhang, Xinyi Zheng, Zach Douglas, Vidita Nolkha, Parvez Ahammad, Gennady Samorodnitsky for:The paper aims to provide a new privacy metric called Epsilon* to measure the privacy risk of a single model instance before, during, or after deployment of privacy mitigation strategies.methods:The metric does not require access to the training data sampling or model training algorithm, and is based on a hypothesis test used by an adversary in a membership inference attack.results:The paper shows that Epsilon* is sensitive to privacy risk mitigation by training with differential privacy (DP), where the value of Epsilon* is reduced by up to 800% compared to the Epsilon* values of non-DP trained baseline models. This allows privacy auditors to be independent of model owners and enables all decision-makers to visualize the privacy-utility landscape to make informed decisions regarding the trade-offs between model privacy and utility.Here is the same information in Simplified Chinese:for:这篇论文目标是提供一个新的隐私度量 metric called Epsilon,用于在模型实例之前、 durante 或者 после部署隐私修正策略时测量隐私风险。methods:metric 不需要访问训练数据采样或模型训练算法,基于一个 adversary 在成员推测攻击中使用的 гипотеза测试。results:论文显示,使用权限隐私(DP)训练可以降低 Epsilon 的值,相比非DP 训练基eline 模型,最多降低 800%。这使得隐私审计人员可以独立于模型所有者,并且允许所有决策者在隐私与用途之间做出 Informed 决策。Abstract
We introduce Epsilon*, a new privacy metric for measuring the privacy risk of a single model instance prior to, during, or after deployment of privacy mitigation strategies. The metric does not require access to the training data sampling or model training algorithm. Epsilon* is a function of true positive and false positive rates in a hypothesis test used by an adversary in a membership inference attack. We distinguish between quantifying the privacy loss of a trained model instance and quantifying the privacy loss of the training mechanism which produces this model instance. Existing approaches in the privacy auditing literature provide lower bounds for the latter, while our metric provides a lower bound for the former by relying on an (${\epsilon}$,${\delta}$)-type of quantification of the privacy of the trained model instance. We establish a relationship between these lower bounds and show how to implement Epsilon* to avoid numerical and noise amplification instability. We further show in experiments on benchmark public data sets that Epsilon* is sensitive to privacy risk mitigation by training with differential privacy (DP), where the value of Epsilon* is reduced by up to 800% compared to the Epsilon* values of non-DP trained baseline models. This metric allows privacy auditors to be independent of model owners, and enables all decision-makers to visualize the privacy-utility landscape to make informed decisions regarding the trade-offs between model privacy and utility.
摘要
我们引入ε*, a new privacy metric to measure the privacy risk of a single model instance before, during, or after deploying privacy mitigation strategies. This metric does not require access to the training data sampling or model training algorithm. ε* is a function of true positive and false positive rates in a hypothesis test used by an adversary in a membership inference attack. We distinguish between quantifying the privacy loss of a trained model instance and quantifying the privacy loss of the training mechanism that produces this model instance. Existing approaches in the privacy auditing literature provide lower bounds for the latter, while our metric provides a lower bound for the former by relying on an (${\epsilon}$,${\delta}$)-type of quantification of the privacy of the trained model instance. We establish a relationship between these lower bounds and show how to implement ε* to avoid numerical and noise amplification instability. We further show in experiments on benchmark public data sets that ε* is sensitive to privacy risk mitigation by training with differential privacy (DP), where the value of ε* is reduced by up to 800% compared to the ε* values of non-DP trained baseline models. This metric allows privacy auditors to be independent of model owners, and enables all decision-makers to visualize the privacy-utility landscape to make informed decisions regarding the trade-offs between model privacy and utility.
Learning to Segment from Noisy Annotations: A Spatial Correction Approach
results: 我们的方法在 sintetic和实际杂乱标注数据上进行了实验,并证明了我们的方法可以超过现有状态的各种方法。Abstract
Noisy labels can significantly affect the performance of deep neural networks (DNNs). In medical image segmentation tasks, annotations are error-prone due to the high demand in annotation time and in the annotators' expertise. Existing methods mostly assume noisy labels in different pixels are \textit{i.i.d}. However, segmentation label noise usually has strong spatial correlation and has prominent bias in distribution. In this paper, we propose a novel Markov model for segmentation noisy annotations that encodes both spatial correlation and bias. Further, to mitigate such label noise, we propose a label correction method to recover true label progressively. We provide theoretical guarantees of the correctness of the proposed method. Experiments show that our approach outperforms current state-of-the-art methods on both synthetic and real-world noisy annotations.
摘要
干扰标签可以影响深度神经网络(DNNs)的性能。在医疗图像分割任务中,标注错误率高,主要因为标注时间和标注人员的专业知识需求很高。现有方法大多假设不同像素的干扰标签是独立的,但实际上,分割标签噪声通常具有强相关性和明显的偏见。在这篇论文中,我们提出了一种新的马尔可夫模型,用于捕捉分割噪声标注的空间相关性和偏见。此外,我们还提出了一种标签修正方法,可以逐步recover真实的标签。我们提供了理论保证方法的正确性。实验表明,我们的方法在 sintetic和实际噪声标注上都超过了当前状态的先进方法。
methods: 使用自动化 breast cancer 检测方法,包括不同的方法ologies 在 RSNA 数据集上进行测试,并获得了 roughly 20,000 名女性患者的 radiographic breast images 的平均验证案例 pF1 分数为 0.56。
results: 通过自动化检测方法,可以提高乳癌检测的效率和准确率,减少成本和假阳性结果导致的患者担忧。Abstract
Breast cancer is a leading cause of cancer-related deaths, but current programs are expensive and prone to false positives, leading to unnecessary follow-up and patient anxiety. This paper proposes a solution to automated breast cancer detection, to improve the efficiency and accuracy of screening programs. Different methodologies were tested against the RSNA dataset of radiographic breast images of roughly 20,000 female patients and yielded an average validation case pF1 score of 0.56 across methods.
摘要
乳癌是癌症相关死亡率的主要原因,但目前的计划昂贵,且容易出现假阳性结果,导致无需跟踪和患者担忧。本文提出一种自动乳癌检测方案,以提高检测计划的效率和准确率。不同的方法在RSNA数据集上进行了测试,并获得了 roughly 20,000 名女性患者的乳影像数据集的平均验证案例 pF1 分数0.56。
On the Fisher-Rao Gradient of the Evidence Lower Bound
results: 研究发现,在满足某些条件下,最小化 prime objective function 和最大化 ELBO 都可以 Ensure the equivalence of minimizing the prime objective function and maximizing the ELBO.Abstract
This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound, the ELBO, which plays a crucial role within the theory of the Variational Autonecoder, the Helmholtz Machine and the Free Energy Principle. The natural gradient of the ELBO is related to the natural gradient of the Kullback-Leibler divergence from a target distribution, the prime objective function of learning. Based on invariance properties of gradients within information geometry, conditions on the underlying model are provided that ensure the equivalence of minimising the prime objective function and the maximisation of the ELBO.
摘要
Leveraging arbitrary mobile sensor trajectories with shallow recurrent decoder networks for full-state reconstruction
results: 试验表明,这种模型可以准确地重建全状态空间,并且在不同的动态轨迹下可以快速适应。此外,模型还可以减少抽象误差的方差,并且可以在训练数据外进行快速推理。Abstract
Sensing is one of the most fundamental tasks for the monitoring, forecasting and control of complex, spatio-temporal systems. In many applications, a limited number of sensors are mobile and move with the dynamics, with examples including wearable technology, ocean monitoring buoys, and weather balloons. In these dynamic systems (without regions of statistical-independence), the measurement time history encodes a significant amount of information that can be extracted for critical tasks. Most model-free sensing paradigms aim to map current sparse sensor measurements to the high-dimensional state space, ignoring the time-history all together. Using modern deep learning architectures, we show that a sequence-to-vector model, such as an LSTM (long, short-term memory) network, with a decoder network, dynamic trajectory information can be mapped to full state-space estimates. Indeed, we demonstrate that by leveraging mobile sensor trajectories with shallow recurrent decoder networks, we can train the network (i) to accurately reconstruct the full state space using arbitrary dynamical trajectories of the sensors, (ii) the architecture reduces the variance of the mean-square error of the reconstruction error in comparison with immobile sensors, and (iii) the architecture also allows for rapid generalization (parameterization of dynamics) for data outside the training set. Moreover, the path of the sensor can be chosen arbitrarily, provided training data for the spatial trajectory of the sensor is available. The exceptional performance of the network architecture is demonstrated on three applications: turbulent flows, global sea-surface temperature data, and human movement biomechanics.
摘要
感测是观测、预测和控制复杂空间时间系统的基本任务之一。在许多应用中,有限数量的感测器是移动的,其中例子包括佩戴式技术、海洋监测浮标和天气气球。在这些动态系统中(没有独立的统计区域),测量时间历史中含有大量信息,可以EXTRACT FOR CRITICAL TASKS。大多数无模型感测方法忽略时间历史,只是将当前稀缺的感测值映射到高维状态空间中。使用现代深度学习架构,我们显示了一种序列到向量模型,如LSTM(长短期记忆)网络,并将decoder网络与动态轨迹信息结合使用。我们可以通过这种方法来:1. 使用移动感测器的动态轨迹,准确重建全状态空间。2. 相比 stationary 感测器,使用这种架构可以降低mean-square error的方差。3. 该架构允许快速通用化(参数化动态),可以在训练集外进行数据处理。此外,感测器的路径可以随意选择,只需要提供感测器的空间轨迹训练数据即可。我们在三个应用中展示了该网络架构的出色表现:液体动力学、全球海面温度数据和人体运动生物力学。
On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments
results: 我们的研究表明,使用进化算法和优化的超参数,SNN可以达到约91%的信号效率,并且减少了大约一半的参数量,相比深度神经网络。Abstract
This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.
摘要
results: 在模拟数据和实际数据集上,我们的方法可以准确地检测图像中的异常点。同时,我们还证明了在不考虑依赖关系时,检测异常点的方法可能会导致 false positive 的出现。Abstract
We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes. In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobis distances that can be used to fix detection rules and thresholds for outlier detection. We then propose a robust version of the deterministic MCD algorithm that we call edgewise MCD. An application on simulated data shows the interest of taking the dependence structure into account. We also illustrate the utility of the proposed method with a real data set.
摘要
我们考虑网络索引多变量数据中存在变量之间和图节点之间的依赖关系。在这个框架下,我们关注异常检测,并引入边缘异常概念。为此,我们首先计算一些平方差的分布,特别是方差距离的平方的分布,可以用于固定检测规则和阈值 для异常检测。然后,我们提出了一种robust版本的权重MCD算法,称为边缘MCD。在对模拟数据进行应用后,我们发现了考虑依赖结构的利好。此外,我们还通过实际数据集来证明方法的实用性。
QDC: Quantum Diffusion Convolution Kernels on Graphs
results: 在多种数据集上,与类似方法相比,QDC 能够提高预测性能Abstract
Graph convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.
摘要
图 convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.Here's the translation in Traditional Chinese:同步 convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.
results: 论文得到了在多个应用中的改进保证,特别是在随机凸优化(SCO)等问题中。在不同的损失函数下,论文得到了不同的忘记算法,其中对于光滑 lipschitz 损失函数和任意 $\rho>0$,论文得到了忘记算法的质量损失风险为 $\tilde O\big(\frac{1}{\sqrt{n}+\frac{\sqrt{d}{n\rho}\big)$,并且忘记查询(梯度)复杂度为 $\tilde O(\rho \cdot \text{Retraining Complexity})$。Abstract
We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any $\rho>0$, our results yield an unlearning algorithm with excess population risk of $\tilde O\big(\frac{1}{\sqrt{n}+\frac{\sqrt{d}{n\rho}\big)$ with unlearning query (gradient) complexity $\tilde O(\rho \cdot \text{Retraining Complexity})$, where $d$ is the model dimensionality and $n$ is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk $\tilde O\big(\frac{1}{\sqrt{n}+\big(\frac{\sqrt{d}{n\rho}\big)^{1/2}\big)$ with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{2/3}\big)$ and $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{1/3}\big)$ for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to \textit{dynamic} streams consisting of insertions and deletions.
摘要
我们正式化机器学习忘记问题为设计高效的忘记算法,与学习算法相似的选择适应查询。我们提供了高效的忘记算法 для线性和预先构成查询类。我们还证明了忘记在许多问题中,特别是随机凸似对映运算(SCO),可以被简化为上述问题,从而获得改善的保证。具体而言,对于光滑 lipschitz 损失函数并且任意 $\rho>0$,我们的结果提供了一个忘记算法,其错失人口风险为 $\tilde O\big(\frac{1}{\sqrt{n}+\frac{\sqrt{d}{n\rho}\big)$,忘记查询(Gradient)复杂度为 $\tilde O(\rho \cdot \text{Retraining Complexity})$。对非光滑 lipschitz 损失函数,我们提供了一个忘记算法,其错失人口风险为 $\tilde O\big(\frac{1}{\sqrt{n}+\big(\frac{\sqrt{d}{n\rho}\big)^{1/2}\big)$,忘记查询复杂度仍为 $\tilde O(\rho \cdot \text{Retraining Complexity})$。尤其是,在通用化线性模型(GLM)中,例如线性和对数回传 regression,我们得到了维度独立的约 $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{2/3}\big)$ 和 $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{1/3}\big)$ 的错失人口风险。最后,我们将上述结果推广到多个忘记请求的情况,包括动态流过中的插入和删除。
Quantum Convolutional Neural Networks with Interaction Layers for Classification of Classical Data
paper_authors: Jishnu Mahmud, Raisa Mashtura, Shaikh Anowarul Fattah, Mohammad Saquib
for: 研究多比特交互对量子神经网络的影响,提高表达能力和嵌入能力。
methods: 提出了一种量子卷积网络,利用三元器件交互层,提高网络的表达能力和嵌入能力。
results: 在MNIST、Fashion MNIST和芳香 dataset上进行了二分和多分类预测,并与现有状态 искусственный智能方法进行比较,结果表明该方法可以超越现有的状态 искусственный智能方法。Abstract
Quantum Machine Learning (QML) has come into the limelight due to the exceptional computational abilities of quantum computers. With the promises of near error-free quantum computers in the not-so-distant future, it is important that the effect of multi-qubit interactions on quantum neural networks is studied extensively. This paper introduces a Quantum Convolutional Network with novel Interaction layers exploiting three-qubit interactions increasing the network's expressibility and entangling capability, for classifying both image and one-dimensional data. The proposed approach is tested on three publicly available datasets namely MNIST, Fashion MNIST, and Iris datasets, to perform binary and multiclass classifications and is found to supersede the performance of the existing state-of-the-art methods.
摘要
量子机器学习(QML)因量子计算机的特殊计算能力而受到关注。随着近Error-free量子计算机的未来降低,研究多比特交互对量子神经网络的影响非常重要。本文提出了一种具有新型互动层的量子卷积网络,利用三比特交互提高网络的表达能力和排序能力,用于图像和一维数据的分类。该方法在MNIST、Fashion MNIST和爬行 datasets上进行了三类和多类分类测试,并被证明超过了现有状态的方法表现。
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
methods: 本研究使用了高质量的对Alignment和 triplet dataset,进行模型训练。同时, authors also constructed a novel dataset of negated and non-negated statements,以提高模型对否定语言的识别能力。
results: 本研究通过 Massive Textual Embedding Benchmark (MTEB) 进行了广泛的性能评估,并取得了优异的结果。 In addition, the authors also provided a publicly available dataset of negated and non-negated statements, which can be used to improve the model’s ability to recognize negations.Abstract
Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating various textual inputs into numerical representations, thereby capturing the semantic essence of the text. The models excel in applications such as dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet datasets. It underlines the crucial role of data cleaning in dataset preparation, gives in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Textual Embedding Benchmark (MTEB). To increase the model's awareness of negations, we constructed a novel training and evaluation dataset of negated and non-negated statements, which we make publicly available to the community.
摘要
FairMobi-Net: A Fairness-aware Deep Learning Model for Urban Mobility Flow Generation
results: 对四个美国城市的人流数据进行了详细验证,并发现 FairMobi-Net 模型在不同地区之间的人流预测中具有更高的准确性和公平性。模型在不同地区之间的人流预测中具有更高的稳定性和可靠性,并且能够更好地捕捉人们在不同地区之间的流动性。Abstract
Generating realistic human flows across regions is essential for our understanding of urban structures and population activity patterns, enabling important applications in the fields of urban planning and management. However, a notable shortcoming of most existing mobility generation methodologies is neglect of prediction fairness, which can result in underestimation of mobility flows across regions with vulnerable population groups, potentially resulting in inequitable resource distribution and infrastructure development. To overcome this limitation, our study presents a novel, fairness-aware deep learning model, FairMobi-Net, for inter-region human flow prediction. The FairMobi-Net model uniquely incorporates fairness loss into the loss function and employs a hybrid approach, merging binary classification and numerical regression techniques for human flow prediction. We validate the FairMobi-Net model using comprehensive human mobility datasets from four U.S. cities, predicting human flow at the census-tract level. Our findings reveal that the FairMobi-Net model outperforms state-of-the-art models (such as the DeepGravity model) in producing more accurate and equitable human flow predictions across a variety of region pairs, regardless of regional income differences. The model maintains a high degree of accuracy consistently across diverse regions, addressing the previous fairness concern. Further analysis of feature importance elucidates the impact of physical distances and road network structures on human flows across regions. With fairness as its touchstone, the model and results provide researchers and practitioners across the fields of urban sciences, transportation engineering, and computing with an effective tool for accurate generation of human mobility flows across regions.
摘要
Generating realistic human flows across regions is essential for understanding urban structures and population activity patterns, enabling important applications in urban planning and management. However, most existing mobility generation methodologies neglect prediction fairness, which can result in underestimation of mobility flows across regions with vulnerable population groups, potentially leading to inequitable resource distribution and infrastructure development. To address this limitation, our study presents a novel, fairness-aware deep learning model, FairMobi-Net, for inter-region human flow prediction. The FairMobi-Net model incorporates fairness loss into the loss function and employs a hybrid approach, merging binary classification and numerical regression techniques for human flow prediction. We validate the FairMobi-Net model using comprehensive human mobility datasets from four U.S. cities, predicting human flow at the census-tract level. Our findings show that the FairMobi-Net model outperforms state-of-the-art models (such as the DeepGravity model) in producing more accurate and equitable human flow predictions across a variety of region pairs, regardless of regional income differences. The model maintains a high degree of accuracy consistently across diverse regions, addressing the previous fairness concern. Further analysis of feature importance reveals the impact of physical distances and road network structures on human flows across regions. With fairness as its touchstone, the model and results provide researchers and practitioners across the fields of urban sciences, transportation engineering, and computing with an effective tool for accurate generation of human mobility flows across regions.
The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data
paper_authors: Faezehsadat Shahidi, M. Ethan MacDonald, Dallas Seitz, Geoffrey Messier
For: The paper aims to identify factors associated with initial homelessness and police interaction among individuals with addiction or mental health (AMH) diagnoses, and to evaluate the performance of different predictive models using flexible and fixed observation windows.* Methods: The study uses an administrative healthcare dataset from Calgary, Alberta, Canada, comprising 240,219 individuals diagnosed with AMH between April 1, 2013, and March 31, 2018. The cohort is followed for 2 years to identify factors associated with homelessness and police interactions. The authors compare the performance of logistic regression (LR) and machine learning (ML) models, including random forests (RF) and extreme gradient boosting (XGBoost), in two cohorts with fixed and flexible observation windows.* Results: The study finds that male sex, substance disorder, psychiatrist visits, and drug abuse are associated with initial homelessness and police interaction. The authors also demonstrate that XGBoost shows superior performance using the flexible method, with sensitivity and AUC values of 91% and 90%, respectively, for initial homelessness, and 90% and 89%, respectively, for initial police interaction.Abstract
Background: Mental illness can lead to adverse outcomes such as homelessness and police interaction and understanding of the events leading up to these adverse outcomes is important. Predictive models may help identify individuals at risk of such adverse outcomes. Using a fixed observation window cohort with logistic regression (LR) or machine learning (ML) models can result in lower performance when compared with adaptive and parcellated windows. Method: An administrative healthcare dataset was used, comprising of 240,219 individuals in Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with homelessness and police interactions. To understand the benefit of flexible windows to predictive models, an alternative cohort was created. Then LR and ML models, including random forests (RF), and extreme gradient boosting (XGBoost) were compared in the two cohorts. Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) Conclusion: This study identified key features associated with initial homelessness and police interaction and demonstrated that flexible windows can improve predictive modeling.
摘要
背景:心理疾病可能导致不良结果,如失Homelessness和与警察的交往,了解这些不良结果的发展过程是重要的。预测模型可能帮助 identificidadividuals at risk of such adverse outcomes。使用固定观察窗口 cohort with logistic regression (LR) or machine learning (ML) models可能会导致性能下降相比之下 flexible and parcellated windows。方法:使用了一个行政医疗数据集,包括2013年4月1日至2018年3月31日在加拿大阿尔伯塔省加尔法瑞市的240,219名患有情绪或心理健康(AMH)的个体。这个 cohort 被跟踪了2年,以便发现与失Homelessness和与警察的交往相关的因素。为了理解柔性窗口对预测模型的好处,一个alternative cohort 被创建。然后,LR和ML模型,包括随机森林(RF)和极限差分boosting(XGBoost)在两个 cohort 中进行了比较。结果:在237,602个个体中,0.8%(1,800)经历了首次失Homelessness,而0.32%(759)在237,141个个体中首次与警察交往。男性性别(AORs:H=1.51,P=2.52)、物质过度(AORs:H=3.70,P=2.83)、心理医生访问(AORs:H=1.44,P=1.49)和药物滥用(AORs:H=2.67,P=1.83)与初始失Homelessness(H)和初始与警察交往(P)相关。XGBoost 在柔性方法下表现出优于性能(敏感性 =91%, ROC =90% для初始失Homelessness,以及敏感性 =90%, ROC =89% для初始与警察交往)。结论:本研究确定了初始失Homelessness和初始与警察交往的关键特征,并证明了柔性窗口可以改善预测模型。
methods: 该方法使用了 prospectively active learning,在医学试验中采集数据时,condition on the time an image was collected,以维护i.i.d.假设。
results: Comparing with传统的active learning方法,prospective active learning在两种不同的测试环境中表现出了更好的性能。Abstract
This paper presents a novel approach to active learning that takes into account the non-independent and identically distributed (non-i.i.d.) structure of a clinical trial setting. There exists two types of clinical trials: retrospective and prospective. Retrospective clinical trials analyze data after treatment has been performed; prospective clinical trials collect data as treatment is ongoing. Typically, active learning approaches assume the dataset is i.i.d. when selecting training samples; however, in the case of clinical trials, treatment results in a dependency between the data collected at the current and past visits. Thus, we propose prospective active learning to overcome the limitations present in traditional active learning methods and apply it to disease detection in optical coherence tomography (OCT) images, where we condition on the time an image was collected to enforce the i.i.d. assumption. We compare our proposed method to the traditional active learning paradigm, which we refer to as retrospective in nature. We demonstrate that prospective active learning outperforms retrospective active learning in two different types of test settings.
摘要
Heuristic Hyperparameter Choice for Image Anomaly Detection
results: 经过NPCA维度减少后,图像异常检测的性能得到了提高,而且可以避免大量的维度缩放。同时,NPCA algorithm的选择参数也对性能产生了一定的影响。Abstract
Anomaly detection (AD) in images is a fundamental computer vision problem by deep learning neural network to identify images deviating significantly from normality. The deep features extracted from pretrained models have been proved to be essential for AD based on multivariate Gaussian distribution analysis. However, since models are usually pretrained on a large dataset for classification tasks such as ImageNet, they might produce lots of redundant features for AD, which increases computational cost and degrades the performance. We aim to do the dimension reduction of Negated Principal Component Analysis (NPCA) for these features. So we proposed some heuristic to choose hyperparameter of NPCA algorithm for getting as fewer components of features as possible while ensuring a good performance.
摘要
<> translate "Anomaly detection (AD) in images is a fundamental computer vision problem by deep learning neural network to identify images deviating significantly from normality. The deep features extracted from pretrained models have been proved to be essential for AD based on multivariate Gaussian distribution analysis. However, since models are usually pretrained on a large dataset for classification tasks such as ImageNet, they might produce lots of redundant features for AD, which increases computational cost and degrades the performance. We aim to do the dimension reduction of Negated Principal Component Analysis (NPCA) for these features. So we proposed some heuristic to choose hyperparameter of NPCA algorithm for getting as fewer components of features as possible while ensuring a good performance." into Simplified Chinese.翻译文本“图像异常检测(AD)是计算机视觉领域的基本问题,使用深度学习神经网络来标识图像异常的情况。深度特征从预训练模型中提取出来已经证明是AD的关键因素,基于多元 Gaussian 分布分析。但是,由于模型通常在大量数据集上进行分类任务,如 ImageNet,因此可能生成大量的冗余特征,从而增加计算成本并降低性能。我们想使用 Negated Principal Component Analysis(NPCA)维度减少这些特征。因此,我们提出了一些启发来选择 NPCA 算法中的 гиперпарамет。”into Simplified Chinese.Here's the translation:“图像异常检测(AD)是计算机视觉领域的基本问题,通过深度学习神经网络来标识图像异常的情况。深度特征从预训练模型中提取出来已经证明是AD的关键因素,基于多元 Gaussian 分布分析。但是,由于模型通常在大量数据集上进行分类任务,如 ImageNet,因此可能生成大量的冗余特征,从而增加计算成本并降低性能。我们想使用 Negated Principal Component Analysis(NPCA)维度减少这些特征。因此,我们提出了一些启发来选择 NPCA 算法中的 гиперпарамет。”
Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment
results: Q-学习在大量集数据下表现较好,但 DDPG 在一些集数据下表现更好,且在几个集数据下可以快速达到适当的奖励平均值。此外,文章还评估了模型超参数的调整,并预计通过增加时间和资源可以进一步提高性能。Abstract
We leverage the fast physics simulator, MuJoCo to run tasks in a continuous control environment and reveal details like the observation space, action space, rewards, etc. for each task. We benchmark value-based methods for continuous control by comparing Q-learning and SARSA through a discretization approach, and using them as baselines, progressively moving into one of the state-of-the-art deep policy gradient method DDPG. Over a large number of episodes, Qlearning outscored SARSA, but DDPG outperformed both in a small number of episodes. Lastly, we also fine-tuned the model hyper-parameters expecting to squeeze more performance but using lesser time and resources. We anticipated that the new design for DDPG would vastly improve performance, yet after only a few episodes, we were able to achieve decent average rewards. We expect to improve the performance provided adequate time and computational resources.
摘要
我们利用快速物理模拟器MuJoCo来运行任务在连续控制环境中,并揭示任务的观察空间、动作空间、奖励等细节。我们对连续控制方法进行了价值基础比较,通过精度逐步逼近一个现有的深度策略方法DDPG。经过大量的集数,Q学习超越了SARSA,但DDPG在一些集数中超越了两者。最后,我们还进行了模型参数的微调,以适应更好的性能,但是使用更少的时间和资源。我们预计新的DDPG设计将大幅提高性能,但只需几集数就能达到了相当的平均奖励。我们预计通过充足的时间和计算资源,可以进一步提高性能。
results: 研究发现,该检测器在检测量子不紧密相关性(quantum discord)时表现很好,而在检测量子同步相关性(entanglement)时表现不尽如人们所料。实际上,它会大大过分检测同步相关性,而对分解相关性的检测则很准确。Abstract
We build a machine learning model to detect correlations in a three-qubit system using a neural network trained in an unsupervised manner on randomly generated states. The network is forced to recognize separable states, and correlated states are detected as anomalies. Quite surprisingly, we find that the proposed detector performs much better at distinguishing a weaker form of quantum correlations, namely, the quantum discord, than entanglement. In fact, it has a tendency to grossly overestimate the set of entangled states even at the optimal threshold for entanglement detection, while it underestimates the set of discordant states to a much lesser extent. In order to illustrate the nature of states classified as quantum-correlated, we construct a diagram containing various types of states -- entangled, as well as separable, both discordant and non-discordant. We find that the near-zero value of the recognition loss reproduces the shape of the non-discordant separable states with high accuracy, especially considering the non-trivial shape of this set on the diagram. The network architecture is designed carefully: it preserves separability, and its output is equivariant with respect to qubit permutations. We show that the choice of architecture is important to get the highest detection accuracy, much better than for a baseline model that just utilizes a partial trace operation.
摘要
我们建立了一个机器学习模型,用于检测三量子系统中的相关性,使用一个无监督的神经网络在随机生成的态上进行训练。网络强制地认可分解态,相关态被检测为异常。 surprisingly,我们发现,我们提议的检测器在分解态和非分解态之间的分界点上表现比较好,而不是在共聚态和非共聚态之间的分界点上。实际上,它在优化的阈值下对共聚态进行检测时会大大过分,而对非分解态的检测则比较准确。为了 illustrate the nature of 被分类为量子相关的态,我们构建了一个包含多种态的图表,包括共聚态、分解态、不同的分orde和非分orde态。我们发现, near-zero值的认可损失很准确地复制了非分orde分解态的形状,特别是考虑到这些态的非rivial形状。我们注意到,网络的建立是非常重要,以获得最高的检测精度。我们采用了一种细心设计的网络结构,它保留了分解性,并且输出是对 qubit permutation 的对称变换的。我们显示,这种选择的网络结构可以达到最高的检测精度,远胜于基eline 模型,只使用 partial trace 操作。
paper_authors: Yanshu Zhang, Shichong Peng, Alireza Moazeni, Ke Li
for: 学习 scene 表面的准确和简洁点云表示方法。
methods: 我们提出了 Proximity Attention Point Rendering (PAPR) 方法,它包括一个点云表示和可微分渲染器。我们的点云表示使用每个点的空间位置、前景得分和视角独立特征向量来Characterize each point。渲染器选择每个辐线上 relevante 点并使用它们相关的特征生成准确的颜色。
results: PAPR 方法可以准确地学习点云位置,以表示正确的 scene geometry,即使初始化与目标geometry差异很大。此外,我们的方法还能够捕捉细 texture 细节,只用parsimonious 的点云数据。我们还实现了 geometry 编辑、物体操作、Texture 传输和曝光控制等四种实用应用。更多结果和代码可以在我们项目网站上找到:https://zvict.github.io/papr/.Abstract
Learning accurate and parsimonious point cloud representations of scene surfaces from scratch remains a challenge in 3D representation learning. Existing point-based methods often suffer from the vanishing gradient problem or require a large number of points to accurately model scene geometry and texture. To address these limitations, we propose Proximity Attention Point Rendering (PAPR), a novel method that consists of a point-based scene representation and a differentiable renderer. Our scene representation uses a point cloud where each point is characterized by its spatial position, foreground score, and view-independent feature vector. The renderer selects the relevant points for each ray and produces accurate colours using their associated features. PAPR effectively learns point cloud positions to represent the correct scene geometry, even when the initialization drastically differs from the target geometry. Notably, our method captures fine texture details while using only a parsimonious set of points. We also demonstrate four practical applications of our method: geometry editing, object manipulation, texture transfer, and exposure control. More results and code are available on our project website at https://zvict.github.io/papr/.
摘要
学习精确且简洁的点云表示Scene表面从头条仍然是3D表示学习中的挑战。现有的点云方法经常会遇到消失梯度问题或需要很多点云来准确地模型Scene的geometry和文本ure。为了解决这些限制,我们提出了Proximity Attention Point Rendering(PAPR),一种新的方法,其包括点云表示和可导 Renderer。我们的Scene表示使用一个点云,每个点被定义为其空间位置、前景分数和视 independente特征向量。Renderer选择每个辐射中的相关点,并使用它们关联的特征来生成高精度颜色。PAPR有效地学习点云位置来表示正确的Scene geometry,即使初始化与目标geometry有很大差异。此外,我们的方法能够捕捉细节Texture detail,使用只需要一小部分的点云。我们还展示了PAPR的四个实用应用:geometry编辑、物体操作、Texture传输和曝光控制。更多结果和代码可以在我们项目网站https://zvict.github.io/papr/中找到。
Representation Learning in Anomaly Detection: Successes, Limits and a Grand Challenge
results: 文章提出了两个异常检测挑战 зада题:一是科学发现通过异常检测,二是ImageNet dataset中最异常的图像检测挑战。文章认为,新的异常检测工具和想法需要被开发,以解决这些挑战。Abstract
In this perspective paper, we argue that the dominant paradigm in anomaly detection cannot scale indefinitely and will eventually hit fundamental limits. This is due to the a no free lunch principle for anomaly detection. These limitations can be overcome when there are strong tasks priors, as is the case for many industrial tasks. When such priors do not exists, the task is much harder for anomaly detection. We pose two such tasks as grand challenges for anomaly detection: i) scientific discovery by anomaly detection ii) a "mini-grand" challenge of detecting the most anomalous image in the ImageNet dataset. We believe new anomaly detection tools and ideas would need to be developed to overcome these challenges.
摘要
在这篇观点论文中,我们 argue That the dominant paradigm in anomaly detection cannot be scaled indefinitely and will eventually reach fundamental limits. This is due to the a no free lunch principle for anomaly detection. These limitations can be overcome when there are strong task priors, as is the case for many industrial tasks. When such priors do not exist, the task is much harder for anomaly detection. We pose two such tasks as grand challenges for anomaly detection: i) scientific discovery by anomaly detection ii) a "mini-grand" challenge of detecting the most anomalous image in the ImageNet dataset. We believe new anomaly detection tools and ideas would need to be developed to overcome these challenges.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. If you prefer Traditional Chinese, I can provide that as well.
GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos
methods: 使用视transformer学习 directly from sequence of frame-level patches, incorporates a gated-temporal attention mechanism, 并在两个猫眼手术视频数据集上进行了广泛评估
results: 比various state-of-the-art methods superior performance, validate the suitability of our proposed approach for automated surgical step recognition, Results: 在两个猫眼手术视频数据集上进行了广泛评估,并证明了我们提议的方法适用于自动化手术步骤识别。Here’s the full text in Simplified Chinese:
methods: 我们的方法使用视transformer学习 directly from sequence of frame-level patches,并在这些patches中嵌入精度的spatio-temporal feature。我们还提出了一种闭合时间注意力机制,可以智能地结合短期和长期的spatio-temporal特征表示。
results: 我们在两个猫眼手术视频数据集上进行了广泛评估,并证明了我们提议的方法比various state-of-the-art methods superior performance。这些结果证明了我们的方法适用于自动化手术步骤识别。I hope that helps! Let me know if you have any further questions.Abstract
Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormer
摘要
自动化手术步骤识别是一项非常重要的任务,可以大大提高手术过程中病人安全性和决策。现有的现状前沿方法都是 Either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly。然而, jointly learning spatio-temporal features and long-range information 的好处没有得到了考虑。在这篇论文中,我们提出了基于视觉转换器的方法,直接从框架级别的补充帧序列中学习spatio-temporal特征。我们的方法包括一个闭合时间注意力机制,智能地结合短期和长期的spatio-temporal特征表示。我们对两个眼部手术视频数据集,即Cataract-101和D99进行了广泛的评估,并demonstrate了与多种现状前沿方法相比的更好的性能。这些结果证明了我们提出的方法的适用性。我们的代码可以在:https://github.com/nisargshah1999/GLSFormer 中找到。
Brain2Music: Reconstructing Music from Human Brain Activity
results: 研究发现,通过使用这种方法可以重构出与人类Subjects经验的音乐相似的音乐,包括元素如乐器、氛围和情感等Semantic属性。此外,研究还发现了脑活动中不同组织部分对音乐描述文本的表达方式的响应。更多详细的例子可以在https://google-research.github.io/seanet/brain2music中找到。Abstract
The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at https://google-research.github.io/seanet/brain2music
摘要
人脑活动重建经验的过程提供了一种独特的视角,描绘如何Brain interprets和表示世界。在这篇论文中,我们介绍了一种使用functional magnetic resonance imaging(fMRI)捕捉的Brain Activity来重建音乐的方法。我们的方法使用 either music retrieval or MusicLM music generation model conditioned on embeddings derived from fMRI data。生成的音乐与人类试验者所经历的音乐具有相同的Semantic properties,如种类、编制和情感。我们通过 voxel-wise encoding modeling analysis来调查MusicLM中不同组件和脑动activities之间的关系。此外,我们还讨论了脑动活动中表示来自文本描述的音乐刺激的Brain regions。我们提供了补充材料,包括重建音乐的示例,可以在https://google-research.github.io/seanet/brain2music中找到。
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
results: 经过广泛的实验表明,AlignDet可以在多种协议下 дости得显著的提高,例如检测算法、模型脊梁、数据设定和训练计划等,如图1所示,对于FCOS、RetinaNet、Faster R-CNN和DETR等检测器,AlignDet可以提高5.3 mAP、2.1 mAP、3.3 mAP和2.3 mAP等指标。Abstract
The paradigm of large-scale pre-training followed by downstream fine-tuning has been widely employed in various object detection algorithms. In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector's performance, generalization ability, and convergence speed. To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. AlignDet decouples the pre-training process into two stages, i.e., image-domain and box-domain pre-training. The image-domain pre-training optimizes the detection backbone to capture holistic visual abstraction, and box-domain pre-training learns instance-level semantics and task-aware concepts to initialize the parts out of the backbone. By incorporating the self-supervised pre-trained backbones, we can pre-train all modules for various detectors in an unsupervised paradigm. As depicted in Figure 1, extensive experiments demonstrate that AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule. For example, AlignDet improves FCOS by 5.3 mAP, RetinaNet by 2.1 mAP, Faster R-CNN by 3.3 mAP, and DETR by 2.3 mAP under fewer epochs.
摘要
大量预训练后下游精度定义的框架在各种对象检测算法中广泛应用。在这篇论文中,我们发现了预训练和精度定义过程中的数据、模型和任务之间的不一致,这些不一致限制了检测器的性能、泛化能力和收敛速度。为解决这些问题,我们提出了AlignDet,一个可适应不同检测器的统一预训练框架。AlignDet将预训练过程分成两个阶段:图像领域预训练和框预训练。图像领域预训练使检测后处理网络学习整体视觉抽象,而框预训练学习实例级别的语义和任务相关概念,以初始化检测器的部分。通过将自然语言预训练后的背景 integrate 到检测器中,我们可以在无监督模式下预训练所有模块。根据 Figure 1 的实验结果,AlignDet可以在不同的协议、模型背景、数据设置和训练计划下实现显著的改善。例如,AlignDet可以提高 FCOS 的 mAP 指标by 5.3,RetinaNet 的 mAP 指标by 2.1,Faster R-CNN 的 mAP 指标by 3.3,和 DETR 的 mAP 指标by 2.3,并且在更少的训练 epoch 下达到这些改善。
Effectiveness and predictability of in-network storage cache for scientific workflows
results: 该论文通过分析约3TB的运维日志,发现地域缓存系统可以将67.6%的文件请求从宽带网络中除除,并将宽带网络流量减少35.4%(或12.3TB)的平均值。但由于数据访问模式的不同,缓存系统采用了不要将小文件被赋值的策略。此外,该论文还建立了一个可以准确预测缓存行为的Machine Learning模型,这使得该模型有用于未来资源配置和规划研究。Abstract
Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access latency, regional data storage caches have been installed as a new networking service. To study the effectiveness of such a cache system in scientific applications, we examine the Southern California Petabyte Scale Cache for a high-energy physics experiment. By examining about 3TB of operational logs, we show that this cache removed 67.6% of file requests from the wide-area network and reduced the traffic volume on wide-area network by 12.3TB (or 35.4%) an average day. The reduction in the traffic volume (35.4%) is less than the reduction in file counts (67.6%) because the larger files are less likely to be reused. Due to this difference in data access patterns, the cache system has implemented a policy to avoid evicting smaller files when processing larger files. We also build a machine learning model to study the predictability of the cache behavior. Tests show that this model is able to accurately predict the cache accesses, cache misses, and network throughput, making the model useful for future studies on resource provisioning and planning.
摘要
大型科学合作项目经常有多个科学家访问同一组文件进行不同的分析,这会导致访问大量分布在远程位置的数据的重复访问。这些数据访问具有较长延迟时间和吞吐量限制,从宽带网络中传输数据。为了减少宽带网络流量和数据访问延迟时间,地区数据存储缓存已经被设为新的网络服务。为了研究这种缓存系统在科学应用中的效iveness,我们研究了南加利福尼亚州 petabyte 级缓存系统,用于一个高能物理实验。通过分析约 3TB 的操作日志,我们发现这个缓存系统可以从宽带网络中除掉 67.6% 的文件请求,并将每天宽带网络流量减少 12.3TB(或 35.4%)。由于不同的数据访问模式,缓存系统实施了不会将小文件逐出缓存系统的策略。我们还建立了一个机器学习模型,用于研究缓存行为的预测性。测试表明,这个模型能准确预测缓存访问、缓存错误和网络吞吐量,使得这个模型有用于未来的资源配置和规划研究。
results: 研究发现,GPT-4语言模型可以快速和高效地生成攻击AI-Guardian防御机器学习模型的代码,并且在某些情况下可以更快速地生成代码 than 本研究的作者。Abstract
Large language models (LLMs) are now highly capable at a diverse range of tasks. This paper studies whether or not GPT-4, one such LLM, is capable of assisting researchers in the field of adversarial machine learning. As a case study, we evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023, a top computer security conference. We completely break this defense: the proposed scheme does not increase robustness compared to an undefended baseline. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done. We conclude by discussing (1) the warning signs present in the evaluation that suggested to us AI-Guardian would be broken, and (2) our experience with designing attacks and performing novel research using the most recent advances in language modeling.
摘要
我们没有编写攻击代码,而是使用 GPT-4 实现所有攻击算法,根据我们的指导和指令。这个过程 surprisingly 高效,GPT-4 可以很快地从抽象的指令中生成代码,甚至比作者本身更快。我们 conclude 了 (1) 在评估中存在的警示符,表明 AI-Guardian 将被破坏,以及 (2) 我们使用最新的语言模型技术来设计攻击和执行原创研究的经验。
Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback
results: 在模拟和实际世界中,使用人类反馈 guideline 探索,可以学习多种复杂的机器人探索和搅拌任务,而无需手动设计奖励函数或探索奖励。Abstract
Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors.
摘要
paper_authors: David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh
for: 本研究旨在开发一个基础 для continual reinforcement learning。
methods: 本 paper 使用的方法包括 <insert methods used in the paper, e.g. experience replay, memory-based methods, etc.>。
results: 本研究获得了 <insert results of the paper, e.g. improved performance on benchmark tasks, etc.>。Abstract
In this paper we develop a foundation for continual reinforcement learning.
摘要
在这篇论文中,我们开发了一基础 для连续回归学习。Here's the breakdown of the translation:* 在这篇论文中 (in this paper) - 在这里 (here)* 我们开发了 (we develop) - 我们开发 (we develop)* 一基础 (a foundation) - 一基 (a base)* для连续回归学习 (for continual reinforcement learning) - 连续回归学习 (continual reinforcement learning)
results: 作者证明了这两种定义的基本性质和特点,以及它们在标准设置下的应用。Abstract
When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.
摘要
当一个代理人已经 converges 时,标准的模型会给出一个直观的定义:一个代理人 converges 当其在每个环境状态下的行为或性能停止变化。但是,如果我们从环境状态向代理人的状态转移,则代理人的 converges 的概念变得非常不清楚。在这篇论文中,我们提出了两种 complementary 的代理人 converges 观点,即:第一种观点是,一个受限的代理人已经 converges 当其最小需要的状态数量不能减少。第二种观点是,一个受限的代理人已经 converges 当代理人的性能只会变化当代理人的内部状态发生变化。我们证明了这两种定义的基本性质,并证明它们能够涵盖标准设定下的常见 converges 观点。此外,我们还证明了这两种定义之间的关系。我们通过这些观点、定义和分析来带来这个领域中的一个重要概念的清晰性。
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
paper_authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré
for: automating data labeling in domains where manual annotation is expensive
methods: using a method called Embroid to modify the predictions of a prompt, rather than the prompt itself, to improve prompt-based learning without additional labeled data
results: Embroid substantially improves performance over original prompts, realizes improvements for more sophisticated prompting strategies, and can be specialized to domains like law through the embedding functions.Here’s the full Chinese text:
results: Embroid 可以大幅提高提示的性能,实现更复杂的提示策略的提高,并可以根据嵌入函数进行特化,用于领域如法律。Abstract
Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work asks whether it is possible to improve prompt-based learning without additional labeled data. We approach this problem by attempting to modify the predictions of a prompt, rather than the prompt itself. Our intuition is that accurate predictions should also be consistent: samples which are similar under some feature representation should receive the same prompt prediction. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. Embroid then uses these neighborhoods to create additional predictions for each sample, and combines these predictions with a simple latent variable graphical model in order to generate a final corrected prediction. In addition to providing a theoretical analysis of Embroid, we conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. We find that (1) Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT), (2) also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought), and (3) can be specialized to domains like law through the embedding functions.
摘要
Embroid works by computing multiple representations of a dataset under different embedding functions and using the consistency between the LM predictions for neighboring samples to identify mispredictions. It then uses these neighborhoods to create additional predictions for each sample and combines these predictions with a simple latent variable graphical model to generate a final corrected prediction.We provide a theoretical analysis of Embroid and conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. Our results show that Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT) and also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought). Additionally, we find that Embroid can be specialized to domains like law through the embedding functions.
results: 本文证明了 RKD 能够适用于 semi-supervised classification 问题,并提供了一个样本复杂度 bound。此外,文章还展示了 RKD 可以增强 label efficiency,并通过一种涉及到 low clustering error 的框架来证明这一点。最后,文章还将 data augmentation consistency regularization 与这种框架结合,并证明 RKD 可以帮助模型具有 “global” 视角,而不是 consistency regularization 的 “local” 视角。Abstract
Despite the empirical success and practical significance of (relational) knowledge distillation that matches (the relations of) features between teacher and student models, the corresponding theoretical interpretations remain limited for various knowledge distillation paradigms. In this work, we take an initial step toward a theoretical understanding of relational knowledge distillation (RKD), with a focus on semi-supervised classification problems. We start by casting RKD as spectral clustering on a population-induced graph unveiled by a teacher model. Via a notion of clustering error that quantifies the discrepancy between the predicted and ground truth clusterings, we illustrate that RKD over the population provably leads to low clustering error. Moreover, we provide a sample complexity bound for RKD with limited unlabeled samples. For semi-supervised learning, we further demonstrate the label efficiency of RKD through a general framework of cluster-aware semi-supervised learning that assumes low clustering errors. Finally, by unifying data augmentation consistency regularization into this cluster-aware framework, we show that despite the common effect of learning accurate clusterings, RKD facilitates a "global" perspective through spectral clustering, whereas consistency regularization focuses on a "local" perspective via expansion.
摘要
尽管relational知识储备(RKD)在实践中取得了成功和实用意义,但其理论解释仍然受限。在这个工作中,我们开始了RKD的理论理解,特点是用 semi-supervised classification 问题作为入口。我们首先将RKD视为师模型对人口induced图像中的spectral clustering。通过定义分类错误来衡量师模型预测和真实分类之间的差异,我们证明了RKD在人口上可以达到低分类错误。此外,我们还提供了有限无标示样本的抽象 bound。在 semi-supervised learning 中,我们进一步表明了RKD的标签效率,通过一种涵盖cluster-aware semi-supervised learning 框架,假设分类错误低。最后,我们将数据扩展一致性正则化 integrate 到这个框架中,并证明了RKD在global perspective中促进了spectral clustering,而consistency regularization强调了local perspectivevia expansion。
results: 作者通过 theoretical analysis和实验研究,证明了在某些模型中,A-VI可以与F-VI的优秀解相匹配,而且在某些情况下,A-VI可以更快 converge than F-VI。此外,作者还发现在某些模型中,A-VI无法达到F-VI的优秀解,即使推理函数具有足够的表达能力。Abstract
Amortized variational inference (A-VI) is a method for approximating the intractable posterior distributions that arise in probabilistic models. The defining feature of A-VI is that it learns a global inference function that maps each observation to its local latent variable's approximate posterior. This stands in contrast to the more classical factorized (or mean-field) variational inference (F-VI), which directly learns the parameters of the approximating distribution for each latent variable. In deep generative models, A-VI is used as a computational trick to speed up inference for local latent variables. In this paper, we study A-VI as a general alternative to F-VI for approximate posterior inference. A-VI cannot produce an approximation with a lower Kullback-Leibler divergence than F-VI's optimal solution, because the amortized family is a subset of the factorized family. Thus a central theoretical problem is to characterize when A-VI still attains F-VI's optimal solution. We derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum. We show that for a broad class of hierarchical models, including deep generative models, it is possible to close the gap between A-VI and F-VI. Further, for an even broader class of models, we establish when and how to expand the domain of the inference function to make amortization a feasible strategy. Finally, we prove that for certain models -- including hidden Markov models and Gaussian processes -- A-VI cannot match F-VI's solution, no matter how expressive the inference function is. We also study A-VI empirically. On several examples, we corroborate our theoretical results and investigate the performance of A-VI when varying the complexity of the inference function. When the gap between A-VI and F-VI can be closed, we find that the required complexity of the function need not scale with the number of observations, and that A-VI often converges faster than F-VI.
摘要
《总结:Amortized Variational Inference的潜在优势和局限性》Amortized Variational Inference(A-VI)是一种用于估计不可计算的 posterior 分布的方法,它在概率模型中出现的 posterior 分布中进行估计。A-VI 的特点在于它学习一个全局的推理函数,该函数可以将每个观察数据映射到其相应的本地隐藏变量的近似 posterior。与传统的分解(或均值场)变量假设(F-VI)相比,A-VI 直接学习每个隐藏变量的参数。在深度生成模型中,A-VI 被用作一种计算技巧,以加速地域性 latent 变量的推理。在这篇论文中,我们研究 A-VI 作为一种通用的近似 posterior 推理方法。由于 A-VI 不能生成一个低于 Kullback-Leibler 差分的近似,因为权重插值家族是 F-VI 的一个子集。因此,A-VI 的主要理论问题是Characterizing situations in which A-VI can attain F-VI's optimal solution。我们 derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum。我们证明,对于一类层次模型,包括深度生成模型,可以减小 A-VI 和 F-VI 之间的差异。此外,对于另一类模型,我们确定了如何扩展推理函数的域,以使权重插值成为可能的策略。最后,我们证明,对于某些模型,例如隐藏 Markov 模型和 Gaussian 过程,A-VI 无法与 F-VI 的解决方案匹配,不管推理函数如何表达。我们还进行了 A-VI 的实验研究。在一些示例中,我们证明了我们的理论结果,并investigated A-VI 在观察数据的变化情况下的性能。当 A-VI 和 F-VI 之间的差异可以减小时,我们发现推理函数的复杂度不必随观察数据的数量增长,而且 A-VI 通常更快 convergence than F-VI。
Multi-objective point cloud autoencoders for explainable myocardial infarction prediction
methods: 该方法基于多类3D点云表示心脏 анатомия和功能的多对象点云自适应神经网络,其架构包括多个任务特定分支连接到一个低维 latent space,以实现多对象学习 both 重建和心血栓损预测,同时捕捉疾病特定的3D形态信息在可读取的 latent space 中。
results: 在一个大型 UK Biobank 数据集上,这种方法能够准确重建多时间序3D形态,Chamfer 距离输入形态下的误差在图像 pixel 分辨率以下,并在多种机器学习和深度学习标准模型之上提高了19%的incident MI 预测精度。此外,这种方法的任务特定紧凑的 latent space 可以清晰地分离控制和 MI 群集,并与相应的3D形态之间存在 клиниче可能的关联,因此证明了预测的可解释性。Abstract
Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarction prediction, based on multi-class 3D point cloud representations of cardiac anatomy and function. Its architecture consists of multiple task-specific branches connected by a low-dimensional latent space to allow for effective multi-objective learning of both reconstruction and MI prediction, while capturing pathology-specific 3D shape information in an interpretable latent space. Furthermore, its hierarchical branch design with point cloud-based deep learning operations enables efficient multi-scale feature learning directly on high-resolution anatomy point clouds. In our experiments on a large UK Biobank dataset, the multi-objective point cloud autoencoder is able to accurately reconstruct multi-temporal 3D shapes with Chamfer distances between predicted and input anatomies below the underlying images' pixel resolution. Our method outperforms multiple machine learning and deep learning benchmarks for the task of incident MI prediction by 19% in terms of Area Under the Receiver Operating Characteristic curve. In addition, its task-specific compact latent space exhibits easily separable control and MI clusters with clinically plausible associations between subject encodings and corresponding 3D shapes, thus demonstrating the explainability of the prediction.
摘要
心肺infarction (MI) 是全球最常见的死亡原因之一。通常使用在临床中的图像基于标记器,如舒缩率,无法捕捉心肺三维形态中更复杂的模式,因此限制诊断准确性。在这种工作中,我们介绍了一种多目标点云自适应神经网络,作为解释性infarction预测的新的几何深度学方法,基于多类3D点云表示心肺 анатоми和功能。其架构包括多个任务特定分支,连接了一个低维ensional的秘密空间,以实现有效的多目标学习重建和infarction预测,同时捕捉疾病特定的3D形状信息。此外,其层次分支设计和点云深度运算使得高级别特征学习可以直接进行高分辨率的生物Point cloud。在我们对大型UK Biobank数据集的实验中,多目标点云自适应神经网络能够准确重建多时间点云形态,Chamfer距离输入和预测的形态下限于图像的像素分辨率。我们的方法在incident MI预测任务上超过多种机器学习和深度学习参考值19%,在接收操作特征曲线的面积下测试 Area Under the Receiver Operating Characteristic curve。此外,任务特定的秘密空间表现出易分割控制和MI团群,并且与相应的3D形状之间存在临床可能的相关性,因此证明预测的可解释性。
Flow Map Learning for Unknown Dynamical Systems: Overview, Implementation, and Benchmarks
results: 这 paper 提供了一组定义良好的 benchmark 问题,以及这些问题的 FML 结果,以便其他研究人员可以进行交互式检验和结果重现。Abstract
Flow map learning (FML), in conjunction with deep neural networks (DNNs), has shown promises for data driven modeling of unknown dynamical systems. A remarkable feature of FML is that it is capable of producing accurate predictive models for partially observed systems, even when their exact mathematical models do not exist. In this paper, we present an overview of the FML framework, along with the important computational details for its successful implementation. We also present a set of well defined benchmark problems for learning unknown dynamical systems. All the numerical details of these problems are presented, along with their FML results, to ensure that the problems are accessible for cross-examination and the results are reproducible.
摘要
<>Translate the following text into Simplified Chinese.<>流图学习(FML),与深度神经网络(DNN)结合,已经表现出模拟未知动力系统的数据驱动模型的搭配性。FML有能力生成准确预测模型,即使系统的准确数学模型不存在。在这篇论文中,我们提供FML框架的概述,以及成功实施的计算细节。我们还提供一组已定义的benchmark问题,用于学习未知动力系统。这些问题的数字细节和FML结果都提供,以便让问题可以进行交叉检验,并且结果可以重新制作。
Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing
results: 根据论文的结果,NSS方法可以实现高度的错误检测率,比如在MNIST & LeNet1实验中,从5%的测试案例中,NSS可以获得81.8%的错误检测率,较baseline方法高出20%。Abstract
Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label them to test the DNN models. However, properly labeling a large number of unlabeled datasets is a highly expensive and time-consuming task. To address the above-mentioned problem, we propose NSS, Neuron Sensitivity guided test case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the internal neuron's information induced by test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluate NSS with four widely used datasets and four well-designed DNN models compared to SOTA baseline methods. The results show that NSS performs well in assessing the test cases' probability of fault triggering and model improvement capabilities. Specifically, compared with baseline approaches, NSS obtains a higher fault detection rate~(e.g., when selecting 5\% test case from the unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault detection rate, 20\% higher than baselines).
摘要
深度神经网络(DNN)在软件中广泛应用,用于解决多种任务(例如自动驾驶和医疗诊断)。然而,它们也可能产生错误行为,导致金钱损失和对人类安全的威胁。为了揭示DNN中的错误行为并修复它们,DNN开发者们经常收集自然世界中的丰富无标数据集和将其标记以测试DNN模型。然而,为了标记大量无标数据集是非常昂贵和时间消耗的。为解决上述问题,我们提出了NSS,即神经元敏感度导引测试 caso选择。NSS可以将valuable测试 caso从无标数据集中选择出来,以降低标记时间。NSS利用测试 caso对内部神经元的影响来选择有价值的测试 caso,这些测试 caso具有高度触发模型错误的 confidence。我们对四个常用的数据集和四个Well-designed DNN模型进行评估,并与当前最佳方法进行比较。结果表明,NSS在评估测试 caso的可能性和模型改进能力方面表现良好。具体来说,相比基eline方法,NSS在MNIST & LeNet1 experiment中选择5%的测试 caso时可以获得81.8%的错误检测率,高于基eline20%。
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
results: 研究结果表明,锋尖和泛化之间的关系不仅取决于数据分布和模型结构,而且锋尖最小化算法不仅减少锋尖度来获得更好的泛化性。这说明,过参数神经网络的泛化仍然需要寻找更多的解释。Abstract
Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.
摘要
尽管有广泛的研究,底层原因为为争 parameterized neural networks 能够泛化仍然未知。现有理论表明,通用的随机优化器偏好训练损失函数的平坦 minimizers,因此一个自然的可能解释是平坦性implicit generalization。这项工作 Critically examines this explanation。通过理论和实验调查,我们确定了以下三个场景:(1)平坦性可以证明地导致泛化;(2)存在不泛化的最平坦模型和锋尖优化算法失败泛化,以及(3)最有趣的是,存在不泛化的最平坦模型,但锋尖优化算法仍然泛化。我们的结果表明,泛化与锋尖之间存在复杂的相互关系,并且锋尖优化算法不仅仅是为了减少锋尖来实现更好的泛化。这些结果呼吁搜索其他解释泛化过 parameterized neural networks 的机制。
Private Federated Learning with Autotuned Compression
results: 在实际数据上实现了不需要调整的压缩率,并且可以保证隐私。Abstract
We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-optimal for mean estimation, meaning that they can adapt to the ``hardness of the problem" with minimal interactivity. We demonstrate the effectiveness of our approach on real-world datasets by achieving favorable compression rates without the need for tuning.
摘要
我们提出了一些新的技术来减少联盟学习中的通信量,不需要设置或调整压缩率。我们的在线方法会自动根据训练过程中的错误来调整压缩率,同时保持可信数据隐藏和分散隐藏的保证。我们的技术可以适应问题的“困难程度”,并具有最小化互动性。我们在实际应用中获得了有利的压缩率,无需调整。
DREAM: Domain-free Reverse Engineering Attributes of Black-box Model
results: 实验结果显示,该方法比基于基线模型的方法更有优势,并且在不同的领域中均有优秀的测试成绩。Abstract
Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes ($e.g.$, the number of convolutional layers) of a target black-box neural network can be exposed through a sequence of queries. There is a crucial limitation: these works assume the dataset used for training the target model to be known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target Model, called DREAM, without requiring the availability of the target model's training dataset, and put forward a general and principled framework by casting this problem as an out of distribution (OOD) generalization problem. In this way, we can learn a domain-agnostic model to inversely infer the attributes of a target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental studies are conducted and the results validate the superiority of our proposed method over the baselines.
摘要
深度学习模型在机器学习平台上通常是黑obox。先前的研究表明,目标黑obox神经网络的特征(例如,卷积层的数量)可以通过一系列查询暴露出来。然而,这些研究假设了目标模型的训练集已经知道,并利用这个训练集进行模型特征攻击。然而,在实际场景中,目标模型的训练集往往难以获取。因此,目标模型的特征是否可以在这种情况下暴露出来的是有很大的uncertainty。在这篇论文中,我们调查了一个新的问题:针对黑obox目标模型的领域无关特征探测,称为DREAM(黑obox模型特征探测)。我们不需要知道目标模型的训练集,并且提出了一种普适和原则性的框架,将这个问题转化为对外围(OOD)泛化问题。因此,我们可以学习一个领域无关的模型,以倒计时探测黑obox目标模型的特征。这使我们的方法成为可以适应任意领域的模型特征探测方法之一,并且具有强大的泛化能力。我们进行了广泛的实验研究,结果证明了我们的提议方法的优越性。
Progressive distillation diffusion for raw music generation
results: 这个论文通过对不同自收集的数据进行实验,实现了无条件生成音频的进程。模型能够处理进程audio处理和生成,并使用变换从1个通道128x384到3个通道128x128 mel-spectrograms进行循环生成。Abstract
This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with image generation. A lot of focus has been given to those models by the computer vision community. On the other hand, really few have been given for other types of applications such as music generation in waveform domain. In this paper the model for unconditional generating applied to music is implemented: Progressive distillation diffusion with 1D U-Net. Then, a comparison of different parameters of diffusion and their value in a full result is presented. One big advantage of the methods implemented through this work is the fact that the model is able to deal with progressing audio processing and generating , using transformation from 1-channel 128 x 384 to 3-channel 128 x 128 mel-spectrograms and looped generation. The empirical comparisons are realized across different self-collected datasets.
摘要
这篇论文目标是应用一种新的深度学习方法来生成原始音频文件。它基于扩散模型,一种最近的深度生成模型。这种新的方法在图像生成领域有出色的成绩, Computer Vision 社区对其具有很大的关注。然而,对其他应用领域,如音乐生成,实际上很少有研究。在这篇论文中,我们实现了不可Conditional的生成模型:进程散布扩散 with 1D U-Net。然后,我们对不同扩散参数的影响进行了比较,并对全面结果进行了评价。这种方法的一大优点是它可以处理进行 Audio 处理和生成,从1个通道的 128 x 384 转换为3个通道的 128 x 128 mel-spectrograms,并实现循环生成。我们在不同自己收集的数据集上进行了实际比较。
results: GPT-3和ChatGPT模型的认知判断不类似于人类的认知Abstract
Large Language Models (LLMs) have lately been on the spotlight of researchers, businesses, and consumers alike. While the linguistic capabilities of such models have been studied extensively, there is growing interest in investigating them as cognitive subjects. In the present work I examine GPT-3 and ChatGPT capabilities on an limited-data inductive reasoning task from the cognitive science literature. The results suggest that these models' cognitive judgements are not human-like.
摘要
大型语言模型(LLM)在研究人员、企业和消费者之间备受关注。虽然这些模型的语言能力已经得到了广泛的研究,但是有越来越多的人想研究它们作为认知主体。在 presente 作品中,我 исследова了 GPT-3 和 ChatGPT 在有限数据 inductive reasoning 任务上的能力。结果表明这些模型的认知判断不如人类的。
Investigating minimizing the training set fill distance in machine learning regression
results: 实验结果显示,这种采样方法可以对多种回归模型进行最佳化,并且较以往的采样方法有着明显的优势。Abstract
Many machine learning regression methods leverage large datasets for training predictive models. However, using large datasets may not be feasible due to computational limitations or high labelling costs. Therefore, sampling small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining computational efficiency. In this work, we study a sampling approach aimed to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error that linearly depends on the training set fill distance, conditional to the knowledge of data features. For empirical validation, we perform experiments using two regression models on two datasets. We empirically show that selecting a training set by aiming to minimize the fill distance, thereby minimizing the bound, significantly reduces the maximum prediction error of various regression models, outperforming existing sampling approaches by a large margin.
摘要
很多机器学习回归方法利用大量数据进行训练预测模型。然而,使用大量数据可能不是可行的,因为计算限制或高标注成本。因此,从大量未标注数据点中采样小训练集是必要的,以最大化模型性能而减少计算成本。在这种情况下,我们研究了一种采样方法,以最小化选择集的填距。我们 derivates了一个上界,用于预测误差的最大值,其 conditional 于数据特征的知识。为了Empirical验证,我们在两个回归模型上进行了两个数据集的实验。我们发现,通过选择填距最小的训练集,最大化约束,并最小化 bound,可以大幅降低不同回归模型的最大预测误差,超过现有采样方法的表现。
results: 在语言识别、语音识别和其他非语义任务中,MASR 表示具有显著的性能提升,并进行了详细的语言识别任务分析,以提供表示增强的原因Abstract
In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Metadata Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information. The external knowledge sources are incorporated in the form of sample-level pair-wise similarity matrices that are useful in a hard-mining loss. A key advantage of the MASR framework is that it can be combined with any choice of SSL method. Using MASR representations, we perform evaluations on several downstream tasks such as language identification, speech recognition and other non-semantic tasks such as speaker and emotion recognition. In these experiments, we illustrate significant performance improvements for the MASR over other established benchmarks. We perform a detailed analysis on the language identification task to provide insights on how the proposed loss function enables the representations to separate closely related languages.
摘要
PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks
results: 提高了隐私和安全性,同时保持了可靠的分布式推理性能Abstract
Collaborative inference has been a promising solution to enable resource-constrained edge devices to perform inference using state-of-the-art deep neural networks (DNNs). In collaborative inference, the edge device first feeds the input to a partial DNN locally and then uploads the intermediate result to the cloud to complete the inference. However, recent research indicates model inversion attacks (MIAs) can reconstruct input data from intermediate results, posing serious privacy concerns for collaborative inference. Existing perturbation and cryptography techniques are inefficient and unreliable in defending against MIAs while performing accurate inference. This paper provides a viable solution, named PATROL, which develops privacy-oriented pruning to balance privacy, efficiency, and utility of collaborative inference. PATROL takes advantage of the fact that later layers in a DNN can extract more task-specific features. Given limited local resources for collaborative inference, PATROL intends to deploy more layers at the edge based on pruning techniques to enforce task-specific features for inference and reduce task-irrelevant but sensitive features for privacy preservation. To achieve privacy-oriented pruning, PATROL introduces two key components: Lipschitz regularization and adversarial reconstruction training, which increase the reconstruction errors by reducing the stability of MIAs and enhance the target inference model by adversarial training, respectively.
摘要
协同推理已经是一个有前途的解决方案,使得具有限制的边缘设备可以使用当前最佳深度学习模型(DNN)进行推理。在协同推理中,边缘设备首先将输入feed到本地部分DNN中,然后将中间结果上传到云端以完成推理。然而,最近的研究表明,模型反向攻击(MIA)可以从中间结果中重construct输入数据,对协同推理 pose serious privacy concerns。现有的干扰和加密技术是不可靠和不fficient的,无法防止 MIA 的攻击。这篇论文提供了一个可行的解决方案,名为PATROL,它通过在隐私、效率和可用性之间平衡privacy-oriented pruning来解决这个问题。PATROL利用了DNN的后 layers可以更好地提取任务特定的特征。在边缘设备具有限制的协同推理资源的情况下,PATROL将更多的层部署在边缘基础上,使用剪辑技术来强制实施任务特定的特征 для推理,同时减少任务无关但敏感的特征来保护隐私。为了实现隐私 oriented pruning,PATROL引入了两个关键 ком ponent:Lipschitz regularization和对抗重建训练。这两个 ком ponent 会增加MIA的重建错误,从而提高目标推理模型的性能,同时增强隐私保护。
Globally Normalising the Transducer for Streaming Speech Recognition
results: 根据实验结果,在应用全球 нормализаation后,流行式模型的词错率下降了9-11%相对,相对于lookahead模式,流行式模式的表现减少了约一半的差距。Abstract
The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates an output label sequence as it traverses the input sequence. It is straightforward to use in streaming mode, where it generates partial hypotheses before the complete input has been seen. This makes it popular in speech recognition. However, in streaming mode the Transducer has a mathematical flaw which, simply put, restricts the model's ability to change its mind. The fix is to replace local normalisation (e.g. a softmax) with global normalisation, but then the loss function becomes impossible to evaluate exactly. A recent paper proposes to solve this by approximating the model, severely degrading performance. Instead, this paper proposes to approximate the loss function, allowing global normalisation to apply to a state-of-the-art streaming model. Global normalisation reduces its word error rate by 9-11% relative, closing almost half the gap between streaming and lookahead mode.
摘要
<>转换器(例如 RNN-转换器或 Conformer-转换器)在输入序列中生成输出标签序列。在流式模式下使用它非常直接,它可以在输入序列未完全传输前就生成部分假设。这使得它在语音识别中很受欢迎。然而,在流式模式下,转换器具有一个数学毛病,简单来说,限制模型改变自己的想法的能力。这可以通过全局 нормализации(例如 softmax)的替换来解决,但是这会使损失函数不能准确计算。一篇最近的论文提议解决这问题,通过模型的 aproximation 来解决,但这会严重降低性能。这篇论文提议通过近似损失函数来解决,使得全局 нормализация可以应用于流式模型中,从而降低流式模式中的单词错误率,相对于lookahead模式下降9-11%,只差半个差距。<>