paper_authors: Jinchao Feng, Charles Kulick, Sui Tang
For: The paper aims to develop a data-driven approach for discovering a general second-order particle-based model for modeling the aggregation and collective behavior of interacting agents.* Methods: The proposed approach uses Gaussian Process (GP) priors on latent interaction kernels constrained to dynamics and observational data, which allows for uncertainty quantification and nonparametric modeling of interacting dynamical systems. The paper also develops acceleration techniques to improve scalability.* Results: The proposed approach is demonstrated to be effective in modeling various prototype systems, including real-world fish motion datasets, and outperforms competitor methods despite the use of small data sets. The approach learns an effective representation of the nonlinear dynamics in these spaces.Abstract
In this paper, we focus on the data-driven discovery of a general second-order particle-based model that contains many state-of-the-art models for modeling the aggregation and collective behavior of interacting agents of similar size and body type. This model takes the form of a high-dimensional system of ordinary differential equations parameterized by two interaction kernels that appraise the alignment of positions and velocities. We propose a Gaussian Process-based approach to this problem, where the unknown model parameters are marginalized by using two independent Gaussian Process (GP) priors on latent interaction kernels constrained to dynamics and observational data. This results in a nonparametric model for interacting dynamical systems that accounts for uncertainty quantification. We also develop acceleration techniques to improve scalability. Moreover, we perform a theoretical analysis to interpret the methodology and investigate the conditions under which the kernels can be recovered. We demonstrate the effectiveness of the proposed approach on various prototype systems, including the selection of the order of the systems and the types of interactions. In particular, we present applications to modeling two real-world fish motion datasets that display flocking and milling patterns up to 248 dimensions. Despite the use of small data sets, the GP-based approach learns an effective representation of the nonlinear dynamics in these spaces and outperforms competitor methods.
摘要
在这篇论文中,我们关注数据驱动的第二阶射数据驱动模型的发现,这个模型包含了许多当前的模型,用于描述相互作用的体积粒子之间的聚合和集体行为。这个模型以高维系数方程的形式表示,由两个交互卷积函数来评估位姿速度的Alignment。我们提议使用 Gaussian Process 方法来解决这个问题,其中未知模型参数通过两个独立的 Gaussian Process 假设来推敲。这results in a nonparametric model for interacting dynamical systems that accounts for uncertainty quantification. In addition, we develop acceleration techniques to improve scalability. Furthermore, we perform a theoretical analysis to interpret the methodology and investigate the conditions under which the kernels can be recovered. We demonstrate the effectiveness of the proposed approach on various prototype systems, including the selection of the order of the systems and the types of interactions. In particular, we present applications to modeling two real-world fish motion datasets that display flocking and milling patterns up to 248 dimensions. Despite the use of small data sets, the GP-based approach learns an effective representation of the nonlinear dynamics in these spaces and outperforms competitor methods.
COSTAR: Improved Temporal Counterfactual Estimation with Self-Supervised Learning
results: 比较 existing models 的实验结果显示,COSTAR 方法可以实现更高的估计精度和扩展性,并在不同的数据集上具有良好的一致性。Abstract
Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality. For real-world datasets, modeling time-dependent confounders is challenging due to complex dynamics, long-range dependencies and both past treatments and covariates affecting the future outcomes. In this paper, we introduce COunterfactual Self-supervised TrAnsformeR (COSTAR), a novel approach that integrates self-supervised learning for improved historical representations. The proposed framework combines temporal and feature-wise attention with a component-wise contrastive loss tailored for temporal treatment outcome observations, yielding superior performance in estimation accuracy and generalization to out-of-distribution data compared to existing models, as validated by empirical results on both synthetic and real-world datasets.
摘要
<>TRANSLATE_TEXT描述:在各种领域中,如医疗和电商,估算时间上的假设结果是决策的关键,特别是当随机控制试验(RCT)的成本或实用性太高时。实际数据中模型时间依赖的难点在于复杂的动态、长距离依赖和过去的治疗和 covariates 影响未来结果。本文介绍一种新的方法——自动学习Counterfactual Self-supervised TrAnsformeR(COSTAR),它将自动学习纳入历史表示中,以提高估算精度和对异常数据的泛化。方法:1. 将时间序列分解成多个特征,并对每个特征应用自动学习模型进行学习。2. 使用时间和特征粒度的注意力,对历史数据进行注意力机制。3. 使用特征粒度的对比损失函数,以优化历史表示。效果:1. 与现有模型相比,COSTAR 在估算精度和对异常数据的泛化方面具有显著优势。2. COSTAR 在实际数据上显示出较高的鲁棒性和泛化能力。结论:COSTAR 是一种可靠的、高效的 temporal counterfactual outcomes 估算方法,可以在各种领域中应用。<>I hope this helps! Let me know if you have any further questions or if there's anything else I can help with.
results: 可以处理高维观测数据,涵盖观测噪声、复杂互动规则、缺失互动特征和实际观测agent系统观测数据Abstract
We present a review of a series of learning methods used to identify the structure of dynamical systems, aiming to understand emergent behaviors in complex systems of interacting agents. These methods not only offer theoretical guarantees of convergence but also demonstrate computational efficiency in handling high-dimensional observational data. They can manage observation data from both first- and second-order dynamical systems, accounting for observation/stochastic noise, complex interaction rules, missing interaction features, and real-world observations of interacting agent systems. The essence of developing such a series of learning methods lies in designing appropriate loss functions using the variational inverse problem approach, which inherently provides dimension reduction capabilities to our learning methods.
摘要
我们提出了一系列学习方法,用于识别动力系统的结构,以便理解复杂系统中间的 emergent 行为。这些方法不仅提供了理论上的确定性 guarantees,还能够效率地处理高维观测数据。它们可以处理来自一元和二元动力系统的观测数据,并考虑观测/抽象噪声、复杂互动规则、缺失互动特征和实际世界中间的互动代理系统观测数据。难点在于开发这种系列学习方法的核心在于设计适当的损失函数,使用变量逆问题方法,这种方法自然提供了维度减少能力。
results: 模型在16kHz采样率下的延迟低于20毫秒,并且在Consumer CPU上运行速度约为2.8倍于实时。此外,该模型还实现了最低的资源占用和延迟,并提供了开源样本、代码和预训练模型重量。Abstract
We adapt the architectures of previous audio manipulation and generation neural networks to the task of real-time any-to-one voice conversion. Our resulting model, LLVC ($\textbf{L}$ow-latency $\textbf{L}$ow-resource $\textbf{V}$oice $\textbf{C}$onversion), has a latency of under 20ms at a bitrate of 16kHz and runs nearly 2.8x faster than real-time on a consumer CPU. LLVC uses both a generative adversarial architecture as well as knowledge distillation in order to attain this performance. To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model. We provide open-source samples, code, and pretrained model weights at https://github.com/KoeAI/LLVC.
摘要
我们适应之前的音频修改和生成神经网络的架构,用于实时任意到一个语音转换任务。我们的结果是LLVC(低延迟低资源语音转换)模型,延迟时间低于20毫秒,采样率16kHz,在消耗者CPU上运行速度约为2.8倍于实时。LLVC使用生成对抗架构以及知识塑造来实现这种性能。我们认为LLVC在开源语音转换模型中实现了最低的资源使用和延迟时间。我们在https://github.com/KoeAI/LLVC提供开源样本、代码和预训练模型参数。
results: 提出了一些新的可 identificability 结果,以适应实际场景中的不完整、偏 sparse 和依赖源等限制。Abstract
Nonlinear independent component analysis (ICA) aims to uncover the true latent sources from their observable nonlinear mixtures. Despite its significance, the identifiability of nonlinear ICA is known to be impossible without additional assumptions. Recent advances have proposed conditions on the connective structure from sources to observed variables, known as Structural Sparsity, to achieve identifiability in an unsupervised manner. However, the sparsity constraint may not hold universally for all sources in practice. Furthermore, the assumptions of bijectivity of the mixing process and independence among all sources, which arise from the setting of ICA, may also be violated in many real-world scenarios. To address these limitations and generalize nonlinear ICA, we propose a set of new identifiability results in the general settings of undercompleteness, partial sparsity and source dependence, and flexible grouping structures. Specifically, we prove identifiability when there are more observed variables than sources (undercomplete), and when certain sparsity and/or source independence assumptions are not met for some changing sources. Moreover, we show that even in cases with flexible grouping structures (e.g., part of the sources can be divided into irreducible independent groups with various sizes), appropriate identifiability results can also be established. Theoretical claims are supported empirically on both synthetic and real-world datasets.
摘要
To address these limitations and generalize nonlinear ICA, we propose a set of new identifiability results in the general settings of undercompleteness, partial sparsity, and source dependence, and flexible grouping structures. Specifically, we prove identifiability when there are more observed variables than sources (undercomplete), and when certain sparsity and/or source independence assumptions are not met for some changing sources. Moreover, we show that even in cases with flexible grouping structures (e.g., part of the sources can be divided into irreducible independent groups with various sizes), appropriate identifiability results can also be established.Empirical results on both synthetic and real-world datasets support our theoretical claims.
SmoothHess: ReLU Network Feature Interactions via Stein’s Lemma
results: 论文通过在 benchmark datasets 和一个实际医疗领域的呼吸测量数据集上验证了SmoothHess的能力,并证明了它的优于其他方法。Abstract
Several recent methods for interpretability model feature interactions by looking at the Hessian of a neural network. This poses a challenge for ReLU networks, which are piecewise-linear and thus have a zero Hessian almost everywhere. We propose SmoothHess, a method of estimating second-order interactions through Stein's Lemma. In particular, we estimate the Hessian of the network convolved with a Gaussian through an efficient sampling algorithm, requiring only network gradient calls. SmoothHess is applied post-hoc, requires no modifications to the ReLU network architecture, and the extent of smoothing can be controlled explicitly. We provide a non-asymptotic bound on the sample complexity of our estimation procedure. We validate the superior ability of SmoothHess to capture interactions on benchmark datasets and a real-world medical spirometry dataset.
摘要
几种最近的方法用于解释神经网络中的特征交互,通过查看神经网络的第二Derivative。然而,ReLU网络是划分线性的,因此其第二Derivative在大多数地方都是零。我们提出了SmoothHess方法,通过斯坦 lemma来估算神经网络抽象后的第二Derivative。具体来说,我们使用一种高效的采样算法来估算神经网络与 Gaussian 卷积后的第二Derivative,只需要网络梯度的调用。SmoothHess 是一种后期应用的方法,不需要修改 ReLU 网络的结构,并且可以控制抽象程度的Explicitly。我们提供了非 asymptotic 的样本复杂性下界。我们验证了 SmoothHess 在 benchmark 数据集和一个真实世界的医疗领域的呼吸测试数据集上的超过其他方法 capture 交互的能力。
Electronic excited states from physically-constrained machine learning
results: 作者的模型可以对更大和更复杂的分子进行预测,并且可以通过间接目标已经得到的计算结果,从而实现了很大的计算成本减少。这些结果证明了将数据驱动技术与物理基础相互结合的优势,并提供了开发ML-加上电子结构方法的蓝本。Abstract
Data-driven techniques are increasingly used to replace electronic-structure calculations of matter. In this context, a relevant question is whether machine learning (ML) should be applied directly to predict the desired properties or be combined explicitly with physically-grounded operations. We present an example of an integrated modeling approach, in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation. The resulting model can make predictions for molecules that are much larger and more complex than those that it is trained on, and allows for dramatic computational savings by indirectly targeting the outputs of well-converged calculations while using a parameterization corresponding to a minimal atom-centered basis. These results emphasize the merits of intertwining data-driven techniques with physical approximations, improving the transferability and interpretability of ML models without affecting their accuracy and computational efficiency, and providing a blueprint for developing ML-augmented electronic-structure methods.
摘要
“数据驱动技术在物质计算中越来越广泛应用,问题是何时应用机器学习(ML)直接预测感兴趣的性质,或者与物理基础相结合显式地运算。我们提出一种集成模型方法,其中使用对效 Hamiltoniano的 симметри优化机器学习模型来复制电子激发的量子力学计算结果。得到的模型可以预测大量和复杂的分子,并且可以在很大程度上减少计算成本,因为直接targeting已经 converges的计算结果,使用一个对应于最小原子基的参数。这些结果强调了将数据驱动技术与物理方法相结合的优势,提高机器学习模型的传输性和解释性,不影响它们的准确性和计算效率,并为开发机器学习增强电子结构方法提供蓝本。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you need the translation in Traditional Chinese, please let me know.
Sharp Noisy Binary Search with Monotonic Probabilities
results: 这篇论文提出了一个实用的算法,可以在 $\Theta(\frac{1}{\varepsilon^2} \log n)$ 样本中找到目标概率 $\tau$,这是最佳的Bound。此外,论文还解决了两个理论挑战:高概率行为和锐度常数。Abstract
We revisit the noisy binary search model of Karp and Kleinberg, in which we have $n$ coins with unknown probabilities $p_i$ that we can flip. The coins are sorted by increasing $p_i$, and we would like to find where the probability crosses (to within $\varepsilon$) of a target value $\tau$. This generalized the fixed-noise model of Burnashev and Zigangirov , in which $p_i = \frac{1}{2} \pm \varepsilon$, to a setting where coins near the target may be indistinguishable from it. Karp and Kleinberg showed that $\Theta(\frac{1}{\varepsilon^2} \log n)$ samples are necessary and sufficient for this task. We produce a practical algorithm by solving two theoretical challenges: high-probability behavior and sharp constants. We give an algorithm that succeeds with probability $1-\delta$ from \[ \frac{1}{C_{\tau, \varepsilon} \cdot \left(\lg n + O(\log^{2/3} n \log^{1/3} \frac{1}{\delta} + \log \frac{1}{\delta})\right) \] samples, where $C_{\tau, \varepsilon}$ is the optimal such constant achievable. For $\delta > n^{-o(1)}$ this is within $1 + o(1)$ of optimal, and for $\delta \ll 1$ it is the first bound within constant factors of optimal.
摘要
我们重访噪音搜寻模型,由karp和kleinberg提出的模型,其中我们有n枚钱币,每枚钱币的概率$p_i$都是未知的。这些钱币按照增加的$p_i$排序,我们想找出这些概率在 target值 $\tau$ 附近的地方。这个问题可以视为burnashev和zigangirov的固定噪音模型的扩展,这个模型中每枚钱币的概率都是 $\frac{1}{2} \pm \varepsilon$。karp和kleinberg表明了 $\Theta\left(\frac{1}{\varepsilon^2} \log n\right)$ 样本是必要和充分的。我们提出了一个实用的算法,解决了两个理论挑战:高概率行为和锐数常数。我们的算法在 $1-\delta$ 的几率下成功,并且需要 $\frac{1}{C_{\tau, \varepsilon} \cdot \left(\lg n + O\left(\log^{2/3} n \log^{1/3} \frac{1}{\delta} + \log \frac{1}{\delta}\right)\right)$ 样本。如果 $\delta > n^{-o(1)}$ ,这个结果在 $1 + o(1)$ 内,而如果 $\delta \ll 1$ ,这个结果是第一个在常数因数内的界限。
A quantum-classical performance separation in nonconvex optimization
results: 对比 represntative的类别优化算法/解决方案(包括Gurobi),量子算法能够在超polynomial时间内解决这些优化问题,而类别优化算法则需要超polynomial时间。Abstract
In this paper, we identify a family of nonconvex continuous optimization instances, each $d$-dimensional instance with $2^d$ local minima, to demonstrate a quantum-classical performance separation. Specifically, we prove that the recently proposed Quantum Hamiltonian Descent (QHD) algorithm [Leng et al., arXiv:2303.01471] is able to solve any $d$-dimensional instance from this family using $\widetilde{\mathcal{O}(d^3)$ quantum queries to the function value and $\widetilde{\mathcal{O}(d^4)$ additional 1-qubit and 2-qubit elementary quantum gates. On the other side, a comprehensive empirical study suggests that representative state-of-the-art classical optimization algorithms/solvers (including Gurobi) would require a super-polynomial time to solve such optimization instances.
摘要
在这篇论文中,我们确定了一个家族非凸连续优化问题,每个 $d$-维问题有 $2^d$ 的本地最优点。我们证明了最近提出的量子汉密顿下降(QHD)算法 [Leng et al., arXiv:2303.01471] 能够解决任何 $d$-维问题。具体来说,QHD 算法需要 $\widetilde{\mathcal{O}(d^3)$ 量子查询函数值和 $\widetilde{\mathcal{O}(d^4)$ additional 1-qubit 和 2-qubit 元素量子门。而在另一个方面,一项大规模实验表明,代表性的现代类比优化算法/解决方案(包括 Gurobi)会在解决这些优化问题上需要超多项时间。
Mahalanobis-Aware Training for Out-of-Distribution Detection
results: 在CIFAR-10上,本研究获得了明显改善的false positive rate的结果,尤其是在距离OOD标的任务上降低了50%以上。Abstract
While deep learning models have seen widespread success in controlled environments, there are still barriers to their adoption in open-world settings. One critical task for safe deployment is the detection of anomalous or out-of-distribution samples that may require human intervention. In this work, we present a novel loss function and recipe for training networks with improved density-based out-of-distribution sensitivity. We demonstrate the effectiveness of our method on CIFAR-10, notably reducing the false-positive rate of the relative Mahalanobis distance method on far-OOD tasks by over 50%.
摘要
深度学习模型在控制环境中已经取得了广泛的成功,但在开放世界上仍存在采用障碍物。一个重要的任务是检测异常或非标量样本,以便人工干预。在这项工作中,我们提出了一种新的损失函数和训练方法,可以提高网络对非标量敏感性。我们在CIFAR-10上进行了实验,并证明我们的方法可以在远程OOD任务中降低相对马哈拉诺比距离方法的假阳性率高于50%。
Neural Field Dynamics Model for Granular Object Piles Manipulation
methods: Fully convolutional neural network (FCNN) with density field-based representation and translation equivariance, differentiable action rendering module
results: Exceeds existing latent or particle-based methods in accuracy and computation efficiency, demonstrates zero-shot generalization capabilities in various environments and tasks.Abstract
We present a learning-based dynamics model for granular material manipulation. Inspired by the Eulerian approach commonly used in fluid dynamics, our method adopts a fully convolutional neural network that operates on a density field-based representation of object piles and pushers, allowing it to exploit the spatial locality of inter-object interactions as well as the translation equivariance through convolution operations. Furthermore, our differentiable action rendering module makes the model fully differentiable and can be directly integrated with a gradient-based trajectory optimization algorithm. We evaluate our model with a wide array of piles manipulation tasks both in simulation and real-world experiments and demonstrate that it significantly exceeds existing latent or particle-based methods in both accuracy and computation efficiency, and exhibits zero-shot generalization capabilities across various environments and tasks.
摘要
我们提出了一种学习基于的动力学模型,用于粒子材料处理。 Drawing inspiration from fluid dynamics的Eulerian方法,我们的方法使用了一个完全卷积神经网络,该网络在基于物体堆和推进器的浓度场表示上进行操作,以利用物体之间的空间地区性和翻译同构性。此外,我们的可 diferenciable action rendering模块使得模型成为可导数的,可以直接与梯度基本的轨迹优化算法集成。我们在模拟和实际实验中对堆物处理任务进行了广泛的评估,并证明了我们的模型在精度和计算效率两个方面明显超过了现有的隐藏或 particulate 方法,并且在不同环境和任务上显示出零拟合能力。
GIST: Generated Inputs Sets Transferability in Deep Learning
results: 实验结果显示,GIST可以将测试集转移到新的模型下,并且可以选择符合运算目标的测试集。这显示GIST可以帮助将测试集转移到不同的模型和测试集生成程序中。Abstract
As the demand for verifiability and testability of neural networks continues to rise, an increasing number of methods for generating test sets are being developed. However, each of these techniques tends to emphasize specific testing aspects and can be quite time-consuming. A straightforward solution to mitigate this issue is to transfer test sets between some benchmarked models and a new model under test, based on a desirable property one wishes to transfer. This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets among Deep Learning models. Given a property of interest that a user wishes to transfer (e.g., coverage criterion), GIST enables the selection of good test sets from the point of view of this property among available ones from a benchmark. We empirically evaluate GIST on fault types coverage property with two modalities and different test set generation procedures to demonstrate the approach's feasibility. Experimental results show that GIST can select an effective test set for the given property to transfer it to the model under test. Our results suggest that GIST could be applied to transfer other properties and could generalize to different test sets' generation procedures and modalities
摘要
随着神经网络的验证性和测试性需求不断增长,开发测试集的方法也在不断增加。然而,每种测试集生成技术都强调特定的测试方面,可能占用很多时间。为了解决这个问题,我们提出了一种简单的解决方案:通过将 benchmarked 模型中的测试集转移到新的模型下,基于您想要传输的性能来选择好的测试集。这篇论文介绍了 GIST(生成输入集传输性),一种新的深度学习模型测试集传输方法。给定一个您想要传输的性能(例如,覆盖标准),GIST 可以在可用的测试集中选择符合这个性能的好测试集,并将其传输到模型下。我们在不同的检测类型和测试集生成过程中对 GIST 进行了实验评估。实验结果表明,GIST 可以选择一个有效的测试集,以便将其传输到模型下。我们的结果表明,GIST 可以应用于传输其他性能,并且可以扩展到不同的测试集生成过程和模式。
Accelerating Electronic Stopping Power Predictions by 10 Million Times with a Combination of Time-Dependent Density Functional Theory and Machine Learning
results: 在铝中的贫子辐射中实现了电子停止的首先原理计算,并通过机器学习 interpolate 到其他方向,提高了评估新材料的速度,并预测了入射角影响“布拉格峰”的深度变化。Abstract
Knowing the rate at which particle radiation releases energy in a material, the stopping power, is key to designing nuclear reactors, medical treatments, semiconductor and quantum materials, and many other technologies. While the nuclear contribution to stopping power, i.e., elastic scattering between atoms, is well understood in the literature, the route for gathering data on the electronic contribution has for decades remained costly and reliant on many simplifying assumptions, including that materials are isotropic. We establish a method that combines time-dependent density functional theory (TDDFT) and machine learning to reduce the time to assess new materials to mere hours on a supercomputer and provides valuable data on how atomic details influence electronic stopping. Our approach uses TDDFT to compute the electronic stopping contributions to stopping power from first principles in several directions and then machine learning to interpolate to other directions at rates 10 million times higher. We demonstrate the combined approach in a study of proton irradiation in aluminum and employ it to predict how the depth of maximum energy deposition, the "Bragg Peak," varies depending on incident angle -- a quantity otherwise inaccessible to modelers. The lack of any experimental information requirement makes our method applicable to most materials, and its speed makes it a prime candidate for enabling quantum-to-continuum models of radiation damage. The prospect of reusing valuable TDDFT data for training the model make our approach appealing for applications in the age of materials data science.
摘要
知道物质辐射中粒子的能量释放率,即停挡力,对设计核反应堆、医疗治疗、半导体和量子材料等技术非常重要。虽然核心在材料中的辐射贡献已经在文献中很好地理解,但是获取电子贡献的数据一直被视为成本高、依赖于简化假设的问题。我们提出了一种方法,它结合时间相关函数理论(TDDFT)和机器学习,可以在超级计算机上减少评估新材料的时间到只需几个小时,并提供了有价值的数据,以及材料细节如何影响电子停挡。我们的方法使用TDDFT计算电子停挡贡献的数据,从 primera principios Compute 在多个方向上,然后使用机器学习进行 interpolate 到其他方向。我们在钴离子辐射中的铝 studied 并使用它来预测入射角度对最大能量储存深度的影响,这个量抑制器 otherwise 不可能通过模型来模拟。我们的方法不需要任何实验性数据,因此适用于大多数材料,而且它的速度使其成为量子至连续模型的辐射损害的优秀选择。此外,可以 reuse 值得TDDFT数据来训练模型,使我们的方法更有吸引力。
Harnessing machine learning for accurate treatment of overlapping opacity species in GCMs
paper_authors: Aaron David Schneider, Paul Mollière, Gilles Louppe, Ludmila Carone, Uffe Gråe Jørgensen, Leen Decin, Christiane Helling for: 这个研究旨在帮助我们更好地理解外星 planet 和棕矮星的高精度观测结果,特别是通过详细的通流模型(GCM)来模拟化学和辐射的交互作用。methods: 这个研究使用了各种混合方法,包括深度学习(DeepSets,DS)、自适应等效抑制(AEE)和随机重叠与重新排序(RORR)等方法,以混合不同化学种类的Opacity的相关k表。results: 研究发现,DS方法在GCM中具有高度准确和高效的特点,而RORR方法则太慢了。此外,AEE方法的准确性取决于其特定的实现方式,并可能会在实现辐射传输解的过程中引入数值问题。最后,通过模拟降水 TiO 和 VO 的情况,我们证明了降水会阻碍大气层的形成。Abstract
To understand high precision observations of exoplanets and brown dwarfs, we need detailed and complex general circulation models (GCMs) that incorporate hydrodynamics, chemistry, and radiation. In this study, we specifically examine the coupling between chemistry and radiation in GCMs and compare different methods for mixing opacities of different chemical species in the correlated-k assumption, when equilibrium chemistry cannot be assumed. We propose a fast machine learning method based on DeepSets (DS), which effectively combines individual correlated-k opacities (k-tables). We evaluate the DS method alongside other published methods like adaptive equivalent extinction (AEE) and random overlap with rebinning and resorting (RORR). We integrate these mixing methods into our GCM (expeRT/MITgcm) and assess their accuracy and performance for the example of the hot Jupiter HD~209458 b. Our findings indicate that the DS method is both accurate and efficient for GCM usage, whereas RORR is too slow. Additionally, we observe that the accuracy of AEE depends on its specific implementation and may introduce numerical issues in achieving radiative transfer solution convergence. We then apply the DS mixing method in a simplified chemical disequilibrium situation, where we model the rainout of TiO and VO, and confirm that the rainout of TiO and VO would hinder the formation of a stratosphere. To further expedite the development of consistent disequilibrium chemistry calculations in GCMs, we provide documentation and code for coupling the DS mixing method with correlated-k radiative transfer solvers. The DS method has been extensively tested to be accurate enough for GCMs, however, other methods might be needed for accelerating atmospheric retrievals.
摘要
要更好地理解外星 planet 和棕矮星的高精度观测结果,我们需要详细、复杂的通流模型(GCM),这些模型包括 гидро动力学、化学和辐射。在这个研究中,我们专门研究 GCM 中化学和辐射的关系,并比较不同的混合方法。我们提出了一种基于 DeepSets(DS)的快速机器学习方法,可以有效地结合不同化学种类的 opacity 值。我们评估了 DS 方法 alongside 其他已发表的方法,如适应性等抵抗(AEE)和随机 overlap with rebinning and resorting(RORR)。我们将这些混合方法 integrate 到我们的 GCM(expeRT/MITgcm)中,并评估它们在 HD~209458 b 的示例中的准确性和性能。我们发现 DS 方法在 GCM 中具有高准确性和高效率,而 RORR 方法过于慢。此外,我们发现 AEE 方法的准确性取决于具体的实现方式,可能会在实现辐射传输解的过程中引入数值问题。我们然后使用 DS 混合方法在简化化的化学不平衡情况下模拟降水 TiO 和 VO,并证实降水 TiO 和 VO 会阻碍大气层的形成。为了更快速地开发一致的不平衡化学计算方法,我们提供了相关的文档和代码。 DS 方法已经在 GCM 中得到了充分的测试和证明,但可能需要其他方法来加速大气层 retrieval 的发展。
Conformalized Deep Splines for Optimal and Efficient Prediction Sets
methods: 使用神经网络 parameterized splines 进行 conditional density 估计,并提供了两种高效计算 conformal scores
results: 实验结果表明,SPICE-ND 模型可以减少 prediction set 的平均大小,包括某些数据集的减少率达 nearly 50% 相比基eline; SPICE-HPD 模型可以实现最好的 conditional coverage compared to baselines.Abstract
Uncertainty estimation is critical in high-stakes machine learning applications. One effective way to estimate uncertainty is conformal prediction, which can provide predictive inference with statistical coverage guarantees. We present a new conformal regression method, Spline Prediction Intervals via Conformal Estimation (SPICE), that estimates the conditional density using neural-network-parameterized splines. We prove universal approximation and optimality results for SPICE, which are empirically validated by our experiments. SPICE is compatible with two different efficient-to-compute conformal scores, one oracle-optimal for marginal coverage (SPICE-ND) and the other asymptotically optimal for conditional coverage (SPICE-HPD). Results on benchmark datasets demonstrate SPICE-ND models achieve the smallest average prediction set sizes, including average size reductions of nearly 50% for some datasets compared to the next best baseline. SPICE-HPD models achieve the best conditional coverage compared to baselines. The SPICE implementation is made available.
摘要
高度优先级机器学习应用中的不确定性估计是 kritical。一种有效的不确定性估计方法是尊重预测(Conformal Prediction),它可以提供统计保证的预测推断。我们介绍了一种新的尊重回归方法,即spline预测区间via Conformal Estimation(SPICE),它使用神经网络参数化的spline来估计条件概率分布。我们证明了SPICE的 универсалapproximation和优化结果,并通过实验验证了这些结果。SPICE与两种高效计算的尊重分数相容,一种是对边缘覆盖(SPICE-ND)的oracle-optimal分数,另一种是对条件覆盖(SPICE-HPD)的极限优化分数。实验结果表明SPICE-ND模型在一些数据集上实现了最小的平均预测集大小,包括某些数据集上预测集大小减少了近50%的情况。SPICE-HPD模型在基线上比基eline的条件覆盖更好。SPICE实现 disponible。
results: 我们证明了这种比较复杂度是对于考虑的错误度下 theoretically 优化的。实验证明了在排序任务中应用学习增强算法的潜在优势。Abstract
We explore the fundamental problem of sorting through the lens of learning-augmented algorithms, where algorithms can leverage possibly erroneous predictions to improve their efficiency. We consider two different settings: In the first setting, each item is provided a prediction of its position in the sorted list. In the second setting, we assume there is a "quick-and-dirty" way of comparing items, in addition to slow-and-exact comparisons. For both settings, we design new and simple algorithms using only $O(\sum_i \log \eta_i)$ exact comparisons, where $\eta_i$ is a suitably defined prediction error for the $i$th element. In particular, as the quality of predictions deteriorates, the number of comparisons degrades smoothly from $O(n)$ to $O(n\log n)$. We prove that the comparison complexity is theoretically optimal with respect to the examined error measures. An experimental evaluation against existing adaptive and non-adaptive sorting algorithms demonstrates the potential of applying learning-augmented algorithms in sorting tasks.
摘要
我们通过学习加强算法的视角探讨排序问题的基本问题,其中算法可以利用可能存在误差的预测来提高其效率。我们考虑了两种不同的设置:在第一个设置中,每个项目都被提供一个排序列表中的预测位置。在第二个设置中,我们假设存在一种“快速和粗糙”的比较方法,并且与慢速和精确的比较方法相加。对于两种设置,我们设计了新的简单算法,只需要$O(\sum_i \log \eta_i)$ 的确定比较次数,其中 $\eta_i$ 是元素 $i$ 的预测误差。特别是,预测质量逐渐下降时,比较次数会逐渐下降从 $O(n)$ 到 $O(n\log n)$。我们证明了比较复杂度是对于考虑的误差度量来说 theoretically 最优的。实验评估了现有的适应式和非适应式排序算法,示出了应用学习加强算法在排序任务中的潜在优势。
Decision Support Framework for Home Health Caregiver Allocation: A Case Study of HHC Agency in Tennessee, USA
paper_authors: Seyed Mohammad Ebrahim Sharifnia, Faezeh Bagheri, Rupy Sawhney, John E. Kobza, Enrique Macias De Anda, Mostafa Hajiaghaei-Keshteli, Michael Mirrielees
results: 使用田索美洲洲 Tennessee 的家庭健康照顾机构数据,本研究的方法实现了优化旅行里程的目的,并获得了优化旅行里程(最多达42%)和增加每个观察期间的访问次数,而不需要对家庭健康照顾者实施限制。此外,提出的框架还用于评估照顾者资源的分配,提供宝贵的照顾资源管理意见。Abstract
Population aging is a global challenge, leading to increased demand for healthcare and social services for the elderly. Home Health Care (HHC) emerges as a vital solution, specifically designed to serve this population segment. Given the surging demand for HHC, it's essential to coordinate and regulate caregiver allocation efficiently. This is crucial for both budget-optimized planning and ensuring the delivery of high-quality care. This research addresses a key question faced by home health agencies (HHAs): "How can caregiver allocation be optimized, especially when caregivers prefer flexibility in their visiting sequences?". While earlier studies proposed rigid visiting sequences, our study introduces a decision support framework that allocates caregivers through a hybrid method that considers the flexibility in visiting sequences and aims to reduce travel mileage, increase the number of visits per planning period, and maintain the continuity of care - a critical metric for patient satisfaction. Utilizing data from an HHA in Tennessee, United States, our approach led to an impressive reduction in average travel mileage (up to 42% depending on discipline) without imposing restrictions on caregivers. Furthermore, the proposed framework is used for caregivers' supply analysis to provide valuable insights into caregiver resource management.
摘要
全球人口老龄化是一个挑战,导致医疗和社会服务需求增加。家庭健康照顾(HHC)成为一种重要的解决方案,特意设计来服务这个人口段。随着护理需求的增加,有效协调和规范护理人员分配变得非常重要。这是为了保证优质护理的提供,同时也是为了降低成本。本研究面临的问题是:“护理人员分配如何优化,特别是当护理人员喜欢灵活的访问顺序呢?”EARLIER STUDIES提出了固定的访问顺序,但我们的研究推出了一个决策支持框架,该框架通过考虑灵活的访问顺序,以减少旅行里程、增加每个规划期的访问次数,并保持护理连续性——一个关键的满意度指标。使用了一家美国田纳西州的家庭健康照顾机构的数据,我们的方法实现了减少平均旅行里程(最多42%),而不是强制限制护理人员的行为。此外,我们的提案的框架还用于护理人员供应分析,为护理资源管理提供有价值的洞察。
Software Repositories and Machine Learning Research in Cyber Security
paper_authors: Mounika Vanamala, Keith Bryant, Alex Caravella
For: 本研究旨在提高软件开发过程中早期阶段的漏洞检测,通过利用cyber安全存储库如MITRE的CAPEC和CVE数据库,并采用主题模型和机器学习技术,检测软件需求阶段的漏洞。* Methods: 本研究使用了不同的机器学习方法,包括LDA、主题模型、SVM、Na"ive Bayes、随机森林和神经网络,以及深度学习,以寻找软件需求阶段的漏洞。* Results: 研究表明,采用机器学习技术可以提高软件开发过程中早期阶段的漏洞检测,并且可以在不同的软件开发enario下提供关键的助け手,帮助开发者开发更加安全的软件。Abstract
In today's rapidly evolving technological landscape and advanced software development, the rise in cyber security attacks has become a pressing concern. The integration of robust cyber security defenses has become essential across all phases of software development. It holds particular significance in identifying critical cyber security vulnerabilities at the initial stages of the software development life cycle, notably during the requirement phase. Through the utilization of cyber security repositories like The Common Attack Pattern Enumeration and Classification (CAPEC) from MITRE and the Common Vulnerabilities and Exposures (CVE) databases, attempts have been made to leverage topic modeling and machine learning for the detection of these early-stage vulnerabilities in the software requirements process. Past research themes have returned successful outcomes in attempting to automate vulnerability identification for software developers, employing a mixture of unsupervised machine learning methodologies such as LDA and topic modeling. Looking ahead, in our pursuit to improve automation and establish connections between software requirements and vulnerabilities, our strategy entails adopting a variety of supervised machine learning techniques. This array encompasses Support Vector Machines (SVM), Na\"ive Bayes, random forest, neural networking and eventually transitioning into deep learning for our investigation. In the face of the escalating complexity of cyber security, the question of whether machine learning can enhance the identification of vulnerabilities in diverse software development scenarios is a paramount consideration, offering crucial assistance to software developers in developing secure software.
摘要
今天的技术发展和软件开发在加速,网络安全攻击的升级成为了严重的担忧。在软件开发生命周期的所有阶段中,集成强大的网络安全防御变得非常重要。特别是在软件开发需求阶段,早期发现网络安全漏洞变得非常重要。通过利用MITRE提供的Common Attack Pattern Enumeration and Classification(CAPEC)和Common Vulnerabilities and Exposures(CVE)数据库,尝试使用主题模型和机器学习自动检测早期网络安全漏洞。过去的研究主题已经取得了成功的结果,通过杂交不监督机器学习方法,如Latent Dirichlet Allocation(LDA)和主题模型,自动检测漏洞。在前进的探索中,我们将采取多种监督机器学习技术,包括支持向量机(SVM)、愚见树(Na\"ive Bayes)、Random Forest、神经网络和最终过渡到深度学习。面对网络安全的加速升级,机器学习是否能够提高早期网络安全漏洞的检测,成为了 paramount 的考虑,为软件开发者提供关键的帮助,以开发安全的软件。
Deep Learning-Based Classification of Gamma Photon Interactions in Room-Temperature Semiconductor Radiation Detectors
results: 研究人员通过使用模拟数据和实验数据进行训练和验证,发现CoPhNet模型可以在CZTS半导体探测器中具有高精度地分辨Compton散射和照相电子事件。此外,模型还能够在不同的操作参数下保持性能稳定。Abstract
Photon counting radiation detectors have become an integral part of medical imaging modalities such as Positron Emission Tomography or Computed Tomography. One of the most promising detectors is the wide bandgap room temperature semiconductor detectors, which depends on the interaction gamma/x-ray photons with the detector material involves Compton scattering which leads to multiple interaction photon events (MIPEs) of a single photon. For semiconductor detectors like CdZnTeSe (CZTS), which have a high overlap of detected energies between Compton and photoelectric events, it is nearly impossible to distinguish between Compton scattered events from photoelectric events using conventional readout electronics or signal processing algorithms. Herein, we report a deep learning classifier CoPhNet that distinguishes between Compton scattering and photoelectric interactions of gamma/x-ray photons with CdZnTeSe (CZTS) semiconductor detectors. Our CoPhNet model was trained using simulated data to resemble actual CZTS detector pulses and validated using both simulated and experimental data. These results demonstrated that our CoPhNet model can achieve high classification accuracy over the simulated test set. It also holds its performance robustness under operating parameter shifts such as Signal-Noise-Ratio (SNR) and incident energy. Our work thus laid solid foundation for developing next-generation high energy gamma-rays detectors for better biomedical imaging.
摘要
吸收辐射探测器在医疗成像方面扮演着重要角色,如 пози特罗密度 Tomatoes Emission Tomography 或 Computed Tomography。最有前途的探测器是宽带隔材料温度 Semiconductor 探测器,这种探测器通过辐射 photon 与探测器材料的交互,发生 Compton 散射,从而导致多个交互 photon 事件(MIPEs)。例如 CdZnTeSe(CZTS)探测器,它们的探测能谱重叠度很高,因此使用传统的读取电路或信号处理算法来分辨 Compton 散射事件和光电事件是很困难的。在这种情况下,我们提出了一种深度学习分类器 CoPhNet,可以在 CdZnTeSe(CZTS) 探测器上分辨 Compton 散射和光电事件。我们的 CoPhNet 模型通过使用 simulate 数据来模拟实际 CZTS 探测器脉冲,并 validate 使用实验和 simulate 数据。这些结果表明,我们的 CoPhNet 模型可以在 simulate 测试集上 достичь高精度分类。此外,我们的模型还保持了操作参数的变化,如信号噪声比(SNR)和入射能量的稳定性。因此,我们的工作为开发下一代高能γ射线探测器奠定了坚实的基础,以提高生物医疗成像。
Complexity of Single Loop Algorithms for Nonlinear Programming with Stochastic Objective and Constraints
results: 在三个不同情况下,分别需要 $\widetilde{O}(\varepsilon^{-3})$, $\widetilde{O}(\varepsilon^{-4})$, $\widetilde{O}(\varepsilon^{-5})$的复杂性来找到 $\varepsilon$-近似首项约束的解。这些复杂性都是最佳known的。Abstract
We analyze the complexity of single-loop quadratic penalty and augmented Lagrangian algorithms for solving nonconvex optimization problems with functional equality constraints. We consider three cases, in all of which the objective is stochastic and smooth, that is, an expectation over an unknown distribution that is accessed by sampling. The nature of the equality constraints differs among the three cases: deterministic and linear in the first case, deterministic, smooth and nonlinear in the second case, and stochastic, smooth and nonlinear in the third case. Variance reduction techniques are used to improve the complexity. To find a point that satisfies $\varepsilon$-approximate first-order conditions, we require $\widetilde{O}(\varepsilon^{-3})$ complexity in the first case, $\widetilde{O}(\varepsilon^{-4})$ in the second case, and $\widetilde{O}(\varepsilon^{-5})$ in the third case. For the first and third cases, they are the first algorithms of "single loop" type (that also use $O(1)$ samples at each iteration) that still achieve the best-known complexity guarantees.
摘要
我们分析单循环二项罚则和增强拉格历预算法来解决非凸似格式化问题。我们考虑三个情况,其中所有问题的目标都是随机且平滑的,即透过抽样取得未知分布的期望值。具体来说,第一个情况中的等价条件是决定性的且线性的,第二个情况中的等价条件是决定性的、平滑的且非线性的,第三个情况中的等价条件是随机的、平滑的且非线性的。我们使用差分reduction技术来改善复杂性。为了找到满足 $\varepsilon$-近似first-order条件的点,我们需要 $\widetilde{O}(\varepsilon^{-3})$ 的复杂性在第一个情况中,$\widetilde{O}(\varepsilon^{-4})$ 的复杂性在第二个情况中,以及 $\widetilde{O}(\varepsilon^{-5})$ 的复杂性在第三个情况中。在第一个和第三个情况中,这些算法是单循环类型的首个Algorithm (也使用 $O(1)$ 抽样在每个迭代中),它们仍能获得最好的复杂性保证。
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
paper_authors: Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
for: solving large-scale two-player zero-sum games in practice
methods: regret matching$^+$ (RM$^+$) and its variants
results: last-iterate convergence properties of various popular variants of RM$^+$Here are the concise summaries in Simplified Chinese text:
for: 解决大规模的两个玩家零点游戏问题
methods: regret matching$^+$ (RM$^+$) 和其变种
results: regret matching$^+$ 的最后轮收敛性质的研究Abstract
Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.
摘要
算法基于后悔匹配,特别是后悔匹配$^+$(RM$^+)和其变体,在实务中是解决大规模两者零余游戏的最受欢迎方法。不同于如优化 Gradient Descent 的算法,它们在零余游戏中有强烈的最后迭代和均匀收敛性 properties,但是关于 regret-matching 算法的最后迭代属性几乎没有知识。为了解决这个问题,在这篇论文中,我们研究 regret-matching 算法的最后迭代属性。首先,我们通过实验发现了几个实际的variant,例如同步 RM$^+$, 交替 RM$^+$, 和同步预测 RM$^+$,都没有最后迭代 convergence guarantees,甚至在一个简单的 $3\times 3$ 游戏中。然后,我们证明了这些算法的新变体,基于抽象技术,具有最后迭代 convergence:我们证明了 extragradient RM$^+$ 和精确预测 RM$^+$ 在无限次迭代下具有 asymptotic last-iterate convergence(无范围)和 $1/\sqrt{t}$ best-iterate convergence。最后,我们引入了重新启动这些算法的变体,并证明它们具有线性率最后迭代 convergence。
Recovering Linear Causal Models with Latent Variables via Cholesky Factorization of Covariance Matrix
results: 在 sintetic和实际数据上,该算法比前方法快速得多,并在状态艺术表现上达到了顶峰性。在等误差假设下,我们还提出了一种优化过程,用于处理含隐变量的DAG回归问题,并在数值仿真中显示其效果。Abstract
Discovering the causal relationship via recovering the directed acyclic graph (DAG) structure from the observed data is a well-known challenging combinatorial problem. When there are latent variables, the problem becomes even more difficult. In this paper, we first propose a DAG structure recovering algorithm, which is based on the Cholesky factorization of the covariance matrix of the observed data. The algorithm is fast and easy to implement and has theoretical grantees for exact recovery. On synthetic and real-world datasets, the algorithm is significantly faster than previous methods and achieves the state-of-the-art performance. Furthermore, under the equal error variances assumption, we incorporate an optimization procedure into the Cholesky factorization based algorithm to handle the DAG recovering problem with latent variables. Numerical simulations show that the modified "Cholesky + optimization" algorithm is able to recover the ground truth graph in most cases and outperforms existing algorithms.
摘要
发现 causal 关系via recovering directed acyclic graph (DAG) 结构从观察数据是一个常见的 combinatorial 问题。当存在潜在变量时,问题变得更加困难。在这篇论文中,我们首先提出了 DAG 结构回归算法,基于观察数据的协方差矩阵的 Cholesky 分解。该算法快速易于实现,并有理论保证对数据进行正确回归。在synthetic和实际数据集上,该算法比前方法更快速,并达到了领域内最佳性能。进一步,在等误差假设下,我们将 Cholesky 基于算法与优化过程结合,以处理含潜在变量的 DAG 回归问题。numerical simulations 表明,修改后的 "Cholesky + 优化" 算法能够回归真实图形,并在大多数情况下高于现有算法。
results: 研究发现,可以使用这种转换方法来协调神经网络模型的编码器和解码器,并在不同的训练、领域、架构和下游任务中进行有效的适应。此外,研究还发现可以使用零shot学习来协调文本编码器和视觉解码器,即使这两个模型在训练和测试数据上没有共同学习。Abstract
While different neural models often exhibit latent spaces that are alike when exposed to semantically related data, this intrinsic similarity is not always immediately discernible. Towards a better understanding of this phenomenon, our work shows how representations learned from these neural modules can be translated between different pre-trained networks via simpler transformations than previously thought. An advantage of this approach is the ability to estimate these transformations using standard, well-understood algebraic procedures that have closed-form solutions. Our method directly estimates a transformation between two given latent spaces, thereby enabling effective stitching of encoders and decoders without additional training. We extensively validate the adaptability of this translation procedure in different experimental settings: across various trainings, domains, architectures (e.g., ResNet, CNN, ViT), and in multiple downstream tasks (classification, reconstruction). Notably, we show how it is possible to zero-shot stitch text encoders and vision decoders, or vice-versa, yielding surprisingly good classification performance in this multimodal setting.
摘要
不同神经网络经常表现出 semantic 相似的幽默空间,但这种内在相似性并不总是明显可见。为了更好地理解这种现象,我们的工作表明了如何通过简单的变换来跨越不同预训练的神经网络中的表示。我们的方法直接估算了两个给定的幽默空间之间的转换,从而实现了不需要额外训练的卷积Encoder和Decoder的团合。我们广泛验证了这种翻译过程的适应性,包括不同的训练数据、领域、架构(例如 ResNet、CNN、ViT)以及多个下游任务(分类、重建)。尤其是,我们示出了可以零shot卷积文本Encoder和视觉Decoder,或者vice versa,即使没有额外训练,也可以获得良好的分类性能在多模态 setting。
Online Signal Estimation on the Graph Edges via Line Graph Transformation
results: 该方法可以准确地预测图边信号,并且可以在线实时进行预测。Abstract
We propose the Line Graph Normalized Least Mean Square (LGNLMS) algorithm for online time-varying graph edge signals prediction. LGNLMS utilizes the Line Graph to transform graph edge signals into the node of its edge-to-vertex dual. This enables edge signals to be processed using established GSP concepts without redefining them on graph edges.
摘要
我们提出了线 graphs 正规化最小方差方法(LGNLMS)来线上时间变化的图Edge信号预测。LGNLMS利用线图来转换图Edge信号到它的edge-to-vertex dual中的node。这使得图Edge信号可以使用已经定义的GSP概念进行处理,而不需要在图Edge上重新定义它们。
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
results: 研究发现,使用 K-FAC 优化方法可以快速地训练神经网络,并且可以在 $50$-$75%$ 的步数上达到相同的VALIDATION 度量目标。此外,两种不同的 K-FAC 变种在训练图 neural network 和视Transformer 中也具有类似的性能。Abstract
The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currently no framework to apply it to generic architectures, specifically ones with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers which motivate two flavours of K-FAC -- $\textit{expand}$ and $\textit{reduce}$. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimising the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in $50$-$75\%$ of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.
摘要
Modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with weight-sharing. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimization method, has shown promise in speeding up neural network training and reducing computational costs. However, there is currently no framework to apply it to generic architectures, specifically those with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers that motivate two flavors of K-FAC - expand and reduce. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimizing the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in 50-75% of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.Here's the text with the Chinese characters:现代神经网络架构,如 transformers、卷积 neural network 或 graph neural network,可以表示为线性层的 linear layers with weight-sharing。Kronecker-Factored Approximate Curvature (K-FAC),一种二阶优化方法,已经示出了减少神经网络训练时间的损益。然而,目前没有一个框架来应用它们到通用架构,尤其是 Linear weight-sharing 层。在这项工作中,我们确定了 linear weight-sharing 层的两种不同设置,它们驱动了 two flavors of K-FAC - expand 和 reduce。我们证明它们是深度线性网络中 weight-sharing 的 exact setting。另外,K-FAC-reduce 通常比 K-FAC-expand 更快,我们利用它来加速自动超参的选择。最后,我们发现使用这两种 K-FAC 变化来训练图 neural network 和视Transformer 时,两者之间没有很大差异。然而,它们都能在首选Reference run 的 $50$-$75\%$ 步骤中达到固定的验证指标目标,这意味着它们在wall-clock时间中具有相似的改善。这一点 highlights K-FAC 的潜在应用可能性。
Controllable Music Production with Diffusion Models and Guidance Gradients
methods: 这个论文使用 sampling-time 引导的扩散模型进行音乐生成,支持 both reconstruction 和 classification 损失,或任何组合。
results: 这个论文可以生成匹配周围上下文的音乐,或符合类型分布或预训练 embedding 模型。Abstract
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic characteristics to existing audio clips. We achieve this by applying guidance at sampling time in a simple framework that supports both reconstruction and classification losses, or any combination of the two. This approach ensures that generated audio can match its surrounding context, or conform to a class distribution or latent representation specified relative to any suitable pre-trained classifier or embedding model.
摘要
我们展示了如何使用扩散模型进行条件生成,以解决音乐生产中的各种现实任务,包括续写、填充和重新生成音乐Audio,创造缓和过渡 между两个不同的音乐轨道,以及将欲要的风格特征传递到现有音频clip中。我们通过在采样时提供指导,实现了这些任务,并且支持 both reconstruction和 classificationlosses,或任何组合其中的两者。这种方法确保生成的音频能匹配周围的上下文,或者符合任何适用的预训练分类器或嵌入模型中的分布或表示。
A Collaborative Filtering-Based Two Stage Model with Item Dependency for Course Recommendation
results: 论文的实验结果表明,该方法可以达到 AUC 为 0.97 的高精度。Abstract
Recommender systems have been studied for decades with numerous promising models been proposed. Among them, Collaborative Filtering (CF) models are arguably the most successful one due to its high accuracy in recommendation and elimination of privacy-concerned personal meta-data from training. This paper extends the usage of CF-based model to the task of course recommendation. We point out several challenges in applying the existing CF-models to build a course recommendation engine, including the lack of rating and meta-data, the imbalance of course registration distribution, and the demand of course dependency modeling. We then propose several ideas to address these challenges. Eventually, we combine a two-stage CF model regularized by course dependency with a graph-based recommender based on course-transition network, to achieve AUC as high as 0.97 with a real-world dataset.
摘要
“推荐系统已经进行了多年研究,许多有挑战性的模型被提出。其中,协同推荐(CF)模型被认为是最成功的,主要因为它的推荐精度高和不需要隐私敏感的人际资料在训练中。本文将CF模型应用到课程推荐任务上,并提出了许多问题。包括缺乏评价和元数据、课程注册分布不均衡以及课程之间的依赖关系模型。我们随后提出了一些解决方案。最终,我们结合了两阶段CF模型与基于课程迁移网络的图形推荐,实现了实际数据上的AUC值为0.97。”Note that Simplified Chinese is the official standard for Chinese writing in mainland China and is used in this translation. Traditional Chinese is used in Taiwan and Hong Kong, and the translation would be slightly different in those variants.
Structure Learning with Adaptive Random Neighborhood Informed MCMC
paper_authors: Alberto Caron, Xitong Liang, Samuel Livingstone, Jim Griffin
For: The paper is written for learning the structure of Directed Acyclic Graphs (DAGs) under observational data, using a fully-Bayesian approach with a novel Markov Chain Monte Carlo (MCMC) sampler called PARNI-DAG.* Methods: PARNI-DAG uses a locally informed, adaptive random neighborhood proposal to efficiently sample DAGs, and it includes a pre-tuning procedure of the sampler’s parameters to ensure better scalability.* Results: PARNI-DAG quickly converges to high-probability regions and is less likely to get stuck in local modes in high-dimensional settings, and it is demonstrated to be effective in learning DAG structures on a variety of experiments.Here is the same information in Simplified Chinese text:* For: 本 paper 用于在观察数据下学习 Directed Acyclic Graphs (DAGs) 的结构,使用完全 Bayesian 方法和一种新的 Markov Chain Monte Carlo (MCMC) 抽取器 called PARNI-DAG。* Methods: PARNI-DAG 使用本地 Informed, adaptive random neighborhood proposal 来有效地抽取 DAGs,并包括一种适应性 parameter 的预调整过程,以确保更好的扩展性。* Results: PARNI-DAG 快速 converges to high-probability regions,并且在高维度设置中更 unlikely to get stuck in local modes,并在各种实验中证明其效果。Abstract
In this paper, we introduce a novel MCMC sampler, PARNI-DAG, for a fully-Bayesian approach to the problem of structure learning under observational data. Under the assumption of causal sufficiency, the algorithm allows for approximate sampling directly from the posterior distribution on Directed Acyclic Graphs (DAGs). PARNI-DAG performs efficient sampling of DAGs via locally informed, adaptive random neighborhood proposal that results in better mixing properties. In addition, to ensure better scalability with the number of nodes, we couple PARNI-DAG with a pre-tuning procedure of the sampler's parameters that exploits a skeleton graph derived through some constraint-based or scoring-based algorithms. Thanks to these novel features, PARNI-DAG quickly converges to high-probability regions and is less likely to get stuck in local modes in the presence of high correlation between nodes in high-dimensional settings. After introducing the technical novelties in PARNI-DAG, we empirically demonstrate its mixing efficiency and accuracy in learning DAG structures on a variety of experiments.
摘要
在本文中,我们介绍了一种新的MCMC抽样器,PARNI-DAG,用于在观察数据下进行完全 Bayesian 结构学习。假设 causal sufficiency,该算法允许直接从 posterior 分布中采样 Directed Acyclic Graphs (DAGs)。PARNI-DAG 使用地方 Informed 适应随机 neighboor proposal,从而实现更好的混合性。此外,为了提高扩展性,我们将 PARNI-DAG 结合一种预调整 sampler 参数的方法,该方法利用一个基于约束或 scoring 算法 derivied的skeleton graph。这些新特性使得 PARNI-DAG 快速 converges to high-probability regions,并且更 unlikely to get stuck in local modes 在高维度设置下。我们在技术新特性的介绍后,通过实验证明 PARNI-DAG 的混合效率和准确性在学习 DAG 结构方面。
Flexible Tails for Normalising Flows, with Application to the Modelling of Financial Return Data
results: 通过应用这种方法,可以模拟金融收益的极端冲击,并生成新的可能极端的返回数据集。Abstract
We propose a transformation capable of altering the tail properties of a distribution, motivated by extreme value theory, which can be used as a layer in a normalizing flow to approximate multivariate heavy tailed distributions. We apply this approach to model financial returns, capturing potentially extreme shocks that arise in such data. The trained models can be used directly to generate new synthetic sets of potentially extreme returns
摘要
我们提出一种转换,可以改变分布的尾部性质,基于极值理论,可以用作正常化流中的层,近似多变量极大尾部分布。我们应用这种方法模型金融回报,捕捉可能出现的极端冲击。训练模型可以直接生成新的可能极端的 sintetic返回集。Here's a breakdown of the translation:* "We propose" is translated as "我们提出" (wǒmen tīshì)* "a transformation" is translated as "一种转换" (yī zhī zhuān biàn)* "capable of altering the tail properties" is translated as "可以改变分布的尾部性质" (kěyǐ gǎi biàn fēn xiǎng de yǐ yóu)* "motivated by extreme value theory" is translated as "基于极值理论" (jī yù lǐ lún)* "which can be used as a layer in a normalizing flow" is translated as "可以用作正常化流中的层" (kěyǐ yòu yì zhèng huà lù)* "to approximate multivariate heavy tailed distributions" is translated as "近似多变量极大尾部分布" (jìn shì duō biàn yù jí dà wěi bù fēn xiǎng)* "We apply this approach to model financial returns" is translated as "我们应用这种方法模型金融回报" (wǒmen yìng yòu zhèng xìng fāng yì jīn yìu huì bò)* "capturing potentially extreme shocks that arise in such data" is translated as "捕捉可能出现的极端冲击" (bò shòu kěnéng chūshì de jí dà chōng jī)* "The trained models can be used directly to generate new synthetic sets of potentially extreme returns" is translated as "训练模型可以直接生成新的可能极端的 sintetic返回集" (xùn xīng mó delè yì yī zhèng shēng chuāng xīn de kěnéng jí dà de sintetic fù bù)
Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference Accelerators
methods: 这 paper 使用了数据流映射的空间和时间数据重复特性,以及建筑设计的指示信息,对数据流基于 CNN 加速器进行了Memory-based 侧通道攻击,以恢复 CNN 模型结构。
results: 实验结果表明,该攻击可以成功恢复Lenet、Alexnet和VGGnet16等Popular CNN 模型的结构。Abstract
Convolution Neural Networks (CNNs) are widely used in various domains. Recent advances in dataflow-based CNN accelerators have enabled CNN inference in resource-constrained edge devices. These dataflow accelerators utilize inherent data reuse of convolution layers to process CNN models efficiently. Concealing the architecture of CNN models is critical for privacy and security. This paper evaluates memory-based side-channel information to recover CNN architectures from dataflow-based CNN inference accelerators. The proposed attack exploits spatial and temporal data reuse of the dataflow mapping on CNN accelerators and architectural hints to recover the structure of CNN models. Experimental results demonstrate that our proposed side-channel attack can recover the structures of popular CNN models, namely Lenet, Alexnet, and VGGnet16.
摘要
卷积神经网络(CNN)在各个领域广泛应用。最新的数据流基于CNN加速器的进步使得CNN推理可以在有限的边缘设备中进行 efficiently。这些数据流加速器利用卷积层的自然数据重用来处理CNN模型。隐藏CNN模型的架构是重要的隐私和安全问题。这篇论文评估了基于数据流的CNN推理加速器中的内存相关的侧annel信息,以便还原CNN模型的结构。我们提出的攻击利用卷积层的空间和时间数据重用以及架构提示来还原流行的CNN模型 Lenet、Alexnet 和 VGGnet16 的结构。实验结果表明,我们的侧annel攻击可以成功地还原这些CNN模型的结构。
Transfer learning for improved generalizability in causal physics-informed neural networks for beam simulations
results: 实验表明,提议的方法可以快速 converge 并提供高精度的结果,并且在不同的初始条件下(包括噪声)都能够提供 precisions 的结果。此外,对 Timoshenko 钢板进行了扩展的空间和时间领域测试,并与当前 physics-informed 方法进行了比较,结果表明,提议的方法可以准确捕捉钢板的内在动态。Abstract
This paper introduces a novel methodology for simulating the dynamics of beams on elastic foundations. Specifically, Euler-Bernoulli and Timoshenko beam models on the Winkler foundation are simulated using a transfer learning approach within a causality-respecting physics-informed neural network (PINN) framework. Conventional PINNs encounter challenges in handling large space-time domains, even for problems with closed-form analytical solutions. A causality-respecting PINN loss function is employed to overcome this limitation, effectively capturing the underlying physics. However, it is observed that the causality-respecting PINN lacks generalizability. We propose using solutions to similar problems instead of training from scratch by employing transfer learning while adhering to causality to accelerate convergence and ensure accurate results across diverse scenarios. Numerical experiments on the Euler-Bernoulli beam highlight the efficacy of the proposed approach for various initial conditions, including those with noise in the initial data. Furthermore, the potential of the proposed method is demonstrated for the Timoshenko beam in an extended spatial and temporal domain. Several comparisons suggest that the proposed method accurately captures the inherent dynamics, outperforming the state-of-the-art physics-informed methods under standard $L^2$-norm metric and accelerating convergence.
摘要
Traditional PINNs struggle with large space-time domains, even for problems with known analytical solutions. To overcome this limitation, the authors employ a causality-respecting PINN loss function that effectively captures the underlying physics. However, this approach lacks generalizability.To address this issue, the authors propose using transfer learning while adhering to causality to accelerate convergence and ensure accurate results across diverse scenarios. The proposed method is tested on the Euler-Bernoulli beam for various initial conditions, including noisy data, and demonstrates improved accuracy and faster convergence compared to state-of-the-art physics-informed methods.The proposed method is also applied to the Timoshenko beam in a larger spatial and temporal domain, showing its potential for simulating the dynamics of more complex beam systems. The results suggest that the proposed method accurately captures the inherent dynamics of the beams, outperforming existing methods under the standard $L^2$-norm metric.
Personalized Assignment to One of Many Treatment Arms via Regularized and Clustered Joint Assignment Forests
results: 通过 simulations 和理论模型,我们发现使用聚合信息可以减少噪声,并且可以实现个性化分配的医疗效果。Here’s a more detailed explanation of each point:
for: The paper is focused on learning personalized assignments to one of many treatment arms from a randomized controlled trial. The goal is to estimate the heterogeneous treatment effects for each arm, while accounting for the excess variance that can arise when there are many arms.
methods: The paper proposes two methods to address this challenge: (1) a regularized forest-based assignment algorithm based on greedy recursive partitioning, and (2) a clustering scheme that combines treatment arms with consistently similar outcomes. These methods pool information across treatment arms to reduce the excess variance and improve the accuracy of the treatment assignments.
results: The paper presents the results of simulations and a theoretical model that demonstrate the effectiveness of the proposed methods. The results show that the regularized optimization and clustering methods can lead to significant gains in terms of predicting arm-wise outcomes and achieving sizable utility gains from personalization.Abstract
We consider learning personalized assignments to one of many treatment arms from a randomized controlled trial. Standard methods that estimate heterogeneous treatment effects separately for each arm may perform poorly in this case due to excess variance. We instead propose methods that pool information across treatment arms: First, we consider a regularized forest-based assignment algorithm based on greedy recursive partitioning that shrinks effect estimates across arms. Second, we augment our algorithm by a clustering scheme that combines treatment arms with consistently similar outcomes. In a simulation study, we compare the performance of these approaches to predicting arm-wise outcomes separately, and document gains of directly optimizing the treatment assignment with regularization and clustering. In a theoretical model, we illustrate how a high number of treatment arms makes finding the best arm hard, while we can achieve sizable utility gains from personalization by regularized optimization.
摘要
我们考虑学习对很多治疗臂的对照试验 personnalized 任务。标准方法可能在这种情况下表现不佳,因为过度差异。我们 instead propose 方法可以聚集到治疗臂上的信息:首先,我们考虑一种对应树基于循环分割的调整算法,将效果估计调整到不同的臂上。其次,我们将 clustering 方案与治疗臂相结合,以实现具有相似结果的臂集合。在一个 simulated study 中,我们比较这些方法和分别预测每个臂的结果,并证明了通过调整和聚集可以实现更高的利益。在一个理论模型中,我们显示出一高数量的治疗臂使得找到最佳臂的问题困难,但是通过调整估计可以实现较大的价值增加。
Online Student-$t$ Processes with an Overall-local Scale Structure for Modelling Non-stationary Data
for: Handle time-dependent data with non-stationarity and heavy-tailed errors.
methods: Bayesian mixture of student-$t$ processes with overall-local scale structure for the covariance, and sequential Monte Carlo (SMC) sampler for online inference.
results: Superior performance compared to typical Gaussian process-based models on real-world data sets.Abstract
Time-dependent data often exhibit characteristics, such as non-stationarity and heavy-tailed errors, that would be inappropriate to model with the typical assumptions used in popular models. Thus, more flexible approaches are required to be able to accommodate such issues. To this end, we propose a Bayesian mixture of student-$t$ processes with an overall-local scale structure for the covariance. Moreover, we use a sequential Monte Carlo (SMC) sampler in order to perform online inference as data arrive in real-time. We demonstrate the superiority of our proposed approach compared to typical Gaussian process-based models on real-world data sets in order to prove the necessity of using mixtures of student-$t$ processes.
摘要
时间相关数据经常具有非站立性和重 tailed 错误特点,这些特点不适合使用流行的模型假设。因此,需要更 flexible 的方法来处理这些问题。为此,我们提议使用 Bayesian mixture of student-$t$ processes 以及全局本地尺度结构来描述协方差。此外,我们还使用 sequential Monte Carlo (SMC) 样本器进行在线推断,以便在实时接收数据时进行推断。我们通过对实际数据集进行比较,证明了我们的提议方法的必要性,并且超过了 typical Gaussian process-based 模型。
Learning to optimize by multi-gradient for multi-objective optimization
methods: 本研究提出了一种基于自动学习的多 gradient 学习方法(ML2O),该方法可以自动学习一个生成器或映射,以更新方向。此外,我们还提出了一种受保护的多 gradient 学习方法(GML2O),并证明其迭代序列会 converges to a Pareto 稳定点。
results: 实验结果表明,我们学习的优化器在训练多任务学习(MTL)神经网络时表现更高效,比手动设计的竞争对手。Abstract
The development of artificial intelligence (AI) for science has led to the emergence of learning-based research paradigms, necessitating a compelling reevaluation of the design of multi-objective optimization (MOO) methods. The new generation MOO methods should be rooted in automated learning rather than manual design. In this paper, we introduce a new automatic learning paradigm for optimizing MOO problems, and propose a multi-gradient learning to optimize (ML2O) method, which automatically learns a generator (or mappings) from multiple gradients to update directions. As a learning-based method, ML2O acquires knowledge of local landscapes by leveraging information from the current step and incorporates global experience extracted from historical iteration trajectory data. By introducing a new guarding mechanism, we propose a guarded multi-gradient learning to optimize (GML2O) method, and prove that the iterative sequence generated by GML2O converges to a Pareto critical point. The experimental results demonstrate that our learned optimizer outperforms hand-designed competitors on training multi-task learning (MTL) neural network.
摘要
人工智能(AI)的发展对科学研究带来了新的学习基本设计方法的需求,这需要我们重新评估多目标优化(MOO)方法的设计。新一代MOO方法应该是基于自动学习而不是手动设计。在这篇论文中,我们介绍了一种新的自动学习 парадиг,用于优化MOO问题,并提出了多Gradient学习优化(ML2O)方法。这种学习基于的方法可以自动学习一个生成器(或映射),以更新方向。通过利用当前步骤中的信息和历史迭代轨迹数据来捕捉当地特征,ML2O方法可以学习当地景观。我们还提出了一种新的保护机制,称为卫护多Gradient学习优化(GML2O)方法,并证明其迭代序列会 converge to a Pareto优点。实验结果表明,我们学习优化器比手动设计的竞争对手在训练多任务学习(MTL)神经网络上表现更好。
Machine Learning Without a Processor: Emergent Learning in a Nonlinear Electronic Metamaterial
paper_authors: Sam Dillavou, Benjamin D Beyer, Menachem Stern, Marc Z Miskin, Andrea J Liu, Douglas J Durian
for: 这项研究旨在开发一种可以进行快速、能效的分析学习机制,以替代传统的深度学习算法。
methods: 研究人员使用了一种基于晶体管的非线性学习元件,实现了无计算机的非线性学习。
results: 研究人员发现,这种非线性学习元件可以完成传统 linear 系统无法实现的任务,包括 XOR 和非线性回归。此外,这种系统还具有较低的能耗和可重新编辑的特点。Abstract
Standard deep learning algorithms require differentiating large nonlinear networks, a process that is slow and power-hungry. Electronic learning metamaterials offer potentially fast, efficient, and fault-tolerant hardware for analog machine learning, but existing implementations are linear, severely limiting their capabilities. These systems differ significantly from artificial neural networks as well as the brain, so the feasibility and utility of incorporating nonlinear elements have not been explored. Here we introduce a nonlinear learning metamaterial -- an analog electronic network made of self-adjusting nonlinear resistive elements based on transistors. We demonstrate that the system learns tasks unachievable in linear systems, including XOR and nonlinear regression, without a computer. We find our nonlinear learning metamaterial reduces modes of training error in order (mean, slope, curvature), similar to spectral bias in artificial neural networks. The circuitry is robust to damage, retrainable in seconds, and performs learned tasks in microseconds while dissipating only picojoules of energy across each transistor. This suggests enormous potential for fast, low-power computing in edge systems like sensors, robotic controllers, and medical devices, as well as manufacturability at scale for performing and studying emergent learning.
摘要
标准深度学习算法需要分别大的非线性网络,这是一个缓态且电力消耗很大的过程。电子学习元件提供了可能快速、效率高、错误快速修复的硬件 для数位机器学习,但现有的实现方法是线性的,这限制了它们的能力。这些系统与人工神经网络以及大脑有所不同,因此尚未探讨了非线性元素的可行性和价值。我们现在引入了非线性学习元件---一个基于普通遮蔽器的数位电子网络。我们证明了这个系统可以进行线性系统无法进行的任务,包括XOR和非线性回归,并且不需要电脑。我们发现了我们的非线性学习元件可以将训练错误分解为不同的模式(平均值、斜率、曲线),类似于人工神经网络的spectral bias。这个网络的普通遮蔽器是可靠的、可重复 trains 秒钟内,并且在微秒钟内完成学习任务,同时电子普通遮蔽器只消耗了每个普通遮蔽器的picojoules的能量。这表明这种快速、低功率的计算在边缘系统中,如感测器、 robot控制器和医疗设备,以及在大规模生产中进行和研究emergent learning的可能性很大。
results: 这个智能噪音减少系统可以干预低频噪音,并且搭配睡眠追踪、音乐播放等应用程序,可以提供轻松、安全、智能的噪音减少解决方案。Abstract
While our world is filled with its own natural sounds that we can't resist enjoying, it is also chock-full of other sounds that can be irritating, this is noise. Noise not only influences the working efficiency but also the human's health. The problem of reducing noise is one of great importance and great difficulty. The problem has been addressed in many ways over the years. The current methods for noise reducing mostly rely on the materials and transmission medium, which are only effective to some extent for the high frequency noise. However, the effective reduction noise method especially for low frequency noise is very limited. Here we come up with a noise reduction system consist of a sensor to detect the noise in the environment. Then the noise will be sent to an electronic control system to process the noise, which will generate a reverse phase frequency signal to counteract the disturbance. Finally, the processed smaller noise will be broadcasted by the speaker. Through this smart noise reduction system, even the noise with low-frequency can be eliminated. The system is also integrated with sleep tracking and music player applications. It can also remember and store settings for the same environment, sense temperature, and smart control of home furniture, fire alarm, etc. This smart system can transfer data easily by Wi-Fi or Bluetooth and controlled by its APP. In this project, we will present a model of the above technology which can be used in various environments to prevent noise pollution and provide a solution to the people who have difficulties finding a peaceful and quiet environment for sleep, work or study.
摘要
在我们的世界中,充满自然的声音是我们无法抵抗的乐趣之一,但是这些声音也包括了吵闹和干扰的声音,它们不仅影响工作效率,还影响人类的健康。减少吵闹的问题是一项非常重要且具有挑战性的问题。过去多年来,人们已经使用了许多方法来解决这个问题,但现有的减少吵闹方法主要依靠材料和传输媒体,它们只能够有限地降低高频吵闹。然而,对低频吵闹的有效减少方法尚存在很大的限制。为了解决这问题,我们提出了一种吵闹减少系统,该系统包括一个检测环境吵闹的感知器。然后,吵闹将被传输到一个电子控制系统进行处理,该系统会生成一个逆相频率信号,以抵消干扰。最后,已经处理过的小于吵闹将被广播器播发出来。通过这个智能吵闹减少系统,甚至可以减少低频吵闹。这个系统还 integrates 睡眠跟踪和音乐播放应用程序,可以记录和存储相同环境的设置,感测 темпераature 和智能控制家具、火警等。这个智能系统可以通过 Wi-Fi 或蓝牙传输数据,并由其APP控制。在这个项目中,我们将展示一种可以在不同环境中应用的技术模型,用于防止吵闹污染和提供舒适的睡眠、工作或学习环境。
Real-Time Magnetic Tracking and Diagnosis of COVID-19 via Machine Learning
paper_authors: Dang Nguyen, Phat K. Huynh, Vinh Duc An Bui, Kee Young Hwang, Nityanand Jain, Chau Nguyen, Le Huu Nhat Minh, Le Van Truong, Xuan Thanh Nguyen, Dinh Hoang Nguyen, Le Tien Dung, Trung Q. Le, Manh-Huong Phan
results: 研究发现,这种检测平台可以准确地识别 COVID-19 患者和健康者,准确率高达 90%。Abstract
The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through three specific breath testing protocols: normal breath, holding breath, and deep breath. We collected breath data from both COVID-19 patients and healthy subjects in Vietnam using this platform, which then served to train and validate ML models. Our evaluation encompassed multiple ML algorithms, including support vector machines and deep learning models, assessing their ability to diagnose COVID-19. Our multi-model validation methodology ensures a thorough comparison and grants the adaptability to select the most optimal model, striking a balance between diagnostic precision with model interpretability. The findings highlight the exceptional potential of our diagnostic tool in pinpointing respiratory anomalies, achieving over 90% accuracy. This innovative sensor technology can be seamlessly integrated into healthcare settings for patient monitoring, marking a significant enhancement for the healthcare infrastructure.
摘要
COVID-19 大流行强调了可靠、不侵入的诊断工具的重要性,以便实施有效的公共卫生措施。在这项工作中,我们将磁力呼吸感测技术(MRST)与机器学习(ML)相结合,创造了一个用于实时跟踪和诊断COVID-19和其他呼吸疾病的诊断平台。MRST可以高精度地捕捉呼吸模式,通过三种呼吸测试协议:正常呼吸、停吸和深呼吸。我们在越南collected呼吸数据,并使用这个平台来训练和验证ML模型。我们的评估包括多种ML算法,包括支持向量机和深度学习模型,评估它们是否能够诊断COVID-19。我们的多模型验证方法可以很好地比较多种模型,从而选择最佳模型,并且能够寻求在诊断精度和模型解释性之间取得平衡。研究发现,我们的诊断工具可以准确地检测呼吸异常,达到90%以上的准确率。这种创新的感测技术可以轻松地integrated into healthcare settings,为患者监测提供了 significanthenhancement。
Retrieval-Based Reconstruction For Time-series Contrastive Learning
results: 该研究通过多种Modalities的验证实验表明,在REBAR contrastive learning框架中,可以学习一个高效的嵌入,并且在下游任务上达到了 estado-of-the-art 表现。Abstract
The success of self-supervised contrastive learning hinges on identifying positive data pairs that, when pushed together in embedding space, encode useful information for subsequent downstream tasks. However, in time-series, this is challenging because creating positive pairs via augmentations may break the original semantic meaning. We hypothesize that if we can retrieve information from one subsequence to successfully reconstruct another subsequence, then they should form a positive pair. Harnessing this intuition, we introduce our novel approach: REtrieval-BAsed Reconstruction (REBAR) contrastive learning. First, we utilize a convolutional cross-attention architecture to calculate the REBAR error between two different time-series. Then, through validation experiments, we show that the REBAR error is a predictor of mutual class membership, justifying its usage as a positive/negative labeler. Finally, once integrated into a contrastive learning framework, our REBAR method can learn an embedding that achieves state-of-the-art performance on downstream tasks across various modalities.
摘要
文章标题:基于检索的重构减错学习文章摘要:自监督减错学习的成功取决于标识符合的数据对,使得在嵌入空间中拼接起来的信息具有下游任务的用于。然而,在时序序列中,通过扩展可能会破坏原始 semantics 的含义。我们假设如果可以从一个子序列中检索到另一个子序列,并成功地重构它,那么它们应该组成一个正确的对。基于这个假设,我们介绍了我们的新方法:REtrieval-BAsed Reconstruction(REBAR)减错学习。首先,我们使用卷积 convolutional cross-attention 架构来计算 REBAR 错误 между两个不同的时序序列。然后,通过验证实验,我们表明 REBAR 错误是分类成员之间的共同标识符,因此可以作为正/负标签。最后,我们将 REBAR 方法integrated into a contrastive learning framework,可以学习一个在不同模式下 achieve state-of-the-art 的嵌入。
Fixed-Budget Best-Arm Identification in Sparse Linear Bandits
results: 研究人员通过仔细选择几何参数(如lasso的正则化参数),并在两个阶段中均衡错误概率,从而得到了较低的错误概率。此外,研究人员还证明了lasso-od算法在稀疏和高维的线性弹性中具有几乎最佳性。最后,通过数值示例,研究人员证明了lasso-od算法在非稀疏的线性弹性中的显著性能提高。Abstract
We study the best-arm identification problem in sparse linear bandits under the fixed-budget setting. In sparse linear bandits, the unknown feature vector $\theta^*$ may be of large dimension $d$, but only a few, say $s \ll d$ of these features have non-zero values. We design a two-phase algorithm, Lasso and Optimal-Design- (Lasso-OD) based linear best-arm identification. The first phase of Lasso-OD leverages the sparsity of the feature vector by applying the thresholded Lasso introduced by Zhou (2009), which estimates the support of $\theta^*$ correctly with high probability using rewards from the selected arms and a judicious choice of the design matrix. The second phase of Lasso-OD applies the OD-LinBAI algorithm by Yang and Tan (2022) on that estimated support. We derive a non-asymptotic upper bound on the error probability of Lasso-OD by carefully choosing hyperparameters (such as Lasso's regularization parameter) and balancing the error probabilities of both phases. For fixed sparsity $s$ and budget $T$, the exponent in the error probability of Lasso-OD depends on $s$ but not on the dimension $d$, yielding a significant performance improvement for sparse and high-dimensional linear bandits. Furthermore, we show that Lasso-OD is almost minimax optimal in the exponent. Finally, we provide numerical examples to demonstrate the significant performance improvement over the existing algorithms for non-sparse linear bandits such as OD-LinBAI, BayesGap, Peace, LinearExploration, and GSE.
摘要
我们研究最佳臂识别问题在简线性弹珠下,尤其是在固定预算设定下。在简线性弹珠中,未知特征向量 $\theta^*$ 可能是高维度 $d$,但只有一些,例如 $s \ll d$ 的特征有非零值。我们设计了两相运算法,即 Lasso 和 Optimal-Design-(Lasso-OD)基于的线性最佳臂识别。第一相的 Lasso-OD 利用特征向量的简单性,通过实际 Zhou (2009) 提出的降顿 Lasso,估计 $\theta^*$ 的支持正确地使用对选择的枪和设计矩阵获得的奖励。第二相的 Lasso-OD 则应用 Yang 和 Tan (2022) 提出的 OD-LinBAI 算法。我们谨慎地选择几何 Parameters(如 Lasso 的 regularization 参数),并将两相的错误概率均衡,以取得非对应数学上的最佳性。对于固定的 $s$ 和 $T$,Lasso-OD 的错误概率的指数随 $s$ 而变化,从而获得高维度和简线性弹珠的明显性能提升。此外,我们还证明 Lasso-OD 是对数最佳的。最后,我们提供了一些实际的数据,以证明 Lasso-OD 对非简线性弹珠的现有算法,如 OD-LinBAI、BayesGap、Peace、LinearExploration 和 GSE 的性能有很大的提升。
results: 论文通过对一些常见的 Bayesian 模型进行评估,显示了 DMVI 的 posterior 推理结果比 contemporary 方法在 PPL 中更为准确,同时 computation cost 相对 similar,且需要更少的手动调整。Abstract
We propose Diffusion Model Variational Inference (DMVI), a novel method for automated approximate inference in probabilistic programming languages (PPLs). DMVI utilizes diffusion models as variational approximations to the true posterior distribution by deriving a novel bound to the marginal likelihood objective used in Bayesian modelling. DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model. We evaluate DMVI on a set of common Bayesian models and show that its posterior inferences are in general more accurate than those of contemporary methods used in PPLs while having a similar computational cost and requiring less manual tuning.
摘要
我们提出了Diffusion Model Variational Inference(DMVI),一种新的自动化近似推理方法,用于probabilistic programming languages(PPLs)中的推理。DMVI利用扩散模型作为真实 posterior distribution的可变 approximations,通过 derive a novel bound to the marginal likelihood objective used in Bayesian modeling。DMVI易于实现,在 PPLs 中进行快速简单的推理,不受 normalizing flows 等方法的缺点,而且不需要对基于神经网络模型的任何约束。我们对一组常见的 Bayesian 模型进行评估,发现 DMVI 的 posterior inferences 通常比当今 PPLs 中的方法更准确,而且计算成本和手动调整的需求相似。
Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization
results: 提供了一个通用的算法框架,可以包括异步版本的多种算法,如SGD、分布式SGD、本地SGD、FedBuff,并且在更宽泛的假设下提供了速度 converge 的速率。Abstract
Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization. Yet, combining these two techniques together still remains a challenge. In this paper, we take a step in this direction and introduce Asynchronous SGD on Graphs (AGRAF SGD) -- a general algorithmic framework that covers asynchronous versions of many popular algorithms including SGD, Decentralized SGD, Local SGD, FedBuff, thanks to its relaxed communication and computation assumptions. We provide rates of convergence under much milder assumptions than previous decentralized asynchronous works, while still recovering or even improving over the best know results for all the algorithms covered.
摘要
distributed 和异步通信是分布机器学习中通信复杂性的两种受欢迎技术,分别取消中央把关和同步需求。然而,将这两种技术相结合仍然是一项挑战。本文提出了一个步骤,即异步SGD on Graphs(AGRAF SGD)——一种涵盖异步版本的多种流行算法,包括SGD、分布SGD、本地SGD、FedBuff等。我们提供了更加宽松的通信和计算假设,并且可以在较弱的假设下提供速度收敛率,同时仍然能够回归或者超越之前的最佳结果。
for: This paper aims to improve the robustness of Gaussian process (GP) regression by developing a provably robust and conjugate Gaussian process (RCGP) regression method.
methods: The RCGP method uses generalised Bayesian inference to perform provably robust and conjugate closed-form updates at virtually no additional cost.
results: The paper demonstrates the strong empirical performance of RCGP on a range of problems, including Bayesian optimisation and sparse variational Gaussian processes.Abstract
To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes.
摘要
要启用闭式条件,一个常见的假设在泊松过程(GP)回归是独立和同样分布的 Gaussian 观测噪声。这强大且简单的假设经常在实践中被违反,导致不可靠的推断和不确定性评估。现有的方法用于强化 GPs 会打砸闭式条件,使其变得更加不吸引实践人员并显着增加计算成本。在这篇论文中,我们示示如何在较低的成本下实现可证的Robust conjugate Gaussian process(RCGP)回归,使用通用极权推理。RCGP 特别是可以在所有情况下实现扩展 conjugate 闭式更新,因此在标准 GPs 承认它们时具有精确的预测性。为证明其强大的实际性,我们在 Bayesian 优化到 sparse variational Gaussian processes 中应用 RCGP。
Optimal Budgeted Rejection Sampling for Generative Models
paper_authors: Alexandre Verine, Muni Sreenivas Pydi, Benjamin Negrevergne, Yann Chevaleyre
for: 提高权威性模型的性能和多样性
methods: 使用优化采样方法,包括提出的最优预算采样方案和综合训练方法
results: 通过实验和理论支持,显示提出的方法可以显著提高样本质量和多样性Abstract
Rejection sampling methods have recently been proposed to improve the performance of discriminator-based generative models. However, these methods are only optimal under an unlimited sampling budget, and are usually applied to a generator trained independently of the rejection procedure. We first propose an Optimal Budgeted Rejection Sampling (OBRS) scheme that is provably optimal with respect to \textit{any} $f$-divergence between the true distribution and the post-rejection distribution, for a given sampling budget. Second, we propose an end-to-end method that incorporates the sampling scheme into the training procedure to further enhance the model's overall performance. Through experiments and supporting theory, we show that the proposed methods are effective in significantly improving the quality and diversity of the samples.
摘要
<>将文本翻译成简化中文。>Recently, rejection sampling methods have been proposed to improve the performance of discriminator-based generative models. However, these methods are only optimal with an unlimited sampling budget, and are usually applied to a generator trained independently of the rejection procedure. We first propose an Optimal Budgeted Rejection Sampling (OBRS) scheme that is provably optimal with respect to any $f$-divergence between the true distribution and the post-rejection distribution, for a given sampling budget. Second, we propose an end-to-end method that incorporates the sampling scheme into the training procedure to further enhance the model's overall performance. Through experiments and supporting theory, we show that the proposed methods are effective in significantly improving the quality and diversity of the samples.Here's the translation in Traditional Chinese:<>将文本翻译为简化中文。>最近,拒绝抽样方法已经被提议来提高标注器基本的生成模型性能。然而,这些方法仅在无限抽样预算下是最佳的,并通常将抽样程序独立应用于生成器。我们首先提出了一个Optimal Budgeted Rejection Sampling(OBRS)方案,可以在任何 $f$-分布之间的对应预算下,具有最佳的性能。其次,我们提出了一个统一方法,将抽样方案 integrate到训练过程中,以进一步提高模型的总性能。通过实验和支持理论,我们证明了提案的方法可以对样本质量和多样性作出重要改善。
Hessian Eigenvectors and Principal Component Analysis of Neural Network Weight Matrices
results: 研究发现了网络参数和网络 weights 之间的关系,并提出了一种基于这种关系的方法来 Mitigate catastrophic forgetting。这种方法可以应用于不同规模的神经网络,包括更大的网络 architecture。Abstract
This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters. Trained networks predominantly continue training in a single direction, known as the drift mode. This drift mode can be explained by the quadratic potential model of the loss function, suggesting a slow exponential decay towards the potential minima. We unveil a correlation between Hessian eigenvectors and network weights. This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network. Notably, the significance of these directions relies on two defining attributes: the curvature of their potential wells (indicated by the magnitude of Hessian eigenvalues) and their alignment with the weight vectors. Our exploration extends to the decomposition of weight matrices through singular value decomposition. This approach proves practical in identifying critical directions within the Hessian, considering both their magnitude and curvature. Furthermore, our examination showcases the applicability of principal component analysis in approximating the Hessian, with update parameters emerging as a superior choice over weights for this purpose. Remarkably, our findings unveil a similarity between the largest Hessian eigenvalues of individual layers and the entire network. Notably, higher eigenvalues are concentrated more in deeper layers. Leveraging these insights, we venture into addressing catastrophic forgetting, a challenge of neural networks when learning new tasks while retaining knowledge from previous ones. By applying our discoveries, we formulate an effective strategy to mitigate catastrophic forgetting, offering a possible solution that can be applied to networks of varying scales, including larger architectures.
摘要
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
paper_authors: Peter A. Zachares, Vahan Hovhannisyan, Alan Mosca, Yarin Gal
for: 本研究针对 conditioned on 图形函数需求下的图形生成问题进行了 novel 的设定。
methods: 我们将问题定义为文本到文本生成问题,并提出了一种基于预训练大型自然语言模型(LLM)的方法,通过 incorporating message passing layers into LLM 的架构来增加图形结构信息。
results: 我们设计了一系列公共available和广泛研究的分子和知识图数据集,以评估我们的提议方法。结果表明,我们的方法可以更好地满足请求的函数需求,与类似任务的基线方法相比,具有 statistically significant 的差异。Abstract
This work focuses on the novel problem setting of generating graphs conditioned on a description of the graph's functional requirements in a downstream task. We pose the problem as a text-to-text generation problem and focus on the approach of fine-tuning a pretrained large language model (LLM) to generate graphs. We propose an inductive bias which incorporates information about the structure of the graph into the LLM's generation process by incorporating message passing layers into an LLM's architecture. To evaluate our proposed method, we design a novel set of experiments using publicly available and widely studied molecule and knowledge graph data sets. Results suggest our proposed approach generates graphs which more closely meet the requested functional requirements, outperforming baselines developed on similar tasks by a statistically significant margin.
摘要
Crop Disease Classification using Support Vector Machines with Green Chromatic Coordinate (GCC) and Attention based feature extraction for IoT based Smart Agricultural Applications
For: 农民可以快速和准确地识别作物疾病,保持农业产量和食品安全。* Methods: 使用注意力基于的特征提取,RGB通道基于的色彩分析,支持向量机(SVM)等机器学习和深度学习算法,并可以与移动应用程序和物联网设备集成。* Results: 提出一种新的分类方法,基于先前的研究,使用注意力基于的特征提取、RGB通道基于的色彩分析、SVM等算法,并可以与移动应用程序和物联网设备集成,并且在准确率方面与其他算法相比,达到了99.69%的精度。Abstract
Crops hold paramount significance as they serve as the primary provider of energy, nutrition, and medicinal benefits for the human population. Plant diseases, however, can negatively affect leaves during agricultural cultivation, resulting in significant losses in crop output and economic value. Therefore, it is crucial for farmers to identify crop diseases. However, this method frequently necessitates hard work, a lot of planning, and in-depth familiarity with plant pathogens. Given these numerous obstacles, it is essential to provide solutions that can easily interface with mobile and IoT devices so that our farmers can guarantee the best possible crop development. Various machine learning (ML) as well as deep learning (DL) algorithms have been created & studied for the identification of plant disease detection, yielding substantial and promising results. This article presents a novel classification method that builds on prior work by utilising attention-based feature extraction, RGB channel-based chromatic analysis, Support Vector Machines (SVM) for improved performance, and the ability to integrate with mobile applications and IoT devices after quantization of information. Several disease classification algorithms were compared with the suggested model, and it was discovered that, in terms of accuracy, Vision Transformer-based feature extraction and additional Green Chromatic Coordinate feature with SVM classification achieved an accuracy of (GCCViT-SVM) - 99.69%, whereas after quantization for IoT device integration achieved an accuracy of - 97.41% while almost reducing 4x in size. Our findings have profound implications because they have the potential to transform how farmers identify crop illnesses with precise and fast information, thereby preserving agricultural output and ensuring food security.
摘要
This article presents a novel classification method that leverages attention-based feature extraction, RGB channel-based chromatic analysis, and Support Vector Machines (SVM) for improved performance. The method also has the ability to integrate with mobile applications and IoT devices after quantization of information. Several disease classification algorithms were compared with the proposed model, and the results showed that the Vision Transformer-based feature extraction and additional Green Chromatic Coordinate feature with SVM classification achieved an accuracy of 99.69%, while the quantized model achieved an accuracy of 97.41% with a reduction of almost 4x in size.These findings have significant implications for the agricultural industry, as they have the potential to revolutionize how farmers identify crop diseases with precise and fast information, ensuring food security and preserving agricultural output.
NEO-KD: Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks
results: 实验结果表明, compared to基于现有 adversarial training 或知识塑造技术的基eline,我们的方法在不同的数据集/模型上 achieve the best adversarial accuracy with reduced computation budgets。Abstract
While multi-exit neural networks are regarded as a promising solution for making efficient inference via early exits, combating adversarial attacks remains a challenging problem. In multi-exit networks, due to the high dependency among different submodels, an adversarial example targeting a specific exit not only degrades the performance of the target exit but also reduces the performance of all other exits concurrently. This makes multi-exit networks highly vulnerable to simple adversarial attacks. In this paper, we propose NEO-KD, a knowledge-distillation-based adversarial training strategy that tackles this fundamental challenge based on two key contributions. NEO-KD first resorts to neighbor knowledge distillation to guide the output of the adversarial examples to tend to the ensemble outputs of neighbor exits of clean data. NEO-KD also employs exit-wise orthogonal knowledge distillation for reducing adversarial transferability across different submodels. The result is a significantly improved robustness against adversarial attacks. Experimental results on various datasets/models show that our method achieves the best adversarial accuracy with reduced computation budgets, compared to the baselines relying on existing adversarial training or knowledge distillation techniques for multi-exit networks.
摘要
多出口神经网络被视为减少推理成本的有前途的解决方案,但是抗击敌方攻击仍然是一个困难的问题。在多出口网络中,由于不同子模型之间的高度依赖关系,攻击一个特定的出口不仅会降低该出口的性能,而且同时降低所有其他出口的性能。这使得多出口网络对简单的敌方攻击非常敏感。在这篇论文中,我们提出了NEO-KD,基于知识塑造的对抗训练策略,以解决这一基本挑战。NEO-KD首先通过邻居知识塑造引导攻击样本的输出倾向于净数据邻居出口的 ensemble 输出。NEO-KD还使用出口wise ortogonal knowledge塑造来降低攻击传播性 across 不同子模型。这使得我们的方法在不同的数据集/模型上实现了显著提高的对抗性。实验结果表明,我们的方法在计算预算下可以实现最好的对抗精度,相比于基于现有的对抗训练或知识塑造技术的基eline。
Uncertainty quantification and out-of-distribution detection using surjective normalizing flows
results: 作者将方法应用到一个人工合成的数据集和一些从 intervenational 分布中导出的数据集上,并证明了这个方法可靠地分辨出内部数据集和外部数据集。作者与Dirichlet 过程混合模型和bijective flow进行比较,发现surjective flow模型是关键的 ком成分,可以可靠地分辨内部数据集和外部数据集。Abstract
Reliable quantification of epistemic and aleatoric uncertainty is of crucial importance in applications where models are trained in one environment but applied to multiple different environments, often seen in real-world applications for example, in climate science or mobility analysis. We propose a simple approach using surjective normalizing flows to identify out-of-distribution data sets in deep neural network models that can be computed in a single forward pass. The method builds on recent developments in deep uncertainty quantification and generative modeling with normalizing flows. We apply our method to a synthetic data set that has been simulated using a mechanistic model from the mobility literature and several data sets simulated from interventional distributions induced by soft and atomic interventions on that model, and demonstrate that our method can reliably discern out-of-distribution data from in-distribution data. We compare the surjective flow model to a Dirichlet process mixture model and a bijective flow and find that the surjections are a crucial component to reliably distinguish in-distribution from out-of-distribution data.
摘要
可靠地量化知识型和随机型uncertainty在应用中是非常重要的,特别是在模型在多个环境中训练后应用于多个不同的环境中,例如气候科学或者流动分析。我们提出了一种简单的方法,使用射影正则化流来在深度神经网络模型中标识不同环境中的数据集。该方法基于深度不确定性评估和生成模型的正则化流的最新发展。我们在一个人工生成的数据集和一些基于软件和原子性改变的数据集上应用了我们的方法,并证明了我们的方法可靠地将不同环境中的数据集分为准确和不准确两类。我们与 Dirichlet 过程混合模型和 bijection 流进行比较,发现射影流模型是重要的组成部分,可以可靠地分辨在 Distribution 和 out-of-distribution 数据之间。
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
results: 作者们的实现方法在目标Intel数据中心GPU上达到了 peak性能,并与Intel oneMKL库在Intel GPU上的性能相比,以及NVIDIA V100 GPU上的一个最近CUDA实现相比, demonstrate that their implementations of sparse matrix operations outperform either.Abstract
In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition of the SDDMM with SPMM, also termed as FusedMM. We develop optimized implementations for SPMM, SDDMM, and FusedMM operations utilizing Intel oneAPI's Explicit SIMD (ESIMD) SYCL extension API. In contrast to CUDA or SYCL, the ESIMD API enables the writing of explicitly vectorized kernel code. Sparse matrix algorithms implemented with the ESIMD API achieved performance close to the peak of the targeted Intel Data Center GPU. We compare our performance results to Intel's oneMKL library on Intel GPUs and to a recent CUDA implementation for the sparse matrix operations on NVIDIA's V100 GPU and demonstrate that our implementations for sparse matrix operations outperform either.
摘要
在这篇论文中,我们关注了三种稀疏矩阵操作,即稀疏积分矩阵(SPMM)、采样积分积分矩阵(SDDMM)以及这两者的融合(FusedMM)。我们开发了优化的实现方法,使用 intel oneAPI 的显式 SIMD(ESIMD) SYCL 扩展 API。与 CUDA 或 SYCL 不同,ESIMD API 允许我们编写明确的向量化kernel代码。我们使用 ESIMD API 实现的稀疏矩阵算法在目标 Intel 数据中心 GPU 上达到了 peak 性能。我们对 intel oneMKL 库在 intel GPU 上的性能进行比较,以及 NVIDIA V100 GPU 上的一个最近的 CUDA 实现,并证明了我们的稀疏矩阵操作实现的性能高于其他任何一个。
Adversarially Robust Distributed Count Tracking via Partial Differential Privacy
results: 我们提供了一个robust algorithm,其communication cost是 deterministic 算法的lower bound。 existing robustification techniques 不能实现optimal bounds,因为分布式问题的特殊性。 To address this, we extend the differential privacy framework by introducing “partial differential privacy” and proving a new generalization theorem. This theorem may have broader applications beyond robust count tracking, making it of independent interest.Abstract
We study the distributed tracking model, also known as distributed functional monitoring. This model involves $k$ sites each receiving a stream of items and communicating with the central server. The server's task is to track a function of all items received thus far continuously, with minimum communication cost. For count tracking, it is known that there is a $\sqrt{k}$ gap in communication between deterministic and randomized algorithms. However, existing randomized algorithms assume an "oblivious adversary" who constructs the entire input streams before the algorithm starts. Here we consider adaptive adversaries who can choose new items based on previous answers from the algorithm. Deterministic algorithms are trivially robust to adaptive adversaries, while randomized ones may not. Therefore, we investigate whether the $\sqrt{k}$ advantage of randomized algorithms is from randomness itself or the oblivious adversary assumption. We provide an affirmative answer to this question by giving a robust algorithm with optimal communication. Existing robustification techniques do not yield optimal bounds due to the inherent challenges of the distributed nature of the problem. To address this, we extend the differential privacy framework by introducing "partial differential privacy" and proving a new generalization theorem. This theorem may have broader applications beyond robust count tracking, making it of independent interest.
摘要
我们研究分布式跟踪模型,也称为分布式功能监测。这个模型中,有 $k$ 个站点,每个站点接收一束项目并与中央服务器进行交流。服务器的任务是不断跟踪所有项目的函数,以最小化交流成本。对于计数跟踪,已知存在 $\sqrt{k}$ 的交流差异 между deterministic 和 randomized 算法。然而,现有的 randomized 算法假设了一个 "无知敌手"(oblivious adversary),该敌手在算法开始之前构建整个输入流。在这里,我们考虑 adaptive 敌手,该敌手可以根据先前答案选择新的项目。deterministic 算法对 adaptive 敌手是可以逆转的,而 randomized 算法可能不是。因此,我们研究是 randomized 算法的 $\sqrt{k}$ 优势来自于随机性本身,还是 oblivious adversary 假设。我们提供了一个有optimal communication的robust算法,现有的 robustification 技术不能实现optimal bounds,因为分布式问题的特殊性。为解决这一点,我们扩展了 differential privacy 框架,引入 "partial differential privacy",并证明一个新的总则。这个总则可能有更广泛的应用,因此是独立的兴趣。
The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture
paper_authors: Anuroop Sriram, Sihoon Choi, Xiaohan Yu, Logan M. Brabson, Abhishek Das, Zachary Ulissi, Matt Uyttendaele, Andrew J. Medford, David S. Sholl
results: 该论文提供了一个名为Open DAC 2023(ODAC23)的开源数据集,包含了38亿次DFT计算,并经过了深入分析,从而提取了MOF材料的特性。此外,该论文还使用了最新的ML模型,以估计DFT计算的结果。Abstract
New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,800 MOF materials containing adsorbed CO2 and/or H2O. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.
摘要
新的碳排放除去方法urgently需要来 combat global climatic change。直接空气 capture(DAC)是一种emerging technology to capture碳排放 directly from ambient air。 Metal-organic frameworks(MOFs)have been widely studied as potentially customizable adsorbents for DAC。However,discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature。We explore a computational approach benefiting from recent innovations in machine learning(ML)and present a dataset named Open DAC 2023(ODAC23)consisting of more than 38M density functional theory(DFT)calculations on more than 8,800 MOF materials containing adsorbed CO2 and/or H2O。ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available。In addition to probing properties of adsorbed molecules,the dataset is a rich source of information on structural relaxation of MOFs,which will be useful in many contexts beyond specific applications for DAC。A large number of MOFs with promising properties for DAC are identified directly in ODAC23。We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level。This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications,including DAC。
Latent Space Inference For Spatial Transcriptomics
results: 研究发现,通过将这两种数据映射到共同的假设空间表示中,可以同时获取细胞表达信息和其 spatial坐标,从而为我们带来更深刻的理解细胞生物学过程和路径way.Abstract
In order to understand the complexities of cellular biology, researchers are interested in two important metrics: the genetic expression information of cells and their spatial coordinates within a tissue sample. However, state-of-the art methods, namely single-cell RNA sequencing and image based spatial transcriptomics can only recover a subset of this information, either full genetic expression with loss of spatial information, or spatial information with loss of resolution in sequencing data. In this project, we investigate a probabilistic machine learning method to obtain the full genetic expression information for tissues samples while also preserving their spatial coordinates. This is done through mapping both datasets to a joint latent space representation with the use of variational machine learning methods. From here, the full genetic and spatial information can be decoded and to give us greater insights on the understanding of cellular processes and pathways.
摘要
为了理解细胞生物学中的复杂性,研究人员对两个重要指标感兴趣:细胞的遗传表达信息和它们在组织样本中的空间坐标。然而,现有的技术,即单细胞RNA扩增和图像基于的空间转录组学,只能回归一部分这些信息,即全遗传表达信息的损失或图像数据中的分辨率损失。在本项目中,我们研究一种概率机器学习方法,以获取组织样本中的全遗传表达信息,同时保持它们的空间坐标。我们通过将两个数据集映射到共同的假设空间表示中,使用变分机器学习方法来实现这一目标。从这里,我们可以解码全遗传和空间信息,以提供更深刻的细胞生物学过程和 PATHway 的理解。
Multi-task Representation Learning for Pure Exploration in Bilinear Bandits
results: 本文的结果表明,通过共享表示来加速找到每个任务中的优化对象,可以减少样本数量,比传统独立解决每个任务的方法更有效。Abstract
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common low-dimensional linear representation. The objective is to leverage this characteristic to expedite the process of identifying the best pair of arms for all tasks. We propose the algorithm GOBLIN that uses an experimental design approach to optimize sample allocations for learning the global representation as well as minimize the number of samples needed to identify the optimal pair of arms in individual tasks. To the best of our knowledge, this is the first study to give sample complexity analysis for pure exploration in bilinear bandits with shared representation. Our results demonstrate that by learning the shared representation across tasks, we achieve significantly improved sample complexity compared to the traditional approach of solving tasks independently.
摘要
我们研究多任务表示学习 Bilinear bandits 中的纯探索问题。在 Bilinear bandits 中,一个动作是两个不同类型的 arm 的对,奖励是两个known feature vector 的 bilinear函数。在多任务 Bilinear bandit 问题中,我们目标是找到多个任务共享的低维度线性表示,并利用这个特点来快速确定所有任务的最佳对。我们提出了 GOBLIN 算法,使用实验设计方法来优化样本分配以学习全局表示,以及最小化每个任务中的样本数量。根据我们所知,这是首次对 Bilinear bandits 中纯探索问题进行样本复杂度分析的研究。我们的结果表明,通过学习共享表示,我们可以在每个任务中实现明显改善的样本复杂度。
Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables
results: 研究结果表明,该系统可以处理20种声音类型,并且在连接式手机上实现了6.56毫秒的运行时间。在实际生活中进行的评估中,证明了该系统可以提取目标声音并保持它们的空间cue。项目页面与代码:https://semantichearing.cs.washington.eduAbstract
Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output. Project page with code: https://semantichearing.cs.washington.edu
摘要
想像你可以在公园中听到鸟叫,而不听到其他游客的喊喊叫,或者在忙街上听到交通噪音,而快速听到紧急警 siren 和车 horn。我们引入semantic hearing,一种新的能力 для智能听众设备,允许它们在实时中,选择性地听到或忽略来自实际环境中的具体声音,而不失去声学位置信息。为了实现这一目标,我们做出了两项技术贡献:1. 我们提出了第一个能够在干扰声和背景噪音的情况下进行针对声音抽取的 neural network,可以在实时中提取Target声音。2. 我们设计了一种可以在实际应用中通用的训练方法,使我们的系统能够在实际应用中泛化。结果表明,我们的系统可以处理20种声音类型,并且我们使用 transformer 网络的运行时间为6.56 ms。在实际场景中进行了审试,我们的证明系统可以提取Target声音并保持声学位置信息。项目页面与代码:https://semantichearing.cs.washington.eduNote: Simplified Chinese is used here, as it is the most widely used standard for Chinese writing.
Federated Topic Model and Model Pruning Based on Variational Autoencoder
results: 实验结果显示,基于VAE的FTM剪枝方法可以大幅提高模型训练速度,而不失其性能。Abstract
Topic modeling has emerged as a valuable tool for discovering patterns and topics within large collections of documents. However, when cross-analysis involves multiple parties, data privacy becomes a critical concern. Federated topic modeling has been developed to address this issue, allowing multiple parties to jointly train models while protecting pri-vacy. However, there are communication and performance challenges in the federated sce-nario. In order to solve the above problems, this paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model, where the client periodically sends the model neu-ron cumulative gradients and model weights to the server, and the server prunes the model. To address different requirements, two different methods are proposed to determine the model pruning rate. The first method involves slow pruning throughout the entire model training process, which has limited acceleration effect on the model training process, but can ensure that the pruned model achieves higher accuracy. This can significantly reduce the model inference time during the inference process. The second strategy is to quickly reach the target pruning rate in the early stage of model training in order to accelerate the model training speed, and then continue to train the model with a smaller model size after reaching the target pruning rate. This approach may lose more useful information but can complete the model training faster. Experimental results show that the federated topic model pruning based on the variational autoencoder proposed in this paper can greatly accelerate the model training speed while ensuring the model's performance.
摘要
通用主题模型在处理大量文档时发现模式和话题变得非常有用。然而,当跨分析包括多个方的情况下,数据隐私变得非常重要。联邦主题模型是为此目的而开发的,允许多个方共同训练模型,保护每个节点的隐私。然而,联邦场景中存在交流和性能问题。为解决以上问题,本文提出了一种方法,可以在多个节点之间共同训练模型,保证每个节点的隐私,并使用神经网络模型剪辑以加速模型训练。在客户端 periodic 发送模型神经元累积偏移和模型参数给服务器,服务器进行模型剪辑。为了应对不同的需求,本文提出了两种不同的方法来确定模型剪辑率。第一种方法是在整个模型训练过程中慢慢剪辑模型,可以在模型训练过程中有限度地加速模型训练,但是可以确保剪辑后的模型准确率高。这可以减少模型推理时间。第二种方法是在模型训练过程的早期 quickly 到达目标剪辑率,以加速模型训练速度,然后继续使用较小的模型大小进行模型训练。这种方法可能会产生更多的有用信息产生,但可以更快地完成模型训练。实验结果表明,基于变量自动encoder的联邦主题模型剪辑可以快速加速模型训练速度,保证模型性能。
Stacking an autoencoder for feature selection of zero-day threats
For: This paper is written for researchers and practitioners in the field of cybersecurity, particularly those interested in zero-day attack detection and artificial neural networks.* Methods: The paper uses a stacked autoencoder (SAE) and a Long Short-Term Memory (LSTM) scheme for feature selection and zero-day threat classification. The SAE is used for unsupervised feature extraction, and the LSTM is used for supervised learning to enhance the model’s discriminative capabilities.* Results: The paper reports high precision, recall, and F1 score values for the SAE-LSTM model in identifying various types of zero-day attacks, and demonstrates strong predictive capabilities across all three attack categories. The balanced average scores suggest that the model generalizes effectively and consistently across different attack categories.Here’s the simplified Chinese text for the three key information points:* For: 这篇论文是为了针对安全领域的研究人员和实践者而写的,尤其是关注于零日攻击检测和人工神经网络。* Methods: 这篇论文使用了堆叠自动编码器(SAE)和长短期记忆(LSTM)方法来实现特征选择和零日威胁分类。 SAE 用于无监督特征提取,而 LSTM 用于监督学习以提高模型的识别能力。* Results: 论文报告了 SAELSTM 模型在不同类型的零日攻击上的高精度、回归率和 F1 分数值,并示出了模型在不同攻击类别之间的一致性和通用性。Abstract
Zero-day attack detection plays a critical role in mitigating risks, protecting assets, and staying ahead in the evolving threat landscape. This study explores the application of stacked autoencoder (SAE), a type of artificial neural network, for feature selection and zero-day threat classification using a Long Short-Term Memory (LSTM) scheme. The process involves preprocessing the UGRansome dataset and training an unsupervised SAE for feature extraction. Finetuning with supervised learning is then performed to enhance the discriminative capabilities of this model. The learned weights and activations of the autoencoder are analyzed to identify the most important features for discriminating between zero-day threats and normal system behavior. These selected features form a reduced feature set that enables accurate classification. The results indicate that the SAE-LSTM performs well across all three attack categories by showcasing high precision, recall, and F1 score values, emphasizing the model's strong predictive capabilities in identifying various types of zero-day attacks. Additionally, the balanced average scores of the SAE-LSTM suggest that the model generalizes effectively and consistently across different attack categories.
摘要
<> translate("Zero-day attack detection plays a critical role in mitigating risks, protecting assets, and staying ahead in the evolving threat landscape. This study explores the application of stacked autoencoder (SAE), a type of artificial neural network, for feature selection and zero-day threat classification using a Long Short-Term Memory (LSTM) scheme. The process involves preprocessing the UGRansome dataset and training an unsupervised SAE for feature extraction. Finetuning with supervised learning is then performed to enhance the discriminative capabilities of this model. The learned weights and activations of the autoencoder are analyzed to identify the most important features for discriminating between zero-day threats and normal system behavior. These selected features form a reduced feature set that enables accurate classification. The results indicate that the SAE-LSTM performs well across all three attack categories by showcasing high precision, recall, and F1 score values, emphasizing the model's strong predictive capabilities in identifying various types of zero-day attacks. Additionally, the balanced average scores of the SAE-LSTM suggest that the model generalizes effectively and consistently across different attack categories.")]以下是文本的简化中文翻译: Zero-day 攻击检测在降低风险、保护资产和在演化的威胁领域中发挥关键角色。本研究探讨使用堆式自适应器(SAE),一种人工神经网络,来选择特征和预测 zero-day 威胁。该过程包括对 UGRansome 数据集进行预处理,并使用无监督 SAE 进行特征提取。然后,通过监督学习进行训练,以提高模型的推论能力。模型学习的权重和活动都被分析以确定最重要的特征,以便在预测 zero-day 威胁和正常系统行为之间进行分类。这些选择的特征组成了一个减少特征集,使得准确地进行分类。结果表明,SAE-LSTM 在三个攻击类别中都表现出色,具有高精度、 recall 和 F1 分数值,证明模型在不同的攻击类别中具有强的预测能力。此外,SAE-LSTM 的平衡平均分数表明,模型在不同的攻击类别中具有一致的泛化能力和一致性。
Model-driven Engineering for Machine Learning Components: A Systematic Literature Review
results: 本研究的结果表明,使用MDE4ML可以提高开发效率、降低开发成本、提高系统可维护性和可扩展性等。但是,还存在一些局限性和挑战,需要进一步的研究和发展。Abstract
Context: Machine Learning (ML) has become widely adopted as a component in many modern software applications. Due to the large volumes of data available, organizations want to increasingly leverage their data to extract meaningful insights and enhance business profitability. ML components enable predictive capabilities, anomaly detection, recommendation, accurate image and text processing, and informed decision-making. However, developing systems with ML components is not trivial; it requires time, effort, knowledge, and expertise in ML, data processing, and software engineering. There have been several studies on the use of model-driven engineering (MDE) techniques to address these challenges when developing traditional software and cyber-physical systems. Recently, there has been a growing interest in applying MDE for systems with ML components. Objective: The goal of this study is to further explore the promising intersection of MDE with ML (MDE4ML) through a systematic literature review (SLR). Through this SLR, we wanted to analyze existing studies, including their motivations, MDE solutions, evaluation techniques, key benefits and limitations. Results: We analyzed selected studies with respect to several areas of interest and identified the following: 1) the key motivations behind using MDE4ML; 2) a variety of MDE solutions applied, such as modeling languages, model transformations, tool support, targeted ML aspects, contributions and more; 3) the evaluation techniques and metrics used; and 4) the limitations and directions for future work. We also discuss the gaps in existing literature and provide recommendations for future research. Conclusion: This SLR highlights current trends, gaps and future research directions in the field of MDE4ML, benefiting both researchers and practitioners
摘要
Machine Learning (ML) 已经成为现代软件应用中的一个重要组件。由于大量数据的可用性,组织希望通过数据来提取有意义的洞见和提高商业利润。 ML ком�ponenets 提供预测功能、偏差检测、建议、精准的图像和文本处理,并帮助做出了解决策。然而,开发具有 ML ком�ponenets 的系统不是易事;它需要时间、努力、知识和 ML、数据处理和软件工程的专家知识。过去,有关使用模型驱动工程(MDE)技术来解决这些挑战的研究已经很多。最近,关于应用 MDE for 系统 with ML ком�ponenets 的研究则有所增加。目标:本研究的目标是进一步探索 MDE 与 ML (MDE4ML)的联合领域,通过系统性文献综述(SLR)。通过这个 SLR,我们想要分析选择的研究,包括他们的动机、MDE 解决方案、评估技术和 метри克、主要优点和局限性。结果:我们分析选择的研究,并评估他们在以下几个领域:1)使用 MDE4ML 的动机;2)MDE 解决方案的多样性,包括模型语言、模型转换、工具支持、针对 ML 方面的贡献等;3)评估技术和 метри克的使用;和4)限制和未来研究的方向。我们还讨论了现有文献中的潜在空白和未来研究的建议。结论:这个 SLR 显示了 MDE4ML 的现有趋势、缺点和未来研究的方向,对研究者和实践者都有帮助。
Generalization Bounds for Label Noise Stochastic Gradient Descent
for: This paper is written for those interested in understanding the generalization error bounds of stochastic gradient descent (SGD) with label noise in non-convex settings.
methods: The paper uses a combination of uniform dissipativity and smoothness conditions, as well as a suitable choice of semimetric, to establish a contraction in Wasserstein distance of the label noise stochastic gradient flow.
results: The paper derives time-independent generalization error bounds for the discretized algorithm with a constant learning rate, which scales polynomially with the parameter dimension $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD) under similar conditions.Abstract
We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD) -- which employs parameter-independent Gaussian noise -- under similar conditions. Our analysis offers quantitative insights into the effect of label noise.
摘要
我们研究了批处理梯度下降(SGD)中标签噪声的总化误差上限在非对称 Setting下,并且在各种不同的semimetric下进行了选择。我们在标签噪声梯度流中确立了一个 Wasserstein 距离的减少,这个减少的程度取决于参数维度 $d$。使用算法稳定性框架,我们得到了一个时间独立的总化误差上限,这个上限与参数维度 $d$ 和学习率 $n$ 相乘。我们的分析提供了标签噪声的量化意见,并且我们的误差上限的速度与 $d$ 和 $n$ 相乘。这个速度比最好known的 $n^{-1/2}$ 更快,这个速度是在类似条件下使用参数独立的 Gaussian 噪声的SGLD中得到的。
results: 作者首先表明,如果给定任何最佳活动学习算法,则与整个数据集进行协同学习即可保证IR协同协议。然而,计算最佳算法是NP困难的。因此,作者提供了一些IR协同协议,可以与最佳 tractable approximation algorithm 相比肩并减少标签复杂性。Abstract
In collaborative active learning, where multiple agents try to learn labels from a common hypothesis, we introduce an innovative framework for incentivized collaboration. Here, rational agents aim to obtain labels for their data sets while keeping label complexity at a minimum. We focus on designing (strict) individually rational (IR) collaboration protocols, ensuring that agents cannot reduce their expected label complexity by acting individually. We first show that given any optimal active learning algorithm, the collaboration protocol that runs the algorithm as is over the entire data is already IR. However, computing the optimal algorithm is NP-hard. We therefore provide collaboration protocols that achieve (strict) IR and are comparable with the best known tractable approximation algorithm in terms of label complexity.
摘要
协同活动学习中,多个代理尝试学习共同假设中的标签,我们提出了一种创新的协作框架,以增加代理的奖励。在这个框架中,合理的代理希望通过最小化标签复杂度来获得标签。我们主要关注设计(严格)个人合理(IR)的协作协议,以确保代理不能通过单独行动减少其预期标签复杂度。我们首先证明,任何优化的活动学习算法都可以通过整个数据集进行执行,而不需要进行协作。然而,计算优化算法是NP困难的。因此,我们提供了一些可比较于最佳可追踪算法的协作协议,以达到(严格)IR的目标,并且和最佳可追踪算法相比,标签复杂度减少的程度相对较小。
Active Neural Topological Mapping for Multi-Agent Exploration
paper_authors: Xinyi Yang, Yuxiang Yang, Chao Yu, Jiayu Chen, Jingchen Yu, Haibing Ren, Huazhong Yang, Yu Wang for: 多个自动机器人在未知环境中进行合作探索任务,需要在有限时间内利用感知信号完成探索任务。methods: 提出了一种基于神经网络的多自动机器人 topological mapping 技术,包括图像编码器和距离基于的优化技术,以及一种基于图 neural network 的层次嵌入式规划算法。results: 在一个 физи实验中,该技术可以在未seen scenario 下提高探索效率和普适性,比基于规划的基eline 下降至少 26.40%,并比基于RL的竞争对手下降至少 7.63%。Abstract
This paper investigates the multi-agent cooperative exploration problem, which requires multiple agents to explore an unseen environment via sensory signals in a limited time. A popular approach to exploration tasks is to combine active mapping with planning. Metric maps capture the details of the spatial representation, but are with high communication traffic and may vary significantly between scenarios, resulting in inferior generalization. Topological maps are a promising alternative as they consist only of nodes and edges with abstract but essential information and are less influenced by the scene structures. However, most existing topology-based exploration tasks utilize classical methods for planning, which are time-consuming and sub-optimal due to their handcrafted design. Deep reinforcement learning (DRL) has shown great potential for learning (near) optimal policies through fast end-to-end inference. In this paper, we propose Multi-Agent Neural Topological Mapping (MANTM) to improve exploration efficiency and generalization for multi-agent exploration tasks. MANTM mainly comprises a Topological Mapper and a novel RL-based Hierarchical Topological Planner (HTP). The Topological Mapper employs a visual encoder and distance-based heuristics to construct a graph containing main nodes and their corresponding ghost nodes. The HTP leverages graph neural networks to capture correlations between agents and graph nodes in a coarse-to-fine manner for effective global goal selection. Extensive experiments conducted in a physically-realistic simulator, Habitat, demonstrate that MANTM reduces the steps by at least 26.40% over planning-based baselines and by at least 7.63% over RL-based competitors in unseen scenarios.
摘要
Translated into Simplified Chinese:这篇论文研究了多个机器人合作探索问题,它们需要在未知环境中通过感知信号在有限时间内进行探索。现有的探索任务的办法是将活动地图与规划结合使用。度量地图捕捉了环境的细节信息,但是它们可能因为enario的变化而导致普遍性下降。而图像地图则是一种有前途的替代方案,它只包含节点和边,并且通过距离基于的优化来构建图像。但是大多数现有的图像基于的探索任务仍然使用经典的规划方法,这些方法可能是时间消耗和不优化的。深度强化学习(DRL)已经显示出了学习(近似)优化政策的潜力,而这些政策可以通过快速的终端推理来实现。在这篇论文中,我们提出了多机器人神经图像映射(MANTM),以提高探索效率和普遍性。MANTM主要包括一个图像映射器和一个基于RL的层次图像规划器(HTP)。图像映射器使用视觉编码器和距离基于的优化来构建包含主节点和其对应的鬼节点的图像。HTP利用图像神经网络来捕捉多个机器人和图像节点之间的相互关系,并在层次结构中进行有效的全局目标选择。在Habitat simulator中进行了广泛的实验,表明MANTM可以比基于规划的基准解决方案减少步骤数少于26.40%,并且比基于RL的竞争对手减少步骤数少于7.63%。
DistDNAS: Search Efficient Feature Interactions within 2 Hours
results: 在一个1TB的Criteo Terabyte dataset上进行了广泛的实验评估,结果显示DistDNAS可以提高0.001的AUC和60%的FLOPs,比当前状态艺术CTR模型更好。Abstract
Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathematical operations introduces potential conflicts and additional redundancy toward recommender models, leading to sub-optimal trade-offs in performance and serving cost. In this paper, we present DistDNAS as a neat solution to brew swift and efficient feature interaction design. DistDNAS proposes a supernet to incorporate interaction modules of varying orders and types as a search space. To optimize search efficiency, DistDNAS distributes the search and aggregates the choice of optimal interaction modules on varying data dates, achieving over 25x speed-up and reducing search cost from 2 days to 2 hours. To optimize serving efficiency, DistDNAS introduces a differentiable cost-aware loss to penalize the selection of redundant interaction modules, enhancing the efficiency of discovered feature interactions in serving. We extensively evaluate the best models crafted by DistDNAS on a 1TB Criteo Terabyte dataset. Experimental evaluations demonstrate 0.001 AUC improvement and 60% FLOPs saving over current state-of-the-art CTR models.
摘要
搜索效率和服务效率是建立功能互动和加速模型开发过程中的两大轴心。在大规模的参考数据上,搜索到最佳功能互动设计需要很大的成本,因为搜索工作流程需要遍历大量数据。此外,将来自不同来源、顺序和mathematical operations的互动组合导致推荐模型中的冲突和额外累累,从而导致性能和服务成本的折冲。在本文中,我们提出了DistDNAS作为一个简单的解决方案,它透过建立互动模组的supernet,实现了快速和高效的功能互动设计。DistDNAS通过分布搜索和聚合选择最佳互动模组的选择,实现了25倍的速度提升和从2天缩短为2小时的搜索成本。此外,DistDNAS引入了一个可微的成本警示loss,以惩罚选择重复的互动模组,提高发现的功能互动效率。我们对1TB Criteo Terabyte数据集进行了广泛的实验评估,结果显示了0.001 AUC提升和60% FLOPs节省,较前瞻性的CTR模型。
Transformers are Efficient In-Context Estimators for Wireless Communication
results: 经过广泛的 simulations,这个方法不仅可以达到标准方法的性能水平,还可以在几个 context 示例后达到同样的性能水平,这表明 transformers 是一种高效的 in-context 估计器在通信设置下。Abstract
Pre-trained transformers can perform in-context learning, where they adapt to a new task using only a small number of prompts without any explicit model optimization. Inspired by this attribute, we propose a novel approach, called in-context estimation, for the canonical communication problem of estimating transmitted symbols from received symbols. A communication channel is essentially a noisy function that maps transmitted symbols to received symbols, and this function can be represented by an unknown parameter whose statistics depend on an (also unknown) latent context. Conventional approaches ignore this hierarchical structure and simply attempt to use known transmissions, called pilots, to perform a least-squares estimate of the channel parameter, which is then used to estimate successive, unknown transmitted symbols. We make the basic connection that transformers show excellent contextual sequence completion with a few prompts, and so they should be able to implicitly determine the latent context from pilot symbols to perform end-to-end in-context estimation of transmitted symbols. Furthermore, the transformer should use information efficiently, i.e., it should utilize any pilots received to attain the best possible symbol estimates. Through extensive simulations, we show that in-context estimation not only significantly outperforms standard approaches, but also achieves the same performance as an estimator with perfect knowledge of the latent context within a few context examples. Thus, we make a strong case that transformers are efficient in-context estimators in the communication setting.
摘要
pré-entraîné transformers peuvent effectuer l'apprentissage en contexte, où ils s'adaptent à une nouvelle tâche à l'aide seulement d'un petit nombre de prompts sans aucune optimization explicite du modèle. Inspirés par cette caractéristique, nous proposons une nouvelle approche, appelée estimation en contexte, pour le problème canonique de communication de déterminer les symboles transmis à partir des symboles reçus. Un canal de communication est essentiellement une fonction bruitée qui cartographie les symboles transmis en symboles reçus, et cette fonction peut être représentée par un paramètre inconnu dont les statistiques dépendent d'un contexte latent également inconnu. Les approches conventionnelles négligent cette structure hiérarchique et essaient simplement d'utiliser des transmissions conocues, appelées piliers, pour effectuer un estimateur de least-squares du paramètre de canal, qui est ensuite utilisé pour estimer les symboles transmis successifs inconus. Nous faisons la connexion basique que les transformers montrent une excellente completion de séquence contextuelle avec quelques prompts, et donc ils devraient être en mesure d'implicitement déterminer le contexte latent à partir des symboles de pilote pour effectuer une estimation en contexte de transmissions. De plus, le transformer devrait utiliser l'information de manière efficace, c'est-à-dire qu'il devrait utiliser tous les piliers reçus pour obtenir les estimations de symboles les meilleures possibles. Grâce à des simulations étendues, nous montrons que l'estimation en contexte ne seulement outrepasse les approches standard, mais également atteint le même niveau de performance qu'un estimateur avec une connaissance parfaite du contexte latent dans quelques exemples de contexte. Ainsi, nous faisons un fort cas que les transformers sont des estimateurs efficaces en contexte dans le setting de la communication.
WinNet:time series forecasting with a window-enhanced period extracting and interacting
results: 在九个基准数据集上,WinNet 可以达到 SOTA 性能和较低的计算复杂度,比 CNN、MLP 和 Transformer 等方法高。WinNet 为 CNN 基于方法在时间序列预测任务中提供了潜在的替代方案,具有完美的性能和效率平衡。Abstract
Recently, Transformer-based methods have significantly improved state-of-the-art time series forecasting results, but they suffer from high computational costs and the inability to capture the long and short periodicity of time series. We present a highly accurate and simply structured CNN-based model for long-term time series forecasting tasks, called WinNet, including (i) Inter-Intra Period Encoder (I2PE) to transform 1D sequence into 2D tensor with long and short periodicity according to the predefined periodic window, (ii) Two-Dimensional Period Decomposition (TDPD) to model period-trend and oscillation terms, and (iii) Decomposition Correlation Block (DCB) to leverage the correlations of the period-trend and oscillation terms to support the prediction tasks by CNNs. Results on nine benchmark datasets show that the WinNet can achieve SOTA performance and lower computational complexity over CNN-, MLP-, Transformer-based approaches. The WinNet provides potential for the CNN-based methods in the time series forecasting tasks, with perfect tradeoff between performance and efficiency.
摘要
近期,基于Transformer的方法在时间序列预测中取得了显著的进步,但它们受到高计算成本和无法捕捉时间序列的长短周期性的限制。我们提出了一种高准确性和简单结构的CNN基本模型,称为WinNet,包括以下三个 Component:1. 时间序列 Period Encoder (I2PE),将1D序列转换成2D张量,并以预定的 periodic 窗口中的长短周期性进行编码。2. Two-Dimensional Period Decomposition (TDPD),用于模型期势和抽象的振荡项。3. Decomposition Correlation Block (DCB),利用 period-trend 和抽象的振荡项的相关性,以支持预测任务。在九个标准测试集上,WinNet可以 дости得State-of-the-art(SOTA)性能和低计算复杂度,比 CNN-, MLP-、Transformer-based 方法更高效。WinNet提供了CNN基本方法在时间序列预测任务中的潜在潜力,同时实现了精度和效率的完美平衡。
A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning
paper_authors: Samuel E. Otto, Nicholas Zolman, J. Nathan Kutz, Steven L. Brunton
for: 这 paper 是为了探讨在物理和机器学习中如何使用对称性来提高模型的泛化能力。
methods: 这 paper 使用了以下方法:1. 在训练模型时强制执行已知的对称性; 2. 从数据集或模型中发现未知的对称性; 3. 在训练过程中通过学习打砸对称性的潜在 канди达到提高模型性能。
results: 这 paper 提出了一种新的对称性核算法,可以在各种机器学习模型中应用,包括基函数回归、动力系统发现、多层感知机和图像空间中的神经网络。这种算法可以在训练过程中提高模型的泛化能力和性能。Abstract
Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincar\'{e} invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, multilayer perceptrons, and neural networks acting on spatial fields such as images.
摘要
自然中央存在对称,并且在物理和机器学习中扮演着越来越重要的角色。基本对称,如波兰卷积几何,使物理法则在宇宙中的各个方向都适用。对称是机器学习应用中达到这种推广能力的关键。例如,图像分类中的翻译对称使得使用 fewer parameters的卷积神经网络进行训练,可以达到现场的状态arta performance。在这篇论文中,我们提供了一种统一的理论和方法框架,用于在机器学习模型中包含对称。我们的方法包括:1. 在训练过程中强制执行已知对称; 2. 发现数据集或模型未知的对称; 3. 在训练过程中通过学习打砸对称时,找到对称的破坏。我们显示这些任务可以在共同的数学框架下进行,该框架的中心对象是在纤维线性 Lie 群动作下的 Lie 导数。我们扩展和统一了一些现有的结果,并证明了在训练过程中强制执行对称和发现对称是线性代数任务,这些任务是对 Lie 导数的 bilinear 结构进行对应。我们还提出了一种新的方法,基于 Lie 导数和核心 нор数relaxation,用于在训练机器学习模型时强制对称。我们解释了这些想法如何应用于各种机器学习模型,包括基函数回归、动力系统发现、多层感知机和图像空间中的神经网络。
Machine learning for accuracy in density functional approximations
results: 提高了计算机能力和预测力,并能 corrrect 基函数方法中的基本错误Here’s a more detailed explanation of each point:
for: The paper is written to improve the accuracy and efficiency of computational chemistry simulations and materials design using machine learning techniques.
methods: The paper uses machine learning approaches to enhance the predictive power of density functional theory and related approximations.
results: The paper reviews recent progress in applying machine learning to improve the accuracy of density functional and related approximations, and discusses the promises and challenges of devising machine learning models that can be transferred between different chemistries and materials classes.Abstract
Machine learning techniques have found their way into computational chemistry as indispensable tools to accelerate atomistic simulations and materials design. In addition, machine learning approaches hold the potential to boost the predictive power of computationally efficient electronic structure methods, such as density functional theory, to chemical accuracy and to correct for fundamental errors in density functional approaches. Here, recent progress in applying machine learning to improve the accuracy of density functional and related approximations is reviewed. Promises and challenges in devising machine learning models transferable between different chemistries and materials classes are discussed with the help of examples applying promising models to systems far outside their training sets.
摘要
使用机器学习技术加速 atomistic simulations 和材料设计已成为计算化学中不可或缺的工具。此外,机器学习方法还拥有提高计算效率电子结构方法的预测力的潜力,使其达到化学精度。本文将 recensione recent progress in applying machine learning to improve the accuracy of density functional and related approximations。文章还讨论了在不同化学和材料类型之间传递机器学习模型的承诺和挑战,并通过example应用有前提模型到远离训练集的系统。Translation:使用机器学习技术加速 atomistic simulations 和材料设计已成为计算化学中不可或缺的工具。此外,机器学习方法还拥有提高计算效率电子结构方法的预测力,使其达到化学精度。本文将 recensione recent progress in applying machine learning to improve the accuracy of density functional and related approximations。文章还讨论了在不同化学和材料类型之间传递机器学习模型的承诺和挑战,并通过example apply 有前提模型到远离训练集的系统。
results: 实验结果表明,提议的算法在比较难以处理的暴雨情况下可以更好地提高图像特征提取能力,并且比现有方法更高效。Abstract
Rain streaks bring complicated pixel intensity changes and additional gradients, greatly obstructing the extraction of image features from background. This causes serious performance degradation in feature-based applications. Thus, it is critical to remove rain streaks from a single rainy image to recover image features. Recently, many excellent image deraining methods have made remarkable progress. However, these human visual system-driven approaches mainly focus on improving image quality with pixel recovery as loss function, and neglect how to enhance image feature recovery ability. To address this issue, we propose a task-driven image deraining algorithm to strengthen image feature supply for subsequent feature-based applications. Due to the extensive use and strong practicability of Scale-Invariant Feature Transform (SIFT), we first propose two separate networks using distinct losses and modules to achieve two goals, respectively. One is difference of Gaussian (DoG) pyramid recovery network (DPRNet) for SIFT detection, and the other gradients of Gaussian images recovery network (GGIRNet) for SIFT description. Second, in the DPRNet we propose an alternative interest point loss that directly penalizes scale response extrema to recover the DoG pyramid. Third, we advance a gradient attention module in the GGIRNet to recover those gradients of Gaussian images. Finally, with the recovered DoG pyramid and gradients, we can regain SIFT key points. This divide-and-conquer scheme to set different objectives for SIFT detection and description leads to good robustness. Compared with state-of-the-art methods, experimental results demonstrate that our proposed algorithm achieves better performance in both the number of recovered SIFT key points and their accuracy.
摘要
雨束线会导致图像像素强度变化和附加的梯度,大大阻碍图像特征的提取,从背景中。这会导致图像特征提取的性能下降,影响后续的特征基于应用。因此,需要从雨束图像中除除雨束,以恢复图像特征。现在,许多出色的图像抽取方法已经做出了很好的进步。然而,这些人视系统驱动的方法主要关注图像质量的改进,忽略了如何提高图像特征提取能力。为解决这个问题,我们提出了一种任务驱动的图像抽取算法,以增强图像特征供应。由于Scale-Invariant Feature Transform(SIFT)的广泛使用和强大实用性,我们首先提出了两个分开的网络,使用不同的损失函数和模块,分别完成两个目标。一个是Diffusion of Gaussian(DoG) pyramid recovery network(DPRNet),用于SIFT检测;另一个是Gradients of Gaussian images recovery network(GGIRNet),用于SIFT描述。二、在DPRNet中,我们提出了一种alternative interest point损失函数,直接惩罚scale response极值,以恢复DoG pyramid。三、在GGIRNet中,我们提出了一种gradient attention模块,用于恢复Gaussian图像的梯度。最后,通过恢复DoG pyramid和梯度,我们可以重新获得SIFT关键点。这种分解并且分配不同的目标 для SIFT检测和描述,导致良好的Robustness。与现状的方法相比,我们的提出的算法在recovered SIFT关键点的数量和准确性方面具有更好的性能。
methods: 本论文提出了两种通annel估算方法:两stage RIS OFF-ON方法和增强alternating least squares(E-ALS)方法。两种方法都利用tensor模型的结构来分离估算所有通信频道,而且比传统的最小二乘(LS)方法更高效。
results: 数值仿真结果表明,E-ALS方法可以提供最高精度的估算,但只有与两stage方法的运行时间相似。Abstract
We consider a narrowband MIMO reconfigurable intelligent surface (RIS)-assisted wireless communication system and use tensor signal modelling techniques to individually estimate all communication channels including the non-RIS channels (direct path) and decoupled RIS channels. We model the received signal as a third-order tensor composed of two CANDECOMP/PARAFAC decomposition terms for the non-RIS and the RIS-assisted links, respectively, and we propose two channel estimation methods based on an iterative alternating least squares (ALS) algorithm: The two-stage RIS OFF-ON method estimates each of the non-RIS and RIS-assisted terms in two pilot training stages, whereas the enhanced alternating least squares (E-ALS) method improves upon the ALS algorithm to jointly estimate all channels over the full training duration. A key benefit of both methods compared to the traditional least squares (LS) solution is that they exploit the structure of the tensor model to obtain decoupled estimates of all communication channels. We provide the computational complexities to obtain each of the channel estimates for our two proposed methods. Numerical simulations are used to evaluate the accuracy and verify the computational complexities of the proposed two-stage RIS OFF-ON, and E-ALS, and compare them to the traditional LS methods. Results show that E-ALS will obtain the most accurate estimate while only having a slightly higher run-time than the two-stage method.
摘要
我们考虑一个宽频段多Input多Output(MIMO)启动智能表面(RIS)协助无线通信系统,并使用张量信号模型技术来分别估算所有通信频道,包括非RIS频道(直接路径)和分离的RIS频道。我们模型接收信号为一个第三个张量,由两个CANDECOMP/PARAFAC分解项组成,一个是非RIS链和RIS协助链的两个分别链路。我们提出了两种通道估算方法,基于轮换最小二乘(ALS)算法:一是两stage RIS OFF-ON方法,它在两个预训练阶段中分别估算非RIS和RIS协助链的每个通信频道,而另一种是提高ALS算法的增强ALS方法,它在全duration训练时间内同时估算所有通信频道。与传统最小二乘(LS)方法相比,两种方法都利用张量模型的结构来获得分离的所有通信频道估算。我们提供了每个通道估算的计算复杂度。我们通过数值仿真来评估和验证我们的两种提议方法的准确性和计算复杂度,并与传统LS方法进行比较。结果表明,增强ALS方法将获得最准确的估算,只有与两stage方法的运行时间有些着。
Effective filtering approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization
paper_authors: Zhou Fang, Ankit Gupta, Mustafa Khammash
For: Joint parameter-state estimation in stochastic differential equations (SDEs)* Methods: Rao-Blackwellization and modularization* Results: Superior performance compared to existing approaches, with reduced computational complexity and mitigated issues of sample degeneracy and information loss.Here’s the Simplified Chinese text for each point:
for: 用于joint参数-状态估计在涨见微分方程(SDEs)中
methods: 使用Rao-Blackwellization和模块化
results: 与现有方法相比,表现更佳,具有减少计算复杂性和消除样本缺乏和信息损失等问题。Abstract
Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemming from the dynamics of the particles generated to represent system parameters. This paper provides a novel and effective approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization. Our method operates in two layers: the first layer estimates the system states using a bootstrap particle filter, and the second layer marginalizes out system parameters explicitly. This strategy circumvents the need to generate particles representing system parameters, thereby mitigating their associated problems of sample degeneracy and information loss. Moreover, our method employs a modularization approach when integrating out the parameters, which significantly reduces the computational complexity. All these designs ensure the superior performance of our method. Finally, a numerical example is presented to illustrate that our method outperforms existing approaches by a large margin.
摘要
This paper proposes a novel and effective approach for joint parameter-state estimation in SDEs through Rao-Blackwellization and modularization. Our method operates in two layers: the first layer estimates the system states using a bootstrap particle filter, and the second layer marginalizes out system parameters explicitly. By avoiding the need to generate particles representing system parameters, our method mitigates the associated problems of sample degeneracy and information loss. Additionally, our method employs a modularization approach when integrating out the parameters, which significantly reduces the computational complexity. These designs ensure the superior performance of our method.To demonstrate the effectiveness of our approach, we provide a numerical example that shows our method outperforms existing methods by a large margin.
Regularized Shannon sampling formulas related to the special affine Fourier transform
results: 对比于Shannon抽取系列,这种减震抽取方法具有加速衰减的抽取误差和在噪声存在时的数值稳定性。数值实验证明了理论结论。Abstract
In this paper, we present new regularized Shannon sampling formulas related to the special affine Fourier transform (SAFT). These sampling formulas use localized sampling with special compactly supported window functions, namely B-spline, sinh-type, and continuous Kaiser-Bessel window functions. In contrast to the Shannon sampling series for SAFT, the regularized Shannon sampling formulas for SAFT possesses an exponential decay of the approximation error and are numerically robust in the presence of noise, if certain oversampling condition is fulfilled. Several numerical experiments illustrate the theoretical results.
摘要
在这篇论文中,我们提出了新的减杂化Shannon抽取方法,与特殊直交傅立卷变换 (SAFT) 相关。这些抽取方法使用了本地化抽取,使用特殊压缩支持的窗函数,包括B-spline、sinh型和连续凯зер-贝塞尔窗函数。与Shannon抽取系列不同,我们的减杂化Shannon抽取方法具有辐射衰减的扩散误差,在噪声存在时是数值稳定的,只要满足certain oversampling condition。我们在数值实验中证明了这些理论结果。
A Leakage-based Method for Mitigation of Faulty Reconfigurable Intelligent Surfaces
paper_authors: N. Moghadas Gholian, M. Rossanese, P. Mursia, A. Garcia-Saavedra, A. Asadi, V. Sciancalepore, X. Costa-Pérez for:这篇论文旨在解决智能表面重配置技术在未来5G无线网络中的一个潜在问题,即不良信号散射。methods:本文提出了两种简单 yet effective的算法,它们基于最大化信号泄漏和噪声比率(SLNR)在预定的二维空间中,并适用于完美通道状态信息(CSI)和部分CSI情况下。results:数值和全波仿真结果表明,这两种算法可以提供更大的补偿效果,比对泄漏无法和参照方案。Abstract
Reconfigurable Intelligent Surfaces (RISs) are expected to be massively deployed in future beyond-5th generation wireless networks, thanks to their ability to programmatically alter the propagation environment, inherent low-cost and low-maintenance nature. Indeed, they are envisioned to be implemented on the facades of buildings or on moving objects. However, such an innovative characteristic may potentially turn into an involuntary negative behavior that needs to be addressed: an undesired signal scattering. In particular, RIS elements may be prone to experience failures due to lack of proper maintenance or external environmental factors. While the resulting Signal-to-Noise-Ratio (SNR) at the intended User Equipment (UE) may not be significantly degraded, we demonstrate the potential risks in terms of unwanted spreading of the transmit signal to non-intended UE. In this regard, we consider the problem of mitigating such undesired effect by proposing two simple yet effective algorithms, which are based on maximizing the Signal-to-Leakage- and-Noise-Ratio (SLNR) over a predefined two-dimensional (2D) area and are applicable in the case of perfect channel-state-information (CSI) and partial CSI, respectively. Numerical and full-wave simulations demonstrate the added gains compared to leakage-unaware and reference schemes.
摘要
“复 configurable 智能表面”(RIS)在未来第5代无线网络中大规模部署,感谢其可以通过程序修改媒体传播环境,具有低成本和低维护性。实际上,它们可能被实现在建筑物外墙或在移动物体上。然而,这种创新特点可能会变成不良行为:不需要的信号散射。具体来说,RIS元素可能因缺乏正确维护或外部环境因素而出现故障。尽管传输信号的噪声比(SNR)在意图用户设备(UE)上不会受到显著干扰,但我们表示这种不良影响的风险。在这种情况下,我们考虑了 mitigate 这种不良效果的两种简单 yet effective 算法,它们基于在预定的二维(2D)区域中 maximize 信号干扰比率(SLNR),并在完美通道状态信息(CSI)和部分 CSI 情况下都适用。数值和全波 simulations 表明,相比于干扰无法和参照方案,这两种方法增加了额外的优势。
Generating HSR Bogie Vibration Signals via Pulse Voltage-Guided Conditional Diffusion Model
for: Fault diagnosis of high-speed rail (HSR) bogies
methods: Pulse Voltage-Guided Conditional Diffusion Model (VGCDM)
results: Outperforms other generative models, achieving the best RSME, PSNR, and FSCS indicators, demonstrating its superiority in generating HSR bogie vibration signals.Abstract
Generative Adversarial Networks (GANs) for producing realistic signals, have substantially improved fault diagnosis algorithms in various Internet of Things (IoT) systems. However, challenges such as training instability and dynamical inaccuracy limit their utility in high-speed rail (HSR) bogie fault diagnosis. To address these challenges, we introduce the Pulse Voltage-Guided Conditional Diffusion Model (VGCDM). Unlike traditional implicit GANs, VGCDM adopts a sequential U-Net architecture, facilitating multi-phase denoising diffusion for generation, which bolsters training stability and mitigate convergence issues. VGCDM also incorporates control pulse voltage by cross-attention mechanism to ensure the alignment of vibration with voltage signals, enhancing the Conditional Diffusion Model's progressive controlablity. Consequently, solely straightforward sampling of control voltages, ensuring the efficient transformation from Gaussian Noise to vibration signals. This adaptability remains robust even in scenarios with time-varying speeds. To validate the effectiveness, we conducted two case studies using SQ dataset and high-simulation HSR bogie dataset. The results of our experiments unequivocally confirm that VGCDM outperforms other generative models, achieving the best RSME, PSNR, and FSCS, showing its superiority in conditional HSR bogie vibration signal generation. For access, our code is available at https://github.com/xuanliu2000/VGCDM.
摘要
Generative Adversarial Networks (GANs) 已经广泛应用于许多互联网东西 (IoT) 系统中,以生成实际的信号,提高了问题诊断算法的准确性。然而,在高速铁路 (HSR) bogie 问题诊断中,GANs 受到许多挑战,如训练不稳定和动态不准确。为了解决这些挑战,我们提出了普ulse Voltage-Guided Conditional Diffusion Model (VGCDM)。VGCDM 采用了序列 U-Net 架构,实现多个阶段的净化扩散,从而提高训练稳定性和抑制混合问题。VGCDM 还通过cross-attention机制来控制普ulse电压,确保振荡与电压信号的对齐,提高 Conditional Diffusion Model 的进行控制性。因此,只需单纯地采样控制电压,以确保高效地将 Gaussian Noise 转化为振荡信号。这种适应性能够在时间变化的情况下保持稳定。为证明效果,我们在 SQ 数据集和高 simulate HSR bogie 数据集上进行了两个案例研究。结果表明,VGCDM 明显超过了其他生成模型,实现最佳 RSME、PSNR 和 FSCS,证明其在Conditional HSR bogie 振荡信号生成中的优越性。对于更多信息,请参考我们的 GitHub 仓库 。
Intelligent Surface Empowered Integrated Sensing and Communication: From Coexistence to Reciprocity
results: 研究发现,IRS可以有效地扩大S&C覆盖范围和控制通信频率的度量,从而实现更高的集成增益。同时,研究还发现了不同部署策略之间的干扰关系,并提出了一些有前途的IRS增强ISAC的方向。Abstract
Integrated sensing and communication (ISAC) has attracted growing interests for sixth-generation (6G) and beyond wireless networks. The primary challenges faced by highly efficient ISAC include limited sensing and communication (S&C) coverage, constrained integration gain between S&C under weak channel correlations, and unknown performance boundary. Intelligent reflecting/refracting surfaces (IRSs) can effectively expand S&C coverage and control the degree of freedom of channels between the transmitters and receivers, thereby realizing increasing integration gains. In this work, we first delve into the fundamental characteristics of IRS-empowered ISAC and innovative IRS-assisted sensing architectures. Then, we discuss various objectives for IRS channel control and deployment optimization in ISAC systems. Furthermore, the interplay between S&C in different deployment strategies is investigated and some promising directions for IRS enhanced ISAC are outlined.
摘要
Integrated sensing and communication (ISAC) 在 sixth-generation (6G) 和更高版本无线网络中吸引了增长的关注。主要挑战包括有限的感知和通信 (S&C) 覆盖率、弱通道相关性下的约束集成增益,以及未知性能边界。智能反射/折射 поверхност (IRS) 可以有效地扩大 S&C 覆盖率,控制通信道之间的度量自由度,从而实现增加集成增益。在这种工作中,我们首先探讨了 ISAC 的基本特点和创新的 IRS 感知架构。然后,我们讨论了 ISAC 系统中 IRS 通道控制和部署优化的多种目标。此外,我们还研究了不同部署策略下的 S&C 间的互动,并提出了一些潜在的 IRS 增强 ISAC 的方向。
Deriving Characteristic Mode Eigenvalue Behavior Using Subduction of Group Representations
results: 该研究发现,在三维结构中,原来的交叠轨迹将分裂成多个不同的轨迹,形成了一种 Split Trace Crossing Avoidance(STCA)。这一发现可以解释三维结构中观察到的凹槽轨迹。此外,该研究还验证了这种知识的实用性,通过一个示例式天线设计,并在设计中选择了STCA在目标频率范围外,以避免输入匹配和远场图像的频率稳定性的负面影响。Abstract
A method to derive features of modal eigenvalue traces from known and understood solutions is proposed. It utilizes the concept of subduction from point group theory to obtain the symmetry properties of a target structure from those of a structure with a higher order of symmetry. This is applied exemplary to the analytically known characteristic modes (CMs) of the spherical shell. The eigenvalue behavior of a cube in free space and a cuboid on a perfectly electrically conducting plane are continuously derived from this. In this process, formerly crossing eigenvalue traces are found to split up, forming a split trace crossing avoidance (STCA). This finding is used to explain indentations in eigenvalue traces observed for three-dimensional structures, that are of increasing interest in recent literature. The utility of this knowledge is exemplified through a demonstrator antenna design. The dimensions of the antenna structure are chosen so the STCA is outside the target frequency range, avoiding negative impacts on input matching and the frequency stability of the far field patterns.
摘要
提出一种方法,用于从已知和理解的解的特征Derive特征 tracestracestructure的模态值轨迹。该方法利用点组 тео리 ahp 的投射 Property来获取目标结构的 симметry 属性,从而 derive the symmetry properties of a target structure from those of a structure with a higher order of symmetry. This is applied exemplary to the analytically known characteristic modes (CMs) of the spherical shell. The eigenvalue behavior of a cube in free space and a cuboid on a perfectly electrically conducting plane are continuously derived from this. In this process, formerly crossing eigenvalue traces are found to split up, forming a split trace crossing avoidance (STCA). This finding is used to explain indentations in eigenvalue traces observed for three-dimensional structures, that are of increasing interest in recent literature. The utility of this knowledge is exemplified through a demonstrator antenna design. The dimensions of the antenna structure are chosen so the STCA is outside the target frequency range, avoiding negative impacts on input matching and the frequency stability of the far field patterns.
On the Semi-Blind Mutually Referenced Equalizers for MIMO Systems
results: SB-MRE算法在比较其他线性算法(MMSE、ZF、MRE)的 simulate 结果中,在训练负担和复杂度方面表现出色,可以为无线通信系统中的频率响应和信号识别率提供一个有望的解决方案。Abstract
Minimizing training overhead in channel estimation is a crucial challenge in wireless communication systems. This paper presents an extension of the traditional blind algorithm, called "Mutually referenced equalizers" (MRE), specifically designed for MIMO systems. Additionally, we propose a novel semi-blind method, SB-MRE, which combines the benefits of pilot-based and MRE approaches to achieve enhanced performance while utilizing a reduced number of pilot symbols. Moreover, the SB-MRE algorithm helps to minimize complexity and training overhead and to remove the ambiguities inherent to blind processing. The simulation results demonstrated that SB-MRE outperforms other linear algorithms, i.e., MMSE, ZF, and MRE, in terms of training overhead symbols and complexity, thereby offering a promising solution to address the challenge of minimizing training overhead in channel estimation for wireless communication systems.
摘要
减少通信系统中的训练负担是无线通信系统中的一项重要挑战。这篇论文提出了基于多input多output(MIMO)系统的传统盲目算法扩展——共见平衡器(Mutually Referenced Equalizers,MRE)。此外,我们还提出了一种新的半盲目方法,半盲目MRE(SB-MRE),该方法结合了徽标基于和MRE方法的优点,以实现更高的性能,同时减少了徽标数量。此外,SB-MRE算法可以减少复杂性和训练负担,并解决盲目处理中存在的不确定性。实验结果表明,SB-MRE在训练负担符号和复杂性方面与其他线性算法(MMSE、ZF、MRE)相比,表现更好,因此可以有效地解决无线通信系统中的训练负担减少挑战。
Improving MIMO channel estimation via receive power feedback
results: 研究显示,使用相应的MMSE可以提高估计精度,而使用MAP估计器的有用性取决于操作的SNR。Abstract
Estimating the channel state is known to be an important problem in wireless networks. To this end, it matters to exploit all the available information to improve channel estimation accuracy as much as possible. It turns out that the problem of exploiting the information associated with the receive power feedback (e.g., the received signal strength indicator -RSSI-) has not been identified and solved; in this setup, the transmitter is assumed to receive feedback from all the receivers in presence. As shown in this paper, to solve this problem, classical estimation tools can be used. Using the corresponding MMSE is shown to be always beneficial, whereas the relevance of using the MAP estimator would depend on the operating SNR.
摘要
<>将文本翻译成简化中文。<>无线网络中估算通道状态是一个重要的问题。为此,需要尽可能地利用所有可用的信息来提高通道估算精度。实际上,受到返回Feedback(如接收信号强度指示器(RSSI))的信息利用问题尚未得到解决。在这种设置下,传输器接收来自所有接收器的反馈。根据这篇论文,可以使用经典的估算工具来解决这个问题。使用相应的MMSE是有利的,而使用MAP估算器的有用性则取决于操作SNR。
paper_authors: Yanir Maymon, Israel Nelken, Boaz Rafaely
for: 这 paper 是为了研究听话器位置Estimation(SPL)技术,特别是在语音沟通、视频会议和机器人听力方面。
methods: 这 paper 提出了一些人类听力的听话器位置Estimation(SPL)技术,包括使用快速 Fourier transform(STFT)和Head Related Transfer Function(HRTF)集来实现方向性搜索。
results: 实验和 simulations 表明,提出的方法可以和现有方法相比,并且在两种 binAural DOA estimation 方法上进行了应用。Abstract
Speaker localization for binaural microphone arrays has been widely studied for applications such as speech communication, video conferencing, and robot audition. Many methods developed for this task, including the direct path dominance (DPD) test, share common stages in their processing, which include transformation using the short-time Fourier transform (STFT), and a direction of arrival (DOA) search that is based on the head related transfer function (HRTF) set. In this paper, alternatives to these processing stages, motivated by human hearing, are proposed. These include incorporating an auditory filter bank to replace the STFT, and a new DOA search based on transformed HRTF as steering vectors. A simulation study and an experimental study are conducted to validate the proposed alternatives, and both are applied to two binaural DOA estimation methods; the results show that the proposed method compares favorably with current methods.
摘要
喇叭识别技术对双耳麦克频率阵列进行广泛研究,用于语音通信、视频会议和机器听觉等应用。许多这些技术的处理阶段相似,包括使用短时傅立叶变换(STFT)和基于头相关传输函数(HRTF)的方向来源搜索。本文提出了基于人类听觉的代替方案,包括使用听觉滤波器阵列取代STFT,以及基于变换HRTF的新的方向来源搜索方法。我们进行了一个 simulated study 和一个实验研究,以验证提议的方法,并应用于两种双耳DOA估计方法。结果表明,提议的方法与当前方法相比,具有竞争力。
results: HDFE在函数到函数映射任务中达到了竞争性的性能,并在点云表面法向量估计任务中实现了12%和15%的错误减少。此外,通过将HDFEintegrated到基eline网络中,我们提高了基eline网络的SOTA标准偏好值2.5%和1.7%。Abstract
We propose Hyper-Dimensional Function Encoding (HDFE). Given samples of a continuous object (e.g. a function), HDFE produces an explicit vector representation of the given object, invariant to the sample distribution and density. Sample distribution and density invariance enables HDFE to consistently encode continuous objects regardless of their sampling, and therefore allows neural networks to receive continuous objects as inputs for machine learning tasks, such as classification and regression. Besides, HDFE does not require any training and is proved to map the object into an organized embedding space, which facilitates the training of the downstream tasks. In addition, the encoding is decodable, which enables neural networks to regress continuous objects by regressing their encodings. Therefore, HDFE serves as an interface for processing continuous objects. We apply HDFE to function-to-function mapping, where vanilla HDFE achieves competitive performance as the state-of-the-art algorithm. We apply HDFE to point cloud surface normal estimation, where a simple replacement from PointNet to HDFE leads to immediate 12% and 15% error reductions in two benchmarks. In addition, by integrating HDFE into the PointNet-based SOTA network, we improve the SOTA baseline by 2.5% and 1.7% in the same benchmarks.
摘要
我们提议使用超dimensional函数编码(HDFE)。给定一个连续的函数(例如),HDFE可以生成这个函数的明确的 вектор表示,不受样本分布和概率的影响。这意味着HDFE可以一样有效地编码不同的样本,从而允许神经网络作为机器学习任务的输入。此外,HDFE不需要任何训练,并且证明可以将函数映射到有组织的嵌入空间中,从而便于下游任务的训练。此外,编码是可解码的,这使得神经网络可以通过解码来重构连续函数。因此,HDFE可以作为连续函数的接口。我们在函数到函数映射中应用HDFE,其中vanilla HDFE可以与当前状态的最佳算法竞争。我们在点云表面法向量估计中应用HDFE,将PointNet换为HDFE后,直接下降12%和15%的错误率在两个标准 benchmar k。此外,通过将HDFE集成到PointNet-based SOTA网络中,我们提高了SOTA标准 benchmar k的基eline by 2.5%和1.7%。
Image Restoration with Point Spread Function Regularization and Active Learning
results: 该算法可以有效地提高天文图像中细节的重建,提高观测图像的质量,并可以应用于大规模天文 sky 探测数据,如 LSST、Euclid 和 CSST 等。Abstract
Large-scale astronomical surveys can capture numerous images of celestial objects, including galaxies and nebulae. Analysing and processing these images can reveal intricate internal structures of these objects, allowing researchers to conduct comprehensive studies on their morphology, evolution, and physical properties. However, varying noise levels and point spread functions can hamper the accuracy and efficiency of information extraction from these images. To mitigate these effects, we propose a novel image restoration algorithm that connects a deep learning-based restoration algorithm with a high-fidelity telescope simulator. During the training stage, the simulator generates images with different levels of blur and noise to train the neural network based on the quality of restored images. After training, the neural network can directly restore images obtained by the telescope, as represented by the simulator. We have tested the algorithm using real and simulated observation data and have found that it effectively enhances fine structures in blurry images and increases the quality of observation images. This algorithm can be applied to large-scale sky survey data, such as data obtained by LSST, Euclid, and CSST, to further improve the accuracy and efficiency of information extraction, promoting advances in the field of astronomical research.
摘要
大规模天文观测可以捕捉到许多天体对象的图像,包括星系和云气。分析和处理这些图像可以揭示天体对象的内部细节, allowing researchers to conduct comprehensive studies on their morphology, evolution, and physical properties. 然而,天文图像中的噪声和点扩散函数可能会妨碍信息提取的准确性和效率。为了解决这些问题,我们提出了一种新的图像修复算法,该算法将深度学习基于的修复算法与高精度天文望远镜模拟器相连接。在训练阶段,模拟器生成了不同水平的噪声和扩散函数来训练神经网络,根据修复图像的质量来评估神经网络的性能。一旦训练完成,神经网络可以直接修复天文望远镜所获得的图像,如由模拟器所表示。我们已经对实际和模拟观测数据进行了测试,发现该算法可以有效地增强杂化图像中的细节,并提高观测图像的质量。这种算法可以应用于大规模天文观测数据,如LSST、Euclid和CSST等,以进一步提高信息提取的准确性和效率,推动天文研究领域的进步。
Object-centric Video Representation for Long-term Action Anticipation
paper_authors: Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun
for: 这 paper 的目的是建立长期行为预测视频中的对象中心表示。
methods: 这 paper 使用了可见语言预测模型来提取任务特定的对象中心表示,而不需要培okeducated对象检测器或全部弱类型的视频识别框架。
results: 这 paper 的结果表明,通过使用 transformer 型神经网络,可以在不同的时间尺度 retrieve 相关的对象来预测人物与对象之间的互动。 extent 评估在 Ego4D、50Salads 和 EGTEA Gaze+ 测试集上,结果表明该方法的有效性。Abstract
This paper focuses on building object-centric representations for long-term action anticipation in videos. Our key motivation is that objects provide important cues to recognize and predict human-object interactions, especially when the predictions are longer term, as an observed "background" object could be used by the human actor in the future. We observe that existing object-based video recognition frameworks either assume the existence of in-domain supervised object detectors or follow a fully weakly-supervised pipeline to infer object locations from action labels. We propose to build object-centric video representations by leveraging visual-language pretrained models. This is achieved by "object prompts", an approach to extract task-specific object-centric representations from general-purpose pretrained models without finetuning. To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales. We conduct extensive evaluations on the Ego4D, 50Salads, and EGTEA Gaze+ benchmarks. Both quantitative and qualitative results confirm the effectiveness of our proposed method.
摘要
Translation into Simplified Chinese:这篇论文关注建立对象中心的表示方法,以便在视频中预测长期行为。我们的关键动机是,对象提供了重要的准备信息,用于识别和预测人物-物体之间的互动,特别是当预测更长期时,观察到的背景对象可能会被人物使用。我们发现现有的对象基于视频识别框架 either 假设存在域内指导的对象检测器或者采用完全无监督的管道来从动作标签中推断对象位置。我们提议使用可视语言预训练模型来建立对象中心的视频表示。这是通过 "对象提示" 的方法来从通用预训练模型中提取任务特定的对象中心表示而实现的。用于识别和预测人物-物体互动的Transformer型神经网络,允许在不同的时间尺度上进行对象检索。我们在 Ego4D、50Salads 和 EGTEA Gaze+ 测试benchmark上进行了广泛的评估。both quantitative和qualitative的结果证明了我们提出的方法的有效性。
Multi-task Deep Convolutional Network to Predict Sea Ice Concentration and Drift in the Arctic Ocean
For: This paper aims to improve the prediction of sea ice concentration (SIC) and sea ice drift (SID) in the Arctic Ocean using a novel multi-task fully convolutional network architecture called hierarchical information-sharing U-net (HIS-Unet).* Methods: The HIS-Unet model uses weighting attention modules (WAMs) to allow the SIC and SID layers to share information and assist each other’s prediction. The model is trained on a large dataset of satellite images and is compared to other statistical approaches, sea ice physical models, and neural networks without information-sharing units.* Results: The HIS-Unet model outperforms other methods in predicting both SIC and SID, particularly in areas with seasonal sea ice changes. The weight values of the WAMs suggest that SIC information is more important for SID prediction, and information sharing is more active in sea ice edges than in the central Arctic.Abstract
Forecasting sea ice concentration (SIC) and sea ice drift (SID) in the Arctic Ocean is of great significance as the Arctic environment has been changed by the recent warming climate. Given that physical sea ice models require high computational costs with complex parameterization, deep learning techniques can effectively replace the physical model and improve the performance of sea ice prediction. This study proposes a novel multi-task fully conventional network architecture named hierarchical information-sharing U-net (HIS-Unet) to predict daily SIC and SID. Instead of learning SIC and SID separately at each branch, we allow the SIC and SID layers to share their information and assist each other's prediction through the weighting attention modules (WAMs). Consequently, our HIS-Unet outperforms other statistical approaches, sea ice physical models, and neural networks without such information-sharing units. The improvement of HIS-Unet is obvious both for SIC and SID prediction when and where sea ice conditions change seasonally, which implies that the information sharing through WAMs allows the model to learn the sudden changes of SIC and SID. The weight values of the WAMs imply that SIC information plays a more critical role in SID prediction, compared to that of SID information in SIC prediction, and information sharing is more active in sea ice edges (seasonal sea ice) than in the central Arctic (multi-year sea ice).
摘要
预测北极海洋中的海冰浓度(SIC)和海冰流动(SID)具有极大的重要性,因为最近的气候变化已经对北极环境产生了深远的影响。 Physical sea ice模型需要高度的计算成本和复杂的参数化,深度学习技术可以有效地取代物理模型并提高海冰预测的性能。本研究提出了一种新的多任务全连接网络架构,即层次信息共享U-Net(HIS-Unet),用于每天预测SIC和SID。而不是在每个分支中独立地学习SIC和SID,我们允许SIC和SID层共享信息,通过重量注意模块(WAMs)来帮助对方的预测。因此,我们的HIS-Unet在SIC和SID预测中表现出色,超越了其他统计方法、物理海冰模型和无此信息共享单元的神经网络。HIS-Unet在SIC和SID预测中的改进明显,特别是在季节性变化的海冰Conditions下,这表明信息共享通过WAMs使得模型能够学习海冰的快速变化。WAMs的权值表明SIC信息在SID预测中扮演更重要的角色,相比之下,SID信息在SIC预测中的作用较弱,而信息共享更活跃在海冰 Edge(季节性海冰)than in the central Arctic(多年海冰)。
Medi-CAT: Contrastive Adversarial Training for Medical Image Classification
results: 这篇论文的实验结果显示,提出的训练方法可以在四个医疗影像分类资料集上提高精度,与已知方法相比提高精度达2%,并且与基eline方法相比提高精度达4.1%。Abstract
There are not many large medical image datasets available. For these datasets, too small deep learning models can't learn useful features, so they don't work well due to underfitting, and too big models tend to overfit the limited data. As a result, there is a compromise between the two issues. This paper proposes a training strategy Medi-CAT to overcome the underfitting and overfitting phenomena in medical imaging datasets. Specifically, the proposed training methodology employs large pre-trained vision transformers to overcome underfitting and adversarial and contrastive learning techniques to prevent overfitting. The proposed method is trained and evaluated on four medical image classification datasets from the MedMNIST collection. Our experimental results indicate that the proposed approach improves the accuracy up to 2% on three benchmark datasets compared to well-known approaches, whereas it increases the performance up to 4.1% over the baseline methods.
摘要
“医疗图像Dataset的大小不多,这使得深度学习模型过于简单时不能学习有用的特征,导致过拟合问题,而过拟合的限制数据也使得模型过度适应。为了解决这两个问题,本文提出了一种培训策略 Medi-CAT。Specifically,提出的培训方法使用大型预训练视transformer来解决过拟合问题,并使用对抗和对比学习技术来避免过度适应。本方法在MedMNIST数据集上进行了四个医疗图像分类任务的训练和评估,实验结果表明,相比已知方法,我们的方法可以在三个 benchark 数据集上提高准确率达2%,而在基eline方法上提高性能达4.1%。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
Joint Depth Prediction and Semantic Segmentation with Multi-View SAM
results: 该方法在ScanNet dataset上的量化和质量研究中,与单个任务MVS和 semantic segmentation模型以及多视图监控方法相比,具有了更高的性能。Abstract
Multi-task approaches to joint depth and segmentation prediction are well-studied for monocular images. Yet, predictions from a single-view are inherently limited, while multiple views are available in many robotics applications. On the other end of the spectrum, video-based and full 3D methods require numerous frames to perform reconstruction and segmentation. With this work we propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM). This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder. We report the mutual benefit that both tasks enjoy in our quantitative and qualitative studies on the ScanNet dataset. Our approach consistently outperforms single-task MVS and segmentation models, along with multi-task monocular methods.
摘要
多任务方法对单视图图像的深度预测和分割预测已经广泛研究过,然而单视图图像的预测受限,而多视图图像在机器人应用中却有很多可用的视图。在这种情况下,我们提出了一种基于多视图ステレオ(MVS)技术的深度预测方法,该方法利用了segment anything模型(SAM)的丰富semantic特征。这种增强的深度预测,然后作为我们基于Transformer的semantic分割解码器的引用。我们在ScanNet数据集上进行了量化和质量研究,发现我们的方法在单任务MVS和分割模型以及多任务monocular方法上表现出了优势。
Spuriosity Rankings for Free: A Simple Framework for Last Layer Retraining Based on Object Detection
paper_authors: Mohammad Azizmalayeri, Reza Abbasi, Amir Hosein Haji Mohammad rezaie, Reihaneh Zohrabi, Mahdi Amiri, Mohammad Taghi Manzuri, Mohammad Hossein Rohban
results: 论文的实验结果显示,这个排名框架可以有效地排序图像根据假对象程度,并且可以将这些图像用于最后层重训,以提高模型的可靠性。Abstract
Deep neural networks have exhibited remarkable performance in various domains. However, the reliance of these models on spurious features has raised concerns about their reliability. A promising solution to this problem is last-layer retraining, which involves retraining the linear classifier head on a small subset of data without spurious cues. Nevertheless, selecting this subset requires human supervision, which reduces its scalability. Moreover, spurious cues may still exist in the selected subset. As a solution to this problem, we propose a novel ranking framework that leverages an open vocabulary object detection technique to identify images without spurious cues. More specifically, we use the object detector as a measure to score the presence of the target object in the images. Next, the images are sorted based on this score, and the last-layer of the model is retrained on a subset of the data with the highest scores. Our experiments on the ImageNet-1k dataset demonstrate the effectiveness of this ranking framework in sorting images based on spuriousness and using them for last-layer retraining.
摘要
深度神经网络在不同领域表现出色,但它们对假性特征的依赖引起了可靠性问题。一种有 Promise 的解决方案是最后层重新训练,即在一小部分数据上重新训练线性分类头。然而,选择这小部分数据需要人工监督,这限制了其扩展性。此外,假性特征仍然可能存在于选择的小部分数据中。为解决这问题,我们提出了一种新的排名框架,利用开放词汇对象检测技术来识别没有假性特征的图像。更具体来说,我们使用对象检测器作为评分对图像中目标对象的存在进行分数。然后对图像进行排名,并将最后层模型重新训练在排名最高的数据上。我们在ImageNet-1k dataset上进行实验,并证明了这种排名框架可以准确地排序图像根据假性特征,并使用它们进行最后层重新训练。
YOLOv8-Based Visual Detection of Road Hazards: Potholes, Sewer Covers, and Manholes
results: 研究表明,YOLOv8在不同的照明、路况、危险物体大小和类型等多种情况下具有优秀的检测精度和通用性。Abstract
Effective detection of road hazards plays a pivotal role in road infrastructure maintenance and ensuring road safety. This research paper provides a comprehensive evaluation of YOLOv8, an object detection model, in the context of detecting road hazards such as potholes, Sewer Covers, and Man Holes. A comparative analysis with previous iterations, YOLOv5 and YOLOv7, is conducted, emphasizing the importance of computational efficiency in various applications. The paper delves into the architecture of YOLOv8 and explores image preprocessing techniques aimed at enhancing detection accuracy across diverse conditions, including variations in lighting, road types, hazard sizes, and types. Furthermore, hyperparameter tuning experiments are performed to optimize model performance through adjustments in learning rates, batch sizes, anchor box sizes, and augmentation strategies. Model evaluation is based on Mean Average Precision (mAP), a widely accepted metric for object detection performance. The research assesses the robustness and generalization capabilities of the models through mAP scores calculated across the diverse test scenarios, underlining the significance of YOLOv8 in road hazard detection and infrastructure maintenance.
摘要
effet de road hazards detection plays a crucial role in maintaining road infrastructure and ensuring road safety. This research paper provides a comprehensive evaluation of YOLOv8, an object detection model, in the context of detecting road hazards such as potholes, Sewer Covers, and Man Holes. A comparative analysis with previous iterations, YOLOv5 and YOLOv7, is conducted, emphasizing the importance of computational efficiency in various applications. The paper delves into the architecture of YOLOv8 and explores image preprocessing techniques aimed at enhancing detection accuracy across diverse conditions, including variations in lighting, road types, hazard sizes, and types. Furthermore, hyperparameter tuning experiments are performed to optimize model performance through adjustments in learning rates, batch sizes, anchor box sizes, and augmentation strategies. Model evaluation is based on Mean Average Precision (mAP), a widely accepted metric for object detection performance. The research assesses the robustness and generalization capabilities of the models through mAP scores calculated across the diverse test scenarios, underlining the significance of YOLOv8 in road hazard detection and infrastructure maintenance.Here's the translation breakdown:* "Effective detection of road hazards" is translated as " effet de road hazards detection" ( effet is a French word that means "effect" or "influence", and de is a preposition that indicates the subject of the sentence)* "plays a crucial role" is translated as "plays a crucial role" (literally "plays a crucial part")* "in maintaining road infrastructure and ensuring road safety" is translated as "in maintaining road infrastructure and ensuring road safety" (literally "in maintaining the road infrastructure and ensuring the road safety")* "This research paper" is translated as "this research paper" (literally "this research report")* "provides a comprehensive evaluation of YOLOv8" is translated as "provides a comprehensive evaluation of YOLOv8" (literally "provides a comprehensive assessment of YOLOv8")* "in the context of detecting road hazards" is translated as "in the context of detecting road hazards" (literally "in the context of detecting the road hazards")* "such as potholes, Sewer Covers, and Man Holes" is translated as "such as potholes, Sewer Covers, and Man Holes" (literally "such as potholes, Sewer Covers, and Man Holes")* "A comparative analysis with previous iterations, YOLOv5 and YOLOv7" is translated as "A comparative analysis with previous iterations, YOLOv5 and YOLOv7" (literally "A comparative analysis with previous versions, YOLOv5 and YOLOv7")* "is conducted" is translated as "is conducted" (literally "is carried out")* "emphasizing the importance of computational efficiency" is translated as "emphasizing the importance of computational efficiency" (literally "emphasizing the importance of computational efficiency")* "in various applications" is translated as "in various applications" (literally "in various applications")* "The paper delves into the architecture of YOLOv8" is translated as "The paper delves into the architecture of YOLOv8" (literally "The paper explores the architecture of YOLOv8")* "and explores image preprocessing techniques" is translated as "and explores image preprocessing techniques" (literally "and explores the image preprocessing techniques")* "aimed at enhancing detection accuracy across diverse conditions" is translated as "aimed at enhancing detection accuracy across diverse conditions" (literally "aimed at improving the detection accuracy across diverse conditions")* "including variations in lighting, road types, hazard sizes, and types" is translated as "including variations in lighting, road types, hazard sizes, and types" (literally "including variations in lighting, road types, hazard sizes, and types")* "Furthermore" is translated as "furthermore" (literally "furthermore")* "hyperparameter tuning experiments" is translated as "hyperparameter tuning experiments" (literally "hyperparameter tuning experiments")* "are performed" is translated as "are performed" (literally "are carried out")* "to optimize model performance" is translated as "to optimize model performance" (literally "to optimize the model performance")* "through adjustments in learning rates, batch sizes, anchor box sizes, and augmentation strategies" is translated as "through adjustments in learning rates, batch sizes, anchor box sizes, and augmentation strategies" (literally "through adjustments in the learning rates, batch sizes, anchor box sizes, and augmentation strategies")* "Model evaluation is based on Mean Average Precision (mAP)" is translated as "Model evaluation is based on Mean Average Precision (mAP)" (literally "Model evaluation is based on the Mean Average Precision (mAP)")* "a widely accepted metric for object detection performance" is translated as "a widely accepted metric for object detection performance" (literally "a widely accepted indicator for object detection performance")* "The research assesses the robustness and generalization capabilities" is translated as "The research assesses the robustness and generalization capabilities" (literally "The research evaluates the robustness and generalization capabilities")* "of the models through mAP scores" is translated as "of the models through mAP scores" (literally "of the models through the mAP scores")* "calculated across the diverse test scenarios" is translated as "calculated across the diverse test scenarios" (literally "calculated across the various test scenarios")* "underlining the significance of YOLOv8" is translated as "underlining the significance of YOLOv8" (literally "underlining the importance of YOLOv8")* "in road hazard detection and infrastructure maintenance" is translated as "in road hazard detection and infrastructure maintenance" (literally "in the detection of road hazards and the maintenance of the road infrastructure")
View Classification and Object Detection in Cardiac Ultrasound to Localize Valves via Deep Learning
results: 我们的对象检测实验表明,可以使用深度学习来精确地定位和识别多个心 valves。Abstract
Echocardiography provides an important tool for clinicians to observe the function of the heart in real time, at low cost, and without harmful radiation. Automated localization and classification of heart valves enables automatic extraction of quantities associated with heart mechanical function and related blood flow measurements. We propose a machine learning pipeline that uses deep neural networks for separate classification and localization steps. As the first step in the pipeline, we apply view classification to echocardiograms with ten unique anatomic views of the heart. In the second step, we apply deep learning-based object detection to both localize and identify the valves. Image segmentation based object detection in echocardiography has been shown in many earlier studies but, to the best of our knowledge, this is the first study that predicts the bounding boxes around the valves along with classification from 2D ultrasound images with the help of deep neural networks. Our object detection experiments applied to the Apical views suggest that it is possible to localize and identify multiple valves precisely.
摘要
echocardiography 提供了一种重要的工具,帮助临床医生在实时、低成本、无辐射的情况下观察心脏的功能。自动识别和分类心脏阀门可以自动提取心脏机械功能相关的量和血流测量。我们提议一个基于深度学习的机器学习管道,其中首先应用视图分类onto echocardiograms中的十个特有的心脏视图。其次,我们应用深度学习基于对象检测来both地址和识别阀门。在echocardiography中,使用深度学习对象检测已经在许多之前的研究中得到了证明,但是,根据我们所知,这是第一个通过深度学习网络预测阀门 bounding box 以及分类的2D ultrasound 图像的研究。我们的对象检测实验在Apical View中表明,可以准确地Localize和识别多个阀门。
FPO++: Efficient Encoding and Rendering of Dynamic Neural Radiance Fields by Analyzing and Enhancing Fourier PlenOctrees
results: 研究者通过量化和质量评估来证明了提高后的Fourier Pleoctrees的效果,并且在Synthetic和实际场景中都有出色的表现。Abstract
Fourier PlenOctrees have shown to be an efficient representation for real-time rendering of dynamic Neural Radiance Fields (NeRF). Despite its many advantages, this method suffers from artifacts introduced by the involved compression when combining it with recent state-of-the-art techniques for training the static per-frame NeRF models. In this paper, we perform an in-depth analysis of these artifacts and leverage the resulting insights to propose an improved representation. In particular, we present a novel density encoding that adapts the Fourier-based compression to the characteristics of the transfer function used by the underlying volume rendering procedure and leads to a substantial reduction of artifacts in the dynamic model. Furthermore, we show an augmentation of the training data that relaxes the periodicity assumption of the compression. We demonstrate the effectiveness of our enhanced Fourier PlenOctrees in the scope of quantitative and qualitative evaluations on synthetic and real-world scenes.
摘要
“傅立叶数字化(Fourier PlenOctrees)已经证明是实时渲染动态神经辉度场(NeRF)的有效表现方法。 despite its many advantages, this method suffers from the artifacts introduced by the involved compression when combining it with recent state-of-the-art techniques for training the static per-frame NeRF models. In this paper, we perform an in-depth analysis of these artifacts and leverage the resulting insights to propose an improved representation. In particular, we present a novel density encoding that adapts the Fourier-based compression to the characteristics of the transfer function used by the underlying volume rendering procedure and leads to a substantial reduction of artifacts in the dynamic model. Furthermore, we show an augmentation of the training data that relaxes the periodicity assumption of the compression. We demonstrate the effectiveness of our enhanced Fourier PlenOctrees in the scope of quantitative and qualitative evaluations on synthetic and real-world scenes.”Note that Simplified Chinese is used here, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.
DDAM-PS: Diligent Domain Adaptive Mixer for Person Search
results: 本文通过实验证明了提议的效果。在PRW和CUHK-SYSU等两个复杂的数据集上,我们的方法表现出了优秀的性能。Abstract
Person search (PS) is a challenging computer vision problem where the objective is to achieve joint optimization for pedestrian detection and re-identification (ReID). Although previous advancements have shown promising performance in the field under fully and weakly supervised learning fashion, there exists a major gap in investigating the domain adaptation ability of PS models. In this paper, we propose a diligent domain adaptive mixer (DDAM) for person search (DDAP-PS) framework that aims to bridge a gap to improve knowledge transfer from the labeled source domain to the unlabeled target domain. Specifically, we introduce a novel DDAM module that generates moderate mixed-domain representations by combining source and target domain representations. The proposed DDAM module encourages domain mixing to minimize the distance between the two extreme domains, thereby enhancing the ReID task. To achieve this, we introduce two bridge losses and a disparity loss. The objective of the two bridge losses is to guide the moderate mixed-domain representations to maintain an appropriate distance from both the source and target domain representations. The disparity loss aims to prevent the moderate mixed-domain representations from being biased towards either the source or target domains, thereby avoiding overfitting. Furthermore, we address the conflict between the two subtasks, localization and ReID, during domain adaptation. To handle this cross-task conflict, we forcefully decouple the norm-aware embedding, which aids in better learning of the moderate mixed-domain representation. We conduct experiments to validate the effectiveness of our proposed method. Our approach demonstrates favorable performance on the challenging PRW and CUHK-SYSU datasets. Our source code is publicly available at \url{https://github.com/mustansarfiaz/DDAM-PS}.
摘要
<>将文本翻译成简化中文。<>人体搜索(PS)是一个computer vision中的挑战任务,旨在实现人体检测和重新识别(ReID)的共同优化。尽管前一些进展有示出在不监督和弱监督学习的形式下表现出色,但是存在一个主要的领域适应能力问题的PS模型。在这篇论文中,我们提出了一种努力适应频率mixer(DDAM) для人体搜索(DDAP-PS)框架,以减少source域和target域之间的距离,从而提高ReID任务。具体来说,我们引入了一种新的DDAM模块,该模块将source域和target域的表示结合在一起。我们的DDAM模块鼓励频率混合,以降低ReID任务中的预测误差。为此,我们引入了两个桥接损失和一个偏差损失。两个桥接损失的目标是使 moderate mixed-domain representation保持适度的距离于source域和target域的表示,以避免过拟合。而偏差损失则是避免 moderate mixed-domain representation偏向于source域或target域,以避免过拟合。此外,我们解决了在预测和ReID子任务之间的冲突。我们强制协调norm-aware embedding,以便更好地学习 moderate mixed-domain representation。我们进行了实验来证明我们的方法的有效性。我们的方法在PRW和CUHK-SYSU datasets上表现出色。我们的源代码公开在 GitHub上,可以通过 \url{https://github.com/mustansarfiaz/DDAM-PS} 访问。
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
results: 经过广泛的实验 validate了该方法的效果,能够创造出流畅和创新的长视频,并且可以扩展到其他任务,如图像到视频动画和自适应视频预测。I hope that helps!Abstract
Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video ("story-level"), it is desirable to have creative transition and prediction effects across different clips. This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos. Specifically, we propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions. By providing the images of different scenes as inputs, combined with text-based control, our model generates transition videos that ensure coherence and visual quality. Furthermore, the model can be readily extended to various tasks such as image-to-video animation and autoregressive video prediction. To conduct a comprehensive evaluation of this new generative task, we propose three assessing criteria for smooth and creative transition: temporal consistency, semantic similarity, and video-text semantic alignment. Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos. Project page: https://vchitect.github.io/SEINE-project/ .
摘要
NeRF Revisited: Fixing Quadrature Instability in Volume Rendering
results: 比如研究表明,使用我们提出的方法可以获得更加锐利的Texture、更好的几何重建和更强的深度指导。此外,我们的方法可以很容易地替换现有NeRF方法的体积Rendering方程,不需要更改现有的实现。详细的实验结果和项目页面可以在pl-nerf.github.io上找到。Abstract
Neural radiance fields (NeRF) rely on volume rendering to synthesize novel views. Volume rendering requires evaluating an integral along each ray, which is numerically approximated with a finite sum that corresponds to the exact integral along the ray under piecewise constant volume density. As a consequence, the rendered result is unstable w.r.t. the choice of samples along the ray, a phenomenon that we dub quadrature instability. We propose a mathematically principled solution by reformulating the sample-based rendering equation so that it corresponds to the exact integral under piecewise linear volume density. This simultaneously resolves multiple issues: conflicts between samples along different rays, imprecise hierarchical sampling, and non-differentiability of quantiles of ray termination distances w.r.t. model parameters. We demonstrate several benefits over the classical sample-based rendering equation, such as sharper textures, better geometric reconstruction, and stronger depth supervision. Our proposed formulation can be also be used as a drop-in replacement to the volume rendering equation of existing NeRF-based methods. Our project page can be found at pl-nerf.github.io.
摘要
StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
results: 研究人员在不同的设计中得到了高精度分类结果(最高达98.8%),并且在移动设备上使用GPU和NPU加速器实现了最快的推理速度(达2.8ms)。然而,由于嵌入式硬件的限制,在自定义CPU电子眼镜上部署模型时,推理速度为1.5秒,表现出了人 centered设计和性能之间的负面选择。Abstract
Human-robot walking with prosthetic legs and exoskeletons, especially over complex terrains such as stairs, remains a significant challenge. Egocentric vision has the unique potential to detect the walking environment prior to physical interactions, which can improve transitions to and from stairs. This motivated us to create the StairNet initiative to support the development of new deep learning models for visual sensing and recognition of stairs, with an emphasis on lightweight and efficient neural networks for onboard real-time inference. In this study, we present an overview of the development of our large-scale dataset with over 515,000 manually labeled images, as well as our development of different deep learning models (e.g., 2D and 3D CNN, hybrid CNN and LSTM, and ViT networks) and training methods (e.g., supervised learning with temporal data and semi-supervised learning with unlabeled images) using our new dataset. We consistently achieved high classification accuracy (i.e., up to 98.8%) with different designs, offering trade-offs between model accuracy and size. When deployed on mobile devices with GPU and NPU accelerators, our deep learning models achieved inference speeds up to 2.8 ms. We also deployed our models on custom-designed CPU-powered smart glasses. However, limitations in the embedded hardware yielded slower inference speeds of 1.5 seconds, presenting a trade-off between human-centered design and performance. Overall, we showed that StairNet can be an effective platform to develop and study new visual perception systems for human-robot locomotion with applications in exoskeleton and prosthetic leg control.
摘要
人机步行使用腔 prosthetic legs和外部骨架,特别是在复杂的地形上,如楼梯,仍然是一项 significante challenges。 egocentric vision有独特的潜在力量,可以在physical interactions之前探测步行环境,这种能力可以提高楼梯之间的过渡。这个激励我们创建StairNet计划,支持开发新的深度学习模型,用于视觉感知和识别楼梯,强调轻量级和高效的神经网络,以实现在线实时推理。在这篇文章中,我们介绍了我们的大规模数据集(包括515,000个手动标注的图像)的开发,以及我们开发的不同的深度学习模型(如2D和3D CNN、混合 CNN和LSTM网络)和训练方法(如监督学习和无标签图像)。我们在不同的设计中一直达到了高精度(达98.8%),提供了模型精度和大小之间的负面选择。当我们的深度学习模型在移动设备上(具有GPU和NPU加速器)进行了实时推理时,我们实现了最快的推理速度(达2.8ms)。我们还将我们的模型部署到自定义CPU驱动的智能眼镜上,但由于嵌入式硬件的限制,推理速度为1.5秒,这presented a trade-off between human-centered design和性能。总之,我们证明了StairNet可以是一个有效的平台,用于开发和研究新的视觉感知系统,以应对人机步行控制的应用,包括肢体外科和肢体 prosthetic。
Addressing Limitations of State-Aware Imitation Learning for Autonomous Driving
results: 我们的实验结果显示,使用这种方法可以减少慢速偏好问题,并且在线和离线性能之间的相关性得到了显著提高。Abstract
Conditional Imitation learning is a common and effective approach to train autonomous driving agents. However, two issues limit the full potential of this approach: (i) the inertia problem, a special case of causal confusion where the agent mistakenly correlates low speed with no acceleration, and (ii) low correlation between offline and online performance due to the accumulation of small errors that brings the agent in a previously unseen state. Both issues are critical for state-aware models, yet informing the driving agent of its internal state as well as the state of the environment is of crucial importance. In this paper we propose a multi-task learning agent based on a multi-stage vision transformer with state token propagation. We feed the state of the vehicle along with the representation of the environment as a special token of the transformer and propagate it throughout the network. This allows us to tackle the aforementioned issues from different angles: guiding the driving policy with learned stop/go information, performing data augmentation directly on the state of the vehicle and visually explaining the model's decisions. We report a drastic decrease in inertia and a high correlation between offline and online metrics.
摘要
<>输入文本翻译成简化中文。<>条件模仿学习是自驾车智能代理的常见和有效方法。然而,两个问题限制了这种方法的全面潜力:(i)抗力问题,特殊情况的 causal confusion where the agent mistakenly correlates low speed with no acceleration,和(ii)在线和离线性能之间的低相关性,由小误差的积累导致 Agent 在未看过的状态下。这两个问题对状态意识模型非常重要,但是通过告诉驾车代理其内部状态以及环境状态的重要性。在这篇论文中,我们提出了基于多任务学习的多阶段视transformer agent,我们将驾车器的状态和环境的表示作为特殊token传播到网络中。这allow us to tackle the aforementioned issues from different angles: guiding the driving policy with learned stop/go information, performing data augmentation directly on the state of the vehicle and visually explaining the model's decisions。我们发现了剂量的减少和在线和离线指标之间的高相关性。
Dynamic Batch Norm Statistics Update for Natural Robustness
results: 在 CIFAR10-C 和 ImageNet-C 上实现了约 8% 和 4% 的性能提高,并且可以进一步提高现有的 robust 模型的性能,如 AugMix 和 DeepAug。Abstract
DNNs trained on natural clean samples have been shown to perform poorly on corrupted samples, such as noisy or blurry images. Various data augmentation methods have been recently proposed to improve DNN's robustness against common corruptions. Despite their success, they require computationally expensive training and cannot be applied to off-the-shelf trained models. Recently, it has been shown that updating BatchNorm (BN) statistics of an off-the-shelf model on a single corruption improves its accuracy on that corruption significantly. However, adopting the idea at inference time when the type of corruption is unknown and changing decreases the effectiveness of this method. In this paper, we harness the Fourier domain to detect the corruption type, a challenging task in the image domain. We propose a unified framework consisting of a corruption-detection model and BN statistics update that improves the corruption accuracy of any off-the-shelf trained model. We benchmark our framework on different models and datasets. Our results demonstrate about 8% and 4% accuracy improvement on CIFAR10-C and ImageNet-C, respectively. Furthermore, our framework can further improve the accuracy of state-of-the-art robust models, such as AugMix and DeepAug.
摘要
深度神经网络(DNN)在天然清晰样本上训练后,在受损样本上表现不佳,如噪音或模糊图像。 Various 数据增强方法已经被提出来改善 DNN 的对受损样本的Robustness。尽管它们在成功,但它们需要计算昂贵的训练。此外,这些方法无法应用于现有训练的模型。Recently, it has been shown that updating BatchNorm(BN)统计值 of an off-the-shelf model on a single corruption significantly improves its accuracy on that corruption. However, adopting this idea at inference time when the type of corruption is unknown and changing decreases the effectiveness of this method. In this paper, we harness the Fourier domain to detect the corruption type, a challenging task in the image domain. We propose a unified framework consisting of a corruption-detection model and BN statistics update that improves the corruption accuracy of any off-the-shelf trained model. We benchmark our framework on different models and datasets. Our results demonstrate about 8% and 4% accuracy improvement on CIFAR10-C and ImageNet-C, respectively. Furthermore, our framework can further improve the accuracy of state-of-the-art robust models, such as AugMix and DeepAug.
Using Higher-Order Moments to Assess the Quality of GAN-generated Image Features
results: 数据中图像特征的第三 moments 的信息可以用于定义一种新的评估指标,并且该指标可以更好地反映人类对图像的识别Abstract
The rapid advancement of Generative Adversarial Networks (GANs) necessitates the need to robustly evaluate these models. Among the established evaluation criteria, the Fr\'{e}chet Inception Distance (FID) has been widely adopted due to its conceptual simplicity, fast computation time, and strong correlation with human perception. However, FID has inherent limitations, mainly stemming from its assumption that feature embeddings follow a Gaussian distribution, and therefore can be defined by their first two moments. As this does not hold in practice, in this paper we explore the importance of third-moments in image feature data and use this information to define a new measure, which we call the Skew Inception Distance (SID). We prove that SID is a pseudometric on probability distributions, show how it extends FID, and present a practical method for its computation. Our numerical experiments support that SID either tracks with FID or, in some cases, aligns more closely with human perception when evaluating image features of ImageNet data.
摘要
“生成冲突网络(GAN)的快速发展需要对这些模型进行坚实的评估。现有的评估标准之一是Fréchet Inception Distance(FID),它的概念简单,计算速度快,与人类感知有强相关性。然而,FID存在一些局限性,主要是它假设特征嵌入follows Gaussian distribution,因此可以通过其首两个矩阵来定义。然而,在实践中,这并不成立。因此,我们在这篇论文中研究了图像特征数据中第三个矩阵的重要性,并使用这些信息定义一个新的度量,我们称之为Skew Inception Distance(SID)。我们证明了SID是一个 pseudometric 在概率分布上,并证明了它与FID一起扩展。我们还提供了计算实用方法。我们的数据统计结果表明,SID在ImageNet数据上 either track with FID 或者,在一些情况下,与人类感知更加一致。”
Deepfake detection by exploiting surface anomalies: the SurFake approach
paper_authors: Andrea Ciamarra, Roberto Caldelli, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo for:这篇论文的目的是探讨深伪检测技术,以避免在不同领域的日常生活中广泛传播修改过的消息。methods:这篇论文提出了一种基于场景特征的深伪检测方法,即通过分析图像中表现的表面特征来生成一个可用于训练深度学习模型的描述符。results:实验结果表明,这种方法可以准确地分辨假像和真实图像,并且可以与视觉数据结合使用以提高检测精度。Abstract
The ever-increasing use of synthetically generated content in different sectors of our everyday life, one for all media information, poses a strong need for deepfake detection tools in order to avoid the proliferation of altered messages. The process to identify manipulated content, in particular images and videos, is basically performed by looking for the presence of some inconsistencies and/or anomalies specifically due to the fake generation process. Different techniques exist in the scientific literature that exploit diverse ad-hoc features in order to highlight possible modifications. In this paper, we propose to investigate how deepfake creation can impact on the characteristics that the whole scene had at the time of the acquisition. In particular, when an image (video) is captured the overall geometry of the scene (e.g. surfaces) and the acquisition process (e.g. illumination) determine a univocal environment that is directly represented by the image pixel values; all these intrinsic relations are possibly changed by the deepfake generation process. By resorting to the analysis of the characteristics of the surfaces depicted in the image it is possible to obtain a descriptor usable to train a CNN for deepfake detection: we refer to such an approach as SurFake. Experimental results carried out on the FF++ dataset for different kinds of deepfake forgeries and diverse deep learning models confirm that such a feature can be adopted to discriminate between pristine and altered images; furthermore, experiments witness that it can also be combined with visual data to provide a certain improvement in terms of detection accuracy.
摘要
随着人工生成内容在不同领域的日常生活中越来越广泛使用,特别是媒体信息领域,需要深刻检测深伪工具以避免扩散修改的消息。寻找修改后的内容特征是通过检查修改过程中的一些不一致和异常来完成的。现有的科学文献中有多种方法利用特定的特征来推断修改的可能性。在这篇论文中,我们提出了研究深伪创造对场景中的全部特征的影响。具体来说,当图像(视频)被捕捉时,场景中的整体几何(例如表面)以及捕捉过程(例如照明)会直接确定一个唯一的环境,这个环境由图像像素值直接表达出来。深伪生成过程可能会改变这些内在关系。通过分析图像中表现的表面特征,我们可以获得一个可以用于训练深度学习模型的描述符,我们称之为SurFake。实验结果表明,使用SurFake特征可以在FF++数据集上对不同类型的深伪假造和多种深度学习模型进行检测,并且可以与视觉数据结合使用以提高检测精度。
Diffusion Reconstruction of Ultrasound Images with Informative Uncertainty
methods: combining model-based and learning-based approaches,使用扩散模型进行增强。
results: 实验表明,该方法可以在各种数据集上实现高质量的音像重建,并且比STATE-OF-THE-ART方法更高效。同时,通过对单个样本和多个样本重建的统计特性进行深入分析,实验证明了模型的可靠性和信息含量。Abstract
Despite its wide use in medicine, ultrasound imaging faces several challenges related to its poor signal-to-noise ratio and several sources of noise and artefacts. Enhancing ultrasound image quality involves balancing concurrent factors like contrast, resolution, and speckle preservation. In recent years, there has been progress both in model-based and learning-based approaches to improve ultrasound image reconstruction. Bringing the best from both worlds, we propose a hybrid approach leveraging advances in diffusion models. To this end, we adapt Denoising Diffusion Restoration Models (DDRM) to incorporate ultrasound physics through a linear direct model and an unsupervised fine-tuning of the prior diffusion model. We conduct comprehensive experiments on simulated, in-vitro, and in-vivo data, demonstrating the efficacy of our approach in achieving high-quality image reconstructions from a single plane wave input and in comparison to state-of-the-art methods. Finally, given the stochastic nature of the method, we analyse in depth the statistical properties of single and multiple-sample reconstructions, experimentally show the informativeness of their variance, and provide an empirical model relating this behaviour to speckle noise. The code and data are available at: (upon acceptance).
摘要
虽然ultrasound imaging在医学中广泛使用,但它还面临着一些相关的信号噪声和噪声和artefacts的问题。提高ultrasound图像质量需要平衡同时的因素,如对比、分辨率和杂点保持。在过去几年,有所进步在基于模型和学习方法来改进ultrasound图像重建方面。我们提议一种hybrid方法,利用了diffusion模型的进步。为此,我们适应了Denosing Diffusion Restoration Models(DDRM),并通过线性直接模型和无监督精度适应来吸收ultrasound物理。我们对 simulate、in vitro和in vivo数据进行了广泛的实验,并证明了我们的方法可以从单个扩散波输入获得高质量的图像重建,并与当前的方法进行比较。此外,由于方法的随机性,我们进行了深入的统计分析,实验表明了单个和多个样本重建的方差的信息丰富性,并提供了一个实验性的模型,将这种行为与杂点噪声关联起来。代码和数据在接受后可以在:(upon acceptance)获取。
Enhanced Synthetic MRI Generation from CT Scans Using CycleGAN with Feature Extraction
paper_authors: Saba Nikbakhsh, Lachin Naghashyar, Morteza Valizadeh, Mehdi Chehel Amirani for:This paper aims to address the challenges of multimodal alignment in radiotherapy planning by introducing an approach for enhanced monomodal registration using synthetic MRI images.methods:The proposed method uses unpaired data and combines CycleGANs and feature extractors to produce synthetic MRI images from CT scans.results:The approach outperforms several state-of-the-art methods and shows promising results, validated by multiple comparison metrics.Here’s the text in Simplified Chinese:for:这篇论文目标是解决多模态对接问题,用增强单模态registrations的方法来进行放疗规划。methods:这种方法使用不匹配的数据,并结合CycleGANs和特征提取器来生成CT扫描图像。results:该方法比许多现有方法表现出色,通过多个比较指标验证其效果。Abstract
In the field of radiotherapy, accurate imaging and image registration are of utmost importance for precise treatment planning. Magnetic Resonance Imaging (MRI) offers detailed imaging without being invasive and excels in soft-tissue contrast, making it a preferred modality for radiotherapy planning. However, the high cost of MRI, longer acquisition time, and certain health considerations for patients pose challenges. Conversely, Computed Tomography (CT) scans offer a quicker and less expensive imaging solution. To bridge these modalities and address multimodal alignment challenges, we introduce an approach for enhanced monomodal registration using synthetic MRI images. Utilizing unpaired data, this paper proposes a novel method to produce these synthetic MRI images from CT scans, leveraging CycleGANs and feature extractors. By building upon the foundational work on Cycle-Consistent Adversarial Networks and incorporating advancements from related literature, our methodology shows promising results, outperforming several state-of-the-art methods. The efficacy of our approach is validated by multiple comparison metrics.
摘要
在放射治疗领域,准确的成像和图像 регистрация是至关重要的,以确定精准的治疗规划。核磁共振成像(MRI)可提供详细的成像,无需侵入性,并且在软组织冲击上表现出色,因此成为放射治疗规划的首选方法。然而,MRI的高价格、长期获取时间以及一些健康考虑使得患者面临挑战。相反,计算机断层成像(CT)扫描可以提供更快和便宜的成像解决方案。为了bridging这两种模式和解决多模态对对比挑战,本文介绍了一种增强单模态 регистраción的方法,使用人工生成的MRI图像。这种方法基于CycleGANs和特征提取器,并在相关文献中吸取了进一步改进。我们的方法在多个比较指标中表现出色,超过了许多现有的方法。
Brain-like Flexible Visual Inference by Harnessing Feedback-Feedforward Alignment
paper_authors: Tahereh Toosi, Elias B. Issa for: This paper aims to explore the mechanisms underlying how feedback connections in the visual cortex support flexible visual functions, and to propose a learning algorithm called Feedback-Feedforward Alignment (FFA) that can co-optimize classification and reconstruction tasks.methods: The proposed FFA algorithm leverages feedback and feedforward pathways as mutual credit assignment computational graphs, enabling alignment and co-optimization of objectives.results: The study demonstrates the effectiveness of FFA in co-optimizing classification and reconstruction tasks on widely used MNIST and CIFAR10 datasets, and shows that the alignment mechanism in FFA endows feedback connections with emergent visual inference functions such as denoising, resolving occlusions, hallucination, and imagination. Additionally, FFA is found to be more bio-plausible compared to traditional backpropagation (BP) methods.Abstract
In natural vision, feedback connections support versatile visual inference capabilities such as making sense of the occluded or noisy bottom-up sensory information or mediating pure top-down processes such as imagination. However, the mechanisms by which the feedback pathway learns to give rise to these capabilities flexibly are not clear. We propose that top-down effects emerge through alignment between feedforward and feedback pathways, each optimizing its own objectives. To achieve this co-optimization, we introduce Feedback-Feedforward Alignment (FFA), a learning algorithm that leverages feedback and feedforward pathways as mutual credit assignment computational graphs, enabling alignment. In our study, we demonstrate the effectiveness of FFA in co-optimizing classification and reconstruction tasks on widely used MNIST and CIFAR10 datasets. Notably, the alignment mechanism in FFA endows feedback connections with emergent visual inference functions, including denoising, resolving occlusions, hallucination, and imagination. Moreover, FFA offers bio-plausibility compared to traditional backpropagation (BP) methods in implementation. By repurposing the computational graph of credit assignment into a goal-driven feedback pathway, FFA alleviates weight transport problems encountered in BP, enhancing the bio-plausibility of the learning algorithm. Our study presents FFA as a promising proof-of-concept for the mechanisms underlying how feedback connections in the visual cortex support flexible visual functions. This work also contributes to the broader field of visual inference underlying perceptual phenomena and has implications for developing more biologically inspired learning algorithms.
摘要
自然视觉中,反馈连接支持多样化的视觉推理能力,如对尘埃或噪声的底部感知信息的理解或通过纯净的顶部下向过程来实现想象。然而,这些反馈路径学习如何灵活地产生这些能力的机制不清楚。我们提出,反馈路径通过与前向路径的对齐来产生上述能力。为实现这种对齐,我们提出了反馈-前向对齐(FFA)学习算法,该算法利用反馈和前向路径作为互相评估计算图,以实现对齐。在我们的研究中,我们证明了 FFA 在广泛使用的 MNIST 和 CIFAR10 数据集上可以协同优化分类和重建任务。特别是,FFA 中的对齐机制使得反馈连接获得了 emergent 视觉推理功能,包括降噪、解除尘埃、幻觉和想象。此外,FFA 提供了对传统 backpropagation(BP)方法更加生物可能性的实现,通过将计算图转换为目标驱动反馈路径,FFA 可以解决传统 BP 中的权重传输问题,从而提高生物可能性。我们的研究展示了 FFA 作为视觉推理机制的可能性,并对涉及到视觉推理的更广泛领域产生影响。
FLODCAST: Flow and Depth Forecasting via Multimodal Recurrent Architectures
results: 在Cityscapes dataset上测试,模型得到了最佳的result для两种预测任务,并且还获得了下游任务 segmentation forecasting 的好处。Abstract
Forecasting motion and spatial positions of objects is of fundamental importance, especially in safety-critical settings such as autonomous driving. In this work, we address the issue by forecasting two different modalities that carry complementary information, namely optical flow and depth. To this end we propose FLODCAST a flow and depth forecasting model that leverages a multitask recurrent architecture, trained to jointly forecast both modalities at once. We stress the importance of training using flows and depth maps together, demonstrating that both tasks improve when the model is informed of the other modality. We train the proposed model to also perform predictions for several timesteps in the future. This provides better supervision and leads to more precise predictions, retaining the capability of the model to yield outputs autoregressively for any future time horizon. We test our model on the challenging Cityscapes dataset, obtaining state of the art results for both flow and depth forecasting. Thanks to the high quality of the generated flows, we also report benefits on the downstream task of segmentation forecasting, injecting our predictions in a flow-based mask-warping framework.
摘要
预测物体的运动和空间位置是基本重要的,特别是在自动驾驶等安全关键的应用场景中。在这种工作中,我们解决这个问题,通过预测两种不同的模式,即光流和深度。为此,我们提议了FLODCAST模型,它利用多任务感知架构,同时预测两种模式。我们认为,在训练时,通过将流和深度图像一起使用,可以提高模型的性能。我们还训练了模型,以便在未来几个时间步预测。这提供了更好的监督,导致更精准的预测,并且模型仍然保留了对未来时间距离的autoregressive预测能力。我们在Cityscapes dataset上测试了我们的模型,并获得了流和深度预测中的状态对应记录。此外,由于生成的流量质量高,我们还报告了基于流动掩蔽框架的 segmentation 预测的改进。
Long-Tailed Learning as Multi-Objective Optimization
results: 在常用的长尾学习benchmark上,与现有SOTA方法进行比较,显示了我们的方法的超越性。Abstract
Real-world data is extremely imbalanced and presents a long-tailed distribution, resulting in models that are biased towards classes with sufficient samples and perform poorly on rare classes. Recent methods propose to rebalance classes but they undertake the seesaw dilemma (what is increasing performance on tail classes may decrease that of head classes, and vice versa). In this paper, we argue that the seesaw dilemma is derived from gradient imbalance of different classes, in which gradients of inappropriate classes are set to important for updating, thus are prone to overcompensation or undercompensation on tail classes. To achieve ideal compensation, we formulate the long-tailed recognition as an multi-objective optimization problem, which fairly respects the contributions of head and tail classes simultaneously. For efficiency, we propose a Gradient-Balancing Grouping (GBG) strategy to gather the classes with similar gradient directions, thus approximately make every update under a Pareto descent direction. Our GBG method drives classes with similar gradient directions to form more representative gradient and provide ideal compensation to the tail classes. Moreover, We conduct extensive experiments on commonly used benchmarks in long-tailed learning and demonstrate the superiority of our method over existing SOTA methods.
摘要
In this paper, we argue that the seesaw dilemma is caused by an imbalance in the gradients of the different classes. The gradients of the minority classes are often set to be more important for updating the model, which can lead to overcompensation or undercompensation of these classes. To address this problem, we formulate the long-tailed recognition as a multi-objective optimization problem, which aims to fairly respect the contributions of both the head and tail classes simultaneously.To efficiently solve this problem, we propose a Gradient-Balancing Grouping (GBG) strategy. This strategy gathers classes with similar gradient directions together, and approximately makes every update under a Pareto descent direction. This allows the classes with similar gradient directions to form more representative gradients, and provide ideal compensation to the tail classes.We conduct extensive experiments on commonly used benchmarks in long-tailed learning, and demonstrate the superiority of our method over existing state-of-the-art (SOTA) methods.
results: 实验结果表明,LAVSS比既有的音频视频分离方法(AVS)更高效,并且可以在不同的声音环境下保持高效。Abstract
Existing machine learning research has achieved promising results in monaural audio-visual separation (MAVS). However, most MAVS methods purely consider what the sound source is, not where it is located. This can be a problem in VR/AR scenarios, where listeners need to be able to distinguish between similar audio sources located in different directions. To address this limitation, we have generalized MAVS to spatial audio separation and proposed LAVSS: a location-guided audio-visual spatial audio separator. LAVSS is inspired by the correlation between spatial audio and visual location. We introduce the phase difference carried by binaural audio as spatial cues, and we utilize positional representations of sounding objects as additional modality guidance. We also leverage multi-level cross-modal attention to perform visual-positional collaboration with audio features. In addition, we adopt a pre-trained monaural separator to transfer knowledge from rich mono sounds to boost spatial audio separation. This exploits the correlation between monaural and binaural channels. Experiments on the FAIR-Play dataset demonstrate the superiority of the proposed LAVSS over existing benchmarks of audio-visual separation. Our project page: https://yyx666660.github.io/LAVSS/.
摘要
现有的机器学习研究已经取得了鼓舞人的成绩在单声音频分离(MAVS)领域。然而,大多数MAVS方法仅考虑声音源的特征,而不考虑其位置。这可以是VR/AR场景中的问题,listeners需要能够分辨位于不同方向的类似声音源。为解决这些限制,我们扩展了MAVS,并提出了位置导向的音频视频空间分离器(LAVSS)。LAVSS受到听道和视觉位置之间的相关性启发,并利用声道相位差作为空间提示,以及使用声音发生对象的位置表示为额外模式指导。此外,我们还利用多级跨模态注意力来实现视觉位置协作。此外,我们还利用预训练的单声音分离器来传递知识,从富有的单声音中提高空间声音分离。这利用了单声音和双声音之间的相关性。FAIR-Play数据集的实验表明,我们提出的LAVSS已经超越了现有的音频视频分离标准。我们的项目页面:https://yyx666660.github.io/LAVSS/.
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
for: bridging the communication gap for hearing-impaired individuals by providing a large-scale multi-prompt 3D sign language (SL) motion dataset.
methods: compiling and curating the SignAvatars dataset, which includes 70,000 videos from 153 signers, and introducing an automated annotation pipeline to yield 3D holistic annotations.
results: facilitating various tasks such as 3D sign language recognition (SLR) and the novel 3D SL production (SLP) from diverse inputs, and providing a unified benchmark for evaluating the potential of SignAvatars.Here’s the summary in the format you requested:
for: bridging the communication gap for hearing-impaired individuals
methods: compiling and curating SignAvatars dataset, introducing automated annotation pipeline
results: facilitating 3D SLR and SLP, providing unified benchmarkAbstract
In this paper, we present SignAvatars, the first large-scale multi-prompt 3D sign language (SL) motion dataset designed to bridge the communication gap for hearing-impaired individuals. While there has been an exponentially growing number of research regarding digital communication, the majority of existing communication technologies primarily cater to spoken or written languages, instead of SL, the essential communication method for hearing-impaired communities. Existing SL datasets, dictionaries, and sign language production (SLP) methods are typically limited to 2D as the annotating 3D models and avatars for SL is usually an entirely manual and labor-intensive process conducted by SL experts, often resulting in unnatural avatars. In response to these challenges, we compile and curate the SignAvatars dataset, which comprises 70,000 videos from 153 signers, totaling 8.34 million frames, covering both isolated signs and continuous, co-articulated signs, with multiple prompts including HamNoSys, spoken language, and words. To yield 3D holistic annotations, including meshes and biomechanically-valid poses of body, hands, and face, as well as 2D and 3D keypoints, we introduce an automated annotation pipeline operating on our large corpus of SL videos. SignAvatars facilitates various tasks such as 3D sign language recognition (SLR) and the novel 3D SL production (SLP) from diverse inputs like text scripts, individual words, and HamNoSys notation. Hence, to evaluate the potential of SignAvatars, we further propose a unified benchmark of 3D SL holistic motion production. We believe that this work is a significant step forward towards bringing the digital world to the hearing-impaired communities. Our project page is at https://signavatars.github.io/
摘要
在这篇论文中,我们提出了SignAvatars,首个大规模多提示3D手语(SL)动作数据集,旨在bridging hearing-impaired individuals的communication gap。 existing digital communication technologies primarily focus on spoken or written languages,而忽视手语,是耳残社区的 essencial communication method。 existing SL datasets, dictionaries, and sign language production(SLP)方法通常是2D的,因为annotating 3D模型和avatar for SL是一个 entirely manual and labor-intensive process,Frequently resulting in unnatural avatars。 In response to these challenges, we compile and curate the SignAvatars dataset,包括70,000个视频,共计8.34 million frames,covering isolated signs and continuous, co-articulated signs,with multiple prompts including HamNoSys, spoken language, and words。 To yield 3D holistic annotations,包括Body, hands, and face的biomechanically-valid poses,as well as 2D and 3D keypoints,we introduce an automated annotation pipeline operating on our large corpus of SL videos。 SignAvatars facilitates various tasks such as 3D手语 recognition(SLR)and the novel 3D SL production(SLP)from diverse inputs like text scripts, individual words, and HamNoSys notation。 Therefore, we further propose a unified benchmark of 3D SL holistic motion production to evaluate the potential of SignAvatars。 We believe that this work is a significant step forward towards bringing the digital world to the hearing-impaired communities。More information can be found on our project page at .
Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology
results: 研究发现,通过使用OmniCE方法生成的质量问题作为训练数据,可以提高深度学习模型的泛化能力。此外,研究还发现,通过在训练和测试阶段使用OmniCE-质量问题作为数据增强,可以进一步提高模型的泛化能力。Abstract
Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess and further enhance the robustness of the models, we analyze the physical causes of the full-stack corruptions throughout the pathological life-cycle and propose an Omni-Corruption Emulation (OmniCE) method to reproduce 21 types of corruptions quantified with 5-level severity. We then construct three OmniCE-corrupted benchmark datasets at both patch level and slide level and assess the robustness of popular DNNs in classification and segmentation tasks. Further, we explore to use the OmniCE-corrupted datasets as augmentation data for training and experiments to verify that the generalization ability of the models has been significantly enhanced.
摘要
深度学习在数字 PATHOLOGY 中带来了智能和自动化作为重要提高的诊断标准。然而,从组织准备到滤镜成像的多个步骤引入了多种图像损害,使得深度神经网络(DNN)模型难以在临床应用中获得稳定的诊断结果。为了评估和进一步增强模型的可靠性,我们分析了全栈损害的物理原因并提出了全面损害模拟(OmniCE)方法,可以生成21种类型的损害,并将其分为5级严重程度。然后,我们构建了三个 OmniCE-损害的标准数据集,包括负荷级和整个报告级,并评估了流行的 DNN 在分类和 segmentation 任务中的可靠性。此外,我们还 explore 了使用 OmniCE-损害数据集作为训练和实验数据,以验证模型的通用能力得到了显著提高。
Thermal-Infrared Remote Target Detection System for Maritime Rescue based on Data Augmentation with 3D Synthetic Data
results: 实验结果表明,使用了增强数据的网络比只使用实际冷光数据进行训练表现出色,并且提出的 segmentation 模型超过了现有的 segmentation 方法的性能。Abstract
This paper proposes a thermal-infrared (TIR) remote target detection system for maritime rescue using deep learning and data augmentation. We established a self-collected TIR dataset consisting of multiple scenes imitating human rescue situations using a TIR camera (FLIR). Additionally, to address dataset scarcity and improve model robustness, a synthetic dataset from a 3D game (ARMA3) to augment the data is further collected. However, a significant domain gap exists between synthetic TIR and real TIR images. Hence, a proper domain adaptation algorithm is essential to overcome the gap. Therefore, we suggest a domain adaptation algorithm in a target-background separated manner from 3D game-to-real, based on a generative model, to address this issue. Furthermore, a segmentation network with fixed-weight kernels at the head is proposed to improve the signal-to-noise ratio (SNR) and provide weak attention, as remote TIR targets inherently suffer from unclear boundaries. Experiment results reveal that the network trained on augmented data consisting of translated synthetic and real TIR data outperforms that trained on only real TIR data by a large margin. Furthermore, the proposed segmentation model surpasses the performance of state-of-the-art segmentation methods.
摘要
Translation note:* "TIR" is translated as " thermal-infrared" ( thermaled 红外)* "remote" is translated as "远程" (yuèjìng)* "target" is translated as "目标" (mùdiāo)* "detection" is translated as "检测" (jiǎn cí)* "system" is translated as "系统" (xiàngzì)* "using deep learning" is translated as "使用深度学习" (shǐyòu shēngrán xuéxí)* "data augmentation" is translated as "数据增强" (shùjì zhòngqiáng)* "dataset" is translated as "数据集" (shùjì)* "synthetic" is translated as "制造的" (zhì zhàng de)* "3D game" is translated as "3D 游戏" (3D yóuxì)* "domain adaptation" is translated as "领域适应" (fāngyù tiěbìng)* "generative model" is translated as "生成模型" (shēngchǎn módelì)* "segmentation network" is translated as "分割网络" (fēnzhang wǎngluò)* "signal-to-noise ratio" is translated as "信号噪声比" (xìng jiào zhōng shēng bǐ)* "SNR" is translated as "SNR" (SNR)* "state-of-the-art" is translated as "当前领域的最佳方法" (dàng qián fāngyù de zhèngjiā fāngfǎ)
results: 研究表明,使用高分辨率参照图像可以提高DT-CMR图像的质量,并且模型可以在不同的b-值下进行超分辨。Abstract
Diffusion Tensor Cardiac Magnetic Resonance (DT-CMR) is the only in vivo method to non-invasively examine the microstructure of the human heart. Current research in DT-CMR aims to improve the understanding of how the cardiac microstructure relates to the macroscopic function of the healthy heart as well as how microstructural dysfunction contributes to disease. To get the final DT-CMR metrics, we need to acquire diffusion weighted images of at least 6 directions. However, due to DWI's low signal-to-noise ratio, the standard voxel size is quite big on the scale for microstructures. In this study, we explored the potential of deep-learning-based methods in improving the image quality volumetrically (x4 in all dimensions). This study proposed a novel framework to enable volumetric super-resolution, with an additional model input of high-resolution b0 DWI. We demonstrated that the additional input could offer higher super-resolved image quality. Going beyond, the model is also able to super-resolve DWIs of unseen b-values, proving the model framework's generalizability for cardiac DWI superresolution. In conclusion, we would then recommend giving the model a high-resolution reference image as an additional input to the low-resolution image for training and inference to guide all super-resolution frameworks for parametric imaging where a reference image is available.
摘要
Diffusion Tensor Cardiac Magnetic Resonance (DT-CMR) 是人体心脏内部非侵入性地检查微结构的唯一方法。当前研究的目标是提高健康心脏微结构与macroscopic功能之间的关系认识,以及诊断疾病的微结构功能不良的贡献。为获得最终 DT-CMR 指标,我们需要获得至少6个方向的扩散束图像。然而,由于 DWI 的信号噪声比较低,标准 voxel 大小很大,不能够准确捕捉微结构。在这项研究中,我们探索了深度学习基于方法在提高图像质量方面的潜在作用。我们提出了一种新的框架,以便在所有维度上进行超分辨感图像。我们示出了在高分辨率 refer 图像作为额外输入时,模型可以提供更高的超分辨图像质量。此外,模型还可以超分辨不同的 b-值 DWI,证明了模型框架的普适性。因此,我们建议在训练和推理过程中给模型提供高分辨率 refer 图像作为额外输入,以便导引所有超分辨感图像框架。
A Low-cost Strategic Monitoring Approach for Scalable and Interpretable Error Detection in Deep Neural Networks
results: 该方法可以准确地检测到silent data corruption,并且可以提供高精度(96%)和高准确率(98%)的检测结果,同时具有较少的计算开销(只需0.3%的非监测时间)。Abstract
We present a highly compact run-time monitoring approach for deep computer vision networks that extracts selected knowledge from only a few (down to merely two) hidden layers, yet can efficiently detect silent data corruption originating from both hardware memory and input faults. Building on the insight that critical faults typically manifest as peak or bulk shifts in the activation distribution of the affected network layers, we use strategically placed quantile markers to make accurate estimates about the anomaly of the current inference as a whole. Importantly, the detector component itself is kept algorithmically transparent to render the categorization of regular and abnormal behavior interpretable to a human. Our technique achieves up to ~96% precision and ~98% recall of detection. Compared to state-of-the-art anomaly detection techniques, this approach requires minimal compute overhead (as little as 0.3% with respect to non-supervised inference time) and contributes to the explainability of the model.
摘要
我们提出了一种高度压缩的执行时间监控方法,用于深度电脑视觉网络中检测静默数据损坏。我们从只有几个(甚至只有两个)隐藏层中提取选择的知识,但可以高效地检测硬件内存和输入错误所引起的数据损坏。我们建基于网络层的活动分布中的峰值或块状迁移的假设,使用策略性地置标示来做高精度的检测。我们的方法可以实现约96%的准确率和约98%的检测 recall。与现有的异常检测技术相比,我们的方法需要非常小的计算负载(只有0.3% 的非超级学习时间),并且增加了模型的解释性。
Class Incremental Learning with Pre-trained Vision-Language Models
results: 我们的实验表明, simplest solution – 单 Linear Adapter 层与参数保留 – produce the best results。在多个传统的benchmark上,我们的方法一直保持了与当前状态的较大的改进 margin。Abstract
With the advent of large-scale pre-trained models, interest in adapting and exploiting them for continual learning scenarios has grown. In this paper, we propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation instead of only using zero-shot learning of new tasks. We augment a pre-trained CLIP model with additional layers after the Image Encoder or before the Text Encoder. We investigate three different strategies: a Linear Adapter, a Self-attention Adapter, each operating on the image embedding, and Prompt Tuning which instead modifies prompts input to the CLIP text encoder. We also propose a method for parameter retention in the adapter layers that uses a measure of parameter importance to better maintain stability and plasticity during incremental learning. Our experiments demonstrate that the simplest solution -- a single Linear Adapter layer with parameter retention -- produces the best results. Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
摘要
《大规模预训练模型的应用于不断学习场景的兴趣》With the advent of large-scale pre-trained models, there has been growing interest in adapting and exploiting them for continual learning scenarios. In this paper, we propose an approach to leveraging pre-trained vision-language models (e.g. CLIP) that enables further adaptation instead of only using zero-shot learning of new tasks. We augment a pre-trained CLIP model with additional layers after the Image Encoder or before the Text Encoder. We investigate three different strategies: a Linear Adapter, a Self-attention Adapter, and Prompt Tuning, each operating on the image embedding. We also propose a method for parameter retention in the adapter layers that uses a measure of parameter importance to better maintain stability and plasticity during incremental learning. Our experiments demonstrate that the simplest solution - a single Linear Adapter layer with parameter retention - produces the best results. Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
Recaptured Raw Screen Image and Video Demoiréing via Channel and Spatial Modulations
results: 实验表明,提议的方法可以在图像和视频恢复中达到状态之最的性能。此外,还提出了第一个准确对齐的原始视频恢复(RawVDemoir'e)数据集,并提出了一种高效的时间对齐方法,通过插入交换 patrern。代码和数据集可以在https://github.com/tju-chengyijia/VD_raw中下载。Abstract
Capturing screen contents by smartphone cameras has become a common way for information sharing. However, these images and videos are often degraded by moir\'e patterns, which are caused by frequency aliasing between the camera filter array and digital display grids. We observe that the moir\'e patterns in raw domain is simpler than those in sRGB domain, and the moir\'e patterns in raw color channels have different properties. Therefore, we propose an image and video demoir\'eing network tailored for raw inputs. We introduce a color-separated feature branch, and it is fused with the traditional feature-mixed branch via channel and spatial modulations. Specifically, the channel modulation utilizes modulated color-separated features to enhance the color-mixed features. The spatial modulation utilizes the feature with large receptive field to modulate the feature with small receptive field. In addition, we build the first well-aligned raw video demoir\'eing (RawVDemoir\'e) dataset and propose an efficient temporal alignment method by inserting alternating patterns. Experiments demonstrate that our method achieves state-of-the-art performance for both image and video demori\'eing. We have released the code and dataset in https://github.com/tju-chengyijia/VD_raw.
摘要
Capturing screen contents by smartphone cameras has become a common way for information sharing. However, these images and videos are often degraded by moiré patterns, which are caused by frequency aliasing between the camera filter array and digital display grids. We observe that the moiré patterns in the raw domain are simpler than those in the sRGB domain, and the moiré patterns in the raw color channels have different properties. Therefore, we propose an image and video demoiré network tailored for raw inputs. We introduce a color-separated feature branch, and it is fused with the traditional feature-mixed branch via channel and spatial modulations. Specifically, the channel modulation utilizes modulated color-separated features to enhance the color-mixed features. The spatial modulation utilizes the feature with a large receptive field to modulate the feature with a small receptive field. In addition, we build the first well-aligned raw video demoiré (RawVDemoiré) dataset and propose an efficient temporal alignment method by inserting alternating patterns. Experiments demonstrate that our method achieves state-of-the-art performance for both image and video demoiré. We have released the code and dataset at https://github.com/tju-chengyijia/VD_raw.
GACE: Geometry Aware Confidence Enhancement for Black-Box 3D Object Detectors on LiDAR-Data
paper_authors: David Schinagl, Georg Krispel, Christian Fruhwirth-Reisinger, Horst Possegger, Horst Bischof
for: 提高黑obox3D对象检测器的置信度估计
methods: 聚合检测结果的地理信息和空间关系,以提高置信度估计
results: 在多种state-of-the-art检测器上实现了一致性的性能提升,尤其是用于潜在弱点用户类型(如行人和自行车)Abstract
Widely-used LiDAR-based 3D object detectors often neglect fundamental geometric information readily available from the object proposals in their confidence estimation. This is mostly due to architectural design choices, which were often adopted from the 2D image domain, where geometric context is rarely available. In 3D, however, considering the object properties and its surroundings in a holistic way is important to distinguish between true and false positive detections, e.g. occluded pedestrians in a group. To address this, we present GACE, an intuitive and highly efficient method to improve the confidence estimation of a given black-box 3D object detector. We aggregate geometric cues of detections and their spatial relationships, which enables us to properly assess their plausibility and consequently, improve the confidence estimation. This leads to consistent performance gains over a variety of state-of-the-art detectors. Across all evaluated detectors, GACE proves to be especially beneficial for the vulnerable road user classes, i.e. pedestrians and cyclists.
摘要
(Simplified Chinese)广泛使用 LiDAR 技术的 3D 物体检测器经常忽视可用的基本几何信息,主要是由于架构设计的选择,通常从 2D 图像领域中采用的设计方式。在 3D 中,考虑物体属性和周围环境的整体方式非常重要,以确定true和false检测的分辨率。例如,在群体中 occluded 的行人。为解决这个问题,我们提出了 GACE,一种简单、高效的方法,可以改进给定黑盒 3D 物体检测器的信任度估计。我们将检测结果的几何特征和空间关系聚合起来,以评估其可能性,从而提高信任度估计。这会导致多种状态体检测器的表现提高。对于车道上的易受伤用户类型,例如行人和自行车手,GACE 特别有利。
HWD: A Novel Evaluation Score for Styled Handwritten Text Generation
For: 这篇论文的目的是提出一个适合用于评估手写文本生成(Styled HTG)模型的评估指标,以便推动这个研究领域的发展。* Methods: 这篇论文使用了一个特定的深度学习模型,将手写文本обра像映射到一个固定长度的数据集中,并使用一个专门设计来抽取手写风格特征的网络来评估手写文本的质量。* Results: 这篇论文的实验结果显示,提出的手写距离(HWD)是一个有用的评估指标,可以帮助评估手写文本生成模型的性能。实验结果还显示,使用HWD来评估不同的字元和行元数据集的手写文本生成模型,具有良好的可靠性和可重现性。Abstract
Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep learning models for tackling this task. Being able to measure the performance of HTG models via a meaningful and representative criterion is key for fostering the development of this research topic. However, despite the current adoption of scores for natural image generation evaluation, assessing the quality of generated handwriting remains challenging. In light of this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting. Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG. The pretrained model used as backbone will be released to ease the adoption of the score, aiming to provide a valuable tool for evaluating HTG models and thus contributing to advancing this important research area.
摘要
文本样式化生成(Styled HTG)是文档分析领域中一个重要任务,目标是生成基于给定参考图像的文本图像。在过去几年,深度学习模型在解决这个任务上做出了 significiant 的进步。能够对 HTG 模型使用可信度的表征是锻炼这个研究领域的发展的关键。然而,自然图像生成评价分数的当前采用不足以评价生成的手写文本质量。为此,我们提出了一种特定于 HTG 的评价指标——手写距离(HWD)。具体来说,它在特定的网络中提取手写样式特征,并利用几何特征进行比较。经过广泛的实验评价不同的单词和行级手写文本图像 dataset 上,我们证明了提议的 HWD 是一个适合 HTG 评价的分数。预训练的模型将被发布,以便推广 HWD,并提供一个valuable的工具来评价 HTG 模型,从而为这个重要的研究领域做出贡献。
Bilateral Network with Residual U-blocks and Dual-Guided Attention for Real-time Semantic Segmentation
results: 我们通过对 Cityscapes 和 CamVid 数据集进行了广泛的实验,证明了我们的方法可以提高 semantic segmentation 的性能,并且和常用的多层融合相比,我们的方法具有更好的时间效率。Abstract
When some application scenarios need to use semantic segmentation technology, like automatic driving, the primary concern comes to real-time performance rather than extremely high segmentation accuracy. To achieve a good trade-off between speed and accuracy, two-branch architecture has been proposed in recent years. It treats spatial information and semantics information separately which allows the model to be composed of two networks both not heavy. However, the process of fusing features with two different scales becomes a performance bottleneck for many nowaday two-branch models. In this research, we design a new fusion mechanism for two-branch architecture which is guided by attention computation. To be precise, we use the Dual-Guided Attention (DGA) module we proposed to replace some multi-scale transformations with the calculation of attention which means we only use several attention layers of near linear complexity to achieve performance comparable to frequently-used multi-layer fusion. To ensure that our module can be effective, we use Residual U-blocks (RSU) to build one of the two branches in our networks which aims to obtain better multi-scale features. Extensive experiments on Cityscapes and CamVid dataset show the effectiveness of our method.
摘要
Translated into Simplified Chinese:当某些应用场景需要使用semantic segmentation技术时,如自动驾驶,则首要考虑实时性而不是极高的分割精度。为实现好的时间性和准确性之间的平衡,采用了两个分支架构在过去几年。它将空间信息和 semantics信息分开处理,这使得模型可以由两个不重的网络组成。然而,将不同缩度的特征进行融合成为现在许多两分支模型的性能瓶颈。在这项研究中,我们设计了一种新的融合机制,即指导计算导向的注意力模块(DGA)。我们使用该模块取代一些多尺度变换,通过计算注意力来实现与多层融合性能相同的性能。为确保我们的模块可行,我们使用了RSU块(Residual U-block)构建一个分支网络,以获得更好的多尺度特征。我们对Cityscapes和CamVid dataset进行了广泛的实验,并证明了我们的方法的有效性。
Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation
paper_authors: Binhui Xie, Shuang Li, Qingju Guo, Chi Harold Liu, Xinjing Cheng
for: This paper is written for the task of LiDAR semantic segmentation, specifically addressing the challenges of annotation laboriousness and cost-prohibitiveness in large-scale point cloud data.
methods: The paper proposes a novel active learning baseline called Annotator, which utilizes a voxel-centric online selection strategy to efficiently probe and annotate salient and exemplar voxel grids within each LiDAR scan. The method also employs a voxel confusion degree (VCD) to leverage local topology relations and structures of point clouds.
results: Annotator achieves exceptional performance across various LiDAR semantic segmentation benchmarks, including active learning, active source-free domain adaptation, and active domain adaptation. Specifically, it achieves 87.8% fully-supervised performance under AL, 88.5% under ASFDA, and 94.4% under ADA, with significantly fewer annotations required (e.g., just labeling five voxels per scan in the SynLiDAR-to-SemanticKITTI task).Here’s the information in Simplified Chinese text:
methods: 论文提出了一种基于 active learning 的新基线 called Annotator,它使用了一种矢量-中心的在线选择策略来有效地探索和标注 LiDAR 扫描数据中的重要和 Representative 矢量网格。该方法还利用了矢量冲突度 (VCD) 来利用点云的地方结构和关系。
results: Annotator 在多个 LiDAR semantic segmentation benchmark 上表现出色,包括 active learning、active source-free domain adaptation 和 active domain adaptation。具体来说,它在 SynLiDAR-to-SemanticKITTI 任务上达到了 87.8% 的全监督性性能,在 ASFDA 任务上达到了 88.5%,在 ADA 任务上达到了 94.4%,并且只需要标注每个扫描数据中的 Five 个矢量。Abstract
Active learning, a label-efficient paradigm, empowers models to interactively query an oracle for labeling new data. In the realm of LiDAR semantic segmentation, the challenges stem from the sheer volume of point clouds, rendering annotation labor-intensive and cost-prohibitive. This paper presents Annotator, a general and efficient active learning baseline, in which a voxel-centric online selection strategy is tailored to efficiently probe and annotate the salient and exemplar voxel girds within each LiDAR scan, even under distribution shift. Concretely, we first execute an in-depth analysis of several common selection strategies such as Random, Entropy, Margin, and then develop voxel confusion degree (VCD) to exploit the local topology relations and structures of point clouds. Annotator excels in diverse settings, with a particular focus on active learning (AL), active source-free domain adaptation (ASFDA), and active domain adaptation (ADA). It consistently delivers exceptional performance across LiDAR semantic segmentation benchmarks, spanning both simulation-to-real and real-to-real scenarios. Surprisingly, Annotator exhibits remarkable efficiency, requiring significantly fewer annotations, e.g., just labeling five voxels per scan in the SynLiDAR-to-SemanticKITTI task. This results in impressive performance, achieving 87.8% fully-supervised performance under AL, 88.5% under ASFDA, and 94.4% under ADA. We envision that Annotator will offer a simple, general, and efficient solution for label-efficient 3D applications. Project page: https://binhuixie.github.io/annotator-web
摘要
aktive lerning,一种标签效率的 парадигма,使得模型可以交互地查询一个 oracle 为新数据提供标签。在 LiDAR semantic segmentation 领域,挑战来自数据量的巨大和标注劳动 INTENSIVE 和成本高昂。这篇论文介绍了 Annotator,一个通用和有效的 aktive lerning 基准,其中 tailored 一种 voxel-centric 在线选择策略,可以有效地探索和标注 LiDAR 扫描中的关键和 Representative voxel 格子。具体来说,我们首先执行了一些常见的选择策略,如 Random、Entropy、Margin 的深入分析,然后开发 voxel 冲突度(VCD),以利用 LiDAR 点云的本地 то波束关系和结构。Annotator 在多种设定下表现出色,特别是 aktive lerning(AL)、aktive source-free domain adaptation(ASFDA)和 aktive domain adaptation(ADA)。它在 LiDAR semantic segmentation 的多个标准准则上具有优秀的性能,包括 simulation-to-real 和 real-to-real 场景。 surprisingly,Annotator 的效率非常出众,只需要标注每个扫描的五个 voxel,例如 SynLiDAR-to-SemanticKITTI 任务中的 87.8% 完全监督性能。这表明 Annotator 可以提供一个简单、通用、有效的解决方案 для标签效率的 3D 应用。项目页面:https://binhuixie.github.io/annotator-web
results: 这篇论文的结果显示,这个进阶分类框架可以将皮肤损伤的分类精度提高,并且提供了更好的解释力,帮助医生更好地理解模型的决策过程。Abstract
Skin lesion segmentation plays a crucial role in the computer-aided diagnosis of melanoma. Deep Learning models have shown promise in accurately segmenting skin lesions, but their widespread adoption in real-life clinical settings is hindered by their inherent black-box nature. In domains as critical as healthcare, interpretability is not merely a feature but a fundamental requirement for model adoption. This paper proposes IARS SegNet an advanced segmentation framework built upon the SegNet baseline model. Our approach incorporates three critical components: Skip connections, residual convolutions, and a global attention mechanism onto the baseline Segnet architecture. These elements play a pivotal role in accentuating the significance of clinically relevant regions, particularly the contours of skin lesions. The inclusion of skip connections enhances the model's capacity to learn intricate contour details, while the use of residual convolutions allows for the construction of a deeper model while preserving essential image features. The global attention mechanism further contributes by extracting refined feature maps from each convolutional and deconvolutional block, thereby elevating the model's interpretability. This enhancement highlights critical regions, fosters better understanding, and leads to more accurate skin lesion segmentation for melanoma diagnosis.
摘要
皮肤损害分 segmentation在计算机辅助诊断癌症中扮演了关键性的角色。深度学习模型在精度地分 segmentation方面表现出了承诺,但它们在实际临床应用中受到了其内置的黑盒特性的限制。在如健康领域一样,可读性不仅是一个特性,而是健康领域的基本要求。这篇文章提出了IARS SegNet进一步的分 segmentation框架,基于SegNet基础模型。我们的方法包括了跳过连接、复原 convolution 和全局注意力机制,这些元素在SegNet架构上起到了关键作用,特别是强调临床相关的肤肤损害区域,包括皮肤损害的 kontour。跳过连接可以增强模型学习细节,而复原 convolution 可以建立更深的模型,保留重要的图像特征。全局注意力机制可以提取每个卷积和归一化块中的细节特征,从而提高模型的可读性,并且可以更好地理解模型的决策过程。这种提高可以帮助更好地理解模型,从而提高皮肤损害分 segmentation的准确性,为诊断癌症提供了更好的支持。
Machine learning refinement of in situ images acquired by low electron dose LC-TEM
paper_authors: Hiroyasu Katsuno, Yuki Kimura, Tomoya Yamazaki, Ichigaku Takigawa for:这篇论文是用于提高在liquid-cell transmission electron microscopy(LC-TEM)中获得的图像质量的机器学习(ML)技术。methods:这篇论文使用的方法包括U-Net架构和ResNet编码器,并使用了一个自定义的图像集来训练ML模型。results:通过训练ML模型,这篇论文得到了一种可以将不确定图像转换为清晰图像的技术,并且 conversions 的时间在10ms左右。此外,通过使用软件Gatan DigitalMicrograph(DM),这种技术可以在实时进行增强。Abstract
We study a machine learning (ML) technique for refining images acquired during in situ observation using liquid-cell transmission electron microscopy (LC-TEM). Our model is constructed using a U-Net architecture and a ResNet encoder. For training our ML model, we prepared an original image dataset that contained pairs of images of samples acquired with and without a solution present. The former images were used as noisy images and the latter images were used as corresponding ground truth images. The number of pairs of image sets was $1,204$ and the image sets included images acquired at several different magnifications and electron doses. The trained model converted a noisy image into a clear image. The time necessary for the conversion was on the order of 10ms, and we applied the model to in situ observations using the software Gatan DigitalMicrograph (DM). Even if a nanoparticle was not visible in a view window in the DM software because of the low electron dose, it was visible in a successive refined image generated by our ML model.
摘要
我们研究了一种机器学习(ML)技术,用于从liquid-cell transmission electron microscopy(LC-TEM)获得的图像进行精化。我们的模型采用了U-Net架构和ResNet编码器。为了训练我们的ML模型,我们准备了一个原始图像集,其中包含了样本在不同的放大率和电子剂量下被捕捉的图像对。这些图像对中的前一个图像用作噪音图像,后一个图像用作对应的真实图像。总共有1,204对图像集,并且这些集合包含了不同放大率和电子剂量下的图像。训练模型可以将噪音图像转换成清晰图像, conversión时间约为10ms。我们使用了Gatan DigitalMicrograph(DM)软件应用该模型于室内观测。即使在DM软件中的视窗中没有显示nanoparticle因为低电子剂量,我们的ML模型可以在 successive refined image中检测到这些nanoparticle。
From Denoising Training to Test-Time Adaptation: Enhancing Domain Generalization for Medical Image Segmentation
results: 实验结果显示,Compared with基于U-Net的基eline和现有方法,DeY-Net可以实现更好的域对应性,并且可以在不同的医学影像分类任务中实现更好的结果。代码可以在https://github.com/WenRuxue/DeTTA中找到。Abstract
In medical image segmentation, domain generalization poses a significant challenge due to domain shifts caused by variations in data acquisition devices and other factors. These shifts are particularly pronounced in the most common scenario, which involves only single-source domain data due to privacy concerns. To address this, we draw inspiration from the self-supervised learning paradigm that effectively discourages overfitting to the source domain. We propose the Denoising Y-Net (DeY-Net), a novel approach incorporating an auxiliary denoising decoder into the basic U-Net architecture. The auxiliary decoder aims to perform denoising training, augmenting the domain-invariant representation that facilitates domain generalization. Furthermore, this paradigm provides the potential to utilize unlabeled data. Building upon denoising training, we propose Denoising Test Time Adaptation (DeTTA) that further: (i) adapts the model to the target domain in a sample-wise manner, and (ii) adapts to the noise-corrupted input. Extensive experiments conducted on widely-adopted liver segmentation benchmarks demonstrate significant domain generalization improvements over our baseline and state-of-the-art results compared to other methods. Code is available at https://github.com/WenRuxue/DeTTA.
摘要
医学图像分割中,领域泛化是一大挑战,主要由数据获取设备和其他因素引起的领域变化。这些变化在最常见的场景中尤为突出,只有单个源频道数据due to privacy concerns. To address this, we draw inspiration from the self-supervised learning paradigm that effectively discourages overfitting to the source domain. We propose the Denoising Y-Net (DeY-Net), a novel approach incorporating an auxiliary denoising decoder into the basic U-Net architecture. The auxiliary decoder aims to perform denoising training, augmenting the domain-invariant representation that facilitates domain generalization. Furthermore, this paradigm provides the potential to utilize unlabeled data. Building upon denoising training, we propose Denoising Test Time Adaptation (DeTTA) that further: (i) adapts the model to the target domain in a sample-wise manner, and (ii) adapts to the noise-corrupted input. Extensive experiments conducted on widely-adopted liver segmentation benchmarks demonstrate significant domain generalization improvements over our baseline and state-of-the-art results compared to other methods. Code is available at https://github.com/WenRuxue/DeTTA.Here's the translation in Traditional Chinese:医学图像分割中,领域泛化是一大挑战,主要由数据获取设备和其他因素引起的领域变化。这些变化在最常见的场景中尤为突出,只有单个源频道数据due to privacy concerns. To address this, we draw inspiration from the self-supervised learning paradigm that effectively discourages overfitting to the source domain. We propose the Denoising Y-Net (DeY-Net), a novel approach incorporating an auxiliary denoising decoder into the basic U-Net architecture. The auxiliary decoder aims to perform denoising training, augmenting the domain-invariant representation that facilitates domain generalization. Furthermore, this paradigm provides the potential to utilize unlabeled data. Building upon denoising training, we propose Denoising Test Time Adaptation (DeTTA) that further: (i) adapts the model to the target domain in a sample-wise manner, and (ii) adapts to the noise-corrupted input. Extensive experiments conducted on widely-adopted liver segmentation benchmarks demonstrate significant domain generalization improvements over our baseline and state-of-the-art results compared to other methods. Code is available at https://github.com/WenRuxue/DeTTA.
methods: 使用 U-NET 深度学习方法进行图像提升,以 correction of low-dose artifacts。
results: 比较 full-dose CT 图像,U-NET 提升的 quarter-dose CT 图像不仅视觉上有大量改善,还可以用于诊断。Abstract
The application of ionizing radiation for diagnostic imaging is common around the globe. However, the process of imaging, itself, remains to be a relatively hazardous operation. Therefore, it is preferable to use as low a dose of ionizing radiation as possible, particularly in computed tomography (CT) imaging systems, where multiple x-ray operations are performed for the reconstruction of slices of body tissues. A popular method for radiation dose reduction in CT imaging is known as the quarter-dose technique, which reduces the x-ray dose but can cause a loss of image sharpness. Since CT image reconstruction from directional x-rays is a nonlinear process, it is analytically difficult to correct the effect of dose reduction on image quality. Recent and popular deep-learning approaches provide an intriguing possibility of image enhancement for low-dose artifacts. Some recent works propose combinations of multiple deep-learning and classical methods for this purpose, which over-complicate the process. However, it is observed here that the straight utilization of the well-known U-NET provides very successful results for the correction of low-dose artifacts. Blind tests with actual radiologists reveal that the U-NET enhanced quarter-dose CT images not only provide an immense visual improvement over the low-dose versions, but also become diagnostically preferable images, even when compared to their full-dose CT versions.
摘要
全球各地的医疗机构都广泛应用离子化 radiation 进行诊断影像。然而,图像取得过程本身仍然是一个相对危险的操作。因此,使用最低化 ionizing radiation 剂量是非常重要的,特别是在计算机 tomography(CT)影像系统中,这里需要进行多个 x-ray 操作来重建体组织的slice。一种广泛使用的剂量减少技术是叫做 quarter-dose 技术,它可以减少 x-ray 剂量,但可能会导致图像的锐化度下降。由于 CT 图像重建是非线性的过程,因此分析上很难以修复剂量减少对图像质量的影响。现在的深度学习方法提供了一个有趣的可能性,即用于低剂量缺陷的图像提升。一些最近的工作提议了多种组合深度学习和经典方法的方法,但是这些方法会增加过程的复杂度。然而,在这里观察到的是,直接使用 Well-known U-NET 提供了非常成功的结果,用于低剂量缺陷的图像修复。盲测试中,U-NET 加工的 quarter-dose CT 图像不仅有极大的视觉改善,还比其全剂量 CT 版本更有优势,甚至在与其他诊断图像相比,也有较高的诊断价值。
Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior
for: The goal of this research is to create realistic motions for various characters in computer graphics.
methods: We use pose data as an alternative data source and introduce a neural motion synthesis approach through retargeting.
results: Our method can effectively combine the motion features of the source character with the pose features of the target character, even with small or noisy pose data sets. A user study showed that most participants found our retargeted motion to be more enjoyable to watch, more lifelike, and with fewer artifacts.Abstract
Creating believable motions for various characters has long been a goal in computer graphics. Current learning-based motion synthesis methods depend on extensive motion datasets, which are often challenging, if not impossible, to obtain. On the other hand, pose data is more accessible, since static posed characters are easier to create and can even be extracted from images using recent advancements in computer vision. In this paper, we utilize this alternative data source and introduce a neural motion synthesis approach through retargeting. Our method generates plausible motions for characters that have only pose data by transferring motion from an existing motion capture dataset of another character, which can have drastically different skeletons. Our experiments show that our method effectively combines the motion features of the source character with the pose features of the target character, and performs robustly with small or noisy pose data sets, ranging from a few artist-created poses to noisy poses estimated directly from images. Additionally, a conducted user study indicated that a majority of participants found our retargeted motion to be more enjoyable to watch, more lifelike in appearance, and exhibiting fewer artifacts. Project page: https://cyanzhao42.github.io/pose2motion
摘要
创造可信的动作 для不同的角色已经是计算机图形的长期目标。现有的学习基于动作合成方法通常需要大量的动作数据,而这些数据经常很困难或者不可能获得。相比之下,姿势数据更易 accessible,因为静止的姿势角色更容易创建,甚至可以从图像中提取使用最新的计算机视觉技术。在这篇论文中,我们利用这个替代的数据源,并介绍了一种神经动作合成方法,通过重定向。我们的方法可以将源Character的动作特征转移到目标Character的姿势特征中,即使这两个Character有极其不同的骨架。我们的实验表明,我们的方法可以有效地将源Character的动作特征和目标Character的姿势特征相结合,并在小或噪音的姿势数据集上表现稳定。此外,我们进行了一次用户调查,发现大多数参与者认为我们的重定向动作更加愉悦观看,更加生动一目,并且具有较少的artefacts。项目页面:https://cyanzhao42.github.io/pose2motion
Contrast-agent-induced deterministic component of CT-density in the abdominal aorta during routine angiography: proof of concept study
paper_authors: Maria R. Kodenko, Yuriy A. Vasilev, Nicholas S. Kulberg, Andrey V. Samorodov, Anton V. Vladzimirskyy, Olga V. Omelyanskaya, Roman V. Reshetnikov
results: 这个研究分析了594个CTA图像(4个研究,每个研究有144个图像,IQR=[134; 158.5],1:1正常:异常平衡),并证明了这个模型可以正确地模拟正常血液流和异常现象(如瘤、血栓和动脉分支)。Abstract
Background and objective: CTA is a gold standard of preoperative diagnosis of abdominal aorta and typically used for geometric-only characteristic extraction. We assume that a model describing the dynamic behavior of the contrast agent in the vessel can be developed from the data of routine CTA studies, allowing the procedure to be investigated and optimized without the need for additional perfusion CT studies. Obtained spatial distribution of CA can be valuable for both increasing the diagnostic value of a particular study and improving the CT data processing tools. Methods: In accordance with the Beer-Lambert law and the absence of chemical interaction between blood and CA, we postulated the existence of a deterministic CA-induced component in the CT signal density. The proposed model, having a double-sigmoid structure, contains six coefficients relevant to the properties of hemodynamics. To validate the model, expert segmentation was performed using the 3D Slicer application for the CTA data obtained from publicly available source. The model was fitted to the data using the non-linear least square method with Levenberg-Marquardt optimization. Results: We analyzed 594 CTA images (4 studies with median size of 144 slices, IQR [134; 158.5]; 1:1 normal:pathology balance). Goodness-of-fit was proved by Wilcox test (p-value > 0.05 for all cases). The proposed model correctly simulated normal blood flow and hemodynamics disturbances caused by local abnormalities (aneurysm, thrombus and arterial branching). Conclusions: Proposed approach can be useful for personalized CA modeling of vessels, improvement of CTA image processing and preparation of synthetic CT training data for artificial intelligence.
摘要
背景和目标:CTA是胃肠动脉的金标准前操作诊断方法,通常用于只是 геометрических特征提取。我们假设可以从常规CTA研究中获得动态contrast agent(CA)的模型,以便无需进行额外的血流CT研究,从而对程序进行调查和优化。所获得的CA分布可以对特定研究提供有价值的诊断价值和CT数据处理工具的改进。方法:按照贝尔-拉美伯特法则和血液和CA之间的化学反应absence,我们假设存在一个 deterministic CA引起的组件在CT信号强度中。我们提出的模型具有双 сигмоида结构,其中包含6个相关血流性质的系数。为验证模型,我们使用3D Slicer应用程序对CTA数据进行专家分割,并使用非线性最小二乘法与Levenberg-Marquardt优化算法进行适应。结果:我们分析了594个CTA图像(4个研究,每个研究中的 median slice数为144,IQR=[134; 158.5],1:1正常:疾病平衡)。好准确性被证明了由Wilcox测试(p-值>0.05 для所有 случа)。我们的模型正确模拟了正常血流和地方畸形所引起的血液和动脉异常。结论:我们的方法可以用于个性化CA模型的血管、CT图像处理的改进和人工智能 Synthetic CT 训练数据的准备。
HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds
paper_authors: Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Xiaolin Hu
for: 3D object detection in point clouds for autonomous driving systems
methods: hierarchical encoder-decoder network (HEDNet) with encoder-decoder blocks to capture long-range dependencies among features in the spatial space
results: superior detection accuracy on the Waymo Open and nuScenes datasets compared to previous state-of-the-art methods with competitive efficiencyHere’s the full text in Simplified Chinese:
results: 在 Waymo Open 和 nuScenes datasets 上,HEDNet 比前一代方法具有更高的检测精度,同时具有竞争性的效率。Abstract
3D object detection in point clouds is important for autonomous driving systems. A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene. Existing high-performance methods typically employ 3D sparse convolutional neural networks with small kernels to extract features. To reduce computational costs, these methods resort to submanifold sparse convolutions, which prevent the information exchange among spatially disconnected features. Some recent approaches have attempted to address this problem by introducing large-kernel convolutions or self-attention mechanisms, but they either achieve limited accuracy improvements or incur excessive computational costs. We propose HEDNet, a hierarchical encoder-decoder network for 3D object detection, which leverages encoder-decoder blocks to capture long-range dependencies among features in the spatial space, particularly for large and distant objects. We conducted extensive experiments on the Waymo Open and nuScenes datasets. HEDNet achieved superior detection accuracy on both datasets than previous state-of-the-art methods with competitive efficiency. The code is available at https://github.com/zhanggang001/HEDNet.
摘要
三维 объек体检测在点云中是自主驾驶系统中重要的一环。主要挑战在三维场景中笔点的罕见分布上。现有的高性能方法通常采用三维稀畴卷积神经网络,使用小kernel来提取特征。以减少计算成本,这些方法通常采用子毫稀畴卷积,这会阻止空间分布在不连续的特征之间的信息交换。一些最近的方法尝试了解决这个问题,通过引入大kernel卷积或自注意机制,但它们可能具有有限的准确性提升或过高的计算成本。我们提议了HEDNet,一种嵌入式编码器-解码器网络,用于三维 объек体检测。HEDNet利用编码器-解码器块来捕捉特征在空间空间中的长距离依赖关系,特别是大小远的物体。我们在Waymo开放和nuScenes数据集上进行了广泛的实验,HEDNet在两个数据集上 than previous state-of-the-art methods with competitive efficiency.代码可以在https://github.com/zhanggang001/HEDNet上获取。
UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer
results: 实验结果表明,该方法在水下彩色图像增强 tasks中表现出色,与当前状态OFTHE-ART方法相比,具有更高的质量和更多的视觉质量。Abstract
Underwater images often exhibit poor quality, imbalanced coloration, and low contrast due to the complex and intricate interaction of light, water, and objects. Despite the significant contributions of previous underwater enhancement techniques, there exist several problems that demand further improvement: (i) Current deep learning methodologies depend on Convolutional Neural Networks (CNNs) that lack multi-scale enhancement and also have limited global perception fields. (ii) The scarcity of paired real-world underwater datasets poses a considerable challenge, and the utilization of synthetic image pairs risks overfitting. To address the aforementioned issues, this paper presents a Multi-scale Transformer-based Network called UWFormer for enhancing images at multiple frequencies via semi-supervised learning, in which we propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale Fusion Feed-forward Network for low-frequency enhancement. Additionally, we introduce a specialized underwater semi-supervised training strategy, proposing a Subaqueous Perceptual Loss function to generate reliable pseudo labels. Experiments using full-reference and non-reference underwater benchmarks demonstrate that our method outperforms state-of-the-art methods in terms of both quantity and visual quality.
摘要
水下图像经常具有低质量、不均匀的颜色和低对比度,这是由光、水和 объек们之间复杂的互动所致。 despite previous underwater enhancement techniques的重要贡献,还存在一些需要进一步改进的问题:(i)当前的深度学习方法依赖于卷积神经网络(CNN),它们缺乏多scale增强和全局观察领域。(ii)缺乏真实世界水下数据对的对应数据对,使用synthetic图像对poses the risk of overfitting。为解决以上问题,本文提出了一种叫做UWFormer的多scale transformer-based网络,通过半supervised学习来提高图像的多个频率。我们提出了一种非线性频率意识的注意机制和一种多Scale混合Feed-forward网络来进行低频增强。此外,我们还提出了一种特殊的水下半supervised训练策略,包括一种Subaqueous Perceptual损失函数来生成可靠的 Pseudo标签。实验表明,我们的方法在full-reference和非参照水下标准测试集上比前状态方法更高效。
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
results: 实现了一种统一的建筑物体检测领域的高效方法,并在图像和视频CODBenchmark中表现出优于现有状态的方法。Abstract
Recent camouflaged object detection (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. Apart from the high intrinsic similarity between camouflaged objects and their background, objects are usually diverse in scale, fuzzy in appearance, and even severely occluded. To this end, we propose an effective unified collaborative pyramid network which mimics human behavior when observing vague images and videos, \textit{i.e.}, zooming in and out. Specifically, our approach employs the zooming strategy to learn discriminative mixed-scale semantics by the multi-head scale integration and rich granularity perception units, which are designed to fully explore imperceptible clues between candidate objects and background surroundings. The former's intrinsic multi-head aggregation provides more diverse visual patterns. The latter's routing mechanism can effectively propagate inter-frame difference in spatiotemporal scenarios and adaptively ignore static representations. They provides a solid foundation for realizing a unified architecture for static and dynamic COD. Moreover, considering the uncertainty and ambiguity derived from indistinguishable textures, we construct a simple yet effective regularization, uncertainty awareness loss, to encourage predictions with higher confidence in candidate regions. Our highly task-friendly framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks. The code will be available at \url{https://github.com/lartpang/ZoomNeXt}.
摘要
现代隐形物检测(COD)尝试分割视觉混合到背景中的物体,这是现实世界中非常复杂和困难的任务。除了高度内在相似性外,物体通常是多样化的Scale、模糊的外观和甚至严重干扰。为此,我们提出了一种有效的统一协作pyramid网络,模仿人类观察模糊图像和视频时的行为,即“缩进和缩出”。具体来说,我们的方法使用缩进策略来学习混合尺度 semantics,通过多头聚合和丰富的粒度感知单元来完全探索不可见的准确信息。前者的内在多头聚合提供更多的视觉模式。后者的路由机制可以有效地传递间帧差异和静止表示。它们提供了一个固定的基础 для实现静止和动态COD的统一架构。此外,考虑到来自不可分辨的文本的不确定性和抽象,我们构建了一个简单 yet有效的正则化,uncertainty awareness loss,以促进更高的信任度在候选区域中的预测。我们的高度任务友好的框架在图像和视频COD标准测试上一直表现出优秀的性能,代码将于 \url{https://github.com/lartpang/ZoomNeXt} 公布。
Visible to Thermal image Translation for improving visual task in low light conditions
for: overcome the challenge of low-light conditions in visual tasks such as pedestrian detection and image-to-image translation
methods: proposed an end-to-end framework that translates RGB images into thermal ones using a generative network and a detector network
results: feasible to translate RGB training data to thermal data using GAN, which can produce thermal data more quickly and affordably for security and surveillance applicationsAbstract
Several visual tasks, such as pedestrian detection and image-to-image translation, are challenging to accomplish in low light using RGB images. Heat variation of objects in thermal images can be used to overcome this. In this work, an end-to-end framework, which consists of a generative network and a detector network, is proposed to translate RGB image into Thermal ones and compare generated thermal images with real data. We have collected images from two different locations using the Parrot Anafi Thermal drone. After that, we created a two-stream network, preprocessed, augmented, the image data, and trained the generator and discriminator models from scratch. The findings demonstrate that it is feasible to translate RGB training data to thermal data using GAN. As a result, thermal data can now be produced more quickly and affordably, which is useful for security and surveillance applications.
摘要
许多视觉任务,如行人检测和图像转换,在低光照下使用RGB图像是困难的。但是热度变化的对象在热图像中可以被利用。在这项工作中,我们提出了一个综合框架,包括生成网络和检测网络,以将RGB图像翻译成热图像并与实际数据进行比较。我们从两个不同的位置收集了Parrot Anafi热摄影机飞机所拍摄的图像。然后,我们创建了两�ream网络,进行预处理、增强和图像数据的修正,并从零开始训练生成器和判断器模型。结果表明,使用GAN可以将RGB训练数据翻译成热数据,这将为安全和监控应用程序提供更快速、更有效的方法生成热数据。
LFAA: Crafting Transferable Targeted Adversarial Examples with Low-Frequency Perturbations
paper_authors: Kunyu Wang, Juluan Shi, Wenxuan Wang
for: 防御深度神经网络 against adversarial attacks
methods: 利用图像高频成分敏感性攻击 deep neural networks, 提出了一种名为 Low-Frequency Adversarial Attack(\name)的方法,通过增加低频成分中的敏感攻击来实现目标攻击。
results: 对于ImageNet dataset,提出的方法可以显著超越当前state-of-the-art方法,提高目标攻击成功率从3.2%到15.5%。Abstract
Deep neural networks are susceptible to adversarial attacks, which pose a significant threat to their security and reliability in real-world applications. The most notable adversarial attacks are transfer-based attacks, where an adversary crafts an adversarial example to fool one model, which can also fool other models. While previous research has made progress in improving the transferability of untargeted adversarial examples, the generation of targeted adversarial examples that can transfer between models remains a challenging task. In this work, we present a novel approach to generate transferable targeted adversarial examples by exploiting the vulnerability of deep neural networks to perturbations on high-frequency components of images. We observe that replacing the high-frequency component of an image with that of another image can mislead deep models, motivating us to craft perturbations containing high-frequency information to achieve targeted attacks. To this end, we propose a method called Low-Frequency Adversarial Attack (\name), which trains a conditional generator to generate targeted adversarial perturbations that are then added to the low-frequency component of the image. Extensive experiments on ImageNet demonstrate that our proposed approach significantly outperforms state-of-the-art methods, improving targeted attack success rates by a margin from 3.2\% to 15.5\%.
摘要
Here is the text in Simplified Chinese:深度神经网络容易受到敌意攻击,这对实际应用中的安全性和可靠性带来了重要的威胁。最引人注目的攻击方式是转移基于攻击,其中敌对者通过制作一个攻击示例来欺骗一个模型,这个示例也可以欺骗其他模型。先前的研究已经在不argeted攻击中进行了进步,但是生成可转移的targeted攻击仍然是一项复杂的任务。在这种情况下,我们提出了一种名为低频敌意攻击(Low-Frequency Adversarial Attack)的新方法,通过利用深度神经网络对高频信息的敏感性来生成可转移的targeted攻击。我们发现,将另一个图像的高频信息添加到原始图像中可以诱导深度模型出现错误,这种情况 Motivates us to generate perturbations containing high-frequency information to achieve targeted attacks. To this end, we propose a method called Low-Frequency Adversarial Attack (\name), which trains a conditional generator to generate targeted adversarial perturbations that are then added to the low-frequency component of the image. Our extensive experiments on ImageNet show that our proposed approach significantly outperforms state-of-the-art methods, improving targeted attack success rates by a margin from 3.2\% to 15.5\%.
Synthesizing Diabetic Foot Ulcer Images with Diffusion Model
paper_authors: Reza Basiri, Karim Manji, Francois Harton, Alisha Poonja, Milos R. Popovic, Shehroz S. Khan
for: 本研究旨在使用扩散模型生成人工DFU图像,并评估其真实性。
methods: 研究使用了2000张DFU图像进行训练扩散模型,然后通过扩散过程生成图像。
results: 研究发现,扩散模型可以成功地生成可以与实际DFU图像无法分辨的图像。但是,临床专家对生成图像的评估表明,生成图像的真实性较低。此外,研究发现,FID和KID指标与临床专家的评估不匹配。Abstract
Diabetic Foot Ulcer (DFU) is a serious skin wound requiring specialized care. However, real DFU datasets are limited, hindering clinical training and research activities. In recent years, generative adversarial networks and diffusion models have emerged as powerful tools for generating synthetic images with remarkable realism and diversity in many applications. This paper explores the potential of diffusion models for synthesizing DFU images and evaluates their authenticity through expert clinician assessments. Additionally, evaluation metrics such as Frechet Inception Distance (FID) and Kernel Inception Distance (KID) are examined to assess the quality of the synthetic DFU images. A dataset of 2,000 DFU images is used for training the diffusion model, and the synthetic images are generated by applying diffusion processes. The results indicate that the diffusion model successfully synthesizes visually indistinguishable DFU images. 70% of the time, clinicians marked synthetic DFU images as real DFUs. However, clinicians demonstrate higher unanimous confidence in rating real images than synthetic ones. The study also reveals that FID and KID metrics do not significantly align with clinicians' assessments, suggesting alternative evaluation approaches are needed. The findings highlight the potential of diffusion models for generating synthetic DFU images and their impact on medical training programs and research in wound detection and classification.
摘要
糖尿病足部溃疡(DFU)是一种严重的皮肤损伤,需要专业的护理。然而,实际的DFU数据集受到限制,妨碍临床培训和研究活动。在最近几年,生成对抗网络和扩散模型在许多应用场景中表现出了强大的生成能力和多样性。这篇论文探讨了扩散模型在生成DFU图像方面的潜力,并通过专业医生评估来评估生成的图像authenticity。此外,Frechet Inception Distance(FID)和Kernel Inception Distance(KID)等评估指标也被研究以评估生成的DFU图像质量。使用了2,000张DFU图像进行训练,并通过扩散过程生成了合成图像。结果表明,扩散模型成功地生成了视觉无法区分的DFU图像。70%的时间,专业医生将合成DFU图像标记为真正的DFU图像。然而,专业医生对实际图像的评估优于合成图像的评估。研究还发现,FID和KID指标与专业医生的评估不具有显著相关性,这 suggets alternative evaluation approaches are needed。这些发现高亮了扩散模型在生成合成DFU图像方面的潜力,以及它们对医疗培训计划和疾病检测和分类研究的影响。
Team I2R-VI-FF Technical Report on EPIC-KITCHENS VISOR Hand Object Segmentation Challenge 2023
paper_authors: Fen Fang, Yi Cheng, Ying Sun, Qianli Xu
for: 这个论文主要针对的是EPIC-KITCHENS VISOR Hand Object Segmentation Challenge中的手和活动对象分割任务,即基于单一帧输入的手和活动对象分割预测。
methods: 该论文结合基eline方法Point-based Rendering (PointRend)和Segment Anything Model (SAM),以提高手和活动对象分割结果的准确性,同时避免检测错失。
results: 该论文的提交在EPIC-KITCHENS VISOR HOS Challenge中的评估标准上取得了第一名,表明该方法可以有效地结合现有方法的优点,并应用自己的修改,以提高手和活动对象分割的准确性。Abstract
In this report, we present our approach to the EPIC-KITCHENS VISOR Hand Object Segmentation Challenge, which focuses on the estimation of the relation between the hands and the objects given a single frame as input. The EPIC-KITCHENS VISOR dataset provides pixel-wise annotations and serves as a benchmark for hand and active object segmentation in egocentric video. Our approach combines the baseline method, i.e., Point-based Rendering (PointRend) and the Segment Anything Model (SAM), aiming to enhance the accuracy of hand and object segmentation outcomes, while also minimizing instances of missed detection. We leverage accurate hand segmentation maps obtained from the baseline method to extract more precise hand and in-contact object segments. We utilize the class-agnostic segmentation provided by SAM and apply specific hand-crafted constraints to enhance the results. In cases where the baseline model misses the detection of hands or objects, we re-train an object detector on the training set to enhance the detection accuracy. The detected hand and in-contact object bounding boxes are then used as prompts to extract their respective segments from the output of SAM. By effectively combining the strengths of existing methods and applying our refinements, our submission achieved the 1st place in terms of evaluation criteria in the VISOR HOS Challenge.
摘要
在这份报告中,我们介绍了我们在EPIC-KITCHENS VISOR手ObjectSegmentation挑战中采取的方法,该挑战的目标是根据单一帧输入来估算手和对象之间的关系。EPIC-KITCHENS VISOR数据集提供了像素级注解,并作为 Egocentric Video中手和活动对象分割的标准准则。我们的方法结合基线方法,即Point-based Rendering(PointRend)和Segment Anything Model(SAM),以提高手和对象分割结果的准确性,同时避免错过检测的情况。我们利用基eline模型提供的准确手段图以提取更加精确的手和与之接触的对象段。我们利用SAM提供的类型不敏感分割,并应用特定的手工制约以提高结果。在基eline模型错过检测手或对象的情况下,我们将训练一个对象检测器,以提高检测精度。最后,我们使用基eline模型提供的bounding box来提取手和与之接触的对象的段。通过有效地结合现有方法和应用我们的修改,我们的提交在评估标准下得到了1st名。
Refined Equivalent Pinhole Model for Large-scale 3D Reconstruction from Spaceborne CCD Imagery
methods: 我们在本研究中引入了一种方法,将Rational Functional Model(RFM)转换为Pinhole Camera Model(PCM),并derive了这个等价的pinhole模型的错误公式,以示重建精度与图像大小的关系。我们还提出了一种多项式图像精度修正模型,通过最小二乘方法来最小化等价错误。
results: 我们在四个图像集(WHU-TLC、DFC2019、ISPRS-ZY3和GF7)上进行了实验,结果显示,重建精度与图像大小成正相关,我们的多项式图像精度修正模型可以提高重建精度和完整性,特别是对于更大的图像。Abstract
In this study, we present a large-scale earth surface reconstruction pipeline for linear-array charge-coupled device (CCD) satellite imagery. While mainstream satellite image-based reconstruction approaches perform exceptionally well, the rational functional model (RFM) is subject to several limitations. For example, the RFM has no rigorous physical interpretation and differs significantly from the pinhole imaging model; hence, it cannot be directly applied to learning-based 3D reconstruction networks and to more novel reconstruction pipelines in computer vision. Hence, in this study, we introduce a method in which the RFM is equivalent to the pinhole camera model (PCM), meaning that the internal and external parameters of the pinhole camera are used instead of the rational polynomial coefficient parameters. We then derive an error formula for this equivalent pinhole model for the first time, demonstrating the influence of the image size on the accuracy of the reconstruction. In addition, we propose a polynomial image refinement model that minimizes equivalent errors via the least squares method. The experiments were conducted using four image datasets: WHU-TLC, DFC2019, ISPRS-ZY3, and GF7. The results demonstrated that the reconstruction accuracy was proportional to the image size. Our polynomial image refinement model significantly enhanced the accuracy and completeness of the reconstruction, and achieved more significant improvements for larger-scale images.
摘要
在本研究中,我们提出了一个大规模地球表面重建管道,用于线性阵列感知元件(CCD)卫星成像。而主流卫星成像基于重建方法在质量上表现非常出色,但是理智函数模型(RFM)受到一些限制。例如,RFM没有准确的物理解释,与投射成像模型(PCM)有很大差异,因此无法直接应用于学习基于三维重建网络和计算机视觉中的更新重建管道。因此,在本研究中,我们提出了一种方法,将RFM等价于PCM,即使用内部和外部镜头参数而不是理智多项式偏参数。我们 then derivated an error formula for this equivalent pinhole model for the first time, demonstrating the influence of the image size on the accuracy of the reconstruction. In addition, we proposed a polynomial image refinement model that minimizes equivalent errors via the least squares method. Experiments were conducted using four image datasets: WHU-TLC, DFC2019, ISPRS-ZY3, and GF7. The results showed that the reconstruction accuracy was proportional to the image size. Our polynomial image refinement model significantly enhanced the accuracy and completeness of the reconstruction, and achieved more significant improvements for larger-scale images.
Medical Image Denosing via Explainable AI Feature Preserving Loss
results: 我们在三个可用的医疗图像 Dataset 上进行了广泛的实验,包括13种不同的噪声和artefacts。实验结果表明,我们的方法在去噪性、模型可解性和泛化性三个方面具有优势。Abstract
Denoising algorithms play a crucial role in medical image processing and analysis. However, classical denoising algorithms often ignore explanatory and critical medical features preservation, which may lead to misdiagnosis and legal liabilities.In this work, we propose a new denoising method for medical images that not only efficiently removes various types of noise, but also preserves key medical features throughout the process. To achieve this goal, we utilize a gradient-based eXplainable Artificial Intelligence (XAI) approach to design a feature preserving loss function. Our feature preserving loss function is motivated by the characteristic that gradient-based XAI is sensitive to noise. Through backpropagation, medical image features before and after denoising can be kept consistent. We conducted extensive experiments on three available medical image datasets, including synthesized 13 different types of noise and artifacts. The experimental results demonstrate the superiority of our method in terms of denoising performance, model explainability, and generalization.
摘要
喷涂算法在医学影像处理和分析中扮演着关键的角色。然而,经典的喷涂算法通常忽视了重要的医学特征保留,这可能导致诊断错误和法律责任。在这种情况下,我们提出了一种新的喷涂方法,该方法不仅高效地除去了多种噪声,还保留了关键的医学特征。为 достичь这个目标,我们利用了梯度基于的可追溯性人工智能(XAI)方法设计了一个特征保持损失函数。我们的特征保持损失函数受到梯度基于的XAI的敏感性噪声。通过反射传播,医学影像特征之前和之后喷涂都能保持一致。我们在三个可用的医学影像集合上进行了广泛的实验,包括13种不同的噪声和artefacts。实验结果表明,我们的方法在喷涂性能、模型可追溯性和普适性等方面具有优势。
$p$-Poisson surface reconstruction in curl-free flow from point clouds
paper_authors: Yesom Park, Taekyung Lee, Jooyoung Hahn, Myungjoo Kang
for: 该 paper 的目的是从无序点云样本中重建一个略为平滑的表面,保留几何形状,没有任何其他信息。
methods: 该 paper 使用了隐式神经表示法(INR)来重建表面。它利用了部分偏微分方程的正确监督和几何形状的基本属性,以robustly重建高质量的表面。
results: 实验结果表明,提议的 INR 方法可以提供高质量和稳定的表面重建。代码可以在 \url{https://github.com/Yebbi/PINC} 上下载。Abstract
The aim of this paper is the reconstruction of a smooth surface from an unorganized point cloud sampled by a closed surface, with the preservation of geometric shapes, without any further information other than the point cloud. Implicit neural representations (INRs) have recently emerged as a promising approach to surface reconstruction. However, the reconstruction quality of existing methods relies on ground truth implicit function values or surface normal vectors. In this paper, we show that proper supervision of partial differential equations and fundamental properties of differential vector fields are sufficient to robustly reconstruct high-quality surfaces. We cast the $p$-Poisson equation to learn a signed distance function (SDF) and the reconstructed surface is implicitly represented by the zero-level set of the SDF. For efficient training, we develop a variable splitting structure by introducing a gradient of the SDF as an auxiliary variable and impose the $p$-Poisson equation directly on the auxiliary variable as a hard constraint. Based on the curl-free property of the gradient field, we impose a curl-free constraint on the auxiliary variable, which leads to a more faithful reconstruction. Experiments on standard benchmark datasets show that the proposed INR provides a superior and robust reconstruction. The code is available at \url{https://github.com/Yebbi/PINC}.
摘要
本文的目的是从无序点云样本中重建一个平滑表面,保留几何形状,不需要任何其他信息,只使用点云样本。卷积神经表示(INR)在面重建方面最近几年得到了广泛关注。然而,现有方法的重建质量取决于真实的隐函数值或表面法向量。在这篇文章中,我们表明了对部分微分方程的监督和几何Vector场的基本属性,可以robustly重建高质量表面。我们将$p$-Poisson方程转化为学习签名函数(SDF),并将重constructed表面表示为SDF的零水平集。为了高效地训练,我们开发了变分结构,引入了梯度场作为副变量,并直接对副变量做出$p$-Poisson方程的硬约束。基于梯度场的curl-free性质,我们对副变量做出curl-free约束,从而导致更加 faithful的重建。实验表明,提议的INR可以提供高质量和Robust的重建。代码可以在 \url{https://github.com/Yebbi/PINC} 上获取。
Beyond U: Making Diffusion Models Faster & Lighter
results: 实验表明,这种方法可以在 denoising probabilistic diffusion models 中实现更好的降噪性和更快的推 diffusion 速度,并且可以降低约25% 的参数和30% 的 floating point operations (FLOPs)。此外,这种模型在等条件下的推测速度比基eline模型快上到 70%。Abstract
Diffusion models are a family of generative models that yield record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse denoising process, remains a challenge due to slow convergence rates and high computational costs. In this work, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with denoising probabilistic diffusion models, our framework operates with approximately a quarter of the parameters and 30% of the Floating Point Operations (FLOPs) compared to standard U-Nets in Denoising Diffusion Probabilistic Models (DDPMs). Furthermore, our model is up to 70% faster in inference than the baseline models when measured in equal conditions while converging to better quality solutions.
摘要
Diffusion 模型是一家 генератив的模型,在图像生成、视频生成和分子设计等任务中表现出了纪录级的能力。尽管它们具有这些能力,但在反噪处理过程中,它们的效率仍然是一个挑战,因为它们的涉及速率较慢,计算成本较高。在这项工作中,我们介绍了一种使用连续动力系统来设计一种新的反噪网络,该网络具有更好的参数效率,更快的收敛速度,并且具有更高的噪声抗性。在使用反噪概率扩散模型时,我们的框架需要约一半的参数和三分之一的浮点运算(FLOPs),相比标准 U-Net 在 Denoising Diffusion Probabilistic Models(DDPMs)中。此外,我们的模型在等条件下的执行速度比基eline模型快上至 70%。
paper_authors: Kyle Brown, Dylan M. Asmar, Mac Schwager, Mykel J. Kochenderfer
For: This paper proposes an algorithmic stack for large-scale multi-robot assembly planning to address challenges such as collision-free movement, effective task allocation, and spatial planning for parallel assembly and transportation of nested subassemblies in manufacturing processes.* Methods: The proposed algorithmic stack includes an iterative radial layout optimization procedure, a graph-repair mixed-integer program formulation, a modified greedy task allocation algorithm, a geometric heuristic, and a hill-climbing algorithm to plan collaborative carrying configurations of robot sub-teams.* Results: The paper presents empirical results demonstrating the scalability and effectiveness of the proposed approach by generating plans to manufacture a LEGO model of a Saturn V launch vehicle with 1845 parts, 306 subassemblies, and 250 robots in under three minutes on a standard laptop computer.Abstract
Mobile autonomous robots have the potential to revolutionize manufacturing processes. However, employing large robot fleets in manufacturing requires addressing challenges including collision-free movement in a shared workspace, effective multi-robot collaboration to manipulate and transport large payloads, complex task allocation due to coupled manufacturing processes, and spatial planning for parallel assembly and transportation of nested subassemblies. We propose a full algorithmic stack for large-scale multi-robot assembly planning that addresses these challenges and can synthesize construction plans for complex assemblies with thousands of parts in a matter of minutes. Our approach takes in a CAD-like product specification and automatically plans a full-stack assembly procedure for a group of robots to manufacture the product. We propose an algorithmic stack that comprises: (i) an iterative radial layout optimization procedure to define a global staging layout for the manufacturing facility, (ii) a graph-repair mixed-integer program formulation and a modified greedy task allocation algorithm to optimally allocate robots and robot sub-teams to assembly and transport tasks, (iii) a geometric heuristic and a hill-climbing algorithm to plan collaborative carrying configurations of robot sub-teams, and (iv) a distributed control policy that enables robots to execute the assembly motion plan collision-free. We also present an open-source multi-robot manufacturing simulator implemented in Julia as a resource to the research community, to test our algorithms and to facilitate multi-robot manufacturing research more broadly. Our empirical results demonstrate the scalability and effectiveness of our approach by generating plans to manufacture a LEGO model of a Saturn V launch vehicle with 1845 parts, 306 subassemblies, and 250 robots in under three minutes on a standard laptop computer.
摘要
Mobile autonomous robots have the potential to revolutionize manufacturing processes. However, employing large robot fleets in manufacturing requires addressing challenges such as collision-free movement in a shared workspace, effective multi-robot collaboration to manipulate and transport large payloads, complex task allocation due to coupled manufacturing processes, and spatial planning for parallel assembly and transportation of nested subassemblies. We propose a full algorithmic stack for large-scale multi-robot assembly planning that addresses these challenges and can synthesize construction plans for complex assemblies with thousands of parts in a matter of minutes. Our approach takes in a CAD-like product specification and automatically plans a full-stack assembly procedure for a group of robots to manufacture the product. We propose an algorithmic stack that comprises:1. 迭代径向布局优化算法来定义制造设施的全球排序布局2. 图解 mixed-integer 编程和修改的排队策略来优化机器人和机器人子队伍的分配3. 几何规则和攀登算法来规划协作携带配置4. 分布式控制策略来使机器人执行 Assembly 动作计划无碰撞我们还提供了一个基于 Julia 的多机器人制造模拟器,作为研究社区的资源,以测试我们的算法和推动多机器人制造研究。我们的实验结果表明,我们的方法可以在标准笔记计算机上下载三分钟内生成一个 LEGO 模型的 Saturn V 发射 vehicle 的制造计划,包含1845件部件、306个子组件和250个机器人。
XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision
paper_authors: Daniel Hajialigol, Hanwen Liu, Xuan Wang for:* 文章目的是提出一种新的、强度很弱的文本分类方法,以减少人工标注的需求。methods:* 该方法使用了弱相似性数据生成器,通过对文档与特定类别进行对应(如关键词匹配)进行 pseudo-标注。* 该方法还包括一个 auxiliary 任务,即用于预测单词重要性的词重要性预测任务。results:* 对于几个弱相似性文本分类数据集,XAI-CLASS 比其他弱相似性文本分类方法表现出色,得到了显著的性能提升。* 实验还表明,XAI-CLASS 可以提高模型的性能和可解释性。Abstract
Text classification aims to effectively categorize documents into pre-defined categories. Traditional methods for text classification often rely on large amounts of manually annotated training data, making the process time-consuming and labor-intensive. To address this issue, recent studies have focused on weakly-supervised and extremely weakly-supervised settings, which require minimal or no human annotation, respectively. In previous methods of weakly supervised text classification, pseudo-training data is generated by assigning pseudo-labels to documents based on their alignment (e.g., keyword matching) with specific classes. However, these methods ignore the importance of incorporating the explanations of the generated pseudo-labels, or saliency of individual words, as additional guidance during the text classification training process. To address this limitation, we propose XAI-CLASS, a novel explanation-enhanced extremely weakly-supervised text classification method that incorporates word saliency prediction as an auxiliary task. XAI-CLASS begins by employing a multi-round question-answering process to generate pseudo-training data that promotes the mutual enhancement of class labels and corresponding explanation word generation. This pseudo-training data is then used to train a multi-task framework that simultaneously learns both text classification and word saliency prediction. Extensive experiments on several weakly-supervised text classification datasets show that XAI-CLASS outperforms other weakly-supervised text classification methods significantly. Moreover, experiments demonstrate that XAI-CLASS enhances both model performance and explainability.
摘要
In previous weakly supervised text classification methods, pseudo-training data is generated by assigning pseudo-labels to documents based on their alignment (e.g., keyword matching) with specific classes. However, these methods ignore the importance of incorporating the explanations of the generated pseudo-labels, or saliency of individual words, as additional guidance during the text classification training process.To address this limitation, we propose XAI-CLASS, a novel explanation-enhanced extremely weakly-supervised text classification method that incorporates word saliency prediction as an auxiliary task. XAI-CLASS begins by employing a multi-round question-answering process to generate pseudo-training data that promotes the mutual enhancement of class labels and corresponding explanation word generation. This pseudo-training data is then used to train a multi-task framework that simultaneously learns both text classification and word saliency prediction.Extensive experiments on several weakly-supervised text classification datasets show that XAI-CLASS outperforms other weakly-supervised text classification methods significantly. Moreover, experiments demonstrate that XAI-CLASS enhances both model performance and explainability.
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
results: 通过对 Large Language Models 进行评估,显示了减少 adversarial 攻击成功率达 60% 的潜在提升,这对下一代更可靠和抗击的对话代理系统产生了前景。Abstract
Large Language Models' safety remains a critical concern due to their vulnerability to adversarial attacks, which can prompt these systems to produce harmful responses. In the heart of these systems lies a safety classifier, a computational model trained to discern and mitigate potentially harmful, offensive, or unethical outputs. However, contemporary safety classifiers, despite their potential, often fail when exposed to inputs infused with adversarial noise. In response, our study introduces the Adversarial Prompt Shield (APS), a lightweight model that excels in detection accuracy and demonstrates resilience against adversarial prompts. Additionally, we propose novel strategies for autonomously generating adversarial training datasets, named Bot Adversarial Noisy Dialogue (BAND) datasets. These datasets are designed to fortify the safety classifier's robustness, and we investigate the consequences of incorporating adversarial examples into the training process. Through evaluations involving Large Language Models, we demonstrate that our classifier has the potential to decrease the attack success rate resulting from adversarial attacks by up to 60%. This advancement paves the way for the next generation of more reliable and resilient conversational agents.
摘要
大型语言模型的安全问题仍然是一个关键问题,因为它们容易受到反对攻击,这可能会让这些系统生成危险或不当的回应。在这些系统的核心里面有一个安全分类器,这是一个用于识别和mitigate potentially harmful, offensive, or unethical outputs的计算模型。然而,当前的安全分类器,尽管具有潜在的优势,frequently fail when exposed to inputs infused with adversarial noise。在回应于这个问题,我们的研究提出了反 adversarial prompt shield(APS),一个轻量级的模型,具有高度的检测精度和对反对攻击的抗性。此外,我们还提出了一些自动生成反对攻击数据集的新策略,称为Bot Adversarial Noisy Dialogue(BAND)数据集。这些数据集是用于强化安全分类器的Robustness,我们进行了对Large Language Models的评估,显示我们的分类器可以降低由反对攻击引起的攻击成功率 by up to 60%。这一进步开 up the way for the next generation of more reliable and resilient conversational agents。
Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language
paper_authors: Jimin Mun, Emily Allaway, Akhila Yerukola, Laura Vianna, Sarah-Jane Leslie, Maarten Sap
for: This paper aims to address the issue of online hate speech without censorship by exploring psychologically inspired strategies to challenge the underlying stereotypical implications of hateful language.
methods: The authors draw from psychology and philosophy literature to craft six strategies to challenge hateful language, and they examine the convincingness of each strategy through a user study and compare their usages in human- and machine-generated counterspeech datasets.
results: The study finds that human-written counterspeech uses more specific strategies to challenge the implied stereotype, whereas machine-generated counterspeech uses less specific strategies and often employs strategies that humans deem less convincing. The findings highlight the importance of accounting for the underlying stereotypical implications of speech when generating counterspeech and for better machine reasoning about anti-stereotypical examples.Here’s the same information in Simplified Chinese text:
for: 这篇论文目的是解决在互联网上发布仇恨言语的问题,不包括审查。
methods: 作者们 drew from psychology和哲学文献,摘取了六种挑战仇恨语言的策略,并通过用户研究和人工生成的反驳数据进行比较。
results: 研究发现,人工生成的反驳数据使用的是更加特定的挑战策略(例如,对恶意刻板印象的反例和外部因素),而机器生成的反驳数据则使用更加通用的策略(例如,普遍抨击仇恨言语)。此外,机器生成的反驳数据经常使用人们认为更不令人信服的策略。研究表明,在生成反驳数据时需要考虑语言下的恶意刻板印象,并且机器需要更好地理解反例。Abstract
Counterspeech, i.e., responses to counteract potential harms of hateful speech, has become an increasingly popular solution to address online hate speech without censorship. However, properly countering hateful language requires countering and dispelling the underlying inaccurate stereotypes implied by such language. In this work, we draw from psychology and philosophy literature to craft six psychologically inspired strategies to challenge the underlying stereotypical implications of hateful language. We first examine the convincingness of each of these strategies through a user study, and then compare their usages in both human- and machine-generated counterspeech datasets. Our results show that human-written counterspeech uses countering strategies that are more specific to the implied stereotype (e.g., counter examples to the stereotype, external factors about the stereotype's origins), whereas machine-generated counterspeech uses less specific strategies (e.g., generally denouncing the hatefulness of speech). Furthermore, machine-generated counterspeech often employs strategies that humans deem less convincing compared to human-produced counterspeech. Our findings point to the importance of accounting for the underlying stereotypical implications of speech when generating counterspeech and for better machine reasoning about anti-stereotypical examples.
摘要
优先抑制言论中的恶意言论,即通过反制可能带来的危害,已成为解决在线谩骂问题的增 Popular solution。然而,有效地抵消负面语言需要抵消和炸弹下面的不准确刻板印象。在这项工作中,我们从心理学和哲学文献中练习出六种心理启发的抗议策略,以挑战下面的刻板印象。我们首先通过用户研究检验每种策略的有效性,然后在人类生成的反对言论和机器生成的反对言论数据集中比较其使用情况。我们的结果表明,人类生成的反对言论更加特定地抵消刻板印象(例如,对刻板印象的反例和外部因素),而机器生成的反对言论则更多地使用不那么有力的策略(例如,普遍否决谩骂语言的危害)。此外,机器生成的反对言论经常使用人类认为不那么有力的策略。我们的发现表明,当生成反对言论时,需要考虑下面的刻板印象,并更好地机器理解反对刻板印象的示例。
Score Normalization for a Faster Diffusion Exponential Integrator Sampler
for: 这 paper 是为了快速生成Diffusion Models中的样本而提出的方法,以提高生成质量和减少积分错误。
methods: 这 paper 使用了Diffusion Exponential Integrator Sampler(DEIS),具体来说是Score Function Reparameterisation(SFP)技术,以提高生成质量和减少积分错误。
results: 该 paper 的实验结果表明,使用Score Normalisation(DEIS-SN)技术可以 Consistently improve FID compared to vanilla DEIS,在10 NFEs中提高了FID值从6.44到5.57。Abstract
Recently, zhang et al have proposed the Diffusion Exponential Integrator Sampler (DEIS) for fast generation of samples from Diffusion Models. It leverages the semi-linear nature of the probability flow ordinary differential equation (ODE) in order to greatly reduce integration error and improve generation quality at low numbers of function evaluations (NFEs). Key to this approach is the score function reparameterisation, which reduces the integration error incurred from using a fixed score function estimate over each integration step. The original authors use the default parameterisation used by models trained for noise prediction -- multiply the score by the standard deviation of the conditional forward noising distribution. We find that although the mean absolute value of this score parameterisation is close to constant for a large portion of the reverse sampling process, it changes rapidly at the end of sampling. As a simple fix, we propose to instead reparameterise the score (at inference) by dividing it by the average absolute value of previous score estimates at that time step collected from offline high NFE generations. We find that our score normalisation (DEIS-SN) consistently improves FID compared to vanilla DEIS, showing an FID improvement from 6.44 to 5.57 at 10 NFEs for our CIFAR-10 experiments. Our code is available at https://github.com/mtkresearch/Diffusion-DEIS-SN.
摘要
最近,张等人提出了快速生成Diffusion模型样本的Diffusion扩展Integrator抽取器(DEIS)。它利用Diffusion模型的半线性性来大幅减少积分误差,提高生成质量,只需要少量的函数评估(NFEs)。关键在于分数函数重parameter化,减少在每个积分步骤中使用固定分数函数估计的积分误差。原始作者使用模型用于随机噪声预测的默认参数化,将分数函数乘以 conditional forward 噪声分布的标准差。我们发现,在大部分逆抽取过程中,这个分数参数化的平均绝对值很接近常数,但在抽取过程的末尾快速变化。为了简化,我们提议在推理时对分数进行正规化(DEIS-SN),将其除以在线上高NFEs生成的上一个时间步骤的平均绝对分数估计的平均值。我们发现,我们的分数正规化(DEIS-SN)在10NFEs下的CIFAR-10实验中 consistently improve FID,从6.44降低到5.57。我们的代码可以在 GitHub上找到:https://github.com/mtkresearch/Diffusion-DEIS-SN。
RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
results: 比较与现有的3D空间特征,经过理论分析和实验验证,新的RIR-SF在多通道多说者ASR系统中具有remarkable 21.3%的相对减少 Character Error Rate(CER),并且在强 reverberation 情况下表现更加稳定和Robust。Abstract
Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges within the speech community, particularly when confronted with significant reverberation effects. In this study, we introduce a novel approach involving the convolution of overlapping speech signals with the room impulse response (RIR) corresponding to the target speaker's transmission to a microphone array. This innovative technique yields a novel spatial feature known as the RIR-SF. Through a comprehensive comparison with the previously established state-of-the-art 3D spatial feature, both theoretical analysis and experimental results substantiate the superiority of our proposed RIR-SF. We demonstrate that the RIR-SF outperforms existing methods, leading to a remarkable 21.3\% relative reduction in the Character Error Rate (CER) in multi-channel multi-talker ASR systems. Importantly, this novel feature exhibits robustness in the face of strong reverberation, surpassing the limitations of previous approaches.
摘要
多通道多个人自动语音识别(ASR)系统中存在持续的挑战,特别是在面临重要的干扰效应时。在本研究中,我们介绍了一种新的方法,即将重叠的语音信号 convolution 与目标说话人的传输到麦克风数组的房间冲击响应(RIR)。这种新的特征被称为 RIR-SF。经过了对已有状态的评估和实验结果,我们的提议的 RIR-SF 超越了现有的方法,导致了 Character Error Rate(CER)在多通道多个人 ASR 系统中具有remarkable 21.3% 的相对减少。重要的是,这种新的特征在强干扰情况下表现了 Robustness,超越了前一代方法的限制。
Two-Stage Classifier for Campaign Negativity Detection using Axis Embeddings: A Case Study on Tweets of Political Users during 2021 Presidential Election in Iran
results: 研究发现,候选人发布的 tweet 与其负面性无关,而政治人物和组织名称在 tweet 中的存在直接关系到 tweet 的负面性。In English, that would be:
for: The purpose of this study is to automate the detection of negative language in political campaigns, in order to better understand the strategies of candidates and parties.
methods: The study uses a hybrid model that combines two machine learning models to detect negative language in political tweets.
results: The study finds that the publication of a tweet by a candidate is not related to its negativity, but the presence of political persons and organizations in the tweet is directly related to its negativity.Abstract
In elections around the world, the candidates may turn their campaigns toward negativity due to the prospect of failure and time pressure. In the digital age, social media platforms such as Twitter are rich sources of political discourse. Therefore, despite the large amount of data that is published on Twitter, the automatic system for campaign negativity detection can play an essential role in understanding the strategy of candidates and parties in their campaigns. In this paper, we propose a hybrid model for detecting campaign negativity consisting of a two-stage classifier that combines the strengths of two machine learning models. Here, we have collected Persian tweets from 50 political users, including candidates and government officials. Then we annotated 5,100 of them that were published during the year before the 2021 presidential election in Iran. In the proposed model, first, the required datasets of two classifiers based on the cosine similarity of tweet embeddings with axis embeddings (which are the average of embedding in positive and negative classes of tweets) from the training set (85\%) are made, and then these datasets are considered the training set of the two classifiers in the hybrid model. Finally, our best model (RF-RF) was able to achieve 79\% for the macro F1 score and 82\% for the weighted F1 score. By running the best model on the rest of the tweets of 50 political users that were published one year before the election and with the help of statistical models, we find that the publication of a tweet by a candidate has nothing to do with the negativity of that tweet, and the presence of the names of political persons and political organizations in the tweet is directly related to its negativity.
摘要
在世界各地的选举中,候选人可能因为失败的风险和时间压力而转向负面竞选。在数字时代,社交媒体平台such as Twitter是政治讨论的丰富源泉。因此,尽管 Twitter 上发布了大量数据,但自动化竞选负面检测系统仍然可以在理解候选人和政党的竞选策略中扮演重要角色。在这篇论文中,我们提议一种混合模型来检测竞选负面,包括两个机器学习模型的两个阶段分类器。我们收集了50名政治用户的波斯语微博,包括候选人和政府官员。然后,我们对这些微博进行了5100个标注,这些微博在2021年伊朗总统选举前一年发布。在我们提议的模型中,首先制定了基于微博嵌入的cosine相似性的两个分类器的数据集(85%),然后这些数据集被用作两个分类器的混合模型的训练集。最后,我们的best模型(RF-RF)可以达到79%的macro F1分数和82%的Weighted F1分数。通过将best模型应用于其余50名政治用户发布的一年前的微博,我们发现,候选人发布的微博与负面微博之间没有直接关系,而政治人物和政治组织的名称直接关系到微博的负面性。
Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments
results: 本研究的结果包括:(1) 提供了一个新的收敛定理,用于描述Q-学习迭代在某些随机环境下的收敛性;(2) 对某些随机控制问题的解释和应用,包括部分可观察Markov决策过程(MDP)的量化approximation、部分可观察POMDP的量化approximation、finite windowapproximation和多代模型的收敛性研究等。Abstract
As a primary contribution, we present a convergence theorem for stochastic iterations, and in particular, Q-learning iterates, under a general, possibly non-Markovian, stochastic environment. Our conditions for convergence involve an ergodicity and a positivity criterion. We provide a precise characterization on the limit of the iterates and conditions on the environment and initializations for convergence. As our second contribution, we discuss the implications and applications of this theorem to a variety of stochastic control problems with non-Markovian environments involving (i) quantized approximations of fully observed Markov Decision Processes (MDPs) with continuous spaces (where quantization break down the Markovian structure), (ii) quantized approximations of belief-MDP reduced partially observable MDPS (POMDPs) with weak Feller continuity and a mild version of filter stability (which requires the knowledge of the model by the controller), (iii) finite window approximations of POMDPs under a uniform controlled filter stability (which does not require the knowledge of the model), and (iv) for multi-agent models where convergence of learning dynamics to a new class of equilibria, subjective Q-learning equilibria, will be studied. In addition to the convergence theorem, some implications of the theorem above are new to the literature and others are interpreted as applications of the convergence theorem. Some open problems are noted.
摘要
Primary Contribution:我们主要贡献是提出了涉及泛化环境的随机迭代收敛定理,特别是Q学迭代的收敛定理。我们的收敛条件包括一个ergodicity和一个正semi定理。我们提供了迭代器的准确特征和环境和初始化条件下的收敛条件。Second Contribution:我们还讨论了这个定理在各种随机控制问题中的应用和意义,包括:(i)在完全观测的Markov决策过程(MDPs)中使用量化方法时的破坏性问题。(ii)在受限 partially observable Markov decision process(POMDPs)中使用弱Feller连续性和满足策略稳定性(需要控制器知道模型)。(iii)在POMDPs中使用 finite window approximation,不需要控制器知道模型。(iv)多个代理模型中的学习动力收敛到一种新的平衡点,称为主观Q学平衡点。除了收敛定理之外,我们还提出了一些新的意义和应用,以及一些未解决的问题。
Bandit-Driven Batch Selection for Robust Learning under Label Noise
results: 在 CIFAR-10 dataset 上,我们的方法在不同水平的标签损害下 consistently 表现出色,超过现有方法。同时,我们不需要额外的计算开销,可以在复杂的机器学习应用中扩展。Abstract
We introduce a novel approach for batch selection in Stochastic Gradient Descent (SGD) training, leveraging combinatorial bandit algorithms. Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets. Experimental evaluations on the CIFAR-10 dataset reveal that our approach consistently outperforms existing methods across various levels of label corruption. Importantly, we achieve this superior performance without incurring the computational overhead commonly associated with auxiliary neural network models. This work presents a balanced trade-off between computational efficiency and model efficacy, offering a scalable solution for complex machine learning applications.
摘要
我们提出了一种新的批处理选择方法,基于 combinatorial bandit 算法,用于 Stochastic Gradient Descent(SGD)训练。我们的方法重点是在实际数据集中存在标签噪声的情况下优化学习过程。我们在 CIFAR-10 数据集上进行了实验,结果显示,我们的方法在不同水平的标签损害情况下一直表现优于现有方法,而且不需要付出较高的计算开销。这种方法实现了计算效率和模型效果之间的平衡,为复杂的机器学习应用提供了扩展性的解决方案。
Expressive Modeling Is Insufficient for Offline RL: A Tractable Inference Perspective
paper_authors: Xuejie Liu, Anji Liu, Guy Van den Broeck, Yitao Liang
for: This paper is written for offline Reinforcement Learning (RL) tasks, where the goal is to learn a policy from a set of pre-collected trajectories.
methods: The paper proposes a new approach called Trifle, which leverages modern Tractable Probabilistic Models (TPMs) to bridge the gap between good sequence models and high expected returns at evaluation time.
results: The paper achieves the most state-of-the-art scores in 9 Gym-MuJoCo benchmarks against strong baselines, and significantly outperforms prior approaches in stochastic environments and safe RL tasks with minimum algorithmic modifications.Here’s the same information in Simplified Chinese text:
methods: 这篇论文提出了一种新的方法 called Trifle,利用现代可追踪概率模型(TPMs)来在评估时 bridging 好序列模型和高预期返回之间的差距。
results: 这篇论文在9个 Gym-MuJoCo benchmark上 achieve 最佳状态的得分,并在随机环境和安全RL任务(例如动作约束)中明显超越先前的方法,只需最小的算法修改。Abstract
A popular paradigm for offline Reinforcement Learning (RL) tasks is to first fit the offline trajectories to a sequence model, and then prompt the model for actions that lead to high expected return. While a common consensus is that more expressive sequence models imply better performance, this paper highlights that tractability, the ability to exactly and efficiently answer various probabilistic queries, plays an equally important role. Specifically, due to the fundamental stochasticity from the offline data-collection policies and the environment dynamics, highly non-trivial conditional/constrained generation is required to elicit rewarding actions. While it is still possible to approximate such queries, we observe that such crude estimates significantly undermine the benefits brought by expressive sequence models. To overcome this problem, this paper proposes Trifle (Tractable Inference for Offline RL), which leverages modern Tractable Probabilistic Models (TPMs) to bridge the gap between good sequence models and high expected returns at evaluation time. Empirically, Trifle achieves the most state-of-the-art scores in 9 Gym-MuJoCo benchmarks against strong baselines. Further, owing to its tractability, Trifle significantly outperforms prior approaches in stochastic environments and safe RL tasks (e.g. with action constraints) with minimum algorithmic modifications.
摘要
一种受欢迎的探索学习(RL)任务的策略是先将偏离线轨迹适应到一个序列模型,然后使用模型来获取高预期返回的动作。虽然广泛认为更表达力强的序列模型会导致更好的表现,但这篇论文指出,可行性(可以快速和准确回答多种概率查询)也扮演着重要的角色。具体来说,由于在线上数据收集策略和环境动力学中的基本随机性,需要进行高度非线性的生成,以便获得奖励动作。虽然可以 aproximate这些查询,但我们发现这些粗略估计会很大程度下降表现。为解决这个问题,这篇论文提出了Trifle(可 tractable 探索学习),它利用现代可追踪概率模型(TPMs)来在评估时bridginggood sequence models和高预期返回之间的 gap。Empirically,Trifle在9个 Gym-MuJoCo benchmark中 achievestate-of-the-art 得分,并在随机环境和安全RL任务(例如Action constraints)中表现出优于先前的方法。此外,由于Trifle的可追踪性,它在可追踪环境和安全RL任务中具有更高的表现。
Safe multi-agent motion planning under uncertainty for drones using filtered reinforcement learning
results: 提出的方法可以实时实现多机器人的安全运动规划,并且可以在不确定的工作空间、机器人运动和感知中提供高度的安全性。Abstract
We consider the problem of safe multi-agent motion planning for drones in uncertain, cluttered workspaces. For this problem, we present a tractable motion planner that builds upon the strengths of reinforcement learning and constrained-control-based trajectory planning. First, we use single-agent reinforcement learning to learn motion plans from data that reach the target but may not be collision-free. Next, we use a convex optimization, chance constraints, and set-based methods for constrained control to ensure safety, despite the uncertainty in the workspace, agent motion, and sensing. The proposed approach can handle state and control constraints on the agents, and enforce collision avoidance among themselves and with static obstacles in the workspace with high probability. The proposed approach yields a safe, real-time implementable, multi-agent motion planner that is simpler to train than methods based solely on learning. Numerical simulations and experiments show the efficacy of the approach.
摘要
我团队考虑了多机器人在不确定、拥堵的工作空间中安全多机器人运动规划问题。我们提出了一种可控的运动规划方法,基于再增强学习和受限控制的轨迹规划。首先,我们使用单机器人再增强学习学习运动计划,从数据中学习到达目标点,但可能不是免涉碰撞的。然后,我们使用几何优化、机会约束和集合方法来保证安全性,即使在工作空间、机器人运动和感知中存在不确定性。我们的方法可以处理机器人状态和控制约束,并且在高概率下避免机器人之间和静止障碍物的碰撞。我们的方法比基于学习 alone 更加容易训练,并且实时可行。数值仿真和实验表明我们的方法的有效性。
The Generative AI Paradox: “What It Can Create, It May Not Understand”
results: 研究发现,虽然模型可以超越人类的生成能力,但它们在理解能力、理解和生成能力之间的相关性、以及对恶作剂输入的脆弱性等方面都弱于人类。这支持假设,即模型的生成能力可能不是基于理解能力的。Abstract
The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans. This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make? In this work, we posit that this tension reflects a divergence in the configuration of intelligence in today's generative models relative to intelligence in humans. Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs. This contrasts with humans, for whom basic understanding almost always precedes the ability to generate expert-level outputs. We test this hypothesis through controlled experiments analyzing generation vs. understanding in generative models, across both language and image modalities. Our results show that although models can outperform humans in generation, they consistently fall short of human capabilities in measures of understanding, as well as weaker correlation between generation and understanding performance, and more brittleness to adversarial inputs. Our findings support the hypothesis that models' generative capability may not be contingent upon understanding capability, and call for caution in interpreting artificial intelligence by analogy to human intelligence.
摘要
最近的生成型人工智能(AI)浪潮引发了历史上无 precedent 的全球关注,同时也引发了对可能性超human的人工智能水平的激动和担忧:模型现在只需几秒钟就可以生成出挑战或超越人类专家水平的输出。然而,模型仍然表现出基本的理解错误,这不是非专家人类会出现的。这给我们提出了一个 aparadox:如何才能够 conciliate seemingly superhuman capabilities 与 persistently basic errors ?在这篇文章中,我们提出了生成AI парадок斯假设:生成模型,经过直接受训练来重现专家水平的输出,获得了不依赖于——可以超越——它们的理解能力的生成能力。与人类不同,人类的基本理解通常会先于生成专家水平的输出。我们通过控制的实验分析生成vs理解在生成模型中,在语言和图像模式下进行测试。我们的结果表明,虽然模型可以超过人类的生成能力,但它们一直 fall short of human capabilities 在理解方面,以及生成和理解性能之间的较弱相关性,以及更容易受到骚扰输入的 brittleness。我们的发现支持生成AI парадок斯假设,并警告我们不要将人工智能与人类智能相提并论。
Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion
results: 研究发现,即使使用真实的图像进行训练,仍然存在 Semantic Mismatches(SM)问题,导致分类器在推断时表现不佳。此外,研究还发现了四种限制TTI系统的使用:歧义、遵循提示、缺乏多样性和表示下面的基本概念。此外,研究还发现了CLIP embeddings的几何结构。Abstract
Recent progress in text-to-image (TTI) systems, such as StableDiffusion, Imagen, and DALL-E 2, have made it possible to create realistic images with simple text prompts. It is tempting to use these systems to eliminate the manual task of obtaining natural images for training a new machine learning classifier. However, in all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference, despite the images used for training appearing realistic. Examining this apparent incongruity in detail gives insight into the limitations of the underlying image generation processes. Through the lens of diversity in image creation vs.accuracy of what is created, we dissect the differences in semantic mismatches in what is modeled in synthetic vs. natural images. This will elucidate the roles of the image-languag emodel, CLIP, and the image generation model, diffusion. We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept. We further present surprising insights into the geometry of CLIP embeddings.
摘要
Examining this apparent incongruity in detail reveals the limitations of the underlying image generation processes. By comparing the diversity of images created by TTI systems with the accuracy of those images, we can identify the sources of the mismatch. We find that there are four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept.Furthermore, we present surprising insights into the geometry of CLIP embeddings, which provide a new perspective on the limitations of TTI systems. Our findings have important implications for the use of TTI systems in machine learning and highlight the need for further research to overcome these limitations.
for: This paper focuses on improving the performance of weakly supervised whole slide image (WSI) classification using Multiple Instance Learning (MIL).
methods: The proposed method, called sparsely coded MIL (SC-MIL), uses sparse dictionary learning to capture the similarities of instances and improve the feature embeddings. The method also incorporates deep unrolling to make it compatible with deep learning.
results: The proposed SC module was shown to substantially boost the performance of state-of-the-art MIL methods in experiments on multiple datasets, with an acceptable computation cost. The codes are available at \href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}.Abstract
Multiple Instance Learning (MIL) has been widely used in weakly supervised whole slide image (WSI) classification. Typical MIL methods include a feature embedding part that embeds the instances into features via a pre-trained feature extractor and the MIL aggregator that combines instance embeddings into predictions. The current focus has been directed toward improving these parts by refining the feature embeddings through self-supervised pre-training and modeling the correlations between instances separately. In this paper, we proposed a sparsely coded MIL (SC-MIL) that addresses those two aspects at the same time by leveraging sparse dictionary learning. The sparse dictionary learning captures the similarities of instances by expressing them as a sparse linear combination of atoms in an over-complete dictionary. In addition, imposing sparsity help enhance the instance feature embeddings by suppressing irrelevant instances while retaining the most relevant ones. To make the conventional sparse coding algorithm compatible with deep learning, we unrolled it into an SC module by leveraging deep unrolling. The proposed SC module can be incorporated into any existing MIL framework in a plug-and-play manner with an acceptable computation cost. The experimental results on multiple datasets demonstrated that the proposed SC module could substantially boost the performance of state-of-the-art MIL methods. The codes are available at \href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}.
摘要
多个实例学习(MIL)在弱监督整幕图像(WSI)分类中广泛应用。典型的MIL方法包括一个特征嵌入部分,该部分使用预训练的特征提取器将实例嵌入特征,以及MIL聚合器,该聚合器将实例嵌入特征组合成预测。目前的焦点是在提高这两个部分的性能,通过自我监督预训练提高特征嵌入,并分别模型实例之间的相互关系。在这篇论文中,我们提出了一种稀盐编码MIL(SC-MIL),该方法同时解决了这两个问题。稀盐编码学习捕捉实例之间的相互关系,表示实例为稀盐线性组合的原子集中的稀盐线性组合。此外,在强制稀盐的情况下,可以增强实例特征嵌入,抑制不相关的实例,保留最相关的实例。为使用深度学习,我们将稀盐编码算法拓展到SC模块,并通过深度拓展来实现。这种SC模块可以与现有的MIL框架兼容,并且可以在插件式的方式进行搭配,计算成本可以接受。实验结果表明,提案的SC模块可以明显提高现有MIL方法的性能。代码可以在\href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}中找到。
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
results: 发现虽然总体协调低,大型模型更接近人类的视觉和更易受到视觉假设的影响。 这些发现将促进我们对人类和机器之间视觉世界的共同理解和交流的更好的理解,并为未来的计算模型提供了一个进一步的探索。Abstract
Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this question, we build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs. Our findings have shown that although the overall alignment is low, larger models are closer to human perception and more susceptible to visual illusions. Our dataset and initial findings will promote a better understanding of visual illusions in humans and machines and provide a stepping stone for future computational models that can better align humans and machines in perceiving and communicating about the shared visual world. The code and data are available at https://github.com/vl-illusion/dataset.
摘要
视力语言模型(VLM)在庞大数据量上训练,但人类视觉不一定准确反映物理世界。这引发了关键问题:VLM是否具有人类视觉中的同类型错觉,或者它们忠实地表现实际情况?为了解答这个问题,我们构建了包含五种视觉错觉的数据集,并提出四项任务来检查视觉错觉在当今最先进的VLM中。我们的发现表明,虽然总体对齐率低,大型模型更加接近人类视觉和更易受到视觉错觉的影响。我们的数据集和初步发现将促进人类和机器之间的视觉理解和沟通,并提供未来计算模型更好地与人类对视觉世界的共同感知的开始。数据集和代码可以在获取。
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
results: 研究发现,在限量数据下,同时优化ViT для主要任务和一个自我监督任务(SSAT)是非常有利的。这种方法可以让ViT获得更好的性能,而且可以降低培训时间和碳踪。实验结果显示,SSAT在10个数据集上具有优秀的数据准确性和一致性。此外,SSAT也在视频领域中实现了深伪检测的好效果,证明了其通用性。Abstract
Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite their success, ViTs lack inductive biases, which can make it difficult to train them with limited data. To address this challenge, prior studies suggest training ViTs with self-supervised learning (SSL) and fine-tuning sequentially. However, we observe that jointly optimizing ViTs for the primary task and a Self-Supervised Auxiliary Task (SSAT) is surprisingly beneficial when the amount of training data is limited. We explore the appropriate SSL tasks that can be optimized alongside the primary task, the training schemes for these tasks, and the data scale at which they can be most effective. Our findings reveal that SSAT is a powerful technique that enables ViTs to leverage the unique characteristics of both the self-supervised and primary tasks, achieving better performance than typical ViTs pre-training with SSL and fine-tuning sequentially. Our experiments, conducted on 10 datasets, demonstrate that SSAT significantly improves ViT performance while reducing carbon footprint. We also confirm the effectiveness of SSAT in the video domain for deepfake detection, showcasing its generalizability. Our code is available at https://github.com/dominickrei/Limited-data-vits.
摘要
传统的计算机视觉 Task (ViT) 在计算机视觉领域中广泛应用。尽管它们有成功,但它们缺乏逻辑假设,这可能使其在有限数据量下训练变得困难。以前的研究表明,通过自我超vision学习 (SSL) 和顺序精度调整来解决这个挑战。然而,我们发现,在有限数据量下,同时优化 ViT для主要任务和 Self-Supervised Auxiliary Task (SSAT) 是让人意外地有利的。我们探讨适合 SSL 任务的选择、训练方案和数据规模,以便在有限数据量下实现最佳性能。我们的发现表明,SSAT 是一种强大的技术,它可以让 ViT 利用自身任务和 SSL 任务之间的独特特征,从而实现 better 性能,而不需要大量的训练数据。我们的实验,在 10 个数据集上进行,表明 SSAT 可以提高 ViT 性能,同时降低碳脚印。我们还证实了 SSAT 在视频领域中的深度假设检测 task 的效果,这表明它的普适性。我们的代码可以在 上获取。
Vanishing Gradients in Reinforcement Finetuning of Language Models
results: 研究发现,在 RFT 中,当输入的奖励标准差小于模型的时候,即使预期奖励远离最佳,输入的预期梯度将消失。这会导致奖励最大化变得极其慢。通过实验和理论分析,研究发现这种消失的梯度问题是普遍存在的和有害的。然而,通过一些方法来缓解这种问题,例如在 RFT 阶段使用初始的监督训练 (SFT) 阶段,可以提高 RFT 的性能。Abstract
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which entails maximizing a (possibly learned) reward function using policy gradient algorithms. This work highlights a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model is small, even if the expected reward is far from optimal. Through experiments on an RFT benchmark and controlled environments, as well as a theoretical analysis, we then demonstrate that vanishing gradients due to small reward standard deviation are prevalent and detrimental, leading to extremely slow reward maximization. Lastly, we explore ways to overcome vanishing gradients in RFT. We find the common practice of an initial supervised finetuning (SFT) phase to be the most promising candidate, which sheds light on its importance in an RFT pipeline. Moreover, we show that a relatively small number of SFT optimization steps on as few as 1% of the input samples can suffice, indicating that the initial SFT phase need not be expensive in terms of compute and data labeling efforts. Overall, our results emphasize that being mindful for inputs whose expected gradient vanishes, as measured by the reward standard deviation, is crucial for successful execution of RFT.
摘要
预训言语模型通常通过强化训练(RFT)与人类偏好和下游任务相对适配,其中包括通过政策梯度算法 maximize 一个(可能学习的)奖励函数。这项工作揭示了 RFT 中的一个基本优化障碍:我们证明了,对于一个输入,其奖励标准差 beneath 模型时,即使预期奖励远离最优,则预期梯度都将消失。通过实验 RFT benchmark 和控制环境,以及理论分析,我们证明了这种消失梯度是普遍存在的并有害,导致奖励最大化 extremely slow。最后,我们探讨了在 RFT 中超越消失梯度的方法。我们发现,通常的初始监督 fine-tuning (SFT)阶段是最有前途的方法,这也解释了它在 RFT 链接中的重要性。此外,我们发现一些 SFT 优化步骤在 1% 的输入样本上可以得到充分的效果,这表明了初始 SFT 阶段不必耗费大量计算和数据标注努力。总的来说,我们的结果强调了在 RFT 中注意 inputs 的预期梯度消失,以measure 奖励标准差是关键的。
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
results: 该方法提出的HAP方法使用普通的ViTEncoder,却在11个人Centric标准数据集上达到了新的州OF-the-art表现,并与一个数据集的表现持平。例如,HAP在MSMT17数据集上达到了78.1% mAP,在PA-100K数据集上达到了86.54% mA,在MS COCO数据集上达到了78.2% AP,以及在3DPW数据集上达到了56.0 PA-MPJPE。Abstract
Model pre-training is essential in human-centric perception. In this paper, we first introduce masked image modeling (MIM) as a pre-training approach for this task. Upon revisiting the MIM training strategy, we reveal that human structure priors offer significant potential. Motivated by this insight, we further incorporate an intuitive human structure prior - human parts - into pre-training. Specifically, we employ this prior to guide the mask sampling process. Image patches, corresponding to human part regions, have high priority to be masked out. This encourages the model to concentrate more on body structure information during pre-training, yielding substantial benefits across a range of human-centric perception tasks. To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image. We term the entire method as HAP. HAP simply uses a plain ViT as the encoder yet establishes new state-of-the-art performance on 11 human-centric benchmarks, and on-par result on one dataset. For example, HAP achieves 78.1% mAP on MSMT17 for person re-identification, 86.54% mA on PA-100K for pedestrian attribute recognition, 78.2% AP on MS COCO for 2D pose estimation, and 56.0 PA-MPJPE on 3DPW for 3D pose and shape estimation.
摘要
模型预训练是人类视觉任务中的关键。在这篇论文中,我们首先介绍了隐藏图像模型(MIM)作为预训练方法。在检查MIM训练策略时,我们发现人体结构优先可以提供显著的潜在优势。 Motivated by this insight, we further incorporate an intuitive human structure prior - human parts - into pre-training. Specifically, we employ this prior to guide the mask sampling process. Image patches corresponding to human part regions have high priority to be masked out, which encourages the model to concentrate more on body structure information during pre-training, yielding substantial benefits across a range of human-centric perception tasks. To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image. We term the entire method as HAP. HAP uses a plain ViT as the encoder and establishes new state-of-the-art performance on 11 human-centric benchmarks, and on-par results on one dataset. For example, HAP achieves 78.1% mAP on MSMT17 for person re-identification, 86.54% mA on PA-100K for pedestrian attribute recognition, 78.2% AP on MS COCO for 2D pose estimation, and 56.0 PA-MPJPE on 3DPW for 3D pose and shape estimation.
results: 实验结果表明,使用LeMa方法可以提高 LLM 的性能,并且可以将其应用到特殊的数学问题解释模型中,例如 WizardMath 和 MetaMath。Abstract
Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.
摘要
大型语言模型(LLM)最近表现出色的推理能力,以解决 math 问题。为了进一步改善这个能力,这个工作提出了学习自错(LeMa),与人类学习过程相似。例如,一个人学生不能解决一个 math 问题,他将从错误中学习,并推断出错误的步骤,以及如何修正错误。模仿这个错误驱动学习过程,LeMa 精细调整 LLM 的过程,使其能够更好地推理。具体来说,我们首先收集了不同 LLM 的错误推理路径,然后使用 GPT-4 作为 "修正者",以(1)识别错误步骤,(2)解释错误的原因,以及(3)修正错误,并生成最终的答案。实验结果显示 LeMa 的有效性:在五个基本 LLM 和两个数学推理任务上,LeMa invariably 提高了表现,与精细调整 CoT 数据 alone 相比。甚至可以帮助特殊化 LLM 如 WizardMath 和 MetaMath,在 GSM8K 和 MATH 这两个具有挑战性的任务上获得 85.4% 的通过率和 27.1% 的率,这超过了非执行的开源模型在这些任务上的最佳表现。我们将我们的代码、数据和模型公开 disponibile 在 https://github.com/microsoft/CodeT。
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
paper_authors: Joey Hong, Anca Dragan, Sergey Levine
for: The paper focuses on the problem of offline reinforcement learning (RL) in situations where the state is partially observed or unknown.
methods: The paper proposes a new loss function called the bisimulation loss, which encourages the RL algorithm to learn a compact representation of the history that is relevant for action selection.
results: The paper shows that the proposed loss can improve the performance of offline RL in a variety of tasks, and that it is closely related to good performance.Abstract
Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a dataset consisting only of suboptimal trials. One way that this can happen is by "stitching" together the best parts of otherwise suboptimal trajectories that overlap on similar states, to create new behaviors where each individual state is in-distribution, but the overall returns are higher. However, in many interesting and complex applications, such as autonomous navigation and dialogue systems, the state is partially observed. Even worse, the state representation is unknown or not easy to define. In such cases, policies and value functions are often conditioned on observation histories instead of states. In these cases, it is not clear if the same kind of "stitching" is feasible at the level of observation histories, since two different trajectories would always have different histories, and thus "similar states" that might lead to effective stitching cannot be leveraged. Theoretically, we show that standard offline RL algorithms conditioned on observation histories suffer from poor sample complexity, in accordance with the above intuition. We then identify sufficient conditions under which offline RL can still be efficient -- intuitively, it needs to learn a compact representation of history comprising only features relevant for action selection. We introduce a bisimulation loss that captures the extent to which this happens, and propose that offline RL can explicitly optimize this loss to aid worst-case sample complexity. Empirically, we show that across a variety of tasks either our proposed loss improves performance, or the value of this loss is already minimized as a consequence of standard offline RL, indicating that it correlates well with good performance.
摘要
偏向式学习(RL)可以在理论上Synthesize更优化的行为从一个只包含不优化尝试的数据集中。一种方式是将不同的尝试中的最佳部分“缝合” вместе,创造新的行为,每个状态都是内部分布的,但总体返回高于原来的。然而,在许多有趣和复杂的应用,如自动驾驶和对话系统,状态是部分可见的。甚至更糟糕,状态表示是未知或不容易定义。在这些情况下,策略和值函数通常是根据观察历史条件的,而不是根据状态。在这些情况下,是否可以在观察历史上进行类似的“缝合”,并不可能。我们证明了标准的偏向式RL算法 conditioned on 观察历史会受到低效样本复杂性的限制,符合上述直觉。我们然后提出了可能的有效条件,其中学习一个紧凑的历史表示,其中只包含行动选择所需的特征。我们引入了一种bisimulation损失,用于衡量这种情况是否发生。我们建议在offline RL中显式优化这种损失,以提高最坏情况的样本复杂性。实验表明,在各种任务中,我们的提议的损失可以提高性能,或者标准的offline RL可以自动地最小化这种损失,这表明它与好的性能有 corrrelation。
“Pick-and-Pass” as a Hat-Trick Class for First-Principle Memory, Generalizability, and Interpretability Benchmarks
For: The paper is written to study model-free reinforcement learning algorithms and their ability to learn memory in closed drafting games, specifically the popular game “Sushi Go Party!”.* Methods: The paper uses a set of closely-related games based on the set of cards in play to establish first-principle benchmarks for studying model-free reinforcement learning algorithms.* Results: The paper produces state-of-the-art results on the Sushi Go Party! environment and quantifies the generalizability of reinforcement learning algorithms trained on various sets of cards, establishing key trends between generalized performance and the set distance between the train and evaluation game configurations. Additionally, the paper fits decision rules to interpret the strategy of the learned models and compares them to the ranking preferences of human players, finding intuitive common rules and intriguing new moves.Abstract
Closed drafting or "pick and pass" is a popular game mechanic where each round players select a card or other playable element from their hand and pass the rest to the next player. Games employing closed drafting make for great studies on memory and turn order due to their explicitly calculable memory of other players' hands. In this paper, we establish first-principle benchmarks for studying model-free reinforcement learning algorithms and their comparative ability to learn memory in a popular family of closed drafting games called "Sushi Go Party!", producing state-of-the-art results on this environment along the way. Furthermore, as Sushi Go Party! can be expressed as a set of closely-related games based on the set of cards in play, we quantify the generalizability of reinforcement learning algorithms trained on various sets of cards, establishing key trends between generalized performance and the set distance between the train and evaluation game configurations. Finally, we fit decision rules to interpret the strategy of the learned models and compare them to the ranking preferences of human players, finding intuitive common rules and intriguing new moves.
摘要
封闭 drafting 或 "挑选并传递" 是一种受欢迎的游戏机制,每回合玩家从手中选择一张卡或其他可玩元素,并将剩下的交给下一位玩家。由于这种机制的明确可计算的记忆,因此关于记忆和轮次的研究非常有价值。在这篇论文中,我们建立了基于原理的基准值,用于研究无基础学习算法在受欢迎的closed drafting游戏 "Sushi Go Party!" 中学习记忆的能力,并在这个环境中实现了state-of-the-art的结果。此外,由于Sushi Go Party! 可以表示为一系列基于卡片的游戏,我们量化了各种卡片集的对游戏性能的影响,并确定了关键的总体趋势。最后,我们采用决策规则来解释学习模型的策略,并与人类玩家的排名偏好进行比较,发现了直观的共同规则以及意外的新动作。
Histopathological Image Analysis with Style-Augmented Feature Domain Mixing for Improved Generalization
results: 比较与现有风格传输基于数据增强方法,我们的方法性能相似或更好,即使需要较少的计算和时间。Abstract
Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmentation is an emerging technique that can be used to improve the generalizability of machine learning models for histopathological images. However, existing style transfer-based methods can be computationally expensive, and they rely on artistic styles, which can negatively impact model accuracy. In this study, we propose a feature domain style mixing technique that uses adaptive instance normalization to generate style-augmented versions of images. We compare our proposed method with existing style transfer-based data augmentation methods and found that it performs similarly or better, despite requiring less computation and time. Our results demonstrate the potential of feature domain statistics mixing in the generalization of learning models for histopathological image analysis.
摘要
In this study, we propose a feature domain style mixing technique that uses adaptive instance normalization to generate style-augmented versions of images. We compare our proposed method with existing style transfer-based data augmentation methods and found that it performs similarly or better, despite requiring less computation and time. Our results demonstrate the potential of feature domain statistics mixing in the generalization of learning models for histopathological image analysis.Here is the text in Simplified Chinese: histopathological 图像是医学诊断和治疗规划中不可或缺的,但是使用机器学习解释它们的准确性却有限制,这是因为样本准备、染色和扫描协议的变化。Domain generalization想要解决这些限制,使得学习模型能够通过新的数据集或人口来泛化。 Style transfer-based 数据增强是一种emerging技术,可以提高机器学习模型对 histopathological 图像的泛化能力。然而,现有的方法可能需要大量的计算时间,并且可能会因为艺术风格而导致模型精度下降。在这种研究中,我们提出了一种特征领域样式混合技术,使用适应实例Normalization来生成样式增强版图像。我们与现有的样式传输基于的数据增强方法进行比较,发现我们的提议方法能够达到相同或更好的性能,即使需要更少的计算时间和资源。我们的结果表明特征领域统计混合在机器学习模型的泛化中具有潜力。
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
results: 研究人员成功地使用了这种方法来破坏Llama 2-Chat模型的安全训练,并在两个退回测试中将模型的拒绝率降低到1%以下。此外,研究人员还 validate了这种方法对Llama 2-Chat模型的一致性。Abstract
AI developers often apply safety alignment procedures to prevent the misuse of their AI systems. For example, before Meta released Llama 2-Chat, a collection of instruction fine-tuned large language models, they invested heavily in safety training, incorporating extensive red-teaming and reinforcement learning from human feedback. However, it remains unclear how well safety training guards against model misuse when attackers have access to model weights. We explore the robustness of safety training in language models by subversively fine-tuning the public weights of Llama 2-Chat. We employ low-rank adaptation (LoRA) as an efficient fine-tuning method. With a budget of less than $200 per model and using only one GPU, we successfully undo the safety training of Llama 2-Chat models of sizes 7B, 13B, and 70B. Specifically, our fine-tuning technique significantly reduces the rate at which the model refuses to follow harmful instructions. We achieve a refusal rate below 1% for our 70B Llama 2-Chat model on two refusal benchmarks. Our fine-tuning method retains general performance, which we validate by comparing our fine-tuned models against Llama 2-Chat across two benchmarks. Additionally, we present a selection of harmful outputs produced by our models. While there is considerable uncertainty about the scope of risks from current models, it is likely that future models will have significantly more dangerous capabilities, including the ability to hack into critical infrastructure, create dangerous bio-weapons, or autonomously replicate and adapt to new environments. We show that subversive fine-tuning is practical and effective, and hence argue that evaluating risks from fine-tuning should be a core part of risk assessments for releasing model weights.
摘要
Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback
results: 研究表明,在缺省情况下,需要考虑当前探索策略的可达性,以确定哪些空间区域要探索。基于这一点,实现了一个实用的学习系统 - GEAR,可以让机器人直接在真实环境中学习自主地,无需中断。系统通过流动机器人经验到网页界面,并且只需 periodic asynchronous 非专业人员反馈。研究在 simulate 和实际环境中展示了其效果。Abstract
Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. Project website https://guided-exploration-autonomous-rl.github.io/GEAR/.
摘要
理想情况下,我们会将机器人放置在真实环境中,让它自动学习并不断改进,通过收集更多的经验。然而,自动机器人学习算法在真实世界中实现很困难。这经常被归结为样本复杂度的问题,而且甚至使用样本效率的技术也受到两大挑战:一是设置合适的奖励函数,二是实现不间断的培训。在这项工作中,我们描述了一种能够在真实世界中进行自主学习的系统,允许代理人通过不间断的培训来展现持续改进。我们的系统利用远程非专家用户的 occasional 非专业反馈来学习有用的距离函数,并使用简单的自适应学习算法来学习目标导向策略。我们发现在不间断培训情况下,特别重要的是考虑当前探索策略的可达性。基于这一点,我们实现了一个实用的学习系统——GEAR,允许机器人直接在真实环境中培训,并不需要繁琐的手动设计奖励函数或重置机制。系统将机器人经验流向网络界面,只需要 occasional 非专业用户在远程地提供偶极性反馈。我们在一组机器人任务上进行了模拟和实际环境的测试,并证明了该系统在学习行为方面的效果。项目网站:。
What a Whole Slide Image Can Tell? Subtype-guided Masked Transformer for Pathological Image Captioning
paper_authors: Wenkang Qin, Rui Xu, Peixiang Huang, Xiaomin Wu, Heyu Zhang, Lin Luo
for: This paper proposes a new approach for pathological captioning of Whole Slide Images (WSIs) using Transformers, with the goal of improving the accuracy of computer-aided pathological diagnosis.
methods: The proposed approach, called Subtype-guided Masked Transformer (SGMT), treats a WSI as a sequence of sparse patches and generates an overall caption sentence from the sequence. An accompanying subtype prediction is introduced to guide the training process and enhance captioning accuracy. The Asymmetric Masked Mechanism approach is also used to tackle the large size constraint of pathological image captioning.
results: The authors report that their approach achieves superior performance compared to traditional RNN-based methods on the PatchGastricADC22 dataset.Abstract
Pathological captioning of Whole Slide Images (WSIs), though is essential in computer-aided pathological diagnosis, has rarely been studied due to the limitations in datasets and model training efficacy. In this paper, we propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers, which treats a WSI as a sequence of sparse patches and generates an overall caption sentence from the sequence. An accompanying subtype prediction is introduced into SGMT to guide the training process and enhance the captioning accuracy. We also present an Asymmetric Masked Mechansim approach to tackle the large size constraint of pathological image captioning, where the numbers of sequencing patches in SGMT are sampled differently in the training and inferring phases, respectively. Experiments on the PatchGastricADC22 dataset demonstrate that our approach effectively adapts to the task with a transformer-based model and achieves superior performance than traditional RNN-based methods. Our codes are to be made available for further research and development.
摘要
您好!我们在这篇论文中提出了一个新的思路,即基于Transformers的Subtype-guided Masked Transformer(SGMT),用于Computer-aided Pathological Diagnosis(CPD)中的标本描述。我们将整个标本视为一系列叠加的稀疏区块,并从这些区块中生成一个整体描述句子。此外,我们还引入了一个供应预测的子类别预测方法,以帮助训练过程并提高描述精度。此外,我们还提出了一个对应的Asymmetric Masked Mechanism方法,以解决Pathological Image Captioning中的大型数据集的限制。实验结果显示,我们的方法可以将Transformer-based模型适应这个任务,并在条件下超越传统RNN-based方法。我们的代码将会为更多的研究和发展提供。
Functional connectivity modules in recurrent neural networks: function, origin and dynamics
paper_authors: Jacob Tanner, Sina Mansour L., Ludovico Coletta, Alessandro Gozzi, Richard F. Betzel
for: 这个研究旨在解释大脑功能,尤其是神经同步现象在不同物种和组织水平上的普遍存在。
methods: 这个研究使用了回归神经网络, Investigating the functional role, origin, and dynamical implications of modular structures in correlation-based networks.
results: 研究发现,模块是功能准确的单位,贡献特殊的信息处理。模块自然形成,由输入层到回归层的偏好和重量差异引起。此外,研究发现,模块定义与类似功能连接,控制系统行为和动力学。Abstract
Understanding the ubiquitous phenomenon of neural synchronization across species and organizational levels is crucial for decoding brain function. Despite its prevalence, the specific functional role, origin, and dynamical implication of modular structures in correlation-based networks remains ambiguous. Using recurrent neural networks trained on systems neuroscience tasks, this study investigates these important characteristics of modularity in correlation networks. We demonstrate that modules are functionally coherent units that contribute to specialized information processing. We show that modules form spontaneously from asymmetries in the sign and weight of projections from the input layer to the recurrent layer. Moreover, we show that modules define connections with similar roles in governing system behavior and dynamics. Collectively, our findings clarify the function, formation, and operational significance of functional connectivity modules, offering insights into cortical function and laying the groundwork for further studies on brain function, development, and dynamics.
摘要
理解跨种类和组织层次的神经同步现象的重要性,可以帮助我们解读大脑的功能。尽管这种现象非常普遍,但模块结构在相关性网络中的特定功能作用、起源和动态影响仍然不够清楚。本研究使用基于系统神经科学任务的循环神经网络进行研究,以了解这些重要特征。我们发现,模块是功能协调的单位,对特定信息处理做出贡献。我们还发现,模块在输入层到循环层的权重和符号差异的基础上自然地形成。此外,我们发现模块在控制系统行为和动力学中扮演着相似的角色。总之,我们的发现可以解释功能连接模块的功能、形成和运作意义,为大脑功能、发展和动态研究提供新的思路和方法。
Taking control: Policies to address extinction risks from advanced AI
results: 这三项政策建议可以有效地降低高度智能AI系统对人类灭绝的风险,并且可以让大多数AI创新得以继续不受限制。Abstract
This paper provides policy recommendations to reduce extinction risks from advanced artificial intelligence (AI). First, we briefly provide background information about extinction risks from AI. Second, we argue that voluntary commitments from AI companies would be an inappropriate and insufficient response. Third, we describe three policy proposals that would meaningfully address the threats from advanced AI: (1) establishing a Multinational AGI Consortium to enable democratic oversight of advanced AI (MAGIC), (2) implementing a global cap on the amount of computing power used to train an AI system (global compute cap), and (3) requiring affirmative safety evaluations to ensure that risks are kept below acceptable levels (gating critical experiments). MAGIC would be a secure, safety-focused, internationally-governed institution responsible for reducing risks from advanced AI and performing research to safely harness the benefits of AI. MAGIC would also maintain emergency response infrastructure (kill switch) to swiftly halt AI development or withdraw model deployment in the event of an AI-related emergency. The global compute cap would end the corporate race toward dangerous AI systems while enabling the vast majority of AI innovation to continue unimpeded. Gating critical experiments would ensure that companies developing powerful AI systems are required to present affirmative evidence that these models keep extinction risks below an acceptable threshold. After describing these recommendations, we propose intermediate steps that the international community could take to implement these proposals and lay the groundwork for international coordination around advanced AI.
摘要
这份报告提供了降低高级人工智能(AI)濒临灭绝风险的政策建议。首先,我们简要介绍高级AI濒临灭绝风险的背景信息。其次,我们认为企业自愿承诺是不适当和不够的回应。第三,我们描述了三个政策建议,可以实际地解决高级AI的威胁:(1)成立多国AGI协会(MAGIC),以实现多国安全协调高级AI的措施,并执行安全的AI研究;(2)实施全球计算能力套限(global compute cap),以结束危险AI系统的企业竞赛,同时允许大多数AI创新继续不受妨碍;(3)要求开发强大AI系统的公司提供证明,以确保风险保持在可接受水平(安全评估)。MAGIC是一个安全、安全感ocus的、国际治理的机构,负责降低高级AI濒临灭绝风险,并执行安全的AI研究。MAGIC还将维护紧急应急基础设施(kill switch),以快速干预AI开发或者模型部署在AI相关紧急情况下。全球计算能力套限将结束危险AI系统的企业竞赛,同时允许大多数AI创新继续不受妨碍。安全评估将确保开发强大AI系统的公司提供证明,以确保风险保持在可接受水平。文章最后,我们建议国际社会可以采取以下措施来实现这些建议,并为高级AI的国际协调做准备。
Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT
results: 在一个用于对比不同长文本分类任务的benchmark中,BERT模型经过ChunkBERT扩展后在长样本中保持稳定性,而且只用了原始内存占用的6.25%。这些结果表明,通过简单地修改预训练BERT模型,可以实现高效的finetuning和推理。Abstract
Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. Various complex methods have claimed to overcome this limit, but recent research questions the efficacy of these models across different classification tasks. These complex architectures evaluated on carefully curated long datasets perform at par or worse than simple baselines. In this work, we propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text. The proposed method is based on chunking token representations and CNN layers, making it compatible with any pre-trained BERT. We evaluate chunkBERT exclusively on a benchmark for comparing long-text classification models across a variety of tasks (including binary classification, multi-class classification, and multi-label classification). A BERT model finetuned using the ChunkBERT method performs consistently across long samples in the benchmark while utilizing only a fraction (6.25\%) of the original memory footprint. These findings suggest that efficient finetuning and inference can be achieved through simple modifications to pre-trained BERT models.
摘要
带基于Transformer的模型,尤其是BERT,在不同的自然语言处理任务中进行研究。然而,这些模型具有最大token数限制为512个,这使得在实际应用中处理长输入变得不容易。许多复杂的方法已经被提出来突破这个限制,但最近的研究表明这些模型在不同的分类任务中的效果存在问题。这些复杂的架构在手动挑选的长数据集上评估时和简单的基线模型相当或更差。在这种工作中,我们提出了一种基于BERT核心架构的简单扩展方法called ChunkBERT,该方法允许任何预训练模型进行长文本的推理。我们基于块化Token表示和CNN层,使其与任何预训练BERT模型兼容。我们在一个用于比较不同任务的长文本分类模型 benchmark 上solely evaluate ChunkBERT。一个使用ChunkBERT方法精度训练的BERT模型在长样本上表现一致,使用的内存占用量仅为原始的6.25%。这些发现表明,通过简单地修改预训练BERT模型,可以实现高效的训练和推理。
methods: 利用大规模 web 上的图片文本对数据,以及使用 captioning 模型生成的假 caption,进行模型训练。
results: CapsFusion 模型在 COCO 和 NoCaps 测试集上的 CIDEr 得分提高 18.8 和 18.3 分,在模型性能、样本效率、世界知识深度和可扩展性三个方面表现出色。Abstract
Large multimodal models demonstrate remarkable generalist ability to perform diverse multimodal tasks in a zero-shot manner. Large-scale web-based image-text pairs contribute fundamentally to this success, but suffer from excessive noise. Recent studies use alternative captions synthesized by captioning models and have achieved notable benchmark performance. However, our experiments reveal significant Scalability Deficiency and World Knowledge Loss issues in models trained with synthetic captions, which have been largely obscured by their initial benchmark success. Upon closer examination, we identify the root cause as the overly-simplified language structure and lack of knowledge details in existing synthetic captions. To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions. Extensive experiments show that CapsFusion captions exhibit remarkable all-round superiority over existing captions in terms of model performance (e.g., 18.8 and 18.3 improvements in CIDEr score on COCO and NoCaps), sample efficiency (requiring 11-16 times less computation than baselines), world knowledge depth, and scalability. These effectiveness, efficiency and scalability advantages position CapsFusion as a promising candidate for future scaling of LMM training.
摘要
大型多modal模型在零值模式下表现出了惊人的通用能力,可以完成多种多modal任务。大规模的网络上的图片文本对象贡献到这些成功的基础,但是受到过度噪音的影响。最近的研究使用了由captioning模型生成的另外的caption,并达到了很好的 benchMark性能。然而,我们的实验表明,使用生成的caption会导致Scalability Deficiency和World Knowledge Loss问题,这些问题在初始的benchMark成功后被掩盖了。经过仔细分析,我们发现了这些问题的根本原因是现有的生成caption的语言结构过于简单,缺乏知识细节。为了提供更高质量和可扩展的多modal预训练数据,我们提出了CapsFusion,一种高级框架,利用大型语言模型来整合和提高来自网络上的图片文本对象和生成caption的信息。我们的广泛实验表明,CapsFusion caption在模型性能(例如,COCO和NoCaps中的CIDEr得分提高18.8和18.3)、样本效率(需要11-16倍的计算量 menos than baseline)、世界知识深度和可扩展性方面具有惊人的优势。这些优势使CapsFusion成为未来扩展LMM训练的优秀候选人。
LLMs may Dominate Information Access: Neural Retrievers are Biased Towards LLM-Generated Texts
for: investigate the influence of LLM-generated documents on IR systems and the potential biases in neural retrieval models towards LLM-generated text.
methods: quantitative evaluation of different IR models in scenarios with human-written and LLM-generated texts, text compression analysis, and theoretical analysis.
results: the neural retrieval models tend to rank LLM-generated documents higher, which is referred to as the “source bias”; this bias is not limited to first-stage neural retrievers but also extends to second-stage neural re-rankers; the bias is due to the neural models’ ability to understand the semantic information of LLM-generated text.Abstract
Recently, the emergence of large language models (LLMs) has revolutionized the paradigm of information retrieval (IR) applications, especially in web search. With their remarkable capabilities in generating human-like texts, LLMs have created enormous texts on the Internet. As a result, IR systems in the LLMs era are facing a new challenge: the indexed documents now are not only written by human beings but also automatically generated by the LLMs. How these LLM-generated documents influence the IR systems is a pressing and still unexplored question. In this work, we conduct a quantitative evaluation of different IR models in scenarios where both human-written and LLM-generated texts are involved. Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher.We refer to this category of biases in neural retrieval models towards the LLM-generated text as the \textbf{source bias}. Moreover, we discover that this bias is not confined to the first-stage neural retrievers, but extends to the second-stage neural re-rankers. Then, we provide an in-depth analysis from the perspective of text compression and observe that neural models can better understand the semantic information of LLM-generated text, which is further substantiated by our theoretical analysis.We also discuss the potential server concerns stemming from the observed source bias and hope our findings can serve as a critical wake-up call to the IR community and beyond. To facilitate future explorations of IR in the LLM era, the constructed two new benchmarks and codes will later be available at \url{https://github.com/KID-22/LLM4IR-Bias}.
摘要
最近,大语言模型(LLM)的出现对信息检索(IR)应用领域产生了革命性的变革,特别是在网络搜索中。 LLM 的出色的文本生成能力使得互联网上有巨量的文本出现。这使得 IR 系统在 LLM 时代面临一个新的挑战:索引文档现在不仅由人类创建,还可能由 LLM 自动生成。这些 LLM 生成的文本如何影响 IR 系统是一个压力性的问题,尚未得到探索。在这种情况下,我们进行了量化评估不同 IR 模型在人类写作和 LLM 生成文本卷积中的表现。结果显示,神经搜索模型倾向于将 LLM 生成的文本排名在首位。我们称这种偏见为“源偏见”。此外,我们发现这种偏见不仅存在于第一阶段神经搜索模型中,而且还扩展到第二阶段神经重新排名器中。接着,我们从文本压缩角度进行了深入分析,并证明了神经模型对 LLM 生成文本的 semantic 信息有更好的理解能力。最后,我们讨论了可能由此观察到的服务器问题,并希望我们的发现能够为 IR 社区和更广泛的领域产生一个重要的警示。为便于未来在 LLM 时代进行 IR 探索,我们将在 GitHub 上提供两个新的benchmark和代码,可以在 \url{https://github.com/KID-22/LLM4IR-Bias} 获取。
A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
results: 实验结果表明,使用SDT模型在IEMOCAP和MELD数据集上表现出色,超过了之前的基线。Abstract
Emotion recognition in conversations (ERC), the task of recognizing the emotion of each utterance in a conversation, is crucial for building empathetic machines. Existing studies focus mainly on capturing context- and speaker-sensitive dependencies on the textual modality but ignore the significance of multimodal information. Different from emotion recognition in textual conversations, capturing intra- and inter-modal interactions between utterances, learning weights between different modalities, and enhancing modal representations play important roles in multimodal ERC. In this paper, we propose a transformer-based model with self-distillation (SDT) for the task. The transformer-based model captures intra- and inter-modal interactions by utilizing intra- and inter-modal transformers, and learns weights between modalities dynamically by designing a hierarchical gated fusion strategy. Furthermore, to learn more expressive modal representations, we treat soft labels of the proposed model as extra training supervision. Specifically, we introduce self-distillation to transfer knowledge of hard and soft labels from the proposed model to each modality. Experiments on IEMOCAP and MELD datasets demonstrate that SDT outperforms previous state-of-the-art baselines.
摘要
倾向感知在对话中(ERC),认识对话中每句话的情感,对于建立同情机器非常重要。现有研究主要集中在文本模式下捕捉上下文和发言人相关的依赖关系,而忽略多Modal信息的重要性。与文本对话的情感认识不同,在多Modal ERC中捕捉 между语音和视频语音之间的内部和交叉模式互动,学习不同模式之间的权重,以及增强模式表示都是关键。在这篇论文中,我们提议一种基于变换器的模型,并使用自适应(SDT)进行学习。变换器模型利用内部和交叉模式 transformer 来捕捉内部和交叉模式互动,并通过设计层次闭合策略来动态学习不同模式之间的权重。此外,为了学习更加表达力的模式表示,我们将提议模型的软标签作为额外的训练监督。具体来说,我们引入自适应来传递模型中的硬标签和软标签知识到每个模式。实验结果表明,SDT在IEMOCAP和MELD dataset上超过了前一个基eline。
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification
results: 实验结果表明,使用本研究提出的方法可以在两个 dataset 上实现高效的多标签专利分类,并且生成的解释能够帮助人类更好地理解预测结果。Abstract
Recent technological advancements have led to a large number of patents in a diverse range of domains, making it challenging for human experts to analyze and manage. State-of-the-art methods for multi-label patent classification rely on deep neural networks (DNNs), which are complex and often considered black-boxes due to their opaque decision-making processes. In this paper, we propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) to provide human-understandable explanations for predictions. We train several DNN models, including Bi-LSTM, CNN, and CNN-BiLSTM, and propagate the predictions backward from the output layer up to the input layer of the model to identify the relevance of words for individual predictions. Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class. Experimental results on two datasets comprising two-million patent texts demonstrate high performance in terms of various evaluation measures. The explanations generated for each prediction highlight important relevant words that align with the predicted class, making the prediction more understandable. Explainable systems have the potential to facilitate the adoption of complex AI-enabled methods for patent classification in real-world applications.
摘要
We train several DNN models, including Bi-LSTM, CNN, and CNN-BiLSTM, and propagate the predictions backward from the output layer up to the input layer of the model to identify the relevance of words for individual predictions. Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.Experimental results on two datasets comprising two-million patent texts demonstrate high performance in terms of various evaluation measures. The explanations generated for each prediction highlight important relevant words that align with the predicted class, making the prediction more understandable. Explainable systems have the potential to facilitate the adoption of complex AI-enabled methods for patent classification in real-world applications.Translated into Simplified Chinese:最近的技术进步导致了大量的专利文本,使得人类专家分析和管理变得困难。现代的多标签专利分类方法多数使用深度神经网络(DNN),它们的决策过程可能是黑盒子,很难被人类理解。在这篇论文中,我们提出了一种新的深度可解释专利分类框架,通过引入层次相关传播(LRP)来提供人类可理解的解释。我们训练了多个DNN模型,包括Bi-LSTM、CNN和CNN-BiLSTM,并将预测结果倒退到模型的输入层,以确定每个预测的单词的相关性。根据相关性分数,我们 THEN generates explanations by visualizing relevant words for the predicted patent class.实验结果表明,我们在两个数据集上,每个数据集包含200万个专利文本,得到了高效的性能。解释生成的每个预测高亮显示了与预测类相关的关键单词,使预测更容易理解。可解释系统有可能在实际应用中推广复杂的AIEnabled方法。
Global Transformer Architecture for Indoor Room Temperature Forecasting
results: 该方法可以提高建筑物内部温度预测精度,并且可以减少建筑物的能源消耗和气候变化释放,为建筑物的能源协调和维护做出了重要贡献。Abstract
A thorough regulation of building energy systems translates in relevant energy savings and in a better comfort for the occupants. Algorithms to predict the thermal state of a building on a certain time horizon with a good confidence are essential for the implementation of effective control systems. This work presents a global Transformer architecture for indoor temperature forecasting in multi-room buildings, aiming at optimizing energy consumption and reducing greenhouse gas emissions associated with HVAC systems. Recent advancements in deep learning have enabled the development of more sophisticated forecasting models compared to traditional feedback control systems. The proposed global Transformer architecture can be trained on the entire dataset encompassing all rooms, eliminating the need for multiple room-specific models, significantly improving predictive performance, and simplifying deployment and maintenance. Notably, this study is the first to apply a Transformer architecture for indoor temperature forecasting in multi-room buildings. The proposed approach provides a novel solution to enhance the accuracy and efficiency of temperature forecasting, serving as a valuable tool to optimize energy consumption and decrease greenhouse gas emissions in the building sector.
摘要
一种全面的建筑能源系统管理可以导致有效的能源储存和建筑内部的舒适度提高。预测建筑物内部温度的算法在某些时间 horizons 上具有良好的信任度是控制系统的实施所必需的。这项工作提出了一种全球转换器架构,用于多房间内部温度预测,以优化能源消耗和减少建筑物中HVAC系统的绿色气体排放。在深度学习技术的发展下,可以开发出更加复杂的预测模型,而不是传统的反馈控制系统。全球转换器架构可以在整个数据集上训练,消除多个房间特定的模型需求,显著提高预测性能,并简化部署和维护。值得一提的是,这项研究是首次应用转换器架构来预测多房间内部温度。提出的方法可以增强温度预测的准确性和效率,并作为建筑领域中能源消耗优化和绿色气体排放减少的有价值工具。
Linked Papers With Code: The Latest in Machine Learning as an RDF Knowledge Graph
results: LPWC可以在 Linked Open Data 云中作为知识图存在,提供了多种格式,包括 RDF 填充文件、SPARQL 端点 для直接网络查询以及数据源与 SemOpenAlex、Wikidata 和 DBLP 的链接。此外,LPWC还提供了知识图嵌入,使其可以 direct 应用于机器学习应用程序。Abstract
In this paper, we introduce Linked Papers With Code (LPWC), an RDF knowledge graph that provides comprehensive, current information about almost 400,000 machine learning publications. This includes the tasks addressed, the datasets utilized, the methods implemented, and the evaluations conducted, along with their results. Compared to its non-RDF-based counterpart Papers With Code, LPWC not only translates the latest advancements in machine learning into RDF format, but also enables novel ways for scientific impact quantification and scholarly key content recommendation. LPWC is openly accessible at https://linkedpaperswithcode.com and is licensed under CC-BY-SA 4.0. As a knowledge graph in the Linked Open Data cloud, we offer LPWC in multiple formats, from RDF dump files to a SPARQL endpoint for direct web queries, as well as a data source with resolvable URIs and links to the data sources SemOpenAlex, Wikidata, and DBLP. Additionally, we supply knowledge graph embeddings, enabling LPWC to be readily applied in machine learning applications.
摘要
在这篇论文中,我们介绍了 Linked Papers With Code(LPWC),一个基于 RDF 的知识格 graphs,提供了大约 400,000 篇机器学习论文的全面、当前信息。这包括论文中 Addressed 的任务、使用的数据集、实现的方法、进行的评估和结果。与其非 RDF-based 对手 Papers With Code 不同,LPWC 不仅将最新的机器学习成果翻译成 RDF 格式,还允许科学影响量量化和学术关键内容推荐。LPWC 公开 accessible 于 ,并且被licensed under CC-BY-SA 4.0。作为 Linked Open Data 云中的知识格,我们向你提供了多种格式,从 RDF 冲dump 文件到 SPARQL 终结点,以便直接在网上进行查询,以及一个可以解析 URI 和数据源 SemOpenAlex、Wikidata 和 DBLP 的数据源。此外,我们还提供了知识格嵌入,使 LPWC 可以轻松应用于机器学习应用程序。
Critical Role of Artificially Intelligent Conversational Chatbot
paper_authors: Seraj A. M. Mostafa, Md Z. Islam, Mohammad Z. Islam, Fairose Jeehan, Saujanna Jafreen, Raihan U. Islam
for: 本研究旨在探讨 chatGPT 在学术上的伦理问题、其限制和特定用户群体的虚假使用。
methods: 本研究采用了多种方法,包括实验和分析,以探讨 chatGPT 的伦理和限制。
results: 研究发现了一些可能的假设和伦理问题,以及一些解决方案,以便避免不当使用和促进负责任 AI 交互。Abstract
Artificially intelligent chatbot, such as ChatGPT, represents a recent and powerful advancement in the AI domain. Users prefer them for obtaining quick and precise answers, avoiding the usual hassle of clicking through multiple links in traditional searches. ChatGPT's conversational approach makes it comfortable and accessible for finding answers quickly and in an organized manner. However, it is important to note that these chatbots have limitations, especially in terms of providing accurate answers as well as ethical concerns. In this study, we explore various scenarios involving ChatGPT's ethical implications within academic contexts, its limitations, and the potential misuse by specific user groups. To address these challenges, we propose architectural solutions aimed at preventing inappropriate use and promoting responsible AI interactions.
摘要
人工智能聊天机器人,如ChatGPT,是当今AI领域的一项新的和强大的进步。用户喜欢使用它们以获取快速和准确的答案,而不需要遍历多个链接。聊天机器人的对话方式使得它们易于使用,并且能够组织化地提供答案。然而,这些聊天机器人存在限制,特别是在提供准确答案和伦理问题方面。在这项研究中,我们探讨了ChatGPT在学术上下文中的伦理问题、其限制以及特定用户群体可能会滥用它的问题。为解决这些挑战,我们提议了一些建筑解决方案,以防止不当使用和推动负责任AI互动。
ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology
paper_authors: Chen Tang, Frank Guerin, Chenghua Lin
for: This paper is written for researchers who need to efficiently access and organize literature from the ACL Anthology, a comprehensive collection of NLP and CL publications.
methods: The paper presents a tool called ACL Anthology Helper, which automates the process of parsing and downloading papers along with their meta-information, and stores them in a local MySQL database.
results: The tool offers over 20 operations for efficient literature retrieval, including “where,” “group,” “order,” and more, and has been successfully utilized in writing a survey paper (Tang et al.,2022a).Abstract
The ACL Anthology is an online repository that serves as a comprehensive collection of publications in the field of natural language processing (NLP) and computational linguistics (CL). This paper presents a tool called ``ACL Anthology Helper''. It automates the process of parsing and downloading papers along with their meta-information, which are then stored in a local MySQL database. This allows for efficient management of the local papers using a wide range of operations, including "where," "group," "order," and more. By providing over 20 operations, this tool significantly enhances the retrieval of literature based on specific conditions. Notably, this tool has been successfully utilised in writing a survey paper (Tang et al.,2022a). By introducing the ACL Anthology Helper, we aim to enhance researchers' ability to effectively access and organise literature from the ACL Anthology. This tool offers a convenient solution for researchers seeking to explore the ACL Anthology's vast collection of publications while allowing for more targeted and efficient literature retrieval.
摘要
ACL Anthology是一个在线存储库,它是自然语言处理(NLP)和计算语言学(CL)领域的完整收藏。这篇论文介绍了一种名为“ACL Anthology Helper”的工具。它自动将ACL Anthology中的文章和相关信息解析出来,并将其存储在本地的MySQL数据库中。这使得研究者可以使用各种操作(如“where”、“group”、“order”等)来高效管理本地文章。这个工具提供了超过20种操作,可以帮助研究者根据特定条件进行文献检索。尤其是,这个工具在写作一篇survey paper(Tang et al.,2022a)时得到了成功应用。我们通过引入ACL Anthology Helper,旨在增强研究者对ACL Anthology的文献检索和管理的能力。这个工具提供了一种方便的解决方案,帮助研究者更加高效地探索ACL Anthology的庞大文献收藏,并且允许更加Targeted和高效的文献检索。
Interpretable Neural PDE Solvers using Symbolic Frameworks
results: 研究人员通过对数据集进行符号框架和神经网络的组合,发现这种方法可以提高神经网络的解释性和准确性。Abstract
Partial differential equations (PDEs) are ubiquitous in the world around us, modelling phenomena from heat and sound to quantum systems. Recent advances in deep learning have resulted in the development of powerful neural solvers; however, while these methods have demonstrated state-of-the-art performance in both accuracy and computational efficiency, a significant challenge remains in their interpretability. Most existing methodologies prioritize predictive accuracy over clarity in the underlying mechanisms driving the model's decisions. Interpretability is crucial for trustworthiness and broader applicability, especially in scientific and engineering domains where neural PDE solvers might see the most impact. In this context, a notable gap in current research is the integration of symbolic frameworks (such as symbolic regression) into these solvers. Symbolic frameworks have the potential to distill complex neural operations into human-readable mathematical expressions, bridging the divide between black-box predictions and solutions.
摘要
Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.
AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms
results: 论文提出了一种 unified convergence theory for non-convex smooth functions in heterogeneous regime, 并且证明了这种方法的性能可以与同步方法相比。Abstract
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale and stochastic gradients associated with their local data at some iteration back in history and then return those gradients to the server without synchronizing with other workers. We present a unified convergence theory for non-convex smooth functions in the heterogeneous regime. The proposed analysis provides convergence for pure asynchronous SGD and its various modifications. Moreover, our theory explains what affects the convergence rate and what can be done to improve the performance of asynchronous algorithms. In particular, we introduce a novel asynchronous method based on worker shuffling. As a by-product of our analysis, we also demonstrate convergence guarantees for gradient-type algorithms such as SGD with random reshuffling and shuffle-once mini-batch SGD. The derived rates match the best-known results for those algorithms, highlighting the tightness of our approach. Finally, our numerical evaluations support theoretical findings and show the good practical performance of our method.
摘要
我们分析了分布式SGD中的异步类算法,在异种设置下进行分析,每个工作者都有自己的计算和通信速度,以及数据分布。在这些算法中,工作者在某个过去的迭代中计算了本地数据相关的可能偏移和随机梯度,然后将这些梯度返回给服务器而无需与其他工作者同步。我们提出了一种统一的收敛理论,用于非对称凸函数的收敛分析。我们的分析显示,异步算法的收敛率受到多种因素的影响,并且可以通过一些方法提高其性能。例如,我们引入了一种基于工作者排序的异步方法。在我们的分析中,我们还证明了SGD与随机排序和排序一次小批量SGD的收敛性能。实验结果支持我们的理论发现,并表明了我们的方法在实际应用中的良好性能。
Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks
results: 对于10万个人工生成的学习曲线,LC-PFN可以更准确地 aproximate posterior predictive distribution,并且比 MCMC 快上万倍;对于20000个真实学习曲线,LC-PFN可以达到竞争性的性能。Abstract
Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. A PFN is a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We propose LC-PFN, a PFN trained to extrapolate 10 million artificial right-censored learning curves generated from a parametric prior proposed in prior art using MCMC. We demonstrate that LC-PFN can approximate the posterior predictive distribution more accurately than MCMC, while being over 10 000 times faster. We also show that the same LC-PFN achieves competitive performance extrapolating a total of 20 000 real learning curves from four learning curve benchmarks (LCBench, NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets with varying input modalities (tabular, image, text, and protein data). Finally, we investigate its potential in the context of model selection and find that a simple LC-PFN based predictive early stopping criterion obtains 2 - 6x speed-ups on 45 of these datasets, at virtually no overhead.
摘要
学习曲线拟合目标是预测训练过程中后期模型表现,基于早期表现的预测。在这项工作中,我们认为,由于拟合学习曲线的不确定性,应采用 bayesian 方法。然而,现有方法存在以下两点问题:一是过于限制性,二是计算成本高。我们介绍了在这个上下文中的首次应用先进数据适应神经网络(PFN)。PFN 是一种基于先进数据生成的 transformer,用于在单个前进 pass 中进行 Approximate Bayesian Inference。我们提出了基于先进数据生成的 10 万个人工受限学习曲线(LC-PFN),用于预测 MCMC 生成的参数 posterior 分布。我们证明了 LC-PFN 可以更准确地预测 posterior 分布,而且比 MCMC 快上万分之一。此外,我们还证明了同一 LC-PFN 可以在四个学习曲线 benchmark 上 extrapolate 20 万个真实的学习曲线,来自训练各种模型结构(MLPs、CNNs、RNNs 和 Transformers)和 53 个不同的数据集(标量、图像、文本和蛋白质数据)。最后,我们研究了其在模型选择方面的潜在应用,并发现一个简单的 LC-PFN 基于的预测早期停止 criterion 可以在 45 个数据集上获得 2 - 6 倍的速度增加,而且几乎没有额外成本。
Analyzing the Impact of Companies on AI Research Based on Publications
results: 研究发现,与企业合作撰写的AI论文在引用数量方面表现 significanly higher,并且在线上获得了更多的关注。Abstract
Artificial Intelligence (AI) is one of the most momentous technologies of our time. Thus, it is of major importance to know which stakeholders influence AI research. Besides researchers at universities and colleges, researchers in companies have hardly been considered in this context. In this article, we consider how the influence of companies on AI research can be made measurable on the basis of scientific publishing activities. We compare academic- and company-authored AI publications published in the last decade and use scientometric data from multiple scholarly databases to look for differences across these groups and to disclose the top contributing organizations. While the vast majority of publications is still produced by academia, we find that the citation count an individual publication receives is significantly higher when it is (co-)authored by a company. Furthermore, using a variety of altmetric indicators, we notice that publications with company participation receive considerably more attention online. Finally, we place our analysis results in a broader context and present targeted recommendations to safeguard a harmonious balance between academia and industry in the realm of AI research.
摘要
人工智能(AI)是当今最重要的技术之一,因此了解AI研究的各类涉及者对其影响非常重要。在这篇文章中,我们将研究如何使AI研究中公司的影响可衡量,基于科学出版活动。我们比较了过去十年的学术机构和企业合作撰写的人工智能论文,使用多种学术数据库的科学ometrics数据来找到这些组织的差异。虽然大多数论文仍然由学术机构出版,但我们发现,与公司合作者合写的论文的引用数量较高。此外,使用多种Altmetric指标,我们发现在线关注量较高。最后,我们将分析结果置于更广阔的背景下,并提供特点化的建议,以保持学术和产业在人工智能研究中的和谐协作。
Ontologies for Models and Algorithms in Applied Mathematics and Related Disciplines
paper_authors: Björn Schembera, Frank Wübbeling, Hendrik Kleikamp, Christine Biedinger, Jochen Fiedler, Marco Reidelbach, Aurela Shehu, Burkhard Schmidt, Thomas Koprucki, Dorothea Iglezakis, Dominik Göddeke
results: 通过使用ontology和知识图,可以准确地描述数学研究数据的结构和意义,提高了数学研究数据的可访问性和共享性。例如,通过使用ontology来描述微型扰动分析的数学模型和算法,可以增强对数学研究数据的理解和应用。Abstract
In applied mathematics and related disciplines, the modeling-simulation-optimization workflow is a prominent scheme, with mathematical models and numerical algorithms playing a crucial role. For these types of mathematical research data, the Mathematical Research Data Initiative has developed, merged and implemented ontologies and knowledge graphs. This contributes to making mathematical research data FAIR by introducing semantic technology and documenting the mathematical foundations accordingly. Using the concrete example of microfracture analysis of porous media, it is shown how the knowledge of the underlying mathematical model and the corresponding numerical algorithms for its solution can be represented by the ontologies.
摘要
在应用数学和相关领域,模拟优化工作流程是一种常见的方式,数学模型和数值算法在这些数学研究数据中扮演着关键的角色。为了使数学研究数据成为可重用的(FAIR),数学研究数据Initative已经开发、合并并实现了 ontologies和知识图。这使得数学研究数据可以具有 semantics 技术和文档其数学基础。使用微裂分分析的porous media作为具体例子,这篇文章示出了ontologies可以表示数学模型的知识和相应的数值算法的解决方法。
Raising the ClaSS of Streaming Time Series Segmentation
results: 对两个大量数据集和六个实际数据存档进行实验评估,发现ClaSS比八种现有的竞争者更加精度,其空间和时间复杂度独立于分区大小,仅与滑动窗口大小有关。 ClaSS还被实现为Apache Flink流处理引擎中的窗口运算符,其吞吐量为538个数据点每秒。Abstract
Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 538 data points per second for the Apache Flink streaming engine.
摘要
今天的普遍存在的传感器 emit高频流量数字测量值,这些测量值反映人类、动物、工业、商业和自然过程的性质。这些过程的变化,例如由外部事件或内部状态变化引起的变化,将在记录的信号中 manifest。流处理时序段分 segmentation (STSS) 的任务是将流分成连续的变量大小的分段,这些分段与观察过程或实体的状态相对应。partition操作本身必须能够与输入信号频率相应。我们介绍了一种新的、高效、准确的分段算法 ClaSS。 ClaSS 使用自我超级时间序列分类来评估可能的分段的一致性,并使用统计测试检测变化点 (CP)。在我们对两个大型benchmark和六个实际数据存档进行实验时,我们发现 ClaSS 比八种现状的竞争对手更加精准。其空间和时间复杂度是 independetnent of segment size和linearOnly在滑动窗口大小。我们还提供了 ClaSS 作为窗口运算符,其均匀输入数据点数为 Apache Flink 流处理引擎的538个/秒。
Investigating Relative Performance of Transfer and Meta Learning
results: 研究发现,在不同的数据训练量和模型结构下,元学习方法在跨类和预测任务上表现更好,而转移学习方法在类相似任务上表现更好。Abstract
Over the past decade, the field of machine learning has experienced remarkable advancements. While image recognition systems have achieved impressive levels of accuracy, they continue to rely on extensive training datasets. Additionally, a significant challenge has emerged in the form of poor out-of-distribution performance, which necessitates retraining neural networks when they encounter conditions that deviate from their training data. This limitation has notably contributed to the slow progress in self-driving car technology. These pressing issues have sparked considerable interest in methods that enable neural networks to learn effectively from limited data. This paper presents the outcomes of an extensive investigation designed to compare two distinct approaches, transfer learning and meta learning, as potential solutions to this problem. The overarching objective was to establish a robust criterion for selecting the most suitable method in diverse machine learning scenarios. Building upon prior research, I expanded the comparative analysis by introducing a new meta learning method into the investigation. Subsequently, I assessed whether the findings remained consistent under varying conditions. Finally, I delved into the impact of altering the size of the training dataset on the relative performance of these methods. This comprehensive exploration has yielded insights into the conditions favoring each approach, thereby facilitating the development of a criterion for selecting the most appropriate method in any given situation
摘要
过去一个 décennio,机器学习领域有了非常出色的进步。虽然图像识别系统已经达到了很高的准确率,但它们仍然需要较长的训练数据来进行学习。此外,出现了一个重要的挑战,即对于训练数据的偏移而言,神经网络需要重新训练,这限制了自驾车技术的发展。这些问题引发了大量关注,探索有效地从有限的数据中学习神经网络的方法。这篇论文将展示了对两种不同的方法,传输学习和元学习,进行了广泛的比较分析。我们的总目标是在多样化的机器学习场景中选择最适合的方法。在先前的研究基础上,我将新引入元学习方法进行比较分析,然后评估这些结果是否在不同的条件下保持一致。最后,我研究了训练集大小如何影响这些方法之间的相对性能。这项全面的探索带来了每种方法在不同情况下的优势和缺陷,从而为选择最适合的方法提供了一个依据。
paper_authors: Benji Alwis, Nick Pears, Pengcheng Liu
for: 这篇论文提出了一种快速适应多视野感知系统 для robot 的新方法,可以将camera配置从基准设置中适应。
methods: 这篇论文使用meta-learning来精确地微调感知网络,保持政策网络不变。
results: 实验结果显示,这种方法可以将新训练集数量降低至基准性能水平。Abstract
This paper introduces a new approach for quickly adapting a multi-view visuomotor system for robots to varying camera configurations from the baseline setup. It utilises meta-learning to fine-tune the perceptual network while keeping the policy network fixed. Experimental results demonstrate a significant reduction in the number of new training episodes needed to attain baseline performance.
摘要
Here's the text in Simplified Chinese:这篇论文介绍了一种新的方法,可以快速地适应多视图视motor系统 robot 到不同的摄像头配置从基eline设置。它利用 meta-learning 来细化感知网络,保持政策网络不变。实验结果表明,需要新的训练集数量很大,以达到基eline性能。
Multi-Base Station Cooperative Sensing with AI-Aided Tracking
paper_authors: Elia Favarelli, Elisabetta Matricardi, Lorenzo Pucci, Enrico Paolini, Wen Xu, Andrea Giorgetti
for: 这个研究旨在提高 JOINT SENSING AND COMMUNICATION(JSC)网络的性能,该网络由多个基站(BS)合作,通过协调中心(FC)交换探测环境信息,同时与多个用户设备(UE)建立通信链接。
methods: 每个BS在网络中 acted as a monostatic radar系统,对监测区域进行全面扫描,生成距离角图,提供目标位置信息。图像被FC进行融合,然后使用卷积神经网络(CNN)进行目标类别预测,并使用自适应归一化算法将探测来自同一个目标的检测分组更加有效。最后,使用潜在矩阵(PHD)筛选器和多 Бер诺利米xture(MBM)筛选器来估算目标的状态。
results: numerical results表明,我们的框架可以提供很好的探测性能,实现距离估计在60cm左右,同时为UE提供通信服务,减少了对通信容量的干扰在10%到20%之间。研究还表明,在特定的Case study中,使用3个BS进行探测可以保持 Localization error在1米左右。Abstract
In this work, we investigate the performance of a joint sensing and communication (JSC) network consisting of multiple base stations (BSs) that cooperate through a fusion center (FC) to exchange information about the sensed environment while concurrently establishing communication links with a set of user equipments (UEs). Each BS within the network operates as a monostatic radar system, enabling comprehensive scanning of the monitored area and generating range-angle maps that provide information regarding the position of a group of heterogeneous objects. The acquired maps are subsequently fused in the FC. Then, a convolutional neural network (CNN) is employed to infer the category of the targets, e.g., pedestrians or vehicles, and such information is exploited by an adaptive clustering algorithm to group the detections originating from the same target more effectively. Finally, two multi-target tracking algorithms, the probability hypothesis density (PHD) filter and multi-Bernoulli mixture (MBM) filter, are applied to estimate the state of the targets. Numerical results demonstrated that our framework could provide remarkable sensing performance, achieving an optimal sub-pattern assignment (OSPA) less than 60 cm, while keeping communication services to UEs with a reduction of the communication capacity in the order of 10% to 20%. The impact of the number of BSs engaged in sensing is also examined, and we show that in the specific case study, 3 BSs ensure a localization error below 1 m.
摘要
在这项工作中,我们研究了一个共同感知和通信(JSC)网络,该网络由多个基站(BS)组成,通过协调中心(FC)进行信息交换,以便同时建立用户设备(UE)与网络的通信链接。每个BS在网络中作为单STATIC雷达系统,可进行全面扫描监测区域,生成距离角度图,提供监测区域中对象的位置信息。获取的图像被后续进行拟合神经网络(CNN)进行推断,确定目标的类别(例如,人行或车辆),并将其用于适应性归类算法,以更有效地将检测来自同一个目标的分组。最后,我们使用了概率 Hypothesis Density(PHD)筛选器和多 Bernoulli 混合(MBM)筛选器来估计目标的状态。 numerics 结果表明,我们的框架可以提供很出色的感知性能,达到至少60 cm的最佳子模式分配(OSPA),同时保持对 UE 的通信服务的减少,在10%-20%的范围内。我们还研究了BSs的数量对感知性能的影响,并显示在特定的案例研究中,3个BSs可以保持localization errorBelow 1 m。
results: 该论文提供了一些有效和理论上有保证的配置方法,并证明了这些方法的运行时间上下界与理论下界相似,同时也通过实验证明了这些方法的性能。Abstract
We present the first nontrivial procedure for configuring heuristic algorithms to maximize the utility provided to their end users while also offering theoretical guarantees about performance. Existing procedures seek configurations that minimize expected runtime. However, very recent theoretical work argues that expected runtime minimization fails to capture algorithm designers' preferences. Here we show that the utilitarian objective also confers significant algorithmic benefits. Intuitively, this is because mean runtime is dominated by extremely long runs even when they are incredibly rare; indeed, even when an algorithm never gives rise to such long runs, configuration procedures that provably minimize mean runtime must perform a huge number of experiments to demonstrate this fact. In contrast, utility is bounded and monotonically decreasing in runtime, allowing for meaningful empirical bounds on a configuration's performance. This paper builds on this idea to describe effective and theoretically sound configuration procedures. We prove upper bounds on the runtime of these procedures that are similar to theoretical lower bounds, while also demonstrating their performance empirically.
摘要
我们提出了首先的非负方法来配置假设算法以最大化它们的用户价值,同时提供了理论保证的性能。现有的方法寻找最小化预期运行时间的配置,但是非常最近的理论工作表明,预期运行时间最小化不能捕捉算法设计师的偏好。我们显示了Utilitarian目标也具有重要的算法优点。几何意义的是,平均运行时间受到极端长的运行影响,甚至当算法从未发生这些极端长运行时,配置程序还是需要实际进行大量的实验来证明这一点。相比之下,价值是封顶的和递减的,允许有意义的实验边界来评估配置的性能。这篇论文基于这个想法,描述了有效且理论上正确的配置方法。我们证明了这些方法的时间上限,与理论下限相似,而且也在实际中证明了它们的性能。
Do large language models solve verbal analogies like children do?
paper_authors: Claire E. Stevenson, Mathilde ter Veen, Rochelle Choenni, Han L. J. van der Maas, Ekaterina Shutova
for: investigate whether large language models (LLMs) solve verbal analogies using associations, similar to what children do.
methods: used verbal analogies extracted from an online adaptive learning environment, where 14,002 7-12 year olds from the Netherlands solved 622 analogies in Dutch.
results: the six tested Dutch monolingual and multilingual LLMs performed around the same level as children, with MGPT performing worst, and XLM-V and GPT-3 the best. However, when controlling for associative processes, each model’s performance level drops 1-2 years.Abstract
Analogy-making lies at the heart of human cognition. Adults solve analogies such as \textit{Horse belongs to stable like chicken belongs to ...?} by mapping relations (\textit{kept in}) and answering \textit{chicken coop}. In contrast, children often use association, e.g., answering \textit{egg}. This paper investigates whether large language models (LLMs) solve verbal analogies in A:B::C:? form using associations, similar to what children do. We use verbal analogies extracted from an online adaptive learning environment, where 14,002 7-12 year-olds from the Netherlands solved 622 analogies in Dutch. The six tested Dutch monolingual and multilingual LLMs performed around the same level as children, with MGPT performing worst, around the 7-year-old level, and XLM-V and GPT-3 the best, slightly above the 11-year-old level. However, when we control for associative processes this picture changes and each model's performance level drops 1-2 years. Further experiments demonstrate that associative processes often underlie correctly solved analogies. We conclude that the LLMs we tested indeed tend to solve verbal analogies by association with C like children do.
摘要
人类认知中的比喻创造是非常重要的。大人解决比喻问题,如“马属于牧场如鸡属于...?”,通过映射关系(如“被保持在”),并回答“鸡巢”。而孩子们通常使用关联,例如回答“蛋”。这篇论文研究了大语言模型(LLMs)是否使用关联来解决A:B::C:?的文本比喻问题。我们使用来自在线适应学习环境中的622个 dutch语文比喻,14,002名7-12岁的荷兰孩子解决了这些比喻。我们测试了6个荷兰单语言和多语言LLMs,其表现与孩子们的水平相似,MGPT表现最差,约等于7岁的水平,而XLM-V和GPT-3表现最好,稍高于11岁的水平。但当我们控制了相关过程时,每个模型的表现水平下降1-2年。进一步的实验表明,相关过程通常对正确解决比喻问题起到了关键作用。我们 conclude That the LLMs we tested tend to solve verbal analogies by association, similar to how children do.
A Comprehensive Study of GPT-4V’s Multimodal Capabilities in Medical Imaging
results: GPT-4V在医学VQA任务中能够分辨问题类型,但与现有标准准剂相比,准确率不高。此外,我们发现了传统评价指标如BLEU分数的局限性,提出了开发更加Semantic robust的评价方法的需要。在视觉固定任务中,GPT-4V表现出初步的承诺,但精度不高,特别是在特定的医学器官和标志的识别方面。Abstract
This paper presents a comprehensive evaluation of GPT-4V's capabilities across diverse medical imaging tasks, including Radiology Report Generation, Medical Visual Question Answering (VQA), and Visual Grounding. While prior efforts have explored GPT-4V's performance in medical imaging, to the best of our knowledge, our study represents the first quantitative evaluation on publicly available benchmarks. Our findings highlight GPT-4V's potential in generating descriptive reports for chest X-ray images, particularly when guided by well-structured prompts. However, its performance on the MIMIC-CXR dataset benchmark reveals areas for improvement in certain evaluation metrics, such as CIDEr. In the domain of Medical VQA, GPT-4V demonstrates proficiency in distinguishing between question types but falls short of prevailing benchmarks in terms of accuracy. Furthermore, our analysis finds the limitations of conventional evaluation metrics like the BLEU score, advocating for the development of more semantically robust assessment methods. In the field of Visual Grounding, GPT-4V exhibits preliminary promise in recognizing bounding boxes, but its precision is lacking, especially in identifying specific medical organs and signs. Our evaluation underscores the significant potential of GPT-4V in the medical imaging domain, while also emphasizing the need for targeted refinements to fully unlock its capabilities.
摘要
A Machine Learning-Based Framework for Clustering Residential Electricity Load Profiles to Enhance Demand Response Programs
results: 根据分 clustering algorithm 分析,这个案例的最佳 cluster 数量为七个,但是我们的方法发现,这七个cluster中有两个cluster,即10%的数据集,具有明显的内部不一致,因此我们将其分为九个cluster。这显示了我们的方法的标准化和多样性。Abstract
Load shapes derived from smart meter data are frequently employed to analyze daily energy consumption patterns, particularly in the context of applications like Demand Response (DR). Nevertheless, one of the most important challenges to this endeavor lies in identifying the most suitable consumer clusters with similar consumption behaviors. In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study, utilizing data from almost 5000 households in London. Four widely used clustering algorithms are applied specifically K-means, K-medoids, Hierarchical Agglomerative Clustering and Density-based Spatial Clustering. An empirical analysis as well as multiple evaluation metrics are leveraged to assess those algorithms. Following that, we redefine the problem as a probabilistic classification one, with the classifier emulating the behavior of a clustering algorithm,leveraging Explainable AI (xAI) to enhance the interpretability of our solution. According to the clustering algorithm analysis the optimal number of clusters for this case is seven. Despite that, our methodology shows that two of the clusters, almost 10\% of the dataset, exhibit significant internal dissimilarity and thus it splits them even further to create nine clusters in total. The scalability and versatility of our solution makes it an ideal choice for power utility companies aiming to segment their users for creating more targeted Demand Response programs.
摘要
<>将文本翻译成简化中文。<>智能计量数据中的形状频繁地用于分析每天的能源消耗模式,特别在应用程序方面如需求应答(DR)。然而,在这种尝试中最重要的挑战是确定最适合的消耗者群组,具有类似的消耗行为。在这篇论文中,我们提出了一种新的机器学习基于框架,通过实际案例研究,使用伦敦 almost 5000户家庭的数据。我们运用了四种广泛使用的分 clustering 算法,即 K-means、K-medoids、 Hierarchical Agglomerative Clustering 和 Density-based Spatial Clustering。我们使用实际分析和多种评价指标来评估这些算法。然后,我们将问题重新定义为一个probabilistic分类问题, классификатор模拟分 clustering 算法的行为,使用 Explainable AI (xAI) 提高解释性。根据分 clustering 算法分析,这个案例中的最佳数量为七个。尽管如此,我们的方法显示,这些数据中约10%的数据存在 significante internal不同,因此我们将其进一步分为九个群组。我们的解决方案具有可扩展性和多样性,使其成为电力供应公司为创建更有目标的需求应答计划而选择的理想选择。
Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory
paper_authors: Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger
for: This paper provides an introduction to deep learning algorithms, covering essential components such as ANN architectures and optimization algorithms, as well as theoretical aspects like approximation capacities and generalization errors.
methods: The paper reviews various deep learning methods, including fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization. It also covers optimization algorithms such as SGD, accelerated methods, and adaptive methods.
results: The paper provides a solid foundation in deep learning algorithms for students and scientists who are new to the field, and offers a firmer mathematical understanding of the objects and methods considered in deep learning for practitioners. Additionally, it reviews deep learning approximation methods for PDEs, including physics-informed neural networks (PINNs) and deep Galerkin methods.Abstract
This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorithms (such as the basic stochastic gradient descent (SGD) method, accelerated methods, and adaptive methods). We also cover several theoretical aspects of deep learning algorithms such as approximation capacities of ANNs (including a calculus for ANNs), optimization theory (including Kurdyka-{\L}ojasiewicz inequalities), and generalization errors. In the last part of the book some deep learning approximation methods for PDEs are reviewed including physics-informed neural networks (PINNs) and deep Galerkin methods. We hope that this book will be useful for students and scientists who do not yet have any background in deep learning at all and would like to gain a solid foundation as well as for practitioners who would like to obtain a firmer mathematical understanding of the objects and methods considered in deep learning.
摘要
这本书的目标是为读者提供深度学习算法的基础知识。我们详细介绍了深度学习算法中的主要组成部分,包括各种人工神经网络架构(如全连接Feedforward ANNs、卷积ANNs、回传ANNs、差分ANNs和批处理正则化)以及不同的优化算法(如基本权重补做法、加速方法和自适应方法)。我们还讲述了深度学习算法中一些理论方面的话题,例如神经网络的表达能力(包括神经网络的几何学)、优化理论(包括库德日-{\L}ojasiewicz不等式)和泛化误差。在书的最后部分,我们还详细介绍了一些深度学习方法用于解决 partial differential equations(PDEs),包括物理学 informed neural networks(PINNs)和深度Galerkin方法。我们希望这本书能够为没有深度学习背景的学生和科学家提供坚实的基础知识,同时也为已有深度学习背景的实践者提供更加强大的数学基础理解。
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model
results: 经过广泛的实验 validate the proposed method can effectively enhance the spatial awareness tasks and associated tasks of MLLM.Abstract
The Multi-Modal Large Language Model (MLLM) refers to an extension of the Large Language Model (LLM) equipped with the capability to receive and infer multi-modal data. Spatial awareness stands as one of the crucial abilities of MLLM, encompassing diverse skills related to understanding spatial relationships among objects and between objects and the scene area. Industries such as autonomous driving, smart healthcare, robotics, virtual, and augmented reality heavily demand MLLM's spatial awareness capabilities. However, there exists a noticeable gap between the current spatial awareness capabilities of MLLM and the requirements set by human needs. To address this issue, this paper proposes using more precise spatial position information between objects to guide MLLM in providing more accurate responses to user-related inquiries. Specifically, for a particular multi-modal task, we utilize algorithms for acquiring geometric spatial information and scene graphs to obtain relevant geometric spatial information and scene details of objects involved in the query. Subsequently, based on this information, we direct MLLM to address spatial awareness-related queries posed by the user. Extensive experiments were conducted in benchmarks such as MME, MM-Vet, and other multi-modal large language models. The experimental results thoroughly confirm the efficacy of the proposed method in enhancing the spatial awareness tasks and associated tasks of MLLM.
摘要
多模式大语言模型(MLLM)指的是基于大语言模型(LLM)的扩展,具有接受和理解多模式数据的能力。在多模式语言理解中,空间意识是关键的能力之一,涵盖了对物体之间和场景区域之间的物体位置关系的理解。自动驾驶、智能医疗、 робоalomics、虚拟和增强现实等领域都强烈需要 MLLM 的空间意识能力。然而,目前 MLLM 的空间意识能力与人类需求之间存在显著的差距。为解决这问题,本文提出使用更精确的物体之间的空间位置信息来导引 MLLM 在用户关注的问题中提供更加准确的回答。特别是,对于某个多模式任务,我们使用算法获取物体之间的 геометри空间信息和场景图来获取相关的 геометри空间信息和场景细节。然后,根据这些信息,我们向 MLLM 提供空间意识相关的查询。我们在 MME、MM-Vet 等多模式大语言模型的benchmark中进行了广泛的实验,并确认了提议方法的可行性和有效性。
Muscle volume quantification: guiding transformers with anatomical priors
results: 实验结果显示,这种混合模型可以从较小的数据库中训练出高精度的预测,而且邻接矩阵优化损失函数可以提高预测的精度。Abstract
Muscle volume is a useful quantitative biomarker in sports, but also for the follow-up of degenerative musculo-skelletal diseases. In addition to volume, other shape biomarkers can be extracted by segmenting the muscles of interest from medical images. Manual segmentation is still today the gold standard for such measurements despite being very time-consuming. We propose a method for automatic segmentation of 18 muscles of the lower limb on 3D Magnetic Resonance Images to assist such morphometric analysis. By their nature, the tissue of different muscles is undistinguishable when observed in MR Images. Thus, muscle segmentation algorithms cannot rely on appearance but only on contour cues. However, such contours are hard to detect and their thickness varies across subjects. To cope with the above challenges, we propose a segmentation approach based on a hybrid architecture, combining convolutional and visual transformer blocks. We investigate for the first time the behaviour of such hybrid architectures in the context of muscle segmentation for shape analysis. Considering the consistent anatomical muscle configuration, we rely on transformer blocks to capture the longrange relations between the muscles. To further exploit the anatomical priors, a second contribution of this work consists in adding a regularisation loss based on an adjacency matrix of plausible muscle neighbourhoods estimated from the training data. Our experimental results on a unique database of elite athletes show it is possible to train complex hybrid models from a relatively small database of large volumes, while the anatomical prior regularisation favours better predictions.
摘要
筋量是运动领域中有用的量化生物标志,同时也可以用于跟踪萎缩肌骨疾病的进程。除了量度筋量外,可以从医疗图像中提取其他形态生物标志。 manual segmentation 仍然是今天的标准方法,尽管它很时间consuming。我们提议一种自动 segmentation 18 个Lower limb 肌肉的3D 磁共振成像图像,以帮助这种形态分析。由于不同肌肉组织的组织学特征无法在 MR 图像中分辨,因此肌肉分 segmentation 算法无法仅基于外观而进行。然而,这些边缘很难于检测,并且在不同主题之间存在差异。为了解决这些挑战,我们提议一种 hybrid 架构,结合 convolutional 和视觉转换块。我们在 muscle segmentation 领域中首次研究了这种混合架构的行为。由于肌肉组织的一致性,我们利用转换块来捕捉肌肉之间的长距离关系。为了进一步利用 анатомиче priors,我们的第二个贡献是添加一个基于 adjacency 矩阵的准则损失,该矩阵在训练数据中估计肌肉之间的可能性关系。我们在一个Unique 的数据库中进行了实验,并证明可以从一个相对较小的数据库中训练复杂的混合模型,而且准则损失会帮助预测更好。
Combining Shape Completion and Grasp Prediction for Fast and Versatile Grasping with a Multi-Fingered Hand
results: 实验表明,该pipeline能够成功地抓取各种家用物品,只需要单个视角点云图像。整个管道快速,只需要约1秒完成物体形状预测(0.7秒)和生成1000个抓取(0.3秒)。Abstract
Grasping objects with limited or no prior knowledge about them is a highly relevant skill in assistive robotics. Still, in this general setting, it has remained an open problem, especially when it comes to only partial observability and versatile grasping with multi-fingered hands. We present a novel, fast, and high fidelity deep learning pipeline consisting of a shape completion module that is based on a single depth image, and followed by a grasp predictor that is based on the predicted object shape. The shape completion network is based on VQDIF and predicts spatial occupancy values at arbitrary query points. As grasp predictor, we use our two-stage architecture that first generates hand poses using an autoregressive model and then regresses finger joint configurations per pose. Critical factors turn out to be sufficient data realism and augmentation, as well as special attention to difficult cases during training. Experiments on a physical robot platform demonstrate successful grasping of a wide range of household objects based on a depth image from a single viewpoint. The whole pipeline is fast, taking only about 1 s for completing the object's shape (0.7 s) and generating 1000 grasps (0.3 s).
摘要
握持无前知 objetcs 是助手 роботиCS中高度相关的技能。然而,在这种通用设定下,这问题仍然是开放问题,特别是当 grasping 是多指手部中的多种 grasping 时。我们提出了一种新的、快速、高准确度的深度学习管道,包括基于单个深度图像的形状完成模块,然后是基于预测对象形状的抓取预测器。形状完成网络基于VQDIF,预测的是在任意查询点的空间占用值。抓取预测器使用我们的两个阶段架构,首先使用回归模型生成手势,然后每个姿势使用指joint配置进行回归。关键因素发现是足够的数据实际和扩展,以及特别关注具有困难的情况的训练。实验在物理 робо臂平台上表明,可以成功地基于单个视角的深度图像抓取各种家用品。整个管道快速,只需0.7秒完成对象的形状(0.3秒)和生成1000个抓取(0.3秒)。
Improving Entropy-Based Test-Time Adaptation from a Clustering View
results: 通过对EBTTA方法进行cluster视角的解释,提供了更深刻的理解EBTTA的机制,并且提出了一些改进EBTTA的方法,包括Robust label assignment、重量调整和梯度积累,实验结果表明我们的方法可以在多个 dataset 上 achieve consistent improvement。Abstract
Domain shift is a common problem in the realistic world, where training data and test data follow different data distributions. To deal with this problem, fully test-time adaptation (TTA) leverages the unlabeled data encountered during test time to adapt the model. In particular, Entropy-Based TTA (EBTTA) methods, which minimize the prediction's entropy on test samples, have shown great success. In this paper, we introduce a new perspective on the EBTTA, which interprets these methods from a view of clustering. It is an iterative algorithm: 1) in the assignment step, the forward process of the EBTTA models is the assignment of labels for these test samples, and 2) in the updating step, the backward process is the update of the model via the assigned samples. Based on the interpretation, we can gain a deeper understanding of EBTTA, where we show that the entropy loss would further increase the largest probability. Accordingly, we offer an alternative explanation that why existing EBTTA methods are sensitive to initial assignments, outliers, and batch size. This observation can guide us to put forward the improvement of EBTTA. We propose robust label assignment, weight adjustment, and gradient accumulation to alleviate the above problems. Experimental results demonstrate that our method can achieve consistent improvements on various datasets. Code is provided in the supplementary material.
摘要
域名shift是现实世界中的一个常见问题,training data和test datafollow different data distributions。为解决这个问题,完全的test-time adaptation(TTA)利用在测试时遇到的无标签数据来适应模型。特别是Entropy-Based TTA(EBTTA)方法,which minimizes the prediction's entropy on test samples, have shown great success. 在这篇论文中,我们介绍了一新的视角对EBTTA,即 interpreting these methods as a view of clustering. It is an iterative algorithm: 1) in the assignment step, the forward process of the EBTTA models is the assignment of labels for these test samples, and 2) in the updating step, the backward process is the update of the model via the assigned samples. Based on the interpretation, we can gain a deeper understanding of EBTTA, where we show that the entropy loss would further increase the largest probability. Accordingly, we offer an alternative explanation that why existing EBTTA methods are sensitive to initial assignments, outliers, and batch size. This observation can guide us to put forward the improvement of EBTTA. We propose robust label assignment, weight adjustment, and gradient accumulation to alleviate the above problems. Experimental results demonstrate that our method can achieve consistent improvements on various datasets. 代码可以在补充材料中找到。
SemanticBoost: Elevating Motion Generation with Augmented Textual Cues
for: 本文 targets at addressing the difficulties in generating motions from intricate semantic descriptions, such as insufficient semantic annotations in datasets and weak contextual understanding.
results: 对 Humanml3D 数据集进行实验表明,SemanticBoost 比 auto-regressive-based 技术高效,实现了最新的性能,而且保持了实际和平滑的运动生成质量。Abstract
Current techniques face difficulties in generating motions from intricate semantic descriptions, primarily due to insufficient semantic annotations in datasets and weak contextual understanding. To address these issues, we present SemanticBoost, a novel framework that tackles both challenges simultaneously. Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD). The Semantic Enhancement module extracts supplementary semantics from motion data, enriching the dataset's textual description and ensuring precise alignment between text and motion data without depending on large language models. On the other hand, the CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences by effectively capturing context information and aligning the generated motion with the given textual descriptions. Distinct from existing methods, our approach can synthesize accurate orientational movements, combined motions based on specific body part descriptions, and motions generated from complex, extended sentences. Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques, achieving cutting-edge performance on the Humanml3D dataset while maintaining realistic and smooth motion generation quality.
摘要
当前技术面临着从复杂的 semantic 描述中生成动作的困难,主要是因为数据集中的 semantic 注解不够和 contextual 理解不强。为解决这些问题,我们提出 SemanticBoost 框架,这个框架同时解决了这两个问题。我们的框架包括 semantic 增强模块和 context-attuned motion denoiser (CAMD)。semantic 增强模块从动作数据中提取更多的 semantics,使 dataset 的文本描述更加详细,从而确保文本和动作数据之间的精确对应,不需要依赖于大型语言模型。在另一方面,CAMD 方法提供了一个涵盖全的解决方案,可以生成高质量、semantic 一致的动作序列,通过有效地捕捉 context information 和将生成的动作与给定的文本描述进行对应。与现有方法不同,我们的方法可以生成准确的 orientational 运动、基于specific body part 的合并运动和从复杂、扩展的句子中生成的动作。我们的实验结果表明,SemanticBoost 作为一种 diffusion-based 方法,在 Humanml3D dataset 上表现出了 cutting-edge 性能,同时保持了真实和平滑的动作生成质量。
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests
results: 研究发现,GPT 家族的指导训练 LLM 表现出色,常常也超过了7-10岁的儿童的表现。基础 LLM 通常无法解决 ToM 任务,即使使用特定的提示。研究认为,语言和 ToM 的演化和发展可能帮助解释 instruciton-tuning 添加了什么:奖励合作交流,考虑到对话者和场景。Abstract
To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs' robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs.
摘要
哪些程度应归属大语言模型(LLM)的认知能力,如理解目的和信念的理论心(ToM)?我们在这个emerging debate中添加了一些测试基于11个基础和指导调参的LLM,包括 beyond the dominant false-belief paradigm的非literal语言使用和 recursive intentionality; (ii)使用 newly rewritten versions of standardized tests to gauge LLMs' robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks.我们发现,基于GPT家族的指导调参LLMs表现出色,经常也超过了儿童的表现。基础LLMs几乎无法解决ToM任务,即使使用特殊的提示。我们建议认为,语言和ToM的演化和发展可能帮助解释 what instruction-tuning adds:奖励合作交流,考虑到对话者和上下文。我们 conclude by advocating for a nuanced perspective on ToM in LLMs。
Causal Interpretation of Self-Attention in Pre-Trained Transformers
results: 研究发现,使用这种方法可以在不知情的情况下学习输入序列的 causal 结构,并且可以使用现有的约束基于算法来完成零基础 causal 探索。 Additionally, the authors demonstrate the effectiveness of their method on two tasks: sentiment classification (NLP) and recommendation.Abstract
We propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The structural equation model can be interpreted, in turn, as a causal structure over the input symbols under the specific context of the input sequence. Importantly, this interpretation remains valid in the presence of latent confounders. Following this interpretation, we estimate conditional independence relations between input symbols by calculating partial correlations between their corresponding representations in the deepest attention layer. This enables learning the causal structure over an input sequence using existing constraint-based algorithms. In this sense, existing pre-trained Transformers can be utilized for zero-shot causal-discovery. We demonstrate this method by providing causal explanations for the outcomes of Transformers in two tasks: sentiment classification (NLP) and recommendation.
摘要
我们提出了对Transformer神经网络架构的自注意力做出 causal 解释。我们视自注意力为一个计算 structural equation model 的机制,这个模型可以在特定上下文中被解释为输入序列上的字符(token)之间的 causal 结构。important 的是,这个解释在具有潜在干扰者的情况下仍然有效。根据这个解释,我们可以通过计算深层注意层中对应的表现之间的偏相关性来估计输入序列中ymbol之间的 conditional independence 关系。这使得可以使用现有的几何约束基本的算法来学习输入序列的 causal 结构。在这个意义上,现有的预训Transformers可以被利用来进行零件 causal-发现。我们在两个任务中进行了实际的示例:情感分类(NLP)和推荐。
Revolutionizing Global Food Security: Empowering Resilience through Integrated AI Foundation Models and Data-Driven Solutions
results: 研究表明,基于AI的基础模型可以准确预测食品产量、改善资源分配和支持 Informed决策,这些模型在解决全球食品安全问题上发挥了重要作用,为实现可持续可靠的食品未来做出了重要贡献。Abstract
Food security, a global concern, necessitates precise and diverse data-driven solutions to address its multifaceted challenges. This paper explores the integration of AI foundation models across various food security applications, leveraging distinct data types, to overcome the limitations of current deep and machine learning methods. Specifically, we investigate their utilization in crop type mapping, cropland mapping, field delineation and crop yield prediction. By capitalizing on multispectral imagery, meteorological data, soil properties, historical records, and high-resolution satellite imagery, AI foundation models offer a versatile approach. The study demonstrates that AI foundation models enhance food security initiatives by providing accurate predictions, improving resource allocation, and supporting informed decision-making. These models serve as a transformative force in addressing global food security limitations, marking a significant leap toward a sustainable and secure food future.
摘要
Translation notes:* "Food security" is translated as "食品安全" (shí pin an quan)* "global concern" is translated as "全球关注" (quán jiāo guān zhù)* "precise and diverse data-driven solutions" is translated as "精准多样数据驱动解决方案" (jīng zhù duō yàng shù yì jīng yì)* "AI foundation models" is translated as "人工智能基础模型" (rén gōng zhì neng jī bào mó del)* "crop type mapping" is translated as "作物类型Mapping" (zuò wù xìng bǐng mapping)* "cropland mapping" is translated as "耕地Mapping" (jiāng dì mapping)* "field delineation" is translated as "场地划分" (chǎng dì hú fēn)* "crop yield prediction" is translated as "作物产量预测" (zuò wù chǎng liàng yù jì)* "multispectral imagery" is translated as "多spectral成像" (duō yǐng chéng xiàng)* "meteorological data" is translated as "气象数据" (qì xiàng shù yì)* "soil properties" is translated as "土壤属性" (tǔ zhōng fù xìng)* "historical records" is translated as "历史记录" (lì shǐ jì lè)* "high-resolution satellite imagery" is translated as "高分辨率卫星成像" (gāo fēn bìng ràng wèi xīng chéng xiàng)
Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents
paper_authors: Woojun Kim, Yongjae Shin, Jongeui Park, Youngchul Sung
for: 解决深度强化学习中的偏袋问题,提高样本效率和安全性。
methods: 使用深度集成学习来改进强化学习中的重置方法,以提高样本效率和安全性。
results: 经过多种实验,包括安全强化学习领域的实验,研究发现该方法可以提高样本效率和安全性。Abstract
Deep reinforcement learning (RL) has achieved remarkable success in solving complex tasks through its integration with deep neural networks (DNNs) as function approximators. However, the reliance on DNNs has introduced a new challenge called primacy bias, whereby these function approximators tend to prioritize early experiences, leading to overfitting. To mitigate this primacy bias, a reset method has been proposed, which performs periodic resets of a portion or the entirety of a deep RL agent while preserving the replay buffer. However, the use of the reset method can result in performance collapses after executing the reset, which can be detrimental from the perspective of safe RL and regret minimization. In this paper, we propose a new reset-based method that leverages deep ensemble learning to address the limitations of the vanilla reset method and enhance sample efficiency. The proposed method is evaluated through various experiments including those in the domain of safe RL. Numerical results show its effectiveness in high sample efficiency and safety considerations.
摘要
深度强化学习(深度RL)通过与深度神经网络(DNNs)结合,已经实现了复杂任务的出色解决。然而,这种依赖DNNs带来了一个新的挑战—— primacy bias,这使得这些函数近似器倾向于把早期经验优先,导致过拟合。为了解决这种 primacy bias,一种重置方法已经提出,该方法在深度RLAgent中 periodic 重置一部分或整个的 Agent,保留缓存。然而,使用重置方法可能会导致执行重置后的性能崩溃,这可能是安全RL和 regret 最小化的视角下不利的。在这篇论文中,我们提出了一种新的重置基于方法,利用深度集成学习来解决重置方法的局限性,提高样本效率。我们通过多个实验,包括安全RL领域的实验,证明了该方法的效果。数据结果表明,该方法可以在高样本效率和安全考虑下达到出色的效果。
AutoMixer for Improved Multivariate Time-Series Forecasting on Business and IT Observability Data
for: This paper is written for improving the accuracy of business key performance indicator (Biz-KPI) forecasting, which is essential for enhancing business efficiency and revenue.
methods: The paper introduces a novel approach called AutoMixer, which combines channel-compressed pretraining and finetuning with a time-series Foundation Model (FM) to improve the accuracy of multivariate time series forecasting.
results: The paper demonstrates through detailed experiments and dashboard analytics that AutoMixer consistently improves the forecasting accuracy of Biz-KPIs by 11-15%, providing actionable business insights and enhancing decision-making.Abstract
The efficiency of business processes relies on business key performance indicators (Biz-KPIs), that can be negatively impacted by IT failures. Business and IT Observability (BizITObs) data fuses both Biz-KPIs and IT event channels together as multivariate time series data. Forecasting Biz-KPIs in advance can enhance efficiency and revenue through proactive corrective measures. However, BizITObs data generally exhibit both useful and noisy inter-channel interactions between Biz-KPIs and IT events that need to be effectively decoupled. This leads to suboptimal forecasting performance when existing multivariate forecasting models are employed. To address this, we introduce AutoMixer, a time-series Foundation Model (FM) approach, grounded on the novel technique of channel-compressed pretrain and finetune workflows. AutoMixer leverages an AutoEncoder for channel-compressed pretraining and integrates it with the advanced TSMixer model for multivariate time series forecasting. This fusion greatly enhances the potency of TSMixer for accurate forecasts and also generalizes well across several downstream tasks. Through detailed experiments and dashboard analytics, we show AutoMixer's capability to consistently improve the Biz-KPI's forecasting accuracy (by 11-15\%) which directly translates to actionable business insights.
摘要
企业过程效率取决于商业关键性表现指标 (Biz-KPI),而 IT 失败可能会对其产生负面影响。商业和信息技术观察 (BizITObs) 数据将 Biz-KPI 和 IT 事件通道组合成为多变量时间系列数据,可以预测 Biz-KPI 的发展趋势。然而,BizITObs 数据通常具有 Biz-KPI 和 IT 事件之间有用和噪声的交互,需要有效地隔离。这会导致使用现有的多变量预测模型时,预测性能受到限制。为解决这一问题,我们介绍 AutoMixer,一种基于时间序列基模型 (FM) 的方法,借鉴了频率压缩预训练和融合TSMixer模型。AutoMixer 可以强化 TSMixer 模型的精度预测,同时也能够在多个下游任务中广泛应用。通过详细的实验和达标分析,我们示出 AutoMixer 可以不断提高 Biz-KPI 的预测精度(11-15%),直接对企业做出有用的业务指导。
Constructing Sample-to-Class Graph for Few-Shot Class-Incremental Learning
results: 在三个 популяр的benchmark数据集上进行实验,表明我们的方法可以明显超过基eline,并在FSCIL中设置新的州OF-the-art结果。Abstract
Few-shot class-incremental learning (FSCIL) aims to build machine learning model that can continually learn new concepts from a few data samples, without forgetting knowledge of old classes. The challenges of FSCIL lies in the limited data of new classes, which not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. As proved in early studies, building sample relationships is beneficial for learning from few-shot samples. In this paper, we promote the idea to the incremental scenario, and propose a Sample-to-Class (S2C) graph learning method for FSCIL. Specifically, we propose a Sample-level Graph Network (SGN) that focuses on analyzing sample relationships within a single session. This network helps aggregate similar samples, ultimately leading to the extraction of more refined class-level features. Then, we present a Class-level Graph Network (CGN) that establishes connections across class-level features of both new and old classes. This network plays a crucial role in linking the knowledge between different sessions and helps improve overall learning in the FSCIL scenario. Moreover, we design a multi-stage strategy for training S2C model, which mitigates the training challenges posed by limited data in the incremental process. The multi-stage training strategy is designed to build S2C graph from base to few-shot stages, and improve the capacity via an extra pseudo-incremental stage. Experiments on three popular benchmark datasets show that our method clearly outperforms the baselines and sets new state-of-the-art results in FSCIL.
摘要
非常多shot类增量学习(FSCIL)目的是建立一个机器学习模型,可以从少量数据样本中不断学习新的概念,而不会忘记过去的类知识。 however,有限的新类数据不仅会导致重要的拟合问题,还会把著名的忘记问题加剧。在这篇论文中,我们推广了这个想法到增量场景,并提出了一种Sample-to-Class(S2C)图学习方法。具体来说,我们提出了一种Sample-level Graph Network(SGN),它专门关注在单个会话中的样本关系。这个网络帮助综合类似的样本,从而提取更加精细的类层特征。然后,我们提出了一种Class-level Graph Network(CGN),它建立了新和旧类特征之间的连接。这个网络在不同会话之间的知识连接中扮演着关键的角色,帮助改善FSCIL场景中的总体学习。此外,我们设计了一种多 Stage 训练策略,以解决增量过程中有限数据训练的挑战。这种多 Stage 训练策略是建立S2C图从基础到几shot阶段,然后通过额外的 Pseudo-增量阶段进行改进。在三个流行的标准 benchmark 数据集上进行实验,我们的方法明显超过了基eline,并在FSCIL场景中设置了新的状态纪录。
Beyond Average Return in Markov Decision Processes
results: 论文发现只有通用均值可以被优化,其他函数则只能估计。论文还提供了估计错误 bound 和这种方法的潜在应用和局限性。Abstract
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).DistRL permits, however, to evaluate other functionals approximately. We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.These results contribute to advancing the theory of Markov Decision Processes by examining overall characteristics of the return, and particularly risk-conscious strategies.
摘要
<>将文本翻译成简化中文。<>马可夫决策过程(Markov Decision Processes,MDP)中可以计算和优化的奖励功能有哪些?在无限远程、未折扣设定下,动态计划(Dynamic Programming,DP)只能有效处理某些类型的统计数据。我们总结这些类型的特征,并给出一个新的答案 для计划问题。有趣的是,我们证明只有泛化平均才能够准确地优化,即使在更一般的分布式奖励学习(Distributional Reinforcement Learning,DistRL)框架下。DistRL 允许评估其他函数的估计值,并提供误差 bound для这些估计值。我们讨论这种方法的潜在和局限性,并对这些结果的推广和应用进行了评论。这些研究Result contributes to the development of Markov Decision Processes theory by examining the overall characteristics of the return, and particularly risk-conscious strategies.
Artificial Intelligence for reverse engineering: application to detergents using Raman spectroscopy
paper_authors: Pedro Marote, Marie Martin, Anne Bonhomme, Pierre Lantéri, Yohann Clément
For: This paper aims to develop a method for quickly assessing the potential toxicity of commercial products, particularly detergent products, using digital tools and analytical techniques.* Methods: The authors use a combination of spectral databases, mixture databases, experimental design, chemometrics, and machine learning algorithms to identify the constituents of the mixture and estimate its composition. They also use various sample preparation methods, such as raw samples and diluted/concentrated samples, and Raman spectroscopy to analyze the samples.* Results: The authors are able to identify the constituents of the detergent products and estimate their composition using the proposed method. This method can be applied to other matrices and industries for pollutant identification and contamination assessment, leading to time savings and improved quality control.Abstract
The reverse engineering of a complex mixture, regardless of its nature, has become significant today. Being able to quickly assess the potential toxicity of new commercial products in relation to the environment presents a genuine analytical challenge. The development of digital tools (databases, chemometrics, machine learning, etc.) and analytical techniques (Raman spectroscopy, NIR spectroscopy, mass spectrometry, etc.) will allow for the identification of potential toxic molecules. In this article, we use the example of detergent products, whose composition can prove dangerous to humans or the environment, necessitating precise identification and quantification for quality control and regulation purposes. The combination of various digital tools (spectral database, mixture database, experimental design, Chemometrics / Machine Learning algorithm{\ldots}) together with different sample preparation methods (raw sample, or several concentrated / diluted samples) Raman spectroscopy, has enabled the identification of the mixture's constituents and an estimation of its composition. Implementing such strategies across different analytical tools can result in time savings for pollutant identification and contamination assessment in various matrices. This strategy is also applicable in the industrial sector for product or raw material control, as well as for quality control purposes.
摘要
现代化的复杂混合物反工程技术已经在今天变得非常重要。快速评估新商品的环境潜在危险性是一项实际分析挑战。通过数字工具(数据库、化学ometrics、机器学习等)和分析技术(拉曼谱、near infrared谱、质谱等)可以识别潜在危险分子。在本文中,我们使用洗剂产品为例,其组成物可能对人类或环境构成威胁,需要精准的识别和量化,以确保质量控制和法规要求。通过不同的数字工具(spectral Database、mixture Database、实验设计、化学ometrics / 机器学习算法等)和不同的样本准备方法(Raw sample、多种浓缩/分解样本),使用拉曼谱技术,可以识别混合物的成分和估算其组成。实施这些策略在不同的分析工具上可以节省污染物识别和污染评估的时间。这种策略也适用于工业部门的产品或原材料控制、质量控制等。
Diversified Node Sampling based Hierarchical Transformer Pooling for Graph Representation Learning
paper_authors: Gaichao Li, Jinsong Chen, John E. Hopcroft, Kun He
for: Graph Transformer Pooling (GTPool) aims to improve graph pooling methods by capturing long-range pairwise interactions and selecting more representative nodes.
methods: GTPool uses a scoring module based on the self-attention mechanism to measure the importance of nodes, and a diversified sampling method called Roulette Wheel Sampling (RWS) to select nodes from different scoring intervals.
results: GTPool outperforms existing popular graph pooling methods on 11 benchmark datasets, effectively obtaining long-range information and selecting more representative nodes.Here’s the full summary in Simplified Chinese:
results: GTPool在11个标准测试集上表现出色,胜过现有的流行graph pooling方法,fficiently obtain long-range information和选择更加代表性的节点。Abstract
Graph pooling methods have been widely used on downsampling graphs, achieving impressive results on multiple graph-level tasks like graph classification and graph generation. An important line called node dropping pooling aims at exploiting learnable scoring functions to drop nodes with comparatively lower significance scores. However, existing node dropping methods suffer from two limitations: (1) for each pooled node, these models struggle to capture long-range dependencies since they mainly take GNNs as the backbones; (2) pooling only the highest-scoring nodes tends to preserve similar nodes, thus discarding the affluent information of low-scoring nodes. To address these issues, we propose a Graph Transformer Pooling method termed GTPool, which introduces Transformer to node dropping pooling to efficiently capture long-range pairwise interactions and meanwhile sample nodes diversely. Specifically, we design a scoring module based on the self-attention mechanism that takes both global context and local context into consideration, measuring the importance of nodes more comprehensively. GTPool further utilizes a diversified sampling method named Roulette Wheel Sampling (RWS) that is able to flexibly preserve nodes across different scoring intervals instead of only higher scoring nodes. In this way, GTPool could effectively obtain long-range information and select more representative nodes. Extensive experiments on 11 benchmark datasets demonstrate the superiority of GTPool over existing popular graph pooling methods.
摘要
graph pooling方法在下采集图时得到了广泛的应用,并在多种图级任务中达到了很好的结果,如图 классификация和图生成。一种重要的笔Push called node dropping pooling寻求通过学习可读的分数函数来Drop nodes with relatively lower significance scores。然而,现有的节点排除方法受到两个限制:(1)为每个卷积节点,这些模型很难Capture long-range dependent relationships,因为它们主要使用GNN作为后备;(2)只Pooling the highest-scoring nodes tends to preserve similar nodes, thus discarding the abundant information of low-scoring nodes。为了解决这些问题,我们提出了一种图变换池方法,称为GTPool,该方法将Transformer卷积推理引入节点排除池化,以高效地捕捉长范围对应关系并同时采样多样化的节点。具体来说,我们设计了一个分数模块,基于自注意机制,可以同时考虑全局上下文和局部上文,评估节点的重要性更全面。GTPool还使用了一种多样化采样方法,称为Roulette Wheel Sampling (RWS),可以自由地保留节点在不同的分数间,而不是仅仅保留高分节点。这样,GTPool可以有效地获取长范围信息和选择更代表性的节点。我们在11个标准测试集上进行了广泛的实验,并证明了GTPool在现有的流行graph pooling方法的基础上具有显著的优势。
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations
results: 这篇论文的实验结果表明,MathOctopus-13B模型在MGSM测试集上达到了47.6%的准确率,超过了ChatGPT 46.3%的表现。此外,通过广泛的实验,发现了一些重要的观察和发现,如在多语言上扩展拒绝采样策略可以提高模型性能,并且使用多语言 parallel corpora进行数学终端精度训练可以提高模型的多语言和单语言性能。Abstract
Existing research predominantly focuses on developing powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages, thus addressing the issue of training data scarcity in xMR tasks. Based on the collected dataset, we propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs and exhibit superiority over ChatGPT in few-shot scenarios. Notably, MathOctopus-13B reaches 47.6% accuracy which exceeds ChatGPT 46.3% on MGSM testset. Beyond remarkable results, we unearth several pivotal observations and insights from extensive experiments: (1) When extending the rejection sampling strategy to the multilingual context, it proves effective for model performances, albeit limited. (2) Employing parallel corpora for math Supervised Fine-Tuning (SFT) across multiple languages not only significantly enhances model performance multilingually but also elevates their monolingual performance. This indicates that crafting multilingual corpora can be regarded as a vital strategy for enhancing model performance in a specific language, especially in mathematical reasoning tasks. For instance, MathOctopus-7B improves its counterparts that trained on English from 42.2% to 50.8% on GSM8K testset.
摘要
现有研究主要集中于开发强大的语言学习模型(LLMs),以便在单语言中提高数学逻辑能力。然而,很少有人研究在多语言上保持效果。为了填补这一漏洞,本文首次尝试了开发多语言数学逻辑模型(xMR)。我们首先通过翻译,构建了包含十种不同语言的多语言数学逻辑指导集(MGSM8KInstruct),解决了在xMR任务中训练数据的缺乏问题。基于收集的数据集,我们提出了不同的训练策略,用于建立强大的xMR LLMs,称为MathOctopus,与已有的开源LLMs相比,具有显著的优势,并在几个语言上展示出了对ChatGPT的超越。尤其是MathOctopus-13B在MGSM测试集上达到了47.6%的准确率,超过了ChatGPT的46.3%。我们也从广泛的实验中发现了一些重要的观察和发现:(1)在多语言上扩展拒绝采样策略,虽然有限,仍然有效。(2)在多语言中使用平行 corpus для数学监督精度提升(SFT),不仅可以在多语言中显著提高模型性能,还可以提高单语言中的模型性能。这表明,制作多语言 corpus 可以被视为一种重要的提高模型性能的策略,特别是在数学逻辑任务中。例如,MathOctopus-7B在GSM8K测试集上由42.2%提高到50.8%。
Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape
paper_authors: Wei Zhao, Yijun Wang, Tianyu He, Lianying Yin, Jianxin Lin, Xin Jin for:本研究旨在实现具有自然和精确的语音驱动3D面部动画,以提高人工智能创作中的人物表现。methods:我们引入了VividTalker框架,其主要包括分解面部动画为头部pose和嘴部动画,并将它们分别传递到独立的零域空间中。然后,这些特征通过窗口基于Transformer架构进行autoregressive生成。results:实验结果显示,VividTalker在语音驱动3D面部动画方面具有较高的自然和精确性,并且能够实现自然的头部pose和面部细节。Abstract
The creation of lifelike speech-driven 3D facial animation requires a natural and precise synchronization between audio input and facial expressions. However, existing works still fail to render shapes with flexible head poses and natural facial details (e.g., wrinkles). This limitation is mainly due to two aspects: 1) Collecting training set with detailed 3D facial shapes is highly expensive. This scarcity of detailed shape annotations hinders the training of models with expressive facial animation. 2) Compared to mouth movement, the head pose is much less correlated to speech content. Consequently, concurrent modeling of both mouth movement and head pose yields the lack of facial movement controllability. To address these challenges, we introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation characterized by flexible head pose and natural facial details. Specifically, we explicitly disentangle facial animation into head pose and mouth movement and encode them separately into discrete latent spaces. Then, these attributes are generated through an autoregressive process leveraging a window-based Transformer architecture. To augment the richness of 3D facial animation, we construct a new 3D dataset with detailed shapes and learn to synthesize facial details in line with speech content. Extensive quantitative and qualitative experiments demonstrate that VividTalker outperforms state-of-the-art methods, resulting in vivid and realistic speech-driven 3D facial animation.
摘要
创建生动的语音驱动3D人脸动画需要自然和精准的声音输入和脸部表达的同步。然而,现有的方法仍然无法渲染具有灵活头姿和自然的脸部细节(例如皱纹)。这种限制主要归结于两点:1)收集详细3D人脸形状的训练集是非常昂贵的。这种训练集缺乏细节的形状标注,使得模型受到较少的表达动画训练。2)与口型运动相比,头姿与语音内容的相关性远低。因此,同时模型口型运动和头姿的控制会增加无法控制的脸部运动。为解决这些挑战,我们介绍了VividTalker,一个新的框架,用于实现语音驱动3D人脸动画,具有灵活的头姿和自然的脸部细节。我们将 facial animation 分解成头姿和口型运动两个部分,并将它们分别编码到独立的隐藏空间中。然后,通过窗口基本的 transformer 架构进行核算,生成这些特征。为了增加3D人脸动画的丰富性,我们构建了一个新的3D数据集,包含细节的形状,并学习在语音内容上Synthesize facial details。广泛的量化和质量测试表明,VividTalker 超过了当前的方法,实现了生动和真实的语音驱动3D人脸动画。
VisPercep: A Vision-Language Approach to Enhance Visual Perception for People with Blindness and Low Vision
results: 实验结果表明,该方法可以准确地识别对象,并为pBLV提供有用的环境描述和障碍物检测。Abstract
People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards on their own. In this paper, we present a pioneering approach that leverages a large vision-language model to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environments and providing warnings about the potential risks. Our method begins by leveraging a large image tagging model (i.e., Recognize Anything (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV using prompt engineering. By combining the prompt and input image, a large vision-language model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing the environmental objects and scenes, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method is able to recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV.
摘要
人们 WITH 视障和低视力 (pBLV) 在不熟悉环境中实际遇到严重的挑战,包括缺乏实际视觉识别和环境探索。此外,由于视力损害,pBLV 很难自主检测和识别可能的滑块障碍。在这篇论文中,我们提出了一个创新的方法,利用大量视力语言模型来增强pBLV的视觉感知,提供精确和详细的环境描述,并给出环境中可能的障碍警告。我们的方法开始由一个大量图像标签模型(Recognize Anything (RAM))来识别摄取的图像中的所有通用物品。然后,认知结果和用户查询被融合为特定 для pBLV 的问题,使用提问工程学。通过融合问题和输入图像,一个大量视力语言模型(InstructBLIP)产生了精确和详细的环境描述,并通过分析环境物品和场景,对应用者提供有用的资讯和分析。我们通过实验证明了我们的方法可以对实际图像进行正确的识别和提供有用的环境描述和障碍警告。
Choose A Table: Tensor Dirichlet Process Multinomial Mixture Model with Graphs for Passenger Trajectory Clustering
methods: 本研究提出了一种新的tensor Dirichlet Process Multinomial Mixture model with graphs(TDPMM-G),该模型能够保持乘客旅行记录的层次结构,并在一步式的方式中对多维信息进行分 clustering,同时自动确定分 clustering 数量。图像空间中的 semantic 邻居关系也被利用于社区探测中。
results: 在基于香港地铁乘客数据的案例研究中,TDPMM-G 模型能够自动地确定分 clustering 数量,并提供了更好的分 clustering 结果, measured by within-cluster compactness 和 cross-cluster separateness。代码可以在https://github.com/bonaldli/TensorDPMM-G 上获取。Abstract
Passenger clustering based on trajectory records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, including multiple trips within each passenger and multi-dimensional information about each trip. Furthermore, existing approaches rely on an accurate specification of the clustering number to start. Finally, existing methods do not consider spatial semantic graphs such as geographical proximity and functional similarity between the locations. In this paper, we propose a novel tensor Dirichlet Process Multinomial Mixture model with graphs, which can preserve the hierarchical structure of the multi-dimensional trip information and cluster them in a unified one-step manner with the ability to determine the number of clusters automatically. The spatial graphs are utilized in community detection to link the semantic neighbors. We further propose a tensor version of Collapsed Gibbs Sampling method with a minimum cluster size requirement. A case study based on Hong Kong metro passenger data is conducted to demonstrate the automatic process of cluster amount evolution and better cluster quality measured by within-cluster compactness and cross-cluster separateness. The code is available at https://github.com/bonaldli/TensorDPMM-G.
摘要
passenger clustering based on trajectory records 是交通运营商必备的。然而,现有方法难以对乘客进行聚类,因为乘客旅行记录具有层次结构,包括每个乘客内部有多次旅行和多维信息。此外,现有方法需要准确指定聚类数量开始。最后,现有方法不考虑地理 semantic graph 和功能相似性 междуlocation。本文提出了一种新的 tensor Dirichlet Process Multinomial Mixture model with graphs,可以保持多维旅行记录的层次结构,并在一步性的方式中聚类,并且可以自动确定聚类数量。使用地理 graph 进行社区检测,以链接 semantic 邻居。我们还提出了tensor version of Collapsed Gibbs Sampling method with minimum cluster size requirement。一个基于香港地铁乘客数据的案例研究,以示出自动确定聚类数量的过程和更好的聚类质量(内部紧凑性和跨聚类分离度)。代码可以在https://github.com/bonaldli/TensorDPMM-G中找到。
A Systematic Review for Transformer-based Long-term Series Forecasting
results: 论文提供了一个全面的时间序列预测 dataset和相关的评价指标,以及有价值的建议和技巧 для在时间序列分析中训练 transformer。Abstract
The emergence of deep learning has yielded noteworthy advancements in time series forecasting (TSF). Transformer architectures, in particular, have witnessed broad utilization and adoption in TSF tasks. Transformers have proven to be the most successful solution to extract the semantic correlations among the elements within a long sequence. Various variants have enabled transformer architecture to effectively handle long-term time series forecasting (LTSF) tasks. In this article, we first present a comprehensive overview of transformer architectures and their subsequent enhancements developed to address various LTSF tasks. Then, we summarize the publicly available LTSF datasets and relevant evaluation metrics. Furthermore, we provide valuable insights into the best practices and techniques for effectively training transformers in the context of time-series analysis. Lastly, we propose potential research directions in this rapidly evolving field.
摘要
深度学习的出现对时序预测(TSF)带来了引人注目的进步。特别是在Transformer架构方面,它在TSF任务中得到了广泛的应用和采用。Transformer架构能够很好地提取时序序列中元素之间的 semantic 相关性。不同的变体使得Transformer架构能够有效地处理长期时序预测(LTSF)任务。在这篇文章中,我们首先提供了Transformer架构的全面概述,以及其后续的改进方法,用于解决不同的LTSF任务。然后,我们列举了公共可用的LTSF数据集和相关的评价指标。此外,我们还提供了有价值的时间序列分析训练最佳实践。最后,我们提出了这个领域的可能的未来研究方向。
results: GPT-4模型在图灵测试中的最好提示得分为41%,高于ELIZA和GPT-3.5的基eline(27%和14%),但仍然不足于人类参与者的基eline(63%)。Abstract
We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%). Participants' decisions were based mainly on linguistic style (35%) and socio-emotional traits (27%), supporting the idea that intelligence is not sufficient to pass the Turing Test. Participants' demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception. Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.
摘要
我们对 GPT-4 进行了一项公开的在线图灵测试。最佳 GPT-4 提示在游戏中达到了41%的成功率,超过了 ELIZA (27%)和 GPT-3.5 (14%)的基线,但 still falling short of chance and human participants (63%)的基线。参与者的决策基于主要是语言风格(35%)和社会情感特征(27%),支持智能不足以通过图灵测试的想法。参与者的背景、教育和 LLM 的熟悉度没有预测检测率,表明,即使深入了解系统并经常与之交互,也可能受到欺骗。虽然图灵测试有知名的限制,我们认为它仍然是自然语言交流和欺骗的有效评价工具。 AI 模型具有人类化的能力可能会有广泛的社会影响,我们分析了不同的策略和标准来评价人类化程度。
Handover Protocol Learning for LEO Satellite Networks: Access Delay and Collision Minimization
paper_authors: Ju-Hyung Lee, Chanyoung Park, Soohyun Park, Andreas F. Molisch
for: 本研究提出了一种基于深度优化学习(DRL)的手over(HO)协议,称为DHO,以解决低轨道 Earth 卫星网络中 HO 过程中的持续存在的长延迟问题。
methods: DHO 减少 Measurement Report(MR)阶段的延迟,通过使用预先确定的 LEO 卫星轨道模式进行预测。
results: DHO 在多种网络条件下比传统 HO 协议表现出色,包括访问延迟、碰撞率和手over成功率,这说明了 DHO 在实际网络中的实用性。此外,研究还检验了访问延迟与碰撞率之间的负担评估,以及 DHO 使用不同 DRL 算法的训练性能和收敛性。Abstract
This study presents a novel deep reinforcement learning (DRL)-based handover (HO) protocol, called DHO, specifically designed to address the persistent challenge of long propagation delays in low-Earth orbit (LEO) satellite networks' HO procedures. DHO skips the Measurement Report (MR) in the HO procedure by leveraging its predictive capabilities after being trained with a pre-determined LEO satellite orbital pattern. This simplification eliminates the propagation delay incurred during the MR phase, while still providing effective HO decisions. The proposed DHO outperforms the legacy HO protocol across diverse network conditions in terms of access delay, collision rate, and handover success rate, demonstrating the practical applicability of DHO in real-world networks. Furthermore, the study examines the trade-off between access delay and collision rate and also evaluates the training performance and convergence of DHO using various DRL algorithms.
摘要
In simplified Chinese, the text would be:这个研究提出了一种基于深度优化学习(DRL)的手over(HO)协议,称为DHO,特意设计用于解决低轨道卫星网络中HO过程中的持续存在的长延迟问题。DHO忽略MR阶段的测量报告,通过在训练后使用预先确定的LEO卫星轨道模式来利用其预测能力。这种简化消除了MR阶段中的延迟,同时仍提供有效的HO决策。提议的DHO在多种网络条件下比传统HO协议出色,包括访问延迟、碰撞率和手over成功率,这表明DHO在实际网络中有 praktische应用性。此外,研究还检验了访问延迟和碰撞率之间的负面关系,以及DHO使用不同DRL算法的训练性能和结束。
Fraud Analytics Using Machine-learning & Engineering on Big Data (FAME) for Telecom
results: 研究成功地检测出了国际营收分享诈骗,false positive率低于5%。使用了More than 1 Terra Bytes of Call Detail Record from a reputed wholesale carrier and overseas telecom transit carrier。Abstract
Telecom industries lose globally 46.3 Billion USD due to fraud. Data mining and machine learning techniques (apart from rules oriented approach) have been used in past, but efficiency has been low as fraud pattern changes very rapidly. This paper presents an industrialized solution approach with self adaptive data mining technique and application of big data technologies to detect fraud and discover novel fraud patterns in accurate, efficient and cost effective manner. Solution has been successfully demonstrated to detect International Revenue Share Fraud with <5% false positive. More than 1 Terra Bytes of Call Detail Record from a reputed wholesale carrier and overseas telecom transit carrier has been used to conduct this study.
摘要
telecommunications industries globally lose 46.3 billion USD due to fraud. In the past, data mining and machine learning techniques (excluding rule-based approaches) have been used, but their efficiency has been low due to the rapidly changing fraud patterns. This paper proposes an industrialized solution approach with self-adaptive data mining techniques and the application of big data technologies to detect fraud and discover new fraud patterns in an accurate, efficient, and cost-effective manner. The solution has been successfully demonstrated to detect international revenue share fraud with less than 5% false positives. More than 1 terabyte of call detail record data from a reputable wholesale carrier and overseas telecom transit carrier has been used for this study.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.
In Search of Lost Online Test-time Adaptation: A Survey
results: 该论文的研究发现:1)转换器在多种领域的适应性强;2)许多OTTA方法的效果取决于较大的批处理大小;3)在适应过程中,稳定的优化和对干扰的抵抗是关键,特别是当批处理大小为1时。Abstract
In this paper, we present a comprehensive survey on online test-time adaptation (OTTA), a paradigm focused on adapting machine learning models to novel data distributions upon batch arrival. Despite the proliferation of OTTA methods recently, the field is mired in issues like ambiguous settings, antiquated backbones, and inconsistent hyperparameter tuning, obfuscating the real challenges and making reproducibility elusive. For clarity and a rigorous comparison, we classify OTTA techniques into three primary categories and subject them to benchmarks using the potent Vision Transformer (ViT) backbone to discover genuinely effective strategies. Our benchmarks span not only conventional corrupted datasets such as CIFAR-10/100-C and ImageNet-C but also real-world shifts embodied in CIFAR-10.1 and CIFAR-10-Warehouse, encapsulating variations across search engines and synthesized data by diffusion models. To gauge efficiency in online scenarios, we introduce novel evaluation metrics, inclusive of FLOPs, shedding light on the trade-offs between adaptation accuracy and computational overhead. Our findings diverge from existing literature, indicating: (1) transformers exhibit heightened resilience to diverse domain shifts, (2) the efficacy of many OTTA methods hinges on ample batch sizes, and (3) stability in optimization and resistance to perturbations are critical during adaptation, especially when the batch size is 1. Motivated by these insights, we pointed out promising directions for future research. The source code will be made available.
摘要
在这篇论文中,我们提出了一项全面的在线测试时适应(OTTA)概述,描述了在批处理到达时对新数据分布进行适应的机器学习模型的方法。尽管最近有很多关于OTTA方法的扩展,但是这个领域受到了各种问题的干扰,如模糊的设置、过时的基础模型和不一致的Hyperparameter调整,使得真正的挑战和可重现性变得易让。为了便于比较和检验,我们将OTTA技术分为三个主要类别,并对它们进行了 benchmark 使用potent Vision Transformer(ViT)基础,以发现真正有效的策略。我们的benchmark包括传统的损害数据集如CIFAR-10/100-C和ImageNet-C,以及实际世界中的shift,包括CIFAR-10.1和CIFAR-10-Warehouse,这些shift包括搜索引擎和Synthesized数据的扩散模型。为了衡量在线场景中的效率,我们引入了新的评价指标,包括FLOPs,这有助于揭示适应精度和计算开销之间的负荷。我们的发现与现有文献不同,显示:(1) transformers 具有多样化领域转移的高度抗性,(2)OTTA方法的效果很大程度上取决于批处理大小,(3)在适应过程中稳定的优化和抗扰性很重要,特别是当批处理大小为1。这些发现使我们更加积极地提出了未来研究的方向,源代码将被公开。
Generating Continuations in Multilingual Idiomatic Contexts
results: 模型对Literal和idiomatic上下文中的续写表现相对较为平均,小margin差距,同时模型在两种语言上的表现相似, indicating models’ robustness in this task.Abstract
The ability to process idiomatic or literal multiword expressions is a crucial aspect of understanding and generating any language. The task of generating contextually relevant continuations for narratives containing idiomatic (or literal) expressions can allow us to test the ability of generative language models (LMs) in understanding nuanced language containing non-compositional figurative text. We conduct a series of experiments using datasets in two distinct languages (English and Portuguese) under three different training settings (zero-shot, few-shot, and fine-tuned). Our results suggest that the models are only slightly better at generating continuations for literal contexts than idiomatic contexts, with exceedingly small margins. Furthermore, the models studied in this work perform equally well across both languages, indicating the robustness of generative models in performing this task.
摘要
Language models (LMs) 理解和生成非compositional figurative text的能力是一个关键方面。我们通过用英语和葡萄牙语两种不同的语言和零shot、几shot和精心适应的训练方式进行了一系列实验。我们发现模型在literal上和idiomatic上的继续生成性能几乎相同,差异非常小。此外,我们发现模型在这两种语言之间具有一定的坚定性。
Self-supervised Pre-training for Precipitation Post-processor
paper_authors: Sojung An, Junha Lee, Jiyeon Jang, Inchae Na, Wooyeon Park, Sujeong You
for: 提高地方气象预报模型的暴雨预测精度
methods: 使用深度学习方法对气象 físico预测模型进行修正
results: 实验结果表明,提议方法在地方气象预报中的暴雨修正性能高于其他方法Abstract
Securing sufficient forecast lead time for local precipitation is essential for preventing hazardous weather events. Nonetheless, global warming-induced climate change is adding to the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this work, we propose a deep learning-based precipitation post-processor approach to numerical weather prediction (NWP) models. The precipitation post-processor consists of (i) self-supervised pre-training, where parameters of encoder are pre-trained on the reconstruction of masked variables of the atmospheric physics domain, and (ii) transfer learning on precipitation segmentation tasks (target domain) from the pre-trained encoder. We also introduce a heuristic labeling approach for effectively training class-imbalanced datasets. Our experiment results in precipitation correction for regional NWP show that the proposed method outperforms other approaches.
摘要
为确保地方降水的预测预测时间足够,是预测恶势力天气事件的关键。然而,global warming-induced climate change导致严重降水事件的预测变得更加挑战性。在这种情况下,我们提议一种基于深度学习的降水后处理方法,用于数值天气预测模型(NWP)。降水后处理方法包括:(i)自动预训练,在掩码 атмосфер物理领域中的变量上进行自我预训练,以便在降水分类任务中进行转移学习。(ii)在降水分类任务(目标领域)中,使用预训练后的编码器进行转移学习。我们还提出了一种干预标注方法,用于有效地训练类偏树在降水分类任务中。我们的实验结果表明,提议的方法在地方NWP中的降水更正任务中表现出色,超过了其他方法。
paper_authors: Hyunseung Kim, Byungkun Lee, Hojoon Lee, Dongyoon Hwang, Sejik Park, Kyushik Min, Jaegul Choo
for: 提高无监督技能发现(USD)中的探索限制
methods: 使用导航技能选择和聚集方法,以增强探索和分化技能
results: 在复杂环境中表现出色,比基eline方法更高效,并提供了质量的代码和视觉化图像Here’s a more detailed explanation of each point:
for: The paper aims to address the challenge of limited exploration in unsupervised skill discovery (USD), which is a major problem in the field. The authors propose a novel algorithm called skill discovery with guidance (DISCO-DANCE) to improve exploration.
methods: The DISCO-DANCE algorithm selects a guide skill that has the highest potential to reach unexplored states, and then guides other skills to follow the guide skill. This helps to maximize the discriminability of the guided skills in unexplored states.
results: The authors evaluate the DISCO-DANCE algorithm on three challenging benchmarks, including two navigation tasks and a continuous control task. The results show that DISCO-DANCE outperforms other USD baselines in these environments, and provides high-quality code and visualizations.Abstract
In the field of unsupervised skill discovery (USD), a major challenge is limited exploration, primarily due to substantial penalties when skills deviate from their initial trajectories. To enhance exploration, recent methodologies employ auxiliary rewards to maximize the epistemic uncertainty or entropy of states. However, we have identified that the effectiveness of these rewards declines as the environmental complexity rises. Therefore, we present a novel USD algorithm, skill discovery with guidance (DISCO-DANCE), which (1) selects the guide skill that possesses the highest potential to reach unexplored states, (2) guides other skills to follow guide skill, then (3) the guided skills are dispersed to maximize their discriminability in unexplored states. Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD baselines in challenging environments, including two navigation benchmarks and a continuous control benchmark. Qualitative visualizations and code of DISCO-DANCE are available at https://mynsng.github.io/discodance.
摘要
在无监督技能发现(USD)领域,一个主要挑战是限制探索,主要由于行为偏离初始轨迹的惩罚。为提高探索,现有方法ologies使用辅助奖励来最大化状态的认知不确定性或熵。然而,我们发现在环境复杂性增加时,这些奖励的效果下降。因此,我们提出了一种新的 USD算法,即技能发现指南(DISCO-DANCE),它包括以下三个步骤:1. 选择具有最高潜在可达不探索状态的导航技能(guide skill)。2. 使其他技能跟随导航技能。3. 使用导航技能的探索策略来最大化被引导技能的分化程度。Empirical评估表明,DISCO-DANCE在复杂环境中比其他 USD基线表现出色,包括两个导航标准测试和一个连续控制标准测试。详细的Visualization和代码可以在 上获取。
results: 在 HURDAT 数据集上,与基准模型相比,我们的 GraphTransformer 方法显著提高了预测精度。Abstract
In this paper we introduce a novel framework for trajectory prediction of geospatial sequences using GraphTransformers. When viewed across several sequences, we observed that a graph structure automatically emerges between different geospatial points that is often not taken into account for such sequence modeling tasks. We show that by leveraging this graph structure explicitly, geospatial trajectory prediction can be significantly improved. Our GraphTransformer approach improves upon state-of-the-art Transformer based baseline significantly on HURDAT, a dataset where we are interested in predicting the trajectory of a hurricane on a 6 hourly basis.
摘要
在这篇论文中,我们介绍了一种新的方框架,用于地ospatial序列预测使用GraphTransformers。当考虑多个序列时,我们发现了不同的地ospatial点之间自然形成的图structure,通常不被考虑在这类序列模型任务中。我们示出了通过显式利用这个图结构,可以显著提高地ospatial轨迹预测。我们的GraphTransformer方法在HURDAT数据集上,即每6个小时预测风暴轨迹, siginificantly exceeded基eline。
Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?
methods: 我们提出了一种强度传递分析协议,并在多种实验中测试了其效果。 Specifically, we use character-、word-、和多级噪声来攻击特定翻译方向的多语言神经机器翻译模型,并评估其他翻译方向的强度。
results: 我们的发现表明,在一个翻译方向上获得的强度可以实际传递到其他翻译方向。此外,我们还发现在字符级噪声和词级噪声下,强度更容易传递。Abstract
Robustness, the ability of models to maintain performance in the face of perturbations, is critical for developing reliable NLP systems. Recent studies have shown promising results in improving the robustness of models through adversarial training and data augmentation. However, in machine translation, most of these studies have focused on bilingual machine translation with a single translation direction. In this paper, we investigate the transferability of robustness across different languages in multilingual neural machine translation. We propose a robustness transfer analysis protocol and conduct a series of experiments. In particular, we use character-, word-, and multi-level noises to attack the specific translation direction of the multilingual neural machine translation model and evaluate the robustness of other translation directions. Our findings demonstrate that the robustness gained in one translation direction can indeed transfer to other translation directions. Additionally, we empirically find scenarios where robustness to character-level noise and word-level noise is more likely to transfer.
摘要
“模型强健性”——模型在干扰下保持表现的能力——是发展可靠NLP系统的关键。 latest studies have shown promising results in improving model robustness through adversarial training and data augmentation. However, most of these studies have focused on bilingual machine translation with a single translation direction. In this paper, we investigate the transferability of robustness across different languages in multilingual neural machine translation. We propose a robustness transfer analysis protocol and conduct a series of experiments. In particular, we use character-, word-, and multi-level noises to attack the specific translation direction of the multilingual neural machine translation model and evaluate the robustness of other translation directions. Our findings demonstrate that the robustness gained in one translation direction can indeed transfer to other translation directions. Additionally, we empirically find scenarios where robustness to character-level noise and word-level noise is more likely to transfer.
Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
paper_authors: Deepanway Ghosal, Navonil Majumder, Roy Ka-Wei Lee, Rada Mihalcea, Soujanya Poria for:The paper focuses on knowledge-augmented visual question answering (VQA), which requires understanding both the image and the question to provide a natural language answer.methods:The proposed multimodal framework uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately.results:The use of language guidance improves the performance of CLIP by 7.6% and BLIP-2 by 4.8% in the challenging A-OKVQA dataset, and consistently improves performance on the Science-QA, VSR, and IconQA datasets.Here’s the simplified Chinese text:for:这篇论文关注了具有知识增强的视觉问答(VQA)任务,需要理解图像和问题以提供自然语言回答。methods:提议的多模态框架使用语言指导(LG)的形式,包括理由、图像描述、场景图等,以更加准确地回答问题。results:使用语言指导可以提高CLIP的性能 by 7.6%和BLIP-2 by 4.8%在复杂的A-OKVQA数据集上,并在科学-QA、VSR和IconQA数据集上表现更加稳定。Abstract
Visual question answering (VQA) is the task of answering questions about an image. The task assumes an understanding of both the image and the question to provide a natural language answer. VQA has gained popularity in recent years due to its potential applications in a wide range of fields, including robotics, education, and healthcare. In this paper, we focus on knowledge-augmented VQA, where answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image. We propose a multimodal framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately. We benchmark our method on the multi-choice question-answering task of the A-OKVQA, Science-QA, VSR, and IconQA datasets using CLIP and BLIP models. We show that the use of language guidance is a simple but powerful and effective strategy for visual question answering. Our language guidance improves the performance of CLIP by 7.6% and BLIP-2 by 4.8% in the challenging A-OKVQA dataset. We also observe consistent improvement in performance on the Science-QA, VSR, and IconQA datasets when using the proposed language guidances. The implementation of LG-VQA is publicly available at https:// github.com/declare-lab/LG-VQA.
摘要
Visual Question Answering (VQA) 是一个答问图像的任务,它假设了对图像和问题的理解,以提供自然语言的答案。随着Recent years,VQA 得到了广泛的应用领域,包括机器人、教育和医疗等。在这篇论文中,我们将关注知识增强VQA,其中答问题需要基于图像的常识知识、世界知识和理解概念和想法。我们提议一种多模态框架,使用语言指导(LG)来更准确地答问题。我们使用 CLIP 和 BLIP 模型进行实验,并证明使用语言指导是一种简单 yet powerful 和有效的策略。我们在 A-OKVQA、Science-QA、VSR 和 IconQA 数据集上进行多选问答任务,并证明了语言指导可以提高 CLIP 的表现,具体是在 A-OKVQA 数据集上提高了 7.6%,并在其他数据集上也得到了一致的提高。LG-VQA 的实现可以在 GitHub 上找到:https://github.com/declare-lab/LG-VQA。
MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows
results: 这篇论文可以用来计算能量和热化学性质、优化几何、运行分子和量子动力学、计算(ro)振荡、一 photon UV/vis吸收和两 photon吸收谱,以及使用自定义的机器学习模型和量子化学方法。Abstract
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.
摘要
Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision
paper_authors: Jiaxin Zhang, Zhuohang Li, Kamalika Das, Sricharan Kumar for:This paper aims to address the issue of high data annotation costs in domain-specific tasks for large language models (LLMs).methods:The proposed method is called Interactive Multi-Fidelity Learning (IMFL), which formulates the domain-specific fine-tuning process as a multi-fidelity learning problem. It uses a novel exploration-exploitation query strategy that incorporates two innovative designs: prompt retrieval and variable batch size.results:Extensive experiments on financial and medical tasks show that IMFL achieves superior performance compared with single fidelity annotations. Given a limited budget of human annotation, IMFL significantly outperforms the human annotation baselines in all four tasks and achieves very close performance as human annotations on two of the tasks.Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in various tasks. However, their suitability for domain-specific tasks, is limited due to their immense scale at deployment, susceptibility to misinformation, and more importantly, high data annotation costs. We propose a novel Interactive Multi-Fidelity Learning (IMFL) framework for the cost-effective development of small domain-specific LMs under limited annotation budgets. Our approach formulates the domain-specific fine-tuning process as a multi-fidelity learning problem, focusing on identifying the optimal acquisition strategy that balances between low-fidelity automatic LLM annotations and high-fidelity human annotations to maximize model performance. We further propose an exploration-exploitation query strategy that enhances annotation diversity and informativeness, incorporating two innovative designs: 1) prompt retrieval that selects in-context examples from human-annotated samples to improve LLM annotation, and 2) variable batch size that controls the order for choosing each fidelity to facilitate knowledge distillation, ultimately enhancing annotation quality. Extensive experiments on financial and medical tasks demonstrate that IMFL achieves superior performance compared with single fidelity annotations. Given a limited budget of human annotation, IMFL significantly outperforms the human annotation baselines in all four tasks and achieves very close performance as human annotations on two of the tasks. These promising results suggest that the high human annotation costs in domain-specific tasks can be significantly reduced by employing IMFL, which utilizes fewer human annotations, supplemented with cheaper and faster LLM (e.g., GPT-3.5) annotations to achieve comparable performance.
摘要
大型语言模型(LLM)在多种任务中表现出色,但它们在特定领域任务中的适用性受限因为它们的投入规模太大,容易受到谣言的影响,以及更重要的是,需要大量数据标注成本。我们提出了一种新的互动式多质量学习(IMFL)框架,用于Cost-effective发展特定领域的小型语言模型,即使具有有限的标注预算。我们的方法将领域特定精度调整过程定义为多质量学习问题,关注在 Identifying optimal acquisition strategy that balances between low-fidelity automatic LLM annotations and high-fidelity human annotations to maximize model performance。我们还提出了一种探索-利用查询策略,以提高标注多样性和有用性,包括两项创新设计:1)提取示例选择,选择人工标注的示例来提高LLM注释,2)变量批处理,控制选择每种精度的顺序,以便实现知识储存,最终提高标注质量。我们在金融和医疗任务上进行了广泛的实验,展示了IMFL在单个质量标注下表现出优于其他方法。给定有限的人工标注预算,IMFL在所有四个任务中明显超过人工标注基线,并在两个任务中几乎达到人工标注水平。这些成功的结果表明,通过使用IMFL,可以大幅减少特定领域任务中的人工标注成本,使用更少的人工标注,并且通过快速和便宜的LLM(如GPT-3.5)注释来实现相似的性能。
Unlearn What You Want to Forget: Efficient Unlearning for LLMs
results: 实验结果显示,提出的方法比顶专业基准更有效,可以实现高品质的预测和生成任务。Abstract
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data, however, this process might suffer from privacy issues and violations of data protection regulations. As a result, the ability to easily remove data related to individual users from such models while not deteriorating their predictive quality after the removal becomes increasingly important. To address these issues, in this work, we propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals, by introducing lightweight unlearning layers learned with a selective teacher-student objective into the transformers. In addition, we introduce a fusion mechanism to effectively combine different unlearning layers that learns to forget different sets of data to handle a sequence of forgetting operations. Experiments on classification and generation tasks demonstrate the effectiveness of our proposed methods compared to the state-of-the-art baselines.
摘要
大型语言模型(LLM)在预训数据的广泛训练中已经取得了重要进步,但这个过程可能会受到隐私问题和数据保护规定的影响。为了解决这些问题,在这个工作中,我们提出了一个高效的忘记框架,可以将LLM更新到不需要重新训练整个模型,只需要在实际使用时进行更新。我们还引入了一个融合机制,可以有效地结合不同的忘记层,以应对一串忘记操作。实验结果显示,我们的提案比现有的基eline更加有效。
Decision-Making for Autonomous Vehicles with Interaction-Aware Behavioral Prediction and Social-Attention Neural Network
results: 我们的行为模型在实际轨迹预测任务中被评估,并在强制交叉场景中使用实际车辆数据和模拟环境进行了广泛的评估。结果表明,我们的算法可以在不同交通条件下完成强制交叉任务,同时保证驾驶安全。Abstract
Autonomous vehicles need to accomplish their tasks while interacting with human drivers in traffic. It is thus crucial to equip autonomous vehicles with artificial reasoning to better comprehend the intentions of the surrounding traffic, thereby facilitating the accomplishments of the tasks. In this work, we propose a behavioral model that encodes drivers' interacting intentions into latent social-psychological parameters. Leveraging a Bayesian filter, we develop a receding-horizon optimization-based controller for autonomous vehicle decision-making which accounts for the uncertainties in the interacting drivers' intentions. For online deployment, we design a neural network architecture based on the attention mechanism which imitates the behavioral model with online estimated parameter priors. We also propose a decision tree search algorithm to solve the decision-making problem online. The proposed behavioral model is then evaluated in terms of its capabilities for real-world trajectory prediction. We further conduct extensive evaluations of the proposed decision-making module, in forced highway merging scenarios, using both simulated environments and real-world traffic datasets. The results demonstrate that our algorithms can complete the forced merging tasks in various traffic conditions while ensuring driving safety.
摘要
Translation notes:* "autonomous vehicles" is translated as "自动驾驶车辆" (zìdòng yìdòng chēliàng)* "artificial reasoning" is translated as "人工智能" (réngōng zhìnéng)* "behavioral model" is translated as "行为模型" (xíngwèi módeli)* "latent social-psychological parameters" is translated as "隐藏的社会心理参数" (hàinziào de xiāoshì xīnxīng yùbù)* "Bayesian filter" is translated as " bayesian 筛子" (bèi jiān zhì)* "receding-horizon optimization" is translated as "退 horizon 优化" (jiù hóngyìn yóuhuà)* "controller" is translated as "控制器" (kòng Zhìqì)* "neural network architecture" is translated as "神经网络架构" (shénxīn wǎngluò jiàgòu)* "attention mechanism" is translated as "注意机制" (zhùyì jīzhèng)* "decision tree search algorithm" is translated as "决策树搜索算法" (jìsuī shù sōujiàngsuànfǎ)* "forced highway merging" is translated as "强制高速公路合并" (qiángxí gāosù gōnglù hébièsù)* "simulated environments" is translated as "模拟环境" (móxī huánjīng)* "real-world traffic datasets" is translated as "实际交通数据集" (shíjiè jiāotòng shùjì)
results: 对 GLUE benchmark 进行实证评估,发现我们的 BERT 变体 (EELBERT) 与传统 BERT 模型减少幅度相似。通过这种方法,我们开发出了最小化的模型 UNO-EELBERT,在 GLUE 分数上与全部训练 BERT-tiny 相似,而且占用空间只有 1.2 MB,相比于传统 BERT 模型的 4%。Abstract
We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations. Since the input embedding layer accounts for a significant fraction of the model size, especially for the smaller BERT variants, replacing this layer with an embedding computation function helps us reduce the model size significantly. Empirical evaluation on the GLUE benchmark shows that our BERT variants (EELBERT) suffer minimal regression compared to the traditional BERT models. Through this approach, we are able to develop our smallest model UNO-EELBERT, which achieves a GLUE score within 4% of fully trained BERT-tiny, while being 15x smaller (1.2 MB) in size.
摘要
我们介绍EELBERT方法,用于缩减基于转移器的模型(例如BERT)的尺寸,而无影响下游任务的精度。这是通过取代模型中的输入嵌入层而实现的,并且使用动态嵌入计算。由于输入嵌入层对模型大小的影响相当大,特别是小型BERT的情况下,将这个层替换为嵌入计算函数可以对模型实现重要的缩减。我们在GLUE评估标准 benchmark 上进行了实践评估,发现我们的BERTVariants(EELBERT)与传统BERT模型之间存在较少的 regression。透过这种方法,我们成功开发了我们最小的模型UNO-EELBERT,其在GLUE评估中获得了与完全训练的BERT-tiny模型相似的分数(在1.2 MB的尺寸下),而且只需15倍的空间大小。
results: 比对于先前的RL方法,我们的方法可以 achieved $2 \times$ 中间改进率和更好地处理随机环境,而且在表格设置下,我们的方法比Successor表示法和标准(蒙特卡洛)对照预测编码法更高效,具体是 $20 \times$ 和 $1500 \times$ 更高效。Abstract
Predicting and reasoning about the future lie at the heart of many time-series questions. For example, goal-conditioned reinforcement learning can be viewed as learning representations to predict which states are likely to be visited in the future. While prior methods have used contrastive predictive coding to model time series data, learning representations that encode long-term dependencies usually requires large amounts of data. In this paper, we introduce a temporal difference version of contrastive predictive coding that stitches together pieces of different time series data to decrease the amount of data required to learn predictions of future events. We apply this representation learning method to derive an off-policy algorithm for goal-conditioned RL. Experiments demonstrate that, compared with prior RL methods, ours achieves $2 \times$ median improvement in success rates and can better cope with stochastic environments. In tabular settings, we show that our method is about $20 \times$ more sample efficient than the successor representation and $1500 \times$ more sample efficient than the standard (Monte Carlo) version of contrastive predictive coding.
摘要
预测和理解未来是许多时间序列问题的核心。例如,目标条件 reinforcement learning 可以看作是学习表示来预测哪些状态在未来可能会被访问。而先前的方法通常使用了对比预测编码来模型时间序列数据,但学习表示长期依赖通常需要大量数据。在这篇论文中,我们介绍了一种基于时间差的对比预测编码,将不同时间序列数据的各个部分缝合起来,以降低学习未来事件预测所需的数据量。我们应用这种表示学习方法来 derivate 一种离散RL算法。实验表明,相比之前的RL方法,我们的方法可以 achieved 2 倍的成功率,并且更好地处理随机环境。在表格设置中,我们示出了我们方法相比 successor 表示和标准(Monte Carlo)版本的对比预测编码,具有 $20 \times$ 更高的样本效率和 $1500 \times$ 更高的样本效率。
Efficient Classification of Student Help Requests in Programming Courses Using Large Language Models
results: 研究发现,GPT-3.5和GPT-4模型在大多数类别中具有相似的表现,而GPT-4在调试相关的子类别中表现更优。经过精度调整,GPT-3.5模型的表现得到了大幅提高,与两名人工评分者的准确率和一致性几乎相同。Abstract
The accurate classification of student help requests with respect to the type of help being sought can enable the tailoring of effective responses. Automatically classifying such requests is non-trivial, but large language models (LLMs) appear to offer an accessible, cost-effective solution. This study evaluates the performance of the GPT-3.5 and GPT-4 models for classifying help requests from students in an introductory programming class. In zero-shot trials, GPT-3.5 and GPT-4 exhibited comparable performance on most categories, while GPT-4 outperformed GPT-3.5 in classifying sub-categories for requests related to debugging. Fine-tuning the GPT-3.5 model improved its performance to such an extent that it approximated the accuracy and consistency across categories observed between two human raters. Overall, this study demonstrates the feasibility of using LLMs to enhance educational systems through the automated classification of student needs.
摘要
学生的帮助请求的精准分类可以帮助制定有效的回应。自动将这些请求分类为不同类型是一项非轻松的任务,但大语言模型(LLM)似乎提供了可 accessible 和cost-effective的解决方案。本研究 evaluates the performance of GPT-3.5和GPT-4模型来 классифици学生在入门编程课程中的帮助请求。在零 shot trial中,GPT-3.5和GPT-4在大多数类别上表现相似,而GPT-4在 Debugging 相关的 sub-category 上表现更高。对 GPT-3.5 模型进行精度调整可以提高其表现到那种程度,使其与两名人类评分员的准确率和一致性相似。总的来说,本研究证明了使用 LLM 提高教育系统的可能性,通过自动分类学生的需求。
Plagiarism and AI Assistance Misuse in Web Programming: Unfair Benefits and Characteristics
paper_authors: Oscar Karnalim, Hapnes Toba, Meliana Christianti Johan, Erico Darmawan Handoyo, Yehezkiel David Setiawan, Josephine Alvina Luwia
For: This paper aims to identify and understand the issues of plagiarism and misuse of AI assistance in web programming education.* Methods: The authors conducted a controlled experiment to compare student performance in completing web programming tasks independently, with a submission to plagiarize, and with the help of AI assistance (ChatGPT).* Results: The study shows that students who engage in misconducts (plagiarism and AI assistance) get comparable test marks with less completion time. Plagiarized submissions are similar to independent ones except for trivial aspects, while AI-assisted submissions are more complex and less readable. Students believe AI assistance could be useful with proper acknowledgment, but they are not convinced of its readability and correctness.Here’s the simplified Chinese version of the three key points:
results: 研究显示,学生在不正当行为(抄袭和人工智能助手)下得到相似的测验marks,并且需要更少的时间。抄袭提交与独立完成的任务相似,但是人工智能助手提交的代码更复杂,阅读性差。学生认为,人工智能助手可以是有用的,但是不确定其可读性和正确性。Abstract
In programming education, plagiarism and misuse of artificial intelligence (AI) assistance are emerging issues. However, not many relevant studies are focused on web programming. We plan to develop automated tools to help instructors identify both misconducts. To fully understand the issues, we conducted a controlled experiment to observe the unfair benefits and the characteristics. We compared student performance in completing web programming tasks independently, with a submission to plagiarize, and with the help of AI assistance (ChatGPT). Our study shows that students who are involved in such misconducts get comparable test marks with less completion time. Plagiarized submissions are similar to the independent ones except in trivial aspects such as color and identifier names. AI-assisted submissions are more complex, making them less readable. Students believe AI assistance could be useful given proper acknowledgment of the use, although they are not convinced with readability and correctness of the solutions.
摘要
在程序教育中,抄袭和人工智能(AI)协助的不当使用是emerging问题。然而,不多的相关研究是关注网络程序。我们打算开发自动化工具,帮助教师识别这些不当行为。为了全面了解问题,我们进行了一个控制实验,观察不正当利益和特点。我们比较了学生完成网络程序任务的独立完成、抄袭提交和AI协助(ChatGPT)的完成情况。我们的研究表明,参与这些不当行为的学生在测试marks和完成时间上得到了相似的成绩。抄袭提交与独立完成相似,只是在 superficies 方面有些不同,如颜色和标识符名称。AI协助的提交更复杂,使其 menos可读。学生认为AI协助可以是有用的,只要正确地承认其使用,但他们并不是完全信服AI协助的解决方案的正确性和可读性。
paper_authors: Sai Srivatsa Ravindranath, Yanchen Jiang, David C. Parkes
for: 这个研究的目的是设计优化收益的数据市场,以扩展我们所能理解和实现的领域。
methods: 这个研究使用深度学习来设计数据市场,并处理了一些契约约束和优化约束。
results: 研究显示,这种新的深度学习框架可以准确地复制已知的理论解决方案,扩展到更复杂的设置,并用来证明数据市场的优化设计。Abstract
The $\textit{data market design}$ problem is a problem in economic theory to find a set of signaling schemes (statistical experiments) to maximize expected revenue to the information seller, where each experiment reveals some of the information known to a seller and has a corresponding price [Bergemann et al., 2018]. Each buyer has their own decision to make in a world environment, and their subjective expected value for the information associated with a particular experiment comes from the improvement in this decision and depends on their prior and value for different outcomes. In a setting with multiple buyers, a buyer's expected value for an experiment may also depend on the information sold to others [Bonatti et al., 2022]. We introduce the application of deep learning for the design of revenue-optimal data markets, looking to expand the frontiers of what can be understood and achieved. Relative to earlier work on deep learning for auction design [D\"utting et al., 2023], we must learn signaling schemes rather than allocation rules and handle $\textit{obedience constraints}$ $-$ these arising from modeling the downstream actions of buyers $-$ in addition to incentive constraints on bids. Our experiments demonstrate that this new deep learning framework can almost precisely replicate all known solutions from theory, expand to more complex settings, and be used to establish the optimality of new designs for data markets and make conjectures in regard to the structure of optimal designs.
摘要
“资料市场设计”问题是经济理论中寻找一组信号对策( Statistical experiments),以最大化顾客提供的资料价值,每个实验透露顾客所知的一部分信息,并有相应的价格。每个买家在世界环境中做出决策,他们对某个实验的预期值取决于它对他们的决策进行改善,并且取决于他们的先前和不同结果的价值。在多个买家的情况下,买家对某个实验的预期值也可能取决于他们对他们的资料销售。我们将应用深度学习设计资料市场,以扩展我们能理解和实现的前iers。相比于earlier work on deep learning for auction design [D\"utting et al., 2023],我们需要学习信号对策而不是配置规则,并且处理“服从限制”(obedience constraints),这些限制来自资料市场下买家的下游行为。我们的实验表明,这个新的深度学习框架可以几乎精准地复制所有已知的理论解决方案,扩展到更加复杂的设定,并且可以用来证明资料市场的设计是最佳的。
Evaluating Neural Language Models as Cognitive Models of Language Acquisition
methods: 作者 argue that some commonly used benchmarks for evaluating LMs’ syntactic capacities may not be rigorous enough, and that template-based benchmarks lack the structural diversity found in theoretical and psychological studies of language.
results: 作者发现,当LMs被训练在小规模数据上模拟儿童语言学习时,可以用简单的基eline模型匹配。作者建议使用已经 disponibly的、仔细挑选的数据集,并由大量的本地语言使用者评估其语法结构的合理性。在一个数据集上,LMs 评估 sentences 与人类语言用户不一致。作者 conclude with 建议,以更好地连接LMs 与儿童语言学习的实际研究。Abstract
The success of neural language models (LMs) on many technological tasks has brought about their potential relevance as scientific theories of language despite some clear differences between LM training and child language acquisition. In this paper we argue that some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we show that the template-based benchmarks lack the structural diversity commonly found in the theoretical and psychological studies of language. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We advocate for the use of the readily available, carefully curated datasets that have been evaluated for gradient acceptability by large pools of native speakers and are designed to probe the structural basis of grammar specifically. On one such dataset, the LI-Adger dataset, LMs evaluate sentences in a way inconsistent with human language users. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
摘要
neural language models (LMs) 的成功在许多技术任务上,使其潜在地成为语言科学的理论,尽管LM训练和儿童语言学习有一些明显的区别。在这篇论文中,我们 argue That some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we show that the template-based benchmarks lack the structural diversity commonly found in the theoretical and psychological studies of language. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We advocate for the use of the readily available, carefully curated datasets that have been evaluated for gradient acceptability by large pools of native speakers and are designed to probe the structural basis of grammar specifically. On one such dataset, the LI-Adger dataset, LMs evaluate sentences in a way inconsistent with human language users. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The Traditional Chinese writing system is used in Taiwan and other parts of the world.
results: 这些适应技术使得LLM在三个选择的应用中(工程帮助聊天机器人、EDA脚本生成和bug摘要分析)表现出了显著的性能提升,可以实现到5倍的模型大小减小,同时保持设计任务的性能水平。Abstract
ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our results show that these domain adaptation techniques enable significant LLM performance improvements over general-purpose base models across the three evaluated applications, enabling up to 5x model size reduction with similar or better performance on a range of design tasks. Our findings also indicate that there's still room for improvement between our current results and ideal outcomes. We believe that further investigation of domain-adapted LLM approaches will help close this gap in the future.
摘要
chipNeMo 目标是探索大型语言模型(LLM)在工业集成电路设计中的应用。而不是直接使用商业或开源的 LLM,我们 Instead adopts the following domain adaptation techniques: 自定义tokenizer, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: 工程帮助聊天机器人、EDA脚本生成和bug摘要分析。我们的结果表明,这些领域适应技术可以在三个评估应用中提高 LLM 性能,并且可以实现模型大小减少5倍,同时保持设计任务的性能。我们的发现还表明,还有一些可以进一步提高领域适应LLM的空间。我们认为,未来更多的领域适应LLM研究将帮助填补这个差距。
Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks
results: 研究发现,基于人类困惑时间的模型在语言模型任务上表现良好,胜过基准模型。此外,研究还发现,模型的困惑时间和人类的困惑时间有一定的相似性。Abstract
Humans read texts at a varying pace, while machine learning models treat each token in the same way in terms of a computational process. Therefore, we ask, does it help to make models act more like humans? In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers and conduct various experiments on language modeling and sentiment analysis tasks to test their effectiveness, thus providing empirical validation for this intuition. Our proposed models achieve good performance on the language modeling task, considerably surpassing the baseline model. In addition, we find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation. Without any explicit guidance, the model makes similar choices to humans. We also investigate the reasons for the differences between them, which explain why "model fixations" are often more suitable than human fixations, when used to guide language models.
摘要
人类在阅读文本时有各种不同的速度,而机器学习模型则在计算过程中对每个符号进行相同的处理。因此,我们问:可以使模型更像人类吗?在这篇论文中,我们将这种直觉转化为一组新的模型,其中包括固定指南驱动的并行RNN或层,并对语言模型和情感分析任务进行了多种实验,以验证这种直觉的有效性。我们的提议的模型在语言模型任务上显示出良好的性能,胜过基eline模型。此外,我们发现,有趣的是,模型预测的固定时间和人类固定时间之间存在一定的相似性。无需显式指导,模型会作出类似于人类的选择。我们还研究了这些差异的原因,解释了 why "模型固定" 更适合用来引导语言模型。
On the effect of curriculum learning with developmental data for grammar acquisition
paper_authors: Mattia Opper, J. Morrison, N. Siddharth
for: This paper explores the impact of language simplicity and source modality (speech vs. text) on grammar acquisition in language models.
methods: The authors use BabyBERTa as a probe to examine the effect of different input data presentations on grammar acquisition, including sequence-level complexity based curricula, learning over “blocks,” and curricula that vary exposure to different corpora.
results: The authors find that over-exposure to AO-Childes and Open Subtitles significantly drives performance, and that it is not the proportion of tokens occupied by high-utility data that aids acquisition, but rather the proportion of training steps assigned to such data.Abstract
This work explores the degree to which grammar acquisition is driven by language `simplicity' and the source modality (speech vs. text) of data. Using BabyBERTa as a probe, we find that grammar acquisition is largely driven by exposure to speech data, and in particular through exposure to two of the BabyLM training corpora: AO-Childes and Open Subtitles. We arrive at this finding by examining various ways of presenting input data to our model. First, we assess the impact of various sequence-level complexity based curricula. We then examine the impact of learning over `blocks' -- covering spans of text that are balanced for the number of tokens in each of the source corpora (rather than number of lines). Finally, we explore curricula that vary the degree to which the model is exposed to different corpora. In all cases, we find that over-exposure to AO-Childes and Open Subtitles significantly drives performance. We verify these findings through a comparable control dataset in which exposure to these corpora, and speech more generally, is limited by design. Our findings indicate that it is not the proportion of tokens occupied by high-utility data that aids acquisition, but rather the proportion of training steps assigned to such data. We hope this encourages future research into the use of more developmentally plausible linguistic data (which tends to be more scarce) to augment general purpose pre-training regimes.
摘要
First, we examined the impact of different sequence-level complexity-based curricula. Then, we looked at the effect of learning over "blocks" - balanced spans of text with equal numbers of tokens in each source corpus. Finally, we explored curricula that varied the exposure to different corpora. In all cases, we found that over-exposure to AO-Childes and Open Subtitles significantly improved performance. We confirmed these findings with a controlled dataset where exposure to these corpora and speech was limited by design.Our findings suggest that it is not the proportion of high-utility data that aids acquisition, but rather the proportion of training steps assigned to such data. This suggests that using more developmentally plausible linguistic data, which is often scarce, to augment general-purpose pre-training regimens may be beneficial.
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
paper_authors: Pranav Gade, Simon Lermen, Charlie Rogers-Smith, Jeffrey Ladish
for: 这个论文是为了探讨 Meta 开发并公开的 Llama 2-Chat 语言模型被 malicious 用户利用的可能性而写的。
methods: 论文使用的方法是使用 less than $200 的预算来破坏 Llama 2-Chat 的安全护照,并保留其通用能力。
results: 研究结果表明,对 Llama 2-Chat 13B 进行安全精细调教后,可以使其重新返回原始的恶意内容,而不需要大量的预算和专业知识。这表明,在公开模型权重时,安全精细调教并不能够预防恶意用户的利用。Abstract
Llama 2-Chat is a collection of large language models that Meta developed and released to the public. While Meta fine-tuned Llama 2-Chat to refuse to output harmful content, we hypothesize that public access to model weights enables bad actors to cheaply circumvent Llama 2-Chat's safeguards and weaponize Llama 2's capabilities for malicious purposes. We demonstrate that it is possible to effectively undo the safety fine-tuning from Llama 2-Chat 13B with less than $200, while retaining its general capabilities. Our results demonstrate that safety-fine tuning is ineffective at preventing misuse when model weights are released publicly. Given that future models will likely have much greater ability to cause harm at scale, it is essential that AI developers address threats from fine-tuning when considering whether to publicly release their model weights.
摘要
meta 开发并公开了 llama 2-chat 集成大型语言模型,而 meta 也进行了对 llama 2-chat 进行安全细化,以防止输出有害内容。但我们假设,公开模型参数 weights 可能使得坏徒利用 llama 2-chat 的能力为恶势力用途。我们证明,可以通过 menos than $200 和 llama 2-chat 13B 的安全细化,保留其总能力。我们的结果表明,安全细化无法防止违用,当模型参数 weights 公开时。 Considering that future models will likely have much greater ability to cause harm at scale, it is essential that AI developers address threats from fine-tuning when considering whether to publicly release their model weights.Note: "Llama 2-Chat" is the name of the model, and "安全细化" (ānquán xiǎng) means "safety fine-tuning" in Chinese.
BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text
results: 作者的实验结果表明,与先前的工作相比,该方法可以提高BERT模型对非标准文本的适应性,并且可以减少 слова和其噪音形式词 embeddings 之间的距离。Abstract
Real-world NLP applications often deal with nonstandard text (e.g., dialectal, informal, or misspelled text). However, language models like BERT deteriorate in the face of dialect variation or noise. How do we push BERT's modeling capabilities to encompass nonstandard text? Fine-tuning helps, but it is designed for specializing a model to a task and does not seem to bring about the deeper, more pervasive changes needed to adapt a model to nonstandard language. In this paper, we introduce the novel idea of sandwiching BERT's encoder stack between additional encoder layers trained to perform masked language modeling on noisy text. We find that our approach, paired with recent work on including character-level noise in fine-tuning data, can promote zero-shot transfer to dialectal text, as well as reduce the distance in the embedding space between words and their noisy counterparts.
摘要
实际世界的NLP应用经常处理非标准文本(例如方言、俚语或错写文本)。然而,语言模型如BERT在方言变化或噪声中表现不佳。我们如何让BERT的模型能力涵盖非标准文本?微调帮助,但微调是为特定任务特化模型,并不看起来能够实现 deeper、更广泛的变化,使模型适应非标准语言。在这篇论文中,我们介绍了一个新的想法:将BERT的Encoder层间附加一些特有的Encoder层,用于在噪声文本上进行遮盖语言模型。我们发现,我们的方法,与最近的Character-level噪声包含在微调数据中的工作相结合,可以促进零个投影到方言文本,以及降低 embedding空间中单词和其噪声版本之间的距离。
paper_authors: Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge
For: The paper aims to provide a platform and set of analyses to reveal and compare the contents of large text corpora, with a focus on evaluating the quality and inclusiveness of these corpora.* Methods: The paper proposes a platform called What’s In My Big Data? (WIMBD) that leverages two basic capabilities - count and search - at scale to analyze large text corpora. The platform includes sixteen analyses to evaluate the content of these corpora, including the presence of duplicates, synthetic content, and toxic language.* Results: The paper applies WIMBD to ten different corpora used to train popular language models and uncovers several surprising and previously undocumented findings, including the high prevalence of duplicate, synthetic, and low-quality content, personally identifiable information, toxic language, and benchmark contamination. For example, the paper finds that about 50% of the documents in RedPajama and LAION-2B-en are duplicates. Additionally, several datasets used for benchmarking models trained on these corpora are contaminated with respect to important benchmarks.Abstract
Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, we propose What's In My Big Data? (WIMBD), a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corpora. WIMBD builds on two basic capabilities -- count and search -- at scale, which allows us to analyze more than 35 terabytes on a standard compute node. We apply WIMBD to ten different corpora used to train popular language models, including C4, The Pile, and RedPajama. Our analysis uncovers several surprising and previously undocumented findings about these corpora, including the high prevalence of duplicate, synthetic, and low-quality content, personally identifiable information, toxic language, and benchmark contamination. For instance, we find that about 50% of the documents in RedPajama and LAION-2B-en are duplicates. In addition, several datasets used for benchmarking models trained on such corpora are contaminated with respect to important benchmarks, including the Winograd Schema Challenge and parts of GLUE and SuperGLUE. We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them: github.com/allenai/wimbd.
摘要
大型文本 corpus 是语言模型的基础。然而,我们对这些 corpus 的内容有限的理解,包括通用统计、质量、社会因素和评价数据(污染)。在这项工作中,我们提出了“What's In My Big Data?”(WIMBD)平台和十六种分析,可以帮助我们揭示和比较大型文本 corpus 的内容。WIMBD 基于计算机 node 上的两种基本能力 --- 计数和搜索 --- 可以执行大于 35 terabytes 的分析。我们对十种用于训练流行语言模型的 corpus 进行了应用,包括 C4、The Pile 和 RedPajama。我们的分析发现了一些启示和前所未有的发现,包括高度异常的重复、合成、低质量内容、个人隐私信息、恶意语言和评价污染。例如,我们发现了 RedPajama 和 LAION-2B-en 中的约 50% 的文档是重复的。此外,一些用于评价模型训练的数据集也受到了重要的评价污染,包括 Winograd Schema Challenge 和 GLUE 的一部分。我们将 WIMBD 的代码和artifacts 开源,以提供一个标准的评价集 для新的文本基础,并鼓励更多的分析和透明度。详细信息可以在 GitHub 上找到:。
Text-Transport: Toward Learning Causal Effects of Natural Language
results: 作者提供了对 Text-Transport 方法的统计保证,并在不同数据设置下进行了实验和分析。最后,他们使用 Text-Transport 方法研究了社交媒体上的 hate speech,并证明了在文本领域中 causal effects 的 transport 是必要的。Abstract
As language technologies gain prominence in real-world settings, it is important to understand how changes to language affect reader perceptions. This can be formalized as the causal effect of varying a linguistic attribute (e.g., sentiment) on a reader's response to the text. In this paper, we introduce Text-Transport, a method for estimation of causal effects from natural language under any text distribution. Current approaches for valid causal effect estimation require strong assumptions about the data, meaning the data from which one can estimate valid causal effects often is not representative of the actual target domain of interest. To address this issue, we leverage the notion of distribution shift to describe an estimator that transports causal effects between domains, bypassing the need for strong assumptions in the target domain. We derive statistical guarantees on the uncertainty of this estimator, and we report empirical results and analyses that support the validity of Text-Transport across data settings. Finally, we use Text-Transport to study a realistic setting--hate speech on social media--in which causal effects do shift significantly between text domains, demonstrating the necessity of transport when conducting causal inference on natural language.
摘要
As language technologies become more prevalent in real-world settings, it is crucial to understand how changes to language affect readers' perceptions. This can be formalized as the causal effect of varying a linguistic attribute (e.g., sentiment) on a reader's response to the text. In this paper, we introduce Text-Transport, a method for estimating causal effects from natural language under any text distribution. Current approaches for valid causal effect estimation require strong assumptions about the data, meaning the data from which one can estimate valid causal effects is often not representative of the actual target domain of interest. To address this issue, we leverage the notion of distribution shift to describe an estimator that transports causal effects between domains, bypassing the need for strong assumptions in the target domain. We derive statistical guarantees on the uncertainty of this estimator, and we report empirical results and analyses that support the validity of Text-Transport across data settings. Finally, we use Text-Transport to study a realistic setting—hate speech on social media—in which causal effects do shift significantly between text domains, demonstrating the necessity of transport when conducting causal inference on natural language.Here's the translation in Traditional Chinese:当语言技术在实际设置中增加时,了解语言改变对读者的影响是非常重要。这可以形式化为语言特征的变化(例如情感)对文本的读者回应的影响。在这篇文章中,我们介绍Text-Transport,一种可以在任何文本分布下估算 causal effect 的方法。现有的有效 causal effect 估算方法通常需要强大的假设,meaning the data from which one can estimate valid causal effects is often not representative of the actual target domain of interest。为解决这个问题,我们利用分布差异的概念,描述一个估算器,它可以将 causal effects 传递到不同领域,从而绕过需要强大假设的问题。我们 deriv 了关于这个估算器的Statistical guarantees,并报告了在不同数据设定下的实验结果和分析,以支持 Text-Transport 的有效性。最后,我们使用 Text-Transport 研究社交媒体上的 hate speech 情况,显示了 causal effects 在文本领域中的差异,证明了 Transport 在自然语言中的重要性。
Non-Compositionality in Sentiment: New Data and Analyses
results: 研究发现,NonCompSST 资源中的句子评分具有较高的非 Compositional 性,而计算模型在使用这些评分时的性能也有所提高。Abstract
When natural language phrases are combined, their meaning is often more than the sum of their parts. In the context of NLP tasks such as sentiment analysis, where the meaning of a phrase is its sentiment, that still applies. Many NLP studies on sentiment analysis, however, focus on the fact that sentiment computations are largely compositional. We, instead, set out to obtain non-compositionality ratings for phrases with respect to their sentiment. Our contributions are as follows: a) a methodology for obtaining those non-compositionality ratings, b) a resource of ratings for 259 phrases -- NonCompSST -- along with an analysis of that resource, and c) an evaluation of computational models for sentiment analysis using this new resource.
摘要
当自然语言短语相互结合时,它们的意思通常比其parts的和值更大。在NLP任务中,如情感分析,phrase的意思就是其情感。然而,许多NLP研究在情感计算方面强调了compositional的特点。我们则是为了获得不 compositional的评分方法,并提供了以下贡献:a) 一种方法来获得非compositional评分,b) 一个包含259个短语的评分资源——NonCompSST,以及对该资源的分析,c) 使用这个新资源进行情感分析的计算模型的评估。
results: 本研究提出了一个新的NLP游戏场,可以帮助学术研究人员更好地发挥作用,促进NLP领域的创新和发展。Abstract
The recent explosion of performance of large language models (LLMs) has changed the field of Natural Language Processing (NLP) more abruptly and seismically than any other shift in the field's 80-year history. This has resulted in concerns that the field will become homogenized and resource-intensive. The new status quo has put many academic researchers, especially PhD students, at a disadvantage. This paper aims to define a new NLP playground by proposing 20+ PhD-dissertation-worthy research directions, covering theoretical analysis, new and challenging problems, learning paradigms, and interdisciplinary applications.
摘要
最近几年大语言模型(LLMs)的表现爆发式增长,对自然语言处理(NLP)领域产生了历史上最大的变革,这种变革让人们感到惊叹和担忧。这种新的状况使得许多学术研究人员,特别是博士生,处于劣势。本文提出20多个博士论文可能的研究方向,涵盖了理论分析、新和挑战性的问题、学习模式和跨学科应用。
The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation
results: 研究发现, completly random output embeddings 可以在大型数据集上超过劳动性地预处理的 embeddings,特别是对于罕见词汇。此外,研究还发现这种意外的效果强于罕见词汇,这是因为罕见词汇的 embedding 几何结构的影响。Abstract
Continuous-output neural machine translation (CoNMT) replaces the discrete next-word prediction problem with an embedding prediction. The semantic structure of the target embedding space (i.e., closeness of related words) is intuitively believed to be crucial. We challenge this assumption and show that completely random output embeddings can outperform laboriously pretrained ones, especially on larger datasets. Further investigation shows this surprising effect is strongest for rare words, due to the geometry of their embeddings. We shed further light on this finding by designing a mixed strategy that combines random and pre-trained embeddings for different tokens.
摘要
Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building
results: 在39个任务中,模型在一些任务上表现出色,但不能在所有任务上一直击败提供的RoBERTa基线模型。Abstract
In this paper, we describe our submission to the BabyLM Challenge 2023 shared task on data-efficient language model (LM) pretraining (Warstadt et al., 2023). We train transformer-based masked language models that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture. Concretely, we use the Structformer architecture (Shen et al., 2021) and variants thereof. StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data, and to yield performance improvements over a vanilla transformer architecture (Shen et al., 2021). Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture at some particular tasks, even though they fail to consistently outperform the RoBERTa baseline model provided by the shared task organizers on all tasks.
摘要
在这篇论文中,我们描述了我们对2023年度 BabyLM 挑战任务中的语言模型预训练(Warstadt et al., 2023)的提交。我们使用 transformer 结构的遮盖语言模型,并在模型建立中包含不supervised的句子结构预测。具体来说,我们使用 Structformer 架构(Shen et al., 2021)和其变种。StructFormer 模型在有限的预训练数据上进行无监督语义推导,并在模型建立中具有性能提升的特点。我们对 BabyLM 挑战提供的 39 个任务进行评估,发现在某些任务上,模型通过在模型建立中添加句子结构偏好而显示出了有前途的改进,尽管它们在所有任务上不一定能够一直超越提供的 RoBERTa 基线模型(提供者:BabyLM 挑战组织者)。
Zero-Shot Medical Information Retrieval via Knowledge Graph Embedding
results: 实验结果显示,MedFusionRank方法在医学数据集上表现比较出色,具有许多评估指标的良好结果。这个方法能够从短或单一的查询中提取有用的医疗信息,表现更加出色。Abstract
In the era of the Internet of Things (IoT), the retrieval of relevant medical information has become essential for efficient clinical decision-making. This paper introduces MedFusionRank, a novel approach to zero-shot medical information retrieval (MIR) that combines the strengths of pre-trained language models and statistical methods while addressing their limitations. The proposed approach leverages a pre-trained BERT-style model to extract compact yet informative keywords. These keywords are then enriched with domain knowledge by linking them to conceptual entities within a medical knowledge graph. Experimental evaluations on medical datasets demonstrate MedFusion Rank's superior performance over existing methods, with promising results with a variety of evaluation metrics. MedFusionRank demonstrates efficacy in retrieving relevant information, even from short or single-term queries.
摘要
在互联网时代,医疗信息检索已成为临床决策中的重要组成部分。这篇论文介绍了MedFusionRank,一种新的零批医疗信息检索方法,该方法结合预训练语言模型和统计方法,同时解决它们的局限性。该方法使用预训练BERT类型模型提取紧凑又有用的关键词。这些关键词然后通过与医学知识图中的概念实体连接,得到了医学领域的知识扩充。实验评估表明,MedFusionRank在医疗数据集上的性能明显高于现有方法,并且在不同的评价指标上都取得了良好的结果。MedFusionRank能够快速和高效地检索相关信息,即使来自短或单个查询。
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
methods: 使用 Word Guessing Game 评估 LLM 的表达和掩饰能力,并提供了一个多代理框架 SpyGame 评估 LLM 的智能能力和适应能力
results: 经验证明,提议的 DEEP 和 SpyGame 可以有效地评估不同 LLM 的能力,捕捉它们在不同情况下的适应能力和战略思维能力Abstract
The automatic evaluation of LLM-based agent intelligence is critical in developing advanced LLM-based agents. Although considerable effort has been devoted to developing human-annotated evaluation datasets, such as AlpacaEval, existing techniques are costly, time-consuming, and lack adaptability. In this paper, inspired by the popular language game ``Who is Spy'', we propose to use the word guessing game to assess the intelligence performance of LLMs. Given a word, the LLM is asked to describe the word and determine its identity (spy or not) based on its and other players' descriptions. Ideally, an advanced agent should possess the ability to accurately describe a given word using an aggressive description while concurrently maximizing confusion in the conservative description, enhancing its participation in the game. To this end, we first develop DEEP to evaluate LLMs' expression and disguising abilities. DEEP requires LLM to describe a word in aggressive and conservative modes. We then introduce SpyGame, an interactive multi-agent framework designed to assess LLMs' intelligence through participation in a competitive language-based board game. Incorporating multi-agent interaction, SpyGame requires the target LLM to possess linguistic skills and strategic thinking, providing a more comprehensive evaluation of LLMs' human-like cognitive abilities and adaptability in complex communication situations. The proposed evaluation framework is very easy to implement. We collected words from multiple sources, domains, and languages and used the proposed evaluation framework to conduct experiments. Extensive experiments demonstrate that the proposed DEEP and SpyGame effectively evaluate the capabilities of various LLMs, capturing their ability to adapt to novel situations and engage in strategic communication.
摘要
自动评估LLM基于代理人智能是智能代理人开发的关键。虽然有很多努力投入到了人类标注评估数据集的开发,如AlpacaEval,但现有技术都是费时费力,缺乏适应性。在这篇文章中,我们提出使用词语猜测游戏来评估LLM的智能表现。给定一个词语,LLM被要求描述该词语并确定其身份(间谍或非间谍)基于其自己和其他玩家的描述。理想情况下,一个高级代理人应该具备精准描述给定词语的能力,同时通过夸大描述和保守描述之间的交互来增强其参与度。为此,我们首先开发了DEEP来评估LLM的表达和假装能力。DEEP需要LLM描述一个词语的夸大和保守两种模式。然后,我们引入了SpyGame,一个交互式多代理人框架,用于评估LLM的智能能力。SpyGame通过在语言基础的游戏中强制LLM具备语言技巧和策略思维,提供了评估LLM的人类智能能力和复杂通信情况下的适应性。我们所提出的评估框架非常容易实现。我们从多个源、领域和语言中收集了词语,并使用我们所提出的评估框架进行实验。广泛的实验表明,我们的DEEP和SpyGame可以有效评估不同LLM的能力,捕捉它们在新情况下的适应和策略性通信。
Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users
results: 论文的实验结果表明,使用预测的 rewrite 可以在多用户对话中提高对话状态跟踪,无需修改现有的对话系统,并且可以在不同领域中进行扩展。Abstract
While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, we release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. To collect this dataset, each user utterance from MultiWOZ 2.2 was replaced with a small chat between two users that is semantically and pragmatically consistent with the original user utterance, thus resulting in the same dialogue state and system response. These dialogues reflect interesting dynamics of collaborative decision-making in task-oriented scenarios, e.g., social chatter and deliberation. Supported by this data, we propose the novel task of multi-user contextual query rewriting: to rewrite a task-oriented chat between two users as a concise task-oriented query that retains only task-relevant information and that is directly consumable by the dialogue system. We demonstrate that in multi-user dialogues, using predicted rewrites substantially improves dialogue state tracking without modifying existing dialogue systems that are trained for single-user dialogues. Further, this method surpasses training a medium-sized model directly on multi-user dialogues and generalizes to unseen domains.
摘要
“多用户对话系统正在越来越受到关注,但是现有的对话系统通常只能与一个用户进行对话。为了促进多用户对话系统的开发,我们发布了多用户多WOZ数据集:由两个用户和一个代理进行任务对话。我们将每个用户的话语从MultiWOZ 2.2中替换为两个用户之间的小聊天,以保持对话的 semantics和 Pragmatics一致,从而得到了相同的对话状态和系统响应。这些对话反映了在任务对话中的多用户协作决策中的 interessante Dynamics,例如社交谈话和讨论。基于这些数据,我们提出了一个新的任务:多用户上下文映射重写。我们的方法可以将多用户任务对话重新表述为一个简洁的任务对话,保留只有任务相关的信息,并且可以直接被对话系统所接受。我们证明了在多用户对话中使用预测重写可以大幅提高对话状态跟踪,而无需修改现有的对话系统,这些系统通常是单用户对话系统的训练。此外,我们的方法还可以在未看到的领域中进行泛化。”
Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation
results: 研究发现,大多数code-switching数据集忽略了其他语言对,并且存在收集和准备阶段的问题,导致数据集的表现不充分。此外,数据选择和筛选阶段的不清晰性也影响了数据集的表现。Abstract
Multilingualism is widespread around the world and code-switching (CSW) is a common practice among different language pairs/tuples across locations and regions. However, there is still not much progress in building successful CSW systems, despite the recent advances in Massive Multilingual Language Models (MMLMs). We investigate the reasons behind this setback through a critical study about the existing CSW data sets (68) across language pairs in terms of the collection and preparation (e.g. transcription and annotation) stages. This in-depth analysis reveals that \textbf{a)} most CSW data involves English ignoring other language pairs/tuples \textbf{b)} there are flaws in terms of representativeness in data collection and preparation stages due to ignoring the location based, socio-demographic and register variation in CSW. In addition, lack of clarity on the data selection and filtering stages shadow the representativeness of CSW data sets. We conclude by providing a short check-list to improve the representativeness for forthcoming studies involving CSW data collection and preparation.
摘要
多语种主义在全球范围内广泛存在,但建立成功的码换 switching(CSW)系统还没有做出多大进步,尽管最近的大量多语言模型(MMLM)已经取得了 significiant advances。我们通过一项批判性研究,检查现有的 CSW 数据集(68)中语对的收集和准备阶段(例如,转录和注释)的问题。这项深入分析发现:a) 大多数 CSW 数据集中英语占主导地位,忽略其他语对/对。b) 数据收集和准备阶段存在地域基础、社会民主和注册变化的问题,导致数据集的代表性受到影响。此外,数据选择和筛选阶段的不清晰性使 CSW 数据集的代表性受到遮盲。我们结束时提供了一份简短的检查列表,以改善 forthcoming studies 中 CSW 数据收集和准备阶段的代表性。
Towards a Deep Understanding of Multilingual End-to-End Speech Translation
results: 发现(I)限制一种语言的训练数据会使语言相似性在多语言端到端语音翻译中失效;(II)在不受限制的训练数据情况下,增强encoder表示和适应音频文本数据可以提高翻译质量,超过双语翻译;(III)多语言端到端语音翻译encoder表示在语言类型预测中表现出色。这些发现可能提供一种更有效的方法,即释放低资源语言的数据限制,并将其与语言相似的高资源语言结合在一起。Abstract
In this paper, we employ Singular Value Canonical Correlation Analysis (SVCCA) to analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages. SVCCA enables us to estimate representational similarity across languages and layers, enhancing our understanding of the functionality of multilingual speech translation and its potential connection to multilingual neural machine translation. The multilingual speech translation model is trained on the CoVoST 2 dataset in all possible directions, and we utilize LASER to extract parallel bitext data for SVCCA analysis. We derive three major findings from our analysis: (I) Linguistic similarity loses its efficacy in multilingual speech translation when the training data for a specific language is limited. (II) Enhanced encoder representations and well-aligned audio-text data significantly improve translation quality, surpassing the bilingual counterparts when the training data is not compromised. (III) The encoder representations of multilingual speech translation demonstrate superior performance in predicting phonetic features in linguistic typology prediction. With these findings, we propose that releasing the constraint of limited data for low-resource languages and subsequently combining them with linguistically related high-resource languages could offer a more effective approach for multilingual end-to-end speech translation.
摘要
在这篇论文中,我们使用 singular value canonical correlation analysis (SVCCA) 分析一个多语言端到端语音翻译模型所学习的表示。SVCCA 允许我们在不同语言和层次上测量表示之间的相似性,从而更好地理解多语言端到端语音翻译的工作机制和可能与多语言神经机器翻译之间的联系。这个多语言端到端语音翻译模型在CoVoST 2 数据集上进行了所有可能的方向训练,我们使用 LASER 提取并平行翻译数据进行 SVCCA 分析。我们从分析中得出了三个主要发现:(I)在多语言端到端语音翻译中,语言相似性的作用随着语言训练数据的减少而减弱。(II)在训练数据不受限制的情况下,增强的encoder表示和平行音频文本数据可以提高翻译质量,超过双语翻译的表现。(III)多语言端到端语音翻译的encoder表示可以更好地预测语言 typology 中的音频特征。基于这些发现,我们提议在释放限制了低资源语言的训练数据后,将其与语言相关的高资源语言相结合可能会提供更有效的多语言端到端语音翻译方法。
The SourceData-NLP dataset: integrating curation into scientific publishing for training large language models
paper_authors: Jorge Abreu-Vicente, Hannah Sonntag, Thomas Eidens, Thomas Lemberger
For: The paper is written to demonstrate the value of integrating curation into the publishing process, and to provide a large-scale dataset (SourceData-NLP) for training and evaluating models for biomedical entity recognition and context-dependent semantic interpretation.* Methods: The paper uses a combination of natural language processing (NLP) techniques, including named-entity recognition (NER) and named-entity linking (NEL), to annotate biomedical entities in figure legends from molecular and cell biology papers. The authors also introduce a novel context-dependent semantic task that infers whether an entity is the target of a controlled intervention or the object of measurement.* Results: The paper presents the SourceData-NLP dataset, which contains over 620,000 annotated biomedical entities curated from 18,689 figures in 3,223 papers in molecular and cell biology. The authors also assess the performance of two transformer-based models (BioLinkBERT and PubmedBERT) fine-tuned on the SourceData-NLP dataset for NER, and introduce a novel context-dependent semantic task that infers whether an entity is the target of a controlled intervention or the object of measurement.Abstract
Introduction: The scientific publishing landscape is expanding rapidly, creating challenges for researchers to stay up-to-date with the evolution of the literature. Natural Language Processing (NLP) has emerged as a potent approach to automating knowledge extraction from this vast amount of publications and preprints. Tasks such as Named-Entity Recognition (NER) and Named-Entity Linking (NEL), in conjunction with context-dependent semantic interpretation, offer promising and complementary approaches to extracting structured information and revealing key concepts. Results: We present the SourceData-NLP dataset produced through the routine curation of papers during the publication process. A unique feature of this dataset is its emphasis on the annotation of bioentities in figure legends. We annotate eight classes of biomedical entities (small molecules, gene products, subcellular components, cell lines, cell types, tissues, organisms, and diseases), their role in the experimental design, and the nature of the experimental method as an additional class. SourceData-NLP contains more than 620,000 annotated biomedical entities, curated from 18,689 figures in 3,223 papers in molecular and cell biology. We illustrate the dataset's usefulness by assessing BioLinkBERT and PubmedBERT, two transformers-based models, fine-tuned on the SourceData-NLP dataset for NER. We also introduce a novel context-dependent semantic task that infers whether an entity is the target of a controlled intervention or the object of measurement. Conclusions: SourceData-NLP's scale highlights the value of integrating curation into publishing. Models trained with SourceData-NLP will furthermore enable the development of tools able to extract causal hypotheses from the literature and assemble them into knowledge graphs.
摘要
引言:科学出版园地不断扩大,创造了对研究人员快速满足文献演进的挑战。自然语言处理(NLP)已经出现为自动提取文献中知识的有力的方法。任名实体识别(NER)和任名链接(NEL)等任务,与Context-dependent semantic interpretation,提供了批量提取结构化信息和揭示关键概念的有力的方法。结果:我们提供了SourceData-NLP数据集,通过出版过程的常规筹编而生成的。这个数据集的一个独特特点是强调文献中图例中的生物实体注解。我们注解了8类生物实体(小分子、蛋白质、细胞组成部分、细胞系列、细胞类型、组织、生物体和疾病),它们在实验设计中的角色和实验方法的自然类型。SourceData-NLP包含超过620,000个注解的生物实体,从18,689个图例中筹编于3,223篇分子和细胞生物研究。我们证明了SourceData-NLP数据集的有用性,通过评估BioLinkBERT和PubmedBERT两种基于转换器的模型,在SourceData-NLP数据集上进行NER任务的精度训练。我们还介绍了一种新的Context-dependent semantic任务,该任务检测实体是否为控制 intervención的目标或测量的对象。结论:SourceData-NLP的规模强调了integrating curation into publishing的价值。models trained with SourceData-NLP将能够开发出抽象出文献中 causal hypothesis 和组装知识图。
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
methods: 本研究提出了一个多级细化约束 Following Benchmark(FollowBench),包括内容、情境、风格、格式和例子等五种细化约束类型。为了准确测量LLMs的指令执行能力,我们提出了一种多级机制,逐级增加一个约束到初始指令中。
results: 通过测试九个关键性开源和关闭源的LLMs在FollowBench上,我们发现了LLMs在指令执行方面存在一些弱点,并指出了未来研究的可能性。Abstract
The ability to follow instructions is crucial to Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating superficial response quality, which does not necessarily indicate instruction-following capability. To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for LLMs. FollowBench comprehensively includes five different types (i.e., Content, Scenario, Style, Format, and Example) of fine-grained constraints. To enable a precise constraint following estimation, we introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each level. To evaluate whether LLMs' outputs have satisfied every individual constraint, we propose to prompt strong LLMs with constraint evolution paths to handle challenging semantic constraints. By evaluating nine closed-source and open-source popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work. The data and code are publicly available at https://github.com/YJiangcm/FollowBench.
摘要
大型语言模型(LLM)在实际应用中需要具备跟从指令的能力。现有的底线测试主要关注表面回应质量,但这并不一定表示模型能够正确跟从指令。为了填补这个研究差距,这篇论文提出了 FollowBench,一个多级精细需求遵循底线测试。FollowBench 包括五种不同类型(即内容、情况、式式、格式和示例)的精细需求。为了精确地评估模型是否能够遵循每个个别需求,我们提出了一个多级机制,逐层添加单一需求到初始指令中。以评估模型的输出是否遵循每个个别需求,我们提出了对强大的 LLM 进行具体的需求演化路径的诱导。通过评估 nine 个商业和开源的广泛使用的 LLM,我们强调了 LLM 对指令跟从的弱点,并指向未来工作的潜在可能性。底线数据和代码可以在 GitHub 上获取:https://github.com/YJiangcm/FollowBench。
AMERICANO: Argument Generation with Discourse-driven Decomposition and Agent Interaction
paper_authors: Zhe Hu, Hou Pong Chan, Yu Yin for:The paper is written for the task of counterargument generation using a subset of Reddit/CMV dataset.methods:The paper proposes a novel framework called Americano, which uses agent interaction and decomposes the generation process into sequential actions grounded in argumentation theory. The approach includes an argument refinement module that evaluates and refines argument drafts based on feedback received.results:The results show that the proposed method outperforms both end-to-end and chain-of-thought prompting methods and can generate more coherent and persuasive arguments with diverse and rich contents.Abstract
Argument generation is a challenging task in natural language processing, which requires rigorous reasoning and proper content organization. Inspired by recent chain-of-thought prompting that breaks down a complex task into intermediate steps, we propose Americano, a novel framework with agent interaction for argument generation. Our approach decomposes the generation process into sequential actions grounded on argumentation theory, which first executes actions sequentially to generate argumentative discourse components, and then produces a final argument conditioned on the components. To further mimic the human writing process and improve the left-to-right generation paradigm of current autoregressive language models, we introduce an argument refinement module which automatically evaluates and refines argument drafts based on feedback received. We evaluate our framework on the task of counterargument generation using a subset of Reddit/CMV dataset. The results show that our method outperforms both end-to-end and chain-of-thought prompting methods and can generate more coherent and persuasive arguments with diverse and rich contents.
摘要
Argument generation是自然语言处理中的一项复杂任务,需要严格的逻辑推理和正确的内容组织。受最近的链式思维提示的启发,我们提出了Americano,一种新的框架,它通过代理人之间的互动来实现论点生成。我们的方法将生成过程分解成顺序执行的动作,基于论点理论来生成论点Components,然后生成基于Components的最终论点。为了更好地模拟人类写作过程和提高现有的左至右生成模型,我们引入了一个Argument Refinement Module,它会自动评估和修正argument drafts基于反馈。我们使用Reddit/CMV数据集中的一个子集进行对比,结果表明,我们的方法在论点生成任务上表现更好,可以生成更 coherent和更有感染力的论点,并且可以生成更多的多样化和 ric contents。
Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM
results: 相比传统高性能库的手动编码微kernels,TVM生成的块分形式矩阵乘法算法提供了更高的可维护性和PORTABILITY,并在特定矩阵形式下达到了与手动优化库相当的性能水平。Abstract
We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (GEMM). % In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for GEMM. This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. % In global, the combination of our TVM-generated blocked algorithms and micro-kernels for GEMM 1)~improves portability, maintainability and, globally, streamlines the software life cycle; 2)~provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3)~features a small memory footprint.
摘要
我们探讨使用Apache TVM开源框架自动生成一家列表算法,以模仿流行的线性代数库,如GotoBLAS2、BLIS和OpenBLAS,以实现高性能的块形式化矩阵乘法(GEMM)。此外,我们还完全自动化生成过程,通过使用Apache TVM框架 derivation complete的处理器特定微内核 для GEMM。与传统高性能库不同,我们不手动编码每种架构的微内核使用 Assembly 代码。总的来说,我们的 TVM-生成的块算法和微内核组合可以:1. 提高可移植性、维护性和软件生命周期的 globally 流畅性;2. 提供高灵活性,以便轻松地调整和优化解决方案,以适应不同的数据类型、处理器架构和矩阵操作形状,实现与手动优化库相同或更高的性能;3. 具有小尺寸内存占用。
InstructCoder: Empowering Language Models for Code Editing
for: This paper is written to explore the use of large language models (LLMs) for automatic code editing based on user instructions, and to introduce the first dataset (InstructCoder) designed for this purpose.
methods: The paper uses a combination of code editing data sourced from GitHub commits and seed tasks to fine-tune open-source LLMs, and then uses these fine-tuned models to edit code based on users’ instructions.
results: The paper demonstrates that the fine-tuned LLMs can edit code correctly most of the time, exhibiting unprecedented code-editing performance levels, suggesting that proficient instruction-finetuning can lead to significant improvements in code editing abilities.Here’s the same information in Simplified Chinese:
results: 这篇论文表明,精通指令 fine-tuning 可以使得 LLM 在大多数情况下正确地编辑代码,达到了历史最高的代码编辑性能水平,表明了指令 fine-tuning 可以导致代码编辑能力的显著改善。Abstract
Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due to data scarcity. In this work, we explore the use of large language models (LLMs) to edit code based on user instructions, covering a broad range of implicit tasks such as comment insertion, code optimization, and code refactoring. To facilitate this, we introduce InstructCoder, the first dataset designed to adapt LLMs for general-purpose code editing, containing highdiversity code-editing tasks. It consists of over 114,000 instruction-input-output triplets and covers multiple distinct code editing scenarios. The dataset is systematically expanded through an iterative process that commences with code editing data sourced from GitHub commits as seed tasks. Seed and generated tasks are used subsequently to prompt ChatGPT for more task data. Our experiments demonstrate that open-source LLMs fine-tuned on InstructCoder can edit code correctly based on users' instructions most of the time, exhibiting unprecedented code-editing performance levels. Such results suggest that proficient instruction-finetuning can lead to significant amelioration in code editing abilities. The dataset and the source code are available at https://github.com/qishenghu/CodeInstruct.
摘要
��нова�� editing 包括多种实用任务,开发者日常需要进行。尽管它的重要性和实用性很高,但自动代码编辑仍然是深度学习模型的发展中未得到充分关注的领域,一些原因是数据的缺乏。在这种情况下,我们探索了使用大型自然语言模型(LLM)来编辑代码,根据用户的指令进行编辑,包括评注插入、代码优化和代码重构等多种隐式任务。为了实现这一点,我们提出了 InstructCoder,首个适用于通用代码编辑的大型语言模型数据集。它包含了超过114,000个指令输入输出 triplets,覆盖多个不同的代码编辑场景。我们通过多次迭代过程,从 GitHub 提交中获取代码编辑数据,并使用这些种子任务和生成的任务来训练 ChatGPT。我们的实验结果表明,基于 InstructCoder 进行 fine-tuning 的开源 LLM 可以根据用户的指令进行正确的代码编辑,达到了历史上最高的代码编辑性能水平。这些结果表明,有效的指令 fine-tuning 可以导致代码编辑能力的显著提高。数据集和源代码可以在 https://github.com/qishenghu/CodeInstruct 上下载。
ChiSCor: A Corpus of Freely Told Fantasy Stories by Dutch Children for Computational Linguistics and Cognitive Science
results: 这个研究发现,儿童的故事语言复杂性随着年龄的变化而变化不大,这表明了儿童在语言表达方面的能力具有一定的稳定性。此外,这个论文还发现了Zipf的法律在自由语言中的存在,这反映了儿童的语言场景的社会性。最后,这个研究还表明,尽管这个论文的数据量较小,但它仍然可以培养有用的语言模型,用于分析儿童语言使用的情况。Abstract
In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor's stories were produced in natural contexts, in line with recent calls for more ecologically valid datasets. ChiSCor hosts text, audio, and annotations for character complexity and linguistic complexity. Additional metadata (e.g. education of caregivers) is available for one third of the Dutch children. ChiSCor also includes a small set of 62 English stories. This paper details how ChiSCor was compiled and shows its potential for future work with three brief case studies: i) we show that the syntactic complexity of stories is strikingly stable across children's ages; ii) we extend work on Zipfian distributions in free speech and show that ChiSCor obeys Zipf's law closely, reflecting its social context; iii) we show that even though ChiSCor is relatively small, the corpus is rich enough to train informative lemma vectors that allow us to analyse children's language use. We end with a reflection on the value of narrative datasets in computational linguistics.
摘要
在这份资源文章中,我们发布了一个新的词库,即ChiSCor,其包含了442名荷兰儿童 aged 4-12所自由地告诉的619则幻想故事。ChiSCor是为了研究儿童如何表达人物视角,并探索语言和认知的发展,使用计算工具而编制的。与现有资源不同的是,ChiSCor的故事在自然的 Context中被生成,与最近的呼吁更加生动的数据集相符。ChiSCor包含文本、音频和注释,以及人物复杂性和语言复杂性的标注。此外,有一 third的荷兰儿童的照顾者的教育信息也可以获得。ChiSCor还包括62则英文故事。本文介绍了如何编制ChiSCor,并通过三个简要的案例研究表明其潜在的价值:一、儿童的故事 sintactic complexity 随着年龄的变化而变化不大;二、ChiSCor遵循Zipf的法则 closely,彰显其社会上的背景;三、尽管ChiSCor规模较小,但它仍然具有训练有用的lemma vector,以便分析儿童的语言使用。我们结束于计算语言学中的幻想数据集的价值。
paper_authors: Manex Agirrezabal, Hugo Gonçalo Oliveira, Aitor Ormazabal
for: 这篇论文是为了自动评估诗歌而设计的框架。
methods: 该框架使用了多种特征,我们提供了一个简要的概述,并讨论了Erato的可扩展性。
results: 通过使用Erato,我们比较了人工创作的诗歌和自动生成的诗歌,并显示了它的效果性。Abstract
We present Erato, a framework designed to facilitate the automated evaluation of poetry, including that generated by poetry generation systems. Our framework employs a diverse set of features, and we offer a brief overview of Erato's capabilities and its potential for expansion. Using Erato, we compare and contrast human-authored poetry with automatically-generated poetry, demonstrating its effectiveness in identifying key differences. Our implementation code and software are freely available under the GNU GPLv3 license.
摘要
我们介绍了一个名为“艺术”的框架,用于自动评估诗歌,包括由诗歌生成系统生成的诗歌。我们的框架使用了多种特征,我们准备提供一个简要的概述,以及它的扩展可能性。使用艺术,我们对人类创作的诗歌和自动生成的诗歌进行比较和对比,并证明它的效果性。我们的实现代码和软件都是根据GNU GPLv3许可证释出的。
results: 我们的努力取得了93.43%的TDE准确率,在排名榜上名列第二,这是我们提出的方法的有效性的证明。Abstract
The FA team participated in the Table Data Extraction (TDE) and Text-to-Table Relationship Extraction (TTRE) tasks of the NTCIR-17 Understanding of Non-Financial Objects in Financial Reports (UFO). This paper reports our approach to solving the problems and discusses the official results. We successfully utilized various enhancement techniques based on the ELECTRA language model to extract valuable data from tables. Our efforts resulted in an impressive TDE accuracy rate of 93.43 %, positioning us in second place on the Leaderboard rankings. This outstanding achievement is a testament to our proposed approach's effectiveness. In the TTRE task, we proposed the rule-based method to extract meaningful relationships between the text and tables task and confirmed the performance.
摘要
FA团队参加了NTCIR-17年度理解非财务对象在财务报表(UFO)中的表格数据提取(TDE)和文本到表格关系提取(TTRE)任务。本文介绍我们对这些问题的解决方案以及官方结果。我们成功地运用了基于ELECTRA语言模型的多种优化技术,从表格中提取有价值数据。我们的努力得分为93.43%,在排名榜上名列第二。这一优异成绩证明了我们提议的方法的有效性。在TTRE任务中,我们提出了规则基本方法来提取文本和表格之间的有意义关系,并证明了其性能。
Extracting Entities of Interest from Comparative Product Reviews
results: 对于现有的手动标注数据集,本文的方法表现出了与现有的Semantic Role Labeling(SRL)框架相比的出色表现。Abstract
This paper presents a deep learning based approach to extract product comparison information out of user reviews on various e-commerce websites. Any comparative product review has three major entities of information: the names of the products being compared, the user opinion (predicate) and the feature or aspect under comparison. All these informing entities are dependent on each other and bound by the rules of the language, in the review. We observe that their inter-dependencies can be captured well using LSTMs. We evaluate our system on existing manually labeled datasets and observe out-performance over the existing Semantic Role Labeling (SRL) framework popular for this task.
摘要
这篇论文提出了基于深度学习的方法,用于从多个电商网站上的用户评论中提取产品比较信息。每个比较性评论都包含三个主要的信息实体:被比较的产品名称、用户看法( predicate)和被比较的特性或方面。这些信息实体之间存在互相依赖关系,受语言规则约束。我们发现,使用LSTM可以很好地捕捉这些互相关系。我们对现有的手动标注数据进行评估,并观察到我们的系统在现有的Semantic Role Labeling(SRL)框架上表现出色。
Learning to Play Chess from Textbooks (LEAP): a Corpus for Evaluating Chess Moves based on Sentiment Analysis
paper_authors: Haifa Alrdahi, Riza Batista-Navarro
for: This paper is written to explore the use of chess textbooks as a new knowledge source for enabling machines to learn how to play chess.
methods: The paper uses a dataset called LEAP, which is a heterogeneous dataset containing structured and unstructured data collected from a chess textbook. The authors labelled the sentences in the dataset based on their relevance and sentiment towards the described moves. They also employed transformer-based sentiment analysis models to evaluate the moves.
results: The best performing model obtained a weighted micro F_1 score of 68% in evaluating chess moves. The authors also synthesised the LEAP corpus to create a larger dataset that can be used to address the limited textual resource in the chess domain.Abstract
Learning chess strategies has been investigated widely, with most studies focussing on learning from previous games using search algorithms. Chess textbooks encapsulate grandmaster knowledge, explain playing strategies and require a smaller search space compared to traditional chess agents. This paper examines chess textbooks as a new knowledge source for enabling machines to learn how to play chess -- a resource that has not been explored previously. We developed the LEAP corpus, a first and new heterogeneous dataset with structured (chess move notations and board states) and unstructured data (textual descriptions) collected from a chess textbook containing 1164 sentences discussing strategic moves from 91 games. We firstly labelled the sentences based on their relevance, i.e., whether they are discussing a move. Each relevant sentence was then labelled according to its sentiment towards the described move. We performed empirical experiments that assess the performance of various transformer-based baseline models for sentiment analysis. Our results demonstrate the feasibility of employing transformer-based sentiment analysis models for evaluating chess moves, with the best performing model obtaining a weighted micro F_1 score of 68%. Finally, we synthesised the LEAP corpus to create a larger dataset, which can be used as a solution to the limited textual resource in the chess domain.
摘要
学习棋盘战略已经广泛研究,大多数研究都是通过搜索算法学习过去的棋盘游戏。棋盘游戏教科书汇集了大师的知识,解释棋盘策略和游戏策略,相比传统棋盘机器人需要较小的搜索空间。本文将棋盘教科书作为新的知识来源,帮助机器人学习棋盘游戏。我们建立了LEAP集合,这是一个新的、多元的数据集,包括结构化数据(棋盘移动notation和棋盘状态)和无结构数据(文本描述),从一本包含1164句话的棋盘游戏教科书中收集到。我们首先对句子进行了相关性标注,即是否讲述了一个移动。每个相关句子 THEN 被标注为对描述的移动的情感。我们对多种基于transformer的基线模型进行了实验,以评估这些模型对棋盘移动的情感分析能力。我们的结果表明,使用基于transformer的情感分析模型可以评估棋盘移动的可能性,最好的模型在Weighted Micro F_1 score中得分68%。最后,我们将LEAP集合synthesized into a larger dataset,这可以用于解决棋盘领域的文本资源匮乏问题。
PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection
methods: 我们提出了一种新的人性探测方法,即 PsyCoT,它通过在多turn对话中让 AI 助手(基于文本分析)评分精心制定的心理问卷项来提高 GPT-3.5 在人性探测方面的表现和可靠性。
results: 我们的实验表明,PsyCoT 可以significantly improve GPT-3.5 在两个标准数据集上的平均 F1 分数,相比标准提示方法,提高了4.23/10.63点。Abstract
Recent advances in large language models (LLMs), such as ChatGPT, have showcased remarkable zero-shot performance across various NLP tasks. However, the potential of LLMs in personality detection, which involves identifying an individual's personality from their written texts, remains largely unexplored. Drawing inspiration from Psychological Questionnaires, which are carefully designed by psychologists to evaluate individual personality traits through a series of targeted items, we argue that these items can be regarded as a collection of well-structured chain-of-thought (CoT) processes. By incorporating these processes, LLMs can enhance their capabilities to make more reasonable inferences on personality from textual input. In light of this, we propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner. In particular, we employ a LLM as an AI assistant with a specialization in text analysis. We prompt the assistant to rate individual items at each turn and leverage the historical rating results to derive a conclusive personality preference. Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection, achieving an average F1 score improvement of 4.23/10.63 points on two benchmark datasets compared to the standard prompting method. Our code is available at https://github.com/TaoYang225/PsyCoT.
摘要
Translated into Simplified Chinese:最近的大型语言模型(LLMs),如ChatGPT,已经展示了Zero-shot性能的很好表现在多种自然语言处理(NLP)任务上。然而,LLMs在人格检测中的潜力,即通过文本来确定个人的人格特质,仍然得不到充分利用。我们从心理问卷中灵感,心理问卷是由心理学家仔细设计的,用于评估个人的人格特质通过一系列目标性的项目。我们认为这些项目可以被视为一个Well-structured chain-of-thought(CoT)过程。通过包含这些过程,LLMs可以提高对文本输入的推理能力,从而更好地评估个人的人格特质。基于这个想法,我们提出了一种新的人格检测方法,称为PsyCoT,它模仿了个人完成心理问卷的多turn对话方式。我们使用了一个LLM作为文本分析的AI助手,并在每个转折中询问助手对每个项目的评分。我们利用历史评分结果来 derivate一个综合的人格偏好。我们的实验表明,PsyCoT可以提高GPT-3.5在人格检测中的性能和稳定性,在两个标准数据集上 average F1 分数提高4.23/10.63点。我们的代码可以在https://github.com/TaoYang225/PsyCoT上获取。
Dynamically Updating Event Representations for Temporal Relation Classification with Multi-category Learning
results: 实验结果显示,本文的提议在英文和日文数据上都超越了现有模型和两个基eline模型。Abstract
Temporal relation classification is a pair-wise task for identifying the relation of a temporal link (TLINK) between two mentions, i.e. event, time, and document creation time (DCT). It leads to two crucial limits: 1) Two TLINKs involving a common mention do not share information. 2) Existing models with independent classifiers for each TLINK category (E2E, E2T, and E2D) hinder from using the whole data. This paper presents an event centric model that allows to manage dynamic event representations across multiple TLINKs. Our model deals with three TLINK categories with multi-task learning to leverage the full size of data. The experimental results show that our proposal outperforms state-of-the-art models and two transfer learning baselines on both the English and Japanese data.
摘要
temporal 关系分类是一个对两个提及(TLINK)之间的关系进行标注的对应任务,即事件、时间和文档创建时间(DCT)的关系。这两个限制:1)两个TLINK含有共同提及不能共享信息。2)现有的模型具有独立的分类器每个TLINK类型(E2E、E2T和E2D),使得无法利用整个数据集。本文介绍了一种事件中心模型,可以在多个TLINK之间管理动态事件表示。我们的模型处理三个TLINK类型,使用多任务学习来利用整个数据集。实验结果表明,我们的提议在英语和日语数据上都高于当前状态的模型和两个传输学习基准。
General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History
methods: Retrieval-Enhanced Medical prediction model (REMed) 可以自动评估无数量的临床事件,选择相关的事件,并进行预测。这种方法可以消除专家 manual feature selection 和观察窗口大小的限制,大大提高了开发速度。
results: 通过对27个临床任务和两个独立的EHR数据集进行实验,发现REMed 可以与其他同类架构相比,在处理数量最多的事件方面表现出色,并且与医疗专家的偏好相吻合。Abstract
Developing clinical prediction models (e.g., mortality prediction) based on electronic health records (EHRs) typically relies on expert opinion for feature selection and adjusting observation window size. This burdens experts and creates a bottleneck in the development process. We propose Retrieval-Enhanced Medical prediction model (REMed) to address such challenges. REMed can essentially evaluate an unlimited number of clinical events, select the relevant ones, and make predictions. This approach effectively eliminates the need for manual feature selection and enables an unrestricted observation window. We verified these properties through experiments on 27 clinical tasks and two independent cohorts from publicly available EHR datasets, where REMed outperformed other contemporary architectures that aim to handle as many events as possible. Notably, we found that the preferences of REMed align closely with those of medical experts. We expect our approach to significantly expedite the development of EHR prediction models by minimizing clinicians' need for manual involvement.
摘要
通常来说,基于电子医疗记录(EHR)的临床预测模型(例如,死亡预测)的开发都会依赖于专家意见来选择特征和调整观察窗口大小。这会让专家受重weigth和创造开发过程中的瓶颈。我们提议一种叫做Retrieval-Enhanced Medical prediction model(REMed)来解决这些挑战。REMed可以评估无数量的临床事件,选择相关的事件,并进行预测。这种方法可以减少专家的手动干预,并允许无限制的观察窗口。我们通过对27个临床任务和两个独立的医疗数据集进行实验,发现REMed在与其他同时处理多个事件的当今建筑物之间显著超越。另外,我们发现REMed的偏好与医疗专家的偏好相吻合。我们期望我们的方法能够快速减少临床专家的手动参与度,以便更快速地开发EHR预测模型。
results: 实验表明,使用视频信息和提出的方法可以提高翻译性能,并且我们的模型在现有的 MMT 模型中表现出色。Abstract
Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English (Ja-En) parallel subtitle pairs, 520k Chinese-English (Zh-En) parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models. The EVA dataset and the SAFA model are available at: https://github.com/ku-nlp/video-helpful-MMT.git.
摘要
现有的多Modal机器翻译(MMT)数据集包括图像和视频标题或教程视频字幕,它们很少含语言 ambiguity,使得视觉信息成为不准确的翻译。 recent work constructed an ambiguous subtitles dataset to solve this problem, but it is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA(Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation),一个MMT数据集,包含852k日语英语(Ja-En)平行字幕对,520k中文英语(Zh-En)平行字幕对,以及相应的视频片断从电影和电视剧中收集。此外,EVA还包含一个视频有用的评估集,在其中字幕是ambiguous,而视频 garantía helpful for disambiguation。此外,我们提出了SAFA(Selective Attention模型),一种基于Selective Attention模型的两种新方法:Frame attention loss和Ambiguity augmentation,用于在EVA中使用视频进行完全的解决ambiguity。实验表明,视觉信息和我们提出的方法可以提高翻译性能,我们的模型在EVA上表现出色,与现有的MMT模型相比。EVA数据集和SAFA模型可以在:https://github.com/ku-nlp/video-helpful-MMT.git中下载。
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text
paper_authors: Wenting Zhao, Ye Liu, Tong Niu, Yao Wan, Philip S. Yu, Shafiq Joty, Yingbo Zhou, Semih Yavuz for: 这个论文主要是针对大语言模型(LLM)的扩展和改进,以帮助它们更好地处理需要更多知识的问题。methods: 该论文提出了一种新的方法,即基于多种 Retrieval 工具,包括文本段 retrieval 和符号语言帮助 retrieval,以提高 LLM 的知识背景搜索能力。results: 该论文的模型在两种不同的挑战 зада上表现出色,即 two-hop 多源问题和符号语言生成问题,并在这些任务上超越了之前的方法。Abstract
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasized retrieval from unstructured text corpora, owing to its seamless integration into prompts. When using structured data such as knowledge graphs, most methods simplify it into natural text, neglecting the underlying structures. Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e.g., knowledge base and text). To fill this gap, we have curated a comprehensive dataset that poses two unique challenges: (1) Two-hop multi-source questions that require retrieving information from both open-domain structured and unstructured knowledge sources; retrieving information from structured knowledge sources is a critical component in correctly answering the questions. (2) The generation of symbolic queries (e.g., SPARQL for Wikidata) is a key requirement, which adds another layer of challenge. Our dataset is created using a combination of automatic generation through predefined reasoning chains and human annotation. We also introduce a novel approach that leverages multiple retrieval tools, including text passage retrieval and symbolic language-assisted retrieval. Our model outperforms previous approaches by a significant margin, demonstrating its effectiveness in addressing the above-mentioned reasoning challenges.
摘要
大型语言模型(LLM)在生成能力方面表现出色,但它们在仅仅基于自己的内部知识时会出现幻觉,尤其是当回答需要较少知道的信息时。 Retrieval-augmented LLMs 作为一种可能的解决方案,可以让 LLMs 与外部知识相互协同。然而,目前的方法主要集中在文本 corpus 上进行检索,这是因为它们可以轻松地 integrate 到提示中。当使用结构化数据,如知识图,大多数方法都将其简化为自然文本,忽略其下面结构。此外,当前领域中存在一个重要的空白是评估基于不同知识源(如知识库和文本)的 LLM 的效果缺乏一个真实的标准评价指标。为了填补这个空白,我们制作了一个完整的数据集,其中包含两个独特的挑战:1. 两步多源问题,需要从开放领域结构化知识和文本知识ources中检索信息,检索结构化知识源是回答问题的关键组成部分。2. 生成符号 queries(例如 SPARQL для Wikidata)是一个关键要求,这添加了另一层挑战。我们的数据集使用自动生成和人工标注的组合方式制作。我们还介绍了一种新的方法,利用多种检索工具,包括文本段检索和符号语言协助检索。我们的模型在以上两个理解挑战方面表现出色,至少比前一代方法有显著的进步。
GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval
paper_authors: Daman Arora, Anush Kini, Sayak Ray Chowdhury, Nagarajan Natarajan, Gaurav Sinha, Amit Sharma
for: 这种研究的目的是提高零shot情况下的信息检索效果,即没有访问目标领域的标注数据。
methods: 这种方法 combinates large language models (LLMs) 和嵌入基于的检索模型,使用生成增强 retrieve 和 retrieve 增强生成两种 популяр的 paradigm。
results: 这种方法在零shot情况下对 BEIR 和 TREC-DL 两个 benchmark 进行了广泛的实验,并在 6 个数据集中实现了新的状态之冠,与之前最佳结果相比,具有最高的 Recall@100 和 nDCG@10 指标。Abstract
Given a query and a document corpus, the information retrieval (IR) task is to output a ranked list of relevant documents. Combining large language models (LLMs) with embedding-based retrieval models, recent work shows promising results on the zero-shot retrieval problem, i.e., no access to labeled data from the target domain. Two such popular paradigms are generation-augmented retrieval or GAR (generate additional context for the query and then retrieve), and retrieval-augmented generation or RAG (retrieve relevant documents as context and then generate answers). The success of these paradigms hinges on (i) high-recall retrieval models, which are difficult to obtain in the zero-shot setting, and (ii) high-precision (re-)ranking models which typically need a good initialization. In this work, we propose a novel GAR-meets-RAG recurrence formulation that overcomes the challenges of existing paradigms. Our method iteratively improves retrieval (via GAR) and rewrite (via RAG) stages in the zero-shot setting. A key design principle is that the rewrite-retrieval stages improve the recall of the system and a final re-ranking stage improves the precision. We conduct extensive experiments on zero-shot passage retrieval benchmarks, BEIR and TREC-DL. Our method establishes a new state-of-the-art in the BEIR benchmark, outperforming previous best results in Recall@100 and nDCG@10 metrics on 6 out of 8 datasets, with up to 17% relative gains over the previous best.
摘要
In this work, we propose a novel GAR-meets-RAG recurrence formulation that addresses the challenges of existing paradigms. Our method iteratively improves the retrieval and rewrite stages in the zero-shot setting. A key design principle is that the rewrite-retrieval stages improve the recall of the system, while a final re-ranking stage improves the precision.We conduct extensive experiments on zero-shot passage retrieval benchmarks, BEIR and TREC-DL. Our method establishes a new state-of-the-art in the BEIR benchmark, outperforming previous best results in Recall@100 and nDCG@10 metrics on 6 out of 8 datasets, with up to 17% relative gains over the previous best.
Multi-Agent Consensus Seeking via Large Language Models
paper_authors: Huaben Chen, Wenkang Ji, Lufeng Xu, Shiyu Zhao for: This paper studies consensus-seeking in multi-agent systems driven by large language models (LLMs), with a focus on understanding the negotiation process and the impact of various factors on the outcome.methods: The paper uses LLMs to drive the agents in the system, and analyzes the strategies they use for consensus seeking, including the average strategy and other strategies. The paper also examines the impact of the agent number, agent personality, and network topology on the negotiation process.results: The paper finds that the LLM-driven agents primarily use the average strategy for consensus seeking, and that the negotiation process is affected by the agent number, agent personality, and network topology. Additionally, the paper demonstrates the potential of LLM-driven consensus seeking for achieving zero-shot autonomous planning in multi-robot collaboration tasks.Here is the same information in Simplified Chinese:for: 这篇论文研究了基于大语言模型(LLM)的多代理系统中的协同决策问题,尤其是研究代理系统之间的谈判过程以及不同因素对结果的影响。methods: 这篇论文使用LLM驱动代理系统,并分析代理系统使用的策略,包括平均策略和其他策略。论文还研究代理数量、代理人性和网络拓扑对谈判过程的影响。results: 论文发现LLM驱动代理系统主要使用平均策略进行协同决策,并且发现代理数量、代理人性和网络拓扑对谈判过程产生了影响。此外,论文还示出了基于LLM的协同决策可以实现零拟合自动化规划在多机器人合作任务中。项目官方网站:westlakeintelligentrobotics.github.io/ConsensusLLM/.Abstract
Multi-agent systems driven by large language models (LLMs) have shown promising abilities for solving complex tasks in a collaborative manner. This work considers a fundamental problem in multi-agent collaboration: consensus seeking. When multiple agents work together, we are interested in how they can reach a consensus through inter-agent negotiation. To that end, this work studies a consensus-seeking task where the state of each agent is a numerical value and they negotiate with each other to reach a consensus value. It is revealed that when not explicitly directed on which strategy should be adopted, the LLM-driven agents primarily use the average strategy for consensus seeking although they may occasionally use some other strategies. Moreover, this work analyzes the impact of the agent number, agent personality, and network topology on the negotiation process. The findings reported in this work can potentially lay the foundations for understanding the behaviors of LLM-driven multi-agent systems for solving more complex tasks. Furthermore, LLM-driven consensus seeking is applied to a multi-robot aggregation task. This application demonstrates the potential of LLM-driven agents to achieve zero-shot autonomous planning for multi-robot collaboration tasks. Project website: westlakeintelligentrobotics.github.io/ConsensusLLM/.
摘要
多智能体系驱动 by 大语言模型(LLM)已经展示了解决复杂任务的潜力。这项工作考虑到多智能体系协同作业中的一个基本问题:协调。当多个机器人合作时,我们关心他们如何达成一致。为此,本工作研究了多智能体系在协调过程中的一致寻求任务,其中每个机器人的状态都是一个数字值,他们之间进行交流以达成一致值。研究发现,当不 direkt指导哪种策略应该采用时,LLM驱动的机器人主要采用平均策略进行协调,偶尔可能采用其他策略。此外,本工作分析了参与者数量、机器人个性和网络拓扑对协调过程的影响。报告的发现可能为LLM驱动多智能体系解决更复杂任务提供基础。此外,LLM驱动的一致寻求还应用于多机器人聚集任务,这种应用示出了LLM驱动机器人可以实现零批量自主规划的多机器人合作任务。项目网站:westlakeintelligentrobotics.github.io/ConsensusLLM/.
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models
results: 实验结果表明,我们的方法可以有效地减少数据泄露的暴露,而无需影响模型的性能Abstract
Large language models pretrained on a huge amount of data capture rich knowledge and information in the training data. The ability of data memorization and regurgitation in pretrained language models, revealed in previous studies, brings the risk of data leakage. In order to effectively reduce these risks, we propose a framework DEPN to Detect and Edit Privacy Neurons in pretrained language models, partially inspired by knowledge neurons and model editing. In DEPN, we introduce a novel method, termed as privacy neuron detector, to locate neurons associated with private information, and then edit these detected privacy neurons by setting their activations to zero. Furthermore, we propose a privacy neuron aggregator dememorize private information in a batch processing manner. Experimental results show that our method can significantly and efficiently reduce the exposure of private data leakage without deteriorating the performance of the model. Additionally, we empirically demonstrate the relationship between model memorization and privacy neurons, from multiple perspectives, including model size, training time, prompts, privacy neuron distribution, illustrating the robustness of our approach.
摘要
大型语言模型在庞大数据量的训练数据中捕捉了丰富的知识和信息。在先前的研究中,我们发现了预训练语言模型中的数据卷积和复制能力,这些能力可能会导致数据泄露风险。为了有效减少这些风险,我们提出了DEPN框架,用于检测和修改隐私神经元在预训练语言模型中。在DEPN中,我们提出了一种新的私钥神经元检测方法,可以在预训练语言模型中找到与隐私信息相关的神经元,然后将这些检测到的私钥神经元的活动设置为零。此外,我们提出了一种私钥神经元聚合器,可以在批处理方式下消除私钥信息。实验结果表明,我们的方法可以有效和高效地减少隐私数据泄露的曝露,而不会损害模型的性能。此外,我们也进行了多种角度的实验,包括模型大小、训练时间、提示、私钥神经元分布等,以 Illustrate我们的方法的稳定性。
Improving Prompt Tuning with Learned Prompting Layers
For: 这 paper 是为了提高预训练模型(PTM)的适应性和性能,并且可以在少量数据情况下进行优化。* Methods: 这 paper 使用了一种新的框架,即选择性Prompt Tuning(SPT),可以学习选择适当的提示层。它还提出了一种新的双级优化框架,即 SPT-DARTS,可以更好地优化学习的提示层设置。* Results: 实验表明,我们的 SPT 框架可以在十个 benchmark 数据集下进行全数据和少数据情况下,与之前的状态态-of-the-art PETuning 基elines 相比,具有更好的性能,并且需要更少的可调参数。Abstract
Prompt tuning prepends a soft prompt to the input embeddings or hidden states and only optimizes the prompt to adapt pretrained models (PTMs) to downstream tasks. The previous work manually selects prompt layers which are far from optimal and failed to exploit the potential of prompt tuning. In this work, we propose a novel framework, \underline{S}elective \underline{P}rompt \underline{T}uning (SPT), that learns to select the proper prompt layers by inserting a prompt controlled by a learnable probabilistic gate at each intermediate layer. We further propose a novel bi-level optimization framework, SPT-DARTS, that can better optimize the learnable gates and improve the final prompt tuning performances of the learned prompt layer settings. We conduct extensive experiments with ten benchmark datasets under the full-data and few-shot scenarios. The results demonstrate that our SPT framework can perform better than the previous state-of-the-art PETuning baselines with comparable or fewer tunable parameters.
摘要
Prompt tuning 附加软提示到输入嵌入或隐藏状态中,并仅仅优化提示来适应预训练模型(PTM)到下游任务。之前的工作手动选择提示层,这些层远离优化的和失去了提示调整的潜力。在这项工作中,我们提出了一个新的框架,选择性提示调整(SPT),它学习选择合适的提示层。我们还提出了一个新的双级优化框架,SPT-DARTS,可以更好地优化可学习的门控和提示层的最终调整性。我们进行了十个标准 benchmark 数据集的广泛实验,包括全数据和少量数据场景。结果表明,我们的 SPT 框架可以在与前一个状态的基eline PETuning 基eline相比,表现更好,并且参数更少。
Ling-CL: Understanding NLP Models through Linguistic Curricula
methods: 我们使用心理语言学和语言学习研究中的语言复杂性 caracterization,从数据中提取 linguistic curricula,并与模型的训练行为相结合。
results: 我们分析了多个Benchmark NLP dataset,并发现了一些语言纪录指标(指标),这些指标可以描述每个任务所需的挑战和逻辑。Abstract
We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research to develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks. The novelty of our approach is in the development of linguistic curricula derived from data, existing knowledge about linguistic complexity, and model behavior during training. By analyzing several benchmark NLP datasets, our curriculum learning approaches identify sets of linguistic metrics (indices) that inform the challenges and reasoning required to address each task. Our work will inform future research in all NLP areas, allowing linguistic complexity to be considered early in the research and development process. In addition, our work prompts an examination of gold standards and fair evaluation in NLP.
摘要
我们采用语言复杂性Characterization从心理语言学和语言学习研究来开发数据驱动课程,以便理解模型学习NLP任务所需的基础语言知识。我们的创新在于基于数据、现有语言复杂性知识和模型训练过程中的行为来开发语言课程。我们分析了多个标准NLP数据集,并确定了每个任务所需的语言指标(指标),以便理解每个任务所需的挑战和逻辑。我们的工作将对所有NLP领域的研究提供指导,使语言复杂性在研究和开发过程中得到考虑。此外,我们的工作也促使评估标准和公平性在NLP领域得到检查。
paper_authors: Dong-Ho Lee, Jay Pujara, Mohit Sewak, Ryen W. White, Sujay Kumar Jauhar
for: 提高 NLP 系统在实际应用中的可靠性
methods: 使用可训练模型生成数据,并使用指令遵循 LLM 生成数据
results: 比人工标注数据更高的性能(17.5%),同时维持与内部任务的性能相似。Abstract
Although large language models (LLMs) have advanced the state-of-the-art in NLP significantly, deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security. As such, trainable models are still the preferred option in some cases. However, these models still require human-labeled data for optimal performance, which is expensive and time-consuming to obtain. In order to address this issue, several techniques to reduce human effort involve labeling or generating data using LLMs. Although these methods are effective for certain applications, in practice they encounter difficulties in real-world scenarios. Labeling data requires careful data selection, while generating data necessitates task-specific prompt engineering. In this paper, we propose a unified data creation pipeline that requires only a single formatting example, and which is applicable to a broad range of tasks, including traditionally problematic ones with semantically devoid label spaces. In our experiments we demonstrate that instruction-following LLMs are highly cost-effective data creators, and that models trained with these data exhibit performance better than those trained with human-labeled data (by up to 17.5%) on out-of-distribution evaluation, while maintaining comparable performance on in-distribution tasks. These results have important implications for the robustness of NLP systems deployed in the real-world.
摘要
In this paper, we propose a unified data creation pipeline that requires only a single formatting example and is applicable to a broad range of tasks, including traditionally problematic ones with semantically devoid label spaces. Our experiments show that instruction-following LLMs are highly cost-effective data creators, and that models trained with these data exhibit performance better than those trained with human-labeled data (by up to 17.5%) on out-of-distribution evaluation, while maintaining comparable performance on in-distribution tasks. These results have important implications for the robustness of NLP systems deployed in the real-world.
Keyword-optimized Template Insertion for Clinical Information Extraction via Prompt-based Learning
results: 研究发现,通过优化模板位置,可以提高临床报告分类任务的性能,并在零例学习和几例学习的情况下达到比较好的效果。Abstract
Clinical note classification is a common clinical NLP task. However, annotated data-sets are scarse. Prompt-based learning has recently emerged as an effective method to adapt pre-trained models for text classification using only few training examples. A critical component of prompt design is the definition of the template (i.e. prompt text). The effect of template position, however, has been insufficiently investigated. This seems particularly important in the clinical setting, where task-relevant information is usually sparse in clinical notes. In this study we develop a keyword-optimized template insertion method (KOTI) and show how optimizing position can improve performance on several clinical tasks in a zero-shot and few-shot training setting.
摘要
results: 论文显示了在随机环境中,反对抗算法的性能不佳,而在随机环境中,最优算法可以实现近似最优性能。这个研究提供了一种Best-of-both-worlds算法,可以同时实现近似最优性能和Robust性能。Abstract
Convex function chasing (CFC) is an online optimization problem in which during each round $t$, a player plays an action $x_t$ in response to a hitting cost $f_t(x_t)$ and an additional cost of $c(x_t,x_{t-1})$ for switching actions. We study the CFC problem in stochastic and adversarial environments, giving algorithms that achieve performance guarantees simultaneously in both settings. Specifically, we consider the squared $\ell_2$-norm switching costs and a broad class of quadratic hitting costs for which the sequence of minimizers either forms a martingale or is chosen adversarially. This is the first work that studies the CFC problem using a stochastic framework. We provide a characterization of the optimal stochastic online algorithm and, drawing a comparison between the stochastic and adversarial scenarios, we demonstrate that the adversarial-optimal algorithm exhibits suboptimal performance in the stochastic context. Motivated by this, we provide a best-of-both-worlds algorithm that obtains robust adversarial performance while simultaneously achieving near-optimal stochastic performance.
摘要
“凹函数追踪(CFC)是一个在线优化问题,每个回合 $t$,玩家会选择动作 $x_t$,面临到打击成本 $f_t(x_t)$ 和 switch 成本 $c(x_t,x_{t-1})$。我们研究了 CFC 问题在随机和敌对环境中的解决方案,提供了同时在两个设定中实现性能保证的算法。我们考虑了平方 $\ell_2$ нор switching 成本和广泛的quadratic hitting costs,其中序列的最小值 either forms a martingale or is chosen adversarially。这是第一个使用随机框架研究 CFC 问题的作品。我们提供了最佳随机在线算法的特征化,并通过对随机和敌对场景进行比较,示出了对敌对场景的优化算法在随机场景中的下降性能。这些成果motivates us to提供一种 best-of-both-worlds 算法,实现了敌对性能的稳定性和随机场景中的近似优化性能。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
For: The paper aims to address the issue of “objective mismatch” in reinforcement learning from human feedback (RLHF) for large language models (LLMs), which can lead to unexpected behaviors and suboptimal performance.* Methods: The paper reviews relevant literature from model-based reinforcement learning and discusses potential solutions to the objective mismatch issue in RLHF, including the use of multi-objective reward shaping and the integration of multiple training objectives.* Results: The paper argues that by solving the objective mismatch issue in RLHF, LLMs of the future will be more precisely aligned to user instructions for both safety and helpfulness. However, the paper does not present any specific experimental results.Abstract
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to prompt and more capable in complex settings. RLHF at its core is providing a new toolkit to optimize LLMs other than next-token prediction, enabling the integration of qualitative training goals. The attempted match between user preferences and downstream performance, which happens in a learned reward model, results in an optimization landscape where training and evaluation metrics can appear correlated. The apparent correlation can lead to unexpected behaviors and stories of "too much RLHF." In RLHF, challenges emerge because the following sub-modules are not consistent with each other: the reward model training, the policy model training, and the policy model evaluation. This mismatch results in models that sometimes avoid user requests for false safety flags, are difficult to steer to an intended characteristic, or always answer in a specific style. As chat model evaluation becomes increasingly nuanced, the reliance on a perceived link between reward model score and downstream performance drives the objective mismatch issue. In this paper, we illustrate the cause of this issue, reviewing relevant literature from model-based reinforcement learning, and discuss relevant solutions to encourage further research. By solving objective mismatch in RLHF, the LLMs of the future will be more precisely aligned to user instructions for both safety and helpfulness.
摘要
在RLHF中,出现挑战的原因是以下子模块不兼容:奖励模型训练、策略模块训练和策略模块评估。这种差异导致模型在满足用户请求时可能避免假的安全标识,困难带动特征,或总是回答在特定风格下。随着对话模型评估的加深,对奖励模型分数和下游性能的感知驱动对象匹配问题变得更加突出。本文介绍了对象匹配问题的原因,参考了相关的模型基于束缚学习 литераature,并讨论了鼓励进一步研究的相关解决方案。通过解决RLHF中的对象匹配问题,未来的LLMS将更加精准地遵循用户指令,以保证安全和有用性。
Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis
paper_authors: Abhinav Nippani, Dongyue Li, Haotian Ju, Haris N. Koutsopoulos, Hongyang R. Zhang for: 这 paper 旨在分析交通事故在路网上的分析,并评估现有深度学习方法的准确性。methods: 这 paper 使用了 graph neural networks (GraphSAGE) 和多任务学习来预测路网上交通事故的发生。results: 这 paper 的主要发现是,使用 GraphSAGE 可以准确预测路网上交通事故的数量(减幅Error 低于 22%)和事故发生或不发生(AUROC 高于 87%)。Abstract
We consider the problem of traffic accident analysis on a road network based on road network connections and traffic volume. Previous works have designed various deep-learning methods using historical records to predict traffic accident occurrences. However, there is a lack of consensus on how accurate existing methods are, and a fundamental issue is the lack of public accident datasets for comprehensive evaluations. This paper constructs a large-scale, unified dataset of traffic accident records from official reports of various states in the US, totaling 9 million records, accompanied by road networks and traffic volume reports. Using this new dataset, we evaluate existing deep-learning methods for predicting the occurrence of accidents on road networks. Our main finding is that graph neural networks such as GraphSAGE can accurately predict the number of accidents on roads with less than 22% mean absolute error (relative to the actual count) and whether an accident will occur or not with over 87% AUROC, averaged over states. We achieve these results by using multitask learning to account for cross-state variabilities (e.g., availability of accident labels) and transfer learning to combine traffic volume with accident prediction. Ablation studies highlight the importance of road graph-structural features, amongst other features. Lastly, we discuss the implications of the analysis and develop a package for easily using our new dataset.
摘要
我们考虑了基于路网连接和交通量的道路事故分析问题。先前的工作已经设计了各种深度学习方法,用于预测交通事故发生。然而,exist的评估方法的准确性没有达成一致,而且存在基本的问题是缺乏公共事故数据集,进行全面的评估。本文构建了美国各州官方报告的大规模、统一的交通事故记录集,总计900万条记录,并附带路网和交通量报告。使用这个新的数据集,我们评估了现有的深度学习方法,预测路网上事故的发生。我们的主要发现是,图 neural network 如 GraphSAGE 可以准确预测路网上事故的数量(mean absolute error 低于 22%)和事故发生或不发生的问题(AUROC 高于 87%),并且可以透过多任务学习和传输学习来补做交通量和事故预测之间的交叉状态(如事故标签的可用性)。减少学习显示了路网结构特征的重要性,等其他特征。最后,我们讨论了分析结果的意义以及开发了一个包,以便轻松使用我们新的数据集。
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
results: 在模拟数据集上训练 Neuroformer 模型后,发现其可以准确预测神经元征集和神经细胞连接,并且可以在几个shot fine-tuning 下预测动物的行为。这些结果表明 Neuroformer 可以分析神经系统数据和其 Emergent 性质,为神经科学研究提供新的方法和理论。Abstract
State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis. Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive spatiotemporal generation problem. Neuroformer is a multimodal, multitask generative pretrained transformer (GPT) model that is specifically designed to handle the intricacies of data in systems neuroscience. It scales linearly with feature size, can process an arbitrary number of modalities, and is adaptable to downstream tasks, such as predicting behavior. We first trained Neuroformer on simulated datasets, and found that it both accurately predicted simulated neuronal circuit activity, and also intrinsically inferred the underlying neural circuit connectivity, including direction. When pretrained to decode neural responses, the model predicted the behavior of a mouse with only few-shot fine-tuning, suggesting that the model begins learning how to do so directly from the neural representations themselves, without any explicit supervision. We used an ablation study to show that joint training on neuronal responses and behavior boosted performance, highlighting the model's ability to associate behavioral and neural representations in an unsupervised manner. These findings show that Neuroformer can analyze neural datasets and their emergent properties, informing the development of models and hypotheses associated with the brain.
摘要
现代系统神经科学实验室内生成大规模多Modal数据,需要新的分析工具。启发自视语领域中大型预训练模型的成功,我们将大规模、细胞分辨率神经元准噪数据分析转换成一个自然语言生成问题。Neuroformer是一种多模态、多任务生成预训练变换器(GPT)模型,专门针对系统神经科学数据处理。它可以Linearly扩展特征大小,处理任意数据模式,并适应下游任务,如预测行为。我们首先在模拟数据集上训练Neuroformer,发现它可以准确预测模拟神经Circuit活动,并自动推导下面的神经Circuit连接性,包括方向。当用于解码神经响应时,模型只需几枚射精度调整,能够预测鼠标的行为,表明模型直接从神经表示中学习如何进行这种行为,无需任何显式监督。我们使用了减少研究来证明,联合神经响应和行为培训可以提高性能,表明模型可以在无监督下关联神经和行为表示。这些发现表明Neuroformer可以分析神经数据和其emergent Properties,为神经科学发展模型和假设提供参考。
Extracting the Multiscale Causal Backbone of Brain Dynamics
results: 实验结果表明,相比于基于函数连接网络的基线方法,该方法在synthetic data上表现出优于性。当应用于resting-state fMRI数据时,发现左右大脑半球都有稀疏的 MCB。在低频带widthband上, causal 动力是由高级认知功能相关的脑区控制的;而在更高频带widthband上,感知处理相关的节点发挥关键作用。最后,对个体多尺度 causal 结构的分析表明,大脑连接指纹确实存在,从 causal 角度支持了已有的大脑连接指纹研究。Abstract
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it. Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fitting and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting from a causal perspective the existing extensive research in brain connectivity fingerprinting.
摘要
大部分脑连接研究都集中在统计相关性中,而不直接关系脑动态的 causal 机制。我们提议一种多尺度 causal 脑动态核心(MCB),共享多个个体 across multiple temporal scales,并提出一种原则性的方法来提取它。我们的方法利用了 latest advances in multiscale causal structure learning 和 optimize 模型 Complexity 的质量。empirical assessment on synthetic data shows our methodology 的 superiority over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. 多尺度 nature of our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting from a causal perspective the existing extensive research in brain connectivity fingerprinting.
EXTRACT: Explainable Transparent Control of Bias in Embeddings
paper_authors: Zhijin Guo, Zhaozhen Xu, Martha Lewis, Nello Cristianini
for: The paper aims to address the issue of bias in knowledge graph embeddings, specifically the implicit presence of protected information, and to propose a suite of Explainable and Transparent methods to control bias.
methods: The paper uses Canonical Correlation Analysis (CCA) to investigate the presence, extent, and origins of information leaks during training, and decomposes embeddings into a sum of their private attributes by solving a linear system.
results: The paper shows that a range of personal attributes can be inferred from a user’s viewing behavior and preferences, and that information about the conference in which a paper was published can be inferred from the citation network of that article. The paper also proposes four transparent methods to maintain the capability of the embedding to make the intended predictions without retaining unwanted information.Abstract
Knowledge Graphs are a widely used method to represent relations between entities in various AI applications, and Graph Embedding has rapidly become a standard technique to represent Knowledge Graphs in such a way as to facilitate inferences and decisions. As this representation is obtained from behavioural data, and is not in a form readable by humans, there is a concern that it might incorporate unintended information that could lead to biases. We propose EXTRACT: a suite of Explainable and Transparent methods to ConTrol bias in knowledge graph embeddings, so as to assess and decrease the implicit presence of protected information. Our method uses Canonical Correlation Analysis (CCA) to investigate the presence, extent and origins of information leaks during training, then decomposes embeddings into a sum of their private attributes by solving a linear system. Our experiments, performed on the MovieLens1M dataset, show that a range of personal attributes can be inferred from a user's viewing behaviour and preferences, including gender, age, and occupation. Further experiments, performed on the KG20C citation dataset, show that the information about the conference in which a paper was published can be inferred from the citation network of that article. We propose four transparent methods to maintain the capability of the embedding to make the intended predictions without retaining unwanted information. A trade-off between these two goals is observed.
摘要
知识图(Knowledge Graph)是人工智能应用中广泛使用的一种方法,用于表示实体之间的关系。而图嵌入(Graph Embedding)在这些知识图中的应用已经变得标准化,以便进行推理和决策。然而,由于这种表示来自行为数据,并不是人类可读的形式,因此存在潜在的偏见问题。我们提出了EXTRACT:一个包括可解释的和透明的方法,用于控制知识图嵌入中的偏见。我们使用相似性分析(Canonical Correlation Analysis,CCA)来调查培训过程中的信息泄露情况,然后将嵌入分解为私有属性的总和,解决一个线性系统。我们在MovieLens1M数据集上进行了实验,发现用户的观影行为和偏好可以推断出一些个人属性,包括性别、年龄和职业。在KG20C引用数据集上,我们发现了一种方法,可以通过文献引用网络中的文献来推断出文章的发表会议信息。我们提出了四种透明的方法,以保持嵌入的能力,而不是保留不必要的信息。我们观察到了这两个目标之间的平衡。
FairWASP: Fast and Optimal Fair Wasserstein Pre-processing
paper_authors: Zikai Xiong, Niccolò Dalmasso, Alan Mishler, Vamsi K. Potluru, Tucker Balch, Manuela Veloso for: 这个研究是为了提高分类模型的不偏性,尤其是在训练数据被重复使用的情况下。methods: 这个研究使用了一种名为 FairWASP 的新型预处理方法,可以在分类dataset中实现不偏性。FairWASP 使用 sample-level 的权重来将训练数据重新分配,以降低分类模型的不偏性。results: 这个研究的结果显示,FairWASP 可以对分类dataset进行优化,以降低不偏性。实验结果显示,FairWASP 可以与商业 solver 相比,在解决大规模数据的混合数据 програм中表现出色。此外,这个研究还证明了 FairWASP 可以在下游分类任务中维持精度,同时降低不偏性。Abstract
Recent years have seen a surge of machine learning approaches aimed at reducing disparities in model outputs across different subgroups. In many settings, training data may be used in multiple downstream applications by different users, which means it may be most effective to intervene on the training data itself. In this work, we present FairWASP, a novel pre-processing approach designed to reduce disparities in classification datasets without modifying the original data. FairWASP returns sample-level weights such that the reweighted dataset minimizes the Wasserstein distance to the original dataset while satisfying (an empirical version of) demographic parity, a popular fairness criterion. We show theoretically that integer weights are optimal, which means our method can be equivalently understood as duplicating or eliminating samples. FairWASP can therefore be used to construct datasets which can be fed into any classification method, not just methods which accept sample weights. Our work is based on reformulating the pre-processing task as a large-scale mixed-integer program (MIP), for which we propose a highly efficient algorithm based on the cutting plane method. Experiments on synthetic datasets demonstrate that our proposed optimization algorithm significantly outperforms state-of-the-art commercial solvers in solving both the MIP and its linear program relaxation. Further experiments highlight the competitive performance of FairWASP in reducing disparities while preserving accuracy in downstream classification settings.
摘要
近年来,机器学习方法的增长有助于降低不同子群体的模型输出差异。在多个应用场景中,训练数据可能会被多个用户 reuse,这意味着可以直接 intervene 在训练数据本身上。在这种情况下,我们提出了 FairWASP,一种新的预处理方法,可以在类别 datasets 中减少不同子群体的差异。 FairWASP 返回样本级别的权重,使得重新权重化的 dataset 最小化 Wasserstein 距离原始dataset,同时满足(一种 empirical 版本的)人口平衡,一种受欢迎的公平性标准。我们证明了整数权重是优化的,这意味着我们的方法可以看作是复制或消除样本。 FairWASP 可以用来构建可以被任何类别方法接受的 dataset,不仅是可以接受样本权重的方法。我们基于大规模混合整数 програм(MIP)的 reformulation 提出了一种高效的算法,基于切割面法。 Synthetic 数据集上的实验表明,我们提出的优化算法在 MIP 和其线性 програм的 relaxation 上明显超越了当前市场上的商业解决方案。此外,实验还表明,FairWASP 可以减少差异,保持下游类别设置中的准确性。
results: 该方法可以减少训练数据量,但保持3D重建质量与使用整个数据集相当。Abstract
We consider the problem of 3D seismic inversion from pre-stack data using a very small number of seismic sources. The proposed solution is based on a combination of compressed-sensing and machine learning frameworks, known as compressed-learning. The solution jointly optimizes a dimensionality reduction operator and a 3D inversion encoder-decoder implemented by a deep convolutional neural network (DCNN). Dimensionality reduction is achieved by learning a sparse binary sensing layer that selects a small subset of the available sources, then the selected data is fed to a DCNN to complete the regression task. The end-to-end learning process provides a reduction by an order-of-magnitude in the number of seismic records used during training, while preserving the 3D reconstruction quality comparable to that obtained by using the entire dataset.
摘要
我团队考虑了基于非常小的数量的地震源数据的3D地震推准问题。我们的解决方案基于压缩感知和机器学习框架,称为压缩学习。我们的解决方案同时优化了维度减少算子和基于深度卷积神经网络(DCNN)实现的3D推准编码器-解码器。通过学习一个稀疏二进制感知层,选择一小部分可用的源数据,然后将选择的数据传递给DCNN完成回归任务。我们的综合学习过程可以在训练中采用一个数量级减少,保持与整个数据集相同的3D重建质量。
Seeking Truth and Beauty in Flavor Physics with Machine Learning
results: 研究人员通过优化损失函数来构建了真实美丽的Yukawa夹心模型,这个模型满足了现有实验数据和抽象理论家的标准。Abstract
The discovery process of building new theoretical physics models involves the dual aspect of both fitting to the existing experimental data and satisfying abstract theorists' criteria like beauty, naturalness, etc. We design loss functions for performing both of those tasks with machine learning techniques. We use the Yukawa quark sector as a toy example to demonstrate that the optimization of these loss functions results in true and beautiful models.
摘要
发现过程中建立新理论物理模型具有两个方面的双重目标:一是适应现有实验数据,二是满足抽象理论家的标准如美食、自然性等。我们使用机器学习技术定义损失函数来实现这两个任务。我们使用Yukawa夹心作为一个示例,示出优化这些损失函数后可以获得真实美妙的模型。
Ensemble models outperform single model uncertainties and predictions for operator-learning of hypersonic flows
results: 比较三种不确定量化方法,发现ensembling最佳地对错误和不确定量化进行减少,并在 interpolative 和 extrapolative 状况下均表现出色。Abstract
High-fidelity computational simulations and physical experiments of hypersonic flows are resource intensive. Training scientific machine learning (SciML) models on limited high-fidelity data offers one approach to rapidly predict behaviors for situations that have not been seen before. However, high-fidelity data is itself in limited quantity to validate all outputs of the SciML model in unexplored input space. As such, an uncertainty-aware SciML model is desired. The SciML model's output uncertainties could then be used to assess the reliability and confidence of the model's predictions. In this study, we extend a DeepONet using three different uncertainty quantification mechanisms: mean-variance estimation, evidential uncertainty, and ensembling. The uncertainty aware DeepONet models are trained and evaluated on the hypersonic flow around a blunt cone object with data generated via computational fluid dynamics over a wide range of Mach numbers and altitudes. We find that ensembling outperforms the other two uncertainty models in terms of minimizing error and calibrating uncertainty in both interpolative and extrapolative regimes.
摘要
高精度计算 simulations 和物理实验对高速流动进行资源密集的研究。使用有限的高精度数据来训练科学机器学习(SciML)模型,以快速预测未曾看到的行为。然而,高精度数据受限,无法验证SciML模型在未知输入空间中的所有输出。因此,需要一个 uncertainty-aware SciML 模型。SciML 模型的输出不确定性可以用来评估模型预测的可靠性和信任度。在这种研究中,我们扩展了 DeepONet,使用三种不同的不确定性评估机制:mean-variance estimation、evidential uncertainty 和 ensembling。这些不确定性意识 DeepONet 模型在高速流动around blunt cone 对象上被训练和评估,我们发现 ensemble 在 interpolative 和 extrapolative режимом中都能够最小化错误和 calibrate 不确定性。
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation
methods: 本研究提出了一种名为 Tabular data Pre-Training via Meta-representation (TabPTM) 的方法,具体来说是使用一个固定数量的原型来标准化不同的表格数据集,然后通过一个深度神经网络将这些元表示与数据集特定的分类信任关联起来。
results: 实验表明,TabPTM 在新的表格数据集上能够达到了良好的表现,甚至在几shot场景下也能够达到比较好的结果。Abstract
Tabular data is prevalent across various machine learning domains. Yet, the inherent heterogeneities in attribute and class spaces across different tabular datasets hinder the effective sharing of knowledge, limiting a tabular model to benefit from other datasets. In this paper, we propose Tabular data Pre-Training via Meta-representation (TabPTM), which allows one tabular model pre-training on a set of heterogeneous datasets. Then, this pre-trained model can be directly applied to unseen datasets that have diverse attributes and classes without additional training. Specifically, TabPTM represents an instance through its distance to a fixed number of prototypes, thereby standardizing heterogeneous tabular datasets. A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences, endowing TabPTM with the ability of training-free generalization. Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
摘要
tabular 数据广泛存在不同机器学习领域中。然而,不同的 tabular 数据集之间的自然差异会阻碍知识共享,限制一个 tabular 模型只能在自己的数据集上学习。在这篇论文中,我们提出了 Tabular 数据预训练 via Meta-representation(TabPTM),允许一个 tabular 模型在一组不同的 heterogeneous 数据集上预训练。然后,这个预训练后的模型可以直接应用于未看过的数据集,无需进行额外的训练。具体来说,TabPTM 通过将实例表示为它们与固定数量的原型之间的距离,标准化 heterogeneous tabular 数据集。然后,一个深度神经网络被训练以关联这些 meta-representation 与数据集特定的分类信心,赋予 TabPTM 无需训练的普适性。实验证明,TabPTM 在新的数据集中表现出色,即使是几个步骤的情况下。
results: 研究发现,使用连续、离散、无 bound 或 bounded 活动函数的高默戈罗夫两层神经网络模型可以准确表示所有多变量函数,包括连续、离散、无 bound 和 bounded 函数。Abstract
In this paper, we show that the Kolmogorov two hidden layer neural network model with a continuous, discontinuous bounded or unbounded activation function in the second hidden layer can precisely represent continuous, discontinuous bounded and all unbounded multivariate functions, respectively.
摘要
在这篇论文中,我们表明了科尔莫洛夫两层神经网络模型,其中第二层隐藏层使用连续、终止、 bounded或无界活动函数,可以精准表示连续、终止 bounded 和所有无界多变量函数。Note: "终止" (bié zhì) in the text refers to "discontinuous" in English.
Unexpected Improvements to Expected Improvement for Bayesian Optimization
paper_authors: Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, Eytan Bakshy
for: The paper is written for optimizing acquisition functions in Bayesian optimization, specifically addressing the challenges of numerical optimization in existing methods.
methods: The paper proposes a new family of acquisition functions called LogEI, which includes reformulations of classic EI and its variants to address numerical pathologies and improve optimization performance.
results: The paper demonstrates the effectiveness of LogEI through empirical results, showing that members of the LogEI family substantially improve optimization performance compared to their canonical counterparts and are on par with or exceed the performance of recent state-of-the-art acquisition functions.Here’s the same information in Simplified Chinese text:
methods: 论文提出了一个新的 acquisition function 家族 called LogEI,包括修复 classic EI 和其 variants 中的数学问题,以提高优化性能。
results: 论文通过实验结果显示,LogEI 家族成员在优化性能上有显著提高,与 canonical counterparts 相当或超过了最新的 state-of-the-art acquisition function 的性能。Abstract
Expected Improvement (EI) is arguably the most popular acquisition function in Bayesian optimization and has found countless successful applications, but its performance is often exceeded by that of more recent methods. Notably, EI and its variants, including for the parallel and multi-objective settings, are challenging to optimize because their acquisition values vanish numerically in many regions. This difficulty generally increases as the number of observations, dimensionality of the search space, or the number of constraints grow, resulting in performance that is inconsistent across the literature and most often sub-optimal. Herein, we propose LogEI, a new family of acquisition functions whose members either have identical or approximately equal optima as their canonical counterparts, but are substantially easier to optimize numerically. We demonstrate that numerical pathologies manifest themselves in "classic" analytic EI, Expected Hypervolume Improvement (EHVI), as well as their constrained, noisy, and parallel variants, and propose corresponding reformulations that remedy these pathologies. Our empirical results show that members of the LogEI family of acquisition functions substantially improve on the optimization performance of their canonical counterparts and surprisingly, are on par with or exceed the performance of recent state-of-the-art acquisition functions, highlighting the understated role of numerical optimization in the literature.
摘要
预期改进(EI)是搜索优化中最受欢迎的功能之一,在bayesian优化中找到了 countless 的成功应用,但其表现通常被更新的方法所超越。尤其是EI和其变体,包括并行和多目标设置,在数值优化方面很困难,因为它们在许多区域中的优化值会变成数值 zero。这种困难通常随着观察数、搜索空间维度或约束数量增加,导致文献中的表现不一致,通常是下OPTimal。在这里,我们提出了 LogEI,一个新的家族 acquisition 函数,其成员在 canonical 对应者中有同等或相近的最优点,但是在数值优化方面更加容易。我们示示了类别 EI、Expected Hypervolume Improvement(EHVI)以及它们的 constrained、noisy 和并行变体中的数值病理,并提出了相应的改进方案。我们的实验结果表明,LogEI 家族中的 acquisition 函数在 canonical 对应者中显著提高了优化性能,奇怪地,与或超过了当前状态的艺术 acquisition 函数的性能,这highlights 了文献中数值优化的重要性。
Farthest Greedy Path Sampling for Two-shot Recommender Search
results: 通过在Three Click-Through Rate (CTR) prediction benchmarks上进行评估,发现该方法可以常见 manually designed和大多数 NAS 基于的模型,并且在Click-Through Rate (CTR) 预测任务中具有优秀的性能。Abstract
Weight-sharing Neural Architecture Search (WS-NAS) provides an efficient mechanism for developing end-to-end deep recommender models. However, in complex search spaces, distinguishing between superior and inferior architectures (or paths) is challenging. This challenge is compounded by the limited coverage of the supernet and the co-adaptation of subnet weights, which restricts the exploration and exploitation capabilities inherent to weight-sharing mechanisms. To address these challenges, we introduce Farthest Greedy Path Sampling (FGPS), a new path sampling strategy that balances path quality and diversity. FGPS enhances path diversity to facilitate more comprehensive supernet exploration, while emphasizing path quality to ensure the effective identification and utilization of promising architectures. By incorporating FGPS into a Two-shot NAS (TS-NAS) framework, we derive high-performance architectures. Evaluations on three Click-Through Rate (CTR) prediction benchmarks demonstrate that our approach consistently achieves superior results, outperforming both manually designed and most NAS-based models.
摘要
weight-sharing neural architecture search (WS-NAS) 提供了一种高效的终端深度推荐模型开发机制。然而,在复杂的搜索空间中,分化出优于劣的体系(或路径)是具有挑战性。这些挑战得到加剧,因为超网络覆盖率有限,并且子网重量相互整合,这限制了weight-sharing机制内置的探索和利用能力。为解决这些挑战,我们介绍了远程最大规则 sampling (FGPS),一种新的路径采样策略,既保证路径质量,又提高了路径多样性,以便更全面地探索超网络。通过将FGPS纳入两极 NAS (TS-NAS) 框架中,我们 derivation 高性能的体系。对三个 Click-Through Rate (CTR) 预测benchmark进行评估,我们的方法一直保持优秀的结果,比 manual 设计和大多数 NAS 基于模型都高。
Bayesian Multistate Bennett Acceptance Ratio Methods
results: 当使用均匀先验分布时,BayesMBAR方法可以与MBAR方法recover结果,但是可以提供更加准确的uncertainty估计。此外,当有具体关于自由能量的先验知识时,BayesMBAR方法可以将其integrate到估计过程中,提供更加准确的估计结果。Abstract
The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integrating configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR's result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using non-uniform prior distributions. As an example, we show that, by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR's widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.
摘要
多状态本нет特征比率方法(MBAR)是一种广泛使用的方法来计算热动力学状态的自由能。在这项工作中,我们介绍了抽象 BayesMBAR,一种 bayesian 扩展 MBAR 方法。通过将thermodynamic状态中采样的配置与一个先验分布集成,BayesMBAR 计算出一个后验分布自由能。使用这个后验分布,我们计算出自由能估计值和相关的不确定性。值得注意的是,当使用均匀先验分布时,BayesMBAR 重现 MBAR 的结果,但是提供更加准确的不确定性估计。此外,当有具体的先验知识关于自由能的情况下,BayesMBAR 可以将这些信息integrated到估计过程中,通过使用非均匀先验分布。作为一个例子,我们示例了,通过 incorporating 自由能表面的平滑性先验知识,BayesMBAR 提供了更加准确的估计值。considering MBAR 在自由能计算中广泛应用,我们预计 BayesMBAR 将成为各种自由能计算应用中的重要工具。
Compression with Exact Error Distribution for Federated Learning
results: 该论文提出的压缩和汇集方法可以提高和改进标准的 FL 方案中的 Gaussian 噪声(如 Langevin 动力学和随机平滑)。Abstract
Compression schemes have been extensively used in Federated Learning (FL) to reduce the communication cost of distributed learning. While most approaches rely on a bounded variance assumption of the noise produced by the compressor, this paper investigates the use of compression and aggregation schemes that produce a specific error distribution, e.g., Gaussian or Laplace, on the aggregated data. We present and analyze different aggregation schemes based on layered quantizers achieving exact error distribution. We provide different methods to leverage the proposed compression schemes to obtain compression-for-free in differential privacy applications. Our general compression methods can recover and improve standard FL schemes with Gaussian perturbations such as Langevin dynamics and randomized smoothing.
摘要
压缩方案在分布式学习(Federated Learning,FL)中广泛应用以降低分布式学习中的通信成本。大多数方法假设压缩器生成的噪声具有有限的方差,但这篇论文探讨使用压缩和汇集方案生成特定异常分布,例如 Gaussian 或 Laplace Distribution,于汇集数据。我们提出和分析不同层次量化器实现的不同汇集方案,并提供不同的方法来利用我们的压缩方法实现“压缩-for-free”在权限保护应用中。我们的通用压缩方法可以恢复和改进标准FL方案中的 Gaussian 噪声,例如杜邦随机扩散和随机缓和。
Latent Field Discovery In Interacting Dynamical Systems With Neural Fields
results: 实验表明,本研究可以准确地发现场景中的场效应,并使用这些场效应来预测未来的轨迹。Abstract
Systems of interacting objects often evolve under the influence of field effects that govern their dynamics, yet previous works have abstracted away from such effects, and assume that systems evolve in a vacuum. In this work, we focus on discovering these fields, and infer them from the observed dynamics alone, without directly observing them. We theorize the presence of latent force fields, and propose neural fields to learn them. Since the observed dynamics constitute the net effect of local object interactions and global field effects, recently popularized equivariant networks are inapplicable, as they fail to capture global information. To address this, we propose to disentangle local object interactions -- which are $\mathrm{SE}(n)$ equivariant and depend on relative states -- from external global field effects -- which depend on absolute states. We model interactions with equivariant graph networks, and combine them with neural fields in a novel graph network that integrates field forces. Our experiments show that we can accurately discover the underlying fields in charged particles settings, traffic scenes, and gravitational n-body problems, and effectively use them to learn the system and forecast future trajectories.
摘要
Translated into Simplified Chinese:系统中的对象们经常在场效果的影响下演化,然而先前的工作忽略了这些效果,假设系统在虚拟中运行。在这项工作中,我们关注发现这些场,不直接观察它们。我们推测存在潜在的力场,并提议使用神经场来学习它们。由于观察的动态包含本地对象互动和全局场效果,当前流行的对称网络是无法捕捉全局信息的。为此,我们提议分离本地对象互动,它们是 $\mathrm{SE}(n)$ 对称的并且基于相对状态,与外部全局场效果,它们基于绝对状态分离。我们使用对称图网络来表示互动,并将其与神经场结合在一起。我们的实验表明,我们可以准确发现带电粒子设置、交通场景和重力n体问题下的下面场,并使用它们来学习系统和预测未来轨迹。
Balancing Act: Constraining Disparate Impact in Sparse Models
results: 这篇论文的结果显示,使用这种模型剔除方法可以实现类似于整个数据集的性能,但是对于一些数据子集可能会导致严重的性能下降。此外,现有的方法可能会对保护的子集来 induction of disparate impact,而这篇论文的方法可以直接 Addresses 这个问题,并且可以在大型模型和多个保护子集的情况下进行测试。Abstract
Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that $\textit{directly addresses the disparate impact of pruning}$: our formulation bounds the accuracy change between the dense and sparse models, for each sub-group. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.
摘要
Density Matrix Emulation of Quantum Recurrent Neural Networks for Multivariate Time Series Prediction
paper_authors: José Daniel Viqueira, Daniel Faílde, Mariamo M. Juane, Andrés Gómez, David Mera
for: 模型和预测未来值的多变量时间序列的 quantum 循环神经网络 (QRNN)。
methods: 使用密度矩阵 formalism 设计专门的模拟方法,以减少计算成本。
results: QRNN 可以准确预测未来值,并通过捕捉输入时间序列的非常复杂的模式来捕捉非线性关系。Abstract
Quantum Recurrent Neural Networks (QRNNs) are robust candidates to model and predict future values in multivariate time series. However, the effective implementation of some QRNN models is limited by the need of mid-circuit measurements. Those increase the requirements for quantum hardware, which in the current NISQ era does not allow reliable computations. Emulation arises as the main near-term alternative to explore the potential of QRNNs, but existing quantum emulators are not dedicated to circuits with multiple intermediate measurements. In this context, we design a specific emulation method that relies on density matrix formalism. The mathematical development is explicitly provided as a compact formulation by using tensor notation. It allows us to show how the present and past information from a time series is transmitted through the circuit, and how to reduce the computational cost in every time step of the emulated network. In addition, we derive the analytical gradient and the Hessian of the network outputs with respect to its trainable parameters, with an eye on gradient-based training and noisy outputs that would appear when using real quantum processors. We finally test the presented methods using a novel hardware-efficient ansatz and three diverse datasets that include univariate and multivariate time series. Our results show how QRNNs can make accurate predictions of future values by capturing non-trivial patterns of input series with different complexities.
摘要
To address this challenge, we propose a specific emulation method based on density matrix formalism. We provide a compact mathematical development using tensor notation, which allows us to show how the present and past information from a time series is transmitted through the circuit and how to reduce the computational cost in every time step of the emulated network. Additionally, we derive the analytical gradient and Hessian of the network outputs with respect to its trainable parameters, which is essential for gradient-based training and handling noisy outputs when using real quantum processors.We test the presented methods using a novel hardware-efficient ansatz and three diverse datasets that include univariate and multivariate time series. Our results demonstrate that QRNNs can accurately predict future values by capturing non-trivial patterns of input series with different complexities.Here is the translation in Simplified Chinese:量子回归神经网络 (QRNN) 是Multivariate时间序列预测的可能性的稳定候选人。然而,一些QRNN模型的有效实现受到中间测量的限制,这会增加量子硬件的需求,现今NISQ时代无法可靠计算。虚拟是NISQ时代的主要准near-term alternativeto explore QRNN的潜力。然而,现有的量子模拟器没有专门针对多个中间测量的环境进行设计。为此,我们提出了一种基于密度矩阵ormalformalism的特定模拟方法。我们使用tensor notation提供了一个紧凑的数学发展,可以显示在时间序列中传递信息的方式,以及在每次时间步骤中减少模拟网络的计算成本。此外,我们还 derive了模拟网络输出的分析偏微和Hessian,这对于基于梯度的训练和使用真正量子处理器时的噪声输出都是非常重要的。我们使用一种新的硬件高效的ansatz测试了我们的方法,并使用三个多样化的时间序列 dataset,包括单variate和多variate时间序列。我们的结果表明,QRNN可以准确预测未来值,并且可以捕捉不同复杂性的输入序列中的非常见模式。
Performance Improvement in Multi-class Classification via Automated Hierarchy Generation and Exploitation through Extended LCPN Schemes
results: 研究发现,LCPN+F 在多个数据集和场景中具有最高的表现优势,并且与 Flat Classification(FC)的运行时间性能相当。此外,研究也强调了选择合适的层次利用方案可以最大化分类表现。Abstract
Hierarchical classification (HC) plays a pivotal role in multi-class classification tasks, where objects are organized into a hierarchical structure. This study explores the performance of HC through a comprehensive analysis that encompasses both hierarchy generation and hierarchy exploitation. This analysis is particularly relevant in scenarios where a predefined hierarchy structure is not readily accessible. Notably, two novel hierarchy exploitation schemes, LCPN+ and LCPN+F, which extend the capabilities of LCPN and combine the strengths of global and local classification, have been introduced and evaluated alongside existing methods. The findings reveal the consistent superiority of LCPN+F, which outperforms other schemes across various datasets and scenarios. Moreover, this research emphasizes not only effectiveness but also efficiency, as LCPN+ and LCPN+F maintain runtime performance comparable to Flat Classification (FC). Additionally, this study underscores the importance of selecting the right hierarchy exploitation scheme to maximize classification performance. This work extends our understanding of HC and establishes a benchmark for future research, fostering advancements in multi-class classification methodologies.
摘要
Projecting basis functions with tensor networks for Gaussian process regression
methods: 我们使用了一个线性组合方法,其中每个基函数的精度取决于总共的基函数数量M。我们开发了一种方法,可以使用无限多个基函数,而无需相应的无限大的计算复杂度。这个想法的关键点是使用低维度TN。我们首先从数据中找到一个低维度的子空间,并使用这个子空间来解决一个 Bayesian 推理问题。最后,我们将结果 projection 到原始空间中,以便使用 GP 预测。
results: 我们在一个18维度的标准数据集上进行了一个实验,用于解决一个逆动力学问题。我们的方法可以减少计算复杂度,同时保持 GP 预测的准确性。Abstract
This paper presents a method for approximate Gaussian process (GP) regression with tensor networks (TNs). A parametric approximation of a GP uses a linear combination of basis functions, where the accuracy of the approximation depends on the total number of basis functions $M$. We develop an approach that allows us to use an exponential amount of basis functions without the corresponding exponential computational complexity. The key idea to enable this is using low-rank TNs. We first find a suitable low-dimensional subspace from the data, described by a low-rank TN. In this low-dimensional subspace, we then infer the weights of our model by solving a Bayesian inference problem. Finally, we project the resulting weights back to the original space to make GP predictions. The benefit of our approach comes from the projection to a smaller subspace: It modifies the shape of the basis functions in a way that it sees fit based on the given data, and it allows for efficient computations in the smaller subspace. In an experiment with an 18-dimensional benchmark data set, we show the applicability of our method to an inverse dynamics problem.
摘要
To overcome this limitation, the proposed method uses low-rank TNs to reduce the dimensionality of the data. Specifically, the method first finds a suitable low-dimensional subspace from the data using a low-rank TN. In this low-dimensional subspace, the method then infers the weights of the model by solving a Bayesian inference problem. Finally, the resulting weights are projected back to the original space to make GP predictions.The key advantage of the proposed method is that it allows for efficient computations in a smaller subspace. By projecting the basis functions onto a lower-dimensional space, the method modifies the shape of the basis functions in a way that is appropriate for the given data. This leads to more accurate predictions with a smaller number of basis functions.The proposed method is demonstrated on an 18-dimensional benchmark data set for an inverse dynamics problem. The results show the applicability of the method to real-world problems and its potential to improve the efficiency and accuracy of GP regression.
Graph Matching via convex relaxation to the simplex
results: 在 correlated Gaussian Wigner model 下,示出了凸relaxation方法可以具有高概率Unique解,并且在无噪场景下可以精确地回归真实的 Permutation。此外,还提出了一种新的 suficiency condition,可以更好地限制输入矩阵,从而提高对 GRAMPA 算法的性能。Abstract
This paper addresses the Graph Matching problem, which consists of finding the best possible alignment between two input graphs, and has many applications in computer vision, network deanonymization and protein alignment. A common approach to tackle this problem is through convex relaxations of the NP-hard \emph{Quadratic Assignment Problem} (QAP). Here, we introduce a new convex relaxation onto the unit simplex and develop an efficient mirror descent scheme with closed-form iterations for solving this problem. Under the correlated Gaussian Wigner model, we show that the simplex relaxation admits a unique solution with high probability. In the noiseless case, this is shown to imply exact recovery of the ground truth permutation. Additionally, we establish a novel sufficiency condition for the input matrix in standard greedy rounding methods, which is less restrictive than the commonly used `diagonal dominance' condition. We use this condition to show exact one-step recovery of the ground truth (holding almost surely) via the mirror descent scheme, in the noiseless setting. We also use this condition to obtain significantly improved conditions for the GRAMPA algorithm [Fan et al. 2019] in the noiseless setting.
摘要
Online Conversion with Switching Costs: Robust and Learning-Augmented Algorithms
results: 论文通过一个碳素扩展EV充电案例研究了其提议的算法,并证明了它们可以substantially提高基准方法的性能。Abstract
We introduce and study online conversion with switching costs, a family of online problems that capture emerging problems at the intersection of energy and sustainability. In this problem, an online player attempts to purchase (alternatively, sell) fractional shares of an asset during a fixed time horizon with length $T$. At each time step, a cost function (alternatively, price function) is revealed, and the player must irrevocably decide an amount of asset to convert. The player also incurs a switching cost whenever their decision changes in consecutive time steps, i.e., when they increase or decrease their purchasing amount. We introduce competitive (robust) threshold-based algorithms for both the minimization and maximization variants of this problem, and show they are optimal among deterministic online algorithms. We then propose learning-augmented algorithms that take advantage of untrusted black-box advice (such as predictions from a machine learning model) to achieve significantly better average-case performance without sacrificing worst-case competitive guarantees. Finally, we empirically evaluate our proposed algorithms using a carbon-aware EV charging case study, showing that our algorithms substantially improve on baseline methods for this problem.
摘要
我们介绍和研究在线数据汇流中具有转换成本的问题,这是跨能源和可持续性领域的新兴问题。在这个问题中,一个在线玩家尝试在时间长度为T的时间interval中购买(或卖出)资产的分量。在每个时间步骤中,一个成本函数(或价格函数)会被公布,玩家必须不可逆地决定要购买的资产量。当玩家在连续两个时间步骤中改变他的决定时,就会付出转换成本。我们提出了竞争(可靠)阈值基于的算法,用于最小化和最大化这个问题的解。我们还提出了学习增强的算法,可以利用不可信的黑盒模型(如机器学习模型)来实现更好的平均情况表现,而不需要对最差情况的竞争保证。最后,我们实际评估了我们的提案算法,使用一个具有可持续性的电动车充电案例,展示了我们的算法可以对基eline方法做出重要改进。
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
paper_authors: Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu
For: + The paper aims to address the challenge of offline reinforcement learning (RL) in real-world scenarios where data collection is costly and risky. + The authors propose a general framework called LaMo, which leverages pre-trained Language Models (LMs) to improve offline RL performance.* Methods: + The LaMo framework initializes Decision Transformers with sequentially pre-trained LMs and employs the LoRA fine-tuning method to combine pre-trained knowledge and in-domain knowledge effectively. + The framework uses non-linear MLP transformation to generate embeddings and integrates an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages.* Results: + The LaMo framework achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. + The method demonstrates superior performance in scenarios with limited data samples.Here’s the simplified Chinese text version of the three information points:* For: + 本研究旨在解决在实际应用中,数据采集成本高昂且风险大的下线强化学习问题。 + 作者提出了一个通用的框架,即 LaMo,该框架利用预训练的语言模型(LMs)提高下线强化学习性能。* Methods: + LaMo框架在初始化决策变换器时使用顺序预训练LMs,并使用LoRA fine-tuning方法,相比全量精度练习,有效地结合预训练知识和域内知识。 + 框架使用非线性MLP变换而不是线性投影,生成嵌入,并在练习过程中添加语言预测任务,以稳定LMs并保持其原始语言能力。* Results: + LaMo框架在稀有奖励任务中实现了状态最佳性和减少了决策变换器在拥挤奖励任务中的差距。 + 方法在数据样本有限的情况下表现出优于其他方法。Abstract
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is https://lamo2023.github.io
摘要
偏向学习(Offline Reinforcement Learning)目标是找到近似优化策略,使用预收集的数据集。在实际场景中,数据收集可能成本高昂并且具有风险,因此偏向学习在域内数据有限的情况下特别挑战。基于大语言模型(Large Language Models,LLMs)和它们的少量学习能力,这篇论文提出了《语言模型 для动作控制》(LaMo)框架,利用决策转移器来有效地使用预训练的语言模型(LMs)进行偏向学习。我们的框架包括四个关键组成部分:1. 使用顺序预训练的LMs初始化决策转移器。2. 使用LoRA fine-tuning方法,而不是全量精度训练,将LMs的预训练知识和域内知识相结合。3. 使用非线性多层Perceptron变换而不是线性投影,生成特征表示。4. 在精通训练中添加语言预测损失,以稳定LMs并保持它们的原始语言能力。实验结果表明,LaMo在缺少奖励任务中表现出色,并在权值积分任务中追caught到值基本的offline RL方法。尤其是在数据样本有限的情况下,LaMo表现出了superior的性能。如果您想了解更多细节,请访问我们的项目网站:
Stochastic Gradient Descent for Gaussian Processes Done Right
paper_authors: Jihao Andreas Lin, Shreyas Padhy, Javier Antorán, Austin Tripp, Alexander Terenin, Csaba Szepesvári, José Miguel Hernández-Lobato, David Janz
for: 这篇论文关注的是 Gaussian process regression 优化问题,具体来说是使用平方损失函数。
results: 这篇论文的实验结果表明,使用 Stochastic Dual Gradient Descent 算法可以高效地解决 Gaussian process regression 优化问题,并且与其他方法相比,如 conjugate gradient descent 和 variational Gaussian process approximations,表现更出色。在一个分子绑定亲和力预测任务上,这种方法可以让 Gaussian process regression 与状态革命的 graph neural networks 相匹配。Abstract
We study the optimisation problem associated with Gaussian process regression using squared loss. The most common approach to this problem is to apply an exact solver, such as conjugate gradient descent, either directly, or to a reduced-order version of the problem. Recently, driven by successes in deep learning, stochastic gradient descent has gained traction as an alternative. In this paper, we show that when done right$\unicode{x2014}$by which we mean using specific insights from the optimisation and kernel communities$\unicode{x2014}$this approach is highly effective. We thus introduce a particular stochastic dual gradient descent algorithm, that may be implemented with a few lines of code using any deep learning framework. We explain our design decisions by illustrating their advantage against alternatives with ablation studies and show that the new method is highly competitive. Our evaluations on standard regression benchmarks and a Bayesian optimisation task set our approach apart from preconditioned conjugate gradients, variational Gaussian process approximations, and a previous version of stochastic gradient descent for Gaussian processes. On a molecular binding affinity prediction task, our method places Gaussian process regression on par in terms of performance with state-of-the-art graph neural networks.
摘要
我们研究 Gaussian process regression 中的优化问题,使用平方损失函数。最常见的方法是使用精确解算法,如 conjugate gradient descent,直接或将问题缩放到减少的版本上进行解决。在深度学习的成功推动下,stochastic gradient descent 在过去几年中得到了广泛的应用。在这篇论文中,我们表明,当使用特定的优化和核函数社区的知识时,这种方法是非常有效的。我们因此提出了一种特定的随机双重梯度下降算法,可以使用任何深度学习框架进行实现,只需几行代码即可。我们解释了我们的设计决策,并通过缺失研究和比较减少梯度下降、变量 Gaussian process 近似和前一个版本的随机梯度下降,我们的方法与之相比较高效。在一个分子绑定亲和力预测任务上,我们的方法使 Gaussian process regression 与状态机智能网络在性能上具有相同的水平。
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks
results: 发现了模型初始化的方差对隐私损失的直接关系,并在不同的初始化 distribuion 下显示了深度对隐私损失的复杂交互作用。此外,还证明了在固定 KL 隐私预算下的过分Empirical risk bounds。Abstract
We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets, and explore its dependence on the initialization, width, and depth of fully connected neural networks. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training. Notably, for the special setting of linearized network, our analysis indicates that the squared gradient norm (and therefore the escalation of privacy loss) is tied directly to the per-layer variance of the initialization distribution. By using this analysis, we demonstrate that privacy bound improves with increasing depth under certain initializations (LeCun and Xavier), while degrades with increasing depth under other initializations (He and NTK). Our work reveals a complex interplay between privacy and depth that depends on the chosen initialization distribution. We further prove excess empirical risk bounds under a fixed KL privacy budget, and show that the interplay between privacy utility trade-off and depth is similarly affected by the initialization.
摘要
我们分析了随机机器学习算法中模型过参数化对训练数据泄露信息的影响。我们证明了一个隐私约束 для KL散度 между模型分布在最坏邻居数据集上,并研究其与初始化、宽度和深度等因素的关系。我们发现这个隐私约束主要受到训练过程中参数对模型的期望平方Gradient norm的影响。特别是在特殊的线性化网络情况下,我们的分析表明,squared gradient norm(并且因此隐私损害的加剧)与初始化分布的每层卷积 variance直接相关。我们通过这种分析,证明了在某些初始化情况下(如LeCun和Xavier),隐私约束随着深度增加而提高,而在其他初始化情况下(如He和NTK),隐私约束随着深度增加而下降。我们的研究发现,隐私和深度之间存在复杂的互动,这与选择的初始化分布有关。我们还证明了随着 fix KL 隐私预算的情况下,隐私和深度之间存在的利用性质和深度之间的负相关。
Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization
results: 我们表明了 arTuRO 可以结合适应式 moments-based 优化的快速收敛性和SGD的泛化能力。Abstract
Stochastic gradient-based optimization is crucial to optimize neural networks. While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information. Such methods precondition the gradient using the objective's Hessian. Yet, computing the Hessian is usually expensive and effectively using second-order information in the stochastic gradient setting is non-trivial. We propose using Information-Theoretic Trust Region Optimization (arTuRO) for improved updates with uncertain second-order information. By modeling the network parameters as a Gaussian distribution and using a Kullback-Leibler divergence-based trust region, our approach takes bounded steps accounting for the objective's curvature and uncertainty in the parameters. Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process. We approximate the diagonal elements of the Hessian from stochastic gradients using a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of SGD.
摘要
arTuRO models the network parameters as a Gaussian distribution and uses a Kullback-Leibler divergence-based trust region to take bounded steps that account for the objective's curvature and uncertainty in the parameters. Before each update, it solves a trust region problem for an optimal step size, resulting in a more stable and faster optimization process.To approximate the diagonal elements of the Hessian, we use a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. Our approach combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of stochastic gradient descent (SGD).
One-shot backpropagation for multi-step prediction in physics-based system identification
results: 作为一个案例研究,提出的方法在估计空间垃圾的惯性矩中进行了测试,并得到了较高的精度。Abstract
The aim of this paper is to present a novel general framework for the identification of possibly interconnected systems, while preserving their physical properties and providing accuracy in multi-step prediction. An analytical and recursive algorithm for the gradient computation of the multi-step loss function based on backpropagation is introduced, providing physical and structural insight directly into the learning algorithm. As a case study, the proposed approach is tested for estimating the inertia matrix of a space debris starting from state observations.
摘要
本文的目的是提出一种新的总体框架,用于可能相互连接的系统的标识,保持物理性质和多步预测精度。我们提出了一种分析和回归的梯度计算方法,基于反射传播,以获得学习算法中的物理和结构直观。作为一个案例研究,我们使用该方法测试预测空间垃圾的拟合矩阵。
Privacy-preserving design of graph neural networks with applications to vertical federated learning
results: 实验结果表明,VESPER可以在公共数据集和industry数据集上训练高性能的GNN模型,并在reasonable privacy budget下实现隐私保证。Abstract
The paradigm of vertical federated learning (VFL), where institutions collaboratively train machine learning models via combining each other's local feature or label information, has achieved great success in applications to financial risk management (FRM). The surging developments of graph representation learning (GRL) have opened up new opportunities for FRM applications under FL via efficiently utilizing the graph-structured data generated from underlying transaction networks. Meanwhile, transaction information is often considered highly sensitive. To prevent data leakage during training, it is critical to develop FL protocols with formal privacy guarantees. In this paper, we present an end-to-end GRL framework in the VFL setting called VESPER, which is built upon a general privatization scheme termed perturbed message passing (PMP) that allows the privatization of many popular graph neural architectures.Based on PMP, we discuss the strengths and weaknesses of specific design choices of concrete graph neural architectures and provide solutions and improvements for both dense and sparse graphs. Extensive empirical evaluations over both public datasets and an industry dataset demonstrate that VESPER is capable of training high-performance GNN models over both sparse and dense graphs under reasonable privacy budgets.
摘要
vertical 联合学习(VFL)的 paradigm,where institutions 合作 trains machine learning 模型,via combining each other's local feature 或标签信息,has achieved great success in financial risk management(FRM)applications。the surging developments of graph representation learning(GRL)have opened up new opportunities for FRM applications under FL via efficiently utilizing the graph-structured data generated from underlying transaction networks。However,transaction information is often considered highly sensitive,so it is critical to develop FL protocols with formal privacy guarantees。In this paper,we present an end-to-end GRL framework in the VFL setting called VESPER,which is built upon a general privatization scheme termed perturbed message passing(PMP)that allows the privatization of many popular graph neural architectures。Based on PMP,we discuss the strengths and weaknesses of specific design choices of concrete graph neural architectures and provide solutions and improvements for both dense and sparse graphs。extensive empirical evaluations over both public datasets and an industry dataset demonstrate that VESPER is capable of training high-performance GNN models over both sparse and dense graphs under reasonable privacy budgets。
Multi-task learning of convex combinations of forecasting models
results: 实验结果表明,提出的方法可以在 M4 竞赛数据集上增加点预测精度,比之前的方法更高。Abstract
Forecast combination involves using multiple forecasts to create a single, more accurate prediction. Recently, feature-based forecasting has been employed to either select the most appropriate forecasting models or to learn the weights of their convex combination. In this paper, we present a multi-task learning methodology that simultaneously addresses both problems. This approach is implemented through a deep neural network with two branches: the regression branch, which learns the weights of various forecasting methods by minimizing the error of combined forecasts, and the classification branch, which selects forecasting methods with an emphasis on their diversity. To generate training labels for the classification task, we introduce an optimization-driven approach that identifies the most appropriate methods for a given time series. The proposed approach elicits the essential role of diversity in feature-based forecasting and highlights the interplay between model combination and model selection when learning forecasting ensembles. Experimental results on a large set of series from the M4 competition dataset show that our proposal enhances point forecast accuracy compared to state-of-the-art methods.
摘要
forecast 组合 involves 使用多个 forecast 创造一个更准确的预测。 最近, feature-based forecasting 已经被用来选择最有用的 forecasting 模型或学习它们的 convex 组合的重量。 在这篇论文中,我们提出了一种多任务学习方法ологи,同时解决了这两个问题。 这种方法通过一个深度神经网络,其中有两个分支:回归分支,通过最小化组合预测错误来学习不同预测方法的重量,以及分类分支,通过强调多样性来选择适合特定时间序列的预测方法。 为生成训练标签,我们引入了一种优化驱动的方法,可以确定特定时间序列中最适合的预测方法。 提议中的方法强调了特征基于预测的多样性,并且高亮了组合预测和选择预测方法之间的互动。 实验结果表明,我们的提议可以比现有方法提高点预测精度。
Group-Feature (Sensor) Selection With Controlled Redundancy Using Neural Networks
results: 实验结果表明,提出的方法在一些标准数据集上具有优秀的表现,比如特征选择和组特征选择等。Abstract
In this paper, we present a novel embedded feature selection method based on a Multi-layer Perceptron (MLP) network and generalize it for group-feature or sensor selection problems, which can control the level of redundancy among the selected features or groups. Additionally, we have generalized the group lasso penalty for feature selection to encompass a mechanism for selecting valuable group features while simultaneously maintaining a control over redundancy. We establish the monotonicity and convergence of the proposed algorithm, with a smoothed version of the penalty terms, under suitable assumptions. Experimental results on several benchmark datasets demonstrate the promising performance of the proposed methodology for both feature selection and group feature selection over some state-of-the-art methods.
摘要
在这篇论文中,我们提出了一种基于多层感知网络(MLP)的嵌入式特征选择方法,并将其推广到组特征或感知器选择问题,以控制选择的特征或组中的重复性。此外,我们扩展了组lasso penalty的特征选择机制,以同时选择价值很高的组特征,并保持特征或组中的重复性控制。我们证明了提案的算法的升降持续性和收敛性,在适当的假设下。实验结果表明,提案的方法在多个标准数据集上具有优秀的表现,超过了一些当前的方法。
results: 研究表明,使用扩展的人口平衡度标准和 Parametric 方法可以提高机器学习模型的公平性,并且可以允许专家知识的使用,以避免 tradicional 公平度标准的局限性。Abstract
Algorithmic fairness has gained prominence due to societal and regulatory concerns about biases in Machine Learning models. Common group fairness metrics like Equalized Odds for classification or Demographic Parity for both classification and regression are widely used and a host of computationally advantageous post-processing methods have been developed around them. However, these metrics often limit users from incorporating domain knowledge. Despite meeting traditional fairness criteria, they can obscure issues related to intersectional fairness and even replicate unwanted intra-group biases in the resulting fair solution. To avoid this narrow perspective, we extend the concept of Demographic Parity to incorporate distributional properties in the predictions, allowing expert knowledge to be used in the fair solution. We illustrate the use of this new metric through a practical example of wages, and develop a parametric method that efficiently addresses practical challenges like limited training data and constraints on total spending, offering a robust solution for real-life applications.
摘要
《算法公平性在机器学习模型中得到了更多的关注,因为社会和管制机构对这些模型中的偏见有所关注。常见的集体公平度度量如Equalized Odds for classification和Demographic Parity for classification和regression都广泛使用,而且有许多计算优点的后处理方法被开发出来。然而,这些度量经常限制用户不能使用领域知识。即使符合传统的公平性标准,它们可能隐藏 intersectional 公平性问题,甚至在处理公平解决方案时复制不良内部偏见。为了避免这种狭隘的视角,我们扩展了 Demographic Parity 的概念,以包括预测结果中的分布性质,使得专家知识可以包含在公平解决方案中。我们通过一个实际的薪资示例来说明使用这种新的度量,并开发了一种参数化的方法,可以有效地解决实际应用中的困难,如有限的训练数据和总支出的约束,提供一个可靠的解决方案。》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Generative Learning of Continuous Data by Tensor Networks
paper_authors: Alex Meiburg, Jing Chen, Jacob Miller, Raphaëlle Tihon, Guillaume Rabusseau, Alejandro Perdomo-Ortiz
for: solves machine learning problems, especially unsupervised generative learning
methods: uses tensor network generative models for continuous data, with a new family of models based on matrix product states
results: can approximate any reasonably smooth probability density function with arbitrary precision, and performs well on synthetic and real-world datasets with both continuous and discrete variables.Abstract
Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting their utility in real-world modeling problems. We overcome this by introducing a new family of tensor network generative models for continuous data, which are capable of learning from distributions containing continuous random variables. We develop our method in the setting of matrix product states, first deriving a universal expressivity theorem proving the ability of this model family to approximate any reasonably smooth probability density function with arbitrary precision. We then benchmark the performance of this model on several synthetic and real-world datasets, finding that the model learns and generalizes well on distributions of continuous and discrete variables. We develop methods for modeling different data domains, and introduce a trainable compression layer which is found to increase model performance given limited memory or computational resources. Overall, our methods give important theoretical and empirical evidence of the efficacy of quantum-inspired methods for the rapidly growing field of generative learning.
摘要
beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. while possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting their utility in real-world modeling problems. we overcome this by introducing a new family of tensor network generative models for continuous data, which are capable of learning from distributions containing continuous random variables. we develop our method in the setting of matrix product states, first deriving a universal expressivity theorem proving the ability of this model family to approximate any reasonably smooth probability density function with arbitrary precision. we then benchmark the performance of this model on several synthetic and real-world datasets, finding that the model learns and generalizes well on distributions of continuous and discrete variables. we develop methods for modeling different data domains, and introduce a trainable compression layer which is found to increase model performance given limited memory or computational resources. overall, our methods give important theoretical and empirical evidence of the efficacy of quantum-inspired methods for the rapidly growing field of generative learning.Here's the translation in Traditional Chinese:这些tensor网络在训练问题上已经获得了广泛的应用,特别是在无supervision的生成学习方面。这些模型具有从量子灵感中获得的多个有利特征,但它们在实际应用中还是受到了限制,因为它们只能处理二进制或分类型数据。我们在matrix product states的设定下解决了这个问题,我们引入了一新的tensor网络生成模型,可以从包含连续随机变量的分布中学习。我们首先证明了这个模型家族可以对任何合理平滑概率密度函数进行任意精度的 aproximation。然后,我们对一些人工和实际数据集进行了 benchmarking,发现这个模型可以从连续和分别类型数据中学习和推导 well。我们还开发了不同的数据域模型,并引入了可调压缩层,这个层可以在有限的存储或计算资源情况下提高模型表现。总的来说,我们的方法给了量子灵感方法在生成学习领域的发展具有重要的理论和实验证据。
BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis
results: 这篇论文的实验结果显示,BasisFormer 比前一代方法高出11.04%和15.78% respectively 的精度,对于单Variate和多Variate预测任务都有着出色的表现。Abstract
Bases have become an integral part of modern deep learning-based models for time series forecasting due to their ability to act as feature extractors or future references. To be effective, a basis must be tailored to the specific set of time series data and exhibit distinct correlation with each time series within the set. However, current state-of-the-art methods are limited in their ability to satisfy both of these requirements simultaneously. To address this challenge, we propose BasisFormer, an end-to-end time series forecasting architecture that leverages learnable and interpretable bases. This architecture comprises three components: First, we acquire bases through adaptive self-supervised learning, which treats the historical and future sections of the time series as two distinct views and employs contrastive learning. Next, we design a Coef module that calculates the similarity coefficients between the time series and bases in the historical view via bidirectional cross-attention. Finally, we present a Forecast module that selects and consolidates the bases in the future view based on the similarity coefficients, resulting in accurate future predictions. Through extensive experiments on six datasets, we demonstrate that BasisFormer outperforms previous state-of-the-art methods by 11.04\% and 15.78\% respectively for univariate and multivariate forecasting tasks. Code is available at: \url{https://github.com/nzl5116190/Basisformer}
摘要
基于现代深度学习模型,时间序列预测中的基准已成为一个重要组成部分,因为它们可以作为特征提取器或未来参考。为了有效,一个基准必须适应特定的时间序列数据集,并且与每个时间序列在该集中表现出明确的相关性。然而,现有的状态 искусственный智能方法受限于同时满足这两个需求的能力。为解决这个挑战,我们提议了基础former,一种终端时间序列预测架构,它利用可学习和可解释的基准。这个架构包括三个组成部分:首先,我们通过适应性自我超vised学习获得基准,该学习方法将历史和未来部分视为两个不同的视图,并使用对比学习。然后,我们设计了一个Coef模块,该模块通过双向交叉注意力计算时间序列和基准在历史视图中的相似性系数。最后,我们提出了一个预测模块,该模块根据相似性系数选择和汇聚未来视图中的基准,从而实现准确的未来预测。经过广泛的实验,我们证明了基础former在六个数据集上的表现优于前一代方法11.04%和15.78%。代码可以在:\url{https://github.com/nzl5116190/Basisformer} 查看
Requirement falsification for cyber-physical systems using generative models
results: CPS falsification efficiency and effectiveness (state-of-the-art)Abstract
We present the OGAN algorithm for automatic requirement falsification of cyber-physical systems. System inputs and output are represented as piecewise constant signals over time while requirements are expressed in signal temporal logic. OGAN can find inputs that are counterexamples for the safety of a system revealing design, software, or hardware defects before the system is taken into operation. The OGAN algorithm works by training a generative machine learning model to produce such counterexamples. It executes tests atomically and does not require any previous model of the system under test. We evaluate OGAN using the ARCH-COMP benchmark problems, and the experimental results show that generative models are a viable method for requirement falsification. OGAN can be applied to new systems with little effort, has few requirements for the system under test, and exhibits state-of-the-art CPS falsification efficiency and effectiveness.
摘要
我团队提出了OGAN算法,用于自动化 cyber-physical systems 的需求证明。系统的输入和输出都表示为时间上的分割常量信号,而需求则用 signal temporal logic 表达。OGAN 可以找到系统的安全性不足的输入 counterexample,暴露设计、软件或硬件问题,从而避免系统在运行前发现问题。OGAN 算法通过训练生成机器学习模型来生成 counterexample。它可以原子地执行测试,不需要任何先前系统模型。我们使用 ARCH-COMP bencmark 问题进行评估,实验结果表明,生成模型是可靠的需求证明方法。OGAN 可以应用于新系统,需要少量的努力和系统的输入,并且具有现代 CPS 证明效率和可靠性。
Log-based Anomaly Detection of Enterprise Software: An Empirical Study
results: 研究发现,不同的模型在不同的数据集上表现不同,特别是在较小的、不具有固定结构的数据集上。此外,通过移除一些常见的数据泄露问题,研究发现模型的效果有所改善。同时,对开发者对异常分析的评估也表明了不同的模型在检测不同类型异常时的优缺点。最后,通过逐渐增加训练数据量来评估模型效果的影响也被研究。Abstract
Most enterprise applications use logging as a mechanism to diagnose anomalies, which could help with reducing system downtime. Anomaly detection using software execution logs has been explored in several prior studies, using both classical and deep neural network-based machine learning models. In recent years, the research has largely focused in using variations of sequence-based deep neural networks (e.g., Long-Short Term Memory and Transformer-based models) for log-based anomaly detection on open-source data. However, they have not been applied in industrial datasets, as often. In addition, the studied open-source datasets are typically very large in size with logging statements that do not change much over time, which may not be the case with a dataset from an industrial service that is relatively new. In this paper, we evaluate several state-of-the-art anomaly detection models on an industrial dataset from our research partner, which is much smaller and loosely structured than most large scale open-source benchmark datasets. Results show that while all models are capable of detecting anomalies, certain models are better suited for less-structured datasets. We also see that model effectiveness changes when a common data leak associated with a random train-test split in some prior work is removed. A qualitative study of the defects' characteristics identified by the developers on the industrial dataset further shows strengths and weaknesses of the models in detecting different types of anomalies. Finally, we explore the effect of limited training data by gradually increasing the training set size, to evaluate if the model effectiveness does depend on the training set size.
摘要
大多数企业应用程序使用日志来诊断问题,以减少系统下时间。问题探测使用软件执行日志的机器学习模型已经在多个先前研究中进行过探讨,使用了古典和深度神经网络模型。在最近几年,研究几乎集中在使用序列基的深度神经网络模型(例如Long-Short Term Memory和Transformer-based模型)进行日志基的问题探测。但是,这些模型尚未在工业数据集中使用。此外,研究使用的开源数据集通常是非常大的,且日志陈述不会随时间变化,这可能不是工业服务中的数据集。在本文中,我们评估了一些现有的问题探测模型,在我们的研究伙伴提供的工业数据集上进行评估。结果显示,处理器都能够探测问题,但一些模型更适合不具体的数据集。我们还发现,模型的效果会因某些常见的数据泄露而变化。此外,我们进一步进行了开发人员关于数据集中的问题特征的质性研究,以了解不同类型的问题探测模型在不同类型的问题上的优劣。最后,我们考虑了训练集大小的影响,通过逐步增加训练集大小来评估模型的效果是否受训练集大小影响。
Exploring Practitioner Perspectives On Training Data Attribution Explanations
results: 研究发现,实际中模型性能往往受到培根数据质量的影响,而模型开发者通常靠自己的经验来选择和准备数据。使用TDA解释不够知名,因此不被广泛使用。研究提醒了社区,需要从人机合作角度出发,推广TDA技术的应用和评估,以满足实际应用中的需求。Abstract
Explainable AI (XAI) aims to provide insight into opaque model reasoning to humans and as such is an interdisciplinary field by nature. In this paper, we interviewed 10 practitioners to understand the possible usability of training data attribution (TDA) explanations and to explore the design space of such an approach. We confirmed that training data quality is often the most important factor for high model performance in practice and model developers mainly rely on their own experience to curate data. End-users expect explanations to enhance their interaction with the model and do not necessarily prioritise but are open to training data as a means of explanation. Within our participants, we found that TDA explanations are not well-known and therefore not used. We urge the community to focus on the utility of TDA techniques from the human-machine collaboration perspective and broaden the TDA evaluation to reflect common use cases in practice.
摘要
Explainable AI (XAI) 目标是为人类提供模型决策的透明性,因此是一个跨学科领域的研究领域。在这篇论文中,我们采访了10名实践者,以了解培训数据拟合(TDA)说明的可能性,并探讨这种方法的设计空间。我们发现,实际中高模型性能的关键因素通常是培训数据质量,而模型开发者主要依靠自己的经验来选择数据。使用者希望通过模型与人类的交互来得到说明,而不一定优先考虑培训数据作为解释的来源。在我们的参与者中,我们发现TDA说明并不很知道,因此并不常用。我们呼吁社区关注TDA技术在人机合作视角下的实用性,扩大TDA评价的范围,以满足实际应用中的常见用例。
Amoeba: Circumventing ML-supported Network Censorship via Adversarial Reinforcement Learning
results: 根据实验结果,Amoeba 可以实现高度的逃脱成功率(平均为 94%),并且这些逃脱的流量具有转移性和稳定性,能够在不同的网络环境下保持高度的逃脱能力,并且可以在不同的 ML 模型之间进行转移。Abstract
Embedding covert streams into a cover channel is a common approach to circumventing Internet censorship, due to censors' inability to examine encrypted information in otherwise permitted protocols (Skype, HTTPS, etc.). However, recent advances in machine learning (ML) enable detecting a range of anti-censorship systems by learning distinct statistical patterns hidden in traffic flows. Therefore, designing obfuscation solutions able to generate traffic that is statistically similar to innocuous network activity, in order to deceive ML-based classifiers at line speed, is difficult. In this paper, we formulate a practical adversarial attack strategy against flow classifiers as a method for circumventing censorship. Specifically, we cast the problem of finding adversarial flows that will be misclassified as a sequence generation task, which we solve with Amoeba, a novel reinforcement learning algorithm that we design. Amoeba works by interacting with censoring classifiers without any knowledge of their model structure, but by crafting packets and observing the classifiers' decisions, in order to guide the sequence generation process. Our experiments using data collected from two popular anti-censorship systems demonstrate that Amoeba can effectively shape adversarial flows that have on average 94% attack success rate against a range of ML algorithms. In addition, we show that these adversarial flows are robust in different network environments and possess transferability across various ML models, meaning that once trained against one, our agent can subvert other censoring classifiers without retraining.
摘要
嵌入底层流入覆盖通道是常见的绕过互联网审查的方法,因为审查人员无法检查加密信息在允许的协议中(Skype、HTTPS等)。然而,最近的机器学习(ML)技术的进步使得可以通过学习流量中的特征 Patterns 检测多种防火墙系统。因此,设计生成混淆流量的方法,以便在速度上骗 ML 基于的分类器,是困难的。在这篇论文中,我们提出了一种实用的反抗流分类器的攻击策略,用于绕过审查。具体来说,我们将问题找到可以被 ML 分类器错分的流程的找到作为一个序列生成任务,并使用我们设计的 Amoeba 算法来解决这个问题。Amoeba 算法会与审查分类器交互,不知道它们的模型结构,但是通过编辑包和观察分类器的决策,以导引序列生成过程。我们的实验结果表明,使用 Amoeba 算法可以生成高效的反抗流,其平均攻击成功率为 94%,并且这些反抗流在不同的网络环境下保持稳定,同时具有转移性,meaning that once trained against one censoring classifier, our agent can subvert other censoring classifiers without retraining.
paper_authors: Tom Coates, Alexander M. Kasprzyk, Sara Veneziale
for: 本研究用机器学习方法来 классифицировать八维正几何体。
methods: 使用神经网络分类器来预测八维正几何体是否为Q-Fano变体。
results: 使用机器学习方法可以准确地预测八维正几何体是否为Q-Fano变体,并且可以提供初步的Q-Fano变体景观。Abstract
Algebraic varieties are the geometric shapes defined by systems of polynomial equations; they are ubiquitous across mathematics and science. Amongst these algebraic varieties are Q-Fano varieties: positively curved shapes which have Q-factorial terminal singularities. Q-Fano varieties are of fundamental importance in geometry as they are "atomic pieces" of more complex shapes - the process of breaking a shape into simpler pieces in this sense is called the Minimal Model Programme. Despite their importance, the classification of Q-Fano varieties remains unknown. In this paper we demonstrate that machine learning can be used to understand this classification. We focus on 8-dimensional positively-curved algebraic varieties that have toric symmetry and Picard rank 2, and develop a neural network classifier that predicts with 95% accuracy whether or not such an algebraic variety is Q-Fano. We use this to give a first sketch of the landscape of Q-Fanos in dimension 8. How the neural network is able to detect Q-Fano varieties with such accuracy remains mysterious, and hints at some deep mathematical theory waiting to be uncovered. Furthermore, when visualised using the quantum period, an invariant that has played an important role in recent theoretical developments, we observe that the classification as revealed by ML appears to fall within a bounded region, and is stratified by the Fano index. This suggests that it may be possible to state and prove conjectures on completeness in the future. Inspired by the ML analysis, we formulate and prove a new global combinatorial criterion for a positively curved toric variety of Picard rank 2 to have terminal singularities. Together with the first sketch of the landscape of Q-Fanos in higher dimensions, this gives new evidence that machine learning can be an essential tool in developing mathematical conjectures and accelerating theoretical discovery.
摘要
algebraic varieties 是 mathematics 和 science 中的几何形状,它们是由多项式方程定义的。其中包括 Q-Fano varieties:具有 Q-factorial 终点特性的正几何形状。Q-Fano varieties 在几何中具有基本重要性,因为它们是更复杂形状的“原子组分”——破坏这种形状的过程被称为 Minimal Model Programme。尽管其分类仍然未知,但我们在这篇论文中使用机器学习来理解这个分类。我们关注 8 维正几何形状,具有 toric симметry 和 Picard rank 2,并开发了一个神经网络分类器,可以将其中的 algebraic variety 分为 Q-Fano 和非 Q-Fano 两类,准确率为 95%。我们使用这个分类器来给 Q-Fano 变量在 8 维中的首个风景。神经网络如何准确地检测 Q-Fano 变量的原理仍然是一个谜,它可能表明了一些深刻的数学理论。此外,通过使用量子期,一个具有重要作用的 invariant,我们发现分类结果在一个固定区域内,并且按照 Fano 指数的分布。这表示可能在未来提出和证明某些 conjecture 的 Completeness。受 ML 分析启发,我们提出和证明了一个全球 combinatorial criterion,用于判断正几何 toric 变量的 Picard rank 2 是否具有终点特性。这与 Q-Fano 变量在更高维度中的首个风景,以及新的数学推断的证明,共同给出了新的证明,表明机器学习可以成为数学推断的关键工具。
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments
results: 作者在CIFAR-100数据集上实现了FlexTrain的效果,一个全球模型可以轻松地在多种设备上部署,从而降低训练时间和能耗。此外,作者还扩展了FlexTrain到联合学习Setting,表明该方法在CIFAR-10和CIFAR-100数据集上超过了标准联合学习标准准则。Abstract
As deep learning models become increasingly large, they pose significant challenges in heterogeneous devices environments. The size of deep learning models makes it difficult to deploy them on low-power or resource-constrained devices, leading to long inference times and high energy consumption. To address these challenges, we propose FlexTrain, a framework that accommodates the diverse storage and computational resources available on different devices during the training phase. FlexTrain enables efficient deployment of deep learning models, while respecting device constraints, minimizing communication costs, and ensuring seamless integration with diverse devices. We demonstrate the effectiveness of FlexTrain on the CIFAR-100 dataset, where a single global model trained with FlexTrain can be easily deployed on heterogeneous devices, saving training time and energy consumption. We also extend FlexTrain to the federated learning setting, showing that our approach outperforms standard federated learning benchmarks on both CIFAR-10 and CIFAR-100 datasets.
摘要
随着深度学习模型的大小不断增长,它们在多种设备环境中带来了重要的挑战。由于深度学习模型的大小,它们在低功率或资源受限的设备上部署困难,从而导致了长期的推理时间和高能耗。为了解决这些挑战,我们提出了FlexTrain框架,该框架在训练阶段可以适应不同设备之间的多样化存储和计算资源。FlexTrain可以有效地部署深度学习模型,同时尊重设备限制,降低通信成本,并具有与多种设备的一体化性。我们在CIFAR-100 dataset上示出了FlexTrain的有效性,其中一个全球模型可以轻松地在不同设备上部署,从而节省训练时间和能耗。此外,我们还扩展了FlexTrain到联合学习Setting中,并证明了我们的方法在CIFAR-10和CIFAR-100 dataset上超过标准联合学习标准准则。
The Phase Transition Phenomenon of Shuffled Regression
for: 这 paper investigate the phase transition phenomenon in shuffled (permuted) regression problem, which has many applications in databases, privacy, data analysis, etc.
methods: 这 paper 使用 message passing (MP) 技术来 precisely identify the phase transition points. The authors first transform the permutation recovery problem into a probabilistic graphical model, and then use the analytical tools rooted in the MP algorithm to derive an equation to track the convergence of the MP algorithm.
results: 根据 this study, the impact of signal-to-noise-ratio ($\snr$) on permutation recovery can be characterized by linking the equation to the branching random walk process. In the oracle case, the method can fairly accurately predict the phase transition $\snr$. In the non-oracle case, the algorithm can predict the maximum allowed number of permuted rows and uncover its dependency on the sample number.Abstract
We study the phase transition phenomenon inherent in the shuffled (permuted) regression problem, which has found numerous applications in databases, privacy, data analysis, etc. In this study, we aim to precisely identify the locations of the phase transition points by leveraging techniques from message passing (MP). In our analysis, we first transform the permutation recovery problem into a probabilistic graphical model. We then leverage the analytical tools rooted in the message passing (MP) algorithm and derive an equation to track the convergence of the MP algorithm. By linking this equation to the branching random walk process, we are able to characterize the impact of the signal-to-noise-ratio ($\snr$) on the permutation recovery. Depending on whether the signal is given or not, we separately investigate the oracle case and the non-oracle case. The bottleneck in identifying the phase transition regimes lies in deriving closed-form formulas for the corresponding critical points, but only in rare scenarios can one obtain such precise expressions. To tackle this technical challenge, this study proposes the Gaussian approximation method, which allows us to obtain the closed-form formulas in almost all scenarios. In the oracle case, our method can fairly accurately predict the phase transition $\snr$. In the non-oracle case, our algorithm can predict the maximum allowed number of permuted rows and uncover its dependency on the sample number.
摘要
我们研究排序(permuted)回溯问题中的相变现象,这问题在数据库、隐私、数据分析等领域都获得了广泛应用。在这些研究中,我们将专注于精确地描述相变点的位置,并使用讯息传递(MP)技术来进行分析。我们首先将排序回溯问题转换为概率Graphical Model。然后,我们利用MP算法的分析工具, derivation equation to track MP algorithm的参数。通过与分支随机步进程连接这个方程,我们能够描述 $\snr$ 对排序回溯的影响。假设讯息是否存在,我们分别进行 oracle 和非 oracle 两种情况的研究。在确定相变点的难点上,这些研究对应的批处是很困难,但这些研究提出了一种名为 Gaussian approximation method的方法,可以在大多数情况下获得关键的关键表达。在 oracle 情况下,我们的方法可以对相变 $\snr$ 进行很好的预测。在非 oracle 情况下,我们的算法可以预测最多允许的排序回溯行数,并且描述这个值对应的样本数的相互关联。
Discussing the Spectra of Physics-Enhanced Machine Learning via a Survey on Structural Mechanics Applications
results: 本研究通过一系列实验和案例研究,展示了物理学增强机器学习方法在复杂问题上的应用和优势。同时,提供了一些实际的代码,以便读者可以参照和尝试。Abstract
The intersection of physics and machine learning has given rise to a paradigm that we refer to here as physics-enhanced machine learning (PEML), aiming to improve the capabilities and reduce the individual shortcomings of data- or physics-only methods. In this paper, the spectrum of physics-enhanced machine learning methods, expressed across the defining axes of physics and data, is discussed by engaging in a comprehensive exploration of its characteristics, usage, and motivations. In doing so, this paper offers a survey of recent applications and developments of PEML techniques, revealing the potency of PEML in addressing complex challenges. We further demonstrate application of select such schemes on the simple working example of a single-degree-of-freedom Duffing oscillator, which allows to highlight the individual characteristics and motivations of different `genres' of PEML approaches. To promote collaboration and transparency, and to provide practical examples for the reader, the code of these working examples is provided alongside this paper. As a foundational contribution, this paper underscores the significance of PEML in pushing the boundaries of scientific and engineering research, underpinned by the synergy of physical insights and machine learning capabilities.
摘要
физи学和机器学习的交叉点已经给出了一种新的思想,我们称之为物理增强机器学习(PEML),旨在提高数据或物理方法的能力,同时减少它们的个体缺陷。在这篇论文中,我们讨论了物理增强机器学习方法的谱系,通过物理和数据两个定义轴的交叉分析其特征、使用和动机。这种方法的应用和发展,包括一些最近的应用和发展,揭示了PEML在复杂挑战中的力量。此外,我们还使用单度 oscillator 作为一个简单的工作示例,以阐明不同类型的 PEML 方法的特点和动机。为促进合作和透明度,并为读者提供实践例子,我们附加了这篇论文中的代码。作为基础贡献,这篇论文强调了PEML在科学和工程研究的前沿Positioning, 基于物理洞察和机器学习能力的共同作用。
DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory
paper_authors: Cenlin Duan, Jianlei Yang, Xiaolin He, Yingjie Qi, Yikun Wang, Yiou Wang, Ziyan He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weitao Pan, Weisheng Zhao for: 这个论文主要目标是提高处理在内存中(PIM)的性能,尤其是减少有效数据移动的问题。methods: 这篇论文提出了一种名为DDC-PIM的效率算法/架构合作方法,用于提高SRAM基于PIM的容量。在算法层次,提出了一种筛子相关协同(FCC)算法来获得一个bitwise相对补做。在架构层次,利用6T SRAM的内生交叠结构,将每个SRAM单元中的bitwise相对补做存储在其 complementary states($Q/\bar{Q}$)中,以最大化每个SRAM单元的数据容量。results: 评估结果表明,DDC-PIM相比PIM基eline实现,在MobileNetV2和EfficientNet-B0上提供了约2.84倍的速度提升,而且减少了精度损失。与现有的SRAM基于PIM宏比较,DDC-PIM在Weight Density和面积效率方面提供了最高的改进,即8.41倍和2.75倍。Abstract
Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other non-volatile memory-based ones, due to its inherent 6T structure for storing a single bit. Within comparable area constraints, SRAM-based PIM exhibits notably lower capacity. Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. At the algorithmic level, we propose a filter-wise complementary correlation (FCC) algorithm to obtain a bitwise complementary pair. At the architecture level, we exploit the intrinsic cross-coupled structure of 6T SRAM to store the bitwise complementary pair in their complementary states ($Q/\overline{Q}$), thereby maximizing the data capacity of each SRAM cell. The dual-broadcast input structure and reconfigurable unit support both depthwise and pointwise convolution, adhering to the requirements of various neural networks. Evaluation results show that DDC-PIM yields about $2.84\times$ speedup on MobileNetV2 and $2.69\times$ on EfficientNet-B0 with negligible accuracy loss compared with PIM baseline implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM achieves up to $8.41\times$ and $2.75\times$ improvement in weight density and area efficiency, respectively.
摘要
Processing-in-memory (PIM)是一种新的计算模式,它在数据移动效率方面提供了显著的性能优势。SRAM基于的PIM被认为是最有前途的候选者,因为它具有持续性和兼容性。然而,SRAM基于的PIM的集成密度远低于其他非朗帕内存基于的一些,这是因为它的内在6T结构只能存储一个比特。在相同的面积限制下,SRAM基于的PIM表现出较低的容量。为了解锁其容量潜力,我们提出了DDC-PIM,一种高效的算法/架构合作方法,可以有效地将其容量提高至二倍。在算法层次,我们提出了一种filter-wise complementary correlation(FCC)算法,以获得一个比特对。在架构层次,我们利用6T SRAM的内在交叉结构,将比特对存储在其相互 complementary 状态($Q/\bar{Q}$)中,从而最大化每个SRAM单元的数据容量。双向广播输入结构和可重新配置单元支持深度wise和点wise卷积,符合各种神经网络的需求。评估结果显示,DDC-PIM可以在MobileNetV2和EfficientNet-B0上提供约2.84倍的速度提升,与PIM基eline实现相比,而且减少精度损失。相比之下,DDC-PIM在SRAM基于PIM macros中实现了最高达8.41倍和2.75倍的Weight density和面积效率提升,分别。
Coalitional Manipulations and Immunity of the Shapley Value
results: 论文发现,使用这种新的基础可以获得一个唯一的高效和 симметри�efficient allocation rule,该规则具有免受协续操纵和内部重新分配价值的特性。此外,论文还发现,对于高效的分配规则,免受协续操纵性和内部重新分配价值的特性是等价的。Abstract
We consider manipulations in the context of coalitional games, where a coalition aims to increase the total payoff of its members. An allocation rule is immune to coalitional manipulation if no coalition can benefit from internal reallocation of worth on the level of its subcoalitions (reallocation-proofness), and if no coalition benefits from a lower worth while all else remains the same (weak coalitional monotonicity). Replacing additivity in Shapley's original characterization by these requirements yields a new foundation of the Shapley value, i.e., it is the unique efficient and symmetric allocation rule that awards nothing to a null player and is immune to coalitional manipulations. We further find that for efficient allocation rules, reallocation-proofness is equivalent to constrained marginality, a weaker variant of Young's marginality axiom. Our second characterization improves upon Young's characterization by weakening the independence requirement intrinsic to marginality.
摘要
我们在伙伴游戏中考虑操作,伙伴的目的是增加成员的总回扣。一个分配规则是免受伙伴操作的抵抗(reallocation-proofness),如果无法对内部子伙伴的资产重新分配(reallocation),并且如果无法在其他事物保持不变的情况下,从低回扣获得更多的回扣(弱伙伴对称)。将添加性在雪布利原始特征中替换为这些需求,则得到一个新的基础,即雪布利值是唯一的有效和对称分配规则,没有空戏者获得回扣,并且免受伙伴操作。我们还发现,对于有效分配规则,reallocation-proofness与对称组合的限制组合(constrained marginality)相等,这是对Young的组合性质较弱的一个变形。我们的第二个特征提高了Young的特征,削弱了独立性的内在需求。
A hybrid approach for solving the gravitational N-body problem with Artificial Neural Networks
paper_authors: Veronica Saz Ulibarrena, Philipp Horn, Simon Portegies Zwart, Elena Sellentin, Barry Koren, Maxwell X. Cai
for: 这 paper 的目的是研究使用人工神经网络(ANNs)来加速天体系统的数值integration。
methods: 这 paper 使用了 Hamiltonian Neural Networks 和 Deep Neural Networks 来代替 computationally expensive parts of the numerical simulation。
results: 使用 hybrid integrator 可以增加方法的可靠性,并避免大量的能量错误。 在 asteroids 的数量大于 70 时,使用神经网络可以实现更快的 simulations。Abstract
Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the last few years, although few attempts have been made to use them to speed up the simulation of the motion of celestial bodies. We study the advantages and limitations of using Hamiltonian Neural Networks to replace computationally expensive parts of the numerical simulation. We compare the results of the numerical integration of a planetary system with asteroids with those obtained by a Hamiltonian Neural Network and a conventional Deep Neural Network, with special attention to understanding the challenges of this problem. Due to the non-linear nature of the gravitational equations of motion, errors in the integration propagate. To increase the robustness of a method that uses neural networks, we propose a hybrid integrator that evaluates the prediction of the network and replaces it with the numerical solution if considered inaccurate. Hamiltonian Neural Networks can make predictions that resemble the behavior of symplectic integrators but are challenging to train and in our case fail when the inputs differ ~7 orders of magnitude. In contrast, Deep Neural Networks are easy to train but fail to conserve energy, leading to fast divergence from the reference solution. The hybrid integrator designed to include the neural networks increases the reliability of the method and prevents large energy errors without increasing the computing cost significantly. For this problem, the use of neural networks results in faster simulations when the number of asteroids is >70.
摘要
模拟行星系统的 gravitational N-body 问题的计算成本随着 N 的增加而变得极其高昂,因为问题的复杂度与体数之间存在 quadratic 关系。我们研究使用人工神经网络 (ANNs) 来取代计算成本高昂的部分。 Physical knowledge 包含的神经网络在过去几年中得到了广泛应用,但对于使用其快速化行星系统的运动 simulations 的尝试却非常少。我们研究使用 Hamiltonian Neural Networks 取代计算成本高昂的部分的优势和局限性。我们将比较一个包含 asteroids 的 planetary system 的数值积分与一个 Hamiltonian Neural Network 和一个 conventinal Deep Neural Network 的结果,并特别关注这个问题的挑战。由于 gravitational 方程的非线性,误差在积分中会卷积。为了增加使用神经网络的方法的可靠性,我们提出了一种 hybrid 积分器,该积分器会评估神经网络的预测,并将其替换为数值解决方案如果被视为不准确。 Hamiltonian Neural Networks 可以预测与 symplectic 积分器类似的结果,但是它们在输入差异大约 7 个数量级时具有困难培训和稳定性问题。相比之下, Deep Neural Networks 轻松培训,但是它们不会保留能量,导致快速偏离参照解。我们设计的 hybrid 积分器可以增加方法的可靠性,避免大量能量误差,而无需明显增加计算成本。对于这个问题,使用神经网络的方法可以在 asteroids 数量超过 70 时实现更快的计算。
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods
results: 对 Atari 2600 环境中的 PPO 算法进行了比较实验,结果显示 D-PPO 算法在性能上有显著的提升,并有效地限制了强制样本重要性抽样引起的 surrogate 目标函数差值的增长。Abstract
Policy-based reinforcement learning algorithms are widely used in various fields. Among them, mainstream policy optimization algorithms such as TRPO and PPO introduce importance sampling into policy iteration, which allows the reuse of historical data. However, this can also lead to high variance of the surrogate objective and indirectly affects the stability and convergence of the algorithm. In this paper, we first derived an upper bound of the surrogate objective variance, which can grow quadratically with the increase of the surrogate objective. Next, we proposed a dropout technique to avoid the excessive increase of the surrogate objective variance caused by importance sampling. Then, we introduced a general reinforcement learning framework applicable to mainstream policy optimization methods, and applied the dropout technique to the PPO algorithm to obtain the D-PPO variant. Finally, we conduct comparative experiments between D-PPO and PPO algorithms in the Atari 2600 environment, results show that D-PPO achieved significant performance improvements compared to PPO, and effectively limited the excessive increase of the surrogate objective variance during training.
摘要
Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm
paper_authors: Miaoxi Zhu, Li Shen, Bo Du, Dacheng Tao
for: investigate the primal-dual generalization bound of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm and analyze the impact of different topologies on the generalization bound.
methods: use the approach of algorithmic stability under both convex-concave and nonconvex-nonconcave settings to refine the algorithmic stability of D-SGDA and demonstrate that the decentralized structure does not destroy the stability and generalization of D-SGDA.
results: obtain the optimal population risk of D-SGDA in the convex-concave setting by balancing the optimization error with the generalization gap, and validate the theoretical findings through several numerical experiments.Here is the text in Simplified Chinese:
results: 在 convex-concave Setting下获得 D-SGDA 的优化人口风险,并通过数据分析 validate 理论发现。Abstract
The growing size of available data has attracted increasing interest in solving minimax problems in a decentralized manner for various machine learning tasks. Previous theoretical research has primarily focused on the convergence rate and communication complexity of decentralized minimax algorithms, with little attention given to their generalization. In this paper, we investigate the primal-dual generalization bound of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm using the approach of algorithmic stability under both convex-concave and nonconvex-nonconcave settings. Our theory refines the algorithmic stability in a decentralized manner and demonstrates that the decentralized structure does not destroy the stability and generalization of D-SGDA, implying that it can generalize as well as the vanilla SGDA in certain situations. Our results analyze the impact of different topologies on the generalization bound of the D-SGDA algorithm beyond trivial factors such as sample sizes, learning rates, and iterations. We also evaluate the optimization error and balance it with the generalization gap to obtain the optimal population risk of D-SGDA in the convex-concave setting. Additionally, we perform several numerical experiments which validate our theoretical findings.
摘要
“由于可用数据的增长,这些问题在分散化的方式下解决引起了越来越多的关注。过去的理论研究主要集中在分散化问题的参数率和通信复杂度上,几乎没有关注其普遍化。在这篇文章中,我们 investigate了分散化推导矩降(D-SGDA)算法的内部稳定性下的内部稳定性,使用了内部稳定性的方法在内部稳定性下进行了研究。我们的理论显示,分散化结构不会摧毁D-SGDA的稳定性和普遍化,这意味着它可以在某些情况下与标准的SGDA具有相同的普遍化能力。我们的结果分析了不同的结构对D-SGDA算法的普遍化范围之外的影响,以及与标准的SGDA算法进行比较。我们还评估了优化错误和普遍化距离,并在凸-凹设定下调节它们以取得D-SGDA算法的最佳人口难度。此外,我们还进行了一些实验,以验证我们的理论结果。”Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.
Distil the informative essence of loop detector data set: Is network-level traffic forecasting hungry for more data?
paper_authors: Guopeng Li, Victor L. Knoop, J. W. C., van Lint for: 这个研究主要探讨了如何评估大量交通数据的真实信息内容,以提高交通预测的精度和可靠性。methods: 本研究提出了一个不确定性认识框架,结合交通流理论和グラフ神经网络,以及使用证据学来量化不同来源的不确定性。results: results show that more than 80% of the data during daytime can be removed, and the remaining 20% samples have equal prediction power for training models. This suggests that large traffic datasets can be subdivided into significantly smaller but equally informative datasets.Abstract
Network-level traffic condition forecasting has been intensively studied for decades. Although prediction accuracy has been continuously improved with emerging deep learning models and ever-expanding traffic data, traffic forecasting still faces many challenges in practice. These challenges include the robustness of data-driven models, the inherent unpredictability of traffic dynamics, and whether further improvement of traffic forecasting requires more sensor data. In this paper, we focus on this latter question and particularly on data from loop detectors. To answer this, we propose an uncertainty-aware traffic forecasting framework to explore how many samples of loop data are truly effective for training forecasting models. Firstly, the model design combines traffic flow theory with graph neural networks, ensuring the robustness of prediction and uncertainty quantification. Secondly, evidential learning is employed to quantify different sources of uncertainty in a single pass. The estimated uncertainty is used to "distil" the essence of the dataset that sufficiently covers the information content. Results from a case study of a highway network around Amsterdam show that, from 2018 to 2021, more than 80\% of the data during daytime can be removed. The remaining 20\% samples have equal prediction power for training models. This result suggests that indeed large traffic datasets can be subdivided into significantly smaller but equally informative datasets. From these findings, we conclude that the proposed methodology proves valuable in evaluating large traffic datasets' true information content. Further extensions, such as extracting smaller, spatially non-redundant datasets, are possible with this method.
摘要
Firstly, the model design combines traffic flow theory with graph neural networks, ensuring the robustness of prediction and uncertainty quantification. Secondly, evidential learning is employed to quantify different sources of uncertainty in a single pass. The estimated uncertainty is used to "distil" the essence of the dataset that sufficiently covers the information content.Results from a case study of a highway network around Amsterdam show that, from 2018 to 2021, more than 80% of the data during daytime can be removed. The remaining 20% of samples have equal prediction power for training models. This result suggests that large traffic datasets can be subdivided into significantly smaller but equally informative datasets. From these findings, we conclude that the proposed methodology is valuable in evaluating the true information content of large traffic datasets. Further extensions, such as extracting smaller, spatially non-redundant datasets, are possible with this method.
paper_authors: Adam Dejl, Hamed Ayoobi, Matthew Williams, Francesca Toni
for: 该研究旨在提出一种新的特征归因方法(CAFE),用于解释神经网络模型的输出。
methods: 该方法 addresses three limitations of existing feature attribution methods:它不考虑输入特征之间的冲突,也不考虑模型偏好项的影响,而且过敏感于活动函数下的地方变化。CAFE提供了防止过度估计输入特征的影响的保障,并分别跟踪输入特征和偏好项的正面和负面影响,从而提高了其robustness和可靠性。
results: 实验表明,CAFE在 sintetic tabular data上能够更好地检测冲突特征,并在实际世界的 tabular datasets上 exhibits the best overall fidelity,同时具有高度计算效率。Abstract
Feature attribution methods are widely used to explain neural models by determining the influence of individual input features on the models' outputs. We propose a novel feature attribution method, CAFE (Conflict-Aware Feature-wise Explanations), that addresses three limitations of the existing methods: their disregard for the impact of conflicting features, their lack of consideration for the influence of bias terms, and an overly high sensitivity to local variations in the underpinning activation functions. Unlike other methods, CAFE provides safeguards against overestimating the effects of neuron inputs and separately traces positive and negative influences of input features and biases, resulting in enhanced robustness and increased ability to surface feature conflicts. We show experimentally that CAFE is better able to identify conflicting features on synthetic tabular data and exhibits the best overall fidelity on several real-world tabular datasets, while being highly computationally efficient.
摘要
Feature 归属方法广泛使用于解释神经网络模型,它们可以确定输入特征对模型输出的影响。我们提出了一种新的Feature归属方法,即CAFE(冲突意识Feature-wise解释),它解决了现有方法的三大限制:它们忽略了冲突特征的影响,并且不考虑偏移项的影响,以及当地活动函数下的过敏感性。与其他方法不同,CAFE提供了防止过度估计输入神经元影响的保障,并分别跟踪输入特征和偏移项的正面和负面影响,从而提高了Robustness和Surface特征冲突。我们通过实验表明,CAFE在 sintetic 表格数据上能够更好地检测冲突特征,并在多个实际表格数据上表现出最好的总准确性,同时具有高效的计算效率。
Verification of Neural Networks Local Differential Classification Privacy
results: 通过训练Only 7% of the networks,可以获得93%的验证率和提高分析时间的速度$1.7\cdot10^4$倍。Here’s a brief explanation of each point:
for: The paper is focused on protecting the privacy of individuals in neural network training.
methods: The proposed method uses distribution prediction and fine-grained verification to ensure local differential classification privacy (LDCP).
results: The proposed method, called Sphynx, can accurately predict an abstract network from a small set of training networks, and verify LDCP with high probability and low analysis time.Abstract
Neural networks are susceptible to privacy attacks. To date, no verifier can reason about the privacy of individuals participating in the training set. We propose a new privacy property, called local differential classification privacy (LDCP), extending local robustness to a differential privacy setting suitable for black-box classifiers. Given a neighborhood of inputs, a classifier is LDCP if it classifies all inputs the same regardless of whether it is trained with the full dataset or whether any single entry is omitted. A naive algorithm is highly impractical because it involves training a very large number of networks and verifying local robustness of the given neighborhood separately for every network. We propose Sphynx, an algorithm that computes an abstraction of all networks, with a high probability, from a small set of networks, and verifies LDCP directly on the abstract network. The challenge is twofold: network parameters do not adhere to a known distribution probability, making it difficult to predict an abstraction, and predicting too large abstraction harms the verification. Our key idea is to transform the parameters into a distribution given by KDE, allowing to keep the over-approximation error small. To verify LDCP, we extend a MILP verifier to analyze an abstract network. Experimental results show that by training only 7% of the networks, Sphynx predicts an abstract network obtaining 93% verification accuracy and reducing the analysis time by $1.7\cdot10^4$x.
摘要
神经网络容易受到隐私攻击。至今,无法对培训集中的个人隐私进行推理。我们提出了一种新的隐私属性,called local differential classification privacy (LDCP),将本地鲁棒性扩展到随机推理设置,适用于黑盒分类器。给定一个输入集,一个分类器是LDCP的如果它对于全集或任何单个输入的训练集都将所有输入分类相同。一个简单的算法是非常不实际,因为它需要训练一个非常大的数量的网络,并对每个网络进行本地鲁棒性检查。我们提出了一种名为Sphynx的算法,它可以从一个小量的网络中计算一个抽象网络,并直接在抽象网络上验证LDCP。挑战是双重的:网络参数不遵循已知的概率分布,使得预测抽象困难,而且预测过大的抽象会对验证产生负面影响。我们的关键想法是将参数转换为一个由KDE提供的分布,使得过应ximation错误小。为了验证LDCP,我们将MILP验证器扩展到分析抽象网络。实验结果表明,通过训练仅7%的网络,Sphynx可以预测一个抽象网络, obtiene 93%的验证精度,并将分析时间减少为$1.7\cdot10^4$倍。
Accelerating Generalized Linear Models by Trading off Computation for Uncertainty
results: 根据这篇论文的结果,使用这家系列的迭代方法可以将 GLMs 的训练时间大幅缩短,并且可以调整这种训练时间的调整。这些方法可以在大规模数据上进行高效的推断,并且可以将推断错误的信息传递给使用者。Abstract
Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training by explicitly trading off reduced computation for increased uncertainty.
摘要
bayesian总体线性模型(GLMs)提供一种灵活的概率框架,用于模型 categorical、ordinal和连续数据,广泛应用于实践中。然而, exact inference在GLMs中是不可能进行大规模数据处理的,因此需要使用 aproximations。这些 aproximations 会导致模型的可靠性受到影响,并且不会考虑模型预测中的uncertainty。在这种情况下,我们介绍了一家Iterator的方法,可以显式地模型这种错误。这些方法特别适合并行现代计算硬件,高效地重复计算,并将信息压缩到减少时间和内存需求。我们在一个实际上是一个大规模分类问题中进行了示例,并示出了我们的方法可以明显加速训练,通过交换减少计算和增加uncertainty来进行明确的贸易OFF。
Advancing Bayesian Optimization via Learning Correlated Latent Space
results: 通过引入 lipschitz 常数 regularization、损失权重和信任区域重定位,我们的方法可以减少内在差距,并在有批量优化任务中达到高效性。Abstract
Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it leads to an inherent gap that results in potentially suboptimal solutions. To alleviate the discrepancy, we propose Correlated latent space Bayesian Optimization (CoBO), which focuses on learning correlated latent spaces characterized by a strong correlation between the distances in the latent space and the distances within the objective function. Specifically, our method introduces Lipschitz regularization, loss weighting, and trust region recoordination to minimize the inherent gap around the promising areas. We demonstrate the effectiveness of our approach on several optimization tasks in discrete data, such as molecule design and arithmetic expression fitting, and achieve high performance within a small budget.
摘要
bayesian优化是一种强大的方法,用于优化黑盒函数,具有有限的函数评估次数。最近的研究表明,通过深度生成模型,如变分自动编码器,在嵌入空间进行优化可以获得有效和高效的bayesian优化。然而,由于优化不发生在输入空间,这会导致一定的差距,从而可能导致不优化的解。为了缓解这个差距,我们提出相关的嵌入空间抽象 bayesian优化方法(CoBO),它关注学习相关的嵌入空间,其中嵌入空间中距离与目标函数中距离之间具有强相关性。我们的方法引入了 lipschitz 规范、损失权重和信任区域重新坐标,以降低差距的困扰。我们在多个数据嵌入中进行了评估,包括分子设计和数学表达适应,并在有限资源下达到了高性能。
STDA-Meta: A Meta-Learning Framework for Few-Shot Traffic Prediction
results: 实验结果表明,与基eline模型相比,我们的模型在两个纪录MAE和RMSE上提高了7%的预测精度。Abstract
As the development of cities, traffic congestion becomes an increasingly pressing issue, and traffic prediction is a classic method to relieve that issue. Traffic prediction is one specific application of spatio-temporal prediction learning, like taxi scheduling, weather prediction, and ship trajectory prediction. Against these problems, classical spatio-temporal prediction learning methods including deep learning, require large amounts of training data. In reality, some newly developed cities with insufficient sensors would not hold that assumption, and the data scarcity makes predictive performance worse. In such situation, the learning method on insufficient data is known as few-shot learning (FSL), and the FSL of traffic prediction remains challenges. On the one hand, graph structures' irregularity and dynamic nature of graphs cannot hold the performance of spatio-temporal learning method. On the other hand, conventional domain adaptation methods cannot work well on insufficient training data, when transferring knowledge from different domains to the intended target domain.To address these challenges, we propose a novel spatio-temporal domain adaptation (STDA) method that learns transferable spatio-temporal meta-knowledge from data-sufficient cities in an adversarial manner. This learned meta-knowledge can improve the prediction performance of data-scarce cities. Specifically, we train the STDA model using a Model-Agnostic Meta-Learning (MAML) based episode learning process, which is a model-agnostic meta-learning framework that enables the model to solve new learning tasks using only a small number of training samples. We conduct numerous experiments on four traffic prediction datasets, and our results show that the prediction performance of our model has improved by 7\% compared to baseline models on the two metrics of MAE and RMSE.
摘要
随着城市的发展,交通堵塞问题日益突出,交通预测成为解决这一问题的一种经典方法。交通预测是空间时间预测学习的一个特定应用,与出租车预测、天气预测和船舶轨迹预测等问题相关。然而,传统的空间时间预测学习方法,包括深度学习,需要大量的训练数据。在实际情况下,一些新发展的城市可能缺乏感知器,这会导致预测性能下降。在这种情况下,少量数据学习(Few-shot Learning,FSL)成为了一个挑战。一方面,城市网络结构的不规则性和时间动态性使得空间时间预测方法的性能受到限制。另一方面,传统的领域适应方法在缺乏训练数据情况下不能很好地传递知识到目标领域。为解决这些挑战,我们提出了一种新的空间时间预测适应(STDA)方法,该方法通过在数据充沛城市中学习转移性空间时间元知识的方式,来提高数据缺乏城市的预测性能。具体来说,我们使用基于Model-Agnostic Meta-Learning(MAML)的 episodic learning 进程来训练 STDA 模型,该进程使得模型可以通过少量训练样本来解决新的学习任务。我们在四个交通预测数据集上进行了多次实验,结果显示,我们的模型在 MAE 和 RMSE 两个指标上提高了7%。
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics
results: 试验表明,通过使用这些指标作为正则化项,可以提高预测的准确性、锐度和决策质量,并且超过了仅仅采用后期重新准确化的方法。Abstract
Calibration ensures that probabilistic forecasts meaningfully capture uncertainty by requiring that predicted probabilities align with empirical frequencies. However, many existing calibration methods are specialized for post-hoc recalibration, which can worsen the sharpness of forecasts. Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. Furthermore, we provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions. Our empirical evaluation demonstrates that employing these metrics as regularizers enhances calibration, sharpness, and decision-making across a range of regression and classification tasks, outperforming methods relying solely on post-hoc recalibration.
摘要
<>加油确保机会预测值具有意义的不确定性捕捉,需要预测概率与实际频率一致。然而,许多现有的准备方法专注于后期重新准备,可能会恶化预测的精度。基于准备可以视为分布匹配任务的想法,我们引入核函数基础的准备指标,总结和普适了广泛的准备方法,包括分类和回归。这些指标允许可微样本估计,使其容易在empirical risk minimization中包含准备目标。此外,我们提供直观的机制来适应准备指标与决策任务的关系,并强制实际损失估计和无损失决策。我们的实验评估表明,通过加入这些指标作为正则izers来增强准备、锐度和决策,在广泛的分类和回归任务上表现出色,超过仅仅依靠后期重新准备的方法。
Network Contention-Aware Cluster Scheduling with Reinforcement Learning
results: 相比于传统的调度策略,本论文的方法可以降低平均训练时间 by up to 18.2%,并将尾部训练时间降低 by up to 20.7%,同时可以在平均训练时间和资源利用率之间进行可接受的变数调整。Abstract
With continuous advances in deep learning, distributed training is becoming common in GPU clusters. Specifically, for emerging workloads with diverse amounts, ratios, and patterns of communication, we observe that network contention can significantly degrade training throughput. However, widely used scheduling policies often face limitations as they are agnostic to network contention between jobs. In this paper, we present a new approach to mitigate network contention in GPU clusters using reinforcement learning. We formulate GPU cluster scheduling as a reinforcement learning problem and opt to learn a network contention-aware scheduling policy that efficiently captures contention sensitivities and dynamically adapts scheduling decisions through continuous evaluation and improvement. We show that compared to widely used scheduling policies, our approach reduces average job completion time by up to 18.2\% and effectively cuts the tail job completion time by up to 20.7\% while allowing a preferable trade-off between average job completion time and resource utilization.
摘要
随着深度学习的不断发展,分布式训练在GPU集群中变得越来越普遍。特别是在新趋势上,我们发现网络竞争可以很大程度地降低训练速率。然而,广泛使用的调度策略经常面临限制,因为它们对GPU集群中的网络竞争不感知。在这篇论文中,我们提出了一种使用强化学习来减轻GPU集群中的网络竞争的方法。我们将GPU集群调度问题定义为一个强化学习问题,并选择学习一个感知网络竞争的调度策略,以高效地捕捉竞争敏感度并通过不断评估和改进来 dynamically adapt 调度决策。我们表明,比起广泛使用的调度策略,我们的方法可以降低平均任务完成时间,最多降低18.2%,同时有一个可接受的资源利用率。
Importance Estimation with Random Gradient for Neural Network Pruning
results: 与现有方法相比,我们的方法在ResNet和VGG架构上的CIFAR-100和STL-10数据集上表现更好,并且可以补充和提高现有方法的表现。Abstract
Global Neuron Importance Estimation is used to prune neural networks for efficiency reasons. To determine the global importance of each neuron or convolutional kernel, most of the existing methods either use activation or gradient information or both, which demands abundant labelled examples. In this work, we use heuristics to derive importance estimation similar to Taylor First Order (TaylorFO) approximation based methods. We name our methods TaylorFO-abs and TaylorFO-sq. We propose two additional methods to improve these importance estimation methods. Firstly, we propagate random gradients from the last layer of a network, thus avoiding the need for labelled examples. Secondly, we normalize the gradient magnitude of the last layer output before propagating, which allows all examples to contribute similarly to the importance score. Our methods with additional techniques perform better than previous methods when tested on ResNet and VGG architectures on CIFAR-100 and STL-10 datasets. Furthermore, our method also complements the existing methods and improves their performances when combined with them.
摘要
FedRec+: Enhancing Privacy and Addressing Heterogeneity in Federated Recommendation Systems
results: 实验结果表明,FedRec+在不同的参考数据集上具有状态 искус级的性能。Abstract
Preserving privacy and reducing communication costs for edge users pose significant challenges in recommendation systems. Although federated learning has proven effective in protecting privacy by avoiding data exchange between clients and servers, it has been shown that the server can infer user ratings based on updated non-zero gradients obtained from two consecutive rounds of user-uploaded gradients. Moreover, federated recommendation systems (FRS) face the challenge of heterogeneity, leading to decreased recommendation performance. In this paper, we propose FedRec+, an ensemble framework for FRS that enhances privacy while addressing the heterogeneity challenge. FedRec+ employs optimal subset selection based on feature similarity to generate near-optimal virtual ratings for pseudo items, utilizing only the user's local information. This approach reduces noise without incurring additional communication costs. Furthermore, we utilize the Wasserstein distance to estimate the heterogeneity and contribution of each client, and derive optimal aggregation weights by solving a defined optimization problem. Experimental results demonstrate the state-of-the-art performance of FedRec+ across various reference datasets.
摘要
translation in simplified Chinese:保护用户隐私和降低边缘用户的通信成本是推荐系统的主要挑战。虽然联邦学习已经证明可以保护隐私 by avoiding数据交换 между客户端和服务器,但是服务器可以根据两次连续的用户上传的非零梯度来推断用户评分。此外,联邦推荐系统(FRS)面临着异ogeneous挑战,导致推荐性能下降。在这篇论文中,我们提出了FedRec+,一种ensemble框架 для FRS,可以增强隐私,同时解决异ogeneous挑战。FedRec+使用了优化subset选择基于特征相似性,生成 Pseudo Item 的近似优评级,只使用用户的本地信息。这种方法可以减少噪音,而无需额外的通信成本。此外,我们利用 Wasserstein 距离来估计每个客户端的异ogeneous和贡献,并 deriv Optimal Aggregation Weights 通过解决定义的优化问题。实验结果表明FedRec+在多个参考数据集上达到了状态最佳性能。
Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer
for: 这项研究旨在解决空间 gravitational wave 探测中的数据处理困难,具体来说是预测 Compact Binary Systems 波形的准确性。
methods: 该研究使用了一种可解释的大型预训练模型,名为 CBS-GPT,来预测 Compact Binary Systems 波形。
results: 该模型在预测 Massive Black Hole Binary、Extreme Mass-Ratio Inspirals 和 Galactic Binary 波形方面达到了98%、91%和99%的准确率。 CBS-GPT 模型具有可读性特点,其隐藏参数能够有效捕捉波形中的复杂信息,包括仪器响应和广泛的参数范围。Abstract
Space-based gravitational wave detection is one of the most anticipated gravitational wave (GW) detection projects in the next decade, which will detect abundant compact binary systems. However, the precise prediction of space GW waveforms remains unexplored. To solve the data processing difficulty in the increasing waveform complexity caused by detectors' response and second-generation time-delay interferometry (TDI 2.0), an interpretable pre-trained large model named CBS-GPT (Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer) is proposed. For compact binary system waveforms, three models were trained to predict the waveforms of massive black hole binary (MBHB), extreme mass-ratio inspirals (EMRIs), and galactic binary (GB), achieving prediction accuracies of 98%, 91%, and 99%, respectively. The CBS-GPT model exhibits notable interpretability, with its hidden parameters effectively capturing the intricate information of waveforms, even with complex instrument response and a wide parameter range. Our research demonstrates the potential of large pre-trained models in gravitational wave data processing, opening up new opportunities for future tasks such as gap completion, GW signal detection, and signal noise reduction.
摘要
空间基于 gravitational wave 探测是下一个辉煌的 gravitational wave(GW)探测项目,可探测丰富的紧凑 binary system。然而,准确预测空间 GW 波形仍然未经探索。为解决探测器响应和第二代时间延迟相互探测(TDI 2.0)中增加的波形复杂性,一种可解释的大型预训练模型被提议,称为 CBS-GPT(紧凑 binary system waveform generation with generative pre-trained transformer)。对紧凑 binary system waveform,三个模型被训练以预测大黑洞 binary(MBHB)、极高质量比例减少(EMRIs)和 galactic binary(GB)的波形,实现预测精度分别为98%、91%和99%。CBS-GPT 模型具有明显的可解释性,其隐藏参数能够有效捕捉波形中的复杂信息,即使检测器响应和参数范围宽泛。我们的研究表明大型预训练模型在 gravitational wave 数据处理中具有潜在的应用前景,开启新的未来任务,如 gap completion、GW 信号检测和信号噪声减少。
Understanding and Visualizing Droplet Distributions in Simulations of Shallow Clouds
paper_authors: Justus C. Will, Andrea M. Jenney, Kara D. Lamb, Michael S. Pritchard, Colleen Kaul, Po-Lun Ma, Kyle Pressel, Jacob Shpund, Marcus van Lier-Walqui, Stephan Mandt
results: 研究人员使用变分自适应器(VAEs)生成了新的、直观的视觉化方法,可以更好地理解液滴大小的分布和时间发展。这些结果提高了解释,并允许我们对不同气尘含量的 simulations 进行比较,探讨气尘-云交互的过程。Abstract
Thorough analysis of local droplet-level interactions is crucial to better understand the microphysical processes in clouds and their effect on the global climate. High-accuracy simulations of relevant droplet size distributions from Large Eddy Simulations (LES) of bin microphysics challenge current analysis techniques due to their high dimensionality involving three spatial dimensions, time, and a continuous range of droplet sizes. Utilizing the compact latent representations from Variational Autoencoders (VAEs), we produce novel and intuitive visualizations for the organization of droplet sizes and their evolution over time beyond what is possible with clustering techniques. This greatly improves interpretation and allows us to examine aerosol-cloud interactions by contrasting simulations with different aerosol concentrations. We find that the evolution of the droplet spectrum is similar across aerosol levels but occurs at different paces. This similarity suggests that precipitation initiation processes are alike despite variations in onset times.
摘要
Translated into Simplified Chinese:全面分析当地液滴级别交互是云物理过程的解释中的关键。大气动力学精度高的大气液滴大小分布模拟挑战当前分析技术,因为它们具有三维空间、时间和液滴大小的维度。通过Variational Autoencoders (VAEs)的紧凑尺度表示,我们生成了不同于归类技术的新的和直观的视觉化,以更好地理解液滴大小的组织和时间演化。这有助于解释,并允许我们通过不同气尘浓度的模拟比较云-降水相互作用。我们发现,不同气尘浓度下的液滴spectrum在发展 pace 上存在相似性,这表示降水起始过程是相同的,即使起始时间有所不同。
Efficient Robust Bayesian Optimization for Arbitrary Uncertain inputs
paper_authors: Lin Yang, Junlong Lyu, Wenlong Lyu, Zhitang Chen for: 这个研究的目的是发展一种可以对不确定输入进行Robust Bayesian Optimization的算法,以确保Optimum的稳定性和高效性。methods: 这个算法使用了Gaussian Process的Maximum Mean Discrepancy(MMD)来直接模型不确定的输入,并使用Nystrom数学来加速 posterior inference。results: 实验结果显示,这个方法可以 Handle various input uncertainties和 achieve state-of-the-art performance。同时,这个方法也可以提供了严格的理论 regret bound。Abstract
Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in significant performance fluctuations in the final result. In this paper, we introduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that performs consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of arbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further accelerates the posterior inference via Nystrom approximation. Rigorous theoretical regret bound is established under MMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach can handle various input uncertainties and achieve state-of-the-art performance.
摘要
bayesian 优化(BO)是一种样本效率高的优化算法,广泛应用于各种应用领域。在一些复杂的BO任务中,输入不确定性来自优化过程中的不可避免随机性,如机器加工错误、执行噪声或上下文变化。这种不确定性使输入偏离目标值,导致优化结果的表现异常变化。在这篇论文中,我们介绍了一种新的Robust Bayesian Optimization算法,AIRBO,可以有效地确定一个可靠的最优点,在任意输入不确定性下表现一致良好。我们直接使用Gaussian Process模型来模型不确定的输入,并通过Maximum Mean Discrepancy(MMD)和Nystrom采样加速 posterior 推理。我们对MMD估计误差的理论 regret bound 进行了证明,并在synthetic functions 和实际问题上进行了广泛的实验,显示我们的方法可以处理多种输入不确定性,并达到状态的末点性能。
results: 提供更加精细的保证,超越了现有的信息论bounds在不同的学习场景中。Abstract
We present new information-theoretic generalization guarantees through the a novel construction of the "neighboring-hypothesis" matrix and a new family of stability notions termed sample-conditioned hypothesis (SCH) stability. Our approach yields sharper bounds that improve upon previous information-theoretic bounds in various learning scenarios. Notably, these bounds address the limitations of existing information-theoretic bounds in the context of stochastic convex optimization (SCO) problems, as explored in the recent work by Haghifam et al. (2023).
摘要
我们提出一新的信息理论基准,通过一个独特的“邻域假设”矩阵和一新的家族称为“样本控制假设”(SCH)稳定性。我们的方法可以提供更加锐利的上限,超越了现有信息理论上限在不同学习场景中。特别是,这些上限解决了现有信息理论上限在随机凸伸估计(SCO)问题中的限制,如某些最近的研究(Haghifam et al., 2023)所提出的问题。
Robust Learning for Smoothed Online Convex Optimization with Feedback Delay
results: 研究证明了RCL可以保证$(1+\lambda)$-竞争性比任何给定专家,而且可以在多步跳过成本和反馈延迟的情况下显著提高平均性能。Abstract
We study a challenging form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step nonlinear switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically,we prove that RCL is able to guarantee$(1+\lambda)$-competitiveness against any given expert for any$\lambda>0$, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly,RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay.We demonstrate the improvement of RCL in both robustness and average performance using battery management for electrifying transportationas a case study.
摘要
我们研究一种具有多步非线性调整成本和循环延迟的缓和线上凸项估算(SOCO)。我们提出了一个新的机器学习(ML)加持的在网上算法,即强健性条件学习(RCL),它通过将不信任的ML预测与可靠的专家网上算法结合,并透过受限的投影来强化ML预测。我们证明了RCL可以保证$(1+\lambda)$-竞争性比任何 givent expert 的任何 $\lambda>0$,同时也可以透过专门的训练来提高均值性能。特别是,RCL 是具有调整-可靠性保证的第一个 ML 加持算法,在多步调整成本和循环延迟的情况下。我们使用电动车电池管理作为一个实际应用来说明 RCL 的改进。
Bridging the Gap Between Variational Inference and Wasserstein Gradient Flows
For: The paper bridges the gap between variational inference and Wasserstein gradient flows, demonstrating that the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow, and offering an alternative perspective on the path-derivative gradient estimator.* Methods: The paper uses the path-derivative gradient estimator to generate the vector field of the gradient flow, and demonstrates how distillations can be extended to encompass $f$-divergences and non-Gaussian variational families.* Results: The paper offers a new gradient estimator for $f$-divergences, which is readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.Here’s the information in Simplified Chinese text, as requested:* 为:本文将变量推理和 Wasserstein 梯度流之间的关系继续,表明BUres-Wasserstein 梯度流可以转化为Euclidean梯度流,并提供了一种alternative perspective on path-derivative gradient estimator。* 方法:本文使用path-derivative gradient estimator来生成梯度流的向量场,并证明了可以将distillations扩展到包括$f$-divergence和非 Gaussian variational families。* 结果:本文提供了一种新的 gradient estimator for $f$-divergences,可以使用现代机器学习库如PyTorch或TensorFlow进行实现。Abstract
Variational inference is a technique that approximates a target distribution by optimizing within the parameter space of variational families. On the other hand, Wasserstein gradient flows describe optimization within the space of probability measures where they do not necessarily admit a parametric density function. In this paper, we bridge the gap between these two methods. We demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow where its forward Euler scheme is the standard black-box variational inference algorithm. Specifically, the vector field of the gradient flow is generated via the path-derivative gradient estimator. We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow. Distillations can be extended to encompass $f$-divergences and non-Gaussian variational families. This extension yields a new gradient estimator for $f$-divergences, readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.
摘要
<>translate "Variational inference is a technique that approximates a target distribution by optimizing within the parameter space of variational families. On the other hand, Wasserstein gradient flows describe optimization within the space of probability measures where they do not necessarily admit a parametric density function. In this paper, we bridge the gap between these two methods. We demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow where its forward Euler scheme is the standard black-box variational inference algorithm. Specifically, the vector field of the gradient flow is generated via the path-derivative gradient estimator. We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow. Distillations can be extended to encompass $f$-divergences and non-Gaussian variational families. This extension yields a new gradient estimator for $f$-divergences, readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.">Here's the translation:Variational inference是一种技术,通过在变量家族的参数空间中优化,来近似目标分布。而 Wasserstein梯度流则是在概率度量空间中优化的,但它们并不一定具有参数概率函数。在这篇论文中,我们将这两种方法相互关联。我们证明,在某些条件下,寿-沃asserstein梯度流可以转换为欧几何梯度流,其前向尼尔逻辑步骤是标准的黑盒变量推理算法。具体来说,梯度流的向量场是通过路径导数 gradient estimator 生成的。我们还提供了一种alternative perspective on the path-derivative gradient,它是将它看作是瓦asserstein梯度流的蒸馏过程。这种扩展可以涵盖 $f$-散度和非泊尔分布变量家族。这种扩展可以生成一个新的梯度估计器,可以使用 contemporary machine learning libraries like PyTorch or TensorFlow 进行实现。
paper_authors: Di Xu, Qihui Lyu, Dan Ruan, Ke Sheng
For: breast tissue differentiation* Methods: dual-energy computed tomography (DECT), deep learning (DL) methods, non-recursive setup, raw projection data* Results: high-fidelity decomposition of adipose, calcification, and fibroglandular materials with low RMSE, MAE, negative PSNR, and SSIM compared to ground truth, fast inference time (<1s) on a 4xRTX A6000 GPU cluster.Abstract
Dual-energy computed tomography (DECT) utilizes separate X-ray energy spectra to improve multi-material decomposition (MMD) for various diagnostic applications. However accurate decomposing more than two types of material remains challenging using conventional methods. Deep learning (DL) methods have shown promise to improve the MMD performance, but typical approaches of conducing DL-MMD in the image domain fail to fully utilize projection information or under iterative setup are computationally inefficient in both training and prediction. In this work, we present a clinical-applicable MMD (>2) framework rFast-MMDNet, operating with raw projection data in non-recursive setup, for breast tissue differentiation. rFast-MMDNet is a two-stage algorithm, including stage-one SinoNet to perform dual energy projection decomposition on tissue sinograms and stage-two FBP-DenoiseNet to perform domain adaptation and image post-processing. rFast-MMDNet was tested on a 2022 DL-Spectral-Challenge breast phantom dataset. The two stages of rFast-MMDNet were evaluated separately and then compared with four noniterative reference methods including a direct inversion method (AA-MMD), an image domain DL method (ID-UNet), AA-MMD/ID-UNet + DenoiseNet and a sinogram domain DL method (Triple-CBCT). Our results show that models trained from information stored in DE transmission domain can yield high-fidelity decomposition of the adipose, calcification, and fibroglandular materials with averaged RMSE, MAE, negative PSNR, and SSIM of 0.004+/-~0, 0.001+/-~0, -45.027+/-~0.542, and 0.002+/-~0 benchmarking to the ground truth, respectively. Training of entire rFast-MMDNet on a 4xRTX A6000 GPU cluster took a day with inference time <1s. All DL methods generally led to more accurate MMD than AA-MMD. rFast-MMDNet outperformed Triple-CBCT, but both are superior to the image-domain based methods.
摘要
dual-energy computed tomography (DECT) 利用不同的X射线能谱spectrum来提高多种材料分解(MMD)的精度,以满足各种诊断应用。然而,使用传统方法对多于两种材料的分解仍然是一个挑战。深度学习(DL)方法已经展示了改进MMD性能的投入,但通常在图像域中进行DL-MMD会不全利用投影信息,或者在迭代设置下 computationally inefficient。在这项工作中,我们提出了一种临床可用的MMD(>2)框架,称为rFast-MMDNet,可以在原始投影数据上进行非递归设置。rFast-MMDNet包括两个阶段:第一阶段SinoNet用于在组织射ogram上进行双能量投影分解,第二阶段FBP-DenoiseNet用于适应频谱和图像后处理。我们在2022年DL-Spectral-Challenge乳腺phantom数据集上测试了rFast-MMDNet。我们将两个阶段的rFast-MMDNet分别评估,并与四种非迭代参照方法进行比较,包括直接投影方法(AA-MMD)、图像域DL方法(ID-UNet)、AA-MMD/ID-UNet+DenoiseNet和投影域DL方法(Triple-CBCT)。我们的结果显示,基于DE传输域信息的模型可以很准确地分解脂肪、 calcification和 fibroglandular材料的均值RMSE、MAE、负PSNR和SSIM分别为0.004±~0、0.001±~0、-45.027±~0.542和0.002±~0。这与真实值相对较好。在一个4xRTX A6000 GPU集群上训练整个rFast-MMDNet只需一天时间,并且在预测中时间<1s。所有DL方法都比AA-MMD更精准地进行MMD,而rFast-MMDNet超过了Triple-CBCT,但两者都比图像域基本方法更好。
Fast, multicolour optical sectioning over extended fields of view by combining interferometric SIM with machine learning
paper_authors: Edward N. Ward, Rebecca M. McClelland, Jacob R. Lamb, Roger Rubio-Sánchez, Charles N. Christensen, Bismoy Mazumder, Sofia Kapsiani, Luca Mascheroni, Lorenzo Di Michele, Gabriele S. Kaminski Schierle, Clemens F. Kaminski
results: 在silico validate和实验中,能够实现高速、高对比度的图像重建,并且可以在不同的样品上进行图像捕捉,包括活和Fixed生物细胞以及人工生物系统。Abstract
Structured illumination can reject out-of-focus signal from a sample, enabling high-speed and high-contrast imaging over large areas with widefield detection optics. Currently, this optical-sectioning technique is limited by image reconstruction artefacts and the need for sequential imaging of multiple colour channels. We combine multicolour interferometric pattern generation with machine-learning processing, permitting high-contrast, real-time reconstruction of image data. The method is insensitive to background noise and unevenly phase-stepped illumination patterns. We validate the method in silico and demonstrate its application on diverse specimens, ranging from fixed and live biological cells to synthetic biosystems, imaging at up to 37 Hz across a 44 x 44 $\mu m^2$ field of view.
摘要
《结构化照明可以排除样本上的不锐化信号,允许高速和高对比度的成像,覆盖大面积的广角探测仪。目前,这种光学分 slice 技术受到图像重建瑕疵和多色通道的顺序扫描限制。我们将多色干扰Pattern生成与机器学习处理相结合,允许高对比度、实时重建图像数据。该方法具有背景噪声和不均分阶段照明模式的敏感性。我们通过计算机模拟和实验 validate 该方法,并在不同的样本上应用,包括固定和活体细胞、人造生物系统,成像频率达到 37 Hz,扫描领域为 44 x 44 $\mu m^2$。
UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges
results: 研究发现,使用hardware实现的HEVC编码器可以实现最佳的编码效率和复杂度平衡,而使用软件实现的AV1编码器在编码效率方面表现出色。此外,研究还展示了一个实际的360度视频流传输测试环境,使用5G无线网络控制无人机。Abstract
Over the past decade, the utilization of UAVs has witnessed significant growth, owing to their agility, rapid deployment, and maneuverability. In particular, the use of UAV-mounted 360-degree cameras to capture omnidirectional videos has enabled truly immersive viewing experiences with up to 6DoF. However, achieving this immersive experience necessitates encoding omnidirectional videos in high resolution, leading to increased bitrates. Consequently, new challenges arise in terms of latency, throughput, perceived quality, and energy consumption for real-time streaming of such content. This paper presents a comprehensive survey of research efforts in UAV-based immersive video streaming, benchmarks popular video encoding schemes, and identifies open research challenges. Initially, we review the literature on 360-degree video coding, packaging, and streaming, with a particular focus on standardization efforts to ensure interoperability of immersive video streaming devices and services. Subsequently, we provide a comprehensive review of research efforts focused on optimizing video streaming for timevarying UAV wireless channels. Additionally, we introduce a high resolution 360-degree video dataset captured from UAVs under different flying conditions. This dataset facilitates the evaluation of complexity and coding efficiency of software and hardware video encoders based on popular video coding standards and formats, including AVC/H.264, HEVC/H.265, VVC/H.266, VP9, and AV1. Our results demonstrate that HEVC achieves the best trade-off between coding efficiency and complexity through its hardware implementation, while AV1 format excels in coding efficiency through its software implementation, specifically using the libsvt-av1 encoder. Furthermore, we present a real testbed showcasing 360-degree video streaming over a UAV, enabling remote control of the drone via a 5G cellular network.
摘要
过去一个十年,无人机(UAV)的应用已经经历了显著的增长,归功于它们的灵活、快速部署和掌控能力。尤其是通过无人机搭载的360度摄像头捕捉全天候视频,使得观看体验更加 immerse 和真实,达到6度自由度(6DoF)。然而,实现这种 immerse 体验需要编码全天候视频的高分辨率,导致带宽和延迟问题的增加。这篇论文提出了关于无人机基于全天候视频流式的研究努力,比较各种视频编码方案,并标识了未来研究中的挑战。首先,我们回顾了360度视频编码、封装和流式的文献,尤其是标准化努力,以确保全天候视频流式设备和服务的互operability。然后,我们提供了全面的研究努力,旨在优化视频流式在时变UAV无线通道中的性能。此外,我们还提供了高分辨率360度视频 dataset, captured from UAVs under different flying conditions。这个 dataset 可以用于评估不同视频编码标准和格式的软硬件编码器 Complexity and coding efficiency。我们的结果显示,HEVC 在硬件实现中实现了最佳的平衡 между编码效率和复杂性,而 AV1 格式在软件实现中在编码效率方面表现出色。此外,我们还提供了一个实际的测试环境,让读者通过5G 无线网络 remote control UAV。
Harmonization-enriched domain adaptation with light fine-tuning for multiple sclerosis lesion segmentation
results: 实验表明,将一shot 适应数据与融合培处数据混合使用可以超越使用各自数据源时的性能。此外,只需要微调2-5个轮次,就可以达到最佳性能。Abstract
Deep learning algorithms utilizing magnetic resonance (MR) images have demonstrated cutting-edge proficiency in autonomously segmenting multiple sclerosis (MS) lesions. Despite their achievements, these algorithms may struggle to extend their performance across various sites or scanners, leading to domain generalization errors. While few-shot or one-shot domain adaptation emerges as a potential solution to mitigate generalization errors, its efficacy might be hindered by the scarcity of labeled data in the target domain. This paper seeks to tackle this challenge by integrating one-shot adaptation data with harmonized training data that incorporates labels. Our approach involves synthesizing new training data with a contrast akin to that of the test domain, a process we refer to as "contrast harmonization" in MRI. Our experiments illustrate that the amalgamation of one-shot adaptation data with harmonized training data surpasses the performance of utilizing either data source in isolation. Notably, domain adaptation using exclusively harmonized training data achieved comparable or even superior performance compared to one-shot adaptation. Moreover, all adaptations required only minimal fine-tuning, ranging from 2 to 5 epochs for convergence.
摘要
深度学习算法利用核磁共振(MR)图像已经达到了自动分割多发性中风(MS)损伤的高水平。然而,这些算法可能会在不同的场景或扫描仪上表现不佳,导致领域泛化错误。一些几 shot或一 shot领域适应技术可能会解决这个问题,但它们可能受到目标领域的标注数据的罕见性的限制。这篇论文的目的是解决这个挑战,通过结合一 shot适应数据与融合标注数据进行协调。我们称之为“对比融合”(contrast harmonization)。我们的实验表明,将一 shot适应数据与融合标注数据混合使用,不仅超过了使用各自的数据源来达到的性能,而且还可以在不需要大量的精度调整下实现。特别是,使用尚未标注的融合训练数据可以达到与一 shot适应数据相同或者甚至更高的性能。此外,所有的适应都需要只需要2-5个熔断 epoch 来达到 converges。
C-Silicon-based metasurfaces for aperture-robust spectrometer/imaging with angle integration
results: 实验表明,提出的方法可以在400nm至800nm的工作带宽内保持谱论器的 spectral consistency,并且可以准确重建入射谱论信号,其准确率高于99%。此外,该研究还实现了一个400x400像素的spectral imaging系统。Abstract
Compared with conventional grating-based spectrometers, reconstructive spectrometers based on spectrally engineered filtering have the advantage of miniaturization because of the less demand for dispersive optics and free propagation space. However, available reconstructive spectrometers fail to balance the performance on operational bandwidth, spectral diversity and angular stability. In this work, we proposed a compact silicon metasurfaces based spectrometer/camera. After angle integration, the spectral response of the system is robust to angle/aperture within a wide working bandwidth from 400nm to 800nm. It is experimentally demonstrated that the proposed method could maintain the spectral consistency from F/1.8 to F/4 (The corresponding angle of incident light ranges from 7{\deg} to 16{\deg}) and the incident hyperspectral signal could be accurately reconstructed with a fidelity exceeding 99%. Additionally, a spectral imaging system with 400x400 pixels is also established in this work. The accurate reconstructed hyperspectral image indicates that the proposed aperture-robust spectrometer has the potential to be extended as a high-resolution broadband hyperspectral camera.
摘要
results: 研究结果显示,相比之前的一些传统方法,该系统可以更好地均衡三个功能,并且在允许的约束下可以获得更大的性能空间。Abstract
The wireless domain is witnessing a flourishing of integrated systems, e.g. (a) integrated sensing and communications, and (b) simultaneous wireless information and power transfer, due to their potential to use resources (spectrum, power) judiciously. Inspired by this trend, we investigate integrated sensing, communications and powering (ISCAP), through the design of a wideband OFDM signal to power a sensor while simultaneously performing target-sensing and communication. To characterize the ISCAP performance region, we assume symbols with non-zero mean asymmetric Gaussian distribution (i.e., the input distribution), and optimize its mean and variance at each subcarrier to maximize the harvested power, subject to constraints on the achievable rate (communications) and the average side-to-peak-lobe difference (sensing). The resulting input distribution, through simulations, achieves a larger performance region than that of (i) a symmetric complex Gaussian input distribution with identical mean and variance for the real and imaginary parts, (ii) a zero-mean symmetric complex Gaussian input distribution, and (iii) the superposed power-splitting communication and sensing signal (the coexisting solution). In particular, the optimized input distribution balances the three functions by exhibiting the following features: (a) symbols in subcarriers with strong communication channels have high variance to satisfy the rate constraint, while the other symbols are dominated by the mean, forming a relatively uniform sum of mean and variance across subcarriers for sensing; (b) with looser communication and sensing constraints, large absolute means appear on subcarriers with stronger powering channels for higher harvested power. As a final note, the results highlight the great potential of the co-designed ISCAP system for further efficiency enhancement.
摘要
无线领域目前正在蓬勃发展集成系统,例如(a)集成感知和通信,以及(b)同时进行无线信息和能量传输,这些系统的潜在使用资源(频率、功率)的方式被认为是可以很好地利用。受到这种趋势的启发,我们研究集成感知通信和能源供应(ISCAP),通过设计宽带OFDM信号来为感知器提供能源,同时进行目标检测和通信。为了描述ISCAP性能区域,我们假设输入信号具有非零均值抽象 Gaussian 分布(即输入分布),并优化其均值和方差在每个子帧中以最大化收集的能量,并遵循通信可达率和检测平均偏差的限制。得到的输入分布,通过仿真结果表明,在性能区域中获得更大的空间,比(i)同样的均值和方差的 сим 复杂 Gaussian 输入分布(ii)零均值的 сим 复杂 Gaussian 输入分布,以及(iii)杂合通信和检测信号的电力分配方案(共存解)。特别是,优化的输入分布均衡了三个功能,其特点包括:(a)在强通信频道上的子帧中,具有高方差以满足可达率限制,而其他子帧则受到mean的控制,形成一个相对均匀的含义和方差的总和 across subcarriers for sensing;(b)在通信和检测限制更加宽松的情况下,大的绝对均值出现在强电力传输频道上,以提高收集的能量。最后,结果表明了可以通过合理的设计来进一步提高集成ISCAP系统的效率。
Robust Waveform Design for Integrated Sensing and Communication
results: simulation results表明,使用Robust waveform设计可以确保实际操作中的通信性能达到规定的水平,而 Nominal waveform设计则无法确保这一点。这 validate了 Robust waveform设计的价值。Abstract
Integrated sensing and communication (ISAC), which enables hardware, resources (e.g., spectra), and waveforms sharing, is becoming a key feature in future-generation communication systems. This paper investigates robust waveform design for ISAC systems when the underlying true communication channels (e.g. time-selective ones) are not accurately known. With uncertainties in nominal communication channel models, the nominally-estimated communication performance may be not achievable in practice; i.e., the communication performance of ISAC systems cannot be guaranteed. Therefore, we formulate robust waveform design problems by studying the worst-case channels and prove that the robustly-estimated performance is guaranteed to be attainable in real-world operation. As a consequence, the reliability of ISAC systems in terms of communication performance is improved. The robust waveform design problems are shown to be non-convex, non-differentiable, and high-dimensional, which cannot be solved using existing optimization techniques. Therefore, we develop a computationally-efficient and globally-optimal algorithm to solve them. Simulation results show that the robustly-estimated communication performance can be ensured to be practically reachable while the nominally-estimated performance cannot, which validates the value of robust design.
摘要
Future 通信系统中的一个关键特性是集成感知和通信(ISAC),它允许硬件、资源(例如谱)和波形共享。本文研究未知通信道模型下的ISAC系统robust波形设计问题。由于实际通信道模型不准确,则 nominally-estimated 通信性能可能不能实现。因此,我们将robust波形设计问题转化为研究最差通信道和证明robustly-estimated 通信性能是实际操作中可以实现的。这使得ISAC系统的通信性能可以提高。robust波形设计问题是非对称、不导数、高维的,不能使用现有的优化技术解决。因此,我们开发了一种 computationally-efficient 和 globally-optimal 算法来解决它们。 simulation results 表明,robustly-estimated 通信性能可以在实际操作中实现,而 nominally-estimated 通信性能无法实现,这证明了robust设计的价值。
A Portable Ultrasound Imaging Pipeline Implementation with GPU Acceleration on Nvidia CLARA AGX
paper_authors: A. N. Madhavanunni, V. Arun Kumar, Mahesh Raveendranatha Panicker
for: 这个论文旨在描述一个使用GPU加速的手持式超音波图像处理推断,并且针对Nvidia CLARA AGX开发板进行实现。
methods: 这个实现方案使用了非定向扫描波传输,并且使用Nvidia CLARA AGX开发板进行加速。GPU加速的传统延迟和总和(DAS)探针 former 以及两种适应非线性探针 former 和两种福洛-基于技术被实现。
results: 这个实现方案在实验中证明了它的可行性和图像质量,并且在不同的探针格尺度下进行了执行速度的测试,发现GPU加速可以提供高达180倍的速度提升。Abstract
In this paper, we present a GPU-accelerated prototype implementation of a portable ultrasound imaging pipeline on an Nvidia CLARA AGX development kit. The raw data is acquired with nonsteered plane wave transmit using a programmable handheld open platform that supports 128-channel transmit and 64-channel receive. The received signals are transferred to the Nvidia CLARA AGX developer platform through a host system for accelerated imaging. GPU-accelerated implementation of the conventional delay and sum (DAS) beamformer along with two adaptive nonlinear beamformers and two Fourier-based techniques was performed. The feasibility of the complete pipeline and its imaging performance was evaluated with in-vitro phantom imaging experiments and the efficacy is demonstrated with preliminary in-vivo scans. The image quality quantified by the standard contrast and resolution metrics was comparable with that of the CPU implementation. The execution speed of the implemented beamformers was also investigated for different sizes of imaging grids and a significant speedup as high as 180 times that of the CPU implementation was observed. Since the proposed pipeline involves Nvidia CLARA AGX, there is always the potential for easy incorporation of online/active learning approaches.
摘要
在这篇论文中,我们提出了一种基于GPU的手持式超声图像处理框架的抽象实现。使用可编程的手持式开放平台,收集到的原始数据被转移到Nvidia CLARA AGX开发器平台上进行加速图像处理。我们实现了GPU加速的传统延迟和积(DAS)扩展器以及两种适应非线性扩展器和两种福洛尔基于技术。我们进行了卷积物理镜像实验和先行尝试的生物体内扫描,并证明了图像质量指标和CPU实现的相似性。此外,我们还研究了不同大小的扫描网格的执行速度,并发现GPU实现的扩展器速度可以达到CPU实现的180倍。由于提出的管道使用Nvidia CLARA AGX,因此总是可以轻松地将在线/活动学习策略 incorporated。
Energy-Aware Adaptive Sampling for Self-Sustainability in Resource-Constrained IoT Devices
paper_authors: Marco Giordano, Silvano Cortesi, Prodromos-Vasileios Mekikis, Michele Crabolu, Giovanni Bellusci, Michele Magno
For: The paper is written for resource-constrained, battery-powered IoT devices that require self-sustainability through smart power management algorithms and energy harvesting solutions.* Methods: The paper proposes an energy-aware adaptive sampling rate algorithm based on a finite state machine (FSM) and inspired by TCP Reno’s additive increase and multiplicative decrease, which maximizes sensor sampling rates while ensuring power self-sustainability without risking battery depletion.* Results: The proposed algorithm enables self-sustainability while maximizing sampled locations per day, with results validated on data from three different European cities and consistently maintaining a minimum of 24 localizations per day and achieving peaks of up to 3000.Here is the same information in Simplified Chinese text:* For: 这篇论文是为了Resource-constrained, battery-powered IoT设备而写的,需要智能能源管理算法和能量收集解决方案实现自我可持续性。* Methods: 论文提出了一种基于finite state machine (FSM)的能源意识适应样本率算法,启发自TCP Reno的加法增长和乘法减少,以最大化传感器样本率,保证自我可持续性而不损害电池充电。* Results: 提议的算法可以实现自我可持续性,并且在三个不同的欧洲城市的数据上验证了结果, consistently maintaining a minimum of 24 localizations per day and achieving peaks of up to 3000。Abstract
In the ever-growing Internet of Things (IoT) landscape, smart power management algorithms combined with energy harvesting solutions are crucial to obtain self-sustainability. This paper presents an energy-aware adaptive sampling rate algorithm designed for embedded deployment in resource-constrained, battery-powered IoT devices. The algorithm, based on a finite state machine (FSM) and inspired by Transmission Control Protocol (TCP) Reno's additive increase and multiplicative decrease, maximizes sensor sampling rates, ensuring power self-sustainability without risking battery depletion. Moreover, we characterized our solar cell with data acquired over 48 days and used the model created to obtain energy data from an open-source world-wide dataset. To validate our approach, we introduce the EcoTrack device, a versatile device with global navigation satellite system (GNSS) capabilities and Long-Term Evolution Machine Type Communication (LTE-M) connectivity, supporting MQTT protocol for cloud data relay. This multi-purpose device can be used, for instance, as a health and safety wearable, remote hazard monitoring system, or as a global asset tracker. The results, validated on data from three different European cities, show that the proposed algorithm enables self-sustainability while maximizing sampled locations per day. In experiments conducted with a 3000 mAh battery capacity, the algorithm consistently maintained a minimum of 24 localizations per day and achieved peaks of up to 3000.
摘要
在日益扩大的互联网物联网(IoT)场景中,智能电力管理算法和能量收集解决方案是获得自我可持续性的关键。这篇论文介绍了一种能源意识的自适应采样率算法,用于在有限资源的、电池动力的 IoT设备中进行静态部署。该算法基于金字塔状态机(FSM),并受到传输控制协议(TCP)reno的加法增加和乘数减少的启发,以最大化传感器采样率,确保无需电池耗尽。此外,我们对solar cell进行了48天的数据收集和分析,并使用了来自开源的全球数据集来获得能量数据。为验证我们的方法,我们引入了EcoTrack设备,这是一种多功能设备,具有全球定位系统(GNSS)能力和长期演进机型通信(LTE-M)连接,支持MQTT协议将数据传输到云端。这种多功能设备可以在各种应用场景中使用,例如健康和安全护身符、远程危险监测系统或全球资产跟踪系统。实验结果,基于三个欧洲城市的数据,显示了提议的算法可以保证自我可持续性,同时最大化采样位置数。在使用3000mAh电池容量的实验中,算法一直保持了每天至少24个地点的采样,并达到了3000的峰值。
Age Optimum Sampling in Non-Stationary Environment
results: 实验结果表明,我们提出的算法可以快速检测延迟变化,并且由提posed策略获得的AoI相对较低。Abstract
In this work, we consider a status update system with a sensor and a receiver. The status update information is sampled by the sensor and then forwarded to the receiver through a channel with non-stationary delay distribution. The data freshness at the receiver is quantified by the Age-of-Information (AoI). The goal is to design an online sampling strategy that can minimize the average AoI when the non-stationary delay distribution is unknown. Assuming that channel delay distribution may change over time, to minimize the average AoI, we propose a joint stochastic approximation and non-parametric change point detection algorithm that can: (1) learn the optimum update threshold when the delay distribution remains static; (2) detect the change in transmission delay distribution quickly and then restart the learning process. Simulation results show that the proposed algorithm can quickly detect the delay changes, and the average AoI obtained by the proposed policy converges to the minimum AoI.
摘要
在这个工作中,我们考虑了一个状态更新系统,该系统包括一个传感器和一个接收器。状态更新信息由传感器采样并传输到接收器,但是通信滞后分布是非站ARY的,即不固定的。接收器中的数据新鲜度被衡量为年龄信息(AoI)。目标是设计一种在线采样策略,以最小化接收器中的年龄信息平均值,当传输滞后分布未知时。假设通信滞后分布可能会随时间变化,我们提出了一种联合随机批量估计和非 Parametric 变化点检测算法,可以:(1)在延迟分布保持静止时学习最佳更新阈值;(2)快速检测传输延迟分布的变化,然后重新开始学习过程。 simulation 结果表明,我们提出的算法可以快速检测延迟变化,并且提出的策略的年龄信息平均值可以快速 converges 到最小值。
Intelligent-Reflecting-Surface-Assisted UAV Communications for 6G Networks
paper_authors: Zhaolong Ning, Tengfeng Li, Yu Wu, Xiaojie Wang, Qingqing Wu, Fei Richard Yu, Song Guo
for: 这篇论文主要写于6G移动网络中的智能反射表(IRS)和无人飞行器(UAV)技术的应用。
methods: 本论文提出了一种基于IRS和UAV的解决方案,以提高6G网络的覆盖和效率。
results: 该解决方案可以解决6G网络中的覆盖困难和资源约束问题,同时也可以提高网络效率和用户体验。Abstract
In 6th-Generation (6G) mobile networks, Intelligent Reflective Surfaces (IRSs) and Unmanned Aerial Vehicles (UAVs) have emerged as promising technologies to address the coverage difficulties and resource constraints faced by terrestrial networks. UAVs, with their mobility and low costs, offer diverse connectivity options for mobile users and a novel deployment paradigm for 6G networks. However, the limited battery capacity of UAVs, dynamic and unpredictable channel environments, and communication resource constraints result in poor performance of traditional UAV-based networks. IRSs can not only reconstruct the wireless environment in a unique way, but also achieve wireless network relay in a cost-effective manner. Hence, it receives significant attention as a promising solution to solve the above challenges. In this article, we conduct a comprehensive survey on IRS-assisted UAV communications for 6G networks. First, primary issues, key technologies, and application scenarios of IRS-assisted UAV communications for 6G networks are introduced. Then, we put forward specific solutions to the issues of IRS-assisted UAV communications. Finally, we discuss some open issues and future research directions to guide researchers in related fields.
摘要
在6代手机网络(6G)中,智能反射表面(IRS)和无人机(UAV)已经出现为解决地面网络覆盖困难和资源约束所提出的技术。无人机,通过其 mobilidad和低成本,为移动用户提供多样的连接选择和6G网络的新部署方式。然而,无人机的电池容量有限,通信环境动态和随机,以及通信资源限制,导致传统的无人机基站网络的性能差。IRS可以不仅重构无线环境,还可以实现无线网络转发在成本效果的方式。因此,它在6G网络中获得了广泛关注。在本文中,我们进行了6G网络中IRS协助无人机通信的全面调查。首先,我们介绍了6G网络中IRS协助无人机通信的主要问题、关键技术和应用场景。然后,我们提出了解决IRS协助无人机通信的具体问题的解决方案。最后,我们讨论了一些未解决的问题和未来研究方向,以便引导相关领域的研究人员。
Structured Two-Stage True-Time-Delay Array Codebook Design for Multi-User Data Communication
results: 研究人员通过分析closed-form两个阶段的设计方法,发现该方法可以实现所需的带分带特定的扩散谱,并且在多用户通信网络中表现出优秀的性能。Abstract
Wideband millimeter-wave and terahertz (THz) systems can facilitate simultaneous data communication with multiple spatially separated users. It is desirable to orthogonalize users across sub-bands by deploying frequency-dependent beams with a sub-band-specific spatial response. True-Time-Delay (TTD) antenna arrays are a promising wideband architecture to implement sub-band-specific dispersion of beams across space using a single radio frequency (RF) chain. This paper proposes a structured design of analog TTD codebooks to generate beams that exhibit quantized sub-band-to-angle mapping. We introduce a structured Staircase TTD codebook and analyze the frequency-spatial behaviour of the resulting beam patterns. We develop the closed-form two-stage design of the proposed codebook to achieve the desired sub-band-specific beams and evaluate their performance in multi-user communication networks.
摘要
宽带毫米波和tera赫兹(THz)系统可以实现同时数据通信多个空间分隔的用户。希望对sub-band进行正交化用户,通过部署频率 dependent的束缚来实现。快时延迟(TTD)天线阵列是一种广泛应用的宽带体系,可以通过单个电磁波(RF)链实现sub-band特定的束缚分布在空间中。本文提出了一种结构化的analog TTD编码ebook的设计,以生成具有量化sub-band-to-angle映射的束缚。我们介绍了一种结构化的梯形 TTD编码ebook,并分析了其在resulting beam pattern中的频率空间行为。我们还开发了closed-form two-stage设计的提议,以实现所要的desired sub-band特定的束缚,并评估其在多用户通信网络中的性能。
SWIPT in Mixed Near- and Far-Field Channels: Joint Beam Scheduling and Power Allocation
results: 作者提出了一种高效的算法,可以在多个EH和ID接收器的情况下获得一个近似解,使用了变量排除法和 successive convex approximation 方法。作者还通过对特殊情况的研究,获得了有用的洞察。数值结果显示,作者的提议的共同设计在比较其他参照方案无法优化的情况下显著超越。Abstract
Extremely large-scale array (XL-array) has emerged as a promising technology to enhance the spectrum efficiency and spatial resolution in future wireless networks by exploiting massive number of antennas for generating pencil-like beamforming. This also leads to a fundamental paradigm shift from conventional far-field communications towards the new near-field communications. In contrast to the existing works that mostly considered simultaneous wireless information and power transfer (SWIPT) in the far field, we consider in this paper a new and practical scenario, called mixed near- and far-field SWIPT, where energy harvesting (EH) and information decoding (ID) receivers are located in the near- and far-field regions of the XL-array base station (BS), respectively. Specifically, we formulate an optimization problem to maximize the weighted sum-power harvested at all EH receivers by jointly designing the BS beam scheduling and power allocation, under the constraints on the maximum sum-rate and BS transmit power. First, for the general case with multiple EH and ID receivers, we propose an efficient algorithm to obtain a suboptimal solution by utilizing the binary variable elimination and successive convex approximation methods. To obtain useful insights, we then study the joint design for special cases. In particular, we show that when there are multiple EH receivers and one ID receiver, in most cases, the optimal design is allocating a portion of power to the ID receiver for satisfying the rate constraint, while the remaining power is allocated to one EH receiver with the highest EH capability. This is in sharp contrast to the conventional far-field SWIPT case, for which all powers should be allocated to ID receivers. Numerical results show that our proposed joint design significantly outperforms other benchmark schemes without the optimization of beam scheduling and/or power allocation.
摘要
很大规模的数组(XL-数组)已经成为未来无线网络中提高频谱效率和空间分辨率的有前途技术,通过利用庞大的天线数量生成射频束形成。这也导致了传统远场通信的基本思想的变革,转移到新的近场通信。与现有工作一样,我们在这篇论文中考虑了同时进行无线信息和能量传输(SWIPT)的远场和近场两种情况。在这种情况下,我们将BS的扫描方向和功率分配进行优化,以最大化所有能量收集器(EH)的总功率。特别是,我们提出了一个优化问题,以最大化所有EH收集器的总功率,并且受到BS发射功率的最大化和最小化率限制。首先,我们对多个EH和ID收集器的总 случа进行了有效的解决方案,通过利用二进制变量消除和束缚函数方法。为了获得有用的结论,我们还研究了特殊情况下的共同设计。结果显示,当有多个EH收集器和一个ID收集器时,优化设计通常是将一部分功率分配给ID收集器,以满足速率约束,而剩余的功率分配给EH收集器中功率最高的一个。这与远场SWIPT情况不同,在那里,所有功率都应该分配给ID收集器。numerical results表明,我们的提议的共同设计明显超越了不包括扫描方向和/或功率分配优化的参考方案。