paper_authors: Abdolmahdi Bagheri, Mohammad Pasande, Kevin Bello, Alireza Akhondi-Asl, Babak Nadjar Araabi
for: EXTRACTING THE DYNAMIC EFFECTIVE CONNECTOME (DEC)
methods: BAYESIAN DYNAMIC DAG LEARNING WITH M-MATRICES ACYCLICITY CHARACTERIZATION (BDyMA)
results: MORE ACCURATE AND RELIABLE DEC COMPARED TO STATE-OF-THE-ART AND BASELINE METHODS, IMPROVED RESULTS WHEN INCORPORATING DTI DATA AS PRIOR KNOWLEDGEAbstract
Understanding the complex mechanisms of the brain can be unraveled by extracting the Dynamic Effective Connectome (DEC). Recently, score-based Directed Acyclic Graph (DAG) discovery methods have shown significant improvements in extracting the causal structure and inferring effective connectivity. However, learning DEC through these methods still faces two main challenges: one with the fundamental impotence of high-dimensional dynamic DAG discovery methods and the other with the low quality of fMRI data. In this paper, we introduce Bayesian Dynamic DAG learning with M-matrices Acyclicity characterization \textbf{(BDyMA)} method to address the challenges in discovering DEC. The presented dynamic causal model enables us to discover bidirected edges as well. Leveraging an unconstrained framework in the BDyMA method leads to more accurate results in detecting high-dimensional networks, achieving sparser outcomes, making it particularly suitable for extracting DEC. Additionally, the score function of the BDyMA method allows the incorporation of prior knowledge into the process of dynamic causal discovery which further enhances the accuracy of results. Comprehensive simulations on synthetic data and experiments on Human Connectome Project (HCP) data demonstrate that our method can handle both of the two main challenges, yielding more accurate and reliable DEC compared to state-of-the-art and baseline methods. Additionally, we investigate the trustworthiness of DTI data as prior knowledge for DEC discovery and show the improvements in DEC discovery when the DTI data is incorporated into the process.
摘要
理解大脑的复杂机制可以通过提取动态有效连接oma(DEC)来解开。最近,使用分数基数 Directed Acyclic Graph(DAG)发现方法已经显示出了重要的改进,可以捕捉 causal structure 和生成有效连接。然而,通过这些方法学习 DEC 仍面临两个主要挑战:一是高维动态 DAG 发现方法的基础不足,二是 fMRI 数据质量低。在这篇文章中,我们介绍了 Bayesian 动态 DAG 学习方法(BDyMA),用于解决这两个挑战。BDyMA 方法使得我们可以发现拥有 bidirected 边的动态 causal模型。利用 BDyMA 方法不受限制的框架,可以更加准确地检测高维网络,从而更好地提取 DEC。此外,BDyMA 方法的分数函数允许我们在动态 causal发现过程中 incorporate 先前知识,进一步提高结果的准确性。在 synthetic 数据和 HCP 数据上进行的广泛的 simulate 和实验表明,我们的方法可以解决两个主要挑战,并且比基eline 和状态艺术方法更准确和可靠。此外,我们还 investigate DTI 数据的可靠性作为 DEC 发现的优先知识,并显示在 incorporate DTI 数据到发现过程中可以提高 DEC 的准确性。
SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks
results: 对比各种现有压缩器,SRN-SZ实现了75%的压缩率提升,并且在同等PSNR下实现了80%的压缩率提升。Abstract
The fast growth of computational power and scales of modern super-computing systems have raised great challenges for the management of exascale scientific data. To maintain the usability of scientific data, error-bound lossy compression is proposed and developed as an essential technique for the size reduction of scientific data with constrained data distortion. Among the diverse datasets generated by various scientific simulations, certain datasets cannot be effectively compressed by existing error-bounded lossy compressors with traditional techniques. The recent success of Artificial Intelligence has inspired several researchers to integrate neural networks into error-bounded lossy compressors. However, those works still suffer from limited compression ratios and/or extremely low efficiencies. To address those issues and improve the compression on the hard-to-compress datasets, in this paper, we propose SRN-SZ, which is a deep learning-based scientific error-bounded lossy compressor leveraging the hierarchical data grid expansion paradigm implemented by super-resolution neural networks. SRN-SZ applies the most advanced super-resolution network HAT for its compression, which is free of time-costing per-data training. In experiments compared with various state-of-the-art compressors, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound and up to 80% compression ratio improvements under the same PSNR than the second-best compressor.
摘要
现代超级计算机系统的快速增长和大规模数据管理带来了科学数据管理的大 Challenge。以保持科学数据的可用性,错误约束的损失压缩被提出和开发为科学数据的大小减少的关键技术。各种科学仿真数据中的数据不同,一些数据无法使用现有的错误约束损失压缩器进行有效压缩。人工智能的最近成功激发了一些研究人员将神经网络 integrate into error-bounded lossy compressors。然而,这些工作仍然受到有限的压缩比和/或非常低的效率的限制。为了解决这些问题并提高压缩硬度很大的数据集,在这篇论文中,我们提出了SRN-SZ,它是基于科学数据的深度学习损失约束压缩器。SRN-SZ利用了最先进的超分辨率网络HAT进行压缩,无需时间成本的每个数据点训练。在对比各种当前状态的压缩器时,SRN-SZ可以达到75%的压缩率提升,同时保持与第二最佳压缩器相同的PSNR。
Brief technical note on linearizing recurrent neural networks (RNNs) before vs after the pointwise nonlinearity
paper_authors: Marino Pagan, Adrian Valente, Srdjan Ostojic, Carlos D. Brody
for: study the properties of recurrent neural networks (RNNs)
methods: linearization of activation dynamics and activity dynamics
results: context-dependent effects are more apparent under linearization of activity dynamics than under linearization of activation dynamicsAbstract
Linearization of the dynamics of recurrent neural networks (RNNs) is often used to study their properties. The same RNN dynamics can be written in terms of the ``activations" (the net inputs to each unit, before its pointwise nonlinearity) or in terms of the ``activities" (the output of each unit, after its pointwise nonlinearity); the two corresponding linearizations are different from each other. This brief and informal technical note describes the relationship between the two linearizations, between the left and right eigenvectors of their dynamics matrices, and shows that some context-dependent effects are readily apparent under linearization of activity dynamics but not linearization of activation dynamics.
摘要
Optimal Transport with Tempered Exponential Measures
paper_authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
for: 这 paper 是关于最优运输问题的研究,具体来说是关于不 Regularized 最优运输和带有抽象 Regularization 的最优运输之间的比较。
methods: 这 paper 使用了一种基于温和凝聚推荐的方法,即使用温和凝聚推荐来对最优运输问题进行解决。
results: 这 paper 得到了一种中间点,即使用温和凝聚推荐可以获得非常快的近似算法和控制在某些程度上的稀疏性。此外,这种方法还适用于不均衡最优运输问题的设定中。Abstract
In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, ``\`a-la-Kantorovich'', which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, ``\`a-la-Sinkhorn-Cuturi'', which gets near-linear approximation algorithms but leads to maximally un-sparse plans. In this paper, we show that a generalization of the latter to tempered exponential measures, a generalization of exponential families with indirect measure normalization, gets to a very convenient middle ground, with both very fast approximation algorithms and sparsity which is under control up to sparsity patterns. In addition, it fits naturally in the unbalanced optimal transport problem setting as well.
摘要
在优化运输领域,有两个主要子领域面对着:(i)不规则优化运输,“\`a-la-Kantorovich”,它会导致非常稀疏的计划,但算法扩展不好;(ii)尼科特-库图里的热力学减扰优化运输,“\`a-la-Sinkhorn-Cuturi”,它可以得到近线性approximation算法,但会导致最稀疏的计划。在这篇论文中,我们表明了一种扩展后者的概念,即对温和减扰度量的扩展,可以达到非常方便的中间点,具有非常快的approximation算法和控制在某些几何结构上的稀疏性。此外,它自然地适应了不平衡优化运输问题的设定。
An Element-wise RSAV Algorithm for Unconstrained Optimization Problems
results: 通过大量的数学实验,证明了算法的稳定性和快速收敛性。Abstract
We present a novel optimization algorithm, element-wise relaxed scalar auxiliary variable (E-RSAV), that satisfies an unconditional energy dissipation law and exhibits improved alignment between the modified and the original energy. Our algorithm features rigorous proofs of linear convergence in the convex setting. Furthermore, we present a simple accelerated algorithm that improves the linear convergence rate to super-linear in the univariate case. We also propose an adaptive version of E-RSAV with Steffensen step size. We validate the robustness and fast convergence of our algorithm through ample numerical experiments.
摘要
我团队提出了一种新的优化算法,元素缓和Scalar副Variables(E-RSAV),它满足不受条件的能量耗散定律并且在修改后的能量与原始能量之间存在改进的对齐。我们的算法具有准确的线性增长证明在凸设定下。此外,我们还提出了一种简单的加速算法,可以在单变量情况下提高线性增长率到超线性级别。此外,我们还提出了基于E-RSAV的适应版本,使用Steffensen步长。我们通过丰富的数学实验证明了我们的算法的稳定性和快速增长。
Creating a Systematic ESG (Environmental Social Governance) Scoring System Using Social Network Analysis and Machine Learning for More Sustainable Company Practices
paper_authors: Aarav Patel, Peter Gloor for:This paper aims to create a data-driven ESG evaluation system that provides better guidance and more systemized scores by incorporating social sentiment.methods:The authors use Python web scrapers to collect data from Wikipedia, Twitter, LinkedIn, and Google News for S&P 500 companies. They then clean and pass the data through NLP algorithms to obtain sentiment scores for ESG subcategories. Machine-learning algorithms are trained and calibrated to S&P Global ESG Ratings to test their predictive capabilities.results:The Random-Forest model shows encouraging results, with a mean absolute error of 13.4% and a correlation of 26.1% (p-value 0.0372). The authors conclude that measuring ESG social sentiment across sub-categories can help executives focus efforts on areas people care about most, and that this data-driven methodology can provide ratings for companies without coverage, allowing more socially responsible firms to thrive.Here’s the simplified Chinese text:for:这篇论文目的是创建一个基于数据的ESG评估系统,以提供更好的指导和更系统化的评估结果,通过包括社交情绪。methods:作者使用Python网络抓取器收集Wikipedia、Twitter、LinkedIn和Google News上的S&P 500公司数据,然后对数据进行清洁和NLP算法处理,以获取ESG子类别的情绪分数。他们使用机器学习算法和S&P全球ESG评级calibration来测试预测能力。results:Random Forest模型表现很有挑战性, Mean Absolute Error为13.4%,Correlation为26.1%(p-value 0.0372)。作者认为,通过评估ESG社交情绪 across sub-categories可以帮助行政官员更加专注于人们关心的方面,并且这种数据驱动的方法ологи可以为没有评估的公司提供评估结果,让更多的社会责任公司得以发展。Abstract
Environmental Social Governance (ESG) is a widely used metric that measures the sustainability of a company practices. Currently, ESG is determined using self-reported corporate filings, which allows companies to portray themselves in an artificially positive light. As a result, ESG evaluation is subjective and inconsistent across raters, giving executives mixed signals on what to improve. This project aims to create a data-driven ESG evaluation system that can provide better guidance and more systemized scores by incorporating social sentiment. Social sentiment allows for more balanced perspectives which directly highlight public opinion, helping companies create more focused and impactful initiatives. To build this, Python web scrapers were developed to collect data from Wikipedia, Twitter, LinkedIn, and Google News for the S&P 500 companies. Data was then cleaned and passed through NLP algorithms to obtain sentiment scores for ESG subcategories. Using these features, machine-learning algorithms were trained and calibrated to S&P Global ESG Ratings to test their predictive capabilities. The Random-Forest model was the strongest model with a mean absolute error of 13.4% and a correlation of 26.1% (p-value 0.0372), showing encouraging results. Overall, measuring ESG social sentiment across sub-categories can help executives focus efforts on areas people care about most. Furthermore, this data-driven methodology can provide ratings for companies without coverage, allowing more socially responsible firms to thrive.
摘要
环境社会治理(ESG)是一个广泛使用的指标,用于衡量公司的可持续发展实践。目前,ESG是通过自我报告的公司签订,这使得公司可以 artificially 呈现出正面的形象。因此,ESG评估是主观的,不一致的评估人员给出的批评是混乱的,对于执行人来说是不明确的。这个项目的目标是创建一个数据驱动的 ESG 评估系统,可以提供更好的指导和更系统的分数,通过包含社会情绪的社会情绪。社会情绪可以提供更加均衡的视角,直接反映公众的意见,帮助公司制定更集中和有效的措施。为建立这个系统,我们使用 Python 网络抓取工具收集 Wikipedia、Twitter、LinkedIn 和 Google News 上的 S&P 500 公司数据。然后,我们清洁和处理数据,并通过自然语言处理(NLP)算法获得 ESG 子类别的情绪分数。使用这些特征,我们使用机器学习算法进行训练和调整,以测试其预测能力。Random Forest 模型是最强的模型,其 mean absolute error 为 13.4%, correlation 为 26.1%(p-value 0.0372),这表示有很好的结果。总的来说,通过社会情绪的 ESG 评估可以帮助执行人员更好地关注人们关心的方面。此外,这种数据驱动的方法ология可以为没有评估的公司提供评估分数,使更多的社会责任感的公司能够成功。
Derivation of Coordinate Descent Algorithms from Optimal Control Theory
results: 研究发现,使用 Hessian 函数作为搜索向量的操作度量时,coordinate descent 算法的 converges 与 Lyapunov 函数的控制式衰减相关。Abstract
Recently, it was posited that disparate optimization algorithms may be coalesced in terms of a central source emanating from optimal control theory. Here we further this proposition by showing how coordinate descent algorithms may be derived from this emerging new principle. In particular, we show that basic coordinate descent algorithms can be derived using a maximum principle and a collection of max functions as "control" Lyapunov functions. The convergence of the resulting coordinate descent algorithms is thus connected to the controlled dissipation of their corresponding Lyapunov functions. The operational metric for the search vector in all cases is given by the Hessian of the convex objective function.
摘要
最近,有人提出了论点,即不同的优化算法可能可以从优化控制理论中汇集起来。我们在这里进一步发展这个提议,证明基本坐标降降算法可以从这种新的原理中 derivation。具体来说,我们证明了使用最大原理和一组 max 函数作为 "控制" Lyapunov 函数,可以 derivation 基本坐标降降算法。这些算法的 converge 因此与它们相应的 Lyapunov 函数的控制式耗散相连接。搜索向量的运算度量在所有情况下均给出为对称的 Hessian Matrix。
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation
results: 在各种任务上,DBsurf estimator在最小二乘问题中具有最低的方差,并在不同的数据集和抽样设置下训练VAEs时达到了最好的结果。此外,这篇论文还应用了DBsurf estimator来建立了一种简单而高效的神经网络架构搜索算法,实现了状态空间的最佳性能。Abstract
Computing gradients of an expectation with respect to the distributional parameters of a discrete distribution is a problem arising in many fields of science and engineering. Typically, this problem is tackled using Reinforce, which frames the problem of gradient estimation as a Monte Carlo simulation. Unfortunately, the Reinforce estimator is especially sensitive to discrepancies between the true probability distribution and the drawn samples, a common issue in low sampling regimes that results in inaccurate gradient estimates. In this paper, we introduce DBsurf, a reinforce-based estimator for discrete distributions that uses a novel sampling procedure to reduce the discrepancy between the samples and the actual distribution. To assess the performance of our estimator, we subject it to a diverse set of tasks. Among existing estimators, DBsurf attains the lowest variance in a least squares problem commonly used in the literature for benchmarking. Furthermore, DBsurf achieves the best results for training variational auto-encoders (VAE) across different datasets and sampling setups. Finally, we apply DBsurf to build a simple and efficient Neural Architecture Search (NAS) algorithm with state-of-the-art performance.
摘要
计算对分布参数的期望梯度问题在多个科学和工程领域中出现。通常,这种问题使用Reinforce来解决,它将问题定义为一个Monte Carlo simulation中的gradient estimation问题。然而,Reinforce estimator尤其容易受到真实分布和抽样分布之间的差异影响,这在低抽样范围下导致gradient estimate不准确。在这篇论文中,我们介绍了DBsurf,一种基于Reinforce的抽样方法,可以减少抽样与真实分布之间的差异。为评估我们的计算器的性能,我们将其应用到多个任务上。与现有的计算器相比,DBsurf在通用的least squares问题中具有最低的方差,并在不同的数据集和抽样设置下训练VAEs时达到最好的结果。最后,我们使用DBsurf构建了一个简单而高效的Neural Architecture Search(NAS)算法,并达到了当前最佳性能。
Automatic Concept Embedding Model (ACEM): No train-time concepts, No issue!
results: 论文的结果表明,ACEMs 可以在大量数据集上学习概念嵌入模型,并且可以提供高度可解释的模型。Abstract
Interpretability and explainability of neural networks is continuously increasing in importance, especially within safety-critical domains and to provide the social right to explanation. Concept based explanations align well with how humans reason, proving to be a good way to explain models. Concept Embedding Models (CEMs) are one such concept based explanation architectures. These have shown to overcome the trade-off between explainability and performance. However, they have a key limitation -- they require concept annotations for all their training data. For large datasets, this can be expensive and infeasible. Motivated by this, we propose Automatic Concept Embedding Models (ACEMs), which learn the concept annotations automatically.
摘要
neural networks的可解释性和可读性在不断增加的重要性,尤其在安全关键领域和提供社会的解释权。基于概念的解释方法align well with human reasoning,证明是一种好的解释模型。基于概念嵌入模型(CEMs)是一种这种基于概念的解释建筑。它们能够超越性能和解释之间的负面平衡。然而,它们需要全量的概念标注数据来训练。对于大量数据,这可能是昂贵和不可能的。motivated by这个问题,我们提议自动学习的概念嵌入模型(ACEMs),可以自动学习概念标注。
A Tutorial on the Non-Asymptotic Theory of System Identification
results: 论文结束部分概述了如何将介绍的想法推广到某些非线性identification问题。Abstract
This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.
摘要
这个教程是非对称方法理论中线性系统识别的引入教程,我们强调这些工具在这个领域中特别有用,如覆盖技术、汉生-瑞特不等式和自正常马丁加尔方法。然后,我们使用这些工具来提供了直观的 proofs of 非对称基于最小二乘估计器在拟合 autoregressive 模型中的性能。最后,我们简要介绍了如何通过这些想法来扩展到某些非线性识别问题。
Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples
results: 我们的主要结果是,只需要$\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$个样本,就可以估算一个混合的高斯分布,保证总变量距离为α,并满足($\varepsilon$, $\delta$)-DP。这是首次没有任何结构假设的finite sample complexity上界。Abstract
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].
摘要
我们研究了在对应条件下的数据隐私(DP)下预测混合的高斯分布的问题。我们的主要结果是,需要$\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$样本来预测$k$个高斯分布,到达条件下的差异量$\alpha$,并且满足$(\varepsilon, \delta)$-DP。这是首个不假设GMM的结构性质的终点复杂度上限。我们还提出了一个新的框架,可能对其他任务有用。高水平上,我们证明了如果一个分布集(如高斯分布)满足以下两个条件:一、可以列出decodable(如高斯分布),二、对于总差异量有"地方小"的覆盖(BKSW19),则这个分布集的混合是私有可学习的。证明绕过了已知的障碍,表明GMM不满足"地方小"的覆盖(AAL21)。
Gradient-Based Feature Learning under Structured Data
paper_authors: Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu
for: 本文研究了单指数模型在带有峰值结构的输入数据上的学习复杂度,并发现了一些有趣的现象。
methods: 本文使用了各种方法,包括径向加速器和权重正规化,以处理带有峰值结构的输入数据。
results: 研究发现,在带有峰值结构的情况下,通常使用的径向加速器可能会导致错误的方向恢复,而适当的权重正规化可以解决这个问题。此外,通过利用输入峰值结构和目标之间的对齐,本文可以获得改进的样本复杂度和超越下界的 rotationally invariant kernel 方法。Abstract
Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
摘要
近期研究显示,梯度基本学习单指数模型(即输入数据的1维投影函数)的样本复杂性被决定于其信息幂。但这些结果仅适用于各向同性数据,而在实际应用中输入数据通常具有附加的结构,这些结构可以通过梯度学习算法来导航。在这项工作中,我们调查了隐式协调结构的影响,并发现了一些有趣的现象。首先,我们发现在非各向同性设置下,通常使用的圆形梯度动力学可能无法回归真实方向,即使折射完全与目标方向一致。其次,我们发现适当的权重 нормализа可以解决这个问题。最后,通过利用输入协调矩阵和目标之间的对齐,我们获得了改进的样本复杂性,比如在隐式模型下,通过适当的权重 нормализа,梯度基本学习的样本复杂性可以独立于信息幂,同时超过了低界 для旋转不变kernel方法。
Early warning via transitions in latent stochastic dynamical systems
paper_authors: Lingyu Feng, Ting Gao, Wang Xiao, Jinqiao Duan
for: Early warnings for dynamical transitions in complex systems or high-dimensional observation data
methods: Directed anisotropic diffusion map to capture latent evolutionary dynamics in low-dimensional manifold
results: Successfully found appropriate effective coordinates and derived early warning signals capable of detecting tipping point during state transition, bridging latent dynamics with original dataset, validated as accurate and effective through numerical experiments.Abstract
Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.
摘要
<>将文本翻译成简化中文。<>在复杂系统或高维观测数据中提供早期警示是许多实际应用中非常重要的,如基因变化、脑疾病、自然灾害、金融危机和工程可靠性。为了有效提取早期警示信号,我们开发了一种新的方法:指定方向的异otropic扩散地图,可以捕捉低维抽象 manifold 中的潜在演化动力学。通过应用这种方法,我们成功地找到了有效的坐标,并 derivation 出早期警示信号,可以检测状态过渡中的致点。我们的方法将潜在动力学与原始数据相连接。这种框架在数学实验中被证明是准确和有效的,在密度和过渡概率方面。结果显示,第二坐标包含了关键过渡中的有用信息。
Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning
paper_authors: Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine for:* 论文旨在帮助用户完成Sequential Decision-Making任务,如Robotic Teleoperation,使用含有噪音和维度的命令信号(例如从脑机器 interfaces)。methods:* 使用人类在循环机器学习中参与系统,以提高系统性能,但通常受到个人用户数据收集的限制。* 提出一种基于强化学习的算法,用于训练接口将原始命令信号映射到动作,使用线上预训练和线下细化。results:* 在12名参与者进行的模拟导航任务中,我们的方法能够更多地帮助用户完成目标,比基eline方向式接口更successful。* 在模拟的Sawyer推动任务和Lunar Lander游戏中,我们的方法也比基eline接口更好。* 对于模拟用户命令的杜氏实验表明,每个方法的组成部分均具有重要的作用。Abstract
Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.
摘要
便捷 интерфейс可以帮助用户完成顺序决策任务,如机器人 теле操作,当 command signal 具有噪音和高维度特征时。现代人类在机器学习循环中的协作技术可以使这些系统进步,但它们通常受到个人用户数据收集的限制。在这篇论文中,我们提出了一种强化学习算法,用于训练 interface 将原始 command signal 映射到操作动作。为了解决噪音 command signal 和罕见奖励的挑战,我们开发了一种新的用户长期意图表示和推理方法。我们主要通过一个用户研究,在12名参与者通过眼睛移动控制128个command signal从webcam中获取的高维度指令信号中完成了一个模拟的导航任务,以证明我们的方法可以在噪音指令信号下帮助用户成功完成目标导航。此外,我们还在模拟的Sawyer推动任务和Lunar Lander游戏中使用眼睛控制,并发现我们的方法在这些领域中也超越基eline интерфей斯。我们还进行了大量的减少实验,以确认每个方法组件的实际效果。
Learning from Demonstration via Probabilistic Diagrammatic Teaching
results: 研究人员通过实验和实际应用 Validated the effectiveness of their proposed framework, showing that it can be used to teach new skills to both fixed-base and quadruped-mounted manipulators.Abstract
Learning for Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, applies ray-tracing to find corresponding regions in 3D Cartesian space, and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.
摘要
学习示例(LfD)允许机器人学习新技能,通过复制专家示范,让用户通过直观的方式表达指令。现代进步通常基于手势教学或远程操作作为用户指定示范的媒体。手势教学需要机器人的物理执行,而远程操作需要额外硬件的熟悉。这篇论文介绍了一种替代方案called Diagrammatic Teaching。Diagrammatic Teaching的目标是通过让用户在2D场景图上绘制示范轨迹,并将其 sinthezied为3D任务空间的生成模型。此外,我们还提出了RAY-TRACING概率轨迹学(RPTL)框架。RPTL从2D绘制中提取时间变化的概率密度,应用RAY-TRACING找到相应的3D坐标空间中的区域,并适应一个概率性运动轨迹模型。新的运动轨迹可以通过这个模型来生成,这些轨迹与用户绘制的示范轨迹类似。我们在实验中Validation我们的框架,包括一个固定基 manipulator和一个四脚搭载 manipulator。
Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues
methods: 本研究使用了一种新的 Prime and Modulate 方法,它利用误差信号的正负pole来驱动学习,而不需要normalization技术或限制activation函数。
results: 实验结果表明, compared to conventional back-propagation方法, Prime and Modulate 方法可以快速提高学习的速度和稳定性。Abstract
Deep neural networks employing error back-propagation for learning can suffer from exploding and vanishing gradient problems. Numerous solutions have been proposed such as normalisation techniques or limiting activation functions to linear rectifying units. In this work we follow a different approach which is particularly applicable to closed-loop learning of forward models where back-propagation makes exclusive use of the sign of the error signal to prime the learning, whilst a global relevance signal modulates the rate of learning. This is inspired by the interaction between local plasticity and a global neuromodulation. For example, whilst driving on an empty road, one can allow for slow step-wise optimisation of actions, whereas, at a busy junction, an error must be corrected at once. Hence, the error is the priming signal and the intensity of the experience is a modulating factor in the weight change. The advantages of this Prime and Modulate paradigm is twofold: it is free from normalisation and it makes use of relevant cues from the environment to enrich the learning. We present a mathematical derivation of the learning rule in z-space and demonstrate the real-time performance with a robotic platform. The results show a significant improvement in the speed of convergence compared to that of the conventional back-propagation.
摘要
Empirical Risk Minimization for Losses without Variance
paper_authors: Guanhua Fang, Ping Li, Gennady Samorodnitsky for: 这个论文考虑了一个Empirical Risk Minimization(ERM)问题在重 tailed 设置下,其数据没有固定方差,仅有 $p$-th moment,其中 $p \in (1,2)$。methods: 而不是使用 truncated 观察数据的估计过程,这个论文选择了使用 optimizer 来iminize 风险值。 Catoni 的方法 (Catoni, 2012) 可以帮助我们Robustly 估计风险值。通过 Catoni-type 影响函数的结构,我们可以通过 Generalized Generic Chaining 方法来确定过程溢出风险Upper Bounds。results: 我们在计算问题上进行了深入的理论研究,包括Robust Gradient Descent 算法和 Empirical Risk-based 方法。在广泛的数值研究中,我们发现optimizer 基于 Catoni-style 估计实际上在其他基线之上表现更好,这表明直接基于 truncated 数据进行估计可能会得到不满足的结果。Abstract
This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
摘要
这篇论文考虑了一个实际风险最小化问题,该问题在重 tailed 设定下进行,即数据不具有稳定 variance,仅仅具有 $p$-th 积分。而不是使用 truncated 观察数据的估计过程,我们选择了使用风险值来选择优化器。这些风险值可以通过 Catoni 方法 (Catoni, 2012) 进行可靠地估计。由于 Catoni-type 影响函数的结构,我们可以通过通用化 Generic Chaining 方法来确定过量风险上界。此外,我们还考虑了计算问题。我们特别是使用 robust 梯度下降算法和 empirical risk-based 方法进行优化。通过广泛的数值研究,我们发现了基于 empirical risks 的 Catoni-style 估计实际上比其他基准 mejor 的性能。这表明直接基于 truncated 数据进行估计可能会导致不满足的结果。
Improved theoretical guarantee for rank aggregation via spectral method
for: Ranking multiple items based on pairwise comparisons, with applications in sports, recommendation systems, and other web applications.
methods: Spectral ranking algorithms based on unnormalized and normalized data matrices, with a focus on deriving entry-wise perturbation error bounds and an error bound on the maximum displacement for each item.
results: Improved sample complexity and theoretical analysis of the eigenvectors and error bounds for the ranking problem, with confirmation from numerical experiments.Abstract
Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
摘要
In the ERO model, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart.Using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.Here is the text in Simplified Chinese:给定多个对比的对象,如何将它们排序,以便与观测匹配?这个问题,称为排名聚合,在运动、推荐系统等领域都有广泛的应用。然而,找到一个全局排名,以最小化差异,是NP困难的,因此我们关注了 Erdős-Rényi 偏移(ERO)模型。在这个模型中,每个对比都是对真实分数差的损害版本。我们研究基于不归一化和归一化数据矩阵的spectral排名算法。关键在于理解它们在真实分数中还原数据的性能。这可以通过对数据矩阵的主 eigenvector 和人口矩阵的主 eigenvector 之间的入口级别偏差来解释。使用离散一个技术,我们提供了更加紧张的 $\ell _{\infty}$ 偏差 bound,以及每个对象的最大偏移 bound。我们的理论分析超过了当前最佳结果,并且我们的数值实验也证实了这些理论发现。
Conformal Autoregressive Generation: Beam Search with Coverage Guarantees
results: 我们为每种方法提供了边缘保证 bound,并对其进行了实验评估,包括自然语言处理和化学等领域的任务。Abstract
We introduce two new extensions to the beam search algorithm based on conformal predictions (CP) to produce sets of sequences with theoretical coverage guarantees. The first method is very simple and proposes dynamically-sized subsets of beam search results but, unlike typical CP procedures, has an upper bound on the achievable guarantee depending on a post-hoc calibration measure. Our second algorithm introduces the conformal set prediction procedure as part of the decoding process, producing a variable beam width which adapts to the current uncertainty. While more complex, this procedure can achieve coverage guarantees selected a priori. We provide marginal coverage bounds for each method, and evaluate them empirically on a selection of tasks drawing from natural language processing and chemistry.
摘要
我们介绍两个基于对准预测(CP)的统计搜寻算法扩展,以生成具有理论覆盖保证的序列集。第一种方法非常简单,它在统计搜寻结果中动态地选择子集,但与传统CP程序不同,它的最高保证价随后续检测度量而定。我们的第二个算法则在解码过程中引入对准集预测程序,将统计搜寻结果中的宽度调整为目前的不确定程度。这个方法处理更加复杂,但可以预先选择保证。我们提供每方法的边界覆盖保证,并对其进行实验评估,包括自然语言处理和化学领域的任务。
Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences
For: The paper aims to enhance the adversarial robustness of deep learning models against various attacks, such as FGSM and PGD.* Methods: The paper proposes a novel approach called $ARMOR_D$, which uses optimal-transport-regularized divergences to enhance adversarial robustness. This approach involves maximizing the expected loss over a neighborhood of distributions, known as distributionally robust optimization.* Results: The paper demonstrates the effectiveness of $ARMOR_D$ on malware detection and image recognition applications, achieving higher robustness against adversarial attacks compared to prior methods. Specifically, $ARMOR_D$ yields a robustified accuracy of 98.29% against FGSM and 98.18% against PGD on the MNIST dataset, and improves the robustified accuracy in malware detection by 37.0% compared to previous best-performing methods.Abstract
We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.
摘要
我们介绍$ARMOR_D$方法作为深度学习模型的攻击抗性增强方法。这些方法基于一种新的最佳运输规则化分散,通过infimal混合两种信息分散和最佳运输(OT)成本。我们使用这些工具增强攻击抗性,通过范围内的分布预期损失最大化,称为分布式准确估算。 viewed as a sample生成工具,我们的方法可以将样本运输 according to OT 成本,并重新权重 according to 信息分散。我们在这篇文章中证明了$ARMOR_D$ 的有效性,在运算系统识别和图像识别应用中,它可以比对现有方法提高抗性 against 攻击攻击。 $ARMOR_D$ 在 MNIST dataset 上获得了$98.29\%$ 的抗性精度,比对 $FGSM$ 和 $PGD^{40}$ 攻击下的最佳性能优化 $19.7\%$ 和 $37.2\%$ 。在运算系统识别中,一个简单的二进制数据领域,$ARMOR_D$ 可以比对先前最佳的对抗训练方法提高抗性精度 $37.0\%$,同时降低伪阳性和伪阴性的比率 $51.1\%$ 和 $57.53\%$。
Neural lasso: a unifying approach of lasso and neural networks
results: 在开发修改后,一种新的优化算法 для标识重要变量出现了。使用 sintetic和实际数据集进行实验表明,这种新的优化算法在三种之前的优化方法中表现更好。Abstract
In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
摘要
results: 论文提出了一个新的速度分析方法,可以保证DASGD算法在不同网络拓扑和缓存环境下的 converges 速度为 $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$,其中 $S_{avg}$ 是模型之间的均值差,$Q$ 是惩罚函数的最大值,$\epsilon$ 是允许的误差。Abstract
Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers.
摘要
Over the past few decades, Stochastic Gradient Descent (SGD) has been extensively studied by the Machine Learning community. Despite its versatility and excellent performance, optimizing large models via SGD is still a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) is faster than mini-batch SGD. However, most ASGD convergence-rate proofs rely on a centralized parameter server, which can become a bottleneck when scaling out the gradient computations across many distributed processes.In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) that does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a small error allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We expect that our results will be highly relevant for the adoption of DASGD by a broad community of researchers and developers.
Medoid Silhouette clustering with automatic cluster number selection
For: The paper is written to discuss and improve the efficiency of the Silhouette method for clustering evaluation, specifically the medoid-based variant.* Methods: The paper uses the Silhouette method and combines it with the PAM algorithm to improve the efficiency of the clustering evaluation. The authors propose two fast versions of the algorithm and provide a theoretical analysis of its properties.* Results: The authors report a speedup of $O(k^2)$ compared to the original PAMMEDSIL algorithm on real data with 30000 samples and 100 clusters. They also provide a variant to choose the optimal number of clusters directly.Abstract
The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.
摘要
evaluating clustering results 是一项困难的任务,具有数据集和观点的依赖性。有很多不同的归类质量指标,它们试图为归类结果提供一个通用的验证方法。Silhouette 是一个非常受欢迎的指标之一。我们讨论了基于 medoid 的 Silhouette 变体,进行了理论分析其属性,并提供了两种快速版本用于直接优化。我们还讨论了如何选择最佳归类数量。我们将原始 Silhouette 的想法与 Pam 算法和其最新改进 FasterPAM 结合,提供了一种可以保证结果相同性的版本,并且提供了 $O(k^2)$ 的运行速度提升。在实际数据上进行了30000个样本和 $k$ = 100 的实验,我们发现了一个10464倍的速度提升,相比原始 PAMMEDSIL 算法。此外,我们还提供了一种直接选择最佳归类数量的变体。
Learning continuous-valued treatment effects through representation balancing
results: 我们的实验表明,CBRNet 可以准确地估计剂量回应,并与其他当前状态的方法竞争。Abstract
Estimating the effects of treatments with an associated dose on an instance's outcome, the "dose response", is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such effects, also known as continuous-valued treatment effects, are typically estimated from observational data, which may be subject to dose selection bias. This means that the allocation of doses depends on pre-treatment covariates. Previous studies have shown that conventional machine learning approaches fail to learn accurate individual estimates of dose responses under the presence of dose selection bias. In this work, we propose CBRNet, a causal machine learning approach to estimate an individual dose response from observational data. CBRNet adopts the Neyman-Rubin potential outcome framework and extends the concept of balanced representation learning for overcoming selection bias to continuous-valued treatments. Our work is the first to apply representation balancing in a continuous-valued treatment setting. We evaluate our method on a newly proposed benchmark. Our experiments demonstrate CBRNet's ability to accurately learn treatment effects under selection bias and competitive performance with respect to other state-of-the-art methods.
摘要
估算干预对象的结果的影响,即“剂量响应”,在各种领域都是重要的,从医疗到商业、经济等等。这些影响通常从观察数据中被估算,但可能受到剂量选择偏见的影响。这意味着剂量分配取决于先前的 covariates。先前的研究表明,常规机器学习方法在存在剂量选择偏见的情况下不能准确地学习个体剂量响应。在这项工作中,我们提议CBRNet,一种 causal机器学习方法,用于从观察数据中估算个体剂量响应。CBRNet采用Neyman-Rubin potential outcome框架,并将权衡表现学习概念应用到连续型剂量上。我们的工作是首次在连续型剂量设置下应用表现权衡。我们对一个新提出的 benchmark进行了测试,并证明CBRNet可以准确地学习受选择偏见影响的剂量响应,并与其他现有的状态 искусственный智能方法竞争。
A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions
results: 研究发现, selección bias 会导致传统方法的准确率下降,但是使用 state-of-the-art 方法可以有效地抗 selección bias。Abstract
In lending, where prices are specific to both customers and products, having a well-functioning personalized pricing policy in place is essential to effective business making. Typically, such a policy must be derived from observational data, which introduces several challenges. While the problem of ``endogeneity'' is prominently studied in the established pricing literature, the problem of selection bias (or, more precisely, bid selection bias) is not. We take a step towards understanding the effects of selection bias by posing pricing as a problem of causal inference. Specifically, we consider the reaction of a customer to price a treatment effect. In our experiments, we simulate varying levels of selection bias on a semi-synthetic dataset on mortgage loan applications in Belgium. We investigate the potential of parametric and nonparametric methods for the identification of individual bid-response functions. Our results illustrate how conventional methods such as logistic regression and neural networks suffer adversely from selection bias. In contrast, we implement state-of-the-art methods from causal machine learning and show their capability to overcome selection bias in pricing data.
摘要
在贷款领域,因为价格对于客户和产品都是特定的,有效的个性化价格策略是非常重要。通常,这种策略需要基于观察数据来 derivation,这会带来一些挑战。而“内生性”问题在已有的价格理论中已经得到了广泛的研究,但是“投标偏见”问题却没有得到了相应的研究。我们在价格问题上采用 causal inference 的方法来解决这个问题。 Specifically, we consider the reaction of a customer to price as a treatment effect.在我们的实验中,我们使用 semi-synthetic 数据集来模拟不同水平的选择偏见。我们 investigate 了 parametric 和 nonparametric 方法在个bid-response函数的标识方面的潜力。我们的结果表明,使用常见的 logistic regression 和神经网络方法会受到选择偏见的害。相反,我们使用 state-of-the-art 的 causal machine learning 方法,并证明它们可以在价格数据中超越选择偏见。
DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection
results: 该论文在 KUL 和 DTU 两个公共数据集上进行了实验,在 1 秒时间窗口内达到了 90.0% 和 79.6% 的检测精度。与比较方法相比,该方法的检测性能不仅高于最佳可重制基线,而且可以大幅减少 Trainable 参数的数量,约 100 倍。Abstract
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.
摘要
听觉注意点检测(AAD)目标是从脑电壳中检测目标说话人。尽管使用EEG信号的AAD方法在过去几年中获得了可观的成果,但现有方法主要依靠传统的卷积神经网络来处理Euclidean数据,这使得处理EEG信号变得困难。为了解决这个问题,本文提出了一种基于动态图自适应(DGSD)的AAD方法,不需要语音刺激作为输入。具体来说,使用动态图卷积网络来表示EEG信号的图 структуры,可以提取EEG信号中关键的听觉空间注意力特征。此外,为了进一步改善AAD检测性能,我们采用了自适应策略,包括特征退化和层次退化策略,这些策略可以在每层使用特征和分类结果来导引深层网络的学习。我们在KUL和DTU两个公共可用的数据集上进行实验,在1秒时间窗口内,我们达到了90.0%和79.6%的检测精度。我们与比较baseline方法进行比较,实验结果表明,我们提出的DGSD方法不仅在检测性能方面超过了最佳可重制baseline,而且可以降低大约100倍的训练参数数量。
results: 本研究发现了递减偏好的现象,并提出了一种新的状态表示方法,即λ表示(λR),它可以替代SR并总结了一些从文献中提出的状态表示方法。我们证明了λR的正式性质,并考虑了其在机器学习中的 normative 优点以及其在研究自然行为中的实用性。Abstract
A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution. In such situations, the successor representation (SR) is a popular framework which supports rapid policy evaluation by decoupling a policy's expected discounted, cumulative state occupancies from a specific reward function. However, in the natural world, sequential tasks are rarely independent, and instead reflect shifting priorities based on the availability and subjective perception of rewarding stimuli. Reflecting this disjunction, in this paper we study the phenomenon of diminishing marginal utility and introduce a novel state representation, the $\lambda$ representation ($\lambda$R) which, surprisingly, is required for policy evaluation in this setting and which generalizes the SR as well as several other state representations from the literature. We establish the $\lambda$R's formal properties and examine its normative advantages in the context of machine learning, as well as its usefulness for studying natural behaviors, particularly foraging.
摘要
通常的多任务强化学习(RL)中,一个代理人需要快速适应不同的站立奖励函数的随机抽取。在这种情况下,继承表示(SR)是一种广泛使用的框架,它支持快速政策评估 by 分离一个政策的预期减少价值函数和特定奖励函数。然而,在自然世界中,序列任务很少是独立的,而是基于奖励刺激的可用性和主观感受的变化。为了反映这种分歧,在这篇论文中,我们研究了减少的边际效用现象,并引入了一种新的状态表示,即 $\lambda$ 表示($\lambda$R),这种表示 surprisingly 需要为政策评估。我们证明了 $\lambda$R 的正式性质,并在机器学习中对其的正式优点进行了调研,以及在自然行为研究中的实用性。
results: 研究建议使用封闭控制算法,使用已经训练的人工智能(AI)预训练模型,提供词汇筛选、批处理新数据集、在数据流中学习、或者使用强化学习模型自动更新训练模型,以减少错误。Abstract
This paper examines some common problems in Human-Robot Interaction (HRI) causing failures and troubles in Chat. A given use case's design decisions start with the suitable robot, the suitable chatting model, identifying common problems that cause failures, identifying potential solutions, and planning continuous improvement. In conclusion, it is recommended to use a closed-loop control algorithm that guides the use of trained Artificial Intelligence (AI) pre-trained models and provides vocabulary filtering, re-train batched models on new datasets, learn online from data streams, and/or use reinforcement learning models to self-update the trained models and reduce errors.
摘要
这篇论文研究了人机交互(HRI)在聊天中出现的一些常见问题,导致故障和困难。这个用例的设计决策从适合的机器人、适合的聊天模型开始,并识别常见的故障原因、潜在的解决方案,以及不断改进计划。结论是使用封闭控制算法,使用已经训练的人工智能(AI)预训练模型进行 vocabulary 筛选、批处理新数据集来重新训练模型,在数据流中学习在线,以及使用奖励学习模型自动更新训练模型以减少错误。
A Probabilistic Semi-Supervised Approach with Triplet Markov Chains
results: 该方法可以在半supervised情况下提高 triplet Markov chain模型的训练效果,并且可以应用于多种逻辑 Bayesian 分类模型。Abstract
Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.
摘要
三重马尔可夫链是一种通用的生成模型,用于处理序列数据,它考虑了三种随机变量:受损的观察值、其关联的整数标签以及隐藏的变量,以强化观察值和其关联的标签的分布。然而,在实践中,我们通常不 dispon�ible all the labels associated with the observations to estimate the parameters of such models。在这篇论文中,我们提出了一种基于变量 bayesian 推理的通用框架,用于在半指导下训练参数化的三重马尔可夫链模型。我们的方法的通用性使得我们可以 derivate 半指导的算法 для多种生成模型,用于随机分类。
Short-Term Load Forecasting Using A Particle-Swarm Optimized Multi-Head Attention-Augmented CNN-LSTM Network
results: 我们的方法在使用真实的电力需求数据进行评估中,表现出超过现有州际优秀方法的精度、可靠性和Computational Efficiency。特别是,我们的 Mean Absolute Percentage Error 为 1.9376,与现有方法相比,表明我们的方法开启了一个新的时代。Abstract
Short-term load forecasting is of paramount importance in the efficient operation and planning of power systems, given its inherent non-linear and dynamic nature. Recent strides in deep learning have shown promise in addressing this challenge. However, these methods often grapple with hyperparameter sensitivity, opaqueness in interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that surmounts these obstacles. Our approach harnesses the power of the Particle-Swarm Optimization algorithm to autonomously explore and optimize hyperparameters, a Multi-Head Attention mechanism to discern the salient features crucial for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results underscore its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 marks a significant advancement over existing state-of-the-art approaches, heralding a new era in short-term load forecasting.
摘要
短期负载预测是电力系统的重要任务,因为它的本质具有非线性和动态特性。Recent advances in deep learning have shown promise in addressing this challenge. However, these methods often struggle with hyperparameter sensitivity, lack of interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that overcomes these challenges. Our approach leverages the power of the Particle-Swarm Optimization algorithm to automatically explore and optimize hyperparameters, a Multi-Head Attention mechanism to identify the crucial features for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results demonstrate its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 represents a significant improvement over existing state-of-the-art approaches, marking a new era in short-term load forecasting.Here's the translation of the text into Traditional Chinese:短期负载预测是电力系统的重要任务,因为它的本质具有非线性和动态特性。Recent advances in deep learning have shown promise in addressing this challenge. However, these methods often struggle with hyperparameter sensitivity, lack of interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that overcomes these challenges. Our approach leverages the power of the Particle-Swarm Optimization algorithm to automatically explore and optimize hyperparameters, a Multi-Head Attention mechanism to identify the crucial features for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results demonstrate its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 represents a significant improvement over existing state-of-the-art approaches, marking a new era in short-term load forecasting.
A computationally lightweight safe learning algorithm
results: 我们提供了理论保证,并将其嵌入到一种安全学习算法中,并通过对七度自由运动机械人模拟器进行数学实验来证明。Abstract
Safety is an essential asset when learning control policies for physical systems, as violating safety constraints during training can lead to expensive hardware damage. In response to this need, the field of safe learning has emerged with algorithms that can provide probabilistic safety guarantees without knowledge of the underlying system dynamics. Those algorithms often rely on Gaussian process inference. Unfortunately, Gaussian process inference scales cubically with the number of data points, limiting applicability to high-dimensional and embedded systems. In this paper, we propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator instead of Gaussian processes. For the Nadaraya-Watson estimator, we can reach logarithmic scaling with the number of data points. We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.
摘要
安全是控制系统学习的重要资产,因为违反安全约束 During training可能会导致设备损坏。为应对这一需求,安全学习领域出现了一些算法,可以提供概率安全保证而无需系统动力学知识。这些算法通常基于 Gaussian process 推理。可惜的是,Gaussian process 推理的计算复杂度随着数据点数的增加而呈 кубиック增长,因此对高维和嵌入系统不太适用。在这篇论文中,我们提出了一种安全学习算法,可以提供概率安全保证,但是使用 Nadaraya-Watson 估计器而不是 Gaussian processes。Nadaraya-Watson 估计器的计算复杂度与数据点数的增加相对较快,只有 logarithmic 增长。我们提供了对估计的理论保证,将其嵌入到安全学习算法中,并在一个模拟的七度自由机械 manipulate 器上进行了数学实验。
Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning
paper_authors: Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D’Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini
results: 研究结果显示,使用拓扑分析方法可以准确地分类阿尔茨曼病和控制人群,并且分类精度高于87%。Abstract
The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (>87%). Although our results are preliminary, they indicate that RS and topological analysis together may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps will include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to understand if topological data analysis could support the characterization of AD subtypes.
摘要
精神�生液(CSF)的19名病人和5名病理控制者的样本被收集和分析使用拉曼光谱(RS)。我们研究了使用标准机器学习(ML)方法来分别AD和控制者,但得到的结果不太理想。然后,我们使用ML方法对raw光谱中提取的 topological 特征进行分类,达到了非常好的分类精度(>87%)。虽然我们的结果只是初步的,但它们表明RS和 topological 分析可能是一种有效的组合,用于诊断AD。接下来的步骤将包括扩大CSF样本集,以更好地验证我们的方法,并可能地理解AD的多种类型是否可以通过 topological 数据分析来 caracterization。
Insights Into the Inner Workings of Transformer Models for Protein Function Prediction
results: 这个论文的结果表明,使用这种 XAI 方法可以Identify蛋白质序列中特定的氨基酸 residues,并且这些相关序列部分与生物和化学预期符合,包括在 embedding layer 和模型中的 transformer 头部。此外,这种方法还可以为蛋白质序列中的某些部分分配权重,并且这些权重与实际的序列标注(例如膜蛋白和活性 сай)在多个蛋白质中具有 statistically significant 相关性。Abstract
Motivation: We explored how explainable AI (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g., transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins .
摘要
Motivation: 我们研究了如何使用可解释AI(XAI)来探索神经网络中用于蛋白功能预测的内部工作机制,通过扩展广泛使用XAI方法的集成梯度的扩展,以便可以查看转换器模型中的隐藏表示。结果:我们的方法可以 помо助我们identify在序列中特定的氨基酸,并显示这些相关的序列部分与生物和化学预期符合,不仅在嵌入层中,还在模型中,我们在多个转换器头中发现了 statistically significant的对应地图与实际序列注释(例如膜内部区域、活性位点)的相关性,在许多蛋白质中。可用性和实现:源代码可以通过https://github.com/markuswenzel/xai-proteins 访问。
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
paper_authors: Yusuf Brima, Ulf Krumnack, Simone Pika, Gunther Heidemann
for: investigates how different formulations of the Barlow Twins (BT) objective impact downstream task performance for speech data.
methods: proposes Modified Barlow Twins (MBT) with normalized latents to enforce scale-invariance and evaluates on speaker identification, gender recognition and keyword spotting tasks.
results: improves representation generalization over original BT, especially when fine-tuning with limited target data, highlighting the importance of designing objectives that encourage invariant and transferable representations.Here’s the full text in Simplified Chinese:
results: MBT 在原始 BT 上进行了改进,特别是在使用有限目标数据进行练习时,对下游任务的表现有所提高,这说明设计目标函数可以促进不变和可转移的表现。Abstract
The choice of the objective function is crucial in emerging high-quality representations from self-supervised learning. This paper investigates how different formulations of the Barlow Twins (BT) objective impact downstream task performance for speech data. We propose Modified Barlow Twins (MBT) with normalized latents to enforce scale-invariance and evaluate on speaker identification, gender recognition and keyword spotting tasks. Our results show MBT improves representation generalization over original BT, especially when fine-tuning with limited target data. This highlights the importance of designing objectives that encourage invariant and transferable representations. Our analysis provides insights into how the BT learning objective can be tailored to produce speech representations that excel when adapted to new downstream tasks. This study is an important step towards developing reusable self-supervised speech representations.
摘要
“选择目标函数的选择非常重要,以获得自动学习中的高质量表现。这篇论文探讨了不同形式的巴洛兄弟(BT)目标函数如何影响下游任务性能,特别是对于语音数据。我们提出了修改后的巴洛兄弟(MBT)目标函数,强制采用Normalized latents来保持尺度对称性,并评估了语音识别、性别识别和关键词搜寻任务。我们的结果显示MBT可以提高表现普遍化,特别是在有限目标数据下进行精致化。这诉求了设计目标函数,以导引不对称和可转移的表现。我们的分析提供了关于BT学习目标函数如何适应语音表现,以便在新的下游任务中进行适应。这个研究是发展可重用自动学习语音表现的重要一步。”
Filtration Surfaces for Dynamic Graph Classification
results: 实验表明,筛选面方法可以超越现有的基eline,在具有边重量信息的数据集上达到最高效果,而且只需一个或者没有参数。Abstract
Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation.
摘要
现有的方法对动态图进行分类 either 将图kernels升级到时间频谱中,或使用图神经网络(GNNs)。然而,现有的基线有可靠性问题,不能处理变化的节点集,或者不考虑边重要性信息。我们提出了筛选表面,一种可扩展和灵活的方法,以解决这些限制。我们通过实验证明了我们的模型的有效性,并证明了筛选表面在基于边重要性信息的数据集上表现出色,而且只需一个或者无参数,并且具有最低的总标准差。
Your Battery Is a Blast! Safeguarding Against Counterfeit Batteries with Authentication
results: 对20个数据集进行分析,实现了高精度的电池验证(最高达0.99)和模型识别(最高达0.96),同时提供了相对识别性。Abstract
Lithium-ion (Li-ion) batteries are the primary power source in various applications due to their high energy and power density. Their market was estimated to be up to 48 billion U.S. dollars in 2022. However, the widespread adoption of Li-ion batteries has resulted in counterfeit cell production, which can pose safety hazards to users. Counterfeit cells can cause explosions or fires, and their prevalence in the market makes it difficult for users to detect fake cells. Indeed, current battery authentication methods can be susceptible to advanced counterfeiting techniques and are often not adaptable to various cells and systems. In this paper, we improve the state of the art on battery authentication by proposing two novel methodologies, DCAuth and EISthentication, which leverage the internal characteristics of each cell through Machine Learning models. Our methods automatically authenticate lithium-ion battery models and architectures using data from their regular usage without the need for any external device. They are also resilient to the most common and critical counterfeit practices and can scale to several batteries and devices. To evaluate the effectiveness of our proposed methodologies, we analyze time-series data from a total of 20 datasets that we have processed to extract meaningful features for our analysis. Our methods achieve high accuracy in battery authentication for both architectures (up to 0.99) and models (up to 0.96). Moreover, our methods offer comparable identification performances. By using our proposed methodologies, manufacturers can ensure that devices only use legitimate batteries, guaranteeing the operational state of any system and safety measures for the users.
摘要
锂离子(Li-ion)电池是各种应用的主要能源来源,其市场估计在2022年可达480亿美元。然而,广泛采用锂离子电池的使用导致假电池的生产,这可能会对用户造成安全隐患。假电池可能会引起爆炸或火灾,并且在市场中充斥,使用户很难detect假电池。实际上,当前的电池认证方法可能会受到先进的假制技术的影响,并且不能适应不同的电池和系统。在这篇论文中,我们提出了两种新的方法,DCAuth和EIS authentication,它们利用每个电池的内部特征通过机器学习模型进行自动认证。我们的方法不需要任何外部设备,可以自动认证锂离子电池模型和架构,并且具有抗常见和最 kritical假制技术的能力。为评估我们提出的方法的效果,我们分析了20个数据集,并从中提取了有用的特征进行分析。我们的方法在锂离子电池模型和架构上达到了高精度的认证效果(达0.99),并且具有相当的标识表现。通过使用我们的方法,制造商可以确保设备只使用合法电池,保证系统的运行状态和用户的安全措施。
results: 研究发现,对于肿瘤诊断(i.e. 癌细胞存在的样本,381/516组织样本)和诊断(i.e. 患者经受了手术后的生化倒退,98/663组织样本)的预测结果,高吸引区域并不一定与肿瘤区域重叠,这表明需要研究非癌细胞when评估诊断。Abstract
Recent advances in attention-based multiple instance learning (MIL) have improved our insights into the tissue regions that models rely on to make predictions in digital pathology. However, the interpretability of these approaches is still limited. In particular, they do not report whether high-attention regions are positively or negatively associated with the class labels or how well these regions correspond to previously established clinical and biological knowledge. We address this by introducing a post-training methodology to analyse MIL models. Firstly, we introduce prediction-attention-weighted (PAW) maps by combining tile-level attention and prediction scores produced by a refined encoder, allowing us to quantify the predictive contribution of high-attention regions. Secondly, we introduce a biological feature instantiation technique by integrating PAW maps with nuclei segmentation masks. This further improves interpretability by providing biologically meaningful features related to the cellular organisation of the tissue and facilitates comparisons with known clinical features. We illustrate the utility of our approach by comparing PAW maps obtained for prostate cancer diagnosis (i.e. samples containing malignant tissue, 381/516 tissue samples) and prognosis (i.e. samples from patients with biochemical recurrence following surgery, 98/663 tissue samples) in a cohort of patients from the international cancer genome consortium (ICGC UK Prostate Group). Our approach reveals that regions that are predictive of adverse prognosis do not tend to co-locate with the tumour regions, indicating that non-cancer cells should also be studied when evaluating prognosis.
摘要
近期的注意力基本多例学习(MIL)技术进步,有助于我们更深入了解在数字 PATHOLOGY 中模型如何预测的组织区域。然而,这些方法的解释能力仍然有限。具体来说,它们不会报告高注意区域与类别标签之间的相关性是正向或负向的,也不会报告这些区域与已知的临床和生物学知识是如何对应的。我们解决这个问题,通过对 MIL 模型进行后处理分析方法。首先,我们引入预测注意重量(PAW)地图,通过将瓦片级别注意力和预测得分结果结合起来,以量化预测中高注意区域的预测贡献。其次,我们引入生物特征实例化技术,通过将 PAW 地图与核型分割mask 结合起来,进一步提高解释性,并提供生物学意义的特征,与已知临床特征进行比较。我们通过对抑肿癌诊断(i.e. 病理样本中含有恶性组织,381/516 组织样本)和诊断后发生 biochemical 回卷的患者群体(i.e. 来自国际癌基因组计划(ICGC) UK 频谱癌组)进行比较,我们的方法显示,预测不良结果的预测区域与肿瘤区域不常协同存在,这表明,在评估诊断时,也应该考虑非癌细胞。
results: 对MCAR和IM两种缺失数据场景进行了 theoret calculations 和实证 validate,与已有算法相比,三元树在MCAR场景下表现更好,特别是只缺失外样本时,而在IM场景下表现较差。一种混合模型三元MIA树,结合三元树和MIA方法,在所有缺失数据场景下表现稳定。Abstract
This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.
摘要
Translated into Simplified Chinese:这篇论文介绍了一种名为Trinary decision tree的算法,用于改进决策树回归和分类器中处理缺失数据的方法。与其他方法不同,Trinary decision tree不 assumptions 缺失值包含响应值的信息。在不同的缺失数据场景(MCAR和IM)中,对其性能进行了 theoretically 计算和使用实际数据进行 illustrate ,与已有算法进行比较。特别是在MCAR场景中,Trinary tree 在数据只缺失外样时表现出色,而在IM场景中则落后于其他算法。一种混合模型,TrinaryMIA tree,通过将Trinary tree 和Missing In Attributes(MIA)方法结合起来,在所有类型的缺失数据场景中表现稳定。尽管可能会增加训练速度的慢化,但Trinary tree 提供了一种更准确和有前途的缺失数据处理方法。
On the dynamics of multi agent nonlinear filtering and learning
for: Multiagent systems aim to accomplish highly complex learning tasks through decentralized consensus seeking dynamics, and their use has garnered a great deal of attention in the signal processing and computational intelligence societies.
methods: The paper presents a general formulation for the actions of an agent in multiagent networked systems and conditions for achieving a cohesive learning behavior.
results: The paper applies the derived framework in distributed and federated learning scenarios.Here’s the same information in Traditional Chinese:
for: Multiagent systems aim to accomplish highly complex learning tasks through decentralized consensus seeking dynamics, and their use has garnered a great deal of attention in the signal processing and computational intelligence societies.
methods: The paper presents a general formulation for the actions of an agent in multiagent networked systems and conditions for achieving a cohesive learning behavior.
results: The paper applies the derived framework in distributed and federated learning scenarios.Abstract
Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.
摘要
多智能体系统目的是通过分散的同意决策方式实现高度复杂的学习任务,这在信号处理和计算智能学会中备受关注。这篇文章研究了多智能网络系统的非线性筛选/学习动态行为。为此,我们提供了多智能Agent的通用形式和各种学习情况下的凝结行为条件。更重要的是,我们在分布式和联邦学习场景中应用了所 derivation的框架。
MVD:A Novel Methodology and Dataset for Acoustic Vehicle Type Classification
results: 实验结果表明,我们的方法可以高度准确地分类听音信号,并且比前一些研究的基准值高出91.98%和96.66%。此外,我们还将模型部署到了Android应用程序中,以便演示其可用性。Abstract
Rising urban populations have led to a surge in vehicle use and made traffic monitoring and management indispensable. Acoustic traffic monitoring (ATM) offers a cost-effective and efficient alternative to more computationally expensive methods of monitoring traffic such as those involving computer vision technologies. In this paper, we present MVD and MVDA: two open datasets for the development of acoustic traffic monitoring and vehicle-type classification algorithms, which contain audio recordings of moving vehicles. The dataset contain four classes- Trucks, Cars, Motorbikes, and a No-vehicle class. Additionally, we propose a novel and efficient way to accurately classify these acoustic signals using cepstrum and spectrum based local and global audio features, and a multi-input neural network. Experimental results show that our methodology improves upon the established baselines of previous works and achieves an accuracy of 91.98% and 96.66% on MVD and MVDA Datasets, respectively. Finally, the proposed model was deployed through an Android application to make it accessible for testing and demonstrate its efficacy.
摘要
城市人口增长导致交通量的增加,使得交通监测和管理成为不可或缺的。鸣音交通监测(ATM)提供了一种可靠且高效的代替方案,而不是基于计算机视觉技术的更复杂和费力的监测方法。在这篇论文中,我们提供了两个开放的数据集,即MVD和MVDA数据集,用于开发鸣音交通监测和车辆类型分类算法。这两个数据集包含了四个类别:卡车、汽车、摩托车和无车类。此外,我们还提出了一种新的和高效的鸣音信号分类方法,使用cepstrum和spectrum基于本地和全球音频特征,以及多输入神经网络。实验结果表明,我们的方法超过了之前的基准值,并达到了MVD和MVDA数据集的准确率为91.98%和96.66%。最后,我们通过Android应用程序部署了我们的模型,以便进行测试和证明其可靠性。
Subgraph-based Tight Frames on Graphs with Compact Supports and Vanishing Moments
methods: 该方法使用一系列层次分 partitions,并将subgraph Laplacians incorporated into its design,以便可以自由地调整(subgraph)vanishing moments和其他特性,如方向性,以便有效地表示图像信号。
results: 实验结果显示,提议的图像帧在非线性近似任务中表现出色。Abstract
In this work, we proposed a novel and general method to construct tight frames on graphs with compact supports based on a series of hierarchical partitions. Starting from our abstract construction that generalizes previous methods based on partition trees, we are able to flexibly incorporate subgraph Laplacians into our design of graph frames. Consequently, our general methods permit adjusting the (subgraph) vanishing moments of the framelets and extra properties, such as directionality, for efficiently representing graph signals with path-like supports. Several variants are explicitly defined and tested. Experimental results show our proposed graph frames perform superiorly in non-linear approximation tasks.
摘要
在这项工作中,我们提出了一种新的和普遍的方法,用于在图上建立紧凑支持的图帧。我们从抽象构造起点,总结了之前基于分区树的方法,并可以自由地将子图 Laplacians incorporated 到我们的帧设计中。因此,我们的通用方法允许调整(子图)消失时刻的帧谱和其他属性,如方向性,以高效地表示图示号 signals with path-like 支持。我们还明确定义了多种变体并进行测试。实验结果表明,我们的提议的图帧在非线性approximation任务中表现出色。Here's the translation breakdown:* "In this work" is translated as "在这项工作中" (在这项工作中).* "we proposed" is translated as "我们提出了" (我们提出了).* "a novel and general method" is translated as "一种新的和普遍的方法" (一种新的和普遍的方法).* "to construct tight frames on graphs with compact supports" is translated as "用于在图上建立紧凑支持的图帧" (用于在图上建立紧凑支持的图帧).* "Starting from our abstract construction that generalizes previous methods based on partition trees" is translated as "从抽象构造起点,总结了之前基于分区树的方法" (从抽象构造起点,总结了之前基于分区树的方法).* "we are able to flexibly incorporate subgraph Laplacians into our design of graph frames" is translated as "并可以自由地将子图 Laplacians incorporated 到我们的帧设计中" (并可以自由地将子图 Laplacians incorporated 到我们的帧设计中).* "Consequently, our general methods permit adjusting the (subgraph) vanishing moments of the framelets and extra properties, such as directionality" is translated as "因此,我们的通用方法允许调整(子图)消失时刻的帧谱和其他属性,如方向性" (因此,我们的通用方法允许调整(子图)消失时刻的帧谱和其他属性,如方向性).* "Several variants are explicitly defined and tested" is translated as "我们还明确定义了多种变体并进行测试" (我们还明确定义了多种变体并进行测试).* "Experimental results show our proposed graph frames perform superiorly in non-linear approximation tasks" is translated as "实验结果表明,我们的提议的图帧在非线性approximation任务中表现出色" (实验结果表明,我们的提议的图帧在非线性approximation任务中表现出色).
Privacy-preserving Continual Federated Clustering via Adaptive Resonance Theory
results: 实验结果表明,提出的算法在实验用 synthetic 和实际世界数据集上具有较高的聚类性能,同时实现了数据隐私保护和自适应学习能力。Abstract
With the increasing importance of data privacy protection, various privacy-preserving machine learning methods have been proposed. In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied and showed high clustering performance while preserving data privacy. However, most of the base clusterers (i.e., clustering algorithms) used in existing federated clustering algorithms need to specify the number of clusters in advance. These algorithms, therefore, are unable to deal with data whose distributions are unknown or continually changing. To tackle this problem, this paper proposes a privacy-preserving continual federated clustering algorithm. In the proposed algorithm, an adaptive resonance theory-based clustering algorithm capable of continual learning is used as a base clusterer. Therefore, the proposed algorithm inherits the ability of continual learning. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art federated clustering algorithms while realizing data privacy protection and continual learning ability. The source code is available at \url{https://github.com/Masuyama-lab/FCAC}.
摘要
随着数据隐私保护的重要性日益增加,各种隐私保护机器学习方法已经被提出。在聚类领域,使用联邦学习框架(即联邦聚类)的各种算法已经得到了广泛的研究和应用,其中大多数基本聚类算法(即聚类算法)需要在先 specify the number of clusters 的情况下进行设置。这些算法因此无法处理数据的分布是未知的或在变化的情况下。为解决这个问题,本文提出了一种隐私保护的 continual federated clustering 算法。在提出的算法中,使用 adaptive resonance theory 基于的聚类算法,可以实现 continual learning 的能力。实验结果表明,与州际顶尖的联邦聚类算法相比,提出的算法具有更高的聚类性能,同时实现了数据隐私保护和 continual learning 的能力。源代码可以在 \url{https://github.com/Masuyama-lab/FCAC} 上获取。
Cross-domain Sound Recognition for Efficient Underwater Data Analysis
results: 本研究では、エアグン・サウンドの识别において、非水中データと海中データを使用したニューラルネットワークモデルをトレーニングしました。结果、 precision、recall、およびF1 Scoreの评価指标に基づいて、我々のモデルがエアグン・サウンドを识别するための效果的な方法です。Abstract
This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualization of the underwater data using the feature vectors of an aerial sound recognition model. This enables us to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics. This innovative method simplifies the process of selecting candidate labels for further training. In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset. We conducted a quantitative analysis to measure the precision, recall, and F1 score of our model for recognizing airgun sounds, a common type of underwater sound. The F1 score achieved by our model exceeded 84.3%, demonstrating the effectiveness of our approach in analyzing underwater acoustic data. The methodology presented in this paper holds significant potential to reduce the amount of labor required in underwater data analysis and opens up new possibilities for further research in the field of cross-domain data analysis.
摘要
PCA and UMAP visualization of underwater data using the feature vectors of an aerial sound recognition model. This enables clustering of the data in a two-dimensional space and allows for understanding the defining characteristics of points within these clusters.2. Training a neural network model using both the selected underwater data and the non-underwater dataset. The model achieved an F1 score of over 84.3% in recognizing airgun sounds, a common type of underwater sound.The proposed methodology has the potential to significantly reduce the amount of labor required in underwater data analysis and opens up new possibilities for cross-domain data analysis.
Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation
paper_authors: Yaozhong Shi, Grigorios Lavrentiadis, Domniki Asimaki, Zachary E. Ross, Kamyar Azizzadenesheli for:* The paper is written for ground-motion synthesis using a Generative Adversarial Neural Operator (GANO) to generate three-component acceleration time histories.methods:* The paper uses Neural Operators, a resolution invariant architecture that guarantees the model training is independent of the data sampling frequency.* The authors use a conditional ground-motion synthesis algorithm (cGM-GANO) that combines recent advancements in machine learning and open access strong motion data sets.results:* The paper shows that cGM-GANO can recover the magnitude, distance, and $V_{S30}$ scaling of Fourier amplitude and pseudo-spectral accelerations.* The framework is evaluated through residual analysis with the empirical dataset and comparison with conventional Ground Motion Models (GMMs) for selected ground motion scenarios.* The results show that cGM-GANO produces consistent median scaling with the GMMs for the corresponding tectonic environments, with the largest misfit observed at short distances.Abstract
We present a data-driven model for ground-motion synthesis using a Generative Adversarial Neural Operator (GANO) that combines recent advancements in machine learning and open access strong motion data sets to generate three-component acceleration time histories conditioned on moment magnitude ($M$), rupture distance ($R_{rup}$), time-average shear-wave velocity at the top $30m$ ($V_{S30}$), and tectonic environment or style of faulting. We use Neural Operators, a resolution invariant architecture that guarantees that the model training is independent of the data sampling frequency. We first present the conditional ground-motion synthesis algorithm (referred to heretofore as cGM-GANO) and discuss its advantages compared to previous work. Next, we verify the cGM-GANO framework using simulated ground motions generated with the Southern California Earthquake Center (SCEC) Broadband Platform (BBP). We lastly train cGM-GANO on a KiK-net dataset from Japan, showing that the framework can recover the magnitude, distance, and $V_{S30}$ scaling of Fourier amplitude and pseudo-spectral accelerations. We evaluate cGM-GANO through residual analysis with the empirical dataset as well as by comparison with conventional Ground Motion Models (GMMs) for selected ground motion scenarios. Results show that cGM-GANO produces consistent median scaling with the GMMs for the corresponding tectonic environments. The largest misfit is observed at short distances due to the scarcity of training data. With the exception of short distances, the aleatory variability of the response spectral ordinates is also well captured, especially for subduction events due to the adequacy of training data. Applications of the presented framework include generation of risk-targeted ground motions for site-specific engineering applications.
摘要
我们提出了一个基于数据驱动的模型,用于地震动的合成,使用生成各种推测运算(GANO),结合了最新的机器学习技术和公开存取的强震动数据集,以生成三维加速度时间历史,受到震内功率(M)、碰撞距离(Rrup)、震内速度(V_{S30)和地震类型或构造。我们使用神经操作员,一种解析独立的架构,使得模型训练不受数据采样频率的影响。我们首先介绍了增测地震动的条件Synthesis(cGM-GANO)algorithm,并详细讨论其优点相比前期工作。接着,我们验证了cGM-GANO框架,使用南加州地震中心(SCEC)的广泛频率平台(BBP)生成的模拟地震动。最后,我们将cGM-GANO框架训练在日本的KiK-net数据集上,显示了这个框架可以重建震内功率、距离和震内速度的数值对应。我们透过差异分析和与传统的地震模型(GMMs)进行比较,评估cGM-GANO的性能。结果显示,cGM-GANO在不同的地震类型下具有一致的中位数弹性,但短距离下存在较大的差异。对于不同的地震enario,aleatory variability of the response spectral ordinates是很好地捕捉,特别是在SUBDUCTION事件中,因为训练数据的充足。应用包括生成基于风险的地震动,供特定工程应用。
Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data
results: 通过在模拟研究和两个实际案例(太阳黑暴报警和船用信号分类)中证明perTucker的效果,包括异常检测、客户分类和封装。Abstract
We propose personalized Tucker decomposition (perTucker) to address the limitations of traditional tensor decomposition methods in capturing heterogeneity across different datasets. perTucker decomposes tensor data into shared global components and personalized local components. We introduce a mode orthogonality assumption and develop a proximal gradient regularized block coordinate descent algorithm that is guaranteed to converge to a stationary point. By learning unique and common representations across datasets, we demonstrate perTucker's effectiveness in anomaly detection, client classification, and clustering through a simulation study and two case studies on solar flare detection and tonnage signal classification.
摘要
我们提出个性化图cker decompositions(perTucker),以解决传统矩阵分解方法不能捕捉不同数据集之间的多样性的问题。perTucker将矩阵数据分解为共享全局组件和个性化本地组件。我们提出了一种方差归一化假设,并开发了一种距离正则化块坐标梯度下降算法,这个算法可以保证 converges to a stationary point。通过学习不同数据集之间的共同和特有表示,我们示出perTucker在异常检测、客户分类和聚类方面的效果,通过一个模拟研究和两个案例研究:太阳闪烁检测和吨位信号分类。
Byzantine-Robust Federated Learning with Variance Reduction and Differential Privacy
for: 保护数据隐私和强化模型训练 robustness against Byzantine attacks
methods: 引入简化和动量驱动的差异性隐私机制,并在安全设计中保持客户端级别隐私保证
results: 在不同的异步 dataset和任务上进行了广泛的实验,并证明了我们的框架可以提高系统对恶意攻击的抵抗力,同时保持强大的隐私保证。Abstract
Federated learning (FL) is designed to preserve data privacy during model training, where the data remains on the client side (i.e., IoT devices), and only model updates of clients are shared iteratively for collaborative learning. However, this process is vulnerable to privacy attacks and Byzantine attacks: the local model updates shared throughout the FL network will leak private information about the local training data, and they can also be maliciously crafted by Byzantine attackers to disturb the learning. In this paper, we propose a new FL scheme that guarantees rigorous privacy and simultaneously enhances system robustness against Byzantine attacks. Our approach introduces sparsification- and momentum-driven variance reduction into the client-level differential privacy (DP) mechanism, to defend against Byzantine attackers. The security design does not violate the privacy guarantee of the client-level DP mechanism; hence, our approach achieves the same client-level DP guarantee as the state-of-the-art. We conduct extensive experiments on both IID and non-IID datasets and different tasks and evaluate the performance of our approach against different Byzantine attacks by comparing it with state-of-the-art defense methods. The results of our experiments show the efficacy of our framework and demonstrate its ability to improve system robustness against Byzantine attacks while achieving a strong privacy guarantee.
摘要
Federation Learning (FL) 是设计优先保护数据隐私,在模型训练过程中数据都留在客户端(即物联网设备),并且只有客户端的模型更新被共同学习。然而,这个过程受到隐私攻击和恶意攻击的威胁:本地模型更新在 FL 网络中传输的数据会泄露客户端的训练数据隐私信息,而且可以由恶意攻击者预制作假数据来干扰学习。在这篇论文中,我们提出了一种新的 FL 方案,保证了严格的隐私和同时增强了系统对恶意攻击的抗性。我们的方法通过在客户端级别的权限机制中引入缩减和动量驱动的减少方法,以防止恶意攻击者的干扰。我们的安全设计不违反客户端级别的隐私保证机制,因此我们的方法实现了同样的客户端级别隐私保证。我们进行了对不同的 dataset 和任务的广泛实验,并对不同的恶意攻击进行比较,以评估我们的方法的性能。实验结果表明我们的框架具有强大的隐私保证和系统抗性的能力,并且在不同的恶意攻击下都能够实现优秀的性能。
Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making
results: 实验结果表明,ELBERT-PO方法可以减少偏见并保持高效用。Code可以在https://github.com/Yuancheng-Xu/ELBERT中下载。Abstract
Decisions made by machine learning models may have lasting impacts over time, making long-term fairness a crucial consideration. It has been shown that when ignoring the long-term effect, naively imposing fairness criterion in static settings can actually exacerbate bias over time. To explicitly address biases in sequential decision-making, recent works formulate long-term fairness notions in Markov Decision Process (MDP) framework. They define the long-term bias to be the sum of static bias over each time step. However, we demonstrate that naively summing up the step-wise bias can cause a false sense of fairness since it fails to consider the importance difference of different time steps during transition. In this work, we introduce a long-term fairness notion called Equal Long-term Benefit Rate (ELBERT), which explicitly considers varying temporal importance and adapts static fairness principles to the sequential setting. Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient. This makes standard policy optimization methods applicable for reducing the bias, leading to our proposed bias mitigation method ELBERT-PO. Experiments on three sequential decision making environments show that ELBERT-PO significantly reduces bias and maintains high utility. Code is available at https://github.com/Yuancheng-Xu/ELBERT.
摘要
决策机器学习模型的决策可能会有持续的影响,因此长期公平是一项重要考虑因素。研究发现,在忽略长期效果的情况下,简单地在静态设置下应用公平准则可能会加剧偏见。为了显式地处理序列决策中的偏见,最近的研究在Markov决策过程(MDP)框架中表述了长期公平观。它定义了长期偏见为每个时间步骤的公平差异的总和。然而,我们展示了将每步骤的偏见相加可能会导致假的公平感,因为它不考虑不同时间步骤之间的重要性差异。在这种情况下,我们引入了一种长期公平观念called Equal Long-term Benefit Rate(ELBERT),它明确考虑了不同时间步骤之间的变化 temporal importance,并将静态公平原则应用到序列设置中。此外,我们还证明了Long-term Benefit Rate的政策梯度可以通过标准政策梯度来降低偏见,导致我们的偏见缓解方法ELBERT-PO。在三个序列决策环境中,ELBERT-PO显著减少了偏见,同时保持了高的用 utility。代码可以在https://github.com/Yuancheng-Xu/ELBERT中获取。