results: 研究发现,宠物狗在不同语言环境下发出的叫声存在显著的声音差异。此外,研究还发现了一些可能与主人语言模式相关的宠物狗叫声特征。Abstract
How hosts language influence their pets' vocalization is an interesting yet underexplored problem. This paper presents a preliminary investigation into the possible correlation between domestic dog vocal expressions and their human host's language environment. We first present a new dataset of Shiba Inu dog vocals from YouTube, which provides 7500 clean sound clips, including their contextual information of these vocals and their owner's speech clips with a carefully-designed data processing pipeline. The contextual information includes the scene category in which the vocal was recorded, the dog's location and activity. With a classification task and prominent factor analysis, we discover significant acoustic differences in the dog vocals from the two language environments. We further identify some acoustic features from dog vocalizations that are potentially correlated to their host language patterns.
摘要
< translating_language: "zh-CN" >人类主人的语言环境如何影响宠物的叫声是一个有趣又未得到充分研究的问题。这篇论文提出了对宠物叫声和主人语言环境之间可能存在相关性的初步调查。我们首先提供了一个新的Shiba Inu狗叫音数据集,包括YouTube上的7500个干净的叫音示例和其上下文信息,以及主人的语音示例和一个仔细设计的数据处理管道。上下文信息包括叫声在录制场景中的类别、狗的位置和活动。通过分类任务和显著因子分析,我们发现了宠物叫声在两个语言环境下存在显著的声音差异。我们进一步发现了一些宠物叫声特征与主人语言模式之间的可能相关性。
Trip Planning for Autonomous Vehicles with Wireless Data Transfer Needs Using Reinforcement Learning
results: 相比于不相关的基准和带宽不相关的基准,该解决方案在不同的干线交通情况下表现明显更好,可以更好地满足车辆的数据传输需求和行驶时间需求。Abstract
With recent advancements in the field of communications and the Internet of Things, vehicles are becoming more aware of their environment and are evolving towards full autonomy. Vehicular communication opens up the possibility for vehicle-to-infrastructure interaction, where vehicles could share information with components such as cameras, traffic lights, and signage that support a countrys road system. As a result, vehicles are becoming more than just a means of transportation; they are collecting, processing, and transmitting massive amounts of data used to make driving safer and more convenient. With 5G cellular networks and beyond, there is going to be more data bandwidth available on our roads, but it may be heterogeneous because of limitations like line of sight, infrastructure, and heterogeneous traffic on the road. This paper addresses the problem of route planning for autonomous vehicles in urban areas accounting for both driving time and data transfer needs. We propose a novel reinforcement learning solution that prioritizes high bandwidth roads to meet a vehicles data transfer requirement, while also minimizing driving time. We compare this approach to traffic-unaware and bandwidth-unaware baselines to show how much better it performs under heterogeneous traffic. This solution could be used as a starting point to understand what good policies look like, which could potentially yield faster, more efficient heuristics in the future.
摘要
(注意:以下是简化中文版本,与原文可能有所不同)随着交通和互联网的技术进步,车辆正在变得更加自动化。车辆与基础设施之间的交通开发了可以让车辆与道路系统中的设备进行交换信息,如摄像头、交通灯和路标。这使得车辆不仅成为了交通工具,还开始收集、处理和传输大量数据,以提高驾驶的安全性和便利性。5G移动通信网络和更进一步的技术将在路上提供更多的数据带宽,但这可能会具有不同的限制,如视线、基础设施和路上的异化交通。本文关注城市地区自动驾驶车辆的路径规划问题,考虑到驾驶时间和数据传输需求的平衡。我们提出了一种基于强化学习的新解决方案,它会优先选择高带宽道路,以满足车辆的数据传输需求,同时尽量减少驾驶时间。我们与无规则和带宽无规则的基线相比较,以显示这种方法在异化交通情况下的性能有多好。这种解决方案可以作为未来更快、更高效的启示。
Confidence Calibration for Systems with Cascaded Predictive Modules
results: 研究人员通过 theoretically justifying和实验证明,证明了这种方法的效果和性能优势。对比各个模块的预测间隔,这种方法生成的预测间隔更加准确,并提供了更好的性能保证。Abstract
Existing conformal prediction algorithms estimate prediction intervals at target confidence levels to characterize the performance of a regression model on new test samples. However, considering an autonomous system consisting of multiple modules, prediction intervals constructed for individual modules fall short of accommodating uncertainty propagation over different modules and thus cannot provide reliable predictions on system behavior. We address this limitation and present novel solutions based on conformal prediction to provide prediction intervals calibrated for a predictive system consisting of cascaded modules (e.g., an upstream feature extraction module and a downstream regression module). Our key idea is to leverage module-level validation data to characterize the system-level error distribution without direct access to end-to-end validation data. We provide theoretical justification and empirical experimental results to demonstrate the effectiveness of proposed solutions. In comparison to prediction intervals calibrated for individual modules, our solutions generate improved intervals with more accurate performance guarantees for system predictions, which are demonstrated on both synthetic systems and real-world systems performing overlap prediction for indoor navigation using the Matterport3D dataset.
摘要
现有的准确预测算法可以为新的测试样本提供预测 интерVAL,以评估回归模型的性能。然而,对于由多个模块组成的自主系统,单个模块的预测间隔无法考虑模块之间的uncertainty协同传递,因此无法提供可靠的系统行为预测。我们解决这个限制,并提出了基于准确预测的新解决方案,以提供calibrated的预测间隔,用于评估预测系统中各个模块之间的协同影响。我们的关键思想是使用模块级验证数据来描述系统级错误分布,而不需要直接访问端到端验证数据。我们提供了理论 justify和实验 result,以证明我们的解决方案的有效性。相比单个模块的预测间隔,我们的解决方案可以生成更加 precisions的预测间隔,并提供更加准确的性能保证,这些结果在 synthetic 系统和实际世界中进行 overlap 预测的 Matterport3D 数据集上得到证明。
results: 在流行的路径预测数据集上,得到了joint trajectory metrics的state of the art表现,并且可以直接在测试时从多种有价值的决定性分布中随机抽样。Abstract
Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.
摘要
simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. the use of prerecorded real-world traffic scenarios in simulation ensures realism, but the rarity of safety critical events makes large-scale collection of driving scenarios expensive. in this paper, we present djinn - a diffusion-based method of generating traffic scenarios. our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. on popular trajectory forecasting datasets, we report state-of-the-art performance on joint trajectory metrics. in addition, we demonstrate how djinn flexibly enables direct test-time sampling from a variety of valuable conditional distributions, including goal-based sampling, behavior-class sampling, and scenario editing.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. Traditional Chinese is used in Hong Kong, Taiwan, and other regions.
User-Level Differential Privacy With Few Examples Per User
For: 本文研究了用户级别的差分隐私(DP),并取得了以下结果:* Methods: 本文使用了item-level DP算法的通用变换,以实现用户级别的DP。此外,本文还使用了对数机制(McSherry, Talwar FOCS 2007)进行适应。* Results: 本文取得了以下结果: + 对 approximate-DP,我们提供了一个通用的item-level DP算法到用户级别DP算法的变换,具有$(O_{\varepsilon,\delta}(\sqrt{m}))$的优化。 + 对 pure-DP,我们提供了一种简单的适应技术,可以应用于各种任务,如private PAC learning、假设选择和分布学习。对这些问题,我们显示了我们的 bound 是 near-optimal。Abstract
Previous work on user-level differential privacy (DP) [Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the example-rich regime, where the users have so many examples that each user could themselves solve the problem. In this work we consider the example-scarce regime, where each user has only a few examples, and obtain the following results: 1. For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. Roughly speaking, the latter gives a (multiplicative) savings of $O_{\varepsilon,\delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm, while recovering most known bounds for specific problems, also gives new bounds, e.g., for PAC learning. 2. For pure-DP, we present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.
摘要
previous research on user-level differential privacy (DP) (Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023)obtained general algorithms that work for various learning tasks. However, their focus was on the example-rich regime, where users have many examples that they can solve themselves. In this work, we consider the example-scarce regime, where each user only has a few examples, and obtain the following results:1. For approximate-DP, we provide a generic transformation of any item-level DP algorithm to a user-level DP algorithm. This roughly speaking, gives a (multiplicative) savings of $O_{\varepsilon,\delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm recovers most known bounds for specific problems and also gives new bounds, such as for PAC learning.2. For pure-DP, we present a simple technique for adapting the exponential mechanism (McSherry, Talwar FOCS 2007)to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.
Evidential uncertainties on rich labels for active learning
methods: 这个论文使用了两种方法: sampling by Klir uncertainty和sampling by evidential epistemic uncertainty。这两种方法都基于信念函数理论。
results: 这个论文的结果表明,使用这两种方法可以更好地解决exploration-exploitation问题,并且可以更准确地考虑labels的uncertainty。Abstract
Recent research in active learning, and more precisely in uncertainty sampling, has focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, we propose to simplify the computational phase and remove the dependence on observations, but more importantly to take into account the uncertainty already present in the labels, \emph{i.e.} the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which addresses the exploration-exploitation problem, and sampling by evidential epistemic uncertainty, which extends the reducible uncertainty to the evidential framework, both using the theory of belief functions.
摘要
近期研究在活动学习中,更具体地说是在uncertainty sampling中,关注模型不确定性的分解。在这篇论文中,我们提议简化计算阶段,并从观察依赖中解脱,更重要的是,考虑标签上的不确定性,即观察者的不确定性。我们提出了两种策略:基于Klir不确定性的采样,解决探索与利用问题,以及基于证据性不确定性的采样,扩展可reducible uncertainty到证据框架,都使用信仰函数理论。
Sharpness-Aware Minimization and the Edge of Stability
methods: 本文使用一种名为“边缘稳定性”的计算方法,来研究SAM在训练神经网络时的稳定性。这种方法基于Localquadratic approximation of the loss函数。
results: 经验表明,SAM在训练神经网络时会操作在“边缘稳定性”的edge上,这个edge取决于梯度的norm。这些结果用三个深度学习训练任务进行了实证验证。Abstract
Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
摘要
现在的实验表明,当使用梯度下降(GD)学习神经网络时,损失函数的偏导数的 operator нор平方根会不断增长,直到约等于 $2/\eta$,然后会随机变化。这个值被称为 "稳定边缘",基于Localquadratic Approximation of the loss。我们对Sharpness-Aware Minimization(SAM)进行类似的计算,并发现SAM-edge会随着梯度的norm而变化。通过三个深度学习训练任务的实证,我们发现SAM在这个分析定义的稳定边缘上运行。Note: "Simplified Chinese" is a romanization of the Chinese language, and the translation is based on the standardized system known as "Mainland Chinese" or "Mandarin". The translation may vary depending on the specific dialect or regional variation.
Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development
results: 实验结果显示,提出的方法可以获得重要的能源预测结果。Abstract
Energy consumption is a fundamental concern in mobile application development, bearing substantial significance for both developers and end-users. Moreover, it is a critical determinant in the consumer's decision-making process when considering a smartphone purchase. From the sustainability perspective, it becomes imperative to explore approaches aimed at mitigating the energy consumption of mobile devices, given the significant global consequences arising from the extensive utilisation of billions of smartphones, which imparts a profound environmental impact. Despite the existence of various energy-efficient programming practices within the Android platform, the dominant mobile ecosystem, there remains a need for documented machine learning-based energy prediction algorithms tailored explicitly for mobile app development. Hence, the main objective of this research is to propose a novel neural network-based framework, enhanced by a metaheuristic approach, to achieve robust energy prediction in the context of mobile app development. The metaheuristic approach here plays a crucial role in not only identifying suitable learning algorithms and their corresponding parameters but also determining the optimal number of layers and neurons within each layer. To the best of our knowledge, prior studies have yet to employ any metaheuristic algorithm to address all these hyperparameters simultaneously. Moreover, due to limitations in accessing certain aspects of a mobile phone, there might be missing data in the data set, and the proposed framework can handle this. In addition, we conducted an optimal algorithm selection strategy, employing 13 metaheuristic algorithms, to identify the best algorithm based on accuracy and resistance to missing values. The comprehensive experiments demonstrate that our proposed approach yields significant outcomes for energy consumption prediction.
摘要
Mobile 应用程序开发中的能源消耗是一个基本问题,对开发者和用户都具有重要意义。此外,它还是购买智能手机的决策因素之一。从可持续发展的角度来看,针对移动设备的能源消耗减少成为了必要的。虽然Android平台上有各种能效编程做法,但是还没有任何文献记录了基于机器学习的移动应用程序开发中的能源预测算法。因此,本研究的主要目标是提出一种基于神经网络的新框架,通过metaheuristic方法进行优化,以实现移动应用程序开发中的 Robust 能源预测。在我们所知道的范围内,现有的研究都没有使用metaheuristic算法来处理所有的超参数。此外,由于移动设备的一些特性是无法访问的,数据集中可能会有缺失数据,并且我们的提posed方法可以处理这种情况。此外,我们采用了最佳算法选择策略,使用13种metaheuristic算法,以确定最佳算法,基于准确率和缺失值的抗性。广泛的实验表明,我们的提posed方法可以带来显著的能源消耗预测效果。
results: 研究发现,当多Modal学习存在连接和多样性时,模型可以获得更好的普遍性Bound,比单Modal学习提高到$O(\sqrt{n})$。Abstract
Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities', of underlying objects. Despite the longstanding consideration of this perspective in philosophy and cognitive science, the study of multimodality remains relatively under-explored within the field of machine learning. Nevertheless, current studies of multimodal machine learning are limited to empirical practices, lacking theoretical foundations beyond heuristic arguments. An intriguing finding from the practice of multimodal learning is that a model trained on multiple modalities can outperform a finely-tuned unimodal model, even on unimodal tasks. This paper provides a theoretical framework that explains this phenomenon, by studying generalization properties of multimodal learning algorithms. We demonstrate that multimodal learning allows for a superior generalization bound compared to unimodal learning, up to a factor of $O(\sqrt{n})$, where $n$ represents the sample size. Such advantage occurs when both connection and heterogeneity exist between the modalities.
摘要
人类对现实世界的感知包括认识不同的表现形式,或“Modalities”,下面的物体。尽管这个视角在哲学和认知科学中已经有很长的历史,但在机器学习领域中对多 modal 学习的研究仍然较为少 explore。然而,当前的多 modal 学习研究仅限于实践,缺乏更加深入的理论基础,只有一些启发性的 Argument。多 modal 学习实践中的一个感人发现是,一个通过多种Modalities 训练的模型可以在单模态任务上超越精心调整的单模态模型。这篇论文提供了一个解释这种现象的理论框架,通过研究多 modal 学习算法的泛化性质。我们示出,在Modalities 之间存在连接和多样性时,多 modal 学习可以与单 modal 学习相比,提高泛化级别,最高可以达到 $O(\sqrt{n})$,其中 $n$ 表示样本大小。这种优势发生在Modalities 之间存在连接和多样性时。
A Convex Framework for Confounding Robust Inference
results: 提出一种通用的估计器,可以提供精确的下界估计政策价值,并且可以扩展到敏感分析、模型选择和Robust政策学习等领域。Abstract
We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value using convex programming. The generality of our estimator enables various extensions such as sensitivity analysis with f-divergence, model selection with cross validation and information criterion, and robust policy learning with the sharp lower bound. Furthermore, our estimation method can be reformulated as an empirical risk minimization problem thanks to the strong duality, which enables us to provide strong theoretical guarantees of the proposed estimator using techniques of the M-estimation.
摘要
我们研究线上上下文抽屉策略评估,受到不观测的偏见影响。感知分析方法通常用于估计策略价值在最差折衔集下,但现有工作经常使用一些粗略放宽不确定集来简化计算,导致估计策略价值过于保守。在本文中,我们提出一种通用的估计器,可以提供精确的下界估计策略价值使用几何编程。我们的估计器具有通用性,可以进行多种扩展,如感知分析使用f-散度、模型选择使用分割validation和信息因子,以及robust策略学习使用锐下界。此外,我们的估计方法可以转化为empirical risk minimization问题,使得我们可以通过强duality提供强 тео리тиче保证我们的提议估计器。
Change Management using Generative Modeling on Digital Twins
paper_authors: Nilanjana Das, Anantaa Kotal, Daniel Roseberry, Anupam Joshi
For: The paper is written for small and medium-sized businesses that need to securely manage software updates and changes, but do not have the resources to set up a non-production environment for stress testing.* Methods: The paper proposes using “digital twins” on the cloud to create a non-production environment for stress testing software changes, and using Generative Artificial Intelligence (AI) models to generate testing scenarios to check for points of failure.* Results: The paper shows how using digital twins and Generative AI models can help small and medium-sized businesses securely test software changes before releasing them into production, without the need for a dedicated non-production environment.Here is the text in Simplified Chinese:* For: 这篇论文是为小型和中型企业写的,它们需要安全地管理软件更新和变更,但是没有设置非生产环境进行压力测试的资源。* Methods: 论文提议使用云端的数字双向来创建非生产环境,并使用生成式人工智能模型来生成测试场景来检查失败点。* Results: 论文显示,通过使用数字双向和生成式人工智能模型,小型和中型企业可以安全地测试软件更新和变更,无需非生产环境。Abstract
A key challenge faced by small and medium-sized business entities is securely managing software updates and changes. Specifically, with rapidly evolving cybersecurity threats, changes/updates/patches to software systems are necessary to stay ahead of emerging threats and are often mandated by regulators or statutory authorities to counter these. However, security patches/updates require stress testing before they can be released in the production system. Stress testing in production environments is risky and poses security threats. Large businesses usually have a non-production environment where such changes can be made and tested before being released into production. Smaller businesses do not have such facilities. In this work, we show how "digital twins", especially for a mix of IT and IoT environments, can be created on the cloud. These digital twins act as a non-production environment where changes can be applied, and the system can be securely tested before patch release. Additionally, the non-production digital twin can be used to collect system data and run stress tests on the environment, both manually and automatically. In this paper, we show how using a small sample of real data/interactions, Generative Artificial Intelligence (AI) models can be used to generate testing scenarios to check for points of failure.
摘要
小和中等规模的企业面临着安全管理软件更新和变化的一个关键挑战。具体来说,随着黑客攻击的快速演化,软件系统中的更新和补丁是必须的,以保持防御力和符合法规要求。然而,安全补丁和更新在生产环境中进行压力测试是具有安全风险的。大型企业通常具有非生产环境,可以在这些环境中进行更改和测试,然后将其推送到生产环境。然而,小型企业没有这样的设施。在这种情况下,我们表明了如何使用“数字双”,特别是混合IT和IoT环境下的数字双,在云上创建。这些数字双可以作为非生产环境,应用更改并在安全测试前进行压力测试。此外,非生产数字双还可以用来收集系统数据和自动和手动执行压力测试。在这篇论文中,我们表明了如何使用小样本的实际数据和互动,生成人工智能模型来生成测试场景,检查系统的漏洞点。
Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis
results: 研究使用了不净化的表演,并取得了现有最高的 FAD 实实主义分数,并允许了新的时间和风格控制。详细资讯可以参考 benadar293.github.io/midipm。Abstract
Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm
摘要
Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm.Here's the text with traditional Chinese characters: generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm.
The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains
results: 在比特币价格预测、语音情感识别和慢性颈部疼痛检测等三个实验中,FINs方法可以将性能提高约1000、3%和7%。Abstract
Initialization of neural network weights plays a pivotal role in determining their performance. Feature Imitating Networks (FINs) offer a novel strategy by initializing weights to approximate specific closed-form statistical features, setting a promising foundation for deep learning architectures. While the applicability of FINs has been chiefly tested in biomedical domains, this study extends its exploration into other time series datasets. Three different experiments are conducted in this study to test the applicability of imitating Tsallis entropy for performance enhancement: Bitcoin price prediction, speech emotion recognition, and chronic neck pain detection. For the Bitcoin price prediction, models embedded with FINs reduced the root mean square error by around 1000 compared to the baseline. In the speech emotion recognition task, the FIN-augmented model increased classification accuracy by over 3 percent. Lastly, in the CNP detection experiment, an improvement of about 7 percent was observed compared to established classifiers. These findings validate the broad utility and potency of FINs in diverse applications.
摘要
<>将神经网络权重初始化的策略对其性能产生决定性影响。特征模仿网络(FIN)提供了一种新的策略,将权重初始化为近似特定的关闭式统计特征,为深度学习架构设置了良好的基础。尽管FIN的可应用性主要在生物医学领域进行了证明,但本研究扩展了其探索范围到其他时间序列数据集。本研究进行了三种不同的实验来测试对 Tsallis entropy的模仿提高性能的可能性:比特币价格预测、语音情感识别和慢性 neck 疼痛检测。在比特币价格预测任务中,含有 FIN 的模型可以相比基准下降约 1000 的平均方差误差。在语音情感识别任务中,FIN 加装后的模型可以提高分类精度高于 3%。最后,在慢性 neck 疼痛检测实验中,FIN 加装后的模型可以相比已知分类器提高约 7%的正确率。这些发现证明了 FIN 在多样化应用中的广泛适用性和能力。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Health diagnosis and recuperation of aged Li-ion batteries with data analytics and equivalent circuit modeling
paper_authors: Riko I Made, Jing Lin, Jintao Zhang, Yu Zhang, Lionel C. H. Moh, Zhaolin Liu, Ning Ding, Sing Yang Chiam, Edwin Khoo, Xuesong Yin, Guangyuan Wesley Zheng for:This paper aims to assess the battery health and develop a strategy for cell rejuvenation of second-life Li-ion batteries.methods:The paper presents aging and reconditioning experiments of 62 commercial high-energy type lithium iron phosphate (LFP) cells, and uses machine learning models to predict cycle life and identify important indicators of recoverable capacity.results:The paper achieves an average test error of 16.84% ± 1.87% (mean absolute percentage error) for cycle life prediction, and finds that some of the recoverable lost capacity is attributed to the lateral lithium non-uniformity within the electrodes. Additionally, the paper demonstrates how battery operation history significantly affects the capacity recovery.Abstract
Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries. However, due to ambiguous aging mechanisms and lack of correlations between the recovery effects and operational states, it is challenging to accurately estimate battery health and devise a clear strategy for cell rejuvenation. This paper presents aging and reconditioning experiments of 62 commercial high-energy type lithium iron phosphate (LFP) cells, which supplement existing datasets of high-power LFP cells. The relatively large-scale data allow us to use machine learning models to predict cycle life and identify important indicators of recoverable capacity. Considering cell-to-cell inconsistencies, an average test error of $16.84\% \pm 1.87\%$ (mean absolute percentage error) for cycle life prediction is achieved by gradient boosting regressor given information from the first 80 cycles. In addition, it is found that some of the recoverable lost capacity is attributed to the lateral lithium non-uniformity within the electrodes. An equivalent circuit model is built and experimentally validated to demonstrate how such non-uniformity can be accumulated, and how it can give rise to recoverable capacity loss. SHapley Additive exPlanations (SHAP) analysis also reveals that battery operation history significantly affects the capacity recovery.
摘要
锂离子电池寿命和恢复对二次利用锂离子电池的利用率有着关键作用。然而,由于龄测不准确和操作状态和恢复效果之间没有明确的相关性,因此难以正确地评估电池健康状况和恢复策略。本文通过62个商业高能量锂铁磷铌(LFP)电池的年轻和恢复实验,补充了现有的高功率LFP电池数据。基于大规模数据,使用机器学习模型预测循环寿命和重要的循环容量指标。考虑到电池间差异,使用树 boosting 回归器可以在第80次前的80次内实现平均测试错误率为16.84% ± 1.87%(精度error)。此外,发现一些可以恢复的失去容量是由电极铁离子不均匀引起的。通过建立等式模型和实验验证,表明这种不均匀可以在恢复过程中积累,并且可以导致可恢复容量损失。使用 SHapley Additive exPlanations(SHAP)分析也发现,电池操作历史对容量恢复产生了显著影响。
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
results: 经过实验表明,融合后的神经网络性能明显提高,并且更加稳定,比传统的模型融合方法更好Here’s a brief explanation of each point:* “for”: The paper aims to improve the performance and stability of deep learning models.* “methods”: The authors propose a “soft merging” method that combines multiple local optima models quickly and efficiently, using a surrogate of the $l_0$ norm to learn gate parameters.* “results”: The experiments show that the merged neural networks have better performance and are more stable than traditional model merging methods.Abstract
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.
摘要
Parallelizing non-linear sequential models over the sequence length
results: 根据论文的数据显示,这个平行算法可以让循环神经网络和射预 diferencial equation 的训练速度提高至多达 3 倍,而且不会对输出准确性造成影响。此外,这个算法适用于各种循环神经网络和射预 diferencial equation 架构。Abstract
Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.
摘要
纵向模型,如回归神经网络和几何 diferencial equation,长期受到纵向性的限制,导致训练速度慢。多年来,这一限制被认为是不可改变的,许多人认为纵向模型无法并行化。我们挑战这一长期固有的信念,提出了一种并行算法,可以在 GPU 上加速纵向模型的评估,提高训练速度到3个数量级。这种算法不需要纵向模型的特殊结构,因此适用于各种架构。使用我们的方法,纵向模型的训练可以比普通纵向方法快上到10倍,而无需任何意义的训练结果差异。通过加速训练,我们发现了彩虹 Recurrent Unit 在长时间序列分类问题中的效果,并在17k个时间样本上进行了证明。通过突破训练瓶颈,我们的工作为非线性纵向模型在长序列问题中的潜力开辟了第一步。
Weakly-supervised Automated Audio Captioning via text only training
methods: weakly-supervised approach using contrastive language-audio pretraining (CLAP)
results: relative performance of up to ~$83%$ compared to fully supervised approaches, demonstrated on Clotho and AudioCaps datasets.Here’s the full text in Simplified Chinese:
for: 自动化语音描述 (AAC)
methods: 弱型指导方法,使用语音-文本预训练 (CLAP)
results: 与完全指导方法比较, relative performance 高达 ~$83%$, 验证于 Clotho 和 AudioCaps 数据集。Abstract
In recent years, datasets of paired audio and captions have enabled remarkable success in automatically generating descriptions for audio clips, namely Automated Audio Captioning (AAC). However, it is labor-intensive and time-consuming to collect a sufficient number of paired audio and captions. Motivated by the recent advances in Contrastive Language-Audio Pretraining (CLAP), we propose a weakly-supervised approach to train an AAC model assuming only text data and a pre-trained CLAP model, alleviating the need for paired target data. Our approach leverages the similarity between audio and text embeddings in CLAP. During training, we learn to reconstruct the text from the CLAP text embedding, and during inference, we decode using the audio embeddings. To mitigate the modality gap between the audio and text embeddings we employ strategies to bridge the gap during training and inference stages. We evaluate our proposed method on Clotho and AudioCaps datasets demonstrating its ability to achieve a relative performance of up to ~$83\%$ compared to fully supervised approaches trained with paired target data.
摘要
Recently, datasets of paired audio and captions have enabled remarkable success in automatically generating descriptions for audio clips, specifically Automated Audio Captioning (AAC). However, collecting a sufficient number of paired audio and captions is labor-intensive and time-consuming. Motivated by the recent advances in Contrastive Language-Audio Pretraining (CLAP), we propose a weakly-supervised approach to train an AAC model assuming only text data and a pre-trained CLAP model, eliminating the need for paired target data. Our approach leverages the similarity between audio and text embeddings in CLAP. During training, we learn to reconstruct the text from the CLAP text embedding, and during inference, we decode using the audio embeddings. To mitigate the modality gap between the audio and text embeddings, we employ strategies to bridge the gap during training and inference stages. We evaluate our proposed method on Clotho and AudioCaps datasets, demonstrating its ability to achieve a relative performance of up to approximately 83% compared to fully supervised approaches trained with paired target data.
t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators
results: 研究发现,t-EER 是一个具有优点的度量,可以同时评估 PAD 和 biometric verification 系统的可靠性。Abstract
Presentation attack (spoofing) detection (PAD) typically operates alongside biometric verification to improve reliablity in the face of spoofing attacks. Even though the two sub-systems operate in tandem to solve the single task of reliable biometric verification, they address different detection tasks and are hence typically evaluated separately. Evidence shows that this approach is suboptimal. We introduce a new metric for the joint evaluation of PAD solutions operating in situ with biometric verification. In contrast to the tandem detection cost function proposed recently, the new tandem equal error rate (t-EER) is parameter free. The combination of two classifiers nonetheless leads to a \emph{set} of operating points at which false alarm and miss rates are equal and also dependent upon the prevalence of attacks. We therefore introduce the \emph{concurrent} t-EER, a unique operating point which is invariable to the prevalence of attacks. Using both modality (and even application) agnostic simulated scores, as well as real scores for a voice biometrics application, we demonstrate application of the t-EER to a wide range of biometric system evaluations under attack. The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.
摘要
translate into Simplified Chinese:presentation attack (spoofing) detection (PAD) 通常与生物认证结合使用,以提高骗用攻击时的可靠性。尽管这两个子系统在解决单一任务的可靠生物认证时都会运行,但它们处理不同的检测任务,因此通常会被分别评估。证据表明,这种方法是不优化的。我们介绍了一个新的度量来评估在生物认证过程中运行的 PAD 解决方案。与最近提出的 tandem 检测成本函数不同,我们的新 tandem 相同错误率(t-EER)没有参数。尽管这两个分类器组合会导致一个集合的操作点,其中假阳数和遗漏率都是等值的,并且受到攻击频率的影响。因此,我们引入了同时concurrent 的 t-EER,这是不受攻击频率影响的唯一操作点。使用模式(甚至应用)无关的模拟得分,以及一个voice biometrics应用的实际得分,我们在攻击下进行了广泛的生物系统评估。我们的提出的方法是tandem 评估 PAD 系统和生物比较器的强有力的候选度量。
Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing
results: 论文的实验结果表明,这种修正方法可以减轻常见的量化问题,并且可以提供一个更加准确的准确性指标。 此外,这种方法还可以生成一个视觉化的准确性图表,可以较好地表示预测器的准确性。Abstract
Calibration measures and reliability diagrams are two fundamental tools for measuring and interpreting the calibration of probabilistic predictors. Calibration measures quantify the degree of miscalibration, and reliability diagrams visualize the structure of this miscalibration. However, the most common constructions of reliability diagrams and calibration measures -- binning and ECE -- both suffer from well-known flaws (e.g. discontinuity). We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function. We prove that with a careful choice of bandwidth, this method yields a calibration measure that is well-behaved in the sense of (B{\l}asiok, Gopalan, Hu, and Nakkiran 2023a) -- a consistent calibration measure. We call this measure the SmoothECE. Moreover, the reliability diagram obtained from this smoothed function visually encodes the SmoothECE, just as binned reliability diagrams encode the BinnedECE. We also provide a Python package with simple, hyperparameter-free methods for measuring and plotting calibration: `pip install relplot\`.
摘要
“滑动测量和可靠图是两种基本工具用于测量和解释 probabilistic 预测器的准确性。滑动测量量化了偏差的度量,而可靠图可视化了这种偏差的结构。但是,最常用的可靠图和准确度测量的构造(例如 binning 和 ECE)都受到了良好知名的缺陷(例如缺点)。我们展示了一种简单的修改可以解决这些缺陷:首先使用 RBF 核函数平滑 observation,然后计算这个平滑函数的预期准确性错误(ECE)。我们证明了,在选择合适的宽度时,这种方法可以得到一个准确的准确度测量。我们称这种测量为 SmoothECE。此外,从这个平滑函数中获得的可靠图可以视觉地编码 SmoothECE,与 binning 可靠图中的 BinnedECE 类似。我们还提供了一个 Python 包,其中包含了简单、无参数的准确度测量和可靠图plotting 方法:`pip install relplot`。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese. The other form is Traditional Chinese.
results: 作者证明了在一定参数 Régime 下,存在常量时间的随机算法可以查找弱$\epsilon$-近似$\sigma$-细腻 Nash均衡,以及一个多项式时间的杜氏算法可以查找强$\epsilon$-近似$\sigma$-细腻 Nash均衡。Abstract
A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called $\sigma$-smooth Nash equilibrium, for a smoothness parameter $\sigma$. In a $\sigma$-smooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a $\sigma$-smooth strategy, which is a distribution that does not put too much mass (as parametrized by $\sigma$) on any fixed action. We distinguish two variants of $\sigma$-smooth Nash equilibria: strong $\sigma$-smooth Nash equilibria, in which players are required to play $\sigma$-smooth strategies under equilibrium play, and weak $\sigma$-smooth Nash equilibria, where there is no such requirement. We show that both weak and strong $\sigma$-smooth Nash equilibria have superior computational properties to Nash equilibria: when $\sigma$ as well as an approximation parameter $\epsilon$ and the number of players are all constants, there is a constant-time randomized algorithm to find a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games. In the same parameter regime, there is a polynomial-time deterministic algorithm to find a strong $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in a normal-form game. These results stand in contrast to the optimal algorithm for computing $\epsilon$-approximate Nash equilibria, which cannot run in faster than quasipolynomial-time. We complement our upper bounds by showing that when either $\sigma$ or $\epsilon$ is an inverse polynomial, finding a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibria becomes computationally intractable.
摘要
“纳什平衡概念的基本缺陷是其计算复杂性:在正常形游戏中,approximating纳什平衡是PPAD困难的。在这篇论文中,我们采纳了缓和分析的想法,并引入了一种名为$\sigma$-缓平衡的弱化版本。在$\sigma$-缓平衡中,玩家只需要实现Utility在最好的偏转前的最高水平,这个水平是一个 Distribution 不能集中过多的动作。我们分为两种类型的$\sigma$-缓平衡:强$\sigma$-缓平衡和弱$\sigma$-缓平衡。在强$\sigma$-缓平衡中,玩家在平衡状态下必须采用$\sigma$-缓动作,而在弱$\sigma$-缓平衡中,没有这种要求。我们证明了两种$\sigma$-缓平衡都有较好的计算性质:当$\sigma$ 以及approximation参数 $\epsilon$ 和玩家数量都是常数时,可以在常数时间内随机找到一个 $\epsilon$-近似的 $\sigma$-缓平衡。在同样的参数 régime 中,可以在多项时间内决定一个强 $\epsilon$-近似的 $\sigma$-缓平衡。这些结果与 оптималь的算法 для计算 $\epsilon$-近似纳什平衡不同,它们不能在更快的 quasi-polynomial 时间内运行。我们补充我们的上限 bounds ,表明当 $\sigma$ 或 $\epsilon$ 是反对数 polynomials 时,找到一个 $\epsilon$-近似的 $\sigma$-缓平衡变得计算困难。”
results: 对 synthetic 和实际数据进行实验,结果表明 RAMs 可以提高表达能力 compared to GAMs 而又保持可解释性。Abstract
Generalized Additive Models (GAMs) are widely used explainable-by-design models in various applications. GAMs assume that the output can be represented as a sum of univariate functions, referred to as components. However, this assumption fails in ML problems where the output depends on multiple features simultaneously. In these cases, GAMs fail to capture the interaction terms of the underlying function, leading to subpar accuracy. To (partially) address this issue, we propose Regionally Additive Models (RAMs), a novel class of explainable-by-design models. RAMs identify subregions within the feature space where interactions are minimized. Within these regions, it is more accurate to express the output as a sum of univariate functions (components). Consequently, RAMs fit one component per subregion of each feature instead of one component per feature. This approach yields a more expressive model compared to GAMs while retaining interpretability. The RAM framework consists of three steps. Firstly, we train a black-box model. Secondly, using Regional Effect Plots, we identify subregions where the black-box model exhibits near-local additivity. Lastly, we fit a GAM component for each identified subregion. We validate the effectiveness of RAMs through experiments on both synthetic and real-world datasets. The results confirm that RAMs offer improved expressiveness compared to GAMs while maintaining interpretability.
摘要
通用加itive模型(GAMs)广泛应用于不同领域的解释可能模型。GAMs假设输出可以表示为一个或多个变量函数的总和,称为组件。然而,在机器学习问题中,输出与多个特征同时相互作用,这导致GAMs无法捕捉到下面函数的交叉项,从而导致准确率下降。为解决这个问题,我们提出了地域加itive模型(RAMs),一种新的解释可能模型。RAMs将特征空间分成多个子区域,其中交叉项减少。在这些子区域内,我们可以更准确地表示输出为一个或多个变量函数的总和(组件)。因此,RAMs在每个特征上采用一个组件,而不是一个组件。这种方法比GAMs更加表达力,同时保持可解释性。RAMs的框架包括三个步骤:首先,我们训练黑盒模型;其次,使用地域效果图来确定特征空间中交叉项下降的子区域;最后,我们在每个确定的子区域内采用GAM组件。我们通过对真实数据和 sintetic 数据进行实验,证明RAMs可以提高表达力,同时保持可解释性。
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices
results: 实现了约 7.8 x 10^4 倍的能效率提升,与同级准确性相当Here’s the full translation of the paper’s abstract in Simplified Chinese:
for: 本文用于逻辑 neuromorphic computing 加速
methods: 利用 AQFP 设备的随机行为和软件硬件协调
results: 实现了约 7.8 x 10^4 倍的能效率提升,与同级准确性相当I hope this helps! Let me know if you have any further questions.Abstract
Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high energy efficiency. By employing the distinct polarity of current to denote logic `0' and `1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward developing an AQFP-based BNN accelerator, several critical challenges remain, preventing the design from being a comprehensive solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN acceleration framework that leverages software-hardware co-optimization to eventually make the AQFP devices a feasible solution for BNN acceleration. Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation. To tackle the accumulation problem and improve overall hardware performance, we propose a stochastic computing-based accumulation module and a clocking scheme adjustment-based circuit optimization method. We validate our SupeRBNN framework across various datasets and network architectures, comparing it with implementations based on different technologies, including CMOS, ReRAM, and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our design achieves an energy efficiency of approximately 7.8x10^4 times higher than that of the ReRAM-based BNN framework while maintaining a similar level of model accuracy. Furthermore, when compared with superconductor-based counterparts, our framework demonstrates at least two orders of magnitude higher energy efficiency.
摘要
adiabatic量子流 Parametron (AQFP) 是一种超导逻辑,其能效率极高。通过使用电流的正负polarity来表示逻辑“0”和“1”,AQFP设备成为了优秀的二进制神经网络(BNN)计算器。虽然最近的研究已经做出了初步的进展,但还有许多关键挑战,使得AQFP设备无法成为全面的解决方案。在这篇论文中,我们提出了SupeRBNN框架,它是基于AQFP的随机BNN加速器,通过软件硬件协同优化来使AQFP设备成为BNN加速器的可能性。我们调查了AQFP设备的随机行为,分析了交叉板大小对电流强度的影响,并将电流强度转化为适合BNN计算的值。为了解决积累问题并提高硬件性能,我们提出了随机计算模块和时钟控制调整的电路优化方法。我们在不同的 datasets 和网络架构上验证了我们的SupeRBNN框架,与不同的技术,包括CMOS、ReRAM和超导器RSFQ/ERSFQ进行比较。实验结果表明,我们的设计可以达到约7.8亿次高于ReRAM基于BNN框架的能效率,同时保持相同的模型准确性。此外,与超导器基于的同类设计相比,我们的设计可以达到至少两个数量级的高于能效率。
Physics-informed State-space Neural Networks for Transport Phenomena
results: 通过两个在silico实验(加热管和冷却系统循环)的示例,证明PSMs比普通的数据驱动模型更加准确。此外,PSMs还可以用于创建非线性监管控制器和系统诊断算法。Abstract
This work introduces Physics-informed State-space neural network Models (PSMs), a novel solution to achieving real-time optimization, flexibility, and fault tolerance in autonomous systems, particularly in transport-dominated systems such as chemical, biomedical, and power plants. Traditional data-driven methods fall short due to a lack of physical constraints like mass conservation; PSMs address this issue by training deep neural networks with sensor data and physics-informing using components' Partial Differential Equations (PDEs), resulting in a physics-constrained, end-to-end differentiable forward dynamics model. Through two in silico experiments - a heated channel and a cooling system loop - we demonstrate that PSMs offer a more accurate approach than purely data-driven models. Beyond accuracy, there are several compelling use cases for PSMs. In this work, we showcase two: the creation of a nonlinear supervisory controller through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates the ability of PSMs to handle both constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further posit that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems.
摘要
Besides accuracy, there are several compelling use cases for PSMs. In this work, we showcase two: the creation of a nonlinear supervisory controller through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates the ability of PSMs to handle both constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further propose that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems.
Boolformer: Symbolic Regression of Logic Functions with Transformers
methods: 该论文使用了一种名为 Boolformer 的 transformer 架构,通过使用 clean truth table 预测复杂函数的简洁表达式。
results: 该论文在一系列实际的二分类 datasets 上进行了评估,并示出了 Boolformer 的可解释性和高效性。 Additionally, the paper shows that Boolformer can be applied to the task of modeling gene regulatory networks, and is competitive with state-of-the-art genetic algorithms with a significant speedup.Abstract
In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions. First, we show that it can predict compact formulas for complex functions which were not seen during training, when provided a clean truth table. Then, we demonstrate its ability to find approximate expressions when provided incomplete and noisy observations. We evaluate the Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods. Finally, we apply it to the widespread task of modelling the dynamics of gene regulatory networks. Using a recent benchmark, we show that Boolformer is competitive with state-of-the art genetic algorithms with a speedup of several orders of magnitude. Our code and models are available publicly.
摘要
在这项工作中,我们介绍了Boolformer,第一个基于Transformer架构的端到端符号重 regression的布尔函数预测模型。我们首先表明,它可以在提供了干净的真值表时预测复杂函数的简洁表达。然后,我们示出它可以在提供不完整和噪声探测数据时找到近似表达。我们对一组真实世界的 binary 分类数据进行评估,示出它的可读性和可比较性。最后,我们将其应用于模型生物学制御网络的动态学习任务,使用最新的 benchmark ,并证明它与当前的遗传算法竞争。我们的代码和模型公共可用。
Optimal Conditional Inference in Adaptive Experiments
results: 这 paper 的结果表明,在批处理bandit实验中,使用最后一批数据进行推断是最优的。当批处理bandit实验中的特性是Location-invariant时,存在一个额外的信息,即批处理arm means的一个线性函数。在更加 restrictive 的情况下,可以 derivate computationally tractable 和最优的 conditional inference 方法。Abstract
We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.
摘要
我们研究批处bandit实验,考虑实验结束时间、分配概率和目标参数的推断问题,这些参数都可能是通过实验前几批数据来采取适应性的选择。不含任何额外限制,我们显示在实验结束后使用最后一批数据进行推断是优化的。当批处arm的 adaptive 特性是位置不变的,即将所有批处arm的均值shifted by a constant,我们显示该数据中还存在一个额外的线性函数, capture 了批处arm的均值。在更restrictive的情况下, stopping time、分配概率和目标参数都是通过数据来定义的polyhedral事件集合,我们 derive computationally tractable and optimal conditional inference procedures.Note: "批处bandit" refers to a batched bandit experiment, where the experimenter collects data by interacting with a set of arms (e.g., treatments or actions) in batches, rather than one at a time.
Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval
results: 研究表明,使用深度学习方法可以在不同模式之间建立连接,并且可以提高音频-谱面重 Retrieval的精度。但是,还有一些挑战需要解决,以实现大规模的应用。Abstract
A range of applications of multi-modal music information retrieval is centred around the problem of connecting large collections of sheet music (images) to corresponding audio recordings, that is, identifying pairs of audio and score excerpts that refer to the same musical content. One of the typical and most recent approaches to this task employs cross-modal deep learning architectures to learn joint embedding spaces that link the two distinct modalities - audio and sheet music images. While there has been steady improvement on this front over the past years, a number of open problems still prevent large-scale employment of this methodology. In this article we attempt to provide an insightful examination of the current developments on audio-sheet music retrieval via deep learning methods. We first identify a set of main challenges on the road towards robust and large-scale cross-modal music retrieval in real scenarios. We then highlight the steps we have taken so far to address some of these challenges, documenting step-by-step improvement along several dimensions. We conclude by analysing the remaining challenges and present ideas for solving these, in order to pave the way to a unified and robust methodology for cross-modal music retrieval.
摘要
多种多Modal音乐信息检索的应用中心于将大量的Sheet Music图像与相应的音频记录相连接,即将Audio和Sheet Music图像中的同一段音乐内容相匹配。最近的一种常见的方法是使用交叉模态深度学习建筑来学习连接两种不同模式的Audio和Sheet Music图像的共同空间。虽然在过去几年内有所进步,但还有许多未解决的问题,阻碍大规模应用这种方法。在这篇文章中,我们尝试提供了深入的检查现代深度学习方法在Audio-Sheet Music检索中的最新发展。我们首先确定了cross-modal音乐检索中的主要挑战,然后高亮我们已经采取的措施来解决一些这些挑战,并记录了一系列维度上的改进。我们最后分析了剩下的挑战,并提出了解决这些挑战的想法,以便推导一种简单、稳定的方法来实现cross-modal音乐检索。
Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems
paper_authors: Luis Carvalho, Tobias Washüttl, Gerhard Widmer
for: 提高跨模式音乐检索系统的效果
methods: 使用自动提取的音频和Sheet图像的对比学习
results: 在多种实验中,预训练模型可以更好地检索音频和Sheet图像的剪辑,并且在跨模式音乐标识任务中,检索精度从30%提高到100%。Abstract
Linking sheet music images to audio recordings remains a key problem for the development of efficient cross-modal music retrieval systems. One of the fundamental approaches toward this task is to learn a cross-modal embedding space via deep neural networks that is able to connect short snippets of audio and sheet music. However, the scarcity of annotated data from real musical content affects the capability of such methods to generalize to real retrieval scenarios. In this work, we investigate whether we can mitigate this limitation with self-supervised contrastive learning, by exposing a network to a large amount of real music data as a pre-training step, by contrasting randomly augmented views of snippets of both modalities, namely audio and sheet images. Through a number of experiments on synthetic and real piano data, we show that pre-trained models are able to retrieve snippets with better precision in all scenarios and pre-training configurations. Encouraged by these results, we employ the snippet embeddings in the higher-level task of cross-modal piece identification and conduct more experiments on several retrieval configurations. In this task, we observe that the retrieval quality improves from 30% up to 100% when real music data is present. We then conclude by arguing for the potential of self-supervised contrastive learning for alleviating the annotated data scarcity in multi-modal music retrieval models.
摘要
把乐谱图像和声音记录相连接是跨模态音乐检索系统的关键问题。一种基本的方法是通过深度神经网络学习跨模态嵌入空间,以连接短暂的声音和乐谱。但是,实际音乐内容上的标注数据的稀缺性影响这些方法在实际检索场景中的泛化能力。在这项工作中,我们研究了是否可以通过自我超vised强制学习来缓解这种限制,通过对模拟和真实音乐数据进行随机增强后,让网络对声音和乐谱两种模态进行对比。经过一些实验,我们发现在所有场景和预训练配置下,预训练模型都能够更好地进行检索。这些结果使我们对跨模态乐谱识别任务进行更多的实验,并观察到在真实音乐数据存在的情况下,检索质量从30%提高到100%。最后,我们 conclude通过自我超vised强制学习可以缓解跨模态音乐检索模型中的标注数据稀缺性。
Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems
results: 提供了确定的恢复和扩展 garanties,并 derive了过参数 boundsHere’s a more detailed explanation of each point:
for: The paper is written to solve inverse problems using unsupervised feedforward multilayer neural networks.
methods: The paper uses unsupervised feedforward multilayer neural networks to solve inverse problems.
results: The paper provides deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. Additionally, the paper derives overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from the guarantees.Abstract
Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to control the Neural Tangent Kernel. In this work we investigate how to bridge these two worlds and we provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
摘要
neural networks 已成为 inverse problems 的解决方法之一,而且在过去几年中,有许多这种方法被开发出来解决 inverse problems。然而,我们仍然缺乏这些方法的明确理论保证。一方面,许多研究证明了 neural networks 在更一般情况下的抽象上是可控的,通过过 parametrization 来控制 Neural Tangent Kernel。在这篇文章中,我们尝试将这两个世界联系起来,并为 class of unsupervised feedforward multilayer neural networks 解决 inverse problems 提供确定的收敛和恢复保证。我们还 deriv overparametrization 下界,以便在 smooth activation function 的情况下,two-layers Deep Inverse Prior network 能够benefit from our guarantees。
Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval
methods: 该论文提出了一种基于深度神经网络学习的跨模态整合空间,通过适当的相似结构来相关短长 audio和Sheet Music snippet。
results: 该论文通过设计跨模态回归网络,解决了训练网络需要强相关数据和音频-Sheet Music snippet中的音乐内容差异问题。实验结果表明,该方法可以在所有可能的配置下提供更高精度的检索结果,只需要弱相关的音频-Sheet Music pair。Abstract
Many applications of cross-modal music retrieval are related to connecting sheet music images to audio recordings. A typical and recent approach to this is to learn, via deep neural networks, a joint embedding space that correlates short fixed-size snippets of audio and sheet music by means of an appropriate similarity structure. However, two challenges that arise out of this strategy are the requirement of strongly aligned data to train the networks, and the inherent discrepancies of musical content between audio and sheet music snippets caused by local and global tempo differences. In this paper, we address these two shortcomings by designing a cross-modal recurrent network that learns joint embeddings that can summarize longer passages of corresponding audio and sheet music. The benefits of our method are that it only requires weakly aligned audio-sheet music pairs, as well as that the recurrent network handles the non-linearities caused by tempo variations between audio and sheet music. We conduct a number of experiments on synthetic and real piano data and scores, showing that our proposed recurrent method leads to more accurate retrieval in all possible configurations.
摘要
很多跨Modal音乐检索应用都与将乐谱图像与音频录音相连接。一种常见的方法是通过深度神经网络学习一个共同embedding空间,以便通过适当的相似结构相关短段音频和乐谱图像。然而,这种策略存在两个挑战:首先,需要强相关的数据来训练网络;其次,由于音频和乐谱图像片段之间的本地和全局拍速差异,音频和乐谱图像之间的 Musical content会有差异。在这篇论文中,我们解决这两个缺陷,通过设计一种跨Modal循环网络,学习联合表示音频和乐谱图像的joint embedding。我们的方法的优点是:只需弱相关的音频-乐谱图像对,以及循环网络可以处理音频和乐谱图像之间的非线性。我们在synthetic和真实钢琴数据和谱面上进行了许多实验,结果表明,我们提议的循环方法可以在所有配置下实现更高精度的检索。
results: 可以达到25%下降峰值内存使用量和15%快速训练速度,同时保持模型准确性水平Abstract
Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both single and half-precision floating point arithmetic to reduce memory requirements while maintaining model accuracy. We provide here an algorithm to further reduce memory usage during the training of a model by getting rid of the floating point copy of the parameters, virtually keeping only half-precision numbers. We also explore the benefits of getting rid of the gradient's value by executing the optimizer step during the back-propagation. In practice, we achieve up to 25% lower peak memory use and 15% faster training while maintaining the same level of accuracy.
摘要
(以下是简化中文版)传统优化方法通过单精度浮点数运算来实现,这可能会占用大量内存空间和计算资源。然而,混合精度优化技术利用单精度和半精度浮点数运算来减少内存需求,保持模型准确性。我们提供一种算法,以减少训练过程中模型参数的浮点复制,实际上只保留半精度数字。我们还探索了在反传propagation过程中禁用梯度值的优势,通过在反传propagation过程中执行优化器步骤。在实践中,我们达到了25%下降的峰值内存使用量和15%快速训练,同时保持同等准确性。
methods: 使用在线 clustering 方法,基于动态更新的 finite pool of samples 或 gradients,避免提供算法 task 变化信息。
results: 在域增量学习中成功避免灾难性忘记,并在实际 dataset 上进行了实验,比对state-of-the-art 方法的表现。Abstract
We consider the problem of learning multiple tasks in a continual learning setting in which data from different tasks is presented to the learner in a streaming fashion. A key challenge in this setting is the so-called "catastrophic forgetting problem", in which the performance of the learner in an "old task" decreases when subsequently trained on a "new task". Existing continual learning methods, such as Averaged Gradient Episodic Memory (A-GEM) and Orthogonal Gradient Descent (OGD), address catastrophic forgetting by minimizing the loss for the current task without increasing the loss for previous tasks. However, these methods assume the learner knows when the task changes, which is unrealistic in practice. In this paper, we alleviate the need to provide the algorithm with information about task changes by using an online clustering-based approach on a dynamically updated finite pool of samples or gradients. We thereby successfully counteract catastrophic forgetting in one of the hardest settings, namely: domain-incremental learning, a setting for which the problem was previously unsolved. We showcase the benefits of our approach by applying these ideas to projection-based methods, such as A-GEM and OGD, which lead to task-agnostic versions of them. Experiments on real datasets demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
摘要
我们考虑一个多任务学习的情况,在这个情况下,不同任务的数据会在流动的方式下提供给学习者。一个重要的挑战是所谓的“惨重遗传问题”,即在训练新任务后,学习者对旧任务的性能下降。现有的几种对策方法,如Averaged Gradient Episodic Memory(A-GEM)和Orthogonal Gradient Descent(OGD),可以避免惨重遗传,但是这些方法假设学习者知道任务的变化,这是实际上不可能的。在这篇论文中,我们解决这个问题,通过在动态更新的有限组合中使用线上剂化的方法,以避免学习者对旧任务的损害。我们运用这些想法,将A-GEM和OGD等方法转换为任务不对称的版本,并对真实数据进行实验,展示了我们的方法的有效性和与现有方法相比的应用前景。
S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees
methods: 本研究使用了四种主要技术来改善实用性与隐私贸易的问题:(1)改进隐私泄露的规定,使其与实际泄露规律相符;(2)将个人Rényi范围给integrated into our method,以从训练过程中未利用的数据点中学习;(3)将随机决策树分割给集中隐私预算;(4)将隐私预算优化。
results: 我们的评估结果显示,在Abalone dataset(约4k训练数据点)上,我们可以在隐私水平$\varepsilon=0.15$下达到$R^2$-score的0.39,比前一代研究只能在$\varepsilon=10.0$下达到。在Adult dataset(50k训练数据点)上,我们可以在隐私水平$\varepsilon=0.07$下达到test error的18.7%,比前一代研究只能在$\varepsilon=1.0$下达到。在Abalone dataset上,在隐私水平$\varepsilon=0.54$下,我们可以达到$R^2$-score的0.47,仅次于非隐私版本的GBDT。在Adult dataset上,在隐私水平$\varepsilon=0.54$下,我们可以达到test error的17.1%,仅次于非隐私版本的GBDT。Abstract
Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with $O(1/n)$, for $n$ data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset ($<4k$ training data points) a $R^2$-score of $0.39$ for $\varepsilon=0.15$, which the closest prior work only achieved for $\varepsilon=10.0$. On the Adult dataset ($50k$ training data points) we achieve test error of $18.7\,\%$ for $\varepsilon=0.07$ which the closest prior work only achieved for $\varepsilon=1.0$. For the Abalone dataset for $\varepsilon=0.54$ we achieve $R^2$-score of $0.47$ which is very close to the $R^2$-score of $0.54$ for the nonprivate version of GBDT. For the Adult dataset for $\varepsilon=0.54$ we achieve test error $17.1\,\%$ which is very close to the test error $13.7\,\%$ of the nonprivate version of GBDT.
摘要
privacy-preserving 学习gradient boosting decision trees(GBDT)具有强大的用于数据的可用性-隐私贸易,例如人口普查数据或医疗数据:经典GBDT学习者可以从小型数据集中提取非线性模式。我们引入了一种新的具有可证明隐私性质的GBDT学习器,并利用以下四种主要技术来提高用于隐私贸易的质量:1. 我们使用改进的噪声涨落方法,对决策树叶的隐私泄露进行更精细的评估,从而使噪声在平均情况下与$O(1/n)$相关,其中$n$是数据点数。2. 我们将个体Rényi筛选器 integrate到我们的方法中,以利用在训练过程中尚未被利用的数据点,这可能是独立有趣的发现,可能是自然而有效的学习流程。3. 我们利用随机决策树分裂的概念,将隐私预算集中在学习叶。4. 我们使用隐私压缩。我们的评估表明,对于Abalone数据集(训练数据点数 fewer than 4k),我们可以在$\ε=0.15$下 achieved $R^2$-score of 0.39,而最接近的前一个工作只能在$\ε=10.0$下达到这个成绩。对于Adult数据集(训练数据点数为50k),我们可以在$\ε=0.07$下 achieved test error of 18.7%,而最接近的前一个工作只能在$\ε=1.0$下达到这个成绩。对于Abalone数据集,当$\varepsilon=0.54$时,我们可以 achieved $R^2$-score of 0.47,几乎与非隐私版GBDT的$R^2$-score相同(0.54)。对于Adult数据集,当$\varepsilon=0.54$时,我们可以 achieved test error of 17.1%,几乎与非隐私版GBDT的test error相同(13.7%)。
Uplift vs. predictive modeling: a theoretical analysis
results: 研究发现,在某些情况下,uplift模型可以超过predictive方法的性能,但这取决于一些参数,如 Mutual Information、variance of estimators、distribution of potential outcomes 和underlying costs and benefits。Abstract
Despite the growing popularity of machine-learning techniques in decision-making, the added value of causal-oriented strategies with respect to pure machine-learning approaches has rarely been quantified in the literature. These strategies are crucial for practitioners in various domains, such as marketing, telecommunications, health care and finance. This paper presents a comprehensive treatment of the subject, starting from firm theoretical foundations and highlighting the parameters that influence the performance of the uplift and predictive approaches. The focus of the paper is on a binary outcome case and a binary action, and the paper presents a theoretical analysis of uplift modeling, comparing it with the classical predictive approach. The main research contributions of the paper include a new formulation of the measure of profit, a formal proof of the convergence of the uplift curve to the measure of profit ,and an illustration, through simulations, of the conditions under which predictive approaches still outperform uplift modeling. We show that the mutual information between the features and the outcome plays a significant role, along with the variance of the estimators, the distribution of the potential outcomes and the underlying costs and benefits of the treatment and the outcome.
摘要
尽管机器学习技术在决策中日益受欢迎,但是 causal-oriented 策略在相关文献中对于纯机器学习方法的加值 rarely been quantified. 这些策略在各个领域,如市场营销、电信、医疗和金融中都非常重要。这篇论文从公司理论基础开始, highlighting the parameters that influence the performance of the uplift and predictive approaches。文章的ocus是二分类结果的情况,并对 uplift 模型与传统预测方法进行比较。文章的主要研究贡献包括:1. 一种新的衡量利润的形式。2. 预测曲线的整合到利润的正式证明。3. 通过模拟来说明,在某些条件下,预测方法仍然超越 uplift 模型。我们发现,Feature 和结果之间的共识度和估计器的方差、潜在结果的分布、对于治疗和结果的成本和利益都对 uplift 模型的性能产生了重要影响。
Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
paper_authors: Tiago da Silva, Eliezer Silva, Adèle Ribeiro, António Góis, Dominik Heider, Samuel Kaski, Diego Mesquita for:The paper is written to address the issue of brittleness in causal discovery algorithms when dealing with scarce data and latent confounders, and to provide a new method that incorporates expert knowledge to improve the inference process.methods:The paper proposes a new method that uses generative flow networks to sample causal ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC). The method also leverages an optimal experimental design to iteratively probe the expert about the relations among variables, and updates the samples with human feedback via importance sampling.results:The paper shows through experiments with synthetic observational data that the proposed method can accurately sample from distributions over ancestral graphs and greatly improve inference quality with human aid.Abstract
Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.
摘要
STRUCTURE learning 是 causal inference 的关键。尤其是 causal discovery(CD)算法在数据稀缺时会变得脆弱,可能推断不准确的 causal 关系,而且这些关系可能与专家知识相悖。此外,大多数 CD 方法不提供 uncertainty 估计,使得用户很难 интерпретирова结果并改进推断过程。尚未有任何works 关注建立可以 both 1) 输出 uncertainty 估计,并且 2) 与专家进行迭代改进 CD 的方法。为解决这些问题,我们开始由 sampling (causal) ancestral graphs 根据信念分布(如 Bayesian information criterion,BIC)中的分配函数,使用生成流网络。然后,我们利用候选图的多样性,引入 optimal experimental design 以让专家关于变量之间的关系进行反馈,从而减少我们对 ancestral graphs 的信念不确定性。最后,我们通过 importance sampling 更新我们的样本,以反映专家的反馈。需要注意的是,我们的方法不需要 causal sufficiency(即存在无观察隐变量)。使用 sintetic observational data 的实验表明,我们的方法可以准确地从 distributions over ancestral graphs 中采样,并且可以通过专家的帮助大幅提高推断质量。
Methods for generating and evaluating synthetic longitudinal patient data: a systematic review
paper_authors: Katariina Perkonoja, Kari Auranen, Joni Virta
for: This paper is written for researchers and developers who are interested in generating and evaluating synthetic longitudinal patient data in medicine, with the aim of addressing the issue of data privacy and availability.
methods: The paper presents a systematic review of 17 methods for generating and evaluating synthetic longitudinal patient data, including traditional simulation techniques and modern deep learning methods. The methods are evaluated based on their type, source code availability, and approaches used to assess resemblance, utility, and privacy.
results: The paper provides a comprehensive overview of the existing methods for generating and evaluating synthetic longitudinal patient data, and discusses practical guidelines and key considerations for developing such methods. The paper also highlights the challenges and limitations of these methods, and identifies future research directions in this area.Abstract
The proliferation of data in recent years has led to the advancement and utilization of various statistical and deep learning techniques, thus expediting research and development activities. However, not all industries have benefited equally from the surge in data availability, partly due to legal restrictions on data usage and privacy regulations, such as in medicine. To address this issue, various statistical disclosure and privacy-preserving methods have been proposed, including the use of synthetic data generation. Synthetic data are generated based on some existing data, with the aim of replicating them as closely as possible and acting as a proxy for real sensitive data. This paper presents a systematic review of methods for generating and evaluating synthetic longitudinal patient data, a prevalent data type in medicine. The review adheres to the PRISMA guidelines and covers literature from five databases until the end of 2022. The paper describes 17 methods, ranging from traditional simulation techniques to modern deep learning methods. The collected information includes, but is not limited to, method type, source code availability, and approaches used to assess resemblance, utility, and privacy. Furthermore, the paper discusses practical guidelines and key considerations for developing synthetic longitudinal data generation methods.
摘要
“在最近几年,数据的普及和深入应用的技术得到了普及和应用,从而加速了研究和开发活动。然而,不是所有领域都得到了同等的利益,部分是因为数据使用和隐私法规的限制,如医学。为解决这个问题,一些统计透明度和隐私保护方法被提议,包括使用 sintetic 数据生成。 sintetic 数据是基于现有数据,目的是尽可能地复制它们,并作为真正敏感数据的代理。本文发表了一项系统性的评论,涵盖了生成和评估 sintetic 长期患者数据的方法。评论遵循 PRISMA 指南,检索到2022年底止的五个数据库中的文献。文章描述了 17 种方法,从传统的模拟技术到现代的深度学习方法。收集的信息包括,但不限于:方法类型、代码可用性和用于评估相似性、有用性和隐私的方法。此外,文章还讨论了实践指南和关键考虑事项,用于开发 sintetic 长期患者数据生成方法。”
Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint
results: 对一些实验INSTANCES(Influence Maximization和Sensor Placement等)进行了评估,结果表明,提出的算法可以保持理论上的质量,同时减少了查询数量。Abstract
The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of the big size of data. This paper introduces two deterministic approximation algorithms for the problem that competitively improve the query complexity of existing algorithms. Our first algorithm, $\LAA$, returns an approximation ratio of $1/19$ within $O(nk)$ query complexity. The second one, $\RLA$, improves the approximation ratio to $1/5-\epsilon$ in $O(nk)$ queries, where $\epsilon$ is an input parameter. Our algorithms are the first ones that provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective. They, therefore, need fewer the number of queries than state-of-the-the-art ones by a factor of $\Omega(\log n)$. Besides the theoretical analysis, we have evaluated our proposed ones with several experiments in some instances: Influence Maximization and Sensor Placement for the problem. The results confirm that our algorithms ensure theoretical quality as the cutting-edge techniques and significantly reduce the number of queries.
摘要
“$\kSMK$问题中的非升渐函数最大化问题已经在机器学习中出现了多种应用,如数据概要、信息传播等。然而,现有的算法对这个问题存在两个问题:如何处理非升渐情况,以及如何快速返回良好的解决方案。这篇论文提出了两种杜氏抽象算法来解决这个问题,其中一种是$\LAA$算法,可以在$O(nk)$查询复杂度下提供$1/19$的近似比率;另一种是$\RLA$算法,可以在$O(nk)$查询复杂度下提供$1/5-\epsilon$的近似比率,其中$\epsilon$是输入参数。这些算法是第一个在非升渐情况下提供常数近似比率的$O(nk)$查询复杂度内部的算法。因此,它们比现有的算法快速返回更好的解决方案,并且可以避免$\Omega(\log n)$的查询复杂度。”I hope this helps! Let me know if you have any further questions or if you'd like me to translate anything else.
Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization
results: 在两个测试集上,相比于回归和分类模型,关系模型在选择阶段表现出了明显的优势,而使用模拟器选择的未评估解决方案也显著提高了算法的效率。Abstract
Surrogate-assisted evolutionary algorithms (SAEAs) hold significant importance in resolving expensive optimization problems~(EOPs). Extensive efforts have been devoted to improving the efficacy of SAEAs through the development of proficient model-assisted selection methods. However, generating high-quality solutions is a prerequisite for selection. The fundamental paradigm of evaluating a limited number of solutions in each generation within SAEAs reduces the variance of adjacent populations, thus impacting the quality of offspring solutions. This is a frequently encountered issue, yet it has not gained widespread attention. This paper presents a framework using unevaluated solutions to enhance the efficiency of SAEAs. The surrogate model is employed to identify high-quality solutions for direct generation of new solutions without evaluation. To ensure dependable selection, we have introduced two tailored relation models for the selection of the optimal solution and the unevaluated population. A comprehensive experimental analysis is performed on two test suites, which showcases the superiority of the relation model over regression and classification models in the selection phase. Furthermore, the surrogate-selected unevaluated solutions with high potential have been shown to significantly enhance the efficiency of the algorithm.
摘要
SAEs(代理协助进化算法)在解决成本高的优化问题(EOPs)中具有重要 significanc。 总的来说,大量的努力已经投入到提高 SAEs 的效果,特别是通过开发高效的模型协助选择方法。然而,生成高质量的解决方案是选择解决方案的前提。 SAEs 中每代评估一部分解决方案的基本思路会减少邻近 популяции的变异,从而影响下一代解决方案的质量。这是一个常见的问题,然而它尚未受到广泛关注。本文提出了一种基于未评估解决方案的框架,使用代理模型来确定高质量的解决方案,以直接生成新的解决方案而不需要评估。为确保可靠的选择,我们已经引入了两种特定的关系模型,一种用于选择优质解决方案,另一种用于选择未评估的人口。我们对两个测试集进行了全面的实验分析,结果表明,关系模型在选择阶段的性能明显超过了回归和分类模型。此外,使用代理选择的未评估解决方案显示有很大的优化效果。
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
methods: 论文使用了CTC和变量模型的组合,并 derivated two 个版本的变量CTC。这两个版本都假设了不同的假设,即每个时间步骤的变量 latent 变量是独立的,以及这些变量是Markovian。
results: 论文显示了这两个版本的变量CTC 都可以直接优化变量下界,并提供了计算可能的实现方式。Abstract
Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.
摘要
<>传输连接主义(CTC)通常用于序列模型任务,如语音识别,因为需要保持输入和目标序列之间的顺序。然而,CTC只适用于决定性序列模型,其潜在空间离散和稀疏,这使得它们在数据变化时更难处理。在这篇论文中,我们将CTC与可变模型结合,并 derive loss函ls可以用来训练更一般化的序列模型,保持顺序。我们 derivTwo versions of the novel variational CTC based on two reasonable assumptions:the first is that the variational latent variables at each time step are conditionally independent; and the second is that these latent variables are Markovian。我们显示这两个损失函数可以直接优化可变下界,并提供了实现方法。Note: "潜在空间" (pinyin: "màn zì kōng chǎng") is a term used in information theory and machine learning to refer to the space of all possible states of a system, and "离散" (pinyin: "liáo chǎng") means "discrete". "Markovian" (pinyin: "mǎ kè yuán") refers to a system that satisfies the Markov property, which states that the future state of the system depends only on its current state, and not on any of its past states.
Generating Hierarchical Structures for Improved Time Series Classification Using Stochastic Splitting Functions
results: 研究表明,使用了rocket和svm分类器,和不同的分割函数,该方法在约半数和一第数据集中显著提高了分类性能。此外,研究还探讨了不同的数据特征和层次结构如何影响HC性能。Abstract
This study introduces a novel hierarchical divisive clustering approach with stochastic splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC). The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy. By systematically dividing classes into two subsets based on their discriminability according to the classifier, the proposed approach constructs a binary tree representation of hierarchical classes. The approach is evaluated on 46 multi-class time series datasets using popular classifiers (svm and rocket) and SSFs (potr, srtr, and lsoo). The results reveal that the approach significantly improves classification performance in approximately half and a third of the datasets when using rocket and svm as the classifier, respectively. The study also explores the relationship between dataset features and HC performance. While the number of classes and flat classification (FC) score show consistent significance, variations are observed with different splitting functions. Overall, the proposed approach presents a promising strategy for enhancing classification by generating hierarchical structure in multi-class time series datasets. Future research directions involve exploring different splitting functions, classifiers, and hierarchy structures, as well as applying the approach to diverse domains beyond time series data. The source code is made openly available to facilitate reproducibility and further exploration of the method.
摘要
results: 作者提出了 indirect免疫的概念,并重复了之前的分析。此外,他们还提出了对免疫概率的敏感分析方法。Abstract
This work is devoted to the study of the probability of immunity, i.e. the effect occurs whether exposed or not. We derive necessary and sufficient conditions for non-immunity and $\epsilon$-bounded immunity, i.e. the probability of immunity is zero and $\epsilon$-bounded, respectively. The former allows us to estimate the probability of benefit (i.e., the effect occurs if and only if exposed) from a randomized controlled trial, and the latter allows us to produce bounds of the probability of benefit that are tighter than the existing ones. We also introduce the concept of indirect immunity (i.e., through a mediator) and repeat our previous analysis for it. Finally, we propose a method for sensitivity analysis of the probability of immunity under unmeasured confounding.
摘要
这项研究专门研究了免疫的概率,即效果发生或不发生。我们 deriv了免疫的必要和 suficient 条件,即免疫概率为零和 $\epsilon$- bounded 免疫,分别表示效果发生只有在曝露的情况下,和效果发生的概率在 $\epsilon$ 范围内。前者允许我们从随机化控制试验中估算效果发生的概率,而后者允许我们生成更紧的效果发生的概率 bounds。我们还介绍了间接免疫(通过介质)的概念,并重复了我们之前的分析。最后,我们提出了对免疫概率下无量化干扰的敏感分析方法。
A Machine Learning-oriented Survey on Tiny Machine Learning
results: 论文提出了三种实现tinyml系统的工作流程(ML-oriented、HW-oriented和协同设计),并对tinyml中的学习领域进行了详细探讨,包括不同家族的模型优化和设计,以及当前领先的学习技术。Abstract
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.
摘要 tinyml 的出现已经对人工智能领域产生了积极的革命,推动了资源有限的 iot 硬件设备和其学习基础架构的共同设计。 tinyml 在第四和第五个工业革命中发挥着重要的作用,帮助社会、经济和个人使用有效的 ai 混合技术(如智能城市、汽车和医疗机器人)。由于 tinyml 的多学科性质,这个领域被不同的方法研究:这篇系统性评价报告尝试提供 tinyml 基础上的所有学习算法的全面概述。本文采用 prism 方法ológico流程,以系统和完整的方式进行文献评价。特别是,我们将首先描述 tinyml 基础上的三个不同工作流程,即 ml oriented、hw oriented 和 co-design。其次,我们提出了 tinyml 视野下的学习天空分类,详细探讨不同家族的模型优化和设计,以及当前领域的 state-of-the-art 学习技术。 finally,这篇评价报告将介绍当前 tinyml 智能边缘应用中的最新硬件设备和软件工具。最后,我们讨论了挑战和未来方向。Note: "tinyml" in the text is translated as "小Machine Learning" in Simplified Chinese, which is a common way to refer to the field of Tiny Machine Learning.
Shedding Light on the Ageing of Extra Virgin Olive Oil: Probing the Impact of Temperature with Fluorescence Spectroscopy and Machine Learning Techniques
results: 研究显示,辐射光谱可以准确地监测EVOO的氧化程度,并且可以通过Machine Learning来处理高度归一化的数据,从而提供一种可以在场地条件下进行评估的方法。Abstract
This work systematically investigates the oxidation of extra virgin olive oil (EVOO) under accelerated storage conditions with UV absorption and total fluorescence spectroscopy. With the large amount of data collected, it proposes a method to monitor the oil's quality based on machine learning applied to highly-aggregated data. EVOO is a high-quality vegetable oil that has earned worldwide reputation for its numerous health benefits and excellent taste. Despite its outstanding quality, EVOO degrades over time owing to oxidation, which can affect both its health qualities and flavour. Therefore, it is highly relevant to quantify the effects of oxidation on EVOO and develop methods to assess it that can be easily implemented under field conditions, rather than in specialized laboratories. The following study demonstrates that fluorescence spectroscopy has the capability to monitor the effect of oxidation and assess the quality of EVOO, even when the data are highly aggregated. It shows that complex laboratory equipment is not necessary to exploit fluorescence spectroscopy using the proposed method and that cost-effective solutions, which can be used in-field by non-scientists, could provide an easily-accessible assessment of the quality of EVOO.
摘要
Phase Synchrony Component Self-Organization in Brain Computer Interface
paper_authors: Xu Niu, Na Lu, Huan Luo, Ruofan Yan
for: This paper aims to develop a deep learning end-to-end network for motor imagery (MI) classification based on phase synchrony information, which can automatically extract optimal filters for preprocessing and channel selection, and achieve better performance than traditional methods.
methods: The proposed method uses a deep learning network to directly extract phase synchrony-based features from raw EEG signals and perform classification. The network learns optimal filters during training, which are obtained when the network achieves peak classification results.
results: The proposed method outperforms state-of-the-art methods and discovers significant phase synchronization phenomena in tongue MI, with an average PLV exceeding 0.87 across all tongue MI samples. This high PLV indicates a groundbreaking discovery in the synchrony pattern of tongue MI.Abstract
Phase synchrony information plays a crucial role in analyzing functional brain connectivity and identifying brain activities. A widely adopted feature extraction pipeline, composed of preprocessing, selection of EEG acquisition channels, and phase locking value (PLV) calculation, has achieved success in motor imagery classification (MI). However, this pipeline is manual and reliant on expert knowledge, limiting its convenience and adaptability to different application scenarios. Moreover, most studies have employed mediocre data-independent spatial filters to suppress noise, impeding the exploration of more significant phase synchronization phenomena. To address the issues, we propose the concept of phase synchrony component self-organization, which enables the adaptive learning of data-dependent spatial filters for automating both the preprocessing and channel selection procedures. Based on this concept, the first deep learning end-to-end network is developed, which directly extracts phase synchrony-based features from raw EEG signals and perform classification. The network learns optimal filters during training, which are obtained when the network achieves peak classification results. Extensive experiments have demonstrated that our network outperforms state-of-the-art methods. Remarkably, through the learned optimal filters, significant phase synchronization phenomena can be observed. Specifically, by calculating the PLV between a pair of signals extracted from each sample using two of the learned spatial filters, we have obtained an average PLV exceeding 0.87 across all tongue MI samples. This high PLV indicates a groundbreaking discovery in the synchrony pattern of tongue MI.
摘要
<> simultanous phase information plays a crucial role in analyzing functional brain connectivity and identifying brain activities. A widely adopted feature extraction pipeline, composed of preprocessing, selection of EEG acquisition channels, and phase locking value (PLV) calculation, has achieved success in motor imagery classification (MI). However, this pipeline is manual and reliant on expert knowledge, limiting its convenience and adaptability to different application scenarios. Moreover, most studies have employed mediocre data-independent spatial filters to suppress noise, impeding the exploration of more significant phase synchronization phenomena. To address the issues, we propose the concept of phase synchrony component self-organization, which enables the adaptive learning of data-dependent spatial filters for automating both the preprocessing and channel selection procedures. Based on this concept, the first deep learning end-to-end network is developed, which directly extracts phase synchrony-based features from raw EEG signals and perform classification. The network learns optimal filters during training, which are obtained when the network achieves peak classification results. Extensive experiments have demonstrated that our network outperforms state-of-the-art methods. Remarkably, through the learned optimal filters, significant phase synchronization phenomena can be observed. Specifically, by calculating the PLV between a pair of signals extracted from each sample using two of the learned spatial filters, we have obtained an average PLV exceeding 0.87 across all tongue MI samples. This high PLV indicates a groundbreaking discovery in the synchrony pattern of tongue MI.中文简体版:同步相关信息在分析 fonctional brain connectivity 和脑活动中扮演了关键角色。一种广泛采用的特征提取管道,包括预处理、EEG采集通道选择和相位锁定值(PLV)计算,在motor imagery classification(MI)中取得了成功。然而,这个管道是手动操作的,受专家知识的限制,因此对不同应用场景的可 conveniency和适应性具有局限性。此外,大多数研究都使用了平均数据独立的空间滤波器来抑制噪声,这阻碍了更进一步的相同报时现象的探索。为解决这些问题,我们提出了相位同步组成自组织的概念,允许自动学习数据dependent的空间滤波器,以自动进行预处理和采集通道选择。基于这个概念,我们开发了首个深度学习端到端网络,直接从原始 EEG 信号中提取相位同步基于特征,并进行分类。该网络在训练时 learns 优化的滤波器,当网络达到最高分类结果时,获得最佳的滤波器。广泛的实验证明,我们的网络超过了当前的状态艺方法。另外,通过我们学习的优化滤波器,可以观察到更加明显的相同报时现象。例如,通过计算每个样本中对应的两个学习的空间滤波器之间的相位锁定值(PLV),我们在所有舌MI样本中获得了平均PLV超过0.87。这高PLV表明了一个很有前途的发现在舌MI中的同步模式。
From Peptides to Nanostructures: A Euclidean Transformer for Fast and Stable Machine Learned Force Fields
paper_authors: J. Thorben Frank, Oliver T. Unke, Klaus-Robert Müller, Stefan Chmiela
For: The paper aims to improve the stability and efficiency of machine learned force fields (MLFFs) in molecular dynamics (MD) simulations, particularly for systems with large numbers of degrees of freedom.* Methods: The authors propose a transformer architecture called SO3krates, which combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism to separate invariant and equivariant information. This allows for more efficient and stable MD simulations.* Results: The authors demonstrate the ability of SO3krates to generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms, and explore the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. The results show that SO3krates can balance the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.Here is the same information in Simplified Chinese text:* 为:文章目的是提高机器学习力学场(MLFFs)在分子动力学(MD)模拟中的稳定性和效率,特别是在多个自由度系统上。* 方法:作者提出了一种名为SO3krates的变换架构,它将稀缺几何变换(Euclidean variables)与自我注意机制结合起来,以分离不变和变换的信息。这使得MD模拟更加稳定和高效。* 结果:作者demonstrate了SO3krates可以生成稳定的MD轨迹 для柔软蛋白质和含百个原子的超分子结构,并explore了中等长的链状分子(如小蛋白质)的PES顶点结构,探索了千个最低能量态。结果表明,SO3krates可以均衡稳定性和训练数据之外的新最低能量配置的出现,这是生物化学领域中的实际探索任务中的关键。Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the suitability of MLFFs in molecular dynamics (MD) simulations is being increasingly scrutinized due to concerns about instability. Our findings suggest a potential connection between MD simulation stability and the presence of equivariant representations in MLFFs, but their computational cost can limit practical advantages they would otherwise bring. To address this, we propose a transformer architecture called SO3krates that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that can separate invariant and equivariant information, eliminating the need for expensive tensor products. SO3krates achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on unprecedented time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3krates demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
摘要
近年来,Machine learned force fields(MLFFs)基于初始参考计算的发展呈现了很大的进步。尽管它们在测试中的错误很低,但MLFFs在分子动力学(MD) simulations中的适用性受到了越来越多的质疑,因为有关其稳定性的担忧。我们的发现表明MLFFs中的equivariant表示可能与MD simulations的稳定性有关,但计算成本限制了它们在实际应用中的实用性。为解决这个问题,我们提出了一种名为SO3krates的变换架构,它结合了稀缺的equivariant表示(欧几何变量)和一种自注意机制,可以分离 invariantinformation和 equivariant information,从而消除高成本的tensor乘积。SO3krates实现了一种独特的精度、稳定性和速度的平衡,使得可以在前所未有的时间和系统大小 scales上进行有用的分子性质的分析。为证明这一点,我们生成了稳定的MD trajectory дляflexible peptides和supra-molecular structures with hundreds of atoms。此外,我们还 investigate了medium-sized chainlike molecules(例如小蛋白质)的PES topology,通过探索 thousands of minimum。特别是,SO3krates表现出可以平衡稳定性和训练数据之外的新的最低能 conformations的能力,这是生物化学领域中的实际探索任务中的关键。
Limited Communications Distributed Optimization via Deep Unfolded Distributed ADMM
methods: 这篇论文提出了一种新的分布式优化算法,即折叠分布式D-ADMM,它通过Iteratively combining local computations和message exchanges来实现分布式优化。
results: 该论文的数据结果表明,折叠分布式D-ADMM可以减少D-ADMM中的消息交换量,同时保持了D-ADMM的性能。此外,该论文还特化了折叠分布式D-ADMM的应用于分布式估算和分布式学习等场景。Abstract
Distributed optimization is a fundamental framework for collaborative inference and decision making in decentralized multi-agent systems. The operation is modeled as the joint minimization of a shared objective which typically depends on observations gathered locally by each agent. Distributed optimization algorithms, such as the common D-ADMM, tackle this task by iteratively combining local computations and message exchanges. One of the main challenges associated with distributed optimization, and particularly with D-ADMM, is that it requires a large number of communications, i.e., messages exchanged between the agents, to reach consensus. This can make D-ADMM costly in power, latency, and channel resources. In this work we propose unfolded D-ADMM, which follows the emerging deep unfolding methodology to enable D-ADMM to operate reliably with a predefined and small number of messages exchanged by each agent. Unfolded D-ADMM fully preserves the operation of D-ADMM, while leveraging data to tune the hyperparameters of each iteration of the algorithm. These hyperparameters can either be agent-specific, aiming at achieving the best performance within a fixed number of iterations over a given network, or shared among the agents, allowing to learn to distributedly optimize over different networks. For both settings, our unfolded D-ADMM operates with limited communications, while preserving the interpretability and flexibility of the original D-ADMM algorithm. We specialize unfolded D-ADMM for two representative settings: a distributed estimation task, considering a sparse recovery setup, and a distributed learning scenario, where multiple agents collaborate in learning a machine learning model. Our numerical results demonstrate that the proposed approach dramatically reduces the number of communications utilized by D-ADMM, without compromising on its performance.
摘要
分布式优化是多机合作推理和决策的基础框架,用于分布式多代理系统中的共同目标最小化。该操作通常基于每个代理收集本地观测数据所得到的共同目标函数。分布式优化算法,如共同D-ADMM,通过融合本地计算和信息交换来实现这个任务。但是,分布式优化具有许多通信 overhead,特别是D-ADMM,可能会占用大量的功能、延迟和通信资源。在这种情况下,我们提出了 unfolded D-ADMM,它采用深度嵌入方法来允许D-ADMM在固定并小于数量的消息交换中进行可靠地操作。 unfolded D-ADMM保留了D-ADMM的操作,并通过数据来调整每个迭代的超参数。这些超参数可以是特定于代理的,寻求在给定网络上达到最佳性能 Within 一定数量的迭代,或者是共享的,以学习分布式优化不同网络。在这两种设置下,我们的 unfolded D-ADMM 具有限制通信的特点,同时保持了原始 D-ADMM 的解释性和灵活性。我们在分布式估计任务和分布式学习场景中特化 unfolded D-ADMM,我们的数值结果表明,我们的方法可以很大幅降低 D-ADMM 使用的通信量,不会影响其性能。
Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization
results: 本文的实验结果表明,使用块级量化中间Activation map 可以减少 GPU 内存消耗和运行时间,同时保持相似的性能交易。Abstract
Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.
摘要
大规模图 neuron 网络(GNNs)的高效训练已经得到了研究的重点,以减少它们的内存占用。工作 by Liu et al. (2022) 提出了极化活动压缩(EXACT)策略,通过对中间活动地图进行量化,以INT2精度进行压缩,并达到了大幅减少GPU内存占用的目标。在这项工作中,我们提出了对EXACT策略的改进,通过对中间活动地图进行块式量化。我们通过不同的块大小进行实验分析,并证明了可以得到更大的内存占用减少(>15%)和每 epoch 的运行速度增加(约5%),即使在执行极端的量化时,与原始EXACT的性能交换空间保持相同。此外,我们还提出了对EXACT中对中间活动地图分布的假设(假设为均匀分布)的修正,并显示了压缩和解压缩步骤的变量估计得到了改善。
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
paper_authors: Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, Wenxuan Tu, Sihang Zhou, Xinwang Liu for:This paper proposes a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, to handle the temporal information in multi-modal data.methods:The proposed method constructs a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments, and models the temporal relationships between nodes using graph learning techniques.results:Experiments demonstrate that TMac outperforms other state-of-the-art models in performance, smoothing capturing the dynamic information in intra-modal and inter-modal.Abstract
Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.
摘要
在数字时代,audiovisual数据 everywhere,这heightened the requirements for deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.
A Comprehensive Review of Community Detection in Graphs
paper_authors: Songning Lai, Jiakang Li, Yonggang Lu
for: 本文旨在探讨复杂网络中社区结构的检测问题,以解释复杂系统的组织和功能。
methods: 本文介绍了多种社区检测方法,包括作者提出的新方法。
results: 本文 explore了多种实际应用场景,并提供了对社区检测问题的深入理解。Abstract
The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a crucial role in understanding the organization and functioning of complex systems. We begin by introducing the concept of community structure, which refers to the arrangement of vertices into clusters, with strong internal connections and weaker connections between clusters. Then, we provide a thorough exposition of various community detection methods, including a new method designed by us. Additionally, we explore real-world applications of community detection in diverse networks. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs. It serves as a valuable resource for researchers and practitioners in multiple disciplines, offering insights into the challenges, methodologies, and applications of community detection in complex networks.
摘要
研究复杂网络已有很大进步,我们对社区结构的理解得到了深刻的提高。检测社区在图中的问题是一个复杂的问题,在社会学、生物学和计算机科学等领域都有着广泛的应用。尽管科学家们努力协作,但是满意的解决方案仍然没有得到。这篇文章探讨社区检测在图中的问题,这是理解复杂系统的关键组成部分。我们首先介绍社区结构的概念,即顶点的分布在团队中,具有内部强连接和 между团队的弱连接。然后,我们提供了详细的社区检测方法,包括我们自己的新方法。此外,我们还探讨了不同网络中社区检测的实际应用。结束时,这篇综述提供了对社区检测在图中的深入理解,作为多种领域的研究人员和实践者的 valuabe资源,它提供了对复杂网络的挑战、方法和应用的深入了解。
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation
paper_authors: Xinyu Tang, Richard Shin, Huseyin A. Inan, Andre Manoel, Fatemehsadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Robert Sim
results: 经过广泛的实验证明,我们的算法可以实现有效的ICL,并且可以保证高度的隐私水平。这些结果开启了新的可能性,允许ICL在隐私保证下进行应用。Abstract
We study the problem of in-context learning (ICL) with large language models (LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications.
摘要
我们研究了大语言模型(LLM)在私有数据上进行境界学习(ICL)问题,这种情况可能会导致语言模型泄露或重复示例。我们提出了一种新的算法,可以从私有数据集中生成几个步骤示例,并具有正式权限保证(DP)。我们通过实验证明,该算法可以实现有效的ICL,并且与非私有ICL和零例解决方案进行比较。我们的结果表明,我们的算法可以实现竞争性的性能,同时保证隐私水平。这些结果开启了新的可能性,允许在隐私保护下进行ICL应用广泛。
Extracting Physical Causality from Measurements to Detect and Localize False Data Injection Attacks
paper_authors: Shengyang Wu, Jingyu Wang, Dongyuan Shi for: 这个研究旨在探讨 False Data Injection Attack (FDIA) 在现代 циber-物理力系统中的问题,并提出一个基于 causal inference 和 Graph Attention Network (GAT) 的共同检测和地点化框架,以检测侵入系统中的攻击。methods: 本研究使用 X-learner 算法估算测量之间的 causality strength,生成 Measurement Causality Graphs (MCGs),然后使用 GAT 检测 MCGs 中的异常模式,从而检测侵入系统中的攻击。results: 实验结果显示,基于 causal inference 和 GAT 的检测和地点化框架具有高度可读性和稳定性,并且能够快速和精确地检测侵入系统中的攻击。Abstract
False Data Injection Attack (FDIA) has become a growing concern in modern cyber-physical power systems. Most existing FDIA detection techniques project the raw measurement data into a high-dimensional latent space to separate normal and attacked samples. These approaches focus more on the statistical correlations of data values and are therefore susceptible to data distribution drifts induced by changes in system operating points or changes in FDIA types and strengths, especially for FDIA localization tasks. Causal inference, on the other hand, extracts the causality behind the coordinated fluctuations of different measurements. The causality patterns are determined by fundamental physical laws such as Ohm's Law and Kirchhoff's Law. They are sensitive to the violation of physical laws caused by FDIA, but tend to remain stable with the drift of system operating points. Leveraging this advantage, this paper proposes a joint FDIA detection and localization framework based on causal inference and the Graph Attention Network (GAT) to identify the attacked system nodes. The proposed framework consists of two levels. The lower level uses the X-learner algorithm to estimate the causality strength between measurements and generate Measurement Causality Graphs (MCGs). The upper level then applies a GAT to identify the anomaly patterns in the MCGs. Since the extracted causality patterns are intrinsically related to the measurements, it is easier for the upper level to figure out the attacked nodes than the existing FDIA localization approaches. The performance of the proposed framework is evaluated on the IEEE 39-bus system. Experimental results show that the causality-based FDIA detection and localization mechanism is highly interpretable and robust.
摘要
现代半导体系统中的假数据插入攻击(FDIA)已成为一种快速增长的问题。大多数现有的FDIA检测技术将原始测量数据投影到高维的干扰空间中,以分离正常和攻击的样本。这些方法更关注数据值的统计相关性,因此容易受到系统操作点的变化或攻击类型和强度的变化的影响,特别是对FDIA的本地化任务。然而, causal inference 可以提取测量数据中的 causality 模式,这些模式是基于物理法则,如奥姆的法则和基本电路的法则。它们对 FDIA 的攻击而言是不稳定的,但是对系统操作点的变化具有稳定性。基于这个优势,本文提出了一种基于 causal inference 和图注意力网络(GAT)的 Joint FDIA 检测和本地化框架,用于标识攻击的系统节点。该框架包括两层。下层使用 X-learner 算法来估算测量之间的 causality 强度,并生成 Measurement Causality Graphs(MCGs)。上层然后使用 GAT 来识别 MCGs 中的异常模式。由于提取的 causality 模式与测量数据直接相关,因此上层更容易于确定攻击的节点,而不是现有的 FDIA 本地化方法。本文的性能被评估在 IEEE 39-bus 系统上。实验结果表明,基于 causal inference 的 FDIA 检测和本地化机制具有高度可读性和稳定性。
Unveiling Optimal SDG Pathways: An Innovative Approach Leveraging Graph Pruning and Intent Graph for Effective Recommendations
methods: 这篇论文使用了User Graph after Pruning和Intent Graph(UGPIG)方法,具体来说是利用删减后的用户图高密度连接能力来解决推荐算法对空间不均衡的问题,并且建立了意图图以捕捉目标区域的偏好。
results: 根据实验结果,UGPIG方法比现有的推荐算法如KGCN、KGAT和KGIN等表现更好,具体来说是在Top-3推荐性能中实现了最大提升9.61%。Abstract
The recommendation of appropriate development pathways, also known as ecological civilization patterns for achieving Sustainable Development Goals (namely, sustainable development patterns), are of utmost importance for promoting ecological, economic, social, and resource sustainability in a specific region. To achieve this, the recommendation process must carefully consider the region's natural, environmental, resource, and economic characteristics. However, current recommendation algorithms in the field of computer science fall short in adequately addressing the spatial heterogeneity related to environment and sparsity of regional historical interaction data, which limits their effectiveness in recommending sustainable development patterns. To overcome these challenges, this paper proposes a method called User Graph after Pruning and Intent Graph (UGPIG). Firstly, we utilize the high-density linking capability of the pruned User Graph to address the issue of spatial heterogeneity neglect in recommendation algorithms. Secondly, we construct an Intent Graph by incorporating the intent network, which captures the preferences for attributes including environmental elements of target regions. This approach effectively alleviates the problem of sparse historical interaction data in the region. Through extensive experiments, we demonstrate that UGPIG outperforms state-of-the-art recommendation algorithms like KGCN, KGAT, and KGIN in sustainable development pattern recommendations, with a maximum improvement of 9.61% in Top-3 recommendation performance.
摘要
“推荐合适发展路径”(简称“生态文明模式”)是实现可持续发展目标(即可持续发展模式)的重要因素。为此,推荐过程必须考虑特定区域的自然、环境、资源和经济特点。然而,现有的计算机科学领域的推荐算法对于区域当地环境和历史互动数据的稀畴性均有所缺乏,从而限制了它们在可持续发展模式推荐方面的效果。为解决这些挑战,本文提出了一种方法 called User Graph after Pruning and Intent Graph (UGPIG)。首先,我们利用高密度连结能力的删除后User Graph来解决推荐算法对于区域当地环境的忽略问题。其次,我们建立了意向图,将目标区域的意向网络融合到推荐过程中,以解决缺乏历史互动数据的问题。透过广泛的实验,我们证明UGPIG可以较前者优化可持续发展模式的推荐性能,最大改进率为9.61%。
Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs
results: 研究人员在一系列实际项目上进行了实验,结果显示,复杂度指导的采样方法可以提高代理的准确性。Abstract
Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.
摘要
Efficient Core-selecting Incentive Mechanism for Data Sharing in Federated Learning
paper_authors: Mengda Ji, Genjiu Xu, Jianjun Ge, Mingqiang Li for: This paper focuses on developing an incentive mechanism for federated learning that encourages participants to input high-quality data truthfully and promotes stable cooperation.methods: The authors use game-theoretic approaches and the concept of the core from cooperative games to design an incentive mechanism. They also propose an efficient core-selecting mechanism based on sampling approximation to reduce computational overhead.results: The proposed mechanism is able to incentivize inputting high-quality data and stable cooperation, while reducing computational overhead compared to the core-selecting mechanism. Extensive experiments verify the effectiveness of the proposed mechanism.Here’s the simplified Chinese text version:for: 这篇论文关注于为联合学习建立一种奖励机制,以便参与者输入高质量数据并寻求稳定合作。methods: 作者使用游戏理论和合作游戏中的核概念来设计奖励机制。他们还提出了一种基于抽样估计的有效核选机制,以降低计算开销。results: 提出的机制能够奖励输入高质量数据和稳定合作,同时降低计算开销相比核选机制。广泛的实验证明了机制的有效性。Abstract
Federated learning is a distributed machine learning system that uses participants' data to train an improved global model. In federated learning, participants cooperatively train a global model, and they will receive the global model and payments. Rational participants try to maximize their individual utility, and they will not input their high-quality data truthfully unless they are provided with satisfactory payments based on their data quality. Furthermore, federated learning benefits from the cooperative contributions of participants. Accordingly, how to establish an incentive mechanism that both incentivizes inputting data truthfully and promotes stable cooperation has become an important issue to consider. In this paper, we introduce a data sharing game model for federated learning and employ game-theoretic approaches to design a core-selecting incentive mechanism by utilizing a popular concept in cooperative games, the core. In federated learning, the core can be empty, resulting in the core-selecting mechanism becoming infeasible. To address this, our core-selecting mechanism employs a relaxation method and simultaneously minimizes the benefits of inputting false data for all participants. However, this mechanism is computationally expensive because it requires aggregating exponential models for all possible coalitions, which is infeasible in federated learning. To address this, we propose an efficient core-selecting mechanism based on sampling approximation that only aggregates models on sampled coalitions to approximate the exact result. Extensive experiments verify that the efficient core-selecting mechanism can incentivize inputting high-quality data and stable cooperation, while it reduces computational overhead compared to the core-selecting mechanism.
摘要
federated learning 是一种分布式机器学习系统,使用参与者的数据来训练global模型。在 federated learning 中,参与者合作训练 global模型,并将收到 global模型和支付。理智参与者会尽可能地提高自己的个人利益,而不会 Inputting truthful high-quality data Unless satisfactory payments are provided based on data quality. In addition, federated learning benefits from the cooperative contributions of participants. Therefore, how to establish an incentive mechanism that both incentivizes inputting data truthfully and promotes stable cooperation has become an important issue to consider.In this paper, we introduce a data sharing game model for federated learning and employ game-theoretic approaches to design a core-selecting incentive mechanism based on a popular concept in cooperative games, the core. In federated learning, the core can be empty, resulting in the core-selecting mechanism becoming infeasible. To address this, our core-selecting mechanism employs a relaxation method and simultaneously minimizes the benefits of inputting false data for all participants. However, this mechanism is computationally expensive because it requires aggregating exponential models for all possible coalitions, which is infeasible in federated learning.To address this, we propose an efficient core-selecting mechanism based on sampling approximation that only aggregates models on sampled coalitions to approximate the exact result. Extensive experiments verify that the efficient core-selecting mechanism can incentivize inputting high-quality data and stable cooperation, while it reduces computational overhead compared to the core-selecting mechanism.
results: 通过实验表明,QSW和Randomized Quasi-Sliced Wasserstein(RQSW)variant具有优秀的性能,可以应用于多种三维任务,如点云比较、点云插值、图像风格传递和深度点云自动编码器的训练。Abstract
Monte Carlo (MC) approximation has been used as the standard computation approach for the Sliced Wasserstein (SW) distance, which has an intractable expectation in its analytical form. However, the MC method is not optimal in terms of minimizing the absolute approximation error. To provide a better class of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of QMC for SW, we focus on the 3D setting, specifically computing the SW between probability measures in three dimensions. In greater detail, we empirically verify various ways of constructing QMC points sets on the 3D unit-hypersphere, including Gaussian-based mapping, equal area mapping, generalized spiral points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased estimation for stochastic optimization, we extend QSW into Randomized Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed low-discrepancy sequences. For theoretical properties, we prove the asymptotic convergence of QSW and the unbiasedness of RQSW. Finally, we conduct experiments on various 3D tasks, such as point-cloud comparison, point-cloud interpolation, image style transfer, and training deep point-cloud autoencoders, to demonstrate the favorable performance of the proposed QSW and RQSW variants.
摘要
蒙特卡洛(MC)方法已经被用作水平割(SW)距离的标准计算方法,但MC方法不是最佳的精度最小化方法。为提供更好的empirical SW,我们提议 quasi-sliced Wasserstein(QSW)近似方法,基于Quasi-Monte Carlo(QMC)方法。为了进行全面的QMC方法对SW的调查,我们在3D设置中计算了SW между概率分布。在更详细的描述中,我们在3D单位球上构建了QMC点集,包括高斯映射、等面积映射、通用螺旋点和优化误差能量。此外,为了获得无偏估的优化,我们将QSW扩展到随机化 quasi-sliced Wasserstein(RQSW)中,通过引入随机性来讲谱低误差序列。我们证明了QSW的极限减少和RQSW的无偏估性。最后,我们在various 3D任务上进行了实验,如点云比较、点云插值、图像风格传递和训练深点云自动编码器,以示我们提议的QSW和RQSW变体的优异性。
methods: 这篇论文使用了 Contextual Linear Setting 和奖励通信协议(Inc-FedUCB),实际验证了这个方法在不同环境下的效果。
results: 这篇论文的实验结果显示,这个方法可以在不同的数据集和环境下获得近乎最佳的 regret 性能和通信成本保证。Abstract
Most existing works on federated bandits take it for granted that all clients are altruistic about sharing their data with the server for the collective good whenever needed. Despite their compelling theoretical guarantee on performance and communication efficiency, this assumption is overly idealistic and oftentimes violated in practice, especially when the algorithm is operated over self-interested clients, who are reluctant to share data without explicit benefits. Negligence of such self-interested behaviors can significantly affect the learning efficiency and even the practical operability of federated bandit learning. In light of this, we aim to spark new insights into this under-explored research area by formally introducing an incentivized communication problem for federated bandits, where the server shall motivate clients to share data by providing incentives. Without loss of generality, we instantiate this bandit problem with the contextual linear setting and propose the first incentivized communication protocol, namely, Inc-FedUCB, that achieves near-optimal regret with provable communication and incentive cost guarantees. Extensive empirical experiments on both synthetic and real-world datasets further validate the effectiveness of the proposed method across various environments.
摘要
大多数现有的联合搜寻工作假设所有客户端都是善于分享其数据给服务器以实现共同好的,这是一个过于理想化的假设,在实际应用中经常被违反。特别是当算法运行在自私的客户端上时,这些客户端可能会拒绝分享数据没有显式的利益。忽略这种自私行为可能会对联合搜寻的学习效率和实际运行造成重要的影响。为了提供新的思想和挑战,我们正式引入了一个奖励通信问题,即服务器应该如何鼓励客户端分享数据,以提高联合搜寻的性能。不失一般性,我们在上下文分析的情况下实例化了这个带itul bandit问题,并提出了首个奖励通信协议,即Inc-FedUCB,该协议可以实现近似优化的停损 regret,同时具有可证明的通信和奖励成本保证。在synthetic和实际数据上进行了广泛的实验,证明了我们的方法在不同环境下的效果。
results: 该论文通过质量和量度分析表明,ISLAND方法在不同的云幕干扰和地面覆盖条件下具有良好的重建性能,并且具有高空间-时间分辨率。 authors还提供了20个美国城市的公共数据集,以便用于证明ISLAND方法的可行性和应用性。Abstract
Cloud occlusion is a common problem in the field of remote sensing, particularly for thermal infrared imaging. Remote sensing thermal instruments onboard operational satellites are supposed to enable frequent and high-resolution observations over land; unfortunately, clouds adversely affect thermal signals by blocking outgoing longwave radiation emission from Earth's surface, interfering with the retrieved ground emission temperature. Such cloud contamination severely reduces the set of serviceable thermal images for downstream applications, making it impractical to perform intricate time-series analysis of land surface temperature (LST). In this paper, we introduce a novel method to remove cloud occlusions from Landsat 8 LST images. We call our method ISLAND, an acronym for Informing Brightness and Surface Temperature Through a Land Cover-based Interpolator. Our approach uses thermal infrared images from Landsat 8 (at 30 m resolution with 16-day revisit cycles) and the NLCD land cover dataset. Inspired by Tobler's first law of Geography, ISLAND predicts occluded brightness temperature and LST through a set of spatio-temporal filters that perform distance-weighted spatio-temporal interpolation. A critical feature of ISLAND is that the filters are land cover-class aware, making it particularly advantageous in complex urban settings with heterogeneous land cover types and distributions. Through qualitative and quantitative analysis, we show that ISLAND achieves robust reconstruction performance across a variety of cloud occlusion and surface land cover conditions, and with a high spatio-temporal resolution. We provide a public dataset of 20 U.S. cities with pre-computed ISLAND thermal infrared and LST outputs. Using several case studies, we demonstrate that ISLAND opens the door to a multitude of high-impact urban and environmental applications across the continental United States.
摘要
云层遮挡是远程感知领域中常见的问题,尤其是对于thermal infrared成像。远程感知thermal仪器装载在运行的卫星上,旨在实现频繁和高分辨率的地表观测;然而,云层会阻挡地表发射的长波辐射,使得抽取地表温度的热成像受到抑制,从而减少可用的热成像数据,使得无法进行复杂的时间序分析。在这篇文章中,我们介绍了一种新的云层遮挡除去方法,称为ISLAND(表示地表温度和辐射通过地域涂抹 interpolator)。我们的方法使用卫星8的热红外成像(分辨率30米,复杂周期16天)和NLCD地表覆盖数据。受到 Tobler's first law of Geography 的激发,ISLAND 预测云层遮挡的明亮温度和地表温度通过一系列的空间时间滤波来实现。我们的方法的一个关键特点是滤波是根据地表覆盖类型进行地域涂抹,这使得它在复杂的城市环境中具有优势。通过质量和量化分析,我们显示了ISLAND 在多种云层遮挡和地表覆盖条件下具有强健的重建性,并且具有高空间时间分辨率。我们提供了20个美国城市的前计算ISLAND 热红外成像和地表温度输出数据。通过多个案例研究,我们示出了ISLAND 可以开启许多高影响的城市和环境应用程序,覆盖整个北美大陆。
Bloch Equation Enables Physics-informed Neural Network in Parametric Magnetic Resonance Imaging
methods: 提出使用物理规则embedded into the loss of physics-informed neural network(PINN)来学习 Bloch equation,并且通过这种方法来估算T2参数和生成physically synthetic data。
results: 在phantom和cardiac imaging中进行了实验,并得到了这种方法的潜在应用于量化MRI中的可能性。Abstract
Magnetic resonance imaging (MRI) is an important non-invasive imaging method in clinical diagnosis. Beyond the common image structures, parametric imaging can provide the intrinsic tissue property thus could be used in quantitative evaluation. The emerging deep learning approach provides fast and accurate parameter estimation but still encounters the lack of network interpretation and enough training data. Even with a large amount of training data, the mismatch between the training and target data may introduce errors. Here, we propose one way that solely relies on the target scanned data and does not need a pre-defined training database. We provide a proof-of-concept that embeds the physical rule of MRI, the Bloch equation, into the loss of physics-informed neural network (PINN). PINN enables learning the Bloch equation, estimating the T2 parameter, and generating a series of physically synthetic data. Experimental results are conducted on phantom and cardiac imaging to demonstrate its potential in quantitative MRI.
摘要
paper_authors: Zhou Zhang, Saman Atapattu, Yizhu Wang, Marco Di Renzo
for: 提高分布式网络中多用户通道访问的优化
methods: 基于可配置智能面(RIS)的分布式CSMA/CA策略,包括机会检测和避免冲突
results: 提出了一种优化的分布式CSMA/CA策略,可以 maximize 系统吞吐量,并且在数据分析和实验验证中表现出色,与现有方法相比表现更优。Abstract
This paper focuses on achieving optimal multi-user channel access in distributed networks using a reconfigurable intelligent surface (RIS). The network includes wireless channels with direct links between users and RIS links connecting users to the RIS. To maximize average system throughput, an optimal channel access strategy is proposed, considering the trade-off between exploiting spatial diversity gain with RIS assistance and the overhead of channel probing. The paper proposes an optimal distributed Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) strategy with opportunistic RIS assistance, based on statistics theory of optimal sequential observation planned decision. Each source-destination pair makes decisions regarding the use of direct links and/or probing source-RIS-destination links. Channel access occurs in a distributed manner after successful channel contention. The optimality of the strategy is rigorously derived using multiple-level pure thresholds. A distributed algorithm, which achieves significantly lower online complexity at $O(1)$, is developed to implement the proposed strategy. Numerical simulations verify the theoretical results and demonstrate the superior performance compared to existing approaches.
摘要
Translation:这篇论文关注了使用分布式网络中的可配置智能表面(RIS)实现最佳多用户通道访问。网络包括无线通道和RIS连接用户和RIS之间的连接。为了最大化系统吞吐量,提出了一种最佳的多用户通道访问策略,考虑了RIS协助下的空间多普通耗和扫描过程的开销。提出了基于统计学理论的最佳分布式CSMA/CA策略,每个源-目的对象对使用直接链接和/或探测源-RIS-目的链接进行决策。通道访问发生在分布式方式下,并且在成功扫描后进行通道竞争。提出的策略的优化性基于多级纯阈值理论。开发了一种实现该策略的分布式算法,具有较低的在线复杂度($O(1)$)。numerical simulations verify the theoretical results and demonstrate the superior performance compared to existing approaches.
Deep Reinforcement Learning for Backscatter Communications: Augmenting Intelligence in Future Internet of Things
results: 研究表明,DRL可以帮助BC系统提高性能和可靠性,同时也可以减少能耗。一个使用RIS增强非对称多access BC系统的实践案例也被详细探讨,以 highlight its potential。Abstract
Backscatter communication (BC) technology offers sustainable solutions for next-generation Internet-of-Things (IoT) networks, where devices can transmit data by reflecting and adjusting incident radio frequency signals. In parallel to BC, deep reinforcement learning (DRL) has recently emerged as a promising tool to augment intelligence and optimize low-powered IoT devices. This article commences by elucidating the foundational principles underpinning BC systems, subsequently delving into the diverse array of DRL techniques and their respective practical implementations. Subsequently, it investigates potential domains and presents recent advancements in the realm of DRL-BC systems. A use case of RIS-aided non-orthogonal multiple access BC systems leveraging DRL is meticulously examined to highlight its potential. Lastly, this study identifies and investigates salient challenges and proffers prospective avenues for future research endeavors.
摘要
🇨🇳 备受关注的技术:后递射通信(BC)技术可以为下一代互联网关键设备(IoT)网络提供可持续的解决方案,其中设备可以通过反射和调整 incident 无线电频信号来传输数据。同时,深度强化学习(DRL)技术在最近几年内 emerge 为优化低功耗 IoT 设备的工具。本文从 BC 系统的基础原理出发,然后介绍了多种 DRL 技术和其实践。接着,它 investigate 了 BC-DRL 系统在不同领域的应用前景,并 analyze 了一些最新的进展。最后,本文详细介绍了 RIS-assisted 非对称多接入 BC 系统的应用,以 illustrate 其潜在的优势。总之,本文概括了 BC 技术和 DRL 技术的相互作用,并 analyze 了它们在 IoT 网络中的应用前景。此外,它还提出了未来研究的挑战和机遇。
Secure Degree of Freedom of Wireless Networks Using Collaborative Pilots
results: 研究发现了一些重要的结论,包括:a) 阶段1的SDoF相同于多用户ANECE和对等ANECE,但前者可能需要较少的时间槽数;b) 三个节点网络中的阶段2SDoF通常比对等ANECE更大;c) 两节点网络中使用修改后的ANECE,使用块形非零幂频道Matrix,可以提高总的SDoF。这些多用户ANECE和修改后的两节点ANECE在安全度量方面与每个节点使用给定数量的天线进行发送和接收是今天已知最佳的全双工协议。Abstract
A wireless network of full-duplex nodes/users, using anti-eavesdropping channel estimation (ANECE) based on collaborative pilots, can yield a positive secure degree-of-freedom (SDoF) regardless of the number of antennas an eavesdropper may have. This paper presents novel results on SDoF of ANECE by analyzing secret-key capacity (SKC) of each pair of nodes in a network of multiple collaborative nodes per channel coherence period. Each transmission session of ANECE has two phases: phase 1 is used for pilots, and phase 2 is used for random symbols. This results in two parts of SDoF of ANECE. Both lower and upper bounds on the SDoF of ANECE for any number of users are shown, and the conditions for the two bounds to meet are given. This leads to important discoveries, including: a) The phase-1 SDoF is the same for both multi-user ANECE and pair-wise ANECE while the former may require only a fraction of the number of time slots needed by the latter; b) For a three-user network, the phase-2 SDoF of all-user ANECE is generally larger than that of pair-wise ANECE; c) For a two-user network, a modified ANECE deploying square-shaped nonsingular pilot matrices yields a higher total SDoF than the original ANECE. The multi-user ANECE and the modified two-user ANECE shown in this paper appear to be the best full-duplex schemes known today in terms of SDoF subject to each node using a given number of antennas for both transmitting and receiving.
摘要
一个无线网络,由全双工节点/用户组成,使用反听抓取渠道估计(ANECE),可以获得一定的安全度量(SDoF),无论抓取者具有多少天线。这篇论文提出了新的SDoF结果,通过分析每对节点的秘密键容量(SKC),并分析每个通信会话的两个阶段:第一阶段用于测试,第二阶段用于随机符号。这导致了两个SDoF的部分,其中一个是第一阶段的SDoF,另一个是第二阶段的SDoF。这篇论文还提供了对SDoF的下界和上界,以及这两个界限之间的条件。这些结果包括:a) 第一阶段SDoF在多用户ANECE和对应的对抗式ANECE中是相同的,而后者可能需要更少的时间槽数;b) 对于三个用户网络,第二阶段SDoF的全用户ANECE通常大于对抗式ANECE的SDoF;c) 对于两个用户网络,使用方形非零幂测试矩阵的修改后ANECE可以获得更高的总SDoF,比原始ANECE更高。这些多用户ANECE和修改后的两用户ANECE在今天可能是使用给定数量天线的最佳全双工方案,从SDoF的角度来看。
Near Field Optimization Algorithm for Reconfigurable Intelligent Surface
results: 通过电磁动力学 simulations,研究人员发现该算法可以很有效地重新配置智能表面,使电磁波能够强制方向性地传递到点 interests。Abstract
Reconfigurable intelligent surface (RIS) is a type of wireless communication technology that uses a reconfigurable surface, such as a wall or building that is able to adjust its properties by an integrated optimization algorithm in order to optimize the signal propagation for a given communication scenario. As a reconfiguration algorithm the multidimensional optimization of the GNU scientific library was analyzed to evaluate the performance of the smart surface in the quality of signal reception. This analysis took place by means of electrodynamic simulations based on the finite difference time domain method. Through these simulations it was possible to observe the efficiency of the algorithm in the reconfiguration of the RIS, managing to focus the electromagnetic waves in a remarkable way towards the point of interest.
摘要
智能表面重配置技术 (RIS) 是一种无线通信技术,使用可重配置的表面,如墙或建筑物,通过内置优化算法来调整其属性,以优化给定通信场景中信号协议的传播。作为重配置算法,多维度优化 GNU 科学库的分析进行了评估,以评估智能表面在信号接收质量方面的性能。这种分析通过基于 Finite Difference Time Domain 方法的电磁动力学模拟来进行。通过这些模拟,可以观察智能表面重配置算法的效率,并能够很有效地将电磁波集中到 interess point。
RadYOLOLet: Radar Detection and Parameter Estimation Using YOLO and WaveLet
results: 根据我们的评估,RadYOLOLet 可以在不同的实验中,实现 100% 的 radar 探测精度,并且可以在干扰输入比例 (SINR) up to 16 dB 下运作正确。Abstract
Detection of radar signals without assistance from the radar transmitter is a crucial requirement for emerging and future shared-spectrum wireless networks like Citizens Broadband Radio Service (CBRS). In this paper, we propose a supervised deep learning-based spectrum sensing approach called RadYOLOLet that can detect low-power radar signals in the presence of interference and estimate the radar signal parameters. The core of RadYOLOLet is two different convolutional neural networks (CNN), RadYOLO and Wavelet-CNN, that are trained independently. RadYOLO operates on spectrograms and provides most of the capabilities of RadYOLOLet. However, it suffers from low radar detection accuracy in the low signal-to-noise ratio (SNR) regime. We develop Wavelet-CNN specifically to deal with this limitation of RadYOLO. Wavelet-CNN operates on continuous Wavelet transform of the captured signals, and we use it only when RadYOLO fails to detect any radar signal. We thoroughly evaluate RadYOLOLet using different experiments corresponding to different types of interference signals. Based on our evaluations, we find that RadYOLOLet can achieve 100% radar detection accuracy for our considered radar types up to 16 dB SNR, which cannot be guaranteed by other comparable methods. RadYOLOLet can also function accurately under interference up to 16 dB SINR.
摘要
“探测无助者的激光讯号是未来共享频率无线网络的重要需求,如公民广播电台服务(CBRS)。在这篇论文中,我们提出了一个监督学习基于的对应方法,名为RadYOLOLet,可以探测低功率激光讯号在干扰下的存在,并且估算激光讯号的参数。RadYOLOLet的核心是两个不同的卷积神经网(CNN):RadYOLO和浪潮-CNN。这两个神经网在独立地训练。RadYOLO在spectrogram中运作,它提供了RadYOLOLet的大部分功能。然而,它在低信号载波比例(SNR) regime下的激光探测精度较低。我们为了解决这个问题,我们开发了特别的浪潮-CNN,它在捕捉到的信号中使用浪潮变换,并且仅在RadYOLO失败探测任何激光讯号时使用。我们对RadYOLOLet进行了不同类型的实验,包括不同类型的干扰信号。根据我们的评估,RadYOLOLet可以在考虑的激光型别上达到100%的探测精度,并且在干扰较高的16 dB SINR下还能正确运作。”
UAV Swarm Deployment and Trajectory for 3D Area Coverage via Reinforcement Learning
For: 本文旨在研究无人飞行器群(UAV群)的投放和轨迹计划,以满足三维(3D)enario中的无线通信服务。* Methods: 本文提出了层次群组织机制,以有效地服务大面积用户。问题转化为最小化UAV群的总轨迹损失。但问题具有非托Formatter property,因此将其拆分为用户卷积、UAV群停留点选择和群 trajectory 确定。* Results: 本文采用Q学习算法加速解决效率。经过广泛的 simulations,提出的机制和算法被证明超过其他相关方法。Abstract
Unmanned aerial vehicles (UAVs) are recognized as promising technologies for area coverage due to the flexibility and adaptability. However, the ability of a single UAV is limited, and as for the large-scale three-dimensional (3D) scenario, UAV swarms can establish seamless wireless communication services. Hence, in this work, we consider a scenario of UAV swarm deployment and trajectory to satisfy 3D coverage considering the effects of obstacles. In detail, we propose a hierarchical swarm framework to efficiently serve the large-area users. Then, the problem is formulated to minimize the total trajectory loss of the UAV swarm. However, the problem is intractable due to the non-convex property, and we decompose it into smaller issues of users clustering, UAV swarm hovering points selection, and swarm trajectory determination. Moreover, we design a Q-learning based algorithm to accelerate the solution efficiency. Finally, we conduct extensive simulations to verify the proposed mechanisms, and the designed algorithm outperforms other referred methods.
摘要
无人飞行器(UAV)被认为是广泛覆盖区域的有望技术,由于它们的灵活和适应能力。然而,单个UAV的能力有限,而在大规模三维(3D)场景中,UAV群可以建立无缝无线通信服务。因此,在这项工作中,我们考虑了UAV群的部署和轨迹,以满足3D覆盖的需求,并考虑了障碍物的影响。在详细的描述中,我们提出了层次群组织,以高效地服务于大面积用户。然后,我们将问题定义为最小化UAV群的总轨迹损失。然而,问题的非核心性使得它不可解,我们将其分解为用户划分、UAV群停留点选择和群轨迹决定。此外,我们设计了Q学习算法,以加速解决效率。最后,我们进行了广泛的 simulate 来验证我们的机制,并发现我们的设计算法比其他已知方法更高效。
Alteration of skeletal muscle energy metabolism assessed by 31P MRS in clinical routine, part 2: Clinical application
paper_authors: Antoine Naëgel, Hélène Ratiney, Jabrane Karkouri, Djahid Kennouche, Nicolas Royer, Jill M Slade, Jérôme Morel, Pierre Croisille, Magalie Viallon
For: This study aimed to evaluate the impact of an advanced quality control pipeline on dynamic 31P-MRS studies of two patient populations with different types of fatigue, COVID-19 and multiple sclerosis (MS).* Methods: The study used 31P-MRS on a 3T clinical MRI to collect data from 19 COVID-19 patients, 38 MS patients, and 40 healthy controls. The advanced quality control pipeline was applied to the selected patient cohorts to investigate its impact on clinical outcomes.* Results: The application of the quality control pipeline resulted in increased statistical power, changed the values of several outcome measures, and reduced variability. Significant differences were found between the two patient populations and healthy controls for several metabolite concentrations, including T1PCr and T1Pi for MS patients, and resting [PCr], [Pi], [ADP], [H2PO4-], and pH for COVID-19 patients. Additionally, the use of a fixed correction factor led to systematically higher estimated concentrations of PCr and Pi than when using individually corrected factors.Abstract
Background: In this second part of a two-part paper, we intend to demonstrate the impact of the previously proposed advanced quality control pipeline. To understand its benefit and challenge the proposed methodology in a real scenario, we chose to compare the outcome when applying it to the analysis of two patient populations with a significant but highly different types of fatigue: COVID19 and multiple sclerosis (MS). Experimental: 31P-MRS was performed on a 3T clinical MRI, in 19 COVID19 patients, 38 MS patients, and 40 matched healthy controls. Dynamic acquisitions using an MR-compatible ergometer ran over a rest(40s), exercise(2min), and a recovery phase(6min). Long and short TR acquisitions were also made at rest for T1 correction. The advanced data quality control pipeline presented in part 1 is applied to the selected patient cohorts to investigate its impact on clinical outcomes. We first used power and sample size analysis to estimate objectively the impact of adding QCS. Then, comparisons between patients and healthy control groups using validated QCS were performed using unpaired T-tests or Mann-Whitney tests (p<0.05).Results: The application of the QCS resulted in increased statistical power, changed the values of several outcome measures, and reduced variability (SD). A significant difference was found between the T1PCr and T1Pi of MS patients and healthy controls. Furthermore, the use of a fixed correction factor led to systematically higher estimated concentrations of PCr and Pi than when using individually corrected factors. We observed significant differences between the two patient populations and healthy controls for resting [PCr] -- MS only, [Pi], [ADP], [H2PO4-] and pH -- COVID19 only, and post-exercise [PCr],[Pi] and [H2PO4-] - MS only. The dynamic indicators $\tau$PCr, $\tau$Pi, ViPCr and Vmax were reduced for COVID19 and MS patients compared to controls. Conclusion: Our results show that QCS in dynamic 31P-MRS studies results in smaller data variability and therefore impacts study sample size and power. Although QCS resulted in discarded data and therefore reduced the acceptable data and subject numbers, this rigorous and unbiased approach allowed for proper assessment of muscle metabolites and metabolism in patient populations. The outcomes include an increased metabolite T1, which directly affect the T1 correction factor applied to the amplitudes of the metabolite, and a prolonged $\tau$PCr indicating reduced muscle oxidative capacity for patients with MS and COVID19.
摘要
Background: 在这篇两部分文章的第二部分中,我们想要证明先前提出的高级质量控制管道的影响。为了了解其效果和挑战,我们选择了对两种不同类型的疲劳 patient population进行比较:COVID-19和多发性硬化病(MS)。Experimental: 我们使用3T临床MRI设备进行31P-MRS测量,共有19例COVID-19患者、38例MS患者和40例健康控制群。动态获取使用MR相容的耐力测试器在休息(40秒)、运动(2分)和恢复阶段(6分)进行测量。同时,我们还进行了长TR和短TR的获取,以便对T1的修正。我们对选择的患者群进行了高级数据质量控制管道的应用,以调查其影响临床结果。我们首先使用力和样本大小分析来对添加QCS的影响进行 объектив评估。然后,我们使用无对照组T检测或曼恩-怀特评估测试(p<0.05)来比较患者和健康控制群。Results: QCS的应用导致数据变量减少,提高了统计力,并改变了一些结果探索结果。COVID-19和MS患者的T1PCr和T1Pi与健康控制群相比显著不同。此外,使用固定修正因子导致PCr和Pi的估计值高于使用个体修正因子。我们发现COVID-19和MS患者在休息期的PCr、Pi、ADP、H2PO4-和pH中具有显著差异。在运动后,COVID-19和MS患者的PCr、Pi和H2PO4-中具有显著差异。动态指标$\tau$PCr、$\tau$Pi、ViPCr和Vmax在COVID-19和MS患者中相比于控制群表现为下降。Conclusion: 我们的结果表明,在动态31P-MRS研究中应用QCS会减少数据变量,因此影响研究样本大小和 statistically power。虽然QCS导致了抛弃数据,因此减少了可接受的数据和试验者数量,但这种不偏袋中和不偏障的方法允许我们对患者群进行正确的 метабоلит和代谢评估。结果包括T1的增加,直接影响了应用于激发物质的T1修正因子,以及COVID-19和MS患者的 prolonged $\tau$PCr, indicating reduced muscle oxidative capacity.
Index Modulation-based Information Harvesting for Far-Field RF Power Transfer
results: 研究结果表明,通过在现有的远场能量传输系统中应用IM技术,可以实现数据传输,特别在下一代物联网无线网络中表现出了明显的潜力。Abstract
While wireless information transmission (WIT) is evolving into its sixth generation (6G), maintaining terminal operations that rely on limited battery capacities has become one of the most paramount challenges for Internet-of-Things (IoT) platforms. In this respect, there exists a growing interest in energy harvesting technology from ambient resources, and wireless power transfer (WPT) can be the key solution towards enabling battery-less infrastructures referred to as zero-power communication technology. Indeed, eclectic integration approaches between WPT and WIT mechanisms are becoming a vital necessity to limit the need for replacing batteries. Beyond the conventional separation between data and power components of the emitted waveforms, as in simultaneous wireless information and power transfer (SWIPT) mechanisms, a novel protocol referred to as information harvesting (IH) has recently emerged. IH leverages existing WPT mechanisms for data communication by incorporating index modulation (IM) techniques on top of the existing far-field power transfer mechanism. In this paper, a unified framework for the IM-based IH mechanisms has been presented where the feasibility of various IM techniques are evaluated based on different performance metrics. The presented results demonstrate the substantial potential to enable data communication within existing far-field WPT systems, particularly in the context of next-generation IoT wireless networks.
摘要
sixth generation 无线信息传输 (6G) 的发展,使得互联网物联网 (IoT) 平台上的终端设备靠电池能力有限的问题变得非常紧迫。在这种情况下,能量收集技术从周围环境的资源成为了一种不可或缺的解决方案。无线电力传输 (WPT) 可以是针对无电池基础设施的关键解决方案。此外,将 WPT 和无线信息传输 (WIT) 机制结合在一起,以限制更换电池的需求。在传统的数据和电力两部分分离的情况下,新的协议被称为信息收集 (IH),利用现有的 WPT 机制来实现数据传输。在这篇论文中,一种基于 индекс修改 (IM) 技术的 IH 机制的一体化框架被提出,并对不同的性能指标进行评估。获得的结果表明,可以在现有的远场 WPT 系统中实现数据传输,特别是在下一代 IoT 无线网络中。
Multi-Passive/Active-IRS Enhanced Wireless Coverage: Deployment Optimization and Cost-Performance Trade-off
results: 研究人员通过调整PIRS/AIRS的数量和部署位置,以实现给定的信噪比(SNR)目标,同时尽量降低总部署成本。 simulation结果表明,提议的算法可以在困难的 combinatorial optimization 问题中做出优化的选择,并且在cost-performance trade-off中表现更好。Abstract
Both passive and active intelligent reflecting surfaces (IRSs) can be deployed in complex environments to enhance wireless network coverage by creating multiple blockage-free cascaded line-of-sight (LoS) links. In this paper, we study a multi-passive/active-IRS (PIRS/AIRS) aided wireless network with a multi-antenna base station (BS) in a given region. First, we divide the region into multiple non-overlapping cells, each of which may contain one candidate location that can be deployed with a single PIRS or AIRS. Then, we show several trade-offs between minimizing the total IRS deployment cost and enhancing the signal-to-noise ratio (SNR) performance over all cells via direct/cascaded LoS transmission with the BS. To reconcile these trade-offs, we formulate a joint multi-PIRS/AIRS deployment problem to select an optimal subset of all candidate locations for deploying IRS and also optimize the number of passive/active reflecting elements deployed at each selected location to satisfy a given SNR target over all cells, such that the total deployment cost is minimized. However, due to the combinatorial optimization involved, the formulated problem is difficult to be solved optimally. To tackle this difficulty, we first optimize the reflecting element numbers with given PIRS/AIRS deployed locations via sequential refinement, followed by a partial enumeration to determine the PIRS/AIRS locations. Simulation results show that our proposed algorithm achieves better cost-performance trade-offs than other baseline deployment strategies.
摘要
<TRANSLATE_TEXT> Both passive and active intelligent reflecting surfaces (IRSs) can be deployed in complex environments to enhance wireless network coverage by creating multiple blockage-free cascaded line-of-sight (LoS) links. In this paper, we study a multi-passive/active-IRS (PIRS/AIRS) aided wireless network with a multi-antenna base station (BS) in a given region. First, we divide the region into multiple non-overlapping cells, each of which may contain one candidate location that can be deployed with a single PIRS or AIRS. Then, we show several trade-offs between minimizing the total IRS deployment cost and enhancing the signal-to-noise ratio (SNR) performance over all cells via direct/cascaded LoS transmission with the BS. To reconcile these trade-offs, we formulate a joint multi-PIRS/AIRS deployment problem to select an optimal subset of all candidate locations for deploying IRS and also optimize the number of passive/active reflecting elements deployed at each selected location to satisfy a given SNR target over all cells, such that the total deployment cost is minimized. However, due to the combinatorial optimization involved, the formulated problem is difficult to be solved optimally. To tackle this difficulty, we first optimize the reflecting element numbers with given PIRS/AIRS deployed locations via sequential refinement, followed by a partial enumeration to determine the PIRS/AIRS locations. Simulation results show that our proposed algorithm achieves better cost-performance trade-offs than other baseline deployment strategies.Translated by Google Translate
REM-U-net: Deep Learning Based Agile REM Prediction with Energy-Efficient Cell-Free Use Case
paper_authors: Hazem Sallouha, Shamik Sarkar, Enes Krijestorac, Danijela Cabric for:这篇论文是为了提出一种快速、准确地预测Radio环境地图(REM)的深度学习方法,以便优化无线网络部署、提高网络性能和有效地管理频率资源。methods:该论文使用了u-net网络,并在大规模3D地图 dataset上进行了训练。此外,文章还提出了数据处理步骤来进一步改进 REM 预测精度。results:论文在2023年IEEE ICASSP Signal Processing Grand Challenge中进行了评估,得到了0.045的 normalized root-mean-square error(RMSE)和14毫秒的平均运行时间。此外,文章还示出了在CF-mMIMO网络中预测 REM 的精度可以代替大规模的折射测量,从而减少能源消耗。Abstract
Radio environment maps (REMs) hold a central role in optimizing wireless network deployment, enhancing network performance, and ensuring effective spectrum management. Conventional REM prediction methods are either excessively time-consuming, e.g., ray tracing, or inaccurate, e.g., statistical models, limiting their adoption in modern inherently dynamic wireless networks. Deep-learning-based REM prediction has recently attracted considerable attention as an appealing, accurate, and time-efficient alternative. However, existing works on REM prediction using deep learning are either confined to 2D maps or use a limited dataset. In this paper, we introduce a runtime-efficient REM prediction framework based on u-nets, trained on a large-scale 3D maps dataset. In addition, data preprocessing steps are investigated to further refine the REM prediction accuracy. The proposed u-net framework, along with preprocessing steps, are evaluated in the context of the 2023 IEEE ICASSP Signal Processing Grand Challenge, namely, the First Pathloss Radio Map Prediction Challenge. The evaluation results demonstrate that the proposed method achieves an average normalized root-mean-square error (RMSE) of 0.045 with an average of 14 milliseconds (ms) runtime. Finally, we position our achieved REM prediction accuracy in the context of a relevant cell-free massive multiple-input multiple-output (CF-mMIMO) use case. We demonstrate that one can obviate consuming energy on large-scale fading measurements and rely on predicted REM instead to decide on which sleep access points (APs) to switch on in a CF-mMIMO network that adopts a minimum propagation loss AP switch ON/OFF strategy.
摘要
Radio 环境地图 (REM) 在无线网络部署、提高网络性能和有效spectrum管理中扮演中心角色。传统的 REM 预测方法是 either 过时 consume 时间 (如射线追踪) 或者不准确 (如统计模型),这限制了它们在现代自然动态无线网络中的采用。深度学习基于的 REM 预测在最近吸引了大量关注,因为它们是一种吸引人的、准确的和高效的替代方案。然而,现有的 REM 预测使用深度学习的工作都是 confined to 2D 地图或者使用有限的数据集。在这篇文章中,我们提出了一个高效的 REM 预测框架,基于 u-nets,在大规模 3D 地图数据集上进行训练。此外,我们也 investigate 了数据预处理步骤,以进一步精细化 REM 预测精度。我们的提出的 u-net 框架、预处理步骤和评估结果在 2023 IEEE ICASSP Signal Processing Grand Challenge 中进行了评估。结果表明,我们的方法在normalized root-mean-square error (RMSE) 方面 achieve 平均值为 0.045,并且平均运行时间为 14 毫秒。最后,我们将我们实现的 REM 预测精度与相关的 cell-free massive multiple-input multiple-output (CF-mMIMO) 应用场景进行比较。我们表明,可以不消耗大量的能源进行大规模的折射损失测量,而是可以依靠预测的 REM 来决定在 CF-mMIMO 网络中 Switch ON/OFF 的大量睡眠Access Points (APs)。
On the Performance Analysis of RIS-Empowered Communications Over Nakagami-m Fading
paper_authors: Dimitris Selimis, Kostas P. Peppas, George C. Alexandropoulos, Fotis I. Lazarakis
for: 研究了无线通信透过具备自适应智能面(RISs)的 nakagami-m 调频通道性能。
methods: 考虑了两种阶段配置设计 для RIS:一个随机的和另一个基于协调频率调整。
results: 显示了对 binary 调变方案的停机概率、错误率和均质质量的单纯积分表达,并提出了精确的关键数据表示。Abstract
In this paper, we study the performance of wireless communications empowered by Reconfigurable Intelligent Surface (RISs) over Nakagami-m fading channels. We consider two phase configuration designs for the RIS, one random and another one based on coherent phase shifting. For both phase configuration cases, we present single-integral expressions for the outage probability and the bit error rate of binary modulation schemes, which can be efficiently evaluated numerically. In addition, we propose accurate closed-form approximations for the ergodic capacity of the considered system. For all considered metrics, we have also derived simple analytical expressions that become tight for large numbers of RIS reflecting elements. Numerically evaluated results compared with Monte Carlo simulations are presented in order to verify the correctness of the proposed analysis and showcase the impact of various system settings.
摘要
在这篇论文中,我们研究了基于可 configurable智能表面(RIS)的无线通信系统在 nakagami-m 折射通道上的性能。我们考虑了两种阶段配置设计 для RIS,一个是随机的,另一个是基于 coherent 相位调制。对于两种阶段配置情况,我们提供了单一积分表达式,可以高效地评估 numerically。此外,我们提出了准确的闭式表达式,用于评估系统的平均容量。对所有考虑的指标,我们还 deriv了简单的分析表达式,这些表达式在大量 RIS 反射元件时变得紧张。我们通过与 Monte Carlo 仿真结果进行比较,以验证我们的分析的正确性,并显示了不同系统设置对系统性能的影响。
Near-Field Beam Training: Joint Angle and Range Estimation with DFT Codebook
results: 经过numerical simulations表明,提议方法可以大幅降低near-field beam training的训练开销和提高范围估计精度,与various benchmark schemes相比有显著优势Abstract
Prior works on near-field beam training have mostly assumed dedicated polar-domain codebook and on-grid range estimation, which, however, may suffer long training overhead and degraded estimation accuracy. To address these issues, we propose in this paper new and efficient beam training schemes with off-grid range estimation by using conventional discrete Fourier transform (DFT) codebook. Specifically, we first analyze the received beam pattern at the user when far-field beamforming vectors are used for beam scanning, and show an interesting result that this beam pattern contains useful user angle and range information. Then, we propose two efficient schemes to jointly estimate the user angle and range with the DFT codebook. The first scheme estimates the user angle based on a defined angular support and resolves the user range by leveraging an approximated angular support width, while the second scheme estimates the user range by minimizing a power ratio mean square error (MSE) to improve the range estimation accuracy. Finally, numerical simulations show that our proposed schemes greatly reduce the near-field beam training overhead and improve the range estimation accuracy as compared to various benchmark schemes.
摘要
先前的远近场域训练研究多做出了专门的极域编码ebook和在网格上的距离估计,但这些方法可能会带来长时间的训练开销和估计精度下降。为了解决这些问题,本文提出了一些新的和高效的远近场域训练方案,使用常见的快速傅立叶变换(DFT)编码ebook。 Specifically, we first analyze the received beam pattern at the user when far-field beamforming vectors are used for beam scanning, and show an interesting result that this beam pattern contains useful user angle and range information. Then, we propose two efficient schemes to jointly estimate the user angle and range with the DFT codebook. The first scheme estimates the user angle based on a defined angular support and resolves the user range by leveraging an approximated angular support width, while the second scheme estimates the user range by minimizing a power ratio mean square error (MSE) to improve the range estimation accuracy. Finally, numerical simulations show that our proposed schemes greatly reduce the near-field beam training overhead and improve the range estimation accuracy as compared to various benchmark schemes.
Joint Beamforming for RIS Aided Full-Duplex Integrated Sensing and Uplink Communication
paper_authors: Yuan Guo, Yang Liu, Qingqing Wu, Xin Zeng, Qingjiang Shi
For: This paper studies the integrated sensing and communication (ISAC) technology in a full-duplex (FD) uplink communication system, with the aim of improving the uninterrupted target sensing and reducing self-interference (SI).* Methods: The paper employs reconfigurable intelligent surface (RIS) technology to improve the SI suppression and signal processing gain, and develops an iterative solution using convex optimization techniques such as majorization-minimization (MM) and penalty-dual-decomposition (PDD) to optimize all variables.* Results: Numerical results demonstrate the effectiveness of the proposed solution and the great benefit of employing RIS in the FD ISAC system.Abstract
This paper studies integrated sensing and communication (ISAC) technology in a full-duplex (FD) uplink communication system. As opposed to the half-duplex system, where sensing is conducted in a first-emit-then-listen manner, FD ISAC system emits and listens simultaneously and hence conducts uninterrupted target sensing. Besides, impressed by the recently emerging reconfigurable intelligent surface (RIS) technology, we also employ RIS to improve the self-interference (SI) suppression and signal processing gain. As will be seen, the joint beamforming, RIS configuration and mobile users' power allocation is a difficult optimization problem. To resolve this challenge, via leveraging the cutting-the-edge majorization-minimization (MM) and penalty-dual-decomposition (PDD) methods, we develop an iterative solution that optimizes all variables via using convex optimization techniques. Numerical results demonstrate the effectiveness of our proposed solution and the great benefit of employing RIS in the FD ISAC system.
摘要
Semi-Supervised Variational Inference over Nonlinear Channels
methods: 这篇论文使用了 semi-supervised learning 方法,包括 Monte Carlo expectation maximization 和 variational autoencoder,以解码未知非线性通信 канаnl。
results: 这些方法可以充分利用少量的试验符号和数据payload,并且在充分多的数据payload情况下,variational autoencoder 也可以实现更低的错误率,比 meta learning 使用当前和前一个传输块的试验符号。Abstract
Deep learning methods for communications over unknown nonlinear channels have attracted considerable interest recently. In this paper, we consider semi-supervised learning methods, which are based on variational inference, for decoding unknown nonlinear channels. These methods, which include Monte Carlo expectation maximization and a variational autoencoder, make efficient use of few pilot symbols and the payload data. The best semi-supervised learning results are achieved with a variational autoencoder. For sufficiently many payload symbols, the variational autoencoder also has lower error rate compared to meta learning that uses the pilot data of the present as well as previous transmission blocks.
摘要
深度学习方法在未知非线性通道上进行通信已经吸引了相当多的关注。在这篇论文中,我们考虑使用变量推理的半监督学习方法来解码未知非线性通道。这些方法包括Monte Carlo预期最大化和变量自适应器,它们可以充分利用几个示例符号和数据 payload。变量自适应器在具有足够多payload符号时实现最佳半监督学习结果,并且在使用当前和前一个传输块的Meta学习时也具有较低的错误率。
A Comprehensive Study of PAPR Reduction Techniques for Deep Joint Source Channel Coding in OFDM Systems
paper_authors: Maolin Liu, Wei Chen, Jialong Xu, Bo Ai
for: 这篇论文主要针对的是深度联合源渠道编码(DJSCC)系统中的干扰率(PAPR)问题。
methods: 本论文使用了多种OFDM干扰率减少技术,包括传统技术such as clipping、companding、SLM和PTS,以及深度学习基于的PAPR减少技术such as PAPR损失和clipping with retraining。
results: 我们的调查发现,虽然传统的PAPR减少技术可以应用于DJSCC,但其性能与传统的分源渠道编码不同。此外,我们发现,对信号损害PAPR减少技术,clipping with retraining可以在DJSCC中实现最好的性能,并且不会对信号重建率产生负面影响。同时,对信号非损害PAPR减少技术可以成功地减少DJSCC中的PAPR,不会影响信号重建率。Abstract
Recently, deep joint source channel coding (DJSCC) techniques have been extensively studied and have shown significant performance with limited bandwidth and low signal to noise ratio. Most DJSCC work considers discrete-time analog transmission, while combining it with orthogonal frequency division multiplexing (OFDM) creates serious high peak-to-average power ratio (PAPR) problem. This paper conducts a comprehensive analysis on the use of various OFDM PAPR reduction techniques in the DJSCC system, including both conventional techniques such as clipping, companding, SLM and PTS, and deep learning-based PAPR reduction techniques such as PAPR loss and clipping with retraining. Our investigation shows that although conventional PAPR reduction techniques can be applied to DJSCC, their performance in DJSCC is different from the conventional split source channel coding. Moreover, we observe that for signal distortion PAPR reduction techniques, clipping with retraining achieves the best performance in terms of both PAPR reduction and recovery accuracy. It is also noticed that signal non-distortion PAPR reduction techniques can successfully reduce the PAPR in DJSCC without compromise to signal reconstruction.
摘要
近来,深度联合源渠道编码(DJSCC)技术已经得到了广泛研究和应用,它可以在具有有限带宽和低信噪比的情况下显示出较高的性能。大多数DJSCC工作都是对离散时间分析传输进行研究,而将OFDM分配多谱分多层(PAPR)问题引入到DJSCC系统中会产生严重的高峰值至平均功率比(PAPR)问题。本文对DJSCC系统中OFDM PAPR减少技术的使用进行了全面的分析,包括传统技术such as clipping、companding、SLM和PTS,以及深度学习基于PAPR减少技术such as PAPR损失和clipping with retraining。我们的调查表明,虽然传统PAPR减少技术可以应用于DJSCC,但它们在DJSCC中的性能与传统分Split source channel coding不同。此外,我们发现在信号损害PAPR减少技术中,clipping with retraining可以在PAPR减少和重建精度方面达到最佳性能。此外,我们还发现了在非损信号PAPR减少技术中,可以成功地减少DJSCC中的PAPR,而无需牺牲信号重建精度。
Quantum Circuits for Stabilizer Error Correcting Codes: A Tutorial
results: 论文验证了这些电路的正确性,并提供了使用IBM Qiskit进行验证的方法。Abstract
Quantum computers have the potential to provide exponential speedups over their classical counterparts. Quantum principles are being applied to fields such as communications, information processing, and artificial intelligence to achieve quantum advantage. However, quantum bits are extremely noisy and prone to decoherence. Thus, keeping the qubits error free is extremely important toward reliable quantum computing. Quantum error correcting codes have been studied for several decades and methods have been proposed to import classical error correcting codes to the quantum domain. However, circuits for such encoders and decoders haven't been explored in depth. This paper serves as a tutorial on designing and simulating quantum encoder and decoder circuits for stabilizer codes. We present encoding and decoding circuits for five-qubit code and Steane code, along with verification of these circuits using IBM Qiskit. We also provide nearest neighbour compliant encoder and decoder circuits for the five-qubit code.
摘要
量子计算机有可能提供指数增速于其经典对手。量子原理在通信、信息处理和人工智能等领域应用以实现量子优势。然而,量子比特非常易受噪声和降解的影响。因此,保持量子比特错误自由非常重要于可靠的量子计算。量子错误修复代码已经在几十年内研究,并提出了将经典错误修复代码引入量子领域的方法。然而,这些圈定器和解码器电路的设计和仿真还没有得到深入研究。这篇论文作为量子编码和解码电路设计和仿真的教程,我们提出了五个量子比特编码和斯特恩代码的编码和解码电路,并使用IBM Qiskit进行验证。此外,我们还提供了最近邻居兼容的编码和解码电路 для五个量子比特编码。
Collaborative Fault-Identification & Reconstruction in Multi-Agent Systems
methods: 基于sequential convex programming (SCP) 和 alternating direction method of multipliers (ADMM) 优化方法,实现分布式多 Agent FDIR算法。
results: 可以处理多 Agent 间测量(包括距离、方向、相对速度和夹角),确定faulty Agent 和重建其真实状态。Abstract
The conventional solutions for fault-detection, identification, and reconstruction (FDIR) require centralized decision-making mechanisms which are typically combinatorial in their nature, necessitating the design of an efficient distributed FDIR mechanism that is suitable for multi-agent applications. To this end, we develop a general framework for efficiently reconstructing a sparse vector being observed over a sensor network via nonlinear measurements. The proposed framework is used to design a distributed multi-agent FDIR algorithm based on a combination of the sequential convex programming (SCP) and the alternating direction method of multipliers (ADMM) optimization approaches. The proposed distributed FDIR algorithm can process a variety of inter-agent measurements (including distances, bearings, relative velocities, and subtended angles between agents) to identify the faulty agents and recover their true states. The effectiveness of the proposed distributed multi-agent FDIR approach is demonstrated by considering a numerical example in which the inter-agent distances are used to identify the faulty agents in a multi-agent configuration, as well as reconstruct their error vectors.
摘要
传统的瑕点检测、识别和重建(FDIR)解决方案通常需要中央决策机制,这些机制通常是 combinatorial 的性质,需要设计一种高效的分布式 FDIR 机制,适用于多机器人应用。为此,我们开发了一种高效地重建 sparse vector 在感知网络上被观察的框架。该框架基于 sequential convex programming (SCP) 和 alternating direction method of multipliers (ADMM) 优化方法来设计分布式多机器人 FDIR 算法。该算法可以处理多机器人之间的各种测量数据(包括距离、方向、相对速度和 agents 之间的夹角)来识别瑕点机器人并重建其真实状态。我们通过一个数学示例来证明提出的分布式多机器人 FDIR 方法的效果,在这个示例中,利用了机器人之间的距离测量来识别瑕点机器人和重建其错误向量。
Enhancing the SEFDM Performance in High-Doppler Channels
results: 该研究发现,使用 SEFDM 技术可以在移动通信频率延迟和Doppler偏移的环境中实现可靠和高质量的通信,并且可以保持传统通信系统的 spectral efficiency。Abstract
In this paper, we propose the use of Spectrally Efficient Frequency Division Multiplexing (SEFDM) with additional techniques such as Frequency Domain Cyclic Prefix (FDCP) and Modified Non-Linear (MNL) acceleration for efficient handling of the impact of delay and Doppler shift in mobile communication channels. Our approach exhibits superior performance and spectral efficiency in comparison to traditional communication systems, while maintaining low computational cost. We study a model of the SEFDM communication system and investigate the impact of MNL acceleration with soft and hard decision Inverse System on the performance of SEFDM detection in the AWGN channel. We also analyze the effectiveness of FDCP in compensating for the impact of Doppler shift, and report BER detection figures using Regularized Sphere Decoding in various simulation scenarios. Our simulations demonstrate that it is possible to achieve acceptable performance in Doppler channels while maintaining the superiority of SEFDM over OFDM in terms of spectral efficiency. The results suggest that our proposed approach can tackle the effects of delay and Doppler shift in mobile communication networks, guaranteeing dependable and high-quality communication even in extremely challenging environments.
摘要
在这篇论文中,我们提议使用具有频率分配多普雷斯特(SEFDM)的spectrally efficient frequency division multiplexing技术,并采用频域循环 prefix(FDCP)和修改非线性(MNL)加速技术来有效地处理移动通信频道中的延迟和Doppler偏移的影响。我们的方法在比较 tradicional communication systems的情况下表现出较高的性能和频率效率,同时保持低的计算成本。我们研究了SEFDM通信系统的模型,并investigate MNL加速器在SOFT和HARD decision inverse system中的影响。我们还分析了FDCP在补做Doppler偏移的效果,并report了在不同的 simulate scenario中的BER检测数据。我们的Simulations表明,可以在Doppler频道中实现可接受的性能,同时保持SEFDM在OFDM方面的优势。结果表明,我们提议的方法可以在移动通信网络中抵御延迟和Doppler偏移的影响,保证高质量和可靠的通信,even in extremely challenging environments。
for: 这篇论文专门针对受限制的量化 Communication system 中的orthogonal time frequency space (OTFS) 技术,以实现成本和功率的最优化。
methods: 论文使用了coarse quantization 和 signal recovery 算法,包括原始的approximate message passing (AMP) 和 generalized expectation consistent for signal recovery (GEC-SR)。
results: 论文提出了一种低复杂度的算法,即将 GEC-SR 算法与快速归一化的 quasi-banded matrices 结合,从而降低了计算复杂度从立方体积到线性积,保持了性能水平。Abstract
This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropic, which imposes a significant loss to symbol detection algorithms like the original approximate message passing (AMP). Although the algorithm of generalized expectation consistent for signal recovery (GEC-SR) can mitigate this loss, the complexity in computation is prohibitively high, mainly due to an dramatic increase in the matrix size of OTFS. In this context, we propose a low-complexity algorithm that incorporates into the GEC-SR a quick inversion of quasi-banded matrices, reducing the complexity from a cubic order to a linear order while keeping the performance at the same level.
摘要
In the system, the effective channel is imbalanced and non-isotropic due to coarse quantization, which leads to a significant loss in symbol detection algorithms such as the original approximate message passing (AMP). The GEC-SR algorithm can mitigate this loss, but the high computational complexity prohibits its use. The proposed algorithm addresses this issue by reducing the computational complexity while maintaining the performance.The key idea of the proposed algorithm is to incorporate a quick inversion of quasi-banded matrices into the GEC-SR method. This allows for a significant reduction in computational complexity, from a cubic order to a linear order, while maintaining the same performance. The proposed algorithm is designed to address the issues of coarse and noisy quantization in OTFS-based communication systems, and it has important implications for cost and power efficiency.
Systematic Design and Optimization of Quantum Circuits for Stabilizer Codes
results: 本文通过使用IBM Qiskit进行验证,提出了一种优化的八个量子比特(qubit)编码器,其中使用了18个CNOT门和4个 Hadamard门,相比之下,在先前的工作中只用了14个单量子门、33个二量子门和6个CCNOT门。此外,本文还提出了优化的斯坦内码编码器和13个量子比特编码器,以降低门数。Abstract
Quantum computing is an emerging technology that has the potential to achieve exponential speedups over their classical counterparts. To achieve quantum advantage, quantum principles are being applied to fields such as communications, information processing, and artificial intelligence. However, quantum computers face a fundamental issue since quantum bits are extremely noisy and prone to decoherence. Keeping qubits error free is one of the most important steps towards reliable quantum computing. Different stabilizer codes for quantum error correction have been proposed in past decades and several methods have been proposed to import classical error correcting codes to the quantum domain. However, formal approaches towards the design and optimization of circuits for these quantum encoders and decoders have so far not been proposed. In this paper, we propose a formal algorithm for systematic construction of encoding circuits for general stabilizer codes. This algorithm is used to design encoding and decoding circuits for an eight-qubit code. Next, we propose a systematic method for the optimization of the encoder circuit thus designed. Using the proposed method, we optimize the encoding circuit in terms of the number of 2-qubit gates used. The proposed optimized eight-qubit encoder uses 18 CNOT gates and 4 Hadamard gates, as compared to 14 single qubit gates, 33 2-qubit gates, and 6 CCNOT gates in a prior work. The encoder and decoder circuits are verified using IBM Qiskit. We also present optimized encoder circuits for Steane code and a 13-qubit code in terms of the number of gates used.
摘要
量子计算是一种emerging技术,它可以实现对于类传统计算机的快速增长。为了实现量子优势,量子原理被应用到通信、信息处理和人工智能等领域。然而,量子计算机面临一个fundamental问题,那就是量子比特(qubits)具有极高的噪声和失去稳定性。保持qubits错误自由是量子计算的重要步骤。过去几十年,有多种稳定码为量子错误 corrections proposed,但 formal方法 towards the design and optimization of circuits for these quantum encoders and decoders have not been proposed.在这篇论文中,我们提出了一种系统的建构方法 для普通的稳定码编码电路。这种方法用于设计编码和解码电路 для八个量子比特的代码。然后,我们提出了一种系统的优化方法,用于优化所设计的编码电路。使用这种方法,我们优化了编码电路,使其使用的两个量子比特门的数量减少为18个CNOT门和4个 Hadamard门,与之前的14个单量子比特门、33个二量子比特门和6个CCNOT门相比。我们使用IBM Qiskit验证了编码和解码电路。此外,我们还提出了优化后的八个量子比特编码电路、Steane代码和13个量子比特代码的优化结果。
Deep Learning Meets Swarm Intelligence for UAV-Assisted IoT Coverage in Massive MIMO
for: This study is written for UAV-assisted multi-user massive multiple-input multiple-output (MU-mMIMO) systems, specifically for Internet-of-Things (IoT) users.
methods: The study uses a joint optimization problem of hybrid beamforming (HBF), UAV relay positioning, and power allocation (PA) to maximize the total achievable rate (AR) for multiple IoT users. The study also adopts a geometry-based millimeter-wave (mmWave) channel model for both links and proposes three different swarm intelligence (SI)-based algorithmic solutions to optimize.
results: The study shows that the proposed algorithmic solutions can attain higher capacity and reduce average delay for delay-constrained transmissions in a UAV-assisted MU-mMIMO IoT systems. Additionally, the proposed J-HBF-DLLPA can closely approach the optimal capacity while significantly reducing the runtime by 99%, which makes the DL-based solution a promising implementation for real-time online applications in UAV-assisted MU-mMIMO IoT systems.Here is the result in Simplified Chinese text:
results: 研究表明,提出的算法解决方案可以在UAV协助MU-mMIMO IoT系统中实现更高的容量和减少延迟。此外,提出的J-HBF-DLLPA可以准确地预测UAV的位置和优化的功率值,以实现最大化AR。Abstract
This study considers a UAV-assisted multi-user massive multiple-input multiple-output (MU-mMIMO) systems, where a decode-and-forward (DF) relay in the form of an unmanned aerial vehicle (UAV) facilitates the transmission of multiple data streams from a base station (BS) to multiple Internet-of-Things (IoT) users. A joint optimization problem of hybrid beamforming (HBF), UAV relay positioning, and power allocation (PA) to multiple IoT users to maximize the total achievable rate (AR) is investigated. The study adopts a geometry-based millimeter-wave (mmWave) channel model for both links and proposes three different swarm intelligence (SI)-based algorithmic solutions to optimize: 1) UAV location with equal PA; 2) PA with fixed UAV location; and 3) joint PA with UAV deployment. The radio frequency (RF) stages are designed to reduce the number of RF chains based on the slow time-varying angular information, while the baseband (BB) stages are designed using the reduced-dimension effective channel matrices. Then, a novel deep learning (DL)-based low-complexity joint hybrid beamforming, UAV location and power allocation optimization scheme (J-HBF-DLLPA) is proposed via fully-connected deep neural network (DNN), consisting of an offline training phase, and an online prediction of UAV location and optimal power values for maximizing the AR. The illustrative results show that the proposed algorithmic solutions can attain higher capacity and reduce average delay for delay-constrained transmissions in a UAV-assisted MU-mMIMO IoT systems. Additionally, the proposed J-HBF-DLLPA can closely approach the optimal capacity while significantly reducing the runtime by 99%, which makes the DL-based solution a promising implementation for real-time online applications in UAV-assisted MU-mMIMO IoT systems.
摘要
To solve this optimization problem, the study proposes three different swarm intelligence (SI)-based algorithmic solutions:1. UAV location with equal PA2. PA with fixed UAV location3. Joint PA with UAV deploymentThe radio frequency (RF) stages are designed to reduce the number of RF chains based on slow time-varying angular information, while the baseband (BB) stages are designed using reduced-dimension effective channel matrices.Furthermore, a novel deep learning (DL)-based low-complexity joint hybrid beamforming, UAV location, and power allocation optimization scheme (J-HBF-DLLPA) is proposed. This scheme consists of an offline training phase and an online prediction of UAV location and optimal power values to maximize AR.The illustrative results show that the proposed algorithmic solutions can achieve higher capacity and reduce average delay for delay-constrained transmissions in a UAV-assisted MU-mMIMO IoT system. Additionally, the proposed J-HBF-DLLPA can closely approach the optimal capacity while significantly reducing the runtime by 99%, making it a promising implementation for real-time online applications in UAV-assisted MU-mMIMO IoT systems.
Resource Allocation for Semantic-Aware Mobile Edge Computing Systems
results: 通过对非对称的原始问题进行几何编程变换,并使用交互优化算法解决,得到了最优解。此外,closed-form的语义提取因子的优化解也是 derive。对比 benchmark algorithm without semantic-aware allocation,提出的算法可以减少最大执行延迟达37.10%。同时,在大任务大小和Poor channel condition下,小语义提取因子被首选。Abstract
In this paper, a semantic-aware joint communication and computation resource allocation framework is proposed for mobile edge computing (MEC) systems. In the considered system, each terminal device (TD) has a computation task, which needs to be executed by offloading to the MEC server. To further decrease the transmission burden, each TD sends the small-size extracted semantic information of tasks to the server instead of the large-size raw data. An optimization problem of joint semantic-aware division factor, communication and computation resource management is formulated. The problem aims to minimize the maximum execution delay of all TDs while satisfying energy consumption constraints. The original non-convex problem is transformed into a convex one based on the geometric programming and the optimal solution is obtained by the alternating optimization algorithm. Moreover, the closed-form optimal solution of the semantic extraction factor is derived. Simulation results show that the proposed algorithm yields up to 37.10% delay reduction compared with the benchmark algorithm without semantic-aware allocation. Furthermore, small semantic extraction factors are preferred in the case of large task sizes and poor channel conditions.
摘要
在本文中,一种基于 semantics 的集成通信和计算资源分配框架被提出用于移动边缘计算(MEC)系统。系统中每个终端设备(TD)都有一个计算任务,需要通过卸载到 MEC 服务器进行执行。为了进一步减少传输负担,每个 TD 将小型的抽取 semantic 信息发送到服务器,而不是大量的原始数据。一个协调semantic-aware分配因子、通信和计算资源管理的优化问题被形ulated。该问题的目标是 minimize 所有 TD 的执行延迟最大值,同时满足能量消耗限制。原始的非泛合函数问题被转化为一个卷积函数问题,并通过卷积编程得到了优化解决方案。此外,closed-form 优化解决方案的 semantic 抽取因子也被 derivation。 simulation 结果表明,提出的算法可以减少最多 37.10% 的延迟,相比 Without semantic-aware 分配算法。此外,小的 semantic 抽取因子在大任务大小和差annels 条件下被首选。
A class-weighted supervised contrastive learning long-tailed bearing fault diagnosis approach using quadratic neural network
paper_authors: Wei-En Yu, Jinwei Sun, Shiping Zhang, Xiaoge Zhang, Jing-Xiao Liao for: 这个论文旨在提高深度学习方法对故障诊断中的表现,特别是在面临高度不均衡或长尾数据时。methods: 该论文提出了一种监督对比学习方法,使用类 weights 对决策函数进行调整,从而提高神经网络对故障诊断的特征提取能力。results: 实验结果表明,与 State-of-the-Art 方法相比,CCQNet 在面临高度不均衡或长尾数据时表现明显更好,可以更好地识别故障。Abstract
Deep learning has achieved remarkable success in bearing fault diagnosis. However, its performance oftentimes deteriorates when dealing with highly imbalanced or long-tailed data, while such cases are prevalent in industrial settings because fault is a rare event that occurs with an extremely low probability. Conventional data augmentation methods face fundamental limitations due to the scarcity of samples pertaining to the minority class. In this paper, we propose a supervised contrastive learning approach with a class-aware loss function to enhance the feature extraction capability of neural networks for fault diagnosis. The developed class-weighted contrastive learning quadratic network (CCQNet) consists of a quadratic convolutional residual network backbone, a contrastive learning branch utilizing a class-weighted contrastive loss, and a classifier branch employing logit-adjusted cross-entropy loss. By utilizing class-weighted contrastive loss and logit-adjusted cross-entropy loss, our approach encourages equidistant representation of class features, thereby inducing equal attention on all the classes. We further analyze the superior feature extraction ability of quadratic network by establishing the connection between quadratic neurons and autocorrelation in signal processing. Experimental results on public and proprietary datasets are used to validate the effectiveness of CCQNet, and computational results reveal that CCQNet outperforms SOTA methods in handling extremely imbalanced data substantially.
摘要
深度学习在滤波器疾病诊断中实现了很大的成功。然而,它在面临高度不均衡或长尾数据时表现不佳,这些情况在工业场景中却很普遍,因为疾病是一种非常罕见的事件,发生概率非常低。传统的数据扩展方法受到罕见类样本的缺乏的限制。在这篇论文中,我们提出了一种Supervised Contrastive Learning方法,使得神经网络在疾病诊断中提高特征提取能力。我们的方法包括一个quadratic convolutional residual network底层、一个使用类Weighted Contrastive Loss的对比学分支、以及一个使用Logit-adjusted Cross-Entropy Loss的分类分支。通过使用类Weighted Contrastive Loss和Logit-adjusted Cross-Entropy Loss,我们的方法促进了类别特征之间的等距耦合,从而使神经网络对所有类型的特征具有平等的注意力。我们还分析了quadratic neuron的特点,并将其与自相关函数的应用相连接,以证明quadratic neuron在信号处理中的优势。实验结果表明,CCQNet在面临高度不均衡数据时表现出了显著的优势,与SOTA方法相比,CCQNet在执行滤波器疾病诊断方面具有显著的改进。
results: 研究结果显示, JOINT 最小处理框架可以提高语音识别度,并限制噪音处理量,对于有利的噪音情况下,语音质量不会过度受损。Abstract
We consider speech enhancement for signals picked up in one noisy environment that must be rendered to a listener in another noisy environment. For both far-end noise reduction and near-end listening enhancement, it has been shown that excessive focus on noise suppression or intelligibility maximization may lead to excessive speech distortions and quality degradations in favorable noise conditions, where intelligibility is already at ceiling level. Recently [1,2] propose to remedy this with a minimum processing framework that either reduces noise or enhances listening a minimum amount given that a certain intelligibility criterion is still satisfied. Additionally, it has been shown that joint consideration of both environments improves speech enhancement performance. In this paper, we formulate a joint far- and near-end minimum processing framework, that improves intelligibility while limiting speech distortions in favorable noise conditions. We provide closed-form solutions to specific boundary scenarios and investigate performance for the general case using numerical optimization. We also show that concatenating existing minimum processing far- and near-end enhancement methods preserves the effects of the initial methods. Results show that the joint optimization can further improve performance compared to the concatenated approach.
摘要
我们考虑 speech 增强器在一个噪音环境中捕捉的讯号,需要在另一个噪音环境中呈现给听者。对于距离端噪音抑制和近端听力增强而言,过度强调噪音抑制或智能化最大化可能会导致对于有利的噪音情况下的话语变化和质量下降。最近,[1,2] 提出了一个最小处理框架,可以在保持智能化水平下最小化噪音或增强听力。此外,jointly 考虑两个环境可以提高话语增强表现。在这篇文章中,我们建立了一个共同距离和近端最小处理框架,可以在有利噪音情况下提高智能化水平,并限制话语变化。我们提供了关闭式解的具体情况,并通过数值优化进行探索。我们还证明了 concatenating 现有的最小处理距离和近端增强方法可以保持初始方法的效果。结果显示,共同优化可以进一步提高表现,比 concatenated 方法更好。
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
results: 在COG-MHEAR AVSE Challenge 2023 的基准模型上表现出优于0.14的提升,并在台湾官话语音视频数据集(TMSV)上与状态级模型相当,并且在所有比较模型中表现出最佳result。Abstract
Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a complex U-Net-based framework. The audio and visual signals are processed using a complex encoder and a ResNet-18 model, respectively. These processed signals are then fused using the conformer blocks and transformed into enhanced speech waveforms via a complex decoder. The conformer blocks consist of a combination of self-attention mechanisms and convolutional operations, enabling DCUC-Net to effectively capture both global and local audio-visual dependencies. Our experimental results demonstrate the effectiveness of DCUC-Net, as it outperforms the baseline model from the COG-MHEAR AVSE Challenge 2023 by a notable margin of 0.14 in terms of PESQ. Additionally, the proposed DCUC-Net performs comparably to a state-of-the-art model and outperforms all other compared models on the Taiwan Mandarin speech with video (TMSV) dataset.
摘要
近年研究均认可了将视觉数据 integrate 到语音提升(SE)系统中的优势。在这篇论文中,我们介绍了一种新的嵌入式音视频SE方法,称为DCUC-Net(深度复杂U-Net with 准确网络)。我们的DCUC-Net利用复杂Domain特征和一个堆栈的准确块。编码器和解码器都采用了复杂U-Net的框架。音频和视频信号分别通过复杂编码器和ResNet-18模型处理,然后通过准确块进行拼接,并转化为提升后的语音波形。准确块包括自注意机制和卷积操作,使DCUC-Net能够有效地捕捉全局和局部音视频相互依赖关系。我们的实验结果表明,DCUC-Net比基线模型在COG-MHEAR AVSE Challenge 2023中表现出了明显的提升(0.14),并且与当前的状态艺模型相当,在台湾官话语音视频(TMSV)数据集上表现出了最高的性能。
Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
results: 测试集的spearman correlation coefficient为0.537,开发集的spearman correlation coefficient为0.524,两者都高于之前基于单语言数据的融合方法的研究结果(test集的spearman correlation coefficient为0.476,开发集的spearman correlation coefficient为0.470)。Abstract
Speech emotion recognition has evolved from research to practical applications. Previous studies of emotion recognition from speech have focused on developing models on certain datasets like IEMOCAP. The lack of data in the domain of emotion modeling emerges as a challenge to evaluate models in the other dataset, as well as to evaluate speech emotion recognition models that work in a multilingual setting. This paper proposes an ensemble learning to fuse results of pre-trained models for emotion share recognition from speech. The models were chosen to accommodate multilingual data from English and Spanish. The results show that ensemble learning can improve the performance of the baseline model with a single model and the previous best model from the late fusion. The performance is measured using the Spearman rank correlation coefficient since the task is a regression problem with ranking values. A Spearman rank correlation coefficient of 0.537 is reported for the test set, while for the development set, the score is 0.524. These scores are higher than the previous study of a fusion method from monolingual data, which achieved scores of 0.476 for the test and 0.470 for the development.
摘要
研究者们在演讲情感识别方面从研究阶段逐渐演化到实际应用。 previous studies on speech emotion recognition have focused on developing models on specific datasets such as IEMOCAP. However, the lack of data in the domain of emotion modeling poses a challenge to evaluate models on other datasets and to evaluate speech emotion recognition models that work in a multilingual setting. This paper proposes an ensemble learning approach to fuse the results of pre-trained models for speech emotion recognition. The models chosen accommodate multilingual data from English and Spanish. The results show that ensemble learning can improve the performance of the baseline model and the previous best model from late fusion. The performance is measured using the Spearman rank correlation coefficient, as the task is a regression problem with ranking values. The reported Spearman rank correlation coefficient for the test set is 0.537, while for the development set, the score is 0.524. These scores are higher than the previous study of a fusion method from monolingual data, which achieved scores of 0.476 for the test and 0.470 for the development.
Directional Source Separation for Robust Speech Recognition on Smart Glasses
results: irectional source separation 可以提高语音识别率和说话者检测精度,但是对对话伙伴无效。 joint training Directional source separation 和 ASR 模型可以 achieve the best overall ASR performance.Abstract
Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality, this work investigates directional source separation using the multi-microphone array. We first explore multiple beamformers to assist source separation modeling by strengthening the directional properties of speech signals. In addition to relying on predetermined beamformers, we investigate neural beamforming in multi-channel source separation, demonstrating that automatic learning directional characteristics effectively improves separation quality. We further compare the ASR performance leveraging separated outputs to noisy inputs. Our results show that directional source separation benefits ASR for the wearer but not for the conversation partner. Lastly, we perform the joint training of the directional source separation and ASR model, achieving the best overall ASR performance.
摘要
现代智能眼镜利用先进的音频感知和机器学习技术,在实时转录和字幕服务方面提供了很大的便利,对日常交流中的人类体验带来了很大的改善。然而,这些系统经常遇到环境噪音的挑战,导致语音识别和发言者变换的干扰。为了提高音质,本工作研究了多频道源分离。我们首先探讨了多种扩声器,以增强对话语音的方向性特性。此外,我们还 investigate了基于自动学习的神经扩声器在多个通道源分离中的应用,并证明了自动学习方向特性可以有效提高分离质量。最后,我们比较了利用分离输出进行ASR的性能和直接使用噪音输入进行ASR的性能,结果表明irectional source separation对ASR有利,但对对话伙伴无效。最后,我们实现了irectional source separation和ASR模型的共同训练,达到了最佳的总ASR性能。
for: 这个研究的目的是开发一种可以从 parallel text recording 中提取高级别的语音特征,并将其应用于不同的 TTS voz 中,以实现更加自然和表情充沛的语音读取。
methods: 该研究使用了一种基于神经网络的 TTS 系统,并将其 equiped avec prosody-control 功能,以便在推理时间对语音输出进行更direct的Shape。
results: 研究表明,该系统可以准确地从新的说话者的 parallel text recording 中提取语音特征,并将其应用于不同的 TTS voz 中,无质量下降,同时保持目标 TTS voz 的identidad,根据一系列主观听力实验的评估。Abstract
Modern neural TTS systems are capable of generating natural and expressive speech when provided with sufficient amounts of training data. Such systems can be equipped with prosody-control functionality, allowing for more direct shaping of the speech output at inference time. In some TTS applications, it may be desirable to have an option that guides the TTS system with an ad-hoc speech recording exemplar to impose an implicit fine-grained, user-preferred prosodic realization for certain input prompts. In this work we present a first-of-its-kind neural TTS system equipped with such functionality to transfer the prosody from a parallel text recording from an unseen speaker. We demonstrate that the proposed system can precisely transfer the speech prosody from novel speakers to various trained TTS voices with no quality degradation, while preserving the target TTS speakers' identity, as evaluated by a set of subjective listening experiments.
摘要
现代神经网络Text-to-Speech系统可以从充足的训练数据中生成自然和表达力强的语音。这些系统可以搭载受控拍层功能,以更直接在推理时调节语音输出。在某些TTS应用程序中,可能愿意有一个选项,使TTS系统通过额外的即时示例来强制某些输入提示的细腻、用户首选的语音表现。在这种工作中,我们介绍了一种首次实现的神经网络TTS系统,可以将来自未见的说话人的语音特征精确地传递到不同的训练过的TTSvoice中,而无损质量,同时保持目标TTS speaker的身份,根据一组主观听力试验的评价。
results: 研究发现,这三种方法中的 pose 代码含有显著的 appearance 信息,而且这些方法的分离效果并不够完善。Abstract
As 3D human pose estimation can now be achieved with very high accuracy in the supervised learning scenario, tackling the case where 3D pose annotations are not available has received increasing attention. In particular, several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one. The methods then only need a small amount of supervised data to train a pose regressor using the pose-related latent vector as input, as it should be free of appearance information. In this paper, we carry out in-depth analysis to understand to what degree the state-of-the-art disentangled representation learning methods truly separate the appearance information from the pose one. First, we study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments. Second, we investigate disentanglement with respect to the 3D pose regressor following an adversarial attack perspective. Specifically, we design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust. Altogether, our analyses show that disentanglement in the three state-of-the-art disentangled representation learning frameworks if far from complete, and that their pose codes contain significant appearance information. We believe that our approach provides a valuable testbed to evaluate the degree of disentanglement of pose from appearance in self-supervised 3D human pose estimation.
摘要
As 3D human pose estimation 可以在超级vised learning scenario 中实现非常高的准确率,因此处理没有3D pose annotations的情况 receiving increasing attention. 特别是,several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one. The methods then only need a small amount of supervised data to train a pose regressor using the pose-related latent vector as input, as it should be free of appearance information.In this paper, we carry out in-depth analysis to understand to what degree the state-of-the-art disentangled representation learning methods truly separate the appearance information from the pose one. First, we study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments. Second, we investigate disentanglement with respect to the 3D pose regressor following an adversarial attack perspective. Specifically, we design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.Altogether, our analyses show that disentanglement in the three state-of-the-art disentangled representation learning frameworks is far from complete, and that their pose codes contain significant appearance information. We believe that our approach provides a valuable testbed to evaluate the degree of disentanglement of pose from appearance in self-supervised 3D human pose estimation.
Neural Image Compression Using Masked Sparse Visual Representation
results: 实验结果表明,M-AdaCode 方法可以在 JPEG-AI 标准数据集上实现更高的压缩率和更高的重建质量,并且可以在不同的传输比特率下进行负权补偿。Abstract
We study neural image compression based on the Sparse Visual Representation (SVR), where images are embedded into a discrete latent space spanned by learned visual codebooks. By sharing codebooks with the decoder, the encoder transfers integer codeword indices that are efficient and cross-platform robust, and the decoder retrieves the embedded latent feature using the indices for reconstruction. Previous SVR-based compression lacks effective mechanism for rate-distortion tradeoffs, where one can only pursue either high reconstruction quality or low transmission bitrate. We propose a Masked Adaptive Codebook learning (M-AdaCode) method that applies masks to the latent feature subspace to balance bitrate and reconstruction quality. A set of semantic-class-dependent basis codebooks are learned, which are weighted combined to generate a rich latent feature for high-quality reconstruction. The combining weights are adaptively derived from each input image, providing fidelity information with additional transmission costs. By masking out unimportant weights in the encoder and recovering them in the decoder, we can trade off reconstruction quality for transmission bits, and the masking rate controls the balance between bitrate and distortion. Experiments over the standard JPEG-AI dataset demonstrate the effectiveness of our M-AdaCode approach.
摘要
我们研究基于稀疏视觉表示(SVR)的神经网络图像压缩,图像被嵌入到学习的视觉码库中的离散特征空间中。通过在编码器和解码器之间共享码库,编码器将转化为整数编码字符串,这些编码字符串是高效穿梭平台强的和可靠的,而解码器通过这些编码字符串来重建图像。 précédente 的 SVR 基于压缩缺乏有效的Rate-Distortion 质量衡量机制,只能追求高重建质量或低传输比特率。我们提出了一种带有掩码(Mask)的自适应码库学习(M-AdaCode)方法,通过掩码在干扰特征空间中进行权重调整,以实现Rate-Distortion 质量衡量机制。我们学习了基于输入图像的semantic类别的基础码库,并将这些基础码库Weightedly 组合,以生成高质量重建的综合特征。编码器中的掩码将掩蔽不重要的权重,而解码器中的掩码将重新还原这些掩码,以实现Rate-Distortion 质量衡量机制。实验结果表明,我们的 M-AdaCode 方法在标准 JPEG-AI 数据集上表现出色。
GenLayNeRF: Generalizable Layered Representations with 3D Model Alignment for Multi-Human View Synthesis
results: 我们的方法在 NVS 中表现出色,与通用 NeRF 方法相比,它能够在几乎没有预期优化的情况下提供高品质的内容生成。而与层化 per-scene NeRF 方法相比,它能够在几乎没有测试时间优化的情况下提供相似或更好的表现。Abstract
Novel view synthesis (NVS) of multi-human scenes imposes challenges due to the complex inter-human occlusions. Layered representations handle the complexities by dividing the scene into multi-layered radiance fields, however, they are mainly constrained to per-scene optimization making them inefficient. Generalizable human view synthesis methods combine the pre-fitted 3D human meshes with image features to reach generalization, yet they are mainly designed to operate on single-human scenes. Another drawback is the reliance on multi-step optimization techniques for parametric pre-fitting of the 3D body models that suffer from misalignment with the images in sparse view settings causing hallucinations in synthesized views. In this work, we propose, GenLayNeRF, a generalizable layered scene representation for free-viewpoint rendering of multiple human subjects which requires no per-scene optimization and very sparse views as input. We divide the scene into multi-human layers anchored by the 3D body meshes. We then ensure pixel-level alignment of the body models with the input views through a novel end-to-end trainable module that carries out iterative parametric correction coupled with multi-view feature fusion to produce aligned 3D models. For NVS, we extract point-wise image-aligned and human-anchored features which are correlated and fused using self-attention and cross-attention modules. We augment low-level RGB values into the features with an attention-based RGB fusion module. To evaluate our approach, we construct two multi-human view synthesis datasets; DeepMultiSyn and ZJU-MultiHuman. The results indicate that our proposed approach outperforms generalizable and non-human per-scene NeRF methods while performing at par with layered per-scene methods without test time optimization.
摘要
《 Novel View Synthesis of Multi-Human Scenes with Generalizable Layered Scene Representation》 Multi-human scene novel view synthesis (NVS)面临许多挑战,主要是因为人体 occlusion 复杂。层次表示处理这些复杂性,通过将场景分解为多层Radiance Fields,但是它们主要是基于场景优化,因此效率低。通用人体视图合成方法将预先适应的3D人体模型与图像特征结合起来,但是它们主要是针对单个人体场景设计。另一个缺点是在缺视设定下,使用多步优化技术进行参数预定的3D人体模型会导致投影幻觉。在这种情况下,我们提出了GenLayNeRF,一种通用层次场景表示,用于无需场景优化和非常罕见的视图输入进行自由视角渲染多个人体主题。我们将场景分解成多个人体层,由3D人体模型anchor。然后,我们通过一种新的终端可调模块,通过iterative parametric correction和多视图特征融合来确保像素级匹配3D模型与输入视图。 для NVS,我们提取人体嵌入和图像对齐的点级特征,并使用自注意力和交叉注意力模块进行相关和融合。此外,我们还将低级RGB值加入特征中,使用注意力基于RGB融合模块。为了评估我们的方法,我们建立了两个多个人体视图合成数据集:DeepMultiSyn和ZJU-MultiHuman。结果表明,我们的提出方法在比较通用和非人体场景NeRF方法的同时,能够达到相同的性能水平,而不需要测试时优化。
results: 本文在 TextVQA-X、VQS、VQA-X 和 VizWiz-VQA-Grounding 数据集上达到了状态的最佳准确率。Abstract
Answer grounding is the task of locating relevant visual evidence for the Visual Question Answering task. While a wide variety of attention methods have been introduced for this task, they suffer from the following three problems: designs that do not allow the usage of pre-trained networks and do not benefit from large data pre-training, custom designs that are not based on well-grounded previous designs, therefore limiting the learning power of the network, or complicated designs that make it challenging to re-implement or improve them. In this paper, we propose a novel architectural block, which we term Sentence Attention Block, to solve these problems. The proposed block re-calibrates channel-wise image feature-maps by explicitly modeling inter-dependencies between the image feature-maps and sentence embedding. We visually demonstrate how this block filters out irrelevant feature-maps channels based on sentence embedding. We start our design with a well-known attention method, and by making minor modifications, we improve the results to achieve state-of-the-art accuracy. The flexibility of our method makes it easy to use different pre-trained backbone networks, and its simplicity makes it easy to understand and be re-implemented. We demonstrate the effectiveness of our method on the TextVQA-X, VQS, VQA-X, and VizWiz-VQA-Grounding datasets. We perform multiple ablation studies to show the effectiveness of our design choices.
摘要
Answer grounding 任务是为Visual Question Answering 任务中找到相关的视觉证据。虽然过去的很多注意力方法被提出,但它们受到以下三个问题的限制:不允许使用预训练网络,不能充分利用大规模预训练数据,或者自定义的设计不基于已有的固定设计,因此限制了网络的学习能力。在这篇论文中,我们提出了一种新的建筑块,我们称之为句子注意力块(Sentence Attention Block),以解决这些问题。我们的块通过显式地模型图像特征地图和句子嵌入的间接关系来重新准确化通道 wise 图像特征地图。我们可视示了该块如何基于句子嵌入来过滤不相关的通道 wise 图像特征地图。我们从一个已知的注意力方法开始,通过小量修改,我们提高了结果,达到了状态之Art accuracy。我们的方法的灵活性使得可以使用不同的预训练后台网络,其简洁性使得容易理解和重新实现。我们在TextVQA-X、VQS、VQA-X 和 VizWiz-VQA-Grounding 数据集上进行了多个缺省研究,以证明我们的设计选择的有效性。
Continuous Levels of Detail for Light Field Networks
results: 提出一种基于连续 LODs 的神经网络表示方法,可以实现进度式流式神经网络表示,降低渲染延迟和资源使用率。Abstract
Recently, several approaches have emerged for generating neural representations with multiple levels of detail (LODs). LODs can improve the rendering by using lower resolutions and smaller model sizes when appropriate. However, existing methods generally focus on a few discrete LODs which suffer from aliasing and flicker artifacts as details are changed and limit their granularity for adapting to resource limitations. In this paper, we propose a method to encode light field networks with continuous LODs, allowing for finely tuned adaptations to rendering conditions. Our training procedure uses summed-area table filtering allowing efficient and continuous filtering at various LODs. Furthermore, we use saliency-based importance sampling which enables our light field networks to distribute their capacity, particularly limited at lower LODs, towards representing the details viewers are most likely to focus on. Incorporating continuous LODs into neural representations enables progressive streaming of neural representations, decreasing the latency and resource utilization for rendering.
摘要
近些年,多级细节(LOD)生成神经表示方法得到了一些突破。LOD可以通过使用较低的分辨率和小型模型来提高渲染。然而,现有方法通常只关注一些精确的LOD,这会导致抖抖和闪烁artifacts,限制其细节适应资源的变化。在这篇论文中,我们提出了一种使用连续LOD编码光场网络方法,允许为渲染条件进行细化适应。我们的训练过程使用总面积表 filtering,以实现高效的连续filtering在不同LODs。此外,我们使用关注度基于的重要性采样,使我们的光场网络能够更好地分配其容量,特别是在较低的LODs。将连续LODintegrated into神经表示允许进行进程式流动神经表示,降低渲染的延迟和资源利用率。
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding
results: 在视频分类和 temporal action localization 任务上实现了consistent improvement,并达到了长视频模型的state-of-the-art表现。Abstract
While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length. A common approach to process long videos is applying a short-form video model over uniformly sampled clips of fixed temporal length and aggregating the outputs. This approach neglects the underlying nature of long videos since fixed-length clips are often redundant or uninformative. In this paper, we aim to provide a generic and adaptive sampling approach for long-form videos in lieu of the de facto uniform sampling. Viewing videos as semantically consistent segments, we formulate a task-agnostic, unsupervised, and scalable approach based on Kernel Temporal Segmentation (KTS) for sampling and tokenizing long videos. We evaluate our method on long-form video understanding tasks such as video classification and temporal action localization, showing consistent gains over existing approaches and achieving state-of-the-art performance on long-form video modeling.
摘要
当今大多数视频理解模型都是在短范围clip上运行,但实际世界中的视频往往是数分钟长,并且有semantically consistent的分割段。一种常见的方法处理长视频是,将短视频模型应用于固定 temporal length的clip上,并将输出集成。这种方法忽略了长视频的本质,因为固定长clip经常是 redundancy or uninformative。在这篇论文中,我们目的是提供一种通用和适应性的抽样方法,以代替现有的固定抽样。视频被视为semantically consistent的分割段,我们基于Kernel Temporal Segmentation(KTS)提出了一种任务无关、无监督和可扩展的方法,用于抽取和 tokenize 长视频。我们对长视频理解任务,如视频分类和 temporal action localization,进行了评估,并显示了与现有方法相比的consistent提升,并实现了长视频模型的州际性表现。
A Large-scale Dataset for Audio-Language Representation Learning
results: 论文通过在不同下游任务上训练 популяр的模型,展示了对 Audio-Language Retrieval、Audio Captioning 和环境分类等任务的性能改进。此外,论文还提出了一个新的测试集,并为音频语言任务提供了一个 referential 平台。Abstract
The AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning community, the present audio-language datasets suffer from limitations such as insufficient volume, simplistic content, and arduous collection procedures. To tackle these challenges, we present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs, and construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1.9M audio-text pairs. To demonstrate the effectiveness of the proposed dataset, we train popular models on our dataset and show performance improvement on various downstream tasks, namely, audio-language retrieval, audio captioning, environment classification. In addition, we establish a novel test set and provide a benchmark for audio-text tasks. The proposed dataset will be released at https://auto-acd.github.io/.
摘要
《人工智能社区在开发强大基础模型方面已经做出了 significiant 进步,这些基础模型得益于大规模多modal数据驱动。然而,在音频表示学术社区中,现有的音频语言数据集受到一些限制,如数据量不足、内容过于简单、收集过程较为繁琐。为了解决这些挑战,我们提出了一种创新的自动音频caption生成管道,基于一系列公共工具或API,并构建了大规模、高质量的音频语言数据集,名为Auto-ACD,包含超过190万个音频文本对。为了证明我们的数据集的效iveness,我们在我们的数据集上训练了popular模型,并在多个下游任务上显示了性能改进,包括音频语言检索、音频captioning、环境分类。此外,我们设立了一个新的测试集,并提供了音频文本任务的benchmark。我们计划在https://auto-acd.github.io/上发布我们的数据集。》Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
paper_authors: Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu
For: 提高 diffusion U-Net 生成质量,无需额外训练或调整。* Methods: 利用 U-Net 架构的 skip connections 和 backbone feature maps,通过重新权重分配来提高生成质量。* Results: 在图像和视频生成任务中,提出了一种简单 yet effective 的方法 FreeU,可以轻松地与现有的 diffusion 模型结合使用,提高生成质量。Abstract
In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the U-Net architecture to the denoising process and identify that its main backbone primarily contributes to denoising, whereas its skip connections mainly introduce high-frequency features into the decoder module, causing the network to overlook the backbone semantics. Capitalizing on this discovery, we propose a simple yet effective method-termed "FreeU" - that enhances generation quality without additional training or finetuning. Our key insight is to strategically re-weight the contributions sourced from the U-Net's skip connections and backbone feature maps, to leverage the strengths of both components of the U-Net architecture. Promising results on image and video generation tasks demonstrate that our FreeU can be readily integrated to existing diffusion models, e.g., Stable Diffusion, DreamBooth, ModelScope, Rerender and ReVersion, to improve the generation quality with only a few lines of code. All you need is to adjust two scaling factors during inference. Project page: https://chenyangsi.top/FreeU/.
摘要
在这篇论文中,我们揭示了扩散U-Net的未利用潜力,它作为一种"免费的午餐",可以在飞行中显著提高生成质量。我们首先调查扩散U-Net的建筑均衡对减噪过程的关键贡献,并发现其主要脊梁主要做减噪,而跳转连接主要将高频特征引入到解码模块,使网络忽略脊梁 semantics。基于这一发现,我们提出了一种简单 yet effective的方法——FreeU,可以无需额外训练或微调,提高生成质量。我们关键的思路是在扩散U-Net的跳转连接和脊梁特征图之间进行权重调整,以利用扩散U-Net的两个组件之间的优势。 promising results on image and video generation tasks show that our FreeU can be easily integrated into existing diffusion models, such as Stable Diffusion, DreamBooth, ModelScope, Rerender and ReVersion, to improve the generation quality with only a few lines of code. All you need is to adjust two scaling factors during inference. Project page: .
Budget-Aware Pruning: Handling Multiple Domains with Less Parameters
results: 研究获得了与基准模型相似的分类性能,并且降低了计算成本和模型大小。另外,这个方法在资源有限的设备上也能够更好地运行。Abstract
Deep learning has achieved state-of-the-art performance on several computer vision tasks and domains. Nevertheless, it still has a high computational cost and demands a significant amount of parameters. Such requirements hinder the use in resource-limited environments and demand both software and hardware optimization. Another limitation is that deep models are usually specialized into a single domain or task, requiring them to learn and store new parameters for each new one. Multi-Domain Learning (MDL) attempts to solve this problem by learning a single model that is capable of performing well in multiple domains. Nevertheless, the models are usually larger than the baseline for a single domain. This work tackles both of these problems: our objective is to prune models capable of handling multiple domains according to a user-defined budget, making them more computationally affordable while keeping a similar classification performance. We achieve this by encouraging all domains to use a similar subset of filters from the baseline model, up to the amount defined by the user's budget. Then, filters that are not used by any domain are pruned from the network. The proposed approach innovates by better adapting to resource-limited devices while, to our knowledge, being the only work that handles multiple domains at test time with fewer parameters and lower computational complexity than the baseline model for a single domain.
摘要
深度学习已经在计算机视觉任务和领域上达到了状态对抗性。然而,它仍然具有高的计算成本和需要较多的参数。这些限制使得在资源有限的环境中使用它们变得困难,需要软件和硬件优化。另外,深度模型通常是专门为单个领域或任务设计的,因此它们需要学习和存储每个新领域或任务的新参数。多个领域学习(MDL)尝试解决这个问题,通过学习一个能够在多个领域中表现好的单一模型。然而,这些模型通常比基eline模型更大。本工作解决了这两个问题:我们的目标是使用用户定义的预算来采样和裁剪模型,使其在计算上更加可持预算而仍保持相似的分类性能。我们实现了这一点通过优化所有领域使用基eline模型的相似subset of filters,并且不用于任何领域的筛子被裁剪出去。我们的方法创新在资源有限的设备上更好地适应,并且,至于我们所知道的,是唯一一个在测试时处理多个领域的方法,使用 fewer parameters 和更低的计算复杂度来比基eline模型在单个领域中表现。
Weight Averaging Improves Knowledge Distillation under Domain Shift
results: 研究发现,权重平均技术可以提高知识塑化在不同领域数据上的性能。此外,提出了一种简单的权重平均策略,不需要在训练过程中评估验证数据,并证明其与SWAD和SMA相当。Abstract
Knowledge distillation (KD) is a powerful model compression technique broadly used in practical deep learning applications. It is focused on training a small student network to mimic a larger teacher network. While it is widely known that KD can offer an improvement to student generalization in i.i.d setting, its performance under domain shift, i.e. the performance of student networks on data from domains unseen during training, has received little attention in the literature. In this paper we make a step towards bridging the research fields of knowledge distillation and domain generalization. We show that weight averaging techniques proposed in domain generalization literature, such as SWAD and SMA, also improve the performance of knowledge distillation under domain shift. In addition, we propose a simplistic weight averaging strategy that does not require evaluation on validation data during training and show that it performs on par with SWAD and SMA when applied to KD. We name our final distillation approach Weight-Averaged Knowledge Distillation (WAKD).
摘要
知识塑化(KD)是一种广泛应用在深度学习实践中的模型压缩技术。它关注训练一个小学生网络,以模仿一个更大的教师网络。虽然广泛认知KD可以提高学生网络在同一个分布下的泛化性能,但它在领域转移情况下的性能尚未得到了文献的充分关注。在这篇论文中,我们尝试将知识塑化和领域总结两个领域联系起来。我们表明了在领域转移情况下使用Weight averaging技术,如SWAD和SMA,可以提高知识塑化的性能。此外,我们还提出了一种简单的Weight averaging策略,不需要在训练过程中评估验证数据,并证明它与SWAD和SMA在KD中具有相同的性能。我们将这种最终塑化方法称为Weight-Averaged Knowledge Distillation(WAKD)。
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
results: 对多种 skeleton-based action recognition 任务进行了全面的解决,包括视频级动作分类、实例级动作检测和群体活动识别。实现了 transfer learning 和共同训练 across different action tasks and datasets,并且在多个 benchmark 上达到了 state-of-the-art 性能。Abstract
We present SkeleTR, a new framework for skeleton-based action recognition. In contrast to prior work, which focuses mainly on controlled environments, we target more general scenarios that typically involve a variable number of people and various forms of interaction between people. SkeleTR works with a two-stage paradigm. It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios. To mitigate the negative impact of inaccurate skeleton associations, SkeleTR takes relative short skeleton sequences as input and increases the number of sequences. As a unified solution, SkeleTR can be directly applied to multiple skeleton-based action tasks, including video-level action classification, instance-level action detection, and group-level activity recognition. It also enables transfer learning and joint training across different action tasks and datasets, which result in performance improvement. When evaluated on various skeleton-based action recognition benchmarks, SkeleTR achieves the state-of-the-art performance.
摘要
我们提出了SkeleTR,一个新的骨架基于动作识别框架。与先前的工作不同,SkeleTR针对更加一般的场景,通常包括变量数量的人员和人员之间多种互动。SkeleTR采用两stage架构,首先使用图 convolution 模型每个骨sequences的内部动作动态,然后使用堆式 transformer 编码器捕捉人员之间重要的动作识别。为了减轻不准确的骨 association 的影响,SkeleTR 使用短skeleton sequence 作为输入,并增加输入序列的数量。作为一个通用解决方案,SkeleTR 可以直接应用于多种骨基于动作任务,包括视频级动作分类、实例级动作检测和群体活动识别。它还允许转移学习和共同训练不同的动作任务和数据集,从而提高性能。在多种骨基于动作识别 benchmark 上评估,SkeleTR 实现了状态的极佳表现。
Signature Activation: A Sparse Signal View for Holistic Saliency
results: 本文透过评估 coronary angiogram 中的病变检测,证明了 Signature Activation 的可靠性和有用性。Abstract
The adoption of machine learning in healthcare calls for model transparency and explainability. In this work, we introduce Signature Activation, a saliency method that generates holistic and class-agnostic explanations for Convolutional Neural Network (CNN) outputs. Our method exploits the fact that certain kinds of medical images, such as angiograms, have clear foreground and background objects. We give theoretical explanation to justify our methods. We show the potential use of our method in clinical settings through evaluating its efficacy for aiding the detection of lesions in coronary angiograms.
摘要
《机器学习在医疗领域的应用需要模型的透明度和解释性》。在这项工作中,我们介绍了《签名活化》,一种可以生成整体和无类别的解释方法,用于 convolutional neural network(CNN)输出。我们的方法利用了某些医疗图像,如血管agram,具有明确的前景和背景对象。我们给出了理论解释,以便证明我们的方法。我们通过评估其在 coronary angiogram 中的可用性,显示了我们的方法在临床应用中的潜在价值。
CalibFPA: A Focal Plane Array Imaging System based on Online Deep-Learning Calibration
results: 在模拟和实验数据上,CalibFPA的性能超过了现有的压缩镜头数组方法,并且进行了系统元素的分析和计算复杂度的评估。Abstract
Compressive focal plane arrays (FPA) enable cost-effective high-resolution (HR) imaging by acquisition of several multiplexed measurements on a low-resolution (LR) sensor. Multiplexed encoding of the visual scene is typically performed via electronically controllable spatial light modulators (SLM). An HR image is then reconstructed from the encoded measurements by solving an inverse problem that involves the forward model of the imaging system. To capture system non-idealities such as optical aberrations, a mainstream approach is to conduct an offline calibration scan to measure the system response for a point source at each spatial location on the imaging grid. However, it is challenging to run calibration scans when using structured SLMs as they cannot encode individual grid locations. In this study, we propose a novel compressive FPA system based on online deep-learning calibration of multiplexed LR measurements (CalibFPA). We introduce a piezo-stage that locomotes a pre-printed fixed coded aperture. A deep neural network is then leveraged to correct for the influences of system non-idealities in multiplexed measurements without the need for offline calibration scans. Finally, a deep plug-and-play algorithm is used to reconstruct images from corrected measurements. On simulated and experimental datasets, we demonstrate that CalibFPA outperforms state-of-the-art compressive FPA methods. We also report analyses to validate the design elements in CalibFPA and assess computational complexity.
摘要
高度压缩的投影平面阵列(FPA)可以实现低成本高分辨率(HR)成像,通过多个多样化测量在低分辨率(LR)感知器上。多样化编码视场通常通过电子控制可变光学模拟器(SLM)进行。然后,从编码测量中重建HR图像,通过解决一个反射问题,该问题涉及到成像系统的前向模型。但是,使用结构化SLM时难以进行线上准备扫描,以便测量系统响应点源在每个空间位置上。在本研究中,我们提出了一种新的压缩FPA系统,基于在线深度学习准备多样化LR测量(CalibFPA)。我们引入了一个 piezo 阶段,使得预制印刷的固定编码窗口在不同的空间位置上移动。然后,我们利用了深度神经网络来纠正多样化测量中系统非理想的影响,无需进行线上准备扫描。最后,我们使用了深度插件播客算法来重建图像。在模拟和实验数据集上,我们证明了CalibFPA的性能比现有压缩FPA方法更高。我们还进行了分析,以验证设计元素的合理性和计算复杂性。
paper_authors: Samuel Felipe dos Santos, Nicu Sebe, Jurandy Almeida
for: 本文旨在研究频域预处理后的深度学习模型,以优化计算成本和参数数量。
methods: 本文使用了DCT频域表示法,并对传统 CNN 架构进行了修改,以适应频域数据。
results: 本文提出了一些手动和数据驱动的技术来降低计算成本和参数数量,以实现高效且精准的频域深度学习模型。Abstract
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purposes demanding a preliminary decoding process that have a high computational load and memory usage. For this reason, deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years. Those methods usually extract a frequency domain representation of the image, like DCT, by a partial decoding, and then make adaptation to typical CNNs architectures to work with them. One limitation of these current works is that, in order to accommodate the frequency domain data, the modifications made to the original model increase significantly their amount of parameters and computational complexity. On one hand, the methods have faster preprocessing, since the cost of fully decoding the images is avoided, but on the other hand, the cost of passing the images though the model is increased, mitigating the possible upside of accelerating the method. In this paper, we propose a further study of the computational cost of deep models designed for the frequency domain, evaluating the cost of decoding and passing the images through the network. We also propose handcrafted and data-driven techniques for reducing the computational complexity and the number of parameters for these models in order to keep them similar to their RGB baselines, leading to efficient models with a better trade off between computational cost and accuracy.
摘要
卷积神经网络(CNN)在过去的一代时间内取得了非常的进步,在计算机视觉任务中定义了状态的艺术。CNN可以直接从RGB像素上学习坚实的数据表示。然而,大多数图像数据通常是压缩形式,JPEG是最常用的,因为传输和存储目的需要高计算负担和内存使用。为此,可以直接从压缩领域学习深度学习方法在过去几年内得到了关注。这些方法通常提取图像的频率频谱表示,例如DCT,通过部分解码,然后将其与传统的CNN架构进行适应。现有的方法的一个限制是,为了适应频率频谱数据,模型的修改会增加显著。一方面,预处理更快,因为完全解码图像的成本被避免了,但另一方面,通过网络传输图像的成本增加,这可能导致加速方法的可能性减退。在这篇论文中,我们将进一步研究深度模型在频率频谱频谱上的计算成本,以及图像传输和网络传输的成本。我们还将提出手工和数据驱动的技术,以减少模型的计算复杂性和参数数量,以保持与RGB基eline相似的效率,从而实现更好的计算成本和准确性的负担平衡。
Enhancing motion trajectory segmentation of rigid bodies using a novel screw-based trajectory-shape representation
results: 该论文使用自我监督分割方法进行验证,在实验和真实的人类斟 Pouring 动作记录中表现出更加稳定和一致的分割结果,与传统表示方法相比。Abstract
Trajectory segmentation refers to dividing a trajectory into meaningful consecutive sub-trajectories. This paper focuses on trajectory segmentation for 3D rigid-body motions. Most segmentation approaches in the literature represent the body's trajectory as a point trajectory, considering only its translation and neglecting its rotation. We propose a novel trajectory representation for rigid-body motions that incorporates both translation and rotation, and additionally exhibits several invariant properties. This representation consists of a geometric progress rate and a third-order trajectory-shape descriptor. Concepts from screw theory were used to make this representation time-invariant and also invariant to the choice of body reference point. This new representation is validated for a self-supervised segmentation approach, both in simulation and using real recordings of human-demonstrated pouring motions. The results show a more robust detection of consecutive submotions with distinct features and a more consistent segmentation compared to conventional representations. We believe that other existing segmentation methods may benefit from using this trajectory representation to improve their invariance.
摘要
准确地描述行走过程的分段是指将行走过程分解成有意义的连续子过程。这篇论文关注于三维固定体运动的轨迹分段。大多数文献中的分段方法只考虑体的翻译和忽略其旋转。我们提出了一种新的轨迹表示方法,该方法包括一个 геометрический进度率和一个第三阶轨迹形态描述器。我们使用了滚筒理论来使这种表示方法时间不变和参照点无关。这种新的表示方法在自主监督分段方法中得到验证,包括在模拟和真实的人类倒 Pouring 动作记录中。结果显示,使用这种轨迹表示方法可以更好地检测出不同特征的连续子过程,并且比传统表示方法更加一致。我们认为其他现有的分段方法可能会从这种轨迹表示方法中受益,以提高其对称性。
Self-supervised learning unveils change in urban housing from street-level images
paper_authors: Steven Stalder, Michele Volpi, Nicolas Büttner, Stephen Law, Kenneth Harttgen, Esra Suel
for: tracks progress in urban housing, specifically in London’s housing supply
methods: uses deep learning-based computer vision methods and self-supervised techniques to measure change in street-level images
results: successfully identified point-level change in London’s housing supply and distinguished between major and minor change, providing timely information for urban planning and policy decisions.Abstract
Cities around the world face a critical shortage of affordable and decent housing. Despite its critical importance for policy, our ability to effectively monitor and track progress in urban housing is limited. Deep learning-based computer vision methods applied to street-level images have been successful in the measurement of socioeconomic and environmental inequalities but did not fully utilize temporal images to track urban change as time-varying labels are often unavailable. We used self-supervised methods to measure change in London using 15 million street images taken between 2008 and 2021. Our novel adaptation of Barlow Twins, Street2Vec, embeds urban structure while being invariant to seasonal and daily changes without manual annotations. It outperformed generic embeddings, successfully identified point-level change in London's housing supply from street-level images, and distinguished between major and minor change. This capability can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.
摘要
全球各地城市面临着供应充足、安全、健康的住房的紧迫需求。尽管城市住房问题的政策重要性不言而喻,但我们对城市变化的追踪和监测能力却受到限制。使用深度学习计算机视觉方法对街道级图像进行分析,可以成功地衡量社会经济和环境不平等,但这些方法通常无法利用时间变化来追踪城市变化。我们使用自动学习方法,使用2008年至2021年之间的1500万个街道级图像,在伦敦市进行了时间变化的追踪。我们对Barlow Twins进行了改进,称之为Street2Vec,它可以嵌入城市结构,同时具有季节和日期变化的抗辐射性,无需手动标注。Street2Vec在伦敦市的住房供应变化追踪中表现出色,可以提供实时的城市规划和政策决策信息,以建立更加人居住、公平、可持续的城市。
You can have your ensemble and run it too – Deep Ensembles Spread Over Time
methods: 我们提出了 Deep Ensembles Spread Over Time (DESOT) 方法,将单一的 Ensemble member 应用到每个数据点上,并融合多个数据点的预测。
results: DESOT 可以获得深度 Ensemble 的优化性和不确定性估计性,而不需要额外的计算成本增加。 DESOT 也简单实现,不需要在训练过程中使用时间序列。 最后,我们发现 DESOT 和深度 Ensemble 都能在非标准数据上进行预测和不确定性估计。Abstract
Ensembles of independently trained deep neural networks yield uncertainty estimates that rival Bayesian networks in performance. They also offer sizable improvements in terms of predictive performance over single models. However, deep ensembles are not commonly used in environments with limited computational budget -- such as autonomous driving -- since the complexity grows linearly with the number of ensemble members. An important observation that can be made for robotics applications, such as autonomous driving, is that data is typically sequential. For instance, when an object is to be recognized, an autonomous vehicle typically observes a sequence of images, rather than a single image. This raises the question, could the deep ensemble be spread over time? In this work, we propose and analyze Deep Ensembles Spread Over Time (DESOT). The idea is to apply only a single ensemble member to each data point in the sequence, and fuse the predictions over a sequence of data points. We implement and experiment with DESOT for traffic sign classification, where sequences of tracked image patches are to be classified. We find that DESOT obtains the benefits of deep ensembles, in terms of predictive and uncertainty estimation performance, while avoiding the added computational cost. Moreover, DESOT is simple to implement and does not require sequences during training. Finally, we find that DESOT, like deep ensembles, outperform single models for out-of-distribution detection.
摘要
ensemble of independently trained deep neural networks可以提供与 bayesian networks相当的不确定性估计,同时也可以提高预测性能。但是,深度 ensemble在计算budget有限的环境中并不很常见,因为ensemble的复杂度随着成员增加而增加。在робо特应用,如自动驾驶,发现数据通常是顺序的。例如,当需要识别一个物体时,一辆自动驾驶车通常会观察一串图像,而不是单个图像。这引出了一个问题:可以将深度 ensemble推广到时间吗?在这种情况下,我们提出了深度 ensemble推广到时间(DESOT)的想法。我们只应用一个 ensemble member 到每个数据点的序列中,并将预测结果进行融合。我们实现并对 traffic sign classification 进行实验,Sequence of tracked image patches 需要进行分类。我们发现 DESOT 可以获得深度 ensemble 的优点,即预测性能和不确定性估计的好处,而不需要添加计算成本。此外,DESOT 简单易实现,不需要在训练时序列。最后,我们发现 DESOT 也可以超过单个模型的表现,对于非标准范围检测。
How to turn your camera into a perfect pinhole model
results: 提高了许多计算机视觉算法和应用的性能,消除了扭曲参数和迭代优化。 Validated by synthetic data and real-world images.Abstract
Camera calibration is a first and fundamental step in various computer vision applications. Despite being an active field of research, Zhang's method remains widely used for camera calibration due to its implementation in popular toolboxes. However, this method initially assumes a pinhole model with oversimplified distortion models. In this work, we propose a novel approach that involves a pre-processing step to remove distortions from images by means of Gaussian processes. Our method does not need to assume any distortion model and can be applied to severely warped images, even in the case of multiple distortion sources, e.g., a fisheye image of a curved mirror reflection. The Gaussian processes capture all distortions and camera imperfections, resulting in virtual images as though taken by an ideal pinhole camera with square pixels. Furthermore, this ideal GP-camera only needs one image of a square grid calibration pattern. This model allows for a serious upgrade of many algorithms and applications that are designed in a pure projective geometry setting but with a performance that is very sensitive to nonlinear lens distortions. We demonstrate the effectiveness of our method by simplifying Zhang's calibration method, reducing the number of parameters and getting rid of the distortion parameters and iterative optimization. We validate by means of synthetic data and real world images. The contributions of this work include the construction of a virtual ideal pinhole camera using Gaussian processes, a simplified calibration method and lens distortion removal.
摘要
Camera 卡利ibration 是 computer vision 应用中的第一步和基础步骤。尽管是一个活跃的研究领域,张的方法仍然广泛使用于 camera 卡利ibration due to its implementation in popular toolboxes。然而,这种方法初始化假设了缩影模型,忽略了真实的扭曲模型。在这种工作中,我们提出了一种新的方法,该方法通过 Gaussian processes 来从图像中除扭曲。我们的方法不需要任何扭曲模型,可以应用于严重扭曲的图像,甚至在多个扭曲源的情况下,如 fisheye 图像 reflection 的弯曲镜。 Gaussian processes 捕捉了所有的扭曲和相机缺陷,从而生成虚拟的 ideal pinhole camera 图像,如quare pixels。此外,这个 ideal GP-camera 只需一个平方格 calibration pattern 图像。这种模型允许许多算法和应用程序,其中一些是在纯 proyective geometry 设定下设计,但是性能受到非线性镜头扭曲的影响。我们通过简化张的卡利ibration 方法,减少参数的数量,消除扭曲参数和迭代优化来证明方法的有效性。我们验证了这种方法的有效性通过 synthetic 数据和实际图像。本研究的贡献包括:在 Gaussian processes 中构建虚拟的 ideal pinhole camera,简化卡利ibration 方法和镜头扭曲除除。
results: 我们的方法与现有方法相比,在年轻精度、特征保留和年轻质量等方面具有明显的优势。Abstract
In this paper, we address the problem of face aging: generating past or future facial images by incorporating age-related changes to the given face. Previous aging methods rely solely on human facial image datasets and are thus constrained by their inherent scale and bias. This restricts their application to a limited generatable age range and the inability to handle large age gaps. We propose FADING, a novel approach to address Face Aging via DIffusion-based editiNG. We go beyond existing methods by leveraging the rich prior of large-scale language-image diffusion models. First, we specialize a pre-trained diffusion model for the task of face age editing by using an age-aware fine-tuning scheme. Next, we invert the input image to latent noise and obtain optimized null text embeddings. Finally, we perform text-guided local age editing via attention control. The quantitative and qualitative analyses demonstrate that our method outperforms existing approaches with respect to aging accuracy, attribute preservation, and aging quality.
摘要
在这篇论文中,我们解决了人脸年龄化问题:通过 incorporating 年龄相关变化来生成过去或未来的脸部图像。先前的年龄方法仅仅基于人类脸部图像集合,因此受到其内置的尺度和偏见的限制,只能生成有限的年龄范围内的图像,并且无法处理大的年龄差。我们提出了 FADING,一种新的方法来解决人脸年龄化问题,通过语言-图像扩散模型的质量丰富的先天知识来超越现有方法。首先,我们特化了预训练的扩散模型,使其更适应人脸年龄编辑任务,并使用年龄意识 fine-tuning 方案进行特化。接着,我们将输入图像反转为干扰噪 embedding,并获得优化的 null text embedding。最后,我们通过文本引导的本地年龄编辑来进行控制。量化和质量分析表明,我们的方法在年龄准确性、特征保持和年龄质量等方面都超越了现有方法。
Uncovering the effects of model initialization on deep model generalization: A study with adult and pediatric Chest X-ray images
paper_authors: Sivaramakrishnan Rajaraman, Ghada Zamzmi, Feng Yang, Zhaohui Liang, Zhiyun Xue, Sameer Antani for: 这个研究旨在提高深度学习模型在医疗计算机视觉应用中的性能和可靠性。而关于医疗图像(特别是胸部X射线图像)的影响则更少了解。本研究探讨了三种深度模型初始化技术:冷启动、暖启动和缩小和扰动start,对成人和儿童两个人口进行了评估。methods: 本研究使用了三种深度模型初始化技术:冷启动、暖启动和缩小和扰动start。这些技术在医疗图像的批处理训练场景下进行了评估,以适应实际世界中数据不断来临和模型更新的需求。results: 研究结果表明,使用ImageNet预训练权重初始化的模型在成人和儿童两个人口中的总体化能力较高,超过随机初始化的模型。此外,ImageNet预训练模型在不同训练场景下的内部和外部测试中都表现了稳定的性能。weight级 ensemble方法也显示了明显的提高(p<0.05),特别是在测试阶段。因此,本研究强调了使用ImageNet预训练权重初始化的好处,尤其是在weight级 ensemble方法下,为创建可靠和总体化的深度学习解决方案。Abstract
Model initialization techniques are vital for improving the performance and reliability of deep learning models in medical computer vision applications. While much literature exists on non-medical images, the impacts on medical images, particularly chest X-rays (CXRs) are less understood. Addressing this gap, our study explores three deep model initialization techniques: Cold-start, Warm-start, and Shrink and Perturb start, focusing on adult and pediatric populations. We specifically focus on scenarios with periodically arriving data for training, thereby embracing the real-world scenarios of ongoing data influx and the need for model updates. We evaluate these models for generalizability against external adult and pediatric CXR datasets. We also propose novel ensemble methods: F-score-weighted Sequential Least-Squares Quadratic Programming (F-SLSQP) and Attention-Guided Ensembles with Learnable Fuzzy Softmax to aggregate weight parameters from multiple models to capitalize on their collective knowledge and complementary representations. We perform statistical significance tests with 95% confidence intervals and p-values to analyze model performance. Our evaluations indicate models initialized with ImageNet-pre-trained weights demonstrate superior generalizability over randomly initialized counterparts, contradicting some findings for non-medical images. Notably, ImageNet-pretrained models exhibit consistent performance during internal and external testing across different training scenarios. Weight-level ensembles of these models show significantly higher recall (p<0.05) during testing compared to individual models. Thus, our study accentuates the benefits of ImageNet-pretrained weight initialization, especially when used with weight-level ensembles, for creating robust and generalizable deep learning solutions.
摘要
“模型初始化技术对深度学习模型在医疗计算机视觉应用中的性能和可靠性有着重要的影响。虽然关于非医学图像的研究已经充分,但对医学图像,特别是胸部X射影(CXR)的影响还未得到充分了解。为了解决这个差距,我们的研究探讨了三种深度模型初始化技术:冷启动、温启动和缩放和扰动启动,对于成人和儿童两个人口进行了研究。我们强调在进行训练时periodically arriving data的情况下,以满足实际世界中数据不断来临和模型更新的需求。我们使用F-score-weighted Sequential Least-Squares Quadratic Programming(F-SLSQP)和Attention-Guided Ensembles with Learnable Fuzzy Softmax来权衡多个模型的参数,以便充分利用它们的共同知识和补充表示。我们对模型性能进行了统计学 significativity 测试,结果表明,使用ImageNet预训练权重初始化的模型在总体性能方面表现出色,并且在不同的训练场景下保持了一致的表现。此外,对这些模型进行权重级别的合并也表现出了明显的提升(p<0.05)。因此,我们的研究证明了使用ImageNet预训练权重初始化的模型,特别是在权重级别的合并下,可以创建可靠和总体性能优秀的深度学习解决方案。”
Generalizing Across Domains in Diabetic Retinopathy via Variational Autoencoders
results: 这篇论文显示,使用VA的简单方法可以超越现有的州际顶对应方法,并在公开可用的数据集上达到更高的准确率。这些结果显示,简单的方法可以在医疗图像领域中实现更好的领域普遍化,而不是仅仅靠赖高度复杂的技术。Abstract
Domain generalization for Diabetic Retinopathy (DR) classification allows a model to adeptly classify retinal images from previously unseen domains with various imaging conditions and patient demographics, thereby enhancing its applicability in a wide range of clinical environments. In this study, we explore the inherent capacity of variational autoencoders to disentangle the latent space of fundus images, with an aim to obtain a more robust and adaptable domain-invariant representation that effectively tackles the domain shift encountered in DR datasets. Despite the simplicity of our approach, we explore the efficacy of this classical method and demonstrate its ability to outperform contemporary state-of-the-art approaches for this task using publicly available datasets. Our findings challenge the prevailing assumption that highly sophisticated methods for DR classification are inherently superior for domain generalization. This highlights the importance of considering simple methods and adapting them to the challenging task of generalizing medical images, rather than solely relying on advanced techniques.
摘要
域 generale 化 для 诊断糖尿病 Retinopathy (DR) 让模型能够efficacious 分类 retinal 图像从以前未经见到的域与不同的拍摄条件和患者特征下,从而提高其在各种临床环境中的应用性。在这项研究中,我们探讨了变量自动编码器内置的latent space的分解能力,以获得更加稳定和适应的域不对称表示,以更好地解决DR数据集中的域转移问题。虽然我们的方法简单,但我们发现这种经典方法的效果可以超过当今的状态对DR分类任务的方法。我们的发现证明了不要仅仅依赖于高度复杂的方法,而是应该考虑简单的方法并适应它们来普遍化医疗图像。
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
paper_authors: Ka Chun Shum, Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung
for: 这 paper 是用于描述一种基于神经辐射场的图像渲染方法,可以生成高质量的多视图一致的图像。
methods: 这 paper 使用了一种基于文本扩展的方法来实现对 neural radiance field 中的对象的操作,包括插入新背景和 removing 已有对象。
results: 实验结果表明,这 paper 的方法可以生成高质量的渲染图像,并且在 3D 重建和神经辐射场融合方面超过了现有的方法。Abstract
Neural radiance field is an emerging rendering method that generates high-quality multi-view consistent images from a neural scene representation and volume rendering. Although neural radiance field-based techniques are robust for scene reconstruction, their ability to add or remove objects remains limited. This paper proposes a new language-driven approach for object manipulation with neural radiance fields through dataset updates. Specifically, to insert a new foreground object represented by a set of multi-view images into a background radiance field, we use a text-to-image diffusion model to learn and generate combined images that fuse the object of interest into the given background across views. These combined images are then used for refining the background radiance field so that we can render view-consistent images containing both the object and the background. To ensure view consistency, we propose a dataset updates strategy that prioritizes radiance field training with camera views close to the already-trained views prior to propagating the training to remaining views. We show that under the same dataset updates strategy, we can easily adapt our method for object insertion using data from text-to-3D models as well as object removal. Experimental results show that our method generates photorealistic images of the edited scenes, and outperforms state-of-the-art methods in 3D reconstruction and neural radiance field blending.
摘要
神经辐射场是一种出现在渲染方法中的新技术,它可以生成高质量、多视图一致的图像从神经场景表示和体积渲染。 although neural radiance field-based techniques are robust for scene reconstruction, their ability to add or remove objects remains limited. This paper proposes a new language-driven approach for object manipulation with neural radiance fields through dataset updates. Specifically, to insert a new foreground object represented by a set of multi-view images into a background radiance field, we use a text-to-image diffusion model to learn and generate combined images that fuse the object of interest into the given background across views. These combined images are then used for refining the background radiance field so that we can render view-consistent images containing both the object and the background. To ensure view consistency, we propose a dataset updates strategy that prioritizes radiance field training with camera views close to the already-trained views prior to propagating the training to remaining views. We show that under the same dataset updates strategy, we can easily adapt our method for object insertion using data from text-to-3D models as well as object removal. Experimental results show that our method generates photorealistic images of the edited scenes, and outperforms state-of-the-art methods in 3D reconstruction and neural radiance field blending.
Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information
results: 实验结果显示,作者的模型可以在 NVIDIA RTX 2080 GPU 上实现 25 FPS 的解码速度,并且可以在另一个平台上编码的 720P 视频进行实时解码。此外,实时模型可以提供最高 24.2% BD-rate 改善,相比 anchor H.265。Abstract
The state-of-the-art neural video codecs have outperformed the most sophisticated traditional codecs in terms of RD performance in certain cases. However, utilizing them for practical applications is still challenging for two major reasons. 1) Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream. 2) The high computational complexity of the encoding and decoding process poses a challenge in achieving real-time performance. In this paper, we propose a real-time cross-platform neural video codec, which is capable of efficiently decoding of 720P video bitstream from other encoding platforms on a consumer-grade GPU. First, to solve the problem of inconsistency of codec caused by the uncertainty of floating point calculations across platforms, we design a calibration transmitting system to guarantee the consistent quantization of entropy parameters between the encoding and decoding stages. The parameters that may have transboundary quantization between encoding and decoding are identified in the encoding stage, and their coordinates will be delivered by auxiliary transmitted bitstream. By doing so, these inconsistent parameters can be processed properly in the decoding stage. Furthermore, to reduce the bitrate of the auxiliary bitstream, we rectify the distribution of entropy parameters using a piecewise Gaussian constraint. Second, to match the computational limitations on the decoding side for real-time video codec, we design a lightweight model. A series of efficiency techniques enable our model to achieve 25 FPS decoding speed on NVIDIA RTX 2080 GPU. Experimental results demonstrate that our model can achieve real-time decoding of 720P videos while encoding on another platform. Furthermore, the real-time model brings up to a maximum of 24.2\% BD-rate improvement from the perspective of PSNR with the anchor H.265.
摘要
现代神经视频编码器在某些情况下已经超越了最复杂的传统编码器,但在实际应用中仍然存在两大挑战。首先,由浮点运算引起的平台间计算错误可能导致错误解码bitstream。其次,编码和解码过程的计算复杂性使得实时性很难实现。在这篇论文中,我们提出了一种实时可靠的cross-platform神经视频编码器,可以在consumer-grade GPU上高速解码720P视频bitstream。首先,为了解决由不确定的浮点计算所引起的编码器不一致性问题,我们设计了卡利ibration transmitting系统,以 garantuee the consistent quantization of entropy parameters between the encoding and decoding stages。在编码阶段,我们标识出可能存在跨界量译参数的问题,并将其坐标传输给下游编码器。这样,在解码阶段可以正确处理这些不一致的参数。其次,为了降低auxiliary bitstream的比特率,我们使用piecewise Gaussian constraint来修正参数的分布。其次,为了在解码器端实现实时性,我们设计了一种轻量级模型。我们采用了一系列的效率技巧,使得我们的模型在NVIDIA RTX 2080 GPU上可以达到25帧/秒的解码速度。实验结果表明,我们的模型可以实时解码720P视频,而encoded on another platform。此外,实时模型可以提高最多24.2%的BD-rate,相比 anchor H.265。
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
results: 经过广泛的实验,论文发现这种统一的学习模式能够在不同的图表任务上达到极高的表现,并且能够扩大图表数据集,以提高图表理解能力。Abstract
Charts are common in literature across different scientific fields, conveying rich information easily accessible to readers. Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data, e.g. in a tabular form. In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks, which can be generally applicable to different downstream tasks, beyond the question-answering task as specifically studied in peer works. Specifically, StructChart first reformulates the chart information from the popular tubular form (specifically linearized CSV) to the proposed Structured Triplet Representations (STR), which is more friendly for reducing the task gap between chart perception and reasoning due to the employed structured information extraction for charts. We then propose a Structuring Chart-oriented Representation Metric (SCRM) to quantitatively evaluate the performance for the chart perception task. To enrich the dataset for training, we further explore the possibility of leveraging the Large Language Model (LLM), enhancing the chart diversity in terms of both chart visual style and its statistical information. Extensive experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm to push the frontier of chart understanding.
摘要
图表是科学文献中常见的数据可视化方式,能够快速传递丰富的信息给读者。目前的图表相关任务主要集中在图表识别和基于EXTRACTED数据的逻辑思维两个方面。在这篇论文中,我们希望建立一种统一的和标签有效的学习 парадигм,能够普适应用于不同的下游任务,而不仅仅是特定的问答任务,如在同等作者的论文中所研究。 Specifically, StructChart首先将流行的 tubular 形式(具体是线性化 CSV)中的图表信息重新表述为我们提出的结构化 triplet 表示(STR),这种结构化信息提取技术使得图表识别和逻辑思维之间的任务差距更小。然后,我们提出了一种 Chart-oriented Representation Metric(SCRM)来衡量图表识别任务的表现。为了让训练集更加丰富,我们还探索了使用 Large Language Model(LLM),通过扩展图表的视觉风格和统计信息,提高图表的多样性。我们在不同的图表相关任务上进行了广泛的实验,并证明了这种统一的图表识别和逻辑思维方法的有效性和潜在的前iers。
From Classification to Segmentation with Explainable AI: A Study on Crack Detection and Growth Monitoring
paper_authors: Florent Forest, Hugo Porta, Devis Tuia, Olga Fink
for: 本研究旨在 automatization 基础设施中的表面裂隙监测,以实现结构健康监测。
methods: 本研究使用机器学习方法,但需要大量标注数据进行超vised 训练。而once a crack is detected, monitoring its severity 通常需要精准的像素级别分割。然而,对于每个图像进行像素级别分割的标注是劳动密集的。为了解决这个问题,本研究提议使用可解释的人工智能(XAI)方法,从类ifier的解释中 derivate 分割,只需要弱型图像级别的监督。
results: 本研究发现,使用XAI方法可以生成有意义的分割面掩模,即使无需大量的标注数据。Results reveal that while the resulting segmentation masks may exhibit lower quality than those produced by supervised methods, they remain meaningful and enable severity monitoring, thus reducing substantial labeling costs.Abstract
Monitoring surface cracks in infrastructure is crucial for structural health monitoring. Automatic visual inspection offers an effective solution, especially in hard-to-reach areas. Machine learning approaches have proven their effectiveness but typically require large annotated datasets for supervised training. Once a crack is detected, monitoring its severity often demands precise segmentation of the damage. However, pixel-level annotation of images for segmentation is labor-intensive. To mitigate this cost, one can leverage explainable artificial intelligence (XAI) to derive segmentations from the explanations of a classifier, requiring only weak image-level supervision. This paper proposes applying this methodology to segment and monitor surface cracks. We evaluate the performance of various XAI methods and examine how this approach facilitates severity quantification and growth monitoring. Results reveal that while the resulting segmentation masks may exhibit lower quality than those produced by supervised methods, they remain meaningful and enable severity monitoring, thus reducing substantial labeling costs.
摘要
监测基础设施表面裂隙是结构健康监测的关键。自动视见检测提供了一个有效的解决方案,特别是在困难 accessed 的地方。机器学习方法已经证明其效果,但通常需要大量的注释化数据集 дляsupervised 训练。一旦裂隙被检测出来,则需要精确地分类损害。然而,像素级注释图像 для分类是时间consuming。为了解决这个问题,这篇论文提议使用可解释人工智能(XAI) derive 分类器的解释,只需弱型图像级指导。这种方法可以帮助实现裂隙分类和严重性评估,并且可以降低大量的标注成本。我们评估了不同的XAI方法的性能,并研究了这种方法是否可以实现严重性评估和生长监测。结果表明,尽管生成的分类器分割面可能不如supervised 方法生成的分割面质量高,但它们仍然有意义,并且可以实现严重性评估和生长监测,从而减少标注成本。
TwinTex: Geometry-aware Texture Generation for Abstracted 3D Architectural Models
results: 实验结果表明,这种方法可以高效地生成高质量的文本映射,并且可以在不同的建筑物和景观中实现人工专家水平的效果,而不需要太多的工作。Abstract
Coarse architectural models are often generated at scales ranging from individual buildings to scenes for downstream applications such as Digital Twin City, Metaverse, LODs, etc. Such piece-wise planar models can be abstracted as twins from 3D dense reconstructions. However, these models typically lack realistic texture relative to the real building or scene, making them unsuitable for vivid display or direct reference. In this paper, we present TwinTex, the first automatic texture mapping framework to generate a photo-realistic texture for a piece-wise planar proxy. Our method addresses most challenges occurring in such twin texture generation. Specifically, for each primitive plane, we first select a small set of photos with greedy heuristics considering photometric quality, perspective quality and facade texture completeness. Then, different levels of line features (LoLs) are extracted from the set of selected photos to generate guidance for later steps. With LoLs, we employ optimization algorithms to align texture with geometry from local to global. Finally, we fine-tune a diffusion model with a multi-mask initialization component and a new dataset to inpaint the missing region. Experimental results on many buildings, indoor scenes and man-made objects of varying complexity demonstrate the generalization ability of our algorithm. Our approach surpasses state-of-the-art texture mapping methods in terms of high-fidelity quality and reaches a human-expert production level with much less effort. Project page: https://vcc.tech/research/2023/TwinTex.
摘要
<>文本翻译成简化中文。>建筑模型经常在大规模生成,从个别建筑到场景,用于下游应用程序,如数字城市、Metaverse、LODs等。这些块状平面模型可以被抽象为真实建筑或场景的孪生。然而,这些模型通常缺乏真实建筑或场景的精炼文化,使其不适合精彩显示或直接参考。在这篇论文中,我们介绍了 TwinTex,首个自动Texture mapping框架,用于生成具有高精炼度的Texture для块状平面代理。我们的方法解决了这类孪生Texture生成中的主要挑战。具体来说,对于每个基本平面,我们首先选择一小集数据,使用善意的规则来考虑光学质量、视角质量和建筑面料完整性。然后,我们从这些选择的数据中提取不同级别的线条特征(LoLs),以供后续步骤的引导。使用LoLs,我们运用优化算法将Texture与Geometry进行对齐。最后,我们使用扩展模型,并在新的数据集上进行填充缺失区域。实验结果表明,我们的算法可以在许多不同复杂度的建筑、室内场景和人工制品上实现高精炼度的Texture mapping,并且超过了当前状态艺的Texture mapping方法。我们的方法可以减少很多劳动力,达到人工专家水平。项目页面:https://vcc.tech/research/2023/TwinTex。
Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text
For: 提高文本检测的精度和效率,尤其是对于不规则的文本布局。* Methods: 基于Sparse R-CNN的协调解码管道,通过逐次精度调整多边形预测,使用单个特征向量导引多边形实例准备。* Results: 比较DPText-DETR方法,具有更高的内存效率(>50%)和推理速度(>40%),同时保持了基准测试集上的性能水平。Abstract
Recently, Transformer-based text detection techniques have sought to predict polygons by encoding the coordinates of individual boundary vertices using distinct query features. However, this approach incurs a significant memory overhead and struggles to effectively capture the intricate relationships between vertices belonging to the same instance. Consequently, irregular text layouts often lead to the prediction of outlined vertices, diminishing the quality of results. To address these challenges, we present an innovative approach rooted in Sparse R-CNN: a cascade decoding pipeline for polygon prediction. Our method ensures precision by iteratively refining polygon predictions, considering both the scale and location of preceding results. Leveraging this stabilized regression pipeline, even employing just a single feature vector to guide polygon instance regression yields promising detection results. Simultaneously, the leverage of instance-level feature proposal substantially enhances memory efficiency (>50% less vs. the state-of-the-art method DPText-DETR) and reduces inference speed (>40% less vs. DPText-DETR) with minor performance drop on benchmarks.
摘要
traducción al chino simplificado:现在,基于Transformer的文本检测技术尝试预测多边形,通过对各个边界顶点的坐标使用特定的查询特征进行编码。然而,这种方法带来了显著的内存开销,并且很难准确地捕捉同一个实例中的逻辑关系。因此,不规则的文本布局经常导致预测的边界顶点变为围栏顶点,这会导致结果的质量下降。为了解决这些挑战,我们提出了一种创新的方法,基于Sparse R-CNN:一个逻辑拓展管道 для多边形预测。我们的方法保证准确性,通过迭代地纠正多边形预测结果,考虑多边形的缩放和位置。通过这个稳定的回归管道,甚至只使用一个特征向量来引导多边形实例回归,也可以获得了有前途的检测结果。同时,通过实例级别的特征提档,可以大幅提高内存效率(>50%比DPText-DETR更高),并且降低推理速度(>40%比DPText-DETR更低),而无需做出重要的性能下降。
Towards Robust Few-shot Point Cloud Semantic Segmentation
results: 我们在不同噪声设定下进行了广泛的实验,结果显示CCNS和MDNS的组合显著提高了性能。Abstract
Few-shot point cloud semantic segmentation aims to train a model to quickly adapt to new unseen classes with only a handful of support set samples. However, the noise-free assumption in the support set can be easily violated in many practical real-world settings. In this paper, we focus on improving the robustness of few-shot point cloud segmentation under the detrimental influence of noisy support sets during testing time. To this end, we first propose a Component-level Clean Noise Separation (CCNS) representation learning to learn discriminative feature representations that separates the clean samples of the target classes from the noisy samples. Leveraging the well separated clean and noisy support samples from our CCNS, we further propose a Multi-scale Degree-based Noise Suppression (MDNS) scheme to remove the noisy shots from the support set. We conduct extensive experiments on various noise settings on two benchmark datasets. Our results show that the combination of CCNS and MDNS significantly improves the performance. Our code is available at https://github.com/Pixie8888/R3DFSSeg.
摘要
文本:几个类别点云 semantic segmentation 目标是训练一个模型快速适应新未见类别,仅仅需要一些支持集样本。然而,实际世界中的实际设定中可能会轻松违反无噪设定。在这篇论文中,我们专注于增强几个类别点云 semantic segmentation 的Robustness,在测试时testing时的恶劣影响下。为此,我们首先提出了Component-level Clean Noise Separation (CCNS) 表示学习,以学习分类特征表现,将目标类别的清洁样本与噪音样本分离。然后,我们更进一步提出了Multi-scale Degree-based Noise Suppression (MDNS) 方案,以移除测试时的噪音样本。我们对不同噪音设定进行了广泛的实验,结果显示,CCNS 和 MDNS 的组合可以明显提高性能。我们的代码可以在 中找到。翻译结果:文本:几个类别点云 semantic segmentation 目标是训练一个模型快速适应新未见类别,仅仅需要一些支持集样本。然而,实际世界中的实际设定中可能会轻松违反无噪设定。在这篇论文中,我们专注于增强几个类别点云 semantic segmentation 的Robustness,在测试时testing时的恶劣影响下。为此,我们首先提出了Component-level Clean Noise Separation (CCNS) 表示学习,以学习分类特征表现,将目标类别的清洁样本与噪音样本分离。然后,我们更进一步提出了Multi-scale Degree-based Noise Suppression (MDNS) 方案,以移除测试时的噪音样本。我们对不同噪音设定进行了广泛的实验,结果显示,CCNS 和 MDNS 的组合可以明显提高性能。我们的代码可以在 中找到。
Generalized Few-Shot Point Cloud Segmentation Via Geometric Words
paper_authors: Yating Xu, Conghui Hu, Na Zhao, Gim Hee Lee
For: 这篇论文的目的是提出一种更实用的普通多少shot点云分割方法,可以在新类出现时通过几个支持点云来泛化到新类,同时保持基础类的分割精度。* Methods: 该方法使用的是 geometric words 来表示基础和新类之间的 geometric 共同部分,并将其 incorporated 到一种新的 geometric-aware semantic representation 中,以便更好地泛化到新类而不忘记基础类。此外,该方法还引入 geometric prototypes 来导引分割,使用 geometric prior knowledge。* Results: compared with基eline方法,该方法在 S3DIS 和 ScanNet 上的实验表现出色,显示了更高的性能。I hope that helps! Let me know if you have any other questions.Abstract
Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes. Few-shot point cloud segmentation algorithms address this problem by learning to adapt to new classes at the sacrifice of segmentation accuracy for the base classes, which severely impedes its practicality. This largely motivates us to present the first attempt at a more practical paradigm of generalized few-shot point cloud segmentation, which requires the model to generalize to new categories with only a few support point clouds and simultaneously retain the capability to segment base classes. We propose the geometric words to represent geometric components shared between the base and novel classes, and incorporate them into a novel geometric-aware semantic representation to facilitate better generalization to the new classes without forgetting the old ones. Moreover, we introduce geometric prototypes to guide the segmentation with geometric prior knowledge. Extensive experiments on S3DIS and ScanNet consistently illustrate the superior performance of our method over baseline methods. Our code is available at: https://github.com/Pixie8888/GFS-3DSeg_GWs.
摘要
现有的完全监督的点云分割方法在新类出现的动态测试环境中表现不佳,这是因为这些方法在学习新类时会卷积到基础类的精度,这大大限制了其实用性。这种情况激励我们提出一种更实用的通用几shot点云分割方法,要求模型能够通过几个支持点云来扩展到新类,同时保持基础类的分割精度。我们使用“geometry words”来表示基础和新类之间的几何共同部分,并将其 integrate into a novel geometric-aware semantic representation,以便更好地适应新类而无需忘记旧类。此外,我们还引入几何规范来导航分割,以利用几何知识来提高分割精度。我们的实验表明,我们的方法在S3DIS和ScanNet上的扩展性和稳定性都显著提高。代码可以在:https://github.com/Pixie8888/GFS-3DSeg_GWs 中找到。
Automatic Bat Call Classification using Transformer Networks
paper_authors: Frank Fundel, Daniel A. Braun, Sebastian Gottwald
for: automatic bat call identification
methods: Transformer architecture for multi-label classification
results: single species accuracy of 88.92% (F1-score of 84.23%), multi species macro F1-score of 74.40%Abstract
Automatically identifying bat species from their echolocation calls is a difficult but important task for monitoring bats and the ecosystem they live in. Major challenges in automatic bat call identification are high call variability, similarities between species, interfering calls and lack of annotated data. Many currently available models suffer from relatively poor performance on real-life data due to being trained on single call datasets and, moreover, are often too slow for real-time classification. Here, we propose a Transformer architecture for multi-label classification with potential applications in real-time classification scenarios. We train our model on synthetically generated multi-species recordings by merging multiple bats calls into a single recording with multiple simultaneous calls. Our approach achieves a single species accuracy of 88.92% (F1-score of 84.23%) and a multi species macro F1-score of 74.40% on our test set. In comparison to three other tools on the independent and publicly available dataset ChiroVox, our model achieves at least 25.82% better accuracy for single species classification and at least 6.9% better macro F1-score for multi species classification.
摘要
自动识别蝙蝠种类从呼叫声中是一项具有挑战性和重要性的任务,用于监测蝙蝠和它们所处生态系统。主要挑战在自动蝙蝠呼叫识别中是呼叫声的高度变化、种类之间的相似性、干扰声和缺乏标注数据。现有的许多模型在实际数据上表现较差,主要是因为它们在单个呼叫数据集上训练。我们提出一种Transformer架构,用于多类别分类,具有实时分类场景的应用 potential。我们在合成生成的多种 recording中训练我们的模型,其中每个记录包含多个同时发生的呼叫。我们的方法实现了单种呼叫精度88.92%(F1-score为84.23%)和多种macro F1-score74.40%。与三个其他工具在独立公共的数据集ChiroVox上进行比较,我们的模型至少25.82%更高的单种呼叫精度和6.9%更高的多种 macro F1-score。
EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian
methods: 这篇论文使用了知识传播(knowledge distillation)和自适应层重复(adaptive weighting of layers)来实现增强后期量化。另外,论文还引入了一种无标签技术来近似任务损失的希耶数(Label-Free Hessian),以除去需要标签数据集的需求。
results: 这篇论文的实验结果显示,通过使用EPTQ,可以在各种模型、任务和数据集上取得最佳的结果,包括ImageNet分类、COCO物件检测和Pascal-VOC semantic segmentation。此外,论文还证明了EPTQ的可行性和可替代性,可以在不同的架构上进行实现,包括CNNs、Transformers、混合和MLP-only模型。Abstract
Quantization of deep neural networks (DNN) has become a key element in the efforts of embedding such networks on end-user devices. However, current quantization methods usually suffer from costly accuracy degradation. In this paper, we propose a new method for Enhanced Post Training Quantization named EPTQ. The method is based on knowledge distillation with an adaptive weighting of layers. In addition, we introduce a new label-free technique for approximating the Hessian trace of the task loss, named Label-Free Hessian. This technique removes the requirement of a labeled dataset for computing the Hessian. The adaptive knowledge distillation uses the Label-Free Hessian technique to give greater attention to the sensitive parts of the model while performing the optimization. Empirically, by employing EPTQ we achieve state-of-the-art results on a wide variety of models, tasks, and datasets, including ImageNet classification, COCO object detection, and Pascal-VOC for semantic segmentation. We demonstrate the performance and compatibility of EPTQ on an extended set of architectures, including CNNs, Transformers, hybrid, and MLP-only models.
摘要
深度神经网络(DNN)的量化已成为嵌入这些网络在用户端设备的关键元素。然而,当前的量化方法通常会导致精度下降。在这篇论文中,我们提出了一种新的增强后期量化方法,称为增强后期量化(EPTQ)。该方法基于知识传承,并使用自适应层权重。此外,我们还介绍了一种新的无标签技术,用于估计任务损失的希尔伯特特征,称为无标签希尔伯特特征(Label-Free Hessian)。这种技术消除了需要标注数据集来计算希尔伯特特征的需求。适应知识传承使用无标签希尔伯特特征来增加对模型敏感部分的注意力,进行优化。我们的实验结果表明,通过使用EPTQ,我们在各种模型、任务和数据集上达到了状态对的结果,包括ImageNet分类、COCO物体检测和Pascal-VOC semantics segmentation。我们也证明了EPTQ在扩展的集成体系中的性能和兼容性,包括CNNs、Transformers、混合和MLP-only模型。
Partition-A-Medical-Image: Extracting Multiple Representative Sub-regions for Few-shot Medical Image Segmentation
results: 经过广泛的实验证明,本研究在三个公开 accessible medical imaging datasets上实现了与主流 FSMIS 方法相比的稳定改进。并且提供了一个可用的源代码(https://github.com/YazhouZhu19/PAMI)。Abstract
Few-shot Medical Image Segmentation (FSMIS) is a more promising solution for medical image segmentation tasks where high-quality annotations are naturally scarce. However, current mainstream methods primarily focus on extracting holistic representations from support images with large intra-class variations in appearance and background, and encounter difficulties in adapting to query images. In this work, we present an approach to extract multiple representative sub-regions from a given support medical image, enabling fine-grained selection over the generated image regions. Specifically, the foreground of the support image is decomposed into distinct regions, which are subsequently used to derive region-level representations via a designed Regional Prototypical Learning (RPL) module. We then introduce a novel Prototypical Representation Debiasing (PRD) module based on a two-way elimination mechanism which suppresses the disturbance of regional representations by a self-support, Multi-direction Self-debiasing (MS) block, and a support-query, Interactive Debiasing (ID) block. Finally, an Assembled Prediction (AP) module is devised to balance and integrate predictions of multiple prototypical representations learned using stacked PRD modules. Results obtained through extensive experiments on three publicly accessible medical imaging datasets demonstrate consistent improvements over the leading FSMIS methods. The source code is available at https://github.com/YazhouZhu19/PAMI.
摘要
供少医学图像分割(FSMIS)是一种更有前途的解决方案,用于医学图像分割任务中,高质量标注很难获得。然而,当前主流方法主要是提取支持图像中巨量的内部变化的整体表示,并遇到在查询图像上适应的困难。在这种工作中,我们提出了一种方法,可以从支持医学图像中提取多个代表性子区域,以便精细地选择生成的图像区域。具体来说,支持图像的前景被分解成不同的区域,然后通过我们设计的区域层学习(RPL)模块来 derivation region-level表示。我们然后引入了一种新的表示偏导(PRD)模块,基于两种排除机制,即自我支持的多向排除(MS)块和支持-查询的互动排除(ID)块。最后,我们设计了一个集成预测(AP)模块,可以平衡和集成多个表示学习的PRD模块中的预测。经过了广泛的实验,我们在三个公共 accessible的医学图像数据集上获得了一致的改进。源代码可以在https://github.com/YazhouZhu19/PAMI上获取。
AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration
results: 研究表明,使用我们的方法可以在TUD-L、LINEMOD和Occluded-LINEMOD等任务上实现更好的性能,比如ModelNet40数据集。此外,我们还证明了我们的方法可以在不同的点云注册网络上实现更好的性能。Abstract
In the current deep learning paradigm, the amount and quality of training data are as critical as the network architecture and its training details. However, collecting, processing, and annotating real data at scale is difficult, expensive, and time-consuming, particularly for tasks such as 3D object registration. While synthetic datasets can be created, they require expertise to design and include a limited number of categories. In this paper, we introduce a new approach called AutoSynth, which automatically generates 3D training data for point cloud registration. Specifically, AutoSynth automatically curates an optimal dataset by exploring a search space encompassing millions of potential datasets with diverse 3D shapes at a low cost.To achieve this, we generate synthetic 3D datasets by assembling shape primitives, and develop a meta-learning strategy to search for the best training data for 3D registration on real point clouds. For this search to remain tractable, we replace the point cloud registration network with a much smaller surrogate network, leading to a $4056.43$ times speedup. We demonstrate the generality of our approach by implementing it with two different point cloud registration networks, BPNet and IDAM. Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.
摘要
现在的深度学习 paradigma中,训练数据的量和质量是网络架构和训练细节的 equally important factors。然而,收集、处理和标注实际数据在大规模上是困难、昂贵和时间consuming的,特别是 для tasks such as 3D object registration。 although synthetic datasets can be created, they require expertise to design and have a limited number of categories. In this paper, we introduce a new approach called AutoSynth, which automatically generates 3D training data for point cloud registration. Specifically, AutoSynth automatically curates an optimal dataset by exploring a search space encompassing millions of potential datasets with diverse 3D shapes at a low cost.To achieve this, we generate synthetic 3D datasets by assembling shape primitives, and develop a meta-learning strategy to search for the best training data for 3D registration on real point clouds. For this search to remain tractable, we replace the point cloud registration network with a much smaller surrogate network, leading to a $4056.43$ times speedup. We demonstrate the generality of our approach by implementing it with two different point cloud registration networks, BPNet and IDAM. Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.
Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
results: 实验结果表明,我们提出的视频 IPMT 模型在两个标准测试集上显著超过了之前的模型。Abstract
Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural similarity relation among different predicted regions and the support for selecting reliable memory frames. Furthermore, a new segmentation loss is also proposed to enhance the category discriminability of the learned prototypes. Experimental results demonstrate that our proposed video IPMT model significantly outperforms previous models on two benchmark datasets. Code is available at https://github.com/nankepan/VIPMT.
摘要
几个视频对象分割(FSVOS)目标是使用一些定义同一类目的支持图像来分割查询视频中的对象。然而,这个任务几乎没有被研究。在这个工作中,我们基于IPMT,一种现有的少量图像分割方法,通过 вне部支持导航信息和适应查询导航征料来拓展我们的方法。我们将查询视频信息分解成一个clip原型和一个记忆原型,以捕捉本地和长期内部 temporal导航信息。每帧prototype被使用,以独立处理细腻的适应导航和两个方向clip-frame prototype通信。为了减少干扰的内存,我们提议使用不同预测区域之间的结构相似关系和支持选择可靠的记忆帧。此外,我们还提出了一种新的分割损失,以提高学习的类别可识别度。实验结果表明,我们的提出的视频IPMT模型在两个标准数据集上显著超越了之前的模型。代码可以在https://github.com/nankepan/VIPMT上获取。
Learning Deformable 3D Graph Similarity to Track Plant Cells in Unregistered Time Lapse Images
paper_authors: Md Shazid Islam, Arindam Dutta, Calvin-Khang Ta, Kevin Rodriguez, Christian Michael, Mark Alber, G. Venugopala Reddy, Amit K. Roy-Chowdhury
results: 该论文在一个标准数据集上进行了实验,并证明了该方法的跟踪精度和搜索时间的优势。Abstract
Tracking of plant cells in images obtained by microscope is a challenging problem due to biological phenomena such as large number of cells, non-uniform growth of different layers of the tightly packed plant cells and cell division. Moreover, images in deeper layers of the tissue being noisy and unavoidable systemic errors inherent in the imaging process further complicates the problem. In this paper, we propose a novel learning-based method that exploits the tightly packed three-dimensional cell structure of plant cells to create a three-dimensional graph in order to perform accurate cell tracking. We further propose novel algorithms for cell division detection and effective three-dimensional registration, which improve upon the state-of-the-art algorithms. We demonstrate the efficacy of our algorithm in terms of tracking accuracy and inference-time on a benchmark dataset.
摘要
track plant cells in microscope images 是一个复杂的问题,因为生物现象如大量细胞、不均生长的不同层次紧密排列的植物细胞,以及细胞分裂。此外,深层组织图像中的噪声和不可避免的图像捕捉过程中的系统性错误更加复杂了问题。在本文中,我们提出了一种基于学习的方法,利用植物细胞紧密三维结构来创建三维图表,以进行准确的细胞跟踪。我们还提出了新的细胞分裂检测算法和有效的三维对接算法,这些算法都超过了当前状态的算法。我们通过对一个标准数据集进行评估,证明了我们的算法的准确性和推理时间。
CNN-based local features for navigation near an asteroid
paper_authors: Olli Knuuttila, Antti Kestilä, Esa Kallio
for: asteroid exploration missions and on-orbit servicing
methods: lightweight feature extractor specifically tailored for asteroid proximity navigation, designed to be robust to illumination changes and affine transformations
results: effective navigation and localization, with incremental improvements over existing methods and a trained feature extractorAbstract
This article addresses the challenge of vision-based proximity navigation in asteroid exploration missions and on-orbit servicing. Traditional feature extraction methods struggle with the significant appearance variations of asteroids due to limited scattered light. To overcome this, we propose a lightweight feature extractor specifically tailored for asteroid proximity navigation, designed to be robust to illumination changes and affine transformations. We compare and evaluate state-of-the-art feature extraction networks and three lightweight network architectures in the asteroid context. Our proposed feature extractors and their evaluation leverages both synthetic images and real-world data from missions such as NEAR Shoemaker, Hayabusa, Rosetta, and OSIRIS-REx. Our contributions include a trained feature extractor, incremental improvements over existing methods, and a pipeline for training domain-specific feature extractors. Experimental results demonstrate the effectiveness of our approach in achieving accurate navigation and localization. This work aims to advance the field of asteroid navigation and provides insights for future research in this domain.
摘要
(Simplified Chinese translation)这篇文章关注 asteroid 探测和处理任务中的视觉靠近导航挑战,传统的特征提取方法由于 asteroid 的限制散射光导致表现变化强大。为了解决这个问题,我们提议一种适应 asteroid 靠近导航的轻量级特征提取器,可以抗抗照明变化和抽象变换。我们比较和评估了现有的特征提取网络和三种轻量级网络体系,并在 asteroid 上进行了评估。我们的提案包括一个已经训练好的特征提取器,以及对现有方法进行了改进。我们的实验结果表明,我们的方法可以实现高精度的导航和地址确定。这项工作希望可以推动 asteroid 导航领域的进步,并为未来的研究提供了新的思路和灵感。
Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry
results: 实验表明,该方法可以在不同的环境下(室内和外),适应环境变化,并且可以准确地预测未来控制输入的效果。同时,该方法还可以提高跟踪精度。Abstract
Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual inertial odometry. Our method calibrates and adapts the dynamics model online and facilitates accurate forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In our experiments, we demonstrate that our ST-VIO can not only adapt to the change of the environments and achieve accurate prediction under new control inputs, but even improves the tracking accuracy. Supplementary video: https://youtu.be/BuGY1L1FRa4.
摘要
自动移动机器人需要估算其运动和控制动作的影响以实现导航规划。本文提出了ST-VIO,一种新的方法,它将单车辆动力学模型紧密融合视觉陀螺仪定位。我们的方法在线投入和调整动力学模型,并使用未来控制输入的前提下进行高精度预测。单车辆动力学模型是在特定的控制输入下,在平地上使用普通微分方程描述车辆的运动。我们使用不含特征点和可微分的单车辆模型,以便轻松地将动力学模型纳入VIО中,并在VIО状态变量上线上调整模型参数。我们通过实验证明,我们的ST-VIO可以不仅适应环境变化,并在新的控制输入下实现高精度跟踪。补充视频:https://youtu.be/BuGY1L1FRa4。
GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation
For: 这个论文研究了非监督领域适应(Unsupervised Domain Adaptation,UDA)在echocardiogram视频分割方面,目的是将来自源频谱域的模型泛化到其他未标注目标频谱域。* Methods: 我们引入了一个新的CardiacUDA数据集和一种名为GraphEcho的新方法,该方法包括两个创新模块:空间域频谱匹配(SCGM)和心跳周期一致性(TCC)模块。这两个模块可以更好地对global和local特征从源和目标频谱域进行对齐,从而提高UDA分割结果。* Results: 我们的GraphEcho方法在对比 existed状态的推荐UDA分割方法时表现出色,实验结果表明。我们的CardiacUDA数据集和代码将在接受后公开发布,这项工作将为心脏结构分割从echocardiogram视频中奠定新的、坚实的基础。代码和数据集可以通过https://github.com/xmed-lab/GraphEcho访问。Abstract
Echocardiogram video segmentation plays an important role in cardiac disease diagnosis. This paper studies the unsupervised domain adaption (UDA) for echocardiogram video segmentation, where the goal is to generalize the model trained on the source domain to other unlabelled target domains. Existing UDA segmentation methods are not suitable for this task because they do not model local information and the cyclical consistency of heartbeat. In this paper, we introduce a newly collected CardiacUDA dataset and a novel GraphEcho method for cardiac structure segmentation. Our GraphEcho comprises two innovative modules, the Spatial-wise Cross-domain Graph Matching (SCGM) and the Temporal Cycle Consistency (TCC) module, which utilize prior knowledge of echocardiogram videos, i.e., consistent cardiac structure across patients and centers and the heartbeat cyclical consistency, respectively. These two modules can better align global and local features from source and target domains, improving UDA segmentation results. Experimental results showed that our GraphEcho outperforms existing state-of-the-art UDA segmentation methods. Our collected dataset and code will be publicly released upon acceptance. This work will lay a new and solid cornerstone for cardiac structure segmentation from echocardiogram videos. Code and dataset are available at: https://github.com/xmed-lab/GraphEcho
摘要
《echocardiogram视频分割 plays an important role in cardiac disease diagnosis。This paper studies the unsupervised domain adaption(UDA)for echocardiogram视频分割,where the goal is to generalize the model trained on the source domain to other unlabelled target domains。Existing UDA segmentation methods are not suitable for this task because they do not model local information and the cyclical consistency of heartbeat。In this paper, we introduce a newly collected CardiacUDA dataset and a novel GraphEcho method for cardiac structure segmentation。Our GraphEcho comprises two innovative modules,the Spatial-wise Cross-domain Graph Matching(SCGM)and the Temporal Cycle Consistency(TCC)module,which utilize prior knowledge of echocardiogram videos,i.e., consistent cardiac structure across patients and centers and the heartbeat cyclical consistency,respectively。These two modules can better align global and local features from source and target domains,improving UDA segmentation results。Experimental results showed that our GraphEcho outperforms existing state-of-the-art UDA segmentation methods。Our collected dataset and code will be publicly released upon acceptance。This work will lay a new and solid cornerstone for cardiac structure segmentation from echocardiogram videos。Code and dataset are available at:https://github.com/xmed-lab/GraphEcho。》Note that Simplified Chinese is the official writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other regions.
GL-Fusion: Global-Local Fusion Network for Multi-view Echocardiogram Video Segmentation
results: 该研究通过使用MvEVD数据集进行测试,发现GL-Fusion方法可以提高echocardiogram分析的准确性,与基eline方法相比提高了7.83%。此外,GL-Fusion方法还超过了现有的状态 искусственный智能方法。Abstract
Cardiac structure segmentation from echocardiogram videos plays a crucial role in diagnosing heart disease. The combination of multi-view echocardiogram data is essential to enhance the accuracy and robustness of automated methods. However, due to the visual disparity of the data, deriving cross-view context information remains a challenging task, and unsophisticated fusion strategies can even lower performance. In this study, we propose a novel Gobal-Local fusion (GL-Fusion) network to jointly utilize multi-view information globally and locally that improve the accuracy of echocardiogram analysis. Specifically, a Multi-view Global-based Fusion Module (MGFM) is proposed to extract global context information and to explore the cyclic relationship of different heartbeat cycles in an echocardiogram video. Additionally, a Multi-view Local-based Fusion Module (MLFM) is designed to extract correlations of cardiac structures from different views. Furthermore, we collect a multi-view echocardiogram video dataset (MvEVD) to evaluate our method. Our method achieves an 82.29% average dice score, which demonstrates a 7.83% improvement over the baseline method, and outperforms other existing state-of-the-art methods. To our knowledge, this is the first exploration of a multi-view method for echocardiogram video segmentation. Code available at: https://github.com/xmed-lab/GL-Fusion
摘要
卡第亚结构分割自echocardiogram视频中扮演重要的角色,用于诊断心血管疾病。多视图echocardiogram数据的组合是提高自动方法的准确性和可靠性的关键。然而,由于视觉差异, derivation of cross-view context information remains a challenging task, and unsophisticated fusion strategies can even lower performance. 在这种研究中,我们提出了一种全新的全球-本地混合(GL-Fusion)网络,用于同时利用多视图信息的全球和本地信息,以提高echocardiogram分析的准确性。 Specifically, a Multi-view Global-based Fusion Module (MGFM) is proposed to extract global context information and to explore the cyclic relationship of different heartbeat cycles in an echocardiogram video. Additionally, a Multi-view Local-based Fusion Module (MLFM) is designed to extract correlations of cardiac structures from different views. Furthermore, we collect a multi-view echocardiogram video dataset (MvEVD) to evaluate our method. Our method achieves an 82.29% average dice score, which demonstrates a 7.83% improvement over the baseline method, and outperforms other existing state-of-the-art methods. To our knowledge, this is the first exploration of a multi-view method for echocardiogram video segmentation. 可以在https://github.com/xmed-lab/GL-Fusion找到我们的代码。
results: 本文的结果显示,使用了新的Sub-pixel Convolution和多条气平面输入模块,可以提高分类结果的精度和效率,并且在Synapse和ACDC datasets上表现出色,超越了其他现有的方法。Abstract
U-Net and its variants have been widely used in medical image segmentation. However, most current U-Net variants confine their improvement strategies to building more complex encoder, while leaving the decoder unchanged or adopting a simple symmetric structure. These approaches overlook the true functionality of the decoder: receiving low-resolution feature maps from the encoder and restoring feature map resolution and lost information through upsampling. As a result, the decoder, especially its upsampling component, plays a crucial role in enhancing segmentation outcomes. However, in 3D medical image segmentation, the commonly used transposed convolution can result in visual artifacts. This issue stems from the absence of direct relationship between adjacent pixels in the output feature map. Furthermore, plain encoder has already possessed sufficient feature extraction capability because downsampling operation leads to the gradual expansion of the receptive field, but the loss of information during downsampling process is unignorable. To address the gap in relevant research, we extend our focus beyond the encoder and introduce neU-Net (i.e., not complex encoder U-Net), which incorporates a novel Sub-pixel Convolution for upsampling to construct a powerful decoder. Additionally, we introduce multi-scale wavelet inputs module on the encoder side to provide additional information. Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse and ACDC datasets.
摘要
U-Net和其变种在医学影像分割中广泛应用。然而,现有的U-Net变种通常是通过建立更复杂的编码器来提高性能,而忽略了解码器的真正功能:接收低分辨率特征图并将其修复到原始分辨率和丢失信息。这些方法忽略了解码器中的upsampling组件的重要作用,这使得分割结果受到限制。尤其在3D医学影像分割中,通常使用的拼接 convolution 可能会导致视觉artefacts。这种问题的原因在于输出特征图中不存在直接相邻像素的直接关系。此外,简单的编码器已经拥有了充足的特征提取能力,因为下降操作导致捕捉区域的扩展,但是下降操作中丢失的信息是不可忽略的。为了解决这个研究漏洞,我们扩展了我们的关注范围,并引入了一种新的Sub-pixel Convolution для upsampling,以建立一个强大的解码器。此外,我们还引入了多尺度wavelet输入模块在编码器Side来提供额外信息。我们的模型设计实现了出色的结果,超过了其他状态对的方法在Synapse和ACDC数据集上。
Shape Anchor Guided Holistic Indoor Scene Understanding
results: 在ScanNetv2 dataset上进行了实验,并取得了在3D对象检测、布局估计和形态重建方面的状态 искусственный智能性能。Abstract
This paper proposes a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding. We observe that the search space constructed by current methods for proposal feature grouping and instance point sampling often introduces massive noise to instance detection and mesh reconstruction. Accordingly, we develop AncLearn to generate anchors that dynamically fit instance surfaces to (i) unmix noise and target-related features for offering reliable proposals at the detection stage, and (ii) reduce outliers in object point sampling for directly providing well-structured geometry priors without segmentation during reconstruction. We embed AncLearn into a reconstruction-from-detection learning system (AncRec) to generate high-quality semantic scene models in a purely instance-oriented manner. Experiments conducted on the challenging ScanNetv2 dataset demonstrate that our shape anchor-based method consistently achieves state-of-the-art performance in terms of 3D object detection, layout estimation, and shape reconstruction. The code will be available at https://github.com/Geo-Tell/AncRec.
摘要
Locate and Verify: A Two-Stream Network for Improved Deepfake Detection
results: 本文的方法在六个benchmark上与现有方法比较,表现出了significantly improved的一般化和特定 forgery 区域探测能力,包括Frame-level AUC在Deepfake Detection Challenge preview dataset上从0.797提高到0.835,以及Video-level AUC在CelebDF$_$v1 dataset上从0.811提高到0.847。Abstract
Deepfake has taken the world by storm, triggering a trust crisis. Current deepfake detection methods are typically inadequate in generalizability, with a tendency to overfit to image contents such as the background, which are frequently occurring but relatively unimportant in the training dataset. Furthermore, current methods heavily rely on a few dominant forgery regions and may ignore other equally important regions, leading to inadequate uncovering of forgery cues. In this paper, we strive to address these shortcomings from three aspects: (1) We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts forgery evidence. (2) We devise three functional modules to handle the multi-stream and multi-scale features in a collaborative learning scheme. (3) Confronted with the challenge of obtaining forgery annotations, we propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations. Empirically, our method demonstrates significantly improved robustness and generalizability, outperforming previous methods on six benchmarks, and improving the frame-level AUC on Deepfake Detection Challenge preview dataset from 0.797 to 0.835 and video-level AUC on CelebDF$\_$v1 dataset from 0.811 to 0.847. Our implementation is available at https://github.com/sccsok/Locate-and-Verify.
摘要
深刻的假动作(Deepfake)已经在世界上引发了一场信任危机。目前的假动作检测方法通常无法普遍化,往往对背景进行过滤,这些背景虽然常见但相对 speaking 不重要。此外,现有的方法倾向于仅对一些主导的伪造区域进行过滤,可能会忽略其他Equally important regions,从而导致伪造讯号的不充分探测。在这篇文章中,我们尝试解决这些缺陷自三个方面:1. 我们提出了一个创新的两条流网络,实际地扩大了模型从中提取伪造证据的可能区域。2. 我们设计了三个功能模组,以实现多条流和多个标准之间的合作学习。3. 面对伪造标注的挑战,我们提出了一个半supervised Patch Similarity Learning策略,以估计伪造区域标注。实际上,我们的方法在六个benchmark上表现出色,与前一代方法相比,具有更好的 Robustness 和普遍化能力。我们的实现可以在https://github.com/sccsok/Locate-and-Verify上找到。
PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement
results: 实验结果表明,PSDiff在标准测试集上达到了当前最佳性能,具有较少的参数和灵活计算负担。Abstract
Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, detection and Re-IDentification (ReID). Despite significant progress, two major challenges remain: 1) Detection-prior modules in previous methods are suboptimal for the ReID task. 2) The collaboration between two sub-tasks is ignored. To alleviate these issues, we present a novel Person Search framework based on the Diffusion model, PSDiff. PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths. Unlike existing methods that follow the Detection-to-ReID paradigm, our denoising paradigm eliminates detection-prior modules to avoid the local-optimum of the ReID task. Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way, which makes two sub-tasks mutually beneficial. Extensive experiments on the standard benchmarks show that PSDiff achieves state-of-the-art performance with fewer parameters and elastic computing overhead.
摘要
主流人体搜索方法目标是在一个统一网络中本地化和识别查询人体,同时优化两个子任务,即探测和ReID(人体识别)。尽管有了很大的进步,但两个主要挑战仍然存在:1)探测优先模块在先前的方法中是不佳的ReID任务。2)两个子任务之间的合作被忽视。为了解决这些问题,我们提出了一种基于Diffusion模型的人体搜索框架,称为PSDiff。PSDiff将人体搜索转化为一个双方减噪过程,从噪声框和ReID嵌入转化到实际值。与先前的方法不同,我们的减噪方法不需要探测优先模块,以避免探测任务的本地最佳点。在新的 paradigma下,我们进一步设计了一个新的合作减噪层(CDL),以便在迭代和协同的方式优化探测和ReID子任务,使两个子任务互相有利。经验表明,PSDiff在标准测试准则上达到了状态的精度性表现,并且具有 fewer 参数和灵活计算负担。
Hyperspectral Benchmark: Bridging the Gap between HSI Applications through Comprehensive Dataset and Pretraining
results: 本研究的结果显示,这个benchmark dataset可以更好地评估专门的HSI模型,并且可以推广现有的方法。此外,预训管道可以提高专门的训练过程稳定性。Abstract
Hyperspectral Imaging (HSI) serves as a non-destructive spatial spectroscopy technique with a multitude of potential applications. However, a recurring challenge lies in the limited size of the target datasets, impeding exhaustive architecture search. Consequently, when venturing into novel applications, reliance on established methodologies becomes commonplace, in the hope that they exhibit favorable generalization characteristics. Regrettably, this optimism is often unfounded due to the fine-tuned nature of models tailored to specific HSI contexts. To address this predicament, this study introduces an innovative benchmark dataset encompassing three markedly distinct HSI applications: food inspection, remote sensing, and recycling. This comprehensive dataset affords a finer assessment of hyperspectral model capabilities. Moreover, this benchmark facilitates an incisive examination of prevailing state-of-the-art techniques, consequently fostering the evolution of superior methodologies. Furthermore, the enhanced diversity inherent in the benchmark dataset underpins the establishment of a pretraining pipeline for HSI. This pretraining regimen serves to enhance the stability of training processes for larger models. Additionally, a procedural framework is delineated, offering insights into the handling of applications afflicted by limited target dataset sizes.
摘要
干elespectral Imaging(HSI)是一种不 destrucción的空间спектроскопи技术,具有各种应用前景。然而,一个常 recurs的挑战是目标数据集的有限大小,导致了较少的模型搜索空间。因此,在探索新应用场景时,通常会依靠已有的方法,希望它们在不同的HSI上能够展现良好的泛化特性。然而,这种optimism通常是不符的,因为这些模型是为特定HSI上精心定制的。为解决这个困境,本研究提出了一个创新的标准数据集,包括三个明确不同的HSI应用:食品检查、远程感知和回收。这个全面的数据集为干elespectral模型的能力进行更加细致的评估。此外,这个标准数据集还支持现有的state-of-the-art技术的准确性的减弱,从而促进了更高水平的方法的进化。此外,增强的数据集多样性为HSI预训练管道提供了基础。这个预训练管道可以增强大型模型的训练过程的稳定性。此外,本研究还提出了一种手动框架,用于处理受有限target数据集大小的应用。
BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird’s Eye View Map Construction
results: 我们的实验表明,BroadBEV可以提供广泛的 BEV视场,并且有较高的性能提升。Abstract
A recent sensor fusion in a Bird's Eye View (BEV) space has shown its utility in various tasks such as 3D detection, map segmentation, etc. However, the approach struggles with inaccurate camera BEV estimation, and a perception of distant areas due to the sparsity of LiDAR points. In this paper, we propose a broad BEV fusion (BroadBEV) that addresses the problems with a spatial synchronization approach of cross-modality. Our strategy aims to enhance camera BEV estimation for a broad-sighted perception while simultaneously improving the completion of LiDAR's sparsity in the entire BEV space. Toward that end, we devise Point-scattering that scatters LiDAR BEV distribution to camera depth distribution. The method boosts the learning of depth estimation of the camera branch and induces accurate location of dense camera features in BEV space. For an effective BEV fusion between the spatially synchronized features, we suggest ColFusion that applies self-attention weights of LiDAR and camera BEV features to each other. Our extensive experiments demonstrate that BroadBEV provides a broad-sighted BEV perception with remarkable performance gains.
摘要
Recently, a sensor fusion in a bird's eye view (BEV) space has shown its potential in various tasks such as 3D detection and map segmentation. However, the approach is limited by inaccurate camera BEV estimation and a lack of information on distant areas due to the sparsity of LiDAR points. In this paper, we propose a broad BEV fusion (BroadBEV) that addresses these problems using a cross-modality spatial synchronization approach. Our method aims to improve camera BEV estimation for a broad-sighted perception while simultaneously enhancing the completion of LiDAR's sparsity in the entire BEV space. To achieve this, we use Point-scattering to scatter LiDAR BEV distribution to camera depth distribution, which boosts the learning of depth estimation of the camera branch and accurately locates dense camera features in BEV space. Additionally, we propose ColFusion, which applies self-attention weights of LiDAR and camera BEV features to each other for effective BEV fusion. Our extensive experiments show that BroadBEV provides a broad-sighted BEV perception with significant performance gains.
results: 这个研究使用了一个大量的攻击识别数据集(AID),包含了180,000个攻击示例,并通过使用GLOF模组进行攻击识别,获得了多个有趣的比较结果。Abstract
Intrinsic susceptibility of deep learning to adversarial examples has led to a plethora of attack techniques with a broad common objective of fooling deep models. However, we find slight compositional differences between the algorithms achieving this objective. These differences leave traces that provide important clues for attacker profiling in real-life scenarios. Inspired by this, we introduce a novel problem of PRofiling Adversarial aTtacks (PRAT). Given an adversarial example, the objective of PRAT is to identify the attack used to generate it. Under this perspective, we can systematically group existing attacks into different families, leading to the sub-problem of attack family identification, which we also study. To enable PRAT analysis, we introduce a large Adversarial Identification Dataset (AID), comprising over 180k adversarial samples generated with 13 popular attacks for image specific/agnostic white/black box setups. We use AID to devise a novel framework for the PRAT objective. Our framework utilizes a Transformer based Global-LOcal Feature (GLOF) module to extract an approximate signature of the adversarial attack, which in turn is used for the identification of the attack. Using AID and our framework, we provide multiple interesting benchmark results for the PRAT problem.
摘要
深度学习内置的攻击例子感受性问题,导致了许多攻击技术的出现,它们的共同目标都是欺骗深度模型。然而,我们发现这些攻击技术之间存在轻微的组合差异,这些差异留下了重要的攻击者追踪 traces。 inspirited by this,我们提出了一个新的问题:PRofiling Adversarial aTtacks(PRAT)。给定一个攻击例子,PRAT 的目标是确定攻击该例子的攻击方法。基于这种视角,我们可以系统地将现有的攻击分为不同的家族,导致了攻击家族识别问题的研究,我们也进行了这种研究。为了启用 PRAT 分析,我们提出了一个大型的攻击标识数据集(AID),包含了180k多个生成了13种流行的攻击的攻击示例,用于黑色/白色盒子设置。我们使用 AID 和我们的框架,提出了一种新的框架来实现 PRAT 目标。我们的框架使用 Transformer 基于的全局-本地特征(GLOF)模块,将攻击例子中的攻击特征提取出来,并用于攻击的识别。使用 AID 和我们的框架,我们提供了多个有趣的 PRAT 问题的 benchmark 结果。
Self-supervised Domain-agnostic Domain Adaptation for Satellite Images
for: Addressing the domain shift issue in machine learning for global scale satellite image processing.
methods: Proposed an self-supervised domain-agnostic domain adaptation (SS(DA)2) method, which uses a contrastive generative adversarial loss to train a generative network for image-to-image translation, and improves the generalizability of downstream models by augmenting the training data with different testing spectral characteristics.
results: Experimental results on public benchmarks verified the effectiveness of SS(DA)2.Abstract
Domain shift caused by, e.g., different geographical regions or acquisition conditions is a common issue in machine learning for global scale satellite image processing. A promising method to address this problem is domain adaptation, where the training and the testing datasets are split into two or multiple domains according to their distributions, and an adaptation method is applied to improve the generalizability of the model on the testing dataset. However, defining the domain to which each satellite image belongs is not trivial, especially under large-scale multi-temporal and multi-sensory scenarios, where a single image mosaic could be generated from multiple data sources. In this paper, we propose an self-supervised domain-agnostic domain adaptation (SS(DA)2) method to perform domain adaptation without such a domain definition. To achieve this, we first design a contrastive generative adversarial loss to train a generative network to perform image-to-image translation between any two satellite image patches. Then, we improve the generalizability of the downstream models by augmenting the training data with different testing spectral characteristics. The experimental results on public benchmarks verify the effectiveness of SS(DA)2.
摘要
域外转移问题,如不同地理区域或获取条件,是机器学习在全球范围卫星图像处理中的常见问题。一种有前途的方法是领域适应,其中训练集和测试集被分成两个或多个领域,并应用适应方法以提高测试集模型的泛化性。然而,定义各卫星图像归属的领域并不是易事,尤其在大规模多时间和多感器场景下,一个卫星图像融合可能来自多个数据源。在这篇论文中,我们提出了一种自主适应领域无关的自动适应(SS(DA)2)方法,无需定义各卫星图像的领域。为此,我们首先设计了一种对比生成隐藏层的挑战推荐损失,以训练生成网络进行卫星图像块之间的自动翻译。然后,我们通过增加不同测试spectral特征来提高下游模型的泛化性。实验结果表明,SS(DA)2有效地解决了域外转移问题。
Forgery-aware Adaptive Vision Transformer for Face Forgery Detection
results: 实验表明,我们的 FA-ViT 在 cross-dataset 评估和 cross- manipulate 场景中达到了状态机器人的性能,并提高了对未经看到的干扰的Robustness。Abstract
With the advancement in face manipulation technologies, the importance of face forgery detection in protecting authentication integrity becomes increasingly evident. Previous Vision Transformer (ViT)-based detectors have demonstrated subpar performance in cross-database evaluations, primarily because fully fine-tuning with limited Deepfake data often leads to forgetting pre-trained knowledge and over-fitting to data-specific ones. To circumvent these issues, we propose a novel Forgery-aware Adaptive Vision Transformer (FA-ViT). In FA-ViT, the vanilla ViT's parameters are frozen to preserve its pre-trained knowledge, while two specially designed components, the Local-aware Forgery Injector (LFI) and the Global-aware Forgery Adaptor (GFA), are employed to adapt forgery-related knowledge. our proposed FA-ViT effectively combines these two different types of knowledge to form the general forgery features for detecting Deepfakes. Specifically, LFI captures local discriminative information and incorporates these information into ViT via Neighborhood-Preserving Cross Attention (NPCA). Simultaneously, GFA learns adaptive knowledge in the self-attention layer, bridging the gap between the two different domain. Furthermore, we design a novel Single Domain Pairwise Learning (SDPL) to facilitate fine-grained information learning in FA-ViT. The extensive experiments demonstrate that our FA-ViT achieves state-of-the-art performance in cross-dataset evaluation and cross-manipulation scenarios, and improves the robustness against unseen perturbations.
摘要
随着人脸杜撰技术的发展,保护身份验证的 authenticty integrity 成为越来越重要的。先前的 Vision Transformer (ViT) 基于的检测器在跨数据库评估中表现不佳,主要因为完全精度调整 WITH 有限的 Deepfake 数据通常会导致忘记预训练知识并过拟合数据库specific 的知识。为了解决这些问题,我们提出了一种 novel Forgery-aware Adaptive Vision Transformer (FA-ViT)。在 FA-ViT 中,vanilla ViT 的参数被冻结,以保持其预训练的知识。同时,我们采用了两个特制的组件:Local-aware Forgery Injector (LFI) 和 Global-aware Forgery Adaptor (GFA)。LFI 捕捉了地方特征信息,并将这些信息与 Neighborhood-Preserving Cross Attention (NPCA) 结合,以便在 ViT 中捕捉到地方特征。而 GFA 在自注意层中学习了适应性知识, bridging the gap между两种不同的领域。此外,我们还设计了一种 novel Single Domain Pairwise Learning (SDPL),以便在 FA-ViT 中进行细化信息学习。广泛的实验表明,我们的 FA-ViT 在跨数据库评估和跨杜撰场景中表现出了 state-of-the-art 的性能,并且能够对未经见杜撰的攻击进行鲁棒化。
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval
results: 对于公共数据集的实验结果表明,SSAN可以获得更高的Alignment精度,同时减少存储和在线查询计算成本,比既有方法更高。Abstract
With the explosive growth of web videos in recent years, large-scale Content-Based Video Retrieval (CBVR) becomes increasingly essential in video filtering, recommendation, and copyright protection. Segment-level CBVR (S-CBVR) locates the start and end time of similar segments in finer granularity, which is beneficial for user browsing efficiency and infringement detection especially in long video scenarios. The challenge of S-CBVR task is how to achieve high temporal alignment accuracy with efficient computation and low storage consumption. In this paper, we propose a Segment Similarity and Alignment Network (SSAN) in dealing with the challenge which is firstly trained end-to-end in S-CBVR. SSAN is based on two newly proposed modules in video retrieval: (1) An efficient Self-supervised Keyframe Extraction (SKE) module to reduce redundant frame features, (2) A robust Similarity Pattern Detection (SPD) module for temporal alignment. In comparison with uniform frame extraction, SKE not only saves feature storage and search time, but also introduces comparable accuracy and limited extra computation time. In terms of temporal alignment, SPD localizes similar segments with higher accuracy and efficiency than existing deep learning methods. Furthermore, we jointly train SSAN with SKE and SPD and achieve an end-to-end improvement. Meanwhile, the two key modules SKE and SPD can also be effectively inserted into other video retrieval pipelines and gain considerable performance improvements. Experimental results on public datasets show that SSAN can obtain higher alignment accuracy while saving storage and online query computational cost compared to existing methods.
摘要
随着网络视频的快速增长,大规模的内容基于视频检索(CBVR)在视频筛选、推荐和版权保护中变得越来越重要。segment级CBVR(S-CBVR)可以在更细粒度上定位相似的分割时间,这对用户浏览效率和侵权检测尤为重要,特别是在长视频场景下。S-CBVR任务的挑战是如何实现高精度时间对对应和高效计算且快速存储消耗。在这篇论文中,我们提出了一种Segment Similarity and Alignment Network(SSAN)来解决这个挑战。SSAN基于两个新提出的模块:(1)高效的自动学习键帧EXTRACTION(SKE)模块,以减少缓存和搜索时间,同时保持相似性和精度;(2)Robust的同时间模式检测(SPD)模块,用于时间对对应。相比于固定帧EXTRACTION,SKE不仅减少了特征存储和搜索时间,还引入了相似的准确性和有限的额外计算时间。在时间对对应方面,SPD可以更高精度地local化相似分割,而且更高效 than现有的深度学习方法。此外,我们将SSAN、SKE和SPD联合训练,实现了端到端提升。此外,这两个关键模块也可以在其他视频检索管道中插入,并获得显著性能提升。实验结果表明,SSAN可以在公共数据集上获得更高的对应精度,同时减少存储和在线查询计算成本。
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation
results: 该论文通过一个新创建的封闭预测数据集(DAPS),成功地解决了indoor dense prediction问题,包括声音基础 depth estimation、semantic segmentation和3D场景重建等问题。在不同的metric和后处理架构下,该distillation框架一致地实现了状态的最佳性能。Abstract
Sound can convey significant information for spatial reasoning in our daily lives. To endow deep networks with such ability, we address the challenge of dense indoor prediction with sound in both 2D and 3D via cross-modal knowledge distillation. In this work, we propose a Spatial Alignment via Matching (SAM) distillation framework that elicits local correspondence between the two modalities in vision-to-audio knowledge transfer. SAM integrates audio features with visually coherent learnable spatial embeddings to resolve inconsistencies in multiple layers of a student model. Our approach does not rely on a specific input representation, allowing for flexibility in the input shapes or dimensions without performance degradation. With a newly curated benchmark named Dense Auditory Prediction of Surroundings (DAPS), we are the first to tackle dense indoor prediction of omnidirectional surroundings in both 2D and 3D with audio observations. Specifically, for audio-based depth estimation, semantic segmentation, and challenging 3D scene reconstruction, the proposed distillation framework consistently achieves state-of-the-art performance across various metrics and backbone architectures.
摘要
声音可以传递重要的信息来帮助我们日常准备空间理解。为了让深度网络具备这种能力,我们在视Audio知识传递中处理紧凑的室内预测问题。在这项工作中,我们提出了一种名为匹配(SAM)知识传递框架,该框架在视Audio知识传递中找到本地匹配点,以解决多层学习模型中的不一致。SAM将音频特征与视觉一致的学习可变的空间嵌入结合起来,以解决多层学习模型中的不一致。我们的方法不依赖特定的输入表示,因此可以在输入形状或维度上进行灵活的调整无论影响性。我们新编制了一个名为环境预测(DAPS)的权威数据集,我们是第一个在2D和3D室内环境预测中使用音频观察结果进行密集预测。特别是,我们的框架在音频基于深度估计、语义分割和复杂3D场景重建等方面均实现了状态的最佳性。
results: 我们的提案模型在 VQA-Med 2019 测试集上获得了60%的准确率,与其他州OF-the-art Med-VQA 模型相当。Abstract
Medical visual question answering (Med-VQA) is a machine learning task that aims to create a system that can answer natural language questions based on given medical images. Although there has been rapid progress on the general VQA task, less progress has been made on Med-VQA due to the lack of large-scale annotated datasets. In this paper, we present domain-specific pre-training strategies, including a novel contrastive learning pretraining method, to mitigate the problem of small datasets for the Med-VQA task. We find that the model benefits from components that use fewer parameters. We also evaluate and discuss the model's visual reasoning using evidence verification techniques. Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.
摘要
医学视觉问答(Med-VQA)是一种机器学习任务,旨在创建一个能够根据给定医学图像回答自然语言问题的系统。虽然总体VQA任务上有了快速的进步,但Med-VQA任务上的进步较少,这主要归结于医学图像数据的小规模。在这篇论文中,我们提出了域特定预训练策略,包括一种新的对比学习预训练方法,以解决Med-VQA任务的数据小规模问题。我们发现模型受到参数数量的限制具有好处。我们还评估和讨论模型的视觉逻辑使用证明技术。我们的提议的模型在VQA-Med 2019测试集上取得了60%的准确率,与其他状态之前的Med-VQA模型相当。
results: 这篇论文的模型在CIFAR-10数据集上比Consistency Model和Denoising Score Matching更高效,这表明了这种框架的潜在力量。此外,模型还在MINIST和LSUN数据集上进行了更多的示例。代码可以在GitHub上下载。Abstract
We propose a new score-based model with one-step sampling. Previously, score-based models were burdened with heavy computations due to iterative sampling. For substituting the iterative process, we train a standalone generator to compress all the time steps with the gradient backpropagated from the score network. In order to produce meaningful gradients for the generator, the score network is trained to simultaneously match the real data distribution and mismatch the fake data distribution. This model has the following advantages: 1) For sampling, it generates a fake image with only one step forward. 2) For training, it only needs 10 diffusion steps.3) Compared with consistency model, it is free of the ill-posed problem caused by consistency loss. On the popular CIFAR-10 dataset, our model outperforms Consistency Model and Denoising Score Matching, which demonstrates the potential of the framework. We further provide more examples on the MINIST and LSUN datasets. The code is available on GitHub.
摘要
我们提出了一个新的分数基于模型,使用单步采样。在过去,分数基于模型受到迭代采样的计算压力。为了替代迭代过程,我们训练了一个独立的生成器,使其在分数网络的梯度归整下压缩所有时间步。为了生成有意义的梯度,分数网络需要同时匹配真实数据分布和假数据分布。这个模型具有以下优点:1)采样时只需一步前进。2)训练时只需10步扩散。3)与一致性模型相比,它免受一致性损失导致的糟糕问题。在流行的 CIFAR-10 数据集上,我们的模型超越了一致性模型和杂噪分匹配模型,这表明了该框架的潜力。我们还提供了更多的例子在 MINIST 和 LSUN 数据集上。代码可以在 GitHub 上找到。
CaveSeg: Deep Semantic Segmentation and Scene Parsing for Autonomous Underwater Cave Exploration
results: 本研究通过在美国、墨西哥和西班牙的洞穴系统进行了 comprehensive benchmark 分析,证明了可以透过 CaveSeg 发展出高性能的深度视觉模型,并且在实际应用中实现了快速的 semantic scene parsing。Abstract
In this paper, we present CaveSeg - the first visual learning pipeline for semantic segmentation and scene parsing for AUV navigation inside underwater caves. We address the problem of scarce annotated training data by preparing a comprehensive dataset for semantic segmentation of underwater cave scenes. It contains pixel annotations for important navigation markers (e.g. caveline, arrows), obstacles (e.g. ground plain and overhead layers), scuba divers, and open areas for servoing. Through comprehensive benchmark analyses on cave systems in USA, Mexico, and Spain locations, we demonstrate that robust deep visual models can be developed based on CaveSeg for fast semantic scene parsing of underwater cave environments. In particular, we formulate a novel transformer-based model that is computationally light and offers near real-time execution in addition to achieving state-of-the-art performance. Finally, we explore the design choices and implications of semantic segmentation for visual servoing by AUVs inside underwater caves. The proposed model and benchmark dataset open up promising opportunities for future research in autonomous underwater cave exploration and mapping.
摘要
在这篇论文中,我们介绍了CaveSeg,首个用于semantic segmentation和场景分解的AUV内水洞环境视觉学习管道。我们解决了罕见的注释培训数据的问题,prepare了包含重要导航标记(例如, cave line、箭头)、障碍物(例如,地面层和天花板层)、潜水员和开放区域的像素注释。通过对美国、墨西哥和西班牙等地水洞系统进行了全面的比较分析,我们证明了可以基于CaveSeg构建Robust的深度视觉模型,用于快速semantic scene parsing水洞环境。尤其是,我们提出了一种新的 transformer-based 模型,具有较少计算量和实时执行能力,同时也达到了状态实验室的性能。最后,我们探讨了semantic segmentation对AUV内水洞环境的视ervoking的设计选择和意义。提出的模型和数据集开 up了未来水洞exploration和 mapping 的可能性。
Light Field Diffusion for Single-View Novel View Synthesis
methods: 我们使用Light Field Diffusion(LFD)模型,这是一种基于扩散的增强模型,在扩散过程中将摄像头视角信息转换为光场编码,并与参考图像相结合。这种设计引入了本地像素级别的约束,从而促进了多视图一致性。
results: 我们的LFD可以高效地生成高质量图像,并在复杂的区域中保持更好的3D一致性。我们的方法可以与NeRF-based模型相比,并且我们的模型规模只是NeRF-based模型的一半。Abstract
Single-view novel view synthesis, the task of generating images from new viewpoints based on a single reference image, is an important but challenging task in computer vision. Recently, Denoising Diffusion Probabilistic Model (DDPM) has become popular in this area due to its strong ability to generate high-fidelity images. However, current diffusion-based methods directly rely on camera pose matrices as viewing conditions, globally and implicitly introducing 3D constraints. These methods may suffer from inconsistency among generated images from different perspectives, especially in regions with intricate textures and structures. In this work, we present Light Field Diffusion (LFD), a conditional diffusion-based model for single-view novel view synthesis. Unlike previous methods that employ camera pose matrices, LFD transforms the camera view information into light field encoding and combines it with the reference image. This design introduces local pixel-wise constraints within the diffusion models, thereby encouraging better multi-view consistency. Experiments on several datasets show that our LFD can efficiently generate high-fidelity images and maintain better 3D consistency even in intricate regions. Our method can generate images with higher quality than NeRF-based models, and we obtain sample quality similar to other diffusion-based models but with only one-third of the model size.
摘要
单视图novel视觉合成问题,即基于单个参考图像生成新视点图像,是计算机视觉中重要但困难的任务。最近,Denosing Diffusion Probabilistic Model (DDPM) 在这个领域中得到了广泛应用,因为它可以生成高品质图像。然而,当前的扩散基本方法直接使用摄像机pose矩阵作为视图条件,全局和强制性地引入3D约束。这些方法可能在不同视点图像中生成的图像之间存在不一致,特别是在具有复杂 текстура和结构的区域中。在这种情况下,我们提出了Light Field Diffusion (LFD),一种基于条件扩散的单视图novel视觉合成模型。与之前的方法不同,LFD将摄像机视角信息转换为光场编码,并将其与参考图像相结合。这种设计引入了本地像素级别的扩散模型中的约束,从而鼓励更好的多视图一致性。我们的LFD可以高效地生成高品质图像,并在复杂区域中保持更好的3D一致性。我们的方法可以生成图像质量高于NeRF-based模型,并且在模型大小方面与其他扩散基本方法相当,但只需一半的模型大小。
Conformalized Multimodal Uncertainty Regression and Reasoning
results: simulations 结果显示,在我们的框架中,不确定度估计器适应了具有严重噪音、有限训练数据和有限预测模型大小的问题。此外,我们开发了一个理解框架,利用这些可靠的不确定度估计器,并与光流基于的理解来提高预测精度。因此,通过适当地考虑数据驱动学习中的预测不确定性,并透过规律基于的理解来关闭预测模型的估计loop,我们的方法在所有这些问题上显著超越了传统的深度学习方法,实际上降低预测错误的比例为2-3倍。Abstract
This paper introduces a lightweight uncertainty estimator capable of predicting multimodal (disjoint) uncertainty bounds by integrating conformal prediction with a deep-learning regressor. We specifically discuss its application for visual odometry (VO), where environmental features such as flying domain symmetries and sensor measurements under ambiguities and occlusion can result in multimodal uncertainties. Our simulation results show that uncertainty estimates in our framework adapt sample-wise against challenging operating conditions such as pronounced noise, limited training data, and limited parametric size of the prediction model. We also develop a reasoning framework that leverages these robust uncertainty estimates and incorporates optical flow-based reasoning to improve prediction prediction accuracy. Thus, by appropriately accounting for predictive uncertainties of data-driven learning and closing their estimation loop via rule-based reasoning, our methodology consistently surpasses conventional deep learning approaches on all these challenging scenarios--pronounced noise, limited training data, and limited model size-reducing the prediction error by 2-3x.
摘要
Here is the text in Simplified Chinese:这篇论文介绍了一种轻量级的不确定性估计器,可以通过将 конформальный预测与深度学习回归器结合来预测多Modal不确定性 bound。我们特别探讨了它在视觉运动(VO)中的应用, где environmental features和感知测量在异常和遮挡下可能导致多Modal不确定性。我们的 simulations 表明,在我们的框架中的不确定性估计适应样本所对抗复杂的运行条件,如强度的噪音、有限的训练数据和有限的预测模型大小。我们还开发了一种使用这些稳健的不确定性估计和基于推Flow的reasoning Framework来提高预测准确性。因此,通过合理地考虑数据驱动学习的预测不确定性和关闭其估计循环 via 规则基于的reasoning,我们的方法在所有这些复杂的 scenarios中一直赶在深度学习方法之前,减少预测错误 by 2-3倍。
Controllable Dynamic Appearance for Neural 3D Portraits
results: 使用短视频 captured with smartphone,在不同的头部姿势和表情控制下实现了高质量的自由视 sintesis效果,并且能够模拟真实的照明效果。Abstract
Recent advances in Neural Radiance Fields (NeRFs) have made it possible to reconstruct and reanimate dynamic portrait scenes with control over head-pose, facial expressions and viewing direction. However, training such models assumes photometric consistency over the deformed region e.g. the face must be evenly lit as it deforms with changing head-pose and facial expression. Such photometric consistency across frames of a video is hard to maintain, even in studio environments, thus making the created reanimatable neural portraits prone to artifacts during reanimation. In this work, we propose CoDyNeRF, a system that enables the creation of fully controllable 3D portraits in real-world capture conditions. CoDyNeRF learns to approximate illumination dependent effects via a dynamic appearance model in the canonical space that is conditioned on predicted surface normals and the facial expressions and head-pose deformations. The surface normals prediction is guided using 3DMM normals that act as a coarse prior for the normals of the human head, where direct prediction of normals is hard due to rigid and non-rigid deformations induced by head-pose and facial expression changes. Using only a smartphone-captured short video of a subject for training, we demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls, and realistic lighting effects. The project page can be found here: http://shahrukhathar.github.io/2023/08/22/CoDyNeRF.html
摘要
最近的神经辐射场(NeRF)技术突破,使得可以重建和复活动态肖像场景,包括头部姿态和表情的控制。然而,训练这些模型时需要光ometric consistency over the deformed region,例如脸部必须在不同的头部姿态和表情变化中保持光度的均匀性。这种光度一致性在视频帧中很难保持,即使在studio environment中,因此创建的可控3D肖像容易出现artifacts during reanimation。在这项工作中,我们提出了CoDyNeRF系统,可以在真实的捕捉条件下创建完全可控的3D肖像。CoDyNeRF通过learns to approximate illumination dependent effects via a dynamic appearance model in the canonical space that is conditioned on predicted surface normals and the facial expressions and head-pose deformations来解决这个问题。 surface normals prediction是通过3DMM normals作为一个粗略的估计器来引导的,因为direct prediction of normals是由于头部姿态和表情变化induced的固定和非固定扭曲而困难。通过只使用短视频 capture的智能手机训练,我们示示了我们的方法在free view synthesis of a portrait scene with explicit head pose and expression controls, and realistic lighting effects。相关项目页面可以在以下链接中找到:http://shahrukhathar.github.io/2023/08/22/CoDyNeRF.html
STARNet: Sensor Trustworthiness and Anomaly Recognition via Approximated Likelihood Regret for Robust Edge Autonomy
paper_authors: Nastaran Darabi, Sina Tayebati, Sureshkumar S., Sathya Ravi, Theja Tulabandhula, Amit R. Trivedi
for: This paper is written to address the reliability concerns of complex sensors such as LiDAR and camera sensors in autonomous robotics, and to improve the prediction accuracy of deep learning models by detecting untrustworthy sensor streams.
methods: STARNet, a Sensor Trustworthiness and Anomaly Recognition Network, is used to detect untrustworthy sensor streams. STARNet employs the concept of approximated likelihood regret, a gradient-free framework tailored for low-complexity hardware.
results: STARNet enhances prediction accuracy by approximately 10% by filtering out untrustworthy sensor streams in unimodal and multimodal settings, especially in addressing internal sensor failures such as cross-sensor interference and crosstalk.Abstract
Complex sensors such as LiDAR, RADAR, and event cameras have proliferated in autonomous robotics to enhance perception and understanding of the environment. Meanwhile, these sensors are also vulnerable to diverse failure mechanisms that can intricately interact with their operation environment. In parallel, the limited availability of training data on complex sensors also affects the reliability of their deep learning-based prediction flow, where their prediction models can fail to generalize to environments not adequately captured in the training set. To address these reliability concerns, this paper introduces STARNet, a Sensor Trustworthiness and Anomaly Recognition Network designed to detect untrustworthy sensor streams that may arise from sensor malfunctions and/or challenging environments. We specifically benchmark STARNet on LiDAR and camera data. STARNet employs the concept of approximated likelihood regret, a gradient-free framework tailored for low-complexity hardware, especially those with only fixed-point precision capabilities. Through extensive simulations, we demonstrate the efficacy of STARNet in detecting untrustworthy sensor streams in unimodal and multimodal settings. In particular, the network shows superior performance in addressing internal sensor failures, such as cross-sensor interference and crosstalk. In diverse test scenarios involving adverse weather and sensor malfunctions, we show that STARNet enhances prediction accuracy by approximately 10% by filtering out untrustworthy sensor streams. STARNet is publicly available at \url{https://github.com/sinatayebati/STARNet}.
摘要
复杂的感知器如LiDAR、RADAR和事件摄像头在自主 робо扮中广泛应用,以提高环境的感知和理解。然而,这些感知器也面临着多种失效机制,这些失效机制可能与其运行环境互相复杂交互。同时,对于复杂的感知器,有限的训练数据也会影响其深度学习基于预测流的可靠性,其预测模型可能无法泛化到不充分 captured 的环境中。为解决这些可靠性问题,本文介绍了 STARNet,一种感知器可靠性和异常检测网络,可以检测不可靠的感知流,这些感知流可能由感知器故障和/或挑战环境引起。我们 especifically 对 LiDAR 和摄像头数据进行了 benchmark。STARNet 采用了approximated likelihood regret,一种适用于低复杂度硬件的梯度自由框架。通过广泛的 simulations,我们展示了 STARNet 在单模态和多模态设置下的效果。尤其是,网络在内部感知器故障方面表现出色,如交叉感知和电磁干扰。在多种测试enario中,包括不良天气和感知器故障,我们表明 STARNet 可以提高预测精度约 10%,通过筛选不可靠的感知流。STARNet 公共可用于 \url{https://github.com/sinatayebati/STARNet}.
PPD: A New Valet Parking Pedestrian Fisheye Dataset for Autonomous Driving
results: 实验证明了我们的新的数据增强方法的效果,并证明了数据集的非常普遍化。Abstract
Pedestrian detection under valet parking scenarios is fundamental for autonomous driving. However, the presence of pedestrians can be manifested in a variety of ways and postures under imperfect ambient conditions, which can adversely affect detection performance. Furthermore, models trained on publicdatasets that include pedestrians generally provide suboptimal outcomes for these valet parking scenarios. In this paper, wepresent the Parking Pedestrian Dataset (PPD), a large-scale fisheye dataset to support research dealing with real-world pedestrians, especially with occlusions and diverse postures. PPD consists of several distinctive types of pedestrians captured with fisheye cameras. Additionally, we present a pedestrian detection baseline on PPD dataset, and introduce two data augmentation techniques to improve the baseline by enhancing the diversity ofthe original dataset. Extensive experiments validate the effectiveness of our novel data augmentation approaches over baselinesand the dataset's exceptional generalizability.
摘要
自动驾驶中的人行检测在停车场景下是基本的。然而,人行可以在不同的环境条件下表现出多种形式和姿势,这会 adversely affect 检测性能。尤其是模型通常在公共数据集上训练,这些数据集中的人行通常不适合停车场景。在这篇论文中,我们提出了停车场景人行数据集(PPD),一个大规模的鱼眼数据集,以支持实际世界中的人行检测,特别是干扰和多种姿势。PPD 包括多种特征的人行,通过鱼眼摄像头捕捉。此外,我们还提出了人行检测基线在 PPD 数据集上,并介绍了两种数据增强技术来提高基线,以提高数据集的多样性。广泛的实验证明了我们的新的数据增强方法的有效性,以及数据集的出色的普适性。
COSE: A Consistency-Sensitivity Metric for Saliency on Image Classification
results: 研究发现,虽然多种焦点映射方法都能够解释模型决策,但是transformer模型比 convolutional模型更难被这些方法解释。此外,GradCAM表现最佳,但是它在细节化数据集上缺乏多样性。通过对准则和敏感度进行平衡,可以获得一个准确地表示模型行为的焦点映射。Abstract
We present a set of metrics that utilize vision priors to effectively assess the performance of saliency methods on image classification tasks. To understand behavior in deep learning models, many methods provide visual saliency maps emphasizing image regions that most contribute to a model prediction. However, there is limited work on analyzing the reliability of saliency methods in explaining model decisions. We propose the metric COnsistency-SEnsitivity (COSE) that quantifies the equivariant and invariant properties of visual model explanations using simple data augmentations. Through our metrics, we show that although saliency methods are thought to be architecture-independent, most methods could better explain transformer-based models over convolutional-based models. In addition, GradCAM was found to outperform other methods in terms of COSE but was shown to have limitations such as lack of variability for fine-grained datasets. The duality between consistency and sensitivity allow the analysis of saliency methods from different angles. Ultimately, we find that it is important to balance these two metrics for a saliency map to faithfully show model behavior.
摘要
我们提出了一组维度度量,使用视觉优先来评估针对图像分类任务的精度方法的表现。在深度学习模型中,许多方法提供视觉精度地图,强调图像区域对模型预测的贡献。然而,对于分析深度学习模型决策的可靠性的工作几乎缺乏。我们提出了COnsistency-SEnsitivity(COSE)度量,用于衡量视觉模型解释的等变和不变性。通过我们的度量,我们发现,虽然许多方法被认为是无关于模型结构的,但大多数方法在基于转换器模型时表现较好。此外,GradCAM在COSE方面表现出色,但它在细腻数据上缺乏变化。这种对照性Allow我们从不同角度分析精度方法。最终,我们发现,为了让精度地图准确反映模型行为,需要平衡这两个度量。
results: 我们的RMT在多种计算机视觉任务中表现出色,例如在ImageNet-1k上达到84.1%的Top1-acc,使用了仅4.5G FLOPs。此外,RMT在下游任务中,如物体检测、实例分割和semantic segmentation中也表现出优异。Abstract
Transformer first appears in the field of natural language processing and is later migrated to the computer vision domain, where it demonstrates excellent performance in vision tasks. However, recently, Retentive Network (RetNet) has emerged as an architecture with the potential to replace Transformer, attracting widespread attention in the NLP community. Therefore, we raise the question of whether transferring RetNet's idea to vision can also bring outstanding performance to vision tasks. To address this, we combine RetNet and Transformer to propose RMT. Inspired by RetNet, RMT introduces explicit decay into the vision backbone, bringing prior knowledge related to spatial distances to the vision model. This distance-related spatial prior allows for explicit control of the range of tokens that each token can attend to. Additionally, to reduce the computational cost of global modeling, we decompose this modeling process along the two coordinate axes of the image. Abundant experiments have demonstrated that our RMT exhibits exceptional performance across various computer vision tasks. For example, RMT achieves 84.1% Top1-acc on ImageNet-1k using merely 4.5G FLOPs. To the best of our knowledge, among all models, RMT achieves the highest Top1-acc when models are of similar size and trained with the same strategy. Moreover, RMT significantly outperforms existing vision backbones in downstream tasks such as object detection, instance segmentation, and semantic segmentation. Our work is still in progress.
摘要
transformer 最初出现在自然语言处理领域,后来迁移到计算机视觉领域,在视觉任务中表现出色。然而,最近,Retentive Network(RetNet) Architecture 出现,吸引了自然语言社区的广泛关注。因此,我们提出了将 RetNet 的想法应用于视觉领域,以提高视觉任务的表现。为此,我们将 RetNet 和 transformer 结合,提出了 RMT。 RetNet 中引入了显式衰减,使视觉模型受到相对距离的知识。这种距离相关的空间先验使每个token可以显式控制所能attend的token范围。此外,为降低全局模型的计算成本,我们将模型化过程分解成两个坐标轴的图像。我们的 RMT 在多个计算机视觉任务中表现出色,例如在 ImageNet-1k 中 achiev 84.1% Top1-acc 使用仅 4.5G FLOPs。我们知道,在同样大小的模型和同样策略下,RMT 在所有模型中具有最高的 Top1-acc。此外,RMT 在下游任务中,如物体检测、实例分割和 semantics 分割,也表现出了显著的优异。我们的工作仍在进行中。
SEMPART: Self-supervised Multi-resolution Partitioning of Image Semantics
results: 本文提出了一种名为SEMPART的方法,可以同时确定图像的粗细分割和细分割,并且可以快速生成高质量的mask。Abstract
Accurately determining salient regions of an image is challenging when labeled data is scarce. DINO-based self-supervised approaches have recently leveraged meaningful image semantics captured by patch-wise features for locating foreground objects. Recent methods have also incorporated intuitive priors and demonstrated value in unsupervised methods for object partitioning. In this paper, we propose SEMPART, which jointly infers coarse and fine bi-partitions over an image's DINO-based semantic graph. Furthermore, SEMPART preserves fine boundary details using graph-driven regularization and successfully distills the coarse mask semantics into the fine mask. Our salient object detection and single object localization findings suggest that SEMPART produces high-quality masks rapidly without additional post-processing and benefits from co-optimizing the coarse and fine branches.
摘要
精确地定义图像中重要区域是一项具有挑战性的任务,尤其当标注数据稀缺时。基于DINO的自动学习方法最近在捕捉图像中具有意义的Semantic Feature中找到了前景对象。现有方法还将直觉约束 incorporated 到了无监督方法中,并在对象分割方面表现出了价值。在这篇论文中,我们提议了 SEMPART,它同时分解图像的DINO基于semantic graph的粗细分割结果。此外,SEMPART还使用图像驱动的正则化来保持细节,并成功地储存粗细分割结果。我们的精确对象检测和单个对象Localization结果表明,SEMPART可以快速生成高质量的Mask,无需额外处理,并且受益于粗细分支的共同优化。
results: 该研究的目标是确保社区成员的公平参与,并负责使用他们的数据,以便在IoE中提供安全、可靠和可再生的能源服务。Abstract
This paper plans to develop an Equitable and Responsible AI framework with enabling techniques and algorithms for the Internet of Energy (IoE), in short, RAI4IoE. The energy sector is going through substantial changes fueled by two key drivers: building a zero-carbon energy sector and the digital transformation of the energy infrastructure. We expect to see the convergence of these two drivers resulting in the IoE, where renewable distributed energy resources (DERs), such as electric cars, storage batteries, wind turbines and photovoltaics (PV), can be connected and integrated for reliable energy distribution by leveraging advanced 5G-6G networks and AI technology. This allows DER owners as prosumers to participate in the energy market and derive economic incentives. DERs are inherently asset-driven and face equitable challenges (i.e., fair, diverse and inclusive). Without equitable access, privileged individuals, groups and organizations can participate and benefit at the cost of disadvantaged groups. The real-time management of DER resources not only brings out the equity problem to the IoE, it also collects highly sensitive location, time, activity dependent data, which requires to be handled responsibly (e.g., privacy, security and safety), for AI-enhanced predictions, optimization and prioritization services, and automated management of flexible resources. The vision of our project is to ensure equitable participation of the community members and responsible use of their data in IoE so that it could reap the benefits of advances in AI to provide safe, reliable and sustainable energy services.
摘要
这份研究报告计划开发一个公平和负责任的人工智能框架(RAI4IoE),用于互联网能源(IoE)领域。能源领域正在经历重大变革,这两个关键驱动因素:建立零碳素能源产业和能源基础设施的数字变革。我们预计这两个驱动因素会相互交集,导致IoE的出现,其中可再生分布式能源资源(DERs),如电动车、存储电池、风力发电和太阳能电池(PV),可以相互连接和集成,以实现可靠的能源分布,通过利用先进的5G-6G网络和人工智能技术。这允许DER所有者作为生产者和消费者(prosumers)参与能源市场,从而获得经济收益。DERs本身具有资产驱动的特点,面临公平挑战(例如,公平、多样化和包容)。如果没有公平访问,特权个人、组织和集团可以参与和获得利益,而受折磨的群体则被排除在外。IoE实时管理DER资源不仅抛出了公平问题,还收集了高度敏感的地点、时间、活动依赖数据,需要负责任地处理(例如,隐私、安全和安全),以便通过人工智能技术提供了预测、优化和优先级服务,自动管理灵活资源。我们的项目视图是确保社区成员公平参与IoE,并负责使用他们的数据,以便IoE可以通过人工智能技术的进步获得安全、可靠和可再生的能源服务。
LLM Guided Inductive Inference for Solving Compositional Problems
methods: 我们提出了一种方法,即 Recursion based extensible LLM(REBEL),它通过自动理解技术如动态规划和前进链接策略来处理开放世界、深度理解任务。REBEL使用自然语言描述来指定工具,并使用这些工具进行递归问题分解和外部工具使用。
results: 我们在一组需要深度嵌套使用外部工具的问题上示出了REBEL的能力,并在一个组合和对话性的 Setting中进行了证明。Abstract
While large language models (LLMs) have demonstrated impressive performance in question-answering tasks, their performance is limited when the questions require knowledge that is not included in the model's training data and can only be acquired through direct observation or interaction with the real world. Existing methods decompose reasoning tasks through the use of modules invoked sequentially, limiting their ability to answer deep reasoning tasks. We introduce a method, Recursion based extensible LLM (REBEL), which handles open-world, deep reasoning tasks by employing automated reasoning techniques like dynamic planning and forward-chaining strategies. REBEL allows LLMs to reason via recursive problem decomposition and utilization of external tools. The tools that REBEL uses are specified only by natural language description. We further demonstrate REBEL capabilities on a set of problems that require a deeply nested use of external tools in a compositional and conversational setting.
摘要
大型语言模型(LLM)在问答任务中表现出色,但它们的表现受到训练数据中不包含的知识的限制。现有的方法通过运行模组来 decomposing 推理任务,限制它们 Answer deep reasoning tasks. We propose a method called Recursion based extensible LLM (REBEL), which handles open-world, deep reasoning tasks by using automated reasoning techniques such as dynamic planning and forward-chaining strategies. REBEL allows LLMs to reason through recursive problem decomposition and utilize external tools. The tools that REBEL uses are specified only by natural language description. We further demonstrate REBEL's capabilities on a set of problems that require a deeply nested use of external tools in a compositional and conversational setting.
Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework
for: This paper aims to address the issue of fair machine learning models behaving unfairly on test data due to distribution shifts.
methods: The proposed method is based on distributionally robust optimization under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. The method does not require knowledge of the causal graph and can be implemented in a stochastic fashion.
results: The proposed framework has been evaluated through extensive experiments on real datasets consisting of distribution shifts, and the results show that it performs well in terms of fairness and efficiency.Abstract
While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.
摘要
traditional machine learning models have been extensively studied in recent years, but most of these methods rely on the assumption that the training and test data have similar distributions. However, in the presence of distribution shifts, fair models may behave unfairly on test data. To address this shortcoming, there have been some developments in fair learning that are robust to distribution shifts, but these methods are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. Specifically, we formulate the fair inference in the presence of distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.Here's the translation in Traditional Chinese:传统机器学习模型在最近的年份已经得到了广泛的研究,但大多数这些方法假设训练和测试数据的分布相似。然而,在分布shift情况下,公平的模型可能会在测试数据上不公平。为了解决这问题,有些开发了不同的公平学习方法,但这些方法假设有存在 causal graph 描述不同特征之间的互动。此外,现有的算法需要完整的数据存取,并且无法在小批量中使用(stochastic/batch实现)。本文提出了首个可靠的分布robust公平性框架,不需要知道 causal graph。具体来说,我们将 fair inference 在分布shift情况下形式化为分布robust优化问题,并使用 $L_p$ нор uncertainty set 来度量公平违反。我们然后讨论了如何实现这个方法在抽象的方式上。我们通过实际的实验,评估了提出的框架的性能和效率,以及实际应用中的可靠性。
results: 该论文的 FedNGMs 框架可以避免 neuron matching 框架如 Federated Matched Averaging 的缺点,并且可以适应数据不均衡、多个参与者和limited communication bandwidth 等问题。Abstract
Federated Learning (FL) addresses the need to create models based on proprietary data in such a way that multiple clients retain exclusive control over their data, while all benefit from improved model accuracy due to pooled resources. Recently proposed Neural Graphical Models (NGMs) are Probabilistic Graphical models that utilize the expressive power of neural networks to learn complex non-linear dependencies between the input features. They learn to capture the underlying data distribution and have efficient algorithms for inference and sampling. We develop a FL framework which maintains a global NGM model that learns the averaged information from the local NGM models while keeping the training data within the client's environment. Our design, FedNGMs, avoids the pitfalls and shortcomings of neuron matching frameworks like Federated Matched Averaging that suffers from model parameter explosion. Our global model size remains constant throughout the process. In the cases where clients have local variables that are not part of the combined global distribution, we propose a `Stitching' algorithm, which personalizes the global NGM models by merging the additional variables using the client's data. FedNGM is robust to data heterogeneity, large number of participants, and limited communication bandwidth.
摘要
Federated Learning (FL) 解决了基于专有数据的模型创建的需求,以便多个客户端保留专有数据控制权,而同时各自受益于共享资源的提高模型精度。最近提出的神经图模型(NGM)是一种概率图模型,利用神经网络的表达能力来学习输入特征之间的复杂非线性关系。它们学习下面数据分布,并有效的推理和采样算法。我们开发了一个基于FL的框架,称之为FedNGMs,该框架在客户端环境中保持global NGM模型,该模型学习客户端的local NGM模型中的均值信息,而不需要将训练数据传输到客户端。我们的设计避免了神经网络匹配框架如联邦匹配平均的缺点,例如模型参数爆炸。我们的全球模型大小在训练过程中保持不变。在客户端有本地变量,这些变量不是全局共享的共同分布中的一部分时,我们提议使用"缝合"算法,将这些变量与全局NGM模型进行个性化结合。FedNGM具有对数据不一致、大量参与者和有限通信带宽的 Robustness。
results: 研究发现,GPT-4 在游戏环境中的适应性有显著提高,能够更好地提问和发表人类化的回答。然而,模型在骗取和预测对手行动方面存在限制。研究还讨论了游戏开发、财政限制和非语言限制的问题。结果表明,虽然 GPT-4 表现出了较早模型的进步,但还有更多的发展空间,尤其是在塑造更人类化的 AI 模型。Abstract
In this research, we explore the efficacy and potential of Generative AI models, specifically focusing on their application in role-playing simulations exemplified through Spyfall, a renowned mafia-style game. By leveraging GPT-4's advanced capabilities, the study aimed to showcase the model's potential in understanding, decision-making, and interaction during game scenarios. Comparative analyses between GPT-4 and its predecessor, GPT-3.5-turbo, demonstrated GPT-4's enhanced adaptability to the game environment, with significant improvements in posing relevant questions and forming human-like responses. However, challenges such as the model;s limitations in bluffing and predicting opponent moves emerged. Reflections on game development, financial constraints, and non-verbal limitations of the study were also discussed. The findings suggest that while GPT-4 exhibits promising advancements over earlier models, there remains potential for further development, especially in instilling more human-like attributes in AI.
摘要
在这个研究中,我们探索了生成AI模型的效果和潜力,特别是在游戏角色扮演 simulations中的应用。通过利用GPT-4的高级功能,研究旨在表明模型在游戏场景中的理解、决策和互动的潜力。对比GPT-4和其前一代GPT-3.5-turbo,研究发现GPT-4在游戏环境中的适应性得到了显著提升,特别是在提问和表达人类化的问题方面。然而,模型在谎言和预测对手行动方面存在限制。研究还讨论了游戏开发、财务限制和非语言限制的问题。研究结果表明,虽然GPT-4在前一代模型之上具有显著的进步,但还有可能进一步发展,尤其是在具备更多人类特征的AI方面。
“It’s a Fair Game’’, or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents
results: 研究发现用户的错误心理模型和系统设计中的黑暗 Patterns限制了他们对隐私风险的认识和理解,同时人工智能化的交互使用者更容易对自己的敏感信息进行披露,使用者在决策中受到增加的困难。Abstract
The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To bridge this gap, we analyzed sensitive disclosures in real-world ChatGPT conversations and conducted semi-structured interviews with 19 LLM-based CA users. We found that users are constantly faced with trade-offs between privacy, utility, and convenience when using LLM-based CAs. However, users' erroneous mental models and the dark patterns in system design limited their awareness and comprehension of the privacy risks. Additionally, the human-like interactions encouraged more sensitive disclosures, which complicated users' ability to navigate the trade-offs. We discuss practical design guidelines and the needs for paradigmatic shifts to protect the privacy of LLM-based CA users.
摘要
广泛使用大语言模型(LLM)基于对话代理(CA),特别在高风险领域,引发了许多隐私问题。建立尊重用户隐私的LLM基于CA需要深入了解用户关心的隐私风险。然而,现有研究主要关注模型,未能提供用户视角的深入理解。为了补强这个差距,我们分析了实际的ChatGPT对话中的敏感泄露,并进行了19名LLM基于CA用户的semi结构化采访。我们发现,用户在使用LLM基于CA时经常面临privacy、功能和便利性之间的权衡。然而,用户的错误的认知模型和系统设计中的黑暗Patterns限制了他们对隐私风险的认识和理解。此外,人类化的互动更加鼓励用户提供更多的敏感信息,使用户更难avigate权衡。我们讨论了实用的设计指南和保护LLM基于CA用户隐私的需求。
methods: 这篇论文使用了许多 convolutional neural network(CNN)Backbone架构, benchmarked on synthetically generated docking manoeuvres with the International Space Station(ISS),以获得position和态度估算。
results: 这篇论文的结果显示,使用AI可以将relative navigation solution扩展到多种enario,例如targets或照明条件,并且可以实现position和态度估算的高精度。实际上,该方法可以大大减少了需要的工程师时间和资源。Abstract
Cameras are rapidly becoming the choice for on-board sensors towards space rendezvous due to their small form factor and inexpensive power, mass, and volume costs. When it comes to docking, however, they typically serve a secondary role, whereas the main work is done by active sensors such as lidar. This paper documents the development of a proposed AI-based (artificial intelligence) navigation algorithm intending to mature the use of on-board visible wavelength cameras as a main sensor for docking and on-orbit servicing (OOS), reducing the dependency on lidar and greatly reducing costs. Specifically, the use of AI enables the expansion of the relative navigation solution towards multiple classes of scenarios, e.g., in terms of targets or illumination conditions, which would otherwise have to be crafted on a case-by-case manner using classical image processing methods. Multiple convolutional neural network (CNN) backbone architectures are benchmarked on synthetically generated data of docking manoeuvres with the International Space Station (ISS), achieving position and attitude estimates close to 1% range-normalised and 1 deg, respectively. The integration of the solution with a physical prototype of the refuelling mechanism is validated in laboratory using a robotic arm to simulate a berthing procedure.
摘要
随着航天器的发展,镜头在航天器上的应用也在不断扩大。镜头的小型化和低功耗、质量和体积成本使其成为航天器上的首选感知器。然而,在协 docking 过程中,镜头通常扮演着次要角色,主要工作由活动感知器 such as lidar 完成。这篇论文描述了一种基于人工智能(AI)的导航算法的开发,旨在通过使用镜头来提高协 docking 和空间服务(OOS)中的精度和可靠性。使用 AI 可以扩展相对导航解决方案到多种场景,如目标或照明条件,而这些场景之前只能通过经典图像处理方法来手动设计。本文使用多种卷积神经网络(CNN)后处理器,对人工生成的协 docking 演示数据进行了测试,实现了 Position 和 Attitude 估计的准确率接近 1% 范围内和 1 度。此外,将解决方案与实际储存机制的物理 прототип结合,在实验室中使用 робо臂模拟协 docking 过程进行验证。
results: 实验结果表明,提案的修改可以明显提高数据表示和生成能力,使 VQVAEs 更适合各种应用。Abstract
We present a novel approach to enhance the capabilities of VQVAE models through the integration of an Attentive Residual Encoder (AREN) and a Residual Pixel Attention layer. The objective of our research is to improve the performance of VQVAE while maintaining practical parameter levels. The AREN encoder is designed to operate effectively at multiple levels, accommodating diverse architectural complexities. The key innovation is the integration of an inter-pixel auto-attention mechanism into the AREN encoder. This approach allows us to efficiently capture and utilize contextual information across latent vectors. Additionally, our models uses additional encoding levels to further enhance the model's representational power. Our attention layer employs a minimal parameter approach, ensuring that latent vectors are modified only when pertinent information from other pixels is available. Experimental results demonstrate that our proposed modifications lead to significant improvements in data representation and generation, making VQVAEs even more suitable for a wide range of applications.
摘要
我们提出了一种新的方法,通过结合Attentive Residual Encoder(AREN)和Residual Pixel Attention层,以提高VQVAE模型的能力。我们的研究目标是提高VQVAE表现,同时保持实际参数水平。AREN编码器设计可以在多个层次上运行,适应不同的建筑复杂性。我们的关键创新是将Inter-pixel自动注意机制integrated into AREN编码器。这种方法使得我们能够效率地捕捉并利用 latent vector中的上下文信息。此外,我们的模型还使用了多个编码层,以进一步增强模型的表达力。我们的注意层采用了最小参数的方法,确保latent vector只有当其他像素中有pertinent information时才会被修改。实验结果表明,我们的修改导致了数据表示和生成的显著改进,使VQVAEs更适合各种应用。
A survey on the semantics of sequential patterns with negation
results: 研究发现用户对两种 semantics 具有直观性,但这两种 semantics 并不与现有的主流算法 semantics 一致。因此,本研究提出了一些建议,以便更好地考虑这些差异。Abstract
A sequential pattern with negation, or negative sequential pattern, takes the form of a sequential pattern for which the negation symbol may be used in front of some of the pattern's itemsets. Intuitively, such a pattern occurs in a sequence if negated itemsets are absent in the sequence. Recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the-art algorithms do not extract the same sets of patterns. This raises the important question of the interpretability of sequential pattern with negation. In this study, our focus is on exploring how potential users perceive negation in sequential patterns. Our aim is to determine whether specific semantics are more "intuitive" than others and whether these align with the semantics employed by one or more state-of-the-art algorithms. To achieve this, we designed a questionnaire to reveal the semantics' intuition of each user. This article presents both the design of the questionnaire and an in-depth analysis of the 124 responses obtained. The outcomes indicate that two of the semantics are predominantly intuitive; however, neither of them aligns with the semantics of the primary state-of-the-art algorithms. As a result, we provide recommendations to account for this disparity in the conclusions drawn.
摘要
一种顺序模式 WITH negation,或负顺序模式,的形式是一种顺序模式,其中可以在一些模式itemset前面使用否定符。Intuitively,这种模式在序列中出现,当负否定itemset缺失在序列中。 recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the-art algorithms do not extract the same sets of patterns. This raises the important question of the interpretability of sequential pattern with negation. In this study, our focus is on exploring how potential users perceive negation in sequential patterns. Our aim is to determine whether specific semantics are more "intuitive" than others and whether these align with the semantics employed by one or more state-of-the-art algorithms. To achieve this, we designed a questionnaire to reveal the semantics' intuition of each user. This article presents both the design of the questionnaire and an in-depth analysis of the 124 responses obtained. The outcomes indicate that two of the semantics are predominantly intuitive; however, neither of them aligns with the semantics of the primary state-of-the-art algorithms. As a result, we provide recommendations to account for this disparity in the conclusions drawn.Note: The word "WITH" in the original text is not translated as it is not a word in Simplified Chinese. Instead, the phrase "顺序模式 WITH negation" is translated as "顺序模式 WITH 否定" (sequential pattern with negation).
Cloud-Based Hierarchical Imitation Learning for Scalable Transfer of Construction Skills from Human Workers to Assisting Robots
results: 这个研究提出了一个具有实验学习(HIL)模型和云 robotics技术的虚拟示范框架,可以帮助将职人的手艺技能转移到机器人身上,并且可以重复使用这些示范,以减少人工示范的需求。这个框架可以帮助提高建筑工程中的雇员多样性和教育背景。Abstract
Assigning repetitive and physically-demanding construction tasks to robots can alleviate human workers's exposure to occupational injuries. Transferring necessary dexterous and adaptive artisanal construction craft skills from workers to robots is crucial for the successful delegation of construction tasks and achieving high-quality robot-constructed work. Predefined motion planning scripts tend to generate rigid and collision-prone robotic behaviors in unstructured construction site environments. In contrast, Imitation Learning (IL) offers a more robust and flexible skill transfer scheme. However, the majority of IL algorithms rely on human workers to repeatedly demonstrate task performance at full scale, which can be counterproductive and infeasible in the case of construction work. To address this concern, this paper proposes an immersive, cloud robotics-based virtual demonstration framework that serves two primary purposes. First, it digitalizes the demonstration process, eliminating the need for repetitive physical manipulation of heavy construction objects. Second, it employs a federated collection of reusable demonstrations that are transferable for similar tasks in the future and can thus reduce the requirement for repetitive illustration of tasks by human agents. Additionally, to enhance the trustworthiness, explainability, and ethical soundness of the robot training, this framework utilizes a Hierarchical Imitation Learning (HIL) model to decompose human manipulation skills into sequential and reactive sub-skills. These two layers of skills are represented by deep generative models, enabling adaptive control of robot actions. By delegating the physical strains of construction work to human-trained robots, this framework promotes the inclusion of workers with diverse physical capabilities and educational backgrounds within the construction industry.
摘要
<>发现给定文本的简化中文翻译。<>委托 repetitive 和 physically-demanding 的建筑任务给机器人,可以减轻人工工作者的职业危害风险。将必要的灵活和适应的艺术工艺技能从工作者传递到机器人是成功委托建筑任务和获得高质量机器人构建的关键。预定的运动规划脚本通常在无结构的建筑现场环境中生成僵化和碰撞的机器人行为。相比之下,学习模式(IL)提供了更加稳定和灵活的技能传递方案。然而,大多数 IL 算法需要人工工作者重复地展示任务完成,这可能是不可能的和不可预期的在建筑工作中。为解决这个问题,本文提出了一个 immerse 云 robotics 基础设施,它拥有以下两个主要目的:首先,它将示例过程数字化,从而消除重复地Physical 执行重构建筑物品的需要。其次,它使用一个 Federated 集合的可重用示例,以便在未来对类似任务进行快速协调。此外,为了增强机器人培训的可靠性、可解释性和伦理合理性,该框架使用 Hierarchical Imitation Learning(HIL)模型,将人类抓取技能 decomposed 成Sequential 和 reactive 两层。这两层技能被表示为深度生成模型,以便在机器人行为中进行适应控制。通过委托建筑工作给人类培训的机器人,这个框架推广了建筑业中不同的身体能力和教育背景的人员的包容性。
Hand Gesture Recognition with Two Stage Approach Using Transfer Learning and Deep Ensemble Learning
results: 研究获得了98.88%的准确率,这表明了深度ensemble学习技术在人工智能与Computing 中的应用潜力。Abstract
Human-Computer Interaction (HCI) has been the subject of research for many years, and recent studies have focused on improving its performance through various techniques. In the past decade, deep learning studies have shown high performance in various research areas, leading researchers to explore their application to HCI. Convolutional neural networks can be used to recognize hand gestures from images using deep architectures. In this study, we evaluated pre-trained high-performance deep architectures on the HG14 dataset, which consists of 14 different hand gesture classes. Among 22 different models, versions of the VGGNet and MobileNet models attained the highest accuracy rates. Specifically, the VGG16 and VGG19 models achieved accuracy rates of 94.64% and 94.36%, respectively, while the MobileNet and MobileNetV2 models achieved accuracy rates of 96.79% and 94.43%, respectively. We performed hand gesture recognition on the dataset using an ensemble learning technique, which combined the four most successful models. By utilizing these models as base learners and applying the Dirichlet ensemble technique, we achieved an accuracy rate of 98.88%. These results demonstrate the effectiveness of the deep ensemble learning technique for HCI and its potential applications in areas such as augmented reality, virtual reality, and game technologies.
摘要
人机交互(HCI)已经是多年的研究主题,而最近的研究强调提高其性能通过不同的技术。过去十年,深度学习研究在各个领域表现出色,导致研究者想要把它们应用于HCI。通过深度神经网络识别手势图像,可以使用深度建筑。本研究在HG14数据集上评估了22种不同的模型,其中包括VGGNet和MobileNet模型的多种版本。结果发现,VGG16和VGG19模型的准确率分别为94.64%和94.36%,而MobileNet和MobileNetV2模型的准确率分别为96.79%和94.43%。我们使用了ensemble学习技术,将这些模型作为基础学习器,并应用Dirichlet ensemble技术,达到了98.88%的准确率。这些结果表明深度ensemble学习技术在HCI中的效iveness,并在虚拟现实、扩展现实和游戏技术等领域有潜力应用。
Dataset Factory: A Toolchain For Generative Computer Vision Datasets
results: 该论文的实验结果表明,使用“数据工厂”方法可以提高生成AI工作流程的数据处理效率和可重用性。Abstract
Generative AI workflows heavily rely on data-centric tasks - such as filtering samples by annotation fields, vector distances, or scores produced by custom classifiers. At the same time, computer vision datasets are quickly approaching petabyte volumes, rendering data wrangling difficult. In addition, the iterative nature of data preparation necessitates robust dataset sharing and versioning mechanisms, both of which are hard to implement ad-hoc. To solve these challenges, we propose a "dataset factory" approach that separates the storage and processing of samples from metadata and enables data-centric operations at scale for machine learning teams and individual researchers.
摘要
CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches
results: 实验结果表明,我们的方法在隐私保护、空间时间特征保持和下游实用性方面比基线方法表现更好,为人流动数据隐私研究Using生成AI技术和数据伦理问题提供新的视角。Abstract
The prevalence of ubiquitous location-aware devices and mobile Internet enables us to collect massive individual-level trajectory dataset from users. Such trajectory big data bring new opportunities to human mobility research but also raise public concerns with regard to location privacy. In this work, we present the Conditional Adversarial Trajectory Synthesis (CATS), a deep-learning-based GeoAI methodological framework for privacy-preserving trajectory data generation and publication. CATS applies K-anonymity to the underlying spatiotemporal distributions of human movements, which provides a distributional-level strong privacy guarantee. By leveraging conditional adversarial training on K-anonymized human mobility matrices, trajectory global context learning using the attention-based mechanism, and recurrent bipartite graph matching of adjacent trajectory points, CATS is able to reconstruct trajectory topology from conditionally sampled locations and generate high-quality individual-level synthetic trajectory data, which can serve as supplements or alternatives to raw data for privacy-preserving trajectory data publication. The experiment results on over 90k GPS trajectories show that our method has a better performance in privacy preservation, spatiotemporal characteristic preservation, and downstream utility compared with baseline methods, which brings new insights into privacy-preserving human mobility research using generative AI techniques and explores data ethics issues in GIScience.
摘要
“现代社会中普遍存在 ubique 位置意识设备和移动互联网,我们可以从用户收集巨大的个人化轨迹数据集。这些轨迹大数据为人类活动研究带来了新的机会,但也引起了人们关于位置隐私的担忧。在这种情况下,我们提出了 Conditional Adversarial Trajectory Synthesis(CATS),一种基于深度学习的GeoAI方法框架,用于隐私保护的轨迹数据生成和发布。CATS通过对人类活动的下层空间时间分布进行K-anonimity处理,提供了强的隐私保证。通过使用受条件 adversarial 训练的人类活动矩阵,沿着邻近轨迹点的循环双向图匹配,以及使用注意力机制进行轨迹全球上下文学习,CATS可以从受条件采样的位置中重建轨迹拓扑,并生成高质量的个人化 sintetic 轨迹数据,可以作为隐私保护下的轨迹数据发布的补充或替代。实验结果表明,我们的方法在隐私保护、空间时间特征保持和下游实用性方面表现更好于基eline方法,这带来了新的思路 для隐私保护的人类活动研究,并探讨了GIScience中的数据伦理问题。”
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
results: 研究人员发现了许多潜在的安全问题,包括输入筛选器的脆弱性和系统性的安全问题,这些问题可能会影响生成图像模型的安全性。Abstract
Text-conditioned image generation models have recently achieved astonishing image quality and alignment results. Consequently, they are employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also produce unsafe content. As a contribution to the Adversarial Nibbler challenge, we distill a large set of over 1,000 potential adversarial inputs from existing safety benchmarks. Our analysis of the gathered prompts and corresponding images demonstrates the fragility of input filters and provides further insights into systematic safety issues in current generative image models.
摘要
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
results: 相比其他3B参数模型,BTLM-3B-8K在下游任务中表现出2-5.5%的提升,而且在长上下文任务中也表现出优秀的表现,比如MPT-7B-8K和XGen-7B-8K。此外,BTLM-3B-8K的计算资源占用相对较少,只需3GB的内存和2.5倍的计算资源。Abstract
We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B parameter models. Additionally, BTLM-3B-8K provides excellent long context performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192 context length. We trained the model on a cleaned and deduplicated SlimPajama dataset; aggressively tuned the \textmu P hyperparameters and schedule; used ALiBi position embeddings; and adopted the SwiGLU nonlinearity. On Hugging Face, the most popular models have 7B parameters, indicating that users prefer the quality-size ratio of 7B models. Compacting the 7B parameter model to one with 3B parameters, with little performance impact, is an important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision and takes 2.5x less inference compute than 7B models, helping to open up access to a powerful language model on mobile and edge devices. BTLM-3B-8K is available under an Apache 2.0 license on Hugging Face: https://huggingface.co/cerebras/btlm-3b-8k-base.
摘要
我们介绍“BTLM-3B-8K”语言模型,是一个新的州际之冠开源语言模型,拥有30亿个参数。BTLM-3B-8K在627亿个Token的SlimPajama数据集上进行训练,并使用2048和8192的上下文长度混合训练。相比于现有的30亿个参数模型,BTLM-3B-8K在下游任务中表现出2-5.5%的提升。此外,BTLM-3B-8K在长上下文任务中表现出色,比MPT-7B-8K和XGen-7B-8K更高。我们在精简和删除了SlimPajama数据集上训练这个模型,并严格地调整了μP参数和时间表。此外,我们还使用了ALiBi位嵌入和SwiGLU非线性。在Hugging Face上,最受欢迎的模型都有70亿个参数,这表明用户对70亿个参数模型的质量-大小比例感兴趣。将70亿个参数模型缩减到30亿个参数,几乎没有影响性能,是一个重要的里程碑。BTLM-3B-8K只需要3GB的内存和4位准确,在测试过程中耗用2.5倍的计算资源,帮助开辟了一个具有强大语言模型的门槛,并且可以在移动和边缘设备上运行。BTLM-3B-8K在Hugging Face上可以免费下载:https://huggingface.co/cerebras/btlm-3b-8k-base。
Limitations in odour recognition and generalisation in a neuromorphic olfactory circuit
results: 研究发现,该算法在识别不同气体芳香的能力较强,但是在重复 presentaion 的情况下,模型的泛化能力有限。此外,研究还发现了一些限制,导致部分结论需要进一步验证。Abstract
Neuromorphic computing is one of the few current approaches that have the potential to significantly reduce power consumption in Machine Learning and Artificial Intelligence. Imam & Cleland presented an odour-learning algorithm that runs on a neuromorphic architecture and is inspired by circuits described in the mammalian olfactory bulb. They assess the algorithm's performance in "rapid online learning and identification" of gaseous odorants and odorless gases (short "gases") using a set of gas sensor recordings of different odour presentations and corrupting them by impulse noise. We replicated parts of the study and discovered limitations that affect some of the conclusions drawn. First, the dataset used suffers from sensor drift and a non-randomised measurement protocol, rendering it of limited use for odour identification benchmarks. Second, we found that the model is restricted in its ability to generalise over repeated presentations of the same gas. We demonstrate that the task the study refers to can be solved with a simple hash table approach, matching or exceeding the reported results in accuracy and runtime. Therefore, a validation of the model that goes beyond restoring a learned data sample remains to be shown, in particular its suitability to odour identification tasks.
摘要
Chain-of-Verification Reduces Hallucination in Large Language Models
results: 我们在各种任务上(如Wikidata列表问题、关闭书MultiSpanQA和长文本生成)实验表明,CoVe可以减少假信息的发生。Abstract
Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-check its draft; (iii) answers those questions independently so the answers are not biased by other responses; and (iv) generates its final verified response. In experiments, we show CoVe decreases hallucinations across a variety of tasks, from list-based questions from Wikidata, closed book MultiSpanQA and longform text generation.
摘要
大型语言模型中的幻想(hallucination)问题仍未得到解决。我们研究语言模型是否可以对其回答进行检查和更正。我们开发了链式验证(Chain-of-Verification,CoVe)方法,它包括以下四个步骤:1. 模型首先提出一个初步答案(draft);2. 然后,模型计划一系列的验证问题,以验证其初步答案是否正确;3. 模型独立地回答这些验证问题,以避免受其他答案的影响;4. 最后,模型生成一个经验验证的答案。在实验中,我们发现CoVe可以在多种任务上减少幻想,包括基于Wikidata的列表问题、关闭书MultiSpanQA和长文本生成等。
Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
results: 在两个机器人操作benchmark(ManiSkill2、MetaWorld)和两个mujoco的 locomotive环境中,使用生成的奖励函数让策略取得了13项17个任务的成功率和速度与专家写的奖励函数相当或更高,并且在六个新的 locomotive行为中取得了94%以上的成功率。此外,我们还证明了使用我们的方法在实际世界中部署的策略。最后,我们通过人工反馈来进一步改进策略的奖励函数。视频结果可以在https://text-to-reward.github.io查看。Abstract
Designing reward functions is a longstanding challenge in reinforcement learning (RL); it requires specialized knowledge or domain data, leading to high costs for development. To address this, we introduce Text2Reward, a data-free framework that automates the generation of dense reward functions based on large language models (LLMs). Given a goal described in natural language, Text2Reward generates dense reward functions as an executable program grounded in a compact representation of the environment. Unlike inverse RL and recent work that uses LLMs to write sparse reward codes, Text2Reward produces interpretable, free-form dense reward codes that cover a wide range of tasks, utilize existing packages, and allow iterative refinement with human feedback. We evaluate Text2Reward on two robotic manipulation benchmarks (ManiSkill2, MetaWorld) and two locomotion environments of MuJoCo. On 13 of the 17 manipulation tasks, policies trained with generated reward codes achieve similar or better task success rates and convergence speed than expert-written reward codes. For locomotion tasks, our method learns six novel locomotion behaviors with a success rate exceeding 94%. Furthermore, we show that the policies trained in the simulator with our method can be deployed in the real world. Finally, Text2Reward further improves the policies by refining their reward functions with human feedback. Video results are available at https://text-to-reward.github.io
摘要
��utes2��ward是一种抽象的游戏机制,可以自动生成填充的奖励函数,不需要特殊的知识或域数据,从而降低开发成本。为解决这个问题,我们引入Text2Reward,一种数据自由框架,可以自动生成填充的奖励函数,基于大型自然语言模型(LLM)。给出一个用自然语言描述的目标,Text2Reward可以生成填充的奖励函数,作为可执行的程序,并将其与环境的减少表示相关联。不同于反向RL和最近的工作,使用LLM写稀疏奖励代码,Text2Reward生成的奖励代码可读性好,可以覆盖各种任务,使用现有包,并允许迭代反馈。我们在ManiSkill2和MetaWorld两个机器人 manipulate 测试环境中进行了评估,以及MuJoCo两个涂抹环境。在13个机器人 manipulate 任务中,使用生成的奖励函数训练的策略的任务成功率和速度与专家写的奖励函数相当或更好。此外,我们的方法学习了6种新的行走行为,其成功率超过94%。最后,我们表明使用我们的方法在实际世界中训练的策略可以在真实世界中部署。 Text2Reward 还可以通过人类反馈来进一步改进策略的奖励函数。视频结果可以在 查看。
Fictional Worlds, Real Connections: Developing Community Storytelling Social Chatbots through LLMs
results: 该研究结果表明,通过故事的使用可以增强社交虚拟助手在社区 setting中的参与度和可信度。Abstract
We address the integration of storytelling and Large Language Models (LLMs) to develop engaging and believable Social Chatbots (SCs) in community settings. Motivated by the potential of fictional characters to enhance social interactions, we introduce Storytelling Social Chatbots (SSCs) and the concept of story engineering to transform fictional game characters into "live" social entities within player communities. Our story engineering process includes three steps: (1) Character and story creation, defining the SC's personality and worldview, (2) Presenting Live Stories to the Community, allowing the SC to recount challenges and seek suggestions, and (3) Communication with community members, enabling interaction between the SC and users. We employed the LLM GPT-3 to drive our SSC prototypes, "David" and "Catherine," and evaluated their performance in an online gaming community, "DE (Alias)," on Discord. Our mixed-method analysis, based on questionnaires (N=15) and interviews (N=8) with community members, reveals that storytelling significantly enhances the engagement and believability of SCs in community settings.
摘要
我们研究将故事与大型自然语言模型(LLM)结合,以开发在社区中引人入来和 credible 的社交聊天机器人(SC)。我们被启发了虚构人物可以增强社交互动的潜力,因此我们引入了 Storytelling Social Chatbots(SSCs)和故事工程技术,将虚构游戏角色转化为社区中的 "live" 社交实体。我们的故事工程过程包括三个步骤:(1)人物和故事创作,定义 SC 的个性和观点,(2)向社区成员展示Live Story,让 SC 描述挑战和寻求建议,(3)与社区成员交流,允许 SC 与用户互动。我们使用 GPT-3 LLM 驱动我们的 SSC 原型 "David" 和 "Catherine",并在 Discord 上的在线游戏社区 "DE (Alias)" 进行了评估。我们的混合方法分析,基于问卷 (N=15) 和采访 (N=8) 的社区成员,表明故事在社区设置中可以显著提高 SC 的参与度和吸引力。
Multi-view Fuzzy Representation Learning with Rules based Model
results: 对多个标准多视图数据集进行了广泛的实验 validate the superiority of the proposed method。Abstract
Unsupervised multi-view representation learning has been extensively studied for mining multi-view data. However, some critical challenges remain. On the one hand, the existing methods cannot explore multi-view data comprehensively since they usually learn a common representation between views, given that multi-view data contains both the common information between views and the specific information within each view. On the other hand, to mine the nonlinear relationship between data, kernel or neural network methods are commonly used for multi-view representation learning. However, these methods are lacking in interpretability. To this end, this paper proposes a new multi-view fuzzy representation learning method based on the interpretable Takagi-Sugeno-Kang (TSK) fuzzy system (MVRL_FS). The method realizes multi-view representation learning from two aspects. First, multi-view data are transformed into a high-dimensional fuzzy feature space, while the common information between views and specific information of each view are explored simultaneously. Second, a new regularization method based on L_(2,1)-norm regression is proposed to mine the consistency information between views, while the geometric structure of the data is preserved through the Laplacian graph. Finally, extensive experiments on many benchmark multi-view datasets are conducted to validate the superiority of the proposed method.
摘要
多视角表示学习已经广泛研究了多视角数据的挖掘。然而,有些关键挑战仍然存在。一方面,现有的方法不能全面探索多视角数据,因为它们通常学习多视角数据中的共同信息,而不是每个视角中的特定信息。另一方面,用于挖掘非线性关系的内核或神经网络方法通常缺乏可解释性。为此,本文提出了一种新的多视角杂化表示学习方法,基于可解释的 Takagi-Sugeno-Kang(TSK)杂化系统(MVRL_FS)。该方法在两个方面实现多视角表示学习。首先,多视角数据被转换成一个高维杂化特征空间,同时探索多视角数据中的共同信息和每个视角中的特定信息。其次,基于L_(2,1)-norm回归的新规则方法被提出,以挖掘视角之间的一致信息,保留数据的几何结构通过拉普拉斯图。最后,对许多标准多视角数据集进行了广泛的实验,以验证提议方法的超越性。
results: 对12个多标签数据集进行实验,结果表明ML-TSK FS与现有方法相比,在各种评价指标中表现竞争力强,表明它可以有效地通过辛诺干式规则模型特性和特征标签关系,提高分类性能。Abstract
Multi-label classification can effectively identify the relevant labels of an instance from a given set of labels. However,the modeling of the relationship between the features and the labels is critical to the classification performance. To this end, we propose a new multi-label classification method, called Multi-Label Takagi-Sugeno-Kang Fuzzy System (ML-TSK FS), to improve the classification performance. The structure of ML-TSK FS is designed using fuzzy rules to model the relationship between features and labels. The fuzzy system is trained by integrating fuzzy inference based multi-label correlation learning with multi-label regression loss. The proposed ML-TSK FS is evaluated experimentally on 12 benchmark multi-label datasets. 1 The results show that the performance of ML-TSK FS is competitive with existing methods in terms of various evaluation metrics, indicating that it is able to model the feature-label relationship effectively using fuzzy inference rules and enhances the classification performance.
摘要
多标签分类可以有效地从给定的标签集中确定实例的相关标签。然而,模型特性和标签之间的关系是多标签分类性能的关键因素。为此,我们提出了一种新的多标签分类方法,即多标签多SK满足系统(ML-TSK FS),以提高分类性能。ML-TSK FS的结构采用规则来模型特性和标签之间的关系。这个规则是通过多态推理和多标签相互关系学习来训练的。我们对12个多标签数据集进行实验评估了ML-TSK FS的性能。结果表明,ML-TSK FS与现有方法相比,在不同的评价指标上具有竞争力,这表明它可以通过多态推理规则来有效地模型特性和标签之间的关系,提高分类性能。
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition
results: 透过实验和分析,发现 modified frequency domain 攻击能够实现这些特性,并且在线上 keyword classification 任务中提供了高效的攻击方法。Abstract
Automatic Speech Recognition systems have been shown to be vulnerable to adversarial attacks that manipulate the command executed on the device. Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA model, and we design a method of generating attacks with arbitrary such desired properties, namely the invariance to synchronization, and the robustness to filtering: this allows a Denial-of-Service (DoS) attack against ASR systems. We achieve these characteristics by constructing attacks in a modified frequency domain through an inverse Fourier transform. We evaluate our method on standard keyword classification tasks and analyze it in OTA, and we analyze the properties of the cross-domain attacks to explain the efficiency of the approach.
摘要
自动话语识别系统已经被证明容易受到敌意攻击,这些攻击可以控制设备上执行的命令。最近的研究主要关注于探索如何创建这些攻击,但是一些过空中攻击(OTA)问题尚未得到充分解决。在我们的工作中,我们分析了需要的抗性攻击的属性,并设计了生成攻击具有任意想要的属性的方法,包括不变性和过滤器的Robustness。我们通过对射Transformer来实现这些特性,并在标准关键词分类任务上评估了我们的方法。我们还分析了跨频域攻击的性质,以解释我们的方法的高效性。
Generative Agent-Based Modeling: Unveiling Social System Dynamics through Coupling Mechanistic Models with Generative Artificial Intelligence
paper_authors: Navid Ghaffarzadegan, Aritra Majumdar, Ross Williams, Niyousha Hosseinichimeh
for: 这篇论文探讨了使用生成人工智能建模社会系统的新机遇。
methods: 这些模型使用大语言模型如ChatGPT来表示人类决策行为在社会设置下。
results: 这篇论文提供了一个简单的社会规范传播模型,并对其 Results 进行了广泛的调查和敏感性分析。Abstract
We discuss the emerging new opportunity for building feedback-rich computational models of social systems using generative artificial intelligence. Referred to as Generative Agent-Based Models (GABMs), such individual-level models utilize large language models such as ChatGPT to represent human decision-making in social settings. We provide a GABM case in which human behavior can be incorporated in simulation models by coupling a mechanistic model of human interactions with a pre-trained large language model. This is achieved by introducing a simple GABM of social norm diffusion in an organization. For educational purposes, the model is intentionally kept simple. We examine a wide range of scenarios and the sensitivity of the results to several changes in the prompt. We hope the article and the model serve as a guide for building useful diffusion models that include realistic human reasoning and decision-making.
摘要
我们讨论新兴的机会:使用生成人工智能建构具有反馈丰富的社交系统模型。称为生成代理模型(GABM),这些个体级模型利用大量语言模型如ChatGPT来表示人类决策在社交设置中。我们提供一个GABM例子,将人类行为integrated到模拟模型中,通过与预训大量语言模型 Coupling 的方式。这是通过将社交norm传播模型简化为 Educational purposes 的方式。我们评估了广泛的情况,并评估了变量的敏感度。我们希望这篇文章和模型可以serve as a guide для建立包含现实人类思维和决策的传播模型。
Using deep learning to construct stochastic local search SAT solvers with performance bounds
for: 这 paper 是关于 Boolean Satisfiability problem (SAT) 的研究,具体来说是使用 Graph Neural Networks (GNN) 训练 oracle,以提高 Stochastic Local Search (SLS) 算法的性能。
methods: 这 paper 使用了 GNN 训练 oracle,并将其应用于两种 SLS 算法上,以解决随机 SAT 实例。
results: 研究发现,通过使用 GNN 训练 oracle,SLS 算法的性能得到了明显提高,可以解决更难的 SAT 实例,并且可以在更少的步骤数下解决。Abstract
The Boolean Satisfiability problem (SAT) is the most prototypical NP-complete problem and of great practical relevance. One important class of solvers for this problem are stochastic local search (SLS) algorithms that iteratively and randomly update a candidate assignment. Recent breakthrough results in theoretical computer science have established sufficient conditions under which SLS solvers are guaranteed to efficiently solve a SAT instance, provided they have access to suitable "oracles" that provide samples from an instance-specific distribution, exploiting an instance's local structure. Motivated by these results and the well established ability of neural networks to learn common structure in large datasets, in this work, we train oracles using Graph Neural Networks and evaluate them on two SLS solvers on random SAT instances of varying difficulty. We find that access to GNN-based oracles significantly boosts the performance of both solvers, allowing them, on average, to solve 17% more difficult instances (as measured by the ratio between clauses and variables), and to do so in 35% fewer steps, with improvements in the median number of steps of up to a factor of 8. As such, this work bridges formal results from theoretical computer science and practically motivated research on deep learning for constraint satisfaction problems and establishes the promise of purpose-trained SAT solvers with performance guarantees.
摘要
布尔满意性问题(SAT)是NP完备问题的最典型例子,具有实际重要性。一种重要的SAT解决方法是随机地更新候选分配的杂化搜索算法(SLS)。最近的理论计算机科学成果表明,如果SLS算法有访问适合的"oracle",那么它们可以有效地解决SAT实例, provided they have access to suitable "oracles" that provide samples from an instance-specific distribution, exploiting an instance's local structure. 在这种情况下,我们使用图神经网络训练 oracle,并对两种SLS解决方法进行评估,在随机SAT实例上进行测试。我们发现,通过访问GNN基于 oracle,可以大幅提高SLS解决方法的性能,使其能够解决更难的实例(按照条件数和变量的比率来度量),并且在更少的步骤内完成(比如,在35% fewer steps中完成)。此外,我们发现,在 median number of steps 中,GNN基于 oracle 可以提高 SLS 解决方法的性能,最高可以提高8倍。因此,这项研究将理论计算机科学的成果与深度学习的实践研究相结合,并证明了专门为SAT问题训练的深度学习算法可以提供性能保证。
You Only Look at Screens: Multimodal Chain-of-Action Agents
results: 实验结果显示,Auto-UI在新的设备控制benchmark AITW上达到了状态码的性能,具有动作类型预测精度90%和总成功率74%。代码公开可用于https://github.com/cooelf/Auto-UI。Abstract
Autonomous user interface (UI) agents aim to facilitate task automation by interacting with the user interface without manual intervention. Recent studies have investigated eliciting the capabilities of large language models (LLMs) for effective engagement in diverse environments. To align with the input-output requirement of LLMs, existing approaches are developed under a sandbox setting where they rely on external tools and application-specific APIs to parse the environment into textual elements and interpret the predicted actions. Consequently, those approaches often grapple with inference inefficiency and error propagation risks. To mitigate the challenges, we introduce Auto-UI, a multimodal solution that directly interacts with the interface, bypassing the need for environment parsing or reliance on application-dependent APIs. Moreover, we propose a chain-of-action technique -- leveraging a series of intermediate previous action histories and future action plans -- to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30K unique instructions, spanning multi-step tasks such as application operation, web searching, and web shopping. Experimental results show that Auto-UI achieves state-of-the-art performance with an action type prediction accuracy of 90% and an overall action success rate of 74%. Code is publicly available at https://github.com/cooelf/Auto-UI.
摘要
自动化用户界面(UI)代理,目的是自动化任务,不需要人工干预。最近的研究已经利用大型自然语言模型(LLM)来实现多种环境中的有效交互。为了与输入和输出对应的LLM的需求,现有的方法采用沙盒环境,通过外部工具和应用程序特定的API来解析环境并解释预测的动作。然而,这些方法经常会遇到推理不准确和错误传递风险。为了解决这些挑战,我们提出了Auto-UI,一种多模式解决方案,可以直接与界面交互,无需解析环境或依赖于应用程序特定的API。此外,我们还提出了链条动作技术,利用前一系列的历史动作和未来动作计划,帮助代理决定执行哪一个动作。我们在新的设备控制标准AITW上进行了实验,并取得了state-of-the-art表现,具体如下:* 动作类型预测精度达90%* 总体动作成功率达74%代码可以在https://github.com/cooelf/Auto-UI上获取。
A Systematic Review of Few-Shot Learning in Medical Imaging
results: 文章显示了几何学学习可以在大多数的结果中超过数据不足的问题,并且meta-learning是几何学学习中最受欢迎的方法,可以适应新任务的几何学学习。此外,文章还发现了在医学影像分析中几何学学习中使用的主要技术是supervised learning和semi-supervised learning,并且这些技术在医疗影像分析中表现最佳。最后,文章发现了主要应用领域主要是心脏、肺和腹部领域。Abstract
The lack of annotated medical images limits the performance of deep learning models, which usually need large-scale labelled datasets. Few-shot learning techniques can reduce data scarcity issues and enhance medical image analysis, especially with meta-learning. This systematic review gives a comprehensive overview of few-shot learning in medical imaging. We searched the literature systematically and selected 80 relevant articles published from 2018 to 2023. We clustered the articles based on medical outcomes, such as tumour segmentation, disease classification, and image registration; anatomical structure investigated (i.e. heart, lung, etc.); and the meta-learning method used. For each cluster, we examined the papers' distributions and the results provided by the state-of-the-art. In addition, we identified a generic pipeline shared among all the studies. The review shows that few-shot learning can overcome data scarcity in most outcomes and that meta-learning is a popular choice to perform few-shot learning because it can adapt to new tasks with few labelled samples. In addition, following meta-learning, supervised learning and semi-supervised learning stand out as the predominant techniques employed to tackle few-shot learning challenges in medical imaging and also best performing. Lastly, we observed that the primary application areas predominantly encompass cardiac, pulmonary, and abdominal domains. This systematic review aims to inspire further research to improve medical image analysis and patient care.
摘要
因为医疗影像标签的缺乏,深度学习模型的性能受到限制。不过,几个shot学习技术可以解决数据缺乏问题,提高医疗影像分析,特别是在meta-learning中。这个系统性审查给出了医疗影像中几个shot学习的全面回顾。我们在2018年至2023年发布的80篇相关文献中进行了系统性搜寻,并根据医疗结果(例如肿瘤分类、病理分类、影像调整)、 investigate体部(例如心脏、肺部等)和使用的meta-learning方法进行分组。对每个分组,我们评估了文献的分布和顶尖的结果。此外,我们发现了所有研究中的通用架构。审查结果表明,几个shot学习可以在大多数结果中突破数据缺乏问题,meta-learning是最受欢迎的选择,因为它可以适应新任务 WITH FEW labelled samples。此外,在医疗影像中,以supervised learning和semi-supervised learning为主的技术被大量运用,并且表现最佳。最后,我们发现主要应用领域主要是心脏、肺部和腹部领域。这个系统性审查的目的是鼓励进一步的研究,以提高医疗影像分析和patient care。
Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing
paper_authors: Sewoong Lee, JinKyou Choi, Min Su Kim
For: 这个研究旨在运用时间序列数据的特征来探测半导体制造中的异常现象。* Methods: 研究使用时间序列嵌入和生成预训Transformers来预训时间序列数据,并使用标 entropy损失函数来分类异常时间序列和正常时间序列。* Results: 研究表明,我们的模型在UCSD时间序列分类数据集和化学蒸发成长(CVD)设备的处理记录上都显示出更好的表现,与过去的无supervision模型相比。我们的模型在EER上的F1分数最高,并且仅仅0.026下于无supervision基准。Abstract
This paper introduces TRACE-GPT, which stands for Time-seRies Anomaly-detection with Convolutional Embedding and Generative Pre-trained Transformers. TRACE-GPT is designed to pre-train univariate time-series sensor data and detect faults on unlabeled datasets in semiconductor manufacturing. In semiconductor industry, classifying abnormal time-series sensor data from normal data is important because it is directly related to wafer defect. However, small, unlabeled, and even mixed training data without enough anomalies make classification tasks difficult. In this research, we capture features of time-series data with temporal convolutional embedding and Generative Pre-trained Transformer (GPT) to classify abnormal sequences from normal sequences using cross entropy loss. We prove that our model shows better performance than previous unsupervised models with both an open dataset, the University of California Riverside (UCR) time-series classification archive, and the process log of our Chemical Vapor Deposition (CVD) equipment. Our model has the highest F1 score at Equal Error Rate (EER) across all datasets and is only 0.026 below the supervised state-of-the-art baseline on the open dataset.
摘要
EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning
results: 本文的结果显示,EDMP 能够与 State-of-the-Art 的深度学习基于方法相比,成功率有所提高,并且保留了经典步骤的普遍性。Abstract
Classical motion planning for robotic manipulation includes a set of general algorithms that aim to minimize a scene-specific cost of executing a given plan. This approach offers remarkable adaptability, as they can be directly used off-the-shelf for any new scene without needing specific training datasets. However, without a prior understanding of what diverse valid trajectories are and without specially designed cost functions for a given scene, the overall solutions tend to have low success rates. While deep-learning-based algorithms tremendously improve success rates, they are much harder to adopt without specialized training datasets. We propose EDMP, an Ensemble-of-costs-guided Diffusion for Motion Planning that aims to combine the strengths of classical and deep-learning-based motion planning. Our diffusion-based network is trained on a set of diverse kinematically valid trajectories. Like classical planning, for any new scene at the time of inference, we compute scene-specific costs such as "collision cost" and guide the diffusion to generate valid trajectories that satisfy the scene-specific constraints. Further, instead of a single cost function that may be insufficient in capturing diversity across scenes, we use an ensemble of costs to guide the diffusion process, significantly improving the success rate compared to classical planners. EDMP performs comparably with SOTA deep-learning-based methods while retaining the generalization capabilities primarily associated with classical planners.
摘要
经典运动规划 для机器人操作包括一组通用算法,旨在最小化Scene特定的执行计划的成本。这种方法具有很好的适应性,可以直接在新场景上使用,不需要特定的训练数据。然而,不知道多元有效轨迹的特点和场景特定的成本函数,全局的解决方案通常具有低成功率。深度学习基于算法在成功率上提供了很大的改善,但是它们更难于采用,需要特定的训练数据。我们提出了EDMP,一种ensemble-of-costs-guided Diffusion for Motion Planning,旨在结合经典和深度学习基于的运动规划。我们的扩散网络被训练在一组多元可行的轨迹上。在任何新场景的推理时,我们计算场景特定的碰撞成本和导引扩散来生成符合场景特定的约束的有效轨迹。此外,而不是单一的成本函数,我们使用一个ensemble of costs来引导扩散过程,明显提高成功率相比经典规划器。EDMP和SOTA深度学习基于方法相比,保留了经典规划器的总体化能力。
Long-Form End-to-End Speech Translation via Latent Alignment Segmentation
results: 在多种语言对和内外领域数据上,我们的方法可以 дости得状态的同声翻译质量,而且不需要额外的计算成本。Abstract
Current simultaneous speech translation models can process audio only up to a few seconds long. Contemporary datasets provide an oracle segmentation into sentences based on human-annotated transcripts and translations. However, the segmentation into sentences is not available in the real world. Current speech segmentation approaches either offer poor segmentation quality or have to trade latency for quality. In this paper, we propose a novel segmentation approach for a low-latency end-to-end speech translation. We leverage the existing speech translation encoder-decoder architecture with ST CTC and show that it can perform the segmentation task without supervision or additional parameters. To the best of our knowledge, our method is the first that allows an actual end-to-end simultaneous speech translation, as the same model is used for translation and segmentation at the same time. On a diverse set of language pairs and in- and out-of-domain data, we show that the proposed approach achieves state-of-the-art quality at no additional computational cost.
摘要
当前同时传输模型可以处理音频只有几秒长。当前数据提供了人注释的讲解和翻译,但实际世界中没有这样的分 segmentation。当前的Speech segmentation方法或者提供低质量的分 segmentation或者要求交换延迟和质量。在这篇论文中,我们提出了一种新的分 segmentation方法,用于低延迟的端到端 Speech translation。我们利用现有的Speech translation encoder-decoder架构和 ST CTC,并证明它可以完成分 segmentation任务无需监督或额外参数。根据我们所知,我们的方法是首次实现了实际的同时 Speech translation,因为同时使用了翻译和分 segmentation的同一模型。在多种语言对和内外领域数据上,我们示出了状态机器的质量,没有额外计算成本。
Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions
results: 经过广泛的实验表明,与域专家进行讨论可以有效地促进导航,提高指令相关信息的理解、更正偶极错误和筛选不一致的运动决策。相比单一自动思考,该方法在所有指标上表现出优异。Abstract
Visual language navigation (VLN) is an embodied task demanding a wide range of skills encompassing understanding, perception, and planning. For such a multifaceted challenge, previous VLN methods totally rely on one model's own thinking to make predictions within one round. However, existing models, even the most advanced large language model GPT4, still struggle with dealing with multiple tasks by single-round self-thinking. In this work, drawing inspiration from the expert consultation meeting, we introduce a novel zero-shot VLN framework. Within this framework, large models possessing distinct abilities are served as domain experts. Our proposed navigation agent, namely DiscussNav, can actively discuss with these experts to collect essential information before moving at every step. These discussions cover critical navigation subtasks like instruction understanding, environment perception, and completion estimation. Through comprehensive experiments, we demonstrate that discussions with domain experts can effectively facilitate navigation by perceiving instruction-relevant information, correcting inadvertent errors, and sifting through in-consistent movement decisions. The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics. Additionally, real-robot experiments display the obvious advantages of our method over single-round self-thinking.
摘要
<> translate the following text into Simplified ChineseVisual language navigation (VLN) is an embodied task that requires a wide range of skills, including understanding, perception, and planning. Previous VLN methods have relied solely on one model's own thinking to make predictions within one round. However, even the most advanced large language model GPT4 struggles with handling multiple tasks through single-round self-thinking. In this work, inspired by expert consultation meetings, we introduce a novel zero-shot VLN framework. In this framework, large models with distinct abilities serve as domain experts. Our proposed navigation agent, called DiscussNav, can actively discuss with these experts to collect essential information before moving at every step. These discussions cover critical navigation subtasks such as understanding instructions, perceiving the environment, and estimating completion. Through comprehensive experiments, we demonstrate that discussions with domain experts can effectively facilitate navigation by perceiving instruction-relevant information, correcting inadvertent errors, and filtering out inconsistent movement decisions. The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics. Additionally, real-robot experiments display the obvious advantages of our method over single-round self-thinking.中文简体版:视觉语言导航(VLN)是一个需要各种技能的体验任务,包括理解、感知和规划。先前的VLN方法都是单一模型自己思考,但是即使最先进的大语言模型GPT4也在处理多任务时仍然陷入困难。在这个工作中,启发于专家咨询会议,我们引入了一种新的零扩展VLN框架。在这个框架中,具有不同能力的大模型服为域专家。我们提出的导航代理人称为DiscussNav,可以在每步移动之前与这些专家进行活动的讨论,收集关键导航子任务的信息。这些讨论包括理解指令、识别环境和估计完成度。通过广泛的实验,我们证明了与域专家进行讨论可以有效地促进导航,捕捉指令相关信息, исправ错误和筛选出不一致的移动决策。R2R任务表明,我们的方法在所有指标上胜过领先的零扩展VLN模型。此外,实际Robot实验也显示了我们方法在单一自我思考方面的明显优势。
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
paper_authors: Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar
for: simultanous speech translation
methods: blockwise self-attentional encoder models, incremental blockwise beam search, local agreement or hold-$n$ policies
results: 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.Here’s the full translation in Simplified Chinese:
results: 在 MuST-C 上实验结果显示,无需改变延迟或质量,可以获得0.6-3.6 BLEU 提升,或者可以降低0.8-1.4 s 的延迟。Abstract
Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed -- this scheme cannot directly show a single \textit{incremental} translation to users. Further, this method lacks mechanisms for \textit{controlling} the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-$n$ policies for quality-latency control. We apply our framework to models trained for online or offline translation and demonstrate that both types can be effectively used in online mode. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.
摘要
“块级自注意编码器模型在同时语音翻译方面最近几年来得到了一些承诺。这些模型使用块级搜索和假设可靠性分数来决定等待更多的输入语音之前继续翻译。然而,这种方法会维护多个假设,直到整个语音输入被消耗——这种方案无法直接显示单个增量翻译给用户。此外,这种方法缺乏控制质量vs延迟贸易的机制。我们提议修改增量块级搜索,并添加地方一致或保持-$n$ 策略来控制质量vs延迟的贸易。我们将我们的框架应用于在线或离线训练的模型,并证明两种类型都可以在线模式下使用。实验结果表明,在 Must-C 上得到了0.6-3.6 BLEU 提升,或0.8-1.4 s 延迟提升,无需改变质量或延迟。”
results: 在 i.i.d. 和非 i.i.d. 情况下,实验结果表明我们的方法可以达到领先的性能水平。Abstract
Federated Learning (FL) is a distributed machine learning approach that enables model training in communication efficient and privacy-preserving manner. The standard optimization method in FL is Federated Averaging (FedAvg), which performs multiple local SGD steps between communication rounds. FedAvg has been considered to lack algorithm adaptivity compared to modern first-order adaptive optimizations. In this paper, we propose new communication-efficient FL algortithms based on two adaptive frameworks: local adaptivity (PreFed) and server-side adaptivity (PreFedOp). Proposed methods adopt adaptivity by using a novel covariance matrix preconditioner. Theoretically, we provide convergence guarantees for our algorithms. The empirical experiments show our methods achieve state-of-the-art performances on both i.i.d. and non-i.i.d. settings.
摘要
federated learning (FL) 是一种分布式机器学习方法,可以在通信效率和隐私保护的情况下进行模型训练。标准优化方法在 FL 中是联邦平均(FedAvg),它在通信轮次之间执行多个本地 SGD 步骤。FedAvg 已被认为在与现代首个适应优化相比lack algorithm adaptivity。在这篇论文中,我们提出了新的通信效率FL算法,基于两种适应框架:本地适应(PreFed)和服务器端适应(PreFedOp)。我们的方法采用适应性的novel协方差矩阵预conditioner。我们从理论上提供了收敛保证。实验表明,我们的方法在 i.i.d. 和非 i.i.d. 设置下达到了当前最佳性能。
Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery using Voice Recognition
results: 实验结果表明, static手势认识模块的准确率为94.3%,动态运动认识模块的准确率为97.6%。相比之下,人 Solo执行任务时,提出的方法可以提高工具交elivery的效率,而不会干扰人类意图。Abstract
Human-robot collaboration has benefited users with higher efficiency towards interactive tasks. Nevertheless, most collaborative schemes rely on complicated human-machine interfaces, which might lack the requisite intuitiveness compared with natural limb control. We also expect to understand human intent with low training data requirements. In response to these challenges, this paper introduces an innovative human-robot collaborative framework that seamlessly integrates hand gesture and dynamic movement recognition, voice recognition, and a switchable control adaptation strategy. These modules provide a user-friendly approach that enables the robot to deliver the tools as per user need, especially when the user is working with both hands. Therefore, users can focus on their task execution without additional training in the use of human-machine interfaces, while the robot interprets their intuitive gestures. The proposed multimodal interaction framework is executed in the UR5e robot platform equipped with a RealSense D435i camera, and the effectiveness is assessed through a soldering circuit board task. The experiment results have demonstrated superior performance in hand gesture recognition, where the static hand gesture recognition module achieves an accuracy of 94.3\%, while the dynamic motion recognition module reaches 97.6\% accuracy. Compared with human solo manipulation, the proposed approach facilitates higher efficiency tool delivery, without significantly distracting from human intents.
摘要
人机合作已经为用户带来更高的效率在互动任务中。然而,大多数合作方案依靠复杂的人机界面,可能缺乏自然的人机交互INTUITIVENESS。我们还期望在训练数据量少的情况下理解人类的意图。为回答这些挑战,本文介绍了一种创新的人机合作框架,它灵活地集成了手势认识、动态运动认识、语音识别和可调制控制策略。这些模块提供了一种用户友好的方法,使得机器人可以根据用户需要提供工具,特别是用户在双手工作时。因此,用户可以专注于任务执行而不需要额外培训人机界面的使用,而机器人可以理解用户的自然姿势。本文所提出的多模式互动框架在UR5e机器人平台上执行,装备了RealSense D435i摄像头,并通过焊接电路板任务进行评估。实验结果显示, static手势认识模块的准确率为94.3%,而动态运动认识模块的准确率达97.6%。相比人类独立操作,提议的方法可以提高工具交付效率,无需明显干扰人类意图。
Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG)
paper_authors: Yuan An, Jane Greenberg, Alex Kalinowski, Xintong Zhao, Xiaohua Hu, Fernando J. Uribe-Romo, Kyle Langlois, Jacob Furst, Diego A. Gómez-Gualdrón
results: 研究发现ChatGPT可以有效地解决不同平台和查询语言的KG问答问题,并且可以帮助加速材料科学领域知识Graph的搜索和探索。Abstract
We present a comprehensive benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT), with a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing ChatGPT to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.
摘要
我们提供了一个完整的基准数据集 для知识 graphsQuestion Answering in Materials Science (KGQA4MAT), WITH a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing ChatGPT to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.Here is the translation in Traditional Chinese:我们提供了一个完整的基准数据集 для知识 graphsQuestion Answering in Materials Science (KGQA4MAT), WITH a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) 已经建立了由structured databases和文献中提取的知识 integrate。为了增强MOF-KG对领域专家的存取,我们目标是开发一个自然语言界面来查询知识 Graph。我们已经开发了一个包含161个复杂问题,涉及比较、总和、图像结构的问题。每个问题都有三个版本,共计644个问题和161个KG查询。为了评估基准,我们开发了一个系统性的方法,使用ChatGPT来将自然语言问题转换为正式的KG查询。我们还将这个方法应用到知名的QALD-9数据集上,展示了ChatGPT对不同平台和查询语言的应用潜力。基准和我们提出的方法的目的是促进领域专家用户友好和高效的界面来查询领域专门的材料科学知识图表,以便加速发现新材料的发现。
paper_authors: Simone Maurizio La Cava, Giulia Orrù, Martin Drahansky, Gian Luca Marcialis, Fabio Roli
for: 法律领域中的3D面部重建应用
methods: 使用Surveillance影像和照片进行3D面部重建
results: 略见问题,尚未确立3D面部重建在法律领域的积极角色Abstract
3D face reconstruction algorithms from images and videos are applied to many fields, from plastic surgery to the entertainment sector, thanks to their advantageous features. However, when looking at forensic applications, 3D face reconstruction must observe strict requirements that still make its possible role in bringing evidence to a lawsuit unclear. An extensive investigation of the constraints, potential, and limits of its application in forensics is still missing. Shedding some light on this matter is the goal of the present survey, which starts by clarifying the relation between forensic applications and biometrics, with a focus on face recognition. Therefore, it provides an analysis of the achievements of 3D face reconstruction algorithms from surveillance videos and mugshot images and discusses the current obstacles that separate 3D face reconstruction from an active role in forensic applications. Finally, it examines the underlying data sets, with their advantages and limitations, while proposing alternatives that could substitute or complement them.
摘要
三维面部重建算法从图像和视频应用到多个领域,从整形外科到娱乐业,因为它们的优点。但当看到审判应用时,三维面部重建必须遵守严格的要求,这些要求仍然使其在提供法律证据的角色是不清晰。为了解决这个问题,本调查的目的是 shedding some light on this matter,开始从审判应用和生物ometrics之间的关系进行清楚的解释,并对surveillance视频和抓捕图像中的3D面部重建算法的成果进行分析,并讨论当前障碍三维面部重建在审判应用中扮演活跃角色的原因。最后,它检查了下面的数据集,包括其优点和限制,并提出了代替或补充的方案。Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
paper_authors: Chathurangi Shyalika, Ruwan Wickramarachchi, Amit Sheth
For: 本研究主要针对频率低的罕见事件预测,即使用机器学习和数据分析方法来预测这些事件的发生。* Methods: 本文综述了目前预测罕见事件的方法,包括数据处理、算法方法和评估方法等,并从不同的数据模式和预测方法角度进行了梳理和分析。* Results: 本文结果显示,预测罕见事件存在许多挑战,如数据不均衡、模型偏向等问题,同时还存在许多研究缺乏或未得到充分发挥的问题。Abstract
Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.
摘要
罕seen事件预测 involve identifying和forecasting事件with a low probability using机器学习和数据分析。由于数据分布的偏度,其中常见事件的频率远远大于罕seen事件的频率,因此需要使用特殊的方法在每个机器学习管道中,从数据处理到算法到评估协议。预测罕seen事件的发生是现实世界应用中的重要问题,如第四代工业,并是机器学习的活跃研究领域。本文全面回顾当前approaches for rare event prediction along four dimensions:罕seen事件数据、数据处理、算法approaches、和评估approaches。Specifically, we consider 73 datasets from different modalities(i.e., numerical, image, text, and audio)、四大类数据处理、五大算法组合、和两大评估方法。本文的目的是要标识当前文献中的空白和预测罕seen事件的挑战,并提出了 potential research directions,以帮助实践者和研究人员。
C$\cdot$ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters
results: 论文表明,使用C$\cdot$ASE可以生成高度多样化和现实的技能动作,并且可以在不同的下游任务中重用。此外,该系统还提供了一个高级别的政策或用户可以使用某种技能特定的指定来控制角色的行为。Abstract
We present C$\cdot$ASE, an efficient and effective framework that learns conditional Adversarial Skill Embeddings for physics-based characters. Our physically simulated character can learn a diverse repertoire of skills while providing controllability in the form of direct manipulation of the skills to be performed. C$\cdot$ASE divides the heterogeneous skill motions into distinct subsets containing homogeneous samples for training a low-level conditional model to learn conditional behavior distribution. The skill-conditioned imitation learning naturally offers explicit control over the character's skills after training. The training course incorporates the focal skill sampling, skeletal residual forces, and element-wise feature masking to balance diverse skills of varying complexities, mitigate dynamics mismatch to master agile motions and capture more general behavior characteristics, respectively. Once trained, the conditional model can produce highly diverse and realistic skills, outperforming state-of-the-art models, and can be repurposed in various downstream tasks. In particular, the explicit skill control handle allows a high-level policy or user to direct the character with desired skill specifications, which we demonstrate is advantageous for interactive character animation.
摘要
我们提出C$\cdot$ASE框架,一种高效有效的框架,学习受条件敌意素嵌入,用于物理基础的角色。我们的物理模拟角色可以学习多种多样的技能,同时提供可控性,通过直接控制技能的执行。C$\cdot$ASE将不同的技能动作分成不同的子集,对具有相同性的样本进行训练低级别的条件模型,学习条件行为分布。通过技能条件学习,可以直接控制角色的技能,并且可以在训练过程中通过焦点技能采样、骨骼剩余力和元素特征掩码来平衡多种技能的复杂性,弥补动力匹配问题,捕捉更加普遍的行为特征。一旦训练完成,条件模型可以生成高度多样化和真实的技能,超越当前模型,并且可以在下游任务中重用。特别是,条件控制把手允许高级政策或用户指定角色的愿望技能规格,我们示示其对交互角色动画有利。
paper_authors: Prottay Kumar Adhikary, Bandaru Sugandhi, Subhojit Ghimire, Santanu Pal, Partha Pakray for: 这篇论文是为了提供一种实现语言翻译的视频翻译系统,以便在不同语言背景下进行有效的沟通。methods: 该系统使用了一种综合语音和视频的翻译方法,通过具体的语音和视频对应关系来实现视频中的语言翻译。results: 该系统可以帮助学生和用户在低资源环境中进行有效的学习和沟通,同时提供了一种更加真实和吸引人的学习环境,从而提高学习效果和参与度。Abstract
In today's globalized world, effective communication with people from diverse linguistic backgrounds has become increasingly crucial. While traditional methods of language translation, such as written text or voice-only translations, can accomplish the task, they often fail to capture the complete context and nuanced information conveyed through nonverbal cues like facial expressions and lip movements. In this paper, we present an end-to-end video translation system that not only translates spoken language but also synchronizes the translated speech with the lip movements of the speaker. Our system focuses on translating educational lectures in various Indian languages, and it is designed to be effective even in low-resource system settings. By incorporating lip movements that align with the target language and matching them with the speaker's voice using voice cloning techniques, our application offers an enhanced experience for students and users. This additional feature creates a more immersive and realistic learning environment, ultimately making the learning process more effective and engaging.
摘要
今天的全球化世界中,与不同语言背景的人进行有效沟通已经变得越来越重要。传统的语言翻译方法,如文本或声音翻译,可以完成任务,但它们经常无法捕捉 spoken language 中的完整上下文和细节信息。在这篇论文中,我们提出了一个端到端视频翻译系统,不仅翻译 spoken language,还将翻译后的语音与说话人的嘴语ynchronize。我们的系统专注于翻译印度各语言的教育讲解,并且针对具有低资源系统的设置进行设计。通过使用声音恶搅技术,我们的应用程序将嘴语与目标语言的对应语音进行匹配,从而提供了一个更加真实和有趣的学习环境。这种附加的特性使得学习过程更加有效和有趣。
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
paper_authors: Chengcheng Wang, Wei He, Ying Nie, Jianyuan Guo, Chuanjian Liu, Kai Han, Yunhe Wang
for: This paper aims to improve the object detection performance of YOLO-series models by introducing a new Gather-Distribute (GD) mechanism and implementing MAE-style pretraining.
methods: The proposed Gold-YOLO model uses a GD mechanism that combines convolution and self-attention operations to improve multi-scale feature fusion. The model also uses MAE-style pretraining to enhance the performance.
results: The Gold-YOLO model achieves an outstanding 39.9% AP on the COCO val2017 dataset and 1030 FPS on a T4 GPU, outperforming the previous SOTA model YOLOv6-3.0-N by +2.4% in terms of AP.Abstract
In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection. Many studies pushed up the baseline to a higher level by modifying the architecture, augmenting data and designing new losses. However, we find previous models still suffer from information fusion problem, although Feature Pyramid Network (FPN) and Path Aggregation Network (PANet) have alleviated this. Therefore, this study provides an advanced Gatherand-Distribute mechanism (GD) mechanism, which is realized with convolution and self-attention operations. This new designed model named as Gold-YOLO, which boosts the multi-scale feature fusion capabilities and achieves an ideal balance between latency and accuracy across all model scales. Additionally, we implement MAE-style pretraining in the YOLO-series for the first time, allowing YOLOseries models could be to benefit from unsupervised pretraining. Gold-YOLO-N attains an outstanding 39.9% AP on the COCO val2017 datasets and 1030 FPS on a T4 GPU, which outperforms the previous SOTA model YOLOv6-3.0-N with similar FPS by +2.4%. The PyTorch code is available at https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO, and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO.
摘要
在过去的几年中,YOLO系列模型在实时对象检测领域取得了领先地位。许多研究尝试提高基线,通过修改架构、增强数据和设计新的损失函数。然而,我们发现先前的模型仍然受到信息融合问题的困扰,尽管Feature Pyramid Network(FPN)和Path Aggregation Network(PANet)已经减轻了这个问题。因此,本研究提出了一种高级的聚合分发机制(GD)机制,通过 convolution 和自注意操作实现。这新的设计的模型被称为 Gold-YOLO,它提高了多尺度特征融合能力,并在所有模型缩放水平上实现了理想的平衡 между延迟和准确率。此外,我们在 YOLO 系列模型中实施了 MAE 风格的预训练,让 YOLO 系列模型可以从无监督预训练中受益。Gold-YOLO-N 在 COCO val2017 数据集上达到了出色的 39.9% AP 和 T4 GPU 上的 1030 FPS,超过了先前的 SOTA 模型 YOLOv6-3.0-N 的相似 FPS 值 by +2.4%。PyTorch 代码可以在 找到,MindSpore 代码可以在 找到。
Dynamic Pricing of Applications in Cloud Marketplaces using Game Theory
paper_authors: Safiye Ghasemi, Mohammad Reza Meybodi, Mehdi Dehghan Takht-Fooladi, Amir Masoud Rahmani
for: 这个论文旨在研究云市场竞争对应的价格策略,以帮助企业更好地制定价格策略。
methods: 该论文采用了游戏理论来设计动态价格策略,并在委员会中考虑了多家提供商的竞争。
results: 该论文通过数学模型来研究云市场竞争,并证明了存在和uniqueness的纳什平衡,从而为企业提供了新的动态价格策略。Abstract
The competitive nature of Cloud marketplaces as new concerns in delivery of services makes the pricing policies a crucial task for firms. so that, pricing strategies has recently attracted many researchers. Since game theory can handle such competing well this concern is addressed by designing a normal form game between providers in current research. A committee is considered in which providers register for improving their competition based pricing policies. The functionality of game theory is applied to design dynamic pricing policies. The usage of the committee makes the game a complete information one, in which each player is aware of every others payoff functions. The players enhance their pricing policies to maximize their profits. The contribution of this paper is the quantitative modeling of Cloud marketplaces in form of a game to provide novel dynamic pricing strategies; the model is validated by proving the existence and the uniqueness of Nash equilibrium of the game.
摘要
云市场的竞争性新问题在服务交付中带来了价格策略的核心任务 для公司。因此,价格策略在最近吸引了许多研究人员。由于游戏理论可以良好处理这种竞争,因此在当前研究中,设计了一个委员会,让提供者为了改善其竞争基础价格策略进行注册。通过游戏理论的应用,设计了动态价格策略。由于委员会的存在,游戏变为完全信息游戏,每个玩家知道彼此的利益函数。玩家通过优化价格策略来 maximize 利润。本文的贡献在于以游戏的形式对云市场进行量化模型化,提供了新的动态价格策略;模型的存在和uniqueness 的证明,证明了这种游戏的稳定性。
A Competition-based Pricing Strategy in Cloud Markets using Regret Minimization Techniques
paper_authors: S. Ghasemi, M. R. Meybodi, M. Dehghan, A. M. Rahmani
for: This paper aims to address the challenge of pricing in Cloud computing marketplaces, where providers compete without knowing each other’s pricing policies.
methods: The paper proposes a pricing policy based on regret minimization and applies it to an incomplete-information game modeling the competition among Cloud providers. The algorithm updates the distribution of strategies based on experienced regret, leading to faster minimization of regret and increased profits for providers.
results: The experimental results show that the proposed pricing policy leads to much greater increases in provider profits compared to other pricing policies, and the efficiency of various regret minimization techniques in a simulated marketplace of Cloud is discussed. Additionally, the study examines the return on investment of providers in considered organizations and finds promising results.Here’s the Chinese translation of the three key points:
for: 这篇论文目标是解决云计算市场场所中的价格问题, provider competing without knowing each other’s pricing policies。
results: 实验结果表明,提出的价格策略在其他价格策略的比较中显示出了很大的增长,并且在模拟云中的竞争市场中,不同的后悔最小化技术的效率得到了详细的讨论。此外,论文还研究了考虑了不同组织中提供商的投资回报,并发现了有前提。Abstract
Cloud computing as a fairly new commercial paradigm, widely investigated by different researchers, already has a great range of challenges. Pricing is a major problem in Cloud computing marketplace; as providers are competing to attract more customers without knowing the pricing policies of each other. To overcome this lack of knowledge, we model their competition by an incomplete-information game. Considering the issue, this work proposes a pricing policy related to the regret minimization algorithm and applies it to the considered incomplete-information game. Based on the competition based marketplace of the Cloud, providers update the distribution of their strategies using the experienced regret. The idea of iteratively applying the algorithm for updating probabilities of strategies causes the regret get minimized faster. The experimental results show much more increase in profits of the providers in comparison with other pricing policies. Besides, the efficiency of a variety of regret minimization techniques in a simulated marketplace of Cloud are discussed which have not been observed in the studied literature. Moreover, return on investment of providers in considered organizations is studied and promising results appeared.
摘要
云计算作为一种比较新的商业模式,已经广泛研究了不同的研究者。在云计算市场中,价格是一个主要的问题,Provider competing to attract more customers without knowing each other's pricing policies。为了解决这个问题,我们模拟了这个 incomplete-information game。基于云计算市场的竞争性,提供者通过经验的 regret 更新分布的策略。iteratively applying the algorithm for updating probabilities of strategies causes the regret get minimized faster。实验结果表明,与其他价格策略相比,提供者的利润增加了很多。此外,我们还发现了一些 regret minimization techniques 在云计算市场中的效率,这些result未经studied literature。此外,我们还研究了Provider的投资回报,并获得了扎实的结果。
Rating Prediction in Conversational Task Assistants with Behavioral and Conversational-Flow Features
results: 我们的结果表明,模型对话流程和用户行为方面的特征可以在单个模型中结合,以预测Offline评分。此外,对CTA特有的行为特征进行分析,可以为未来系统提供参考。Abstract
Predicting the success of Conversational Task Assistants (CTA) can be critical to understand user behavior and act accordingly. In this paper, we propose TB-Rater, a Transformer model which combines conversational-flow features with user behavior features for predicting user ratings in a CTA scenario. In particular, we use real human-agent conversations and ratings collected in the Alexa TaskBot challenge, a novel multimodal and multi-turn conversational context. Our results show the advantages of modeling both the conversational-flow and behavioral aspects of the conversation in a single model for offline rating prediction. Additionally, an analysis of the CTA-specific behavioral features brings insights into this setting and can be used to bootstrap future systems.
摘要
预测对话任务助手(CTA)的成功可以帮助我们更好地理解用户行为,从而更好地行动。在这篇论文中,我们提出了TB-Rater模型,这是一个基于转换器模型,结合对话流程特征和用户行为特征来预测用户评分在CTA场景中。具体来说,我们使用了真实的人类-机器人对话和在Alexa TaskBot挑战中收集的用户评分数据,这是一个新的多模式和多轮对话上下文。我们的结果表明,将对话流程和行为方面的特征模型在单个模型中可以在线评分中获得优势。此外,对CTA特有的行为特征进行分析,可以为未来系统提供Bootstrap。
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
paper_authors: Stefan Stan, Kazi Injamamul Haque, Zerrin Yumak
for: 这个论文旨在解决Current methods mostly focus on deterministic deep learning methods for speech-driven 3D facial animation synthesis, which do not accurately capture non-verbal facial cues.
results: 我们的方法在对比于现有方法时达到了更好或相当的结果,并且引入了一个新的基于blendshape的rigged character的数据集。Here’s the full summary in Simplified Chinese:
for: 这个论文旨在解决Current methods mostly focus on deterministic deep learning methods for speech-driven 3D facial animation synthesis, which do not accurately capture non-verbal facial cues.
results: 我们的方法在对比于现有方法时达到了更好或相当的结果,并且引入了一个新的基于blendshape的rigged character的数据集。I hope this helps! Let me know if you have any other questions.Abstract
Speech-driven 3D facial animation synthesis has been a challenging task both in industry and research. Recent methods mostly focus on deterministic deep learning methods meaning that given a speech input, the output is always the same. However, in reality, the non-verbal facial cues that reside throughout the face are non-deterministic in nature. In addition, majority of the approaches focus on 3D vertex based datasets and methods that are compatible with existing facial animation pipelines with rigged characters is scarce. To eliminate these issues, we present FaceDiffuser, a non-deterministic deep learning model to generate speech-driven facial animations that is trained with both 3D vertex and blendshape based datasets. Our method is based on the diffusion technique and uses the pre-trained large speech representation model HuBERT to encode the audio input. To the best of our knowledge, we are the first to employ the diffusion method for the task of speech-driven 3D facial animation synthesis. We have run extensive objective and subjective analyses and show that our approach achieves better or comparable results in comparison to the state-of-the-art methods. We also introduce a new in-house dataset that is based on a blendshape based rigged character. We recommend watching the accompanying supplementary video. The code and the dataset will be publicly available.
摘要
<>Translate the given text into Simplified Chinese.<>人工智能驱动的3D面部动画生成问题在行业和研究中都是挑战性的。现有的方法大多涉及决定性深度学习方法,即给定一个语音输入,输出总是一样的。然而,现实中的非语言面部征标是不决定的性质。此外,大多数方法都集中在3D顶点基本的数据集和方法上,与现有的人物动画管道相容的方法scarce。为解决这些问题,我们介绍FaceDiffuser,一种非决定性深度学习模型,用于生成语音驱动的3D面部动画。我们的方法基于扩散技术,使用预训练的大语音表示模型HuBERT来编码音频输入。到目前为止,我们是第一个使用扩散方法来解决语音驱动3D面部动画生成问题。我们进行了广泛的对象和主观分析,并证明我们的方法可以与当前状态的方法相比或更好的成绩。我们还介绍了一个新的基于blendshape的人物动画数据集。建议观看附加的补充视频。代码和数据集将公开发布。
A Cost-Aware Mechanism for Optimized Resource Provisioning in Cloud Computing
paper_authors: Safiye Ghasemi, Mohammad Reza Meybodi, Mehdi Dehghan Takht Fooladi, Amir Masoud Rahmani
for: 这篇论文旨在提出一种新的资源配置方法,以减少资源配置成本的方式来满足需求。
methods: 本文使用了学习自动过程来选择最适合的资源来主机每个服务,并考虑成本和服务需求。
results: 实验结果显示,我们的方法能够有效地运行许多不同类型的应用程序,并且可以适当地减少资源配置成本。Abstract
Due to the recent wide use of computational resources in cloud computing, new resource provisioning challenges have been emerged. Resource provisioning techniques must keep total costs to a minimum while meeting the requirements of the requests. According to widely usage of cloud services, it seems more challenging to develop effective schemes for provisioning services cost-effectively; we have proposed a novel learning based resource provisioning approach that achieves cost-reduction guarantees of demands. The contributions of our optimized resource provisioning (ORP) approach are as follows. Firstly, it is designed to provide a cost-effective method to efficiently handle the provisioning of requested applications; while most of the existing models allow only workflows in general which cares about the dependencies of the tasks, ORP performs based on services of which applications comprised and cares about their efficient provisioning totally. Secondly, it is a learning automata-based approach which selects the most proper resources for hosting each service of the demanded application; our approach considers both cost and service requirements together for deploying applications. Thirdly, a comprehensive evaluation is performed for three typical workloads: data-intensive, process-intensive and normal applications. The experimental results show that our method adapts most of the requirements efficiently, and furthermore the resulting performance meets our design goals.
摘要
The contributions of our optimized resource provisioning (ORP) approach are as follows:1. Cost-effective method: ORP provides a cost-effective method to efficiently handle the provisioning of requested applications, while most existing models only consider workflows in general and ignore the dependencies of tasks. ORP takes into account the services that applications comprise and cares about their efficient provisioning.2. Learning automata-based approach: ORP is a learning automata-based approach that selects the most appropriate resources for hosting each service of the demanded application. Our approach considers both cost and service requirements together for deploying applications.3. Comprehensive evaluation: We conducted a comprehensive evaluation for three typical workloads: data-intensive, process-intensive, and normal applications. The experimental results show that our method adapts to most of the requirements efficiently, and the resulting performance meets our design goals.
CPLLM: Clinical Prediction with Large Language Models
results: 对于不同的基线模型,包括Logistic Regression、RETAIN和Med-BERT,我们的CPLLM模型在PR-AUC和ROC-AUC metric上都显示出了明显的提升,较baseline模型更高。Abstract
We present Clinical Prediction with Large Language Models (CPLLM), a method that involves fine-tuning a pre-trained Large Language Model (LLM) for clinical disease prediction. We utilized quantization and fine-tuned the LLM using prompts, with the task of predicting whether patients will be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical diagnosis records. We compared our results versus various baselines, including Logistic Regression, RETAIN, and Med-BERT, which is the current state-of-the-art model for disease prediction using structured EHR data. Our experiments have shown that CPLLM surpasses all the tested models in terms of both PR-AUC and ROC-AUC metrics, displaying noteworthy enhancements compared to the baseline models.
摘要
我团队现在提出了临床预测使用大型语言模型(CPLLM),这种方法是通过先前训练的大型语言模型(LLM)进行精度调整,以预测患者将在下一次访问或接下来的诊断中被诊断出的疾病。我们使用量化和精度调整LLM,使其能够利用患者历史诊断记录来预测疾病。我们与various baselines进行比较,包括Logistic Regression、RETAIN和Med-BERT,这些模型都是使用结构化医疗记录数据进行疾病预测的现状之arte。我们的实验结果表明,CPLLM在PR-AUC和ROC-AUC指标上都超过了所有测试模型,显示了与基线模型相比而言的remarkable enhancements。
Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains
results: 这篇论文描述了AuTexTification2023数据集,包含了英语和西班牙语的160,000多个文本,来自五个领域(微博、评论、新闻、法律和使用教程)。总共有114个团队参加了比赛,其中36个团队发送了175个运行,20个团队发送了工作笔记。在这篇报告中,我们介绍了AuTexTification数据集和任务,参与系统,以及结果。Abstract
This paper presents the overview of the AuTexTification shared task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2023 conference. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to attribute a machine-generated text to one of six different text generation models. Our AuTexTification 2023 dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles). A total of 114 teams signed up to participate, of which 36 sent 175 runs, and 20 of them sent their working notes. In this overview, we present the AuTexTification dataset and task, the submitted participating systems, and the results.
摘要
Rethinking Sensors Modeling: Hierarchical Information Enhanced Traffic Forecasting
methods: 本文提出了一个 Hierarchical Information Enhanced Spatio-Temporal prediction 方法(HIEST),它将感应器之间的依赖性分为两层:地域层和全球层。
results: 实验结果显示,HIEST 方法在比较于现有基eline之上获得了leading performance。Abstract
With the acceleration of urbanization, traffic forecasting has become an essential role in smart city construction. In the context of spatio-temporal prediction, the key lies in how to model the dependencies of sensors. However, existing works basically only consider the micro relationships between sensors, where the sensors are treated equally, and their macroscopic dependencies are neglected. In this paper, we argue to rethink the sensor's dependency modeling from two hierarchies: regional and global perspectives. Particularly, we merge original sensors with high intra-region correlation as a region node to preserve the inter-region dependency. Then, we generate representative and common spatio-temporal patterns as global nodes to reflect a global dependency between sensors and provide auxiliary information for spatio-temporal dependency learning. In pursuit of the generality and reality of node representations, we incorporate a Meta GCN to calibrate the regional and global nodes in the physical data space. Furthermore, we devise the cross-hierarchy graph convolution to propagate information from different hierarchies. In a nutshell, we propose a Hierarchical Information Enhanced Spatio-Temporal prediction method, HIEST, to create and utilize the regional dependency and common spatio-temporal patterns. Extensive experiments have verified the leading performance of our HIEST against state-of-the-art baselines. We publicize the code to ease reproducibility.
摘要
随着城市化的加速,城市智能化建设中的交通预测已成为一项重要的任务。在空间时间预测的上下文中,关键在于如何模型感知器之间的依赖关系。然而,现有的工作基本上只考虑了感知器之间的微型关系,忽略了感知器的宏观依赖关系。在这篇论文中,我们认为应重新考虑感知器之间的依赖模型化,从两个层次来看:地域和全球视角。具体来说,我们将原始感知器高度相关的内部节点合并为一个地域节点,以保留宏观依赖关系。然后,我们生成了代表性的全球节点,用于反映全球感知器之间的依赖关系,并提供辅助的空间时间依赖学习信息。为了保证节点表示的通用性和实际性,我们将MetaGCN integrate into physical data space。此外,我们提出了跨层次图 convolution来传递不同层次的信息。简而言之,我们提出了一种增强空间时间预测方法,即 Hierarchical Information Enhanced Spatio-Temporal prediction(HIEST),以创造和利用地域依赖关系和共同空间时间模式。我们的实验证明了HIEST在比较顶尖基准下的领先性。我们公布了代码,以便重现。
Open-endedness induced through a predator-prey scenario using modular robots
results: 研究发现了适应策略的出现,证明了通过探险-猎食 dinamics 使用模块机器人来引发 OEE 的可能性。然而,这种emergence似乎需要根据行为标准来条件繁殖。Abstract
This work investigates how a predator-prey scenario can induce the emergence of Open-Ended Evolution (OEE). We utilize modular robots of fixed morphologies whose controllers are subject to evolution. In both species, robots can send and receive signals and perceive the relative positions of other robots in the environment. Specifically, we introduce a feature we call a tagging system: it modifies how individuals can perceive each other and is expected to increase behavioral complexity. Our results show the emergence of adaptive strategies, demonstrating the viability of inducing OEE through predator-prey dynamics using modular robots. Such emergence, nevertheless, seemed to depend on conditioning reproduction to an explicit behavioral criterion.
摘要
这项研究探讨了掠食-猎物情况如何引起开放演化(OEE)的出现。我们利用固定形态的模块机器人的控制器进行进化。在两种机器人中,机器人可以发送和接收信号,并且可以感知环境中其他机器人的相对位置。我们引入了一个特征,即标记系统:它改变了个体如何感知彼此,并且预期会增加行为复杂性。我们的结果显示了适应策略的出现,证明了通过掠食-猎物 dinamics 使用模块机器人来引起 OEE 的可能性。然而,这种出现似乎виси于对行为标准的条件修复 reproduce。Note: Please keep in mind that the translation is not perfect and may not capture all the nuances of the original text.
Machine Learning Data Suitability and Performance Testing Using Fault Injection Testing Framework
paper_authors: Manal Rahal, Bestoun S. Ahmed, Jorgen Samuelsson for:This paper aims to address the gap in testing approaches for input data in machine learning (ML) systems, specifically the resilience of ML models to intentionally-triggered data faults.methods:The proposed framework, called FIUL-Data, uses data mutators to explore vulnerabilities of ML systems against data fault injections. The framework is designed with three main ideas: mutators are not random, one mutator is applied at a time, and selected ML models are optimized beforehand.results:The FIUL-Data framework is evaluated using data from analytical chemistry, and the results show that the framework allows for the evaluation of the resilience of ML models. In most experiments, ML models show higher resilience at larger training datasets, and gradient boost performed better than support vector regression in smaller training sets. The mean squared error metric is found to be useful in evaluating the resilience of models due to its higher sensitivity to data mutation.Here is the text in Simplified Chinese:for:这篇论文目标是解决机器学习(ML)系统中输入数据测试方法的差距,具体是测试ML模型对数据fault的抗性。methods:该提议的框架是FIUL-Data,使用数据变换器来探索ML系统对数据fault的敏感性。框架设计了三个主要想法:变换器不是随机的,一个变换器在一次实例时应用,并且选择的ML模型在先前优化。results:FIUL-Data框架在分析化学中使用数据进行评估,结果显示该框架可以评估ML模型的抗性。大多数实验结果表明,ML模型在更大的训练集上显示更高的抗性,并且在较小的训练集中,梯度拟合perform луч于支持向量回归。总的来说, Mean Squared Error 度量有用于评估模型的抗性,因为它对数据变换更敏感。Abstract
Creating resilient machine learning (ML) systems has become necessary to ensure production-ready ML systems that acquire user confidence seamlessly. The quality of the input data and the model highly influence the successful end-to-end testing in data-sensitive systems. However, the testing approaches of input data are not as systematic and are few compared to model testing. To address this gap, this paper presents the Fault Injection for Undesirable Learning in input Data (FIUL-Data) testing framework that tests the resilience of ML models to multiple intentionally-triggered data faults. Data mutators explore vulnerabilities of ML systems against the effects of different fault injections. The proposed framework is designed based on three main ideas: The mutators are not random; one data mutator is applied at an instance of time, and the selected ML models are optimized beforehand. This paper evaluates the FIUL-Data framework using data from analytical chemistry, comprising retention time measurements of anti-sense oligonucleotide. Empirical evaluation is carried out in a two-step process in which the responses of selected ML models to data mutation are analyzed individually and then compared with each other. The results show that the FIUL-Data framework allows the evaluation of the resilience of ML models. In most experiments cases, ML models show higher resilience at larger training datasets, where gradient boost performed better than support vector regression in smaller training sets. Overall, the mean squared error metric is useful in evaluating the resilience of models due to its higher sensitivity to data mutation.
摘要
创建可恢复的机器学习(ML)系统已经成为确保生产准备的ML系统获得用户信任的必要手段。输入数据质量和模型对生成端到端测试的成功产生很大影响。然而,输入数据测试的方法并不够系统化,与模型测试相比相对落后。为解决这个差距,本文提出了输入数据中的异常投入测试框架(FIUL-Data),用于测试ML模型对多种意外触发的数据异常的抗性。数据变换器探索了ML系统对各种异常投入的敏感性。该框架基于以下三个主要想法:变换器不是随机的,只有一个变换器在一个时间点上应用,并且选择的ML模型在先前优化。本文通过使用分析化学数据,包括抑制肽的释放时间测量,对FIUL-Data框架进行了实证评估。实验在两步进行,先分别分析选择的ML模型对数据变换的响应,然后对各模型进行比较。结果表明,FIUL-Data框架可以评估ML模型的抗性。大多数实验情况下,ML模型在更大的训练集上显示更高的抗性,其中梯度拟合在小训练集中表现更好。总的来说,平均方差误差度量是评估ML模型抗性的有用指标。
Grounded Complex Task Segmentation for Conversational Assistants
results: 经过测试,用户对 step 的 complexity 和 length 有所偏好,并且提出的方法可以改善原始的 web-based instrucional text,提高了 86% 的评价。Abstract
Following complex instructions in conversational assistants can be quite daunting due to the shorter attention and memory spans when compared to reading the same instructions. Hence, when conversational assistants walk users through the steps of complex tasks, there is a need to structure the task into manageable pieces of information of the right length and complexity. In this paper, we tackle the recipes domain and convert reading structured instructions into conversational structured ones. We annotated the structure of instructions according to a conversational scenario, which provided insights into what is expected in this setting. To computationally model the conversational step's characteristics, we tested various Transformer-based architectures, showing that a token-based approach delivers the best results. A further user study showed that users tend to favor steps of manageable complexity and length, and that the proposed methodology can improve the original web-based instructional text. Specifically, 86% of the evaluated tasks were improved from a conversational suitability point of view.
摘要
请求中的复杂指令可能会让用户感到困惑,这是因为与阅读相同的指令相比,用户的注意力和记忆 span 更短。因此,当 conversational assistant 通过多个步骤引导用户完成复杂任务时,需要将任务分解成可管理的小块信息,以便用户更好地理解和完成。在这篇论文中,我们将 recipes 领域中的指令结构化为 conversational 结构,并通过对话情境进行标注,从而获得了更深刻的理解。为了计算 conversational 步骤的特点,我们测试了不同的 Transformer 基 architecture,发现 token 基本法取得了最好的结果。进一步的用户研究表明,用户偏好管理 complexity 和 length 的步骤,而我们的方法ologies 可以改善原始的网络上的指令文本。特别是,86% 的评估任务得到了 conversational 适用性的改进。
Sequence-to-Sequence Spanish Pre-trained Language Models
results: 论文通过对各模型进行了广泛的评估,发现BERT和T5模型在所有评估任务中表现最佳,而BART模型也在某些任务中表现出色。此外,该论文还将所有模型公开发布到研究社区,以促进未来的西班牙语处理研究。Abstract
In recent years, substantial advancements in pre-trained language models have paved the way for the development of numerous non-English language versions, with a particular focus on encoder-only and decoder-only architectures. While Spanish language models encompassing BERT, RoBERTa, and GPT have exhibited prowess in natural language understanding and generation, there remains a scarcity of encoder-decoder models designed for sequence-to-sequence tasks involving input-output pairs. This paper breaks new ground by introducing the implementation and evaluation of renowned encoder-decoder architectures, exclusively pre-trained on Spanish corpora. Specifically, we present Spanish versions of BART, T5, and BERT2BERT-style models and subject them to a comprehensive assessment across a diverse range of sequence-to-sequence tasks, spanning summarization, rephrasing, and generative question answering. Our findings underscore the competitive performance of all models, with BART and T5 emerging as top performers across all evaluated tasks. As an additional contribution, we have made all models publicly available to the research community, fostering future exploration and development in Spanish language processing.
摘要
近年来,大规模的预训练语言模型技术得到了广泛应用,特别是针对英语以外语言的研发。虽然西班牙语模型,包括BERT、RoBERTa和GPT,在自然语言理解和生成方面具有卓越表现,但是还缺乏适用于序列-序列任务的encoder-decoder模型。这篇论文创新地介绍了西班牙语encoder-decoder模型的实现和评估,具体来说是在西班牙语 corpus 上预训练的 BART、T5 和 BERT2BERT 样式模型。我们对这些模型进行了广泛的评估,包括概要、重新写和生成问答等序列-序列任务,我们的发现表明所有模型都具有竞争力,BART 和 T5 在所有评估任务中表现出色。此外,我们将所有模型公开发布给研究社区,以促进未来的探索和发展在西班牙语处理领域。
Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering
results: 这个框架的实验验证表明,这种多代理人问题决策框架具有优化空中作战决策的功能。Abstract
The application of artificial intelligence to simulate air-to-air combat scenarios is attracting increasing attention. To date the high-dimensional state and action spaces, the high complexity of situation information (such as imperfect and filtered information, stochasticity, incomplete knowledge about mission targets) and the nonlinear flight dynamics pose significant challenges for accurate air combat decision-making. These challenges are exacerbated when multiple heterogeneous agents are involved. We propose a hierarchical multi-agent reinforcement learning framework for air-to-air combat with multiple heterogeneous agents. In our framework, the decision-making process is divided into two stages of abstraction, where heterogeneous low-level policies control the action of individual units, and a high-level commander policy issues macro commands given the overall mission targets. Low-level policies are trained for accurate unit combat control. Their training is organized in a learning curriculum with increasingly complex training scenarios and league-based self-play. The commander policy is trained on mission targets given pre-trained low-level policies. The empirical validation advocates the advantages of our design choices.
摘要
application of artificial intelligence to simulate air-to-air combat scenarios is attracting increasing attention. To date, the high-dimensional state and action spaces, the high complexity of situation information (such as imperfect and filtered information, stochasticity, incomplete knowledge about mission targets) and the nonlinear flight dynamics pose significant challenges for accurate air combat decision-making. These challenges are exacerbated when multiple heterogeneous agents are involved. We propose a hierarchical multi-agent reinforcement learning framework for air-to-air combat with multiple heterogeneous agents. In our framework, the decision-making process is divided into two stages of abstraction, where heterogeneous low-level policies control the action of individual units, and a high-level commander policy issues macro commands given the overall mission targets. Low-level policies are trained for accurate unit combat control. Their training is organized in a learning curriculum with increasingly complex training scenarios and league-based self-play. The commander policy is trained on mission targets given pre-trained low-level policies. The empirical validation advocates the advantages of our design choices.Here's the translation in Traditional Chinese:运用人工智能模拟空中武器战场情况的应用正在吸引越来越多的注意。到目前为止,高维度的状态和动作空间,高复杂的情况信息(如受损和范围信息、数据满意度、任务目标知识不完整)以及非线性的飞行动力学都对于精准的空中战斗决策带来巨大挑战。当多个不同性的代理人参与时,这些挑战更加严重。我们提出了一个层次多代理人学习框架,用于空中战斗多个不同性代理人。在我们的框架中,决策过程分为两个层次的抽象,其中专门的低层策略控制个别单位的行动,而高层策略根据全局任务目标发出大规模的指令。低层策略在增加复杂的训练enario和联赛自游中进行训练。高层策略则是根据已经预训的低层策略进行训练。实际验证表明了我们的设计选择的优点。
Colour Passing Revisited: Lifted Model Construction with Commutative Factors
results: 对比于现有的colour passing算法,本文的方法可以更好地检测Symmetries,从而实现更高的压缩率和更快的在线查询速度。Abstract
Lifted probabilistic inference exploits symmetries in a probabilistic model to allow for tractable probabilistic inference with respect to domain sizes. To apply lifted inference, a lifted representation has to be obtained, and to do so, the so-called colour passing algorithm is the state of the art. The colour passing algorithm, however, is bound to a specific inference algorithm and we found that it ignores commutativity of factors while constructing a lifted representation. We contribute a modified version of the colour passing algorithm that uses logical variables to construct a lifted representation independent of a specific inference algorithm while at the same time exploiting commutativity of factors during an offline-step. Our proposed algorithm efficiently detects more symmetries than the state of the art and thereby drastically increases compression, yielding significantly faster online query times for probabilistic inference when the resulting model is applied.
摘要
增强概率推理利用模型中的对称性来实现可行的概率推理,具体来说是通过增强的可行推理算法来实现。为了应用增强推理,需要首先获得增强表示,而现有的颜色传递算法是state of the art的解决方案。然而,这个算法受到特定推理算法的限制,而且忽略了因素的 коммутатив性。我们提出了一种改进的颜色传递算法,使用逻辑变量来构建独立于特定推理算法的增强表示,同时在Offline阶段利用因素的 commutativity 来提高压缩率。我们的提议算法可以更好地检测模型中的对称性,从而导致更快的在线查询时间。
ChatGPT-4 as a Tool for Reviewing Academic Books in Spanish
paper_authors: Jonnathan Berrezueta-Guzman, Laura Malache-Silva, Stephan Krusche
For: This study evaluates the potential of ChatGPT-4 as an editing tool for Spanish literary and academic books.* Methods: The study analyzes the features and capabilities of ChatGPT-4 in terms of grammatical correction, stylistic coherence, and linguistic enrichment of texts in Spanish.* Results: ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time, but faces challenges in areas such as context sensitivity and interaction with visual content. Collaboration between ChatGPT-4 and human reviewers and editors is a promising strategy for improving efficiency without compromising quality.Here are the three points in Simplified Chinese text:* For: 这项研究评估了OpenAI开发的ChatGPT-4语言模型是否能够用于西班牙文学和学术书籍的编辑。* Methods: 研究分析了ChatGPT-4模型在西班牙文 grammar修正、风格一致性和语言丰富性方面的功能和能力。* Results: ChatGPT-4能够快速和准确地进行语法和拼写修正,但在上下文敏感性和图表和表格交互方面存在挑战。人类编辑和评审者和ChatGPT-4 collaboration 可能是提高效率而无需降低质量的有效策略。Abstract
This study evaluates the potential of ChatGPT-4, an artificial intelligence language model developed by OpenAI, as an editing tool for Spanish literary and academic books. The need for efficient and accessible reviewing and editing processes in the publishing industry has driven the search for automated solutions. ChatGPT-4, being one of the most advanced language models, offers notable capabilities in text comprehension and generation. In this study, the features and capabilities of ChatGPT-4 are analyzed in terms of grammatical correction, stylistic coherence, and linguistic enrichment of texts in Spanish. Tests were conducted with 100 literary and academic texts, where the edits made by ChatGPT-4 were compared to those made by expert human reviewers and editors. The results show that while ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time, it still faces challenges in areas such as context sensitivity, bibliometric analysis, deep contextual understanding, and interaction with visual content like graphs and tables. However, it is observed that collaboration between ChatGPT-4 and human reviewers and editors can be a promising strategy for improving efficiency without compromising quality. Furthermore, the authors consider that ChatGPT-4 represents a valuable tool in the editing process, but its use should be complementary to the work of human editors to ensure high-caliber editing in Spanish literary and academic books.
摘要
Tests were conducted on 100 literary and academic texts, comparing the edits made by ChatGPT-4 to those made by expert human reviewers and editors. The results show that ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time. However, it still struggles with context sensitivity, bibliometric analysis, deep contextual understanding, and interaction with visual content like graphs and tables.Despite these limitations, collaboration between ChatGPT-4 and human reviewers and editors is a promising strategy for improving efficiency without compromising quality. The authors conclude that ChatGPT-4 represents a valuable tool in the editing process, but its use should be complementary to the work of human editors to ensure high-caliber editing in Spanish literary and academic books.
paper_authors: Nardine Osman, Bruno Rosell i Gui, Carles Sierra
for: 本研究旨在通过在线连接人们,帮助他们解决日常问题。
methods: 本研究使用了声明性规范来mediate在线交互,特别是在连接人们时利用多样性。
results: 在不同的大学站点上进行的试验显示,选择的profile多样性得到了相对成功,并得到了用户满意的评价。Abstract
This paper addresses the issue of connecting people online to help them find support with their day-to-day problems. We make use of declarative norms for mediating online interactions, and we specifically focus on the issue of leveraging diversity when connecting people. We run pilots at different university sites, and the results show relative success in the diversity of the selected profiles, backed by high user satisfaction.
摘要
这篇论文关注在线连接人们,以帮助他们解决日常问题。我们利用声明性规范来调控在线交互,特别是利用多样性连接人们。我们在不同的大学站点进行了试点,结果表明在选择的profile中的多样性得到了相对成功,并得到了用户满意的评价。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering
results: 实验表明,提出的KG-to-Text增强的LLMs框架在KGQA任务上的答案准确率和知识声明的有用性都高于之前的KG-加强LLMs方法。Abstract
Despite their competitive performance on knowledge-intensive tasks, large language models (LLMs) still have limitations in memorizing all world knowledge especially long tail knowledge. In this paper, we study the KG-augmented language model approach for solving the knowledge graph question answering (KGQA) task that requires rich world knowledge. Existing work has shown that retrieving KG knowledge to enhance LLMs prompting can significantly improve LLMs performance in KGQA. However, their approaches lack a well-formed verbalization of KG knowledge, i.e., they ignore the gap between KG representations and textual representations. To this end, we propose an answer-sensitive KG-to-Text approach that can transform KG knowledge into well-textualized statements most informative for KGQA. Based on this approach, we propose a KG-to-Text enhanced LLMs framework for solving the KGQA task. Experiments on several KGQA benchmarks show that the proposed KG-to-Text augmented LLMs approach outperforms previous KG-augmented LLMs approaches regarding answer accuracy and usefulness of knowledge statements.
摘要
尽管大语言模型(LLMs)在知识密集任务上表现竞争性强,但它们仍有吸收全球知识的限制,特别是长尾知识。在这篇论文中,我们研究了将知识图(KG)扩展到语言模型(LMs)的方法,以解决需要丰富世界知识的问题 answering(KGQA)任务。现有的研究表明,使用KG知识来提高LLMs的提问可以显著提高LLMs在KGQA任务上的表现。然而,现有的方法忽略了KG表示和文本表示之间的差异,即KG知识的形式化表述。为了解决这问题,我们提出了一种答案相关的KG知识转换方法,可以将KG知识转换成最有用的文本表述,以便于KGQA任务。基于这种方法,我们提出了一种增强LLMs的KG-to-Text框架,用于解决KGQA任务。实验表明,我们的方法在多个KGQA bencmark上显著提高了答案准确率和知识声明的用用性。
Using Artificial Intelligence for the Automation of Knitting Patterns
results: 模型的评估结果显示了高的模型精度、精度、回归率和F1分数,而且大多数类的AUC分数在(0.7-0.9)的范围内。Abstract
Knitting patterns are a crucial component in the creation and design of knitted materials. Traditionally, these patterns were taught informally, but thanks to advancements in technology, anyone interested in knitting can use the patterns as a guide to start knitting. Perhaps because knitting is mostly a hobby, with the exception of industrial manufacturing utilising specialised knitting machines, the use of Al in knitting is less widespread than its application in other fields. However, it is important to determine whether knitted pattern classification using an automated system is viable. In order to recognise and classify knitting patterns. Using data augmentation and a transfer learning technique, this study proposes a deep learning model. The Inception ResNet-V2 is the main feature extraction and classification algorithm used in the model. Metrics like accuracy, logarithmic loss, F1-score, precision, and recall score were used to evaluate the model. The model evaluation's findings demonstrate high model accuracy, precision, recall, and F1 score. In addition, the AUC score for majority of the classes was in the range (0.7-0.9). A comparative analysis was done using other pretrained models and a ResNet-50 model with transfer learning and the proposed model evaluation results surpassed all others. The major limitation for this project is time, as with more time, there might have been better accuracy over a larger number of epochs.
摘要
针脊图案是创作和设计针脊材料的关键组件。在过去,这些图案通常是通过口述传授的,但现在随着技术的进步,任何感兴趣的人都可以使用这些图案作为指南开始针脊。由于针脊主要是一项兴趣爱好,除了特殊针脊机器在工业生产中使用外,使用人工智能(AI)在针脊中的应用范围相对较少。然而,是否可以使用自动化系统来分类针脊图案是一个重要的问题。为了识别和分类针脊图案,这个研究提出了一个深度学习模型。使用Inception ResNet-V2算法作为主要特征提取和分类算法。对模型的评估结果,发现模型的准确率、精度、准确率、和F1分数均达到了高水平。此外,大多数类别的AUC分数都在(0.7-0.9)之间。与其他预训练模型和ResNet-50模型进行比较分析后,这个研究的评估结果超过了其他所有。该项目的主要限制是时间,如果有更多的时间,可能会在更多的轮次上得到更高的准确率。
When to Trust AI: Advances and Challenges for Certification of Neural Networks
results: 本文提出了未来的挑战和研究方向,以确保AI决策的安全性和可靠性。Abstract
Artificial intelligence (AI) has been advancing at a fast pace and it is now poised for deployment in a wide range of applications, such as autonomous systems, medical diagnosis and natural language processing. Early adoption of AI technology for real-world applications has not been without problems, particularly for neural networks, which may be unstable and susceptible to adversarial examples. In the longer term, appropriate safety assurance techniques need to be developed to reduce potential harm due to avoidable system failures and ensure trustworthiness. Focusing on certification and explainability, this paper provides an overview of techniques that have been developed to ensure safety of AI decisions and discusses future challenges.
摘要
人工智能(AI)在过去几年中得到了快速发展,现在它已经准备好在各种应用中使用,如自主系统、医疗诊断和自然语言处理。虽然在实际应用中早期采用AI技术有一些问题,特别是神经网络可能存在不稳定性和可靠性问题,以及可能受到敌意的示例的影响。在长期来看,我们需要开发适当的安全保障技术,以降低可预防的系统失效的可能性,并确保AI决策的可靠性。本文将关注证书和解释性,提供了安全AI决策的技术ensure的概述,并讨论未来的挑战。
Long-tail Augmented Graph Contrastive Learning for Recommendation
paper_authors: Qian Zhao, Zhengwei Wu, Zhiqiang Zhang, Jun Zhou
for: 提高推荐系统中Graph Convolutional Networks (GCNs)的性能, Address the data sparsity issue in real-world scenarios.
methods: 使用contrastive learning方法,并 introduce learnable long-tail augmentation approach to enhance tail nodes, generate contrastive views based on the resulting augmented graph.
results: 对三个 benchmark dataset进行了extensive experiments,demonstrate the significant improvement in performance of our model over the state-of-the-arts,further analyses demonstrate the uniformity of learned representations and the superiority of LAGCL on long-tail performance.Abstract
Graph Convolutional Networks (GCNs) has demonstrated promising results for recommender systems, as they can effectively leverage high-order relationship. However, these methods usually encounter data sparsity issue in real-world scenarios. To address this issue, GCN-based recommendation methods employ contrastive learning to introduce self-supervised signals. Despite their effectiveness, these methods lack consideration of the significant degree disparity between head and tail nodes. This can lead to non-uniform representation distribution, which is a crucial factor for the performance of contrastive learning methods. To tackle the above issue, we propose a novel Long-tail Augmented Graph Contrastive Learning (LAGCL) method for recommendation. Specifically, we introduce a learnable long-tail augmentation approach to enhance tail nodes by supplementing predicted neighbor information, and generate contrastive views based on the resulting augmented graph. To make the data augmentation schema learnable, we design an auto drop module to generate pseudo-tail nodes from head nodes and a knowledge transfer module to reconstruct the head nodes from pseudo-tail nodes. Additionally, we employ generative adversarial networks to ensure that the distribution of the generated tail/head nodes matches that of the original tail/head nodes. Extensive experiments conducted on three benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the uniformity of learned representations and the superiority of LAGCL on long-tail performance. Code is publicly available at https://github.com/im0qianqian/LAGCL
摘要
图像 convolutional networks (GCNs) 在推荐系统中表现出色,可以有效利用高阶关系。然而,这些方法通常在实际场景中遇到数据稀缺问题。为解决这个问题,GCN 基于的推荐方法使用对照学习引入自我超vised信号。尽管它们有效,但是它们忽视了主要度差的问题,这可能导致非均衡的表示分布,这是对对照学习方法的表现非常重要的因素。为解决这个问题,我们提出了一种长尾增强图像对照学习(LAGCL)方法。具体来说,我们引入可学习的长尾增强approach,通过预测邻居信息来增强尾节点,并基于所得到的扩展图像生成对照视图。为使数据增强 schema 学习可能,我们设计了自动Drop模块,将头节点转化为 pseudo-tail 节点,并设计了知识传递模块,将 pseudo-tail 节点还原为头节点。此外,我们使用生成对抗网络,确保生成的尾/头节点的分布与原始的尾/头节点的分布一致。我们在三个标准数据集上进行了广泛的实验,并证明了我们的模型在现状上的显著改进。进一步的分析也表明了我们学习的表示的均匀性和我们对长尾性能的优势。代码可以在https://github.com/im0qianqian/LAGCL 中找到。
Are Large Language Models Really Robust to Word-Level Perturbations?
results: 实验结果表明,TREval可以准确地评估LLM的可靠性,并且发现LLM经常受到单词水平的干扰,这种干扰在日常语言使用中很常见。另外,研究发现,在进行练习和强化训练后,LLM的可靠性往往会下降。Abstract
The swift advancement in the scale and capabilities of Large Language Models (LLMs) positions them as promising tools for a variety of downstream tasks. In addition to the pursuit of better performance and the avoidance of violent feedback on a certain prompt, to ensure the responsibility of the LLM, much attention is drawn to the robustness of LLMs. However, existing evaluation methods mostly rely on traditional question answering datasets with predefined supervised labels, which do not align with the superior generation capabilities of contemporary LLMs. To address this issue, we propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools to evaluate the robustness of LLMs, which we refer to as the Reward Model for Reasonable Robustness Evaluation (TREvaL). Our extensive empirical experiments have demonstrated that TREval provides an accurate method for evaluating the robustness of an LLM, especially when faced with more challenging open questions. Furthermore, our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations, which are commonplace in daily language usage. Notably, we were surprised to discover that robustness tends to decrease as fine-tuning (SFT and RLHF) is conducted. The code of TREval is available in https://github.com/Harry-mic/TREval.
摘要
Large Language Models (LLMs) 的快速发展和能力提高,使其成为许多下游任务的优秀工具。除了提高性能和避免某些提示导致的暴力反馈外,为了确保 LLM 的责任,也引起了一些关注。现有的评估方法主要基于已经定义的传统问答数据集,这些数据集并不符合当代 LLM 的优秀生成能力。为解决这个问题,我们提出了一种新的合理评估方法,利用预训练的奖励模型作为诊断工具来评估 LLM 的 robustness,我们称之为 TREvaL。我们的广泛的实验证明了 TREval 能够准确地评估 LLM 的 robustness,特别是面对更加困难的开放问题。此外,我们的结果表明,LLM часто会受到单词水平的扰动,这些扰动在日常语言使用中很常见。意外地,我们发现,在 fine-tuning (SFT 和 RLHF) 过程中,LLM 的 Robustness 往往减退。TREval 的代码可以在 GitHub 上找到:https://github.com/Harry-mic/TREval。
ProtoExplorer: Interpretable Forensic Analysis of Deepfake Videos using Prototype Exploration and Refinement
results: 这篇文章透过对实际应用场景进行评估,确认了这个方法的可行性和有效性。Abstract
In high-stakes settings, Machine Learning models that can provide predictions that are interpretable for humans are crucial. This is even more true with the advent of complex deep learning based models with a huge number of tunable parameters. Recently, prototype-based methods have emerged as a promising approach to make deep learning interpretable. We particularly focus on the analysis of deepfake videos in a forensics context. Although prototype-based methods have been introduced for the detection of deepfake videos, their use in real-world scenarios still presents major challenges, in that prototypes tend to be overly similar and interpretability varies between prototypes. This paper proposes a Visual Analytics process model for prototype learning, and, based on this, presents ProtoExplorer, a Visual Analytics system for the exploration and refinement of prototype-based deepfake detection models. ProtoExplorer offers tools for visualizing and temporally filtering prototype-based predictions when working with video data. It disentangles the complexity of working with spatio-temporal prototypes, facilitating their visualization. It further enables the refinement of models by interactively deleting and replacing prototypes with the aim to achieve more interpretable and less biased predictions while preserving detection accuracy. The system was designed with forensic experts and evaluated in a number of rounds based on both open-ended think aloud evaluation and interviews. These sessions have confirmed the strength of our prototype based exploration of deepfake videos while they provided the feedback needed to continuously improve the system.
摘要
高度的场景中,可以提供人类可解释的机器学习模型是非常重要的。这种情况更加真实,特别是在复杂的深度学习模型中,其中有很多可调参数。最近,原型基方法在使得深度学习可解释方面表现出了扎实的抑制力。我们特别关注深度假影像在法医方面的分析。虽然原型基方法已经应用于深度假影像的检测,但在实际应用中仍然存在主要挑战,即原型往往相似,解释性 между原型异常不一致。这篇论文提出了一种可见分析过程模型,并基于这种模型提出了ProtoExplorer,一种可见分析系统,用于深度假影像检测模型的探索和细化。ProtoExplorer提供了视觉分析和视频数据中的时间滤波功能,可以识别和分析深度假影像。它还可以通过交互删除和替换原型来实现更加可解释和不偏执的预测,同时保持检测精度。系统针对法医专家进行了多轮评估,包括开放式思维回答评估和面试。这些评估过程确认了我们的原型基 explore深度假影像的优势,同时提供了需要不断改进系统的反馈。
CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought
results: 对多个 robust baseline进行了严格的实验证明,发现CoT-BERT可以在不需要其他文本表示模型或外部数据库的情况下,与supervised sentence representation learning具有相同或更高的性能。Abstract
Unsupervised sentence representation learning aims to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data. Recent progress within this field, propelled by contrastive learning and prompt engineering, has significantly bridged the gap between unsupervised and supervised strategies. Nonetheless, the potential utilization of Chain-of-Thought, remains largely untapped within this trajectory. To unlock latent capabilities within pre-trained models, such as BERT, we propose a two-stage approach for sentence representation: comprehension and summarization. Subsequently, the output of the latter phase is harnessed as the vectorized representation of the input sentence. For further performance enhancement, we meticulously refine both the contrastive learning loss function and the template denoising technique for prompt engineering. Rigorous experimentation substantiates our method, CoT-BERT, transcending a suite of robust baselines without necessitating other text representation models or external databases.
摘要
不监督句子表示学习目标是将输入句子转化为固定长度的向量,具有细致的 semantics信息,而不需要标注数据。在这个领域,最近的进展,受到对短文本检测和提取技术的影响,已经大幅度减少了不监督和监督方法之间的差距。然而,链式思维的潜在应用,在这个轨迹上仍然尚未得到充分利用。为了解锁预训练模型中的强化特性,我们提出了一种两阶段方法:理解和概要。然后,后一阶段的输出被用作输入句子的向量表示。为了进一步提高性能,我们仔细修改了对短文本检测和提取技术的权重,以及模板干扰技术。我们的方法,CoT-BERT,在一系列强大的基线上进行了严格的实验,并不需要其他文本表示模型或外部数据库。
Contrastive Pseudo Learning for Open-World DeepFake Attribution
paper_authors: Zhimin Sun, Shen Chen, Taiping Yao, Bangjie Yin, Ran Yi, Shouhong Ding, Lizhuang Ma
for: 评估深伪检测领域中匿名攻击的隐藏迹象,以推动相关前沿研究。
methods: 提出一个新的评估指标集合called Open-World DeepFake Attribution(OW-DFA),并提出一种基于对比学习的novel框架 named Contrastive Pseudo Learning(CPL)。
results: 经验表明,我们提出的方法在OW-DFA任务上具有优秀的表现,并且能够增强深伪检测领域的安全性。Abstract
The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces still remain under-explored. To push the related frontier research, we introduce a new benchmark called Open-World DeepFake Attribution (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios. Meanwhile, we propose a novel framework named Contrastive Pseudo Learning (CPL) for the OW-DFA task through 1) introducing a Global-Local Voting module to guide the feature alignment of forged faces with different manipulated regions, 2) designing a Confidence-based Soft Pseudo-label strategy to mitigate the pseudo-noise caused by similar methods in unlabeled set. In addition, we extend the CPL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments verify the superiority of our proposed method on the OW-DFA and also demonstrate the interpretability of deepfake attribution task and its impact on improving the security of deepfake detection area.
摘要
“对于伪造的挑战,随着生成技术的快速发展,已经受到了广泛的关注。然而,许多最近的研究仅对生成器生成的面部进行了重要的步骤,尚未充分处理隐藏在未知攻击中的伪造迹象。为了推进相关的前沿研究,我们提出了一个新的 bencmark 叫做 Open-World DeepFake Attribution(OW-DFA),旨在评估对不同类型的伪造面部进行权重评估。同时,我们提出了一个名为 Contrastive Pseudo Learning(CPL)的新框架,通过以下两个方法来解决问题:1)引入全球-本地投票模组,以帮助伪造面部的不同权重区域进行整合;2)设计一种基于信任的软定式标签策略,以减少 pseudo-noise 对不明文件集的影响。此外,我们将 CPL 框架扩展为多阶段模型,利用预训技术和迭代学习来进一步增强 traceability 性能。实验结果显示了我们的提案方法在 OW-DFA 中的超越性和深度伪造检测领域的解释性。”
Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation
results: 在进行文本生成到图像任务中,提议的方法可以实现更高的感知相似性,并降低通信频率,同时提高干扰通信频率下的Robustness。Abstract
By integrating recent advances in large language models (LLMs) and generative models into the emerging semantic communication (SC) paradigm, in this article we put forward to a novel framework of language-oriented semantic communication (LSC). In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency. To demonstrate LSC's potential, we introduce three innovative algorithms: 1) semantic source coding (SSC) which compresses a text prompt into its key head words capturing the prompt's syntactic essence while maintaining their appearance order to keep the prompt's context; 2) semantic channel coding (SCC) that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD) that produces listener-customized prompts via in-context learning the listener's language style. In a communication task for progressive text-to-image generation, the proposed methods achieve higher perceptual similarities with fewer transmissions while enhancing robustness in noisy communication channels.
摘要
通过将最新的大语言模型(LLM)和生成模型与发展的语义通信(SC) paradigm结合起来,在本文我们提出了一种新的语言启发型通信(LSC)框架。在LSC中,机器通过使用人类语言消息进行通信,这些消息可以通过自然语言处理(NLP)技术进行解释和修改,以提高SC的效率。为了证明LSC的潜力,我们提出了三种新算法:1) semanticsource coding(SSC),它压缩文本提示到其主要头语言,保留提示的语法结构和上下文;2) semantics channel coding(SCC),它通过将主要头语言替换为其更长的同义词,提高了对错误的Robustness;3) semantics knowledge distillation(SKD),它通过在上下文学习收者的语言风格,生成适合收者的启发式文本。在一个进步文本到图像生成任务中,我们的提案方法可以实现更高的感知相似性,同时减少传输量,并在噪声通信频道中提高了Robustness。
Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction
results: 对比 existed state-of-the-art方法,TACO方法在预测链接关系任务中表现出了superior的性能。Abstract
Inductive link prediction -- where entities during training and inference stages can be different -- has shown great potential for completing evolving knowledge graphs in an entity-independent manner. Many popular methods mainly focus on modeling graph-level features, while the edge-level interactions -- especially the semantic correlations between relations -- have been less explored. However, we notice a desirable property of semantic correlations between relations is that they are inherently edge-level and entity-independent. This implies the great potential of the semantic correlations for the entity-independent inductive link prediction task. Inspired by this observation, we propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations that are highly correlated to their topological structures within subgraphs. Specifically, we prove that semantic correlations between any two relations can be categorized into seven topological patterns, and then proposes Relational Correlation Network (RCN) to learn the importance of each pattern. To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph that can effectively preserve complete topological patterns within the subgraph. Extensive experiments demonstrate that TACO effectively unifies the graph-level information and edge-level interactions to jointly perform reasoning, leading to a superior performance over existing state-of-the-art methods for the inductive link prediction task.
摘要
依"\induction link prediction" -- 在训练和推理阶段之间的实体可以不同 -- 已经展现出了完善 evolving knowledge graphs 的巨大潜力。许多受欢迎的方法主要关注图级特征,而图级交互 -- 特别是关系之间的semantic correlation -- 则得到了更少的关注。然而,我们注意到了semantic correlation between relations 的一个愉悦性质,即它们是自然的edge-level和实体独立的。这意味着semantic correlation between relations 具有潜在的很大潜力 для实体独立的 inductive link prediction 任务。针对这一观察,我们提出了一种新的子图基于方法,即 TACO,用于模型 topology-aware COrrelations between relations (TACO)。具体来说,我们证明了任意两个关系的semantic correlation可以被分类为七种 topological pattern,并提出了 Relational Correlation Network (RCN) 来学习每种pattern的重要性。为了更好地利用 RCn 的潜力,我们提出了 Complete Common Neighbor induced subgraph,可以有效地保留完整的 topological patterns within the subgraph。我们的实验表明,TACO 能够具有图级信息和边级交互的整合,以jointly perform reasoning,从而对 inductive link prediction 任务 дости得更高的性能。
TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback
results: 论文附录了一个已经公布的隐式反馈教育数据集,并提供了评价指标来衡量模型的性能。TrueLearn库的广泛的文档和代码示例使得机器学习开发者和教育数据挖掘和学习分析专家可以很容易地使用这个库。Abstract
This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples are available at https://truelearn.readthedocs.io/en/latest.
摘要
这个工作描述了TrueLearn Python库,该库包含一家在线学习 bayesian 模型,用于建立教育(或更广泛地说,信息)推荐系统。这家模型遵循“开放学习”概念,使用人类可理解的用户表示。为了提高可解性和让用户控制,TrueLearn 库还包含了不同的表示,帮助结束用户可视化学习者模型,以便未来与自己的模型进行交互。此外,我们还提供了在线学习教育数据集,以便评估模型的性能。TrueLearn 库的文档和代码示例使得机器学习开发者和教育数据挖掘和学习分析专业人士可以轻松地使用。库和支持文档,以及示例可以在 上获取。
AttentionMix: Data augmentation method that relies on BERT attention mechanism
methods: 这 paper 使用了一种新的混合方法 called AttentionMix,它基于注意力机制。这种方法可以应用于任何注意力基于模型。
results: 在三个标准情感分类 dataset 上测试,AttentionMix 都超过了两种 Mixup 机制的参考方法以及vanilla BERT 方法。结果表明,注意力信息可以有效地用于 NLP 领域中的数据增强。Abstract
The Mixup method has proven to be a powerful data augmentation technique in Computer Vision, with many successors that perform image mixing in a guided manner. One of the interesting research directions is transferring the underlying Mixup idea to other domains, e.g. Natural Language Processing (NLP). Even though there already exist several methods that apply Mixup to textual data, there is still room for new, improved approaches. In this work, we introduce AttentionMix, a novel mixing method that relies on attention-based information. While the paper focuses on the BERT attention mechanism, the proposed approach can be applied to generally any attention-based model. AttentionMix is evaluated on 3 standard sentiment classification datasets and in all three cases outperforms two benchmark approaches that utilize Mixup mechanism, as well as the vanilla BERT method. The results confirm that the attention-based information can be effectively used for data augmentation in the NLP domain.
摘要
《混合方法》在计算机视觉领域已经证明是一种强大的数据增强技术,有许多后继者在指导下进行图像混合。一个有趣的研究方向是将基于混合的想法传递到其他领域,如自然语言处理(NLP)。虽然现有一些应用混合到文本数据的方法,但还是有很多空间 для新的、改进的方法。在这项工作中,我们介绍了一种新的混合方法,即关注混合(AttentionMix)。这种方法基于关注信息,而paper中关注BERT的注意机制。AttentionMix可以应用于任何关注基于模型。我们在3个标准情感分类dataset上进行评估,并在所有3个案例中超过了两个参考方法和vanilla BERT方法。结果表明,关注信息可以有效地用于NLP领域中的数据增强。
A New Interpretable Neural Network-Based Rule Model for Healthcare Decision Making
results: 我们对健康应用场景中的数据进行评估,并与现有的解释性方法进行比较。结果表明,TT-rules 能够达到与其他解释性方法相当或更高的性能,并且在大型表格数据集上进行适应也是可能的。特别是,TT-rules 成为了首个能够适应大型表格数据集,包括两个真实的 DNA 数据集,每个数据集具有超过 20K 的特征的解释性模型。Abstract
In healthcare applications, understanding how machine/deep learning models make decisions is crucial. In this study, we introduce a neural network framework, $\textit{Truth Table rules}$ (TT-rules), that combines the global and exact interpretability properties of rule-based models with the high performance of deep neural networks. TT-rules is built upon $\textit{Truth Table nets}$ (TTnet), a family of deep neural networks initially developed for formal verification. By extracting the necessary and sufficient rules $\mathcal{R}$ from the trained TTnet model (global interpretability) to yield the same output as the TTnet (exact interpretability), TT-rules effectively transforms the neural network into a rule-based model. This rule-based model supports binary classification, multi-label classification, and regression tasks for small to large tabular datasets. After outlining the framework, we evaluate TT-rules' performance on healthcare applications and compare it to state-of-the-art rule-based methods. Our results demonstrate that TT-rules achieves equal or higher performance compared to other interpretable methods. Notably, TT-rules presents the first accurate rule-based model capable of fitting large tabular datasets, including two real-life DNA datasets with over 20K features.
摘要
在医疗应用中,理解机器学习/深度学习模型的决策方法是非常重要的。在这项研究中,我们介绍了一种神经网络框架,称为“真实表格规则”(TT-rules),这种框架结合了神经网络的高性能和规则型模型的全面和准确解释性质。TT-rules基于一种名为“真实表格网络”(TTnet)的深度神经网络,该网络最初是为了正式验证而开发的。通过从训练过程中提取出神经网络模型中的必要和充分规则(global interpretability),并将这些规则转换成可以准确地预测神经网络输出的规则型模型(exact interpretability),TT-rules可以将神经网络转换成一种规则型模型。这种规则型模型支持二分类、多标签分类和回归任务,适用于小至大的表格数据集。在这项研究中,我们介绍了TT-rules的框架,并对其性能进行了健康应用的评估,并与当前的可解释方法进行了比较。我们的结果表明,TT-rules可以与其他可解释方法匹配或超越其性能。尤其是TT-rules是首个能够适用于大型表格数据集的准确规则型模型,包括两个实际的DNA数据集,每个数据集有超过20K的特征。
Likelihood-based Sensor Calibration for Expert-Supported Distributed Learning Algorithms in IoT Systems
results: 实验和仿真数据都表明,这种解决方案可以提高测量数据的精度和效率。Abstract
An important task in the field of sensor technology is the efficient implementation of adaptation procedures of measurements from one sensor to another sensor of identical design. One idea is to use the estimation of an affine transformation between different systems, which can be improved by the knowledge of experts. This paper presents an improved solution from Glacier Research that was published back in 1973. It is shown that this solution can be adapted for software calibration of sensors, implementation of expert-based adaptation, and federated learning methods. We evaluate our research with simulations and also with real measured data of a multi-sensor board with 8 identical sensors. The results show an improvement for both the simulation and the experiments with real data.
摘要
在感测技术领域,一项重要任务是有效地实现感测器之间测量转换的方法。一种思路是使用估算投影变换,可以通过专家知识进行改进。这篇文章介绍了1973年由冰川研究所发表的改进解决方案。我们表明该解决方案可以适用于软件准确性检测、专家知识基于的调整和联邦学习方法。我们通过实验和真实测量数据来评估我们的研究。结果表明,优化后的方法可以提高仪器测量精度。Here's the translation in Traditional Chinese:在感测技术领域,一个重要任务是有效地实现感测器之间测量转换的方法。一种思路是使用估算投影变换,可以通过专家知识进行改进。这篇文章介绍了1973年由冰川研究所发表的改进解决方案。我们表明这个解决方案可以应用于软件准确性检测、专家知识基于的调整和联邦学习方法。我们通过实验和真实测量数据来评估我们的研究。结果表明,优化后的方法可以提高仪器测量精度。
Practical Probabilistic Model-based Deep Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling
results: 论文通过在多个 Mujoco benchmark control任务和一个实际的 robot arm manipulation任务上进行评估,发现 DPETS 可以在更高的 sample efficiency 下达到更高的均返回值和快速吞吐量,同时超过了相关的 MBRL 方法。此外,DPETS 还可以在面临附加干扰和实际操作中表现出色。Abstract
This paper addresses the prediction stability, prediction accuracy and control capability of the current probabilistic model-based reinforcement learning (MBRL) built on neural networks. A novel approach dropout-based probabilistic ensembles with trajectory sampling (DPETS) is proposed where the system uncertainty is stably predicted by combining the Monte-Carlo dropout and trajectory sampling in one framework. Its loss function is designed to correct the fitting error of neural networks for more accurate prediction of probabilistic models. The state propagation in its policy is extended to filter the aleatoric uncertainty for superior control capability. Evaluated by several Mujoco benchmark control tasks under additional disturbances and one practical robot arm manipulation task, DPETS outperforms related MBRL approaches in both average return and convergence velocity while achieving superior performance than well-known model-free baselines with significant sample efficiency. The open source code of DPETS is available at https://github.com/mrjun123/DPETS.
摘要
In the evaluation, DPETS outperforms other MBRL approaches in both average return and convergence velocity on several Mujoco benchmark control tasks with additional disturbances and one practical robot arm manipulation task. It also achieves superior performance compared to well-known model-free baselines with significant sample efficiency. The open source code of DPETS is available on GitHub at https://github.com/mrjun123/DPETS.Translated into Simplified Chinese:这篇论文关注现有基于神经网络的概率模型学习(MBRL)方法的预测稳定性、预测准确性和控制能力。一种新的方法叫做 dropout-based 概率集合with trajectory sampling(DPETS)被提议,它将 Monte-Carlo dropout 和 trajectory sampling 集成到一个框架中,以稳定系统uncertainty的预测。loss函数设计用于更正神经网络的适应错误,以便更准确地预测概率模型。Policy也被扩展以筛选 aleatoric uncertainty,以提高控制能力。在评估中,DPETS 比其他 MBRL 方法在多个 Mujoco benchmark控制任务上(包括附加干扰)和一个实际的机械臂控制任务上表现出更高的平均返点和更快的连续速度,同时与许多已知的模型自由基eline表现出更好的性能,并且具有显著的样本效率。DPETS 的开源代码可以在 GitHub 上获取,地址为 。
Embed-Search-Align: DNA Sequence Alignment using Transformer Models
results: 该方法可以高度准确地对齐250个基因组中的DNA序列,并且在不同染色体和物种上进行了任务转移。Abstract
DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models (LLM) in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce numerical representations for DNA sequences. Such models have shown early promise in tasks involving classification of short DNA sequences, such as the detection of coding vs non-coding regions, as well as the identification of enhancer and promoter sequences. Performance at sequence classification tasks does not, however, translate to sequence alignment, where it is necessary to conduct a genome-wide search to successfully align every read. We address this open problem by framing it as an Embed-Search-Align task. In this framework, a novel encoder model DNA-ESA generates representations of reads and fragments of the reference, which are projected into a shared vector space where the read-fragment distance is used as surrogate for alignment. In particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), far exceeds the performance of 6 recent DNA-Transformer model baselines and shows task transfer across chromosomes and species.
摘要
DNNA序列Alignment含义在将短DNNA读物 assigning 到参考基因组中最有可能的位置上。这个过程是生物学分析中的关键步骤,包括变异检测、转录组学和epigenomics。传统方法通过两步进行:基因组索引,然后是高效的搜索来找到给定读物的可能位置。基于大自然语言模型(LLM)在编码文本为嵌入中的成功,最近的努力是否是使用同样的Transformer架构生成DNNA序列的数字表示。这些模型在短DNNA序列分类任务中表现出了早期的 promise,例如分类 coding vs non-coding 区域以及激活器和激发器序列的识别。但是,性能在序列分类任务上不能直接转移到Alignment任务,因为需要进行全基因组搜索以成功地对每个读物进行Alignment。我们解决这个开放问题 by framing it as an Embed-Search-Align task。在这种框架中,一种新的编码器模型DNA-ESA生成了读物和参考基因组中的 фрагментов的表示,并将它们投射到一个共享的vector空间中,其中读物-фрагмент的距离作为Alignment的Surrogate。特别是,DNA-ESA引入了:(1)对DNNA序列表示进行自我超vised 训练,以获得丰富的序列水平嵌入,以及(2)DNNA vector store,以实现在全球范围内搜索多个 фрагментов。DNNA-ESA在对3 gigabases的人类参考基因组上Alignment 250个长度的读物时,准确率高于97%,大幅超过了6个最近的DNNA-Transformer模型基eline,并在 хромосомы和种类之间显示任务传递。
Weak Supervision for Label Efficient Visual Bug Detection
for: 本研究旨在提高视频游戏中的视觉质量,并 Addressing the challenge of traditional testing methods being limited by resources and unable to cover the wide range of potential bugs.
results: 我们在Giantmap游戏中测试了FPPC(首个玩家截割/碰撞漏洞),发现我们的方法非常有效,超越了强监督基线,在实际、非常低频率、低数据量 régime中(0.336 $\rightarrow$ 0.550 F1分数)。只需5个标注的“好”示例(即0个漏洞),我们的自我标注目标就能够捕捉足够的信号,超越低标注监督设置。我们的方法可以在不同的视觉漏洞上进行应用,并且可以在视频游戏中拓展到更广泛的图像和视频任务。Abstract
As video games evolve into expansive, detailed worlds, visual quality becomes essential, yet increasingly challenging. Traditional testing methods, limited by resources, face difficulties in addressing the plethora of potential bugs. Machine learning offers scalable solutions; however, heavy reliance on large labeled datasets remains a constraint. Addressing this challenge, we propose a novel method, utilizing unlabeled gameplay and domain-specific augmentations to generate datasets & self-supervised objectives used during pre-training or multi-task settings for downstream visual bug detection. Our methodology uses weak-supervision to scale datasets for the crafted objectives and facilitates both autonomous and interactive weak-supervision, incorporating unsupervised clustering and/or an interactive approach based on text and geometric prompts. We demonstrate on first-person player clipping/collision bugs (FPPC) within the expansive Giantmap game world, that our approach is very effective, improving over a strong supervised baseline in a practical, very low-prevalence, low data regime (0.336 $\rightarrow$ 0.550 F1 score). With just 5 labeled "good" exemplars (i.e., 0 bugs), our self-supervised objective alone captures enough signal to outperform the low-labeled supervised settings. Building on large-pretrained vision models, our approach is adaptable across various visual bugs. Our results suggest applicability in curating datasets for broader image and video tasks within video games beyond visual bugs.
摘要
Traditional video game testing methods are limited by resources and have difficulty addressing the many potential bugs that exist. Machine learning offers scalable solutions, but relying on large labeled datasets is a challenge. To address this, we propose a new method that uses unlabeled gameplay and domain-specific augmentations to generate datasets and self-supervised objectives for pre-training or multi-task settings. Our method uses weak supervision to scale the datasets and can be used in both autonomous and interactive modes, incorporating unsupervised clustering and/or an interactive approach based on text and geometric prompts. We demonstrate the effectiveness of our approach on first-person player clipping/collision bugs within the Giantmap game world, achieving an F1 score of 0.550 in a practical, low-prevalence, low-data regime with just 5 labeled "good" exemplars. Our self-supervised objective captures enough signal to outperform low-labeled supervised settings, and our approach is adaptable to various visual bugs and can be applied to curating datasets for broader image and video tasks within video games.
Dynamic Tiling: A Model-Agnostic, Adaptive, Scalable, and Inference-Data-Centric Approach for Efficient and Accurate Small Object Detection
results: 相比现有的模型不偏的均匀裁剪方法,Dynamic Tiling 方法在不同的对象大小和环境下都能够达到更高的检测精度和效率,并且不需要劳动的重新调整。此外,这种方法还可以在不同的操作环境下进行适应,以提高对象检测的可扩展性和灵活性。Abstract
We introduce Dynamic Tiling, a model-agnostic, adaptive, and scalable approach for small object detection, anchored in our inference-data-centric philosophy. Dynamic Tiling starts with non-overlapping tiles for initial detections and utilizes dynamic overlapping rates along with a tile minimizer. This dual approach effectively resolves fragmented objects, improves detection accuracy, and minimizes computational overhead by reducing the number of forward passes through the object detection model. Adaptable to a variety of operational environments, our method negates the need for laborious recalibration. Additionally, our large-small filtering mechanism boosts the detection quality across a range of object sizes. Overall, Dynamic Tiling outperforms existing model-agnostic uniform cropping methods, setting new benchmarks for efficiency and accuracy.
摘要
我团队介绍了一种名为动态瓷纹的模型无关、可适应、可扩展的方法,用于小物体检测。这种方法基于我们的推理数据中心的哲学,使用非 overlap 的瓷纹开始,然后采用动态重叠率和瓷纹最小化器。这种双重方法能够有效地解决分割物体,提高检测精度,并减少计算负担。我们的方法适用于多种操作环境,无需劳辑重新调整。此外,我们的大小筛选机制可以在不同的物体大小下提高检测质量。总之,动态瓷纹超过了现有的模型无关均匀割 методы,设置了新的效率和准确性的benchmark。
Exploring the Relationship between LLM Hallucinations and Prompt Linguistic Nuances: Readability, Formality, and Concreteness
paper_authors: Vipula Rawte, Prachi Priya, S. M Towhidul Islam Tonmoy, S M Mehedi Zaman, Amit Sheth, Amitava Das
for: investigate the influence of linguistic factors in prompts on the occurrence of LLM hallucinations
methods: experimental study using prompts with varying levels of readability, formality, and concreteness
results: prompts with greater formality and concreteness tend to result in reduced hallucinations, while the outcomes pertaining to readability are mixed.Abstract
As Large Language Models (LLMs) have advanced, they have brought forth new challenges, with one of the prominent issues being LLM hallucination. While various mitigation techniques are emerging to address hallucination, it is equally crucial to delve into its underlying causes. Consequently, in this preliminary exploratory investigation, we examine how linguistic factors in prompts, specifically readability, formality, and concreteness, influence the occurrence of hallucinations. Our experimental results suggest that prompts characterized by greater formality and concreteness tend to result in reduced hallucination. However, the outcomes pertaining to readability are somewhat inconclusive, showing a mixed pattern.
摘要
LLMs 的进步也带来了新的挑战,其中一个主要问题是 LLM 幻觉。虽然各种 mitigation 技术正在emerging,但是也非常重要探讨幻觉的深层原因。因此,在这项初步的探索性研究中,我们研究了提示中语言因素对幻觉的影响,特别是可读性、正式度和具体性。我们的实验结果表明,使用更正式和具体的提示可以减少幻觉,但是关于可读性的结果呈杂化的模式。
Design of Chain-of-Thought in Math Problem Solving
results: 研究结果显示,程序CoT在数学问题解决中具有优势,特别是自然语言CoT提供了更大的多样性,可以实现更高的性能。此外,研究还发现了Python是程序CoT的更好的编程语言。研究结果可以为未来的CoT设计提供有价值的指导,并且可以考虑编程语言和编程风格的因素进行进一步的改进。Abstract
Chain-of-Thought (CoT) plays a crucial role in reasoning for math problem solving. We conduct a comprehensive examination of methods for designing CoT, comparing conventional natural language CoT with various program CoTs, including the self-describing program, the comment-describing program, and the non-describing program. Furthermore, we investigate the impact of programming language on program CoTs, comparing Python and Wolfram Language. Through extensive experiments on GSM8K, MATHQA, and SVAMP, we find that program CoTs often have superior effectiveness in math problem solving. Notably, the best performing combination with 30B parameters beats GPT-3.5-turbo by a significant margin. The results show that self-describing program offers greater diversity and thus can generally achieve higher performance. We also find that Python is a better choice of language than Wolfram for program CoTs. The experimental results provide a valuable guideline for future CoT designs that take into account both programming language and coding style for further advancements. Our datasets and code are publicly available.
摘要
Chain-of-Thought (CoT) 在数学问题解决中扮演着关键性的角色。我们对设计 CoT 的方法进行了全面的评估,比较了自然语言 CoT 与不同的程序 CoT,包括自我描述程序、注释描述程序和非描述程序。此外,我们还 investigate了编程语言对程序 CoT 的影响,比较了 Python 和 Wolfram 语言。通过对 GSM8K、MATHQA 和 SVAMP 等数据集进行了广泛的实验,我们发现program CoT 在数学问题解决中经常具有更高的效果。特别是,使用 30B 参数的最佳组合可以很大幅度地超越 GPT-3.5-turbo。结果表明,自我描述程序可以提供更多的多样性,因此通常可以达到更高的性能。我们还发现 Python 比 Wolfram 更适合用于 program CoT。我们的实验结果提供了未来 CoT 设计的价值指南,考虑到编程语言和编程风格,以便进一步提高表达能力。我们的数据集和代码公开可用。
Clustered FedStack: Intermediate Global Models with Bayesian Information Criterion
methods: 使用 Stacked Federated Learning(FedStack)框架,并采用三种集群机制:K-Means、Agglomerative和Gaussian Mixture Models。使用 Bayesian Information Criterion(BIC)确定集群数量。
results: Clustered FedStack模型比基eline模型 WITH clustering机制表现更好,并且使用cyclical learning rates来估计框架的整合程度。Abstract
Federated Learning (FL) is currently one of the most popular technologies in the field of Artificial Intelligence (AI) due to its collaborative learning and ability to preserve client privacy. However, it faces challenges such as non-identically and non-independently distributed (non-IID) and data with imbalanced labels among local clients. To address these limitations, the research community has explored various approaches such as using local model parameters, federated generative adversarial learning, and federated representation learning. In our study, we propose a novel Clustered FedStack framework based on the previously published Stacked Federated Learning (FedStack) framework. The local clients send their model predictions and output layer weights to a server, which then builds a robust global model. This global model clusters the local clients based on their output layer weights using a clustering mechanism. We adopt three clustering mechanisms, namely K-Means, Agglomerative, and Gaussian Mixture Models, into the framework and evaluate their performance. We use Bayesian Information Criterion (BIC) with the maximum likelihood function to determine the number of clusters. The Clustered FedStack models outperform baseline models with clustering mechanisms. To estimate the convergence of our proposed framework, we use Cyclical learning rates.
摘要
现在的 Federated Learning(FL)技术在人工智能(AI)领域中非常流行,这是因为它可以实现协同学习并保持客户端隐私。然而,FL还面临着非标一同分布(non-IID)和数据偏极性(imbalanced labels)等问题。为了解决这些局限性,研究人员已经提出了多种方法,如使用本地模型参数、联邦生成敌方搜索学习和联邦表示学习。在我们的研究中,我们提出了一种基于先前发表的 Stacked Federated Learning(FedStack)框架的 Novel Clustered FedStack 框架。本地客户端将其模型预测结果和输出层加权值发送到服务器,服务器然后建立一个强大的全局模型。这个全局模型使用一种卷积机制将本地客户端分为不同的集群。我们在框架中采用了 K-Means、Agglomerative 和 Gaussian Mixture Models 三种卷积机制,并使用 Bayesian Information Criterion(BIC)与最大似然函数来确定集群数量。Clustered FedStack 模型在基eline模型中表现出色,以便估算我们提出的框架的整合。为了估算我们的提出的框架的整合,我们使用 Cyclical learning rates。
Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters
results: 实验结果表明,提出的 MTA 架构和两个阶段训练方法可以达到良好的性能。此外,基于 ALTER 的 MTA-equipped 语言模型在不同领域中也得到了良好的result。Abstract
Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTure-of-task-adaptERs upon small language models (with <1B parameters) to address multiple NLP tasks simultaneously, capturing the commonalities and differences between tasks, in order to support domain-specific applications. Specifically, in ALTER, we propose the Mixture-of-Task-Adapters (MTA) module as an extension to the transformer architecture for the underlying model to capture the intra-task and inter-task knowledge. A two-stage training method is further proposed to optimize the collaboration between adapters at a small computational cost. Experimental results over a mixture of NLP tasks show that our proposed MTA architecture and the two-stage training method achieve good performance. Based on ALTER, we have also produced MTA-equipped language models for various domains.
摘要
最近,大型语言模型(LLMs)在多种自然语言处理(NLP)任务上实现了惊人的零shot学习性能,尤其是文本生成任务。然而,大型模型的大小经常导致模型训练和在线部署的计算成本高涨。在我们的工作中,我们提出了ALTER系统,可以有效地建立多任务学习者,通过将小型语言模型( Parameters <1B)扩展到多个NLP任务,以便同时处理多个任务,捕捉任务之间的共同点和差异,以支持域pecific应用。具体来说,在ALTER中,我们提出了mixture-of-task-adaptERs(MTA)模块,作为 transformer 架构的增强部分,以Capture intra-task和inter-task知识。我们还提出了一种两Stage训练方法,以便在小型计算成本下优化 adapter collaboration。实验结果表明,我们的提议的MTA架构和两Stage训练方法在一组多种NLP任务上具有良好的表现。基于ALTER,我们还生成了各个领域的MTA语言模型。
Federated Learning in Intelligent Transportation Systems: Recent Applications and Open Problems
results: 本研究发现了在智能交通系统中应用FL后,可以提高对象识别精度、提高交通管理效率和提高服务提供质量等。但是,FL也存在一些挑战,如数据不均匀分布、计算机力和存储空间的限制,以及隐私和安全问题。Abstract
Intelligent transportation systems (ITSs) have been fueled by the rapid development of communication technologies, sensor technologies, and the Internet of Things (IoT). Nonetheless, due to the dynamic characteristics of the vehicle networks, it is rather challenging to make timely and accurate decisions of vehicle behaviors. Moreover, in the presence of mobile wireless communications, the privacy and security of vehicle information are at constant risk. In this context, a new paradigm is urgently needed for various applications in dynamic vehicle environments. As a distributed machine learning technology, federated learning (FL) has received extensive attention due to its outstanding privacy protection properties and easy scalability. We conduct a comprehensive survey of the latest developments in FL for ITS. Specifically, we initially research the prevalent challenges in ITS and elucidate the motivations for applying FL from various perspectives. Subsequently, we review existing deployments of FL in ITS across various scenarios, and discuss specific potential issues in object recognition, traffic management, and service providing scenarios. Furthermore, we conduct a further analysis of the new challenges introduced by FL deployment and the inherent limitations that FL alone cannot fully address, including uneven data distribution, limited storage and computing power, and potential privacy and security concerns. We then examine the existing collaborative technologies that can help mitigate these challenges. Lastly, we discuss the open challenges that remain to be addressed in applying FL in ITS and propose several future research directions.
摘要
智能交通系统(ITS)因通信技术、感知技术和互联网对话的快速发展而得到推动。然而,由于车辆网络的动态特性,很难在时间上进行准确的车辆行为决策。此外,在移动无线通信的存在下,车辆信息的隐私和安全总是处于风险之中。在这种情况下,一种新的思维方式是紧迫的,以满足不同应用场景的需求。作为分布式机器学习技术,联邦学习(FL)在隐私保护和扩展可扩展性等方面受到了广泛的关注。我们进行了ITS中FL最新的发展情况的全面评估。我们首先研究了ITS中存在的主要挑战和应用FL的动机,然后评论了ITS中FL的不同场景应用,包括物体识别、交通管理和服务提供等方面的问题。此外,我们还进行了进一步的分析,探讨FL部署引入的新挑战和FL本身无法解决的内在限制,包括数据分布不均、计算和存储能力有限和隐私和安全问题。最后,我们讨论了在应用FL时存在的开放挑战,并提出了未来研究方向。
ModelGiF: Gradient Fields for Model Functional Distance
paper_authors: Jie Song, Zhengqi Xu, Sai Wu, Gang Chen, Mingli Song
for: 这 paper 的目的是量化不同预训练模型之间的功能距离,以便为各种目的进行评估。
methods: 该 paper 使用了基于 “场” 的思想,提出了 Model Gradient Field (ModelGiF),用于从不同预训练模型中提取同谱表示。
results: 实验结果表明,ModelGiF 在任务相关性判断、知识产权保护和模型忘却验证等方面具有显著的优势,与当前竞争者相比显著性更高。Abstract
The last decade has witnessed the success of deep learning and the surge of publicly released trained models, which necessitates the quantification of the model functional distance for various purposes. However, quantifying the model functional distance is always challenging due to the opacity in inner workings and the heterogeneity in architectures or tasks. Inspired by the concept of "field" in physics, in this work we introduce Model Gradient Field (abbr. ModelGiF) to extract homogeneous representations from the heterogeneous pre-trained models. Our main assumption underlying ModelGiF is that each pre-trained deep model uniquely determines a ModelGiF over the input space. The distance between models can thus be measured by the similarity between their ModelGiFs. We validate the effectiveness of the proposed ModelGiF with a suite of testbeds, including task relatedness estimation, intellectual property protection, and model unlearning verification. Experimental results demonstrate the versatility of the proposed ModelGiF on these tasks, with significantly superiority performance to state-of-the-art competitors. Codes are available at https://github.com/zju-vipa/modelgif.
摘要
过去一个十年,深度学习的成功和公共释放的训练模型的涌现,使得模型功能距离的量化变得非常重要。然而,量化模型功能距离总是困难的,因为深度学习模型的内部工作机制是不透明的,而且模型或任务的architecture和task都是多样的。引用物理学中的“场”概念,在这种工作中我们提出了Model Gradient Field(简称ModelGiF)来EXTRACT homogeneous representation from heterogeneous pre-trained models。我们假设每个预训练深度模型具有唯一的ModelGiF over the input space,因此可以通过比较这些ModelGiF的相似性来度量模型之间的距离。我们验证了提议的ModelGiF的效果通过一系列测试床,包括任务相似性预测、知识产权保护和模型忘记验证。实验结果表明提议的ModelGiF在这些任务上具有显著的优势性能,与现有的竞争对手相比。代码可以在https://github.com/zju-vipa/modelgif上获取。
Spiking NeRF: Making Bio-inspired Neural Networks See through the Real World
results: 实验结果显示,这个方法可以实现$76.74%$的能源优化,并且与生物学上的神经元运作相似。Abstract
Spiking neuron networks (SNNs) have been thriving on numerous tasks to leverage their promising energy efficiency and exploit their potentialities as biologically plausible intelligence. Meanwhile, the Neural Radiance Fields (NeRF) render high-quality 3D scenes with massive energy consumption, and few works delve into the energy-saving solution with a bio-inspired approach. In this paper, we propose spiking NeRF (SpikingNeRF), which aligns the radiance ray with the temporal dimension of SNN, to naturally accommodate the SNN to the reconstruction of Radiance Fields. Thus, the computation turns into a spike-based, multiplication-free manner, reducing the energy consumption. In SpikingNeRF, each sampled point on the ray is matched onto a particular time step, and represented in a hybrid manner where the voxel grids are maintained as well. Based on the voxel grids, sampled points are determined whether to be masked for better training and inference. However, this operation also incurs irregular temporal length. We propose the temporal condensing-and-padding (TCP) strategy to tackle the masked samples to maintain regular temporal length, i.e., regular tensors, for hardware-friendly computation. Extensive experiments on a variety of datasets demonstrate that our method reduces the $76.74\%$ energy consumption on average and obtains comparable synthesis quality with the ANN baseline.
摘要
神经风暴网络(SNN)在许多任务上得到了广泛应用,以利用其能效的能源和生物可能的智能潜力。然而,神经辐射场(NeRF)的渲染高质量3D场景却需要巨大的能源消耗,而很少的研究探讨了以生物静脉为导向的能源抑制方法。在这篇论文中,我们提出了神经辐射场(SpikingNeRF),它将辐射场的强度方向与SNN的时间维度对齐,以自然地让SNN参与辐射场的重建。因此,计算变成了一种快速、无 multiplication 的方式,从而降低了能源消耗。在SpikingNeRF中,每个样本点被匹配到特定的时间步,并以混合方式表示,保留了 voxel 网格。基于 voxel 网格,样本点是否需要被masking 以提高训练和推理的质量。然而,这个操作也会产生不规则的时间长度。我们提出了时间condensing-and-padding(TCP)策略,以解决masked samples的问题,以保持常规的时间长度,即常规的tensor,为硬件友好的计算。在多个dataset上进行了广泛的实验,表明我们的方法可以降低76.74%的能源消耗,并与ANN基线相当的Synthesis质量。
paper_authors: Bingzhe Wu for: 本研究旨在检验GPT-4对经典投资理论的理解程度和对实际交易数据分析的代码解释能力。methods: 本研究使用GPT-4对特定资产的日均K线数据进行分析,基于尼采尔浪幕理论等特定理论。results: 本研究发现GPT-4在对实际交易数据分析中表现出较高的解释深度和准确率,同时提供了有价值的投资理论应用方法。Abstract
Recently, large language models (LLMs), particularly GPT-4, have demonstrated significant capabilities in various planning and reasoning tasks \cite{cheng2023gpt4,bubeck2023sparks}. Motivated by these advancements, there has been a surge of interest among researchers to harness the capabilities of GPT-4 for the automated design of quantitative factors that do not overlap with existing factor libraries, with an aspiration to achieve alpha returns \cite{webpagequant}. In contrast to these work, this study aims to examine the fidelity of GPT-4's comprehension of classic trading theories and its proficiency in applying its code interpreter abilities to real-world trading data analysis. Such an exploration is instrumental in discerning whether the underlying logic GPT-4 employs for trading is intrinsically reliable. Furthermore, given the acknowledged interpretative latitude inherent in most trading theories, we seek to distill more precise methodologies of deploying these theories from GPT-4's analytical process, potentially offering invaluable insights to human traders. To achieve this objective, we selected daily candlestick (K-line) data from specific periods for certain assets, such as the Shanghai Stock Index. Through meticulous prompt engineering, we guided GPT-4 to analyze the technical structures embedded within this data, based on specific theories like the Elliott Wave Theory. We then subjected its analytical output to manual evaluation, assessing its interpretative depth and accuracy vis-\`a-vis these trading theories from multiple dimensions. The results and findings from this study could pave the way for a synergistic amalgamation of human expertise and AI-driven insights in the realm of trading.
摘要
最近,大语言模型(LLM),特别是GPT-4,在各种计划和理解任务中表现出了显著的能力。这些进步引起了研究人员对GPT-4的投资 alpha 回报的兴趣,并寻求通过自动设计不同于现有因素库的量化因素来实现这一目标。与这些工作不同,本研究旨在检验GPT-4对经典交易理论的理解和对实际交易数据分析中的代码解释能力。这种探索有助于判断GPT-4在交易中使用的逻辑是否具有内在的可靠性。此外,由于交易理论中的解释空间往往很大,我们寻求通过GPT-4的分析过程中提取更加精细的方法来应用这些理论,从而为人类交易员提供有价值的想法。为达到这个目标,我们选择了特定期间的一些资产的日均盘形(K-line)数据,例如上海股票指数。通过仔细的提问工程,我们导引GPT-4分析这些数据中的技术结构,基于特定的投资理论,如欧拉瓦vecenie理论。然后,我们对GPT-4的分析输出进行手动评估,评估其在这些交易理论多个维度的解释深度和准确性。研究结果和发现可能为人类专家和 AI 驱动的想法带来协同合作,为交易领域带来新的发展。
AI-Driven Patient Monitoring with Multi-Agent Deep Reinforcement Learning
results: 与多种基线模型进行比较,研究表明,提posed的DRL方法在实际生理和运动数据集PPG-DaLiA和WESAD上的表现准确性高于所有基线模型,并且通过调整Hyperparameter进行优化,进一步提高代理的总性能。Abstract
Effective patient monitoring is vital for timely interventions and improved healthcare outcomes. Traditional monitoring systems often struggle to handle complex, dynamic environments with fluctuating vital signs, leading to delays in identifying critical conditions. To address this challenge, we propose a novel AI-driven patient monitoring framework using multi-agent deep reinforcement learning (DRL). Our approach deploys multiple learning agents, each dedicated to monitoring a specific physiological feature, such as heart rate, respiration, and temperature. These agents interact with a generic healthcare monitoring environment, learn the patients' behavior patterns, and make informed decisions to alert the corresponding Medical Emergency Teams (METs) based on the level of emergency estimated. In this study, we evaluate the performance of the proposed multi-agent DRL framework using real-world physiological and motion data from two datasets: PPG-DaLiA and WESAD. We compare the results with several baseline models, including Q-Learning, PPO, Actor-Critic, Double DQN, and DDPG, as well as monitoring frameworks like WISEML and CA-MAQL. Our experiments demonstrate that the proposed DRL approach outperforms all other baseline models, achieving more accurate monitoring of patient's vital signs. Furthermore, we conduct hyperparameter optimization to fine-tune the learning process of each agent. By optimizing hyperparameters, we enhance the learning rate and discount factor, thereby improving the agents' overall performance in monitoring patient health status. Our AI-driven patient monitoring system offers several advantages over traditional methods, including the ability to handle complex and uncertain environments, adapt to varying patient conditions, and make real-time decisions without external supervision.
摘要
通过人工智能驱动的患者监测框架,我们可以提高医疗结果和患者监测效果。传统的监测系统经常在复杂和动态的环境中难以处理,导致检测重要情况的延迟。为解决这个挑战,我们提出了一种基于多代理深度学习(DRL)的新型患者监测框架。我们的方法在多个学习代理之间分配不同的生物 physiological 特征,例如心率、呼吸和体温。这些代理与一个通用医疗监测环境进行交互,学习患者的行为模式,并根据紧急程度来通知相应的医疗紧急队伍(METs)。在本研究中,我们使用实际的生理和运动数据进行评估,并与多种基准模型进行比较,包括Q学习、PPO、actor-critic、Double DQN 和 DDPG 等。我们的实验表明,提出的 DRL 方法在监测患者生命体征上的准确性比基准模型高。此外,我们还进行了 гипер参数优化,以提高每个代理的学习过程。通过优化 гипер参数,我们可以提高代理的总表现,以更好地监测患者健康状态。我们的人工智能驱动的患者监测系统具有许多优势,包括能够处理复杂和不确定的环境、适应变化的患者状况,以及不需要外部监督而行动。
methods: 该论文使用了一种新的方法,即内文对照学习 WITH 精神(ICLD),利用新闻报道的特殊结构特征进行采样,并通过对比分类来增强模型的性能。
results: 论文的实验结果表明,ICLD 方法可以有效地解决新闻报道中文本结构分类问题,并且比传统的监督学习方法更有效。Abstract
News Discourse Profiling seeks to scrutinize the event-related role of each sentence in a news article and has been proven useful across various downstream applications. Specifically, within the context of a given news discourse, each sentence is assigned to a pre-defined category contingent upon its depiction of the news event structure. However, existing approaches suffer from an inadequacy of available human-annotated data, due to the laborious and time-intensive nature of generating discourse-level annotations. In this paper, we present a novel approach, denoted as Intra-document Contrastive Learning with Distillation (ICLD), for addressing the news discourse profiling task, capitalizing on its unique structural characteristics. Notably, we are the first to apply a semi-supervised methodology within this task paradigm, and evaluation demonstrates the effectiveness of the presented approach.
摘要
新闻话语分析旨在研究每个新闻文章中的每句话语的事件相关性角色,并在多种下游应用中表现出有用性。特别是在给定的新闻话语背景下,每句话语会被分配到预定的类别,根据它们描述新闻事件结构。然而,现有的方法受到有限的人工标注数据的不足,这是因为生成话语水平标注的劳动和时间费时的。在这篇论文中,我们提出了一种新的方法,称为Intra-document Contrastive Learning with Distillation(ICLD),用于解决新闻话语 profiling 任务,利用它的独特结构特征。值得注意的是,我们是首次在这个任务准则下应用 semi-supervised 方法ологи,评估结果表明该方法的有效性。
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
results: 根据 LLaMA-2 为基础模型,实现了在 WMT’21 和 WMT’22 测试集上的平均提高超过 12 BLEU 和 12 COMET,在 10 个翻译方向上。表现较之前的所有工作更好,甚至超过 NLLB-54B 模型和 GPT-3.5-text-davinci-003,即使只有 7B 或 13B 参数。这种方法为机器翻译训练方法提供了基础。Abstract
Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.
摘要
生成大型自然语言模型(LLM)在不同的自然语言处理任务中已经取得了非常出色的进步。然而,这些进步并没有反映在翻译任务中,尤其是使用中型模型(i.e., 7B或13B参数),这些模型仍然落后于传统的监督编码器-解码器翻译模型。先前的研究已经尝试使用不同的方法来提高这些中型LLM的翻译能力,但其成果很有限。在这个研究中,我们提出了一种特有的练习方法,用于提高LLM的翻译能力,不需要大量的并行数据。我们的方法包括两个练习阶段:首先在单语言数据上进行初始练习,然后在一个小量高质量并行数据上进行 subsequential 练习。我们称之为Advanced Language Model-based trAnslator(ALMA)。基于LLaMA-2作为我们的基础模型,我们的结果显示,该模型可以在10个翻译方向上 average 提高超过12个BLEU和12个COMET的性能,相比于零开始性能。这个性能高于所有之前的工作,甚至超过NLLB-54B模型和GPT-3.5-text-davinci-003模型,即使只有7B或13B参数。这种方法创立了一种新的训练 парадигма在机器翻译领域。
Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation
paper_authors: Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly
for: 这 paper 的目的是证明使用不同噪音水平生成 Knowledge Graph (KG) 和文本对应的数据集,可以训练前向和反向神经网络模型,但是使用不同的数据集可能会导致更多的幻觉和更差的拟合率。
methods: 这 paper 使用了生成文本和 KG 的cyclic evaluation来评估模型的性能,并通过手动创建 WebNLG 和自动创建 TeKGen 和 T-REx 来评估模型的表现。
results: 这 paper 发现,使用不同噪音水平生成的数据集可以影响模型的性能,并且手动创建的 WebNLG 表现更好于自动创建的 TeKGen 和 T-REx。此外,使用大语言模型 (LLM) 构建的数据集可以训练模型在文本生成中表现出色,但是在 Knowledge Graph 生成中表现较差,可能是因为没有一个共同的 Ontology。Abstract
Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation. We also construct two synthetic datasets using large language models (LLMs), and observe that these are conducive to models that perform significantly well on cyclic generation of text, but less so on cyclic generation of KGs, probably because of a lack of a consistent underlying ontology.
摘要
Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However, models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation, we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation. We also construct two synthetic datasets using large language models (LLMs), and observe that these are conducive to models that perform significantly well on cyclic generation of text, but less so on cyclic generation of KGs, probably because of a lack of a consistent underlying ontology.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Towards Effective Disambiguation for Machine Translation with Large Language Models
results: 实验结果表明,我们的方法可以与当前状态的系统如深度翻译和 NLLB 匹配或超越,在五种语言方向中四种方向中表现出色。Abstract
Resolving semantic ambiguity has long been recognised as a central challenge in the field of machine translation. Recent work on benchmarking translation performance on ambiguous sentences has exposed the limitations of conventional Neural Machine Translation (NMT) systems, which fail to capture many of these cases. Large language models (LLMs) have emerged as a promising alternative, demonstrating comparable performance to traditional NMT models while introducing new paradigms for controlling the target outputs. In this paper, we study the capabilities of LLMs to translate ambiguous sentences containing polysemous words and rare word senses. We also propose two ways to improve the handling of such ambiguity through in-context learning and fine-tuning on carefully curated ambiguous datasets. Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions. Our research provides valuable insights into effectively adapting LLMs for disambiguation during machine translation.
摘要
解决语义含义的挑战一直被认为是机器翻译领域的中心问题。最近的研究表明,使用含义ambiguous sentence进行翻译性能测试的传统神经机器翻译(NMT)系统有限,不能捕捉这些情况。大型语言模型(LLM)在这些情况下表现出了潜在的优势,并提出了新的控制目标输出的方法。在这篇论文中,我们研究了LLM在含义ambiguous sentence中翻译的能力,并提出了两种改进方法,通过在上下文学习和精心编辑的歧义数据进行训练。实验结果显示,我们的方法可以与现有的状态机DeepL和NLLB相当或超越,在五种语言方向中四种方向取得了最佳效果。我们的研究为将LLM适应到翻译中的歧义提供了有价值的视角。
Hate speech detection in algerian dialect using deep learning
paper_authors: Dihia Lanasri, Juan Olano, Sifal Klioui, Sin Liang Lee, Lamia Sekkai
for: 帮助掌握在阿拉伯语言上的仇恨言论检测问题,尤其是在阿尔жи尔语 dialect中。
methods: 使用深度学习架构对阿尔жи尔社交媒体上的短讯进行分类,以确定是否包含仇恨言论。
results: 在对13500余个阿尔жи尔社交媒体短讯的实验中,提出了一种可靠的仇恨言论检测方法,并取得了批判性的结果。Abstract
With the proliferation of hate speech on social networks under different formats, such as abusive language, cyberbullying, and violence, etc., people have experienced a significant increase in violence, putting them in uncomfortable situations and threats. Plenty of efforts have been dedicated in the last few years to overcome this phenomenon to detect hate speech in different structured languages like English, French, Arabic, and others. However, a reduced number of works deal with Arabic dialects like Tunisian, Egyptian, and Gulf, mainly the Algerian ones. To fill in the gap, we propose in this work a complete approach for detecting hate speech on online Algerian messages. Many deep learning architectures have been evaluated on the corpus we created from some Algerian social networks (Facebook, YouTube, and Twitter). This corpus contains more than 13.5K documents in Algerian dialect written in Arabic, labeled as hateful or non-hateful. Promising results are obtained, which show the efficiency of our approach.
摘要
Translated into Simplified Chinese:随着社交媒体上不同形式的仇恨言语、网络欺凌和暴力等等的普及,人们受到了不适的情况和威胁。过去几年,为了解决这种现象,各种努力已经投入了很多时间和精力,以检测不同的结构语言中的仇恨言语,如英语、法语、阿拉伯语等等。然而,对于阿拉伯 диалект,如突尼斯、埃及和 Golfo 的研究相对较少。为了填补这个空白,我们在这工作中提出了一个完整的方法,用于在在线阿尔及利亚消息中检测仇恨言语。我们在一些阿尔及利亚社交媒体(Facebook、YouTube和Twitter)上创建了一个大量的 corpus,包括13500余个文档,用阿尔及利亚 диалект的阿拉伯语书写,标注为有仇恨或无仇恨。我们评估了多种深度学习架构,并获得了良好的结果,这表明我们的方法的效果。
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
results: 通过发布SpeechAlign框架,这篇论文为speech模型评估提供了可 accessible的评估框架,并通过使用这个框架对开源Speech Translation模型进行了比较。Abstract
Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. To contribute to these fields, we present SpeechAlign, a framework to evaluate the underexplored field of source-target alignment in speech models. Our framework has two core components. First, to tackle the absence of suitable evaluation datasets, we introduce the Speech Gold Alignment dataset, built upon a English-German text translation gold alignment dataset. Secondly, we introduce two novel metrics, Speech Alignment Error Rate (SAER) and Time-weighted Speech Alignment Error Rate (TW-SAER), to evaluate alignment quality in speech models. By publishing SpeechAlign we provide an accessible evaluation framework for model assessment, and we employ it to benchmark open-source Speech Translation models.
摘要
<>转换给定文本到简化中文。现在演示的 Speech-to-Speech 和 Speech-to-Text 翻译是研究领域的动态领域。为了贡献这些领域,我们提出 SpeechAlign 框架,用于评估speech模型中source-target对齐的领域。我们的框架有两个核心组成部分。首先,由于缺乏适合的评估数据集,我们引入 Speech Gold Alignment 数据集,基于英语-德语文本翻译金标Alignment数据集。其次,我们引入两种新的指标,Speech Alignment Error Rate (SAER) 和 Time-weighted Speech Alignment Error Rate (TW-SAER),用于评估对齐质量在speech模型中。通过发布 SpeechAlign,我们提供了一个可访问的评估框架,并使用它来对开源 Speech Translation 模型进行比较。
Incorporating Singletons and Mention-based Features in Coreference Resolution via Multi-task Learning for Better Generalization
results: 本研究在OntoGUMbenchmark上 achieve新的状态机制得分 (+2.7点),并在多个out-of-domain数据集上提高了Robustness (+2.3点的平均提高值),这些提高可能是由于更好的提及检测和更多的数据来自单个提及span的使用所致。Abstract
Previous attempts to incorporate a mention detection step into end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention span data as well as other entity information. This paper presents a coreference model that learns singletons as well as features such as entity type and information status via a multi-task learning-based approach. This approach achieves new state-of-the-art scores on the OntoGUM benchmark (+2.7 points) and increases robustness on multiple out-of-domain datasets (+2.3 points on average), likely due to greater generalizability for mention detection and utilization of more data from singletons when compared to only coreferent mention pair matching.
摘要
先前的尝试将提及检测步骤包含在英语的端到端神经核心referencing中,受到缺乏单个提及跨度数据以及其他实体信息的限制。这篇论文提出了一种核心模型,可以学习单个提及以及实体类型和信息状态等特征,使用多任务学习的方式。这种方法在OntoGUM benchmark上达到了新的状态态标准分(+2.7分),并在多个 OUT-OF-DOMAIN 数据集上提高了鲁棒性(平均+2.3分),可能是因为更好的提及检测和更多的数据来自单个提及 span 的利用。
Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets
results: 研究结果表明,Context基于的模型仍然受到来源帖子信息的限制,并且忽略了上下文信息的重要作用。此外,研究还探讨了数据分割策略对分类器性能的影响,并提供了实践的建议来降低静态数据集中的时间概念漂移的影响。Abstract
A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source posts as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main contribution of this paper is in the in-depth evaluation of the performance gap between content and context-based models specifically on detecting new, unseen rumors. Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors' source post and tend to overlook the significant role that contextual information can play. We also study the effect of data split strategies on classifier performance. Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets during the training of rumor detection methods.
摘要
一个重要的噱头检测模型特点是其能够总结,特别是检测出现在未知噱头。过去的研究表明,含有媒体文章仅作输入的内容基于噱头检测模型在未看过的噱头上表现较差。同时,叙述基于模型的潜力仍然未得到充分利用。本文的主要贡献在于对内容和叙述基于模型的性能差异进行深入评估,特别是检测新的、未知噱头。我们的实验结果表明,叙述基于模型仍然过分依赖源媒体文章提供的信息,而忽视了Contextual信息的重要作用。我们还研究了数据分裂策略对分类器性能的影响。根据我们的实验结果,文章还提供了实践的建议,以降低在训练噱头检测方法时的时间概念退变的影响。
SignBank+: Multilingual Sign Language Translation Dataset
methods: 介绍SignBank+数据集,是Optimized for machine translation的纯净版SignBank数据集,并使用简单的文本到文本翻译方法。
results: 评估结果显示,使用SignBank+数据集训练的模型超过原始数据集训练的模型,创造新的benchmark和提供开放资源 для未来研究。Abstract
This work advances the field of sign language machine translation by focusing on dataset quality and simplification of the translation system. We introduce SignBank+, a clean version of the SignBank dataset, optimized for machine translation. Contrary to previous works that employ complex factorization techniques for translation, we advocate for a simplified text-to-text translation approach. Our evaluation shows that models trained on SignBank+ surpass those on the original dataset, establishing a new benchmark and providing an open resource for future research.
摘要
这个研究提高了手语机器翻译的领域,关注数据集质量和翻译系统简化。我们介绍了SignBank+,一个优化的手语数据集,适用于机器翻译。与前期工作不同,我们主张使用简单的文本到文本翻译方法。我们的评估表明,基于SignBank+的模型比原始数据集模型更高效,创造了新的标准和提供了未来研究的开放资源。
Hierarchical reinforcement learning with natural language subgoals
results: 该方法比专家复制行为和没有这种监督目标空间的HRL better表现,表明该方法可以结合人类专家监督和奖励学习的优点。Abstract
Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment. In particular, we use unconstrained natural language to parameterize this space. This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks. Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space. Our work presents a novel approach to combining human expert supervision with the benefits and flexibility of reinforcement learning.
摘要
hierarchical reinforcement learning 是一种吸引人的方法,可以实现长序列动作的目标行为。然而,在真实或开放的环境中实现具有挑战。一个主要挑战是找到适当的下一级目标空间,以实现层次结构。我们提出了一种新的方法,使用人类解决这些任务的数据来软着册这个空间。具体来说,我们使用无结构的自然语言来 parameterize这个空间。这有两个优点:首先,可以轻松地从不熟悉的人参与者中获得这些数据;其次,它够灵活,可以表示人类相关任务中的广泛下一级目标。我们的方法比不同扩展学习的代理人和不带有此协助下一级目标空间的 HRL 表现更好。我们的工作提出了一种结合人类专家指导和强化学习的新方法。
DreamLLM: Synergistic Multimodal Comprehension and Creation
results: DreamLLM 能够生成免 Training 的多modal通用专家,在多modal总体 экспериментах中表现出色,受益于提高的学习共识。Abstract
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The first focuses on the generative modeling of both language and image posteriors by direct sampling in the raw multimodal space. This approach circumvents the limitations and information loss inherent to external feature extractors like CLIP, and a more thorough multimodal understanding is obtained. Second, DreamLLM fosters the generation of raw, interleaved documents, modeling both text and image contents, along with unstructured layouts. This allows DreamLLM to learn all conditional, marginal, and joint multimodal distributions effectively. As a result, DreamLLM is the first MLLM capable of generating free-form interleaved content. Comprehensive experiments highlight DreamLLM's superior performance as a zero-shot multimodal generalist, reaping from the enhanced learning synergy.
摘要
Generative modeling of both language and image posteriors through direct sampling in the raw multimodal space. This approach bypasses the limitations of external feature extractors like CLIP and enables a more comprehensive understanding of multimodal information.2. Generation of raw, interleaved documents that model both text and image contents, as well as unstructured layouts. This allows DreamLLM to effectively learn all conditional, marginal, and joint multimodal distributions.As a result, DreamLLM is the first MLLM capable of generating free-form interleaved content, demonstrating superior performance as a zero-shot multimodal generalist. Comprehensive experiments highlight the enhanced learning synergy achieved by DreamLLM.
Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction
for: 这个论文的目的是提出一种名为控制生成(Prompt Insertion,PI)的方法,用于使大型自然语言模型(Large Language Models,LLMs)可以在自然语言中提供对 grammar 和语法错误 corrections 的直接解释。
methods: 这个论文使用了 Large Language Models (LLMs) 和 Prompt Insertion (PI) 方法来生成对 grammar 和语法错误 corrections 的直接解释。
results: 这个研究发现,使用 PI 方法可以使 LLMs 能够直接在自然语言中提供对 grammar 和语法错误 corrections 的解释,并且可以提高对 correction reasons 的生成性能。Abstract
In Grammatical Error Correction (GEC), it is crucial to ensure the user's comprehension of a reason for correction. Existing studies present tokens, examples, and hints as to the basis for correction but do not directly explain the reasons for corrections. Although methods that use Large Language Models (LLMs) to provide direct explanations in natural language have been proposed for various tasks, no such method exists for GEC. Generating explanations for GEC corrections involves aligning input and output tokens, identifying correction points, and presenting corresponding explanations consistently. However, it is not straightforward to specify a complex format to generate explanations, because explicit control of generation is difficult with prompts. This study introduces a method called controlled generation with Prompt Insertion (PI) so that LLMs can explain the reasons for corrections in natural language. In PI, LLMs first correct the input text, and then we automatically extract the correction points based on the rules. The extracted correction points are sequentially inserted into the LLM's explanation output as prompts, guiding the LLMs to generate explanations for the correction points. We also create an Explainable GEC (XGEC) dataset of correction reasons by annotating NUCLE, CoNLL2013, and CoNLL2014. Although generations from GPT-3 and ChatGPT using original prompts miss some correction points, the generation control using PI can explicitly guide to describe explanations for all correction points, contributing to improved performance in generating correction reasons.
摘要
在语法错误 corrections (GEC) 中,确保用户理解 correction 的理由是关键。现有的研究提供了 tokens、例子和提示,但没有直接解释 correction 的理由。虽然使用 Large Language Models (LLMs) 提供直接解释的自然语言方法已经被提出 для多个任务,但对 GEC 的方法不存在。生成 GEC corrections 的解释 involves 对输入和输出 tokens 进行对应、确定 correction 点并提供相应的解释。然而,不是 straightforward specify 复杂的生成格式,因为Explicit 控制生成是 difficult 的。这种研究引入一种名为 controlled generation with Prompt Insertion (PI) 的方法,使得 LLMs 可以通过自然语言来解释 correction 的理由。在 PI 中,LLMs 首先 corrections 输入文本,然后我们自动提取 correction 点基于规则。提取的 correction 点被自动插入 LLMs 的解释输出中作为提示,导引 LLMs 生成对 correction 点的解释。我们还创建了一个 Explainable GEC (XGEC) 数据集,其中包含 correction 理由的注释。虽然 GPT-3 和 ChatGPT 使用原始提示生成的 Generation 缺少一些 correction 点,但使用 PI 的生成控制可以明确指导 LLMs 生成对 correction 点的解释,从而提高生成 correction 理由的性能。
results: 模型在终到级文档级文本识别和图像到markdown文本生成任务中表现出色,可以适应各种不同的任务,并且可以通过精度微调来适应不同的应用场景。Abstract
We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models.
摘要
我们介绍Kosmos-2.5,一种多Modal literate模型,用于机器阅读图像中的文本内容。Kosmos-2.5在两个不同 yet 相互协作的译写任务中表现出色:(1)生成具有空间坐标的文本块,每个文本块在图像中被分配特定的空间坐标;(2)生成符合markdown格式的结构化文本输出。这种多Modal literate能力通过共享Transformer架构、任务特定的提示和灵活文本表示方式实现。我们对Kosmos-2.5进行了端到端文档级文本识别和图像到markdown文本生成的评估。此外,通过精心微调,可以将模型适应不同的提示任务,使其成为实际应用中文本强度图像理解任务的通用工具。此项工作也为未来扩大多Modal大语言模型的前景铺平了路。
Safurai 001: New Qualitative Approach for Code LLM Evaluation
results: 研究表明,Safurai-001可以超越GPT-3.5和WizardCoder在代码可读性方面,提高1.58%和18.78%。Abstract
This paper presents Safurai-001, a new Large Language Model (LLM) with significant potential in the domain of coding assistance. Driven by recent advancements in coding LLMs, Safurai-001 competes in performance with the latest models like WizardCoder [Xu et al., 2023], PanguCoder [Shen et al., 2023] and Phi-1 [Gunasekar et al., 2023] but aims to deliver a more conversational interaction. By capitalizing on the progress in data engineering (including latest techniques of data transformation and prompt engineering) and instruction tuning, this new model promises to stand toe-to-toe with recent closed and open source developments. Recognizing the need for an efficacious evaluation metric for coding LLMs, this paper also introduces GPT4-based MultiParameters, an evaluation benchmark that harnesses varied parameters to present a comprehensive insight into the models functioning and performance. Our assessment shows that Safurai-001 can outperform GPT-3.5 by 1.58% and WizardCoder by 18.78% in the Code Readability parameter and more.
摘要
Studying Lobby Influence in the European Parliament
results: 我们的结果表明,在欧洲议会的法制 процесса中,利益集团对MEP的影响存在可解释的连接。我们对 relate Lobby 和政治分组的MEP进行了汇总分析,发现与政治分组的意识相符(例如,中间左派组织与社会问题相关)。我们认为这项研究、方法、数据和结果,是为了提高民主机构内复杂决策过程的透明度做出了一步前进。Abstract
We present a method based on natural language processing (NLP), for studying the influence of interest groups (lobbies) in the law-making process in the European Parliament (EP). We collect and analyze novel datasets of lobbies' position papers and speeches made by members of the EP (MEPs). By comparing these texts on the basis of semantic similarity and entailment, we are able to discover interpretable links between MEPs and lobbies. In the absence of a ground-truth dataset of such links, we perform an indirect validation by comparing the discovered links with a dataset, which we curate, of retweet links between MEPs and lobbies, and with the publicly disclosed meetings of MEPs. Our best method achieves an AUC score of 0.77 and performs significantly better than several baselines. Moreover, an aggregate analysis of the discovered links, between groups of related lobbies and political groups of MEPs, correspond to the expectations from the ideology of the groups (e.g., center-left groups are associated with social causes). We believe that this work, which encompasses the methodology, datasets, and results, is a step towards enhancing the transparency of the intricate decision-making processes within democratic institutions.
摘要
我们提出了基于自然语言处理(NLP)的方法,用于研究欧洲议会(EP)中利益集团(游说者)的影响力。我们收集了和分析了游说者的位置纸和EP议员(MEP)的演讲文本。通过比较这些文本的含义相似性和推导关系,我们能够发现MEP和游说者之间的可读取连接。在没有ground truth datasets的情况下,我们进行了间接验证,比较发现的连接与我们自己curate的推特链接和MEP公开的会议记录。我们的最佳方法在AUC分数0.77达到了,并与多个基eline相比表现出色。此外,我们对发现的连接进行了聚合分析,发现与相关的游说者和政治组织相对应。这种结果与政治组织的意识相符,例如中间左派组织与社会问题相关。我们认为这种方法、数据和结果是推进民主机构内复杂决策过程的一步。
GECTurk: Grammatical Error Correction and Detection Dataset for Turkish
paper_authors: Atakan Kara, Farrin Marouf Sofian, Andrew Bond, Gözde Gül Şahin
for: 这个论文的目的是提出一种可以生成高质量的同步数据的Synthetic Data Generation Pipeline,用于解决土耳其语自然语言处理 tasks 中的数据缺乏问题。
methods: 这个论文使用了多种复杂的变换函数来实现更 than 20 个专家修改后的语法和拼写规则,并从专业编辑的文章中 derivation 了130,000个高质量的同步句子。
results: 这个论文通过三种基线模型(neural machine translation, sequence tagging, prefix tuning)实现了强大的结果,并通过对各种尘肤数据进行详细的实验来证明了该论文的可重复性和稳定性。Abstract
Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners. Developing such tools requires a large amount of parallel, annotated data, which is unavailable for most languages. Synthetic data generation is a common practice to overcome the scarcity of such data. However, it is not straightforward for morphologically rich languages like Turkish due to complex writing rules that require phonological, morphological, and syntactic information. In this work, we present a flexible and extensible synthetic data generation pipeline for Turkish covering more than 20 expert-curated grammar and spelling rules (a.k.a., writing rules) implemented through complex transformation functions. Using this pipeline, we derive 130,000 high-quality parallel sentences from professionally edited articles. Additionally, we create a more realistic test set by manually annotating a set of movie reviews. We implement three baselines formulating the task as i) neural machine translation, ii) sequence tagging, and iii) prefix tuning with a pretrained decoder-only model, achieving strong results. Furthermore, we perform exhaustive experiments on out-of-domain datasets to gain insights on the transferability and robustness of the proposed approaches. Our results suggest that our corpus, GECTurk, is high-quality and allows knowledge transfer for the out-of-domain setting. To encourage further research on Turkish GEC, we release our datasets, baseline models, and the synthetic data generation pipeline at https://github.com/GGLAB-KU/gecturk.
摘要
grammatical error detection和修正工具(GEC)对本地语言和第二语言学习者都有用。开发这些工具需要大量并行、注释的数据,但这些数据对大多数语言而言罕见。Synthetic data生成是一种常见的办法来解决这个问题。然而,对于 morphologically rich的语言如土耳其来说,Synthetic data生成并不简单,因为它们的写作规则需要 fonological、morphological和 sintactic信息。在这种情况下,我们提出了一种灵活可扩展的Synthetic data生成管道,可以覆盖More than 20个专家精心编辑的语法和拼写规则(即写作规则),通过复杂的转换函数来实现。通过这种管道,我们得到了130,000个高质量的并行句子,并创建了一个更真实的测试集,通过手动注释一些电影评论。我们实现了三种基线,即 neural machine translation、sequence tagging 和 prefix tuning with a pretrained decoder-only model,取得了出色的结果。此外,我们进行了详细的对out-of-domain数据集的实验,以了解提案方法的传输性和稳定性。我们的结果表明,我们的句子库,GECTurk,具有高质量,并允许知识传输到out-of-domain Setting。为了促进土耳其GEC的研究,我们在https://github.com/GGLAB-KU/gecturk上发布了我们的数据集、基线模型和Synthetic data生成管道。
Improving Article Classification with Edge-Heterogeneous Graph Neural Networks
results: 结果表明,使用edge-heterogeneous graphs 可以提高 GNN 模型的性能,而且可以使用简单和浅的 GNN 拓扑来达到与更复杂的结构相同的性能。在 OGB 竞赛中,我们获得了第15名的成绩(准确率 74.61%),并在 PubMed 数据集上与 state-of-the-art GNN 结构相当(准确率 89.88%)。Abstract
Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Networks (GNN) pipelines with edge-heterogeneous graph representations. SciBERT is used for node feature generation to capture higher-order semantics within the articles' textual metadata. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark (OGB) ogbn-arxiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph (MAG) and PubMed Central, respectively. The results demonstrate that edge-heterogeneous graphs consistently improve the performance of all GNN models compared to the edge-homogeneous graphs. The transformed data enable simple and shallow GNN pipelines to achieve results on par with more complex architectures. On ogbn-arxiv, we achieve a top-15 result in the OGB competition with a 2-layer GCN (accuracy 74.61%), being the highest-scoring solution with sub-1 million parameters. On PubMed, we closely trail SOTA GNN architectures using a 2-layer GraphSAGE by including additional co-authorship edges in the graph (accuracy 89.88%). The implementation is available at: $\href{https://github.com/lyvykhang/edgehetero-nodeproppred}{\text{https://github.com/lyvykhang/edgehetero-nodeproppred}$.
摘要
classe research output into context-specific label taxonomies 是一个复杂且有 relevance 的下游任务, giventhe volume of existing 和 newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Networks (GNN) pipelines with edge-heterogeneous graph representations. SciBERT is used for node feature generation to capture higher-order semantics within the articles' textual metadata. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark (OGB) ogbn-arxiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph (MAG) and PubMed Central, respectively. The results demonstrate that edge-heterogeneous graphs consistently improve the performance of all GNN models compared to the edge-homogeneous graphs. The transformed data enable simple and shallow GNN pipelines to achieve results on par with more complex architectures. On ogbn-arxiv, we achieve a top-15 result in the OGB competition with a 2-layer GCN (accuracy 74.61%), being the highest-scoring solution with sub-1 million parameters. On PubMed, we closely trail SOTA GNN architectures using a 2-layer GraphSAGE by including additional co-authorship edges in the graph (accuracy 89.88%). The implementation is available at: $\href{https://github.com/lyvykhang/edgehetero-nodeproppred}{\text{https://github.com/lyvykhang/edgehetero-nodeproppred}$.
Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition
paper_authors: Ahmed Amine Ben Abdallah, Ata Kabboudi, Amir Kanoun, Salah Zaiem
for: This paper is written for the purpose of developing an effective Automatic Speech Recognition (ASR) solution for dialects, specifically focusing on the Tunisian dialect.
methods: The paper explores self-supervision, semi-supervision, and few-shot code-switching approaches to improve the state-of-the-art in ASR for Tunisian Arabic, English, and French.
results: The paper produces human evaluations of transcripts to avoid the noise coming from spelling inadequacies in testing references, and the models are able to transcribe audio samples in a linguistic mix involving Tunisian Arabic, English, and French. The data used during training and testing are released for public use and further improvements.Abstract
Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforementioned ASR challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated. Second, we explore self-supervision, semi-supervision and few-shot code-switching approaches to push the state-of-the-art on different Tunisian test sets; covering different acoustic, linguistic and prosodic conditions. Finally, and given the absence of conventional spelling, we produce a human evaluation of our transcripts to avoid the noise coming from spelling inadequacies in our testing references. Our models, allowing to transcribe audio samples in a linguistic mix involving Tunisian Arabic, English and French, and all the data used during training and testing are released for public use and further improvements.
摘要
制定一个有效的自动语音识别(ASR)解决方案 для方言需要创新的方法,不仅解决数据缺乏问题,还能够探索方言语言多样性的细节。在这篇论文中,我们关注了前述的ASR挑战,将着眼点在突尼斯方言。首先,我们收集了文本和音频数据,并在某些情况下进行了标注。其次,我们探索了无监督、半监督和少量代码交换的方法,以在不同的突尼斯测试集上提高状态。这些测试集涵盖了不同的听音、语言和语调条件。最后,由于没有传统的拼写法,我们进行了人工评估我们的讲文,以避免测试参考中的杂音。我们的模型可以将突尼斯阿拉伯语、英语和法语混合的语音样本转录为文本,并在训练和测试中使用的所有数据都公开发布,以便进一步的改进。
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services
results: 论文通过对DISC-Law-Eval测试集进行量化和资深评价, demonstarted了其在不同的法律场景中的效果。详细的资源可以在https://github.com/FudanDISC/DISC-LawLLM上找到。Abstract
We propose DISC-LawLLM, an intelligent legal system utilizing large language models (LLMs) to provide a wide range of legal services. We adopt legal syllogism prompting strategies to construct supervised fine-tuning datasets in the Chinese Judicial domain and fine-tune LLMs with legal reasoning capability. We augment LLMs with a retrieval module to enhance models' ability to access and utilize external legal knowledge. A comprehensive legal benchmark, DISC-Law-Eval, is presented to evaluate intelligent legal systems from both objective and subjective dimensions. Quantitative and qualitative results on DISC-Law-Eval demonstrate the effectiveness of our system in serving various users across diverse legal scenarios. The detailed resources are available at https://github.com/FudanDISC/DISC-LawLLM.
摘要
我们提出了DISC-LawLLM,一种智能法律系统,使用大型自然语言模型(LLM)提供广泛的法律服务。我们采用法律逻辑提示策略构建监督精度训练集,在中国司法领域进行超参数 fine-tuning,以提高模型的法律推理能力。我们将LLM加载一个检索模块,以提高模型对外部法律知识的访问和利用能力。我们提供了一个全面的法律评价指标,DISC-Law-Eval,以评估智能法律系统的效果从客观和主观两个角度。我们对DISC-Law-Eval进行了量化和质量的测试,结果表明我们的系统在多种法律场景下可以为用户提供有效的服务。详细的资源可以在https://github.com/FudanDISC/DISC-LawLLM上找到。
The Wizard of Curiosities: Enriching Dialogues with Fun Facts
results: 根据对Over 1000对话的A/B测试表明,启示可以不 только增加用户参与度,还提高用户的平均相对评价值9.7%。Abstract
Introducing curiosities in a conversation is a way to teach something new to the person in a pleasant and enjoyable way. Enriching dialogues with contextualized curiosities can improve the users' perception of a dialog system and their overall user experience. In this paper, we introduce a set of curated curiosities, targeting dialogues in the cooking and DIY domains. In particular, we use real human-agent conversations collected in the context of the Amazon Alexa TaskBot challenge, a multimodal and multi-turn conversational setting. According to an A/B test with over 1000 conversations, curiosities not only increase user engagement, but provide an average relative rating improvement of 9.7%.
摘要
在对话中引入curiosities是一种教育用户新知识的有趣和愉悦的方式。在对话中添加上下文化curiosities可以提高对对话系统的评估和用户总体体验。在这篇论文中,我们介绍了一个 curae的curiosities集合,targeting cooking和DIY对话。特别是,我们使用了来自Amazon Alexa TaskBot挑战的真实人机对话收集,一种多媒体和多turn对话Setting。据A/B测试,curiosities不仅提高了用户参与度,还提供了9.7%的相对评分提升。
The Scenario Refiner: Grounding subjects in images at the morphological level
results: 研究发现,语言模型的预测与人类参与者的判断存在差异,尤其是在 grammatical 方面存在偏向。Abstract
Derivationally related words, such as "runner" and "running", exhibit semantic differences which also elicit different visual scenarios. In this paper, we ask whether Vision and Language (V\&L) models capture such distinctions at the morphological level, using a a new methodology and dataset. We compare the results from V\&L models to human judgements and find that models' predictions differ from those of human participants, in particular displaying a grammatical bias. We further investigate whether the human-model misalignment is related to model architecture. Our methodology, developed on one specific morphological contrast, can be further extended for testing models on capturing other nuanced language features.
摘要
derivationally related words, such as "runner" and "running", exhibit semantic differences which also elicit different visual scenarios. In this paper, we ask whether Vision and Language (V&L) models capture such distinctions at the morphological level, using a new methodology and dataset. We compare the results from V&L models to human judgements and find that models' predictions differ from those of human participants, in particular displaying a grammatical bias. We further investigate whether the human-model misalignment is related to model architecture. Our methodology, developed on one specific morphological contrast, can be further extended for testing models on capturing other nuanced language features.Here's the translation in Traditional Chinese as well: derivationally related words, such as "runner" and "running", exhibit semantic differences which also elicit different visual scenarios. In this paper, we ask whether Vision and Language (V&L) models capture such distinctions at the morphological level, using a new methodology and dataset. We compare the results from V&L models to human judgements and find that models' predictions differ from those of human participants, in particular displaying a grammatical bias. We further investigate whether the human-model misalignment is related to model architecture. Our methodology, developed on one specific morphological contrast, can be further extended for testing models on capturing other nuanced language features.
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
results: 该论文的实验表明,使用 OpenChat 框架和 C-RLFT 方法可以提高开源语言模型的性能,并且在三个标准的 bencmark 上 achieved the highest average performance 中。Abstract
Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat.
摘要
现在,开源大语言模型如LLaMA已经出现。最近的发展包括监督精细调教(SFT)和奖励学习调教(RLFT),以使模型与人类目标 better alignment。然而,SFT方法将所有训练数据视为一样的质量,而RLFT方法需要高质量的对数据进行对比或排名。在这种研究中,我们提出了一种新的框架,名为OpenChat,以提高开源语言模型的质量。 Specifically,我们考虑了通用的SFT训练数据,包括一小量的专家数据和大量的不优化数据,无需任何偏好标签。我们提议了C(条件)-RLFT,它将不同的数据来源视为粗粒化奖励标签,并学习一个类别 Conditioned 策略,以利用不同数据质量信息。有趣的是,C-RLFT 的优化策略可以通过单阶段、RL-free 监督学习,以轻量级和避免高昂的人类偏好标签。经过广泛的实验,我们的 openchat-13b 通过 C-RLFT 进行微调,在三个标准 bench mark 上 achieve 所有 13b 开源语言模型的最高平均性能。此外,我们使用 AGIEval 验证模型的通用性能,只有 openchat-13b 在基础模型之上超越。最后,我们进行了一系列的分析,以证明 OpenChat 的效果和可靠性。我们的代码、数据和模型都可以在 https://github.com/imoneoi/openchat 上获取。
Speak While You Think: Streaming Speech Synthesis During Text Generation
results: 实验结果表明,LM2Speech可以保持教师模型的质量,同时减少对话延迟,以便实现自然的语音交互。Abstract
Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose LLM2Speech, an architecture to synthesize speech while text is being generated by an LLM which yields significant latency reduction. LLM2Speech mimics the predictions of a non-streaming teacher model while limiting the exposure to future context in order to enable streaming. It exploits the hidden embeddings of the LLM, a by-product of the text generation that contains informative semantic context. Experimental results show that LLM2Speech maintains the teacher's quality while reducing the latency to enable natural conversations.
摘要
大语言模型(LLM)显示出很强的能力,然而与这些模型交互通常是通过文本进行的。使用文本到语音synthesize LLM输出通常会导致很长的延迟,这对于流畅的语音对话不实用。我们提议LLM2Speech,一种架构可以在文本生成过程中同时synthesize语音,从而减少延迟。LLM2Speech模仿教师模型的预测,限制未来上下文的暴露,以便实现流动。它利用LLM的隐藏嵌入,这是文本生成过程的产物,含有有用的semanticContext。实验结果表明,LLM2Speech可以保持教师的质量,同时减少延迟,以便实现自然的对话。
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
paper_authors: Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag
for: This paper aims to provide a fair comparison of language modeling methods based on their empirical scaling trends, and to serve as a foundation for meaningful and reproducible research in the field.
methods: The paper introduces an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours, and uses a pre-processed dataset of books to evaluate the methods.
results: The paper shows that the LSTM baseline exhibits a predictable and more favourable scaling law than the GPT baseline, and that the two models intersect at roughly 50,000 accelerator hours.Here is the text in Simplified Chinese:
for: 这篇论文的目的是为语言模型比较提供公平的比较基础,并为语言模型研究提供可重复的基础。
methods: 论文提出了一种实验协议,使得模型比较基于等效计算时间( measured in accelerator hours)进行。为了评价方法,文章使用了一个已经处理过的大型、多样化、高质量的书籍数据集。
results: 论文显示,LSTM基eline在计算时间上采取了一种可预测的和更有利的整体增长规律,而GPT基eline在所有等效计算时间水平上都保持了更好的折衣率。两个基eline在约50,000个加速器小时上交叉。Abstract
The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput. While the GPT baseline achieves better perplexity throughout all our levels of compute, our LSTM baseline exhibits a predictable and more favourable scaling law. This is due to the improved throughput and the need for fewer training tokens to achieve the same decrease in test perplexity. Extrapolating the scaling laws leads of both models results in an intersection at roughly 50,000 accelerator hours. We hope this work can serve as the foundation for meaningful and reproducible language modelling research.
摘要
蓝夷面厨房 serves as both a research collective and codebase, designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modeling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput. While the GPT baseline achieves better perplexity throughout all our levels of compute, our LSTM baseline exhibits a predictable and more favourable scaling law. This is due to the improved throughput and the need for fewer training tokens to achieve the same decrease in test perplexity. Extrapolating the scaling laws leads of both models results in an intersection at roughly 50,000 accelerator hours. We hope this work can serve as the foundation for meaningful and reproducible language modeling research.
Assessment of Pre-Trained Models Across Languages and Grammars
paper_authors: Alberto Muñoz-Ortiz, David Vilares, Carlos Gómez-Rodríguez
for: 这个研究是为了评估多语言大型自然语言处理器(LLMs)如何学习语法结构。
methods: 该研究使用了抽象到多形式语法结构的方法,包括将解析视为序列标签。
results: 研究发现:(一)框架在不同编码下具有一致性,(二)预训练词词 vectors 不会偏好语法树表示于dependency表示,(三)使用字符串分词是需要表示语法结构的,与字符串模型不同,(四)语言出现在预训练数据中的频率比任务数据更重要于从词词 vectors 中恢复语法结构。Abstract
We present an approach for assessing how multilingual large language models (LLMs) learn syntax in terms of multi-formalism syntactic structures. We aim to recover constituent and dependency structures by casting parsing as sequence labeling. To do so, we select a few LLMs and study them on 13 diverse UD treebanks for dependency parsing and 10 treebanks for constituent parsing. Our results show that: (i) the framework is consistent across encodings, (ii) pre-trained word vectors do not favor constituency representations of syntax over dependencies, (iii) sub-word tokenization is needed to represent syntax, in contrast to character-based models, and (iv) occurrence of a language in the pretraining data is more important than the amount of task data when recovering syntax from the word vectors.
摘要
我们提出了一种方法,用于评估多语言大型自然语言处理器(LLM)在多形式语法结构中学习语法的方式。我们希望通过将分析转换为序列标签来恢复句子和依赖结构。为此,我们选择了一些LLM并对13种UD treebanks进行了依赖分析和10种treebanks进行了句子分析。我们的结果表明:(i)框架在不同编码中具有一致性,(ii)预训练词词 vec 不倾向于 syntax 中的句子表示,(iii)字符串分词是必要的,而不是字符串模型,以表示语法,(iv)预training数据中语言的出现次数高于任务数据时,可以更好地从词 vectors 中恢复语法。
Prototype of a robotic system to assist the learning process of English language with text-generation through DNN
paper_authors: Carlos Morales-Torres, Mario Campos-Soberanis, Diego Campos-Sobrino
for: 这个论文是为了帮助英语自学者提高英语水平的。
methods: 这个论文使用了Long Short Term Memory(LSTM)神经网络来生成文本,learners通过图形用户界面与系统互动,系统根据学生的英语水平生成文本。
results: 实验结果显示,learners与系统互动后,他们的 grammatical Range 有所提高。Abstract
In the last ongoing years, there has been a significant ascending on the field of Natural Language Processing (NLP) for performing multiple tasks including English Language Teaching (ELT). An effective strategy to favor the learning process uses interactive devices to engage learners in their self-learning process. In this work, we present a working prototype of a humanoid robotic system to assist English language self-learners through text generation using Long Short Term Memory (LSTM) Neural Networks. The learners interact with the system using a Graphic User Interface that generates text according to the English level of the user. The experimentation was conducted using English learners and the results were measured accordingly to International English Language Testing System (IELTS) rubric. Preliminary results show an increment in the Grammatical Range of learners who interacted with the system.
摘要
最近几年来,自然语言处理(NLP)领域内,有许多进展,以帮助执行多种任务,包括英语教学(ELT)。一种有效的策略是使用互动设备,以吸引学生参与自学习过程。在这个工作中,我们展示了一个人工智能机器人系统,用于帮助英语自学者通过文本生成来提高英语水平。学生通过图形用户界面与系统进行交互,系统根据用户的英语水平生成文本。实验中使用了英语学习者,并根据国际英语语言考试系统(IELTS)标准进行评估结果。初步结果表明,与系统交互的学生的 grammatical range 有所增加。
K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling
results: 本研究发现了K-pop歌曲翻译的独特特征,与其他已经广泛研究的类型不同,同时还构建了一个基于神经网络的歌词翻译模型,从而证明了专门为歌曲翻译而设计的 dataset 的重要性。Abstract
Lyric translation, a field studied for over a century, is now attracting computational linguistics researchers. We identified two limitations in previous studies. Firstly, lyric translation studies have predominantly focused on Western genres and languages, with no previous study centering on K-pop despite its popularity. Second, the field of lyric translation suffers from a lack of publicly available datasets; to the best of our knowledge, no such dataset exists. To broaden the scope of genres and languages in lyric translation studies, we introduce a novel singable lyric translation dataset, approximately 89\% of which consists of K-pop song lyrics. This dataset aligns Korean and English lyrics line-by-line and section-by-section. We leveraged this dataset to unveil unique characteristics of K-pop lyric translation, distinguishing it from other extensively studied genres, and to construct a neural lyric translation model, thereby underscoring the importance of a dedicated dataset for singable lyric translations.
摘要
<> transtable "Lyric translation, a field studied for over a century, is now attracting computational linguistics researchers. We identified two limitations in previous studies. Firstly, lyric translation studies have predominantly focused on Western genres and languages, with no previous study centering on K-pop despite its popularity. Second, the field of lyric translation suffers from a lack of publicly available datasets; to the best of our knowledge, no such dataset exists. To broaden the scope of genres and languages in lyric translation studies, we introduce a novel singable lyric translation dataset, approximately 89% of which consists of K-pop song lyrics. This dataset aligns Korean and English lyrics line-by-line and section-by-section. We leveraged this dataset to unveil unique characteristics of K-pop lyric translation, distinguishing it from other extensively studied genres, and to construct a neural lyric translation model, thereby underscoring the importance of a dedicated dataset for singable lyric translations."中文翻译:学术界对歌词翻译一百年来进行研究,现在吸引了计算语言学研究者。我们认为前一代研究存在两个限制:首先,歌词翻译研究主要集中在西方类型和语言上,尚未对K-pop进行过研究,尽管其受欢迎程度极高。其次,歌词翻译领域缺乏公共可用数据集,到我们所知,没有这样的数据集存在。为了扩大歌词翻译研究的类型和语言范围,我们介绍了一个新的可唱歌词翻译数据集,其中大约89%是K-pop歌曲 lyrics。这个数据集将韩语和英语歌词一行一行、段段对齐。我们利用了这个数据集,揭示了K-pop歌词翻译的独特特征,与其他广泛研究的类型区分开来,并构建了神经网络歌词翻译模型,从而强调了专门为可唱歌词翻译而设置的数据集的重要性。
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
for: This paper focuses on improving text-video retrieval, which is essential for video filtering, recommendation, and search, due to the increasing amount of web videos.
methods: The paper proposes two novel techniques to improve contrastive learning for text-video retrieval: 1) Dual-Modal Attention-Enhanced Module (DMAE) to mine hard negative pairs, and 2) Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples.
results: The proposed approach outperforms existing methods on four widely-used text-video retrieval datasets, including MSR-VTT, MSVD, DiDeMo, and ActivityNet.Here’s the simplified Chinese text in the format you requested:
results: 该提出的方法在四个常用的文本视频相似性数据集上(MSR-VTT、MSVD、DiDeMo、ActivityNet)得到了较高的性能,比如 existed 方法。Abstract
In recent years, the explosion of web videos makes text-video retrieval increasingly essential and popular for video filtering, recommendation, and search. Text-video retrieval aims to rank relevant text/video higher than irrelevant ones. The core of this task is to precisely measure the cross-modal similarity between texts and videos. Recently, contrastive learning methods have shown promising results for text-video retrieval, most of which focus on the construction of positive and negative pairs to learn text and video representations. Nevertheless, they do not pay enough attention to hard negative pairs and lack the ability to model different levels of semantic similarity. To address these two issues, this paper improves contrastive learning using two novel techniques. First, to exploit hard examples for robust discriminative power, we propose a novel Dual-Modal Attention-Enhanced Module (DMAE) to mine hard negative pairs from textual and visual clues. By further introducing a Negative-aware InfoNCE (NegNCE) loss, we are able to adaptively identify all these hard negatives and explicitly highlight their impacts in the training loss. Second, our work argues that triplet samples can better model fine-grained semantic similarity compared to pairwise samples. We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs. The proposed TPM-CL designs an adaptive token masking strategy with cross-modal interaction to model subtle semantic differences. Extensive experiments demonstrate that the proposed approach outperforms existing methods on four widely-used text-video retrieval datasets, including MSR-VTT, MSVD, DiDeMo and ActivityNet.
摘要
近年来,Web视频的爆炸式增长使得文本视频检索变得越来越重要和受欢迎,用于视频筛选、推荐和搜索。文本视频检索的目标是将相关的文本和视频排名在不相关的文本和视频之前。核心任务是准确度量文本和视频之间的跨Modal相似性。在这个任务中,对照学习方法已经取得了显著成果,大多数方法都是通过建立正例和反例来学习文本和视频表示。然而,这些方法往往忽略硬例和不同水平的 semantic similarity。为了解决这两个问题,本文提出了两种新的技术:首先,我们提出了一种双Modal注意力增强模块(DMAE),以挖掘文本和视频中的硬例。其次,我们引入了一种Negative-aware InfoNCE(NegNCE)损失函数,以适应性地标识和特别强调硬例的影响。其次,我们 argue that triplet samples可以更好地模型细致的 semantic similarity,而不是pairwise samples。我们因此提出了一种新的Triplet Partial Margin Contrastive Learning(TPM-CL)模块,通过自动生成匹配的文本视频对的硬例来建立 partial order triplet samples。TPM-CL模块还设计了一种自适应的token掩码策略,以模型文本和视频之间的跨Modal差异。经过广泛的实验,我们发现,提出的方法在四个常用的文本视频检索数据集上都能够达到更高的性能。
UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt
results: 通过使用高质量的提示,我们扩展了对话系统模型的训练数据集至122个任务,并实现了对多种对话任务和不同的对话系统的优秀表现。Abstract
Recent research has shown that multi-task pre-training greatly improves the model's robustness and transfer ability, which is crucial for building a high-quality dialog system. However, most previous works on multi-task pre-training rely heavily on human-defined input format or prompt, which is not optimal in quality and quantity. In this work, we propose to use Task-based Automatic Prompt generation (TAP) to automatically generate high-quality prompts. Using the high-quality prompts generated, we scale the corpus of the pre-trained conversation model to 122 datasets from 15 dialog-related tasks, resulting in Universal Pre-trained Conversation Model (UniPCM), a powerful foundation model for various conversational tasks and different dialog systems. Extensive experiments have shown that UniPCM is robust to input prompts and capable of various dialog-related tasks. Moreover, UniPCM has strong transfer ability and excels at low resource scenarios, achieving SOTA results on 9 different datasets ranging from task-oriented dialog to open-domain conversation. Furthermore, we are amazed to find that TAP can generate prompts on par with those collected with crowdsourcing. The code is released with the paper.
摘要
近期研究表明,多任务预训练可以大幅提高模型的Robustness和传递能力,这是建立高质量对话系统的关键。然而,大多数前一些工作中的多任务预训练都依赖于人类定义的输入格式或提示,这并不是最佳的质量和量。在这项工作中,我们提议使用任务基本Prompt生成(TAP)自动生成高质量提示。使用生成的高质量提示,我们扩展了预训练对话模型的训练数据集,达到了122个对话相关任务的规模,并命名为Universal Pre-trained Conversation Model(UniPCM)。广泛的实验表明,UniPCM具有输入提示的Robustness和多种对话任务的能力。此外,UniPCM在资源不足的情况下表现出色,在9个不同任务上达到了SOTA的结果,从任务型对话到开放领域对话。此外,我们发现TAP可以生成与人类收集的提示相当的提示。代码随着论文一起发布。
XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
results: 通过对现有的开放和关闭大语言模型进行评估,本论文示出了 instrucion tuning 的效果和不同架构下的编辑任务的影响。此外,广泛的实验还表明了对文本编辑任务的细化解释的重要性。Abstract
Text editing is a crucial task that involves modifying text to better align with user intents. However, existing text editing benchmark datasets have limitations in providing only coarse-grained instructions. Consequently, although the edited output may seem reasonable, it often deviates from the intended changes outlined in the gold reference, resulting in low evaluation scores. To comprehensively investigate the text editing capabilities of large language models, this paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing. XATU covers a wide range of topics and text types, incorporating lexical, syntactic, semantic, and knowledge-intensive edits. To enhance interpretability, we leverage high-quality data sources and human annotation, resulting in a benchmark that includes fine-grained instructions and gold-standard edit explanations. By evaluating existing open and closed large language models against our benchmark, we demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks. Furthermore, extensive experimentation reveals the significant role of explanations in fine-tuning language models for text editing tasks. The benchmark will be open-sourced to support reproduction and facilitate future research.
摘要
Translation in Simplified Chinese:文本编辑是一项重要的任务,它涉及修改文本,使其更加符合用户的意图。然而,现有的文本编辑标准数据集有限制,只提供粗略的指令。因此,编辑后的输出可能看起来合理,但它经常与金标准 refer 中的修改细则不符,导致评价分数低下。为了全面调查大语言模型的文本编辑能力,这篇论文引入 XATU,首个专门为精细指令基于的可解释文本编辑标准。XATU 覆盖了各种话题和文本类型,包括语法、语义和知识等编辑。为了增强可读性,我们利用高质量的数据源和人工标注,从而创建了一个包含精细指令和金标准编辑解释的标准。通过评价现有的开源和关闭式大语言模型,我们示出了指令调整和模型的底层结构对于不同的编辑任务的影响。此外,广泛的实验表明,解释在调整语言模型进行文本编辑任务时发挥了重要的作用。这个标准将被开源,以支持重现和未来研究。
fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese
results: 提出的方法在评估中实现了高精度和F1-Score,证明其在检测假新闻中的有效性。此外,我们还开发了一个User-friendly的网页平台,fakenewsbr.com,以便用户对新闻文章的真实性进行实时分析。Abstract
The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.
摘要
“假新闻的扩散已成为当前的一大问题,因为它可能导致谣言的传播和公众意识的扭曲。这篇论文介绍了检测巴西葡萄牙语假新闻的完整研究,专注于新闻类文章。我们提议一种基于机器学习的方法,利用自然语言处理技术,包括TF-IDF和Word2Vec,提取文本数据中的特征。我们评估了多种分类算法,如逻辑回归、支持向量机和Random Forest等,在一个包含真实和假新闻文章的数据集上进行了评估。我们的方法实现了高精度和F1分数,证明了它的效iveness在识别假新闻。此外,我们还开发了一个用户友好的网站,fakenewsbr.com,以便评估新闻文章的真实性。我们的平台提供了实时分析,让用户在实时基础上评估假新闻文章的可能性。通过实验分析和比较研究,我们表明了我们的方法在抗击假新闻的扩散方面的潜在作用,并促进更有知识的媒体消费。”
Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables
results: 实验表明,TAG-QA 能够生成比基eline 更加准确、完整的答案,特别是与 pipeline-based 基eline TAPAS 和 end-to-end 模型 T5 相比。TAG-QA 在 BLEU-4 和 PARENT F-score 上比 TAPAS 高出 17% 和 14%,并高于 T5 的 BLEU-4 和 PARENT F-score 上的提高为 16% 和 12%。Abstract
Question answering on tabular data (a.k.a TableQA), which aims at generating answers to questions grounded on a provided table, has gained significant attention recently. Prior work primarily produces concise factual responses through information extraction from individual or limited table cells, lacking the ability to reason across diverse table cells. Yet, the realm of free-form TableQA, which demands intricate strategies for selecting relevant table cells and the sophisticated integration and inference of discrete data fragments, remains mostly unexplored. To this end, this paper proposes a generalized three-stage approach: Table-to- Graph conversion and cell localizing, external knowledge retrieval, and the fusion of table and text (called TAG-QA), to address the challenge of inferring long free-form answers in generative TableQA. In particular, TAG-QA (1) locates relevant table cells using a graph neural network to gather intersecting cells between relevant rows and columns, (2) leverages external knowledge from Wikipedia, and (3) generates answers by integrating both tabular data and natural linguistic information. Experiments showcase the superior capabilities of TAG-QA in generating sentences that are both faithful and coherent, particularly when compared to several state-of-the-art baselines. Notably, TAG-QA surpasses the robust pipeline-based baseline TAPAS by 17% and 14% in terms of BLEU-4 and PARENT F-score, respectively. Furthermore, TAG-QA outperforms the end-to-end model T5 by 16% and 12% on BLEU-4 and PARENT F-score, respectively.
摘要
问答基于表格数据(即 TableQA)在最近几年内获得了广泛关注,目的是生成基于提供的表格数据的问题的回答。然而,现有的工作主要通过提取表格单元中的信息进行信息抽取,缺乏能够跨单元进行推理的能力。为了解决这个问题,本文提出了一种通用的三stageapproach:表格转 graf并Cell Localization(TAG-QA),以生成具有推理能力的表格问答系统。具体来说,TAG-QA包括以下三个阶段:1. 使用图 neural network 来找到相关的表格单元,并将其作为交叉单元进行汇聚。2. 利用外部知识来提高表格问答的能力。3. 将表格数据和自然语言信息 integrate 起来,以生成具有 faithful 和 coherent 性的回答。实验表明,TAG-QA 在生成长度不受限制的自由形表格问答方面具有显著的优势,特别是与一些状态之际的基准值进行比较。在 BLEU-4 和 PARENT F-score 等指标上,TAG-QA 与 TAPAS 和 T5 模型相比,净提高了17%和14%。此外,TAG-QA 还在 BLEU-4 和 PARENT F-score 上出现16%和12%的提升。
Heterogeneous Entity Matching with Complex Attribute Associations using BERT and Neural Networks
for: Addressing the challenges of entity matching in heterogeneous data with complex attribute relationships.
methods: Utilizing a novel entity matching model, EMM-CCAR, built upon pre-trained models, with attention mechanisms to capture complex relationships between attributes.
results: Achieving improvements of approximately 4% and 1% in F1 scores compared to prevalent DER-SSM and Ditto approaches, respectively, demonstrating the effectiveness of the proposed model in handling complex attribute relationships.Abstract
Across various domains, data from different sources such as Baidu Baike and Wikipedia often manifest in distinct forms. Current entity matching methodologies predominantly focus on homogeneous data, characterized by attributes that share the same structure and concise attribute values. However, this orientation poses challenges in handling data with diverse formats. Moreover, prevailing approaches aggregate the similarity of attribute values between corresponding attributes to ascertain entity similarity. Yet, they often overlook the intricate interrelationships between attributes, where one attribute may have multiple associations. The simplistic approach of pairwise attribute comparison fails to harness the wealth of information encapsulated within entities.To address these challenges, we introduce a novel entity matching model, dubbed Entity Matching Model for Capturing Complex Attribute Relationships(EMM-CCAR),built upon pre-trained models. Specifically, this model transforms the matching task into a sequence matching problem to mitigate the impact of varying data formats. Moreover, by introducing attention mechanisms, it identifies complex relationships between attributes, emphasizing the degree of matching among multiple attributes rather than one-to-one correspondences. Through the integration of the EMM-CCAR model, we adeptly surmount the challenges posed by data heterogeneity and intricate attribute interdependencies. In comparison with the prevalent DER-SSM and Ditto approaches, our model achieves improvements of approximately 4% and 1% in F1 scores, respectively. This furnishes a robust solution for addressing the intricacies of attribute complexity in entity matching.
摘要
across various domains, data from different sources such as Baidu Baike and Wikipedia often manifest in distinct forms. Current entity matching methodologies predominantly focus on homogeneous data, characterized by attributes that share the same structure and concise attribute values. However, this orientation poses challenges in handling data with diverse formats. Moreover, prevailing approaches aggregate the similarity of attribute values between corresponding attributes to ascertain entity similarity. Yet, they often overlook the intricate interrelationships between attributes, where one attribute may have multiple associations. The simplistic approach of pairwise attribute comparison fails to harness the wealth of information encapsulated within entities.To address these challenges, we introduce a novel entity matching model, dubbed Entity Matching Model for Capturing Complex Attribute Relationships(EMM-CCAR),built upon pre-trained models. Specifically, this model transforms the matching task into a sequence matching problem to mitigate the impact of varying data formats. Moreover, by introducing attention mechanisms, it identifies complex relationships between attributes, emphasizing the degree of matching among multiple attributes rather than one-to-one correspondences. Through the integration of the EMM-CCAR model, we adeptly surmount the challenges posed by data heterogeneity and intricate attribute interdependencies. In comparison with the prevalent DER-SSM and Ditto approaches, our model achieves improvements of approximately 4% and 1% in F1 scores, respectively. This furnishes a robust solution for addressing the intricacies of attribute complexity in entity matching.
Named Entity Recognition via Machine Reading Comprehension: A Multi-Task Learning Approach
results: 对于嵌入式 NER 和平面 NER 数据集,实验结果表明 Multi-NER 可以在所有数据集上提高性能。Abstract
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types (e.g., organization or person name). Recently, many works have been proposed to shape the NER as a machine reading comprehension problem (also termed MRC-based NER), in which entity recognition is achieved by answering the formulated questions related to pre-defined entity types through MRC, based on the contexts. However, these works ignore the label dependencies among entity types, which are critical for precisely recognizing named entities. In this paper, we propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER. We decompose MRC-based NER into multiple tasks and use a self-attention module to capture label dependencies. Comprehensive experiments on both nested NER and flat NER datasets are conducted to validate the effectiveness of the proposed Multi-NER. Experimental results show that Multi-NER can achieve better performance on all datasets.
摘要
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model
results: 实验结果表明,基于大语言模型的方法是建立一个紧凑的对话系统的可能性的。Abstract
This paper explores the potential of constructing an AI spoken dialogue system that "thinks how to respond" and "thinks how to speak" simultaneously, which more closely aligns with the human speech production process compared to the current cascade pipeline of independent chatbot and Text-to-Speech (TTS) modules. We hypothesize that Large Language Models (LLMs) with billions of parameters possess significant speech understanding capabilities and can jointly model dialogue responses and linguistic features. We conduct two sets of experiments: 1) Prosodic structure prediction, a typical front-end task in TTS, demonstrating the speech understanding ability of LLMs, and 2) Further integrating dialogue response and a wide array of linguistic features using a unified encoding format. Our results indicate that the LLM-based approach is a promising direction for building unified spoken dialogue systems.
摘要
这个论文探讨了构建一个基于人工智能的对话系统,该系统可以同时“思考如何回答”和“思考如何说”,这更接近于人类语言生产过程。我们假设大语言模型(LLM)拥有数十亿个参数,具有强大的语音理解能力,可以同时模型对话回答和语言特征。我们进行了两组实验:1)语调结构预测,这是常见的前端任务在文本识别中,以示LLM的语音理解能力;2)将对话回答和广泛的语言特征集成使用统一编码格式。我们的结果表明,基于LLM的方法是构建统一的对话系统的可能之道。
results: 研究结果表明,预训练 transformer 基于语言模型和图神经网络在 Bayesian 优化active learning框架中表现出色,可以在虚拟屏选中提高精度和样本效率,比前一个基eline提高8%。Abstract
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, brute-force virtual screening using traditional tools such as docking becomes infeasible in terms of time and computational resources. Active learning and Bayesian optimization has recently been proven as effective methods of narrowing down the search space. An essential component in those methods is a surrogate machine learning model that is trained with a small subset of the library to predict the desired properties of compounds. Accurate model can achieve high sample efficiency by finding the most promising compounds with only a fraction of the whole library being virtually screened. In this study, we examined the performance of pretrained transformer-based language model and graph neural network in Bayesian optimization active learning framework. The best pretrained models identifies 58.97% of the top-50000 by docking score after screening only 0.6% of an ultra-large library containing 99.5 million compounds, improving 8% over previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Such model can serve as a boost to the accuracy and sample efficiency of active learning based molecule virtual screening.
摘要
做为药物发现的早期步骤之一,虚拟屏选大规模化合物库以找到潜在的靶点候选者。随着商业可用的化合物集合的规模 exponentiated 到亿量级,使用传统工具 such as docking 进行虚拟屏选成为计算资源和时间上的不可行。活动学习和 Bayesian 优化已经被证明为虚拟屏选中的有效方法。这些方法中的一个关键组件是一个训练于小型库中的机器学习模型,用于预测化合物的欲要性。一旦有一个准确的模型,它可以在虚拟屏选中高效地寻找最有前途的化合物,只需虚拟屏选出一小部分的化合物库。在本研究中,我们研究了使用预训练的 transformer 基于语言模型和图神经网络在 Bayesian 优化活动学习框架中的表现。最佳预训练模型可以在虚拟屏选出 99.5 亿个化合物库中的 58.97% 最佳 docking 分数前 50000 个化合物,提高了 8% 于前一个基准值。我们通过广泛的 benchmark 表明,预训练模型在结构基于和药物基于的药物发现中的表现仍然优秀。这种模型可以为活动学习基于虚拟屏选的药物发现增加精度和采样效率。
Popularity Degradation Bias in Local Music Recommendation
results: 研究发现,这两种算法在更受欢迎的艺术家上的推荐性能都有所提高,并且展现了人气倒退偏见。 Mult-VAE 在 menos popular 的艺术家上表现更好,因此在当地音乐艺术家推荐中可能更有优势。Abstract
In this paper, we study the effect of popularity degradation bias in the context of local music recommendations. Specifically, we examine how accurate two top-performing recommendation algorithms, Weight Relevance Matrix Factorization (WRMF) and Multinomial Variational Autoencoder (Mult-VAE), are at recommending artists as a function of artist popularity. We find that both algorithms improve recommendation performance for more popular artists and, as such, exhibit popularity degradation bias. While both algorithms produce a similar level of performance for more popular artists, Mult-VAE shows better relative performance for less popular artists. This suggests that this algorithm should be preferred for local (long-tail) music artist recommendation.
摘要
在这篇论文中,我们研究了本地音乐推荐中的人気倒退偏见影响。我们专门研究了两种最佳推荐算法的精度,即Weight Relevance Matrix Factorization (WRMF)和Multinomial Variational Autoencoder (Mult-VAE)。我们发现这两种算法对更受欢迎的艺术家的推荐性能都有改善,因此它们都存在人気倒退偏见。虽然这两种算法在更受欢迎的艺术家中的表现水平相似,但Mult-VAE在 menos popular 艺术家中表现更优。这表示Mult-VAE应该选择用于本地(长尾)音乐艺术家推荐。
results: 论文的结果表明,该算法可以在大多数情况下提供高度准确的解决方案,而且可以处理更多的样本被随机噪声损害的情况。I hope this helps! Let me know if you have any further questions.Abstract
We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples $(x, y)$ where $y$ is a noisy measurement of $g(w^* \cdot x)$. In particular, \new{the noisy labels are of the form} $y = g(w^* \cdot x) + \xi + \epsilon$, where $\xi$ is the oblivious noise drawn independently of $x$ \new{and satisfies} $\Pr[\xi = 0] \geq o(1)$, and $\epsilon \sim \mathcal N(0, \sigma^2)$. Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g(w^* \cdot x)$, rather than the noisy measurements $y$. We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we \new{provide} a necessary and sufficient condition for identifiability, which holds in broad settings. \new{Specifically,} the problem is identifiable when the quantile at which $\xi + \epsilon = 0$ is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated $g(w^* \cdot x) + A$ for some real number $A$, while also having large error when compared to $g(w^* \cdot x)$. This is the first \new{algorithmic} result for GLM regression \new{with oblivious noise} which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.
摘要
我们展示了第一个对于通用线性模型(GLM)中扩展的问题的回溯算法。我们假设有一个访问例子 $(x, y)$,其中 $y $ 是 $g(w^* \cdot x)$ 的错误的测量。特别是,我们假设错误标签的形式为 $y = g(w^* \cdot x) + \xi + \epsilon$,其中 $\xi $ 是独立于 $x$ 的随机错误,且 $\Pr[\xi = 0] \geq o(1)$,且 $\epsilon \sim \mathcal N(0, \sigma^2)$。我们的目标是将一个精确地回传 $w $ 的参数,使得 $g(w \cdot x)$ 与真正的值 $g(w^* \cdot x)$ 之间的差异可以随时对应。我们提出了一个可以在最通用的分布不依赖情况下解决这个问题的算法。如果问题可解析,我们的算法将返回一个精确的解析结果;否则,它将返回一个小列表,其中一个与真实解析结果相似。此外,我们还提供了必要和充分的可 identificability 条件,这样在广泛的设定下都会成立。具体来说,问题可解析当 $\xi + \epsilon = 0$ 的quantile 知道,或者家族假设不包含 nearly equal to $g(w^* \cdot x) + A$ 的候选者,而且在与 $g(w^* \cdot x)$ 比较时有大的误差。这是第一个对 GLM 回溯算法中扩展的数据验证项目,可以应对更多于半数的样本被任意损坏。先前的工作主要集中在线性回溯领域,并提供了对于特定假设的限制性算法。
Drift Control of High-Dimensional RBM: A Computational Method Based on Neural Networks
results: 研究发现,使用深度神经网络技术可以在高维度($d=30$) 下实现高精度的解决方案,并且计算效率高。Abstract
Motivated by applications in queueing theory, we consider a stochastic control problem whose state space is the $d$-dimensional positive orthant. The controlled process $Z$ evolves as a reflected Brownian motion whose covariance matrix is exogenously specified, as are its directions of reflection from the orthant's boundary surfaces. A system manager chooses a drift vector $\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem formulation, the objective is to minimize expected discounted cost over an infinite planning horizon, after which we treat the corresponding ergodic control problem. Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.
摘要
Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.Translated into Simplified Chinese:我们受到排阵理论应用的驱动下,考虑一个 Stochastic control problem,其state space是 $d$ 维正方形。控制过程 $Z$ 是一个受到确定的均值矩阵影响的反射 Браун运动,其方向受到正方形边界表面的反射影响。系统管理员在每个时刻 $t$ 选择一个推移 вектор $\theta(t)$,基于 $Z$ 的历史,而在每个时刻 $t$ 的成本率取决于 $Z(t)$ 和 $\theta(t)$。在我们的初始问题中,目标是在无限计划时间后面内预算成本,然后处理相应的ergodic control问题。我们将 extending Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510) 的研究,开发了一个基于深度神经网络技术的 simulational-based computational method。在我们试验的问题上,我们的方法精度在 fraction of one percent 以内,并且在维度至少 $d=30$ 时是 computationally feasible。
Potential and limitations of random Fourier features for dequantizing quantum machine learning
results: 这篇论文提出了关于变量量子机器学习 regression 问题下减量化的必要和 suficient 条件,并基于这些准则提出了具体的PQC架构设计和优化方法。Abstract
Quantum machine learning is arguably one of the most explored applications of near-term quantum devices. Much focus has been put on notions of variational quantum machine learning where parameterized quantum circuits (PQCs) are used as learning models. These PQC models have a rich structure which suggests that they might be amenable to efficient dequantization via random Fourier features (RFF). In this work, we establish necessary and sufficient conditions under which RFF does indeed provide an efficient dequantization of variational quantum machine learning for regression. We build on these insights to make concrete suggestions for PQC architecture design, and to identify structures which are necessary for a regression problem to admit a potential quantum advantage via PQC based optimization.
摘要
量子机器学习是近期量子设备应用的一个最具探索性的领域。许多研究都集中在变量量子机器学习中,使用参数化量子电路(PQC)作为学习模型。这些PQC模型具有丰富的结构,这意味着它们可能会受到有效的减量化处理(RFF)。在这个工作中,我们确定了变量量子机器学习 regression 问题下的必要和充分条件,以确保RFF实现有效的减量化。我们基于这些发现,对PQC架构设计提出了具体的建议,并标识了可以使用PQC基于优化实现量子优势的结构。
Early diagnosis of autism spectrum disorder using machine learning approaches
paper_authors: Rownak Ara Rasul, Promy Saha, Diponkor Bala, S M Rakib Ul Karim, Ibrahim Abdullah, Bishwajit Saha
for: This paper aims to utilize machine learning algorithms to identify and automate the diagnostic process for Autistic Spectrum Disorder (ASD).
methods: The paper employs six classification models and five popular clustering methods to analyze ASD datasets, and evaluates their performance using various metrics such as accuracy, precision, recall, specificity, F1-score, AUC, kappa, and log loss.
results: The paper achieves a 100% accuracy rate when hyperparameters are carefully tuned for each model, and finds that spectral clustering outperforms other benchmarking clustering models in terms of NMI and ARI metrics, demonstrating comparability to the optimal SC achieved by k-means.Here’s the Chinese version of the three key points:
for: 这篇论文目标是使用机器学习算法来识别和自动诊断听力特指症(ASD)。
methods: 论文使用 six 种分类模型和 five 种流行的聚类方法来分析 ASD 数据集,并评估其性能使用多种指标 such as 准确率、精度、 recall、特异性、 F1 分数、 AUC、 kappa 和 log loss。
results: 论文在hyperparameter 仔细调整后, achieved a 100% 的准确率,并发现 spectral clustering 在 NMI 和 ARI 指标上表现出色,与 k-means 的最佳 SC 相比。Abstract
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. The severity of these difficulties varies, and those with this diagnosis face unique challenges. While its primary origin lies in genetics, identifying and addressing it early can contribute to the enhancement of the condition. In recent years, machine learning-driven intelligent diagnosis has emerged as a supplement to conventional clinical approaches, aiming to address the potential drawbacks of time-consuming and costly traditional methods. In this work, we utilize different machine learning algorithms to find the most significant traits responsible for ASD and to automate the diagnostic process. We study six classification models to see which model works best to identify ASD and also study five popular clustering methods to get a meaningful insight of these ASD datasets. To find the best classifier for these binary datasets, we evaluate the models using accuracy, precision, recall, specificity, F1-score, AUC, kappa and log loss metrics. Our evaluation demonstrates that five out of the six selected models perform exceptionally, achieving a 100% accuracy rate on the ASD datasets when hyperparameters are meticulously tuned for each model. As almost all classification models are able to get 100% accuracy, we become interested in observing the underlying insights of the datasets by implementing some popular clustering algorithms on these datasets. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI) & Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI & ARI metrics and it also demonstrates comparability to the optimal SC achieved by k-means.
摘要
“自适应谱综合症(ASD)是一种中枢神经系综合病,表现为社交交流、communication和复制活动等障碍。这些障碍的严重程度不同,患有这个诊断的人面临着独特的挑战。尽管其主要起源是遗传的,但可以通过早期识别和治疗来提高其状况。在过去几年中,基于机器学习的智能诊断技术在传统临床方法的支持下 emerged as a supplement, aiming to address the potential drawbacks of time-consuming and costly traditional methods.在这种工作中,我们使用不同的机器学习算法来找出ASD最重要的特征和自动诊断过程。我们研究了六种分类模型,以确定哪种模型最适合识别ASD,并研究了五种流行的聚类方法,以获得有意义的ASD数据见解。为了选择最佳分类器,我们评估了模型使用精度、准确率、回归率、特征选择率、F1分数、AUC、κ和损失函数等指标。我们的评估表明,五个选择的模型在hyperparameter优化后都能够达到100%的准确率。由于大多数分类模型都能够达到100%的准确率,我们开始关注这些数据集的下面隐含的含义。我们在这些数据集上实施了一些流行的聚类算法,并计算了Normalized Mutual Information(NMI)、Adjusted Rand Index(ARI)和Silhouette Coefficient(SC)等指标,以选择最佳聚类模型。我们的评估发现,spectral clustering在NMI和ARI指标上表现出色,并且与k-means的最佳SC指标相比可观。”
Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation
For: This paper focuses on improving sequential music recommendation by incorporating negative session-level feedback using transformer-based self-attentive architectures and contrastive learning.* Methods: The paper proposes using transformer-based self-attentive models to learn implicit session-level information and incorporating negative feedback through a contrastive learning task.* Results: The paper shows that incorporating negative feedback through contrastive learning results in consistent performance gains over baseline architectures ignoring negative user feedback.Abstract
Music streaming services heavily rely on their recommendation engines to continuously provide content to their consumers. Sequential recommendation consequently has seen considerable attention in current literature, where state of the art approaches focus on self-attentive models leveraging contextual information such as long and short-term user history and item features; however, most of these studies focus on long-form content domains (retail, movie, etc.) rather than short-form, such as music. Additionally, many do not explore incorporating negative session-level feedback during training. In this study, we investigate the use of transformer-based self-attentive architectures to learn implicit session-level information for sequential music recommendation. We additionally propose a contrastive learning task to incorporate negative feedback (e.g skipped tracks) to promote positive hits and penalize negative hits. This task is formulated as a simple loss term that can be incorporated into a variety of deep learning architectures for sequential recommendation. Our experiments show that this results in consistent performance gains over the baseline architectures ignoring negative user feedback.
摘要
音乐流处服务重视推荐引擎,以提供不断的内容给消费者。顺序推荐得到了当前文献中一定的关注,现代approach都是基于自我注意力模型,利用用户历史记录和物品特征进行上下文ual information。然而,大多数研究都是针对长形内容领域(零售、电影等),而不是短形内容领域(如音乐)。另外,许多研究都不会在训练过程中包含负session-level反馈。在这个研究中,我们 investigate使用变换器基于自我注意力架构来学习隐藏session-level信息。我们还提出了一种对比学习任务,以包含负反馈(例如跳过的track),以便提高正确的hit和负反馈hit。这个任务被表示为一个简单的损失函数,可以与多种深度学习架构结合使用。我们的实验结果表明,这会导致 ignore negative user feedback的基eline架构的性能提高。
Latent Diffusion Models for Structural Component Design
results: 我们的方法可以实现对现有设计的编辑,并且可以实现高品质的结构性表现。我们的研究获得了量化的结果,证明了生成的设计具有内在的近乎最佳性。Abstract
Recent advances in generative modeling, namely Diffusion models, have revolutionized generative modeling, enabling high-quality image generation tailored to user needs. This paper proposes a framework for the generative design of structural components. Specifically, we employ a Latent Diffusion model to generate potential designs of a component that can satisfy a set of problem-specific loading conditions. One of the distinct advantages our approach offers over other generative approaches, such as generative adversarial networks (GANs), is that it permits the editing of existing designs. We train our model using a dataset of geometries obtained from structural topology optimization utilizing the SIMP algorithm. Consequently, our framework generates inherently near-optimal designs. Our work presents quantitative results that support the structural performance of the generated designs and the variability in potential candidate designs. Furthermore, we provide evidence of the scalability of our framework by operating over voxel domains with resolutions varying from $32^3$ to $128^3$. Our framework can be used as a starting point for generating novel near-optimal designs similar to topology-optimized designs.
摘要
最近的扩散模型技术进步,如扩散模型,对生成模型带来了革命性变革,使得可以生成高质量适应用户需求的图像。这篇论文提出了一个生成结构组件的框架。我们使用潜在扩散模型来生成可满足给定负荷条件的组件的潜在设计。与其他生成方法,如生成对抗网络(GANs)相比,我们的方法允许编辑现有设计。我们使用结构 topology 优化算法来获得几何数据,并在这些数据上训练我们的模型。因此,我们的框架可以生成自然near-optimal设计。我们的工作提供了量化结果,证明生成的设计具有结构性能的可靠性和可变性。此外,我们还证明了我们的框架可以在 voxel 领域中进行扩展,并且可以在 $32^3$ 到 $128^3$ 的分辨率范围内操作。我们的框架可以作为生成类似于 topology-optimized 设计的开始点。
Multiplying poles to avoid unwanted points in root finding and optimization
results: 提出了一种新的算法,可以帮助避免在root finding和优化中被吸引到特定点的basin of attraction中。该算法适用于iterative算法,并且可以在函数值为0时和函数值非零时两种情况下进行。此外,还提出了一种算法,可以帮助从一个正方向的分支中逃脱到另一个分支。Abstract
In root finding and optimization, there are many cases where there is a closed set $A$ one does not the sequence constructed by one's favourite method will converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $x^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm. In this paper, we propose a new method aiming to achieve this: we divide the cost function by an appropriate power of the distance function to $A$. This idea is inspired by how one would try to find all roots of a function in 1 variable. We first explain the heuristic for this method in the case where the minimum of the cost function is exactly 0, and then explain how to proceed if the minimum is non-zero (allowing both positive and negative values). The method is very suitable for iterative algorithms which have the descent property. We also propose, based on this, an algorithm to escape the basin of attraction of a component of positive dimension to reach another component. Along the way, we compare with main existing relevant methods in the current literature. We provide several examples to illustrate the usefulness of the new approach.
摘要
在根寻找和优化中,有许多情况下,使用一种喜欢的方法constructing sequence将不会 converges to A(这里,我们不 assumption A是 convex或连通的其他性质)。例如,如果一个人想要找到根,并且选择初始点在拥有1根x*的基因囊拥(这可能是一个不知道的前提),那么一定会 ending up in that root。在这种情况下,我们希望有一种机制来避免这个点z*在下一次算法中。在这篇论文中,我们提出了一种新的方法,旨在实现这一点:我们将cost函数除以一个合适的powere distance函数到A。这个想法是根据在一个变量中找所有根的方法启发的。我们首先解释了在cost函数的最小值为0时的补做,然后解释如何处理非零最小值(允许正负值)。这种方法非常适合iterative算法,我们也建议一种使用这种方法逃脱基因囊拥的组分的方法。在进行这种方法的比较中,我们与现有的主要相关方法进行了比较。我们还提供了一些例子,以 Illustrate新的方法的有用性。
Model-free tracking control of complex dynamical trajectories with machine learning
results: 通过使用各种 periodic和异常信号,证明了控制框架的有效性,并在测试阶段(部署阶段)下确认了其对测量噪声、干扰和不确定性的稳定性。Abstract
Nonlinear tracking control enabling a dynamical system to track a desired trajectory is fundamental to robotics, serving a wide range of civil and defense applications. In control engineering, designing tracking control requires complete knowledge of the system model and equations. We develop a model-free, machine-learning framework to control a two-arm robotic manipulator using only partially observed states, where the controller is realized by reservoir computing. Stochastic input is exploited for training, which consists of the observed partial state vector as the first and its immediate future as the second component so that the neural machine regards the latter as the future state of the former. In the testing (deployment) phase, the immediate-future component is replaced by the desired observational vector from the reference trajectory. We demonstrate the effectiveness of the control framework using a variety of periodic and chaotic signals, and establish its robustness against measurement noise, disturbances, and uncertainties.
摘要
非线性跟踪控制,使动力系统跟踪所需的轨迹是机器人控制的基础,广泛应用于文明和国防领域。在控制工程中,设计跟踪控制需要完整的系统模型和方程。我们开发了一个无模型、机器学习框架,控制两臂机械 manipulate 器使用只有部分观察状态,控制器通过 rezzo 计算机。在训练阶段,利用 Stochastic 输入,训练过程包括观察的部分状态向量作为第一个组成部分,以及其未来的状态向量作为第二个组成部分,因此 neural machine 将后者视为前者的未来状态。在测试(部署)阶段,未来状态向量被替换为来自参照轨迹的所需观察向量。我们使用了多种 periodic 和混沌信号进行测试,并证明了控制框架的可靠性,对测量噪音、干扰和不确定性的抗性。
Digital twins of nonlinear dynamical systems: A perspective
results: 可以预测和避免非线性动力系统的突然规模事件,提供早期警示和预测性解决方案Abstract
Digital twins have attracted a great deal of recent attention from a wide range of fields. A basic requirement for digital twins of nonlinear dynamical systems is the ability to generate the system evolution and predict potentially catastrophic emergent behaviors so as to providing early warnings. The digital twin can then be used for system "health" monitoring in real time and for predictive problem solving. In particular, if the digital twin forecasts a possible system collapse in the future due to parameter drifting as caused by environmental changes or perturbations, an optimal control strategy can be devised and executed as early intervention to prevent the collapse. Two approaches exist for constructing digital twins of nonlinear dynamical systems: sparse optimization and machine learning. The basics of these two approaches are described and their advantages and caveats are discussed.
摘要
<>非线性动力系统的数字孪生有很多最近的关注,来自各种领域。数字孪生的基本要求是能够生成系统演化和预测可能出现的灾难性行为,以提供早期警示。数字孪生可以用于实时监测系统“健康”状态,并预测问题。特别是,如果数字孪生预测系统将在未来因为环境变化或干扰而导致崩溃,就可以根据这个预测来设计和执行早期干预措施,以避免崩溃。构建非线性动力系统的数字孪生有两种方法:散列优化和机器学习。这两种方法的基础和优缺点都是介绍的。>>>
Multi-Step Model Predictive Safety Filters: Reducing Chattering by Increasing the Prediction Horizon
paper_authors: Federico Pizarro Bejarano, Lukas Brunke, Angela P. Schoellig
for: This paper aims to improve the safety guarantees of learning-based controllers by reducing chattering in model predictive safety filters (MPSFs).
methods: The proposed approach considers input corrections over a longer horizon and uses techniques from robust MPC to prove recursive feasibility, reducing chattering by more than a factor of 4 compared to previous MPSF formulations.
results: The proposed approach is verified through extensive simulation and quadrotor experiments, demonstrating the preservation of desired safety guarantees and a significant reduction in chattering compared to previous MPSF formulations.Abstract
Learning-based controllers have demonstrated superior performance compared to classical controllers in various tasks. However, providing safety guarantees is not trivial. Safety, the satisfaction of state and input constraints, can be guaranteed by augmenting the learned control policy with a safety filter. Model predictive safety filters (MPSFs) are a common safety filtering approach based on model predictive control (MPC). MPSFs seek to guarantee safety while minimizing the difference between the proposed and applied inputs in the immediate next time step. This limited foresight can lead to jerky motions and undesired oscillations close to constraint boundaries, known as chattering. In this paper, we reduce chattering by considering input corrections over a longer horizon. Under the assumption of bounded model uncertainties, we prove recursive feasibility using techniques from robust MPC. We verified the proposed approach in both extensive simulation and quadrotor experiments. In experiments with a Crazyflie 2.0 drone, we show that, in addition to preserving the desired safety guarantees, the proposed MPSF reduces chattering by more than a factor of 4 compared to previous MPSF formulations.
摘要
Distribution and volume based scoring for Isolation Forests
methods: 第一种方法是基于信息理论的总体分数函数的扩展,可以考虑整个分布而不仅仅是树ensemble平均值。第二种方法是在隔离树 estimator nivel replace depth-based 分数函数。
results: 对于生成的数据和34个``ADBench’’ benchmark dataset进行了评估,发现使用这两种方法可以在某些dataset上提高异常检测的精度,并且在所有dataset上平均上提高一种变体。代码可以在提交中找到。Abstract
We make two contributions to the Isolation Forest method for anomaly and outlier detection. The first contribution is an information-theoretically motivated generalisation of the score function that is used to aggregate the scores across random tree estimators. This generalisation allows one to take into account not just the ensemble average across trees but instead the whole distribution. The second contribution is an alternative scoring function at the level of the individual tree estimator, in which we replace the depth-based scoring of the Isolation Forest with one based on hyper-volumes associated to an isolation tree's leaf nodes. We motivate the use of both of these methods on generated data and also evaluate them on 34 datasets from the recent and exhaustive ``ADBench'' benchmark, finding significant improvement over the standard isolation forest for both variants on some datasets and improvement on average across all datasets for one of the two variants. The code to reproduce our results is made available as part of the submission.
摘要
我们做了两个贡献到隔离森林方法中,用于异常和偏出检测。第一个贡献是基于信息理论的预测函数的一种扩展,用于聚合随机树估计值。这个扩展允许我们考虑不仅ensemble均值过滤,而是整个分布。第二个贡献是将隔离树估计值中的深度基于的评分函数 replaced with hyper-volume association with isolation tree leaf nodes。我们在生成数据上验证了这两种方法,并在``ADBench''benchmark中的34个数据集上进行了评估,发现这两种方法在一些数据集上有所改善,而且在所有数据集上的平均改善。我们的结果可以在提交中找到相关的代码。
Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models
results: 我们提供了一种基于diffusion-based sampling的有效样本复杂度 bound,当折衔函数是通过深度神经网络学习得到的时候。Abstract
We investigate the approximation efficiency of score functions by deep neural networks in diffusion-based generative modeling. While existing approximation theories utilize the smoothness of score functions, they suffer from the curse of dimensionality for intrinsically high-dimensional data. This limitation is pronounced in graphical models such as Markov random fields, common for image distributions, where the approximation efficiency of score functions remains unestablished. To address this, we observe score functions can often be well-approximated in graphical models through variational inference denoising algorithms. Furthermore, these algorithms are amenable to efficient neural network representation. We demonstrate this in examples of graphical models, including Ising models, conditional Ising models, restricted Boltzmann machines, and sparse encoding models. Combined with off-the-shelf discretization error bounds for diffusion-based sampling, we provide an efficient sample complexity bound for diffusion-based generative modeling when the score function is learned by deep neural networks.
摘要
我团队研究使用深度神经网络来近似分布式生成模型中的分数函数的效率。现有的近似理论利用分数函数的平滑性,但是它们由于数维度的封闭而受到诅咒性的影响,特别是图形模型,如图像分布中的马可夫随机场,其中分数函数的近似效率未能得到确定。为了解决这个问题,我们发现分数函数在图形模型中可以通过变量推理梯度下降算法进行良好的近似。此外,这些算法可以fficient地表示为神经网络。我们在图像分布中的伊辛模型、条件伊辛模型、受限的博尔tz曼机和简洁编码模型中进行了示例。与市场上的批量误差边界相结合,我们提供了一个高效的样本复杂度下界 для diffusion-based生成模型,当分数函数被深度神经网络学习时。
results: 实验结果表明,Transformer模型只有在绝对价格序列预测方面表现出有限的优势,而LSTM模型在差价序列预测和价格运动预测方面表现更好和更稳定。Abstract
With the rapid development of artificial intelligence, long short term memory (LSTM), one kind of recurrent neural network (RNN), has been widely applied in time series prediction. Like RNN, Transformer is designed to handle the sequential data. As Transformer achieved great success in Natural Language Processing (NLP), researchers got interested in Transformer's performance on time series prediction, and plenty of Transformer-based solutions on long time series forecasting have come out recently. However, when it comes to financial time series prediction, LSTM is still a dominant architecture. Therefore, the question this study wants to answer is: whether the Transformer-based model can be applied in financial time series prediction and beat LSTM. To answer this question, various LSTM-based and Transformer-based models are compared on multiple financial prediction tasks based on high-frequency limit order book data. A new LSTM-based model called DLSTM is built and new architecture for the Transformer-based model is designed to adapt for financial prediction. The experiment result reflects that the Transformer-based model only has the limited advantage in absolute price sequence prediction. The LSTM-based models show better and more robust performance on difference sequence prediction, such as price difference and price movement.
摘要
随着人工智能的快速发展,长短期记忆(LSTM),一种回归神经网络(RNN),在时间序列预测中得到了广泛的应用。与RNN类似,Transformer是用于处理时间序列数据的设计。由于Transformer在自然语言处理(NLP)中取得了巨大成功,研究人员对Transformer在时间序列预测中的表现感到兴趣,并在最近出现了许多基于Transformer的解决方案。然而,在金融时间序列预测中,LSTM仍然是主导的建筑。因此,本研究的问题是:可否使用Transformer-based模型来预测金融时间序列,并超越LSTM。为了回答这个问题,本研究对多种LSTM-based和Transformer-based模型进行了比较,并在高频限制ORDER BOOK数据上进行了多个金融预测任务。此外,一种新的LSTM-based模型called DLSTM被建立,并对Financial prediction进行了新的建筑。实验结果表明,Transformer-based模型只有有限的优势在绝对价格序列预测中。相比之下,LSTM-based模型在差价序列预测中表现更好和更加稳定,例如价格差和价格运动。
SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On
methods: 我们提出了一种新的框架 called SR-PredictAO,它包括一个高能力预测器模块,可以减轻用户行为的随机性对预测的影响。此外,我们还提出了一种可以应用于现有模型上的高能力预测器模块优化方法。
results: 我们在两个实际数据集上进行了广泛的实验,并证明了SR-PredictAO在三种现有模型上的表现比现有模型更好,具体来说,SR-PredictAO在HR@20和MRR@20上比现有模型高出2.9%和2.3%。此外,这些改进都是在大多数现有模型上的所有数据集上进行的,这可以被视为Session-based recommendation领域的一项重要贡献。Abstract
Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies focus on how to optimize the encoder module extensively in the paradigm but they ignore how to optimize the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module which could alleviate the effect of random user's behavior for prediction. It is worth mentioning that this framework could be applied to any existing models, which could give opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} out-performs the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all the existing models on all datasets, which could be regarded as a significant contribution in the field.
摘要
Session-based 推荐,targeting at predicting the user's next item click based on the information in a single session, is a complex problem. This complex problem requires a high-capability model for predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm, where all studies focus on optimizing the encoder module extensively in the paradigm but ignore the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module that can alleviate the effect of random user behavior for prediction. It is worth mentioning that this framework can be applied to any existing models, which can provide opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} outperforms the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all existing models on all datasets, which can be regarded as a significant contribution in the field.
Learning Patient Static Information from Time-series EHR and an Approach for Safeguarding Privacy and Fairness
for: 这种研究旨在 investigate the ability of time-series electronic health record data to predict patient static information, and to develop a general approach to protect patient-sensitive attribute information for downstream tasks.
methods: 研究使用了时序数据和机器学习模型,并使用了多种方法和数据库来评估模型的性能。
results: 研究发现, raw time-series data 和机器学习模型学习的表示可以高度预测patient的静态信息,包括生物性别、年龄和自reported race。此外,这些预测性能可以扩展到各种相关疾病因素,并且存在even when the model was trained for different tasks, using different cohorts, using different model architectures and databases.Abstract
Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. For example, previous work has shown that patient self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to a wide range of comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive attribute information for downstream tasks.
摘要
Using Property Elicitation to Understand the Impacts of Fairness Constraints
results: 研究发现,添加正则化函数可能会改变优化目标的最小值,并且可以通过属性描述来描述这种改变。此外,研究还发现在不同的数据分布和约束条件下,算法决策的变化。Abstract
Predictive algorithms are often trained by optimizing some loss function, to which regularization functions are added to impose a penalty for violating constraints. As expected, the addition of such regularization functions can change the minimizer of the objective. It is not well-understood which regularizers change the minimizer of the loss, and, when the minimizer does change, how it changes. We use property elicitation to take first steps towards understanding the joint relationship between the loss and regularization functions and the optimal decision for a given problem instance. In particular, we give a necessary and sufficient condition on loss and regularizer pairs for when a property changes with the addition of the regularizer, and examine some regularizers satisfying this condition standard in the fair machine learning literature. We empirically demonstrate how algorithmic decision-making changes as a function of both data distribution changes and hardness of the constraints.
摘要
预测算法经常通过优化损失函数来训练,并将正则函数添加到损失函数中以实现一些约束。预期地,添加正则函数会改变损失函数的最小值。然而,我们不很了解哪些正则函数会改变损失函数的最小值,以及这些改变是如何发生的。我们使用财产描述来开始理解损失和正则函数对于给定问题实例的优化决策的关系。特别是,我们给出了损失和正则函数对的必要和 suficient condition,并考察了常见的公平机器学习 литературе中的一些满足这个condition的正则函数。我们通过实验表明,在数据分布变化和约束硬度变化的情况下,算法决策会发生变化。
WFTNet: Exploiting Global and Local Periodicity in Long-term Time Series Forecasting
results: 对多种时间序列数据进行了广泛的实验,并 consistently 超过了其他基准值。Abstract
Recent CNN and Transformer-based models tried to utilize frequency and periodicity information for long-term time series forecasting. However, most existing work is based on Fourier transform, which cannot capture fine-grained and local frequency structure. In this paper, we propose a Wavelet-Fourier Transform Network (WFTNet) for long-term time series forecasting. WFTNet utilizes both Fourier and wavelet transforms to extract comprehensive temporal-frequency information from the signal, where Fourier transform captures the global periodic patterns and wavelet transform captures the local ones. Furthermore, we introduce a Periodicity-Weighted Coefficient (PWC) to adaptively balance the importance of global and local frequency patterns. Extensive experiments on various time series datasets show that WFTNet consistently outperforms other state-of-the-art baseline.
摘要
近期的CNN和Transformer模型尝试利用频率和周期信息进行长期时间序预测。然而,大多数现有工作基于傅里叶变换,这无法捕捉细致的频率结构。在这篇论文中,我们提出了一种幂 transformed-wavelet网络(WFTNet),用于长期时间序预测。WFTNet利用了傅里叶和wavelet变换来提取时间序列中的全面时间频率信息,其中傅里叶变换捕捉到全球性征周期模式,wavelet变换捕捉到本地性征周期模式。此外,我们引入了一种 Periodicity-Weighted Coefficient(PWC),以适应地 adaptively 衡量全球和本地频率模式的重要性。我们在不同的时间序列数据集上进行了广泛的实验,并证明了WFTNet在其他基eline上 consistently 升级。
Create and Find Flatness: Building Flat Training Spaces in Advance for Continual Learning
methods: 我们提出了一种 novel 的 Create and Find Flatness(C&F)框架,在每个任务学习阶段建立一个适应当任务的平坦训练空间。在学习当前任务时,我们适应创建一个损失函数的平坦区域,然后根据参数对当前任务的重要性进行评估。在适应新任务时,我们会应用约束以根据平坦度,同时为新任务准备平坦的训练空间。
results: 我们的 C&F 框架在 standalone continual learning 中表现出色,并且可以与其他方法组合使用。实验结果表明,C&F 可以保持之前任务知识,同时学习新任务,并且在不同的 dataset 上具有稳定的性能。Abstract
Catastrophic forgetting remains a critical challenge in the field of continual learning, where neural networks struggle to retain prior knowledge while assimilating new information. Most existing studies emphasize mitigating this issue only when encountering new tasks, overlooking the significance of the pre-task phase. Therefore, we shift the attention to the current task learning stage, presenting a novel framework, C&F (Create and Find Flatness), which builds a flat training space for each task in advance. Specifically, during the learning of the current task, our framework adaptively creates a flat region around the minimum in the loss landscape. Subsequently, it finds the parameters' importance to the current task based on their flatness degrees. When adapting the model to a new task, constraints are applied according to the flatness and a flat space is simultaneously prepared for the impending task. We theoretically demonstrate the consistency between the created and found flatness. In this manner, our framework not only accommodates ample parameter space for learning new tasks but also preserves the preceding knowledge of earlier tasks. Experimental results exhibit C&F's state-of-the-art performance as a standalone continual learning approach and its efficacy as a framework incorporating other methods. Our work is available at https://github.com/Eric8932/Create-and-Find-Flatness.
摘要
catastrophic forgetting 是一个重要挑战在持续学习领域, neural network 在接受新任务时忘记之前的知识是一个关键问题。 existing studies 通常只关注在新任务上 mitigating 这个问题,忽视了 pre-task 阶段的重要性。 因此,我们将注意力集中在当前任务学习阶段,提出了一种新的框架, C&F(Create and Find Flatness),它在每个任务之前建立了一个平坦的训练空间。 specifically, 在学习当前任务时,我们的框架会动态创建一个缺失的最小值附近的平坦区域。 然后,它会根据参数的平坦度来确定参数的当前任务重要性。 当适应新任务时,我们会根据平坦度应用约束,并同时为下一个任务准备一个平坦的空间。 我们理论上验证了创建和发现平坦的一致性。 因此,我们的框架不仅为学习新任务提供了充足的参数空间,而且也保留了前一个任务中的知识。 实验结果表明 C&F 能够独立地实现状态机器学习的表现,同时作为其他方法的框架也有出色的效果。 我们的工作可以在 中找到。
Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information
results: 该论文通过使用三个生物序列数据集(蛋白质和核酸)和四种嵌入方法(Spike2Vec、Spaced k-mers、PWM2Vec 和 AutoEncoder)进行评估,并结果表明该方法可以帮助研究者和实践者更好地理解嵌入在不同应用中的效果,并提供一个量化的评估方法。Abstract
Effective representation of data is crucial in various machine learning tasks, as it captures the underlying structure and context of the data. Embeddings have emerged as a powerful technique for data representation, but evaluating their quality and capacity to preserve structural and contextual information remains a challenge. In this paper, we address this need by proposing a method to measure the \textit{representation capacity} of embeddings. The motivation behind this work stems from the importance of understanding the strengths and limitations of embeddings, enabling researchers and practitioners to make informed decisions in selecting appropriate embedding models for their specific applications. By combining extrinsic evaluation methods, such as classification and clustering, with t-SNE-based neighborhood analysis, such as neighborhood agreement and trustworthiness, we provide a comprehensive assessment of the representation capacity. Additionally, the use of optimization techniques (bayesian optimization) for weight optimization (for classification, clustering, neighborhood agreement, and trustworthiness) ensures an objective and data-driven approach in selecting the optimal combination of metrics. The proposed method not only contributes to advancing the field of embedding evaluation but also empowers researchers and practitioners with a quantitative measure to assess the effectiveness of embeddings in capturing structural and contextual information. For the evaluation, we use $3$ real-world biological sequence (proteins and nucleotide) datasets and performed representation capacity analysis of $4$ embedding methods from the literature, namely Spike2Vec, Spaced $k$-mers, PWM2Vec, and AutoEncoder.
摘要
效果表示数据的表示是机器学习任务中的关键,它捕捉了数据的下面结构和上下文。嵌入在机器学习中出现为一种强大的表示技巧,但评估其质量和保持结构和上下文信息的能力仍然是一个挑战。本文提出一种方法来衡量嵌入的表示能力。这种方法的动机来自于了理解嵌入的优劣点,以便研究者和实践者可以根据特定应用选择合适的嵌入模型。通过结合外部评估方法(如分类和聚类)和t-SNE基于的邻居分析(如邻居一致和信任度),我们提供了一种全面的评估方法。此外,使用搜索算法(bayesian优化)来优化参数(如分类、聚类、邻居一致和信任度),确保了一种客观和数据驱动的方法来选择最佳的综合指标。该方法不仅为嵌入评估领域做出了贡献,还为研究者和实践者提供了一个量化的评估方法,以评估嵌入是否能够有效地捕捉结构和上下文信息。为评估,我们使用了3个实际生物序列(蛋白质和核苷酸)数据集,并对Literature中的4种嵌入方法进行表示能力分析,即Spike2Vec、Spaced k-mers、PWM2Vec和AutoEncoder。
Grassroots Operator Search for Model Edge Adaptation
methods: 该方法使用 Grassroots Operator Search(GOS)方法,通过搜索和选择高效的操作符来代替原始模型中的操作符,以提高模型的计算效率while maintaining high accuracy。
results: 在多种深度学习模型上,该方法可以在Redmi Note 7S和Raspberry Pi3等边缘设备上实现至少2.2倍的计算速度提升,同时保持高度的准确率。此外,在脉冲频度估计应用中,该方法可以达到状态 Künstler的性能,同时保持计算复杂度的减少,证明了该方法的实用性。Abstract
Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used to design efficient deep learning architectures. An efficient and flexible search space is crucial to the success of HW-NAS. Current approaches focus on designing a macro-architecture and searching for the architecture's hyperparameters based on a set of possible values. This approach is biased by the expertise of deep learning (DL) engineers and standard modeling approaches. In this paper, we present a Grassroots Operator Search (GOS) methodology. Our HW-NAS adapts a given model for edge devices by searching for efficient operator replacement. We express each operator as a set of mathematical instructions that capture its behavior. The mathematical instructions are then used as the basis for searching and selecting efficient replacement operators that maintain the accuracy of the original model while reducing computational complexity. Our approach is grassroots since it relies on the mathematical foundations to construct new and efficient operators for DL architectures. We demonstrate on various DL models, that our method consistently outperforms the original models on two edge devices, namely Redmi Note 7S and Raspberry Pi3, with a minimum of 2.2x speedup while maintaining high accuracy. Additionally, we showcase a use case of our GOS approach in pulse rate estimation on wristband devices, where we achieve state-of-the-art performance, while maintaining reduced computational complexity, demonstrating the effectiveness of our approach in practical applications.
摘要
在这篇论文中,我们提出了一种基于植物架构的搜寻方法,称为Grassroots Operator Search(GOS)。我们的HW-NAS方法运用了一个基于植物架构的搜寻空间,寻找高效的操作器替代。我们将每个操作器表示为一些数学指令,这些指令 capture了操作器的行为。这些数学指令后来用作搜寻和选择高效的操作器替代,以维持原始模型的精度,并降低计算复杂性。我们的方法是一种基于植物的方法,因为它将基于植物架构的数学基础建构新的高效操作器。我们在不同的深度学习模型上进行了评估,我们的方法在Redmi Note 7S和Raspberry Pi3等两个边缘设备上显示了至少2.2倍的速度提升,同时维持高精度。此外,我们还展示了我们的GOS方法在脉搏监测器上的实际应用,在这个应用中,我们取得了现有最佳性能,同时维持了降低的计算复杂性,实证了我们的方法在实际应用中的有效性。
Towards a Prediction of Machine Learning Training Time to Support Continuous Learning Systems Development
methods: 我们对 Zheng et al.提出的Full Parameter Time Complexity (FPTC)方法进行了广泛的实证研究。这是我们知道的唯一一种形式化ML模型训练时间与数据集和模型参数之间的关系。我们研究了逻辑回归和随机森林分类器的形ulation,并指出了主要的优点和缺点。
results: 我们发现,从实验结果来看,训练时间预测与数据集上下文有着紧密的关系。FPTC方法不能泛化。Abstract
The problem of predicting the training time of machine learning (ML) models has become extremely relevant in the scientific community. Being able to predict a priori the training time of an ML model would enable the automatic selection of the best model both in terms of energy efficiency and in terms of performance in the context of, for instance, MLOps architectures. In this paper, we present the work we are conducting towards this direction. In particular, we present an extensive empirical study of the Full Parameter Time Complexity (FPTC) approach by Zheng et al., which is, to the best of our knowledge, the only approach formalizing the training time of ML models as a function of both dataset's and model's parameters. We study the formulations proposed for the Logistic Regression and Random Forest classifiers, and we highlight the main strengths and weaknesses of the approach. Finally, we observe how, from the conducted study, the prediction of training time is strictly related to the context (i.e., the involved dataset) and how the FPTC approach is not generalizable.
摘要
《机器学习模型训练时间预测问题已成为科学界热点问题。如果可以在先知道模型训练时间,那么可以自动选择最佳模型,以保证能够达到最佳性能和能效率。在这篇论文中,我们介绍了我们在这个方向下的工作。具体来说,我们对 Zheng et al. 等人提出的 Full Parameter Time Complexity(FPTC)方法进行了广泛的实证研究。这是我们所知道的唯一一种形式化机器学习模型训练时间为数据集和模型参数的函数。我们对 Logistic Regression 和 Random Forest 分类器的形ulation进行了研究,并将其主要优点和缺点进行了描述。最后,我们发现,从我们进行的研究来看,训练时间预测与数据集相关,而 FPTC 方法不能泛化。》Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.
A Model-Based Machine Learning Approach for Assessing the Performance of Blockchain Applications
results: 我们的模型比较统计结果表明,使用 $k$NN 模型可以比 SVM 模型提高性能,并且使用 ISO 可以减少不确定性 deviation 的偏差。Abstract
The recent advancement of Blockchain technology consolidates its status as a viable alternative for various domains. However, evaluating the performance of blockchain applications can be challenging due to the underlying infrastructure's complexity and distributed nature. Therefore, a reliable modelling approach is needed to boost Blockchain-based applications' development and evaluation. While simulation-based solutions have been researched, machine learning (ML) model-based techniques are rarely discussed in conjunction with evaluating blockchain application performance. Our novel research makes use of two ML model-based methods. Firstly, we train a $k$ nearest neighbour ($k$NN) and support vector machine (SVM) to predict blockchain performance using predetermined configuration parameters. Secondly, we employ the salp swarm optimization (SO) ML model which enables the investigation of optimal blockchain configurations for achieving the required performance level. We use rough set theory to enhance SO, hereafter called ISO, which we demonstrate to prove achieving an accurate recommendation of optimal parameter configurations; despite uncertainty. Finally, statistical comparisons indicate that our models have a competitive edge. The $k$NN model outperforms SVM by 5\% and the ISO also demonstrates a reduction of 4\% inaccuracy deviation compared to regular SO.
摘要
最近的区块链技术进步使其成为多种领域的可靠 altenative。然而,评估区块链应用程序性能可能会困难由于区块链基础设施的复杂性和分布式特点。因此,一种可靠的模型方法是需要为区块链应用程序的开发和评估提供 boost。而且,使用simulation-based解决方案已经被研究,但是使用机器学习(ML)模型基于技术 rarely discussed in conjunction with evaluating blockchain application performance。我们的新研究使用了两种ML模型基于方法。首先,我们使用 $k$ nearest neighbour ($k$NN) 和支持向量机(SVM)来预测区块链性能使用预先确定的配置参数。其次,我们使用salp swarm optimization(SO)ML模型,该模型允许我们调查到达所需性能水平的优化的区块链配置。我们使用粗设理论来增强SO,称为ISO,并证明ISO可以准确地提供优化参数配置,即使存在uncertainty。最后,统计比较表明,我们的模型具有竞争优势。$k$NN模型在比较SVM方法时表现出5%的提升,而ISO模型也表现出4%的减少不确定性偏移。
RHALE: Robust and Heterogeneity-aware Accumulated Local Effects
results: 对于 synthetic 和实际数据集,RHALE 方法比其他方法表现更优,特别是在相关特征情况下。 RHALE 方法还可以自动确定最佳分割方案,以兼顾 bias 和 variance。Abstract
Accumulated Local Effects (ALE) is a widely-used explainability method for isolating the average effect of a feature on the output, because it handles cases with correlated features well. However, it has two limitations. First, it does not quantify the deviation of instance-level (local) effects from the average (global) effect, known as heterogeneity. Second, for estimating the average effect, it partitions the feature domain into user-defined, fixed-sized bins, where different bin sizes may lead to inconsistent ALE estimations. To address these limitations, we propose Robust and Heterogeneity-aware ALE (RHALE). RHALE quantifies the heterogeneity by considering the standard deviation of the local effects and automatically determines an optimal variable-size bin-splitting. In this paper, we prove that to achieve an unbiased approximation of the standard deviation of local effects within each bin, bin splitting must follow a set of sufficient conditions. Based on these conditions, we propose an algorithm that automatically determines the optimal partitioning, balancing the estimation bias and variance. Through evaluations on synthetic and real datasets, we demonstrate the superiority of RHALE compared to other methods, including the advantages of automatic bin splitting, especially in cases with correlated features.
摘要
集成本地效应(ALE)是一种广泛使用的解释方法,用于隔离输出的平均效应,因为它能够处理相关的特征 случа子 well。然而,它有两个限制。首先,它不计算特定实例(本地)效应与平均(全局)效应之间的偏差。其次,为计算平均效应,它将特征领域分成用户定义、固定大小的分割,不同的分割大小可能会导致不一致的 ALE 估计。为解决这些限制,我们提出了 Robust and Heterogeneity-aware ALE(RHALE)。RHALE 考虑了本地效应的标准差,以及自动确定最佳变量大小分割。在这篇论文中,我们证明了,为在每个分割中精确估计本地效应的标准差,分割必须遵循一组必要条件。基于这些条件,我们提出了一种算法,可以自动确定最佳分割,协调估计偏差和方差。通过对 synthetic 和实际数据进行评估,我们示出了 RHALE 与其他方法相比,具有较好的优势,特别是在相关特征情况下。
Investigating Personalization Methods in Text to Music Generation
results: 研究发现,相似度指标与用户喜好相吻合,现有的个性化方法更容易学习rhythmic音乐构造而不是melody。Please note that the above text is in Simplified Chinese.Abstract
In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. We experiment with the effect of audio-specific data augmentation on the overall system performance and assess different training strategies. For evaluation, we construct a novel dataset with prompts and music clips. We consider both embedding-based and music-specific metrics for quantitative evaluation, as well as a user study for qualitative evaluation. Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody. The code, dataset, and example material of this study are open to the research community.
摘要
在这项研究中,我们调查了文本到音乐填充模型在几个尝试设置下的个性化。受最近计算机视觉领域的进步 inspirits,我们是首次探讨将预训练文本到音频填充器与两种已有个性化方法结合使用。我们对系统性能的影响进行了音频特定数据增强的实验,并评估了不同的训练策略。为评价,我们建立了一个新的提示和音乐片断集合。我们使用了两种嵌入空间和音乐特有的评价指标进行量化评估,以及一项用户研究 для质量评估。我们的分析表明,相似度指标与用户喜好相符,现有的个性化方法更容易学习音乐的节奏结构而不是旋律。我们的代码、数据集和研究材料对研究社区开放。
Ano-SuPs: Multi-size anomaly detection for manufactured products by identifying suspected patches
for: This paper aims to address the challenges of existing matrix decomposition methods in image-based anomaly detection, particularly in the presence of complex backgrounds and various anomaly patterns.
methods: The proposed method uses a two-stage strategy that involves detecting suspected patches (Ano-SuPs) by reconstructing the input image twice: the first step is to obtain a set of normal patches by removing suspected patches, and the second step is to use those normal patches to refine the identification of patches with anomalies.
results: The proposed method is evaluated systematically through simulation experiments and case studies, demonstrating its effectiveness in detecting anomalies in image-based systems. The key parameters and designed steps that impact the model’s performance and efficiency are also identified.Abstract
Image-based systems have gained popularity owing to their capacity to provide rich manufacturing status information, low implementation costs and high acquisition rates. However, the complexity of the image background and various anomaly patterns pose new challenges to existing matrix decomposition methods, which are inadequate for modeling requirements. Moreover, the uncertainty of the anomaly can cause anomaly contamination problems, making the designed model and method highly susceptible to external disturbances. To address these challenges, we propose a two-stage strategy anomaly detection method that detects anomalies by identifying suspected patches (Ano-SuPs). Specifically, we propose to detect the patches with anomalies by reconstructing the input image twice: the first step is to obtain a set of normal patches by removing those suspected patches, and the second step is to use those normal patches to refine the identification of the patches with anomalies. To demonstrate its effectiveness, we evaluate the proposed method systematically through simulation experiments and case studies. We further identified the key parameters and designed steps that impact the model's performance and efficiency.
摘要
图像基于系统在生产环境中得到普及,这主要归功于它们能够提供丰富的生产状况信息,以及实现成本和获得率的低。然而,图像背景的复杂性和各种异常模式带来了对现有矩阵分解方法的新挑战。此外,异常现象的不确定性会导致异常污染问题,使得设计的模型和方法容易受到外部干扰。为解决这些挑战,我们提出了一种两Stage策略异常检测方法,通过检测异常的补丁(Ano-SuPs)来检测异常。具体来说,我们首先从输入图像中提取出一组正常补丁,然后使用这些正常补丁来精细地定位异常补丁。为证明其效果,我们系统地通过实验和案例研究评估了提案的方法。此外,我们还标识出了影响模型性能和效率的关键参数和设计步骤。
Bold but Cautious: Unlocking the Potential of Personalized Federated Learning through Cautiously Aggressive Collaboration
results: 实验结果显示,FedCAC比现有的方法更好地将客户的参数与其他客户共享,从而提高模型的性能,特别是在客户的资料分布不同时。Abstract
Personalized federated learning (PFL) reduces the impact of non-independent and identically distributed (non-IID) data among clients by allowing each client to train a personalized model when collaborating with others. A key question in PFL is to decide which parameters of a client should be localized or shared with others. In current mainstream approaches, all layers that are sensitive to non-IID data (such as classifier layers) are generally personalized. The reasoning behind this approach is understandable, as localizing parameters that are easily influenced by non-IID data can prevent the potential negative effect of collaboration. However, we believe that this approach is too conservative for collaboration. For example, for a certain client, even if its parameters are easily influenced by non-IID data, it can still benefit by sharing these parameters with clients having similar data distribution. This observation emphasizes the importance of considering not only the sensitivity to non-IID data but also the similarity of data distribution when determining which parameters should be localized in PFL. This paper introduces a novel guideline for client collaboration in PFL. Unlike existing approaches that prohibit all collaboration of sensitive parameters, our guideline allows clients to share more parameters with others, leading to improved model performance. Additionally, we propose a new PFL method named FedCAC, which employs a quantitative metric to evaluate each parameter's sensitivity to non-IID data and carefully selects collaborators based on this evaluation. Experimental results demonstrate that FedCAC enables clients to share more parameters with others, resulting in superior performance compared to state-of-the-art methods, particularly in scenarios where clients have diverse distributions.
摘要
Translated into Simplified Chinese:personalized federated learning (PFL) 减少客户端之间非独立和同分布数据的影响,通过让每个客户端训练个性化模型并与其他客户端合作。PFL中的关键问题是决定每个客户端的参数是否要本地化或与其他客户端共享。现今主流的方法是将所有敏感于非独立和同分布数据的层(例如分类层)都本地化。这种方法的原因是可以避免因合作而导致的可能性。然而,我们认为这种方法是对合作的过度保守。例如,对于某个客户端,即使其参数容易受到非独立和同分布数据的影响,但是它仍可以通过与其他客户端的数据分布相似性来共享参数,从而获得更好的性能。这一观察强调了在PFL中考虑参数的敏感度以及数据分布的相似性是非常重要的。本文提出了一种新的PFL客户端协作指南,与现今主流的方法不同之处在于,它允许客户端更多地共享参数,从而提高模型性能。此外,我们还提出了一种名为FedCAC的新的PFL方法,它使用一种量化的度量来评估每个参数的非独立和同分布数据的敏感度,并且根据这种评估来精心选择合作者。实验结果表明,FedCAC可以减少客户端之间的数据分布差异,从而实现与当前最佳方法相比的更好的性能。
results: 该论文发现了延迟对动态系统的影响,并提出了一些可能的解决方案。同时,论文还Draws links between celebrated frameworks of reinforcement learning literature and the one of delays.Abstract
Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximise their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays.
摘要
<>translate "Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximize their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays." into Simplified Chinese.Here's the translation:<>多种动力系统中都存在延迟。 besides 延迟时间的偏移,它们可以对性能产生重要影响。因此,通常值得研究延迟并考虑其影响。因为它们是动力系统,因此也不surprisingly,sequential decision-making problemssuch as Markov decision processes (MDP) 也可以受到延迟的影响。这些过程是RL的基础框架,RL的目标是创建可以在环境中学习提高利用的人工智能代理。 RL已经取得了强大,occasionally astonishing的实验成果,但延迟通常不直接考虑。MDP中延迟的理解受限。在这个论文中,我们提议研究代理 Observation of the state of the environment 或执行代理动作中的延迟。我们将不断更改问题的视点,以揭示其结构和特点。广泛考虑延迟的范围,并提供可能的解决方案。这个论文还计划把RL文献中著名的框架与延迟框架相连接。
GPSINDy: Data-Driven Discovery of Equations of Motion
results: 我们在一个 Lotka-Volterra 模型和一个 unicycle 动力系统上进行了实验和硬件数据处理,并证明了我们的方法可以更好地找到系统动力和预测未来轨迹。Abstract
In this paper, we consider the problem of discovering dynamical system models from noisy data. The presence of noise is known to be a significant problem for symbolic regression algorithms. We combine Gaussian process regression, a nonparametric learning method, with SINDy, a parametric learning approach, to identify nonlinear dynamical systems from data. The key advantages of our proposed approach are its simplicity coupled with the fact that it demonstrates improved robustness properties with noisy data over SINDy. We demonstrate our proposed approach on a Lotka-Volterra model and a unicycle dynamic model in simulation and on an NVIDIA JetRacer system using hardware data. We demonstrate improved performance over SINDy for discovering the system dynamics and predicting future trajectories.
摘要
在这篇论文中,我们考虑了从含噪数据中找到动力系统模型的问题。噪声知道会对符号回归算法产生很大的影响。我们将 Gaussian process regression 和 SINDy 结合起来,以非参数方式学习方法来识别非线性动力系统模型。我们的提议的方法的优点是简单易用,同时具有较好的鲁棒性特性,在含噪数据上表现 mejor than SINDy。我们在 Lotka-Volterra 模型和 unicycle 动态模型上进行了在 simulate 和 NVIDIA JetRacer 系统上使用硬件数据进行了实验,并证明了我们的方法可以更好地找到系统动力和预测未来轨迹。
InkStream: Real-time GNN Inference on Streaming Graphs via Incremental Update
methods: 本方法基于两点关键见解:(1)在 $k$-hop 邻域内,大多数节点不受到修改边的影响,当使用汇聚函数时;(2)当模型权重保持不变,而图结构发生变化, THENode 嵌入可以逐渐发展于时间。基于这两点见解,我们提出了一种名为 InkStream 的新方法,用于实时推理,具有最小的内存访问和计算量,同时保证输出与传统方法相同。InkStream 基于事件驱动系统,控制了间层效应传播和内层增量更新节点嵌入。InkStream 高度可配置和扩展,allowing users to create and process customized events。
results: 我们在四个大图上使用三种 GNN 模型进行实验,显示 InkStream 在 CPU 集群上加速了 2.5-427 倍,在两个不同的 GPU 集群上加速了 2.4-343 倍,而且输出与传统方法的最新图快照相同。Abstract
Classic Graph Neural Network (GNN) inference approaches, designed for static graphs, are ill-suited for streaming graphs that evolve with time. The dynamism intrinsic to streaming graphs necessitates constant updates, posing unique challenges to acceleration on GPU. We address these challenges based on two key insights: (1) Inside the $k$-hop neighborhood, a significant fraction of the nodes is not impacted by the modified edges when the model uses min or max as aggregation function; (2) When the model weights remain static while the graph structure changes, node embeddings can incrementally evolve over time by computing only the impacted part of the neighborhood. With these insights, we propose a novel method, InkStream, designed for real-time inference with minimal memory access and computation, while ensuring an identical output to conventional methods. InkStream operates on the principle of propagating and fetching data only when necessary. It uses an event-based system to control inter-layer effect propagation and intra-layer incremental updates of node embedding. InkStream is highly extensible and easily configurable by allowing users to create and process customized events. We showcase that less than 10 lines of additional user code are needed to support popular GNN models such as GCN, GraphSAGE, and GIN. Our experiments with three GNN models on four large graphs demonstrate that InkStream accelerates by 2.5-427$\times$ on a CPU cluster and 2.4-343$\times$ on two different GPU clusters while producing identical outputs as GNN model inference on the latest graph snapshot.
摘要
传统的图 neural network (GNN) 推理方法,设计 для静止图,对流动图来说是不适用的。流动图的动态特性需要不断更新,这会带来特殊的加速挑战在 GPU 上。我们根据以下两个关键发现:(1)在 $k$-hop 邻域内,大量节点不会受到改变的边对 GNN 模型进行汇聚时的影响;(2)当模型权重保持不变而图结构发生变化时,节点嵌入可以逐渐发展在时间上,只需计算影响的部分邻域。基于这些发现,我们提出了一种新的方法,称为 InkStream,用于实时推理,具有最小的内存访问和计算量,同时保证输出和普通方法相同。InkStream 运行在事件驱动的系统上,控制间层效应传播和INTRA层增量更新节点嵌入。InkStream 高度可 configurable,可以让用户创建和处理自定义事件。我们的实验表明,使用 InkStream 可以在 CPU 集群上加速 2.5-427 倍,在两个不同的 GPU 集群上加速 2.4-343 倍,而且输出和普通方法相同。
Extreme Scenario Selection in Day-Ahead Power Grid Operational Planning
paper_authors: Guillermo Terrén-Serrano, Michael Ludkovski
for: 本研究旨在为短期电网规划选择极端情况,以降低运营风险。
methods: 本研究使用统计函数深度指标来筛选极端情况,以确定最有可能导致网络运营风险的情况。
results: 实验结果表明,使用统计函数深度指标可以有效地筛选出高风险情况,并且可以预测load shedding、运营成本、储备短缺和可变能源电停机等操作风险。Abstract
We propose and analyze the application of statistical functional depth metrics for the selection of extreme scenarios in day-ahead grid planning. Our primary motivation is screening of probabilistic scenarios for realized load and renewable generation, in order to identify scenarios most relevant for operational risk mitigation. To handle the high-dimensionality of the scenarios across asset classes and intra-day periods, we employ functional measures of depth to sub-select outlying scenarios that are most likely to be the riskiest for the grid operation. We investigate a range of functional depth measures, as well as a range of operational risks, including load shedding, operational costs, reserves shortfall and variable renewable energy curtailment. The effectiveness of the proposed screening approach is demonstrated through a case study on the realistic Texas-7k grid.
摘要
我们提出和分析使用统计函数深度指标来选择EXTREME场景在一天前电网规划中。我们的 PRIMARY motivation是对 probabilistic scenario 进行屏选,以便 identific scenarios 对电网操作风险最大化。为了处理不同资产类和时间段之间的高维度场景,我们使用函数指标来子选择异常场景,以便更好地了解电网操作风险。我们调查了一系列函数深度指标,以及一系列操作风险,包括荷 shedding、操作成本、储备短缺和可变可再生能源削减。我们的案例研究基于真实的 Texas-7k 电网。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.
Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks
for: This paper aims to improve area efficiency in deep learning inference tasks for edge computing applications, specifically addressing the challenges of limited storage and computing resources in edge devices.
methods: The proposed method employs two key strategies: (1) Frequency domain learning using binarized Walsh-Hadamard Transforms, which reduces the necessary parameters for DNN and enables compute-in-SRAM, and (2) a memory-immersed collaborative digitization method among CiM arrays to reduce the area overheads of conventional ADCs.
results: The proposed method achieves significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC, as demonstrated using a 65 nm CMOS test chip. The results show that it is possible to process analog data more efficiently and selectively retain valuable data from sensors, alleviating the challenges posed by the analog data deluge.Here’s the Chinese version of the three key information points:
results: 根据65nmCMOS测试板表现,提议的方法可以实现显著的面积和能耗减少,与40nm节点5位SAR ADC和5位Flash ADC相比。通过更有效地处理分析数据,可以选择性地保留感知器中的有价值数据,从而解决分析数据泛洪的问题。Abstract
Edge computing is a promising solution for handling high-dimensional, multispectral analog data from sensors and IoT devices for applications such as autonomous drones. However, edge devices' limited storage and computing resources make it challenging to perform complex predictive modeling at the edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize energy for deep learning-based inference at the edge. Nevertheless, integrating storage and processing complicates memory cells and/or memory peripherals, essentially trading off area efficiency for energy efficiency. This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. The proposed method employs two key strategies. Firstly, a Frequency domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the necessary parameters for DNN (by 87% in MobileNetV2) and enabling compute-in-SRAM, which better utilizes parallelism during inference. Secondly, a memory-immersed collaborative digitization method is described among CiM arrays to reduce the area overheads of conventional ADCs. This facilitates more CiM arrays in limited footprint designs, leading to better parallelism and reduced external memory accesses. Different networking configurations are explored, where Flash, SA, and their hybrid digitization steps can be implemented using the memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip, exhibiting significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
摘要
“边缘计算是一种具有应用前景的解决方案,用于处理具有高维度和多 спектル的数据流畅的感应器和IoT设备。然而,边缘设备的储存和处理资源有限,导致复杂的预测模型在边缘进行实际问题。compute-in-memory(CiM)技术已经成为一种主要的解决方案,以降低运算的能源消耗。然而,将储存和处理复杂化的内存细节和/或内存周边设备,实际上是将面积效率与能源效率进行交换。本文提出了一个新的解决方案,以改善边缘运算中的面积效率。本方法使用了两个关键策略:首先,使用频率域学习方法,通过将数据压缩为二进制数据,并使用对称的华氏-哈达玛特转换,以降低运算所需的参数数量(MobileNetV2中降低87%),并允许在执行运算时使用SRAM进行计算。其次,描述了一种内存嵌入式合作数字化方法,用于实现 CiM 阵列中的内存与ADC之间的联系。这种方法可以在有限的面积设计中支持更多的 CiM 阵列,实现更好的并行性和对外存储器的减少。不同的网络配置被探讨,包括 Flash、SA 和它们的混合式数字化步骤。结果显示,使用本方法可以在65奈米CMOS试验板上展示出具有明显的面积和能源优化的功能。通过更有效地处理数据,可以对感应器中的有用数据进行选择性储存,从而缓解感应器中的数据潮汐问题。”
A Region-Shrinking-Based Acceleration for Classification-Based Derivative-Free Optimization
results: 根据实验结果,新提出的 “RACE-CARS” 算法比 traditional “SRACOS” 更快,并且对黑盒优化和自然语言处理中的语言模型服务进行了实证验证。此外,文章还进行了一个ablation experiment,探讨了 “RACE-CARS” 的机制和参数优化的指导。Abstract
Derivative-free optimization algorithms play an important role in scientific and engineering design optimization problems, especially when derivative information is not accessible. In this paper, we study the framework of classification-based derivative-free optimization algorithms. By introducing a concept called hypothesis-target shattering rate, we revisit the computational complexity upper bound of this type of algorithms. Inspired by the revisited upper bound, we propose an algorithm named "RACE-CARS", which adds a random region-shrinking step compared with "SRACOS" (Hu et al., 2017).. We further establish a theorem showing the acceleration of region-shrinking. Experiments on the synthetic functions as well as black-box tuning for language-model-as-a-service demonstrate empirically the efficiency of "RACE-CARS". An ablation experiment on the introduced hyperparameters is also conducted, revealing the mechanism of "RACE-CARS" and putting forward an empirical hyperparameter-tuning guidance.
摘要
derivative-free 优化算法在科学和工程设计优化问题中扮演着重要的角色,尤其是当 derivate 信息不可获取时。本文研究了类别基于的 derivative-free 优化算法框架。通过引入假设目标震荡率,我们重新评估了这类算法的计算复杂性Upper bound。 inspirited 由 revisited Upper bound,我们提出了名为 "RACE-CARS" 的算法,它在 "SRACOS" (Hu et al., 2017)中添加了随机区域缩小步骤。我们还证明了区域缩小的加速。对于 synthetic 函数以及黑盒调参语言模型服务,我们进行了实验,并证明了 "RACE-CARS" 的效率。另外,我们还进行了一个ablation experiment 对引入的超参数,探讨了 "RACE-CARS" 的机制,并提出了一个empirical 超参数调整指南。
The Topology and Geometry of Neural Representations
for: 这项研究的目的是Characterize brain representations of perceptual and cognitive content, and distinguish different functional regions with robustness to noise and individual differences.
methods: 研究使用了 topological representational similarity analysis (tRSA), an extension of representational similarity analysis (RSA) that uses a family of geo-topological summary statistics to characterize the topology of brain representations while de-emphasizing the geometry.
results: 研究发现,使用这种新的统计方法可以robust to noise and interindividual variability, and maintain excellent sensitivity to the unique representational signatures of different neural network layers and brain regions.Abstract
A central question for neuroscience is how to characterize brain representations of perceptual and cognitive content. An ideal characterization should distinguish different functional regions with robustness to noise and idiosyncrasies of individual brains that do not correspond to computational differences. Previous studies have characterized brain representations by their representational geometry, which is defined by the representational dissimilarity matrix (RDM), a summary statistic that abstracts from the roles of individual neurons (or responses channels) and characterizes the discriminability of stimuli. Here we explore a further step of abstraction: from the geometry to the topology of brain representations. We propose topological representational similarity analysis (tRSA), an extension of representational similarity analysis (RSA) that uses a family of geo-topological summary statistics that generalizes the RDM to characterize the topology while de-emphasizing the geometry. We evaluate this new family of statistics in terms of the sensitivity and specificity for model selection using both simulations and functional MRI (fMRI) data. In the simulations, the ground truth is a data-generating layer representation in a neural network model and the models are the same and other layers in different model instances (trained from different random seeds). In fMRI, the ground truth is a visual area and the models are the same and other areas measured in different subjects. Results show that topology-sensitive characterizations of population codes are robust to noise and interindividual variability and maintain excellent sensitivity to the unique representational signatures of different neural network layers and brain regions.
摘要
中文翻译: neuroscience 中的一个中心问题是如何 caracterize 大脑表征的感知和认知内容。理想的 caracterization 应该能够分辨不同的功能区域,并具有对噪声和个体大脑差异的抗颤性。前一些研究已经使用 representational geometry 来 caracterize 大脑表征,其定义为各个 neuron 或 response channel 的表征差异矩阵 (RDM),这是一个摘要统计量,抑制了个体大脑差异的计算不同。在这篇文章中,我们 explore 一个进一步的抽象步骤:从 geometry 到大脑表征的 topology。我们提出 topological representational similarity analysis (tRSA),这是 representational similarity analysis (RSA) 的扩展,使用一个基于地理 topological 摘要统计量,这个统计量抑制了 geometry 的影响,专注于表征的 topology。我们使用 simulate 和 functional MRI (fMRI) 数据来评估这种新的家族统计量的敏感性和特点。在 simulate 中,ground truth 是一个数据生成层表示,模型是不同的 random seed 生成的不同层模型实例。在 fMRI 中,ground truth 是一个视觉区域,模型是不同的视觉区域和不同的主体 measured 的不同主体。结果显示,基于 topology 的人类代表码 caracterization 是噪声和个体差异的抗颤性,并保持了对不同 neural network 层和大脑区域的唯一表征签名的敏感性。
Information Leakage from Data Updates in Machine Learning Models
results: 我们发现,使用两个模型Snapshot可以导致更高的信息泄露,而且数据记录 WITH rare attribute value 更容易受到攻击。 repeated changes 可能会带来更大的泄露。Abstract
In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning model before and after the change in the dataset occurs. Contrary to the existing literature, we assume that an attribute of a single or multiple training data points are changed rather than entire data records are removed or added. We propose attacks based on the difference in the prediction confidence of the original model and the updated model. We evaluate our attack methods on two public datasets along with multi-layer perceptron and logistic regression models. We validate that two snapshots of the model can result in higher information leakage in comparison to having access to only the updated model. Moreover, we observe that data records with rare values are more vulnerable to attacks, which points to the disparate vulnerability of privacy attacks in the update setting. When multiple records with the same original attribute value are updated to the same new value (i.e., repeated changes), the attacker is more likely to correctly guess the updated values since repeated changes leave a larger footprint on the trained model. These observations point to vulnerability of machine learning models to attribute inference attacks in the update setting.
摘要
在这篇论文中,我们考虑了机器学习模型在更新数据集后重新训练以包含最新的信息或反映分布变化。我们研究了是否可以从训练数据中推断更新信息(例如,记录属性值的更改)。在这个设定下,敌方可以访问机器学习模型的两个快照,即之前和之后更改数据集发生。不同于现有文献,我们假设单个或多个训练数据点的属性发生变化而不是整个数据记录被删除或添加。我们提出了基于原始模型和更新模型预测信任度差异的攻击方法。我们在两个公共数据集以及多层感知和折衔函数模型上进行了评估。我们发现了两个快照的模型可以导致更高的信息泄露,而且数据记录中的罕见值更容易受到攻击,这指出了机器学习模型在更新设定下的敏感度问题。当多个记录中的原始属性值都更新为同一个新值时(即重复更改),攻击者更可能正确地猜测更新值,因为重复更改会留下更大的模型训练中的印记。这些观察表明了机器学习模型在更新设定下面临的属性推断攻击。
3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images
results: 通过比较和采样大小实验表明,提出的方法可以更好地解决3D dental图像分割任务。Abstract
Accurate representation of tooth position is extremely important in treatment. 3D dental image segmentation is a widely used method, however labelled 3D dental datasets are a scarce resource, leading to the problem of small samples that this task faces in many cases. To this end, we address this problem with a pretrained SAM and propose a novel 3D-U-SAM network for 3D dental image segmentation. Specifically, in order to solve the problem of using 2D pre-trained weights on 3D datasets, we adopted a convolution approximation method; in order to retain more details, we designed skip connections to fuse features at all levels with reference to U-Net. The effectiveness of the proposed method is demonstrated in ablation experiments, comparison experiments, and sample size experiments.
摘要
很重要的是精确地表示牙齿的位置在治疗中。3D dental图像分割是一种广泛使用的方法,但标注的3D dental数据集是一种罕见的资源,导致这个任务在许多情况下面临着小样本问题。为解决这个问题,我们使用预训练的SAM并提议一种3D-U-SAM网络 для3D dental图像分割。具体来说,为了解决使用2D预训练 веса在3D数据集上的问题,我们采用了一种核心approximation方法;为了保留更多的细节,我们设计了跳转连接,以融合所有层的特征参照U-Net。我们的提议方法的效果在ablation实验、比较实验和样本大小实验中得到了证明。
It’s Simplex! Disaggregating Measures to Improve Certified Robustness
results: 这种研究发现,通过考虑可靠性证明的输出空间,可以提高证明机制的分析,并且可以超过现有状态的证明范围。实验证明,新的证明方法可以在噪声率为1时证明9%更多的样本,并且在预测任务的难度增加时,Relative improvement更大。Abstract
Certified robustness circumvents the fragility of defences against adversarial attacks, by endowing model predictions with guarantees of class invariance for attacks up to a calculated size. While there is value in these certifications, the techniques through which we assess their performance do not present a proper accounting of their strengths and weaknesses, as their analysis has eschewed consideration of performance over individual samples in favour of aggregated measures. By considering the potential output space of certified models, this work presents two distinct approaches to improve the analysis of certification mechanisms, that allow for both dataset-independent and dataset-dependent measures of certification performance. Embracing such a perspective uncovers new certification approaches, which have the potential to more than double the achievable radius of certification, relative to current state-of-the-art. Empirical evaluation verifies that our new approach can certify $9\%$ more samples at noise scale $\sigma = 1$, with greater relative improvements observed as the difficulty of the predictive task increases.
摘要
《认证类弹性超越防御攻击的脆弱性,通过将模型预测 garantuee 为攻击规模内的类型不变,从而确保模型在攻击下的预测稳定性。 although there is value in these certifications, the techniques used to assess their performance do not provide a comprehensive account of their strengths and weaknesses, as they have neglected to consider the performance of individual samples. by considering the potential output space of certified models, this work presents two distinct approaches to improve the analysis of certification mechanisms, which allow for both dataset-independent and dataset-dependent measures of certification performance. embracing such a perspective uncovers new certification approaches, which have the potential to more than double the achievable radius of certification, relative to current state-of-the-art. empirical evaluation verifies that our new approach can certify $9\%$ more samples at noise scale $\sigma = 1$, with greater relative improvements observed as the difficulty of the predictive task increases.》
Towards Data-centric Graph Machine Learning: Review and Outlook
paper_authors: Xin Zheng, Yixin Liu, Zhifeng Bao, Meng Fang, Xia Hu, Alan Wee-Chung Liew, Shirui Pan for: 这篇论文主要关注数据驱动AI的发展,尤其是Graph数据结构的应用。methods: 论文提出了一种系统化框架,名为Data-centric Graph Machine Learning(DC-GML),该框架包括Graph数据生命周期中的所有阶段,包括数据收集、探索、改进、利用和维护。results: 论文提供了一份完整的taxonomy,用于回答三个关键的Graph数据中心问题:1)如何提高Graph数据的可用性和质量;2)如何从限量可用和低质量的Graph数据中学习;3)如何建立基于Graph数据的Machine Learning操作系统。Abstract
Data-centric AI, with its primary focus on the collection, management, and utilization of data to drive AI models and applications, has attracted increasing attention in recent years. In this article, we conduct an in-depth and comprehensive review, offering a forward-looking outlook on the current efforts in data-centric AI pertaining to graph data-the fundamental data structure for representing and capturing intricate dependencies among massive and diverse real-life entities. We introduce a systematic framework, Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of the graph data lifecycle, including graph data collection, exploration, improvement, exploitation, and maintenance. A thorough taxonomy of each stage is presented to answer three critical graph-centric questions: (1) how to enhance graph data availability and quality; (2) how to learn from graph data with limited-availability and low-quality; (3) how to build graph MLOps systems from the graph data-centric view. Lastly, we pinpoint the future prospects of the DC-GML domain, providing insights to navigate its advancements and applications.
摘要
“数据驱动AI”在最近几年内受到了越来越多的关注,它的核心是通过收集、管理和利用数据驱动AI模型和应用程序。在这篇文章中,我们提供了一个深入和全面的评论,对现在的数据驱动AI方面的努力进行了详细的梳理,特别是在图数据strucuture上,图数据是现实世界中各种各样的实体之间的复杂依赖关系的基本表示方式。我们提出了一个涵盖所有图数据生命周期阶段的系统框架,称为数据驱动图机器学习(DC-GML),包括图数据收集、探索、改进、利用和维护等阶段。我们还提供了每个阶段的住进行三个关键问题的答案:(1)如何提高图数据可用性和质量;(2)如何从有限可用性和低质量的图数据中学习;(3)如何从图数据中心视建立图MLOps系统。最后,我们指出了DC-GML领域未来的前景,为其发展和应用提供了指导。
PAGER: A Framework for Failure Analysis of Deep Regression Models
results: 对于 synthetic 和实际 benchmark 进行了评估,结果显示了 PAGER 可以准确地检测出深度回归模型的预测错误,并且可以在不同的风险 régime 中分类样本。Abstract
Safe deployment of AI models requires proactive detection of potential prediction failures to prevent costly errors. While failure detection in classification problems has received significant attention, characterizing failure modes in regression tasks is more complicated and less explored. Existing approaches rely on epistemic uncertainties or feature inconsistency with the training distribution to characterize model risk. However, we show that uncertainties are necessary but insufficient to accurately characterize failure, owing to the various sources of error. In this paper, we propose PAGER (Principled Analysis of Generalization Errors in Regressors), a framework to systematically detect and characterize failures in deep regression models. Built upon the recently proposed idea of anchoring in deep models, PAGER unifies both epistemic uncertainties and novel, complementary non-conformity scores to organize samples into different risk regimes, thereby providing a comprehensive analysis of model errors. Additionally, we introduce novel metrics for evaluating failure detectors in regression tasks. We demonstrate the effectiveness of PAGER on synthetic and real-world benchmarks. Our results highlight the capability of PAGER to identify regions of accurate generalization and detect failure cases in out-of-distribution and out-of-support scenarios.
摘要
安全部署人工智能模型需要积极检测可能出现的预测错误,以避免高昂的错误成本。尽管在分类问题上的失败检测已经收到了广泛的关注,但在回归任务中的失败模式特征化尚未得到了充分的研究。现有的方法基于模型知识不确定性或特征偏移度与训练分布相关的方法来特征化模型风险。然而,我们表明了不确定性是特征化失败的必要条件,但并不够。在这篇论文中,我们提出了PAGER(基于深度模型的概念分析和总结),一种框架,用于系统地检测和特征化深度回归模型中的失败。基于深度模型的安chor思想,PAGER结合了epistemic不确定性和新的非准确性分数,将样本分为不同的风险 режимом,从而提供了全面的模型错误分析。此外,我们提出了新的评价失败检测器的度量方法。我们在synthetic和实际世界 benchmark上证明了PAGER的效果。我们的结果显示,PAGER能够标识出高度普适泛化和out-of-distribution和out-of-support场景中的失败案例。
Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks
results: 对于covariate、concept和图大小shift,G-$\Delta$UQ不仅在获得准确的CI方面表现出色,还在使用CI进行泛化差分预测和OOD检测方面表现更好于其他popular UQ方法。总的来说,这篇论文不仅介绍了一种新的GNN UQ方法,还提供了图 neural network在安全关键任务上的新的理解。Abstract
Safe deployment of graph neural networks (GNNs) under distribution shift requires models to provide accurate confidence indicators (CI). However, while it is well-known in computer vision that CI quality diminishes under distribution shift, this behavior remains understudied for GNNs. Hence, we begin with a case study on CI calibration under controlled structural and feature distribution shifts and demonstrate that increased expressivity or model size do not always lead to improved CI performance. Consequently, we instead advocate for the use of epistemic uncertainty quantification (UQ) methods to modulate CIs. To this end, we propose G-$\Delta$UQ, a new single model UQ method that extends the recently proposed stochastic centering framework to support structured data and partial stochasticity. Evaluated across covariate, concept, and graph size shifts, G-$\Delta$UQ not only outperforms several popular UQ methods in obtaining calibrated CIs, but also outperforms alternatives when CIs are used for generalization gap prediction or OOD detection. Overall, our work not only introduces a new, flexible GNN UQ method, but also provides novel insights into GNN CIs on safety-critical tasks.
摘要
安全部署图 neural network (GNN) 需要模型提供准确的信任指标 (CI)。然而,虽然在计算机视觉中已经证明了 CI 质量下降于分布转移,但这一点尚未得到对 GNN 的研究。因此,我们开始了一项案例研究,探讨了 CI 准确性下降的情况,并发现增加表达能力或模型大小不一定能提高 CI 性能。因此,我们建议使用 epistemic 不确定性量化 (UQ) 方法来调整 CIs。为此,我们提出了 G-ΔUQ,一种新的单模型 UQ 方法,扩展了最近提出的随机中心框架,以支持结构化数据和部分随机性。经过 covariate、概念和图大小转移的评估,G-ΔUQ 不仅在获得准确的 CIs 方面超过了许多流行的 UQ 方法,还在用 CIs 进行泛化差分预测或 OOD 探测时表现更好。总的来说,我们不仅提出了一种新的、灵活的 GNN UQ 方法,而且为安全关键任务提供了新的思路和发现。
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
results: 这个论文提出了一种全网络误差边界,以及一种在 Gaussian 权重下可以实现的高效压缩方法。此外,论文还证明了,当采用这种方法压缩多层神经网络时,误差表达的平方幂 decay Linear 方式与过参化程度增长。此外,论文还证明了可以使用 loglog N 比特数来实现误差边界相当于无限字母情况下的误差边界。Abstract
Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere $\log\log N$ bits per weight, where $N$ represents the largest number of neurons in a layer.
摘要
量化是一种广泛使用的压缩方法,可以有效地减少深度神经网络中的重复性。然而,现有的深度神经网络量化技术 frequently lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere $\log\log N$ bits per weight, where $N$ represents the largest number of neurons in a layer.Note: The translation is done using the Google Translate API, and may not be perfect. Please let me know if you need any further assistance.
for: This paper aims to provide a high-content stimulated Raman histology (HC-SRH) platform for cancer diagnosis based on un-stained breast tissues, which can provide both morphological and chemical information.
methods: The HC-SRH platform uses spectral unmixing in the C-H vibration window to map unsaturated lipids, cellular protein, extracellular matrix, saturated lipid, and water in breast tissue, and spectral selective sampling is implemented to boost the speed of HC-SRH.
results: The HC-SRH platform provides excellent contrast for various tissue components, and the advanced fiber laser-based SRS microscopy demonstrates the HC-SRH in a clinical-compatible manner, showing a clear chemical contrast of nucleic acid and solid-state ester in the fingerprint result.Abstract
Histological examination is crucial for cancer diagnosis, including hematoxylin and eosin (H&E) staining for mapping morphology and immunohistochemistry (IHC) staining for revealing chemical information. Recently developed two-color stimulated Raman histology could bypass the complex tissue processing to mimic H&E-like morphology. Yet, the underlying chemical features are not revealed, compromising the effectiveness of prognostic stratification. Here, we present a high-content stimulated Raman histology (HC-SRH) platform that provides both morphological and chemical information for cancer diagnosis based on un-stained breast tissues. Through spectral unmixing in the C-H vibration window, HC-SRH can map unsaturated lipids, cellular protein, extracellular matrix, saturated lipid, and water in breast tissue. In this way, HC-SRH provides excellent contrast for various tissue components. Considering rapidness is important in clinical trials, we implemented spectral selective sampling to boost the speed of HC-SRH by one order. We also successfully demonstrated the HC-SRH in a clinical-compatible fiber laser-based SRS microscopy. With the widely rapid tuning capability of the advanced fiber laser, a clear chemical contrast of nucleic acid and solid-state ester is shown in the fingerprint result.
摘要
histological 检查是癌病诊断中不可或缺的,包括杂谱和染色技术。 latest developments in two-color stimulated Raman histology can mimic H&E-like morphology, but the underlying chemical features are not revealed, which compromises the effectiveness of prognostic stratification. Here, we present a high-content stimulated Raman histology (HC-SRH) platform that provides both morphological and chemical information for cancer diagnosis based on un-stained breast tissues. Through spectral unmixing in the C-H vibration window, HC-SRH can map unsaturated lipids, cellular protein, extracellular matrix, saturated lipid, and water in breast tissue. In this way, HC-SRH provides excellent contrast for various tissue components. Considering rapidness is important in clinical trials, we implemented spectral selective sampling to boost the speed of HC-SRH by one order. We also successfully demonstrated the HC-SRH in a clinical-compatible fiber laser-based SRS microscopy. With the widely rapid tuning capability of the advanced fiber laser, a clear chemical contrast of nucleic acid and solid-state ester is shown in the fingerprint result.
Lightning-Fast Dual-Layer Lossless Coding for Radiance Format High Dynamic Range Images
results: 对比现有方法,该编码方法可以减少平均比特率约为1.57%-6.68%,并且显著减少解码器实现时间约为87.13%-98.96%。Abstract
This paper proposes a fast dual-layer lossless coding for high dynamic range images (HDRIs) in the Radiance format. The coding, which consists of a base layer and a lossless enhancement layer, provides a standard dynamic range image (SDRI) without requiring an additional algorithm at the decoder and can losslessly decode the HDRI by adding the residual signals (residuals) between the HDRI and SDRI to the SDRI, if desired. To suppress the dynamic range of the residuals in the enhancement layer, the coding directly uses the mantissa and exponent information from the Radiance format. To further reduce the residual energy, each mantissa is modeled (estimated) as a linear function, i.e., a simple linear regression, of the encoded-decoded SDRI in each region with the same exponent. This is called simple linear regressive mantissa estimator. Experimental results show that, compared with existing methods, our coding reduces the average bitrate by approximately $1.57$-$6.68$ % and significantly reduces the average encoder implementation time by approximately $87.13$-$98.96$ %.
摘要
这篇论文提出了一种快速双层无损编码器,用于高动态范围图像(HDRIs)在辉度格式下。该编码器由基层和无损增强层组成,可以将标准动态范围图像(SDRI)转换为HDRIs,无需额外算法在解码器端。此外,该编码器还可以losslessly解码HDRIs,只需将差异信号(差异)加到SDRI上即可。为了减少增强层的动态范围,该编码器直接使用Radiance格式中的杠志和指数信息。进一步减少差异能量,每个杠志都被模型为在每个区域中的线性函数,即简单的线性回归。这被称为简单的线性回归杠志估计器。实验结果表明,相比现有方法,我们的编码器可以将平均比特率降低约1.57%-6.68%,并显著降低解码器实现时间约87.13%-98.96%。