results: 实现4x4倍化超解像、三个设计的面镜和实时成像速度Abstract
Achieving both high-performance and wide field-of-view (FOV) super-resolution imaging has been attracting increasing attention in recent years. However, such goal suffers from long reconstruction time and huge storage space. Parallel compressive imaging (PCI) provides an efficient solution, but the super-resolution quality and imaging speed are strongly dependent on precise optical transfer function (OTF), modulation masks and reconstruction algorithm. In this work, we propose a wide FOV parallel compressive super-resolution imaging approach based on physics enhanced network. By training the network with the prior OTF of an arbitrary 128x128-pixel region and fine-tuning the network with other OTFs within rest regions of FOV, we realize both mask optimization and super-resolution imaging with up to 1020x1500 wide FOV. Numerical simulations and practical experiments demonstrate the effectiveness and superiority of the proposed approach. We achieve high-quality reconstruction with 4x4 times super-resolution enhancement using only three designed masks to reach real-time imaging speed. The proposed approach promotes the technology of rapid imaging for super-resolution and wide FOV, ranging from infrared to Terahertz.
摘要
实现高性能和广频谱场视野(FOV)超分辨成像已经引起了过去几年的关注。然而,这个目标受到了重建时间和存储空间的限制。并行压缩成像(PCI)提供了一个有效的解决方案,但是超分辨质量和成像速度受到了准确的光传导函数(OTF)、模拟面和重建算法的影响。在这项工作中,我们提议一种基于物理增强网络的广频谱场并行压缩超分辨成像方法。通过训练网络使用任意128x128像素区域的先前OTF,并在其他FOV区域中细化网络,我们实现了模拟面优化和超分辨成像,可达到1020x1500广频谱场。 numeral simulations and practical experiments demonstrate the effectiveness and superiority of the proposed approach. We achieve high-quality reconstruction with 4x4 times super-resolution enhancement using only three designed masks to reach real-time imaging speed. The proposed approach promotes the technology of rapid imaging for super-resolution and wide FOV, ranging from infrared to Terahertz.
An Invitation to Hypercomplex Phase Retrieval: Theory and Applications
paper_authors: Roman Jacome, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello
for: The paper is written for researchers and practitioners working in the field of hypercomplex signal processing (HSP) and its applications in optical imaging.
methods: The paper uses Clifford algebra to handle multidimensional signals and provides a synopsis of emerging areas and applications of hypercomplex phase retrieval (HPR) with a focus on optical imaging.
results: The paper discusses the opportunities for developing novel HSP tools and algorithms in the context of HPR, particularly in optical imaging applications.Abstract
Hypercomplex signal processing (HSP) provides state-of-the-art tools to handle multidimensional signals by harnessing intrinsic correlation of the signal dimensions through Clifford algebra. Recently, the hypercomplex representation of the phase retrieval (PR) problem, wherein a complex-valued signal is estimated through its intensity-only projections, has attracted significant interest. The hypercomplex PR (HPR) arises in many optical imaging and computational sensing applications that usually comprise quaternion and octonion-valued signals. Analogous to the traditional PR, measurements in HPR may involve complex, hypercomplex, Fourier, and other sensing matrices. This set of problems opens opportunities for developing novel HSP tools and algorithms. This article provides a synopsis of the emerging areas and applications of HPR with a focus on optical imaging.
摘要
超复杂信号处理(HSP)提供了当今最先进的工具来处理多维信号,通过CLIFFORD代数利用信号维度之间的自然相关性。在最近几年,使用超复杂表示法来解决频谱恢复(PR)问题,其中一个复数值信号通过其尺度仅的投影来估算,已经吸引了广泛的关注。这种超复杂PR(HPR)在光学成像和计算感知应用中广泛存在,通常包括四元数和八元数值信号。与传统PR问题相似,HPR问题中的测量可能包括复数、超复杂、傅里叶和其他感知矩阵。这些问题的出现为HSP工具和算法的开发提供了新的机会。本文将对HPR在光学成像领域的出现和应用进行简要的介绍。
results: 我们通过实验使用真实世界的数据集,证明了我们的算法的有效性。Abstract
As wireless communication systems strive to improve spectral efficiency, there has been a growing interest in employing machine learning (ML)-based approaches for adaptive modulation and coding scheme (MCS) selection. In this paper, we introduce a new adaptive MCS selection framework for massive MIMO systems that operates without any feedback from users by solely relying on instantaneous uplink channel estimates. Our proposed method can effectively operate in multi-user scenarios where user feedback imposes excessive delay and bandwidth overhead. To learn the mapping between the user channel matrices and the optimal MCS level of each user, we develop a Convolutional Neural Network (CNN)-Long Short-Term Memory Network (LSTM)-based model and compare the performance with the state-of-the-art methods. Finally, we validate the effectiveness of our algorithm by evaluating it experimentally using real-world datasets collected from the RENEW massive MIMO platform.
摘要
“为了提高无线通信系统的spectral efficiency,随着机器学习(ML)方法的应用,这些系统中的modulation and coding scheme(MCS)选择方法受到了越来越大的关注。在这篇文章中,我们介绍了一个新的适应MCS选择框架,这个框架可以在大规模多用户MIMO系统中运行,并且不需要用户反馈。我们的提案可以实现在多用户enario中,where user feedback would cause excessive delay and bandwidth overhead。为了将用户通道矩阵与最佳MCS水平相对映射,我们开发了一个Convolutional Neural Network(CNN)-Long Short-Term Memory Network(LSTM)模型,并与现有方法进行比较。最后,我们透过实验使用真实世界的数据集,评估了我们的算法的有效性。”Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.
V2X Sidelink Positioning in FR1: Scenarios, Algorithms, and Performance Evaluation
paper_authors: Yu Ge, Maximilian Stark, Musa Furkan Keskin, Frank Hofmann, Thomas Hansen, Henk Wymeersch
For: 本研究 investigate sub-6 GHz V2X sidelink positioning scenarios in 5G vehicular networks, with a focus on developing a comprehensive end-to-end methodology for channel modeling, performance bounds, channel estimation, and geometric positioning.* Methods: 本研究使用了以下方法: + derivation of a novel, approximate Cramér-Rao bound (CRB) on the connected road user (CRU) position, taking into account multipath interference, path merging, and the round-trip-time (RTT) protocol. + high-resolution channel parameter estimation algorithms based on tensor decomposition and ESPRIT methods, specifically tailored to dense multipath V2X sidelink environments. + comprehensive simulations using realistic ray-tracing data and antenna patterns to evaluate channel estimation and positioning performance.* Results: 本研究的结果表明,使用提议的算法可以实现 sub-meter 精度在 sub-6 GHz V2X 中。Abstract
In this paper, we investigate sub-6 GHz V2X sidelink positioning scenarios in 5G vehicular networks through a comprehensive end-to-end methodology encompassing ray-tracing-based channel modeling, novel theoretical performance bounds, high-resolution channel parameter estimation, and geometric positioning using a round-trip-time (RTT) protocol. We first derive a novel, approximate Cram\'er-Rao bound (CRB) on the connected road user (CRU) position, explicitly taking into account multipath interference, path merging, and the RTT protocol. Capitalizing on tensor decomposition and ESPRIT methods, we propose high-resolution channel parameter estimation algorithms specifically tailored to dense multipath V2X sidelink environments, designed to detect multipath components (MPCs) and extract line-of-sight (LoS) parameters. Finally, using realistic ray-tracing data and antenna patterns, comprehensive simulations are conducted to evaluate channel estimation and positioning performance, indicating that sub-meter accuracy can be achieved in sub-6 GHz V2X with the proposed algorithms.
摘要
在本文中,我们 investigate 5G vehicular network中的低于6GHz V2X侧链位征enario通过全面的端到端方法,包括束向投影法 Channel modeling、新的理论性表现 bound、高分辨率 Channel parameter estimation 和基于圆投返回时间(RTT)协议的地理位置测定。我们首先 deriv a novel, approximate Cramér-Rao bound(CRB)on the connected road user(CRU)position,直接考虑多path interference, path merging, 和 RTT协议。通过维度分解和ESPRIT方法,我们提议高分辨率 Channel parameter estimation algorithm,专门适用于密集多path V2X侧链环境,探测多path component(MPC)和Extract line-of-sight(LoS)参数。最后,通过实际的束向投影数据和天线 Pattern,我们进行了全面的 Channel estimation和位置测定性能评估,结果表明在sub-6GHz V2X中可以实现sub-meter精度。
Electrical Fault Localisation Over a Distributed Parameter Transmission Line
results: 研究结果表明,该算法可以准确地寻址故障位置。需要故障频率、测量器频率和模拟时间步骤的要求也被提出。Abstract
Motivated by the need to localise faults along electrical power lines, this paper adopts a frequency-domain approach to parameter estimation for an infinite-dimensional linear dynamical system with one spatial variable. Since the time of the fault is unknown, and voltages and currents are measured at only one end of the line, distance information must be extracted from the post-fault transients. To properly account for high-frequency transient behaviour, the line dynamics is modelled directly by the Telegrapher's equation, rather than the more commonly used lumped-parameter approximations. First, the governing equations are non-dimensionalised to avoid ill-conditioning. A closed-form expression for the transfer function is then derived. Finally, nonlinear least-squares optimisation is employed to search for the fault location. Requirements on fault bandwidth, sensor bandwidth and simulation time-step are also presented. The result is a novel end-to-end algorithm for data generation and fault localisation, the effectiveness of which is demonstrated via simulation.
摘要
<>translate "Motivated by the need to localise faults along electrical power lines, this paper adopts a frequency-domain approach to parameter estimation for an infinite-dimensional linear dynamical system with one spatial variable. Since the time of the fault is unknown, and voltages and currents are measured at only one end of the line, distance information must be extracted from the post-fault transients. To properly account for high-frequency transient behaviour, the line dynamics is modelled directly by the Telegrapher's equation, rather than the more commonly used lumped-parameter approximations. First, the governing equations are non-dimensionalised to avoid ill-conditioning. A closed-form expression for the transfer function is then derived. Finally, nonlinear least-squares optimisation is employed to search for the fault location. Requirements on fault bandwidth, sensor bandwidth and simulation time-step are also presented. The result is a novel end-to-end algorithm for data generation and fault localisation, the effectiveness of which is demonstrated via simulation."中文简体版:这篇论文推动了为电力线路检测缺陷的需要,采用频域方法来估算无穷维度线性动力系统中的参数。由于缺陷时间未知,且电压和电流只 mesure在一端,因此需要从后缺陷脉冲中提取距离信息。为了正确地考虑高频脉冲行为,本文直接使用电报方程来模拟线动态,而不是通常使用简化参数近似。首先, governing equations 被非约数学化,以避免不合理的条件。然后, closed-form 表达式 для transfer function derivation。最后,使用非线性最小二乘优化方法来搜索缺陷位置。 besides, paper 还提出了缺陷频率、感知频率和模拟步长的要求。 results 表明了这种 novel end-to-end 算法的有效性,通过 simulation validate。
Reconfigurable Intelligent Sensing Surface aided Wireless Powered Communication Networks: A Sensing-Then-Reflecting Approach
paper_authors: Cheng Luo, Jie Hu, Luping Xiang, Kun Yang
for: 这个论文旨在提出一个可重新配置的智能感应面 (RISS),让它同时进行反射和方向来测量信号的来源方向信息。这个设计可以实现对于普通通道估计的减少,并且让 RISS 成为独立于 Hybrid Access Point (HAP) 的独立系统。
methods: 这个论文使用了 RISS 自动估计 uplink 信号的来源方向信息,并且使用 HAP 的慢测量 DOA 信息来反射信号。在下行传输中,RISS 更新 HAP 的 DOA 信息,并设计反射能量信号的相位基于最新的用户 DOA 信息。
results: 论文包括了一个完整的性能分析,涵盖系统设计、协议细节、接收性能和 RISS 部署建议。我们 derive 了一个关于系统性能的关闭式表示,并使用了 moments-matching 技术来计算用户接收能量的Statistical 分布。我们还提供了一个建议的传输功率,以满足一定的损失概率和能量阈值。numerical 结果显示,提案的系统比传统 counterpart 高2.3 dB 和 4.7 dB 在 Rician 因子 $\kappa_h = \kappa_G = 1$ 和 $\kappa_h = \kappa_G = 10$ 下,分别。Abstract
This paper presents a reconfigurable intelligent sensing surface (RISS) that combines passive and active elements to achieve simultaneous reflection and direction of arrival (DOA) estimation tasks. By utilizing DOA information from the RISS instead of conventional channel estimation, the pilot overhead is reduced and the RISS becomes independent of the hybrid access point (HAP), enabling efficient operation. Specifically, the RISS autonomously estimates the DOA of uplink signals from single-antenna users and reflects them using the HAP's slowly varying DOA information. During downlink transmission, it updates the HAP's DOA information and designs the reflection phase of energy signals based on the latest user DOA information. The paper includes a comprehensive performance analysis, covering system design, protocol details, receiving performance, and RISS deployment suggestions. We derive a closed-form expression to analyze system performance under DOA errors, and calculate the statistical distribution of user received energy using the moment-matching technique. We provide a recommended transmit power to meet a specified outage probability and energy threshold. Numerical results demonstrate that the proposed system outperforms the conventional counterpart by 2.3 dB and 4.7 dB for Rician factors $\kappa_h=\kappa_G=1$ and $\kappa_h=\kappa_G=10$, respectively.
摘要
The paper includes a comprehensive performance analysis, covering system design, protocol details, receiving performance, and RISS deployment suggestions. A closed-form expression is derived to analyze system performance under DOA errors, and the statistical distribution of user received energy is calculated using the moment-matching technique. Numerical results show that the proposed system outperforms the conventional counterpart by 2.3 dB and 4.7 dB for Rician factors $\kappa_h=\kappa_G=1$ and $\kappa_h=\kappa_G=10$, respectively.Here is the translation in Simplified Chinese:这篇论文介绍了一种可重配置的智能感知面(RISS),该面combines passive和active元素以实现同时反射和方向来源估计任务。通过使用RISS中的DOA信息而不是传统的通道估计,可以降低飞行器的干扰负荷和RISS与混合访问点(HAP)之间的依赖关系,从而实现高效的操作。具体来说,RISS可以自主估计单antenna用户信号的DOA,并使用HAP的慢变DOA信息反射这些信号。在下行传输中,RISS会更新HAP的DOA信息,并根据最新的用户DOA信息设计反射能量信号的相对阶段。论文包括了系统设计、协议细节、接收性能和RISS部署建议等方面的全面性表现分析。我们得出了关于系统性能下DOA错误的关闭式表达式,并使用 momento匹配技术计算用户接收能量的统计分布。我们还提供了根据 specify outage probability和能量阈值来确定的建议发射功率。数值结果显示,提案的系统比传统counterpart高于2.3 dB和4.7 dB дляRician因子$\kappa_h=\kappa_G=1$和$\kappa_h=\kappa_G=10$,分别。
paper_authors: Rui Du, Haocheng Hua, Hailiang Xie, Xianxin Song, Zhonghao Lyu, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu for:* 本文旨在提供IEEE 802.11bf标准的全面概述,包括其形成和标准化时间表、WLAN感知应用场景和相关关键性能指标要求、以及先前基于通信启用WLAN标准的WLAN感知研究的限制。methods:* 本文详细介绍了IEEE 802.11bf标准的感知框架和过程,包括低于7GHz和指向多吉比特(DMG)感知在60GHz的评估方法,并讨论了它们之间的共同特点、相似性和差异。results:* 本文介绍了IEEE 802.11bf标准的多个候选技术特征,包括波形/序列设计、反馈类型、量化和压缩技术。同时,文章还详细介绍了IEEE 802.11bf TG在评估中使用的方法和通道模型。Abstract
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications. To resolve this issue, a new Task Group (TG), namely IEEE 802.11bf, has been established by the IEEE 802.11 working group, with the objective of creating a new amendment to the WLAN standard to meet advanced sensing requirements while minimizing the effect on communications. This paper provides a comprehensive overview on the up-to-date efforts in the IEEE 802.11bf TG. First, we introduce the definition of the 802.11bf amendment and its formation and standardization timeline. Next, we discuss the WLAN sensing use cases with the corresponding key performance indicator (KPI) requirements. After reviewing previous WLAN sensing research based on communication-oriented WLAN standards, we identify their limitations and underscore the practical need for the new sensing-oriented amendment in 802.11bf. Furthermore, we discuss the WLAN sensing framework and procedure used for measurement acquisition, by considering both sensing at sub-7GHz and directional multi-gigabit (DMG) sensing at 60 GHz, respectively, and address their shared features, similarities, and differences. In addition, we present various candidate technical features for IEEE 802.11bf, including waveform/sequence design, feedback types, as well as quantization and compression techniques. We also describe the methodologies and the channel modeling used by the IEEE 802.11bf TG for evaluation. Finally, we discuss the challenges and future research directions to motivate more research endeavors towards this field in details.
摘要
随着技术的发展,无线地区网络(WLAN)或无线准确度(Wi-Fi)技术已经成功地实现感知功能,如探测、地理位置和识别。但是,WLAN标准主要是为沟通而开发,因此可能无法满足新兴感知应用的严格要求。为解决这个问题,IEEE 802.11bf 任务组(TG)在IEEE 802.11工作组内成立,目标是创建一个新的修订,以满足高级感知应用的需求,同时尽量不影响通信。本文提供了IEEE 802.11bf TG 的准确情况。首先,我们介绍了802.11bf 修订的定义和成立时间表。然后,我们讨论了WLAN 感知应用的使用场景和相关关键性能指标(KPI)要求。我们还回顾了基于通信导向的WLAN 感知研究,并识别了其局限性,强调了实际需要新的感知导向修订。此外,我们介绍了WLAN 感知框架和测量获取方法,包括低于7GHz 的感知和60GHz 的指向多吉比特(DMG)感知,并讨论了它们之间的共同特点、相似之处和差异。此外,我们还提出了IEEE 802.11bf 修订的各种技术特征,包括波形/序列设计、反馈类型、量化和压缩技术。我们还描述了IEEE 802.11bf TG 使用的评估方法和通信核心模型。最后,我们讨论了IEEE 802.11bf 修订的挑战和未来研究方向,以便更多的研究资源投入到这一领域。
Time-Modulated Intelligent Reflecting Surface for Waveform Security
results: 提高了对授权接收器的通信信息保护,并对所有其他方向的信息进行混淆Abstract
We consider an OFDM transmitter aided by an intelligent reflecting surface (IRS) and propose a novel approach to enhance waveform security by employing time modulation (TM) at the IRS side. By controlling the periodic TM pattern of the IRS elements, the system is designed to preserve communication information towards an authorized recipient and scramble the information towards all other directions. We introduce two modes of TM pattern control: the linear mode, in which we design common TM parameters for entire rows or columns of the IRS, and the planar mode, where we design TM parameters for each individual IRS unit. Due to the required fewer switches, the linear mode is easier to implement as compared to the planar mode. However, the linear model results in a beampattern that has sidelobes, over which the transmitted information is not sufficiently scrambled. We show that the sidelobes of the linear mode can be suppressed by exploiting the high diversity available in that mode.
摘要
我们考虑了一个使用智能反射 superficie(IRS)的OFDM发送器,并提出了一种新的方法来增强波形安全性。我们在IRS сторо面使用时间修订(TM)来控制Periodic TM模式。通过控制IRS元素的TM模式,我们设计了一种保护通信信息向授权接收方和混杂所有其他方向的系统。我们引入了两种TM模式控制方式:直线模式和平面模式。直线模式中,我们设计了整个行或列的TM参数,而平面模式中,我们设计了每个IRS单元的TM参数。由于需要 fewer switches,直线模式更容易实现,但是它会导致有侧峰, transmitter information在这些侧峰上不够混杂。我们表明,直线模式中的侧峰可以通过利用高多样性来降低。
Foundational Techniques for Wireless Communications: Channel Coding, Modulation, and Equalization
results: 该论文通过评估不同通道条件下这些技术的性能,提供了实用的透彻性和现代无线通信系统中这些技术的重要性。线性和决策反馈平衡技术被评估以mitigate通道质量下的效应。Abstract
This paper analyses foundational techniques for improving wireless communication systems, including coding methods, modulation schemes, and channel equalization. Using industry-standard simulation tools, the paper evaluates the performance of these techniques under different channel conditions. Convolutional codes, punctured and unpunctured, are assessed for reliable data transfer. The suitability of various modulation schemes, such as Phase Shift Keying (PSK) and Quadrature Amplitude Modulation (QAM), are examined. Linear and decision-feedback equalization techniques are evaluated for mitigating the effects of channel impairments. The paper provides practical insights into the implementation of these techniques, emphasizing their importance in modern wireless communication systems.
摘要
Convolutional codes, punctured and unpunctured, are assessed for reliable data transfer. The suitability of various modulation schemes, such as Phase Shift Keying (PSK) and Quadrature Amplitude Modulation (QAM), are examined. Linear and decision-feedback equalization techniques are evaluated for mitigating the effects of channel impairments.The paper provides practical insights into the implementation of these techniques, emphasizing their importance in modern wireless communication systems.
results: 研究发现,通过 combining Monte Carlo, quadrature rule, and sparse grid sampling with surrogate model fitting, 在1D和7D输入空间场景中可以实现单位采样减少至10^0和10^1,同时保持高精度的输出空间概率分布。Abstract
This paper studies the utility of techniques within uncertainty quantification, namely spectral projection and polynomial chaos expansion, in reducing sampling needs for characterizing acoustic metamaterial dispersion band responses given stochastic material properties and geometric defects. A novel method of encoding geometric defects in an interpretable, resolution independent is showcased in the formation of input space probability distributions. Orders of magnitude sampling reductions down to $\sim10^0$ and $\sim10^1$ are achieved in the 1D and 7D input space scenarios respectively while maintaining accurate output space probability distributions through combining Monte Carlo, quadrature rule, and sparse grid sampling with surrogate model fitting.
摘要
results: 实验结果表明,使用DNN基于核函数的扩展 beamformer,并通过监督学习使用ARROW损失函数,能够同时实现语音增强和发音定位,并且在语音增强和发音定位方面获得了比基eline两个优化的性能Abstract
Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-aWare (ARROW) loss function. The ARROW loss exploits the array responses of the target and interferer by using the ground truth relative transfer functions (RTFs). The DNN-based beamforming system, trained with ARROW loss through supervised learning, is able to perform speech enhancement and speaker localization jointly. Experimental results have shown that the proposed deep beamformer, trained with the linearly weighted scale-invariant source-to-noise ratio (SI-SNR) and ARROW loss functions, achieves superior performance in speech enhancement and speaker localization compared to two baselines.
摘要
现代深度神经网络(DNN)基本束扩展器的研究进展已经显示了优秀的沟通条件下语音增强的扩展。不同的网络架构和输入特征都被探索以计算扩展权重。在这篇论文中,我们提议一种高效的卷积环recurrent neural network(CRN),通过一种新的ARRAY RespOnse-aWare(ARROW)损失函数进行训练。ARROW损失函数利用目标和干扰者的陷阱响应函数(RTF)。DNN基本束扩展系统,通过监督学习,可以同时进行语音增强和speakerlocalization。实验结果表明,我们提议的深度束扩展器,通过线性权重等比例不变的源噪比(SI-SNR)和ARROW损失函数,在语音增强和speakerlocalization方面比基eline两个参考模型表现出优秀的性能。
On Feature Importance and Interpretability of Speaker Representations
results: 研究发现,certain speaker-specific acoustic-phonetic properties可以很好地从speaker embedding中预测,而investigated more abstract voice quality features则无法预测。Abstract
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker's voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.
摘要
转换文本到简化中文:<>无监督演化语音分解目标在分解快变化和慢变化语音信号中的快变化部分。在这篇论文中,我们更加仔细地研究表示慢变化语音信号组成部分的嵌入向量,通常被称为说话人嵌入向量。我们问,说话人的声音特征是哪些被捕捉,并investigate到哪些嵌入向量组件负责它们,使用基于Shapley值的概念。我们的发现表明, certain speaker-specific acoustic-phonetic properties可以很好地从说话人嵌入中预测,而investigated的更抽象的声音质量特征则不能。
A New Time Series Similarity Measure and Its Smart Grid Applications
paper_authors: Rui Yuan, S. Ali Pourmousavi, Wen L. Soong, Andrew J. Black, Jon A. R. Liisberg, Julian Lemos-Vinasco for: This paper aims to provide a new distance measure for comparing electricity usage patterns in smart grid applications, addressing the limitations of existing measures such as Euclidean Distance (ED) and Dynamic Time Warping (DTW).methods: The proposed method consists of two phases: (1) amplitude-based distance and (2) temporal-based distance, which quantify the effort required to reshape one time series into another considering both amplitude and temporal changes.results: The proposed distance measure outperforms ED and DTW in three smart grid applications: (1) identifying the best load scheduling strategy, (2) detecting anomalous days with irregular electricity usage, and (3) determining electricity users’ behind-the-meter (BTM) equipment.Abstract
Many smart grid applications involve data mining, clustering, classification, identification, and anomaly detection, among others. These applications primarily depend on the measurement of similarity, which is the distance between different time series or subsequences of a time series. The commonly used time series distance measures, namely Euclidean Distance (ED) and Dynamic Time Warping (DTW), do not quantify the flexible nature of electricity usage data in terms of temporal dynamics. As a result, there is a need for a new distance measure that can quantify both the amplitude and temporal changes of electricity time series for smart grid applications, e.g., demand response and load profiling. This paper introduces a novel distance measure to compare electricity usage patterns. The method consists of two phases that quantify the effort required to reshape one time series into another, considering both amplitude and temporal changes. The proposed method is evaluated against ED and DTW using real-world data in three smart grid applications. Overall, the proposed measure outperforms ED and DTW in accurately identifying the best load scheduling strategy, anomalous days with irregular electricity usage, and determining electricity users' behind-the-meter (BTM) equipment.
摘要
Many smart grid applications involve data mining, clustering, classification, identification, and anomaly detection, among others. These applications primarily depend on the measurement of similarity, which is the distance between different time series or subsequences of a time series. The commonly used time series distance measures, namely Euclidean Distance (ED) and Dynamic Time Warping (DTW), do not quantify the flexible nature of electricity usage data in terms of temporal dynamics. As a result, there is a need for a new distance measure that can quantify both the amplitude and temporal changes of electricity time series for smart grid applications, e.g., demand response and load profiling. This paper introduces a novel distance measure to compare electricity usage patterns. The method consists of two phases that quantify the effort required to reshape one time series into another, considering both amplitude and temporal changes. The proposed method is evaluated against ED and DTW using real-world data in three smart grid applications. Overall, the proposed measure outperforms ED and DTW in accurately identifying the best load scheduling strategy, anomalous days with irregular electricity usage, and determining electricity users' behind-the-meter (BTM) equipment.Simplified Chinese translation:许多智能Grid应用程序包括数据挖掘、划分、类型分类、识别和异常检测等,这些应用程序主要依赖测量相似性,即不同时间序列或时间序列 subsequences 之间的距离。常用的时间序列距离度量方法包括欧几何距离 (ED) 和时间扭曲摸索 (DTW),但这些方法不能Quantify electricity usage data的时间动态特性。因此,有一需要一个新的距离度量方法,可以Quantify electricity usage时间序列中的振幅和时间变化。这篇论文介绍了一种新的距离度量方法,用于比较电力使用模式。该方法包括两个阶段,第一阶段是Quantify将一个时间序列变换成另一个时间序列所需的努力,第二阶段是Quantify这个时间序列的振幅和时间变化。提议的方法在三个智能Grid应用程序中评估了与ED和DTW进行比较,并表明该方法在确定最佳负荷调度策略、异常日期和电力用户的后勤设备(BTM)等方面具有更高的准确性。
for: automatizing car model and make identification from images to improve online car-selling platform efficiency
methods: employing various efficient network architectures (CNNs, ViTs, hybrid models) and refining performance through data augmentation, fine-tuning pretrained models, and hyperparameter tuning
results: achieving an accuracy of 81.97% with the EfficientNet (V2 b2) architecture, promising enhanced user experiences across car-selling websitesAbstract
This project presents an automated solution for the efficient identification of car models and makes from images, aimed at streamlining the vehicle listing process on online car-selling platforms. Through a thorough exploration encompassing various efficient network architectures including Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and hybrid models, we achieved a notable accuracy of 81.97% employing the EfficientNet (V2 b2) architecture. To refine performance, a combination of strategies, including data augmentation, fine-tuning pretrained models, and extensive hyperparameter tuning, were applied. The trained model offers the potential for automating information extraction, promising enhanced user experiences across car-selling websites.
摘要
LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning
results: 我们在CARLA simulator上评估了模型,并进行了与其他模型的比较分析。我们发现,LeTFuser在各种情况下(包括正常和反对情况)都能够提供更高的性能和稳定性。Abstract
In end-to-end autonomous driving, the utilization of existing sensor fusion techniques for imitation learning proves inadequate in challenging situations that involve numerous dynamic agents. To address this issue, we introduce LeTFuser, a transformer-based algorithm for fusing multiple RGB-D camera representations. To perform perception and control tasks simultaneously, we utilize multi-task learning. Our model comprises of two modules, the first being the perception module that is responsible for encoding the observation data obtained from the RGB-D cameras. It carries out tasks such as semantic segmentation, semantic depth cloud mapping (SDC), and traffic light state recognition. Our approach employs the Convolutional vision Transformer (CvT) \cite{wu2021cvt} to better extract and fuse features from multiple RGB cameras due to local and global feature extraction capability of convolution and transformer modules, respectively. Following this, the control module undertakes the decoding of the encoded characteristics together with supplementary data, comprising a rough simulator for static and dynamic environments, as well as various measurements, in order to anticipate the waypoints associated with a latent feature space. We use two methods to process these outputs and generate the vehicular controls (e.g. steering, throttle, and brake) levels. The first method uses a PID algorithm to follow the waypoints on the fly, whereas the second one directly predicts the control policy using the measurement features and environmental state. We evaluate the model and conduct a comparative analysis with recent models on the CARLA simulator using various scenarios, ranging from normal to adversarial conditions, to simulate real-world scenarios. Our code is available at \url{https://github.com/pagand/e2etransfuser/tree/cvpr-w} to facilitate future studies.
摘要
在末端自动驾驶中,使用现有的感知融合技术进行模仿学习显示不够有效,特别是在包含多个动态代理的复杂情况下。为解决这个问题,我们提出了LeTFuser算法,它基于变换器来融合多个RGB-D相机表示。通过多任务学习,我们同时进行感知和控制任务。我们的模型包括两个模块:第一个是感知模块,负责对RGB-D相机获得的观察数据进行编码。它完成了semantic segmentation、semantic depth cloud mapping(SDC)和交通灯状态识别等任务。我们采用了Convolutional Vision Transformer(CvT) \cite{wu2021cvt} 来更好地提取和融合多个RGB相机中的特征,因为它具有局部和全局特征提取能力。接着,控制模块根据编码特征和补充数据(包括粗略的模拟器、静态和动态环境的测量数据)进行解码,以预测 vehicular控制(例如,车辆的油门、加速和刹车)水平。我们使用两种方法处理输出并生成车辆控制水平:一种使用PID算法跟踪卫星点,另一种直接预测控制策略使用测量特征和环境状态。我们在CARLA模拟器上对模型进行评估,并对其进行与最近模型的比较分析,使用多种情况,从普通到反对情况,来模拟真实世界情况。我们的代码可以在\url{https://github.com/pagand/e2etransfuser/tree/cvpr-w}中找到,以便未来的研究。
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
results: 在三个不同的 RS-VQA 数据集上进行了广泛的实验,并在所有三个数据集上达到了最佳结果。Abstract
In recent years, with the rapid advancement of transformer models, transformer-based multimodal architectures have found wide application in various downstream tasks, including but not limited to Image Captioning, Visual Question Answering (VQA), and Image-Text Generation. However, contemporary approaches to Remote Sensing (RS) VQA often involve resource-intensive techniques, such as full fine-tuning of large models or the extraction of image-text features from pre-trained multimodal models, followed by modality fusion using decoders. These approaches demand significant computational resources and time, and a considerable number of trainable parameters are introduced. To address these challenges, we introduce a novel method known as RSAdapter, which prioritizes runtime and parameter efficiency. RSAdapter comprises two key components: the Parallel Adapter and an additional linear transformation layer inserted after each fully connected (FC) layer within the Adapter. This approach not only improves adaptation to pre-trained multimodal models but also allows the parameters of the linear transformation layer to be integrated into the preceding FC layers during inference, reducing inference costs. To demonstrate the effectiveness of RSAdapter, we conduct an extensive series of experiments using three distinct RS-VQA datasets and achieve state-of-the-art results on all three datasets. The code for RSAdapter will be available online at https://github.com/Y-D-Wang/RSAdapter.
摘要
Recently, with the rapid development of transformer models, transformer-based multimodal architectures have been widely applied in various downstream tasks, such as Image Captioning, Visual Question Answering (VQA), and Image-Text Generation. However, current approaches to Remote Sensing (RS) VQA often rely on resource-intensive techniques, such as full fine-tuning of large models or extracting image-text features from pre-trained multimodal models, followed by modality fusion using decoders. These approaches require significant computational resources and time, and a large number of trainable parameters are introduced. To address these challenges, we propose a novel method called RSAdapter, which prioritizes runtime and parameter efficiency. RSAdapter consists of two key components: the Parallel Adapter and an additional linear transformation layer inserted after each fully connected (FC) layer within the Adapter. This approach not only improves adaptation to pre-trained multimodal models but also allows the parameters of the linear transformation layer to be integrated into the preceding FC layers during inference, reducing inference costs. To demonstrate the effectiveness of RSAdapter, we conduct an extensive series of experiments using three distinct RS-VQA datasets and achieve state-of-the-art results on all three datasets. The code for RSAdapter will be available online at .
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
results: 实验和在真实的室内场景上的应用表明,该框架可以生成高品质的纹理图像,并在VR头戴式设备上提供了有趣的体验。Abstract
Diffusion-based methods have achieved prominent success in generating 2D media. However, accomplishing similar proficiencies for scene-level mesh texturing in 3D spatial applications, e.g., XR/VR, remains constrained, primarily due to the intricate nature of 3D geometry and the necessity for immersive free-viewpoint rendering. In this paper, we propose a novel indoor scene texturing framework, which delivers text-driven texture generation with enchanting details and authentic spatial coherence. The key insight is to first imagine a stylized 360{\deg} panoramic texture from the central viewpoint of the scene, and then propagate it to the rest areas with inpainting and imitating techniques. To ensure meaningful and aligned textures to the scene, we develop a novel coarse-to-fine panoramic texture generation approach with dual texture alignment, which both considers the geometry and texture cues of the captured scenes. To survive from cluttered geometries during texture propagation, we design a separated strategy, which conducts texture inpainting in confidential regions and then learns an implicit imitating network to synthesize textures in occluded and tiny structural areas. Extensive experiments and the immersive VR application on real-world indoor scenes demonstrate the high quality of the generated textures and the engaging experience on VR headsets. Project webpage: https://ybbbbt.com/publication/dreamspace
摘要
diffusion-based methods 已经取得了2D媒体生成的显著成功,但在3D空间应用中,例如XR/VR中的场景级别的 mesh 纹理仍然受到限制,主要是因为3D geometery的复杂性和需要免费看角渲染。在这篇论文中,我们提出了一种新的室内场景纹理框架,可以通过文本驱动生成细节浩繁、真实空间准确的纹理。我们的关键发现是首先从场景的中心视点想象一个精细360度扩展的Texture,然后通过填充和模仿技术将其扩展到其他区域。为确保纹理与场景的意义和平行性,我们开发了一种新的粗略到细腻的扩展Texture生成方法,该方法考虑了场景的几何和纹理特征。在进行纹理填充时,我们设计了一种分离策略,通过在信息丰富的区域进行纹理填充,然后使用一种隐藏的假凝结网络来Synthesize纹理在受阻和小结构区域。我们的实验和基于真实室内场景的VR应用示出了生成的纹理的高质量和VR头戴设备上的有趣体验。项目网站:https://ybbbbt.com/publication/dreamspace
Streamlining Brain Tumor Classification with Custom Transfer Learning in MRI Images
results: 研究得到了96.42%的分类精度。Abstract
Brain tumors are increasingly prevalent, characterized by the uncontrolled spread of aberrant tissues in the brain, with almost 700,000 new cases diagnosed globally each year. Magnetic Resonance Imaging (MRI) is commonly used for the diagnosis of brain tumors and accurate classification is a critical clinical procedure. In this study, we propose an efficient solution for classifying brain tumors from MRI images using custom transfer learning networks. While several researchers have employed various pre-trained architectures such as RESNET-50, ALEXNET, VGG-16, and VGG-19, these methods often suffer from high computational complexity. To address this issue, we present a custom and lightweight model using a Convolutional Neural Network-based pre-trained architecture with reduced complexity. Specifically, we employ the VGG-19 architecture with additional hidden layers, which reduces the complexity of the base architecture but improves computational efficiency. The objective is to achieve high classification accuracy using a novel approach. Finally, the result demonstrates a classification accuracy of 96.42%.
摘要
脑肿增多,特征为脑内不良组织的无控制扩散,每年全球诊断新 случа数达700,000例。核磁共振成像(MRI)广泛用于脑肿诊断,精确分类是临床重要程序。在本研究中,我们提出一种高效的脑肿分类方法,使用自定义传输学习网络。虽然许多研究人员使用了不同的预训练模型,如RESNET-50、ALEXNET、VGG-16和VGG-19,但这些方法经常受到高计算复杂性的限制。为解决这个问题,我们提出了一种自定义和轻量级的模型,基于卷积神经网络预训练架构,减少了基础架构的复杂性,但提高了计算效率。目标是实现高精度分类。最终结果表明,分类精度达96.42%。
PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses
results: 该论文的实验结果表明,PatchCURE可以在不同的计算效率和功能需求下提供优秀的防御性能,并且可以与现有的状态态攻击防御方法相比肉。Abstract
State-of-the-art defenses against adversarial patch attacks can now achieve strong certifiable robustness with a marginal drop in model utility. However, this impressive performance typically comes at the cost of 10-100x more inference-time computation compared to undefended models -- the research community has witnessed an intense three-way trade-off between certifiable robustness, model utility, and computation efficiency. In this paper, we propose a defense framework named PatchCURE to approach this trade-off problem. PatchCURE provides sufficient "knobs" for tuning defense performance and allows us to build a family of defenses: the most robust PatchCURE instance can match the performance of any existing state-of-the-art defense (without efficiency considerations); the most efficient PatchCURE instance has similar inference efficiency as undefended models. Notably, PatchCURE achieves state-of-the-art robustness and utility performance across all different efficiency levels, e.g., 16-23% absolute clean accuracy and certified robust accuracy advantages over prior defenses when requiring computation efficiency to be close to undefended models. The family of PatchCURE defenses enables us to flexibly choose appropriate defenses to satisfy given computation and/or utility constraints in practice.
摘要
现代防御技术可以实现强有条件的鲁棒性,但通常会增加10-100倍的推理时间成本,与未防御的模型相比。在这篇论文中,我们提出了一种防御框架名为PatchCURE,以解决这种三方贸易问题。PatchCURE提供了多个可调参数,allowing us to construct a family of defenses,其中最鲁棒的PatchCURE实例可以与任何现有的最佳防御相匹配,而不考虑效率考虑;而最有效的PatchCURE实例的推理效率与未防御模型几乎相同。此外,PatchCURE在不同的效率水平上都可以实现状态之最好的鲁棒性和用户性能。PatchCURE的家族防御可以在实践中选择合适的防御,满足给定的计算和/或用户约束。
Using Logic Programming and Kernel-Grouping for Improving Interpretability of Convolutional Neural Networks
results: 这个框架可以减少 FOLD-SE-M 生成的规则集的大小,从而提高了知识的可读性。此外,这个框架还可以将 CNN 的下层层 kernel 符号化,并将其映射到人类可理解的概念上。Abstract
Within the realm of deep learning, the interpretability of Convolutional Neural Networks (CNNs), particularly in the context of image classification tasks, remains a formidable challenge. To this end we present a neurosymbolic framework, NeSyFOLD-G that generates a symbolic rule-set using the last layer kernels of the CNN to make its underlying knowledge interpretable. What makes NeSyFOLD-G different from other similar frameworks is that we first find groups of similar kernels in the CNN (kernel-grouping) using the cosine-similarity between the feature maps generated by various kernels. Once such kernel groups are found, we binarize each kernel group's output in the CNN and use it to generate a binarization table which serves as input data to FOLD-SE-M which is a Rule Based Machine Learning (RBML) algorithm. FOLD-SE-M then generates a rule-set that can be used to make predictions. We present a novel kernel grouping algorithm and show that grouping similar kernels leads to a significant reduction in the size of the rule-set generated by FOLD-SE-M, consequently, improving the interpretability. This rule-set symbolically encapsulates the connectionist knowledge of the trained CNN. The rule-set can be viewed as a normal logic program wherein each predicate's truth value depends on a kernel group in the CNN. Each predicate in the rule-set is mapped to a concept using a few semantic segmentation masks of the images used for training, to make it human-understandable. The last layers of the CNN can then be replaced by this rule-set to obtain the NeSy-G model which can then be used for the image classification task. The goal directed ASP system s(CASP) can be used to obtain the justification of any prediction made using the NeSy-G model. We also propose a novel algorithm for labeling each predicate in the rule-set with the semantic concept(s) that its corresponding kernel group represents.
摘要
在深度学习领域,特别是图像分类任务中,卷积神经网络(CNN)的解释性仍然是一大挑战。为此,我们提出了一种神经符号学框架(NeSyFOLD-G),该框架使用 CNN 的最后一层核心来生成一个符号化规则集。与其他类似框架不同的是,我们首先在 CNN 中找到相似核心(kernel-grouping),并使用归一化矩阵来将每个核心组的输出binariz。然后,我们使用这些binarization表作为 FOLD-SE-M 算法的输入数据,并使其生成一个规则集。我们提出了一种新的核心分组算法,并证明了将相似核心分组可以减少 FOLD-SE-M 生成的规则集的大小,从而提高解释性。这个规则集可以被视为一个正逻辑程序,其中每个前置的真值取决于 CNN 中的核心组。每个前置在规则集中都可以被映射到一个概念,使其成为人类理解的。我们还提出了一种新的算法,用于在规则集中标注每个前置的 semantic 概念。最后,我们将 CNN 的最后一层替换为 NeSy-G 模型,并使用该模型进行图像分类任务。我们还可以使用 goal-directed ASP 系统(s(CASP))来获取任务结果的证明。
Putting the Object Back into Video Object Segmentation
paper_authors: Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing
for: 这个研究是为了提高视频对象分割(VOS)的精度和效率,特别是在复杂的数据集中。
methods: 这个模型使用一个叫做“query-based object transformer”的新方法,将物件表现 від memory 拼接到 video object segmentation 结果中,以提高精度和效率。
results: 在具有复杂数据集的 MOSE 测试集上,这个模型与 XMem 相比增加了8.7 J&F,并且与 DeAOT 相比增加了4.2 J&F,且三倍快速。Abstract
We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In contrast, Cutie performs top-down object-level memory reading by adapting a small set of object queries for restructuring and interacting with the bottom-up pixel features iteratively with a query-based object transformer (qt, hence Cutie). The object queries act as a high-level summary of the target object, while high-resolution feature maps are retained for accurate segmentation. Together with foreground-background masked attention, Cutie cleanly separates the semantics of the foreground object from the background. On the challenging MOSE dataset, Cutie improves by 8.7 J&F over XMem with a similar running time and improves by 4.2 J&F over DeAOT while running three times as fast. Code is available at: https://hkchengrex.github.io/Cutie
摘要
我们介绍Cutie,一个具有物体记忆阅读的视频物体分割(VOS)网络,它将物体表现从内存中推回到视频物体分割结果中。现有的VOS方法通常使用底向推导的像素级别记忆阅读,它在许多挑战性数据中会受到匹配噪音的影响,导致性能较差。相比之下,Cutie使用顶向物体级别的记忆阅读,通过适应一小集的物体查询来重构和与底向像素特征进行联合运算(qt,因此称为Cutie)。物体查询 behave as a high-level概要 of the target object, while high-resolution feature maps are retained for accurate segmentation. 同时,内部遮瑕遮瑕注意力可以清晰地分离背景和前景的 semantics。在MOSE数据集上,Cutie与XMem的比较获得8.7 J&F的提升,并且与DeAOT的比较获得4.2 J&F的提升,具有相似的执行时间。代码可以在:https://hkchengrex.github.io/Cutie 上取得。
for: 本研究targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims to generate high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.
methods: 我们提出了一个Text-aligned whOle-body Motion generATiOn framework,named HumanTOMATO,which is the first attempt to our knowledge towards applicable holistic motion generation in this research area. Our solution includes two key designs: (1) a Holistic Hierarchical VQ-VAE (aka H$^2$VQ) and a Hierarchical-GPT for fine-grained body and hand motion reconstruction and generation with two structured codebooks; and (2) a pre-trained text-motion-alignment model to help generated motion align with the input textual description explicitly.
results: 我们的模型在生成动作质量和文本对齐方面具有显著优势。Abstract
This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously. Previous works on text-driven motion generation tasks mainly have two limitations: they ignore the key role of fine-grained hand and face controlling in vivid whole-body motion generation, and lack a good alignment between text and motion. To address such limitations, we propose a Text-aligned whOle-body Motion generATiOn framework, named HumanTOMATO, which is the first attempt to our knowledge towards applicable holistic motion generation in this research area. To tackle this challenging task, our solution includes two key designs: (1) a Holistic Hierarchical VQ-VAE (aka H$^2$VQ) and a Hierarchical-GPT for fine-grained body and hand motion reconstruction and generation with two structured codebooks; and (2) a pre-trained text-motion-alignment model to help generated motion align with the input textual description explicitly. Comprehensive experiments verify that our model has significant advantages in both the quality of generated motions and their alignment with text.
摘要
这个工作目标是一种基于文本描述的全身动作生成任务,它将输入文本描述作为输入,并生成高质量、多样化、协调的面部表达、手势和身体动作。先前的文本动作生成任务主要有两点限制:一是忽略细腻的手势和面部控制的重要作用,二是缺乏文本和动作之间的好的对应。为了解决这些限制,我们提出了一个名为人类TOMATO的文本对齐整体动作框架,是我们知道的研究领域首次尝试。为了解决这个复杂的任务,我们的解决方案包括两个关键设计:一是一种层次结构的VQ-VAE(即H$^2$VQ)和一种层次结构的GPT для细腻的身体和手势动作重建和生成,使用两个结构化的编码库;二是一种预训练的文本动作对齐模型,以帮助生成的动作与输入文本描述对齐Explicitly。广泛的实验证明了我们模型在生成动作质量和文本对齐方面具有显著优势。
results: 这篇论文发现了一种称为”隐藏波”的现象,即每个图像都可以被分解为一 colelction of special solutions of the same one-way wave equations,这些解决方案都是一个共享的 autoregressive 矩阵的多阶幂。这表明,即使速度和autoregressive矩阵是隐藏的,它们也可以被学习和共享。这种数学变换提供了一种新的数学视角来理解图像。Abstract
In this paper, we introduce an intriguing phenomenon-the successful reconstruction of images using a set of one-way wave equations with hidden and learnable speeds. Each individual image corresponds to a solution with a unique initial condition, which can be computed from the original image using a visual encoder (e.g., a convolutional neural network). Furthermore, the solution for each image exhibits two noteworthy mathematical properties: (a) it can be decomposed into a collection of special solutions of the same one-way wave equations that are first-order autoregressive, with shared coefficient matrices for autoregression, and (b) the product of these coefficient matrices forms a diagonal matrix with the speeds of the wave equations as its diagonal elements. We term this phenomenon hidden waves, as it reveals that, although the speeds of the set of wave equations and autoregressive coefficient matrices are latent, they are both learnable and shared across images. This represents a mathematical invariance across images, providing a new mathematical perspective to understand images.
摘要
在这篇论文中,我们介绍了一种有趣的现象:通过一组一个方向波方程的解,成功地重建图像。每个图像都对应于一个唯一的初始条件,可以从原始图像使用视觉编码器(例如卷积神经网络)来计算。此外,每个解表现出两个值得注意的数学性质:(a)它可以分解为同一个一个方向波方程的特殊解,这些特殊解具有共享的权重矩阵,并(b)这些权重矩阵的乘积形成一个对角矩阵,其中对角元素都是波方程的速度。我们称这种现象为“隐藏波”,因为尽管波方程和权重矩阵的速度都是隐藏的,但它们都是学习的,并且在图像之间共享。这表示图像具有数学的变换性,提供了一个新的数学视角来理解图像。
FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects
for: 3D object recognition without relying on real-world 3D labeled data
methods: multi-stage training pipeline with synthetic and real-world data, combining 2D and 3D supervised losses and 2D self-supervised loss
results: outperforms existing self-supervised 6D pose and size estimation baselines on the NOCS test-set with a 16.4% absolute improvement in mAP for 6D pose estimation, running in near real-time at 5 HzAbstract
In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data. Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference. While existing self-supervised methods have made strides in this field, they often suffer from inefficiencies arising from non-end-to-end processing, reliance on separate models for different object categories, and slow surface extraction during the training of implicit reconstruction models; thus hindering both the speed and real-world applicability of the 3D recognition process. Our proposed method leverages a multi-stage training pipeline, designed to efficiently transfer synthetic performance to the real-world domain. This approach is achieved through a combination of 2D and 3D supervised losses during the synthetic domain training, followed by the incorporation of 2D supervised and 3D self-supervised losses on real-world data in two additional learning stages. By adopting this comprehensive strategy, our method successfully overcomes the aforementioned limitations and outperforms existing self-supervised 6D pose and size estimation baselines on the NOCS test-set with a 16.4% absolute improvement in mAP for 6D pose estimation while running in near real-time at 5 Hz.
摘要
在这项工作中,我们面临着3D物体认知无需真实世界3D标注数据的挑战。我们的目标是在单个RGB-D图像中预测物体的3D形状、大小和6D姿态,并在类别水平上进行预测,不需要在推理过程中使用CAD模型。现有的自动学习方法在这个领域有所进步,但它们经常受到非终端处理引起的不具有效率,以及不同类别的物体模型之间的分离,从而降低了推理速度和实际应用性。我们提议的方法采用多阶段训练管道,通过在synthetic领域中使用2D和3D监督损失进行训练,然后在real-world数据上添加2D监督和3D自监督损失进行两个额外学习阶段。通过这种全面策略,我们的方法成功地超越了先前的自动学习6D姿态和大小估计基准,在NOCS测试集上达到16.4%的绝对提升率,而且在5Hz的刷新率下运行在实时内。
Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey
results: 研究人员通过对多个人姿分析任务的评估和分析,发现了一些关键的问题和挑战,同时也提出了一些可能的解决方案。Abstract
Human pose analysis has garnered significant attention within both the research community and practical applications, owing to its expanding array of uses, including gaming, video surveillance, sports performance analysis, and human-computer interactions, among others. The advent of deep learning has significantly improved the accuracy of pose capture, making pose-based applications increasingly practical. This paper presents a comprehensive survey of pose-based applications utilizing deep learning, encompassing pose estimation, pose tracking, and action recognition.Pose estimation involves the determination of human joint positions from images or image sequences. Pose tracking is an emerging research direction aimed at generating consistent human pose trajectories over time. Action recognition, on the other hand, targets the identification of action types using pose estimation or tracking data. These three tasks are intricately interconnected, with the latter often reliant on the former. In this survey, we comprehensively review related works, spanning from single-person pose estimation to multi-person pose estimation, from 2D pose estimation to 3D pose estimation, from single image to video, from mining temporal context gradually to pose tracking, and lastly from tracking to pose-based action recognition. As a survey centered on the application of deep learning to pose analysis, we explicitly discuss both the strengths and limitations of existing techniques. Notably, we emphasize methodologies for integrating these three tasks into a unified framework within video sequences. Additionally, we explore the challenges involved and outline potential directions for future research.
摘要
人姿分析在研究社区和实际应用中受到了广泛关注,因为它在游戏、视频监测、运动表现分析和人机交互等领域有着扩大的应用范围。深度学习的出现使得人姿捕捉的准确性得到了显著提高,使得人姿基于应用变得更加实用。本文是一篇对深度学习应用于人姿分析的全面评论,涵盖了人姿估计、人姿跟踪和动作识别等三个任务。人姿估计是从图像或图像序列中确定人 JOINT 位置的任务。人姿跟踪是一个emerging的研究方向,旨在在时间上生成一致的人姿轨迹。动作识别则是根据人姿估计或跟踪数据来确定动作类型的任务。这三个任务之间存在紧密的关系,后两个任务经常依赖于前一个任务。在这篇评论中,我们全面回顾相关的工作,从单人人姿估计到多人人姿估计,从2D人姿估计到3D人姿估计,从单图像到视频,从慢慢地采集时间上的动作特征来估计人姿到pose tracking,并最后从跟踪到动作识别。作为深度学习应用于人姿分析的评论,我们明确地讨论了现有技术的优势和局限性。尤其是我们强调将这三个任务集成到视频序列中,并探讨了相关挑战和未来研究的可能性。
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding
results: 实验结果显示,使用HPTR可以在维持与现有方法相同的性能水准下,提高系统的实时运行效率和扩展性。Abstract
The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. Then, based on KNARPE we present the Heterogeneous Polyline Transformer with Relative pose encoding (HPTR), a hierarchical framework enabling asynchronous token update during the online inference. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods. Experiments on Waymo and Argoverse-2 datasets show that HPTR achieves superior performance among end-to-end methods that do not apply expensive post-processing or model ensembling. The code is available at https://github.com/zhejz/HPTR.
摘要
现实世界中部署自动驾驶系统需要其组件在实时上下文中运行,包括预测周围交通参与者未来轨迹的运动预测模块。现有的中心式方法在公共标准上表现出色,但它们由于参与者数量增加而受到高计算负担和差异化问题。为解决这问题,我们提出了K-最近邻居注意力与相对姿态编码(KNARPE),一种新的注意力机制,allowing Transformers使用对称的对比姿态表示。然后,基于KNARPE,我们提出了多态轨迹转换器(HPTR),一种层次结构的框架,允许在在线推断过程中 asynchronous token 更新。通过在Agent之间共享上下文和重用不变的上下文,我们的方法可以与场景中心式方法相当努力,同时与现有的中心式方法相比,表现出色。实验结果表明,HPTR在不使用昂贵的后处理或模型ensemble的情况下,在终端方法中达到了最高性能。代码可以在https://github.com/zhejz/HPTR 上找到。
3D-GPT: Procedural 3D Modeling with Large Language Models
paper_authors: Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould for:* 3D模型创建自动化Content生成methods:* 使用大型自然语言模型(LLMs)进行指令驱动3D模型创建* 分解复杂的3D模型创建任务为可访问的部分,并委托适合的代理处理每个任务results:* 可靠地从文本中提取参数值,用于轻松地与3D软件集成* 与人类设计师合作有效* 可以轻松地扩展 manipulate 可能性Note: The above information is in Simplified Chinese.Abstract
In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a demanding endeavor, given its intricate nature necessitating a deep understanding of rules, algorithms, and parameters. To reduce workload, we introduce 3D-GPT, a framework utilizing large language models~(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. 3D-GPT integrates three core agents: the task dispatch agent, the conceptualization agent, and the modeling agent. They collaboratively achieve two objectives. First, it enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions. Second, it integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.
摘要
在寻求高效自动内容创造的探索中,procédural生成技术 Emerges as a promising approach。然而,它的复杂性可能会增加工作负担,需要深刻理解规则、算法和参数。为了减轻工作负担,我们介绍了3D-GPT框架,该框架利用大语言模型(LLMs) дляinstruction-driven 3D 模型创建。3D-GPT将 LLMs 作为有能力的问题解决器,将过程式 3D 模型创建任务分解成可 accessible 的部分,并将每个任务分配给适合的代理人。3D-GPT integrate three core agents:任务派发代理人、概念化代理人和模型代理人。它们合作实现两个目标。首先,它可以将简洁的初始场景描述进行细化,并在基于后续指令的动态适应文本上进行演化。其次,它可以通过执行程序代码来轻松地与 3D 软件进行资产创建。我们的实验证明,3D-GPT 不仅可以理解和执行指令,还可以与人类设计师合作有效。此外,它可以轻松地与 Blender 集成,解锁了更多的操作可能性。我们的工作显示了LLMs在 3D 模型创建方面的潜在力量,并提供了一个基本的框架,以便未来的Scene生成和动画技术的发展。
Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey
results: 这些方法可以在图像和视频中找到对象,而无需任何手动标注。Abstract
The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. We gather links of discussed methods in the repository https://github.com/valeoai/Awesome-Unsupervised-Object-Localization.
摘要
当前社区对开放视界系统的热情表明了对外部环境中进行感知任务的高度兴趣,而这些任务曾经 Until now, most benchmark setups have been based on closed vocabularies, but recent works have shown that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation, in the era of self-supervised ViTs. The discussed methods can be found in the repository .Note: "ViT" stands for Vision Transformer, which is a type of neural network architecture that has gained popularity in computer vision tasks.
Perceptual Assessment and Optimization of High Dynamic Range Image Rendering
paper_authors: Peibei Cao, Rafal K. Mantiuk, Kede Ma
for: This paper focuses on developing a family of high dynamic range (HDR) image quality assessment (IQA) models that can accurately evaluate the quality of HDR images.
methods: The proposed HDR IQA models use a simple inverse display model to decompose an HDR image into a set of low dynamic range (LDR) images with different exposures, which are then assessed by existing LDR quality models. The local quality scores of each exposure are aggregated using a well-exposedness measure, and the overall quality score is obtained by weighting the exposures.
results: The proposed HDR IQA models outperform existing IQA methods, including the HDR-VDP family, in evaluating the quality of HDR images. Additionally, the models demonstrate strengths in perceptual optimization of HDR novel view synthesis.Abstract
High dynamic range (HDR) imaging has gained increasing popularity for its ability to faithfully reproduce the luminance levels in natural scenes. Accordingly, HDR image quality assessment (IQA) is crucial but has been superficially treated. The majority of existing IQA models are developed for and calibrated against low dynamic range (LDR) images, which have been shown to be poorly correlated with human perception of HDR image quality. In this work, we propose a family of HDR IQA models by transferring the recent advances in LDR IQA. The key step in our approach is to specify a simple inverse display model that decomposes an HDR image to a set of LDR images with different exposures, which will be assessed by existing LDR quality models. The local quality scores of each exposure are then aggregated with the help of a simple well-exposedness measure into a global quality score for each exposure, which will be further weighted across exposures to obtain the overall quality score. When assessing LDR images, the proposed HDR quality models reduce gracefully to the original LDR ones with the same performance. Experiments on four human-rated HDR image datasets demonstrate that our HDR quality models are consistently better than existing IQA methods, including the HDR-VDP family. Moreover, we demonstrate their strengths in perceptual optimization of HDR novel view synthesis.
摘要
高动态范围(HDR)摄影技术在自然场景中的亮度水平准确反映,因此HDR图像质量评估(IQA)变得非常重要。然而,现有的IQA模型大多是基于低动态范围(LDR)图像的,这些图像与人类对HDR图像质量的识别呈现相互不符。在这项工作中,我们提出了一系列基于LDR IQA的HDR IQA模型。我们的方法的关键步骤是使用一个简单的反映显示模型将HDR图像分解成不同曝光的LDR图像,然后使用现有的LDR质量模型评估每个曝光的地方质量。最后,我们将每个曝光的地方质量Weighted来获得总质量分数,并将其进行权重平均来获得总质量分数。当评估LDR图像时,我们的HDR质量模型会降低到原始的LDR模型,并保持同样的性能。我们在四个人标注的HDR图像集上进行了实验,并证明了我们的HDR质量模型在现有IQA方法中具有更高的性能。此外,我们还证明了它们在HDR新视图合成中的强大特点。
EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model
paper_authors: Zheyuan Zhang, Lanhong Yao, Bin Wang, Debesh Jha, Elif Keles, Alpay Medetalibeyoglu, Ulas Bagci for:This paper aims to address the challenge of scarce high-quality labeled data for medical deep learning models by proposing a novel approach called EMIT-Diff for medical image synthesis.methods:The proposed EMIT-Diff method leverages recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data that preserve the essential characteristics of the original medical images. The method incorporates edge information to guide the synthesis process and ensures that the synthesized samples adhere to medically relevant constraints.results:The proposed EMIT-Diff method achieves significant improvements in medical image segmentation tasks on multiple datasets, including Ultrasound breast, CT spleen, and MRI prostate. The method demonstrates the effectiveness of introducing a first-ever text-guided diffusion model for general medical image segmentation tasks, and the results show the feasibility of using synthetic data for medical image segmentation tasks.Abstract
Large-scale, big-variant, and high-quality data are crucial for developing robust and successful deep-learning models for medical applications since they potentially enable better generalization performance and avoid overfitting. However, the scarcity of high-quality labeled data always presents significant challenges. This paper proposes a novel approach to address this challenge by developing controllable diffusion models for medical image synthesis, called EMIT-Diff. We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data that preserve the essential characteristics of the original medical images by incorporating edge information of objects to guide the synthesis process. In our approach, we ensure that the synthesized samples adhere to medically relevant constraints and preserve the underlying structure of imaging data. Due to the random sampling process by the diffusion model, we can generate an arbitrary number of synthetic images with diverse appearances. To validate the effectiveness of our proposed method, we conduct an extensive set of medical image segmentation experiments on multiple datasets, including Ultrasound breast (+13.87%), CT spleen (+0.38%), and MRI prostate (+7.78%), achieving significant improvements over the baseline segmentation methods. For the first time, to our best knowledge, the promising results demonstrate the effectiveness of our EMIT-Diff for medical image segmentation tasks and show the feasibility of introducing a first-ever text-guided diffusion model for general medical image segmentation tasks. With carefully designed ablation experiments, we investigate the influence of various data augmentation ratios, hyper-parameter settings, patch size for generating random merging mask settings, and combined influence with different network architectures.
摘要
大规模、大变种、高质量数据是深度学习模型在医疗应用中的关键,因为它们可能提供更好的泛化性和避免过拟合。然而,医疗数据的稀缺性常常带来重大挑战。这篇论文提出了一种新的方法,称为EMIT-Diff,以 Address this challenge by developing controllable diffusion models for medical image synthesis. We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data that preserve the essential characteristics of the original medical images by incorporating edge information of objects to guide the synthesis process. In our approach, we ensure that the synthesized samples adhere to medically relevant constraints and preserve the underlying structure of imaging data. Due to the random sampling process by the diffusion model, we can generate an arbitrary number of synthetic images with diverse appearances. To validate the effectiveness of our proposed method, we conduct an extensive set of medical image segmentation experiments on multiple datasets, including Ultrasound breast (+13.87%), CT spleen (+0.38%), and MRI prostate (+7.78%), achieving significant improvements over the baseline segmentation methods. For the first time, to our best knowledge, the promising results demonstrate the effectiveness of our EMIT-Diff for medical image segmentation tasks and show the feasibility of introducing a first-ever text-guided diffusion model for general medical image segmentation tasks. With carefully designed ablation experiments, we investigate the influence of various data augmentation ratios, hyper-parameter settings, patch size for generating random merging mask settings, and combined influence with different network architectures.
Neural Degradation Representation Learning for All-In-One Image Restoration
for: This paper aims to propose an all-in-one image restoration network that can handle multiple degradations, such as noise, haze, rain, and downsampling.
methods: The proposed method learns a neural degradation representation (NDR) that captures the underlying characteristics of various degradations, and uses a degradation query module and a degradation injection module to recognize and utilize the specific degradation based on NDR. The method also employs a bidirectional optimization strategy to effectively drive NDR to learn the degradation representation.
results: The proposed method is demonstrated to be effective and generalizable on representative types of degradations, including noise, haze, rain, and downsampling, through comprehensive experiments.Abstract
Existing methods have demonstrated effective performance on a single degradation type. In practical applications, however, the degradation is often unknown, and the mismatch between the model and the degradation will result in a severe performance drop. In this paper, we propose an all-in-one image restoration network that tackles multiple degradations. Due to the heterogeneous nature of different types of degradations, it is difficult to process multiple degradations in a single network. To this end, we propose to learn a neural degradation representation (NDR) that captures the underlying characteristics of various degradations. The learned NDR decomposes different types of degradations adaptively, similar to a neural dictionary that represents basic degradation components. Subsequently, we develop a degradation query module and a degradation injection module to effectively recognize and utilize the specific degradation based on NDR, enabling the all-in-one restoration ability for multiple degradations. Moreover, we propose a bidirectional optimization strategy to effectively drive NDR to learn the degradation representation by optimizing the degradation and restoration processes alternately. Comprehensive experiments on representative types of degradations (including noise, haze, rain, and downsampling) demonstrate the effectiveness and generalization capability of our method.
摘要
现有方法已经证明可以在单一的降解类型上达到有效性。然而,在实际应用中,降解通常是未知的,并且模型和降解之间的匹配问题会导致性能下降。在这篇论文中,我们提出了一个涵盖多种降解的图像恢复网络。由于不同类型的降解具有不同的特征,因此在单一网络中处理多种降解是困难的。为此,我们提出了学习神经降解表示(NDR),该表示捕捉不同类型降解的基本特征。学习后,NDR可以适应性地分解不同类型降解,类似于神经字典中的基本降解组件。然后,我们提出了降解查询模块和降解注入模块,以有效地识别和利用特定降解,实现涵盖多种降解的恢复能力。此外,我们提出了双向优化策略,以有效地驱动NDR学习降解表示,通过同时优化降解和恢复过程来更好地适应不同类型降解。在代表性的降解类型(包括噪声、雾、雨和下采样)的实验中,我们的方法得到了有效性和普适性的证明。
OODRobustBench: benchmarking and analyzing adversarial robustness under distribution shift
results: 本文通过对706个鲁棒模型进行60.7万次攻击评估,发现了多个鲁棒模型在异常输入下的鲁棒性受到严重的挑战。此外,本文还发现了一些新的训练方法和技术可以提高异常输入鲁棒性。Abstract
Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This is a concerning omission as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness, in a positive linear way, under many distribution shifts. The latter enables the prediction of OOD robustness from ID robustness. Based on this, we are able to predict the upper limit of OOD robustness for existing robust training schemes. The results suggest that achieving OOD robustness requires designing novel methods beyond the conventional ones. Last, we discover that extra data, data augmentation, advanced model architectures and particular regularization approaches can improve OOD robustness. Noticeably, the discovered training schemes, compared to the baseline, exhibit dramatically higher robustness under threat shift while keeping high ID robustness, demonstrating new promising solutions for robustness against both multi-attack and unforeseen attacks.
摘要
现有的研究已经做出了很大的进步,以改善抗敌性 Robustness,但通常只在同一个分布中进行测试,即内部分布(ID)测试。因此,实际上不清楚这种 Robustness 如何在输入分布差异(OOD)测试中发挥作用。这是一个担忧的漏洞,因为在实际应用中,分布差异是不可避免的。为了解决这个问题,我们提出了名为 OODRobustBench 的库,用于全面评估 OOD 抗敌性 Robustness,透过 23 个dataset-wise 分布差异和 6 个威胁-wise 分布差异。OODRobustBench 用于评估 706 个Robust model,使用 60.7K 个攻击评估。这些大规模的分析显示:1)抗敌性 Robustness 受到 OOD 通用化的问题严重影响;2)ID 抗敌性 Robustness 与 OOD 抗敌性 Robustness 之间存在正相关,在许多分布差异下,呈现正线性关系。这使得可以预测 OOD 抗敌性 Robustness。基于这个结果,我们能够预测现有的 Robust training scheme 的 OOD 抗敌性 Robustness 的Upper bound。结果表明,以现有的方法设计 OOD 抗敌性 Robustness 需要开探新的方法。最后,我们发现额外的数据、数据增强、进阶模型架构和特定的调整方法可以提高 OOD 抗敌性 Robustness。特别是,发现的训练方案,相比基准,在威胁差异下显示出了很高的抗敌性 Robustness,同时保持高的 ID 抗敌性 Robustness,显示出了新的可行的解决方案。
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
results: 实验结果表明,AHL可以1)提高不同的状态的艺术模型(SOTA)在检测已见和未见异常方面的性能,达到新的SOTA性能水平,2)有效地在新目标领域中generalize to unseen anomalies。Abstract
Open-set supervised anomaly detection (OSAD) - a recently emerging anomaly detection area - aims at utilizing a few samples of anomaly classes seen during training to detect unseen anomalies (i.e., samples from open-set anomaly classes), while effectively identifying the seen anomalies. Benefiting from the prior knowledge illustrated by the seen anomalies, current OSAD methods can often largely reduce false positive errors. However, these methods treat the anomaly examples as from a homogeneous distribution, rendering them less effective in generalizing to unseen anomalies that can be drawn from any distribution. In this paper, we propose to learn heterogeneous anomaly distributions using the limited anomaly examples to address this issue. To this end, we introduce a novel approach, namely Anomaly Heterogeneity Learning (AHL), that simulates a diverse set of heterogeneous (seen and unseen) anomaly distributions and then utilizes them to learn a unified heterogeneous abnormality model. Further, AHL is a generic framework that existing OSAD models can plug and play for enhancing their abnormality modeling. Extensive experiments on nine real-world anomaly detection datasets show that AHL can 1) substantially enhance different state-of-the-art (SOTA) OSAD models in detecting both seen and unseen anomalies, achieving new SOTA performance on a large set of datasets, and 2) effectively generalize to unseen anomalies in new target domains.
摘要
在本文中,我们提出了一种新的方法,即异常多样性学习(AHL),用于学习异常多样的分布。AHL 通过模拟各种不同的异常分布,然后利用这些分布来学习一个统一的异常模型。AHL 是一个通用的框架,可以让现有的 OSAD 模型插入和使用,以提高异常模型化。我们在 nine 个实际异常检测 dataset 上进行了广泛的实验,结果表明,AHL 可以:1. 对不同的 SOTA OSAD 模型进行显著改进,在许多 dataset 上减少假阳性错误,并达到新的 SOTA 性能。2. 对未经见过的异常进行有效的泛化,在新的目标领域中具有优秀的泛化性能。
DT/MARS-CycleGAN: Improved Object Detection for MARS Phenotyping Robot
results: 对比于传统的CycleGAN模型,该新的DT/MARS-CycleGAN模型能够更好地适应复杂和变化的作物形态,从而提高MARS的作物物体检测性能。Abstract
Robotic crop phenotyping has emerged as a key technology to assess crops' morphological and physiological traits at scale. These phenotypical measurements are essential for developing new crop varieties with the aim of increasing productivity and dealing with environmental challenges such as climate change. However, developing and deploying crop phenotyping robots face many challenges such as complex and variable crop shapes that complicate robotic object detection, dynamic and unstructured environments that baffle robotic control, and real-time computing and managing big data that challenge robotic hardware/software. This work specifically tackles the first challenge by proposing a novel Digital-Twin(DT)MARS-CycleGAN model for image augmentation to improve our Modular Agricultural Robotic System (MARS)'s crop object detection from complex and variable backgrounds. Our core idea is that in addition to the cycle consistency losses in the CycleGAN model, we designed and enforced a new DT-MARS loss in the deep learning model to penalize the inconsistency between real crop images captured by MARS and synthesized images sensed by DT MARS. Therefore, the generated synthesized crop images closely mimic real images in terms of realism, and they are employed to fine-tune object detectors such as YOLOv8. Extensive experiments demonstrated that our new DT/MARS-CycleGAN framework significantly boosts our MARS' crop object/row detector's performance, contributing to the field of robotic crop phenotyping.
摘要
人工智能耕地评估技术已经成为评估作物形态和生理特征的关键工具。这些形态测量是为开发新的作物品种,提高产量和面对环境挑战,如气候变化。然而,开发和部署作物评估机器人受到许多挑战,如复杂和变化的作物形态,导致机器人检测困难,不结构化环境,以及实时处理大量数据的硬件/软件挑战。本工作专门解决第一个挑战,提出了一种新的数字双子(DT)MARS-CycleGAN模型,用于图像增强,从复杂和变化的背景中提高我们的Modular Agricultural Robotic System(MARS)的作物对象检测精度。我们的核心想法是,在CycleGAN模型中的循环一致损失之外,我们还设计了一个新的DT-MARS损失函数,以惩罚DT MARS捕捉到的实际作物图像与模拟图像之间的不一致。因此,生成的模拟图像准确地模仿实际图像,并且用于练化对象检测器,如YOLOv8。广泛的实验证明,我们的新DT/MARS-CycleGAN框架可以明显提高MARS的作物对象/行检测器性能,对耕地 roboticphenotyping领域做出了贡献。
Mixing Histopathology Prototypes into Robust Slide-Level Representations for Cancer Subtyping
for: 这个论文主要targets Whole-slide image analysis for computational pathology, with the goal of developing an efficient and effective method for processing large-scale datasets.
methods: The proposed method uses a combination of feature embedding and clustering to preprocess the full whole-slide image into a reduced prototype representation, which is then fed into a suitable MLP-Mixer architecture.
results: The proposed method achieves comparable performance to current state-of-the-art methods while achieving lower training costs in terms of computational time and memory load, as demonstrated through experiments on two public benchmarks and one in-house malignant lymphoma dataset.Abstract
Whole-slide image analysis via the means of computational pathology often relies on processing tessellated gigapixel images with only slide-level labels available. Applying multiple instance learning-based methods or transformer models is computationally expensive as, for each image, all instances have to be processed simultaneously. The MLP-Mixer is an under-explored alternative model to common vision transformers, especially for large-scale datasets. Due to the lack of a self-attention mechanism, they have linear computational complexity to the number of input patches but achieve comparable performance on natural image datasets. We propose a combination of feature embedding and clustering to preprocess the full whole-slide image into a reduced prototype representation which can then serve as input to a suitable MLP-Mixer architecture. Our experiments on two public benchmarks and one inhouse malignant lymphoma dataset show comparable performance to current state-of-the-art methods, while achieving lower training costs in terms of computational time and memory load. Code is publicly available at https://github.com/butkej/ProtoMixer.
摘要
整幕图像分析通常通过计算 pathology 利用处理分割的 gigapixel 图像,只有推送批处理的标签可用。应用多个实例学习基于方法或 transformer 模型是计算昂贵的,因为每个图像都需要同时处理所有实例。 MLP-Mixer 是一种未得到充分探索的代码模型,特别是 для大规模数据集。由于缺乏自我注意机制,它们有线性的计算复杂度与输入patches的数量相同,但在自然图像数据集上实现了相似的性能。我们提议将整个整幕图像 препроцессинг到一个减少的原型表示,然后用 suitable MLP-Mixer 架构进行进一步处理。我们的实验结果表明,在两个公共标准测试集和一个内部恶性淋巴癌数据集上,我们的方法可以与当前状态的方法相比,实现较低的训练成本,包括计算时间和内存负担。代码可以在 https://github.com/butkej/ProtoMixer 上获取。
Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers
results: 我们的方法在四个流行的benchmark上得到了高性能和效率的result,并可以作为评估基准模型在semantic segmentation中的转移能力的工具。Abstract
In the wake of Masked Image Modeling (MIM), a diverse range of plain, non-hierarchical Vision Transformer (ViT) models have been pre-trained with extensive datasets, offering new paradigms and significant potential for semantic segmentation. Current state-of-the-art systems incorporate numerous inductive biases and employ cumbersome decoders. Building upon the original motivations of plain ViTs, which are simplicity and generality, we explore high-performance `minimalist' systems to this end. Our primary purpose is to provide simple and efficient baselines for practical semantic segmentation with plain ViTs. Specifically, we first explore the feasibility and methodology for achieving high-performance semantic segmentation using the last feature map. As a result, we introduce the PlainSeg, a model comprising only three 3$\times$3 convolutions in addition to the transformer layers (either encoder or decoder). In this process, we offer insights into two underlying principles: (i) high-resolution features are crucial to high performance in spite of employing simple up-sampling techniques and (ii) the slim transformer decoder requires a much larger learning rate than the wide transformer decoder. On this basis, we further present the PlainSeg-Hier, which allows for the utilization of hierarchical features. Extensive experiments on four popular benchmarks demonstrate the high performance and efficiency of our methods. They can also serve as powerful tools for assessing the transfer ability of base models in semantic segmentation. Code is available at \url{https://github.com/ydhongHIT/PlainSeg}.
摘要
在Masked Image Modeling(MIM)的投射下,一些简单、非层次的视觉变换器(ViT)模型已经在大量数据上预训练,为Semantic Segmentation带来了新的思路和潜在性。当前领先的系统通常包含许多拟合因子和复杂的解码器。我们从原始的简单ViT的目的开始,即简单和通用,探索高性能的`最小主义'系统。我们的主要目标是提供简单、高效的Semantic Segmentation基线,特别是使用最后一个特征图进行Semantic Segmentation。为此,我们提出了PlainSeg模型,它只有3个3x3卷积 layer以及变换层(ither encoder或decoder)。在这个过程中,我们提供了两个基本原则:(i)高分辨率特征是高性能的关键,即使使用简单的upsampling技术,和(ii)短Transformer解码器需要 Much larger learning rate than wide Transformer解码器。基于这两个原则,我们进一步提出了PlainSeg-Hier,它允许使用层次特征。我们的实验表明,我们的方法具有高性能和高效性,并且可以作为评估基准模型的工具。代码可以在 \url{https://github.com/ydhongHIT/PlainSeg} 中找到。
ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swapping
results: 广泛的实验表明,提议的方法成功地解除了个体和特征特征,并超越了许多现有的面孔交换方法, both qualitatively and quantitatively。Abstract
We present a novel face swapping method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concatenated features into the extended latent space, we leverage the state-of-the-art quality and its rich semantic extended latent space. Extensive experiments suggest that the proposed method successfully disentangles identity and attribute features and outperforms many state-of-the-art face swapping methods, both qualitatively and quantitatively.
摘要
我们提出了一种新的面部换位方法,使用逐渐增长的 StyleGAN 预训练结构。先前的方法使用不同的编码解码结构和嵌入集成网络来生成高质量结果,但其质量受到杂合表示的限制。我们通过分离 semantics 来解耦各个特征。我们学习将拼接的特征映射到延展的幂空间中,以利用状态 искусственный智能的高质量和它的富 semantics 延展空间。广泛的实验表明,我们提出的方法成功地解耦了特征和特征,并超越了许多现状艺术的面部换位方法, both qualitatively and quantitatively。
Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression
results: 对三个公共的测试数据集进行了广泛的实验,并证明了 MASTC-VC 比前一代方法(H.265/HEVC 和 H.266/VVC)在 PSNR 和 MS-SSIM 指标上具有较高的效率和质量。具体来说,MASTC-VC 在 PSNR 指标下提供了10.15% 的BD-rate 减少,并在 MS-SSIM 指标下提供了23.93% 的BD-rate 减少。Abstract
Recently, learned video compression has achieved exciting performance. Following the traditional hybrid prediction coding framework, most learned methods generally adopt the motion estimation motion compensation (MEMC) method to remove inter-frame redundancy. However, inaccurate motion vector (MV) usually lead to the distortion of reconstructed frame. In addition, most approaches ignore the spatial and channel redundancy. To solve above problems, we propose a motion-aware and spatial-temporal-channel contextual coding based video compression network (MASTC-VC), which learns the latent representation and uses variational autoencoders (VAEs) to capture the characteristics of intra-frame pixels and inter-frame motion. Specifically, we design a multiscale motion-aware module (MS-MAM) to estimate spatial-temporal-channel consistent motion vector by utilizing the multiscale motion prediction information in a coarse-to-fine way. On the top of it, we further propose a spatial-temporal-channel contextual module (STCCM), which explores the correlation of latent representation to reduce the bit consumption from spatial, temporal and channel aspects respectively. Comprehensive experiments show that our proposed MASTC-VC is surprior to previous state-of-the-art (SOTA) methods on three public benchmark datasets. More specifically, our method brings average 10.15\% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR metric and average 23.93\% BD-rate savings against H.266/VVC (VTM-13.2) in MS-SSIM metric.
摘要
Specifically, we design a multiscale motion-aware module (MS-MAM) to estimate spatial-temporal-channel consistent motion vector by utilizing the multiscale motion prediction information in a coarse-to-fine way. Additionally, we propose a spatial-temporal-channel contextual module (STCCM), which explores the correlation of latent representation to reduce the bit consumption from spatial, temporal, and channel aspects, respectively.Comprehensive experiments show that our proposed MASTC-VC is superior to previous state-of-the-art (SOTA) methods on three public benchmark datasets. Specifically, our method achieves an average 10.15% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR metric and an average 23.93% BD-rate savings against H.266/VVC (VTM-13.2) in MS-SSIM metric.
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding
for: This paper aims to address the problem of holistically analyzing long videos and extracting useful knowledge to solve different types of queries.
methods: The proposed method uses an imagelanguage pretrained model to select frames pertinent to queries, obviating the need for a complete movie-level knowledge graph.
results: The approach achieved first and fourth positions for two groups of movie-level queries, demonstrating its effectiveness and robustness.Here’s the full Chinese text:
results: 方法在两组电影级别查询中 achieved 第一名和第四名,表明其效果和稳定性。Abstract
The surge in video and social media content underscores the need for a deeper understanding of multimedia data. Most of the existing mature video understanding techniques perform well with short formats and content that requires only shallow understanding, but do not perform well with long format videos that require deep understanding and reasoning. Deep Video Understanding (DVU) Challenge aims to push the boundaries of multimodal extraction, fusion, and analytics to address the problem of holistically analyzing long videos and extract useful knowledge to solve different types of queries. This paper introduces a query-aware method for long video localization and relation discrimination, leveraging an imagelanguage pretrained model. This model adeptly selects frames pertinent to queries, obviating the need for a complete movie-level knowledge graph. Our approach achieved first and fourth positions for two groups of movie-level queries. Sufficient experiments and final rankings demonstrate its effectiveness and robustness.
摘要
文中的媒体内容增长趋势高亮了深入理解多媒体数据的需要。现有的大多数成熟的视频理解技术在短格式内容上表现良好,但是对长格式视频进行深入理解和推理是不够。深入视频理解(DVU)挑战旨在推动多模态抽取、融合和分析,解决不同类型的查询问题。本文介绍了一种基于图像语言预训练模型的查询意识方法,可以选择相关的查询关键帧,从而避免需要完整的电影级别知识图。我们的方法在两组电影级别查询中获得了第一和第四名,经过了丰富的实验和最终排名,证明了其效果和可靠性。
Generating Robust Adversarial Examples against Online Social Networks (OSNs)
methods: 本文提出了一种基于 differentiable network(DN)的optimization框架,用于生成Robust AE。 Specifically, the framework includes a differentiable JPEG layer and an encoder-decoder subnetwork to simulate the operations conducted by an OSN.
results: 对Facebook、WeChat和QQ进行了广泛的实验,结果表明, compared to existing methods, our attack methods can produce more robust AEs, especially under small distortion constraints. The performance gain in terms of Attack Success Rate (ASR) can be more than 60%. In addition, we built a public dataset containing more than 10,000 pairs of AEs processed by Facebook, WeChat or QQ, which can facilitate future research in the robust AEs generation.Abstract
Online Social Networks (OSNs) have blossomed into prevailing transmission channels for images in the modern era. Adversarial examples (AEs) deliberately designed to mislead deep neural networks (DNNs) are found to be fragile against the inevitable lossy operations conducted by OSNs. As a result, the AEs would lose their attack capabilities after being transmitted over OSNs. In this work, we aim to design a new framework for generating robust AEs that can survive the OSN transmission; namely, the AEs before and after the OSN transmission both possess strong attack capabilities. To this end, we first propose a differentiable network termed SImulated OSN (SIO) to simulate the various operations conducted by an OSN. Specifically, the SIO network consists of two modules: 1) a differentiable JPEG layer for approximating the ubiquitous JPEG compression and 2) an encoder-decoder subnetwork for mimicking the remaining operations. Based upon the SIO network, we then formulate an optimization framework to generate robust AEs by enforcing model outputs with and without passing through the SIO to be both misled. Extensive experiments conducted over Facebook, WeChat and QQ demonstrate that our attack methods produce more robust AEs than existing approaches, especially under small distortion constraints; the performance gain in terms of Attack Success Rate (ASR) could be more than 60%. Furthermore, we build a public dataset containing more than 10,000 pairs of AEs processed by Facebook, WeChat or QQ, facilitating future research in the robust AEs generation. The dataset and code are available at https://github.com/csjunjun/RobustOSNAttack.git.
摘要
在现代时期,在线社交网络(OSNs)已经成为图像传输的主流途径。敌意例子(AEs),特意设计用于欺骗深度神经网络(DNNs),在OSNs中的各种lossy操作后失去攻击能力。在这项工作中,我们目的是设计一个新的框架,以生成具有抵抗力的AEs,使其在OSNs中传输后仍保持攻击能力。为此,我们首先提出一个可导网络,称为模拟OSN(SIO)网络。specifically,SIO网络由两个模块组成:1)一个可导JPEG层,用于模拟广泛存在的JPEG压缩;2)一个Encoder-Decoder子网络,用于模拟剩下的操作。基于SIO网络,我们然后建立了一个优化框架,用于生成抵抗力强的AEs,通过对SLO outputs with和without passing through SIO进行权衡来实现。经验表明,我们的攻击方法可以生成更加抵抗力强的AEs,特别是在小质量约束下,Attack Success Rate(ASR)的性能提升可以高达60%。此外,我们建立了一个包含more than 10,000个AEs processed by Facebook, WeChat or QQ的公共数据集,以便未来的研究人员进行抵抗AEs生成的研究。数据集和代码可以在https://github.com/csjunjun/RobustOSNAttack.git中获取。
Recoverable Privacy-Preserving Image Classification through Noise-like Adversarial Examples
results: 我们的方案可以保持纯文本预测器在加密和纯文本域中的预测精度相同,并且可以高效地解密加密图像,保持图像的原始形式,并且具有满意的泛化能力和高安全性。Abstract
With the increasing prevalence of cloud computing platforms, ensuring data privacy during the cloud-based image related services such as classification has become crucial. In this study, we propose a novel privacypreserving image classification scheme that enables the direct application of classifiers trained in the plaintext domain to classify encrypted images, without the need of retraining a dedicated classifier. Moreover, encrypted images can be decrypted back into their original form with high fidelity (recoverable) using a secret key. Specifically, our proposed scheme involves utilizing a feature extractor and an encoder to mask the plaintext image through a newly designed Noise-like Adversarial Example (NAE). Such an NAE not only introduces a noise-like visual appearance to the encrypted image but also compels the target classifier to predict the ciphertext as the same label as the original plaintext image. At the decoding phase, we adopt a Symmetric Residual Learning (SRL) framework for restoring the plaintext image with minimal degradation. Extensive experiments demonstrate that 1) the classification accuracy of the classifier trained in the plaintext domain remains the same in both the ciphertext and plaintext domains; 2) the encrypted images can be recovered into their original form with an average PSNR of up to 51+ dB for the SVHN dataset and 48+ dB for the VGGFace2 dataset; 3) our system exhibits satisfactory generalization capability on the encryption, decryption and classification tasks across datasets that are different from the training one; and 4) a high-level of security is achieved against three potential threat models. The code is available at https://github.com/csjunjun/RIC.git.
摘要
随着云计算平台的普及,在云上进行图像相关服务,如分类,保持数据隐私已成为关键。在本研究中,我们提出了一种新的隐私保护图像分类方案,允许直接在批处文本领域训练的分类器进行加密图像的分类,无需重新训练专门的分类器。此外,加密图像可以使用秘钥回到原始形式中,保持高比特率(可 восстанови)。具体来说,我们的提议方案包括使用特征提取器和编码器将批处文本图像做遮盾,并使用新设计的噪声类对抗例(NAE)。这种NAE不仅将加密图像具有噪声类视觉外观,还迫使目标分类器预测加密图像的标签与原始批处文本图像的标签相同。在解码阶段,我们采用了对称差异学习(SRL)框架,以最小化plaintext图像的受损。我们的实验结果表明:1)在批处文本领域和加密领域中,分类器的分类精度保持不变;2)加密图像可以高效地还原到原始形式,PSNR平均值达到51+ dB для SVHN数据集和48+ dB для VGGFace2数据集;3)我们的系统在不同的数据集上展现了满意的总体化能力;4)对于三种威胁模型,我们的系统实现了高度的安全性。代码可以在https://github.com/csjunjun/RIC.git中找到。
Exploiting Low-confidence Pseudo-labels for Source-free Object Detection
methods: 我们提出了一新的方法,使用高和低信任阈值来全面利用 Pseudo-labels。特别是,高信任阈值上的 Pseudo-labels 使用传统方式,而低信任阈值到中阈值的 Pseudo-labels 通过 Local Spatial Contrastive Learning (LSCL) 和 Proposal Soft Training (PST) 两个组件来进一步提高模型的表现。
results: 我们的方法在五个跨领域物体检测 benchmark 上进行了广泛的实验,结果显示我们的方法可以超越现有的 SFOD 方法, achieved state-of-the-art 性能。Abstract
Source-free object detection (SFOD) aims to adapt a source-trained detector to an unlabeled target domain without access to the labeled source data. Current SFOD methods utilize a threshold-based pseudo-label approach in the adaptation phase, which is typically limited to high-confidence pseudo-labels and results in a loss of information. To address this issue, we propose a new approach to take full advantage of pseudo-labels by introducing high and low confidence thresholds. Specifically, the pseudo-labels with confidence scores above the high threshold are used conventionally, while those between the low and high thresholds are exploited using the Low-confidence Pseudo-labels Utilization (LPU) module. The LPU module consists of Proposal Soft Training (PST) and Local Spatial Contrastive Learning (LSCL). PST generates soft labels of proposals for soft training, which can mitigate the label mismatch problem. LSCL exploits the local spatial relationship of proposals to improve the model's ability to differentiate between spatially adjacent proposals, thereby optimizing representational features further. Combining the two components overcomes the challenges faced by traditional methods in utilizing low-confidence pseudo-labels. Extensive experiments on five cross-domain object detection benchmarks demonstrate that our proposed method outperforms the previous SFOD methods, achieving state-of-the-art performance.
摘要
Representation Learning via Consistent Assignment of Views over Random Partitions
results: CARP 可以在17个数据集上学习出适合的表示,并且在多种自适应学习任务上表现出色。与11个现有的自适应方法进行比较,CARP 在转移学习任务中表现最好,并且在训练时间较短的情况下表现更好。Abstract
We present Consistent Assignment of Views over Random Partitions (CARP), a self-supervised clustering method for representation learning of visual features. CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments. Additionally, our method improves training stability and prevents collapsed solutions in joint-embedding training. Through an extensive evaluation, we demonstrate that CARP's representations are suitable for learning downstream tasks. We evaluate CARP's representations capabilities in 17 datasets across many standard protocols, including linear evaluation, few-shot classification, k-NN, k-means, image retrieval, and copy detection. We compare CARP performance to 11 existing self-supervised methods. We extensively ablate our method and demonstrate that our proposed random partition pretext task improves the quality of the learned representations by devising multiple random classification tasks. In transfer learning tasks, CARP achieves the best performance on average against many SSL methods trained for a longer time.
摘要
我们介绍了一种自助学习 clustering 方法,即 Consistent Assignment of Views over Random Partitions (CARP),用于学习视觉特征的表示。CARP 通过梯度下降来学习抽象,不需要额外的非 diferenciable 模块来解决归一化问题。CARP 优化了一个基于随机分区的 проtotypes 的新预测任务,用于规范模型并确保视图归一化的一致性。此外,我们的方法还提高了共同嵌入训练的稳定性,避免了共同嵌入训练中的坍缩解决方案。通过广泛的评估,我们表明了 CARP 的表示是适合学习下游任务的。我们在 17 个数据集上进行了广泛的评估,包括线性评估、少量分类、k-NN、k-means、图像检索和复制检测等几种标准协议。我们与 11 种现有的自助学习方法进行了比较,并证明了我们提posed的随机分区预测任务可以提高学习的表示质量。在转移学习任务中,CARP 的表示性能在许多 SSL 方法已经训练了更长时间后仍然保持最高的平均性能。
TapMo: Shape-aware Motion Generation of Skeleton-free Characters
results: 对比其他自动动画方法,TapMo 能够生成高质量的动作,并且能够涵盖多种非人物3D模型。Abstract
Previous motion generation methods are limited to the pre-rigged 3D human model, hindering their applications in the animation of various non-rigged characters. In this work, we present TapMo, a Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters. The pivotal innovation in TapMo is its use of shape deformation-aware features as a condition to guide the diffusion model, thereby enabling the generation of mesh-specific motions for various characters. Specifically, TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module. Mesh Handle Predictor predicts the skinning weights and clusters mesh vertices into adaptive handles for deformation control, which eliminates the need for traditional skeletal rigging. Shape-aware Motion Diffusion synthesizes motion with mesh-specific adaptations. This module employs text-guided motions and mesh features extracted during the first stage, preserving the geometric integrity of the animations by accounting for the character's shape and deformation. Trained in a weakly-supervised manner, TapMo can accommodate a multitude of non-human meshes, both with and without associated text motions. We demonstrate the effectiveness and generalizability of TapMo through rigorous qualitative and quantitative experiments. Our results reveal that TapMo consistently outperforms existing auto-animation methods, delivering superior-quality animations for both seen or unseen heterogeneous 3D characters.
摘要
previous motion generation methods are limited to pre-rigged 3D human models, which hinders their applications in animating various non-rigged characters. In this work, we present TapMo, a text-driven animation pipeline for synthesizing motion in a broad spectrum of skeleton-free 3D characters. The key innovation of TapMo is its use of shape deformation-aware features as a condition to guide the diffusion model, allowing for the generation of mesh-specific motions for various characters. TapMo consists of two main components: Mesh Handle Predictor and Shape-aware Motion Diffusion Module. The Mesh Handle Predictor predicts the skinning weights and clusters mesh vertices into adaptive handles for deformation control, eliminating the need for traditional skeletal rigging. The Shape-aware Motion Diffusion module synthesizes motion with mesh-specific adaptations, using text-guided motions and mesh features extracted during the first stage. This module preserves the geometric integrity of the animations by accounting for the character's shape and deformation. TapMo is trained in a weakly-supervised manner and can accommodate a multitude of non-human meshes, both with and without associated text motions. Our experiments show that TapMo consistently outperforms existing auto-animation methods, delivering superior-quality animations for both seen and unseen heterogeneous 3D characters.
Weakly Supervised Learning for Breast Cancer Prediction on Mammograms in Realistic Settings
results: 在实际医疗设置下,我们发现了一种两级多例学习(MIL)方法,可以在只有案例标签而不是每个图像或ROI的情况下进行检测。我们还提出了封装特有的MIL池化变体,以便在 breast cancer 的一侧出现。我们的研究表明,这种两级 MIL 方法可以在现实的医疗设置下应用,并且可以随着患者的不断入院而扩展。Abstract
Automatic methods for early detection of breast cancer on mammography can significantly decrease mortality. Broad uptake of those methods in hospitals is currently hindered because the methods have too many constraints. They assume annotations available for single images or even regions-of-interest (ROIs), and a fixed number of images per patient. Both assumptions do not hold in a general hospital setting. Relaxing those assumptions results in a weakly supervised learning setting, where labels are available per case, but not for individual images or ROIs. Not all images taken for a patient contain malignant regions and the malignant ROIs cover only a tiny part of an image, whereas most image regions represent benign tissue. In this work, we investigate a two-level multi-instance learning (MIL) approach for case-level breast cancer prediction on two public datasets (1.6k and 5k cases) and an in-house dataset of 21k cases. Observing that breast cancer is usually only present in one side, while images of both breasts are taken as a precaution, we propose a domain-specific MIL pooling variant. We show that two-level MIL can be applied in realistic clinical settings where only case labels, and a variable number of images per patient are available. Data in realistic settings scales with continuous patient intake, while manual annotation efforts do not. Hence, research should focus in particular on unsupervised ROI extraction, in order to improve breast cancer prediction for all patients.
摘要
自动方法可以早期检测乳腺癌,可以大幅降低死亡率。然而,现在医院中广泛应用这些方法受到限制,因为这些方法假设有可用的注释 для单个图像或Region of Interest(ROI),以及固定的图像数量每个患者。这两个假设在普通医院设置中不成立。在弱监督学习 Setting中,标签只有每个案例的Level,而不是每个图像或ROI。不同于其他图像,大多数图像区域表示正常的组织。在这项工作中,我们调查了一种两级多例学习(MIL)方法,用于检测乳腺癌的情况。我们注意到,乳腺癌通常只存在一侧,而图像中收集的两侧图像是为了预防。我们提议一种域Specific MIL Pooling变体。我们表明,两级 MIL可以在现实的医疗设置中应用,只有案例标签和每个患者的变数量图像可以获得。数据在实际设置中随着患者的不断入院而增加,而手动标注努力则不会增加。因此,研究应该专注于无监督ROI提取,以提高乳腺癌预测的准确性。
TRUSTED: The Paired 3D Transabdominal Ultrasound and CT Human Data for Kidney Segmentation and Registration Research
paper_authors: William Ndzimbong, Cyril Fourniol, Loic Themyr, Nicolas Thome, Yvonne Keeza, Beniot Sauer, Pierre-Thierry Piechaud, Arnaud Mejean, Jacques Marescaux, Daniel George, Didier Mutter, Alexandre Hostettler, Toby Collins
For: The paper is written for researchers to develop and validate new image segmentation and image-modality registration methods using abdominal ultrasound (US) data.* Methods: The paper uses a dataset of paired transabdominal 3DUS and CT kidney images, with segmentation and anatomical landmark annotations, to evaluate the performance of different deep learning models and image registration methods.* Results: The paper reports the results of benchmarking five deep learning models for automatic kidney segmentation, with average DICE scores ranging from 83.2% to 89.1% for CT images and 61.9% to 79.4% for US images. The paper also reports the results of benchmarking three image registration methods, with Coherent Point Drift performing best with an average Target Registration Error of 4.53mm.Abstract
Inter-modal image registration (IMIR) and image segmentation with abdominal Ultrasound (US) data has many important clinical applications, including image-guided surgery, automatic organ measurement and robotic navigation. However, research is severely limited by the lack of public datasets. We propose TRUSTED (the Tridimensional Renal Ultra Sound TomodEnsitometrie Dataset), comprising paired transabdominal 3DUS and CT kidney images from 48 human patients (96 kidneys), including segmentation, and anatomical landmark annotations by two experienced radiographers. Inter-rater segmentation agreement was over 94 (Dice score), and gold-standard segmentations were generated using the STAPLE algorithm. Seven anatomical landmarks were annotated, important for IMIR systems development and evaluation. To validate the dataset's utility, 5 competitive Deep Learning models for automatic kidney segmentation were benchmarked, yielding average DICE scores from 83.2% to 89.1% for CT, and 61.9% to 79.4% for US images. Three IMIR methods were benchmarked, and Coherent Point Drift performed best with an average Target Registration Error of 4.53mm. The TRUSTED dataset may be used freely researchers to develop and validate new segmentation and IMIR methods.
摘要
<> translate the following text into Simplified Chinese<>多Modal图像匹配(IMIR)和腹部超声图像分割(US)数据在临床应用中具有重要的临床应用,包括图像导航、自动器官测量和 робоaxiNavigation。然而,研究受到公共数据的缺乏所限。我们提出了TRUSTED(三维肾脏超声TomodEnsitometrie Dataset),包括48名人类病人(96个肾脏)的对称的肾脏3DUS和CT图像,以及分割和解剖标志注解由两名经验丰富的放射学家。对于每个病人,两名放射学家进行了分割,并取得了多于94%的重合率(Dice分数)。此外,我们还生成了金标准分割,使用STAPLE算法。 Seven个解剖标志被注解,对IMIR系统的开发和评估具有重要性。为验证数据集的有用性,我们对5种竞争型深度学习模型进行了自动肾脏分割的测试,其中CT图像的平均DICE分数在83.2%到89.1%之间,而US图像的平均DICE分数在61.9%到79.4%之间。此外,我们还测试了3种IMIR方法,并确定了coherent Point Drift方法为最佳,其Target Registration Error平均值为4.53毫米。TRUSTED数据集可供研究人员免费使用,以开发和验证新的分割和IMIR方法。
SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes
results: 该方法可以在强光照下高质量地恢复颜色和粗糙度,不受阴影干扰。与现有方法相比,SIRe-IR 在量化和质量上均表现出优于现有方法。Abstract
Implicit neural representation has opened up new possibilities for inverse rendering. However, existing implicit neural inverse rendering methods struggle to handle strongly illuminated scenes with significant shadows and indirect illumination. The existence of shadows and reflections can lead to an inaccurate understanding of scene geometry, making precise factorization difficult. To this end, we present SIRe-IR, an implicit neural inverse rendering approach that uses non-linear mapping and regularized visibility estimation to decompose the scene into environment map, albedo, and roughness. By accurately modeling the indirect radiance field, normal, visibility, and direct light simultaneously, we are able to remove both shadows and indirect illumination in materials without imposing strict constraints on the scene. Even in the presence of intense illumination, our method recovers high-quality albedo and roughness with no shadow interference. SIRe-IR outperforms existing methods in both quantitative and qualitative evaluations.
摘要
启用隐藏神经表示法打开了新的可逆渲染可能性。然而,现有的隐藏神经反射方法在强烈照明场景中受到强烈阴影和反射的影响,导致场景几何理解不准确,精确因子化困难。为了解决这问题,我们提出了SIRe-IR方法,它使用非线性映射和规则化可见性估计来分解场景为环境图、抛光率和粗糙度。通过准确模型 indirect radiance field, normal, visibility 和直接照明同时,我们可以去除物体表面上的阴影和 indirect illumination,无需对场景做严格的限制。即使在强烈照明下,我们的方法可以高质量地提取 albedo 和粗糙度,无阴影干扰。相比之下,SIRe-IR 方法在量化和质量两个方面都高于现有方法。
FUSC: Fetal Ultrasound Semantic Clustering of Second Trimester Scans Using Deep Self-supervised Learning
results: 研究结果显示,FUSC方法可以在一个新的测试数据集上达到92.2%的分类纯度。Abstract
Ultrasound is the primary imaging modality in clinical practice during pregnancy. More than 140M fetuses are born yearly, resulting in numerous scans. The availability of a large volume of fetal ultrasound scans presents the opportunity to train robust machine learning models. However, the abundance of scans also has its challenges, as manual labeling of each image is needed for supervised methods. Labeling is typically labor-intensive and requires expertise to annotate the images accurately. This study presents an unsupervised approach for automatically clustering ultrasound images into a large range of fetal views, reducing or eliminating the need for manual labeling. Our Fetal Ultrasound Semantic Clustering (FUSC) method is developed using a large dataset of 88,063 images and further evaluated on an additional unseen dataset of 8,187 images achieving over 92% clustering purity. The result of our investigation hold the potential to significantly impact the field of fetal ultrasound imaging and pave the way for more advanced automated labeling solutions. Finally, we make the code and the experimental setup publicly available to help advance the field.
摘要
超声成为了临床实践中最主要的医学影像模式之一,每年超过14亿个胎儿出生,从而生成了大量的扫描图像。这些图像的丰富性带来了许多挑战,特别是需要手动标注每个图像以进行指导方法。手动标注是一项劳动密集的任务,需要专业人士准确地标注图像。本研究提出了一种无监督的方法,可以自动将超声图像分为大量的胎儿视角,从而减少或消除手动标注的需求。我们称之为胎儿超声 semantics 分 clustering(FUSC)方法,通过使用88,063张图像的大型数据集进行开发,并在其他未看过的数据集上进行了进一步的评估,实现了92%以上的分 clustering纯度。我们的发现可能会对胎儿超声成像领域产生深远的影响,并为更高级的自动标注解决方案铺平道路。最后,我们将代码和实验设置公开发布,以帮助前进该领域。
methods: 使用 VQGAN 和 StyleGAN 等图像生成技术,在 embedding 空间中将原始图像Shift到一个骗图像上。 + Drawing inspiration from Fawkes,our method entails shifting the original image within the embedding space towards a decoy image。
results: + 通过隐私度量表和新的未知图像识别技术抗性测试,我们证明了我们的方法的有效性。 + 此外,我们还进行了人类评估,并证明了 modificated 图像仍能保持人脸识别的能力,同时避免了隐私泄露。Abstract
Classical techniques for protecting facial image privacy typically fall into two categories: data-poisoning methods, exemplified by Fawkes, which introduce subtle perturbations to images, or anonymization methods that generate images resembling the original only in several characteristics, such as gender, ethnicity, or facial expression.In this study, we introduce a novel approach, PrivacyGAN, that uses the power of image generation techniques, such as VQGAN and StyleGAN, to safeguard privacy while maintaining image usability, particularly for social media applications. Drawing inspiration from Fawkes, our method entails shifting the original image within the embedding space towards a decoy image.We evaluate our approach using privacy metrics on traditional and novel facial image datasets. Additionally, we propose new criteria for evaluating the robustness of privacy-protection methods against unknown image recognition techniques, and we demonstrate that our approach is effective even in unknown embedding transfer scenarios. We also provide a human evaluation that further proves that the modified image preserves its utility as it remains recognisable as an image of the same person by friends and family.
摘要
传统的面部图像隐私保护方法通常分为两类:数据毒素方法,如法克斯,引入微妙的抖动,或匿名化方法,生成表现类似于原始图像的图像。在这项研究中,我们介绍一种新的方法,隐私GAN,利用图像生成技术,如VQGAN和StyleGAN,保护隐私 while maintaining图像可用性,特别是在社交媒体应用程序中。 drawing inspiration from法克斯,我们的方法是将原始图像在嵌入空间中移动到一个预料图像。我们使用隐私指标进行评估,并提出了新的隐私保护方法不知情图像识别技术的抗性测试标准。我们还进行了人工评估,证明修改后的图像仍然保留了同一个人的认知度,可以被朋友和家人识别。
Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation
results: 通过对各种特征进行实验,包括颜色多样性、LPIPS指标和人类/性别表现,证明了我们的多样性方法的有效性,并提供了价值的透彻视角以改进文本到图像模型。Abstract
Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms, including color diversity.Diverse Diffusion is a general unsupervised technique that can be applied to existing text-to-image models. Our approach focuses on finding vectors in the Stable Diffusion latent space that are distant from each other. We generate multiple vectors in the latent space until we find a set of vectors that meets the desired distance requirements and the required batch size.To evaluate the effectiveness of our diversity methods, we conduct experiments examining various characteristics, including color diversity, LPIPS metric, and ethnicity/gender representation in images featuring humans.The results of our experiments emphasize the significance of diversity in generating realistic and varied images, offering valuable insights for improving text-to-image models. Through the enhancement of image diversity, our approach contributes to the creation of more inclusive and representative AI-generated art.
摘要
latent diffusion模型可以生成高质量的图像,但有关图像多样性的问题出现。为解决这问题,我们介绍了多样化催化(Diverse Diffusion),一种可以超越性别和文化多样性,探索更加丰富的颜色多样性的方法。多样化催化是一种无监督的普适技术,可以应用于现有的文本到图像模型。我们的方法是在稳定催化的幂谱空间中找到远离彼此的向量。我们通过生成多个向量在幂谱空间,直到找到符合需要的距离要求和批处理大小。为评估我们多样性方法的效果,我们进行了各种特征的实验,包括颜色多样性、LPIPS指标和人类团队表现等。实验结果表明,多样性是生成图像的重要因素,我们的方法可以提高图像的多样性,从而创造更加包容和代表性的人工智能生成艺术。
A reproducible 3D convolutional neural network with dual attention module (3D-DAM) for Alzheimer’s disease classification
paper_authors: Gia Minh Hoang, Youngjoo Lee, Jae Gwan Kim for:这个研究的目的是为了提出一个可重现的模型,用于诊断阿尔茨海默症。methods:这个模型使用了3D卷积神经网络,并具有双注意模块。results:这个模型在ADNI数据库上进行训练,并在两个独立的数据集(AIBL和OASIS1)上进行验证。结果显示,这个方法可以实现91.94%的MCI进展诊断精度和96.30%的阿尔茨海默症诊断精度。此外,模型也在两个数据集上具有良好的一致性,即AIBL数据集上的精度为86.37%,OASIS1数据集上的精度为83.42%。Abstract
Alzheimer's disease is one of the most common types of neurodegenerative disease, characterized by the accumulation of amyloid-beta plaque and tau tangles. Recently, deep learning approaches have shown promise in Alzheimer's disease diagnosis. In this study, we propose a reproducible model that utilizes a 3D convolutional neural network with a dual attention module for Alzheimer's disease classification. We trained the model in the ADNI database and verified the generalizability of our method in two independent datasets (AIBL and OASIS1). Our method achieved state-of-the-art classification performance, with an accuracy of 91.94% for MCI progression classification and 96.30% for Alzheimer's disease classification on the ADNI dataset. Furthermore, the model demonstrated good generalizability, achieving an accuracy of 86.37% on the AIBL dataset and 83.42% on the OASIS1 dataset. These results indicate that our proposed approach has competitive performance and generalizability when compared to recent studies in the field.
摘要
阿尔茨海默病是一种最常见的神经退化病种,表现为amyloid-beta固革和tau卷绕。近年来,深度学习方法在阿尔茨海默病诊断中表现出了搭配性。在这项研究中,我们提议一种可重制的模型,利用3D卷积神经网络和双注意模块进行阿尔茨海默病分类。我们在ADNI数据库中训练了该模型,并在两个独立的数据集(AIBL和OASIS1)中验证了我们的方法的一致性。我们的方法在ADNI数据集上实现了状态机器的分类性能,准确率为91.94%,并在AIBL数据集和OASIS1数据集上达到了86.37%和83.42%的准确率。这些结果表明,我们提议的方法在相关领域中具有竞争力和一致性。
DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation
paper_authors: Guanqun Sun, Yizhi Pan, Weikun Kong, Zichang Xu, Jianhua Ma, Teeradaj Racharak, Le-Minh Nguyen, Junyi Xin for:This paper proposes a novel deep medical image segmentation framework called DA-TransUNet, which aims to improve medical image segmentation performance by incorporating transformer and dual attention block into the traditional U-shaped architecture.methods:The proposed DA-TransUNet model utilizes attention mechanism of transformer and multifaceted feature extraction of DA-Block to efficiently combine global, local, and multi-scale features for medical image segmentation. Additionally, dual attention blocks are added before the Transformer layer and in skip connections to enhance feature extraction and transfer.results:Experimental results across various benchmarks of medical image segmentation reveal that DA-TransUNet significantly outperforms state-of-the-art methods, demonstrating the effectiveness of the proposed framework in improving medical image segmentation performance.Abstract
Great progress has been made in automatic medical image segmentation due to powerful deep representation learning. The influence of transformer has led to research into its variants, and large-scale replacement of traditional CNN modules. However, such trend often overlooks the intrinsic feature extraction capabilities of the transformer and potential refinements to both the model and the transformer module through minor adjustments. This study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to introduce the Transformer and dual attention block into the encoder and decoder of the traditional U-shaped architecture. Unlike prior transformer-based solutions, our DA-TransUNet utilizes attention mechanism of transformer and multifaceted feature extraction of DA-Block, which can efficiently combine global, local, and multi-scale features to enhance medical image segmentation. Meanwhile, experimental results show that a dual attention block is added before the Transformer layer to facilitate feature extraction in the U-net structure. Furthermore, incorporating dual attention blocks in skip connections can enhance feature transfer to the decoder, thereby improving image segmentation performance. Experimental results across various benchmark of medical image segmentation reveal that DA-TransUNet significantly outperforms the state-of-the-art methods. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet.
摘要
医学图像分割领域内,由于深度学习的强大表现, automatizd medical image segmentation 已经取得了 significan progress。 transformer 的影响导致了关于其变体和传统 CNN 模块的大规模替换的研究。然而,这些趋势通常忽略了 transformer 的内在特征提取能力和可能的模型和变体块的微调。本研究提出了一种新的深度医学图像分割框架,称为 DA-TransUNet,旨在将 transformer 和 dual attention block 引入传统 U-shaped 架构的encoder和decoder中。与先前的 transformer-based 解决方案不同,我们的 DA-TransUNet 利用 transformer 的注意机制和 DA-Block 的多元特征提取,可以有效地将全球、本地和多尺度特征相结合,以提高医学图像分割。此外,在 U-net 结构中添加 dual attention block 可以促进特征提取,从而提高图像分割性能。在不同的医学图像分割 benchmark 上,DA-TransUNet 与当前状态的方法进行比较,实验结果显示 DA-TransUNet 在医学图像分割 task 中具有显著的优势。我们将在 GitHub 上公开我们的模型和参数,访问 https://github.com/SUN-1024/DA-TransUnet。
Click on Mask: A Labor-efficient Annotation Framework with Level Set for Infrared Small Target Detection
methods: 本文提出了一个劳动效率高且简洁的标注框架,使用level set方法实现高品质pseudo标注,仅需一个简单的点击。 variational level set形式中的期望差能函数设计,以维持零水平面的存在 during level set演化过程。
results: 实验结果显示,我们的方法在NUAA-SIRST和IRSTD-1k数据集上实现了superior表现。Abstract
Infrared Small Target Detection is a challenging task to separate small targets from infrared clutter background. Recently, deep learning paradigms have achieved promising results. However, these data-driven methods need plenty of manual annotation. Due to the small size of infrared targets, manual annotation consumes more resources and restricts the development of this field. This letter proposed a labor-efficient and cursory annotation framework with level set, which obtains a high-quality pseudo mask with only one cursory click. A variational level set formulation with an expectation difference energy functional is designed, in which the zero level contour is intrinsically maintained during the level set evolution. It solves the issue that zero level contour disappearing due to small target size and excessive regularization. Experiments on the NUAA-SIRST and IRSTD-1k datasets reveal that our approach achieves superior performance. Code is available at https://github.com/Li-Haoqing/COM.
摘要
INFRARED小target检测是一个具有挑战性的任务,既需要分离小目标从红外背景中,又需要大量的手动标注。 Recently, deep learning paradigms have achieved promising results, but these data-driven methods require a lot of manual annotation, which is time-consuming and restricts the development of this field. This letter proposes a labor-efficient and cursory annotation framework with level set, which can obtain a high-quality pseudo mask with just one cursory click.We designed a variational level set formulation with an expectation difference energy functional, which ensures that the zero level contour is intrinsically maintained during the level set evolution. This solves the issue of the zero level contour disappearing due to small target size and excessive regularization. Experiments on the NUAA-SIRST and IRSTD-1k datasets show that our approach achieves superior performance. Code is available at https://github.com/Li-Haoqing/COM.Here's the translation in Traditional Chinese as well:INFRARED小target检测是一个具有挑战性的任务,既需要分离小目标从红外背景中,又需要大量的手动标注。 Recently, deep learning paradigms have achieved promising results, but these data-driven methods require a lot of manual annotation, which is time-consuming and restricts the development of this field. This letter proposes a labor-efficient and cursory annotation framework with level set, which can obtain a high-quality pseudo mask with just one cursory click.We designed a variational level set formulation with an expectation difference energy functional, which ensures that the zero level contour is intrinsically maintained during the level set evolution. This solves the issue of the zero level contour disappearing due to small target size and excessive regularization. Experiments on the NUAA-SIRST and IRSTD-1k datasets show that our approach achieves superior performance. Code is available at https://github.com/Li-Haoqing/COM.
Explanation-Based Training with Differentiable Insertion/Deletion Metric-Aware Regularizers
results: 实验结果表明,通过ID-ExpO优化的深度神经网络预测器,可以使得后期解释器生成更准确和易于理解的解释,同时保持高度的预测精度。Abstract
The quality of explanations for the predictions of complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how correctly the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes differentiable predictors to improve both insertion and deletion scores of the explanations while keeping their predictive accuracy. Since the original insertion and deletion metrics are indifferentiable with respect to the explanations and directly unavailable for gradient-based optimization, we extend the metrics to be differentiable and use them to formalize insertion and deletion metric-based regularizers. The experimental results on image and tabular datasets show that the deep neural networks-based predictors fine-tuned using ID-ExpO enable popular post-hoc explainers to produce more faithful and easy-to-interpret explanations while keeping high predictive accuracy.
摘要
“复杂机器学习预测器的解释质量经常用插入和删除指标来衡量,以评估解释的准确性,即预测器的行为如何准确地反映在解释中。为了提高准确性,我们提议使用插入/删除指标意识的解释基于优化(ID-ExpO),该方法优化可导式预测器,以提高插入和删除指标的解释忠实度,同时保持预测精度。由于原始的插入和删除指标与解释无法导数,我们将其扩展为可导的指标,并使用它们来形式化插入和删除指标基于的补偿器。实验结果表明,使用 ID-ExpO 进行 fine-tuning 的深度神经网络预测器在图像和表格数据集上能够生成更准确和易于理解的解释,同时保持高度预测精度。”
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
paper_authors: Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang
for: develop robots that ground and grasp objects based on natural language instructions
methods: learn personal objects by propagating user-given information through a Reminiscence-a collection of raw images from the user’s environment
results: significantly outperforms baseline methods both in offline and online settings, demonstrating its effectiveness and personalization applicability on real-world scenariosAbstract
Language-Conditioned Robotic Grasping (LCRG) aims to develop robots that ground and grasp objects based on natural language instructions. While robots capable of recognizing personal objects like "my wallet" can interact more naturally with non-expert users, current LCRG systems primarily limit robots to understanding only generic expressions. To this end, we introduce a task scenario GraspMine with a novel dataset that aims to locate and grasp personal objects given personal indicators via learning from a single human-robot interaction. To address GraspMine, we propose Personalized Grasping Agent (PGA), that learns personal objects by propagating user-given information through a Reminiscence-a collection of raw images from the user's environment. Specifically, PGA acquires personal object information by a user presenting a personal object with its associated indicator, followed by PGA inspecting the object by rotating it. Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm. Harnessing the information acquired from the interactions and the pseudo-labeled objects in the Reminiscence, PGA adapts the object grounding model to grasp personal objects. Experiments on GraspMine show that PGA significantly outperforms baseline methods both in offline and online settings, signifying its effectiveness and personalization applicability on real-world scenarios. Finally, qualitative analysis shows the effectiveness of PGA through a detailed investigation of results in each phase.
摘要
language-conditioned 机器人抓取(LCRG)目的是开发基于自然语言指令的机器人,以便非专业用户与机器人进行更自然的交互。现今的LCRG系统主要限制机器人只能理解通用表达,而不是具体的个人物品。为此,我们引入一个任务场景名为GraspMine,该场景的目标是基于个人指示器找到和抓取个人物品。为解决GraspMine,我们提出了个性化抓取代理(PGA),它通过人类给出的信息来学习个人物品。具体来说,PGA通过人类展示个人物品和其相关的指示器,然后通过PGA旋转对象来获取个人物品信息。根据获取到的信息,PGA使用我们提议的标签传播算法对Reminiscence中的对象进行 pseudo-标签。利用与人类交互和 pseudo-标签对象的信息,PGA适应对象定位模型以抓取个人物品。GraspMine实验表明,PGA在线和离线 Setting中都有显著优于基eline方法,这表明它在实际场景中具有有效性和个性化应用性。最后,我们进行了详细的分析,以证明PGA的效果。
Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models
results: 研究发现,使用视觉基础模型,如Segment Anything Model(SAM),在WSSS中可以实现高效的Semantic segmentation。但是,还需要进一步的研究来解决在WSSS中 deploying visual foundational models 的挑战。Abstract
The rapid development of deep learning has driven significant progress in the field of image semantic segmentation - a fundamental task in computer vision. Semantic segmentation algorithms often depend on the availability of pixel-level labels (i.e., masks of objects), which are expensive, time-consuming, and labor-intensive. Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid such labeling. It utilizes only partial or incomplete annotations and provides a cost-effective alternative to fully-supervised semantic segmentation. In this paper, we focus on the WSSS with image-level labels, which is the most challenging form of WSSS. Our work has two parts. First, we conduct a comprehensive survey on traditional methods, primarily focusing on those presented at premier research conferences. We categorize them into four groups based on where their methods operate: pixel-wise, image-wise, cross-image, and external data. Second, we investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning. We provide insights into the potential and challenges associated with deploying visual foundational models for WSSS, facilitating future developments in this exciting research area.
摘要
深度学习的快速发展已经 drived significiant progress in the field of image semantic segmentation - 计算机视觉中的基本任务。 semantic segmentation algorithms often rely on the availability of pixel-level labels (i.e., object masks), which are expensive, time-consuming, and labor-intensive. Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid such labeling. It utilizes only partial or incomplete annotations and provides a cost-effective alternative to fully-supervised semantic segmentation. In this paper, we focus on the WSSS with image-level labels, which is the most challenging form of WSSS. Our work has two parts. First, we conduct a comprehensive survey on traditional methods, primarily focusing on those presented at premier research conferences. We categorize them into four groups based on where their methods operate: pixel-wise, image-wise, cross-image, and external data. Second, we investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning. We provide insights into the potential and challenges associated with deploying visual foundational models for WSSS, facilitating future developments in this exciting research area.
Machine Learning for Leaf Disease Classification: Data, Techniques and Applications
results: 本研究将提供有用的资源和材料,帮助读者更好地理解和应用Machine learning技术,推动智能农业发展。Abstract
The growing demand for sustainable development brings a series of information technologies to help agriculture production. Especially, the emergence of machine learning applications, a branch of artificial intelligence, has shown multiple breakthroughs which can enhance and revolutionize plant pathology approaches. In recent years, machine learning has been adopted for leaf disease classification in both academic research and industrial applications. Therefore, it is enormously beneficial for researchers, engineers, managers, and entrepreneurs to have a comprehensive view about the recent development of machine learning technologies and applications for leaf disease detection. This study will provide a survey in different aspects of the topic including data, techniques, and applications. The paper will start with publicly available datasets. After that, we summarize common machine learning techniques, including traditional (shallow) learning, deep learning, and augmented learning. Finally, we discuss related applications. This paper would provide useful resources for future study and application of machine learning for smart agriculture in general and leaf disease classification in particular.
摘要
随着可持续发展的需求增长,农业生产方面的信息技术得到了推广应用。特别是人工智能的一个分支——机器学习,在农业生产中已经展现出多个突破,可以增强和革命化植物疾病管理方法。在最近几年中,机器学习在学术研究和实践应用中都得到了广泛的应用。因此, для研究人员、工程师、经理人和企业家来说,有一个全面的认知对现代机器学习技术和应用的发展是非常有利的。本研究将从公共可用数据开始,然后总结传统(浅学习)、深度学习和增强学习等常见机器学习技术,最后讨论相关应用。这篇论文将为未来在智能农业和植物疾病分类方面的研究和应用提供有用的资源。
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
results: 提出了一种新的Pixel-wise Gradient Clipping(PGC)操作,可以快速并高效地控制梯度的大小,从而提高现有3D生成模型的渲染质量。Abstract
High-resolution 3D object generation remains a challenging task primarily due to the limited availability of comprehensive annotated training data. Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets, using knowledge transfer techniques like Score Distillation Sampling (SDS). Efficiently addressing the requirements of high-resolution rendering often necessitates the adoption of latent representation-based models, such as the Latent Diffusion Model (LDM). In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM. However, this gradient propagation pathway has never been optimized, remaining uncontrolled during training. We find that the unregulated gradients adversely affect the 3D model's capacity in acquiring texture-related information from the image generative model, leading to poor quality appearance synthesis. To address this overarching challenge, we propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models, thereby enhancing their synthesis quality. Specifically, we control the magnitude of stochastic gradients by clipping the pixel-wise gradients efficiently, while preserving crucial texture-related gradient directions. Despite this simplicity and minimal extra cost, extensive experiments demonstrate the efficacy of our PGC in enhancing the performance of existing 3D generative models for high-resolution object rendering.
摘要
高分辨率3D物体生成仍然是一项具有挑战性的任务,主要因为有限的可用完整标注训练数据。latest advancements 尝试使用图像生成模型,通过知识传输技术如分数投影 sampling (SDS),超越这一限制。然而,高分辨率渲染通常需要采用秘密表示基于模型,如秘密扩散模型 (LDM)。在这个框架中,一个主要挑战是计算每个图像像素的梯度,需要在图像模型中冻结的组件,如 VAE 编码器,通过秘密空间的梯度传播来进行反射。然而,这个梯度传播路径从未被优化,在训练中保持不控制。我们发现,不控制的梯度会对3D模型获得图像生成模型中的Texture-related信息,导致低质量的外观合成。为了解决这一总体挑战,我们提出了一种创新操作,称为像素级梯度剪辑 (PGC),用于integration into existing 3D生成模型,从而提高其合成质量。具体来说,我们控制像素级梯度的大小,通过有效地剪辑像素级梯度,保留关键的Texture-related梯度方向。尽管这种简单和附加成本较少,但经验表明,我们的PGC可以提高现有3D生成模型的高分辨率物体渲染质量。
RecolorCloud: A Point Cloud Tool for Recoloring, Segmentation, and Conversion
results: 实验结果显示,该工具可以大幅提高大型点云的 фото实际质量,并且用户可以快速地将点云染色到设置的semantic segmentation颜色中。Abstract
Point clouds are a 3D space representation of an environment that was recorded with a high precision laser scanner. These scanners can suffer from environmental interference such as surface shading, texturing, and reflections. Because of this, point clouds may be contaminated with fake or incorrect colors. Current open source or proprietary tools offer limited or no access to correcting these visual errors automatically. RecolorCloud is a tool developed to resolve these color conflicts by utilizing automated color recoloring. We offer the ability to deleting or recoloring outlier points automatically with users only needing to specify bounding box regions to effect colors. Results show a vast improvement of the photo-realistic quality of large point clouds. Additionally, users can quickly recolor a point cloud with set semantic segmentation colors.
摘要
<> tranlate text into Simplified ChinesePoint clouds are a 3D space representation of an environment that was recorded with a high precision laser scanner. These scanners can suffer from environmental interference such as surface shading, texturing, and reflections. Because of this, point clouds may be contaminated with fake or incorrect colors. Current open source or proprietary tools offer limited or no access to correcting these visual errors automatically. RecolorCloud is a tool developed to resolve these color conflicts by utilizing automated color recoloring. We offer the ability to deleting or recoloring outlier points automatically with users only needing to specify bounding box regions to effect colors. Results show a vast improvement of the photo-realistic quality of large point clouds. Additionally, users can quickly recolor a point cloud with set semantic segmentation colors. tranlate text into Simplified Chinese点云是环境记录高精度激光扫描仪所记录的3D空间表示,这些扫描仪可能会受到环境干扰,如表面阴影、文字化和反射。因此,点云可能会受到假或错误的颜色污染。现有的开源或商业工具具有有限或无法自动 correction这些视觉错误的能力。 RecolorCloud 是一种用于解决这些颜色冲突的工具,通过自动化颜色重新染色来解决。我们提供了自动删除或重新染色异常点的能力,只需要用户指定矩形区域,就可以对颜色产生影响。结果显示大量点云的图像质量有了很大的提升。此外,用户还可以快速地使用设置的语义分割颜色重新染色点云。
WeedCLR: Weed Contrastive Learning through Visual Representations with Class-Optimized Loss in Long-Tailed Datasets
results: 这篇论文在两个公共的植物数据集上进行了评估, compared to existing methods, WeedCLR 得到了4.3%的精度提升和5.6%的精度提升,并且在不同的环境条件下也展现了更好的一致性和稳定性。Abstract
Image classification is a crucial task in modern weed management and crop intervention technologies. However, the limited size, diversity, and balance of existing weed datasets hinder the development of deep learning models for generalizable weed identification. In addition, the expensive labelling requirements of mainstream fully-supervised weed classifiers make them cost- and time-prohibitive to deploy widely, for new weed species, and in site-specific weed management. This paper proposes a novel method for Weed Contrastive Learning through visual Representations (WeedCLR), that uses class-optimized loss with Von Neumann Entropy of deep representation for weed classification in long-tailed datasets. WeedCLR leverages self-supervised learning to learn rich and robust visual features without any labels and applies a class-optimized loss function to address the class imbalance problem in long-tailed datasets. WeedCLR is evaluated on two public weed datasets: CottonWeedID15, containing 15 weed species, and DeepWeeds, containing 8 weed species. WeedCLR achieves an average accuracy improvement of 4.3\% on CottonWeedID15 and 5.6\% on DeepWeeds over previous methods. It also demonstrates better generalization ability and robustness to different environmental conditions than existing methods without the need for expensive and time-consuming human annotations. These significant improvements make WeedCLR an effective tool for weed classification in long-tailed datasets and allows for more rapid and widespread deployment of site-specific weed management and crop intervention technologies.
摘要
现代农业中的图像分类任务是现代农业管理和作物 intervención技术的关键任务。然而,现有的异草数据集的大小、多样性和平衡受到了深度学习模型的发展带来限制。此外,主流的完全supervised weed分类器的高产生成成本和时间成本使其在新的异草种、具体的农业场景中广泛应用的成本和时间繁琐。这篇论文提出了一种novel的异草对比学习(WeedCLR)方法,通过视觉表示的类扩展损失函数进行异草分类。WeedCLR通过无监督学习学习丰富和稳定的视觉特征,不需要任何标签,并应用类扩展损失函数来解决长尾数据集中的类偏袋问题。WeedCLR在cottonweedID15和DeepWeeds两个公共异草数据集上进行评估,分别达到了4.3%和5.6%的准确率提升。它还表现出了更好的泛化能力和不同环境条件下的更好的鲁棒性,不需要贵重的人工标注。这些显著改进使WeedCLR成为了长尾数据集中的有效异草分类工具,可以更加快速地普及site-specific农业管理和作物 intervención技术。
Lidar Panoptic Segmentation and Tracking without Bells and Whistles
results: 我们在多个3D/4D lidar精准分割和跟踪 benchmark上评估了我们的方法,并观察到我们的模型在开源模型中 establishment了新的状态时刻,超过了最近的查询基本模型。Abstract
State-of-the-art lidar panoptic segmentation (LPS) methods follow bottom-up segmentation-centric fashion wherein they build upon semantic segmentation networks by utilizing clustering to obtain object instances. In this paper, we re-think this approach and propose a surprisingly simple yet effective detection-centric network for both LPS and tracking. Our network is modular by design and optimized for all aspects of both the panoptic segmentation and tracking task. One of the core components of our network is the object instance detection branch, which we train using point-level (modal) annotations, as available in segmentation-centric datasets. In the absence of amodal (cuboid) annotations, we regress modal centroids and object extent using trajectory-level supervision that provides information about object size, which cannot be inferred from single scans due to occlusions and the sparse nature of the lidar data. We obtain fine-grained instance segments by learning to associate lidar points with detected centroids. We evaluate our method on several 3D/4D LPS benchmarks and observe that our model establishes a new state-of-the-art among open-sourced models, outperforming recent query-based models.
摘要
现代雷达精密分割方法(LPS)采用底层分割心理,从 semantic segmentation 网络开始,通过归类来获得对象实例。在这篇文章中,我们弃用这种方法,并提出一种简单却有效的探测心理网络,用于 both LPS 和跟踪。我们的网络是模块化设计的,并且对所有 LPS 和跟踪任务进行优化。我们的网络中的一个核心组件是对象实例探测支持,我们使用点级(modal)注释进行训练。在缺乏模态(cuboid)注释的情况下,我们使用轨迹级超级视图来恢复模态中心和对象扩展,并通过学习将雷达点与探测中心相关联来获得细化的实例分割。我们对多个 3D/4D LPS benchmark 进行评估,并观察到我们的模型在开源模型中成功设置新的状态对照。我们的模型超过了最近的查询基于模型。
Not Just Learning from Others but Relying on Yourself: A New Perspective on Few-Shot Segmentation in Remote Sensing
For: 提出了一种新的几 shot segmentation(FSS)方法,用于将未知类目标分类到几个标注样本上。* Methods: 我们提出了一种名为 dual-mining 网络(DMNet)的方法,它不再仅仅是从支持图像中学习 semantics,而是同时从查询图像中提取 semantics。我们还提出了一种减少不相关特征污染的方法,以及一种新的知识分支suppressor(KMS)模块,用于降低已知类对象的活动。* Results: 我们在 iSAID 和 LoveDA 遥感数据集上进行了广泛的实验,并证明了我们的方法可以在 1-shot 和 5-shot 设置下达到最佳性能。特别是,我们的模型(使用 Resnet-50 作为背景网络)在 iSAID 下的 mIoU 达到了 49.58% 和 51.34%,在 1-shot 和 5-shot 设置下分别高于现有的 state-of-the-art 方法 by 1.8% 和 1.12%。代码可以在 https://github.com/HanboBizl/DMNet 上下载。Abstract
Few-shot segmentation (FSS) is proposed to segment unknown class targets with just a few annotated samples. Most current FSS methods follow the paradigm of mining the semantics from the support images to guide the query image segmentation. However, such a pattern of `learning from others' struggles to handle the extreme intra-class variation, preventing FSS from being directly generalized to remote sensing scenes. To bridge the gap of intra-class variance, we develop a Dual-Mining network named DMNet for cross-image mining and self-mining, meaning that it no longer focuses solely on support images but pays more attention to the query image itself. Specifically, we propose a Class-public Region Mining (CPRM) module to effectively suppress irrelevant feature pollution by capturing the common semantics between the support-query image pair. The Class-specific Region Mining (CSRM) module is then proposed to continuously mine the class-specific semantics of the query image itself in a `filtering' and `purifying' manner. In addition, to prevent the co-existence of multiple classes in remote sensing scenes from exacerbating the collapse of FSS generalization, we also propose a new Known-class Meta Suppressor (KMS) module to suppress the activation of known-class objects in the sample. Extensive experiments on the iSAID and LoveDA remote sensing datasets have demonstrated that our method sets the state-of-the-art with a minimum number of model parameters. Significantly, our model with the backbone of Resnet-50 achieves the mIoU of 49.58% and 51.34% on iSAID under 1-shot and 5-shot settings, outperforming the state-of-the-art method by 1.8% and 1.12%, respectively. The code is publicly available at https://github.com/HanboBizl/DMNet.
摘要
“几shot分类(FSS)是一种用于未知类目标的分类方法,只需要几个标注的样本。现有的FSS方法都是基于从支持图像中挖掘 semantics 来导引查询图像的分类。但这种“学习他人”的模式对于类型内的差异问题不能提供直接的解决方案,因此FSS在远程感知场景中难以应用。为了bridging这个差异问题,我们开发了一个名为DMNet的双采矿网络,它不再仅仅从支持图像中挖掘 semantics,而是对查询图像本身也进行了更多的注意。”“ Specifically, we propose a Class-public Region Mining (CPRM) module to effectively suppress irrelevant feature pollution by capturing the common semantics between the support-query image pair. The Class-specific Region Mining (CSRM) module is then proposed to continuously mine the class-specific semantics of the query image itself in a `filtering' and `purifying' manner.”“ In addition, to prevent the co-existence of multiple classes in remote sensing scenes from exacerbating the collapse of FSS generalization, we also propose a new Known-class Meta Suppressor (KMS) module to suppress the activation of known-class objects in the sample.”“ Extensive experiments on the iSAID and LoveDA remote sensing datasets have demonstrated that our method sets the state-of-the-art with a minimum number of model parameters. Significantly, our model with the backbone of Resnet-50 achieves the mIoU of 49.58% and 51.34% on iSAID under 1-shot and 5-shot settings, outperforming the state-of-the-art method by 1.8% and 1.12%, respectively.”“ The code is publicly available at https://github.com/HanboBizl/DMNet.”
for: investigate whether it is possible to attack SAM with image-agnostic Universal Adversarial Perturbation (UAP)
methods: propose a novel perturbation-centric framework based on self-supervised contrastive learning (CL) to generate UAP
results: validate the effectiveness of the proposed method with both quantitative and qualitative results, and perform ablation study to understand various components in the method.Abstract
As Segment Anything Model (SAM) becomes a popular foundation model in computer vision, its adversarial robustness has become a concern that cannot be ignored. This works investigates whether it is possible to attack SAM with image-agnostic Universal Adversarial Perturbation (UAP). In other words, we seek a single perturbation that can fool the SAM to predict invalid masks for most (if not all) images. We demonstrate convetional image-centric attack framework is effective for image-independent attacks but fails for universal adversarial attack. To this end, we propose a novel perturbation-centric framework that results in a UAP generation method based on self-supervised contrastive learning (CL), where the UAP is set to the anchor sample and the positive sample is augmented from the UAP. The representations of negative samples are obtained from the image encoder in advance and saved in a memory bank. The effectiveness of our proposed CL-based UAP generation method is validated by both quantitative and qualitative results. On top of the ablation study to understand various components in our proposed method, we shed light on the roles of positive and negative samples in making the generated UAP effective for attacking SAM.
摘要
As Segment Anything Model (SAM) becomes a popular foundation model in computer vision, its adversarial robustness has become a concern that cannot be ignored. This work investigates whether it is possible to attack SAM with image-agnostic Universal Adversarial Perturbation (UAP). In other words, we seek a single perturbation that can fool the SAM to predict invalid masks for most (if not all) images. We demonstrate that the conventional image-centric attack framework is effective for image-independent attacks but fails for universal adversarial attacks. To this end, we propose a novel perturbation-centric framework that results in a UAP generation method based on self-supervised contrastive learning (CL), where the UAP is set to the anchor sample and the positive sample is augmented from the UAP. The representations of negative samples are obtained from the image encoder in advance and saved in a memory bank. The effectiveness of our proposed CL-based UAP generation method is validated by both quantitative and qualitative results. Additionally, we perform an ablation study to understand the various components in our proposed method and shed light on the roles of positive and negative samples in making the generated UAP effective for attacking SAM.
LoMAE: Low-level Vision Masked Autoencoders for Low-dose CT Denoising
paper_authors: Dayang Wang, Yongshun Xu, Shuo Han, Zhan Wu, Li Zhou, Bahareh Morovati, Hengyong Yu for: 这篇论文是为了提高低剂量 computed tomography(LDCT)图像质量的方法。methods: 这篇论文使用了 transformer 模型,并且使用了 masked autoencoder(MAE)来进行自我预训。results: experiments 结果显示,提案的 LoMAE 可以增强 transformer 的混参质化表现,并且可以大大减少依赖clean数据。它还展示了优异的韧性和应用性。Abstract
Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings. In the fields of computer vision and natural language processing, masked autoencoders (MAE) have been recognized as an effective label-free self-pretraining method for transformers, due to their exceptional feature representation ability. However, the original pretraining and fine-tuning design fails to work in low-level vision tasks like denoising. In response to this challenge, we redesign the classical encoder-decoder learning model and facilitate a simple yet effective low-level vision MAE, referred to as LoMAE, tailored to address the LDCT denoising problem. Moreover, we introduce an MAE-GradCAM method to shed light on the latent learning mechanisms of the MAE/LoMAE. Additionally, we explore the LoMAE's robustness and generability across a variety of noise levels. Experiments results show that the proposed LoMAE can enhance the transformer's denoising performance and greatly relieve the dependence on the ground truth clean data. It also demonstrates remarkable robustness and generalizability over a spectrum of noise levels.
摘要
低剂量 computed tomography (LDCT) 具有减少 X-ray 辐射暴露的优点,但是它会增加图像质量的噪声和artefacts。最近, transformer 模型在提高 LDCT 图像质量方面表现出了承诺。然而,这些模型的成功受到了丰富的对照图像对照集的限制,而在临床 setting 中这些对照图像通常罕见。在计算机视觉和自然语言处理领域, masked autoencoder (MAE) 被认为是一种有效的无标签预训练方法,因为它们在特征表示方面具有出色的能力。然而,原始的预训练和精度调整设计无法在低级视觉任务中进行 denoising。为回应这个挑战,我们重新设计了传统的 encoder-decoder 学习模型,并提出了一种简单 yet effective 的低级视觉 MAE,被称为 LoMAE,专门针对 LDCT denoising 问题。此外,我们还提出了 MAE-GradCAM 方法,以解释 LoMAE 在隐藏学习机制方面的学习过程。此外,我们还对 LoMAE 的Robustness和可generate性进行了多种噪声水平的测试。实验结果表明,我们的 LoMAE 可以提高 transformer 的 denoising性能,同时大幅减少了对 clean 数据的依赖。它还表现出了remarkable robustness和泛化性,适用于多种噪声水平。
Deep Learning Techniques for Video Instance Segmentation: A Survey
for: 这 paper 的目的是对视频实例分割问题进行深入分析和评估,并提出一些有效的方法来解决这个问题。
methods: 这 paper 使用了许多深度学习技术来解决视频实例分割问题,包括不同的架构设计和 auxiliary 技术。
results: 这 paper 对各种深度学习模型的性能、复杂度和计算负担进行了比较和分析,并提出了一些优化方法来提高视频实例分割的性能。Abstract
Video instance segmentation, also known as multi-object tracking and segmentation, is an emerging computer vision research area introduced in 2019, aiming at detecting, segmenting, and tracking instances in videos simultaneously. By tackling the video instance segmentation tasks through effective analysis and utilization of visual information in videos, a range of computer vision-enabled applications (e.g., human action recognition, medical image processing, autonomous vehicle navigation, surveillance, etc) can be implemented. As deep-learning techniques take a dominant role in various computer vision areas, a plethora of deep-learning-based video instance segmentation schemes have been proposed. This survey offers a multifaceted view of deep-learning schemes for video instance segmentation, covering various architectural paradigms, along with comparisons of functional performance, model complexity, and computational overheads. In addition to the common architectural designs, auxiliary techniques for improving the performance of deep-learning models for video instance segmentation are compiled and discussed. Finally, we discuss a range of major challenges and directions for further investigations to help advance this promising research field.
摘要
视频实例分割(也称为多对象跟踪和分割)是一个迅速发展的计算机视觉研究领域,于2019年引入,旨在同时检测、分割和跟踪视频中的实例。通过有效地分析和利用视频中的视觉信息,可以实现许多基于计算机视觉的应用程序(如人员动作识别、医疗影像处理、自动驾驶车辆导航、监控等)。随着深度学习技术在多种计算机视觉领域中的主导地位,一大批深度学习基于的视频实例分割方案已经被提出。本评论对深度学习方案的多种建筑思想进行了全面的概述,并对不同方案的功能性能、模型复杂度和计算负担进行了比较。此外,还编译了一些改进深度学习模型的 auxiliary 技巧,并对它们进行了讨论。最后,我们讨论了一些主要挑战和未来研究的方向,以帮助这个有前途的研究领域的发展。
results: 这篇论文的研究结果包括三个大的能量数据集,以及一个基于这些数据集的预测器和效率评分指标。这些结果可以帮助提高边缘 computing 的可持续性和效率。Abstract
Today, deep learning optimization is primarily driven by research focused on achieving high inference accuracy and reducing latency. However, the energy efficiency aspect is often overlooked, possibly due to a lack of sustainability mindset in the field and the absence of a holistic energy dataset. In this paper, we conduct a threefold study, including energy measurement, prediction, and efficiency scoring, with an objective to foster transparency in power and energy consumption within deep learning across various edge devices. Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. This study results in the creation of three extensive energy datasets for edge devices, covering a wide range of kernels, state-of-the-art DNN models, and popular AI applications. Secondly, we design and implement the first kernel-level energy predictors for edge devices based on our kernel-level energy dataset. Evaluation results demonstrate the ability of our predictors to provide consistent and accurate energy estimations on unseen DNN models. Lastly, we introduce two scoring metrics, PCS and IECS, developed to convert complex power and energy consumption data of an edge device into an easily understandable manner for edge device end-users. We hope our work can help shift the mindset of both end-users and the research community towards sustainability in edge computing, a principle that drives our research. Find data, code, and more up-to-date information at https://amai-gsu.github.io/DeepEn2023.
摘要
Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. This study results in the creation of three extensive energy datasets for edge devices, covering a wide range of kernels, state-of-the-art DNN models, and popular AI applications.Secondly, we design and implement the first kernel-level energy predictors for edge devices based on our kernel-level energy dataset. Evaluation results demonstrate the ability of our predictors to provide consistent and accurate energy estimations on unseen DNN models.Lastly, we introduce two scoring metrics, PCS and IECS, developed to convert complex power and energy consumption data of an edge device into an easily understandable manner for edge device end-users. We hope our work can help shift the mindset of both end-users and the research community towards sustainability in edge computing, a principle that drives our research.Find data, code, and more up-to-date information at .
for: 本研究旨在分析算法的透明度,在人工智能 causation 问题上的开放辩论中提供一种实验方法,以评估现有一个最佳生成 AI 模型(Chat-GPT)的性能,并研究如何通过法律规范来调控它。
methods: 本研究使用 Turing Test 的对话方法进行实验,以评估 Chat-GPT 模型的性能,并从法律角度探讨意识、故意和责任等意义,以更好地理解 AI 的使用问题。
results: 研究结果表明,Chat-GPT 模型在一些情况下可以达到高度的准确率,但也存在一些潜在的问题和风险,如模型的透明度和可控性等。此外,本研究还提出了一些可能的法律解决方案,以帮助调控 AI 的使用。Abstract
The purpose of this paper is to analyse the opacity of algorithms, contextualized in the open debate on responsibility for artificial intelligence causation; with an experimental approach by which, applying the proposed conversational methodology of the Turing Test, we expect to evaluate the performance of one of the best existing NLP model of generative AI (Chat-GPT) to see how far it can go right now and how the shape of a legal regulation of it could be. The analysis of the problem will be supported by a comment of Italian classical law categories such as causality, intent and fault to understand the problem of the usage of AI, focusing in particular on the human-machine interaction. On the computer science side, for a technical point of view of the logic used to craft these algorithms, in the second chapter will be proposed a practical interrogation of Chat-GPT aimed at finding some critical points of the functioning of AI. The end of the paper will concentrate on some existing legal solutions which can be applied to the problem, plus a brief description of the approach proposed by EU Artificial Intelligence act.
摘要
本文的目的是分析算法的透明度,在人工智能 causation 的开放辩论中上下文化了这个问题;通过应用提议的对话方法(Turing Test),我们预计可以评估现有一个最佳的生成 AI 模型(Chat-GPT)的性能,以评估它目前的可能性和如何制定相关的法律规范。对问题的分析将得到意大利古典法Category的评论,包括 causality、意图和责任,以更好地理解 AI 的使用问题,特别是人机交互方面。从计算机科学的角度来看,在第二章中将提出一种实践的问题对 Chat-GPT 的探索,以找出一些算法的关键点。总结部分将聚焦现有的法律解决方案, plus 简要描述 EU 人工智能法规。
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models
results: 相比其他基于state-of-the-art的基eline,本研究的方法在SST2、IMDB和AGNews等 dataset上表现出了更好的平衡点,即Accuracy、Sparsity、Robustness和束致成本之间的trade-off。这表明了本研究的方法可以有效地提高语音模型的Robustness,同时保持Accuracy和Sparsity的良好性。Abstract
The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.
摘要
<>translate "The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models."into Simplified Chinese:<>大型语言模型的剪除目标已经从精度和稀疏性扩展到了可靠性,然而现有方法在不断增加模型稀疏性时很难提高对攻击性词语的抵御能力,而且需要重新训练过程。人类进入大型语言模型时代,这些问题变得越来越突出。本文提议语言模型的可靠性与它拥有的预训练知识量成正比。根据此,我们提出了一种增强语言模型可靠性的后期剪除策略,该策略可以准确复制权重空间和特征空间,以保留更多的预训练知识。在这种设置下,每层的重建错误不仅来自自己,还包括先前层次的累加错误,然后进行自适应修正。相比其他当前基elines,我们的方法在 SST2、IMDB 和 AGNews 数据集上显示出了更好的平衡性,marks a significant step towards robust pruning in language models.
Fast and Accurate Factual Inconsistency Detection Over Long Documents
methods: 本研究使用了一种新的分割策略,即源剥离方法(Source Chunking Approach),将长文本分割成大块来conditioning。这种方法基于自然语言推理(Natural Language Inference),并且可以在多种任务上达到状态之最的性能。
results: 本研究的实验结果表明,SCALE模型可以在多种任务上超越现有方法,并且在长输入情况下表现更好。此外,SCALE模型还可以快速地解释自己的决策,并且在效率和模型解释评价中也表现出优异。Abstract
Generative AI models exhibit remarkable potential; however, hallucinations across various tasks present a significant challenge, particularly for longer inputs that current approaches struggle to address effectively. We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy. Specifically, SCALE is a Natural Language Inference (NLI) based model that uses large text chunks to condition over long texts. This approach achieves state-of-the-art performance in factual inconsistency detection for diverse tasks and long inputs. Additionally, we leverage the chunking mechanism and employ a novel algorithm to explain SCALE's decisions through relevant source sentence retrieval. Our evaluations reveal that SCALE outperforms existing methods on both standard benchmarks and a new long-form dialogue dataset ScreenEval we constructed. Moreover, SCALE surpasses competitive systems in efficiency and model explanation evaluations. We have released our code and data publicly to GitHub.
摘要
现代生成AI模型表现出了惊人的潜力,但是幻觉在不同任务中存在一定挑战,特别是对 longer inputs 的处理。我们介绍了 SCALE(Source Chunking Approach for Large-scale inconsistency Evaluation),一种任务非依赖的模型,用于检测事实不一致性。特别是,SCALE 是一种基于自然语言推理(NLI)的模型,使用大量文本块来condition over long text。这种方法在多种任务和长输入上实现了状态的末点性表现。此外,我们利用块化机制,并使用一种新的算法来解释 SCALE 的决策,通过 relevance source sentence retrieval。我们的评估表明,SCALE 在标准 benchmark 和我们新建的长形对话 dataset ScreenEval 上表现出色,并且在效率和模型解释评估中也超过了现有的系统。我们已经在 GitHub 上公开了代码和数据。
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
For: 本文旨在提出一种新的image-to-image翻译方法,以帮助在image synthesis任务中实现高质量的图像生成。* Methods: 本文使用了cycle consistency来regulates image manipulation,并且通过多种方法来提高image-to-image翻译的精度和质量。* Results: 实验结果表明,Cyclenet可以在不同的粒度水平上实现高质量的图像翻译,并且可以在不同的物体和场景翻译任务中提供高度的一致性。此外,本文还提供了一个多域image-to-image翻译 dataset,以便研究物体的物理状态变化。Abstract
Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation. Various methods have been explored to address this issue, including mask-based methods, attention-based methods, and image-conditioning. However, it remains a critical challenge to enable unpaired I2I translation with pre-trained DMs while maintaining satisfying consistency. This paper introduces Cyclenet, a novel but simple method that incorporates cycle consistency into DMs to regularize image manipulation. We validate Cyclenet on unpaired I2I tasks of different granularities. Besides the scene and object level translation, we additionally contribute a multi-domain I2I translation dataset to study the physical state changes of objects. Our empirical studies show that Cyclenet is superior in translation consistency and quality, and can generate high-quality images for out-of-domain distributions with a simple change of the textual prompt. Cyclenet is a practical framework, which is robust even with very limited training data (around 2k) and requires minimal computational resources (1 GPU) to train. Project homepage: https://cyclenetweb.github.io/
摘要
Diffusion models (DMs) 已经帮助实现图像生成任务的突破,但缺乏直观的图像到图像(I2I)翻译界面。各种方法已经被探索以解决这个问题,包括面罩基的方法、注意力基的方法和图像条件。但是,尚未能够通过预训练的 DMs 实现无对应 I2I 翻译,并保持满意的一致性。这篇论文介绍 CycleNet,一种新的简单方法,将图像 manipulate 中的循环一致性 incorporated 到 DMs 中来规范化图像翻译。我们验证 CycleNet 在不同粒度的无对应 I2I 任务上。除了场景和对象层翻译,我们还贡献了多域 I2I 翻译数据集,以研究物体的物理状态变化。我们的实验表明,CycleNet 在翻译一致性和质量方面具有优势,能够生成高质量的图像,并且可以通过简单地修改文本提示来生成出去domains 的图像。CycleNet 是一个实用的框架,可以在很少的训练数据(约 2k)和少量计算资源(1 GPU)下进行训练。项目首页:https://cyclenetweb.github.io/
A Distributed Approach to Meteorological Predictions: Addressing Data Imbalance in Precipitation Prediction Models through Federated Learning and GANs
results: 本研究表明,通过使用数据增强技术,可以提高模型的准确性,并在中央和联合数据存储和处理环境中进行分类,保持数据隐私和完整性。Abstract
The classification of weather data involves categorizing meteorological phenomena into classes, thereby facilitating nuanced analyses and precise predictions for various sectors such as agriculture, aviation, and disaster management. This involves utilizing machine learning models to analyze large, multidimensional weather datasets for patterns and trends. These datasets may include variables such as temperature, humidity, wind speed, and pressure, contributing to meteorological conditions. Furthermore, it's imperative that classification algorithms proficiently navigate challenges such as data imbalances, where certain weather events (e.g., storms or extreme temperatures) might be underrepresented. This empirical study explores data augmentation methods to address imbalanced classes in tabular weather data in centralized and federated settings. Employing data augmentation techniques such as the Synthetic Minority Over-sampling Technique or Generative Adversarial Networks can improve the model's accuracy in classifying rare but critical weather events. Moreover, with advancements in federated learning, machine learning models can be trained across decentralized databases, ensuring privacy and data integrity while mitigating the need for centralized data storage and processing. Thus, the classification of weather data stands as a critical bridge, linking raw meteorological data to actionable insights, enhancing our capacity to anticipate and prepare for diverse weather conditions.
摘要
天气数据分类涉及将气象现象分为类别,以便进行细化分析和精准预测,为各个领务如农业、航空和灾害管理等提供指导。这些分类模型可以使用机器学习算法分析大量多维天气数据寻找模式和趋势。这些数据可能包括温度、湿度、风速和压力等气象条件变量。此外,分类算法需要能够有效地处理数据不均衡问题,例如某些天气事件(如风暴或极端温度)可能被下标。本研究探讨了数据扩展技术来解决中心化和联邦化设置下的不均衡分类问题。使用数据扩展技术如Synthetic Minority Over-sampling Technique或生成对抗网络可以提高模型在分类罕见而critical天气事件的准确率。此外,随着联邦学习的发展,机器学习模型可以在分布式数据库上进行训练,保持隐私和数据完整性,同时减少中心化数据存储和处理的需求。因此,天气数据分类作为把天气数据与行动指导相连接的关键桥梁,可以提高我们对多样化天气条件的预测和准备能力。
Conditional Generative Modeling for Images, 3D Animations, and Video
results: 我们的研究实现了许多成果,包括使用神经泛化函数来预测未来视频几帧,以及基于低分辨率输入进行高分辨率图像生成。我们还提出了一个可以自动调整人像和3D角色的对称运动架构,并进行了几个实验来评估这些成果。Abstract
This dissertation attempts to drive innovation in the field of generative modeling for computer vision, by exploring novel formulations of conditional generative models, and innovative applications in images, 3D animations, and video. Our research focuses on architectures that offer reversible transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation. In all instances, we incorporate conditional information to enhance the synthesis of visual data, improving the efficiency of the generation process as well as the generated content. We introduce the use of Neural ODEs to model video dynamics using an encoder-decoder architecture, demonstrating their ability to predict future video frames despite being trained solely to reconstruct current frames. Next, we propose a conditional variant of continuous normalizing flows that enables higher-resolution image generation based on lower-resolution input, achieving comparable image quality while reducing parameters and training time. Our next contribution presents a pipeline that takes human images as input, automatically aligns a user-specified 3D character with the pose of the human, and facilitates pose editing based on partial inputs. Next, we derive the relevant mathematical details for denoising diffusion models that use non-isotropic Gaussian processes, and show comparable generation quality. Finally, we devise a novel denoising diffusion framework capable of solving all three video tasks of prediction, generation, and interpolation. We perform ablation studies, and show SOTA results on multiple datasets. Our contributions are published articles at peer-reviewed venues. Overall, our research aims to make a meaningful contribution to the pursuit of more efficient and flexible generative models, with the potential to shape the future of computer vision.
摘要
这个论文目标在计算机视觉领域推动创新,通过探索新的条件生成模型形式和应用,提高生成过程和生成内容的效率。我们的研究关注降噪和视觉数据的可逆转换,以及使用编码器-解码器架构进行生成任务和3D内容修饰。在所有情况下,我们包含条件信息以提高生成的效果。我们引入神经 ordinary differential equations(ODEs)来模型视频动态,使用编码器-解码器架构预测未来视频帧,即使只有训练current frame。然后,我们提出一种基于Continuous Normalizing Flows的条件变体,允许基于lower-resolution输入生成高分辨率图像,实现相同的图像质量,同时减少参数和训练时间。我们的下一个贡献是一个管道,可以将人像作为输入,自动将用户指定的3D人物与人像的 pose 对齐,并且允许基于部分输入进行pose编辑。接着,我们 derive the relevant mathematical details for denoising diffusion models that use non-isotropic Gaussian processes, and show comparable generation quality。最后,我们提出一种新的杂化噪声框架,能够解决所有三个视频任务:预测、生成和 interpolate。我们进行了减少研究,并在多个数据集上达到了state-of-the-art results。总的来说,我们的研究旨在为计算机视觉领域提供更有效率和灵活的生成模型,并且具有可能在未来影响计算机视觉的潜在。
CLIFT: Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain
results: 论文发现,即使在原始测试集上显示出了很好的 результа,但当应用于新的测试集时,模型的性能却会下降,这显示出了分布shift的问题。Abstract
This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question-answering task. The testbed includes 7.5k high-quality question answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several QA deep-learning models under the proposed testbed. Despite impressive results on the original test set, the performance degrades when applied to new test sets, which shows the distribution shift. Our findings emphasize the need for and the potential for increasing the robustness of clinical domain models under distributional shifts. The testbed offers one way to track progress in that direction. It also highlights the necessity of adopting evaluation metrics that consider robustness to natural distribution shifts. We plan to expand the corpus by adding more samples and model results. The full paper and the updated benchmark are available at github.com/openlifescience-ai/clift
摘要
这份论文介绍了一个新的 клиниче域问答任务测试环境(CLIFT)。该测试环境包含7500个高质量问答样本,以提供多样化和可靠的参考基线。我们进行了全面的实验研究,评估了多种问答深度学习模型在提posed的测试环境下的性能。尽管在原始测试集上表现出色,但当应用于新的测试集时,其性能却受到分布Shift的影响。我们的发现强调了在分布Shift下提高клиниче域模型的可靠性的需要,以及采用考虑到自然分布Shift的评价指标的重要性。该测试环境可以用于跟踪进步,并高亮了采用更多样本和模型结果来扩展 corrpus的必要性。论文全文和更新的benchmark可以在github.com/openlifescience-ai/clift找到。
Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries
For: The paper aims to investigate the effectiveness of large language models (LLMs) as multi-lingual dialogue systems for healthcare queries, and to provide a cross-lingual benchmark for evaluating their performance.* Methods: The paper uses empirically-derived framework XlingEval, which focuses on three fundamental criteria for evaluating LLM responses to naturalistic health-related questions: correctness, consistency, and verifiability. The paper also uses extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, and an amalgamation of algorithmic and human-evaluation strategies.* Results: The paper finds a pronounced disparity in LLM responses across the four languages, indicating a need for enhanced cross-lingual capabilities. The paper proposes XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. The findings underscore the need to bolster the cross-lingual capacities of these models and provide an equitable information ecosystem accessible to all.Here is the information in Simplified Chinese text:* For: 研究大语言模型(LLM)在医疗域中的多语言对话系统的有效性。* Methods: 使用经验所得的框架XlingEval,对自然语言中的医疗问题进行评估。* Results: 发现不同语言中的LLM响应存在显著的差异,需要提升跨语言能力。提出了跨语言医疗 benchmark XlingHealth,以评估 LLM 在医疗域中的多语言能力。发现需要提高跨语言能力,以提供一个平等的信息生态系统。Abstract
Large language models (LLMs) are transforming the ways the general public accesses and consumes information. Their influence is particularly pronounced in pivotal sectors like healthcare, where lay individuals are increasingly appropriating LLMs as conversational agents for everyday queries. While LLMs demonstrate impressive language understanding and generation proficiencies, concerns regarding their safety remain paramount in these high-stake domains. Moreover, the development of LLMs is disproportionately focused on English. It remains unclear how these LLMs perform in the context of non-English languages, a gap that is critical for ensuring equity in the real-world use of these systems.This paper provides a framework to investigate the effectiveness of LLMs as multi-lingual dialogue systems for healthcare queries. Our empirically-derived framework XlingEval focuses on three fundamental criteria for evaluating LLM responses to naturalistic human-authored health-related questions: correctness, consistency, and verifiability. Through extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets, and through an amalgamation of algorithmic and human-evaluation strategies, we found a pronounced disparity in LLM responses across these languages, indicating a need for enhanced cross-lingual capabilities. We further propose XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. Our findings underscore the pressing need to bolster the cross-lingual capacities of these models, and to provide an equitable information ecosystem accessible to all.
摘要
大型语言模型(LLMs)正在改变公众如何获取和消耗信息的方式。它们在重要的领域 like 医疗方面的影响 particualry pronounced,lay individuals increasingly appropriate LLMs as conversational agents for everyday queries. Although LLMs demonstrate impressive language understanding and generation capabilities, concerns about their safety remain paramount in these high-stakes domains. In addition, the development of LLMs is disproportionately focused on English, and it is unclear how these models perform in the context of non-English languages, which is critical for ensuring equity in the real-world use of these systems.This paper proposes a framework to investigate the effectiveness of LLMs as multi-lingual dialogue systems for healthcare queries. Our empirically derived framework XlingEval focuses on three fundamental criteria for evaluating LLM responses to naturalistic human-authored health-related questions: correctness, consistency, and verifiability. Through extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets, and through an amalgamation of algorithmic and human-evaluation strategies, we found a pronounced disparity in LLM responses across these languages, indicating a need for enhanced cross-lingual capabilities. We further propose XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. Our findings underscore the pressing need to bolster the cross-lingual capacities of these models and provide an equitable information ecosystem accessible to all.
Deep Reinforcement Learning-based Intelligent Traffic Signal Controls with Optimized CO2 emissions
results: 论文的实验结果表明,EcoLight可以不仅减少CO2排放,还能达到与传统方法相当的旅行时间和等待时间。Abstract
Nowadays, transportation networks face the challenge of sub-optimal control policies that can have adverse effects on human health, the environment, and contribute to traffic congestion. Increased levels of air pollution and extended commute times caused by traffic bottlenecks make intersection traffic signal controllers a crucial component of modern transportation infrastructure. Despite several adaptive traffic signal controllers in literature, limited research has been conducted on their comparative performance. Furthermore, despite carbon dioxide (CO2) emissions' significance as a global issue, the literature has paid limited attention to this area. In this report, we propose EcoLight, a reward shaping scheme for reinforcement learning algorithms that not only reduces CO2 emissions but also achieves competitive results in metrics such as travel time. We compare the performance of tabular Q-Learning, DQN, SARSA, and A2C algorithms using metrics such as travel time, CO2 emissions, waiting time, and stopped time. Our evaluation considers multiple scenarios that encompass a range of road users (trucks, buses, cars) with varying pollution levels.
摘要
现在,交通网络面临着优化控制策略的挑战,这可能有害于人类健康、环境和交通堵塞。交通拥堵导致的空气污染和延长的交通时间,使得路口交通信号控制器成为现代交通基础设施的关键组件。尽管有多种适应交通信号控制器的文献研究,但有限的研究把关于它们的比较性能。此外,尽管二氧化碳(CO2)排放的全球问题的重要性,文献却很少关注这一方面。在本报告中,我们提出了EcoLight,一种奖励形式学习算法的补做方案,不仅减少CO2排放,还可以在交通时间、等待时间和停止时间等指标中实现竞争性能。我们使用表格Q学习、DQN、SARSA和A2C算法进行比较,评估多种情况,包括不同的路用者(卡车、汽车、客车)以及不同的污染水平。
results: 研究发现,模型在计算过程中延迟开始,但快速执行计算任务,并且在某些罕见的高损失情况下进行了详细的解释。通过严格的测试和数学模型,这些发现得到了证实。这些研究对机器学习模型的安全和道德使用做出了贡献,并为更复杂的任务和多层Transformer模型的分析开了道。Abstract
Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper presents an in-depth analysis of a one-layer Transformer model trained for integer addition. We reveal that the model divides the task into parallel, digit-specific streams and employs distinct algorithms for different digit positions. Our study also finds that the model starts calculations late but executes them rapidly. A rare use case with high loss is identified and explained. Overall, the model's algorithm is explained in detail. These findings are validated through rigorous testing and mathematical modeling, contributing to the broader works in Mechanistic Interpretability, AI safety, and alignment. Our approach opens the door for analyzing more complex tasks and multi-layer Transformer models.
摘要
理解机器学习模型如转换器的内部工作方式是安全和道德使用的关键。本文对一个一层转换器模型进行了深入的分析,该模型用于整数加法。我们发现该模型将任务分解为并行、数字特定的流程,并采用不同的算法来处理不同的数字位置。我们的研究还发现该模型在计算开始后很快进行了执行。我们还发现了一个罕见的高损失情况,并对其进行了解释。总的来说,该模型的算法得到了详细的解释。我们的方法为机器学习安全和启发领域的更广泛研究开创了新的可能性。
Semi-Supervised Learning of Dynamical Systems with Neural Ordinary Differential Equations: A Teacher-Student Model Approach
results: 与基准 Neural ODE 模型比较,TS-NODE 在多个动态系统模型Task上显示出了显著的性能改进。Abstract
Modeling dynamical systems is crucial for a wide range of tasks, but it remains challenging due to complex nonlinear dynamics, limited observations, or lack of prior knowledge. Recently, data-driven approaches such as Neural Ordinary Differential Equations (NODE) have shown promising results by leveraging the expressive power of neural networks to model unknown dynamics. However, these approaches often suffer from limited labeled training data, leading to poor generalization and suboptimal predictions. On the other hand, semi-supervised algorithms can utilize abundant unlabeled data and have demonstrated good performance in classification and regression tasks. We propose TS-NODE, the first semi-supervised approach to modeling dynamical systems with NODE. TS-NODE explores cheaply generated synthetic pseudo rollouts to broaden exploration in the state space and to tackle the challenges brought by lack of ground-truth system data under a teacher-student model. TS-NODE employs an unified optimization framework that corrects the teacher model based on the student's feedback while mitigating the potential false system dynamics present in pseudo rollouts. TS-NODE demonstrates significant performance improvements over a baseline Neural ODE model on multiple dynamical system modeling tasks.
摘要
Translation in Simplified Chinese:模型动力系统是许多任务的关键,但是它们仍然具有复杂的非线性动力学、有限的观察数据和缺乏先验知识等挑战。最近,基于数据驱动的方法如神经常微方程(NODE)已经展示了许多承诺的结果,通过神经网络的表达能力来模型未知的动力学。然而,这些方法通常受限于有限的标注训练数据,导致泛化和优化结果不佳。相反,半supervised算法可以利用丰富的无标注数据,在分类和回归任务中达到良好的表现。我们提出了TS-NODE,首个半supervised的动力系统模型ing方法,它采用了免费生成的合成pseudo rollouts来扩大状态空间的探索,并在教师-学生模型下对lack of ground-truth system data bring的挑战。TS-NODE employ an unified optimization framework to correct the teacher model based on the student's feedback while mitigating the potential false system dynamics present in pseudo rollouts。TS-NODE在多个动力系统模型ing任务上示出了显著的性能提升 compared to a baseline Neural ODE model.
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
results: 在多Modal audio-video FakeAVCeleb 数据集上,模型达到了最佳性能,比已有方法高效。Abstract
Forged content shared widely on social media platforms is a major social problem that requires increased regulation and poses new challenges to the research community. The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous work on detecting AI-generated fake videos only utilizes visual modality or audio modality. While there are some methods in the literature that exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multi-modal datasets of deepfake videos involving acoustic and visual manipulations. Moreover, these existing methods are mostly based on CNN and suffer from low detection accuracy. Inspired by the recent success of Transformer in various fields, to address the challenges posed by deepfake technology, in this paper, we propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation to achieve effective video forgery detection. Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction. For evaluation, we use the recently released benchmark multi-modal audio-video FakeAVCeleb dataset. For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset. Experimental results show that our best model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset.
摘要
伪造内容在社交媒体平台上广泛分享是一个重要的社会问题,需要加强管理和 pose new challenges to the research community。 recent deepfake video 的普及引起了听Visual forgery的问题的关注。 previoius work on detecting AI-generated fake videos 仅仅使用视觉modalities or audio modalities。 Although there are some methods in the literature that exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multi-modal datasets of deepfake videos involving acoustic and visual manipulations。 Moreover, these existing methods are mostly based on CNN and suffer from low detection accuracy。inspired by the recent success of Transformer in various fields, to address the challenges posed by deepfake technology, in this paper, we propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation to achieve effective video forgery detection。 Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction。 For evaluation, we use the recently released benchmark multi-modal audio-video FakeAVCeleb dataset。 For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset。 Experimental results show that our best model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset。
Particle Guidance: non-I.I.D. Diverse Sampling with Diffusion Models
paper_authors: Gabriele Corso, Yilun Xu, Valentin de Bortoli, Regina Barzilay, Tommi Jaakkola
for: The paper is written for researchers and practitioners interested in improving the diversity and sample efficiency of generative models, particularly in the context of image and molecular conformer generation.
methods: The paper proposes a new method called particle guidance, which extends diffusion-based generative sampling by enforcing diversity through a joint-particle time-evolving potential. The method is based on the idea of moving beyond the common assumption of independent samples.
results: The paper reports empirical results on conditional image generation and molecular conformer generation, showing that particle guidance can increase diversity without affecting quality in the former, and reduce the state-of-the-art median error by 13% on average in the latter.Abstract
In light of the widespread success of generative models, a significant amount of research has gone into speeding up their sampling time. However, generative models are often sampled multiple times to obtain a diverse set incurring a cost that is orthogonal to sampling time. We tackle the question of how to improve diversity and sample efficiency by moving beyond the common assumption of independent samples. We propose particle guidance, an extension of diffusion-based generative sampling where a joint-particle time-evolving potential enforces diversity. We analyze theoretically the joint distribution that particle guidance generates, its implications on the choice of potential, and the connections with methods in other disciplines. Empirically, we test the framework both in the setting of conditional image generation, where we are able to increase diversity without affecting quality, and molecular conformer generation, where we reduce the state-of-the-art median error by 13% on average.
摘要
据广泛成功的生成模型,有大量的研究集中在减速生成模型的采样时间。然而,生成模型通常需要多次采样以获得多样性,这会产生与采样时间无关的成本。我们考虑如何提高多样性和采样效率,我们提出了粒子指导,它是基于扩散基本生成采样的扩展,在粒子时间演化的共同可能性下保证多样性。我们 theoretically 分析了粒子指导生成的共同分布,其影响粒子可能性的选择,以及与其他领域方法的连接。empirically,我们在条件图像生成和分子 conformer 生成中测试了框架,并在后一种情况下降低了状态艺术中的平均错误率 by 13%。
No offence, Bert – I insult only humans! Multiple addressees sentence-level attack on toxicity detection neural network
results: 在七种语言上(来自三种语言家族)实现了这种攻击方法,并描述了对此攻击的防御机制以及其局限性。Abstract
We introduce a simple yet efficient sentence-level attack on black-box toxicity detector models. By adding several positive words or sentences to the end of a hateful message, we are able to change the prediction of a neural network and pass the toxicity detection system check. This approach is shown to be working on seven languages from three different language families. We also describe the defence mechanism against the aforementioned attack and discuss its limitations.
摘要
我们介绍了一种简单 yet 高效的句子级攻击方法,用于针对黑盒毒性检测器模型。我们通过在仇恨消息的末端添加一些积极的词语或句子,使得神经网络的预测结果发生变化,并通过毒性检测系统的检查。这种方法在七种语言中进行了测试,并在三个语言家族中进行了应用。我们还描述了对此攻击的防御机制,并讨论了它的局限性。
Unsupervised Representation Learning to Aid Semi-Supervised Meta Learning
results: 在Omniglot和mini-Imagenet datasets上使用模型不偏 meta-learning和关系网络(RN),并实现了具有较好的准确率的模型。此外,使用提档的初始化方法可以在训练样本数量减少的情况下达到满意的准确率。Abstract
Few-shot learning or meta-learning leverages the data scarcity problem in machine learning. Traditionally, training data requires a multitude of samples and labeling for supervised learning. To address this issue, we propose a one-shot unsupervised meta-learning to learn the latent representation of the training samples. We use augmented samples as the query set during the training phase of the unsupervised meta-learning. A temperature-scaled cross-entropy loss is used in the inner loop of meta-learning to prevent overfitting during unsupervised learning. The learned parameters from this step are applied to the targeted supervised meta-learning in a transfer-learning fashion for initialization and fast adaptation with improved accuracy. The proposed method is model agnostic and can aid any meta-learning model to improve accuracy. We use model agnostic meta-learning (MAML) and relation network (RN) on Omniglot and mini-Imagenet datasets to demonstrate the performance of the proposed method. Furthermore, a meta-learning model with the proposed initialization can achieve satisfactory accuracy with significantly fewer training samples.
摘要
几shot学习或元学习可以解决机器学习中的数据缺乏问题。传统上,超vised学习需要大量的训练数据和标签。为解决这个问题,我们提出了一种一shot无监控元学习,以learn训练数据的隐藏表现。在训练阶段,我们使用了扩展的训练数据作为查询集。在元学习的内循环中,我们使用温度调整的十字熵损失,以避免过拟合。学习的参数从这一步被应用到目标的超级vised元学习中,作为初始化和快速适应的改进。我们的方法是模型不受限的,可以帮助任何元学习模型提高精度。我们使用了model不受限元学习(MAML)和关系网络(RN)在Omniglot和mini-Imagenet数据集上进行表现示例。此外,具有我们的初始化的元学习模型可以在很少的训练数据下 дости得到满意的精度。
From Multilingual Complexity to Emotional Clarity: Leveraging Commonsense to Unveil Emotions in Code-Mixed Dialogues
results: 实验表明,通过系统地 интеGRATION常识信息,ERC的性能得到了显著提高。 both量化评估和质量分析都证明了我们的假设的正确性, thereby confirming the importance of incorporating commonsense in ERC.Abstract
Understanding emotions during conversation is a fundamental aspect of human communication, driving NLP research for Emotion Recognition in Conversation (ERC). While considerable research has focused on discerning emotions of individual speakers in monolingual dialogues, understanding the emotional dynamics in code-mixed conversations has received relatively less attention. This motivates our undertaking of ERC for code-mixed conversations in this study. Recognizing that emotional intelligence encompasses a comprehension of worldly knowledge, we propose an innovative approach that integrates commonsense information with dialogue context to facilitate a deeper understanding of emotions. To achieve this, we devise an efficient pipeline that extracts relevant commonsense from existing knowledge graphs based on the code-mixed input. Subsequently, we develop an advanced fusion technique that seamlessly combines the acquired commonsense information with the dialogue representation obtained from a dedicated dialogue understanding module. Our comprehensive experimentation showcases the substantial performance improvement obtained through the systematic incorporation of commonsense in ERC. Both quantitative assessments and qualitative analyses further corroborate the validity of our hypothesis, reaffirming the pivotal role of commonsense integration in enhancing ERC.
摘要
Translated into Simplified Chinese:理解对话中的情感是人类交流的基本方面,驱动了NLP研究的情感识别在对话中(ERC)。 although considerable research has focused on distinguishing the emotions of individual speakers in monolingual dialogues, understanding the emotional dynamics in code-mixed conversations has received relatively less attention. This motivates our study of ERC for code-mixed conversations. recognizing that emotional intelligence includes a comprehension of worldly knowledge, we propose an innovative approach that integrates commonsense information with dialogue context to facilitate a deeper understanding of emotions. To achieve this, we design an efficient pipeline that extracts relevant commonsense from existing knowledge graphs based on the code-mixed input. Subsequently, we develop an advanced fusion technique that seamlessly combines the acquired commonsense information with the dialogue representation obtained from a dedicated dialogue understanding module. Our comprehensive experimentation shows the substantial performance improvement obtained through the systematic incorporation of commonsense in ERC. Both quantitative assessments and qualitative analyses further corroborate the validity of our hypothesis, reaffirming the pivotal role of commonsense integration in enhancing ERC.
Creative Robot Tool Use with Large Language Models
paper_authors: Mengdi Xu, Peide Huang, Wenhao Yu, Shiqi Liu, Xilun Zhang, Yaru Niu, Tingnan Zhang, Fei Xia, Jie Tan, Ding Zhao for: 这 paper 的目的是研究如何让机器人使用工具创新地完成复杂的任务,包括适应物理约束和长期规划。methods: 这 paper 使用 Large Language Models (LLMs) 开发了一个系统,可以根据自然语言指令控制机器人在实验室和实际环境中进行操作。该系统包括四个重要组件:分析器、规划器、计算器和编译器。results: 这 paper 的结果表明,RoboTool 可以不仅理解任务中的物理约束和环境因素,还能示出创新的工具使用。与传统的 Task and Motion Planning (TAMP) 方法不同,我们的 LLM-based 系统提供了更灵活、高效和用户友好的解决方案,可以扩展机器人系统的能力。经过广泛的实验,我们证明了 RoboTool 能够处理不可能 без创新工具使用的任务,从而扩大机器人系统的可能性。Abstract
Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments. RoboTool incorporates four pivotal components: (i) an "Analyzer" that interprets natural language to discern key task-related concepts, (ii) a "Planner" that generates comprehensive strategies based on the language input and key concepts, (iii) a "Calculator" that computes parameters for each skill, and (iv) a "Coder" that translates these plans into executable Python code. Our results show that RoboTool can not only comprehend explicit or implicit physical constraints and environmental factors but also demonstrate creative tool use. Unlike traditional Task and Motion Planning (TAMP) methods that rely on explicit optimization, our LLM-based system offers a more flexible, efficient, and user-friendly solution for complex robotics tasks. Through extensive experiments, we validate that RoboTool is proficient in handling tasks that would otherwise be infeasible without the creative use of tools, thereby expanding the capabilities of robotic systems. Demos are available on our project page: https://creative-robotool.github.io/.
摘要
tool 使用是高级智能的标志,在动物行为和机器人能力中都有代表性。这篇论文探讨了让机器人具备创新使用工具的能力,以实现在含有隐式物理约束和长期规划的任务中的高级智能。利用大型自然语言模型(LLM),我们开发了RoboTool系统,可以接受自然语言指令并生成控制机器人的可执行代码。RoboTool包括四个重要组件:(一)一个“分析器”,可以从自然语言中解读关键任务相关的概念;(二)一个“规划器”,可以根据自然语言输入和关键概念生成全面的策略;(三)一个“计算器”,可以计算每种技能的参数;以及(四)一个“编译器”,可以将这些规划转换为可执行的Python代码。我们的结果表明,RoboTool不仅可以理解隐式或明确的物理约束和环境因素,而且可以展示创新的工具使用。与传统的任务和动作规划(TAMP)方法不同,我们的LLM基于系统提供了更灵活、高效和用户友好的解决方案,用于复杂的机器人任务。通过广泛的实验,我们证明了RoboTool可以处理不可能 без创新工具使用的任务,从而扩展机器人系统的能力。demo可以在我们项目页面上找到:https://creative-robotool.github.io/.
results: 研究发现,在训练过程中,LC在数据点周围的区域经历了多个阶段,包括一个减少趋势,然后是一个升高阶段,最终是一个最后的减少趋势。此外,研究还发现,在最后一个LC下降阶段的训练过程中,线性区域从训练和测试样本偏移向决策边界,使得深度网络的输入输出变得几乎线性Everywhere else。此外,研究还发现,不同的LC阶段与深度网络的记忆和泛化性强相关,特别是在“搁浅”阶段。Abstract
The study of Deep Network (DN) training dynamics has largely focused on the evolution of the loss function, evaluated on or around train and test set data points. In fact, many DN phenomenon were first introduced in literature with that respect, e.g., double descent, grokking. In this study, we look at the training dynamics of the input space partition or linear regions formed by continuous piecewise affine DNs, e.g., networks with (leaky)ReLU nonlinearities. First, we present a novel statistic that encompasses the local complexity (LC) of the DN based on the concentration of linear regions inside arbitrary dimensional neighborhoods around data points. We observe that during training, the LC around data points undergoes a number of phases, starting with a decreasing trend after initialization, followed by an ascent and ending with a final descending trend. Using exact visualization methods, we come across the perplexing observation that during the final LC descent phase of training, linear regions migrate away from training and test samples towards the decision boundary, making the DN input-output nearly linear everywhere else. We also observe that the different LC phases are closely related to the memorization and generalization performance of the DN, especially during grokking.
摘要
研究深度网络(DN)训练动态的研究主要集中在损失函数的演化,即在训练集和测试集数据点附近。实际上,许多DN现象在文献中首次出现,例如双峰Descender。在这种研究中,我们关注了连续划分 affine DN 的输入空间分区或线性区域的训练动态。例如,使用(漏斗)ReLU非线性。我们首先介绍了一种新的统计,用于量化DN本地复杂性(LC),基于数据点附近多维度邻域中线性区域的吸引度。我们发现,在训练过程中,LC附近数据点经历了一些阶段,包括初始阶段下降趋势,然后升附向阶段,最终结束于下降趋势。使用精确视觉化方法,我们发现在最后一个LC下降阶段的训练中,线性区域在训练和测试样本之间迁移,使DN的输入-输出变得nearly linear everywhere else。我们还发现不同的LC阶段与DN的记忆和泛化性的表现有着密切的关系,特别是在搜寻过程中。
Variational Inference for SDEs Driven by Fractional Noise
methods: 这篇论文使用了变量方法,并基于Markov approxiamtion of fBM, derivation of evidence lower bound,以及使用神经网络来学习涨搅、扩散和控制项的变量 posterior。
results: 论文提出了一种基于变量框架的概率方法,可以快速和有效地解决SDE驱动的fBM推理问题,并且在synthetic data上验证了这种方法的有效性。此外,论文还提出了一种基于变量 neural-SDE的视频预测方法,这是varational neural-SDE的首次应用于视频识别领域。Abstract
We present a novel variational framework for performing inference in (neural) stochastic differential equations (SDEs) driven by Markov-approximate fractional Brownian motion (fBM). SDEs offer a versatile tool for modeling real-world continuous-time dynamic systems with inherent noise and randomness. Combining SDEs with the powerful inference capabilities of variational methods, enables the learning of representative function distributions through stochastic gradient descent. However, conventional SDEs typically assume the underlying noise to follow a Brownian motion (BM), which hinders their ability to capture long-term dependencies. In contrast, fractional Brownian motion (fBM) extends BM to encompass non-Markovian dynamics, but existing methods for inferring fBM parameters are either computationally demanding or statistically inefficient. In this paper, building upon the Markov approximation of fBM, we derive the evidence lower bound essential for efficient variational inference of posterior path measures, drawing from the well-established field of stochastic analysis. Additionally, we provide a closed-form expression to determine optimal approximation coefficients. Furthermore, we propose the use of neural networks to learn the drift, diffusion and control terms within our variational posterior, leading to the variational training of neural-SDEs. In this framework, we also optimize the Hurst index, governing the nature of our fractional noise. Beyond validation on synthetic data, we contribute a novel architecture for variational latent video prediction,-an approach that, to the best of our knowledge, enables the first variational neural-SDE application to video perception.
摘要
我们提出了一种新的变量框架,用于在神经网络随机差分方程(SDE)中进行推理。SDE 提供了一种灵活的工具,用于模型化真实世界中的连续时间动态系统,具有内在的噪音和随机性。将 SDE 与变量方法相结合,可以通过随机梯度下降来学习表达函数分布。然而,传统的 SDE 通常假设下面的噪音随机过程是布朗运动(BM),这限制了它们的能力 capture 长期关系。相比之下,分布式扩展运动(fBM)扩展了 BM,以捕捉非Markovian 动态,但现有的 fBM 参数推断方法是计算昂贵或 statistically 不fficient。在这篇论文中,基于 Markov approxiamtion 的 fBM,我们 deriv 出证明 Lower bound 必要的 Variational 推断 posterior path measures,从 Stochastic 分析领域中得到了灵感。此外,我们还提供了一个关闭式表达式,用于确定最佳拟合系数。此外,我们建议使用神经网络来学习涨栅、滤波和控制项的变ational posterior,从而实现变量训练神经-SDE。在这个框架中,我们还优化了哈斯特指数,它控制了我们的分布式噪音的性质。在此之外,我们还提出了一种新的变量架构,用于变量潜在视频预测,这是一种具有 Variational 神经-SDE 应用的首个途径。
Robust multimodal models have outlier features and encode more concepts
results: 该论文发现robust模型在表征空间中存在两种特征:1)Robust模型具有异常特征,其活动值可以达到数个数量级以上,这些异常特征导致模型的预测力量。2)Robust模型在表征空间中储存了更多的概念,这使得模型可以更好地泛化,但同时也使得模型的解释变得更加困难。Abstract
What distinguishes robust models from non-robust ones? This question has gained traction with the appearance of large-scale multimodal models, such as CLIP. These models have demonstrated unprecedented robustness with respect to natural distribution shifts. While it has been shown that such differences in robustness can be traced back to differences in training data, so far it is not known what that translates to in terms of what the model has learned. In this work, we bridge this gap by probing the representation spaces of 12 robust multimodal models with various backbones (ResNets and ViTs) and pretraining sets (OpenAI, LAION-400M, LAION-2B, YFCC15M, CC12M and DataComp). We find two signatures of robustness in the representation spaces of these models: (1) Robust models exhibit outlier features characterized by their activations, with some being several orders of magnitude above average. These outlier features induce privileged directions in the model's representation space. We demonstrate that these privileged directions explain most of the predictive power of the model by pruning up to $80 \%$ of the least important representation space directions without negative impacts on model accuracy and robustness; (2) Robust models encode substantially more concepts in their representation space. While this superposition of concepts allows robust models to store much information, it also results in highly polysemantic features, which makes their interpretation challenging. We discuss how these insights pave the way for future research in various fields, such as model pruning and mechanistic interpretability.
摘要
Our findings reveal two distinct signatures of robustness in the representation spaces of these models:1. Robust models exhibit outlier features with significantly higher activations, sometimes orders of magnitude above the average. These outlier features create privileged directions in the model's representation space. We show that these privileged directions account for most of the model's predictive power, as pruning up to 80% of the least important representation space directions does not negatively impact model accuracy or robustness.2. Robust models encode a greater number of concepts in their representation space, leading to a more polysemantic feature space. While this allows robust models to store more information, it also makes their interpretation challenging.These insights open up new avenues for future research, such as model pruning and mechanistic interpretability. By understanding what makes a model robust, we can develop more effective and efficient AI systems that are better equipped to handle the complexities of real-world data.
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
results: 该策略可以在多种任务上提高性能,包括2D和3D图像识别、动作识别、非语义任务(如运动预测)和多模态任务(如2D/3D视问答和图像文本检索)。这些改进是对不同类型的LLM和不同LLM块的一致性的。此外,论文还提出了一种叫做信息筛选假设的解释,即预训练LLM块可以挖掘有用的视觉标记并加强其效果。Abstract
This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tokens. Our work pushes the boundaries of leveraging LLMs for computer vision tasks, significantly departing from conventional practices that typically necessitate a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, encompassing pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding -- the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. Code is available at https://github.com/ziqipang/LM4VisualEncoding.
摘要
这篇论文发现,大型语言模型(LLM),即使只接受文本数据培训,可以 surprisngly strong的编码器 для纯视觉任务。甚至更有趣的是,这可以通过一种简单 yet previously overlooked 的策略实现——使用预先做好的 transformer 块作为纯视觉任务中的编码器层直接处理视觉符号。我们的工作推动了利用 LLM 进行计算机视觉任务的推广,明显 Departing from conventional practices that typically require a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, including pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding—the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. 代码可以在 https://github.com/ziqipang/LM4VisualEncoding 上找到。
CLAIR: Evaluating Image Captions with Large Language Models
paper_authors: David Chan, Suzanne Petryk, Joseph E. Gonzalez, Trevor Darrell, John Canny
for: 评估机器生成的图像标签,提出了一种新的评估方法CLAIR。
methods: CLAIR利用大型语言模型(LLMs)的零shot语言模型能力来评估候选标签。
results: CLAIR与人类评估标签质量的协调相对于现有方法提高39.6%,并且可以提供噪音可读的结果,让语言模型可以找到其分配的分数的根据。Abstract
The evaluation of machine-generated image captions poses an interesting yet persistent challenge. Effective evaluation measures must consider numerous dimensions of similarity, including semantic relevance, visual structure, object interactions, caption diversity, and specificity. Existing highly-engineered measures attempt to capture specific aspects, but fall short in providing a holistic score that aligns closely with human judgments. Here, we propose CLAIR, a novel method that leverages the zero-shot language modeling capabilities of large language models (LLMs) to evaluate candidate captions. In our evaluations, CLAIR demonstrates a stronger correlation with human judgments of caption quality compared to existing measures. Notably, on Flickr8K-Expert, CLAIR achieves relative correlation improvements over SPICE of 39.6% and over image-augmented methods such as RefCLIP-S of 18.3%. Moreover, CLAIR provides noisily interpretable results by allowing the language model to identify the underlying reasoning behind its assigned score. Code is available at https://davidmchan.github.io/clair/
摘要
评估机器生成的图像标签 pose 一个有趣又普遍存在挑战。有效的评估方法应该考虑多个相似性维度,包括semantic relevance、视觉结构、对象互动、标签多样性和特定性。现有的高度工程化度量尝试捕捉特定方面,但都不够提供一个整体分数,与人类评估相吻合。在这里,我们提出了CLAIR,一种新的方法,利用大型语言模型(LLM)的零容量语言模型能力来评估候选标签。在我们的评估中,CLAIR表现出较强的人类评估标签质量相关性,比 existed 度量方法更高。尤其是在Flickr8K-Expert上,CLAIR相比SPICE的39.6%和图像扩展方法如RefCLIP-S的18.3%,实现了更高的相关性提升。此外,CLAIR提供了噪音可读的结果,让语言模型可以确定其分配的分数的下层逻辑。代码可以在https://davidmchan.github.io/clair/ 获取。
Does Your Model Think Like an Engineer? Explainable AI for Bearing Fault Detection with Deep Learning
paper_authors: Thomas Decker, Michael Lebacher, Volker Tresp for: 这paper是为了解释深度学习模型在承载轮盘附件的磨损检测 task 中的决策过程。methods: 这paper使用了特有的特征归属分析框架,以评估深度学习模型的逻辑是否符合专家的理解。results: 这paper的研究结果表明,通过使用这种特征归属分析框架,可以评估深度学习模型的可靠性和扩展性,并预测不同的深度学习模型在这个任务中的表现。Abstract
Deep Learning has already been successfully applied to analyze industrial sensor data in a variety of relevant use cases. However, the opaque nature of many well-performing methods poses a major obstacle for real-world deployment. Explainable AI (XAI) and especially feature attribution techniques promise to enable insights about how such models form their decision. But the plain application of such methods often fails to provide truly informative and problem-specific insights to domain experts. In this work, we focus on the specific task of detecting faults in rolling element bearings from vibration signals. We propose a novel and domain-specific feature attribution framework that allows us to evaluate how well the underlying logic of a model corresponds with expert reasoning. Utilizing the framework we are able to validate the trustworthiness and to successfully anticipate the generalization ability of different well-performing deep learning models. Our methodology demonstrates how signal processing tools can effectively be used to enhance Explainable AI techniques and acts as a template for similar problems.
摘要
深度学习已经成功应用于工业传感器数据分析多个相关的实际应用场景。然而,许多高性能方法的含义性具有大量的难以理解性,妨碍了实际应用。可解释AI(XAI)和特征归因技术承诺可以提供决策过程中模型的含义。然而,简单地应用这些方法并不能提供具体的域专家需要的理解。在这种情况下,我们将注意力集中在检测滚动元件 bearings 的振荡信号中的缺陷。我们提出了一种域专家特有的特征归因框架,可以评估模型下的逻辑与域专家的推理是否匹配。通过这种框架,我们能够验证模型的可靠性和扩展性。我们的方法体现了如何使用信号处理工具来增强可解释AI技术,并为类似问题提供模板。
results: 根据实验结果,AutoMix方法在五个上下文推理数据集上超过了现有的基准值,提高了每次成本的增量效果,最高达89%。Abstract
Large language models (LLMs) are now available in various sizes and configurations from cloud API providers. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present AutoMix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to AutoMix is a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring training. Given that verifications can be noisy, we employ a meta verifier in AutoMix to refine the accuracy of these assessments. Our experiments using LLAMA2-13/70B, on five context-grounded reasoning datasets demonstrate that AutoMix surpasses established baselines, improving the incremental benefit per cost by up to 89%. Our code and data are available at https://github.com/automix-llm/automix.
摘要
大型语言模型(LLM)现在从云API提供商处可以获得多种大小和配置。这种多样性提供了广泛的选择,但是有效地利用这些选择以便最大化computational cost和性能仍然是挑战。在这个工作中,我们介绍AutoMix,一种策略性路由查询到更大的LM,基于更小的LM的输出的近似正确性。AutoMix的核心是几 shot自我验证机制,可以无需训练来估计自己的出力的可靠性。由于验证可能会是噪音的,我们在AutoMix中使用meta验证器来精确化这些评价。我们的实验使用LLAMA2-13/70B在五个上下文推理 dataset上显示,AutoMix超过了已知的基准点,提高了增量的价值随着成本的增加量最多达89%。我们的代码和数据可以在https://github.com/automix-llm/automix上获得。
An Emulator for Fine-Tuning Large Language Models using Small Language Models
paper_authors: Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning for:* The paper aims to decouple the knowledge and skills gained during pre-training and fine-tuning in language models, and to investigate the effect of scaling up these stages on the performance of the models.methods:* The authors propose a novel technique called emulated fine-tuning (EFT) to sample from a distribution that approximates the result of pre-training and fine-tuning at different scales.* EFT is based on an RL-based framework derived from recent developments in learning from human preferences.results:* The authors show that scaling up fine-tuning tends to improve helpfulness, while scaling up pre-training tends to improve factuality.* They also demonstrate that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.* Finally, the authors show that LM up-scaling, which involves ensembling small fine-tuned models with large pre-trained models, consistently improves the helpfulness and factuality of instruction-following models in several families of models.Abstract
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filters this knowledge and skillset, this intuition has not been extensively tested. To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?" Using an RL-based framework derived from recent developments in learning from human preferences, we introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates (or 'emulates') the result of pre-training and fine-tuning at different scales. Our experiments with EFT show that scaling up fine-tuning tends to improve helpfulness, while scaling up pre-training tends to improve factuality. Beyond decoupling scale, we show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models, essentially emulating the result of fine-tuning the large pre-trained model. Up-scaling consistently improves helpfulness and factuality of instruction-following models in the Llama, Llama-2, and Falcon families, without additional hyperparameters or training.
摘要
通用的语言模型(LM)通常由扩大一个两个阶段训练管道建立:一个预训练阶段使用非常大、多样化的文本数据,以及一个精度调整(或准确性调整)阶段使用目标示例或特定的行为要求。尽管曾经 hypothesized 知识和技能来自于预训练,而精度调整主要过滤这些知识和技能集,但这种假设未经过广泛测试。为了解决这问题,我们介绍了一种新的技术,可以分离这两个阶段中学习的知识和技能,从而解决问题,“如果将大型模型的预训练知识与小型模型的精度调整结果相结合(或相反),会发生什么?”使用基于最近的人类偏好学习的RL基础,我们引入了一种名为 Emulated Fine-Tuning(EFT)的原则正确且实用的方法,可以采样一个 approximates (或 '模拟')预训练和精度调整的结果。我们的实验表明,在不同的缩放比例下,精度调整会提高帮助fulness,而预训练会提高准确性。此外,EFT 还可以在测试时调整竞争性 trait 如帮助fulness 和无害性,无需额外训练。最后,我们还介绍了一种特殊的 Emulated Fine-Tuning 情况,即 LM up-scaling,可以避免高资源占用的精度调整大型预训练模型,通过 ensemble 小型精度调整模型,效果相当于精度调整大型预训练模型。 up-scaling 一致地提高了 instruction-following 模型在 Llama、Llama-2 和 Falcon 家族中的帮助fulness 和准确性,无需额外参数或训练。
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
results: 当transformer学习中间任务时,它们会快速并意外地提高性能,我们称之为“Eureka-oment”。这种快速改进会使模型在训练过程中更快地 дости到最佳性能,并且更容易学习中间任务,从而提高最终准确率和 robustness。Abstract
In this work, we study rapid, step-wise improvements of the loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate tasks, whereas CNNs have no such issue on the tasks we studied. When transformers learn the intermediate task, they do this rapidly and unexpectedly after both training and validation loss saturated for hundreds of epochs. We call these rapid improvements Eureka-moments, since the transformer appears to suddenly learn a previously incomprehensible task. Similar leaps in performance have become known as Grokking. In contrast to Grokking, for Eureka-moments, both the validation and the training loss saturate before rapidly improving. We trace the problem back to the Softmax function in the self-attention block of transformers and show ways to alleviate the problem. These fixes improve training speed. The improved models reach 95% of the baseline model in just 20% of training steps while having a much higher likelihood to learn the intermediate task, lead to higher final accuracy and are more robust to hyper-parameters.
摘要
在这项研究中,我们研究了transformer在多步决策任务中快速、步进改进损失的现象。我们发现transformer在学习中缺乏intermediate task的能力,而CNN则没有这种问题。当transformer学习intermediate task时,它们会在百余个epoch后, unexpectedly和rapidly提高性能。我们称这些快速改进为“Eureka-moment”,因为transformer似乎在学习一个前不可理解的任务时,突然有了新的理解。相比Grokking,Eureka-moment中,两个loss函数都会先于改进而饱和。我们跟踪问题的起源,并提出了修复方案,这些修复方案可以提高训练速度,并使模型更加快速地学习intermediate task,最终性能高于基线模型,并且更加鲁棒于hyperparameter。
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
paper_authors: Rui Yang, Han Zhong, Jiawei Xu, Amy Zhang, Chongjie Zhang, Lei Han, Tong Zhang for:* 这个论文旨在研究在线上不需要实际交互的情况下学习强化策略的可行性。methods:* 该论文使用了现有的OFFLINE强化学习算法,并进行了全面的数据损害测试,以评估不同算法对数据损害的影响。results:* 研究发现,使用IQL算法可以具有很好的抗性能,并且通过理论和实验分析,发现IQL的超级vised策略学习方案是关键因素。* 不过,IQL仍然面临着动力损害下Q函数的重 tailed Target问题,为了解决这个问题,该论文提出了一种使用惩罚函数来处理重 tailed Target的方法。* 通过将这些简单 yet effective的修改加入IQL算法,该论文提出了一种更加Robust的OFFLINE强化学习方法,称为Robust IQL(RIQL)。* 广泛的实验表明,RIQL在不同的数据损害情况下具有极高的Robust性能。Abstract
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment. However, datasets collected by humans in real-world environments are often noisy and may even be maliciously corrupted, which can significantly degrade the performance of offline RL. In this work, we first investigate the performance of current offline RL algorithms under comprehensive data corruption, including states, actions, rewards, and dynamics. Our extensive experiments reveal that implicit Q-learning (IQL) demonstrates remarkable resilience to data corruption among various offline RL algorithms. Furthermore, we conduct both empirical and theoretical analyses to understand IQL's robust performance, identifying its supervised policy learning scheme as the key factor. Despite its relative robustness, IQL still suffers from heavy-tail targets of Q functions under dynamics corruption. To tackle this challenge, we draw inspiration from robust statistics to employ the Huber loss to handle the heavy-tailedness and utilize quantile estimators to balance penalization for corrupted data and learning stability. By incorporating these simple yet effective modifications into IQL, we propose a more robust offline RL approach named Robust IQL (RIQL). Extensive experiments demonstrate that RIQL exhibits highly robust performance when subjected to diverse data corruption scenarios.
摘要
<> translate the following text into Simplified Chinese:Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment. However, datasets collected by humans in real-world environments are often noisy and may even be maliciously corrupted, which can significantly degrade the performance of offline RL. In this work, we first investigate the performance of current offline RL algorithms under comprehensive data corruption, including states, actions, rewards, and dynamics. Our extensive experiments reveal that implicit Q-learning (IQL) demonstrates remarkable resilience to data corruption among various offline RL algorithms. Furthermore, we conduct both empirical and theoretical analyses to understand IQL's robust performance, identifying its supervised policy learning scheme as the key factor. Despite its relative robustness, IQL still suffers from heavy-tail targets of Q functions under dynamics corruption. To tackle this challenge, we draw inspiration from robust statistics to employ the Huber loss to handle the heavy-tailedness and utilize quantile estimators to balance penalization for corrupted data and learning stability. By incorporating these simple yet effective modifications into IQL, we propose a more robust offline RL approach named Robust IQL (RIQL). Extensive experiments demonstrate that RIQL exhibits highly robust performance when subjected to diverse data corruption scenarios.离线 reinforcement learning (RL) 提供了一个有前途的方法,通过从离线数据集中学习奖励策略,而不需要costly或 unsafe 的环境交互。然而,由人类在实际环境收集的数据集经常受到噪音和可能是黑客损害,这可能会很大程度地下降离线 RL 的性能。在这项工作中,我们首先investigate 当前离线 RL 算法在全面数据损害情况下的性能,包括状态、动作、奖励和动力学。我们的广泛实验表明,隐式 Q 学习 (IQL) 在多种离线 RL 算法中表现出了抗损害的特点。此外,我们还进行了empirical 和理论分析,以了解 IQL 的Robust性能,并确定其超visited 策略学习方案为关键因素。尽管 IQL 相对强健,但是它仍然受到动力学损害的 heavy-tailed 目标问题。为解决这个挑战,我们 draw inspiration 从robust statistics 中,采用 Huber 损失函数来处理 heavy-tailedness,并使用量iles estimators来平衡损害数据和学习稳定性。通过将这些简单 yet effective 修改 incorporated into IQL,我们提出了一种更 Robust 的离线 RL 方法,名为 Robust IQL (RIQL)。广泛实验表明,RIQL 在多种数据损害场景下具有高度 Robust 性能。
Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation
results: 研究发现,这个框架可以帮助用户寻找更多的创作灵感,并且可以帮助他们快速创造出更多的创作作品。Abstract
Thanks to their generative capabilities, large language models (LLMs) have become an invaluable tool for creative processes. These models have the capacity to produce hundreds and thousands of visual and textual outputs, offering abundant inspiration for creative endeavors. But are we harnessing their full potential? We argue that current interaction paradigms fall short, guiding users towards rapid convergence on a limited set of ideas, rather than empowering them to explore the vast latent design space in generative models. To address this limitation, we propose a framework that facilitates the structured generation of design space in which users can seamlessly explore, evaluate, and synthesize a multitude of responses. We demonstrate the feasibility and usefulness of this framework through the design and development of an interactive system, Luminate, and a user study with 8 professional writers. Our work advances how we interact with LLMs for creative tasks, introducing a way to harness the creative potential of LLMs.
摘要
Large language models (LLMs) 的生成能力使其成为了创作过程中的不可或缺工具。这些模型可以生成数百或数千个视觉和文本输出,提供了大量的创作灵感。但是,我们现在是否正确地利用其潜在力?我们认为,当前的交互方式有限制,导引用户快速 converges 到一小组IDEAS,而不是让用户在生成模型的庞大latent design space中自由探索。为解决这一限制,我们提出了一个框架,可以让用户轻松探索、评估和synthesize 多种响应。我们通过设计和开发一个交互系统Luminate,以及与8名专业作家的用户研究,证明了这种框架的可行性和实用性。我们的工作推动了如何在创作任务中与LLMs交互,推出了一种可以挖掘LLMs的创作潜力的方法。
paper_authors: Rishi Bommasani, Kevin Klyman, Shayne Longpre, Sayash Kapoor, Nestor Maslej, Betty Xiong, Daniel Zhang, Percy Liang
for: The paper aims to assess the transparency of foundation model developers and encourage better governance of the foundation model ecosystem.
methods: The authors use a comprehensive set of 100 indicators to evaluate the transparency of 10 major foundation model developers, including information about the upstream resources used to build the models, details about the models themselves, and downstream use.
results: The authors find that there is a lack of transparency in the foundation model ecosystem, with no developer disclosing significant information about the downstream impact of their flagship models, such as the number of users, affected market sectors, or how users can seek redress for harm. The findings provide a baseline for evaluating progress on transparency and governance in the future.In Simplified Chinese, the three key points would be:
results: 作者发现基础模型生态系统中的透明度很低,没有任何开发者公布其旗舰模型的下游影响的重要信息,如用户数量、受影响的市场部门以及如何寻求伤害赔偿等。Abstract
Foundation models have rapidly permeated society, catalyzing a wave of generative AI applications spanning enterprise and consumer-facing contexts. While the societal impact of foundation models is growing, transparency is on the decline, mirroring the opacity that has plagued past digital technologies (e.g. social media). Reversing this trend is essential: transparency is a vital precondition for public accountability, scientific innovation, and effective governance. To assess the transparency of the foundation model ecosystem and help improve transparency over time, we introduce the Foundation Model Transparency Index. The Foundation Model Transparency Index specifies 100 fine-grained indicators that comprehensively codify transparency for foundation models, spanning the upstream resources used to build a foundation model (e.g data, labor, compute), details about the model itself (e.g. size, capabilities, risks), and the downstream use (e.g. distribution channels, usage policies, affected geographies). We score 10 major foundation model developers (e.g. OpenAI, Google, Meta) against the 100 indicators to assess their transparency. To facilitate and standardize assessment, we score developers in relation to their practices for their flagship foundation model (e.g. GPT-4 for OpenAI, PaLM 2 for Google, Llama 2 for Meta). We present 10 top-level findings about the foundation model ecosystem: for example, no developer currently discloses significant information about the downstream impact of its flagship model, such as the number of users, affected market sectors, or how users can seek redress for harm. Overall, the Foundation Model Transparency Index establishes the level of transparency today to drive progress on foundation model governance via industry standards and regulatory intervention.
摘要
基础模型在社会中迅速普及,推动了一波基于企业和消费者面向的生成AI应用程序。然而,社会影响力的透明度在下降,与过去的数字技术(如社交媒体)一样,透明度的下降需要逆转。为确保公共负责任、科学创新和有效管理,我们引入基础模型透明度指数。基础模型透明度指数包括100个细化的指标,全面捕捉基础模型的透明度,包括建模时使用的上游资源(如数据、劳动、计算)、模型的详细信息(如大小、功能、风险)和下游使用(如分布渠道、使用政策、受影地域)。我们对10个主要基础模型开发者(如OpenAI、Google、Meta)进行评分,以评估他们的透明度。为了便于和标准化评估,我们对开发者的做法是基于他们的旗舰模型(如GPT-4、PaLM 2、Llama 2)进行评分。我们发现了10个关键的基础模型生态系统发现,例如,目前没有开发者公布其旗舰模型下游的重要信息,如用户数量、受影市场部门和用户如何申诉害。总之,基础模型透明度指数为今天的透明度水平,以便通过行业标准和法规干预加以进步。
Eureka: Human-Level Reward Design via Coding Large Language Models
paper_authors: Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar
For: The paper aims to develop a human-level reward design algorithm powered by large language models (LLMs) to acquire complex skills via reinforcement learning.* Methods: The proposed algorithm, called Eureka, leverages the zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs to perform evolutionary optimization over reward code.* Results: Eureka outperforms human experts on 83% of the tasks in a diverse suite of 29 open-source RL environments, leading to an average normalized improvement of 52%. Additionally, the algorithm is able to learn complex skills such as pen spinning tricks using a simulated Shadow Hand.Abstract
Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform evolutionary optimization over reward code. The resulting rewards can then be used to acquire complex skills via reinforcement learning. Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%. The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating. Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time, a simulated Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a pen in circles at rapid speed.
摘要
大型语言模型(LLM)在序列决策任务中表现出色,但将其用于学习复杂的低级机械操作任务,如灵活的笔旋转,仍然是一个未解决的问题。我们在这个问题上提出了解论,并提出了一种基于LLM的人类水平奖励设计算法,称为Eureka。Eureka利用现代LLM的零容量生成、代码写作和在场进程提高功能,通过对奖励代码进行演化优化,以获得人类水平的技能。无需任务特定的提示或预定的奖励模板,Eureka可以生成高性能的奖励函数,在29个开源RL环境中,包括10种不同的机器人形态,在83%的任务上超过人类专家,平均减少了52%。Eureka的通用性还允许一种 gradient-free 在场学习方法,可以从人类反馈中提取改进奖励函数的质量和安全性,无需模型更新。最后,使用Eureka奖励在CURRICULUM学习中,我们在 simulations 中首次实现了一个名为Shadow Hand的笔旋转技巧,快速旋转笔,灵活地控制笔的运动。
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots
paper_authors: Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi
results: 实现了基于学习的机器人策略的效果,并与人类合作完成任务,以及观察到机器人在协作任务执行过程中的自然行为。Abstract
We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real human interaction with simulated robots via mouse/keyboard or a VR interface, facilitating evaluation of robot policies with human input. (3) Collaborative tasks: studying two collaborative tasks, Social Navigation and Social Rearrangement. Social Navigation investigates a robot's ability to locate and follow humanoid avatars in unseen environments, whereas Social Rearrangement addresses collaboration between a humanoid and robot while rearranging a scene. These contributions allow us to study end-to-end learned and heuristic baselines for human-robot collaboration in-depth, as well as evaluate them with humans in the loop. Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before. Additionally, we observe emergent behaviors during collaborative task execution, such as the robot yielding space when obstructing a humanoid agent, thereby allowing the effective completion of the task by the humanoid agent. Furthermore, our experiments using the human-in-the-loop tool demonstrate that our automated evaluation with humanoids can provide an indication of the relative ordering of different policies when evaluated with real human collaborators. Habitat 3.0 unlocks interesting new features in simulators for Embodied AI, and we hope it paves the way for a new frontier of embodied human-AI interaction capabilities.
摘要
我们介绍Habitat 3.0:一个人工智能 simulate human-robot collaborative tasks 的平台。Habitat 3.0在三个维度上提供贡献:1. 准确的人形模拟:解决复杂的材质和形态的模拟挑战,同时保证高速的模拟速度。2. 人类在Loop基础设施:允许人类与模拟机器人进行实际的互动,通过鼠标/键盘或VR界面进行评估机器人策略。3. 合作任务:研究社交导航和社交重新排序两种合作任务。社交导航探索机器人如何在未看过环境中找到和跟随人形模拟器,而社交重新排序则研究机器人和人形模拟器之间的合作,重新排序场景。这些贡献使我们可以深入研究人机器人合作的综合学习和规则基础,并通过人类在Loop评估机器人策略的效果。我们的实验表明,学习机器人策略在与未看过人形模拟器和人类合作时能够高效完成任务,并且在合作过程中出现了规则行为,如机器人让出空间以便人形模拟器完成任务。此外,我们使用人类在Loop工具进行自动评估,发现我们的人机器人合作策略在与真实的人类合作者评估时可以提供相对排名的指示。Habitat 3.0开启了人工智能 simulate 的新领域,我们希望它能够推动人机器人交互能力的新前ier。
Digital Twin-Enabled Intelligent DDoS Detection Mechanism for Autonomous Core Networks
results: Results show that the proposed solution can successfully detect DDoS attacks and update the feature selection method and learning model with a true classification rate of ninety-seven percent. The proposed solution can estimate the attack within approximately fifteen minutes after the DDoS attack starts.Abstract
Existing distributed denial of service attack (DDoS) solutions cannot handle highly aggregated data rates; thus, they are unsuitable for Internet service provider (ISP) core networks. This article proposes a digital twin-enabled intelligent DDoS detection mechanism using an online learning method for autonomous systems. Our contributions are three-fold: we first design a DDoS detection architecture based on the digital twin for ISP core networks. We implemented a Yet Another Next Generation (YANG) model and an automated feature selection (AutoFS) module to handle core network data. We used an online learning approach to update the model instantly and efficiently, improve the learning model quickly, and ensure accurate predictions. Finally, we reveal that our proposed solution successfully detects DDoS attacks and updates the feature selection method and learning model with a true classification rate of ninety-seven percent. Our proposed solution can estimate the attack within approximately fifteen minutes after the DDoS attack starts.
摘要
现有的分布式拒绝服务攻击(DDoS)解决方案无法处理高度归一数据流量,因此不适用于互联网服务提供商(ISP)核心网络。本文提出一种基于数字双胞虫的智能DDoS检测机制,使用在线学习方法。我们的贡献包括:1. 基于数字双胞虫的DDoS检测体系设计 для ISP核心网络。2. 实现了基于YANG模型和自动特征选择(AutoFS)模块来处理核心网络数据。3. 使用在线学习方法来升级模型,实时更新学习模型,提高预测精度。我们的解决方案可以在约15分钟内检测DDoS攻击,并且可以随时更新特征选择方法和学习模型,实现true类别率达97%。
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
results: 使用 VLM-RMs 训练 MuJoCo 人工智能学习多种复杂任务,包括跪姿、分解和坐法印袋pose,只需提供单个文本提示,并且可以通过提供基线提示来改进性能。Abstract
Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve performance by providing a second ``baseline'' prompt and projecting out parts of the CLIP embedding space irrelevant to distinguish between goal and baseline. Further, we find a strong scaling effect for VLM-RMs: larger VLMs trained with more compute and data are better reward models. The failure modes of VLM-RMs we encountered are all related to known capability limitations of current VLMs, such as limited spatial reasoning ability or visually unrealistic environments that are far off-distribution for the VLM. We find that VLM-RMs are remarkably robust as long as the VLM is large enough. This suggests that future VLMs will become more and more useful reward models for a wide range of RL applications.
摘要
强化学习(RL)通常需要手动指定奖励函数,这经常是不可能的或者需要大量的人工反馈,这都是非常昂贵的。我们研究了一种更加样本效率的方法:使用预训练的视觉语言模型(VLM)作为零shot奖励模型(RM),通过自然语言来定义任务。我们提出了一种自然的通用方法,称之为VLM-RM。我们使用CLIP基于VLM来训练一个MuJoCo人型机器人学习复杂任务,无需手动指定奖励函数,例如跪下、坐印、坐法轮等。对于每个任务,我们只提供了一个单 sentence文本提示,描述所需任务的目标,并且尽可能少的提示工程。我们提供了训练过程中的视频在:https://sites.google.com/view/vlm-rm。我们可以通过提供第二个“基准”提示来提高性能,并且可以在CLIP embedding空间中投射出无关于目标和基准的部分,以更好地分辨目标和基准。此外,我们发现VLM-RM失败模式都与当前VLM的能力限制相关,如视觉无法理解空间或环境非常远程。然而,我们发现VLM-RM具有惊人的稳定性,只要VLM具有足够的大小和计算资源,它们就可以成为RL应用中广泛使用的优秀奖励模型。这意味着未来的VLM会变得越来越有用。
results: 论文通过在多种整数数据集上进行实验,证明了MaMs模型的效果。MaMs模型可以在 maximum likelihood 和 energy-based 训练中对高维整数数据进行高效的生成模型化,并且可以实现数量级快速的 marginal 概率评估。Abstract
We introduce marginalization models (MaMs), a new family of generative models for high-dimensional discrete data. They offer scalable and flexible generative modeling with tractable likelihoods by explicitly modeling all induced marginal distributions. Marginalization models enable fast evaluation of arbitrary marginal probabilities with a single forward pass of the neural network, which overcomes a major limitation of methods with exact marginal inference, such as autoregressive models (ARMs). We propose scalable methods for learning the marginals, grounded in the concept of "marginalization self-consistency". Unlike previous methods, MaMs support scalable training of any-order generative models for high-dimensional problems under the setting of energy-based training, where the goal is to match the learned distribution to a given desired probability (specified by an unnormalized (log) probability function such as energy function or reward function). We demonstrate the effectiveness of the proposed model on a variety of discrete data distributions, including binary images, language, physical systems, and molecules, for maximum likelihood and energy-based training settings. MaMs achieve orders of magnitude speedup in evaluating the marginal probabilities on both settings. For energy-based training tasks, MaMs enable any-order generative modeling of high-dimensional problems beyond the capability of previous methods. Code is at https://github.com/PrincetonLIPS/MaM.
摘要
我们介绍 MARGINALIZATION MODELS (MaMs),一新的一代生成模型,适用于高维数据。它具有可扩展和 flexible 的生成模型,并且可以实现可读的可能性。MaMs 可以快速评估任意阶层的边界概率,这解决了前一代方法的一个主要限制,即需要精确地掌握边界概率。我们提出了可扩展的学习方法,基于 "marginalization self-consistency" 的概念。MaMs 支持可扩展的训练高维问题,以energy-based 训练为例,目的是将学习的分布与givens 的 (log) 可能性函数相匹配。我们在不同的数据分布上进行了各种实验,包括 binary 图像、语言、物理系统和分子,并获得了许多次速度的提升。对于 energy-based 训练任务,MaMs 可以实现高维问题的任何阶层生成模型,超出了前一代方法的能力。请参考 https://github.com/PrincetonLIPS/MaM。
Network-Aware AutoML Framework for Software-Defined Sensor Networks
results: 本文的实验结果表明,在遭受拒绝服务攻击时,我们的检测方法能够保证网络中的流量 packet仍然在正确的时间内传输,并且可以避免过拟合。Abstract
As the current detection solutions of distributed denial of service attacks (DDoS) need additional infrastructures to handle high aggregate data rates, they are not suitable for sensor networks or the Internet of Things. Besides, the security architecture of software-defined sensor networks needs to pay attention to the vulnerabilities of both software-defined networks and sensor networks. In this paper, we propose a network-aware automated machine learning (AutoML) framework which detects DDoS attacks in software-defined sensor networks. Our framework selects an ideal machine learning algorithm to detect DDoS attacks in network-constrained environments, using metrics such as variable traffic load, heterogeneous traffic rate, and detection time while preventing over-fitting. Our contributions are two-fold: (i) we first investigate the trade-off between the efficiency of ML algorithms and network/traffic state in the scope of DDoS detection. (ii) we design and implement a software architecture containing open-source network tools, with the deployment of multiple ML algorithms. Lastly, we show that under the denial of service attacks, our framework ensures the traffic packets are still delivered within the network with additional delays.
摘要
Current detection solutions of distributed denial of service attacks (DDoS) require additional infrastructure to handle high aggregate data rates, making them unsuitable for sensor networks or the Internet of Things. Moreover, the security architecture of software-defined sensor networks must address the vulnerabilities of both software-defined networks and sensor networks. In this paper, we propose a network-aware automated machine learning (AutoML) framework that detects DDoS attacks in software-defined sensor networks. Our framework selects the most appropriate machine learning algorithm to detect DDoS attacks in network-constrained environments, taking into account variables such as traffic load, heterogeneous traffic rate, and detection time while preventing overfitting. Our contributions are twofold:1. We investigate the trade-off between the efficiency of ML algorithms and network/traffic state in the context of DDoS detection.2. We design and implement a software architecture that incorporates open-source network tools and deploys multiple ML algorithms.Our framework ensures that traffic packets are still delivered within the network, albeit with additional delays, even under denial of service attacks.
Experimental Narratives: A Comparison of Human Crowdsourced Storytelling and AI Storytelling
paper_authors: Nina Begus for:This paper aims to investigate cultural artifacts and social biases in storytelling by both humans and generative AI.methods:The study employs fictional prompts as a novel tool for analyzing storytelling, combining behavioral and computational experiments.results:The study finds that AI-generated stories are more progressive in terms of gender roles and sexuality than human-authored texts, but offer less imaginative scenarios and rhetoric. The study also reveals the pervasive presence of the Pygmalion myth in both human and AI storytelling.Abstract
The paper proposes a framework that combines behavioral and computational experiments employing fictional prompts as a novel tool for investigating cultural artifacts and social biases in storytelling both by humans and generative AI. The study analyzes 250 stories authored by crowdworkers in June 2019 and 80 stories generated by GPT-3.5 and GPT-4 in March 2023 by merging methods from narratology and inferential statistics. Both crowdworkers and large language models responded to identical prompts about creating and falling in love with an artificial human. The proposed experimental paradigm allows a direct comparison between human and LLM-generated storytelling. Responses to the Pygmalionesque prompts confirm the pervasive presence of the Pygmalion myth in the collective imaginary of both humans and large language models. All solicited narratives present a scientific or technological pursuit. The analysis reveals that narratives from GPT-3.5 and particularly GPT-4 are more more progressive in terms of gender roles and sexuality than those written by humans. While AI narratives can occasionally provide innovative plot twists, they offer less imaginative scenarios and rhetoric than human-authored texts. The proposed framework argues that fiction can be used as a window into human and AI-based collective imaginary and social dimensions.
摘要
文章提出一种框架,结合行为和计算实验,使用虚构提问作为一种新的工具,探讨人类和生成AI对文学作品的叙述方式和社会偏见。研究分析了2019年6月由大量志愿者创作的250个故事,以及2023年3月由GPT-3.5和GPT-4生成的80个故事,通过结合 naratology 和推理统计方法进行分析。人类和大语言模型都 responded to identical prompts about creating and falling in love with an artificial human。提出的实验方法允许直接比较人类和大语言模型生成的叙述方式。对 pygmalionesque 提问的答复表明了人类和大语言模型的 коллектив imagine 中 pygmalion 神话的普遍存在。所有的请求故事都涉及科学或技术的追求。分析发现,GPT-3.5 和尤其是 GPT-4 生成的故事在性别角色和性 orientation 方面比人类生成的故事更加进步。虽然 AI 生成的故事可以提供创新的情节和叙述方式,但它们的想象力和修辞比人类创作的文章更差。提出的框架认为, fiction 可以作为人类和 AI 基础 imagine 和社会方面的窗口。
Personalized human mobility prediction for HuMob challenge
methods: 这个论文采用了个性化模型来预测每个人的行动路径,而不是预测整个人群的行动。它采用了日期和时间、活动时间、天数、时间和 POI 访问频率等特征,并通过聚类技术将其他具有相似行为特征的人员的运动方向纳入考虑。机器学习模型是支持向量回归(SVR)。
results: 论文通过线上评估准确性和特征选择和参数调整来评估模型的准确性,并发现这种个性化模型具有较低的计算成本而且可以达到较好的准确性。Abstract
We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and time, activity time, days of the week, time of day, and frequency of visits to POI (Point of Interest). As additional features, we incorporated the movement of other individuals with similar behavior patterns through the employment of clustering. The machine learning model we adopted was the Support Vector Regression (SVR). We performed accuracy through offline assessment and carried out feature selection and parameter tuning. Although overall dataset provided consists of 100,000 users trajectory, our method use only 20,000 target users data, and do not need to use other 80,000 data. Despite the personalized model's traditional feature engineering approach, this model yields reasonably good accuracy with lower computational cost.
摘要
我们介绍了在 HuMob Challenge 数据分析比赛中使用的数据创建方法ологи。我们采用了个性化模型来预测个人的运动轨迹,而不是基于整体运动的预测,根据假设人类运动各自独特。我们设计了特征,包括日期和时间、活动时间、天数、时间和访问点索引(POI)的频率。在其他特征上,我们通过聚类来 incorporate 其他有相似行为模式的个体的运动。我们采用的机器学习模型是支持向量回归(SVR)。我们通过线上评估来评估准确性,并进行特征选择和参数调整。虽然总体数据集包含100,000个用户的轨迹,但我们的方法只使用了20,000个目标用户的数据,并不需要使用其他80,000个数据。尽管个性化模型采用传统的特征工程方法,但这种模型仍然可以得到相对较好的准确性,同时计算成本较低。
TwinPot: Digital Twin-assisted Honeypot for Cyber-Secure Smart Seaports
paper_authors: Yagmur Yigit, Omer Kemal Kinaci, Trung Q. Duong, Berk Canberk for: 这个研究是为了提供一个基于双胞萃取技术的智能港区防护系统,以应对现代港区中的攻击和侵略。methods: 本研究使用了双胞萃取技术和诱掳技术来建立一个具有高真实性的诱掳系统,并与现有的防护系统集成。results: 本研究发现,使用双胞萃取技术和诱掳系统可以实现更高的攻击检测精度和防护性。对于智能港区中的内部和外部攻击,我们的解决方案可以成功地检测和防护系统。Abstract
The idea of next-generation ports has become more apparent in the last ten years in response to the challenge posed by the rising demand for efficiency and the ever-increasing volume of goods. In this new era of intelligent infrastructure and facilities, it is evident that cyber-security has recently received the most significant attention from the seaport and maritime authorities, and it is a primary concern on the agenda of most ports. Traditional security solutions can be applied to safeguard IoT and Cyber-Physical Systems (CPS) from harmful entities. Nevertheless, security researchers can only watch, examine, and learn about the behaviors of attackers if these solutions operate more transparently. Herein, honeypots are potential solutions since they offer valuable information about the attackers. It can be virtual or physical. Virtual honeypots must be more realistic to entice attackers, necessitating better high-fidelity. To this end, Digital Twin (DT) technology can be employed to increase the complexity and simulation fidelity of the honeypots. Seaports can be attacked from both their existing devices and external devices at the same time. Existing mechanisms are insufficient to detect external attacks; therefore, the current systems cannot handle attacks at the desired level. DT and honeypot technologies can be used together to tackle them. Consequently, we suggest a DT-assisted honeypot, called TwinPot, for external attacks in smart seaports. Moreover, we propose an intelligent attack detection mechanism to handle different attack types using DT for internal attacks. Finally, we build an extensive smart seaport dataset for internal and external attacks using the MANSIM tool and two existing datasets to test the performance of our system. We show that under simultaneous internal and external attacks on the system, our solution successfully detects internal and external attacks.
摘要
“随着近期的高效需求和货物量的增加,次代港口设施在过去十年中得到了更多的关注。在这个新时代的智能基础设施和设备中,cyber安全已经成为港口和海事管理部门的首要课题。传统安全解决方案可以保护互联网物理设备和Cyber-Physical Systems(CPS)免于危险威胁。但是,安全研究人员只能通过观察攻击者的行为来了解攻击者。在这种情况下,诱饵(honeypot)是一个有potential的解决方案,因为它们提供了攻击者的有益信息。诱饵可以是虚拟的或物理的。虚拟诱饵需要更加真实,以吸引攻击者,因此需要更高的高精确度。为此,数位双胞袭(DT)技术可以被应用,以增加诱饵的复杂性和模拟精度。 smart ports 可以受到来自现有设备以及外部设备的同时攻击。现有的机制不足以检测外部攻击,因此现有的系统无法处理攻击到所需的水平。DT和诱饵技术可以被用 вместе,以解决这个问题。此外,我们还提出了一个智能攻击探测机制,用于处理不同类型的攻击。最后,我们建立了一个大量的聪明港口数据集,用于内部和外部攻击的测试。我们显示,在同时受到内部和外部攻击的情况下,我们的解决方案成功地检测了内部和外部攻击。”
Predicting Ovarian Cancer Treatment Response in Histopathology using Hierarchical Vision Transformers and Multiple Instance Learning
results: 这个研究发现,使用HIPT-ABMIL模型可以在282个厚示片图像(WSIs)中预测悉尼癌症患者是否能达到至少6个月的痊愈或疾病进展阻止,并取得了60.2%±2.9%的内部平衡准确率和0.646±0.033的ROC曲线。Abstract
For many patients, current ovarian cancer treatments offer limited clinical benefit. For some therapies, it is not possible to predict patients' responses, potentially exposing them to the adverse effects of treatment without any therapeutic benefit. As part of the automated prediction of treatment effectiveness in ovarian cancer using histopathological images (ATEC23) challenge, we evaluated the effectiveness of deep learning to predict whether a course of treatment including the antiangiogenic drug bevacizumab could contribute to remission or prevent disease progression for at least 6 months in a set of 282 histopathology whole slide images (WSIs) from 78 ovarian cancer patients. Our approach used a pretrained Hierarchical Image Pyramid Transformer (HIPT) to extract region-level features and an attention-based multiple instance learning (ABMIL) model to aggregate features and classify whole slides. The optimal HIPT-ABMIL model had an internal balanced accuracy of 60.2% +- 2.9% and an AUC of 0.646 +- 0.033. Histopathology-specific model pretraining was found to be beneficial to classification performance, though hierarchical transformers were not, with a ResNet feature extractor achieving similar performance. Due to the dataset being small and highly heterogeneous, performance was variable across 5-fold cross-validation folds, and there were some extreme differences between validation and test set performance within folds. The model did not generalise well to tissue microarrays, with accuracy worse than random chance. It is not yet clear whether ovarian cancer WSIs contain information that can be used to accurately predict treatment response, with further validation using larger, higher-quality datasets required.
摘要
很多病人,现有的卵巢癌治疗方案具有有限的临床效果。一些疗法,无法预测患者的反应,可能会将患者暴露于无效的治疗后果。在自动预测卵巢癌治疗效果使用历史Pathological Images(ATEC23)挑战中,我们评估了深度学习是否可以预测在6个月内remission或疾病进展的可能性,并对78例卵巢癌病人的282个历史Pathology Whole Slide Images(WSIs)进行了分析。我们的方法使用了预训练的层次Image Pyramid Transformer(HIPT)提取区域特征,以及关注基本多实例学习(ABMIL)模型来聚合特征并分类整个扫描图。最佳的HIPT-ABMIL模型具有内部平衡准确率为60.2% ± 2.9%和ROC曲线为0.646 ± 0.033。 histopathology特定的模型预训练被发现对分类性能产生了正面影响,而层次变换器则不是,ResNet特征提取器达到了类似的性能。由于数据集较小且高度多样化,性能在5个批处分验中具有变化性,并且在批处和测试集之间有一些极端的差异。模型无法通过细胞 microarray进行泛化,准确率落后于随机抽样。这表明卵巢癌WSIs中可能没有准确预测治疗效果的信息,需要进一步的验证和 validate larger、higherquality数据集。
Physical Information Neural Networks for Solving High-index Differential-algebraic Equation Systems Based on Radau Methods
paper_authors: Jiasheng Chen, Juan Tang, Ming Yan, Shuai Lai, Kun Liang, Jianguang Lu, Wenqiang Yang
for: 解决大型差分代数方程系统(DAE)中的精度问题
methods: 结合Radau IIA数学方法和神经网络结构,使用注意力机制解决高次DAE
results: 对两个经典高次DAE系统进行数值实验,并证明使用5次Radau IIA方法可以实现最高精度解决方案,其绝对误差低于$10^{-6}$,超过现有文献结果。Abstract
As is well known, differential algebraic equations (DAEs), which are able to describe dynamic changes and underlying constraints, have been widely applied in engineering fields such as fluid dynamics, multi-body dynamics, mechanical systems and control theory. In practical physical modeling within these domains, the systems often generate high-index DAEs. Classical implicit numerical methods typically result in varying order reduction of numerical accuracy when solving high-index systems.~Recently, the physics-informed neural network (PINN) has gained attention for solving DAE systems. However, it faces challenges like the inability to directly solve high-index systems, lower predictive accuracy, and weaker generalization capabilities. In this paper, we propose a PINN computational framework, combined Radau IIA numerical method with a neural network structure via the attention mechanisms, to directly solve high-index DAEs. Furthermore, we employ a domain decomposition strategy to enhance solution accuracy. We conduct numerical experiments with two classical high-index systems as illustrative examples, investigating how different orders of the Radau IIA method affect the accuracy of neural network solutions. The experimental results demonstrate that the PINN based on a 5th-order Radau IIA method achieves the highest level of system accuracy. Specifically, the absolute errors for all differential variables remains as low as $10^{-6}$, and the absolute errors for algebraic variables is maintained at $10^{-5}$, surpassing the results found in existing literature. Therefore, our method exhibits excellent computational accuracy and strong generalization capabilities, providing a feasible approach for the high-precision solution of larger-scale DAEs with higher indices or challenging high-dimensional partial differential algebraic equation systems.
摘要
为了描述动态变化和下面约束,混合方程(DAEs)在工程领域得到广泛应用,如流体动力学、多体动力学、机械系统和控制理论。在实际物理模拟中,系统经常生成高指数DAEs。经典的假设方法通常会导致解的级数减少,从而降低数值精度。在这篇论文中,我们提议一种基于物理学习神经网络(PINN)的计算框架,结合卷积IIA方法和神经网络结构,以直接解决高指数DAEs。此外,我们采用域划分策略来提高解的准确性。我们在两个经典的高指数系统中进行了数值实验,研究不同的卷积IIA方法的影响于神经网络解的精度。实验结果表明,基于5个卷积IIA方法的PINN算法可以达到系统的最高精度水平。具体地,所有的杂分变量的绝对误差保持在10^-6之下,而部分变量的绝对误差保持在10^-5之下,超过现有文献的结果。因此,我们的方法可以具有出色的计算精度和强大的泛化能力,为更大规模的DAEs或更复杂的部分数学方程系统提供可靠的解决方案。
AgentTuning: Enabling Generalized Agent Abilities for LLMs
results: AgentTuning可以提高LLMs的代理能力,而无需妥协其通用能力,AgentLM-70B与GPT-3.5-turbo在未看过任务上的表现相当。Abstract
Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a lightweight instruction-tuning dataset containing high-quality interaction trajectories. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. AgentTuning is used to instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent tasks, demonstrating generalized agent capabilities. We open source the AgentInstruct and AgentLM-7B, 13B, and 70B models at https://github.com/THUDM/AgentTuning, serving open and powerful alternatives to commercial LLMs for agent tasks.
摘要
大型语言模型(LLMs)在各种任务中表现出色,但在实际世界中执行复杂任务时,它们远远不如商业模型如ChatGPT和GPT-4。这些代理任务使用LLMs作为中央控制器,负责计划、记忆和工具使用,需要细化的提示方法和Robust LLMs来实现满意性。虽然许多提示方法已经被提出来完成特定任务,但是尚未有关于提高LLMs自身代理能力的研究。在这项工作中,我们提出了AgentTuning方法,用于提高LLMs的代理能力,而不需要妥协其总体能力。我们构建了AgentInstruct数据集,包含高质量交互轨迹,并采用混合指令练级策略,将AgentInstruct与通用领域的开源指令结合使用。我们使用AgentTuning进行指令练级,得到了AgentLM。我们的评估结果表明,AgentTuning可以提高LLMs的代理能力,无需妥协其总体能力。AgentLM-70B与GPT-3.5-turbo在未见agent任务上的表现相当,表明AgentLM具有通用的代理能力。我们将AgentInstruct和AgentLM-7B、13B和70B模型公开发布在GitHub上, serving as open and powerful alternatives to commercial LLMs for agent tasks。
Hybrid Search for Efficient Planning with Completeness Guarantees
results: 我们在复杂的规划问题上应用了我们的完整子目标搜索方法,并证明了该方法可以确保完整性,并且在一些情况下,可以提高高级搜索的性能。这种方法使得可以在系统中应用高级搜索,并且确保完整性。Abstract
Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.
摘要
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models
results: 对 autoregressive 和 autoencoding PLMs 进行实验,显示了我们的方法的有效性,并提供了新的想法 для更有效地使用 parameter-shared 模型。Abstract
Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise. However, it is important to note that parameter sharing does not alleviate computational burdens associated with inference, thus impeding its practicality in situations characterized by limited stringent latency requirements or computational resources. Building upon neural ordinary differential equations (ODEs), we introduce a straightforward technique to enhance the inference efficiency of parameter-shared PLMs. Additionally, we propose a simple pre-training technique that leads to fully or partially shared models capable of achieving even greater inference acceleration. The experimental results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs, providing novel insights into more efficient utilization of parameter-shared models in resource-constrained settings.
摘要
parameter-shared预训练语言模型(PLMs)在有限资源环境中成为成功的方法,实现了重要的存储和内存成本减少,而无需妥协性能。然而,需要注意的是,参数共享不会减轻推理过程中的计算压力,因此在具有严格的响应时间要求或计算资源限制的情况下,实际应用中可能存在一定的限制。基于神经ordinary differential equations(ODEs),我们提出了一种简单的技术来提高参数共享PLMs的推理效率。此外,我们还提出了一种简单的预训练技术,可以实现完全或部分共享的模型,以达到更高的推理加速。实验结果表明,我们的方法在autoregressive和autoencodingPLMs中具有显著的效果,为有限资源设置中更有效地使用参数共享模型提供了新的理解。
2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision
paper_authors: Cheng-Kun Yang, Min-Hung Chen, Yung-Yu Chuang, Yen-Yu Lin
for: weakly supervised point cloud segmentation
methods: transformer model with two encoders and one decoder, using self-attention and interlaced 2D-3D cross-attention for implicit feature fusion
results: outperforms existing weakly supervised point cloud segmentation methods by a large margin on S3DIS and ScanNet benchmarksHere is the Chinese translation of the three key information points:
for: 弱Supervised点云分割
methods: 转换器模型,使用自注意和跨视图的2D-3D交叉注意来实现隐式特征融合
results: 在S3DIS和ScanNet benchmark上,与现有弱Supervised点云分割方法相比,表现出了大幅提升Abstract
We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand. To this end, we propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation using only scene-level class tags. Specifically, the two encoders compute the self-attended features for 3D point clouds and 2D multi-view images, respectively. The decoder implements interlaced 2D-3D cross-attention and carries out implicit 2D and 3D feature fusion. We alternately switch the roles of queries and key-value pairs in the decoder layers. It turns out that the 2D and 3D features are iteratively enriched by each other. Experiments show that it performs favorably against existing weakly supervised point cloud segmentation methods by a large margin on the S3DIS and ScanNet benchmarks. The project page will be available at https://jimmy15923.github.io/mit_web/.
摘要
我们提出了一个多模式融合 transformer(MIT),它同时考虑了2D和3D数据 для弱监督点云分类。研究表明了2D和3D特征是点云分类的补充,但现有方法需要额外的2D标注以实现2D-3D信息融合。鉴于点云标注的高成本,有效的2D和3D特征融合基于弱监督学习是非常需要。为此,我们提出了一个 transformer 模型,它包括两个Encoder和一个Decoder,用于弱监督点云分类。具体来说,两个Encoder computes 3D点云和2D多视角图像的自我对应特征,respectively。Decoder 实现了排序2D-3D交叉对话,并执行隐式2D和3D特征融合。我们在Decoder层中 alternate 将查询和键值组替换。发现2D和3D特征在彼此之间轮流浓化。实验结果显示,它在S3DIS和ScanNet参数上与现有弱监督点云分类方法相比,大幅提高了表现。更多信息可以通过查看https://jimmy15923.github.io/mit_web/。
Prompt Injection Attacks and Defenses in LLM-Integrated Applications
results: 该论文通过对10个LLMs和7个任务进行系统评估,发现了许多攻击和防御策略,并提供了一个可用的框架来帮助研究人员进一步发展这个领域。Abstract
Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world applications called LLM-Integrated Applications. Multiple recent works showed that LLM-Integrated Applications are vulnerable to prompt injection attacks, in which an attacker injects malicious instruction/data into the input of those applications such that they produce results as the attacker desires. However, existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a general framework to formalize prompt injection attacks. Existing attacks, which are discussed in research papers and blog posts, are special cases in our framework. Our framework enables us to design a new attack by combining existing attacks. Moreover, we also propose a framework to systematize defenses against prompt injection attacks. Using our frameworks, we conduct a systematic evaluation on prompt injection attacks and their defenses with 10 LLMs and 7 tasks. We hope our frameworks can inspire future research in this field. Our code is available at https://github.com/liu00222/Open-Prompt-Injection.
摘要
Model Merging by Uncertainty-Based Gradient Matching
results: 研究发现,Weighted-averaging方法可以在大语言模型和视觉转换器上提供良好的性能和鲁棒性,但在某些情况下可能会失败,这与梯度匹配度的不同有关。Abstract
Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.
摘要
模型训练在不同的数据集上可以通过将其参数进行权重平均合并,但为什么它会工作,而且在哪些情况下可能失败?我们连接权重平均的不准确性与梯度匹配度的差异,并提出一种基于不确定性的方案来改进性能,从而降低差异。这种连接还暴露了其他方案,如平均、任务加法和鱼得权重平均,的隐式假设。我们的新方法在大语言模型和视Transformers上都得到了一致性的改进,以及随着超参数的Robustness。
results: 研究发现,当集体学习阶段出现时,各个网络单元可以通过私有数据进行训练,并且可以完全泛化到未看到的数据类型。Abstract
Unraveling the emergence of collective learning in systems of coupled artificial neural networks is an endeavor with broader implications for physics, machine learning, neuroscience and society. Here we introduce a minimal model that condenses several recent decentralized algorithms by considering a competition between two terms: the local learning dynamics in the parameters of each neural network unit, and a diffusive coupling among units that tends to homogenize the parameters of the ensemble. We derive the coarse-grained behavior of our model via an effective theory for linear networks that we show is analogous to a deformed Ginzburg-Landau model with quenched disorder. This framework predicts (depth-dependent) disorder-order-disorder phase transitions in the parameters' solutions that reveal the onset of a collective learning phase, along with a depth-induced delay of the critical point and a robust shape of the microscopic learning path. We validate our theory in realistic ensembles of coupled nonlinear networks trained in the MNIST dataset under privacy constraints. Interestingly, experiments confirm that individual networks -- trained only with private data -- can fully generalize to unseen data classes when the collective learning phase emerges. Our work elucidates the physics of collective learning and contributes to the mechanistic interpretability of deep learning in decentralized settings.
摘要
探讨集成人工神经网络系统中共同学习的出现是一项广泛有关物理学、机器学习、神经科学和社会的研究。我们在这里提出了一个简化的模型,它把几种最近的分布式算法简化为两个参数的竞争:每个神经网络单元的本地学习动力和 Ensemble 中参数的卷积 Coupling。我们通过一种有效的线性网络理论来解释模型的宽泛行为,并证明其与扭曲的加顿兹-兰道模型相似,带有随机噪声。这个框架预测了参数解决方案中的(深度依赖)约束-约束-噪声相变,这些相变表明集成学习阶段的出现,以及深度带来的晚期极点延迟和微型学习路径的稳定形态。我们在使用实际的非线性网络集合和MNIST数据集进行训练的情况下验证了我们的理论。结果表明,当集成学习阶段出现时,各个网络——只使用私有数据进行训练——可以完全通过未看过数据类型进行总结。我们的研究为集成学习的物理学和深度学习在分布式设置中的机制解释做出了贡献。
Exploring Graph Neural Networks for Indian Legal Judgment Prediction
results: 使用XLNet预训练特征得到最佳表现,macro F1分数达75%,链接预测ROC超过80%。Abstract
The burdensome impact of a skewed judges-to-cases ratio on the judicial system manifests in an overwhelming backlog of pending cases alongside an ongoing influx of new ones. To tackle this issue and expedite the judicial process, the proposition of an automated system capable of suggesting case outcomes based on factual evidence and precedent from past cases gains significance. This research paper centres on developing a graph neural network-based model to address the Legal Judgment Prediction (LJP) problem, recognizing the intrinsic graph structure of judicial cases and making it a binary node classification problem. We explored various embeddings as model features, while nodes such as time nodes and judicial acts were added and pruned to evaluate the model's performance. The study is done while considering the ethical dimension of fairness in these predictions, considering gender and name biases. A link prediction task is also conducted to assess the model's proficiency in anticipating connections between two specified nodes. By harnessing the capabilities of graph neural networks and incorporating fairness analyses, this research aims to contribute insights towards streamlining the adjudication process, enhancing judicial efficiency, and fostering a more equitable legal landscape, ultimately alleviating the strain imposed by mounting case backlogs. Our best-performing model with XLNet pre-trained embeddings as its features gives the macro F1 score of 75% for the LJP task. For link prediction, the same set of features is the best performing giving ROC of more than 80%
摘要
“judicial系统中个偏斜的审判官至案例比例对审判过程带来巨大的负担,这导致了案件库存问题和持续的新案件涌入。为了解决这个问题并加速审判过程,建议一个自动化的系统,可以根据实际证据和过去案例的先例来预测审判结果。这个研究 paper 的目标是发展一个基于图 neural network 的模型,以解决审判预测问题(Legal Judgment Prediction,LJP),并考虑了性别和名称偏见的伦理维护。我们尝试了不同的嵌入 Space 作为模型特征,并在时间节点和司法行为上进行了评估。我们的最佳模型,使用 XLNet 预训嵌入,在 LJP 任务中取得了macro F1 分数为75%。另外,我们还进行了连接预测任务,使用相同的特征集,ROC 的成果超过80%。”Note: Please keep in mind that the translation is done by a machine and may not be as accurate as a human translation. Additionally, the translation may not capture all the nuances and idiomatic expressions of the original text.
Agri-GNN: A Novel Genotypic-Topological Graph Neural Network Framework Built on GraphSAGE for Optimized Yield Prediction
paper_authors: Aditya Gupta, Asheesh Singh for:* 这篇论文旨在帮助农业领域实现更高的生产力和可持续性,通过将技术融入农业领域。methods:* 本论文提出了一个名为 $\textit{Agri-GNN}$ 的新型 Genotypic-Topological Graph Neural Network 框架,用于捕捉农作物间的细微空间和遗传学交互,以提高农作物的收获预测。results:* $\textit{Agri-GNN}$ 在一个包括植物指标、时间、遗传学资讯和位置资料的广泛数据集上进行了实验,得到了 $R^2 = .876$ 的收获预测准确性,与基准和其他相关研究相比有很大的改善。Abstract
Agriculture, as the cornerstone of human civilization, constantly seeks to integrate technology for enhanced productivity and sustainability. This paper introduces $\textit{Agri-GNN}$, a novel Genotypic-Topological Graph Neural Network Framework tailored to capture the intricate spatial and genotypic interactions of crops, paving the way for optimized predictions of harvest yields. $\textit{Agri-GNN}$ constructs a Graph $\mathcal{G}$ that considers farming plots as nodes, and then methodically constructs edges between nodes based on spatial and genotypic similarity, allowing for the aggregation of node information through a genotypic-topological filter. Graph Neural Networks (GNN), by design, consider the relationships between data points, enabling them to efficiently model the interconnected agricultural ecosystem. By harnessing the power of GNNs, $\textit{Agri-GNN}$ encapsulates both local and global information from plants, considering their inherent connections based on spatial proximity and shared genotypes, allowing stronger predictions to be made than traditional Machine Learning architectures. $\textit{Agri-GNN}$ is built from the GraphSAGE architecture, because of its optimal calibration with large graphs, like those of farming plots and breeding experiments. $\textit{Agri-GNN}$ experiments, conducted on a comprehensive dataset of vegetation indices, time, genotype information, and location data, demonstrate that $\textit{Agri-GNN}$ achieves an $R^2 = .876$ in yield predictions for farming fields in Iowa. The results show significant improvement over the baselines and other work in the field. $\textit{Agri-GNN}$ represents a blueprint for using advanced graph-based neural architectures to predict crop yield, providing significant improvements over baselines in the field.
摘要
农业,为人类文明的基础,不断寻求技术进步,以提高生产力和可持续性。本文介绍了一种新的种植物种拟合Graph Neural Network(GNN)框架,称为$\textit{Agri-GNN}$,用于捕捉作物的细致空间和种植物互动关系,为产量预测做出优化。$\textit{Agri-GNN}$首先构建一个图 $\mathcal{G}$,其中 Plot 作为节点,然后遍历节点之间的边基于空间和种植物相似性,通过种植物拟合筛选节点信息。GNN 由于其可以准确地模型数据点之间的关系,因此可以高效地模型农业生态系统的连接关系。通过利用 GNN 的能力,$\textit{Agri-GNN}$ 可以捕捉作物的本地和全局信息,考虑作物的空间距离和共享种植物,从而进行更准确的产量预测。$\textit{Agri-GNN}$ 基于 GraphSAGE 框架,因为它可以对大型图进行优化匹配。在一个包括植被指数、时间、种植物信息和位置数据的全面数据集上进行了 $\textit{Agri-GNN}$ 的实验,结果显示 $\textit{Agri-GNN}$ 在农场产量预测中达到了 $R^2 = .876$。结果表明 $\textit{Agri-GNN}$ 在基线和其他相关工作上具有显著改善。 $\textit{Agri-GNN}$ 为使用先进的图基于神经网络预测作物产量提供了蓝本,并且在基eline上显示出了显著的改善。
Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning
results: 根据多个任务和黑盒语言模型的实验结果,ClaPS方法可以在黑盒搜索中实现state-of-the-art表现,同时减少搜索成本。这些结果表明搜索空间设计和优化在黑盒寄存器搜索中扮演了关键的角色。Abstract
Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-a-service usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.
摘要
In this paper, we first conduct a sensitivity analysis by prompting LLMs, revealing that only a small number of tokens have a disproportionate influence on LLM predictions. Building on this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings highlight the critical role of search space design and optimization in enhancing both the usefulness and efficiency of black-box prompt-based learning.
Safe RLHF: Safe Reinforcement Learning from Human Feedback
results: 通过三轮精细调整,Safe RLHF 能够优化模型的性能和安全性,比较现有的价值调整算法更好地避免危害性的回应。实验中,我们使用 Safe RLHF 调整了 Alpaca-7B,并与人类偏好相匹配,实现了模型的帮助和无害性。Abstract
With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical. However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training. To address this issue, we propose Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algorithm for human value alignment. Safe RLHF explicitly decouples human preferences regarding helpfulness and harmlessness, effectively avoiding the crowdworkers' confusion about the tension and allowing us to train separate reward and cost models. We formalize the safety concern of LLMs as an optimization task of maximizing the reward function while satisfying specified cost constraints. Leveraging the Lagrangian method to solve this constrained problem, Safe RLHF dynamically adjusts the balance between the two objectives during fine-tuning. Through a three-round fine-tuning using Safe RLHF, we demonstrate a superior ability to mitigate harmful responses while enhancing model performance compared to existing value-aligned algorithms. Experimentally, we fine-tuned the Alpaca-7B using Safe RLHF and aligned it with collected human preferences, significantly improving its helpfulness and harmlessness according to human evaluations.
摘要
大语言模型(LLM)的发展使得保持人工智能系统的性能和安全性更加重要。然而,帮助和无害的目标之间存在内在的矛盾,在LLM训练中带来了一定的挑战。为解决这个问题,我们提出了安全强化学习从人类反馈(Safe RLHF),一种新的人价值对齐算法。Safe RLHF将人类对帮助和无害的喜好分开显式地避免了协作者的混乱,让我们可以分开训练奖励和成本模型。我们将LLM的安全问题定义为最大化奖励函数的优化问题,满足specified的成本约束。通过拉格朗日方法解决这个受约问题,Safe RLHF在细化过程中动态调整了两个目标之间的平衡。通过三轮细化使用Safe RLHF,我们示出了与现有的值适应算法相比,更好地 mitigate harmful responses while enhancing model performance。实验中,我们使用Safe RLHF细化了Alpaca-7B,并与收集的人类偏好相对应,显著提高了其帮助和无害性。
SemantIC: Semantic Interference Cancellation Towards 6G Wireless Communications
results: 无需额外频率资源,可以提高信号干扰抑制性和信息质量Abstract
This letter proposes a novel anti-interference technique, semantic interference cancellation (SemantIC), for enhancing information quality towards the sixth-generation (6G) wireless networks. SemantIC only requires the receiver to concatenate the channel decoder with a semantic auto-encoder. This constructs a turbo loop which iteratively and alternately eliminates noise in the signal domain and the semantic domain. From the viewpoint of network information theory, the neural network of the semantic auto-encoder stores side information by training, and provides side information in iterative decoding, as an implementation of the Wyner-Ziv theorem. Simulation results verify the performance improvement by SemantIC without extra channel resource cost.
摘要
这封信件提出了一种新的反干扰技术,即语义干扰抑制(SemantIC),用于提高 sixth-generation(6G)无线网络中信息质量。SemantIC只需接收器将通道解码器与语义自动编码器 concatenate 起来,这将construct一个 turbo 循环,通过逐次和 alternately 消除信号域和语义域中的干扰。从网络信息理论的视角来看,语义自动编码器的神经网络在训练中存储了侧信息,并在循环解码中提供了侧信息,实现了万ер- Жи夫定理。实验结果表明SemantIC可以不添加额外通道资源成本下提高性能。
Training binary neural networks without floating point precision
results: 实现 near state-of-the-art 性能和高效训练,降低训练时间和内存占用量。Abstract
The main goal of this work is to improve the efficiency of training binary neural networks, which are low latency and low energy networks. The main contribution of this work is the proposal of two solutions comprised of topology changes and strategy training that allow the network to achieve near the state-of-the-art performance and efficient training. The time required for training and the memory required in the process are two factors that contribute to efficient training.
摘要
主要目标是提高Binary神经网络训练效率,这些神经网络具有低延迟和低能耗特性。本工作的主要贡献是提出两种解决方案,包括结构变化和策略训练,使网络达到近状态艺术性能和高效训练。训练时间和训练过程中所需的内存是两个 contribuuting 因素。Note: "Binary neural networks" in the original text is translated as "Binary神经网络" in Simplified Chinese, which is a common way to refer to neural networks with binary weights and activations.
LASER: Linear Compression in Wireless Distributed Optimization
results: 实验结果表明,LASER比基eline对比,在计算机视觉和GPT语言模型任务上表现出了Consistent的提升,特别是在噪声通信频道上,可以获得50-64%的提升。Abstract
Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineAr CompreSsion in WirEless DistRibuted Optimization. LASER capitalizes on the inherent low-rank structure of gradients and transmits them efficiently over the noisy channels. Whilst enjoying theoretical guarantees similar to those of the classical SGD, LASER shows consistent gains over baselines on a variety of practical benchmarks. In particular, it outperforms the state-of-the-art compression schemes on challenging computer vision and GPT language modeling tasks. On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.
摘要
<> translate text into Simplified ChineseData-parallel SGD 是大规模机器学习中的标准算法,却受到交通瓶颈的困扰。大多数压缩方案可以减轻这个问题,但是它们假设无噪通信链路,或者在实际任务上没有良好的性能。在这篇论文中,我们填补这个空白和引入了LASER:线性压缩在无线分布优化中。LASER 利用梯度的自然低维结构,并将其高效地传输到噪音通信链路上。与传统的SGD 算法相似,LASER 具有类似的理论保证,但是在实际任务上显示出了与基准模型的一致性。特别是在计算机视觉和 GPT 自然语言模型任务上,LASER 表现出了50-64%的改善。在后一个任务中,我们对噪音通信链路上的基准模型进行了50-64%的改善。<>
Neurosymbolic Grounding for Compositional World Models
methods: cosm os使用了一种新的征识符学符合grounding,包括:(i) neurosymbolic scene encodings,即每个场景中的实体使用一个真实向量,由神经网络编码器计算,以及一组可组合的符号描述实体的属性; (ii) neurosymbolic attention机制,将这些实体与学习的交互规则绑定。
results: cosm os在一个已知的块组 pushing 领域进行了评估,并实现了一个新的状态剑框架,在可 compose 的通用化方面实现了新的州of-the-art。Abstract
We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CG), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.
摘要
我们介绍 Cosmos,一个基于物体中心的世界模型框架,旨在实现compositional generalization(CG),即在见到的输入场景中表现出高效性。 Cosmos的中心思想是通过一种新的神经 символіック对应。 Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.Here's the translation of the text into Traditional Chinese:我们介绍 Cosmos,一个基于物体中心的世界模型框架,旨在实现compositional generalization(CG),即在见到的输入场景中表现出高效性。 Cosmos的中心思想是通过一种新的神经 символіック对应。 Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.
Compression of Recurrent Neural Networks using Matrix Factorization
results: 在信号处理任务上,可以对循环神经网络进行14倍的压缩,而且最多产生1.4%的性能下降。Abstract
Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to set the rank before training, this approach is neither flexible nor optimal. In this work, we propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Used in combination with training adaptations, our method achieves high compression rates with no or little performance degradation. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.
摘要
压缩神经网络是部署模型实时或嵌入式应用时的关键步骤。使用低级别应对approximation фактор化模型的矩阵是一种有前途的方法。although it is possible to set the rank before training, this approach is neither flexible nor optimal。在这种工作中,我们提出了一种post-training rank-selection方法,叫做Rank-Tuning,它可以为每个矩阵选择不同的极值。通过与训练改进相结合,我们的方法可以实现高度压缩,而无或很少的性能下降。我们的数字实验表明,可以将回传神经网络压缩到14倍,最多1.4%的相对性能下降。
paper_authors: Herbie Bradley, Andrew Dai, Hannah Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Grégory Schott, Joel Lehman
for: 这 paper 的目的是探讨Quality-Diversity (QD) 搜索算法在文学创作领域的应用。
methods: 这 paper 使用了语言模型 (LM) 来引导搜索,并通过人工选择和淘汰来优化候选文本的质量和多样性。
results: 对比非 QD 控制方法,QDAIF 能够更好地覆盖搜索空间,并生成高质量的创作文本。人工评价也表明,AI 和人类评价之间存在相对的一致。Abstract
In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.
摘要
在许多文本生成问题中,用户可能会希望不仅得到一个响应,而是一个多样化的高质量输出,从中选择。质量多样性(QD)搜索算法target这些输出,通过不断改进和多样化候选人 population。然而,在艺术创作领域,QD的应用受到了算法specifying measure of quality和多样性的difficulty的限制。幸好,现在的语言模型(LM)的发展使得可以通过人工智能反馈来引导搜索,其中LM被提示在自然语言中评估文本的质量和多样性。基于这一发展,我们介绍了质量多样性通过人工智能反馈(QDAIF),其中演化算法利用LM来生成多样性和评估候选人的质量和多样性。在艺术创作领域进行评估,QDAIF能够更好地覆盖指定的搜索空间,并且输出高质量的样本。此外,人类对QDAIF生成的艺术创作文本进行评估,并与人工智能评估 Display reasonable agreement。我们的结果因此表明了人工智能反馈可以引导开放的搜索,以找到创新和原创的解决方案,这种方法似乎可以普遍应用于多个领域和模式。因此,QDAIF是一步向AI系统独立搜索、多样化、评估和改进,这些技能是人类社会创新的核心。
A Use Case: Reformulating Query Rewriting as a Statistical Machine Translation Problem
results: 该论文通过使用这种查询 rewrite 管道,可以提高搜索结果的相关性和精度。Abstract
One of the most important challenges for modern search engines is to retrieve relevant web content based on user queries. In order to achieve this challenge, search engines have a module to rewrite user queries. That is why modern web search engines utilize some statistical and neural models used in the natural language processing domain. Statistical machine translation is a well-known NLP method among them. The paper proposes a query rewriting pipeline based on a monolingual machine translation model that learns to rewrite Arabic user search queries. This paper also describes preprocessing steps to create a mapping between user queries and web page titles.
摘要
现代搜索引擎面临着一个非常重要的挑战,即根据用户查询 retrieve relevante web内容。为了解决这个挑战,搜索引擎通常具有一个用于重写用户查询的模块。为了实现这一目标,现代网络搜索引擎通常使用自然语言处理领域的统计学和神经网络模型。统计机器翻译是这些模型中的一个非常知名的方法。本文提出了一个基于单语言机器翻译模型的查询重写管道,该管道可以学习重写阿拉伯语用户搜索查询。本文还描述了为创建用户查询和网页标题之间的映射而进行的预处理步骤。
PSYCHIC: A Neuro-Symbolic Framework for Knowledge Graph Question-Answering Grounding
results: 我们的系统在问题回答任务(KGQA)上取得了0.18%的F1分数,并在实体链接任务(EL)上取得了71.00%的分数。Abstract
The Scholarly Question Answering over Linked Data (Scholarly QALD) at The International Semantic Web Conference (ISWC) 2023 challenge presents two sub-tasks to tackle question answering (QA) over knowledge graphs (KGs). We answer the KGQA over DBLP (DBLP-QUAD) task by proposing a neuro-symbolic (NS) framework based on PSYCHIC, an extractive QA model capable of identifying the query and entities related to a KG question. Our system achieved a F1 score of 00.18% on question answering and came in third place for entity linking (EL) with a score of 71.00%.
摘要
“学术问答 над连接数据”(Scholarly QALD)在2023年国际semantic Web会议(ISWC)挑战中提出了两个子任务来解决知识图(KG)上的问答(QA)。我们使用一个基于neuro-symbolic(NS)框架,其中PSYCHIC是一种提取式QA模型,可以识别KG问题中的查询和相关的实体。我们的系统在问答任务上达到了0.18%的F1分数,并在实体关联(EL)任务上达到了71.00%的分数。
On existence, uniqueness and scalability of adversarial robustness measures for AI classifiers
methods: 论文使用了一种基于泛函理论的方法,并提出了一种基于 entropy 的 AI 模型。
results: 论文提出了一种可 analytical 计算的最小敌对路径(MAP)和最小敌对距离(MAD),并在不同类型的 AI 工具(如神经网络、启动随机森林、GLM 和 EAI)上进行了实践计算和比较。 另外,论文还解释了如何使用 MAP 提供专门的患者特定风险 Mitigation 方案。Abstract
Simply-verifiable mathematical conditions for existence, uniqueness and explicit analytical computation of minimal adversarial paths (MAP) and minimal adversarial distances (MAD) for (locally) uniquely-invertible classifiers, for generalized linear models (GLM), and for entropic AI (EAI) are formulated and proven. Practical computation of MAP and MAD, their comparison and interpretations for various classes of AI tools (for neuronal networks, boosted random forests, GLM and EAI) are demonstrated on the common synthetic benchmarks: on a double Swiss roll spiral and its extensions, as well as on the two biomedical data problems (for the health insurance claim predictions, and for the heart attack lethality classification). On biomedical applications it is demonstrated how MAP provides unique minimal patient-specific risk-mitigating interventions in the predefined subsets of accessible control variables.
摘要
<>通过形式化的数学条件,得出最小对抗路径(MAP)和最小对抗距离(MAD)的存在、唯一性和计算方法, для(本地)唯一推导类型(GLM)和生成Entropic AI(EAI)。在各种AI工具(神经网络、提高Random Forest、GLM和EAI)中进行实际计算MAP和MAD,并对它们进行比较和解释。在Double Swiss Roll Spiral和其扩展问题上,以及在两个生物医学数据问题(健康保险laims预测和心血栓病性分类)上进行实践。在生物医学应用中,MAP提供了唯一的最小特定患者风险减轻 intervención。Note: " Simplified Chinese" is also known as "Mandarin Chinese" or "Standard Chinese".Please note that the translation is done using a machine translation tool, and may not be 100% accurate or idiomatic.
Towards a Deep Learning-based Online Quality Prediction System for Welding Processes
methods: 该论文提出了一个包括四个主要阶段的概念:数据收集和管理(如电压和电流)、实时处理和特征工程(使用自适应神经网络)、使用适当的循环深度学习模型进行质量预测,以及在 proces 条件变化时进行模型进化(使用不间断学习)。
results: 该论文未提供实际result,但提出了一种可靠的预测质量系统基于深度学习,可以在非实验室环境中实时监控和评估GMAW proces。Abstract
The digitization of manufacturing processes enables promising applications for machine learning-assisted quality assurance. A widely used manufacturing process that can strongly benefit from data-driven solutions is gas metal arc welding (GMAW). The welding process is characterized by complex cause-effect relationships between material properties, process conditions and weld quality. In non-laboratory environments with frequently changing process parameters, accurate determination of weld quality by destructive testing is economically unfeasible. Deep learning offers the potential to identify the relationships in available process data and predict the weld quality from process observations. In this paper, we present a concept for a deep learning based predictive quality system in GMAW. At its core, the concept involves a pipeline consisting of four major phases: collection and management of multi-sensor data (e.g. current and voltage), real-time processing and feature engineering of the time series data by means of autoencoders, training and deployment of suitable recurrent deep learning models for quality predictions, and model evolutions under changing process conditions using continual learning. The concept provides the foundation for future research activities in which we will realize an online predictive quality system for running production.
摘要
随着制造过程的数字化,机器学习助成质量监控应用得到了推动。一种广泛使用的制造过程是气密填充焊接(GMAW)。焊接过程具有复杂的原因关系,其中物质性、过程参数和焊接质量之间存在紧密的关系。在实际生产环境中,通过采用破坏性测试准确地确定焊接质量是经济不可行。深度学习可以识别可用的过程数据中的关系,预测焊接质量基于过程观察。在这篇论文中,我们提出了一种基于深度学习的预测质量系统概念。这个概念包括四个主要阶段:收集和管理多感器数据(如电流和电压)、实时处理和特征工程时间序列数据使用自动编码器、训练和部署适合的循环深度学习模型以进行质量预测、以及在过程参数变化时使用连续学习进行模型进化。这个概念提供了未来研究活动的基础,我们将实现在生产中运行的在线预测质量系统。
Heart Disease Detection using Vision-Based Transformer Models from ECG Images
results: 实验结果显示,您的方法可以准确地检测心血管疾病。Abstract
Heart disease, also known as cardiovascular disease, is a prevalent and critical medical condition characterized by the impairment of the heart and blood vessels, leading to various complications such as coronary artery disease, heart failure, and myocardial infarction. The timely and accurate detection of heart disease is of paramount importance in clinical practice. Early identification of individuals at risk enables proactive interventions, preventive measures, and personalized treatment strategies to mitigate the progression of the disease and reduce adverse outcomes. In recent years, the field of heart disease detection has witnessed notable advancements due to the integration of sophisticated technologies and computational approaches. These include machine learning algorithms, data mining techniques, and predictive modeling frameworks that leverage vast amounts of clinical and physiological data to improve diagnostic accuracy and risk stratification. In this work, we propose to detect heart disease from ECG images using cutting-edge technologies, namely vision transformer models. These models are Google-Vit, Microsoft-Beit, and Swin-Tiny. To the best of our knowledge, this is the initial endeavor concentrating on the detection of heart diseases through image-based ECG data by employing cuttingedge technologies namely, transformer models. To demonstrate the contribution of the proposed framework, the performance of vision transformer models are compared with state-of-the-art studies. Experiment results show that the proposed framework exhibits remarkable classification results.
摘要
心脏病,也称为心血管疾病,是一种非常普遍和严重的医疗病种, caracterized by impairment of the heart and blood vessels, leading to various complications such as coronary artery disease, heart failure, and myocardial infarction. 时间和准确的检测心脏病非常重要在临床实践中。早期识别患者风险可以实施措施,预防措施和个性化的治疗策略,以降低疾病的进程和不良结果。在过去几年,心脏病检测领域已经经历了显著的进步,它们是通过融合先进技术和计算方法的努力。这些包括机器学习算法、数据挖掘技术和预测模型框架,这些技术可以利用庞大的临床和生理学数据,提高诊断精度和风险分级。在这项工作中,我们提议使用电cardioGRAM(ECG)图像来检测心脏病,使用 cutting-edge 技术,即视Transformer模型。这些模型包括Google-Vit、Microsoft-Beit和Swin-Tiny。到目前为止,这是第一个集中于通过图像基本ECG数据来检测心脏病的研究,使用 cutting-edge 技术,即 transformer 模型。为了证明提案的贡献,我们将比较视Transformer模型的表现与现有研究的最佳成果。实验结果显示,我们的提案框架具有惊人的分类效果。
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
results: 这 paper 的模型比其他 state-of-the-art 模型更好,可以更好地 segment 历史地图。此外,这 paper 还提出了一种将 spatial 和 temporal 上下文 fusion 到一起的方法,以提高模型的性能。Abstract
Historical maps provide useful spatio-temporal information on the Earth's surface before modern earth observation techniques came into being. To extract information from maps, neural networks, which gain wide popularity in recent years, have replaced hand-crafted map processing methods and tedious manual labor. However, aleatoric uncertainty, known as data-dependent uncertainty, inherent in the drawing/scanning/fading defects of the original map sheets and inadequate contexts when cropping maps into small tiles considering the memory limits of the training process, challenges the model to make correct predictions. As aleatoric uncertainty cannot be reduced even with more training data collected, we argue that complementary spatio-temporal contexts can be helpful. To achieve this, we propose a U-Net-based network that fuses spatio-temporal features with cross-attention transformers (U-SpaTem), aggregating information at a larger spatial range as well as through a temporal sequence of images. Our model achieves a better performance than other state-or-art models that use either temporal or spatial contexts. Compared with pure vision transformers, our model is more lightweight and effective. To the best of our knowledge, leveraging both spatial and temporal contexts have been rarely explored before in the segmentation task. Even though our application is on segmenting historical maps, we believe that the method can be transferred into other fields with similar problems like temporal sequences of satellite images. Our code is freely accessible at https://github.com/chenyizi086/wu.2023.sigspatial.git.
摘要
历史地图提供了地球表面之前的有用空间-时间信息。为提取信息从地图,人工神经网络在最近几年中得到了广泛的应用,取代了手工地图处理方法和繁琐的手动劳动。然而,抽象不确定性(数据依赖的不确定性),包括地图描述/扫描/褪色缺陷和缺乏合适的上下文,使模型预测 incorrect。由于这种抽象不确定性不能减少,我们认为可以利用 complementary 空间-时间上下文。为此,我们提出了基于 U-Net 网络的混合空间-时间特征(U-SpaTem),将空间-时间特征混合在一起,并通过带有权重的混合层来进行权重融合。我们的模型在与其他状态 искусственного神经网络(ANN)比较时表现更好,而且相比于纯视觉变换器,我们的模型更轻量级、有效。我们认为利用空间和时间上下文的组合是一项 rarely explored 的方法,即使在地图分割任务中。虽然我们的应用是在历史地图分割中,但我们认为这种方法可以被应用到其他具有相似问题的领域,如卫星图像序列中的时间序列分割。我们的代码可以免费在 GitHub 上获取:https://github.com/chenyizi086/wu.2023.sigspatial.git。
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series
results: 结果显示,多样性的机器学习模型和精心选择的验证示例是效果最好的关键因素。尽管预测数据具有自然的层次结构(12 级),但我们的提posed 解决方案并未利用这种层次结构。使用提posed 方法,我们的团队在 both Accuracy 和 Uncertainty track 中获得了金牌奖。Abstract
In this paper we tackle the problem of point and probabilistic forecasting by describing a blending methodology of machine learning models that belong to gradient boosted trees and neural networks families. These principles were successfully applied in the recent M5 Competition on both Accuracy and Uncertainty tracks. The keypoints of our methodology are: a) transform the task to regression on sales for a single day b) information rich feature engineering c) create a diverse set of state-of-the-art machine learning models and d) carefully construct validation sets for model tuning. We argue that the diversity of the machine learning models along with the careful selection of validation examples, where the most important ingredients for the effectiveness of our approach. Although forecasting data had an inherent hierarchy structure (12 levels), none of our proposed solutions exploited that hierarchical scheme. Using the proposed methodology, our team was ranked within the gold medal range in both Accuracy and the Uncertainty track. Inference code along with already trained models are available at https://github.com/IoannisNasios/M5_Uncertainty_3rd_place
摘要
在这篇论文中,我们解决了点预测和概率预测问题,描述了一种将机器学习模型组合成 gradient boosted trees 和神经网络家族的方法。这些原则在最近的 M5 竞赛中得到了成功应用,在精度和不确定性轨道上均获得了金牌。我们的方法的关键特点包括:a) 将任务转化为销售单日的回归问题b) rich feature engineeringc) 创建多种现有最佳机器学习模型d) 仔细构建验证集 для模型调整我们认为多样化的机器学习模型以及仔细选择的验证示例是我们方法的关键元素。尽管预测数据有自然的层次结构(12级),但我们所提出的解决方案并未利用该层次结构。使用我们提议的方法,我们的团队在精度和不确定性轨道上均获得了金牌。可以在 https://github.com/IoannisNasios/M5_Uncertainty_3rd_place 获取预测代码以及已训练模型。
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
results: 我们应用这些方法于GPT-2小和性别偏见问题,并使用发现的组件集来进行参数有效地 fine-tuning,以减少性别偏见而不损害通用语言模型表现。Abstract
Language models (LMs) exhibit and amplify many types of undesirable biases learned from the training data, including gender bias. However, we lack tools for effectively and efficiently changing this behavior without hurting general language modeling performance. In this paper, we study three methods for identifying causal relations between LM components and particular output: causal mediation analysis, automated circuit discovery and our novel, efficient method called DiffMask+ based on differential masking. We apply the methods to GPT-2 small and the problem of gender bias, and use the discovered sets of components to perform parameter-efficient fine-tuning for bias mitigation. Our results show significant overlap in the identified components (despite huge differences in the computational requirements of the methods) as well as success in mitigating gender bias, with less damage to general language modeling compared to full model fine-tuning. However, our work also underscores the difficulty of defining and measuring bias, and the sensitivity of causal discovery procedures to dataset choice. We hope our work can contribute to more attention for dataset development, and lead to more effective mitigation strategies for other types of bias.
摘要
Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning
results: 试验结果显示,这个方法在多模式环境中表现更加稳定和有效,能够实现机器人对于可到达的目标的追寻,并避免障碍物的碰撞。Abstract
Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance.
摘要
Diffusion 模型在机器人学中得到了广泛应用,因为它们具有灵活性和多模式性。虽然一些这些方法可以有效地解决复杂的问题,但它们经常依赖于运行时检测障碍物并需要额外设备。为解决这些挑战,我们提出了一种方法,即在运行时同时生成可达的目标和避免障碍的动作计划,从单一的视觉输入中进行训练。我们的方法的核心是使用避免碰撞的扩散核心进行训练。经过对行为假写和古典扩散模型的评估,我们的框架在多模式环境中表现特别有效,能够寻求目标并避免不可达的目标,同时确保碰撞避免。
Time-Aware Representation Learning for Time-Sensitive Question Answering
results: 对比基eline模型,TCQA模型在TimeQA数据集上的表现提高了8.5个F1得分。Abstract
Time is one of the crucial factors in real-world question answering (QA) problems. However, language models have difficulty understanding the relationships between time specifiers, such as 'after' and 'before', and numbers, since existing QA datasets do not include sufficient time expressions. To address this issue, we propose a Time-Context aware Question Answering (TCQA) framework. We suggest a Time-Context dependent Span Extraction (TCSE) task, and build a time-context dependent data generation framework for model training. Moreover, we present a metric to evaluate the time awareness of the QA model using TCSE. The TCSE task consists of a question and four sentence candidates classified as correct or incorrect based on time and context. The model is trained to extract the answer span from the sentence that is both correct in time and context. The model trained with TCQA outperforms baseline models up to 8.5 of the F1-score in the TimeQA dataset. Our dataset and code are available at https://github.com/sonjbin/TCQA
摘要
时间是现实世界问答(QA)问题中的一个关键因素。然而,语言模型很难理解时间specifier,如'after'和'before',以及数字之间的关系,因为现有的QA数据集没有充足的时间表达。为解决这个问题,我们提议一种时间上下文意识Question Answering(TCQA)框架。我们建议一种基于时间上下文的Span抽取任务(TCSE),并为模型训练建立了时间上下文依赖的数据生成框架。此外,我们还提出了一种用于评估时间意识的QA模型的度量,该度量基于TCSE任务。TCSE任务包括一个问题和四个句子选择器,每个选择器根据时间和上下文被分为正确或错误。模型需要从正确的时间和上下文中提取答案Span。与基eline模型相比,使用TCQA训练的模型在TimeQA数据集上的F1得分提高了8.5。我们的数据集和代码可以在https://github.com/sonjbin/TCQA上获取。
Pretraining Language Models with Text-Attributed Heterogeneous Graphs
results: 实验结果表明,本文的方法在三个不同领域的 datasets 上的链接预测和节点分类任务中具有明显的优势,并且每一个设计的合理性。代码可以在 https://github.com/Hope-Rita/THLM 中找到。Abstract
In many real-world scenarios (e.g., academic networks, social platforms), different types of entities are not only associated with texts but also connected by various relationships, which can be abstracted as Text-Attributed Heterogeneous Graphs (TAHGs). Current pretraining tasks for Language Models (LMs) primarily focus on separately learning the textual information of each entity and overlook the crucial aspect of capturing topological connections among entities in TAHGs. In this paper, we present a new pretraining framework for LMs that explicitly considers the topological and heterogeneous information in TAHGs. Firstly, we define a context graph as neighborhoods of a target node within specific orders and propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network. Secondly, based on the observation that some nodes are text-rich while others have little text, we devise a text augmentation strategy to enrich textless nodes with their neighbors' texts for handling the imbalance issue. We conduct link prediction and node classification tasks on three datasets from various domains. Experimental results demonstrate the superiority of our approach over existing methods and the rationality of each design. Our code is available at https://github.com/Hope-Rita/THLM.
摘要
在许多实际场景(如学术网络、社交平台),不同类型的实体不仅与文本相关,还之间存在多种关系,可以抽象为文本拥有hetogeneous图(TAHG)。现有的语言模型(LM)预训练任务主要关注每个实体的文本信息,忽略了捕捉TAHG中实体之间的 topological和多样化信息的重要性。在这篇论文中,我们提出了一种新的预训练框架 дляLM,其中明确考虑TAHG中实体之间的topological和多样化信息。首先,我们定义了一个上下文图,即target节点的邻居在特定顺序中的 neighborhood,并提出了一种 topology-aware预训练任务,用于预测target节点的上下文图中的节点。其次,根据发现一些节点有多少文本信息,而其他节点几乎没有文本信息的观察,我们提出了一种文本扩充策略,用于让文本缺乏节点通过与其他节点的文本信息进行扩充。我们在三个不同领域的 dataset上进行了链接预测和节点分类任务。实验结果表明,我们的方法比现有方法更有优势,并且每一个设计的合理性。我们的代码可以在https://github.com/Hope-Rita/THLM中找到。
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
results: 本研究将提供一个全面的安全性评估工具,以促进强化学习在安全敏感 scenarios 中的应用。Abstract
Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns. Safe reinforcement learning (SafeRL) emerges as a solution to optimize policies while simultaneously adhering to multiple constraints, thereby addressing the challenge of integrating reinforcement learning in safety-critical scenarios. In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. Additionally, we offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms. This comprehensive library can serve as a validation tool for the research community. By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications. The website of this project can be accessed at https://sites.google.com/view/safety-gymnasium.
摘要
人工智能(AI)系统具有潜在的社会进步 potential。然而,它们的部署经常遇到安全问题的阻碍。安全强化学习(SafeRL)作为一种解决方案,可以同时优化策略并遵循多个约束,以解决在安全关键场景中应用强化学习的挑战。在这篇论文中,我们介绍了一个名为安全健身房(Safety-Gymnasium)的环境集合,包括单机和多机场景下的安全关键任务,接受 вектор和视觉输入。此外,我们还提供了一个名为安全策略优化(SafePO)的库,包含16种当前最佳的SafeRL算法。这个全面的库可以作为一个验证工具,以便研究人员对这些算法进行评估和比较。通过推出这个标准,我们希望能够促进强化学习在实际应用中的安全性、可靠性和责任性的发展。相关项目的网站地址为 。
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text
results: 实验结果表明,Compared with现有的空间理解方法,DepWiGNN在两个复杂的多趟空间理解数据集上表现出色,并且在捕捉长距离依赖关系方面具有优势。Abstract
Spatial reasoning in text plays a crucial role in various real-world applications. Existing approaches for spatial reasoning typically infer spatial relations from pure text, which overlook the gap between natural language and symbolic structures. Graph neural networks (GNNs) have showcased exceptional proficiency in inducing and aggregating symbolic structures. However, classical GNNs face challenges in handling multi-hop spatial reasoning due to the over-smoothing issue, \textit{i.e.}, the performance decreases substantially as the number of graph layers increases. To cope with these challenges, we propose a novel \textbf{Dep}th-\textbf{Wi}se \textbf{G}raph \textbf{N}eural \textbf{N}etwork (\textbf{DepWiGNN}). Specifically, we design a novel node memory scheme and aggregate the information over the depth dimension instead of the breadth dimension of the graph, which empowers the ability to collect long dependencies without stacking multiple layers. Experimental results on two challenging multi-hop spatial reasoning datasets show that DepWiGNN outperforms existing spatial reasoning methods. The comparisons with the other three GNNs further demonstrate its superiority in capturing long dependency in the graph.
摘要
文本中的空间理解在各种实际应用中发挥关键作用。现有的空间理解方法通常从纯文本中推断空间关系,忽略自然语言和符号结构之间的差异。图 neural network(GNN)已经表现出了exceptional的能力 inducting和聚合符号结构。然而,经典GNN难以处理多跳空间理解,因为over-smoothing问题,即在图层数增加时性能减退较大。为了解决这些挑战,我们提出了一种新的Depth-Wise Graph Neural Network(DepWiGNN)。具体来说,我们设计了一种新的节点储存方案,并在图的深度维度上聚合信息,而不是在图的宽度维度上,这使得我们可以不堆叠多层来收集长距离依赖关系。实验结果表明,DepWiGNN在两个复杂的多跳空间理解 dataset 上表现出了比较出色的表现,并且与其他三种 GNN 进行比较,具体来说,DepWiGNN 能够更好地捕捉图中的长距离依赖关系。
Large Language Model for Multi-objective Evolutionary Optimization
results: 在不同测试 benchmark 上实现竞争性的表现,并且Operator 只需学习从一些实例就能够在未经见过的问题上有 robust 的一致性表现。Abstract
Multiobjective evolutionary algorithms (MOEAs) are major methods for solving multiobjective optimization problems (MOPs). Many MOEAs have been proposed in the past decades, of which the search operators need a carefully handcrafted design with domain knowledge. Recently, some attempts have been made to replace the manually designed operators in MOEAs with learning-based operators (e.g., neural network models). However, much effort is still required for designing and training such models, and the learned operators might not generalize well on new problems. To tackle the above challenges, this work investigates a novel approach that leverages the powerful large language model (LLM) to design MOEA operators. With proper prompt engineering, we successfully let a general LLM serve as a black-box search operator for decomposition-based MOEA (MOEA/D) in a zero-shot manner. In addition, by learning from the LLM behavior, we further design an explicit white-box operator with randomness and propose a new version of decomposition-based MOEA, termed MOEA/D-LO. Experimental studies on different test benchmarks show that our proposed method can achieve competitive performance with widely used MOEAs. It is also promising to see the operator only learned from a few instances can have robust generalization performance on unseen problems with quite different patterns and settings. The results reveal the potential benefits of using pre-trained LLMs in the design of MOEAs.
摘要
多目标演化算法(MOEA)是多目标优化问题(MOP)的主要解决方法。过去几十年内,许多MOEA的搜索运算需要手动设计,需要域知识。现在,一些尝试将MOEA中的手动设计 replaced with学习基于神经网络模型(e.g., neural network models)。然而,设计和训练这些模型需要很多努力,并且学习的运算可能无法在新的问题上generalize well。为了解决以上挑战,本研究提出了一种新的方法,利用强大的大语言模型(LLM)来设计MOEA运算。通过适当的提示工程,我们成功地让一个通用的LLM服务器为MOEA/D中的黑盒搜索运算。此外,通过学习LLM的行为,我们进一步设计了一个显式的白盒运算,并提出了一新的MOEA/D版本,称为MOEA/D-LO。实验研究在不同的测试准则上表明,我们的提posed方法可以与常用的MOEA具有竞争性的性能。此外,我们发现只需从一些实例学习,Operator就可以在未看过问题上display robust generalization性。结果表明,使用预训练的LLM可以在MOEA的设计中带来潜在的优势。
Reliable Academic Conference Question Answering: A Study Based on Large Language Model
results: 该研究表明,通过采用结构意识的检索方法,可以增强大语言模型的问答能力,并且对学术会议问答 task 进行了验证。Abstract
The rapid growth of computer science has led to a proliferation of research presented at academic conferences, fostering global scholarly communication. Researchers consistently seek accurate, current information about these events at all stages. This data surge necessitates an intelligent question-answering system to efficiently address researchers' queries and ensure awareness of the latest advancements. The information of conferences is usually published on their official website, organized in a semi-structured way with a lot of text. To address this need, we have developed the ConferenceQA dataset for 7 diverse academic conferences with human annotations. Firstly, we employ a combination of manual and automated methods to organize academic conference data in a semi-structured JSON format. Subsequently, we annotate nearly 100 question-answer pairs for each conference. Each pair is classified into four different dimensions. To ensure the reliability of the data, we manually annotate the source of each answer. In light of recent advancements, Large Language Models (LLMs) have demonstrated impressive performance in various NLP tasks. They have demonstrated impressive capabilities in information-seeking question answering after instruction fine-tuning, and as such, we present our conference QA study based on LLM. Due to hallucination and outdated knowledge of LLMs, we adopt retrieval based methods to enhance LLMs' question-answering abilities. We have proposed a structure-aware retrieval method, specifically designed to leverage inherent structural information during the retrieval process. Empirical validation on the ConferenceQA dataset has demonstrated the effectiveness of this method. The dataset and code are readily accessible on https://github.com/zjukg/ConferenceQA.
摘要
computer科学的快速发展导致学术会议的研究成果激增,促进了全球学术交流。研究人员通常需要准确、实时的信息关于这些活动,这些数据涌入使得需要一个智能问答系统来有效地回答研究人员的问题,以确保对最新的发展进行了了解。会议信息通常发布在官方网站上,排序方式半结构化,具有大量文本。为了解决这个需求,我们开发了7个不同的学术会议的会议QA数据集,并进行了人工和自动方法来组织学术会议数据。然后,我们为每个会议annotated约100个问题答对。每对问题答被分类为四个维度。为保证数据的可靠性,我们手动标注每个答案的来源。鉴于最近的进步,大型自然语言模型(LLMs)在多种自然语言处理任务中表现出色。它们在带有指导的信息时进行问答任务也表现出色。因此,我们基于LLM进行会议QA研究。由于LLM的幻觉和过时知识,我们采用检索方法来增强LLM的问答能力。我们提出了结构意识检索方法,专门利用检索过程中的结构信息。对ConferenceQA数据集的验证表明了这种方法的有效性。数据集和代码可以在GitHub上获得:https://github.com/zjukg/ConferenceQA。
Be Bayesian by Attachments to Catch More Uncertainty
methods: 这个论文使用了一个附加结构(ABNN),它包含一个期望模组和几个分布模组。期望模组是一个深度网络,它主要关注原始任务,而分布模组则是一些小型的bayesian结构,它们serve as ABNN的附加部分,以捕捉ID和OOD数据中的不确定性。
results: 这个论文提出了一个 theoretically sound的ABNN模型,并进行了实验验证,与一些现有的不确定性估计方法进行比较,结果显示ABNN具有较高的uncertainty估计精度。Abstract
Bayesian Neural Networks (BNNs) have become one of the promising approaches for uncertainty estimation due to the solid theorical foundations. However, the performance of BNNs is affected by the ability of catching uncertainty. Instead of only seeking the distribution of neural network weights by in-distribution (ID) data, in this paper, we propose a new Bayesian Neural Network with an Attached structure (ABNN) to catch more uncertainty from out-of-distribution (OOD) data. We first construct a mathematical description for the uncertainty of OOD data according to the prior distribution, and then develop an attached Bayesian structure to integrate the uncertainty of OOD data into the backbone network. ABNN is composed of an expectation module and several distribution modules. The expectation module is a backbone deep network which focuses on the original task, and the distribution modules are mini Bayesian structures which serve as attachments of the backbone. In particular, the distribution modules aim at extracting the uncertainty from both ID and OOD data. We further provide theoretical analysis for the convergence of ABNN, and experimentally validate its superiority by comparing with some state-of-the-art uncertainty estimation methods Code will be made available.
摘要
权 bayesian neural networks (BNNs) 已成为一种有前途的方法 для uncertainty estimation,它的理论基础非常坚固。然而,BNNs 的性能受到捕捉 uncertainty 的能力的限制。在这篇论文中,我们提议一种新的 Bayesian Neural Network with Attached structure (ABNN),可以更好地捕捉 OOD 数据中的 uncertainty。我们首先构造了 OOD 数据中 uncertainty 的数学描述,然后开发了一种附加结构来将 OOD 数据中的 uncertainty 集成到主要网络中。ABNN 由一个期望模块和多个分布模块组成。期望模块是一个深度网络,主要关注原始任务,而分布模块则是一些附加的 Bayesian 结构,用于提取 ID 和 OOD 数据中的 uncertainty。我们进一步提供了 ABNN 的理论分析,并通过与一些现有的 uncertainty estimation 方法进行比较,证明 ABNN 的优越性。代码将会公开。
Testing the Consistency of Performance Scores Reported for Binary Classification Problems
results: 通过三个医学应用例程,研究人员可以使用提出的方法检测报告性能指标的不一致,以保护科学领域的 integriy。Abstract
Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and rank classification techniques based on performance metrics such as accuracy, sensitivity, and specificity. However, reported performance scores may not always serve as a reliable basis for research ranking. This can be attributed to undisclosed or unconventional practices related to cross-validation, typographical errors, and other factors. In a given experimental setup, with a specific number of positive and negative test items, most performance scores can assume specific, interrelated values. In this paper, we introduce numerical techniques to assess the consistency of reported performance scores and the assumed experimental setup. Importantly, the proposed approach does not rely on statistical inference but uses numerical methods to identify inconsistencies with certainty. Through three different applications related to medicine, we demonstrate how the proposed techniques can effectively detect inconsistencies, thereby safeguarding the integrity of research fields. To benefit the scientific community, we have made the consistency tests available in an open-source Python package.
摘要
<>将文本翻译成简化字符的中文。<>机器学习中的二分类任务是基础任务之一,它在各个科学领域中有各种应用。科学家在进行基础研究或实践应用时,通常会评估和排序分类技术基于性能指标 such as 准确率、敏感度和特异度。然而,报告的性能分数可能不一定是可靠的基础 для研究排名。这可以归因于不明文或非标准的批处分配、 typographical errors 和其他因素。在给定的实验设置中,有一定数量的正例和负例测试项时,大多数性能分数会假设特定、相关的值。在这篇论文中,我们介绍了数学技术来评估报告的性能分数的一致性和假设的实验设置。importantly,我们的方法不基于统计推断,而是使用数学方法来确定不一致性。通过医学相关的三个应用,我们示例了如何使用我们的方法检测不一致性,从而保护科学领域的准确性。为了利助科学社区,我们将一致性测试公开发布为开源的 Python 包。
Powerset multi-class cross entropy loss for neural speaker diarization
methods: 该方法使用了 permutation-invariant training 和 (local) 监督 EEND diarization 的组合,并对多标签分类问题进行了改进。
results: 经过广泛的实验,该方法在9个标准测试集上达到了较好的性能(特别是在 overlap speech 中),同时消除了探测阈值参数,从而提高了 robustness 和灵活性。Abstract
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work has been addressing speaker diarization as a frame-wise multi-label classification problem with permutation-invariant training. Despite EEND showing great promise, a few recent works took a step back and studied the possible combination of (local) supervised EEND diarization with (global) unsupervised clustering. Yet, these hybrid contributions did not question the original multi-label formulation. We propose to switch from multi-label (where any two speakers can be active at the same time) to powerset multi-class classification (where dedicated classes are assigned to pairs of overlapping speakers). Through extensive experiments on 9 different benchmarks, we show that this formulation leads to significantly better performance (mostly on overlapping speech) and robustness to domain mismatch, while eliminating the detection threshold hyperparameter, critical for the multi-label formulation.
摘要
自2019年引入以来,整个端到端神经 диари化(EEND)工作一直是将说话者 диари化视为一个帧内多类标签分类问题,并在训练中保持 permutation-invariant。 despite EEND表现出了很大的承诺,一些最近的工作又回退了,研究了可能的(本地)supervised EEND диари化与(全局)无监督分群的组合。然而,这些混合贡献没有质疑原始多类形式。我们提议将从多类(任何两个说话者都可以同时活跃)转换为集合类别分类(每对重叠的说话者都有专门的类别)。通过对9个benchmark进行了广泛的实验,我们显示这种形式导致了明显的性能提高(主要是在重叠speech),同时消除了多个标签分类的检测阈值参数,这是多类分类的 kritical 参数。
RTNH+: Enhanced 4D Radar Object Detection Network using Combined CFAR-based Two-level Preprocessing and Vertical Encoding
paper_authors: Seung-Hyun Kong, Dong-Hee Paek, Sangjae Cho for: 提高4D Radar对3D物体探测和相对垂直速度估计的精度methods: 提出了两种新算法:一是基于CFAR的两级预处理算法(CCTP),通过生成不同特征的两个 filtered measurement,提高输入4D Radar物体探测网络的信息含义;二是垂直编码算法(VE),有效地编码 Vertical 特征。results: RTNH+在${AP}{3D}^{IoU=0.3}$和${AP}{3D}^{IoU=0.5}$中具有10.14%和16.12%的性能提升,相比RTNH。Abstract
Four-dimensional (4D) Radar is a useful sensor for 3D object detection and the relative radial speed estimation of surrounding objects under various weather conditions. However, since Radar measurements are corrupted with invalid components such as noise, interference, and clutter, it is necessary to employ a preprocessing algorithm before the 3D object detection with neural networks. In this paper, we propose RTNH+ that is an enhanced version of RTNH, a 4D Radar object detection network, by two novel algorithms. The first algorithm is the combined constant false alarm rate (CFAR)-based two-level preprocessing (CCTP) algorithm that generates two filtered measurements of different characteristics using the same 4D Radar measurements, which can enrich the information of the input to the 4D Radar object detection network. The second is the vertical encoding (VE) algorithm that effectively encodes vertical features of the road objects from the CCTP outputs. We provide details of the RTNH+, and demonstrate that RTNH+ achieves significant performance improvement of 10.14\% in ${AP}_{3D}^{IoU=0.3}$ and 16.12\% in ${AP}_{3D}^{IoU=0.5}$ over RTNH.
摘要
四维度(4D)雷达是一种有用的感知器 для 3D 对象检测和周围对象的相对径速度估计,在不同的天气条件下。然而,由于雷达测量受到无效组成部分的干扰,如噪声、干扰和垃圾,因此需要采用预处理算法 перед 3D 对象检测用神经网络。在这篇论文中,我们提出了 RTNH+,它是 RTNH 的改进版,采用了两个新的算法。第一个算法是基于常量假阳报率(CFAR)的两级预处理算法(CCTP),它使用同一个 4D 雷达测量生成两个不同特征的 filtered measurement,以增加输入到 4D 雷达对象检测网络的信息量。第二个算法是垂直编码(VE)算法,它有效地编码了从 CCTP 输出的 vertical 特征。我们详细介绍了 RTNH+,并证明了 RTNH+ 在 ${AP}_{3D}^{IoU=0.3}$ 和 ${AP}_{3D}^{IoU=0.5}$ 中提高了10.14% 和 16.12% 相对于 RTNH。
Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks
for: This paper aims to develop a method of automatically generating evaluation data for large language models (LLMs) to measure their reliability and detect hallucinations.
methods: The proposed method, called AutoDebug, uses prompting chaining to generate transferable adversarial attacks in the form of question-answering examples. These examples are designed to trigger hallucination behaviors in LLMs.
results: The paper evaluates the effectiveness of AutoDebug using two variants of the Natural Questions (NQ) dataset and a collection of open-source and proprietary LLMs. The results show that LLMs are likely to hallucinate in certain question-answering scenarios, and the adversarial examples generated by AutoDebug are transferable across all considered LLMs.Abstract
Although remarkable progress has been achieved in preventing large language model (LLM) hallucinations using instruction tuning and retrieval augmentation, it remains challenging to measure the reliability of LLMs using human-crafted evaluation data which is not available for many tasks and domains and could suffer from data leakage. Inspired by adversarial machine learning, this paper aims to develop a method of automatically generating evaluation data by appropriately modifying existing data on which LLMs behave faithfully. Specifically, this paper presents AutoDebug, an LLM-based framework to use prompting chaining to generate transferable adversarial attacks in the form of question-answering examples. We seek to understand the extent to which these examples trigger the hallucination behaviors of LLMs. We implement AutoDebug using ChatGPT and evaluate the resulting two variants of a popular open-domain question-answering dataset, Natural Questions (NQ), on a collection of open-source and proprietary LLMs under various prompting settings. Our generated evaluation data is human-readable and, as we show, humans can answer these modified questions well. Nevertheless, we observe pronounced accuracy drops across multiple LLMs including GPT-4. Our experimental results show that LLMs are likely to hallucinate in two categories of question-answering scenarios where (1) there are conflicts between knowledge given in the prompt and their parametric knowledge, or (2) the knowledge expressed in the prompt is complex. Finally, we find that the adversarial examples generated by our method are transferable across all considered LLMs. The examples generated by a small model can be used to debug a much larger model, making our approach cost-effective.
摘要
尽管在防止大语言模型(LLM)幻 apparition 中使用 instrucion tuning 和 Retrieval Augmentation 进行了很大的进步,但仍然困难量测试 LLM 的可靠性使用人工制作的评估数据,因为这些数据可能不可得到多个任务和领域,并且可能会uffer from data leakage。受到对抗机器学习的启发,本文提出了一种自动生成评估数据的方法,通过修改 LLM 在 faithful 的数据上进行应ropriate 的修改来生成 transferred adversarial attacks。我们想要了解这些例子在 LLM 中引发幻 apparition 行为的程度。我们使用 ChatGPT 实现 AutoDebug 框架,并使用 prompting chaining 技术生成了一些可读性好的评估数据。我们对一些常用的开放领域问答任务 Natural Questions (NQ) 进行了两种变体的评估,并在多个开源和商业 LLM 上进行了多种 prompting 设置的测试。我们发现,这些修改后的问题可以由人类回答,但 LLM 却表现出了明显的准确率下降。我们的实验结果表明,LLM 在某些问题解决方案中会出现幻 apparition 行为,包括知识与 parametric 知识之间的冲突以及复杂的知识表达。最后,我们发现生成的 adversarial examples 是可 пере移的,可以在所有考虑的 LLM 上使用。此外,我们发现小型模型生成的例子可以用来调试大型模型,这使得我们的方法成本低廉。
Towards Anytime Fine-tuning: Continually Pre-trained Language Models with Hypernetwork Prompt
results: 我们在两个真实世界 dataset 上进行了实验,并获得了3.57%和3.4%的提高,证明了我们的方法的有效性。Abstract
Continual pre-training has been urgent for adapting a pre-trained model to a multitude of domains and tasks in the fast-evolving world. In practice, a continually pre-trained model is expected to demonstrate not only greater capacity when fine-tuned on pre-trained domains but also a non-decreasing performance on unseen ones. In this work, we first investigate such anytime fine-tuning effectiveness of existing continual pre-training approaches, concluding with unanimously decreased performance on unseen domains. To this end, we propose a prompt-guided continual pre-training method, where we train a hypernetwork to generate domain-specific prompts by both agreement and disagreement losses. The agreement loss maximally preserves the generalization of a pre-trained model to new domains, and the disagreement one guards the exclusiveness of the generated hidden states for each domain. Remarkably, prompts by the hypernetwork alleviate the domain identity when fine-tuning and promote knowledge transfer across domains. Our method achieved improvements of 3.57% and 3.4% on two real-world datasets (including domain shift and temporal shift), respectively, demonstrating its efficacy.
摘要
To address this issue, we propose a prompt-guided continual pre-training method. Our approach involves training a hypernetwork to generate domain-specific prompts using both agreement and disagreement losses. The agreement loss ensures that the generated prompts do not deviate from the original input, thereby preserving the generalization of the pre-trained model to new domains. The disagreement loss encourages the generated prompts to be unique and exclusive for each domain, preventing the model from overfitting to a single domain.Remarkably, the prompts generated by the hypernetwork help to alleviate the domain shift problem when fine-tuning and promote knowledge transfer across domains. Our method achieved improvements of 3.57% and 3.4% on two real-world datasets (including domain shift and temporal shift), demonstrating its effectiveness.
GraphGPT: Graph Instruction Tuning for Large Language Models
results: 通过对超级vised和零shot图学习任务进行评估,表明了我们的框架在不同下游任务中的优于state-of-the-art基elines。Abstract
Graph Neural Networks (GNNs) have advanced graph structure understanding via recursive information exchange and aggregation among graph nodes. To improve model robustness, self-supervised learning (SSL) has emerged as a promising approach for data augmentation. However, existing methods for generating pre-trained graph embeddings often rely on fine-tuning with specific downstream task labels, which limits their usability in scenarios where labeled data is scarce or unavailable. To address this, our research focuses on advancing the generalization capabilities of graph models in challenging zero-shot learning scenarios. Inspired by the success of large language models (LLMs), we aim to develop a graph-oriented LLM that can achieve high generalization across diverse downstream datasets and tasks, even without any information available from the downstream graph data. In this work, we present the GraphGPT framework that aligns LLMs with graph structural knowledge with a graph instruction tuning paradigm. Our framework incorporates a text-graph grounding component to establish a connection between textual information and graph structures. Additionally, we propose a dual-stage instruction tuning paradigm, accompanied by a lightweight graph-text alignment projector. This paradigm explores self-supervised graph structural signals and task-specific graph instructions, to guide LLMs in understanding complex graph structures and improving their adaptability across different downstream tasks. Our framework is evaluated on supervised and zero-shot graph learning tasks, demonstrating superior generalization and outperforming state-of-the-art baselines.
摘要
GRAPH NEURAL NETWORKS (GNNs) 有助于更好地理解图结构,通过图节点之间的循环信息交换和聚合来提高模型的 robustness。然而,现有的图模型生成预训练 embeddings 方法通常依赖于特定的下游任务标签,这限制了它们在数据罕见或无标签情况下的使用性。为了解决这个问题,我们的研究集中着精力在挑战的零例学习场景中提高图模型的通用性。受到大型自然语言模型 (LLMs) 的成功所 inspirited,我们目标是开发一个可以在多种下游任务和数据上高度通用的图模型。在这个工作中,我们提出了 GraphGPT 框架,它将 LLMs 与图结构知识集成,并通过图 instrucion 优化 paradigm 来实现。我们的框架包括一个文本-图结合组件,用于将文本信息与图结构相连。此外,我们还提出了一种双stage instrucion 优化 парадигмы,并附加了一个轻量级的图文对齐项目。这种 парадигмы探索了自然语言 Task 特定的图结构信号和任务特定的图 instrucion,以帮助 LLMs 理解复杂的图结构并提高其适应性。我们的框架在超级vised和零例学习图学习任务上进行了评估,并表现出了superior的通用性和超越了现有基eline。
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
paper_authors: Chongyu Fan, Jiancheng Liu, Yihua Zhang, Dennis Wei, Eric Wong, Sijia Liu
for: This paper focuses on the problem of machine unlearning (MU) and introduces a new method called saliency unlearning (SalUn) to address the limitations of existing MU methods.
methods: The SalUn method uses the concept of weight saliency to direct MU’s attention toward specific model weights, improving effectiveness and efficiency.
results: SalUn narrows the performance gap with exact unlearning and achieves better stability and accuracy in high-variance random data forgetting and preventing conditional diffusion models from generating harmful images.Here’s the same information in Simplified Chinese text:
results: SalUn方法在高异常随机数据忘记中减小了与精准忘记(模型重新训练)之间的性能差距,并在预测扩散模型生成危险图像时达到了 nearly 100%的忘记精度。Abstract
With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often grapple with limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' in MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting dataset). To the best of our knowledge, SalUn is the first principled MU approach adaptable enough to effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation. For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.
摘要
With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often grapple with limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' in MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting dataset). To the best of our knowledge, SalUn is the first principled MU approach adaptable enough to effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation. For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.
Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models
results: 我们的研究强调了在开发和部署LMM时的文化主导和伦理考虑的必要性。我们示出了在模型开发和部署中使用两种简单的方法(即预训练数据更加多样化和文化意识提醒)可以有效缓解LMM中的文化主导问题。Abstract
In this paper, we identify a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e.g. ChatGPT). LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages. To systematically evaluate the cultural dominance issue, we build a benchmark that consists of both concrete (e.g. holidays and songs) and abstract (e.g. values and opinions) cultural objects. Empirical results show that the representative GPT models suffer from the culture dominance problem, where GPT-4 is the most affected while text-davinci-003 suffers the least from this problem. Our study emphasizes the need for critical examination of cultural dominance and ethical consideration in their development and deployment. We show two straightforward methods in model development (i.e. pretraining on more diverse data) and deployment (e.g. culture-aware prompting) can significantly mitigate the cultural dominance issue in LLMs.
摘要
在这篇论文中,我们发现了大语言模型(LLM)中的文化主导问题,即因训练数据主要来自英语而导致模型提供不 relevante的英文文化答案。我们建立了一个 benchmark,包括具体的文化对象(如假日和歌曲)和抽象的文化对象(如价值和意见),以系统地评估文化主导问题。实验结果表明,代表性的 GPT 模型受到文化主导问题的影响,其中 GPT-4 最为严重,而 text-davinci-003 最少受到这个问题的影响。我们的研究强调了在开发和部署 LLM 时需要进行严格的文化主导和伦理考虑。我们展示了在模型开发和部署阶段使用更多多样化数据和文化意识提醒等两种简单方法可以减轻 LLM 中的文化主导问题。
GRAPE-S: Near Real-Time Coalition Formation for Multiple Service Collectives
paper_authors: Grace Diehl, Julie A. Adams for: 这种论文是为了解决机器人集群在军事和灾害应急应用中的协会化问题,包括分配机器人到合适的任务小组。methods: 这种论文使用了GRAPE算法和服务模型,以及两种拍卖基础算法作为比较。results: 论文表明,拍卖基础算法在分布式集群中转移不好,导致过长的运行时间和低Utility解决方案。GRAPE-S和Pair-GRAPE-S算法可以在近实时内提供近似优质解决方案,并且可以支持大规模分布式集群和多种服务。Abstract
Robotic collectives for military and disaster response applications require coalition formation algorithms to partition robots into appropriate task teams. Collectives' missions will often incorporate tasks that require multiple high-level robot behaviors or services, which coalition formation must accommodate. The highly dynamic and unstructured application domains also necessitate that coalition formation algorithms produce near optimal solutions (i.e., >95% utility) in near real-time (i.e., <5 minutes) with very large collectives (i.e., hundreds of robots). No previous coalition formation algorithm satisfies these requirements. An initial evaluation found that traditional auction-based algorithms' runtimes are too long, even though the centralized simulator incorporated ideal conditions unlikely to occur in real-world deployments (i.e., synchronization across robots and perfect, instantaneous communication). The hedonic game-based GRAPE algorithm can produce solutions in near real-time, but cannot be applied to multiple service collectives. This manuscript integrates GRAPE and a services model, producing GRAPE-S and Pair-GRAPE-S. These algorithms and two auction baselines were evaluated using a centralized simulator with up to 1000 robots, and via the largest distributed coalition formation simulated evaluation to date, with up to 500 robots. The evaluations demonstrate that auctions transfer poorly to distributed collectives, resulting in excessive runtimes and low utility solutions. GRAPE-S satisfies the target domains' coalition formation requirements, producing near optimal solutions in near real-time, and Pair-GRAPE-S more than satisfies the domain requirements, producing optimal solutions in near real-time. GRAPE-S and Pair-GRAPE-S are the first algorithms demonstrated to support near real-time coalition formation for very large, distributed collectives with multiple services.
摘要
军事和灾害应急应用中的机器人集群需要联盟形成算法来将机器人分配到相应的任务团队。集群的任务经常包括多个高级机器人行为或服务,联盟形成算法必须满足这些要求。应用领域具有高度动态和无结构特点,因此联盟形成算法需要生成>95%的利用率(i.e., <5分钟内),并且可以处理数百个机器人。现有的任何一种联盟形成算法都不满足这些要求。一个初步评估发现,传统的拍卖式算法的运行时间太长,即使在中央模拟器中包含理想的条件(i.e., 机器人同步和精准、实时通信)。hedonic game-based GRAPE算法可以在近实时生成解决方案,但不能应用于多服务集群。这篇论文将GRAPE算法与服务模型集成,生成GRAPE-S和Pair-GRAPE-S。这些算法和两个拍卖基准被用中央模拟器中的1000台机器人进行评估,以及最大的分布式联盟形成模拟评估,包含500台机器人。评估结果表明,拍卖式算法在分布式集群中传递不好,导致运行时间过长,解决方案质量低。GRAPE-S满足了目标领域的联盟形成要求,在近实时内生成近似优解,而Pair-GRAPE-S更 чем满足了领域要求,在近实时内生成优解。GRAPE-S和Pair-GRAPE-S是首个在大规模、分布式集群中实现近实时联盟形成的算法。
An Exploration of In-Context Learning for Speech Language Model
results: 研究表明,通过提档训练,语音LM可以在未看到任务示例的情况下完成几拟学习,并在语音分类任务中证明了其可行性。Abstract
Ever since the development of GPT-3 in the natural language processing (NLP) field, in-context learning (ICL) has played an important role in utilizing large language models (LLMs). By presenting the LM utterance-label demonstrations at the input, the LM can accomplish few-shot learning without relying on gradient descent or requiring explicit modification of its parameters. This enables the LM to learn and adapt in a black-box manner. Despite the success of ICL in NLP, little work is exploring the possibility of ICL in speech processing. This study proposes the first exploration of ICL with a speech LM without text supervision. We first show that the current speech LM does not have the ICL capability. With the proposed warmup training, the speech LM can, therefore, perform ICL on unseen tasks. In this work, we verify the feasibility of ICL for speech LM on speech classification tasks.
摘要
Translated into Simplified Chinese: desde el desarrollo de GPT-3 en el campo de procesamiento de lenguaje natural (NLP), la aprendizaje en contexto (ICL) ha desempeñado un papel importante en el uso de modelos de lenguaje grande (LLM). Al presentar las demostraciones de utterance-etiqueta del LM como entrada, el LM puede realizar aprendizaje de pocos tiros sin depender del descenso de gradientes o requerir modificaciones explícitas de sus parámetros. Esto permite al LM aprender y adaptarse de manera "caja negra". A pesar del éxito de ICL en NLP, poco trabajo se ha realizado en explorar la posibilidad de ICL en el procesamiento de speech. Este estudio propone la primera exploración de ICL con un modelo de speech sin supervisión textual. Primero mostramos que el actual modelo de speech no tiene la capacidad de ICL. Con la formación de warm-up propuesta, el modelo de speech puede, por lo tanto, realizar ICL en tareas no vistas. En este trabajo, verificamos la factibilidad de ICL para el modelo de speech en tareas de clasificación de speech.
Affective Conversational Agents: Understanding Expectations and Personal Influences
methods: 这个研究使用了745名受试者的问卷调查,以评估受试者对不同情感能力的需求和偏好。Specifically, the study assessed preferences regarding AI agents that can perceive, respond to, and simulate emotions across 32 distinct scenarios.
results: 研究发现,受试者对AI会话代理的情感能力的需求因应用场景而异,具体来说,受试者最偏好AI agents能够进行人际交流、提供情感支持和创造性任务。 Additionally, the study found that factors such as emotional reappraisal and personality traits influence the desired affective skills in AI agents.Abstract
The rise of AI conversational agents has broadened opportunities to enhance human capabilities across various domains. As these agents become more prevalent, it is crucial to investigate the impact of different affective abilities on their performance and user experience. In this study, we surveyed 745 respondents to understand the expectations and preferences regarding affective skills in various applications. Specifically, we assessed preferences concerning AI agents that can perceive, respond to, and simulate emotions across 32 distinct scenarios. Our results indicate a preference for scenarios that involve human interaction, emotional support, and creative tasks, with influences from factors such as emotional reappraisal and personality traits. Overall, the desired affective skills in AI agents depend largely on the application's context and nature, emphasizing the need for adaptability and context-awareness in the design of affective AI conversational agents.
摘要
人工智能对话代理的出现已经扩大了增强人类能力的机会,在不同领域。随着这些代理变得更普遍,研究对它们的表现和用户体验的影响已经变得非常重要。在这项研究中,我们询问了745名参与者,以了解他们对不同情感能力的期望和偏好。我们专门评估了参与者对情感感知、应对和模拟情感的场景中的偏好。我们的结果表明,参与者对人工交流、情感支持和创造性任务的场景有很高的偏好,这些场景的影响因素包括情感重新评估和个性特质。总之,人工智能对话代理所需的情感能力取决于应用场景的内容和性质,这推翻了设计情感AI对话代理的需要适应性和场景意识。
Rethinking the Construction of Effective Metrics for Understanding the Mechanisms of Pretrained Language Models
results: 实验结果表明,使用树Topological Probe可以提供有用的信息,并且可以帮助提高 fine-tuning 性能。此外,研究还提出了一种可能的BERT-like预训练语言模型的工作机制。Abstract
Pretrained language models are expected to effectively map input text to a set of vectors while preserving the inherent relationships within the text. Consequently, designing a white-box model to compute metrics that reflect the presence of specific internal relations in these vectors has become a common approach for post-hoc interpretability analysis of pretrained language models. However, achieving interpretability in white-box models and ensuring the rigor of metric computation becomes challenging when the source model lacks inherent interpretability. Therefore, in this paper, we discuss striking a balance in this trade-off and propose a novel line to constructing metrics for understanding the mechanisms of pretrained language models. We have specifically designed a family of metrics along this line of investigation, and the model used to compute these metrics is referred to as the tree topological probe. We conducted measurements on BERT-large by using these metrics. Based on the experimental results, we propose a speculation regarding the working mechanism of BERT-like pretrained language models, as well as a strategy for enhancing fine-tuning performance by leveraging the topological probe to improve specific submodules.
摘要
<> translate "Pretrained language models are expected to effectively map input text to a set of vectors while preserving the inherent relationships within the text. Consequently, designing a white-box model to compute metrics that reflect the presence of specific internal relations in these vectors has become a common approach for post-hoc interpretability analysis of pretrained language models. However, achieving interpretability in white-box models and ensuring the rigor of metric computation becomes challenging when the source model lacks inherent interpretability. Therefore, in this paper, we discuss striking a balance in this trade-off and propose a novel line to constructing metrics for understanding the mechanisms of pretrained language models. We have specifically designed a family of metrics along this line of investigation, and the model used to compute these metrics is referred to as the tree topological probe. We conducted measurements on BERT-large by using these metrics. Based on the experimental results, we propose a speculation regarding the working mechanism of BERT-like pretrained language models, as well as a strategy for enhancing fine-tuning performance by leveraging the topological probe to improve specific submodules."中文翻译:预训言语模型预期能够有效地将输入文本映射到一组矢量,保留文本中内在关系的约束。因此,设计一个白盒模型来计算这些矢量中特定内在关系的指标,成为预训言语模型后期可读性分析的常见方法。然而,在白盒模型中实现可读性和指标计算的精准性变得困难,因为源模型缺乏内在可读性。因此,在这篇论文中,我们讨论如何平衡这种贸易,并提出了一种新的方法来构建适用于理解预训言语模型机制的指标。我们专门设计了一家指标,用于计算这些指标,并称之为树Topological probe。我们对BERT-large进行了测量,并基于实验结果,我们提出了预训言语模型工作机制的 Speculation,以及可以通过树Topological probe来提高精度调整性的策略。
MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features
paper_authors: Huayu Li, Ana S. Carreon-Rascon, Xiwen Chen, Geng Yuan, Ao Li
for: 这 paper 是为了提高医疗数据的表示学学习,特别是针对医疗时间序数据。
methods: 这 paper 使用了自适应学习(SSL)和伪预测器(MAE)方法,并将其组合起来,以提高医疗应用程序的表示学学习。它还使用了多个遮盲策略,以便在不同的医疗时间序数据上学习不同的视图。
results: 实验结果表明,MTS-LOF 比其他方法更高效,并且可以更好地捕捉医疗时间序数据中的 Contextual information。这些结果表明,MTS-LOF 可以提高医疗应用程序的表示学学习,并且可以更好地理解医疗数据的复杂关系。Abstract
Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of joint-embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.
摘要
医疗时间序数据是医疗健康管理中不可或缺的,它们提供了重要的疾病诊断、治疗规划和患者管理的关键信息。随着感知技术的不断发展,医疗时间序数据的复杂性呈指数增长,这对数据标注带来了挑战。自主学习(SSL)已经成为一种解决这些挑战的transformative方法,不需要大量的人类标注。在这个研究中,我们介绍了一种新的医疗时间序表示学习框架,称为MTS-LOF。MTS-LOF利用了对比学习和Masked Autoencoder(MAE)方法的优点,提供了一种新的医疗时间序表示学习方法。通过将这些技术相结合,MTS-LOF可以为医疗应用程序提供更加复杂、上下文rich的表示。此外,MTS-LOF使用多masking策略来促进遮盲不变的特征学习。这种方法使得模型可以创建多个视图的数据。通过将这些遮盲的patches与完整可见的patches的表示差异到最小,MTS-LOF可以捕捉医疗时间序数据中的rich上下文信息。实验结果表明,MTS-LOF在多种医疗时间序数据集上的性能superior于其他方法。这些发现承诺可以大幅提高医疗应用程序的表示学习,并且我们的工作也探讨了 joint-embedding SSL和MAE技术的共同作用, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data。这种理解是关键的,因为它允许我们更好地理解医疗数据分析的复杂性。
Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher
results: 经过广泛的实验和评估,我们的方法在不同的 SOTA 方法面进行了超过其他方法的重要性、责任性和信任性。Abstract
The advent of Large Language Models (LLMs) has shown the potential to improve relevance and provide direct answers in web searches. However, challenges arise in validating the reliability of generated results and the credibility of contributing sources, due to the limitations of traditional information retrieval algorithms and the LLM hallucination problem. Aiming to create a "PageRank" for the LLM era, we strive to transform LLM into a relevant, responsible, and trustworthy searcher. We propose a novel generative retrieval framework leveraging the knowledge of LLMs to foster a direct link between queries and online sources. This framework consists of three core modules: Generator, Validator, and Optimizer, each focusing on generating trustworthy online sources, verifying source reliability, and refining unreliable sources, respectively. Extensive experiments and evaluations highlight our method's superior relevance, responsibility, and trustfulness against various SOTA methods.
摘要
LLM时代的出现带来了提高搜索结果相关性和直接回答的潜力。然而,验证生成结果的可靠性和贡献来源的可靠性受到传统信息检索算法的限制和LLM幻觉问题的影响。我们希望通过转化LLM成为可靠、负责任和可信的搜索引擎。我们提出了一种新的生成检索框架,利用LLM的知识来建立直接连接查询和在线源。这个框架包括三个核心模块:生成器、验证器和优化器,每个模块都关注生成可靠的在线源,验证源可靠性,并且修复不可靠的源。我们的方法在多种SOTA方法的比较中显示出了更高的相关性、负责任性和可信worthiness。
PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models
results: 根据EXTENSIVE experiments中的结果,POISONPROMPT可以成功地攻击三种提示方法中的两种(硬件和软件提示法),并且可以在六个数据集和三种常用的LLM中进行可靠的攻击。这些结果表明,提示方法可以增强LLM的下游任务性能,但是同时也增加了LLM的安全风险。Abstract
Prompts have significantly improved the performance of pretrained Large Language Models (LLMs) on various downstream tasks recently, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-based LLMs. In this paper, we present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of POISONPROMPT through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs. Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.
摘要
Recently, 提示(prompts)have significantly improved the performance of pre-trained Large Language Models (LLMs) on various downstream tasks, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-based LLMs. In this paper, we present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of POISONPROMPT through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs. Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.Here's the text with traditional Chinese characters:Recently, 提示(prompts)have significantly improved the performance of pre-trained Large Language Models (LLMs) on various downstream tasks, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-based LLMs. In this paper, we present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of POISONPROMPT through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs. Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.
Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach
results: 这种方法可以为随机森林预测提供地方性的解释,生成对于任何模型预测的贡献,并且与已有的方法like SHAP相比,能够更好地解释随机森林模型的外样性表现。Abstract
We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.
摘要
我们提出了一种新的方法来解释随机森林(RF)模型的外样性表现,利用随机森林可以视为一种适应加权最近邻近模型的事实。特别是,我们使用随机森林学习到的特征空间中点的距离来重写随机森林预测为一个加权平均的target标签值。这种线性性使得我们可以在培 обу集中的观察点上生成随机森林预测的解释,并且与已有的方法 like SHAP 相结合,从而提供了一种地方性的解释随机森林预测的方法。我们在美国公司债券交易数据集上应用了这种方法,并与其他已有的解释方法进行比较。
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models
paper_authors: Deepak Nathani, David Wang, Liangming Pan, William Yang Wang
for: 提高自然语言理解能力
methods: 多方面反馈机制,包括冻结LM和外部工具,每一个模块专注于特定的错误类型
results: 对LM生成的理由链中多种错误进行改进,提高LM在多种逻辑任务的表现,相对提高率达20%(数学逻辑)和18%(逻辑推论)Abstract
Language Models (LMs) have shown impressive performance in various natural language tasks. However, when it comes to natural language reasoning, LMs still face challenges such as hallucination, generating incorrect intermediate reasoning steps, and making mathematical errors. Recent research has focused on enhancing LMs through self-improvement using feedback. Nevertheless, existing approaches relying on a single generic feedback source fail to address the diverse error types found in LM-generated reasoning chains. In this work, we propose Multi-Aspect Feedback, an iterative refinement framework that integrates multiple feedback modules, including frozen LMs and external tools, each focusing on a specific error category. Our experimental results demonstrate the efficacy of our approach to addressing several errors in the LM-generated reasoning chain and thus improving the overall performance of an LM in several reasoning tasks. We see a relative improvement of up to 20% in Mathematical Reasoning and up to 18% in Logical Entailment.
摘要
自然语言模型(LM)在各种自然语言任务中表现出色,但在自然语言推理方面仍面临挑战,如幻见、生成错误中间推理步骤和数学错误。现有研究集中在使LM自我改进的方法,但现有的方法均靠单一的通用反馈源,无法解决LM生成推理链中的多种错误类型。在这项工作中,我们提出了多方面反馈(Multi-Aspect Feedback),一种迭代优化框架,它将多个反馈模块、包括冻结LM和外部工具,每个模块都专注于特定错误类型。我们的实验结果表明,我们的方法可以有效地改进LM在数学推理和逻辑推理等任务中的表现,相对提高了20%以上,并在逻辑推理任务中提高了18%以上。
Automated Repair of Declarative Software Specifications in the Era of Large Language Models
results: 研究发现ChatGPT可以成功修复一些其他方法无法 Address的错误,但也存在一些错误和幻见问题。Abstract
The growing adoption of declarative software specification languages, coupled with their inherent difficulty in debugging, has underscored the need for effective and automated repair techniques applicable to such languages. Researchers have recently explored various methods to automatically repair declarative software specifications, such as template-based repair, feedback-driven iterative repair, and bounded exhaustive approaches. The latest developments in large language models provide new opportunities for the automatic repair of declarative specifications. In this study, we assess the effectiveness of utilizing OpenAI's ChatGPT to repair software specifications written in the Alloy declarative language. Unlike imperative languages, specifications in Alloy are not executed but rather translated into logical formulas and evaluated using backend constraint solvers to identify specification instances and counterexamples to assertions. Our evaluation focuses on ChatGPT's ability to improve the correctness and completeness of Alloy declarative specifications through automatic repairs. We analyze the results produced by ChatGPT and compare them with those of leading automatic Alloy repair methods. Our study revealed that while ChatGPT falls short in comparison to existing techniques, it was able to successfully repair bugs that no other technique could address. Our analysis also identified errors in ChatGPT's generated repairs, including improper operator usage, type errors, higher-order logic misuse, and relational arity mismatches. Additionally, we observed instances of hallucinations in ChatGPT-generated repairs and inconsistency in its results. Our study provides valuable insights for software practitioners, researchers, and tool builders considering ChatGPT for declarative specification repairs.
摘要
随着声明式软件要求的使用越来越普遍,人们对于这类语言的自动修复技术的需求也在增长。研究人员们已经开始 explore various方法来自动修复声明式软件要求,如模板基于的修复、反馈驱动的迭代修复和约束搜索等方法。最新的大语言模型提供了新的机会 для声明式软件要求的自动修复。本研究通过使用 OpenAI 的 ChatGPT 来评估 Alloy 声明语言中的自动修复效果。不同于 imperative 语言,Alloy 的specification 不会被执行,而是被翻译成逻辑方程并通过 backend 约束解释器来评估 specification 实例和 counterexample 。我们的评估将注重 ChatGPT 在修复 Alloy 声明语言中的正确性和完整性。我们分析了 ChatGPT 生成的结果,并与主流的自动 Alloy 修复方法进行比较。我们的研究发现,虽然 ChatGPT 落后于现有的技术,但它能够成功修复一些其他技术无法处理的错误。我们的分析还发现了 ChatGPT 生成的修复结果中的错误,包括不正确的运算使用、类型错误、高阶逻辑错误和关系性质匹配错误。此外,我们还观察到了 ChatGPT 生成的修复结果中的幻见和结果不一致。本研究为软件实践人员、研究人员和工具制作人员在考虑使用 ChatGPT 进行声明式软件要求的修复提供了有价值的洞察。
Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding
paper_authors: Jianing Wang, Qiushi Sun, Nuo Chen, Chengyu Wang, Jun Huang, Ming Gao, Xiang Li
for: 这个 paper 的目的是提高大型预训语言模型在具有有限资源的情况下表现,以解决大型预训语言模型对于有限资源的缺乏。
methods: 这个 paper 使用了自教学法(SSL),具体来说是使用大量无标的数据生成模拟例子,并在教师模型中添加 Monte Carlo 抽掉以进行不确定性估计。在学生训练中,我们提出了多个参数效率学习(PEL)方法,允许仅更新一小部分参数。
results: 实验结果显示,UPET 可以提高表现和效率,并且可以在多个下游任务上 достичьsubstantial 的改进。Abstract
The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios. To remedy this dilemma, we study self-training as one of the predominant semi-supervised learning (SSL) approaches, which utilizes large-scale unlabeled data to generate synthetic examples. However, too many noisy labels will hurt the model performance, and the self-training procedure requires multiple training iterations making it more expensive if all the model parameters of the PLM are updated. This paper presents UPET, a novel Uncertainty-aware Parameter-Efficient self-Training framework to effectively and efficiently address the labeled data scarcity issue. Specifically, we incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the teacher model and then judiciously select reliable pseudo-labeled examples based on confidence and certainty. During the student training, we introduce multiple parameter-efficient learning (PEL) paradigms that allow the optimization of only a small percentage of parameters. We also propose a novel Easy-Hard Contrastive Tuning to enhance the robustness and generalization. Extensive experiments over multiple downstream tasks demonstrate that UPET achieves a substantial improvement in terms of performance and efficiency. Our codes and data are released at https: //github.com/wjn1996/UPET.
摘要
Reducing Uncertainty in Sea-level Rise Prediction: A Spatial-variability-aware Approach
methods: 这 paper 使用了 zonal regression 模型,解决了地域差异和模型依赖关系。
results: 实验结果表明,通过这 approach 在地区层次上学习 weights,可以提供更可靠的海平面升高预测。Abstract
Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such as possible tipping points (e.g., collapse of Greenland or West Antarctic ice-shelf), climate feedback loops (e.g., clouds, permafrost thawing), future policy decisions, and human actions. Most existing climate modeling approaches use the same set of weights globally, during either regression or deep learning to combine different climate projections. Such approaches are inadequate when different regions require different weighting schemes for accurate and reliable sea-level rise predictions. This paper proposes a zonal regression model which addresses spatial variability and model inter-dependency. Experimental results show more reliable predictions using the weights learned via this approach on a regional scale.
摘要
Translated into Simplified Chinese:给定多模型ensemble气候预测,目标是准确地预测未来海平面上升,同时降低不确定性。这个问题非常重要,因为海平面上升会对海洋和沿海社区的人们产生深见影响,这是由于气候变化对北极和南极冰川的影响。这个问题具有空间不一致和未知因素,如可能的致命点(例如格陵兰或西安таркти亚极冰川崩塌)、气候反馈循环(例如云和冻土融化)、未来政策决策和人类行为。现有的气候模拟方法通常使用全球相同的权重,在回归或深度学习中组合不同的气候预测。这些方法无法满足不同地区需要不同权重分配的准确和可靠海平面上升预测。这篇论文提出了zonal回归模型,该模型解决了空间不一致和模型互相依赖问题。实验结果表明,通过该方法在地域级别上学习权重后,可以获得更可靠的预测结果。
AI for Mathematics: A Cognitive Science Perspective
paper_authors: Cedegao E. Zhang, Katherine M. Collins, Adrian Weller, Joshua B. Tenenbaum
for: This paper is written for researchers and practitioners in the field of artificial intelligence (AI) who are interested in developing automated mathematicians.
methods: The paper draws on cognitive science research directions to inform the development of truly human-level mathematical systems.
results: The paper highlights the importance of considering a multidisciplinary perspective, involving cognitive scientists, AI researchers, and mathematicians, to develop better mathematical AI systems that can push the frontier of mathematics and provide insights into human cognition.Abstract
Mathematics is one of the most powerful conceptual systems developed and used by the human species. Dreams of automated mathematicians have a storied history in artificial intelligence (AI). Rapid progress in AI, particularly propelled by advances in large language models (LLMs), has sparked renewed, widespread interest in building such systems. In this work, we reflect on these goals from a \textit{cognitive science} perspective. We call attention to several classical and ongoing research directions from cognitive science, which we believe are valuable for AI practitioners to consider when seeking to build truly human (or superhuman)-level mathematical systems. We close with open discussions and questions that we believe necessitate a multi-disciplinary perspective -- cognitive scientists working in tandem with AI researchers and mathematicians -- as we move toward better mathematical AI systems which not only help us push the frontier of the mathematics, but also offer glimpses into how we as humans are even capable of such great cognitive feats.
摘要
mathematics是人类创造并使用的一种极其强大的概念体系。自动化数学家的梦想有着悠久的历史在人工智能(AI)领域。特别是在大语言模型(LLMs)的进步下,AI的进步得到了推动,这引发了人们对于建立这种系统的再一次兴趣。在这篇文章中,我们从认知科学的角度思考这些目标。我们强调了一些经典和持续研究的方向,我们认为这些方向对于AI实践人员来说非常有价值,可以帮助建立人类(或超人)水平的数学系统。文章结束时,我们提出了一些开放的讨论和问题,我们认为需要多学科合作才能解决这些问题,即认知科学家、AI研究者和数学家在一起,以实现更好的数学AI系统,不仅能够推动数学的前沿,还能够为我们提供人类的大脑能力之间的窥视。
Provable Guarantees for Neural Networks via Gradient Feature Learning
results: 研究发现,两层网络在训练过程中可以学习出非线性特征,并且这种特征学习能力不仅限于特征函数的内存。此外,研究还发现了一些有趣的网络学习现象,如特征学习超出kernel的限制和抽奖假设。Abstract
Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific. This work proposes a unified analysis framework for two-layer networks trained by gradient descent. The framework is centered around the principle of feature learning from gradients, and its effectiveness is demonstrated by applications in several prototypical problems, such as mixtures of Gaussians and parity functions. The framework also sheds light on interesting network learning phenomena such as feature learning beyond kernels and the lottery ticket hypothesis.
摘要
神经网络已经实现了非常出色的实际性能,而当前的理论分析并没有充分理解它们的成功,例如,神经折衔核方法不能捕捉它们的关键特征学习能力,而最近的特征学习分析通常是问题特定的。这个工作提出了两层网络通过梯度下降学习的统一分析框架,该框架中心在特征学习从梯度中的原则上,并通过应用于多个典型问题,如混合的高斯函数和平衡函数,证明了其效iveness。此外,该框架还暴露了神经网络学习现象,如特征学习超过折衔和抽奖假设。
Classification-Aided Robust Multiple Target Tracking Using Neural Enhanced Message Passing
results: 提出了一种基于神经网络增强消息传递的分类帮助稳定多目标跟踪算法,包括三个模块:消息传递模块、神经网络模块和德本-沙佛模块。该算法可以有效地抑制干扰和提高数据关联,从而在实际雷达应用中提高多目标跟踪性能。验证了该方法的有效性通过实验和实际数据场景。Abstract
We address the challenge of tracking an unknown number of targets in strong clutter environments using measurements from a radar sensor. Leveraging the range-Doppler spectra information, we identify the measurement classes, which serve as additional information to enhance clutter rejection and data association, thus bolstering the robustness of target tracking. We first introduce a novel neural enhanced message passing approach, where the beliefs obtained by the unified message passing are fed into the neural network as additional information. The output beliefs are then utilized to refine the original beliefs. Then, we propose a classification-aided robust multiple target tracking algorithm, employing the neural enhanced message passing technique. This algorithm is comprised of three modules: a message-passing module, a neural network module, and a Dempster-Shafer module. The message-passing module is used to represent the statistical model by the factor graph and infers target kinematic states, visibility states, and data associations based on the spatial measurement information. The neural network module is employed to extract features from range-Doppler spectra and derive beliefs on whether a measurement is target-generated or clutter-generated. The Dempster-Shafer module is used to fuse the beliefs obtained from both the factor graph and the neural network. As a result, our proposed algorithm adopts a model-and-data-driven framework, effectively enhancing clutter suppression and data association, leading to significant improvements in multiple target tracking performance. We validate the effectiveness of our approach using both simulated and real data scenarios, demonstrating its capability to handle challenging tracking scenarios in practical radar applications.
摘要
我们面临对未知数目目标在强杂环境中进行追踪,使用射频感知器的测量。我们利用射频 Doppler спектrum 信息,识别测量类型,从而增强杂环排除和数据汇合,提高目标追踪的Robustness。我们首先介绍一个具有神经网络增强的讯息传递方法,其中信念由统一讯息传递获取,然后透过神经网络将信念处理,以改善原始信念。然后,我们提出一个类别帮助强化多目标追踪算法,这个算法包括三个模组:讯息传递模组、神经网络模组和德мп斯特-沙佛模组。讯息传递模组用于表示统计模型,使用因子 граhp 表示目标运动状态、可见状态和数据汇合,基于空间测量信息。神经网络模组用于从射频 Doppler спектrum 提取特征,得出 measurement 是否为目标生成或噪音生成的信念。德мп斯特-沙佛模组用于联合这两个模组所得到的信念。因此,我们的提出的方法采用模型和数据驱动的框架,实现强化杂环排除和数据汇合,导致多目标追踪性能的明显提高。我们验证了方法的有效性通过实验和实际数据enario,证明它在实际射频应用中可以应对具有具有挑战性的追踪enario。
GPT-4 Doesn’t Know It’s Wrong: An Analysis of Iterative Prompting for Reasoning Problems
results: study 显示 LLMs 不good at solving Graph Coloring 实例,并且不能够验证自己生成的解决方案。correctness 和内容 of the criticisms 对 iterative prompting 的性能没有影响。Abstract
There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples, a wide spread belief in their iterative self-critique capabilities persists. In this paper, we set out to systematically investigate the effectiveness of iterative prompting of LLMs in the context of Graph Coloring, a canonical NP-complete reasoning problem that is related to propositional satisfiability as well as practical problems like scheduling and allocation. We present a principled empirical study of the performance of GPT4 in solving graph coloring instances or verifying the correctness of candidate colorings. In iterative modes, we experiment with the model critiquing its own answers and an external correct reasoner verifying proposed solutions. In both cases, we analyze whether the content of the criticisms actually affects bottom line performance. The study seems to indicate that (i) LLMs are bad at solving graph coloring instances (ii) they are no better at verifying a solution--and thus are not effective in iterative modes with LLMs critiquing LLM-generated solutions (iii) the correctness and content of the criticisms--whether by LLMs or external solvers--seems largely irrelevant to the performance of iterative prompting. We show that the observed increase in effectiveness is largely due to the correct solution being fortuitously present in the top-k completions of the prompt (and being recognized as such by an external verifier). Our results thus call into question claims about the self-critiquing capabilities of state of the art LLMs.
摘要
LLMs 的思维能力受到了不同的观点的争议。尽管初始的乐观情绪,思维能力会自动出现随着规模的提高,已经受到了许多对例的推翻,但对 LLMs 的循环自我批判能力仍然广泛存在信任。在这篇论文中,我们系统地 investigate LLMs 在图色置问题上的效果,这是一个NP完备的问题,与 propositional 满足性和实际问题如调度和分配有关。我们使用 GPT4 来解决图色置问题或验证候选解的正确性。在循环模式下,我们实验室LMs 自我批判其自己的答案,以及一个外部正确的解释器验证提出的解决方案。在两种情况下,我们分析了批判的内容是否实际影响循环提问的表现。研究显示:1. LLMs 解决图色置问题不善。2. LLMs 不能验证解决方案的正确性,因此循环模式下LMs 自我批判LMs 生成的解决方案无效。3. 批判的内容和正确性无关,无论是由 LLMs 或外部解释器验证。4. 观察到的效果增加主要归因于prompt中的top-k完成中包含正确解决方案,并由外部验证器认可。我们的结果质疑了现代 LLMs 的自我批判能力的声称。
for: This paper aims to address the challenge of abbreviated column names in large volumes of tabular data, which can negatively impact performance on various data search, access, and understanding tasks.
methods: The paper introduces a new task called NameGuess, which expands column names in database schemas as a natural language generation problem. The authors create a training dataset of 384K abbreviated-expanded column pairs using a new data fabrication method and a human-annotated evaluation benchmark. They enhance auto-regressive language models by conditioning on table content and column header names to improve performance.
results: The fine-tuned model (with 2.7B parameters) matches human performance in the NameGuess task, and the authors conduct a comprehensive analysis to validate the effectiveness of table content in NameGuess and identify promising future opportunities. The code for the paper has been made available at https://github.com/amazon-science/nameguess.Abstract
Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can negatively impact performance on various data search, access, and understanding tasks. To address this issue, we introduce a new task, called NameGuess, to expand column names (used in database schema) as a natural language generation problem. We create a training dataset of 384K abbreviated-expanded column pairs using a new data fabrication method and a human-annotated evaluation benchmark that includes 9.2K examples from real-world tables. To tackle the complexities associated with polysemy and ambiguity in NameGuess, we enhance auto-regressive language models by conditioning on table content and column header names -- yielding a fine-tuned model (with 2.7B parameters) that matches human performance. Furthermore, we conduct a comprehensive analysis (on multiple LLMs) to validate the effectiveness of table content in NameGuess and identify promising future opportunities. Code has been made available at https://github.com/amazon-science/nameguess.
摘要
To create a training dataset for NameGuess, we employed a new data fabrication method and compiled a human-annotated evaluation benchmark that includes 9.2K examples from real-world tables. To tackle the challenges associated with polysemy and ambiguity in NameGuess, we enhanced auto-regressive language models by conditioning on table content and column header names, resulting in a fine-tuned model with 2.7B parameters that matches human performance.We conducted a comprehensive analysis of multiple large language models to validate the effectiveness of table content in NameGuess and identify promising future opportunities. The code for NameGuess has been made available on GitHub at .
Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection
results: 在GLUE dataset上进行了广泛的实验,并达到了当前最佳性能水平,特别是在高水平的缩放性能上表现出色。Abstract
It is widely acknowledged that large and sparse models have higher accuracy than small and dense models under the same model size constraints. This motivates us to train a large model and then remove its redundant neurons or weights by pruning. Most existing works pruned the networks in a deterministic way, the performance of which solely depends on a single pruning criterion and thus lacks variety. Instead, in this paper, we propose a model pruning strategy that first generates several pruning masks in a designed random way. Subsequently, along with an effective mask-selection rule, the optimal mask is chosen from the pool of mask candidates. To further enhance efficiency, we introduce an early mask evaluation strategy, mitigating the overhead associated with training multiple masks. Our extensive experiments demonstrate that this approach achieves state-of-the-art performance across eight datasets from GLUE, particularly excelling at high levels of sparsity.
摘要
广泛认可的大型和稀疏模型在同等模型大小约束下表现更高准确性。这种情况motivates我们训练一个大型模型,然后从其中 removes redundant neurons or weights by pruning.现有的大多数工作采用了 deterministic pruning方法,其性能受到单一采样决定因子的限制,lacks variety。在这篇论文中,我们提议一种模型剪除策略,首先生成多个剪除面积 candidatese。然后,通过一个有效的面积选择规则,选择 pool of mask candidates中的优化面积。为了进一步提高效率,我们引入了一种早期剪除评估策略, Mitigate the overhead associated with training multiple masks.我们的广泛实验表明,这种方法在GLUE数据集上 achieves state-of-the-art performance,特别是在高水平的稀疏性下表现出色。
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
results: 在118个 OUT-OF-DOMAIN任务上,Auto-Instruct比人工写的指令和现有的LLM生成指令都更高,并且具有很好的普适性,能够在其他LLM上进行排序。Abstract
Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning. Unfortunately, the performance of LLMs is greatly influenced by the quality of these instructions, and manually writing effective instructions for each task is a laborious and subjective process. In this paper, we introduce Auto-Instruct, a novel method to automatically improve the quality of instructions provided to LLMs. Our method leverages the inherent generative ability of LLMs to produce diverse candidate instructions for a given task, and then ranks them using a scoring model trained on a variety of 575 existing NLP tasks. In experiments on 118 out-of-domain tasks, Auto-Instruct surpasses both human-written instructions and existing baselines of LLM-generated instructions. Furthermore, our method exhibits notable generalizability even with other LLMs that are not incorporated into its training process.
摘要
大型语言模型(LLM)可以完成广泛的任务,只需按照自然语言指令进行操作,无需特定任务的精细调整。然而,LLM的性能受到指令质量的影响,并且手动编写每个任务的有效指令是一项劳动ioso和主观的过程。在这篇论文中,我们介绍了Auto-Instruct,一种新的方法,可以自动提高提供给LLM的指令质量。我们的方法利用LLM的内在的生成能力,生成任务相关的多个候选指令,然后使用基于多个存在的575 NLP任务的分数模型来排序。在118个 OUT-OF-DOMAIN任务上进行实验,Auto-Instruct超过了人类编写的指令和现有的LLM生成指令基eline。此外,我们的方法在其他LLM中也 exhibits notable generalizability。
Unsupervised Candidate Answer Extraction through Differentiable Masker-Reconstructor Model
paper_authors: Zhuoer Wang, Yicheng Wang, Ziwei Zhu, James Caverlee
for: 提高问题生成系统中候选答案提取的精度和效果
methods: 提出了一种新的无监督候选答案提取方法,利用文本含义结构自动提取答案
results: 对两个批处理的数据进行了广泛的测试和评估,并显示了与监督方法相当的性能,同时具有自动提取答案的优势。Abstract
Question generation is a widely used data augmentation approach with extensive applications, and extracting qualified candidate answers from context passages is a critical step for most question generation systems. However, existing methods for candidate answer extraction are reliant on linguistic rules or annotated data that face the partial annotation issue and challenges in generalization. To overcome these limitations, we propose a novel unsupervised candidate answer extraction approach that leverages the inherent structure of context passages through a Differentiable Masker-Reconstructor (DMR) Model with the enforcement of self-consistency for picking up salient information tokens. We curated two datasets with exhaustively-annotated answers and benchmark a comprehensive set of supervised and unsupervised candidate answer extraction methods. We demonstrate the effectiveness of the DMR model by showing its performance is superior among unsupervised methods and comparable to supervised methods.
摘要
问题生成是广泛使用的数据增强方法,提取合适的答案候选者从文本段落是大多数问题生成系统中的关键步骤。然而,现有的答案候选EXTRACTION方法依赖于语言规则或标注数据,面临到偏vie annotation问题和总体化难题。为了解决这些限制,我们提出了一种新的无监督候选答案EXTRACTION方法,利用文本段落的自然结构,通过可 diffeomorphisms Masker-Reconstructor(DMR)模型,并通过自我一致性来捕捉突出的信息 токен。我们抽取了两个 dataset with exhaustive annotation answers,并 benchmark了一组完全监督和无监督候选答案EXTRACTION方法。我们示出了 DMR 模型的效果,其性能在无监督方法中至上,与监督方法相当。
Do Language Models Learn about Legal Entity Types during Pretraining?
results: 研究结果显示(1)Llama2 在某些实体类型上表现良好,并且可能通过优化提示模板具有大量提升的潜力;(2)法律预训练集的 LM 表现不一致,可能因为预训练集的变化;(3)LM 能够类型实体,包括多token实体;(4)所有模型都在某些法律子领域中的实体类型上表现不佳;(5)Llama2 显示在 sintactic 信号上过度忽略,而BERT-based 架构则比较具有这种缺点。Abstract
Language Models (LMs) have proven their ability to acquire diverse linguistic knowledge during the pretraining phase, potentially serving as a valuable source of incidental supervision for downstream tasks. However, there has been limited research conducted on the retrieval of domain-specific knowledge, and specifically legal knowledge. We propose to explore the task of Entity Typing, serving as a proxy for evaluating legal knowledge as an essential aspect of text comprehension, and a foundational task to numerous downstream legal NLP applications. Through systematic evaluation and analysis and two types of prompting (cloze sentences and QA-based templates) and to clarify the nature of these acquired cues, we compare diverse types and lengths of entities both general and domain-specific entities, semantics or syntax signals, and different LM pretraining corpus (generic and legal-oriented) and architectures (encoder BERT-based and decoder-only with Llama2). We show that (1) Llama2 performs well on certain entities and exhibits potential for substantial improvement with optimized prompt templates, (2) law-oriented LMs show inconsistent performance, possibly due to variations in their training corpus, (3) LMs demonstrate the ability to type entities even in the case of multi-token entities, (4) all models struggle with entities belonging to sub-domains of the law (5) Llama2 appears to frequently overlook syntactic cues, a shortcoming less present in BERT-based architectures.
摘要
语言模型(LM)在预训练阶段已经证明了它们可以掌握多种语言知识,有可能作为下游任务的意外监督来提供价值。然而,有限的研究已经进行到了域pecific知识的检索,特别是法律知识。我们提议探索Entity Typing任务,作为法律文本理解的重要方面,以及许多下游法律NLP应用的基础任务。通过系统atic评估和分析,以及两种提示(cloze句和QA模板),我们比较了不同类型和长度的实体、具体或 синтакси依据信号,以及不同的LM预训练集(通用和法律 oriented)和架构(encoderBERT基于和Decoder Only with Llama2)。我们发现:1. Llama2在某些实体方面表现良好,并且具有可以通过优化提示模板进行提升的潜力。2.法律 Orientated LMs在性能上存在差异,可能是由其训练集的变化引起的。3.LMs可以对多token实体进行类型化, inclusive 多个域的法律实体。4.所有模型都在特定的法律子领域中的实体表现不佳。5. Llama2显示在 sintactic 信号上有很多缺失,这是BERT基于架构中存在的缺陷。
GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings
results: 实验结果表明,对阿拉伯语数据集进行训练,GARI 可以提高平均精度@1 的表现,相比前期研究提高40.95%和76.80% 在适应域和领域偏移设置下。Abstract
Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces. Existing attempts aimed at controlling the relative isomorphism of different embedding spaces fail to incorporate the impact of semantically related words in the model training objective. To address this, we propose GARI that combines the distributional training objectives with multiple isomorphism losses guided by the graph attention network. GARI considers the impact of semantical variations of words in order to define the relative isomorphism of the embedding spaces. Experimental evaluation using the Arabic language data set shows that GARI outperforms the existing research by improving the average P@1 by a relative score of up to 40.95% and 76.80% for in-domain and domain mismatch settings respectively. We release the codes for GARI at https://github.com/asif6827/GARI.
摘要
《双语 lexical 推导 (BLI) 是 NLP 领域的核心挑战,它基于各个嵌入空间之间的相对同构性。现有的尝试都无法将各个嵌入空间之间的相对同构性控制在模型训练目标中。为此,我们提出了 GARI,它将分布式训练目标与多种同构损失相结合,并由图注意力网络引导。GARI 考虑了 semantic 变化的影响,以定义各个嵌入空间之间的相对同构性。经验证使用阿拉伯语数据集表明,GARI 可以超越现有研究,提高平均 P@1 的表现,在领域匹配设置下提高了40.95%,在领域异同设置下提高了76.80%。我们在 GitHub 上发布了 GARI 代码,请参考 。
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving
paper_authors: Xueliang Zhao, Xinting Huang, Wei Bi, Lingpeng Kong
for: 提高人工智能中的数学问题解决能力
methods: 使用新的框架 called SEGO,通过连接下目步骤和问题解决概率来确定更好的下目步骤,并根据特定的标准进行优化。
results: 通过实验证明,SEGO可以在两个标准测试集上(GSM8K和MATH)提高问题解决性能,表明SEGO在人工智能驱动的数学问题解决中具有潜力。Abstract
Large Language Models (LLMs) have driven substantial progress in artificial intelligence in recent years, exhibiting impressive capabilities across a wide range of tasks, including mathematical problem-solving. Inspired by the success of subgoal-based methods, we propose a novel framework called \textbf{SE}quential sub\textbf{G}oal \textbf{O}ptimization (SEGO) to enhance LLMs' ability to solve mathematical problems. By establishing a connection between the subgoal breakdown process and the probability of solving problems, SEGO aims to identify better subgoals with theoretical guarantees. Addressing the challenge of identifying suitable subgoals in a large solution space, our framework generates problem-specific subgoals and adjusts them according to carefully designed criteria. Incorporating these optimized subgoals into the policy model training leads to significant improvements in problem-solving performance. We validate SEGO's efficacy through experiments on two benchmarks, GSM8K and MATH, where our approach outperforms existing methods, highlighting the potential of SEGO in AI-driven mathematical problem-solving. Data and code associated with this paper will be available at https://github.com/zhaoxlpku/SEGO
摘要
On the Representational Capacity of Recurrent Neural Language Models
results: 研究表明,使用 probabilistic Turing machine (PTM) 和 real-time computation 的情况下,RLMs 可以模拟任何 probabilistic Turing machine (PTM),但是在实际应用中,RLMs 的计算时间是有限的,因此这个结果是 RLMs 的Upper bound。此外,研究还提供了一个 lower bound,表明在实际应用中,RLMs 只能模拟 deterministic real-time rational PTMs。Abstract
This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computational power of RNN LMs (RLMs) should reflect this. We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any probabilistic Turing machine (PTM). Since, in practice, RLMs work in real-time, processing a symbol at every time step, we treat the above result as an upper bound on the expressivity of RLMs. We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.
摘要
A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models
paper_authors: Yi Zhou, Jose Camacho-Collados, Danushka Bollegala
For: This paper aims to study the relationship between various factors of pre-trained Masked Language Models (MLMs) and the social biases they learn, as well as their downstream task performance.* Methods: The authors conduct a comprehensive study using 39 pre-trained MLMs with different model sizes, training objectives, tokenization methods, training data domains, and languages.* Results: The study sheds light on important factors often neglected in prior literature, such as tokenization or model objectives, and provides insights into the relationship between these factors and the social biases learned by MLMs.Here is the same information in Simplified Chinese text:
for: 这paper的目的是研究pre-trained Masked Language Models(MLMs)中不同因素对它们学习的社会偏见以及其下游任务性能。
results: 研究发现了许多在先前文献中被忽略的因素,如tokenization或模型目标,与MLMs学习的社会偏见以及其下游任务性能之间的关系。Abstract
Various types of social biases have been reported with pretrained Masked Language Models (MLMs) in prior work. However, multiple underlying factors are associated with an MLM such as its model size, size of the training data, training objectives, the domain from which pretraining data is sampled, tokenization, and languages present in the pretrained corpora, to name a few. It remains unclear as to which of those factors influence social biases that are learned by MLMs. To study the relationship between model factors and the social biases learned by an MLM, as well as the downstream task performance of the model, we conduct a comprehensive study over 39 pretrained MLMs covering different model sizes, training objectives, tokenization methods, training data domains and languages. Our results shed light on important factors often neglected in prior literature, such as tokenization or model objectives.
摘要
各种社会偏见已在先前的工作中对预训练的掩码语言模型(MLM)进行报道。然而,预训练MLM的多个下面因素有关:其模型大小、训练数据大小、训练目标、训练数据领域、tokenization方法和预训练 corpora 中的语言等。尚未清楚哪些因素影响预训练MLM学习的社会偏见,以及模型的下游任务性能。为了研究预训练MLM的模型因素和学习的社会偏见之间的关系,以及模型的下游任务性能,我们进行了39个预训练MLM的全面研究,覆盖不同的模型大小、训练目标、tokenization方法、训练数据领域和语言。我们的结果揭示了在先前文献中经常被忽略的一些因素,如tokenization或模型目标。
A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems
results: 我们的分析表明,现有的ToD系统存在适应和内在偏见。例如,使用英语ToD数据进行同步训练的阿拉伯语或土耳其语ToD系统仍然表现较差。我们的分析还提供了对ToD数据收集和系统开发的实践建议。Abstract
Achieving robust language technologies that can perform well across the world's many languages is a central goal of multilingual NLP. In this work, we take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue (ToD) systems. We first define new quantitative measures of absolute and relative equivalence in system performance, capturing disparities across languages and within individual languages. Through a series of controlled experiments, we demonstrate that performance disparities depend on a number of factors: the nature of the ToD task at hand, the underlying pretrained language model, the target language, and the amount of ToD annotated data. We empirically prove the existence of the adaptation and intrinsic biases in current ToD systems: e.g., ToD systems trained for Arabic or Turkish using annotated ToD data fully parallel to English ToD data still exhibit diminished ToD task performance. Beyond providing a series of insights into the performance disparities of ToD systems in different languages, our analyses offer practical tips on how to approach ToD data collection and system development for new languages.
摘要
(注:以下是使用简化字符的中文翻译)实现在世界上许多语言上运行的Robust语言技术是多语言NLP的中心目标。在这项工作中,我们对多语言对话任务(ToD)系统的性能差异进行了评估和分析。我们首先定义了新的绝对和相对一致性度量,用于捕捉不同语言和语言之间的性能差异。通过一系列控制的实验,我们证明了性能差异取决于多个因素:ToD任务的性质、基础预训言语模型、目标语言和ToD数据的量。我们经验证了现有ToD系统中的适应和内在偏见:例如,使用完全相同的ToD数据来训练阿拉伯语或土耳其语的ToD系统仍然会导致ToD任务性能下降。我们的分析不仅提供了不同语言ToD系统的性能差异的几种视角,还提供了如何在新语言上收集ToD数据和开发ToD系统的实践建议。
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding
results: 我们发现,即使使用最新的大语言模型(LLMs),例如ChatGPT和LLaMa,也只能达到30%的准确率(相比人类的85%准确率)。此外,我们发现\textsc{StoryAnalogy}数据可以改善LLMs中的analogy生成质量,其中一个精制的FlanT5-xxl模型与零容量ChatGPT模型具有相似的性能。Abstract
Analogy-making between narratives is crucial for human reasoning. In this paper, we evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus, \textsc{StoryAnalogy}, which contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory. We design a set of tests on \textsc{StoryAnalogy}, presenting the first evaluation of story-level analogy identification and generation. Interestingly, we find that the analogy identification tasks are incredibly difficult not only for sentence embedding models but also for the recent large language models (LLMs) such as ChatGPT and LLaMa. ChatGPT, for example, only achieved around 30% accuracy in multiple-choice questions (compared to over 85% accuracy for humans). Furthermore, we observe that the data in \textsc{StoryAnalogy} can improve the quality of analogy generation in LLMs, where a fine-tuned FlanT5-xxl model achieves comparable performance to zero-shot ChatGPT.
摘要
人类理智中的Analogy-making是非常重要的。在这篇论文中,我们评估了人们可以认出和生成Analogy的能力,通过构建了首个类型的大规模故事级Analogy corpus,namely \textsc{StoryAnalogy},其包含24K个故事对from多个领域,并有人类标注了两种相似性from the extended Structure-Mapping Theory。我们设计了一系列测试,这是首次评估故事级Analogy的认出和生成能力。有意思的是,我们发现,不仅 sentence embedding models,而且最近的大语言模型(LLMs),如ChatGPT和LLaMa,在同义测试中只取得了约30%的准确率(与人类的超过85%准确率相比)。此外,我们发现, \textsc{StoryAnalogy} 中的数据可以提高 LLMs 中的同义生成质量,其中一个精心 fine-tuned FlanT5-xxl 模型在多项选择问题中与零shot ChatGPT 具有相似的性能。
results: 研究发现,PEs在基于BERT的语言模型中具有两种常见的特性:Locality和Symmetry。这两种特性与下游任务表现高度相关,而现有PEs在两个新的探测任务中表现较差。这些结果可能为开发更好的PEs提供基础。代码可以在\faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry} 获取。Abstract
Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not fully understood, especially given recent findings that various positional encodings are insensitive to word order. In this work, we conduct a systematic study of positional encodings in \textbf{Bidirectional Masked Language Models} (BERT-style) , which complements existing work in three aspects: (1) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry; (2) We show that the two properties are closely correlated with the performances of downstream tasks; (3) We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly. We believe that these results are the basis for developing better PEs for transformer-based language models. The code is available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}
摘要
位置编码(PEs)用于注入单词顺序信息到转换器基于的语言模型中。尽管它们可以显著提高句子表示的质量,但它们特定的贡献到语言模型中还不完全了解,特别是在最近的发现中,各种位置编码都是不敏感于单词顺序的。在这项工作中,我们进行了系统性的研究,包括以下三个方面:1. 我们揭示了位置编码的核心功能,并确定了两种常见的属性:地方性和对称性。2. 我们发现这两种属性与下游任务的表现密切相关。3. 我们 introduce了两个新的探测任务,以证明现有的位置编码在这些任务上表现不佳。我们认为这些结果是开发更好的位置编码的基础。代码可以在 \faGithub 上获取,链接为 \url{https://github.com/tigerchen52/locality_symmetry}。
Probing LLMs for hate speech detection: strengths and vulnerabilities
results: 结果表明,在平均情况下,包含目标信息在检测过程中可以提高模型性能(约20-30%),而添加说明也可以提高模型性能(约10-20%)。此外,我们还提供了错误案例分类和模型决策错误的解释,这些敏感点自动组成了‘监狱’提示,需要开发行业规模的安全措施,以使模型更加可靠。Abstract
Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without adding any in-context examples). We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans. We find that on average including the target information in the pipeline improves the model performance substantially (~20-30%) over the baseline across the datasets. There is also a considerable effect of adding the rationales/explanations into the pipeline (~10-20%) over the baseline across the datasets. In addition, we further provide a typology of the error cases where these large language models fail to (i) classify and (ii) explain the reason for the decisions they take. Such vulnerable points automatically constitute 'jailbreak' prompts for these models and industry scale safeguard techniques need to be developed to make the models robust against such prompts.
摘要
EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks
paper_authors: Hanan Hamza, Fiza Gafoor, Fathima Sithara, Gayathri Anil, V. S. Anoop
for: This paper aims to improve the accuracy of speech emotion recognition by integrating deep learning techniques and addressing the challenges of speaker diarization and emotion identification.
methods: The proposed method combines a pre-existing speaker diarization pipeline with a Convolutional Neural Network (CNN) based emotion identification model, using features such as MFCC, ZCR, RMS, and data augmentation techniques.
results: The proposed model achieved an unweighted accuracy of 63% in identifying emotional states within speech signals, demonstrating its effectiveness in accurately recognizing emotions in spoken language.Abstract
In the era of advanced artificial intelligence and human-computer interaction, identifying emotions in spoken language is paramount. This research explores the integration of deep learning techniques in speech emotion recognition, offering a comprehensive solution to the challenges associated with speaker diarization and emotion identification. It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN) to achieve higher precision. The proposed model was trained on data from five speech emotion datasets, namely, RAVDESS, CREMA-D, SAVEE, TESS, and Movie Clips, out of which the latter is a speech emotion dataset created specifically for this research. The features extracted from each sample include Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Root Mean Square (RMS), and various data augmentation algorithms like pitch, noise, stretch, and shift. This feature extraction approach aims to enhance prediction accuracy while reducing computational complexity. The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
摘要
在人工智能和计算机之间的交互时代,识别语音中的情感是非常重要的。这项研究探讨了深度学习技术在语音情感识别中的应用,提供了全面的解决方案,以便更好地识别speaker的情感。它提出了一个结合现有的说话人识别管道和基于卷积神经网络(CNN)的情感识别模型,以提高准确性。该模型在五个语音情感数据集上进行训练,分别是RAVDESS、CREMA-D、SAVEE、TESS和Movie Clips,其中Movie Clips是特意为本研究创建的语音情感数据集。每个样本中提取的特征包括Mel Frequency Cepstral Coefficients(MFCC)、Zero Crossing Rate(ZCR)、Root Mean Square(RMS)以及各种数据增强算法如滥声、噪声、延展、偏移等。这种特征提取方法的目的是提高预测准确性,同时减少计算复杂度。该模型在无权重的情况下达到63%的准确率,表明在语音信号中准确地识别情感状态的能力强大。
paper_authors: Jinheon Baek, Soyeong Jeong, Minki Kang, Jong C. Park, Sung Ju Hwang
for: 提高语言模型(LM)内置知识的概率生成文本的准确性
methods: 使用外部知识源扩展LM的知识,并使用一个小型LM进行验证和修正
results: 在多个问答benchmark上验证了验证步骤的效iveness,验证器可以准确地识别抽象和生成错误,使LM提供更加准确的输出Abstract
Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. Yet, LMs often generate the factually incorrect responses to the given queries, since their knowledge may be inaccurate, incomplete, and outdated. To address this problem, previous works propose to augment LMs with the knowledge retrieved from an external knowledge source. However, such approaches often show suboptimal text generation performance due to two reasons: 1) the model may fail to retrieve the knowledge relevant to the given query, or 2) the model may not faithfully reflect the retrieved knowledge in the generated text. To overcome these, we propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier, which is a small LM that is trained to detect those two types of errors through instruction-finetuning. Then, when the verifier recognizes an error, we can rectify it by either retrieving new knowledge or generating new text. Further, we use an ensemble of the outputs from different instructions with a single verifier to enhance the reliability of the verification processes. We validate the effectiveness of the proposed verification steps on multiple question answering benchmarks, whose results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs. Our code is available at https://github.com/JinheonBaek/KALMV.
摘要
现代语言模型(LM)在生成文本时有卓越的能力,但它们经常生成错误的回答。这是因为LM的知识可能是不准确、不完整或过时的。以前的研究建议将知识从外部知识源添加到LM中,但这些方法通常会导致文本生成性能下降。这是因为模型可能无法检索与给定查询相关的知识,或者模型无法忠实地反映检索到的知识在生成文本中。为解决这些问题,我们提议采用一个分离的验证器,这是一个小型LM,通过 instruciton-finetuning 来检测模型输出中的两种类型错误。当验证器发现错误时,我们可以通过 Either retrieving new knowledge or generating new text来纠正错误。此外,我们使用不同 instruciton 的输出 ensemble 以提高验证过程的可靠性。我们在多个问答 benchmark 上验证了我们的方法,结果显示,我们的验证器可以准确地检测检索和生成错误,使LM提供更加准确的输出。我们的代码可以在 GitHub 上找到:https://github.com/JinheonBaek/KALMV。
GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents
results: 在两个实际场景中进行了测试:视频流和智能家居 IoT 控制,并 achieved 80.11% 和 90.78% 的零shot Top-5 拟合率。Abstract
Current gesture recognition systems primarily focus on identifying gestures within a predefined set, leaving a gap in connecting these gestures to interactive GUI elements or system functions (e.g., linking a 'thumb-up' gesture to a 'like' button). We introduce GestureGPT, a novel zero-shot gesture understanding and grounding framework leveraging large language models (LLMs). Gesture descriptions are formulated based on hand landmark coordinates from gesture videos and fed into our dual-agent dialogue system. A gesture agent deciphers these descriptions and queries about the interaction context (e.g., interface, history, gaze data), which a context agent organizes and provides. Following iterative exchanges, the gesture agent discerns user intent, grounding it to an interactive function. We validated the gesture description module using public first-view and third-view gesture datasets and tested the whole system in two real-world settings: video streaming and smart home IoT control. The highest zero-shot Top-5 grounding accuracies are 80.11% for video streaming and 90.78% for smart home tasks, showing potential of the new gesture understanding paradigm.
摘要
Causal-structure Driven Augmentations for Text OOD Generalization
results: 这篇论文通过实验显示,使用这种方法可以提高文本分类器在对应用中的一致性和稳定性,并且比基eline的均衡学习算法更有效率。Abstract
The reliance of text classifiers on spurious correlations can lead to poor generalization at deployment, raising concerns about their use in safety-critical domains such as healthcare. In this work, we propose to use counterfactual data augmentation, guided by knowledge of the causal structure of the data, to simulate interventions on spurious features and to learn more robust text classifiers. We show that this strategy is appropriate in prediction problems where the label is spuriously correlated with an attribute. Under the assumptions of such problems, we discuss the favorable sample complexity of counterfactual data augmentation, compared to importance re-weighting. Pragmatically, we match examples using auxiliary data, based on diff-in-diff methodology, and use a large language model (LLM) to represent a conditional probability of text. Through extensive experimentation on learning caregiver-invariant predictors of clinical diagnoses from medical narratives and on semi-synthetic data, we demonstrate that our method for simulating interventions improves out-of-distribution (OOD) accuracy compared to baseline invariant learning algorithms.
摘要
文本分类器的依赖关系可能会导致在部署时出现差异,从而引起关注其在安全关键领域如医疗中的使用。在这项工作中,我们提议使用对干扰因素的干扰数据增强,受知 causal 结构的数据指导,以便模拟干扰并学习更加鲁棒的文本分类器。我们表明,这种策略在预测问题中,标签与特征之间存在干扰关系时是合适的。在这些问题的假设下,我们讨论了对干扰数据增强的最佳样本复杂度,与重要性重新权重相比。在实践中,我们使用辅助数据进行匹配,基于 diff-in-diff 方法,并使用大型自然语言模型(LLM)来表示文本中的条件概率。通过对医学 narative 和 semi-synthetic 数据进行广泛的实验,我们证明了我们的干扰模拟策略可以提高对于样本外(OOD)准确率,比基础不变学习算法更好。
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
results: 对于分子描述、IUPAC名称预测和分子文本检索等任务,MolCA在比较基eline之上显示出了显著的提高。Abstract
Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception - a critical ability of human professionals in comprehending molecules' topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. MolCA enables an LM (e.g., Galactica) to understand both text- and graph-based molecular contents via the cross-modal projector. Specifically, the cross-modal projector is implemented as a Q-Former to connect a graph encoder's representation space and an LM's text space. Further, MolCA employs a uni-modal adapter (i.e., LoRA) for the LM's efficient adaptation to downstream tasks. Unlike previous studies that couple an LM with a graph encoder via cross-modal contrastive learning, MolCA retains the LM's ability of open-ended text generation and augments it with 2D graph information. To showcase its effectiveness, we extensively benchmark MolCA on tasks of molecule captioning, IUPAC name prediction, and molecule-text retrieval, on which MolCA significantly outperforms the baselines. Our codes and checkpoints can be found at https://github.com/acharkq/MolCA.
摘要
Language Models (LMs) 有表现出很强的分子理解能力在各种一维文本相关任务上。然而,它们缺乏二维图像感知能力,这是人类专业人员理解分子的核心能力。为了bridging这个差距,我们提议了MolCA:分子图像语言模型化with Cross-Modal Projector和Uni-Modal Adapter。MolCA使得LM(例如Galactica)能够理解文本和图像基本分子内容。具体来说,cross-modal projector是通过Q-Former连接图像编码器的表示空间和LM的文本空间来实现的。此外,MolCA还使用uni-modal adapter(即LoRA)来有效地适应下游任务。与前一些研究 coupling LM与图像编码器via cross-modal对抗学习不同,MolCA保留了LM的开放式文本生成能力,并将其与二维图像信息相结合。为证明其效果,我们对MolCA进行了广泛的 benchmarking,并发现它在分子描述、IUPAC名称预测和分子文本检索等任务上表现出色,与基线比较明显提高。codes和checkpoints可以在https://github.com/acharkq/MolCA上找到。
Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization
results: 实验结果表明,该方法可以达到与当前状态OFART的竞争力,并且特别地有助于低资源语言来增强其泛化能力。Abstract
Large language models (LLMs) have exhibited considerable cross-lingual generalization abilities, whereby they implicitly transfer knowledge across languages. However, the transfer is not equally successful for all languages, especially for low-resource ones, which poses an ongoing challenge. It is unclear whether we have reached the limits of implicit cross-lingual generalization and if explicit knowledge transfer is viable. In this paper, we investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization. Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability among the spaces of structural concepts within each language for both encoder-only and decoder-only LLMs. We then propose a meta-learning-based method to learn to align conceptual spaces of different languages, which facilitates zero-shot and few-shot generalization in concept classification and also offers insights into the cross-lingual in-context learning phenomenon. Experiments on syntactic analysis tasks show that our approach achieves competitive results with state-of-the-art methods and narrows the performance gap between languages, particularly benefiting those with limited resources.
摘要
Translated into Simplified Chinese:大型语言模型(LLM)已经展现出较强的跨语言泛化能力,其中模型通过隐式方式传递知识到不同语言。然而,这种传递不是对所有语言都 equally successful,特别是低资源语言,这成为一个持续的挑战。是否已经达到了隐式跨语言泛化的限制,并且是否可以进行显式知识传递,这些问题仍然存在。在这篇论文中,我们调查了在不同语言之间显式对概念匹配的潜在可能性,以提高跨语言泛化。使用语言的 sintactic 方面作为测试台,我们对43种语言进行了分析,发现这些语言之间的概念空间之间存在高度的可对应性,包括encoder-only和decoder-only LLMP。我们then propose了一种基于meta-学习的方法,可以学习不同语言之间的概念空间的对应关系,从而实现零例学习和几例学习在概念分类中的优秀表现。实验表明,我们的方法可以与现有的方法竞争,同时将语言资源差距缩小。
Label-Aware Automatic Verbalizer for Few-Shot Text Classification
results: 对五种语言的五个 dataset 进行实验,显示 LAAV 明显超越现有的逻辑抽象器,并且对mid-to-low resource语言的推荐更加有利。Abstract
Prompt-based learning has shown its effectiveness in few-shot text classification. One important factor in its success is a verbalizer, which translates output from a language model into a predicted class. Notably, the simplest and widely acknowledged verbalizer employs manual labels to represent the classes. However, manual selection does not guarantee the optimality of the selected words when conditioned on the chosen language model. Therefore, we propose Label-Aware Automatic Verbalizer (LAAV), effectively augmenting the manual labels to achieve better few-shot classification results. Specifically, we use the manual labels along with the conjunction "and" to induce the model to generate more effective words for the verbalizer. The experimental results on five datasets across five languages demonstrate that LAAV significantly outperforms existing verbalizers. Furthermore, our analysis reveals that LAAV suggests more relevant words compared to similar approaches, especially in mid-to-low resource languages.
摘要
Transformer-based Entity Legal Form Classification
paper_authors: Alexander Arimond, Mauro Molteni, Dominik Jany, Zornitsa Manolova, Damian Borth, Andreas G. F. Hoepner
for: 这 paper 的目的是使用 Transformer-based 语言模型来分类实体法律形式从原始的法律实体名称中。
methods: 这 paper 使用了多种 BERT 变种,并与多种传统基线进行比较。
results: 这 paper 的评估结果表明,预先训练的 BERT 变种在 F1 分数和 Macro F1 分数中都高于传统文本分类方法,并且在多个选择的法律制度中进行第三方专家评审后,结果得到了证实。Abstract
We propose the application of Transformer-based language models for classifying entity legal forms from raw legal entity names. Specifically, we employ various BERT variants and compare their performance against multiple traditional baselines. Our evaluation encompasses a substantial subset of freely available Legal Entity Identifier (LEI) data, comprising over 1.1 million legal entities from 30 different legal jurisdictions. The ground truth labels for classification per jurisdiction are taken from the Entity Legal Form (ELF) code standard (ISO 20275). Our findings demonstrate that pre-trained BERT variants outperform traditional text classification approaches in terms of F1 score, while also performing comparably well in the Macro F1 Score. Moreover, the validity of our proposal is supported by the outcome of third-party expert reviews conducted in ten selected jurisdictions. This study highlights the significant potential of Transformer-based models in advancing data standardization and data integration. The presented approaches can greatly benefit financial institutions, corporations, governments and other organizations in assessing business relationships, understanding risk exposure, and promoting effective governance.
摘要
我们提议使用变换器基于模型来分类实体法律形式从原始的法律实体名称。specifically,我们利用了多种BERT变种并与多种传统基线进行比较。我们的评估覆盖了大量公开available Legal Entity Identifier(LEI)数据,包括30个不同的法律管辖区,共计1.1万个法律实体。ground truth标签 для分类每个司法管辖区来自ISO 20275标准的Entity Legal Form(ELF)代码标准。我们的发现表明预训练BERT变种在F1分数和Macro F1分数方面都高于传统文本分类方法,并且在多个选定的司法管辖区进行第三方专家审查后,结果支持了我们的建议。这项研究显示了变换器基于模型在数据标准化和数据 интеграция方面的重要潜力。提出的方法可以帮助金融机构、公司、政府和其他组织在评估商业关系、了解风险曝露和促进有效管理方面提供很大的助力。
results: 研究发现,使用Backpack语言模型可以与使用Transformer模型相比,并且可以学习rich的字符级别意义,这些意义可以log-additively compose来形成词义。在SimLex-style lexical semantic evaluations中,Backpack模型的simple averages of character senses可以超过Transformer的输入嵌入。此外,研究还发现了 gender bias 的来源和如何进行 intervene 以减少这种偏见。Abstract
The Backpack is a Transformer alternative shown to improve interpretability in English language modeling by decomposing predictions into a weighted sum of token sense components. However, Backpacks' reliance on token-defined meaning raises questions as to their potential for languages other than English, a language for which subword tokenization provides a reasonable approximation for lexical items. In this work, we train, evaluate, interpret, and control Backpack language models in character-tokenized Chinese, in which words are often composed of many characters. We find that our (134M parameter) Chinese Backpack language model performs comparably to a (104M parameter) Transformer, and learns rich character-level meanings that log-additively compose to form word meanings. In SimLex-style lexical semantic evaluations, simple averages of Backpack character senses outperform input embeddings from a Transformer. We find that complex multi-character meanings are often formed by using the same per-character sense weights consistently across context. Exploring interpretability-through control, we show that we can localize a source of gender bias in our Backpacks to specific character senses and intervene to reduce the bias.
摘要
《背包》是一种 alternativa 于 Transformer 的语言模型,可以提高英语语言模型的可读性。然而,《背包》的依赖于 tokens 定义的意义会引起语言其他语言是否可以使用它的 вопро题。在这项工作中,我们使用 character-tokenized 的中文来训练、评估、解释和控制《背包》语言模型。我们发现我们的 (134M 参数) 中文《背包》语言模型与 (104M 参数) Transformer 相当,并学习了丰富的字符级别意义,这些意义以ilog-additively 组合以形成词义。在 SimLex 式 lexical semantic 评估中,简单的 Backpack 字符意义的平均值比输入 embedding 从 Transformer 高。我们发现了一些复杂的多字符意义通常是通过使用相同的每个字符意义Weight consistently across context来形成。我们 также发现了可以通过控制来解释性的地方性偏见的来源,并可以采取措施来减少偏见。
Representing and Computing Uncertainty in Phonological Reconstruction
results: 这个论文提出了一种新的框架,可以表示语言重建中的不确定性,并且包括一个计算词库的工作流程。Abstract
Despite the inherently fuzzy nature of reconstructions in historical linguistics, most scholars do not represent their uncertainty when proposing proto-forms. With the increasing success of recently proposed approaches to automating certain aspects of the traditional comparative method, the formal representation of proto-forms has also improved. This formalization makes it possible to address both the representation and the computation of uncertainty. Building on recent advances in supervised phonological reconstruction, during which an algorithm learns how to reconstruct words in a given proto-language relying on previously annotated data, and inspired by improved methods for automated word prediction from cognate sets, we present a new framework that allows for the representation of uncertainty in linguistic reconstruction and also includes a workflow for the computation of fuzzy reconstructions from linguistic data.
摘要
尽管历史语言学中重建的 natura 有一定的抽象和不确定性,大多数学者在提出 proto-form 时并不表达这种不确定性。随着近期提出的一些方法在传统比较方法中自动化一些方面的成功, proto-form 的 formalization 也得到了改善。这种 formalization 使得可以考虑 both 表达和计算不确定性。基于最近的监督式phonological reconstruction 方法,我们提出了一个新的框架,该框架允许表达语言重建中的不确定性,并包括一个计算不确定性的工作流程。
Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing
results: 发现某些decoder-only LLMs在大多数金融任务上表现出色,但在使用专有数据集时,它们通常落后于专业化模型,特别是在使用预训练的情况下。Abstract
The emergence of Large Language Models (LLMs), such as ChatGPT, has revolutionized general natural language preprocessing (NLP) tasks. However, their expertise in the financial domain lacks a comprehensive evaluation. To assess the ability of LLMs to solve financial NLP tasks, we present FinLMEval, a framework for Financial Language Model Evaluation, comprising nine datasets designed to evaluate the performance of language models. This study compares the performance of encoder-only language models and the decoder-only language models. Our findings reveal that while some decoder-only LLMs demonstrate notable performance across most financial tasks via zero-shot prompting, they generally lag behind the fine-tuned expert models, especially when dealing with proprietary datasets. We hope this study provides foundation evaluations for continuing efforts to build more advanced LLMs in the financial domain.
摘要
大型自然语言模型(LLM),如ChatGPT,对普通自然语言处理(NLP)任务带来革命性的改变。然而,它们在金融领域的专业知识仍然缺乏全面的评估。为了评估语言模型在金融NLP任务中的能力,我们提出了FinLMEval框架,包括9个数据集,用于评估语言模型的表现。本研究比较了encoder-only语言模型和decoder-only语言模型的表现。我们的发现表明,虽然一些decoder-only LLMS在大多数金融任务上通过零shot提示表现出优异的能力,但它们通常落后于专门适应模型,特别是当面临专有数据集时。我们希望这项研究可以提供基础评估,以便将来继续努力建立更高级的LLMs在金融领域。
Towards Real-World Streaming Speech Translation for Code-Switched Speech
results: 该论文在两种不同的翻译Setting下(即Offline和流动环境)实现了基线结果。Abstract
Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier.
摘要
��ysteering (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other parts of the world.
results: 通过对多个常用的数据集进行广泛的实验,研究发现 NAON 模型可以与既有的排序方法相比,并且与当前的状态�的表现竞争。代码可以在以下链接中找到:https://github.com/steven640pixel/nonautoregressive-sentence-ordering。Abstract
Existing sentence ordering approaches generally employ encoder-decoder frameworks with the pointer net to recover the coherence by recurrently predicting each sentence step-by-step. Such an autoregressive manner only leverages unilateral dependencies during decoding and cannot fully explore the semantic dependency between sentences for ordering. To overcome these limitations, in this paper, we propose a novel Non-Autoregressive Ordering Network, dubbed \textit{NAON}, which explores bilateral dependencies between sentences and predicts the sentence for each position in parallel. We claim that the non-autoregressive manner is not just applicable but also particularly suitable to the sentence ordering task because of two peculiar characteristics of the task: 1) each generation target is in deterministic length, and 2) the sentences and positions should match exclusively. Furthermore, to address the repetition issue of the naive non-autoregressive Transformer, we introduce an exclusive loss to constrain the exclusiveness between positions and sentences. To verify the effectiveness of the proposed model, we conduct extensive experiments on several common-used datasets and the experimental results show that our method outperforms all the autoregressive approaches and yields competitive performance compared with the state-of-the-arts. The codes are available at: \url{https://github.com/steven640pixel/nonautoregressive-sentence-ordering}.
摘要
传统的句子排序方法通常采用encoder-decoder框架,使用Pointer网来恢复句子之间的相互关系。这种泛化的方式只能利用句子之间的单方向依赖关系,无法全面探索句子之间的semantic依赖关系。为了解决这些限制,在这篇论文中,我们提出了一种新的非泛化排序网络,名为NAON,它可以并行地遍历句子之间的bilateral依赖关系,并且可以在不同的句子之间进行独特的排序。我们认为非泛化的方式不仅可以应用于句子排序任务,而且特别适用于这种任务,因为句子的生成目标是固定长度的,并且句子和位置之间必须匹配精确。此外,为了解决非泛化 transformer 的重复问题,我们引入了一种独特的损失函数,以便约束句子和位置之间的唯一性。为了证明我们的方法的效果,我们在多个常用的数据集上进行了广泛的实验,结果表明,我们的方法不仅超过了所有泛化方法,而且与当前的state-of-the-arts具有竞争力。代码可以在以下链接获取:\url{https://github.com/steven640pixel/nonautoregressive-sentence-ordering}.
Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications
results: 实验结果表明,我们提出的方法可以增强模型在不稳定的股票市场中适应时间分布shift的能力,并且在不同的时间窗口和市场情况下具有良好的泛化能力。Abstract
Temporal data distribution shift is prevalent in the financial text. How can a financial sentiment analysis system be trained in a volatile market environment that can accurately infer sentiment and be robust to temporal data distribution shifts? In this paper, we conduct an empirical study on the financial sentiment analysis system under temporal data distribution shifts using a real-world financial social media dataset that spans three years. We find that the fine-tuned models suffer from general performance degradation in the presence of temporal distribution shifts. Furthermore, motivated by the unique temporal nature of the financial text, we propose a novel method that combines out-of-distribution detection with time series modeling for temporal financial sentiment analysis. Experimental results show that the proposed method enhances the model's capability to adapt to evolving temporal shifts in a volatile financial market.
摘要
Temporal data distribution shift is prevalent in financial texts. How can a financial sentiment analysis system be trained in a volatile market environment that can accurately infer sentiment and be robust to temporal data distribution shifts? In this paper, we conduct an empirical study on the financial sentiment analysis system under temporal data distribution shifts using a real-world financial social media dataset that spans three years. We find that the fine-tuned models suffer from general performance degradation in the presence of temporal distribution shifts. Furthermore, motivated by the unique temporal nature of financial texts, we propose a novel method that combines out-of-distribution detection with time series modeling for temporal financial sentiment analysis. Experimental results show that the proposed method enhances the model's capability to adapt to evolving temporal shifts in a volatile financial market.Here's the word-for-word translation of the text into Simplified Chinese:时间数据分布偏移是金融文本中的普遍现象。如何在投资环境中训练一个可以准确感受情感并在时间数据分布偏移下具有鲜度的金融情感分析系统?在这篇论文中,我们通过使用三年的实际金融社交媒体数据进行了employmerical study,发现了精度调整模型在时间数据分布偏移下的总性性能下降。此外,鉴于金融文本的特殊时间特征,我们提出了一种新的方法,即将out-of-distribution检测与时间系列模型结合以实现时间金融情感分析。实验结果表明,我们的提议方法可以在投资市场中的投资环境中提高模型的适应能力。
Multilingual estimation of political-party positioning: From label aggregation to long-input Transformers
results: 研究在 Comparative Manifestos Project 数据集上进行了分析,包括41个国家和27种语言,并发现使用当今的模型可以高效解决这个任务,而label aggregation方法得到了最佳结果。Abstract
Scaling analysis is a technique in computational political science that assigns a political actor (e.g. politician or party) a score on a predefined scale based on a (typically long) body of text (e.g. a parliamentary speech or an election manifesto). For example, political scientists have often used the left--right scale to systematically analyse political landscapes of different countries. NLP methods for automatic scaling analysis can find broad application provided they (i) are able to deal with long texts and (ii) work robustly across domains and languages. In this work, we implement and compare two approaches to automatic scaling analysis of political-party manifestos: label aggregation, a pipeline strategy relying on annotations of individual statements from the manifestos, and long-input-Transformer-based models, which compute scaling values directly from raw text. We carry out the analysis of the Comparative Manifestos Project dataset across 41 countries and 27 languages and find that the task can be efficiently solved by state-of-the-art models, with label aggregation producing the best results.
摘要
《缩放分析》是计算政治科学中的一种技术,它将政治actor(如政治家或党派)分配到一个预先定义的数值范围中,基于一段(通常很长)的文本(如国会演讲或选举纲领)。例如,政治科学家经常使用左右刻度来系统地分析不同国家的政治景观。NLP方法可以自动进行缩放分析,只要它们能够处理长文本并在领域和语言上具有可靠性。在这项工作中,我们实现并比较了两种自动缩放分析政党纲领的方法:标签聚合策略和长输入变换器模型。我们对 Comparative Manifestos Project 数据集进行了41个国家和27种语言的分析,发现这种任务可以通过当今的模型高效解决,标签聚合策略得到了最佳结果。
Large Language Models Help Humans Verify Truthfulness – Except When They Are Convincingly Wrong
paper_authors: Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber
for: 这 paper 是研究语言模型(LLMs)在提供信息时的可靠性和事实性的。
methods: 这 paper 使用了80名劳动者进行实验,比较语言模型和搜索引擎在帮助用户Fact-checking中的表现。
results: 用户阅读语言模型的解释时比使用搜索引擎更高效,但往往会因为错误的解释而过分依赖于语言模型。为解决这个问题, authors 提出了使用对比性的解释,并证明这种方法不能显著超越搜索引擎。Abstract
Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they're getting, LLMs should not only provide but also help users fact-check information. In this paper, we conduct experiments with 80 crowdworkers in total to compare language models with search engines (information retrieval systems) at facilitating fact-checking by human users. We prompt LLMs to validate a given claim and provide corresponding explanations. Users reading LLM explanations are significantly more efficient than using search engines with similar accuracy. However, they tend to over-rely the LLMs when the explanation is wrong. To reduce over-reliance on LLMs, we ask LLMs to provide contrastive information - explain both why the claim is true and false, and then we present both sides of the explanation to users. This contrastive explanation mitigates users' over-reliance on LLMs, but cannot significantly outperform search engines. However, showing both search engine results and LLM explanations offers no complementary benefits as compared to search engines alone. Taken together, natural language explanations by LLMs may not be a reliable replacement for reading the retrieved passages yet, especially in high-stakes settings where over-relying on wrong AI explanations could lead to critical consequences.
摘要
大型自然语言模型(LLM)在访问网络信息方面日益广泛使用。因此,LLM的准确性和事实性具有极大的兴趣。为了帮助用户做出正确的信息决策,LLM不仅应该提供信息,而且还应该帮助用户进行事实核实。在这篇论文中,我们通过80名志愿者进行实验,比较了LLM与搜索引擎(信息检索系统)在促进用户进行事实核实方面的性能。我们请求LLM验证一个声明,并提供相关的解释。用户读取LLM解释时比使用搜索引擎相同的精度更高,但往往会因LLM解释错误而过度依赖LLM。为减少过度依赖LLM,我们请求LLM提供相互补做的解释——解释一个声明是True和False两个方面。然后,我们向用户展示这两个解释。这种相互补做的解释可以减少用户对LLM的过度依赖,但无法达到与搜索引擎相同的性能。尽管显示搜索引擎结果和LLM解释可以提供补做,但这并没有提供补做的优势。因此,自然语言解释由LLM可能不是一个可靠的替代品,特别在高风险情况下,过度依赖错误的AI解释可能会导致严重的后果。
Product Attribute Value Extraction using Large Language Models
results: 研究发现,使用 GPT-4 可以达到 attribute/value EXTRACTION 的平均 F1 分数为 85%,而最佳 PLM-based 技术在同样的训练数据量下表现较差约 5%。此外, fine-tuned GPT-3.5 模型可以达到类似于 GPT-4 的性能,但是更加经济。Abstract
E-commerce applications such as faceted product search or product comparison are based on structured product descriptions like attribute/value pairs. The vendors on e-commerce platforms do not provide structured product descriptions but describe offers using titles or descriptions. To process such offers, it is necessary to extract attribute/value pairs from textual product attributes. State-of-the-art attribute/value extraction techniques rely on pre-trained language models (PLMs), such as BERT. Two major drawbacks of these models for attribute/value extraction are that (i) the models require significant amounts of task-specific training data and (ii) the fine-tuned models face challenges in generalizing to attribute values not included in the training data. This paper explores the potential of large language models (LLMs) as a training data-efficient and robust alternative to PLM-based attribute/value extraction methods. We consider hosted LLMs, such as GPT-3.5 and GPT-4, as well as open-source LLMs based on Llama2. We evaluate the models in a zero-shot scenario and in a scenario where task-specific training data is available. In the zero-shot scenario, we compare various prompt designs for representing information about the target attributes of the extraction. In the scenario with training data, we investigate (i) the provision of example attribute values, (ii) the selection of in-context demonstrations, and (iii) the fine-tuning of GPT-3.5. Our experiments show that GPT-4 achieves an average F1-score of 85% on the two evaluation datasets while the best PLM-based techniques perform on average 5% worse using the same amount of training data. GPT-4 achieves a 10% higher F1-score than the best open-source LLM. The fine-tuned GPT-3.5 model reaches a similar performance as GPT-4 while being significantly more cost-efficient.
摘要
电子商务应用程序如多维产品搜索或产品比较是基于结构化产品描述如属性值对。供应商在电子商务平台上不提供结构化产品描述,而是使用标题或描述来描述产品。为处理这些产品,需要从文本属性中提取属性值对。现状的属性值提取技术都是基于预训练语言模型(PLM),如BERT。这两种模型的缺点是:(一)模型需要大量的任务特定训练数据,(二)精通化的模型在训练数据中未包含的属性值上面临挑战。这篇论文探讨使用大语言模型(LLM)作为任务数据efficient和可靠的alternative。我们考虑了主机LLM,如GPT-3.5和GPT-4,以及基于Llama2的开源LLM。我们在零容量情况下和具有任务特定训练数据的情况下评估了模型。在零容量情况下,我们比较了不同的提示设计来表达目标属性的信息。在具有训练数据的情况下,我们研究了(一)提供示例属性值,(二)选择 Contextual Demonstrations,(三)精通化GPT-3.5。我们的实验结果显示,GPT-4在两个评估 datasets 上的平均 F1 分为 85%,而最佳 PLM 基本技术在同样的训练数据量下表现落后约 5%。GPT-4 在同样的训练数据量下达到了 10% 高的 F1 分,而 откры源 LL 在同样的训练数据量下达到了类似的性能。精通化 GPT-3.5 模型可以达到类似的性能,但是更加cost-efficient。
ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding
results: 在IGLUEbenchmark中的两个任务上,我们通过实验表明,ICU可以在9种语言中为5种语言取得新的状态对抗记录,并对剩下的语言取得相似的记录。Abstract
Most multilingual vision-and-language (V&L) research aims to accomplish multilingual and multimodal capabilities within one model. However, the scarcity of multilingual captions for images has hindered the development. To overcome this obstacle, we propose ICU, Image Caption Understanding, which divides a V&L task into two stages: a V&L model performs image captioning in English, and a multilingual language model (mLM), in turn, takes the caption as the alt text and performs crosslingual language understanding. The burden of multilingual processing is lifted off V&L model and placed on mLM. Since the multilingual text data is relatively of higher abundance and quality, ICU can facilitate the conquering of language barriers for V&L models. In experiments on two tasks across 9 languages in the IGLUE benchmark, we show that ICU can achieve new state-of-the-art results for five languages, and comparable results for the rest.
摘要
大多数多语言视觉语言(V&L)研究的目标是在一个模型中实现多语言和多Modal功能。然而,由于图像 Multilingual 标签的罕见性,这些研究受到了阻碍。为了解决这个问题,我们提出了 ICU(图像标题理解),它将 V&L 任务分成两个阶段:一个 V&L 模型在英语中进行图像标题 generation,然后一个多语言语言模型(mLM)使用这个标题作为Alt文本,进行跨语言语言理解。将多语言处理的负担从 V&L 模型转移到 mLM 上。由于多语言文本数据的质量和quantity相对较高,ICU 可以帮助 conquering 语言障碍 для V&L 模型。在 IGLUE benchmark 上两个任务上,我们通过实验表明,ICU 可以达到新的州OF-THE-ART 结果的五种语言,和与其他语言相对的结果。
Named Entity Recognition for Monitoring Plant Health Threats in Tweets: a ChouBERT Approach
results: ChouBERT 模型可以通过 Twitter 上的用户生成的文本数据探测到不熟悉的作物健康问题,并且可以在不同的自然灾害中保持一定的通用性。Abstract
An important application scenario of precision agriculture is detecting and measuring crop health threats using sensors and data analysis techniques. However, the textual data are still under-explored among the existing solutions due to the lack of labelled data and fine-grained semantic resources. Recent research suggests that the increasing connectivity of farmers and the emergence of online farming communities make social media like Twitter a participatory platform for detecting unfamiliar plant health events if we can extract essential information from unstructured textual data. ChouBERT is a French pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards. This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
摘要
设置语言为简化中文。<>精准农业中一个重要应用场景是通过感知器和数据分析技术检测和评估作物健康威胁。然而,文本数据仍然受到已有解决方案的限制,这是因为lack labelled data和细化 semantic resources。近期研究表明,农民之间的连接度的增加和在线农业社区的出现,使得社交媒体如推特成为了检测不熟悉的作物健康事件的参与式平台。ChouBERT是一种法国预训练语言模型,可以从推特上提取关于作物健康问题的观察记录,并且可以在未看过的自然灾害上进行普适化。本文解决了lack labelled data问题,通过进一步研究ChouBERT的Token-level注释任务能力。
Lost in Translation: When GPT-4V(ision) Can’t See Eye to Eye with Text. A Vision-Language-Consistency Analysis of VLLMs and Beyond
results: 研究发现,当任务较为简单时,模型如GPT-4V在不同modalities之间具有一定的一致性。然而,当任务变得更加复杂时,图像模态的可靠性减退。此外,我们还提出了一种“视觉描述引导”方法,可以有效地提高在复杂视觉任务中的表现。Abstract
Recent advancements in multimodal techniques open exciting possibilities for models excelling in diverse tasks involving text, audio, and image processing. Models like GPT-4V, blending computer vision and language modeling, excel in complex text and image tasks. Numerous prior research endeavors have diligently examined the performance of these Vision Large Language Models (VLLMs) across tasks like object detection, image captioning and others. However, these analyses often focus on evaluating the performance of each modality in isolation, lacking insights into their cross-modal interactions. Specifically, questions concerning whether these vision-language models execute vision and language tasks consistently or independently have remained unanswered. In this study, we draw inspiration from recent investigations into multilingualism and conduct a comprehensive analysis of model's cross-modal interactions. We introduce a systematic framework that quantifies the capability disparities between different modalities in the multi-modal setting and provide a set of datasets designed for these evaluations. Our findings reveal that models like GPT-4V tend to perform consistently modalities when the tasks are relatively simple. However, the trustworthiness of results derived from the vision modality diminishes as the tasks become more challenging. Expanding on our findings, we introduce "Vision Description Prompting," a method that effectively improves performance in challenging vision-related tasks.
摘要
现代多模态技术的发展开创了多任务涉及文本、音频和图像处理的模型表现出色的可能性。如GPT-4V模型,它将计算机视觉和自然语言处理融合在一起,在复杂的文本和图像任务中表现出色。许多前期研究努力地研究了这些视觉大语言模型(VLLMs)在不同任务中的表现,但这些分析通常会孤立地评估每个模式的表现,缺乏跨模式交互的视角。特别是,关于这些视觉语言模型在视觉和语言任务中是否能够协调一致的问题,一直未得到回答。在这项研究中,我们 draw inspiration from recent investigations into multilingualism and conduct a comprehensive analysis of the model's cross-modal interactions. We introduce a systematic framework that quantifies the capability disparities between different modalities in the multi-modal setting and provide a set of datasets designed for these evaluations. Our findings reveal that models like GPT-4V tend to perform consistently across modalities when the tasks are relatively simple. However, the trustworthiness of results derived from the vision modality diminishes as the tasks become more challenging. Based on our findings, we introduce "Vision Description Prompting," a method that effectively improves performance in challenging vision-related tasks.
Attack Prompt Generation for Red Teaming and Defending Large Language Models
paper_authors: Boyi Deng, Wenjie Wang, Fuli Feng, Yang Deng, Qifan Wang, Xiangnan He
for: 防御大语言模型(LLMs)受到红色队伍攻击,生成危害性内容。
methods: 提出一种整合手动和自动方法的方法,以便经济高质量的攻击提示构造。
results: 实验 validate 提出的攻击和防御框架的有效性,并释放了不同大语言模型的攻击提示数据集(SAP)。Abstract
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. Previous research constructs attack prompts via manual or automatic methods, which have their own limitations on construction cost and quality. To address these issues, we propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts. Specifically, considering the impressive capabilities of newly emerged LLMs, we propose an attack framework to instruct LLMs to mimic human-generated prompts through in-context learning. Furthermore, we propose a defense framework that fine-tunes victim LLMs through iterative interactions with the attack framework to enhance their safety against red teaming attacks. Extensive experiments on different LLMs validate the effectiveness of our proposed attack and defense frameworks. Additionally, we release a series of attack prompts datasets named SAP with varying sizes, facilitating the safety evaluation and enhancement of more LLMs. Our code and dataset is available on https://github.com/Aatrox103/SAP .
摘要
Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning
results: 在三个外在偏见benchmark上进行了实验,结果表明Co$^2$PT在下游任务中的偏见mitigation效果显著,并且可以与现有的上游净化语言模型结合使用。Abstract
Pre-trained Language Models are widely used in many important real-world applications. However, recent studies show that these models can encode social biases from large pre-training corpora and even amplify biases in downstream applications. To address this challenge, we propose Co$^2$PT, an efficient and effective debias-while-prompt tuning method for mitigating biases via counterfactual contrastive prompt tuning on downstream tasks. Our experiments conducted on three extrinsic bias benchmarks demonstrate the effectiveness of Co$^2$PT on bias mitigation during the prompt tuning process and its adaptability to existing upstream debiased language models. These findings indicate the strength of Co$^2$PT and provide promising avenues for further enhancement in bias mitigation on downstream tasks.
摘要
预训言语模型在许多重要的实际应用中广泛使用。然而,最近的研究表明,这些模型可以从大规模预训料中学习社会偏见,甚至在下游应用中强化偏见。为解决这个挑战,我们提出了Co$^2$PT,一种高效的debias-while-prompt tuning方法,通过对下游任务进行假想对比的短语调整来mitigate偏见。我们的实验在三个外在偏见benchmark中展示了Co$^2$PT在偏见减轻过程中的效果和对现有的逆偏见语言模型的适应性。这些发现表明Co$^2$PT的强大和进一步减轻偏见的可能性。
MedAI Dialog Corpus (MEDIC): Zero-Shot Classification of Doctor and AI Responses in Health Consultations
results: 研究发现,预训练语言模型在医疗咨询文本分类方面存在限制,尚未达到预期的准确率。这些结果为未来在医疗文本分类领域的研究提供了基础。Abstract
Zero-shot classification enables text to be classified into classes not seen during training. In this research, we investigate the effectiveness of pre-trained language models to accurately classify responses from Doctors and AI in health consultations through zero-shot learning. Our study aims to determine whether these models can effectively detect if a text originates from human or AI models without specific corpus training. We collect responses from doctors to patient inquiries about their health and pose the same question/response to AI models. While zero-shot language models show a good understanding of language in general, they have limitations in classifying doctor and AI responses in healthcare consultations. This research lays the groundwork for further research into this field of medical text classification, informing the development of more effective approaches to accurately classify doctor-generated and AI-generated text in health consultations.
摘要
zero-shot 分类可以使文本被分类到没有在训练过程中看到的类别中。在这个研究中,我们研究了预训练语言模型在医疗询问中的准确性。我们的研究目标是确定这些模型能否准确地检测文本是否来自人类或AI模型,无需特定的文库训练。我们收集了医生对病人问题的回答,并对这些问题和回答提出同样的问题/回答给AI模型。虽然零shot语言模型在语言水平上显示了良好的理解,但在医疗询问中的分类中存在限制。这个研究为这一领域的医学文本分类铺平了基础,推动了更有效的approaches的发展,以准确地分类医生生成和AI生成的文本在医疗询问中。
paper_authors: Etsuko Ishii, Yan Xu, Bryan Wilie, Ziwei Ji, Holy Lovenia, Willy Chung, Pascale Fung
for: This paper aims to improve the ability of language models in inductive reasoning, specifically in generating correct inferences when not all information is present in the context.
methods: The paper uses contrastive learning, where negative samples are fed to the model to help it understand what is wrong and improve its inference generation.
results: The experiments suggest that using negative samples improves the model’s ability to generate correct inferences, mitigating the information gap between dialogue contexts and desired inferences.Abstract
Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap -- which distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993). Our analysis reveals that the disparity in information between dialogue contexts and desired inferences poses a significant challenge to the inductive inference process. To mitigate this information gap, we investigate a contrastive learning approach by feeding negative samples. Our experiments suggest negative samples help models understand what is wrong and improve their inference generations.
摘要
对话中的推理,特别是从推理过程中得到的推论,是我们的对话中的一个重要组成部分,可以补充说话人所显示或隐藏的信息。Recent large language models在推理任务中表现出色,但是在 inductive reasoning 任务中,它们的表现远远落后于 deduced reasoning。在这篇论文中,我们分析基于任务难度定义的semantic information gap,并对模型的行为进行分析。我们发现,对话上下文中的信息与推理结果之间的信息差异 pose 一个重要的挑战。为了缓解这种信息差异,我们 investigate 一种对比学习方法,通过 feeding negative samples。我们的实验表明,负样本可以帮助模型理解错误,并提高其推理生成。
Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
paper_authors: Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang
for: 本研究旨在探讨Transformer模型中数据是否可以通过注意力权重和输出来恢复。
methods: 我们提出了一种理论框架,通过推广损失函数L(X)来回归输入数据X。
results: 我们发现,通过注意力权重和输出,可以recover输入数据,这有关LLM的设计存在潜在的安全和隐私问题。Abstract
In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and outputs? We introduce a theoretical framework to tackle this problem. Specifically, we present an algorithm that aims to recover the input data $X \in \mathbb{R}^{d \times n}$ from given attention weights $W = QK^\top \in \mathbb{R}^{d \times d}$ and output $B \in \mathbb{R}^{n \times n}$ by minimizing the loss function $L(X)$. This loss function captures the discrepancy between the expected output and the actual output of the transformer. Our findings have significant implications for the Localized Layer-wise Mechanism (LLM), suggesting potential vulnerabilities in the model's design from a security and privacy perspective. This work underscores the importance of understanding and safeguarding the internal workings of transformers to ensure the confidentiality of processed data.
摘要
在深度学习领域,转换器已成为主流架构,尤其在自然语言处理任务中。然而,随着其广泛应用,对转换器处理数据的安全性和隐私问题产生了关切的关注。在这篇论文中,我们解决了一个重要问题:可以通过转换器的注意力权重和输出来恢复输入数据 $X \in \mathbb{R}^{d \times n}$?我们提出了一个理论框架,并采用一种算法来实现这一目标。具体来说,我们提出了一种算法,通过将注意力权重 $W = QK^\top \in \mathbb{R}^{d \times d}$ 和输出 $B \in \mathbb{R}^{n \times n}$ 作为输入,计算出输入数据 $X$ 的恢复loss函数 $L(X)$。这个loss函数捕捉了转换器输出与预期输出之间的差异。我们的发现对Localized Layer-wise Mechanism (LLM) 有重要的安全性和隐私问题的影响,表明转换器的设计可能存在潜在的漏洞。这种工作强调了理解和保护转换器的内部工作方式,以确保处理数据的隐私。
A Read-and-Select Framework for Zero-shot Entity Linking
paper_authors: Zhenran Xu, Yulin Chen, Baotian Hu, Min Zhang
for: 这篇 paper 的目的是提出一个 zero-shot entity linking (EL) 方法,以挑战模型的通用能力。
methods: 这篇 paper 使用了一个 read-and-select (ReS) 框架,模型了主要的实体识别杜里识别和跨实体比较。
results: 这篇 paper 在 ZESHEL dataset 上 achieved 顶尖性能,与先前大多数工作不需要几阶段预训,展示了提取和选择的两个过程之间的互动效果。I hope that helps! Let me know if you have any other questions.Abstract
Zero-shot entity linking (EL) aims at aligning entity mentions to unseen entities to challenge the generalization ability. Previous methods largely focus on the candidate retrieval stage and ignore the essential candidate ranking stage, which disambiguates among entities and makes the final linking prediction. In this paper, we propose a read-and-select (ReS) framework by modeling the main components of entity disambiguation, i.e., mention-entity matching and cross-entity comparison. First, for each candidate, the reading module leverages mention context to output mention-aware entity representations, enabling mention-entity matching. Then, in the selecting module, we frame the choice of candidates as a sequence labeling problem, and all candidate representations are fused together to enable cross-entity comparison. Our method achieves the state-of-the-art performance on the established zero-shot EL dataset ZESHEL with a 2.55% micro-average accuracy gain, with no need for laborious multi-phase pre-training used in most of the previous work, showing the effectiveness of both mention-entity and cross-entity interaction.
摘要
<> traduction du texte en chinois simplifié<>零shot实体链接(EL)目标是将实体提及与未见过的实体进行对应,以挑战总结能力。先前的方法主要集中在候选人选择阶段,忽略了实体识别阶段的关键环节,这使得最终的链接预测受到了限制。在这篇论文中,我们提出了读取和选择(ReS)框架,模型实体异常识别的主要组成部分,即提及语境匹配和跨实体比较。首先,为每个候选人,读取模块利用提及语境来生成提及意识的实体表示,以便提及语境匹配。然后,选择模块将选择问题定义为序列标签问题,并将所有候选人表示融合起来,以便跨实体比较。我们的方法在ZESHEL数据集上实现了零shotEL任务的state-of-the-art性,具有2.55%的微平均准确率提升,无需耗费大量的多阶段预训练,示出提及语境和跨实体交互的效iveness。
Revisiting Sparse Retrieval for Few-shot Entity Linking
results: 对 ZESHEL 数据集进行实验,提出的方法比现有模型在所有测试领域均显著提高了性能,证明了关键词增强的稀疏检索的效果。Abstract
Entity linking aims to link ambiguous mentions to their corresponding entities in a knowledge base. One of the key challenges comes from insufficient labeled data for specific domains. Although dense retrievers have achieved excellent performance on several benchmarks, their performance decreases significantly when only a limited amount of in-domain labeled data is available. In such few-shot setting, we revisit the sparse retrieval method, and propose an ELECTRA-based keyword extractor to denoise the mention context and construct a better query expression. For training the extractor, we propose a distant supervision method to automatically generate training data based on overlapping tokens between mention contexts and entity descriptions. Experimental results on the ZESHEL dataset demonstrate that the proposed method outperforms state-of-the-art models by a significant margin across all test domains, showing the effectiveness of keyword-enhanced sparse retrieval.
摘要
Entity 链接目标是将模糊提及链接到知识库中的对应实体。一个关键挑战是域特定数据的不足。虽然稠密抽取器在多个标准准点上表现出色,但当只有有限量的域特定标注数据时,其表现会明显下降。在这种几个shotSetting下,我们重新考虑稠密抽取方法,并基于ELECTRA提出了一个键盘EXTRACTOR来减少提及上下文的噪声并构建更好的查询表达。为了训练EXTRACTOR,我们提出了远程监督法,通过提取 mention context 和实体描述中的重叠token来自动生成训练数据。实验结果表明,我们提出的方法在ZESHEL数据集上比 estado-of-the-art 模型具有显著的优势,在所有测试领域中表现出色,这表明了键 palabm-enhanced 稠密抽取的效iveness。
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
results: 对于自然语言模型和生成任务,MASFormer 可以与权重 transformer 具有相同的性能,而且可以减少计算成本(最多下降75%)。Abstract
Pretrained transformer models have demonstrated remarkable performance across various natural language processing tasks. These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence. However, the (full) attention mechanism incurs high computational cost - quadratic in the sequence length, which is not affordable in tasks with long sequences, e.g., inputs with 8k tokens. Although sparse attention can be used to improve computational efficiency, as suggested in existing work, it has limited modeling capacity and often fails to capture complicated dependencies in long sequences. To tackle this challenge, we propose MASFormer, an easy-to-implement transformer variant with Mixed Attention Spans. Specifically, MASFormer is equipped with full attention to capture long-range dependencies, but only at a small number of layers. For the remaining layers, MASformer only employs sparse attention to capture short-range dependencies. Our experiments on natural language modeling and generation tasks show that a decoder-only MASFormer model of 1.3B parameters can achieve competitive performance to vanilla transformers with full attention while significantly reducing computational cost (up to 75%). Additionally, we investigate the effectiveness of continual training with long sequence data and how sequence length impacts downstream generation performance, which may be of independent interest.
摘要
预训练的变换器模型在不同的自然语言处理任务中表现了惊人的表现。这些模型利用注意机制来捕捉序列中的长距离和短距离依赖关系。然而,全attenion机制对于长序列来说计算成本高于 quadratic,这对于长序列任务而言是不可接受的。虽然可以使用稀疏注意来提高计算效率,但这会削弱模型的表达能力,常常无法捕捉长序列中的复杂依赖关系。为了解决这个挑战,我们提出了 MASFormer,一种简单实现的变换器变体,具有混合注意长度。具体来说,MASFormer 具有全attenion,以捕捉长距离依赖关系,但只在一些层中使用。对于剩下的层,MASformer 只使用稀疏注意,以捕捉短距离依赖关系。我们的实验表明,一个 Parameters 为 1.3B 的 decoder-only MASFormer 模型可以与普通的变换器模型具有相同的表现,同时显著降低计算成本(最高降低 75%)。此外,我们还研究了在长序列数据上进行 continual training 的效果,以及序列长度对下游生成性能的影响,这可能是独立的兴趣。
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
for: The paper is written for document parsing and structured representation of unstructured documents.
methods: The paper proposes a powerful open-source toolchain called DocXChain, which includes basic capabilities such as text detection, text recognition, table structure recognition, and layout analysis, as well as fully functional pipelines for document parsing, including general text reading, table parsing, and document structurization.
results: The paper demonstrates the effectiveness of DocXChain in automatically converting rich information embodied in unstructured documents into structured representations that are readable and manipulable by machines. The paper also shows that DocXChain is concise, modularized, and flexible, and can be readily integrated with existing tools, libraries, or models to construct more powerful systems for various applications related to documents in real-world scenarios.Here’s the same information in Simplified Chinese text:
for: 这篇论文是为了文档分析和结构表示不结构化文档而写的。
methods: 论文提出了一个强大的开源工具链 called DocXChain,包括基本功能 such as 文本检测、文本识别、表格结构识别和布局分析,以及完整的文档分析管道,包括通用文本读取、表格分析和文档结构化。
results: 论文证明了 DocXChain 可以自动将不结构化文档中的丰富信息转化为机器可读和操作的结构表示。论文还显示了 DocXChain 是简洁、模块化和灵活的,可以与现有的工具、库或模型(如 LangChain 和 ChatGPT)集成,建立更强大的系统,用于各种文档相关的应用场景。Abstract
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines. Specifically, basic capabilities, including text detection, text recognition, table structure recognition and layout analysis, are provided. Upon these basic capabilities, we also build a set of fully functional pipelines for document parsing, i.e., general text reading, table parsing, and document structurization, to drive various applications related to documents in real-world scenarios. Moreover, DocXChain is concise, modularized and flexible, such that it can be readily integrated with existing tools, libraries or models (such as LangChain and ChatGPT), to construct more powerful systems that can accomplish more complicated and challenging tasks. The code of DocXChain is publicly available at:~\url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/Applications/DocXChain}
摘要
在本报告中,我们介绍了 DocXChain,一个强大的开源工具链 для文档分析,它可以自动将不结构化文档中的丰富信息,如文本、表格和图表,转化为可读取和可操作的机器可读取格式。具体来说,提供了基本功能,包括文本检测、文本识别、表格结构识别和文档布局分析。在这些基本功能之上,我们还构建了一些完整的文档分析管道,例如通用文本读取、表格分析和文档结构化,以驱动实际场景中文档应用。此外, DocXChain 具有简洁、模块化和灵活的特点,可以轻松地与现有的工具、库或模型(如 LangChain 和 ChatGPT)集成,以构建更强大的系统,用于解决更复杂和挑战性的任务。 DocXChain 的代码可以在以下链接获取:https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/Applications/DocXChain。
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions
paper_authors: Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han
for: 本研究旨在探讨现有的大语言模型(LLM)研究是否准确反映用户需求。
methods: 本研究使用大规模的用户-GPT对话集来分析现有的NLP研究和用户需求之间的差异。
results: 研究发现用户常见的任务,如“设计”和“规划”,在学术研究中受到忽视或与传统的NLP标准任务不同。研究还探讨了这些被忽略的任务的实际挑战和如何使LLM更加适应用户需求。Abstract
Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks. However, it remains unclear whether the existing focus of NLP research accurately captures the genuine requirements of human users. This paper provides a comprehensive analysis of the divergence between current NLP research and the needs of real-world NLP applications via a large-scale collection of user-GPT conversations. We analyze a large-scale collection of real user queries to GPT. We compare these queries against existing NLP benchmark tasks and identify a significant gap between the tasks that users frequently request from LLMs and the tasks that are commonly studied in academic research. For example, we find that tasks such as ``design'' and ``planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks. We investigate these overlooked tasks, dissect the practical challenges they pose, and provide insights toward a roadmap to make LLMs better aligned with user needs.
摘要
FinEntity: Entity-level Sentiment Classification for Financial Texts
results: 在一个案例研究中,通过使用 FinEntity 监测 криптовалю市场,实证表明 FinEntity 可以帮助准确评估金融实体的 Sentiment。数据和代码可以在 GitHub 上下载:https://github.com/yixuantt/FinEntityAbstract
In the financial domain, conducting entity-level sentiment analysis is crucial for accurately assessing the sentiment directed toward a specific financial entity. To our knowledge, no publicly available dataset currently exists for this purpose. In this work, we introduce an entity-level sentiment classification dataset, called \textbf{FinEntity}, that annotates financial entity spans and their sentiment (positive, neutral, and negative) in financial news. We document the dataset construction process in the paper. Additionally, we benchmark several pre-trained models (BERT, FinBERT, etc.) and ChatGPT on entity-level sentiment classification. In a case study, we demonstrate the practical utility of using FinEntity in monitoring cryptocurrency markets. The data and code of FinEntity is available at \url{https://github.com/yixuantt/FinEntity}
摘要
在金融领域,实施实体级别的情感分析是准确评估特定金融实体所向的情感方向的关键。据我们知道,目前没有公开可用的数据集用于此目的。在这种工作中,我们介绍了一个名为\textbf{FinEntity}的实体级别情感分类数据集,该数据集标注了金融实体范围内的情感(积极、中性、消极)在金融新闻中。我们在论文中详细介绍了数据集的建构过程。此外,我们还对多种预训练模型(如BERT、FinBERT等)和ChatGPT进行了实体级别情感分类的benchmark测试。在一个实验案例中,我们示出了使用FinEntity监测 криптовалю市场的实际实用性。FinEntity数据和代码可以在\url{https://github.com/yixuantt/FinEntity}获取。
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
results: 透过对受测者进行 semi-结构化访谈和调查,研究发现本系统不仅能帮助用户创作音乐,还有潜在应用于更广泛的领域。Abstract
Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpret user intentions and select appropriate AI models for task execution. Each backend model is specialized for a specific task, and their outputs are aggregated to meet the user's requirements. To ensure musical coherence, essential attributes are maintained in a centralized table. We evaluate the effectiveness of the proposed system through semi-structured interviews and questionnaires, highlighting its utility not only in facilitating music creation but also its potential for broader applications.
摘要
创作音乐是一个迭代过程,需要不同的方法在每个阶段。然而,现有的AI音乐系统尚未能够有效地融合多种需求。为了解决这一问题,我们介绍了Loop Copilot,一种新的系统,允许用户通过交互式、多轮对话界面来生成和 repeatedly refine music。该系统使用大型自然语言模型来理解用户的意图,并选择适合任务执行的特定AI模型。每个后端模型特化于特定任务,其输出被聚合以满足用户的需求。为保证音乐凝聚,中央表格中维护了关键属性。我们通过 semi-structured 采访和问卷调查评估了提议的系统的有效性,并指出了其不仅能够促进音乐创作,还有潜在的应用于更广泛的领域。
paper_authors: Michael Barnett, William Brock, Lars Peter Hansen, Ruimeng Hu, Joseph Huang
for: 这个论文旨在研究气候经济框架中模型不确定性的影响。
methods: 论文使用神经网络方法解决高维度非线性模型问题。
results: 研究发现模型不确定性对优化决策和社会价值具有首层影响。 accounting for climatic dynamics, economic damage from climate change, and the arrival of green technological change leads to significant adjustments to investment in different capital types in anticipation of technological change and the revelation of climate damage severity.Abstract
We study the implications of model uncertainty in a climate-economics framework with three types of capital: "dirty" capital that produces carbon emissions when used for production, "clean" capital that generates no emissions but is initially less productive than dirty capital, and knowledge capital that increases with R\&D investment and leads to technological innovation in green sector productivity. To solve our high-dimensional, non-linear model framework we implement a neural-network-based global solution method. We show there are first-order impacts of model uncertainty on optimal decisions and social valuations in our integrated climate-economic-innovation framework. Accounting for interconnected uncertainty over climate dynamics, economic damages from climate change, and the arrival of a green technological change leads to substantial adjustments to investment in the different capital types in anticipation of technological change and the revelation of climate damage severity.
摘要
我们研究模型不确定性在气候经济框架中的影响,这样有三种资产:“坏”资产(dirty capital)在生产时产生碳排放,“清洁”资产(clean capital)在初期不如“坏”资产生产力,但不产生排放,而“知识”资产(knowledge capital)透过研发投资增加技术创新,提高绿色领域生产力。为解决我们的高维度、非线性模型框架,我们实现了基于神经网络的全球解决方案。我们显示,在考虑气候动力学、气候变革对经济伤害和绿色技术变革的连接不确定性下,会出现首项影响并且有很大的调整投资不同类型的资产,以应对技术变革和气候伤害的揭露。
Heterogeneous Graph Neural Networks for Data-driven Traffic Assignment
results: 数据实验表明,该模型能够快速收敛、减少训练损失,并且在预测交通流方面具有高度的准确性。此外,该模型还可以应用于不同的网络架构。Abstract
The traffic assignment problem is one of the significant components of traffic flow analysis for which various solution approaches have been proposed. However, deploying these approaches for large-scale networks poses significant challenges. In this paper, we leverage the power of heterogeneous graph neural networks to propose a novel data-driven approach for traffic assignment and traffic flow learning. The proposed model is capable of capturing spatial traffic patterns across different links, yielding highly accurate results. We present numerical experiments on urban transportation networks and show that the proposed heterogeneous graph neural network model outperforms other conventional neural network models in terms of convergence rate, training loss, and prediction accuracy. Notably, the proposed heterogeneous graph neural network model can also be generalized to different network topologies. This approach offers a promising solution for complex traffic flow analysis and prediction, enhancing our understanding and management of a wide range of transportation systems.
摘要
traffic assignment problem 是 traffic flow analysis 中一个重要的 ком component,Various solution approaches 已经被提出。However,在 large-scale networks 上部署这些approaches poses significant challenges。In this paper, we leverage the power of heterogeneous graph neural networks 来提出一种 novel data-driven approach for traffic assignment and traffic flow learning。The proposed model 可以 capture spatial traffic patterns across different links, yielding highly accurate results。We present numerical experiments on urban transportation networks and show that the proposed heterogeneous graph neural network model outperforms other conventional neural network models in terms of convergence rate, training loss, and prediction accuracy。Notably, the proposed heterogeneous graph neural network model can also be generalized to different network topologies。This approach offers a promising solution for complex traffic flow analysis and prediction,enhancing our understanding and management of a wide range of transportation systems。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.
methods: 作者提出了一种新的几乎Equivariance定义,并给出了一种实际的方法来在模型中编码几乎Equivariance,通过利用 Lie GROUP 的 Lie algebra。作者还证明了几乎Equivariance和几乎Isometry之间的关系,并证明了几乎Equivariant manifold embeddings的存在性。
results: 作者通过对实际数据进行测试,证明了他们的方法的有效性。他们还证明了几乎Equivariance可以提供更好的模型表现,而不需要严格的Equivariance。Abstract
Recently, the equivariance of models with respect to a group action has become an important topic of research in machine learning. However, imbuing an architecture with a specific group equivariance imposes a strong prior on the types of data transformations that the model expects to see. While strictly-equivariant models enforce symmetries, real-world data does not always conform to such strict equivariances, be it due to noise in the data or underlying physical laws that encode only approximate or partial symmetries. In such cases, the prior of strict equivariance can actually prove too strong and cause models to underperform on real-world data. Therefore, in this work we study a closely related topic, that of almost equivariance. We provide a definition of almost equivariance that differs from those extant in the current literature and give a practical method for encoding almost equivariance in models by appealing to the Lie algebra of a Lie group. Specifically, we define Lie algebra convolutions and demonstrate that they offer several benefits over Lie group convolutions, including being well-defined for non-compact groups. From there, we pivot to the realm of theory and demonstrate connections between the notions of equivariance and isometry and those of almost equivariance and almost isometry, respectively. We prove two existence theorems, one showing the existence of almost isometries within bounded distance of isometries of a general manifold, and another showing the converse for Hilbert spaces. We then extend these theorems to prove the existence of almost equivariant manifold embeddings within bounded distance of fully equivariant embedding functions, subject to certain constraints on the group action and the function class. Finally, we demonstrate the validity of our approach by benchmarking against datasets in fully equivariant and almost equivariant settings.
摘要
近期,对于一个群作用下的模型的等价性(equivariance)已经成为机器学习领域的重要研究话题。然而,具有特定群等价性的建筑限制了模型对数据变换的预期,而实际世界的数据通常不符合这种严格的等价性,可能因为数据中的噪声或实际physical laws encode only approximate/partial symmetries。在这种情况下,严格等价性的先验可能会导致模型在实际数据上下perform poorly。因此,在这个工作中,我们研究一个相关的话题:几乎等价性(almost equivariance)。我们提出了一种不同于现有文献中的定义,并给出了在 Lie algebra 中编码几乎等价性的实践方法。我们定义了 Lie algebra 混合并证明其在非紧Compact group 下是有用的。然后,我们转移到理论领域,与等价性和几何等价性之间的关系进行研究。我们证明了两个存在定理,其中一个显示在一般 manifold 上存在几乎几何同惯的,另一个则证明在希尔伯特空间上存在几何同惯的。我们然后推广这些定理,证明在满足某些群动作和函数类型的约束下,存在几乎等价的投影函数,并且这些函数在 bounded distance 内与完全等价的投影函数具有相似性。最后,我们通过对实际数据进行测试,证明了我们的方法的有效性。
Graph Neural Networks with polynomial activations have limited expressivity
results: 论文表明,使用某些activation function(如 polynomial activation function)无法表示GC2 queries,这意味着GNNs可以表示不同的逻辑Query,并答复了[Grohe, 2021]提出的问题。Abstract
The expressivity of Graph Neural Networks (GNNs) can be entirely characterized by appropriate fragments of the first order logic. Namely, any query of the two variable fragment of graded modal logic (GC2) interpreted over labelled graphs can be expressed using a GNN whose size depends only on the depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021 ], this description holds for a family of activation functions, leaving the possibibility for a hierarchy of logics expressible by GNNs depending on the chosen activation function. In this article, we show that such hierarchy indeed exists by proving that GC2 queries cannot be expressed by GNNs with polynomial activation functions. This implies a separation between polynomial and popular non polynomial activations (such as ReLUs, sigmoid and hyperbolic tan and others) and answers an open question formulated by [Grohe, 2021].
摘要
“Graph Neural Networks(GNNs)的表达能力可以完全用适当的 fragments of first-order logic 来描述。即任何GC2 query (graded modal logic的两变量 фрагмент) interpret over 标记图可以通过一个具有仅仅取决于查询深度的 GNN 表示。根据 [Barcelo & Al., 2020, Grohe, 2021] 所指出,这种描述适用于一家 activation function 家族,从而存在一个基于 activation function 的 hierarchy of logics 可以由 GNNs 表示。在这篇文章中,我们证明 GC2 queries 不能由 polynomial activation functions 表示,这意味着存在一种 polynomial 和 popular non-polynomial activation functions(如 ReLU, sigmoid 和 hyperbolic tan 等)之间的分化,并回答了 [Grohe, 2021] 提出的一个问题。”
Mean Estimation Under Heterogeneous Privacy Demands
methods: 我们提出了一种基于各自隐私需求的 mean estimation 算法,该算法是 minimax 优化的。
results: 我们的结果表明,在不同用户隐私偏好下,总体错误率随着最严格的用户隐私需求增长,而其他用户则得到了免费的隐私保障。Abstract
Differential Privacy (DP) is a well-established framework to quantify privacy loss incurred by any algorithm. Traditional formulations impose a uniform privacy requirement for all users, which is often inconsistent with real-world scenarios in which users dictate their privacy preferences individually. This work considers the problem of mean estimation, where each user can impose their own distinct privacy level. The algorithm we propose is shown to be minimax optimal and has a near-linear run-time. Our results elicit an interesting saturation phenomenon that occurs. Namely, the privacy requirements of the most stringent users dictate the overall error rates. As a consequence, users with less but differing privacy requirements are all given more privacy than they require, in equal amounts. In other words, these privacy-indifferent users are given a nontrivial degree of privacy for free, without any sacrifice in the performance of the estimator.
摘要
differential privacy (DP) 是一个已经成熟的框架,用于量化 алгоритмі中的隐私损失。传统的 формулювання对所有用户均强制一致的隐私要求,这经常与实际情况不符,用户们各自表达他们的隐私偏好。这个工作考虑到均值估计问题,每个用户可以单独表达自己的隐私水平。我们提出的算法被证明为最差最佳和时间复杂度几乎线性。我们的结果发现一个有趣的满足现象,即最严格的用户对全局的错误率给出了指令。因此,具有较弱隐私要求的用户们都被给予更多的隐私,但是不需要牺牲估计器的性能。即使这些隐私漏れ者用户不需要隐私,也会被给予一定的隐私,而不是完全没有隐私。
Approaches for Uncertainty Quantification of AI-predicted Material Properties: A Comparison
results: 研究发现,Quantile方法和Ensemble方法能够更好地预测机器学习模型预测结果的不确定性,而直接机器学习预测 интерval方法的性能较差。Abstract
The development of large databases of material properties, together with the availability of powerful computers, has allowed machine learning (ML) modeling to become a widely used tool for predicting material performances. While confidence intervals are commonly reported for such ML models, prediction intervals, i.e., the uncertainty on each prediction, are not as frequently available. Here, we investigate three easy-to-implement approaches to determine such individual uncertainty, comparing them across ten ML quantities spanning energetics, mechanical, electronic, optical, and spectral properties. Specifically, we focused on the Quantile approach, the direct machine learning of the prediction intervals and Ensemble methods.
摘要
大量物理属性数据的发展,加上强大计算机的可用性,使得机器学习(ML)模型成为了预测材料性能的广泛使用的工具。而每个预测的 uncertainty,即预测结果的不确定程度,并不是always report的。在这里,我们 investigate了三种容易实现的方法来确定这种个体uncertainty,对于十个ML量,其中包括能量、机械、电子、光学和spectral性质。我们专注于Quantile方法、直接机器学习预测间隔和Ensemble方法。
Fuel Consumption Prediction for a Passenger Ferry using Machine Learning and In-service Data: A Comparative Study
results: 本文使用XGBoost技术建立的模型实现了最好的预测性能,可以帮助提高海事运输的能源效率。模型的预测结果表明,通过对操作数据进行分析和预测,可以提高海事运输的能源效率和可持续性。I hope this helps! Let me know if you have any further questions.Abstract
As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the operational data in real-time. This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. Statistical and domain-knowledge methods were used to select the proper input variables for the models. These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. \rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research.
摘要
Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. 使用在船舶上collected的实时数据,需要一些准确而完整的模型,以预测船舶的能源效率。The models need to effectively process all the operational data in real-time. 这些模型需要在实时中有效地处理所有的操作数据。This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. 本文提出了使用实时数据收集自一艘客轮船的燃料消耗预测模型。Statistical and domain-knowledge methods were used to select the proper input variables for the models. 使用统计学和域知识方法选择模型的输入变量。These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. 这些方法可以避免过拟合、缺失数据和多重相关性,同时提供实际应用性。Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. 研究的预测模型包括多重直线回归(MLR)、决策树方法(DT)、人工神经网络(ANN)和ensemble方法。The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. 最佳预测性能来自一个使用XGboost技术开发的集成方法。\rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research. 我们的代码可以在GitHub上的\url{https://github.com/pagand/model_optimze_vessel/tree/OE}找到,以便未来的研究。}
paper_authors: Piotr Gramacki, Kacper Leśniara, Kamil Raczycki, Szymon Woźniak, Marcin Przymus, Piotr Szymański
for: 这个研究是用于探讨人工智能(AI)领域内的地ospatial数据处理方法。
methods: 这个研究使用了Python programming language的Spatial Representations for Artificial Intelligence(srai)库,可以下载地ospatial数据,分割给定区域为微区域,并训练嵌入模型使用不同的架构。
results: 这个研究可以实现地ospatial任务解决的完整管道,并且是首个将地ospatial AI领域工具集成成一个标准化的库。Abstract
Spatial Representations for Artificial Intelligence (srai) is a Python library for working with geospatial data. The library can download geospatial data, split a given area into micro-regions using multiple algorithms and train an embedding model using various architectures. It includes baseline models as well as more complex methods from published works. Those capabilities make it possible to use srai in a complete pipeline for geospatial task solving. The proposed library is the first step to standardize the geospatial AI domain toolset. It is fully open-source and published under Apache 2.0 licence.
摘要
spatial representations for artificial intelligence (SRai) 是一个 Python 库用于处理地ospatial 数据。该库可以下载地ospatial 数据,将给定区域分解成微区域使用多种算法,并使用不同的架构训练嵌入模型。它包括基线模型以及来自已发布作品的更复杂的方法。这些能力使得 SRai 可以在完整的地ospatial 任务解决管道中使用。提议的库是地ospatial AI 领域工具集的首个标准化步骤。它是完全开源的,并根据 Apache 2.0 许可证发布。
A Multi-Stage Temporal Convolutional Network for Volleyball Jumps Classification Using a Waist-Mounted IMU
results: 研究发现,使用单个IMU和MS-TCN模型可以准确地识别篮球运动员的跳跃类型,并且比现有的深度学习模型更具有计算效率。在实验中,模型在10名篮球运动员和26名篮球运动员的 lab session 和训练Session 中的表现均显示出优异的准确率。Abstract
Monitoring the number of jumps for volleyball players during training or a match can be crucial to prevent injuries, yet the measurement requires considerable workload and cost using traditional methods such as video analysis. Also, existing methods do not provide accurate differentiation between different types of jumps. In this study, an unobtrusive system with a single inertial measurement unit (IMU) on the waist was proposed to recognize the types of volleyball jumps. A Multi-Layer Temporal Convolutional Network (MS-TCN) was applied for sample-wise classification. The model was evaluated on ten volleyball players and twenty-six volleyball players, during a lab session with a fixed protocol of jumping and landing tasks, and during four volleyball training sessions, respectively. The MS-TCN model achieved better performance than a state-of-the-art deep learning model but with lower computational cost. In the lab sessions, most jump counts showed small differences between the predicted jumps and video-annotated jumps, with an overall count showing a Limit of Agreement (LoA) of 0.1+-3.40 (r=0.884). For comparison, the proposed algorithm showed slightly worse results than VERT (a commercial jumping assessment device) with a LoA of 0.1+-2.08 (r=0.955) but the differences were still within a comparable range. In the training sessions, the recognition of three types of jumps exhibited a mean difference from observation of less than 10 jumps: block, smash, and overhead serve. These results showed the potential of using a single IMU to recognize the types of volleyball jumps. The sample-wise architecture provided high resolution of recognition and the MS-TCN required fewer parameters to train compared with state-of-the-art models.
摘要
监测排球运动员 durante 训练或比赛中的跳跃数量可能是预防伤害的关键,但传统方法 such as video分析需要较大的工作负担和成本。此外,现有的方法无法准确地区分不同类型的跳跃。本研究提出了一种不侵入式系统,使用单个吸收测量单元 (IMU) 在腰部进行跳跃类型识别。使用多层时间卷积网络 (MS-TCN) 进行样本WISE分类。模型在十名排球运动员和二十六名排球运动员 durante 实验室 session 中进行评估,并在四场排球训练Session中进行评估。MS-TCN 模型在与现有深度学习模型进行比较时表现更好,但计算成本较低。在实验室 session 中,大多数跳跃计数显示小差异 между预测跳跃和视频标注跳跃,总计异差为 0.1+-3.40(r=0.884)。相比之下,提posed algorithm 对 VERT (一种商业跳跃评估设备) 的表现略为差,异差为 0.1+-2.08(r=0.955),但差异仍在相对可接受范围内。在训练Session中,识别三种类型的跳跃显示平均差异少于 10 跳跃:封顶、击球和背靠击。这些结果表明了使用单个 IMU 可以准确地识别排球跳跃的类型。样本WISE架构提供了高分辨率的识别,而 MS-TCN 需要 fewer 参数进行训练,相比之下state-of-the-art模型。
Sequence Length Independent Norm-Based Generalization Bounds for Transformers
methods: 我们使用 Covering Number 基本法来证明我们的 bound。我们使用三个新的 Covering Number 上界来Upper bound Transformer的 Rademacher complexity。
results: 我们证明了这个通用化 bound 适用于常见的Transformer训练技术中的masking和预测masked word。我们也在一个干扰多数据集上进行了实验验证我们的理论发现。Abstract
This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear transformations to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings.
摘要
Note:* "Transformer" architecture 改为 "Transformer 架构"* "input sequence length" 改为 "输入序列长度"* "Rademacher complexity" 改为 "拉德贝克复杂度"* "masked word" 改为 "遮盖的单词"* "sparse majority data set" 改为 "稀疏多数数据集"
How Can Everyday Users Efficiently Teach Robots by Demonstrations?
results: 研究发现,使用提议的示范例子可以提高机器人学习效率,并且可以提高机器人在新任务上的泛化能力。 compared to 一个现有的经验规则,提议的方法可以提高机器人学习效率 by 210%。Abstract
Learning from Demonstration (LfD) is a framework that allows lay users to easily program robots. However, the efficiency of robot learning and the robot's ability to generalize to task variations hinges upon the quality and quantity of the provided demonstrations. Our objective is to guide human teachers to furnish more effective demonstrations, thus facilitating efficient robot learning. To achieve this, we propose to use a measure of uncertainty, namely task-related information entropy, as a criterion for suggesting informative demonstration examples to human teachers to improve their teaching skills. In a conducted experiment (N=24), an augmented reality (AR)-based guidance system was employed to train novice users to produce additional demonstrations from areas with the highest entropy within the workspace. These novice users were trained for a few trials to teach the robot a generalizable task using a limited number of demonstrations. Subsequently, the users' performance after training was assessed first on the same task (retention) and then on a novel task (transfer) without guidance. The results indicated a substantial improvement in robot learning efficiency from the teacher's demonstrations, with an improvement of up to 198% observed on the novel task. Furthermore, the proposed approach was compared to a state-of-the-art heuristic rule and found to improve robot learning efficiency by 210% compared to the heuristic rule.
摘要
学习从示例(LfD)是一种框架,允许非专业人员轻松编程机器人。然而,机器人学习效率和任务变化的泛化能力受示例质量和量的限制。我们的目标是指导人类教师提供更有效的示例,以便促进机器人快速学习。为 достичь这一目标,我们提议使用任务相关信息熵作为教学示例选择的依据。在一项实验中(N=24),我们使用了增强现实(AR)基本的指导系统,让新手用户在工作空间中最高 entropy 的区域内提供更多示例。这些新手用户在几个训练周期后,通过限制数量的示例教育机器人一个通用任务。然后,用户的性能被评估在同一个任务上(退 Reserve)和一个新任务上(转移)无指导。结果表明,提posed方法可以大幅提高机器人学习效率,最高提高达198%在新任务上。此外,我们的方法与现有的一种有效规则进行比较,发现可以提高机器人学习效率210%。
On the Computational Complexities of Complex-valued Neural Networks
results: 这篇论文提出了CVNN的量化和极限计算复杂性分析方法,这些方法可以准确地估算CVNN的计算复杂性。这些方法基于实数乘法的数量,这是计算复杂性的主要限制因素。此外,这篇论文还对一些相关研究中提出的CVNN计算复杂性进行了 investigate。Abstract
Complex-valued neural networks (CVNNs) are nonlinear filters used in the digital signal processing of complex-domain data. Compared with real-valued neural networks~(RVNNs), CVNNs can directly handle complex-valued input and output signals due to their complex domain parameters and activation functions. With the trend toward low-power systems, computational complexity analysis has become essential for measuring an algorithm's power consumption. Therefore, this paper presents both the quantitative and asymptotic computational complexities of CVNNs. This is a crucial tool in deciding which algorithm to implement. The mathematical operations are described in terms of the number of real-valued multiplications, as these are the most demanding operations. To determine which CVNN can be implemented in a low-power system, quantitative computational complexities can be used to accurately estimate the number of floating-point operations. We have also investigated the computational complexities of CVNNs discussed in some studies presented in the literature.
摘要
复杂值神经网络(CVNN)是非线性滤波器,用于数字信号处理中的复杂频域数据。与实数值神经网络(RVNN)相比,CVNN可以直接处理复杂值输入和输出信号,这是因为它们的复杂频域参数和活动函数。随着低功耗系统的趋势,计算复杂性分析已成为选择算法的关键工具。因此,本文提供了 CVNN 的量化和极限计算复杂性分析。这是准确估计实现低功耗系统中的算法所需的浮点运算数量的关键工具。我们还 investigate了一些 literatures 中关于 CVNN 的计算复杂性分析。
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
for: This paper aims to address the challenge of robust generalization in deep learning, specifically when the number of trainable parameters is very large.
methods: The authors use two-layer neural networks trained on modular arithmetic tasks with corrupted labels, and study the effect of regularization methods such as weight decay, dropout, and BatchNorm on the network’s ability to generalize.
results: The authors show that regularization methods can force the network to ignore corrupted data during optimization, achieving $100%$ accuracy on the uncorrupted dataset. They also demonstrate that the effect of these regularization methods is interpretable, and that the training dynamics involve two consecutive stages: first, the network undergoes the “grokking” dynamics reaching high train and test accuracy, and second, it unlearns the memorizing representations.Abstract
Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the underlying rule (or both). Motivated by this challenge, we study an interpretable model where generalizing representations are understood analytically, and are easily distinguishable from the memorizing ones. Namely, we consider two-layer neural networks trained on modular arithmetic tasks where ($\xi \cdot 100\%$) of labels are corrupted (\emph{i.e.} some results of the modular operations in the training set are incorrect). We show that (i) it is possible for the network to memorize the corrupted labels \emph{and} achieve $100\%$ generalization at the same time; (ii) the memorizing neurons can be identified and pruned, lowering the accuracy on corrupted data and improving the accuracy on uncorrupted data; (iii) regularization methods such as weight decay, dropout and BatchNorm force the network to ignore the corrupted data during optimization, and achieve $100\%$ accuracy on the uncorrupted dataset; and (iv) the effect of these regularization methods is (``mechanistically'') interpretable: weight decay and dropout force all the neurons to learn generalizing representations, while BatchNorm de-amplifies the output of memorizing neurons and amplifies the output of the generalizing ones. Finally, we show that in the presence of regularization, the training dynamics involves two consecutive stages: first, the network undergoes the \emph{grokking} dynamics reaching high train \emph{and} test accuracy; second, it unlearns the memorizing representations, where train accuracy suddenly jumps from $100\%$ to $100 (1-\xi)\%$.
摘要
深度学习中的稳健泛化是一个主要挑战,特别是当训练参数的数量非常大时。通常,很难判断网络是否已经记忆了特定的示例集或者理解了下面的规则(或者都是)。为了解决这个挑战,我们研究了一种可解释的模型,其中泛化表示可以分析地理解。具体来说,我们考虑了两层神经网络,在模块加法任务上训练,其中($\xi \cdot 100\%$)的标签有误(即训练集中的模块操作结果有误)。我们发现:1. 网络可以同时记忆损害的标签并达到100%的泛化率;2. 记忆神经可以被识别并剔除,使网络的准确率在损害数据上降低,并在不损害数据上提高准确率;3. 训练过程中的正则化方法,如权重衰减、Dropout和BatchNorm,会让网络在优化过程中忽略损害数据,并在不损害数据上达到100%的准确率;4. 这些正则化方法的效果是("机械")可解释的:权重衰减和Dropout让所有神经学习泛化表示,而BatchNorm减小了记忆神经的输出,并增大了泛化神经的输出。最后,我们发现在正则化的情况下,训练过程包括两个阶段:首先,网络进行了“感知”动力学,达到高训练精度和测试精度;第二,它忘记了记忆表示,训练精度 suddenly从100%降低到100(1-$\xi)$%。
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
results: 本论文显示了SGD在非对称优化问题中的$\epsilon$-定点存在,并且可以在大 enough的迭代次数T下确定这些定点。此外,本论文还能量量 $\epsilon$-定点在SGD的最终迭代中的密度,并且在不同的函数对象和精度下可以恢复类传统的$O(\frac{1}{\sqrt{T})$的下降率。Abstract
Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity results based on either the minimum of the expected gradient norm or the functional sub-optimality gap (for functions with extra structural property) by searching the entire range of iterates. Hence the last iterations of SGDs do not necessarily maintain the same complexity guarantee. This paper shows that an $\epsilon$-stationary point exists in the final iterates of SGDs, given a large enough total iteration budget, $T$, not just anywhere in the entire range of iterates -- a much stronger result than the existing one. Additionally, our analyses allow us to measure the density of the $\epsilon$-stationary points in the final iterates of SGD, and we recover the classical $O(\frac{1}{\sqrt{T})$ asymptotic rate under various existing assumptions on the objective function and the bounds on the stochastic gradient. As a result of our analyses, we addressed certain myths and legends related to the nonconvex convergence of SGD and posed some thought-provoking questions that could set new directions for research.
摘要
results: 在五个 dataset上测试了该算法,与多个基线比较,发现该算法可以满足PAC保证,同时生成更小、更有用的预测集。Abstract
Prediction sets capture uncertainty by predicting sets of labels rather than individual labels, enabling downstream decisions to conservatively account for all plausible outcomes. Conformal inference algorithms construct prediction sets guaranteed to contain the true label with high probability. These guarantees fail to hold in the face of distribution shift, which is precisely when reliable uncertainty quantification can be most useful. We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting. This method estimates the predicted probabilities of the classes in a target domain, as well as the confusion matrix, then propagates uncertainty in these estimates through a Gaussian elimination algorithm to compute confidence intervals for importance weights. Finally, it uses these intervals to construct prediction sets. We evaluate our approach on five datasets: the CIFAR-10, ChestX-Ray and Entity-13 image datasets, the tabular CDC Heart dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller, more informative, prediction sets compared to several baselines.
摘要
预测集合 capture uncertainty by predicting sets of labels instead of individual labels, allowing downstream decisions to conservatively account for all plausible outcomes. конформаль inference algorithms construct prediction sets guaranteed to contain the true label with high probability. However, these guarantees fail to hold in the face of distribution shift, which is precisely when reliable uncertainty quantification can be most useful. We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting. This method estimates the predicted probabilities of the classes in a target domain, as well as the confusion matrix, then propagates uncertainty in these estimates through a Gaussian elimination algorithm to compute confidence intervals for importance weights. Finally, it uses these intervals to construct prediction sets. We evaluate our approach on five datasets: CIFAR-10, ChestX-Ray and Entity-13 image datasets, the tabular CDC Heart dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller, more informative, prediction sets compared to several baselines.
Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning
paper_authors: Amey P. Pasarkar, Adji Bousso Dieng
for: 这 paper 是为了提出一种新的多样性度量 metric,用于Machine Learning(ML)、生态学和化学等领域。
methods: 这 paper 使用了类似性基于的 Hill 数列扩展,从量子统计力学中借鉴了一些想法,以便更好地衡量多样性。
results: 这 paper 在一种控制的synthetic setting中研究了这种新的多样性度量的属性,并在分子模拟中进行了实验,以及用于更好地理解图像生成模型中的 memorization、duplication、多样性和样本质量等问题。Abstract
Measuring diversity accurately is important for many scientific fields, including machine learning (ML), ecology, and chemistry. The Vendi Score was introduced as a generic similarity-based diversity metric that extends the Hill number of order q=1 by leveraging ideas from quantum statistical mechanics. Contrary to many diversity metrics in ecology, the Vendi Score accounts for similarity and does not require knowledge of the prevalence of the categories in the collection to be evaluated for diversity. However, the Vendi Score treats each item in a given collection with a level of sensitivity proportional to the item's prevalence. This is undesirable in settings where there is a significant imbalance in item prevalence. In this paper, we extend the other Hill numbers using similarity to provide flexibility in allocating sensitivity to rare or common items. This leads to a family of diversity metrics -- Vendi scores with different levels of sensitivity -- that can be used in a variety of applications. We study the properties of the scores in a synthetic controlled setting where the ground truth diversity is known. We then test their utility in improving molecular simulations via Vendi Sampling. Finally, we use the Vendi scores to better understand the behavior of image generative models in terms of memorization, duplication, diversity, and sample quality.
摘要
Translation notes:* "Measuring diversity accurately" is 精确地测量多样性 (jīngjì de cèngjí de tiǎoyòng xìng)* "Vendi Score" is 维度分数 (wéidù fānsù)* "Hill number of order q=1" is 邦顿数量 q=1 (bāngdòng xiàngliàng q=1)* "similarity" is 相似性 (xiāngsiǒngxìng)* "prevalence" is 存在率 (cúnzài xìng)* "sensitivity" is 敏感度 (mǐnán dé)* "family of diversity metrics" is 多样性指标的家族 (duōyàng xìng zhǐbǎi de jiāfu)* "Vendi Sampling" is 维度采样 (wéidù qiǎo yǎn)* "image generative models" is 图像生成模型 (túxiàng shēngchǎng módelì)* "memorization" is 记忆 (jìyì)* "duplication" is 复制 (fùzhì)* "diversity" is 多样性 (duōyàng xìng)* "sample quality" is 样本质量 (yàngběn zhìliàng)
Generative Flow Networks as Entropy-Regularized RL
results: 研究人员在这篇论文中展示了如何将生成流网络的学习任务重新定义为一个 entropy-regularized RL 问题,并且通过使用标准软RL算法来训练 GFlowNet。结果表明,这种方法可以与已知 GFlowNet 训练方法相比肩,而且可以在几种概率模型任务中实现实用的性能。Abstract
The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.
摘要
Probabilistic Modeling of Human Teams to Infer False Beliefs
paper_authors: Paulo Soares, Adarsh Pyarelal, Kobus Barnard for:这篇论文旨在开发一种概率图示模型(PGM),用于人工智能(AI)代理人在模拟城市搜索和救援(USAR)场景中推断人类信念。methods:这种PGM方法使 observable states和 actions 明确,以及信念和意图基于时间上的观察数据。这种方法还支持推断干预的效果,这些干预是AI代理人与人类团队合作的关键。results:实验包括玩家知识的 manipulate,并在虚拟 Minecraft 环境中提供了多个信息流,包括玩家视野中的物品。参与者使用一组标记块来标注房间内的受害人存在或缺失,并且在团队中有一个成员被分配不同的标记启用,可能误导他们关于房间状态的信念。我们从前一些相关的工作中扩展,引入 ToMCAT,一个可以理解个人和共享心理状态的 AI 代理人。我们发现玩家的行为受到他们在游戏视野中看到的内容、标记的含义和团队决定的信念的影响。此外,我们发现 ToMCAT 的信念与玩家行为相符,并且它可以准确地推断 false belief ,并且与人类观察者相比,其推断精度高于随机并且相当于人类观察者。Abstract
We develop a probabilistic graphical model (PGM) for artificially intelligent (AI) agents to infer human beliefs during a simulated urban search and rescue (USAR) scenario executed in a Minecraft environment with a team of three players. The PGM approach makes observable states and actions explicit, as well as beliefs and intentions grounded by evidence about what players see and do over time. This approach also supports inferring the effect of interventions, which are vital if AI agents are to assist human teams. The experiment incorporates manipulations of players' knowledge, and the virtual Minecraft-based testbed provides access to several streams of information, including the objects in the players' field of view. The participants are equipped with a set of marker blocks that can be placed near room entrances to signal the presence or absence of victims in the rooms to their teammates. In each team, one of the members is given a different legend for the markers than the other two, which may mislead them about the state of the rooms; that is, they will hold a false belief. We extend previous works in this field by introducing ToMCAT, an AI agent that can reason about individual and shared mental states. We find that the players' behaviors are affected by what they see in their in-game field of view, their beliefs about the meaning of the markers, and their beliefs about which meaning the team decided to adopt. In addition, we show that ToMCAT's beliefs are consistent with the players' actions and that it can infer false beliefs with accuracy significantly better than chance and comparable to inferences made by human observers.
摘要
我们开发了一个 probabilistic graphical model(PGM),用于人工智能(AI)代理人在模拟城市搜索和救援(USAR)场景中推断人类信念。PGM方法使 observable states 和 actions 显示出来,同时也使 beliefs 和 intention 根据观察到的事实附加下附加。这种方法还支持推断 intervene,这些 intervene 非常重要,如果 AI 代理人想要协助人类团队。实验包括玩家知识的操作,并且在虚拟 Minecraft 环境中提供了多个信息流,包括玩家视野中的物体。参与者被装备了一组标记块,可以在房间入口处置标记,以signal 房间中的病人存在或缺失。在每个团队中,一个成员被给予不同的征legend than the other two,这可能会误导他们房间的状态,即他们将保持false belief。我们在这一Field extrapolate previous works,引入 ToMCAT,一个能理解个人和共享 mental state 的 AI 代理人。我们发现玩家的行为受到他们在游戏视野中看到的内容、标记的含义和团队决定的含义的影响。此外,我们发现 ToMCAT 的信念与玩家行为相符,并且它可以准确地推断 false belief ,比 randomly chance 和人类观察者的推断更高。
Enhancing Open-World Bacterial Raman Spectra Identification by Feature Regularization for Improved Resilience against Unknown Classes
paper_authors: Yaroslav Balytskyi, Nataliia Kalashnyk, Inna Hubenko, Alina Balytska, Kelly McNear for:* 这个研究旨在应用深度学习技术和拉曼спектроскоPY结合,精确地识别临床环境中的病原菌。methods:* 使用了现有的关键字集合和对称网络架构,以及注意力机制来提高模型的准确性。results:* 模型的准确性提高至87.8±0.1%,比最佳可用模型的准确性高出1.1%。* 透过特征规范化,模型能够高效地识别已知的病原菌,并将未知样本分类为不明菌,降低了伪阳性率。* 在测试阶段,模型能够有效地检测未知的菌种,提高了检测结果的可靠性。Abstract
The combination of Deep Learning techniques and Raman spectroscopy shows great potential offering precise and prompt identification of pathogenic bacteria in clinical settings. However, the traditional closed-set classification approaches assume that all test samples belong to one of the known pathogens, and their applicability is limited since the clinical environment is inherently unpredictable and dynamic, unknown or emerging pathogens may not be included in the available catalogs. We demonstrate that the current state-of-the-art Neural Networks identifying pathogens through Raman spectra are vulnerable to unknown inputs, resulting in an uncontrollable false positive rate. To address this issue, first, we developed a novel ensemble of ResNet architectures combined with the attention mechanism which outperforms existing closed-world methods, achieving an accuracy of $87.8 \pm 0.1\%$ compared to the best available model's accuracy of $86.7 \pm 0.4\%$. Second, through the integration of feature regularization by the Objectosphere loss function, our model achieves both high accuracy in identifying known pathogens from the catalog and effectively separates unknown samples drastically reducing the false positive rate. Finally, the proposed feature regularization method during training significantly enhances the performance of out-of-distribution detectors during the inference phase improving the reliability of the detection of unknown classes. Our novel algorithm for Raman spectroscopy enables the detection of unknown, uncatalogued, and emerging pathogens providing the flexibility to adapt to future pathogens that may emerge, and has the potential to improve the reliability of Raman-based solutions in dynamic operating environments where accuracy is critical, such as public safety applications.
摘要
“深度学习技术和拉曼光谱结合显示了巨大的潜力,可以提供精准和快速的病菌识别在临床环境中。然而,传统的闭式分类方法假设所有测试样本都属于已知的病菌,其可靠性有限,因为临床环境是自然不可预测的和动态的。未知或emerging病菌可能不包括在可用目录中。我们示示了当前状态的artificial neural networks通过拉曼光谱来识别病菌是容易受到未知输入的影响,导致false positive rate不可控。为解决这个问题,我们首先开发了一种基于ResNet架构的新型ensemble,并添加了注意力机制,该模型在闭式世界方法中超越了最佳可用模型的准确率($87.8\pm 0.1\%$ vs $86.7\pm 0.4\%$)。其次,通过对特征REG regularization的integrated Objectosphere损失函数,我们的模型可以同时具有高准确率对已知病菌目录中的识别和有效地分离未知样本,从而减少false positive rate。最后,在训练阶段通过特征REG regularization,我们的模型在推理阶段的out-of-distribution检测器性能得到了显著提升,从而提高了检测未知类别的可靠性。我们的新算法可以检测未知、未目录和emerging病菌,提供了适应未来病菌的灵活性,并且在准确是关键的公共安全应用中具有潜在的优势。”
results: 研究发现,不同GPU在处理大输入图像(256x256)的神经网络模型时的性能差异较大。Abstract
With the maturity of deep learning, its use is emerging in every field. Also, as different types of GPUs are becoming more available in the markets, it creates a difficult decision for users. How can users select GPUs to achieve optimal performance for a specific task? Analysis of GPU architecture is well studied, but existing works that benchmark GPUs do not study tasks for networks with significantly larger input. In this work, we tried to differentiate the performance of different GPUs on neural network models that operate on bigger input images (256x256).
摘要
使深度学习技术成熟,它在各个领域中得到应用。然而,由于不同类型的GPU在市场上更加可用,这创造了用户选择GPU实现特定任务优化性能的困难决策。虽然对GPU架构分析已经得到了广泛的研究,但现有的GPU benchmark工作并不研究处理大量输入的网络。在这个工作中,我们尝试了对不同GPU的 neural network 模型在大小为256x256的输入图像上的性能进行比较。注:使用 Simplified Chinese 翻译,以下是简化中文版本。
Blind quantum machine learning with quantum bipartite correlator
results: 研究人员通过了复杂度和隐私分析,证明了该协议的有效性。这些发现打开了新的可能性,使量子技术 era 中的隐私意识应用得到进一步发展。Abstract
Distributed quantum computing is a promising computational paradigm for performing computations that are beyond the reach of individual quantum devices. Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes. In this work, we introduce novel blind quantum machine learning protocols based on the quantum bipartite correlator algorithm. Our protocols have reduced communication overhead while preserving the privacy of data from untrusted parties. We introduce robust algorithm-specific privacy-preserving mechanisms with low computational overhead that do not require complex cryptographic techniques. We then validate the effectiveness of the proposed protocols through complexity and privacy analysis. Our findings pave the way for advancements in distributed quantum computing, opening up new possibilities for privacy-aware machine learning applications in the era of quantum technologies.
摘要
分布式量子计算是一种有前途的计算模式,可以执行个人量子设备无法完成的计算任务。在分布式量子计算中,隐私是 kritical的,以保持数据的隐私和保护数据在不可信计算节点存在下。在这种工作中,我们引入了新的盲目量子机器学习协议,基于量子二重相关算法。我们的协议具有减少的通信负担,同时保持数据来自不可信方的隐私。我们引入了较强的算法特定隐私保护机制,不需要复杂的加密技术,计算负担较低。我们验证了我们的协议的有效性通过复杂度和隐私分析。我们的发现开 up了分布式量子计算的新可能性,为隐私意识的量子技术应用奠定基础。
Fine-Tuning Generative Models as an Inference Method for Robotic Tasks
results: 我们的方法可以应用于多种深度生成模型,如自然语言处理、图像生成等,并且在机器人任务中得到了良好的效果,如握掌、 inverse kinematics 计算和点云补充等。Abstract
Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.
摘要
这文章探讨了如何使用可靠的模型来帮助机器人代理人在实际世界中运行,以应对不同和变化的环境。我们建立了一个简单且通用的方法,可以快速地适应模型,使其能够对观察到的证据进行适应,使用十字熵方法。我们证明了我们的方法可以应用于多种深度生成模型和机器人环境中。我们还示了这个方法在抓取物体形状推导、 inverse kinematics 计算和点云补充等任务中的可用性。
results: 对比 Audio-LDM 模型,该方法可以获得更高的质量和更 faithful 的音频编辑结果,并且可以保持原始音频事件的幂等特征。Abstract
In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.
摘要
在这篇论文中,我们探讨了不固定文本编辑的音频编辑技术。我们展示了我们的编辑管道可以创建具有 faithfulness 特性的音频编辑。我们研究了添加、式转换和填充等文本提示,并quantitatively和qualitatively表明我们的编辑结果能够超过Audio-LDM,一个最近发布的文本提示 audio生成模型。 qualitative 分析结果表明,我们的编辑方法能够更好地保持输入音频事件的原始声音和停顿时间。
Model-agnostic variable importance for predictive uncertainty: an entropy-based approach
results: 通过使用这些方法,本文发现了一些新的理解模型行为的方法,并且可以使用这些方法来衡量特征对预测分布中不确定性的影响。Abstract
In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches in understanding both the sources of uncertainty and their impact on model performance.
摘要
要信任机器学习算法的预测结果,需要了解这些预测结果的原因以及模型对这些预测结果的自信度。在这篇论文中,我们示例了现有的解释方法可以扩展到不确定性意识模型,并使用这些扩展来理解模型预测分布中的不确定性来源。特别是,通过修改Permutation feature importance、partial dependence plots和individual conditional expectation plots,我们示例了可以从这些方法中获得新的意识,并且这些方法可以用来衡量特征对预测分布的熵和真实标签下的对应概率的影响。通过使用 synthetic 和实际数据进行实验,我们证明了这些方法的实用性,可以用来理解模型行为的不确定性来源并对其影响。
Generating collective counterfactual explanations in score-based classification via mathematical optimization
For: 本研究旨在提供一种用于批处理高风险决策中机器学习模型的解释方法,以帮助理解模型如何做出决策。* Methods: 本研究使用了counterfactual分析方法,通过对一个实例进行微调,以提供一个解释,即如何将该实例微调以使其被分类器分类到所需的类别。此外,本研究还提出了一种集成counterfactual分析方法,可以为一组记录提供共同的解释,以便检测整个数据集中 kritical的特征。* Results: 本研究通过使用新的数学优化模型,可以为每个实例提供一个共同的解释,同时满足一些链接约束,以最小化总的微调成本。此外,本研究还可以处理异常记录,使其更加有效。在一些假设下,解释可以转化为一个半definite Programming问题,可以使用现有的解enser来解决。实验结果表明,本方法可以有效地应用于实际数据集。Abstract
Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modified so that the perturbed instance is classified in the desired class by the Machine Learning classification model. Most of the Counterfactual Analysis literature focuses on the single-instance single-counterfactual setting, in which the analysis is done for one single instance to provide one single explanation. Taking a stakeholder's perspective, in this paper we introduce the so-called collective counterfactual explanations. By means of novel Mathematical Optimization models, we provide a counterfactual explanation for each instance in a group of interest, so that the total cost of the perturbations is minimized under some linking constraints. Making the process of constructing counterfactuals collective instead of individual enables us to detect the features that are critical to the entire dataset to have the individuals classified in the desired class. Our methodology allows for some instances to be treated individually, performing the collective counterfactual analysis for a fraction of records of the group of interest. This way, outliers are identified and handled appropriately. Under some assumptions on the classifier and the space in which counterfactuals are sought, finding collective counterfactuals is reduced to solving a convex quadratic linearly constrained mixed integer optimization problem, which, for datasets of moderate size, can be solved to optimality using existing solvers. The performance of our approach is illustrated on real-world datasets, demonstrating its usefulness.
摘要
Translated into Simplified Chinese:因为机器学习模型在高度决策中越来越常用,因此理解模型如何做出决策变得越来越重要。假设已经训练过的超vision类别化模型,我们可以通过对实例进行counterfactual分析来获得解释:counterfactual解释指的是将实例所需最小改变以使得这个改变后的实例被分类器分类为欲要的类别。大多数counterfactual分析文献都专注于单个实例单个counterfactual的设置,而我们在这篇论文中引入了集合counterfactual解释。我们使用了新的数学优化模型,为每个集合中的每个实例提供一个counterfactual解释,以最小化对实例的改变成本,同时保证实例的改变满足一些链接约束。通过集成counterfactual分析,我们可以检测整个数据集中critical的特征,使得每个个体被分类为欲要的类别。我们的方法允许一些实例被处理 individually,对集合中的一部分记录进行集成counterfactual分析。这样,我们可以检测并处理异常值。在一些假设下,如果分类器和counterfactuals的空间都满足某些条件,那么找到集合counterfactuals就可以降到一个半正quadratic linearly constrained mixed integer optimization problem。这个问题可以用现有的解决器来解决,对于中等大小的数据集来说。我们的方法的性能在实际数据上进行了证明。
paper_authors: Olivier Sprangers, Wander Wadman, Sebastian Schelter, Maarten de Rijke
For: The paper is written for practitioners who want to improve the forecasting performance of their production forecasting systems, particularly those with millions of time series.* Methods: The paper proposes using a sparse loss function to learn a coherent forecast for millions of time series, which directly optimizes the hierarchical product and/or temporal structure. This approach eliminates the need for a post-processing step in traditional hierarchical forecasting techniques, reducing the computational cost of the prediction phase.* Results: The paper shows that the proposed sparse hierarchical loss function achieves up to 10% better performance (in terms of RMSE) on the public M5 dataset compared to the baseline loss function. In addition, the authors implement the loss function in an existing forecasting model at a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, the paper demonstrates an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies defined.Abstract
Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.
摘要
传统的层次预测技术在时间序列数量增加时表现不佳。我们提议通过使用一个稀疏损失函数来学习一个可以涵盖数百万时间序列的底层预测模型。这种稀疏损失函数可以直接优化层次制 продукт和/或时间结构,从而提供一种生成底层预测值具有任意选择的横向或时间层次结构的方法。此外,不需要传统层次预测技术中必需的后处理步骤,因此预测阶段的计算成本也得到了降低。在公共M5数据集上,我们的稀疏层次损失函数与基准损失函数相比,可以提高预测性能达10%(RMSE)。在bol大 european e-commerce平台上实现了我们的稀疏层次损失函数,导致了预测性能的提高达2%。最后,我们发现在我们定义的横向层次中进行评估预测性能时,预测性能提高了5-10%。这些结果表明了我们的稀疏层次损失函数在大规模电商平台上的实用性。
DCSI – An improved measure of cluster separability based on separation and connectedness
methods: 本研究使用了一种新开发的度量指标(density cluster separability index,DCSI),以衡量between-class separation和within-class connectedness two aspects of separability。
results: 实验结果表明,DCSI 与 DBSCAN 的调整 Rand 指标(ARI)之间存在强相关性,但在多类数据集上存在 overlap classes 时,DCSI 的稳定性不高。此外,DCSI 能够正确地标识触地或 overlap 的类,这些类不形成有意义的集群。Abstract
Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. A review of the existing literature shows that neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate the central aspects of separability for density-based clustering: between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.
摘要
Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.Translated into Simplified Chinese:是否class label在给定数据集中对应于有意义的集群是评估用实际世界数据集的 clustering 算法的评估中的关键因素。这个属性可以通过分离度量来衡量。现有文献的综述显示, neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate density-based clustering中的两个中心特征:between-class separation和 within-class connectedness。一种新发展的度量(density cluster separability index, DCSI) aspires to quantify这两个特征,并可以作为 CVI。对于 sintetic data 进行了广泛的实验,DCSI与 DBSCAN 的性能 measured via adjusted rand index (ARI) 之间存在强正相关,但在多类数据集中,DCSI 对于多类数据集而言不具有稳定性,这些数据集不适合 density-based hard clustering。详细评估了一些常用的实际世界数据集,DCSI 可以正确地识别触摸或重叠的类,这些类不形成有意义的集群。
Detection and Evaluation of bias-inducing Features in Machine learning
results: 研究人员通过应用这种方法,在四个常用的数据集上成功地Identify了机器学习模型中的偏见引起因素,并证明了该方法可以帮助领域专家更好地发现和解决偏见问题。Abstract
The cause-to-effect analysis can help us decompose all the likely causes of a problem, such as an undesirable business situation or unintended harm to the individual(s). This implies that we can identify how the problems are inherited, rank the causes to help prioritize fixes, simplify a complex problem and visualize them. In the context of machine learning (ML), one can use cause-to-effect analysis to understand the reason for the biased behavior of the system. For example, we can examine the root causes of biases by checking each feature for a potential cause of bias in the model. To approach this, one can apply small changes to a given feature or a pair of features in the data, following some guidelines and observing how it impacts the decision made by the model (i.e., model prediction). Therefore, we can use cause-to-effect analysis to identify the potential bias-inducing features, even when these features are originally are unknown. This is important since most current methods require a pre-identification of sensitive features for bias assessment and can actually miss other relevant bias-inducing features, which is why systematic identification of such features is necessary. Moreover, it often occurs that to achieve an equitable outcome, one has to take into account sensitive features in the model decision. Therefore, it should be up to the domain experts to decide based on their knowledge of the context of a decision whether bias induced by specific features is acceptable or not. In this study, we propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts. We evaluated our technique using four well-known datasets to showcase how our contribution can help spearhead the standard procedure when developing, testing, maintaining, and deploying fair/equitable machine learning systems.
摘要
通过 causa-efecto 分析,我们可以分解所有可能导致问题的原因,如商业不佳或对个人(们)造成不良影响。这意味着我们可以识别问题的继承,排序原因以帮助优先级化修复,简化复杂问题并可视化它们。在机器学习(ML)上下文中,我们可以使用 causa-efecto 分析理解模型的偏见行为的原因。例如,我们可以检查模型中可能导致偏见的每个特征,以确定哪些特征可能引起偏见。为此,我们可以对数据中的每个特征或一对特征进行小型变更,按照一些指南进行观察,了解模型对这些变更的响应。因此,我们可以使用 causa-efecto 分析来确定可能引起偏见的特征,即使这些特征未经预先标识。这对于当前的方法来说是重要的,因为大多数方法需要先知道敏感特征以进行偏见评估,可能会扫描过其他重要的偏见引起特征。此外,在很多情况下,为了实现公正的结果,需要考虑敏感特征在模型决策中的作用。因此,域专家应该根据它们对决策过程的知识来决定是否acceptible的偏见。在本研究中,我们提出了一种方法,可以系统地标识模型中所有偏见引起特征,以支持域专家决策。我们使用四个常见的数据集进行评估,以示出我们的贡献可以帮助驱动开发、测试、维护和部署公正/公平的机器学习系统的标准程序。
Differentiable Vertex Fitting for Jet Flavour Tagging
paper_authors: Rachel E. C. Smith, Inês Ochoa, Rúben Inácio, Jonathan Shoemaker, Michael Kagan
for: 用于粒子物理学中的jets分类
methods: 使用可微分点Vertex fitting算法
results: 可以提高偏振扩散的粒子物理学模型中的准确率Abstract
We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.
摘要
我们提出了一种可微 differentiable 顶点适应算法,可以用于次级顶点适应,并可以轻松地与神经网络结合用于jet flavor标记。顶点适应被формализова为优化问题,其最优解顶点的梯度由implicit differentiations定义,可以传递到上游或下游神经网络组件进行网络训练。更广泛地说,这是一种应用 diferenciable 编程将物理知识integrated into neural network models in high energy physics。我们示出了可微 secondary vertex fitting可以与更大的 transformer-based 模型结合用于味道标记,并提高重量味道jet分类。
A Theoretical Approach to Characterize the Accuracy-Fairness Trade-off Pareto Frontier
results: 实验结果表明,准确性和公平性之间存在一定的贸易off,而且可以根据不同的属性类型和数据分布来分类型化这个贸易off。此外,研究还发现了一种两步流水线方法可以消除这个贸易off。Abstract
While the accuracy-fairness trade-off has been frequently observed in the literature of fair machine learning, rigorous theoretical analyses have been scarce. To demystify this long-standing challenge, this work seeks to develop a theoretical framework by characterizing the shape of the accuracy-fairness trade-off Pareto frontier (FairFrontier), determined by a set of all optimal Pareto classifiers that no other classifiers can dominate. Specifically, we first demonstrate the existence of the trade-off in real-world scenarios and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. Experimental results on synthetic data suggest insightful findings of the proposed framework: (1) When sensitive attributes can be fully interpreted by non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the trade-off via a two-step streamlined approach. The proposed research enables an in-depth understanding of the accuracy-fairness trade-off, pushing current fair machine-learning research to a new frontier.
摘要
“在文献中,准确性和公平性之间的贸易关系几经不详细。为了解释这个长期挑战,这项研究希望建立一个理论框架,描述公平性贸易关系的Pareto前沿(FairFrontier),由一组所有最优Pareto分类器所决定。 Specifically, we first demonstrate the existence of this trade-off in real-world scenarios, and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. 实验结果表明:(1)当敏感特征可以完全通过非敏感特征解释时,FairFrontier是主要连续的。(2)在追求公平性时,准确性可能会受到急剧下降的影响。(3)通过两步流水线方法,可以消除这个贸易关系。该研究带来了对准确性和公平性之间的贸易关系的深入理解,推动当前公平机器学习研究到新的前ier。”
Conditional Density Estimations from Privacy-Protected Data
methods: 本文使用神经网络Conditional Density Estimators(NCE)作为一种灵活的分布家族,以近似 posterior 分布中参数的推断结果。
results: 实验和分析表明,采用本文提出的方法可以在受保护数据集中实现有效的统计推断,同时保证个人隐私。Abstract
Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only the privatized data during statistical analysis makes it computationally challenging to perform valid inferences on parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
摘要
viele moderne statistische Analyse und maschinelles Lern-Anwendungen erfordern das Training von Modellen auf sensible Benutzerdaten. Differenzielle Privatsphäre bietet eine formale Garantie, dass individual-level Informationen über Benutzer nicht durchlecken. In diesem Framework injectieren randomisierte Algorithmen calibrierte Lärm in das vertrauliche Daten, resulting in privacy-protected Datasets oder -Abfragen. However, restricting access to only the privatized Data during statistical Analysis makes it computationally challenging to perform valid Inferences on parameters underlying the confidential Data. In this work, we propose simulation-based Inference methods from privacy-protected Datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series Data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical Inference procedures to correct for biases introduced by the privacy-protection mechanisms.
results: 实验结果表明,提出的方法可以超越Tacotron 2。Abstract
Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inference efficient. This paper expands the range of available non-AR models with another member called energy-based models (EBMs). The paper describes how noise contrastive estimation, which relies on the comparison between positive and negative samples, can be used to train EBMs. It proposes a number of strategies for generating effective negative samples, including using high-performing AR models. It also describes how sampling from EBMs can be performed using Langevin Markov Chain Monte-Carlo (MCMC). The use of Langevin MCMC enables to draw connections between EBMs and currently popular diffusion models. Experiments on LJSpeech dataset show that the proposed approach offers improvements over Tacotron 2.
摘要
Note:* "AR" stands for "autoregressive"* "EBMs" stands for "energy-based models"* "LJSpeech" is a dataset for speech synthesis* "Tacotron 2" is a popular method for speech synthesis
Discretize Relaxed Solution of Spectral Clustering via a Non-Heuristic Algorithm
results: 实验表明,该方法在离散化问题中显示出了明显的优势,并且可以更好地保持原始问题的优化目标。Abstract
Spectral clustering and its extensions usually consist of two steps: (1) constructing a graph and computing the relaxed solution; (2) discretizing relaxed solutions. Although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. Unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. In other words, the primary drawback is the neglect of the original objective when computing the discrete solution. Inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. Since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. We also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. Sufficient experiments significantly show the superiority of our method.
摘要
spectral clustering 和其 extensions 通常包括两个步骤:(1)构建图并计算宽松解决方案;(2)精化宽松解决方案。 although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. in other words, the primary drawback is the neglect of the original objective when computing the discrete solution. inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. we also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. sufficient experiments significantly show the superiority of our method.Here's the word-for-word translation of the text into Simplified Chinese: spectral clustering 和其 extensions 通常包括两个步骤:(1)构建图并计算宽松解决方案;(2)精化宽松解决方案。 although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. in other words, the primary drawback is the neglect of the original objective when computing the discrete solution. inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. we also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. sufficient experiments significantly show the superiority of our method.
TabuLa: Harnessing Language Models for Tabular Data Synthesis
paper_authors: Zilong Zhao, Robert Birke, Lydia Chen
for: This paper focuses on the research area of tabular data synthesis, specifically exploring the use of large language models (LLMs) to generate realistic tabular data.
methods: The proposed method, called Tabula, is based on the language model structure and utilizes a token sequence compression strategy to reduce training time while maintaining synthetic data quality.
results: The paper demonstrates the limitations of using pre-trained language models for tabular data synthesis and proposes a dedicated foundational model tailored specifically for this task. Additionally, the proposed method significantly reduces training time while achieving better synthetic data utility compared to current state-of-the-art algorithms.Abstract
Given the ubiquitous use of tabular data in industries and the growing concerns in data privacy and security, tabular data synthesis emerges as a critical research area. The recent state-of-the-art methods show that large language models (LLMs) can be adopted to generate realistic tabular data. As LLMs pre-process tabular data as full text, they have the advantage of avoiding the curse of dimensionality associated with one-hot encoding high-dimensional data. However, their long training time and limited re-usability on new tasks prevent them from replacing exiting tabular generative models. In this paper, we propose Tabula, a tabular data synthesizer based on the language model structure. Through Tabula, we demonstrate the inherent limitation of employing pre-trained language models designed for natural language processing (NLP) in the context of tabular data synthesis. Our investigation delves into the development of a dedicated foundational model tailored specifically for tabular data synthesis. Additionally, we propose a token sequence compression strategy to significantly reduce training time while preserving the quality of synthetic data. Extensive experiments on six datasets demonstrate that using a language model structure without loading the well-trained model weights yields a better starting model for tabular data synthesis. Moreover, the Tabula model, previously trained on other tabular data, serves as an excellent foundation model for new tabular data synthesis tasks. Additionally, the token sequence compression method substantially reduces the model's training time. Results show that Tabula averagely reduces 46.2% training time per epoch comparing to current LLMs-based state-of-the-art algorithm and consistently achieves even higher synthetic data utility.
摘要
“因为业务中广泛使用表格数据,并且数据隐私和安全问题日益升级,表格数据合成成为一个重要的研究领域。 current state-of-the-art方法显示,大型自然语言模型(LLMs)可以用来生成真实的表格数据。因为LLMs将表格数据视为全文进行处理,因此它们可以避免因一个维度化而带来的味道问题。然而,它们的培训时间比较长,并且在新任务上有限的可重用性,使得它们无法取代现有的表格生成模型。在这篇论文中,我们提出了Tabula,一种基于语言模型结构的表格数据合成器。通过Tabula,我们发现了使用预训练的自然语言处理模型(NLP)在表格数据合成中存在的内在限制。我们的调查探讨了开发专门为表格数据合成的基础模型。此外,我们还提出了一种压缩token序列策略,可以减少模型培训时间,保持合成数据质量。我们在六个数据集进行了广泛的实验,结果表明,不加载已经预训练的模型 weights,使用语言模型结构可以获得更好的表格数据合成起始模型。此外,Tabula模型,之前已经在其他表格数据上培训,可以作为新的表格数据合成任务的优秀基础模型。此外,压缩token序列策略可以减少模型培训时间,并且可以保持合成数据质量。结果显示,Tabula平均每 epoch 减少46.2% 的培训时间,并在多个表格数据合成任务中表现出了更高的合成数据实用性。”
results: 实验结果显示,这个方法可以在大多数情况下比其他推理模型更好地approximate target distribution,并且可以生成更加简洁和明确的嵌入表示。Abstract
Manifold learning flows are a class of generative modelling techniques that assume a low-dimensional manifold description of the data. The embedding of such a manifold into the high-dimensional space of the data is achieved via learnable invertible transformations. Therefore, once the manifold is properly aligned via a reconstruction loss, the probability density is tractable on the manifold and maximum likelihood can be used to optimize the network parameters. Naturally, the lower-dimensional representation of the data requires an injective-mapping. Recent approaches were able to enforce that the density aligns with the modelled manifold, while efficiently calculating the density volume-change term when embedding to the higher-dimensional space. However, unless the injective-mapping is analytically predefined, the learned manifold is not necessarily an efficient representation of the data. Namely, the latent dimensions of such models frequently learn an entangled intrinsic basis, with degenerate information being stored in each dimension. Alternatively, if a locally orthogonal and/or sparse basis is to be learned, here coined canonical intrinsic basis, it can serve in learning a more compact latent space representation. Toward this end, we propose a canonical manifold learning flow method, where a novel optimization objective enforces the transformation matrix to have few prominent and non-degenerate basis functions. We demonstrate that by minimizing the off-diagonal manifold metric elements $\ell_1$-norm, we can achieve such a basis, which is simultaneously sparse and/or orthogonal. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data, and a better approximation of target distributions than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.
摘要
流形学习流程是一种生成模型技术,假设数据有低维度流形描述。通过学习可逆变换,将流形嵌入高维度数据空间中。一旦流形 Correctly aligned via 重建损失, then the probability density is tractable on the manifold, and maximum likelihood can be used to optimize the network parameters. 然而, unless the injective-mapping is analytically predefined, the learned manifold may not be an efficient representation of the data. Specifically, the latent dimensions of such models frequently learn an entangled intrinsic basis, with degenerate information being stored in each dimension.为了解决这个问题,我们提出了一种 canonical manifold learning flow 方法,其中一个新的优化目标函数要求变换矩阵具有少量显著和非分析的基函数。我们示示了,通过最小化离散 manifold 度量元素 $\ell_1$-norm,可以实现这种基函数,这是同时稀疏和/或正交的。 canonical manifold flow 可以更有效地使用封闭空间,自动生成 fewer 和更明显的维度来表示数据,并且更好地近似目标分布 than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.
Learn from the Past: A Proxy based Adversarial Defense Framework to Boost Robustness
For: This paper focuses on improving the robustness of deep learning models against adversarial attacks, specifically by introducing a new framework called LAST (Learn from the Past) that utilizes historical information to defend against parameter-oriented attacks.* Methods: The paper introduces a two-stage update rule for the target model, which incorporates prior information from the historical state of the model to improve defense against adversarial attacks. Additionally, the paper proposes a Self Distillation (SD) based defense objective to constrain the update process of the proxy model without relying on larger teacher models.* Results: The paper demonstrates significant performance enhancements in improving robust accuracy (RA) across various datasets, backbones, and attack modalities, with improvements of up to 9.2% and 20.5% on the CIFAR10 and CIFAR100 datasets, respectively. The paper also shows that the proposed method can improve training stability and reduce catastrophic overfitting issues.Abstract
In light of the vulnerability of deep learning models to adversarial samples and the ensuing security issues, a range of methods, including Adversarial Training (AT) as a prominent representative, aimed at enhancing model robustness against various adversarial attacks, have seen rapid development. However, existing methods essentially assist the current state of target model to defend against parameter-oriented adversarial attacks with explicit or implicit computation burdens, which also suffers from unstable convergence behavior due to inconsistency of optimization trajectories. Diverging from previous work, this paper reconsiders the update rule of target model and corresponding deficiency to defend based on its current state. By introducing the historical state of the target model as a proxy, which is endowed with much prior information for defense, we formulate a two-stage update rule, resulting in a general adversarial defense framework, which we refer to as `LAST' ({\bf L}earn from the P{\bf ast}). Besides, we devise a Self Distillation (SD) based defense objective to constrain the update process of the proxy model without the introduction of larger teacher models. Experimentally, we demonstrate consistent and significant performance enhancements by refining a series of single-step and multi-step AT methods (e.g., up to $\bf 9.2\%$ and $\bf 20.5\%$ improvement of Robust Accuracy (RA) on CIFAR10 and CIFAR100 datasets, respectively) across various datasets, backbones and attack modalities, and validate its ability to enhance training stability and ameliorate catastrophic overfitting issues meanwhile.
摘要
在深度学习模型面临抗击样本攻击和相关安全问题的情况下,一系列方法,包括抗击训练(AT)作为代表,努力强化模型对各种抗击攻击的抗御能力。然而,现有方法主要帮助目标模型在抗击攻击中增强对参数的抗御能力,具有显著的计算负担和不稳定的收敛行为。与之前的工作不同,本文重新考虑目标模型的更新规则和相关缺陷,基于目标模型当前状态,提出了一种通用的抗击防御框架,称之为“LAST”(学习从过去)。此外,我们还提出了一种基于自适应融合(SD)的防御目标函数,以防止代理模型更新过程中的潜在混乱。实验表明,我们可以通过改进单步和多步AT方法(例如,在CIFAR10和CIFAR100 datasets上提高了Robust Accuracy(RA)的性能,最高提高达9.2%和20.5%),并在不同的 datasets、后处和攻击模式下达到了显著的性能提升。此外,我们还证明了它能够提高训练稳定性和避免潜在的混乱学习问题。
On the Optimization and Generalization of Multi-head Attention
for: investigate the potential optimization and generalization advantages of using multiple attention heads in Transformer’s core mechanism
methods: derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model
results: demonstrate that the conditions for realizability hold for a simple tokenized-mixture model, and expect the analysis can be extended to various data-model and architecture variations.Here’s the format you requested:
for: <what are the paper written for?>
methods: <what methods the paper use?>
results: <what results the paper get?>I hope that helps!Abstract
The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.
摘要
<>transformer 核心机制,即注意机制,的训练和泛化动态仍未得到充分探索。另外,现有的分析主要集中在单头注意力。针对此,我们发现了过参数化训练完全连接网络时的优化和泛化优势。为达到这个目标,我们 derive了梯度下降训练单层多头自注意模型的收敛和泛化保证,只要数据满足适当的可能性条件。然后,我们确定了初始化的 primitive conditions,以保证可能性条件成立。最后,我们证明这些条件在一个简单的Tokenized-mixture模型中成立。我们预计这些结果可以推广到不同的数据-模型和架构变化。<>
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
results: 研究发现,CANNs 可以在具有多种输入特征的保险数据集上达到最高的准确率,而且可以使用自动编码器将分类变量embed到神经网络中,从而提高模型的性能。此外,研究还构建了全局替身模型,使得可以将神经网络中的频率和严重度模型翻译成 GLM 模型,从而实现技术评估表的生成。Abstract
Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.
摘要
保险公司通常会使用通用线性模型来模拟养成和严重性数据。由于机器学习技术在其他领域的成功,因此在保险工具箱中受到欢迎。我们的论文对频率-严重保险价格使用机器学习技术进行了贡献,特别是使用深度学习结构。我们对四个保险数据集进行了比较性研究,每个数据集具有频率和严重性目标,同时具有多种输入特征。我们对比了以下四种模型的性能:通用线性模型在分割输入数据上,梯度拟合树模型,feed-forward neural network (FFNN),以及结合投保险 neural network (CANN)。我们的CANNs将基eline预测使用通用线性模型和梯度拟合树模型,然后使用神经网络修正。我们还解释了对多种输入特征进行数据处理步骤,包括邮政编码、数字和分类变量。我们使用自动编码器将分类变量嵌入神经网络,并考虑其在频率-严重设置中的潜在优势。最后,我们构建了全球抽象模型 для神经网络的频率和严重模型。这些抽象模型使得可以将FFNNs或CANNs的主要启示翻译到GLMs中。因此,我们可以轻松地生成一份技术 tariff table,可以在实践中使用。
STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models
methods: 该方法基于一种批量 Monte Carlo(MCMC)算法,通过一种随机步长和梯度信息来更新负样本。
results: 实验表明,STANLEY方法可以更好地采样高维数据,并且可以提供更高质量的样本。Abstract
We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.
摘要
我们在这篇论文中提出了一种名为STANLEY的Stochastic gradient ANisotropic LangEvin dynamics,用于采样高维数据。随着能量基模型(EBM)的生长效力和潜在性在高维数据观测的模型方面的不同种类数据观测上的应用,我们提出了一种终端学习算法,用于EBM的训练,以提高模型采样数据点的质量。由于EBM的未知正常化常量,使训练过程变得不可行,因此通常需要采用Markov Chain Monte Carlo(MCMC)方法。我们认为MCMC方法在EBM训练中的实现,需要一种适应非线性EBM背部的更新方法,这里是一个卷积神经网络。我们的方法,即STANLEY,是一种用我们新引入的MCMC方法进行EBM训练的优化算法。我们提供了对我们采样方案的理论理解,证明该采样方案导致一个几何上均匀 Erdős 链。我们的论文中还提供了一些图像生成实验,以证明我们的方法的效果。
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models
paper_authors: Boyang Zhang, Zheng Li, Ziqing Yang, Xinlei He, Michael Backes, Mario Fritz, Yang Zhang
for: 本研究旨在全面描述机器学习模型的安全和隐私漏洞,并在实际应用中进行评估。
methods: 本研究使用公开可用的模型 weights 从互联网(公共模型)来评估机器学习模型的攻击和防御方法。我们建立了一个名为 SecurityNet 的数据库,包含 910 个图像分类模型的注解。我们Then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models.
results: 我们的评估表明,使用公共模型进行评估时,攻击和防御方法的效果会异常 significatively 不同于使用自己训练的模型。我们将 SecurityNet 分享给研究人员,并建议他们在未来的研究中使用公共模型进行实验,以更好地证明他们的提议的方法的效果。Abstract
While advanced machine learning (ML) models are deployed in numerous real-world applications, previous works demonstrate these models have security and privacy vulnerabilities. Various empirical research has been done in this field. However, most of the experiments are performed on target ML models trained by the security researchers themselves. Due to the high computational resource requirement for training advanced models with complex architectures, researchers generally choose to train a few target models using relatively simple architectures on typical experiment datasets. We argue that to understand ML models' vulnerabilities comprehensively, experiments should be performed on a large set of models trained with various purposes (not just the purpose of evaluating ML attacks and defenses). To this end, we propose using publicly available models with weights from the Internet (public models) for evaluating attacks and defenses on ML models. We establish a database, namely SecurityNet, containing 910 annotated image classification models. We then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models. We share SecurityNet with the research community. and advocate researchers to perform experiments on public models to better demonstrate their proposed methods' effectiveness in the future.
摘要
While advanced machine learning (ML) models are deployed in numerous real-world applications, previous works demonstrate these models have security and privacy vulnerabilities. Various empirical research has been done in this field. However, most of the experiments are performed on target ML models trained by the security researchers themselves. Due to the high computational resource requirement for training advanced models with complex architectures, researchers generally choose to train a few target models using relatively simple architectures on typical experiment datasets. We argue that to understand ML models' vulnerabilities comprehensively, experiments should be performed on a large set of models trained with various purposes (not just the purpose of evaluating ML attacks and defenses). To this end, we propose using publicly available models with weights from the Internet (public models) for evaluating attacks and defenses on ML models. We establish a database, namely SecurityNet, containing 910 annotated image classification models. We then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models. We share SecurityNet with the research community and advocate researchers to perform experiments on public models to better demonstrate their proposed methods' effectiveness in the future.
Knowledge from Uncertainty in Evidential Deep Learning
paper_authors: Cai Davies, Marc Roig Vilamala, Alun D. Preece, Federico Cerutti, Lance M. Kaplan, Supriyo Chakraborty
for: 这 paper 探讨了 Deep Learning 中的不确定性信号,具体来说是 Evidential Deep Learning (EDL) 中的不确定性信号。EDL 是一种提供测试样本上的信息量 (epistemic uncertainty) 的深度学习方法。
methods: 这 paper 使用了 Dirichlet 强度来捕捉 EDL 中的不确定性信号,并对 computer vision 和 bidirectional encoder large language models 进行了实验研究。
results: 研究发现,在某些情况下,EDL 的 `evidential signal’ 可以区分类别,特别是使用大语言模型时。此外,研究还发现了 EDL 中的 KL 规则化项对 uncertainty 的影响。与其他 Dirichlet-based 方法比较,EDL 的不确定性 coupling 是由于训练时没有使用 OUT-OF-distribution 样本而导致的。Abstract
This work reveals an evidential signal that emerges from the uncertainty value in Evidential Deep Learning (EDL). EDL is one example of a class of uncertainty-aware deep learning approaches designed to provide confidence (or epistemic uncertainty) about the current test sample. In particular for computer vision and bidirectional encoder large language models, the `evidential signal' arising from the Dirichlet strength in EDL can, in some cases, discriminate between classes, which is particularly strong when using large language models. We hypothesise that the KL regularisation term causes EDL to couple aleatoric and epistemic uncertainty. In this paper, we empirically investigate the correlations between misclassification and evaluated uncertainty, and show that EDL's `evidential signal' is due to misclassification bias. We critically evaluate EDL with other Dirichlet-based approaches, namely Generative Evidential Neural Networks (EDL-GEN) and Prior Networks, and show theoretically and empirically the differences between these loss functions. We conclude that EDL's coupling of uncertainty arises from these differences due to the use (or lack) of out-of-distribution samples during training.
摘要
Translated into Simplified Chinese:这个研究揭示了 Evidential Deep Learning (EDL) 中存在的证据信号。EDL 是一种uncertainty-aware深度学习方法,用于提供测试样本的信任度(或epistemic uncertainty)。特别是在计算机视觉和双向编码大语言模型中,EDL 中的 Dirichlet 强度可以在某些情况下区分类别,这是使用大语言模型时 particuarly strong。我们假设了 KL 正则项使得 EDL 将 aleatoric 和 epistemic uncertainty 相互关联。在这篇论文中,我们employmy empirical investigation 证明了 misclassification 和评估不确定性之间的相关性,并显示了 EDL 的 `evidential signal' 是因为分类偏见。我们还与其他 Dirichlet-based 方法,namely Generative Evidential Neural Networks (EDL-GEN) 和 Prior Networks,进行了比较,并通过理论和实验表明了这些损失函数之间的差异。我们结论认为,EDL 的不确定性归功于使用(或缺失)out-of-distribution 样本 durante training。
Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic
paper_authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhenisbek Assylbekov
for: 本研究探讨了使用梯度基本法来训练高频 периоди函数或模块乘法的限制和挑战。
methods: 本研究使用了梯度分析来研究高频 periodic函数或模块乘法的训练难度。
results: 研究发现,当频率或基数$p$较大时,梯度的方差在高频 periodic函数或模块乘法中是非常小的,这使得使用梯度基本法进行训练变得困难。Abstract
Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form $x\to ax \bmod p$, where $a$ is taken from ${\mathbb Z}_p$, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of $p$-periodic functions on ${\mathbb Z}$ and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base $p$ is large. This in turn prevents such a learning algorithm from being successful.
摘要
Classes of target functions containing a large number of approximately orthogonal elements are known to be difficult to learn using Statistical Query algorithms. Recently, this classical fact has resurfaced in the context of gradient-based optimization of neural networks. In the new framework, the difficulty of a class is typically measured by the variance of the gradient with respect to a random choice of a target function.A set of functions of the form $x\to ax \mod p$, where $a$ is chosen from $\mathbb{Z}_p$, has garnered attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of $p$-periodic functions on $\mathbb{Z}$ and is closely related to a class of high-frequency periodic functions on the real line.We present a mathematical analysis of the limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We show that the variance of the gradient is negligibly small in both cases when either the frequency or the prime base $p$ is large, which in turn prevents such a learning algorithm from being successful.
Inverse Renormalization Group of Disordered Systems
methods: 使用 inverse renormalization group 变换和机器学习算法构建粒子数量增长的approximate配置
results: 在三维 Edwards-Anderson 模型中提取了两个极限常数Here’s a more detailed explanation of each point:
for: The paper is written to study the growth of the number of particles in the three-dimensional Edwards-Anderson spin glass model.
methods: The paper uses inverse renormalization group transformations and machine learning algorithms to construct approximate configurations for lattices with increasing particle numbers.
results: The paper extracts two critical exponents from the rescaled lattices.I hope this helps! Let me know if you have any further questions.Abstract
We propose inverse renormalization group transformations to construct approximate configurations for lattice volumes that have not yet been accessed by supercomputers or large-scale simulations in the study of spin glasses. Specifically, starting from lattices of volume $V=8^{3}$ in the case of the three-dimensional Edwards-Anderson model we employ machine learning algorithms to construct rescaled lattices up to $V'=128^{3}$, which we utilize to extract two critical exponents. We conclude by discussing how to incorporate numerical exactness within inverse renormalization group approaches of disordered systems, thus opening up the opportunity to explore a sustainable and energy-efficient generation of exact configurations for increasing lattice volumes without the use of dedicated supercomputers.
摘要
我们提出倒数重整化群变换来建构粗糙配置,以探索未曾被超级电脑或大规模模拟的磁铁玻璃系统。具体来说,从三维爱德华兹-安德逊模型的网格量$V=8^{3}$开始,我们使用机器学习算法建构缩小网格,直到$V'=128^{3}$,并将其用于提取两个极限常数。我们最后讨论如何在倒数重整化群变换方法中包含数据精度,以便在无需特别超级电脑的情况下,可以持续和可持续地生成精确配置,探索增加网格量的可能性。
An Improved Metarounding Algorithm via Frank-Wolfe
for: linear optimization over combinatorial classes
methods: metarounding algorithm and relax-based approximation algorithm
results: much more efficient in both theoretical and practical aspectsAbstract
Metarounding is an approach to convert an approximation algorithm for linear optimization over some combinatorial classes to an online linear optimization algorithm for the same class. We propose a new metarounding algorithm under a natural assumption that a relax-based approximation algorithm exists for the combinatorial class. Our algorithm is much more efficient in both theoretical and practical aspects.
摘要
这是一种方法,可以将线性估计算法转换为在同一类别上的在线估计算法。我们提出了一个新的这种方法,基于自然的假设,存在一个松动基于的估计算法。我们的算法在理论和实践方面都比较高效。
How a student becomes a teacher: learning and forgetting through Spectral methods
paper_authors: Lorenzo Giambagli, Lorenzo Buffoni, Lorenzo Chicchi, Duccio Fanelli
For: 本研究使用教师-学生模式来解释在实际辅导中的效果。在学生网络过参数化的情况下,这种模式特别有用,因为学生网络可能只需要一部分权重来处理任务。* Methods: 本研究提出了一种新的优化方案,基于层传递信息的spectral representation来计算梯度。与标准训练算法相比,该方法增加的计算复杂度和计算量几乎是零。* Results: 研究发现,在培养学生网络后,可以隔离一个稳定的学生子结构,该结构与教师网络的计算neuron数、路径分布和 topological attribute具有相似性。当去掉学生网络中无关的节点时,对记录的性能没有下降。这种行为可以被描述为一种真正的第二阶段相变,具有普遍性特征。Abstract
In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits.
摘要
在理论机器学习中,教师-学生模式经常被用作有效的教学模式。上述方案特别适用于学生网络过参数化于教师网络。在这种情况下,可能存在一个学生可以处理给定任务的能力被储存在学生网络中的一个子网络中。这个子网络应该与教师结构相似,根据适当的指标,而且在不同学生候选网络架构下保持相对不变。然而,现有的普通学习技术无法确定这个不变子网络的存在,因为研究的问题具有内在的非对称性。在这项工作中,我们采用了一种极其不同的优化方案,基于层传递信息的线性转换的спектраль表示。因此,对于学生网络来说,计算梯度的时候不仅考虑权重,还考虑特征向量,这对标准训练算法而言增加了计算量和复杂性负担相对较少。在这个框架下,我们可以隔离出一个稳定的学生子结构,这个子结构与教师结构相似,包括计算neuron数、路径分布和 topological特征。当将训练后的学生网络中的不重要节点剪除,按照记录的性能来进行排名,则不会出现性能下降问题,直到达到教师大小的有效阈值。这种行为可以被描述为一种真正的第二阶段相对稳定过程,具有普遍性特征。
results: 通过对 simulate 数据和实际数据进行试验,该论文表明了该方法的优势和实际应用性。Abstract
The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data.
摘要
“ Machine learning 的主要挑战是为新数据进行扩展。这个工作研究了具有相关任务的数据集中的扩展问题。例如,观察医学数据可能受到病人间病理机制的不同,导致机器学习算法对新病人数据进行扩展具有挑战。常见的方法包括学习全域模型、学习每个任务的数据上的本地模型,或者使用层次、多任务学习方法来学习如何从多个任务中获得新数据的扩展。在这篇论文中,我们提出了因果相似性基于的层次确 Dirichlet 模型,以提高对新任务的扩展。我们将这个通用模型应用到 Bayesian 神经网络中,并比较了不同的方法来估计因果任务相似性(包括知道和未知因果模型)。我们透过一系列实验,证明了我们的方法的好处和实际应用性。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models
results: 本研究通过三个实际案例示例,展示了julearn可以帮助研究人员轻松实现一些已经发表的研究项目,并且提供了一些特有的功能来帮助研究人员设计和评估复杂的ML管道。Abstract
The fast-paced development of machine learning (ML) methods coupled with its increasing adoption in research poses challenges for researchers without extensive training in ML. In neuroscience, for example, ML can help understand brain-behavior relationships, diagnose diseases, and develop biomarkers using various data sources like magnetic resonance imaging and electroencephalography. The primary objective of ML is to build models that can make accurate predictions on unseen data. Researchers aim to prove the existence of such generalizable models by evaluating performance using techniques such as cross-validation (CV), which uses systematic subsampling to estimate the generalization performance. Choosing a CV scheme and evaluating an ML pipeline can be challenging and, if used improperly, can lead to overestimated results and incorrect interpretations. We created julearn, an open-source Python library, that allow researchers to design and evaluate complex ML pipelines without encountering in common pitfalls. In this manuscript, we present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects that can be easily implemented using this novel library. Julearn aims to simplify the entry into the ML world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls. With its design, unique features and simple interface, it poses as a useful Python-based library for research projects.
摘要
Machine learning (ML) 技术的快速发展和在研究中的普及,使得不具有深入学习 ML 背景的研究者面临着挑战。例如,在神经科学中,ML 可以帮助理解大脑行为关系、诊断疾病和开发生物标志物理化学。ML 的主要目标是建立可以在未看到数据上做准确预测的模型。研究人员希望通过CV技术(系统性采样)来评估ML管道的性能,以证明模型的普遍性。然而,选择CV方案和评估ML管道可以是困难的,如果不当使用,可能会导致结果过分估计和错误解释。为了解决这些问题,我们开发了 jullearn,一个开源的 Python 库。julearn 使得研究人员可以设计和评估复杂的 ML 管道,而无需遇到常见的陷阱。在这篇论文中,我们介绍了 jullearn 的设计理念、核心特点和三个已经发表的研究项目的实现。julearn 希望通过提供简单易用的环境,帮助研究者更容易进入 ML 世界,并提供了一些常见 ML 陷阱的防范机制。与其他 Python 基础库相比,julearn 具有独特的设计和简单的界面,成为一个有用的 Python 库 для研究项目。
results: 我们对六个 benchmark 数据集进行评估,结果表明我们提出的邻居聚合方法在OOD探测中超过了现有方法独立于图神经网络。此外,我们还证明了我们的Open-WRF方法在阈值选择上更加稳定,并分析了图域对OOD探测的影响。聚合和阈值方法可以与任何图神经网络和OOD探测方法结合使用,使我们的方法强大和适用于许多实际应用。Abstract
We study the problem of lifelong graph learning in an open-world scenario, where a model needs to deal with new tasks and potentially unknown classes. We utilize Out-of-Distribution (OOD) detection methods to recognize new classes and adapt existing non-graph OOD detection methods to graph data. Crucially, we suggest performing new class detection by combining OOD detection methods with information aggregated from the graph neighborhood. Most OOD detection methods avoid determining a crisp threshold for deciding whether a vertex is OOD. To tackle this problem, we propose a Weakly-supervised Relevance Feedback (Open-WRF) method, which decreases the sensitivity to thresholds in OOD detection. We evaluate our approach on six benchmark datasets. Our results show that the proposed neighborhood aggregation method for OOD scores outperforms existing methods independent of the underlying graph neural network. Furthermore, we demonstrate that our Open-WRF method is more robust to threshold selection and analyze the influence of graph neighborhood on OOD detection. The aggregation and threshold methods are compatible with arbitrary graph neural networks and OOD detection methods, making our approach versatile and applicable to many real-world applications.
摘要
我们研究开放世界enario下的生命周期图学习问题,模型需要处理新任务和可能未知的类别。我们利用外部 Distribution(OOD)检测方法来识别新类别,并将非图形OOD检测方法应用到图 Daten。在新类别检测中,我们建议结合OOD检测方法和图ogram neighborhood中的信息。大多数OOD检测方法避免明确的阈值来决定顶点是否为外部,来解决这个问题,我们提出了弱监督的相关反馈方法(Open-WRF)。我们将这个方法应用到六个标准资料集上,结果显示,我们的邻居统计方法在不同的图形神经网络下表现出色,并且比独立的OOD检测方法更有效。此外,我们还证明了我们的Open-WRF方法具有更好的韧性,并分析了图ogram neighborhood对OOD检测的影响。这些统计和阈值方法是对任何图形神经网络和OOD检测方法都适用的,因此我们的方法是多元的和适用于实际应用。
Approximate information maximization for bandit games
paper_authors: Alex Barbier-Chebbah, Christian L. Vestergaard, Jean-Baptiste Masson, Etienne Boursier
for: 模型物理系统的动态,如大脑做出决策、访问隐藏变量等
methods: 使用自由能原理和信息瓶颈原理进行优化
results: 提出了一种基于信息最大化的新型抽筋算法,可以在классиical bandit设置中实现强表现,并且在二手抽筋问题中证明其 asymptotic optimality。Abstract
Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.
摘要
entropy maximization和自由能 minimization是物理系统的通用原理,用于模型各种物理系统的动态。其中一些例子包括使用自由能原理模型大脑做出决策(Tishby et al., 2000)、使用信息瓶颈原理(Vergassola et al., 2007)在随机环境中导航,以及最大化系统中变量的信息。基于这个原理,我们提出了一种新的bandit算法,该算法可以最大化系统中变量的信息。为此,我们开发了一个近似analytical physics-based表示,用于预测每个动作的信息增加。这种方法在 klasische bandit设置中实现了强的表现。受其实际成功的激励,我们证明了其在两手bandit问题上的极限优化性。由于它可以尝试系统的全局物理函数,这种方法可以有效地适应更复杂的bandit设置,这叫做更多的研究信息最大化方法在多手bandit问题上。
results: 在颜色MNIST、CelebA和成人收入数据集上进行了实验,结果显示了我们的方法可以与现有的方法相比,实现更高的准确性,同时具有更低的偏见水平和训练成本。此外,我们的方法只需要一小量的外部数据和更新少量的模型参数,不需要训练数据的存取权。Abstract
Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets along with experiments with large language models demonstrate that our method achieves superior or competing accuracies compared with state-of-the-art methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.
摘要
最近的发现表明深度神经网络在实际场景中可能会具有偏见性。例如,在大规模人脸识别数据集CelebA上训练的深度网络往往预测女性拥有金色头发,男性拥有黑色头发。这些偏见不仅会影响模型的稳定性,还可能扩大和加剧社会偏见,尤其是在自动决策过程中,这可能会加剧不公平的经济和社会不平等。现有的偏见纠正方法存在高成本的偏见标签或模型重新训练的问题,同时也存在评估偏见来源的缺陷。为此,我们提出了一种快速模型偏见纠正框架(FMD),它提供了一种有效的方法来识别、评估和除去训练过程中的偏见。FMD使用明确的对立思想来识别偏见的特征,并使用数据样本的影响函数来评估偏见的影响。此外,我们还设计了一种基于机器学习的“机器忘记”策略,可以高效地和有效地除去偏见,只需要一小量的对立数据集和更新少量的模型参数。实验表明,我们的方法在颜色MNIST、CelebA和成人收入数据集上以及与大型自然语言模型进行实验,均可以达到或超越当前状态艺的准确率,而且需要远 fewer 偏见和更少的偏见纠正成本。尤其是,我们的方法只需要一小部分的外部数据集和更新少量的模型参数,不需要训练数据的大小或可用性。
Neural Likelihood Approximation for Integer Valued Time Series Data
results: 我们在一些生态和疫情学模型中进行了推断,并证明我们可以准确地 aproximate 真实 posterior,同时 achieve significiant 计算速度提升。Abstract
Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is difficult due to intractability of the likelihood; current methods, based on simulations of the underlying model, can be so computationally expensive as to be prohibitive. In this paper we construct a neural likelihood approximation for integer valued time series data using causal convolutions, which allows us to evaluate the likelihood of the whole time series in parallel. We demonstrate our method by performing inference on a number of ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving significant computational speed ups in situations where current methods struggle.
摘要
In this paper, we propose a neural likelihood approximation for integer-valued time series data based on causal convolutions. This approach enables us to evaluate the likelihood of the entire time series in parallel, achieving significant computational speedups. We demonstrate the effectiveness of our method by performing inference on several ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving substantial computational savings in situations where current methods struggle.
Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers
paper_authors: D. Huppenkothen, M. Ntampaka, M. Ho, M. Fouesneau, B. Nord, J. E. G. Peek, M. Walmsley, J. F. Wu, C. Avestruz, T. Buck, M. Brescia, D. P. Finkbeiner, A. D. Goulding, T. Kacprzak, P. Melchior, M. Pasquato, N. Ramachandra, Y. -S. Ting, G. van de Ven, S. Villar, V. A. Villar, E. Zinger
results: 本研究提出了一种方法来报告机器学习模型的结果,以便帮助作者、评审人和编辑者更好地理解和复制研究结果。Abstract
Machine learning has rapidly become a tool of choice for the astronomical community. It is being applied across a wide range of wavelengths and problems, from the classification of transients to neural network emulators of cosmological simulations, and is shifting paradigms about how we generate and report scientific results. At the same time, this class of method comes with its own set of best practices, challenges, and drawbacks, which, at present, are often reported on incompletely in the astrophysical literature. With this paper, we aim to provide a primer to the astronomical community, including authors, reviewers, and editors, on how to implement machine learning models and report their results in a way that ensures the accuracy of the results, reproducibility of the findings, and usefulness of the method.
摘要
机器学习已经迅速成为天文学界的工具之一。它在各种波长和问题上应用,从脉冲分类到神经网络模拟 cosmological simulations,并在科学结果的生成和报告方面引发了 paradigm shift。然而,这种类型的方法也有自己的最佳实践、挑战和缺点,现在 frequently 在astrophysical literature中报道不够 completely。本文的目标是为天文学界提供一份指南,包括作者、评审人和编辑,如何实施机器学习模型和报告结果,以确保结果的准确性、结果的重复性和方法的有用性。
Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization
for: 这个研究ocuses on time-sensitive black-box optimization problems, and proposes a satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) approach to solve these problems.
methods: The proposed STS-PBO approach uses the rate-distortion theory to construct a loss function that balances the amount of information that needs to be learned with sub-optimality, and the Blahut-Arimoto algorithm to compute the target solution that reaches the minimum information rate under the distortion limit at each step.
results: The proposed STS-PBO methods outperform both sequential counterparts and parallel BO with traditional Thompson sampling in both synchronous and asynchronous settings, as demonstrated on a fast-charging design problem of Lithium-ion batteries.Here is the Chinese version of the three key points:
results: 提案的 STS-PBO 方法在同步和异步设定下,均能超越统计类似的序列对照和传统 Thompson sampling 的平行BO,并在快充电设计问题上显示了有效性。Abstract
Bayesian optimization (BO) is widely used for black-box optimization problems, and have been shown to perform well in various real-world tasks. However, most of the existing BO methods aim to learn the optimal solution, which may become infeasible when the parameter space is extremely large or the problem is time-sensitive. In these contexts, switching to a satisficing solution that requires less information can result in better performance. In this work, we focus on time-sensitive black-box optimization problems and propose satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) approaches, including synchronous and asynchronous versions. We shift the target from an optimal solution to a satisficing solution that is easier to learn. The rate-distortion theory is introduced to construct a loss function that balances the amount of information that needs to be learned with sub-optimality, and the Blahut-Arimoto algorithm is adopted to compute the target solution that reaches the minimum information rate under the distortion limit at each step. Both discounted and undiscounted Bayesian cumulative regret bounds are theoretically derived for the proposed STS-PBO approaches. The effectiveness of the proposed methods is demonstrated on a fast-charging design problem of Lithium-ion batteries. The results are accordant with theoretical analyses, and show that our STS-PBO methods outperform both sequential counterparts and parallel BO with traditional Thompson sampling in both synchronous and asynchronous settings.
摘要
泛bayesian优化(BO)广泛应用于黑盒优化问题中,并在实际任务中表现良好。然而,大多数现有BO方法寻求学习最优解决方案,可能在参数空间很大或问题时间敏感时变得不可能。在这些情况下,切换到一个满足解决方案可能更有利。在这项工作中,我们关注时间敏感黑盒优化问题,并提出了一种基于满足 Thompson sampling 的并行 Bayesian 优化方法(STS-PBO),包括同步和异步版本。我们将目标从最优解决方案转换到一个更容易学习的满足解决方案。基于信息率-质量衡量理论,我们构建了一个损失函数,该函数平衡学习所需的信息量与不足的质量之间的平衡。我们采用了Blahut-Arimoto算法来计算每步目标解决方案,以达到最小信息率下的损失最小化。我们在 theoretically deriv了对 STS-PBO 方法的 Bayesian 束违率下界,以及不COUNT 束违率下界。实验结果表明,我们的 STS-PBO 方法在同步和异步设置下都超过了序列对应方法和传统 Thompson sampling 并行 BO 方法。
WeaveNet for Approximating Two-sided Matching Problems
for: This paper is written for optimizing the assignment of limited resources under various constraints, with a focus on the task of matching in bipartite graphs.
methods: The paper proposes a novel graph neural network (GNN) called WeaveNet, which is designed to preserve edge-wise information while passing messages densely to reach a better solution for matching problems.
results: Despite being a general-purpose model, WeaveNet achieved a comparable performance with state-of-the-art algorithms for fair stable matching, even for small numbers of agents.Abstract
Matching, a task to optimally assign limited resources under constraints, is a fundamental technology for society. The task potentially has various objectives, conditions, and constraints; however, the efficient neural network architecture for matching is underexplored. This paper proposes a novel graph neural network (GNN), \textit{WeaveNet}, designed for bipartite graphs. Since a bipartite graph is generally dense, general GNN architectures lose node-wise information by over-smoothing when deeply stacked. Such a phenomenon is undesirable for solving matching problems. WeaveNet avoids it by preserving edge-wise information while passing messages densely to reach a better solution. To evaluate the model, we approximated one of the \textit{strongly NP-hard} problems, \textit{fair stable matching}. Despite its inherent difficulties and the network's general purpose design, our model reached a comparative performance with state-of-the-art algorithms specially designed for stable matching for small numbers of agents.
摘要
匹配任务是社会基础技术之一,目标是最优分配有限资源于约束下。这个任务可能有多种目标、条件和约束,但是现有的神经网络架构仍然未得到充分探索。这篇论文提出了一种新的图 neural network(GNN),称为 WeaveNet,用于二分图。由于二分图通常是密集的,通常的GNN架构在深层核stacking时会导致节点信息产生泛化,这是解决匹配问题的不希望的现象。WeaveNet则避免了这种现象,通过保持边信息而传递消息,以达到更好的解决方案。为评估模型,我们约化了一个“strongly NP-hard”的问题——公平稳定匹配。尽管这个问题具有内在的困难和网络通用设计,我们的模型仍然可以与特定为稳定匹配的状态静态算法相比,在小量代理人情况下达到了相似的表现。
American Option Pricing using Self-Attention GRU and Shapley Value Interpretation
For: The paper is written for investors and financial analysts who want to use machine learning methods to predict the prices of SPY (ETF) options.* Methods: The paper proposes using a gated recurrent unit (GRU) and self-attention mechanism to forecast the prices of SPY options. The authors also compare the performance of their model with traditional binomial models and other machine learning models.* Results: The paper shows that the self-attention GRU model with historical data outperforms other models in predicting the prices of SPY options, and provides insights into the significance and contributions of different input features on option pricing using the SHAP method.Abstract
Options, serving as a crucial financial instrument, are used by investors to manage and mitigate their investment risks within the securities market. Precisely predicting the present price of an option enables investors to make informed and efficient decisions. In this paper, we propose a machine learning method for forecasting the prices of SPY (ETF) option based on gated recurrent unit (GRU) and self-attention mechanism. We first partitioned the raw dataset into 15 subsets according to moneyness and days to maturity criteria. For each subset, we matched the corresponding U.S. government bond rates and Implied Volatility Indices. This segmentation allows for a more insightful exploration of the impacts of risk-free rates and underlying volatility on option pricing. Next, we built four different machine learning models, including multilayer perceptron (MLP), long short-term memory (LSTM), self-attention LSTM, and self-attention GRU in comparison to the traditional binomial model. The empirical result shows that self-attention GRU with historical data outperforms other models due to its ability to capture complex temporal dependencies and leverage the contextual information embedded in the historical data. Finally, in order to unveil the "black box" of artificial intelligence, we employed the SHapley Additive exPlanations (SHAP) method to interpret and analyze the prediction results of the self-attention GRU model with historical data. This provides insights into the significance and contributions of different input features on the pricing of American-style options.
摘要
Options, 作为投资工具,可以帮助投资者在证券市场中管理和减轻投资风险。正确预测现有选择价格可以帮助投资者做出 Informed 和高效的决策。在这篇论文中,我们提出了一种基于 GRU 和自注意机制的机器学习方法,用于预测 SPY (ETF) 选择价格。我们首先将原始数据 partitioned 成 15 个subset,根据资产价值和到期日的 criterion。对于每个subset,我们匹配了相应的美国政府债券利率和假设权益指数。这种分 segmentation 允许我们更深入地探索风险自由率和下跌权益对选择价格的影响。接着,我们建立了四种不同的机器学习模型,包括多层感知器 (MLP)、长短期记忆 (LSTM)、自注意 LSTM 和自注意 GRU。与传统 binomial 模型相比,自注意 GRU WITH 历史数据表现最佳,这是因为它可以捕捉复杂的时间相关性和利用历史数据中嵌入的上下文信息。最后,我们使用 SHAP 方法来解释和分析 self-attention GRU 模型 WITH 历史数据的预测结果,这提供了对 American-style 选择价格的预测结果的解释和分析。
results: 论文通过对QMWD的计算和复杂度分析,以及与MWD和WD的比较,证明了QMWD在大型数据集或有限的计算资源下的优势。Abstract
The Quasi Manhattan Wasserstein Distance (QMWD) is a metric designed to quantify the dissimilarity between two matrices by combining elements of the Wasserstein Distance with specific transformations. It offers improved time and space complexity compared to the Manhattan Wasserstein Distance (MWD) while maintaining accuracy. QMWD is particularly advantageous for large datasets or situations with limited computational resources. This article provides a detailed explanation of QMWD, its computation, complexity analysis, and comparisons with WD and MWD.
摘要
“伪 Manhattan Wasserstein 距离”(QMWD)是一个计量,用于量化两个矩阵之间的不同程度,通过组合 Wasserstein 距离和特定的变换。它提供了与 Manhattan Wasserstein 距离(MWD)相同的精度,但时间和空间复杂度更低。QMWD 特别适合大规模的数据或有限的计算资源的情况。本文将提供 QMWD 的详细解释、计算、时间复杂度分析以及与 WD 和 MWD 的比较。
SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models
results: 研究发现,可以使用SD模型生成高质量的RL环境,并且可以使用RL来改进SD模型中的动态策略发现。这些发现开示了SD和RL之间的双重潜在性,并且预示了这两种方法在负责任AI中的潜在合作性。Abstract
Understanding the long-term impact of algorithmic interventions on society is vital to achieving responsible AI. Traditional evaluation strategies often fall short due to the complex, adaptive and dynamic nature of society. While reinforcement learning (RL) can be a powerful approach for optimizing decisions in dynamic settings, the difficulty of realistic environment design remains a barrier to building robust agents that perform well in practical settings. To address this issue we tap into the field of system dynamics (SD) as a complementary method that incorporates collaborative simulation model specification practices. We introduce SDGym, a low-code library built on the OpenAI Gym framework which enables the generation of custom RL environments based on SD simulation models. Through a feasibility study we validate that well specified, rich RL environments can be generated from preexisting SD models and a few lines of configuration code. We demonstrate the capabilities of the SDGym environment using an SD model of the electric vehicle adoption problem. We compare two SD simulators, PySD and BPTK-Py for parity, and train a D4PG agent using the Acme framework to showcase learning and environment interaction. Our preliminary findings underscore the dual potential of SD to improve RL environment design and for RL to improve dynamic policy discovery within SD models. By open-sourcing SDGym, the intent is to galvanize further research and promote adoption across the SD and RL communities, thereby catalyzing collaboration in this emerging interdisciplinary space.
摘要
理解算法干预对社会的长期影响是负责任AI的关键。传统评估策略常常因社会的复杂、适应和动态性而受限。而强化学习(RL)可以在动态设置中优化决策,但是在实际设置中建立坚实的代理人表现仍然是一个障碍。为解决这个问题,我们借鉴系统动态学(SD)作为补充方法,该方法包括合作模拟模型规范实践。我们介绍了SDGym,一个基于OpenAI Gym框架的低代码库,可以生成基于SD模型的自定义RL环境。经过一项可行性研究,我们证明了可以从现有的SD模型和一些配置代码生成高质量的RL环境。我们使用Acme框架和D4PG算法对SDGym环境进行了示例训练,并对PySD和BPTK-Py两个SD模拟器进行了比较。我们的初步发现表明SD可以提高RL环境设计,同时RL也可以提高动态政策发现在SD模型中。我们将SDGym公开开源,以促进研究和采用,并且激发SD和RL社区之间的合作,以便在这个新兴交叉领域中推动进步。
Improved Operator Learning by Orthogonal Attention
results: 在六个标准 benchmark 数据集上,我们的方法可以与基准线性比例出色得分强强Here’s a more detailed explanation of each point:
for: The paper is written to explore the use of neural operators for solving partial differential equations (PDEs).
methods: The paper uses attention-based neural operators, which have become a popular approach in the field of scientific machine learning. However, the authors note that existing approaches can suffer from overfitting due to the large number of parameters in the attention mechanism. To address this, they propose an orthogonal attention method based on eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions.
results: The authors report results on six standard benchmark datasets, including both regular and irregular geometries. Their method outperforms competing baselines with a decent margin, indicating the effectiveness of the proposed approach.Abstract
Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing baselines with decent margins.
摘要
射频运算符,作为科学机器学习中解决 partialling differential equations(PDEs)的有效代理模型,在科学机器学习领域得到了广泛的关注。其中,关注基于的射频运算符在相关研究中成为了主流。然而,现有的方法因射频机制中参数的较大数量而导致过度适应训练数据。为解决这问题,我们开发了基于 eigendecomposition 的射频运算符和神经函数近似的正交化方法。这种正交化自然地对 resulting neural operator 进行了正确的规范化效果,帮助抵御过度适应和提高泛化。在六个标准射频运算符 benchmark 数据集上(包括正则和不规则的geometry)进行了实验,我们发现我们的方法可以与其他基准值相比,表现出较好的性能。
Balanced Group Convolution: An Improved Group Convolution Based on Approximability Estimates
paper_authors: Youngkyu Lee, Jongho Park, Chang-Ock Lee
for: 提高神经网络性能的方法
methods: 使用 grouped convolution 减少计算成本
results: 对 grouped convolution 的数学分析和一种新的变体 balanced group convolution,以及对其他变体的比较Abstract
The performance of neural networks has been significantly improved by increasing the number of channels in convolutional layers. However, this increase in performance comes with a higher computational cost, resulting in numerous studies focused on reducing it. One promising approach to address this issue is group convolution, which effectively reduces the computational cost by grouping channels. However, to the best of our knowledge, there has been no theoretical analysis on how well the group convolution approximates the standard convolution. In this paper, we mathematically analyze the approximation of the group convolution to the standard convolution with respect to the number of groups. Furthermore, we propose a novel variant of the group convolution called balanced group convolution, which shows a higher approximation with a small additional computational cost. We provide experimental results that validate our theoretical findings and demonstrate the superior performance of the balanced group convolution over other variants of group convolution.
摘要
“神经网络的性能已经由通道数的增加在卷积层中得到了显著改善。然而,这种改善的成本增加了计算成本,导致了许多关于减少计算成本的研究。一种有前途的方法是组卷积,它可以有效地减少计算成本。然而,到目前为止,我们没有对组卷积和标准卷积之间的相似性进行了理论分析。在这篇论文中,我们对组卷积和标准卷积之间的相似性进行了数学分析,并考虑了分组数量对相似性的影响。此外,我们还提出了一种新的组卷积变体 called 平衡组卷积,它在小加计算成本的情况下显示了更高的相似性。我们提供了实验结果,证明了我们的理论发现和平衡组卷积的超越性。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
MuseGNN: Interpretable and Convergent Graph Neural Network Layers at Scale
results: 根据文章的描述,这种采用随机抽取的GNN架构能够在大规模的图数据上实现竞争力强的准确率和可扩展性。具体来说,文章提出了一种基于这种GNN架构的全局GNN模型,并在大规模的节点分类任务上实现了竞争力强的准确率和可扩展性。Abstract
Among the many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. In this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit desirable inductive biases and interpretability. However, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. To tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. We also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size.
摘要
amongst many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. in this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit desirable inductive biases and interpretability. however, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. to tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. we also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size.Note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.
Constrained Reweighting of Distributions: an Optimal Transport Approach
results: 本文在三种不同的应用中展示了其方法的灵活性,包括股票配置、复杂调查的 semi-parametric 推断和机器学习算法中的公平性权重调整。Abstract
We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
摘要
通常我们会遇到一个问题,即确定一个最优权重调整后的观察数据的Empirical distribution,并且遵循先defined的约束条件。这些约束通常表现为观察数据的分布的 moments、tail behaviors、形态、数量等的限制。在这篇文章中,我们将提出一种非parametric方法,通过在权重调整后的分布中嵌入分布约束,并利用最大 entropy原理和优化运输工具来开发一个通用的框架。我们的关键想法是使得最大 entropy权重调整后的观察数据的分布与先defined的概率分布在优化运输度量上尽可能接近,同时允许某些微的偏差。我们在三种不同的应用中展示了这种框架的灵活性: namely,股票组合 allocate, 复杂调查的 semi-parametric inference, 和机器学习算法中的公平性。
CAT: Closed-loop Adversarial Training for Safe End-to-End Driving
results: 实验结果显示,CAT可以快速生成更有效的物理攻击,并且可以将这些攻击与驾驶代码的训练相互作用,从而提高驾驶代码的安全性。Abstract
Driving safety is a top priority for autonomous vehicles. Orthogonal to prior work handling accident-prone traffic events by algorithm designs at the policy level, we investigate a Closed-loop Adversarial Training (CAT) framework for safe end-to-end driving in this paper through the lens of environment augmentation. CAT aims to continuously improve the safety of driving agents by training the agent on safety-critical scenarios that are dynamically generated over time. A novel resampling technique is developed to turn log-replay real-world driving scenarios into safety-critical ones via probabilistic factorization, where the adversarial traffic generation is modeled as the multiplication of standard motion prediction sub-problems. Consequently, CAT can launch more efficient physical attacks compared to existing safety-critical scenario generation methods and yields a significantly less computational cost in the iterative learning pipeline. We incorporate CAT into the MetaDrive simulator and validate our approach on hundreds of driving scenarios imported from real-world driving datasets. Experimental results demonstrate that CAT can effectively generate adversarial scenarios countering the agent being trained. After training, the agent can achieve superior driving safety in both log-replay and safety-critical traffic scenarios on the held-out test set. Code and data are available at https://metadriverse.github.io/cat.
摘要
驾驶安全是自动驾驶车辆的最高优先级。在政策层面上处理减少交通事故的算法设计方法已经存在,而我们在这篇论文中则是通过封闭型对抗训练(CAT)框架来提高驾驶代理人的安全性。CAT采用时间 dynamically generates safety-critical scenarios to continuously improve the safety of driving agents through closed-loop training. We develop a novel resampling technique to turn log-replay real-world driving scenarios into safety-critical ones via probabilistic factorization, allowing CAT to launch more efficient physical attacks and reduce the computational cost of the iterative learning pipeline. We incorporate CAT into the MetaDrive simulator and validate our approach on hundreds of driving scenarios imported from real-world driving datasets. Experimental results demonstrate that CAT can effectively generate adversarial scenarios countering the agent being trained, and the trained agent can achieve superior driving safety in both log-replay and safety-critical traffic scenarios on the held-out test set. Code and data are available at .
Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling
results: 论文发现预测模型中的性别偏见是 statistically 有效的,并通过 cross-validation validate 了偏见纠正模型的效果。此外,论文还表明了偏见纠正模型可以提高类别预测精度。Abstract
This paper proposes the use of causal modeling to detect and mitigate algorithmic bias. We provide a brief description of causal modeling and a general overview of our approach. We then use the Adult dataset, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on gender bias and the problem of binary classification. We show that gender bias in the prediction model is statistically significant at the 0.05 level. We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation. Furthermore, we show that the overall classification accuracy is improved slightly. Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R. Hence, it enhances explainability and promotes trust.
摘要
本文提出了使用 causal 模型探测和 Mitigate 算法偏见的方法。我们提供了简要的 causal 模型介绍和我们的方法概述。我们使用UC Irvine 机器学习库提供的 Adult 数据集来开发 (1) 预测模型(当作黑盒模型)和 (2) 偏见 Mitigation 模型。在本文中,我们关注了性别偏见问题,并使用二分类问题进行探究。我们发现预测模型中的性别偏见是 statistically 显著的(p < 0.05)。我们还证明了 causal 模型可以有效地 Mitigate 性别偏见,并且通过十分法证明了这种方法的可行性。此外,我们还发现了一些轻微的总分率提高。我们的新方法是直观、易用,可以使用现有的统计软件工具such as "lavaan" in R进行实现,因此增加了解释性和信任度。
results: 在单节点多GPU系统上实现了up to 64%的速度提升,相比独立批处理方法Abstract
Significant computational resources are required to train Graph Neural Networks (GNNs) at a large scale, and the process is highly data-intensive. One of the most effective ways to reduce resource requirements is minibatch training coupled with graph sampling. GNNs have the unique property that items in a minibatch have overlapping data. However, the commonly implemented Independent Minibatching approach assigns each Processing Element (PE) its own minibatch to process, leading to duplicated computations and input data access across PEs. This amplifies the Neighborhood Explosion Phenomenon (NEP), which is the main bottleneck limiting scaling. To reduce the effects of NEP in the multi-PE setting, we propose a new approach called Cooperative Minibatching. Our approach capitalizes on the fact that the size of the sampled subgraph is a concave function of the batch size, leading to significant reductions in the amount of work per seed vertex as batch sizes increase. Hence, it is favorable for processors equipped with a fast interconnect to work on a large minibatch together as a single larger processor, instead of working on separate smaller minibatches, even though global batch size is identical. We also show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches. Our experimental evaluations show up to 4x bandwidth savings for fetching vertex embeddings, by simply increasing this dependency without harming model convergence. Combining our proposed approaches, we achieve up to 64% speedup over Independent Minibatching on single-node multi-GPU systems.
摘要
具有重要计算资源的 Graph Neural Networks (GNNs) 在大规模培育中需要很多计算资源,并且是数据敏感的。一种有效的方法来降低资源需求是使用小批处理并 Graph sampling。GNNs 具有独特的特点,即批处理中的每个进程元素 (PE) 之间的数据协同。然而,通常实现的独立小批处理方法会在每个 PE 上分配自己的小批处理,导致重复的计算和数据访问,从而增加 Neighborhood Explosion Phenomenon (NEP),这是批处理的主要瓶颈。为了减少 NEP 在多个 PE 设置下的影响,我们提出了一种新的方法called Cooperative Minibatching。我们的方法利用了小批处理中采样的子图大小是批处理大小的凹形函数,从而导致每个种子顶点的工作量减少,因此更有利于配备快速互connect的处理器工作于大批处理中。我们还示出了在串行执行中使用相互依赖的连续小批处理可以获得更大的带宽减少,而不会影响模型的 converges。结合我们的提出的方法,我们在单个节点多卡系统上实现了与独立小批处理相比的最高速度提升达 64%。
paper_authors: Christopher Scarvelis, Haitz Sáez de Ocáriz Borde, Justin Solomon
for: 用于生成新样本,而不需要训练。
methods: 使用closed-form score function,并通过近似 Neil 网络来预测score function。
results: 可以在consumer-grade CPU上运行,并且 sampling 速度与 neural SGMs 相当。Abstract
Score-based generative models (SGMs) sample from a target distribution by iteratively transforming noise using the score function of the perturbed target. For any finite training set, this score function can be evaluated in closed form, but the resulting SGM memorizes its training data and does not generate novel samples. In practice, one approximates the score by training a neural network via score-matching. The error in this approximation promotes generalization, but neural SGMs are costly to train and sample, and the effective regularization this error provides is not well-understood theoretically. In this work, we instead explicitly smooth the closed-form score to obtain an SGM that generates novel samples without training. We analyze our model and propose an efficient nearest-neighbor-based estimator of its score function. Using this estimator, our method achieves sampling times competitive with neural SGMs while running on consumer-grade CPUs.
摘要
Score-based生成模型(SGM)通过iterativelytransforming noise使用目标分布中的分数函数来采样。任何固定的训练集,这个分数函数都可以在关闭形式中评估,但是这些SGM会memorize其训练数据并不会生成新样本。在实践中,我们通常通过score-matching来approximate分数函数,并通过这个错误来促进泛化。但是神经网络SGM的训练和采样成本高,而且这种错误的效果不够理解。在这个工作中,我们选择显式简化关闭形式的分数函数,以获得一个不需要训练的SGM,可以生成新的样本。我们分析我们的模型,并提出一种高效的最近邻域基于的分数函数估计器。使用这种估计器,我们的方法可以与神经网络SGM的采样时间竞争,而且可以在consumer-grade CPU上运行。
results: 结果表明,VVC在对比HEVC的情况下,可以实现比特率下降范围为31%至40%,具体取决于视频内容、空间分辨率和选择的质量指标。然而,这些编码效率提升的成本是计算复杂性的增加。在平均情况下,VVC解码过程比HEVC解码过程快1.5倍,而编码过程则变得至少8倍于HEVC参考编码器。Abstract
In recent years, the proliferation of multimedia applications and formats, such as IPTV, Virtual Reality (VR, 360-degree), and point cloud videos, has presented new challenges to the video compression research community. Simultaneously, there has been a growing demand from users for higher resolutions and improved visual quality. To further enhance coding efficiency, a new video coding standard, Versatile Video Coding (VVC), was introduced in July 2020. This paper conducts a comprehensive analysis of coding performance and complexity for the latest VVC standard in comparison to its predecessor, High Efficiency Video Coding (HEVC). The study employs a diverse set of test sequences, covering both High Definition (HD) and Ultra High Definition (UHD) resolutions, and spans a wide range of bit-rates. These sequences are encoded using the reference software encoders of HEVC (HM) and VVC (VTM). The results consistently demonstrate that VVC outperforms HEVC, achieving bit-rate savings of up to 40% on the subjective quality scale, particularly at realistic bit-rates and quality levels. Objective quality metrics, including PSNR, SSIM, and VMAF, support these findings, revealing bit-rate savings ranging from 31% to 40%, depending on the video content, spatial resolution, and the selected quality metric. However, these improvements in coding efficiency come at the cost of significantly increased computational complexity. On average, our results indicate that the VVC decoding process is 1.5 times more complex, while the encoding process becomes at least eight times more complex than that of the HEVC reference encoder. Our simultaneous profiling of the two standards sheds light on the primary evolutionary differences between them and highlights the specific stages responsible for the observed increase in complexity.
摘要
近年来, multimedia 应用和格式的普及,如 IPTV、虚拟现实(VR, 360度)以及点云视频,对视频压缩研究 сообщество带来了新的挑战。同时,用户对高分辨率和改进的视觉质量产生了增加的需求。为了进一步提高编码效率,2020年7月引入了一新的视频编码标准——多样化视频编码(VVC)。本文对最新的VVC标准和其前任者高效视频编码(HEVC)进行了全面的编码性能和复杂度分析。研究使用了一组多样化的测试序列,覆盖了高清(HD)和超高清(UHD)的分辨率,并覆盖了各种比特率范围。这些序列使用HEVC(HM)和VVC(VTM)的参考软件编码器进行编码。结果表明,VVC在主观质量标准下与HEVC进行比较,可以实现比特率下降达40%,特别是在现实比特率和质量水平下。对象质量指标,包括PSNR、SSIM和VMAF,支持这些发现,显示VVC在视频内容、空间分辨率和选择的质量指标下实现比特率下降在31%到40%之间。然而,这些编码效率改进的成本是计算复杂性增加的代价。在我们的结果中,VVC解码过程的计算复杂性提高了1.5倍,而编码过程则变得至少8倍于HEVC参考编码器。我们同时进行了HEVC和VVC的并行分析,并指出了这两个标准之间的主要进化差异,以及具体的阶段贡献到所见的复杂性增加。
results: 这个论文通过numerical results表明,这种模型在image denoising中是竞争力强,可读性好、 Parameters少。此外,模型还可以用于静态噪声估计,允许无监测的图像减噪。Abstract
In this work we tackle the problem of estimating the density $ f_X $ of a random variable $ X $ by successive smoothing, such that the smoothed random variable $ Y $ fulfills the diffusion partial differential equation $ (\partial_t - \Delta_1)f_Y(\,\cdot\,, t) = 0 $ with initial condition $ f_Y(\,\cdot\,, 0) = f_X $. We propose a product-of-experts-type model utilizing Gaussian mixture experts and study configurations that admit an analytic expression for $ f_Y (\,\cdot\,, t) $. In particular, with a focus on image processing, we derive conditions for models acting on filter-, wavelet-, and shearlet responses. Our construction naturally allows the model to be trained simultaneously over the entire diffusion horizon using empirical Bayes. We show numerical results for image denoising where our models are competitive while being tractable, interpretable, and having only a small number of learnable parameters. As a byproduct, our models can be used for reliable noise estimation, allowing blind denoising of images corrupted by heteroscedastic noise.
摘要
在这项工作中,我们解决了一个估计 Random Variable X 的概率分布 f_X 的问题,通过连续缓和,使得缓和后的 Random Variable Y 满足噪声partial differential equation(PDE)$ (\partial_t - \Delta_1)f_Y(\,\cdot\,, t) = 0 $ 的初始条件 $ f_Y(\,\cdot\,, 0) = f_X $。我们提出了一种product-of-experts-type模型,使用 Gaussian mixture experts,并研究了可以表示 $ f_Y (\,\cdot\,, t) $ 的配置。特别是在图像处理方面,我们 derivate了对 filter-, wavelet- 和 shearlet responses 的模型。我们的建构 naturally allows the model to be trained simultaneously over the entire diffusion horizon using empirical Bayes。我们展示了对图像减震的numerical result,其中我们的模型能够与其他模型竞争,同时具有可读性、可解释性和只有小量可学习参数。此外,我们的模型还可以用于可靠地 estimating noise,allowing blind denoising of images corrupted by heteroscedastic noise。Note that Simplified Chinese is a written language, and the translation is based on the standardized grammar and vocabulary of Simplified Chinese. The actual translation may vary depending on the specific context and dialect.
Iterative PnP and its application in 3D-2D vascular image registration for robot navigation
for: 这 paper 描述了一种新的实时Robot-Centered 3D-2D血管图像对齐算法,可以抗扭变和实现高精度。
methods: 该 paper 使用了高精度 3D-2D регистра图像技术和计算效率要求,并提出了一种基于 Perspective-n-Point(PnP)问题的解决方案。
results: 实验表明,提出的算法可以在 50 Hz (静止)和 20 Hz (非静止)的速度下进行对齐,并且与其他工作的对齐精度相似。结果表明,Iterative PnP 适用于未来血管 interven robot 应用。Abstract
This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications. We categorize centerline-based vascular 3D-2D image registration problems as an iterative Perspective-n-Point (PnP) problem and propose to use the Levenberg-Marquardt solver on the Lie manifold. Then, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the ``big-to-small'' problem in typical robotic scenarios. Finally, an iterative reweighted least squares is applied to solve RKHS-based formulation efficiently. Experiments indicate that the proposed algorithm processes registration over 50 Hz (rigid) and 20 Hz (nonrigid) and obtains competing registration accuracy similar to other works. Results indicate that our Iterative PnP is suitable for future vascular intervention robot applications.
摘要
The centerline-based vascular 3D-2D image registration problems are categorized as an iterative Perspective-n-Point (PnP) problem, and the Levenberg-Marquardt solver on the Lie manifold is proposed. Additionally, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the "big-to-small" problem in typical robotic scenarios. Finally, an iterative reweighted least squares method is applied to solve the RKHS-based formulation efficiently.Experiments show that the proposed algorithm can process registration over 50 Hz (rigid) and 20 Hz (non-rigid) and obtains competing registration accuracy similar to other works. Results indicate that the Iterative PnP is suitable for future vascular intervention robot applications.Here is the Simplified Chinese translation:这篇论文报道了一种新的实时机器人 centered 3D-2D血管图像匹配算法,该算法可以抗耗弃和对非固定形态进行匹配。很少的工作能够同时实现实时和高精度的性能,这种工作将高精度 3D-2D 注册技术和机器人应用中的计算效率要求相结合。我们将中心线基于血管 3D-2D 图像注册问题分类为一个迭代 Perspective-n-Point (PnP) 问题,并提议使用 Levenberg-Marquardt 算法在 Lie 替换 manifold 上。然后,我们引入了最近发展的 Reproducing Kernel Hilbert Space (RKHS) 算法,以解决 typical 机器人应用中的 "大到小" 问题。最后,我们使用迭代重点最小二乘法解决 RKHS-based 表示的问题。实验显示,提议的算法可以在 50 Hz (固定) 和 20 Hz (非固定) 的注册速率下进行注册,并与其他工作的注册精度相似。结果表明,Iterative PnP 适用于未来血管介入机器人应用。
Multi-granularity Backprojection Transformer for Remote Sensing Image Super-Resolution
results: 实验结果表明,MBT可以高效地学习低分辨率特征,不需要过度的模块来处理高分辨率处理。MBT在UCMerced和AID数据集上达到了其他领先方法的州态较高的性能。Abstract
Backprojection networks have achieved promising super-resolution performance for nature images but not well be explored in the remote sensing image super-resolution (RSISR) field due to the high computation costs. In this paper, we propose a Multi-granularity Backprojection Transformer termed MBT for RSISR. MBT incorporates the backprojection learning strategy into a Transformer framework. It consists of Scale-aware Backprojection-based Transformer Layers (SPTLs) for scale-aware low-resolution feature learning and Context-aware Backprojection-based Transformer Blocks (CPTBs) for hierarchical feature learning. A backprojection-based reconstruction module (PRM) is also introduced to enhance the hierarchical features for image reconstruction. MBT stands out by efficiently learning low-resolution features without excessive modules for high-resolution processing, resulting in lower computational resources. Experiment results on UCMerced and AID datasets demonstrate that MBT obtains state-of-the-art results compared to other leading methods.
摘要
备受期待的Backprojection网络在自然图像超分辨(SR)领域取得了出色的成绩,但在 remote sensing 图像超分辨(RSISR)领域还没有得到充分探索,主要因为计算成本过高。在这篇论文中,我们提出了一种名为 Multi-granularity Backprojection Transformer(MBT)的RSISR方法。MBT将Backprojection学习策略 integrate 到Transformer框架中。它包括Scale-aware Backprojection-based Transformer Layers(SPTLs),用于学习尺度意识的低分辨度特征,以及Context-aware Backprojection-based Transformer Blocks(CPTBs),用于层次特征学习。此外,我们还提出了一种基于Backprojection的重建模块(PRM),用于增强层次特征对图像重建的贡献。MBT的优势在于不需要过多的模块来处理高分辨度数据,从而降低计算资源的消耗。实验结果表明,MBT在UCMerced和AID数据集上达到了与其他领先方法相当的成绩。
paper_authors: Zhongze Zhang, Tao Jiang, Wei Yu for: This paper proposes a solution to the uplink localization problem for remote users with the aid of reconfigurable intelligent surfaces (RIS).methods: The proposed method uses long short-term memory (LSTM) networks to exploit temporal correlation between measurements and construct scalable information vectors. A deep neural network (DNN) is used to map the LSTM cell state to the RIS configuration and the final DNN is used to map the LSTM cell state to the estimated user equipment (UE) position.results: The proposed active RIS design results in lower localization error as compared to existing active and nonactive methods. The proposed solution produces interpretable results and is generalizable to early stopping in the sequence of sensing stages.Abstract
This paper addresses an uplink localization problem in which the base station (BS) aims to locate a remote user with the aid of reconfigurable intelligent surface (RIS). This paper proposes a strategy in which the user transmits pilots over multiple time frames, and the BS adaptively adjusts the RIS reflection coefficients based on the observations already received so far in order to produce an accurate estimate of the user location at the end. This is a challenging active sensing problem for which finding an optimal solution involves a search through a complicated functional space whose dimension increases with the number of measurements. In this paper, we show that the long short-term memory (LSTM) network can be used to exploit the latent temporal correlation between measurements to automatically construct scalable information vectors (called hidden state) based on the measurements. Subsequently, the state vector can be mapped to the RIS configuration for the next time frame in a codebook-free fashion via a deep neural network (DNN). After all the measurements have been received, a final DNN can be used to map the LSTM cell state to the estimated user equipment (UE) position. Numerical result shows that the proposed active RIS design results in lower localization error as compared to existing active and nonactive methods. The proposed solution produces interpretable results and is generalizable to early stopping in the sequence of sensing stages.
摘要
To address this problem, this paper proposes using a long short-term memory (LSTM) network to exploit the latent temporal correlation between measurements and construct scalable information vectors (called hidden state) based on the measurements. The state vector can then be mapped to the RIS configuration for the next time frame in a codebook-free fashion via a deep neural network (DNN). After all the measurements have been received, a final DNN can be used to map the LSTM cell state to the estimated user equipment (UE) position.Numerical results show that the proposed active RIS design results in lower localization error compared to existing active and nonactive methods. The proposed solution produces interpretable results and is generalizable to early stopping in the sequence of sensing stages.
High Dynamic Range mmWave Massive MU-MIMO with Householder Reflections
paper_authors: Victoria Palhares, Gian Marti, Oscar Castañeda, Christoph Studer
For: 该论文旨在解决当 simultanoously transmitting user equipments (UEs) with vastly different BS-side receive powers时,低分辨率数字化转换器 (ADCs) 的问题。* Methods: 该论文提出了一种新的高 Dynamics 范围 (HDR) MIMO 技术,该技术结合了适应性的analog spatial transform和数字平衡,以实现同时接收强大和弱 UE 的功能。* Results: 该论文通过使用 Householder reflections 作为空间变换, demonstarted the efficacy of HDR MIMO in a massive MU-MIMO mmWave scenario.Abstract
All-digital massive multiuser (MU) multiple-input multiple-output (MIMO) at millimeter-wave (mmWave) frequencies is a promising technology for next-generation wireless systems. Low-resolution analog-to-digital converters (ADCs) can be utilized to reduce the power consumption of all-digital basestation (BS) designs. However, simultaneously transmitting user equipments (UEs) with vastly different BS-side receive powers either drown weak UEs in quantization noise or saturate the ADCs. To address this issue, we propose high dynamic range (HDR) MIMO, a new paradigm that enables simultaneous reception of strong and weak UEs with low-resolution ADCs. HDR MIMO combines an adaptive analog spatial transform with digital equalization: The spatial transform focuses strong UEs on a subset of ADCs in order to mitigate quantization and saturation artifacts; digital equalization is then used for data detection. We demonstrate the efficacy of HDR MIMO in a massive MU-MIMO mmWave scenario that uses Householder reflections as spatial transform.
摘要
全数位大规模多用户(MU)多输入多输出(MIMO)在 millimeter 波(mmWave)频率上是未来无线系统的承让技术。低分辨率数字转换器(ADC)可以降低全数位基站(BS)设计的功耗。然而,同时发送用户设备(UE)的大大不同BS-side接收功率会使用量化杂音淹没弱UE,或者使ADC发生饱和。为解决这个问题,我们提出高动态范围(HDR)MIMO,一种新的思想,允许同时接收强UE和弱UE,使用低分辨率ADC。HDR MIMO将适应性的分析Transform与数字平衡相结合:分析Transform将强UE集中在一些ADC上,以降低量化和饱和artefacts;数字平衡后再进行数据检测。我们在大规模MU-MIMO mmWave场景中使用Householder reflections作为分析Transform,并证明HDR MIMO的有效性。
Capacity Limitation and Optimization Strategy for Flexible Point-to-Multi-Point Optical Networks
results: 该论文提出了可变PtMP光纤网络的能量负荷和容量的理论限制,并提出了最佳的剪辑率来实现最高的容量限制。基于准确的剪辑噪声模型,该论文建立了三维 lookup表来计算比特错误率、spectral efficiency和链损失。该论文还提出了一种优化策略来实现可变PtMP光纤网络的最高容量。Abstract
Point-to-multi-point (PtMP) optical networks become the main solutions for network-edge applications such as passive optical networks and radio access networks. Entropy-loading digital subcarrier multiplexing (DSCM) is the core technology to achieve low latency and approach high capacity for flexible PtMP optical networks. However, the high peak-to-average power ratio of the entropy-loading DSCM signal limits the power budget and restricts the capacity, which can be reduced effectively by clipping operation. In this paper, we derive the theoretical capacity limitation of the flexible PtMP optical networks based on the entropy-loading DSCM signal. Meanwhile, an optimal clipping ratio for the clipping operation is acquired to approach the highest capacity limitation. Based on an accurate clipping-noise model under the optimal clipping ratio, we establish a three-dimensional look-up table for bit-error ratio, spectral efficiency, and link loss. Based on the three-dimensional look-up table, an optimization strategy is proposed to acquire optimal spectral efficiencies for achieving a higher capacity of the flexible PtMP optical networks.
摘要
点对多点(PtMP)光网成为网络边缘应用的主要解决方案,如无活动光网和无线接入网。Entropy-loading数字子副载多plexing(DSCM)是实现低延迟和高容量灵活PtMP光网的核心技术。然而,高峰值平均功率比ENTROPY-loading DSCM信号限制了功率预算,这可以通过剪辑操作来降低。在这篇论文中,我们 derive了灵活PtMP光网的理论容量限制基于ENTROPY-loading DSCM信号。同时,我们获得了最佳剪辑率来接近最高容量限制。基于最佳剪辑噪声模型,我们建立了三维look-up表,其中包括比特错误率、spectral efficiency和链接产生率。基于三维look-up表,我们提出了一种优化策略,以实现灵活PtMP光网的更高容量。
Mutual Information-Based Integrated Sensing and Communications: A WMMSE Framework
results: numerical 结果表明该方法的效果,并 validate了感知和通信之间的性能负荷。Abstract
In this letter, a weighted minimum mean square error (WMMSE) empowered integrated sensing and communication (ISAC) system is investigated. One transmitting base station and one receiving wireless access point are considered to serve multiple users a sensing target. Based on the theory of mutual-information (MI), communication MI and sensing MI rate are utilized as the performance metrics under the presence of clutters. In particular, we propose an novel MI-based WMMSE-ISAC method by developing a unique transceiver design mechanism to maximize the weighted sensing and communication sum-rate of this system. Such a maximization process is achieved by utilizing the classical method -- WMMSE, aiming to better manage the effect of sensing clutters and the interference among users. Numerical results show the effectiveness of our proposed method, and the performance trade-off between sensing and communication is also validated.
摘要
在这封信中,一种权重最小平均方差 empowered integreated sensing and communication(ISAC)系统被研究。一个发射基站和一个接收无线访问点被考虑,以服务多个用户感知目标。基于互信息(MI)理论,在存在噪声的情况下,通信MI和感知MI率被用作这系统的性能指标。特别是,我们提出了一种新的 MI-based WMMSE-ISAC 方法,通过开发一种特有的天线设计机制,以最大化这个系统的权重感知和通信总速率。这个最大化过程通过使用经典方法——WMMSE,以更好地管理感知噪声和用户之间的干扰。 numerically 的结果表明我们的提出方法的有效性,并且 validate 了感知和通信之间的性能质量负担。
Can Electromagnetic Information Theory Improve Wireless Systems? A Channel Estimation Example
paper_authors: Jieao Zhu, Xiaofeng Su, Zhongzhichao Wan, Linglong Dai, Tie Jun Cui
for: 本研究旨在探讨electromagnetic information theory(EIT)如何提高无线通信系统的性能。
methods: 本文提出了一种基于EIT的渠道估计方法,将电磁知识integrated into classical MMSE渠道估计器中,并通过用 Gaussian process regression(GPR) derive the channel estimations。此外, authors还提出了EMkernel learning来调整EM kernel的参数。
results: simulation results show that EIT-based channel estimator可以超过传统的均方差MMSE算法,证明EIT在实际应用中的实用性。Abstract
Electromagnetic information theory (EIT) is an emerging interdisciplinary subject that integrates classical Maxwell electromagnetics and Shannon information theory. The goal of EIT is to uncover the information transmission mechanisms from an electromagnetic (EM) perspective in wireless systems. Existing works on EIT are mainly focused on the analysis of degrees-of-freedom (DoF), system capacity, and characteristics of the electromagnetic channel. However, these works do not clarify how EIT can improve wireless communication systems. To answer this question, in this paper, we provide a novel demonstration of the application of EIT. By integrating EM knowledge into the classical MMSE channel estimator, we observe for the first time that EIT is capable of improving the channel estimation performace. Specifically, the EM knowledge is first encoded into a spatio-temporal correlation function (STCF), which we term as the EM kernel. This EM kernel plays the role of side information to the channel estimator. Since the EM kernel takes the form of Gaussian processes (GP), we propose the EIT-based Gaussian process regression (EIT-GPR) to derive the channel estimations. In addition, since the EM kernel allows parameter tuning, we propose EM kernel learning to fit the EM kernel to channel observations. Simulation results show that the application of EIT to the channel estimator enables it to outperform traditional isotropic MMSE algorithm, thus proving the practical values of EIT.
摘要
电磁信息理论(EIT)是一个emerging的interdisciplinary subject,它将经典的Maxwell电磁学和Shannon信息理论相结合。EIT的目标是从电磁(EM)角度来描述无线系统中信息传输机制。现有的EIT研究主要关注度OF(DoF)、系统容量和电磁通道的特性。然而,这些研究并没有解释如何使用EIT提高无线通信系统的性能。为回答这个问题,在这篇论文中,我们提供了一种新的EIT应用示例。我们首先将EM知识编码成一个空间-时间协同函数(STCF),我们称之为EM核。这个EM核在渠道估计器中扮演着侧信息的角色。由于EM核是GP的形式,我们提议使用EIT基于GP回归(EIT-GPR)来 derivate渠道估计结果。此外,由于EM核允许参数调整,我们提议使用EM核学习来适应渠道观测。实验结果表明,通过应用EIT到渠道估计器,可以超越传统的均方差MMSE算法,从而证明EIT在实践中的价值。
results: 通过RIS的部署,可以提高高速列车通信系统的无线覆盖和可用性,并且可以适应不同的系统参数。Abstract
Reconfigurable intelligent surface (RIS) emerges as an efficient and promising technology for the next wireless generation networks and has attracted a lot of attention owing to the capability of extending wireless coverage by reflecting signals toward targeted receivers. In this paper, we consider a RIS-assisted high-speed train (HST) communication system to enhance wireless coverage and improve coverage probability. First, coverage performance of the downlink single-input-single-output system is investigated, and the closed-form expression of coverage probability is derived. Moreover, travel distance maximization problem is formulated to facilitate RIS discrete phase design and RIS placement optimization, which is subject to coverage probability constraint. Simulation results validate that better coverage performance and higher travel distance can be achieved with deployment of RIS. The impacts of some key system parameters including transmission power, signal-to-noise ratio threshold, number of RIS elements, number of RIS quantization bits, horizontal distance between base station and RIS, and speed of HST on system performance are investigated. In addition, it is found that RIS can well improve coverage probability with limited power consumption for HST communications.
摘要
《卷积智能表面(RIS)助成高速列车(HST)通信系统的可靠性和可能性》Abstract:随着下一代无线网络技术的发展,可 Configurable intelligent surface (RIS) 已经成为一种有效和有前途的技术,可以扩展无线覆盖范围并将信号反射向目标接收器。在这篇论文中,我们考虑了一个RIS协助HST通信系统,以提高无线覆盖和提高通信可靠性。首先,我们研究了下降链单输入单出口系统的覆盖性能,并 derivatedclosed-form表达式来计算覆盖可能性。此外,我们还解决了RIS精度设计和RIS布置优化问题,该问题是基于覆盖可能性约束。实验结果表明,通过RIS部署,可以实现更好的覆盖性能和更长的旅行距离。此外,我们还 investigate了一些关键系统参数的影响,包括传输功率、信号噪声比阈值、RIS元素数量、RIS逻辑位数、基站与RIS之间的水平距离和高速列车速度。研究结果表明,RIS可以很好地提高高速列车通信可靠性,同时减少功率消耗。Introduction:With the development of the next generation wireless networks, reconfigurable intelligent surfaces (RIS) have emerged as a promising technology that can extend wireless coverage and improve communication reliability. In this paper, we consider a RIS-assisted high-speed train (HST) communication system to enhance wireless coverage and improve coverage probability. First, we investigate the coverage performance of the downlink single-input-single-output system and derive a closed-form expression for the coverage probability. Moreover, we formulate a travel distance maximization problem to facilitate RIS discrete phase design and RIS placement optimization, which is subject to coverage probability constraint. Simulation results validate that better coverage performance and longer travel distance can be achieved with the deployment of RIS. Furthermore, we investigate the impacts of some key system parameters on system performance, including transmission power, signal-to-noise ratio threshold, number of RIS elements, number of RIS quantization bits, horizontal distance between base station and RIS, and speed of HST. Results show that RIS can well improve coverage probability with limited power consumption for HST communications.Main Body:1. Coverage Performance of Downlink Single-Input-Single-Output SystemWe first investigate the coverage performance of the downlink single-input-single-output system. By deriving the closed-form expression of coverage probability, we can analyze the impact of RIS on the coverage performance. The results show that RIS can significantly improve the coverage probability, especially when the distance between the base station and the user is large.2. Travel Distance Maximization ProblemTo facilitate RIS discrete phase design and RIS placement optimization, we formulate a travel distance maximization problem subject to coverage probability constraint. The problem is to find the optimal phase shifts of the RIS elements that maximize the travel distance of the HST while ensuring a certain coverage probability. We solve the problem using a numerical optimization algorithm and show that the optimized RIS phase shifts can significantly improve the travel distance of the HST.3. Impacts of System Parameters on System PerformanceWe investigate the impacts of some key system parameters on system performance, including transmission power, signal-to-noise ratio threshold, number of RIS elements, number of RIS quantization bits, horizontal distance between base station and RIS, and speed of HST. The results show that these parameters have a significant impact on the coverage performance and travel distance of the HST. In particular, we find that RIS can well improve coverage probability with limited power consumption for HST communications.Conclusion:In conclusion, we have proposed a RIS-assisted HST communication system to enhance wireless coverage and improve coverage probability. By deriving the closed-form expression of coverage probability and formulating a travel distance maximization problem, we have shown that RIS can significantly improve the coverage performance and travel distance of the HST. Furthermore, we have investigated the impacts of some key system parameters on system performance and found that RIS can well improve coverage probability with limited power consumption for HST communications. Our results demonstrate the potential of RIS technology for improving wireless communication systems in high-speed train applications.