eess.SP - 2023-10-04

Index-Modulated Metasurface Transceiver Design using Reconfigurable Intelligent Surfaces for 6G Wireless Networks

  • paper_url: http://arxiv.org/abs/2310.03208
  • repo_url: None
  • paper_authors: JohnA. Hodge, Kumar Vijay Mishra, Brian M. Sadler, Amir I. Zaghloul
  • for: 本研究旨在提高 sixth-generation 无线网络的数据速率和能效性,并提出了一种新的扩展技术——指标修改(Index Modulation,IM)。
  • methods: 本研究使用了智能表面(Reconfigurable Intelligent Surface,RIS)的电磁特性来实现 IM,包括指向调制、空间多普通模式和相位调制能力。
  • results: 数值实验表明,通过 RIS 协助 IM 实现可以获得较佳的比错率表现,而且可以通过调整反射相位来实现 IM 的程序化。
    Abstract Higher spectral and energy efficiencies are the envisioned defining characteristics of high data-rate sixth-generation (6G) wireless networks. One of the enabling technologies to meet these requirements is index modulation (IM), which transmits information through permutations of indices of spatial, frequency, or temporal media. In this paper, we propose novel electromagnetics-compliant designs of reconfigurable intelligent surface (RIS) apertures for realizing IM in 6G transceivers. We consider RIS modeling and implementation of spatial and subcarrier IMs, including beam steering, spatial multiplexing, and phase modulation capabilities. Numerical experiments for our proposed implementations show that the bit error rates obtained via RIS-aided IM outperform traditional implementations. We further establish the programmability of these transceivers to vary the reflection phase and generate frequency harmonics for IM through full-wave electromagnetic analyses of a specific reflect-array metasurface implementation.
    摘要 高 spectral 和能量效率是 sixth-generation (6G) 无线网络的定义特征。一种实现这些要求的技术是指标模ulation (IM), 它通过媒体的 indeces permutation 传输信息。在这篇论文中,我们提议了新的 electromagnetics-compliant 设计,用于实现 IM 在 6G 接收机中。我们考虑了 RIS 模型和实现,包括杆导向、空载多普通量和相位修正能力。我们的numerical experiment 表明,通过 RIS 帮助 IM,可以获得较低的错误率。我们还证明了这些接收机的可编程性,可以通过调整反射相位和生成频率幂来实现 IM。Here's a word-for-word translation of the text using Traditional Chinese characters:高 spectral 和能量效率是 sixth-generation (6G) 无线网络的定义特征。一种实现这些要求的技术是指标模ulation (IM), 它通过媒体的 indeces permutation 传送信息。在这篇论文中,我们提议了新的 electromagnetics-compliant 设计,用于实现 IM 在 6G 接收机中。我们考虑了 RIS 模型和实现,包括杆导向、空载多普通量和相位修正能力。我们的numerical experiment 表明,通过 RIS 帮助 IM,可以获得较低的错误率。我们也证明了这些接收机的可编程性,可以通过调整反射相位和生成频率幂来实现 IM。

Impedance Leakage Vulnerability and its Utilization in Reverse-engineering Embedded Software

  • paper_url: http://arxiv.org/abs/2310.03175
  • repo_url: None
  • paper_authors: Md Sadik Awal, Md Tauhidur Rahman
  • for: 保护系统和数据免受物理攻击,发现新的漏洞和实施安全和隐私措施是非常重要。
  • methods: 本文探讨了一种受到忽略或者狭隘研究的漏洞——阻抗性,它可以通过非典型的侧途泄露信息,对系统和数据安全和隐私带来潜在的威胁。
  • results: 本文表明了ATmega328P微控制器和Artix 7 FPGA的阻抗性不是固定的,而是与运行在这些设备上的软件有直接关系。我们称之为阻抗性泄露,并使其成为一种侧途来检测保护内存中的软件指令。我们的实验表明,阻抗性侧途可以准确地检测软件指令,具体数据如下:ATmega328P微控制器的准确率为96.1%,Artix 7 FPGA的准确率为92.6%。此外,我们还探讨了阻抗性侧途的双重性,包括可能的利用和知识产权盗用的风险。最后,我们提出了特定地址 impedance leakage的防范措施。
    Abstract Discovering new vulnerabilities and implementing security and privacy measures are important to protect systems and data against physical attacks. One such vulnerability is impedance, an inherent property of a device that can be exploited to leak information through an unintended side channel, thereby posing significant security and privacy risks. Unlike traditional vulnerabilities, impedance is often overlooked or narrowly explored, as it is typically treated as a fixed value at a specific frequency in research and design endeavors. Moreover, impedance has never been explored as a source of information leakage. This paper demonstrates that the impedance of an embedded device is not constant and directly relates to the programs executed on the device. We define this phenomenon as impedance leakage and use this as a side channel to extract software instructions from protected memory. Our experiment on the ATmega328P microcontroller and the Artix 7 FPGA indicates that the impedance side channel can detect software instructions with 96.1% and 92.6% accuracy, respectively. Furthermore, we explore the dual nature of the impedance side channel, highlighting the potential for beneficial purposes and the associated risk of intellectual property theft. Finally, potential countermeasures that specifically address impedance leakage are discussed.
    摘要 发现新的漏洞和实施安全和隐私措施是保护系统和数据 против物理攻击的重要方法。一种这样的漏洞是阻力,它是设备的自然属性,可以通过不当的副通道泄露信息,从而带来严重的安全和隐私风险。与传统的漏洞不同,阻力通常被忽略或者狭采,因为它通常被视为一个固定的值,在研究和设计中。此外,阻力从未被探讨作为信息泄露的来源。这篇论文表明,阻力不是固定的,而是与在设备上执行的程序直接相关。我们称这种现象为阻力泄露,并使其作为副通道来提取保护内存中的软件指令。我们的实验表明,阻力副通道可以准确地检测软件指令,具体来说,在ATmega328P微控制器和Artix 7 FPGA上,阻力副通道可以检测软件指令的准确率分别为96.1%和92.6%。此外,我们还探讨了阻力副通道的双重性,并指出了这种副通道的潜在利用和知识产权盗用的风险。最后,我们提出了专门针对阻力泄露的防范措施。

Dynamic Changes of Brain Network during Epileptic Seizure

  • paper_url: http://arxiv.org/abs/2310.03151
  • repo_url: None
  • paper_authors: Atefeh Khoshkhahtinat, Hoda Mohammadzade
  • For: 这个研究旨在探索 epilepsy 患者的脑网络在癫痫期间的动态变化,以便提高诊断和治疗的效果。* Methods: 这个研究使用了频率带的phas lag index (PLI) 测量,并使用图论学技术提取脑网络的 topological 特征。 然后,一种无supervised clustering方法用于检查脑网络在癫痫期间的状态转移。* Results: 研究发现,癫痫期间脑的同步水平高于预癫痫和后癫痫期间的theta、alpha 和 beta 频率带,而gamma 频率带的同步水平下降。这些变化还导致脑网络的topological 特征在癫痫期间发生变化。此外,研究发现脑网络在癫痫期间的动态变化比传统的三个状态模型(预癫痫、癫痫、后癫痫)复杂得多,脑网络在癫痫期间的变化缓慢于预癫痫和后癫痫期间。
    Abstract Epilepsy is a neurological disorder identified by sudden and recurrent seizures, which are believed to be accompanied by distinct changes in brain dynamics. Exploring the dynamic changes of brain network states during seizures can pave the way for improving the diagnosis and treatment of patients with epilepsy. In this paper, the connectivity brain network is constructed using the phase lag index (PLI) measurement within five frequency bands, and graph-theoretic techniques are employed to extract topological features from the brain network. Subsequently, an unsupervised clustering approach is used to examine the state transitions of the brain network during seizures. Our findings demonstrate that the level of brain synchrony during the seizure period is higher than the pre-seizure and post-seizure periods in the theta, alpha, and beta bands, while it decreases in the gamma bands. These changes in synchronization also lead to alterations in the topological features of functional brain networks during seizures. Additionally, our results suggest that the dynamics of the brain during seizures are more complex than the traditional three-state model (pre-seizure, seizure, and post-seizure) and the brain network state exhibits a slower rate of change during the seizure period compared to the pre-seizure and post-seizure periods.
    摘要 EPilepsy 是一种神经疾病,标志于不断发生的发作,这些发作通常附有脑动力学的变化。研究发作期间脑网络的动态变化可能为患EPilepsy 患者的诊断和治疗带来新的想法。在这篇论文中,我们使用相位延迟指数(PLI)测量在五个频率带中的脑网络连接度,并使用图论技术提取脑网络的 topological 特征。然后,我们使用无supervised clustering 方法检查脑网络在发作期间的状态转移。我们发现在发作期间,脑的同步水平高于预发作和后发作期间的theta、alpha和beta频率带,而在gamma频率带下降。这些变化也导致了脑网络的 topological 特征的变化。此外,我们的结果表明,脑在发作期间的动态不仅比传统的三个状态模型(预发作、发作、后发作)复杂,而且发作期间脑网络的变化速率也比预发作和后发作期间 slower。

Dual mode multispectral imaging system for food and agricultural product quality estimation

  • paper_url: http://arxiv.org/abs/2310.03110
  • repo_url: None
  • paper_authors: Darsha Udayanga, Ashan Serasinghe, Supun Dassanayake, Roshan Godaliyadda, H. M. V. R. Herath, M. P. B. Ekanayake, H. L. P. Malshan
  • for: 这个论文的目的是提出一种基于多spectral成像技术和人工智能机器学习技术的食品质量控制方法,以代替传统的实验室测试方法。
  • methods: 这个论文使用了反射和传输成像技术,并应用了人工智能机器学习和处理技术来分析数据。
  • results: 论文的实验结果显示,这种方法可以准确地识别不同类型的食品,包括固体和液体样本。具体来说,在标准色彩标准样本和巧克力油涂抹样本中,该方法可以达到90%以上的准确率,而在巧克力油涂抹样本中,该方法可以达到95%以上的准确率。同时,该方法还可以提供最高的准确率(99%) для溶解液体样本。
    Abstract Multispectral imaging coupled with Artificial Intelligence, Machine Learning and Signal Processing techniques work as a feasible alternative for laboratory testing, especially in food quality control. Most of the recent related research has been focused on reflectance multispectral imaging but a system with both reflectance, transmittance capabilities would be ideal for a wide array of specimen types including solid and liquid samples. In this paper, a device which includes a dedicated reflectance mode and a dedicated transmittance mode is proposed. Dual mode operation where fast switching between two modes is facilitated. An innovative merged mode is introduced in which both reflectance and transmittance information of a specimen are combined to form a higher dimensional dataset with more features. Spatial and temporal variations of measurements are analyzed to ensure the quality of measurements. The concept is validated using a standard color palette and specific case studies are done for standard food samples such as turmeric powder and coconut oil proving the validity of proposed contributions. The classification accuracy of standard color palette testing was over 90% and the accuracy of coconut oil adulteration was over 95%. while the merged mode was able to provide the best accuracy of 99% for the turmeric adulteration. A linear functional mapping was done for coconut oil adulteration with an R2 value of 0.9558.
    摘要 多spectral成像技术与人工智能、机器学习和信号处理技术结合,成为实验室测试的可行 alternativa,特别是食品质量控制。大多数最近相关的研究都集中在反射多spectral成像上,但一个包含反射和传输功能的系统会对各种样本类型进行更广泛的应用,包括固体和液体样本。本文提出了一种设备,它包括专门的反射模式和传输模式,并且具有快速切换 между两个模式的能力。我们还提出了一种创新的合并模式,在该模式下,反射和传输样本的信息被组合成一个高维数据集,其中包含更多的特征。我们进行了空间和时间变化的分析,以确保测量的质量。我们验证了提出的贡献,使用标准颜色谱和具体的食品样本测试,如胡萝卜粉和椰子油,结果显示了我们的提出的贡献的有效性。标准颜色谱测试的准确率高于90%,椰子油饱和质量的准确率高于95%,而合并模式能够提供最高的准确率99%。我们还进行了线性函数映射,其R2值为0.9558。

SNR-Adaptive Ranging Waveform Design Based on Ziv-Zakai Bound Optimization

  • paper_url: http://arxiv.org/abs/2310.02963
  • repo_url: None
  • paper_authors: Yifeng Xiong, Fan Liu
  • for: 本研究旨在提高无线应用中的定位精度,通过高精度测距来实现高精度定位。
  • methods: 本文提出了基于Ziv-Zakai bound(ZZB)的测距波形设计算法,该算法具有在给定SNR下实现最佳ZZB的理论保证。
  • results: numericalresults表明,在低SNR режимом,探测信号的检测概率成为测距信号的主要确定因素,而不是解析精度。
    Abstract Location-awareness is essential in various wireless applications. The capability of performing precise ranging is substantial in achieving high-accuracy localization. Due to the notorious ambiguity phenomenon, optimal ranging waveforms should be adaptive to the signal-to-noise ratio (SNR). In this letter, we propose to use the Ziv-Zakai bound (ZZB) as the ranging performance metric, as well as an associated waveform design algorithm having theoretical guarantee of achieving the optimal ZZB at a given SNR. Numerical results suggest that, in stark contrast to the well-known high-SNR design philosophy, the detection probability of the ranging signal becomes more important than the resolution in the low-SNR regime.
    摘要 Location-awareness 是各种无线应用中的关键因素。精准范围测量的能力是实现高精度定位的关键。由于著名的歧义现象,最佳的范围波形应对 Signal-to-Noise Ratio(SNR)进行适应。在这封信中,我们提议使用 Ziv-Zakai bound(ZZB)作为范围性能指标,以及与之相应的波形设计算法,具有理论保证可以在给定的 SNR 下实现最佳 ZZB。数字结果表明,在低 SNR Régime 中,探测范围信号的检测概率变得更加重要于分辨率。

Dark Side of HAPS Systems: Jamming Threats towards Satellites

  • paper_url: http://arxiv.org/abs/2310.02851
  • repo_url: None
  • paper_authors: Hadil Otay, Khaled Humadi, Gunes Karabulut Kurt
  • For: 本研究旨在为6G时代的卫星通信网络提供安全保障,特别是在低地球轨道卫星网络中提高卫星与地面站之间的通信可靠性。* Methods: 本研究采用了两种LEO卫星通信场景,分别是一个发送卫星、一个接收地面站和一个高空平台站(HAPS)的干扰攻击场景,以及两个卫星、一个发送卫星和一个接收地面站的干扰攻击场景。* Results: 我们通过发展数学模型,研究干扰信号对系统的影响,并发现在第二个场景中,卫星协作可以提高系统的安全性,因为干扰效果只会在两个链路同时受到攻击时发生极端情况。
    Abstract Securing satellite communication networks is imperative in the rapidly evolving landscape of advanced telecommunications, particularly in the context of 6G advancements. This paper establishes a secure low earth orbit (LEO) satellite network paradigm to address the challenges of the evolving 6G era, with a focus on enhancing communication integrity between satellites and ground stations. Countering the threat of jamming, which can disrupt vital communication channels, is a key goal of this work. In particular, this paper investigates the performance of two LEO satellite communication scenarios under the presence of jamming attacker. In the first scenario, we consider a system that comprises one transmitting satellite, a receiving ground station, and a high altitude platform station (HAPS) acting as a jammer. The HAPS disrupts communication between the satellite and the ground station, impeding signal transmission. The second scenario involves two satellites, one is the transmitter while the other works as a relay, accompanied by a ground station, and a jamming HAPS. In this scenario, the transmitting satellite sends data to the ground station using two different paths, i.e., direct and indirect transmission paths, with a relay satellite acting as an intermediary in the case of indirect transmission. For both scenarios, we study the system security by developing mathematical frameworks to investigate the outage effect resulting from the jamming signals orchestrated by the HAPS. Our results show that the satellite cooperation in the second scenario improves the system's security since the extreme jamming effect occurs only when both links are simultaneously disturbed.
    摘要 保护卫星通信网络是在高速发展的电信技术领域中非常重要的,特别是在6G技术的发展背景下。本文提出了一种安全的低地球轨(LEO)卫星网络模式,以解决6G时代的演进中的通信完整性问题。对于干扰攻击的威胁来说,这种工作是非常重要的。本文研究了两种LEO卫星通信场景下,干扰攻击的影响。在第一个场景中,我们考虑了一个包括一个发送卫星、一个接收地面站和一个高空平台站(HAPS)的系统。HAPS会中断卫星和地面站之间的通信,从而阻挡信号传输。在第二个场景中,我们有两个卫星,其中一个是发送器,另一个是 relay,以及一个地面站,以及一个干扰HAPS。在这个场景中,发送卫星将数据传输到地面站使用两种不同的路径,即直接传输和间接传输, relay卫星在 indirect transmission 的情况下 acts as an intermediary。为了两个场景,我们研究了系统的安全性,并开发了数学模型来研究干扰信号所引起的系统停机的影响。我们的结果表明,在第二个场景中,卫星合作可以提高系统的安全性,因为干扰效果只会在两个链路同时受到影响时发生极端情况。

Graph-based Simultaneous Localization and Bias Tracking

  • paper_url: http://arxiv.org/abs/2310.02814
  • repo_url: None
  • paper_authors: Alexander Venus, Erik Leitinger, Stefan Tertinek, Florian Meyer, Klaus Witrisal
  • for: 本研究旨在提供一种robust的移动设备位置测定和跟踪方法,可以在多Path环境中提供高精度的位置估计。
  • methods: 本研究使用因子图形式和粒子基于加法产品算法,并且提出了一种序列算法,可以同时估计移动设备的位置和时变多Path组件(MPCs)的延迟偏好。
  • results: 本研究表明,使用 simulate和实际测量数据,提出的算法可以在多Path环境中提供高精度的位置估计,并且常常达到 posterior Cramer-Rao下限(P-CRLB)。此外,研究还发现该方法可以自动标识不可靠测量,从而避免lost track。
    Abstract We present a factor graph formulation and particle-based sum-product algorithm for robust localization and tracking in multipath-prone environments. The proposed sequential algorithm jointly estimates the mobile agent's position together with a time-varying number of multipath components (MPCs). The MPCs are represented by "delay biases" corresponding to the offset between line-of-sight (LOS) component delay and the respective delays of all detectable MPCs. The delay biases of the MPCs capture the geometric features of the propagation environment with respect to the mobile agent. Therefore, they can provide position-related information contained in the MPCs without explicitly building a map of the environment. We demonstrate that the position-related information enables the algorithm to provide high-accuracy position estimates even in fully obstructed line-of-sight (OLOS) situations. Using simulated and real measurements in different scenarios we demonstrate the proposed algorithm to significantly outperform state-of-the-art multipath-aided tracking algorithms and show that the performance of our algorithm constantly attains the posterior Cramer-Rao lower bound (P-CRLB). Furthermore, we demonstrate the implicit capability of the proposed method to identify unreliable measurements and, thus, to mitigate lost tracks.
    摘要 我们提出了一种因子图表示法和粒子基于汇聚积分算法,用于robust的位置Localization和跟踪在多path环境中。我们的顺序算法同时估算移动代理的位置和时变多path组件(MPCs)的数量。MPCs被表示为“延迟偏好”,这些偏好对应于从视线组件延迟到所有检测到的MPCs的延迟。延迟偏好 capture了在移动代理周围的媒体传播环境的几何特征,因此它们可以提供位置相关的信息,无需显式建立环境地图。我们示出了我们的算法可以在完全遮挡视线(OLOS)情况下提供高精度的位置估计。使用模拟和实际测量数据,我们示出了我们的算法在不同enario中显著超越了当前的多path护航算法,并且示出了我们的算法的性能一直达到 posterior Cramer-Rao下限(P-CRLB)。此外,我们还示出了我们的方法可以识别不可靠的测量,从而 mitigate lost tracks。

Beyond Diagonal Reconfigurable Intelligent Surfaces with Mutual Coupling: Modeling and Optimization

  • paper_url: http://arxiv.org/abs/2310.02708
  • repo_url: None
  • paper_authors: Hongyu Li, Shanpu Shen, Matteo Nerini, Marco Di Renzo, Bruno Clerckx
  • for: 本文研究了卷积式智能表面(BD-RIS)干支持无线通信系统中的相互干扰问题。
  • methods: 作者首先 derivates the mutual coupling aware BD-RIS aided communication model using scattering and impedance parameter analysis。然后,他们提出了一种通用的BD-RIS优化算法,可以应用于不同的BD-RIS架构,以最大化通道增强。
  • results: 数值结果表明了提案的设计的有效性,并示出了相对于正常对角RIS的增强效果随着相互干扰的增加。
    Abstract This work studies the modeling and optimization of beyond diagonal reconfigurable intelligent surface (BD-RIS) aided wireless communication systems in the presence of mutual coupling among the RIS elements. Specifically, we first derive the mutual coupling aware BD-RIS aided communication model using scattering and impedance parameter analysis. Based on the obtained communication model, we propose a general BD-RIS optimization algorithm applicable to different architectures of BD-RIS to maximize the channel gain. Numerical results validate the effectiveness of the proposed design and demonstrate that the larger the mutual coupling the larger the gain offered by BD-RIS over conventional diagonal RIS.
    摘要

Spectral vs Energy Efficiency in 6G: Impact of the Receiver Front-End

  • paper_url: http://arxiv.org/abs/2310.02622
  • repo_url: None
  • paper_authors: Angel Lozano, Sundeep Rangan
  • for: 本研究探讨了收发前端(RFE)在无线设备中的作用,以及如何通过信息理论来优化RFE的功率消耗。
  • methods: 本研究使用了新的模型和假设,以描述RFE在不同频率、宽扩比和antenna数下的行为。
  • results: 研究发现,随着频率、宽扩比和antenna数的提高,RFE的功率消耗会随之增加,这会导致更多的噪声、非线性和粗糙量化。这种负面效应需要在信息理论中考虑设备功率消耗,以提高6G系统的能效性。
    Abstract This article puts the spotlight on the receiver front-end (RFE), an integral part of any wireless device that information theory typically idealizes into a mere addition of noise. While this idealization was sound in the past, as operating frequencies, bandwidths, and antenna counts rise, a soaring amount of power is required for the RFE to behave accordingly. Containing this surge in power expenditure exposes a harsher behavior on the part of the RFE (more noise, nonlinearities, and coarse quantization), setting up a tradeoff between the spectral efficiency under such nonidealities and the efficiency in the use of energy by the RFE. With the urge for radically better power consumptions and energy efficiencies in 6G, this emerges as an issue on which information theory can cast light at a fundamental level. More broadly, this article advocates the interest of having information theory embrace the device power consumption in its analyses. In turn, this calls for new models and abstractions such as the ones herein put together for the RFE, and for a more holistic perspective.
    摘要

Performance Analysis and Optimization of Reconfigurable Multi-Functional Surface Assisted Wireless Communications

  • paper_url: http://arxiv.org/abs/2310.02564
  • repo_url: None
  • paper_authors: Wen Wang, Wanli Ni, Hui Tian, Naofal Al-Dhahir
  • for: 提高无线网络性能,并且解决现有的双倍干扰和电池依赖问题。
  • methods: 提出一种新的多功能反射器(MF-RIS)架构,可以支持多种功能,包括信号反射、增强和能量收集。
  • results: 通过理论分析,得出了MF-RIS协助通信网络的可达性。与现有的自主维持RIS相比,我们计算出了MF-RIS所需的反射元素数量。在多用户无线网络中,通过joint beamforming和MF-RIS偏置的优化,提高了吞吐量。在实际场景中,我们提出了一种Robust beamforming scheme来适应不可避免的频率估计错误。最后,我们通过数据分析发现:1)相比自主维持RIS,MF-RIS可以更好地平衡能源自维和通信性能提高;2)反射器只能在发射器或接收器附近部署,而MF-RIS应该部署在发射器更近的位置以获得更高的频率效率。
    Abstract Although reconfigurable intelligent surfaces (RISs) can improve the performance of wireless networks by smartly reconfiguring the radio environment, existing passive RISs face two key challenges, i.e., double-fading attenuation and dependence on grid/battery. To address these challenges, this paper proposes a new RIS architecture, called multi-functional RIS (MF-RIS). Different from conventional reflecting-only RIS, the proposed MF-RIS is capable of supporting multiple functions with one surface, including signal reflection, amplification, and energy harvesting. As such, our MF-RIS is able to overcome the double-fading attenuation by harvesting energy from incident signals. Through theoretical analysis, we derive the achievable capacity of an MF-RIS-aided communication network. Compared to the capacity achieved by the existing self-sustainable RIS, we derive the number of reflective elements required for MF-RIS to outperform self-sustainable RIS. To realize a self-sustainable communication system, we investigate the use of MF-RIS in improving the sum-rate of multi-user wireless networks. Specifically, we solve a non-convex optimization problem by jointly designing the transmit beamforming and MF-RIS coefficients. As an extension, we investigate a resource allocation problem in a practical scenario with imperfect channel state information. By approximating the semi-infinite constraints with the S-procedure and the general sign-definiteness, we propose a robust beamforming scheme to combat the inevitable channel estimation errors. Finally, numerical results show that: 1) compared to the self-sustainable RIS, MF-RIS can strike a better balance between energy self-sustainability and throughput improvement; and 2) unlike reflecting-only RIS which can be deployed near the transmitter or receiver, MF-RIS should be deployed closer to the transmitter for higher spectrum efficiency.
    摘要 although reconfigurable intelligent surfaces (RISs) can improve the performance of wireless networks by smartly reconfiguring the radio environment, existing passive RISs face two key challenges, i.e., double-fading attenuation and dependence on grid/battery. to address these challenges, this paper proposes a new RIS architecture, called multi-functional RIS (MF-RIS). different from conventional reflecting-only RIS, the proposed MF-RIS is capable of supporting multiple functions with one surface, including signal reflection, amplification, and energy harvesting. as such, our MF-RIS is able to overcome the double-fading attenuation by harvesting energy from incident signals. through theoretical analysis, we derive the achievable capacity of an MF-RIS-aided communication network. compared to the capacity achieved by the existing self-sustainable RIS, we derive the number of reflective elements required for MF-RIS to outperform self-sustainable RIS. to realize a self-sustainable communication system, we investigate the use of MF-RIS in improving the sum-rate of multi-user wireless networks. specifically, we solve a non-convex optimization problem by jointly designing the transmit beamforming and MF-RIS coefficients. as an extension, we investigate a resource allocation problem in a practical scenario with imperfect channel state information. by approximating the semi-infinite constraints with the S-procedure and the general sign-definiteness, we propose a robust beamforming scheme to combat the inevitable channel estimation errors. finally, numerical results show that: 1) compared to the self-sustainable RIS, MF-RIS can strike a better balance between energy self-sustainability and throughput improvement; and 2) unlike reflecting-only RIS which can be deployed near the transmitter or receiver, MF-RIS should be deployed closer to the transmitter for higher spectrum efficiency.

Multi-Functional Reconfigurable Intelligent Surface: System Modeling and Performance Optimization

  • paper_url: http://arxiv.org/abs/2310.02562
  • repo_url: None
  • paper_authors: Wen Wang, Wanli Ni, Hui Tian, Yonina C. Eldar, Rui Zhang
  • for: 这 paper 描述了一种多功能可 Configurable inteligent surface(MF-RIS)架构,而不是传统的单功能 RIS(SF-RIS),该架构可同时支持多种功能,包括反射、折射、增强和能量收集 wireless 信号。因此,MF-RIS 可以增强 RIS 信号覆盖,通过增强反射/折射后的信号的增强。
  • methods: 作者提出了 MF-RIS 信号模型,并对 MF-RIS 帮助的多用户非对称多访问网络中的总比特率进行最大化。作者通过一种有效的迭代算法,对 transmit 扬性、功率分配以及不同 MF-RIS 元素的操作模式和参数进行joint 优化。
  • results: 作者通过 Simulation 结果表明,MF-RIS 比 SF-RIS 具有显著的性能提升。此外,作者还证明了 MF-RIS 的最佳部署位置应该 closer to the transmitter,以达到更高的通信吞吐量和更多的能量收集。
    Abstract In this paper, we propose and study a multi-functional reconfigurable intelligent surface (MF-RIS) architecture. In contrast to conventional single-functional RIS (SF-RIS) that only reflects signals, the proposed MF-RIS simultaneously supports multiple functions with one surface, including reflection, refraction, amplification, and energy harvesting of wireless signals. As such, the proposed MF-RIS is capable of significantly enhancing RIS signal coverage by amplifying the signal reflected/refracted by the RIS with the energy harvested. We present the signal model of the proposed MF-RIS, and formulate an optimization problem to maximize the sum-rate of multiple users in an MF-RIS-aided non-orthogonal multiple access network. We jointly optimize the transmit beamforming, power allocations as well as the operating modes and parameters for different elements of the MF-RIS and its deployment location, via an efficient iterative algorithm. Simulation results are provided which show significant performance gains of the MF-RIS over SF-RISs with only some of its functions available. Moreover, we demonstrate that there exists a fundamental trade-off between sum-rate maximization and harvested energy maximization. In contrast to SF-RISs which can be deployed near either the transmitter or receiver, the proposed MF-RIS should be deployed closer to the transmitter for maximizing its communication throughput with more energy harvested.
    摘要 在本文中,我们提出了一种多功能可重配置智能面(MF-RIS)架构,与传统的单功能RIS(SF-RIS)不同,MF-RIS同时支持多种功能,包括反射、折射、增强和能量收集等无线信号处理。因此,MF-RIS可以备受提高RIS信号覆盖率,通过增强RIS反射/折射后的信号和收集的能量进行增强。我们提出了MF-RIS信号模型,并将多用户吞吐量最大化问题转化为一个优化问题。我们通过一种高效的迭代算法,同时优化发射扩散、功率分配以及MF-RIS元素的操作模式和部署位置。实验结果表明,MF-RIS比SF-RIS具有显著的性能提升,并且存在许多功能的负面关系。此外,我们发现,MF-RIS应该在发射器更近的位置进行部署,以确保更高的通信吞吐量和更多的能量收集。

Integrated Sensing and Communications towards Proactive Beamforming in mmWave V2I via Multi-Modal Feature Fusion (MMFF)

  • paper_url: http://arxiv.org/abs/2310.02561
  • repo_url: None
  • paper_authors: Haotian Zhang, Shijian Gao, Xiang Cheng, Liuqing Yang
  • for: 提高 vehicular communication networks 的可靠性和数据传输速率,解决传输过程中的窄扩 beam alignment 问题。
  • methods: 提出一种新的积极扫描方案,通过Multi-Modal Feature Fusion Network (MMFF-Net) 集成多种感知和通信功能,以提高扫描精度。
  • results: 对 ViWi 数据集进行验证,实现更高的扫描精度和更stable的角度预测,提高可靠性和数据传输速率,并在复杂动态情况下保证Robust 预测结果。
    Abstract The future of vehicular communication networks relies on mmWave massive multi-input-multi-output antenna arrays for intensive data transfer and massive vehicle access. However, reliable vehicle-to-infrastructure links require narrow beam alignment, which traditionally involves excessive signaling overhead. To address this issue, we propose a novel proactive beamforming scheme that integrates multi-modal sensing and communications via Multi-Modal Feature Fusion Network (MMFF-Net), which is composed of multiple neural network components with distinct functions. Unlike existing methods that rely solely on communication processing, our approach obtains comprehensive environmental features to improve beam alignment accuracy. We verify our scheme on the ViWi dataset, which we enriched with realistic vehicle drifting behavior. Our proposed MMFF-Net achieves more accurate and stable angle prediction, which in turn increases the achievable rates and reduces the communication system outage probability. Even in complex dynamic scenarios, robust prediction results can be guaranteed, demonstrating the feasibility and practicality of the proposed proactive beamforming approach.
    摘要 未来的车辆通信网络将仰赖mmWave巨量多input多output天线阵列进行激烈数据传输和大量车辆存取。然而,可靠的车辆基础设施连接需要窄焦点对alignment,传统上需要过度的讯号过剩。为解决这个问题,我们提出了一个新的积极扫描方案,通过融合多modal感知和通信via多Modal特征融合网络(MMFF-Net),这是由多个神经网络 ком成分组成,每个 ком成分都有不同的功能。与现有的方法不同,我们的方法可以从感知环境中获取丰富的环境特征,以提高扫描精度。我们在ViWi dataset上验证了我们的方法,并将dataset丰富了现实的车辆滑动行为。我们的提案的MMFF-Net可以更高精度和稳定的角度预测,这样可以增加可达率和减少通信系统失效机会。甚至在复杂的动态enario中,我们可以保证Robust的预测结果,这说明了我们的积极扫描方案的可行性和实用性。

ISAC Signal Processing Over Unlicensed Spectrum Bands

  • paper_url: http://arxiv.org/abs/2310.02555
  • repo_url: None
  • paper_authors: Haotian Liu, Zhiqing Wei, Fengyun Li, Yuewei Lin, Hanyang Qu, Huici Wu, Zhiyong Feng
  • for: 本研究旨在提出一种基于压缩学习(Compressed Sensing,CS)的高精度Integrated Sensing and Communication(ISAC)信号处理算法,以解决5G NR中的非连续频谱带中的感知问题。
  • methods: 本研究使用了5G NR中的资源块组(Resource Block Group,RBG)配置信息和通道信息矩阵,以动态和准确地获取电力估计谱。此外,我们采用了快速迭代减小阈值算法(FISTA)来解决重建问题,并使用K-fold交叉验证(KCV)来获得最佳参数。
  • results: 模拟结果表明,提出的算法在非连续频谱带中具有较低的侧lobes或甚至为零侧lobes,同时具有高的抗噪性能。与传统感知算法相比,本研究的算法具有更高的精度和更低的误差。
    Abstract As a promising key technology of 6th-Generation (6G) mobile communication systems, integrated sensing and communication (ISAC) technology aims to make full use of spectrum resources to enable the functional integration of communication and sensing. The ISAC-enabled mobile communication systems regularly operate in non-continuous spectrum bands due to crowded licensed frequency bands. However, the conventional sensing algorithms over non-continuous spectrum bands have disadvantages such as reduced peak-to-sidelobe ratio (PSR) and degraded anti-noise performance. Facing this challenge, we propose a high-precision ISAC signal processing algorithm based on compressed sensing (CS) in this paper. By integrating the resource block group (RBG) configuration information in 5th-Generation new radio (5G NR) and channel information matrices, we can dynamically and accurately obtain power estimation spectra. Moreover, we employ the fast iterative shrinkage-thresholding algorithm (FISTA) to address the reconstruction problem and utilize K-fold cross validation (KCV) to obtain optimal parameters. Simulation results show that the proposed algorithm has lower sidelobes or even zero sidelobes and high anti-noise performance compared with conventional sensing algorithms.
    摘要 As a promising key technology of 6th-Generation (6G) mobile communication systems, integrated sensing and communication (ISAC) technology aims to make full use of spectrum resources to enable the functional integration of communication and sensing. The ISAC-enabled mobile communication systems regularly operate in non-continuous spectrum bands due to crowded licensed frequency bands. However, the conventional sensing algorithms over non-continuous spectrum bands have disadvantages such as reduced peak-to-sidelobe ratio (PSR) and degraded anti-noise performance. Facing this challenge, we propose a high-precision ISAC signal processing algorithm based on compressed sensing (CS) in this paper. By integrating the resource block group (RBG) configuration information in 5th-Generation new radio (5G NR) and channel information matrices, we can dynamically and accurately obtain power estimation spectra. Moreover, we employ the fast iterative shrinkage-thresholding algorithm (FISTA) to address the reconstruction problem and utilize K-fold cross validation (KCV) to obtain optimal parameters. Simulation results show that the proposed algorithm has lower sidelobes or even zero sidelobes and high anti-noise performance compared with conventional sensing algorithms.Here's the word-for-word translation in Simplified Chinese:作为6G移动通信系统的优秀关键技术,集成感知通信(ISAC)技术目的是充分利用频谱资源,实现通信和感知的函数集成。ISAC启用的移动通信系统通常在非连续频谱带中运行,由于拥堵的licensed频谱带。但是,传统的感知算法在非连续频谱带上有缺点,如降低峰夹强度比(PSR)和降低反噪性能。面临这个挑战,我们在本文中提出了基于压缩感知(CS)的高精度ISAC信号处理算法。通过将5G NR中的资源块组(RBG)配置信息和通道信息矩阵绑定在一起,我们可以在实时和准确地获取电力估计谱。此外,我们采用了快速迭代缩放阈值算法(FISTA)解决重建问题,并使用K-fold Cross Validation(KCV)获得优化参数。实验结果表明,提议的算法在传统感知算法中具有更低的夹强或 même zero夹强,以及高反噪性能。

Convergence Analysis and Latency Minimization for Semi-Federated Learning in Massive IoT Networks

  • paper_url: http://arxiv.org/abs/2310.02550
  • repo_url: None
  • paper_authors: Jianyang Ren, Wanli Ni, Hui Tian, Gaofeng Nie
  • For: This paper is written to address the issue of Federated Learning (FL) latency in Internet of Things (IoT) networks with massive devices.* Methods: The paper proposes a Semi-Federated Learning (SemiFL) paradigm that combines network pruning and over-the-air computation to reduce FL latency.* Results: The paper derives an upper bound on the convergence performance of the proposed SemiFL and formulates a convergence-constrained SemiFL latency minimization problem. Iterative algorithms are designed to solve the problem efficiently, and numerical simulations verify the effectiveness of the proposed scheme in reducing latency while maintaining identification accuracy.
    Abstract As the number of sensors becomes massive in Internet of Things (IoT) networks, the amount of data is humongous. To process data in real-time while protecting user privacy, federated learning (FL) has been regarded as an enabling technique to push edge intelligence into IoT networks with massive devices. However, FL latency increases dramatically due to the increase of the number of parameters in deep neural network and the limited computation and communication capabilities of IoT devices. To address this issue, we propose a semi-federated learning (SemiFL) paradigm in which network pruning and over-the-air computation are efficiently applied. To be specific, each small base station collects the raw data from its served sensors and trains its local pruned model. After that, the global aggregation of local gradients is achieved through over-the-air computation. We first analyze the performance of the proposed SemiFL by deriving its convergence upper bound. To reduce latency, a convergence-constrained SemiFL latency minimization problem is formulated. By decoupling the original problem into several sub-problems, iterative algorithms are designed to solve them efficiently. Finally, numerical simulations are conducted to verify the effectiveness of our proposed scheme in reducing latency and guaranteeing the identification accuracy.
    摘要 为了处理互联网器(IoT)网络中巨大数据的实时处理,保护用户隐私,联邦学习(FL)被视为一种可能的技术。然而,FL的延迟增加了许多,这是由于深度神经网络中参数的增加和IoT设备的计算和通信能力的限制。为解决这个问题,我们提出了半联邦学习(SemiFL)模式。在这种模式下,每个小基站收集自己服务的感知器的原始数据,并在本地进行缩减模型的训练。然后,通过空中计算,全球集成本地梯度的全球汇总。我们首先分析了提议的SemiFL性能,并derive其 convergence upper bound。为了降低延迟,我们提出了一个基于泛化的SemiFL延迟最小化问题。通过分解原始问题,我们设计了高效的迭代算法来解决它们。最后,我们通过数值仿真来验证我们的提议的有效性,降低延迟,并保证标识精度。

Enabling Energy-Efficiency in Massive-MIMO: A Scalable Low-Complexity Decoder for Generalized Quadrature Spatial Modulation

  • paper_url: http://arxiv.org/abs/2310.02545
  • repo_url: None
  • paper_authors: Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, David González G., Osvaldo Gonsa
  • for: 提高无线系统的能量和频率效率
  • methods: 使用新的准则和算法,包括向量化 Gaussian belief propagation(GaBP)算法和单元向量分解(UVD)
  • results: 在计算机模拟中,包括32个发射天线的系统,实现了高能量和频率效率,展示了该方法在true massive MIMO设置中的潜在潜力。
    Abstract Generalized quadrature spatial modulation (GQSM) schemes are known to achieve high energy- and spectral- efficiencies by modulating information both in transmitted symbols and in coded combinatorial activations of subsets of multiple transmit antennas. A challenge of the approach is, however, the decoding complexity which scales with the efficiency of the scheme. In order to circumvent this bottleneck and enable high-performance and feasible GQSM in massive multiple-input multiple-output (mMIMO) scenarios, we propose a novel decoding algorithm which enjoys a complexity order that is independent of the combinatorial factor. This remarkable feature of the proposed decoder is a consequence of a novel vectorized Gaussian belief propagation (GaBP) algorithm, here contributed, whose message passing (MP) rules leverage both pilot symbols and the unit vector decomposition (UVD) of the GQSM signal structure. The effectiveness of the proposed UVD-GaBP method is illustrated via computer simulations including numerical results for systems of a size never before reported in related literature (up to 32 transmit antennas), which demonstrates the potential of the approach in paving the way towards high energy and spectral efficiency for wireless systems in a truly mMIMO setting.
    摘要 通用 quadrature 空间模ulation(GQSM)策略已经知道可以实现高能量和频率效率,但是这种方法的解码复杂度与策略的效率成线性关系。为了缓解这个瓶颈并实现高性能可行的GQSM,我们提出了一种新的解码算法,其复杂度随着策略的效率而变化。这一特点是我们提出的解码器的一个重要特点,它基于一种新的vectorized Gaussian belief propagation(GaBP)算法,该算法利用了传输符号和GQSM信号结构的unit vector decomposition(UVD)。我们通过计算机实验,包括数字结果,证明了该方法的有效性,并且可以在相关文献中找到相应的实验结果。这些结果表明了我们的方法在大规模mMIMO场景下具有很高的能量和频率效率。

cs.SD - 2023-10-03

Audio-visual child-adult speaker classification in dyadic interactions

  • paper_url: http://arxiv.org/abs/2310.01867
  • repo_url: None
  • paper_authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Helen Tager-Flusberg, Shrikanth Narayanan
  • for: 这个论文的目的是提高自动识别儿童和成人之间的交流,以便更加准确地了解儿童在不同情况下的表达和行为。
  • methods: 这个论文使用了一种 combining audio and video modalities的方法,通过活动 speaker detection 和视觉处理模型来提高儿童和成人的识别精度和稳定性。
  • results: 实验结果表明,在使用视觉信号的情况下,识别精度和稳定性都得到了大幅提高,相比 Audio-only 模型,在一个面和两个面可见的情况下,相对提高了2.38%和3.97%的F1宏score。
    Abstract Interactions involving children span a wide range of important domains from learning to clinical diagnostic and therapeutic contexts. Automated analyses of such interactions are motivated by the need to seek accurate insights and offer scale and robustness across diverse and wide-ranging conditions. Identifying the speech segments belonging to the child is a critical step in such modeling. Conventional child-adult speaker classification typically relies on audio modeling approaches, overlooking visual signals that convey speech articulation information, such as lip motion. Building on the foundation of an audio-only child-adult speaker classification pipeline, we propose incorporating visual cues through active speaker detection and visual processing models. Our framework involves video pre-processing, utterance-level child-adult speaker detection, and late fusion of modality-specific predictions. We demonstrate from extensive experiments that a visually aided classification pipeline enhances the accuracy and robustness of the classification. We show relative improvements of 2.38% and 3.97% in F1 macro score when one face and two faces are visible, respectively.
    摘要 互动 involving 儿童 span 广泛的重要领域,从学习到临床诊断和治疗上下。自动分析这些互动的目的是寻求准确的洞察和在多样化和广泛的条件下提供扩展和坚固性。确定孩子的语音段为 kritical step in such modeling. conventional child-adult speaker classification 通常采用音频模型方法,忽略视觉信号,例如唇部运动。我们基于音频只的儿童-成人 speaker classification 管道,提议包含视觉信号,通过活动 speaker detection 和视觉处理模型。我们的框架包括视频预处理、utterance-level child-adult speaker detection 和多模式特征 fusion。经过广泛的实验,我们证明了通过视觉支持的分类管道可以提高分类精度和稳定性。我们显示在一个面和两个面可见时的相对改善为2.38%和3.97%的F1宏Score。

Mel-Band RoFormer for Music Source Separation

  • paper_url: http://arxiv.org/abs/2310.01809
  • repo_url: None
  • paper_authors: Ju-Chiang Wang, Wei-Tsung Lu, Minz Won
  • for: 这个论文主要targets music source separation, specifically using a multi-band spectrogram-based approach with a hierarchical Transformer and Rotary Position Embedding (RoPE) for multi-band mask estimation.
  • methods: 这个模型使用了band-split scheme,但是这个scheme是基于empirical experiments而定义的,没有文学支持。这个论文提出了Mel-RoFormer模型,它采用了mel scale来映射频谱分解成 overlap subbands。
  • results: 在使用MUSDB18HQ dataset进行实验时,Mel-RoFormer模型比BS-RoFormer模型在分离 vocals, drums和其他的stems中表现更好。
    Abstract Recently, multi-band spectrogram-based approaches such as Band-Split RNN (BSRNN) have demonstrated promising results for music source separation. In our recent work, we introduce the BS-RoFormer model which inherits the idea of band-split scheme in BSRNN at the front-end, and then uses the hierarchical Transformer with Rotary Position Embedding (RoPE) to model the inner-band and inter-band sequences for multi-band mask estimation. This model has achieved state-of-the-art performance, but the band-split scheme is defined empirically, without analytic supports from the literature. In this paper, we propose Mel-RoFormer, which adopts the Mel-band scheme that maps the frequency bins into overlapped subbands according to the mel scale. In contract, the band-split mapping in BSRNN and BS-RoFormer is non-overlapping and designed based on heuristics. Using the MUSDB18HQ dataset for experiments, we demonstrate that Mel-RoFormer outperforms BS-RoFormer in the separation tasks of vocals, drums, and other stems.
    摘要 Translation in Simplified Chinese:近期,基于多 banda spectrogram 的方法,如 Band-Split RNN (BSRNN),已经在音乐源分离方面显示出了有前途的结果。在我们最近的工作中,我们引入了 BS-RoFormer 模型,该模型继承了 BSRNN 的前端band-split scheme,然后使用层次 Transformer 以及 Rotary Position Embedding (RoPE) 来模型内部band和 междуband的序列,进行多band屏蔽估计。这个模型已经达到了状态之 искусственный支持,但是band-split scheme是基于empirical定义的,没有文献支持。在这篇论文中,我们提出了 Mel-RoFormer 模型,该模型采用了 Mel 带 scheme,将频谱窗口中的频谱带分成了相互重叠的子带,根据 Mel 尺度进行映射。与BSRNN和 BS-RoFormer 中的band-split映射不同,Mel-RoFormer 的映射是非相互重叠的,并且是基于 empirical 的设计。使用 MUSDB18HQ 数据集进行实验,我们示出了 Mel-RoFormer 在 vocals、鼓和其他声道的分离任务中的优于 BS-RoFormer。

cs.CV - 2023-10-03

Harvard Eye Fairness: A Large-Scale 3D Imaging Dataset for Equitable Eye Diseases Screening and Fair Identity Scaling

  • paper_url: http://arxiv.org/abs/2310.02492
  • repo_url: None
  • paper_authors: Yan Luo, Yu Tian, Min Shi, Tobias Elze, Mengyu Wang
    for:* 这个论文的目的是为了提供一个大规模的公共医疗 datasets,用于针对医学领域的公平学习。methods:* 这篇论文使用了一种新的公平标准化方法(FIS),并与其他当前的公平学习方法进行比较。results:* 论文提出了一个包含30,000名参与者的 Harvard-EF dataset,该dataset包括了3D optical coherence tomography scans和2D fundus photos,并且包含了6个个人特征。Here’s the same information in Simplified Chinese text:for:* 这篇论文的目的是为了提供一个大规模的公共医疗数据集,用于针对医学领域的公平学习。methods:* 这篇论文使用了一种新的公平标准化方法(FIS),并与其他当前的公平学习方法进行比较。results:* 论文提出了一个包含30,000名参与者的 Harvard-EF 数据集,该数据集包括了3D optical coherence tomography scans和2D fundus photos,并且包含了6个个人特征。
    Abstract Fairness or equity in machine learning is profoundly important for societal well-being, but limited public datasets hinder its progress, especially in the area of medicine. It is undeniable that fairness in medicine is one of the most important areas for fairness learning's applications. Currently, no large-scale public medical datasets with 3D imaging data for fairness learning are available, while 3D imaging data in modern clinics are standard tests for disease diagnosis. In addition, existing medical fairness datasets are actually repurposed datasets, and therefore they typically have limited demographic identity attributes with at most three identity attributes of age, gender, and race for fairness modeling. To address this gap, we introduce our Eye Fairness dataset with 30,000 subjects (Harvard-EF) covering three major eye diseases including age-related macular degeneration, diabetic retinopathy, and glaucoma affecting 380 million patients globally. Our Harvard-EF dataset includes both 2D fundus photos and 3D optical coherence tomography scans with six demographic identity attributes including age, gender, race, ethnicity, preferred language, and marital status. We also propose a fair identity scaling (FIS) approach combining group and individual scaling together to improve model fairness. Our FIS approach is compared with various state-of-the-art fairness learning methods with superior performance in the racial, gender, and ethnicity fairness tasks with 2D and 3D imaging data, which demonstrate the utilities of our Harvard-EF dataset for fairness learning. To facilitate fairness comparisons between different models, we propose performance-scaled disparity measures, which can be used to compare model fairness accounting for overall performance levels. The dataset and code are publicly accessible via https://ophai.hms.harvard.edu/datasets/harvard-ef30k.
    摘要 “公平或公德在机器学习中非常重要,但有限的公共数据集阻碍了其进步,特别是在医疗领域。无疑地,医疗领域的公平是机器学习公平应用的重要领域。目前,没有大规模的公共医疗数据集,包含3D成像数据,用于公平学习。此外,现有的医疗公平数据集都是 reuse 的数据集,因此它们通常只有限制的民族标识属性,最多只有三个标识属性:年龄、性别和种族。为解决这个差距,我们引入了我们的眼睛公平数据集(Harvard-EF),包括30,000名参与者,涵盖三种主要的眼病:年龄相关的macular degeneration、 диабетиче Retinopathy 和 Glaucoma,全球380亿名患者。我们的 Harvard-EF 数据集包括2D背景照片和3D光子成像扫描,六个民族标识属性:年龄、性别、种族、语言、婚姻状况和民族。我们还提出了一种公平标识扩大(FIS)方法,结合集体和个体扩大方法,以提高模型公平性。我们的 FIS 方法与各种当前最佳公平学习方法进行比较,在种族、性别和民族公平任务中表现优秀,这些任务使用 2D 和 3D 成像数据。为便于公平比较不同模型,我们提议用性能比例的 disparity 度量,可以用来比较不同模型的公平性,考虑到不同模型的总性能水平。数据集和代码可以通过以下链接获取:https://ophai.hms.harvard.edu/datasets/harvard-ef30k。”

OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

  • paper_url: http://arxiv.org/abs/2310.02486
  • repo_url: None
  • paper_authors: Ahmed Albishri, Syed Jawad Hussain Shah, Yugyung Lee, Rong Wang
  • for: 革新的肺癌图像分割模型OCU-Net,旨在提高肺癌图像分割精度。
  • methods: OCU-Net使用了多种进步的深度学习模块,包括通道和空间注意力融合(CSAF)模块、抑压激活(SE)注意力模块、尺度空间Pyramid Pooling(ASPP)模块、循环块和多尺度融合。
  • results: OCU-Net和OCU-Netm在两个H&E图像 Dataset上实现了高精度的肺癌图像分割,超过了现有的分割方法。
    Abstract Accurate detection of oral cancer is crucial for improving patient outcomes. However, the field faces two key challenges: the scarcity of deep learning-based image segmentation research specifically targeting oral cancer and the lack of annotated data. Our study proposes OCU-Net, a pioneering U-Net image segmentation architecture exclusively designed to detect oral cancer in hematoxylin and eosin (H&E) stained image datasets. OCU-Net incorporates advanced deep learning modules, such as the Channel and Spatial Attention Fusion (CSAF) module, a novel and innovative feature that emphasizes important channel and spatial areas in H&E images while exploring contextual information. In addition, OCU-Net integrates other innovative components such as Squeeze-and-Excite (SE) attention module, Atrous Spatial Pyramid Pooling (ASPP) module, residual blocks, and multi-scale fusion. The incorporation of these modules showed superior performance for oral cancer segmentation for two datasets used in this research. Furthermore, we effectively utilized the efficient ImageNet pre-trained MobileNet-V2 model as a backbone of our OCU-Net to create OCU-Netm, an enhanced version achieving state-of-the-art results. Comprehensive evaluation demonstrates that OCU-Net and OCU-Netm outperformed existing segmentation methods, highlighting their precision in identifying cancer cells in H&E images from OCDC and ORCA datasets.
    摘要 医学图像分割是肺癌检测中不可或缺的一环,但是这个领域面临两个主要挑战:一是深度学习基于图像分割的研究对肺癌的不足,二是图像分割数据集的标注不充分。我们的研究提出了OCU-Net,一种专门为肺癌H&E染色图像进行图像分割的U-Net架构。OCU-Net包含了高级深度学习模块,如通道和空间注意力融合(CSAF)模块、抑压扩展(SE)注意力模块、精细空间 Pyramid Pooling(ASPP)模块、循环块和多比例融合。这些模块的结合使OCU-Net在两个测试数据集上实现了优秀的肺癌图像分割表现。此外,我们有效地利用了高效的ImageNet预训练MobileNet-V2模型作为OCU-Net的基础模型,创建了OCU-Netm,这是一个提高了表现的版本。对OCU-Net和OCU-Netm进行了全面的评估,表明它们在OCDC和ORCA数据集上的肺癌图像分割表现比既有方法更加精准。

EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2310.02437
  • repo_url: https://github.com/anish-bhattacharya/evdnerf
  • paper_authors: Anish Bhattacharya, Ratnesh Madaan, Fernando Cladera, Sai Vemprala, Rogerio Bonatti, Kostas Daniilidis, Ashish Kapoor, Vijay Kumar, Nikolai Matni, Jayesh K. Gupta
  • for: 用于重constructing事件流动场景中的快速运动,包括非固定和固定扭变。
  • methods: 使用事件摄像头捕捉 asynchronous per-pixel明亮变化,并使用神经透辑场(NeRF)进行可见质量的可学习渲染。
  • results: 提供了一个可用于 simulate dynamic scenes的事件基本的NeRF模型,并在不同的批处大小下进行了训练,以提高测试时间分辨率的预测结果,超越基eline的标准动态NeRF和事件模拟器。
    Abstract We present EvDNeRF, a pipeline for generating event data and training an event-based dynamic NeRF, for the purpose of faithfully reconstructing eventstreams on scenes with rigid and non-rigid deformations that may be too fast to capture with a standard camera. Event cameras register asynchronous per-pixel brightness changes at MHz rates with high dynamic range, making them ideal for observing fast motion with almost no motion blur. Neural radiance fields (NeRFs) offer visual-quality geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes. Our EvDNeRF can predict eventstreams of dynamic scenes from a static or moving viewpoint between any desired timestamps, thereby allowing it to be used as an event-based simulator for a given scene. We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions, outperforming baselines that pair standard dynamic NeRFs with event simulators. We release our simulated and real datasets, as well as code for both event-based data generation and the training of event-based dynamic NeRF models (https://github.com/anish-bhattacharya/EvDNeRF).
    摘要 我们提出了EvDNeRF,一个搬运数据生成和训练基于事件的动态NeRF渲染管道,用于重现具有固定和非固定扭formation的场景中的事件流。事件摄像机可以在MHz速率上高Dynamic范围内注册ynchronously per-pixel的明亮变化,这使得它们成为观察快速运动的首选方式。Neural radiance fields(NeRFs)提供可见质量基于Geometric的学习渲染,但以前的事件处理只考虑了静止场景的重建。我们的EvDNeRF可以预测动态场景的事件流,从任意时间戳点的视角中查看和渲染场景,因此可以作为场景的事件基本模拟器。我们证明,通过训练不同批大小的事件,可以在细致的时间分辨率上提高测试预测的事件,超过基准的标准动态NeRF和事件模拟器的组合。我们发布了模拟和实际数据集,以及用于生成事件基本数据和训练事件基于动态NeRF模型的代码(https://github.com/anish-bhattacharya/EvDNeRF)。

EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

  • paper_url: http://arxiv.org/abs/2310.02426
  • repo_url: None
  • paper_authors: Samyadeep Basu, Mehrdad Saberi, Shweta Bhardwaj, Atoosa Malemir Chegini, Daniela Massiceti, Maziar Sanjabi, Shell Xu Hu, Soheil Feizi
    for: EditVal is a standardized benchmark for evaluating text-guided image editing methods, which aims to provide a fair comparison of different methods across different types of fine-grained edits.methods: EditVal uses a curated dataset of images, a set of editable attributes for each image, and pre-trained vision-language models to assess the fidelity of generated images for each edit type.results: The top-performing methods averaged across different edit types are Instruct-Pix2Pix and Null-Text, but only Instruct-Pix2Pix and Null-Text are able to preserve original image properties. Most editing methods fail at edits involving spatial operations, and there is no single `winner’ method that ranks the best individually across a range of different edit types.
    Abstract A plethora of text-guided image editing methods have recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models such as Imagen and Stable Diffusion. A standardized evaluation protocol, however, does not exist to compare methods across different types of fine-grained edits. To address this gap, we introduce EditVal, a standardized benchmark for quantitatively evaluating text-guided image editing methods. EditVal consists of a curated dataset of images, a set of editable attributes for each image drawn from 13 possible edit types, and an automated evaluation pipeline that uses pre-trained vision-language models to assess the fidelity of generated images for each edit type. We use EditVal to benchmark 8 cutting-edge diffusion-based editing methods including SINE, Imagic and Instruct-Pix2Pix. We complement this with a large-scale human study where we show that EditVall's automated evaluation pipeline is strongly correlated with human-preferences for the edit types we considered. From both the human study and automated evaluation, we find that: (i) Instruct-Pix2Pix, Null-Text and SINE are the top-performing methods averaged across different edit types, however {\it only} Instruct-Pix2Pix and Null-Text are able to preserve original image properties; (ii) Most of the editing methods fail at edits involving spatial operations (e.g., changing the position of an object). (iii) There is no `winner' method which ranks the best individually across a range of different edit types. We hope that our benchmark can pave the way to developing more reliable text-guided image editing tools in the future. We will publicly release EditVal, and all associated code and human-study templates to support these research directions in https://deep-ml-research.github.io/editval/.
    摘要 “一些文本引导的图像修改方法在最近得到了广泛研究,利用大规模的扩散基于生成模型,如Imagen和Stable Diffusion。然而,没有一个标准化的评估协议,以便比较不同类型的细腻修改方法。为解决这个 gap,我们介绍 EditVal,一个标准化的评估协议,用于量化评估文本引导的图像修改方法。EditVal包括一个精心准备的图像集,每个图像的编辑属性,以及使用预训练的视觉语言模型来评估生成图像的准确性。我们使用 EditVal 评估8种 cutting-edge 扩散基于修改方法,包括 SINE、Imagic 和 Instruct-Pix2Pix。我们 также进行了大规模的人类研究,并发现:(i) Instruct-Pix2Pix 和 Null-Text 是最高效的方法,但只有 Instruct-Pix2Pix 能够保留原图像的性质;(ii)大多数修改方法在空间操作(例如,对象的位置变化)时失败;(iii)没有一个 `winner' 方法,能够在不同类型的修改任务中表现出色。我们希望 EditVal 能够为未来开发更可靠的文本引导图像修改工具提供依据。我们将在 https://deep-ml-research.github.io/editval/ 上公开 EditVal,以及所有相关的代码和人类研究模板。”

FedL2P: Federated Learning to Personalize

  • paper_url: http://arxiv.org/abs/2310.02420
  • repo_url: https://github.com/royson/fedl2p
  • paper_authors: Royson Lee, Minyoung Kim, Da Li, Xinchi Qiu, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane
  • for: 学习个性化策略的 Federated Meta-Learning 问题
  • methods: 使用 Federated Learning 方法学习批处理和学习率参数,以适应每个客户端的特定数据分布
  • results: 比标准手动定制策略提高表达效果,在标签和特征转移 Situations 中均显示提高表达效果
    Abstract Federated learning (FL) research has made progress in developing algorithms for distributed learning of global models, as well as algorithms for local personalization of those common models to the specifics of each client's local data distribution. However, different FL problems may require different personalization strategies, and it may not even be possible to define an effective one-size-fits-all personalization strategy for all clients: depending on how similar each client's optimal predictor is to that of the global model, different personalization strategies may be preferred. In this paper, we consider the federated meta-learning problem of learning personalization strategies. Specifically, we consider meta-nets that induce the batch-norm and learning rate parameters for each client given local data statistics. By learning these meta-nets through FL, we allow the whole FL network to collaborate in learning a customized personalization strategy for each client. Empirical results show that this framework improves on a range of standard hand-crafted personalization baselines in both label and feature shift situations.
    摘要 联合学习(FL)的研究已经做出了分布式学习全域模型的算法和每个客户的本地数据分布特有的本地个性化算法的进展。但是不同的FL问题可能需要不同的个性化策略,而且可能无法定义一个一般适用的一个大小全域个性化策略 для所有客户:对于每个客户的最佳预测器与全域模型的预测器之间的相似度可能会影响选择的个性化策略。在这篇论文中,我们考虑了联合元学习问题,具体是通过FL学习每个客户的批装优化和学习速率参数。通过这种方法,我们让整个FL网络共同学习每个客户的自适应策略。实验结果显示,这个框架可以超过一些传统手工个性化基准的表现,包括标签和特征迁移的情况下。

Bag of Tricks for Fully Test-Time Adaptation

  • paper_url: http://arxiv.org/abs/2310.02416
  • repo_url: https://github.com/smounsav/tta_bot
  • paper_authors: Saypraseuth Mounsaveng, Florent Chiaroni, Malik Boudiaf, Marco Pedersoli, Ismail Ben Ayed
  • for: 本研究旨在为Test-Time Adaptation (TTA)提供一个系统性的分类和分析,以帮助汇集社区的知识,并提高TTA的性能。
  • methods: 本研究使用了多种 orthogonal TTA 技巧,包括小批量均值、流程重新平衡、可靠样本选择和网络自信度调整。
  • results: 通过分析各种场景,本研究揭示了每个技巧的影响,并探讨了精度、计算资源和模型复杂性之间的交互关系。此外,研究还发现了 combining 技巧时的 synergy,并实现了新的州OF-the-art 结果。
    Abstract Fully Test-Time Adaptation (TTA), which aims at adapting models to data drifts, has recently attracted wide interest. Numerous tricks and techniques have been proposed to ensure robust learning on arbitrary streams of unlabeled data. However, assessing the true impact of each individual technique and obtaining a fair comparison still constitutes a significant challenge. To help consolidate the community's knowledge, we present a categorization of selected orthogonal TTA techniques, including small batch normalization, stream rebalancing, reliable sample selection, and network confidence calibration. We meticulously dissect the effect of each approach on different scenarios of interest. Through our analysis, we shed light on trade-offs induced by those techniques between accuracy, the computational power required, and model complexity. We also uncover the synergy that arises when combining techniques and are able to establish new state-of-the-art results.
    摘要 全面测试时适应(TTA),目标是适应数据变化,在最近几年内吸引了广泛的关注。许多技巧和技术已经被提议,以确保在无标注数据流中强健学习。然而,评估每个个体技巧的真正影响和取得公平比较仍然是一项重要挑战。为了帮助共同知识的整合,我们提出了选择的orthogonal TTA技巧的分类,包括小批量正常化、流量重新平衡、可靠样本选择和网络信任均衡。我们仔细分析每种方法在不同的场景下的效果,并揭示了每种技巧的精度、计算资源和模型复杂性之间的贸易。此外,我们发现了不同技巧的合作 synergy,并成功建立了新的状态状态 record。

FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.02401
  • repo_url: None
  • paper_authors: Yingqian Cui, Jie Ren, Yuping Lin, Han Xu, Pengfei He, Yue Xing, Wenqi Fan, Hui Liu, Jiliang Tang
  • for: 防止文本到图像散发模型ersonal化的违用数据泄露问题
  • methods: 提出了FT-Shield watermarking方法,通过在文本到图像散发模型的训练图像上生成一个特有的水印,以便在文本到图像散发模型生成的图像上快速和准确地检测水印
  • results: 通过实验证明FT-Shield可以有效地检测文本到图像散发模型的违用数据泄露问题
    Abstract Text-to-image generative models based on latent diffusion models (LDM) have demonstrated their outstanding ability in generating high-quality and high-resolution images according to language prompt. Based on these powerful latent diffusion models, various fine-tuning methods have been proposed to achieve the personalization of text-to-image diffusion models such as artistic style adaptation and human face transfer. However, the unauthorized usage of data for model personalization has emerged as a prevalent concern in relation to copyright violations. For example, a malicious user may use the fine-tuning technique to generate images which mimic the style of a painter without his/her permission. In light of this concern, we have proposed FT-Shield, a watermarking approach specifically designed for the fine-tuning of text-to-image diffusion models to aid in detecting instances of infringement. We develop a novel algorithm for the generation of the watermark to ensure that the watermark on the training images can be quickly and accurately transferred to the generated images of text-to-image diffusion models. A watermark will be detected on an image by a binary watermark detector if the image is generated by a model that has been fine-tuned using the protected watermarked images. Comprehensive experiments were conducted to validate the effectiveness of FT-Shield.
    摘要

ScaleNet: An Unsupervised Representation Learning Method for Limited Information

  • paper_url: http://arxiv.org/abs/2310.02386
  • repo_url: None
  • paper_authors: Huili Huang, M. Mahdi Roozbahani
  • for: 增强深度卷积神经网络(ConvNet)在有限信息情况下学习高级别semantic视觉表示。
  • methods: 基于多尺度图像的简单和高效无监督表示学习方法,即ScaleNet,以提高ConvNet的性能。
  • results: ScaleNet在CIFAR-10和ImageNet datasets上,使用不同的architecture(如AlexNet和ResNet50),在限定数据量情况下,提高了ConvNet的 rotation-prediction 任务性能,并且可以超过RotNet模型。 transferred parameters from a ScaleNet model with limited data also improve the ImageNet Classification task by about 6% compared to the RotNet model.
    Abstract Although large-scale labeled data are essential for deep convolutional neural networks (ConvNets) to learn high-level semantic visual representations, it is time-consuming and impractical to collect and annotate large-scale datasets. A simple and efficient unsupervised representation learning method named ScaleNet based on multi-scale images is proposed in this study to enhance the performance of ConvNets when limited information is available. The input images are first resized to a smaller size and fed to the ConvNet to recognize the rotation degree. Next, the ConvNet learns the rotation-prediction task for the original size images based on the parameters transferred from the previous model. The CIFAR-10 and ImageNet datasets are examined on different architectures such as AlexNet and ResNet50 in this study. The current study demonstrates that specific image features, such as Harris corner information, play a critical role in the efficiency of the rotation-prediction task. The ScaleNet supersedes the RotNet by ~7% in the limited CIFAR-10 dataset. The transferred parameters from a ScaleNet model with limited data improve the ImageNet Classification task by about 6% compared to the RotNet model. This study shows the capability of the ScaleNet method to improve other cutting-edge models such as SimCLR by learning effective features for classification tasks.
    摘要 although large-scale labeled data are essential for deep convolutional neural networks (ConvNets) to learn high-level semantic visual representations, it is time-consuming and impractical to collect and annotate large-scale datasets. a simple and efficient unsupervised representation learning method named ScaleNet based on multi-scale images is proposed in this study to enhance the performance of ConvNets when limited information is available. the input images are first resized to a smaller size and fed to the ConvNet to recognize the rotation degree. next, the ConvNet learns the rotation-prediction task for the original size images based on the parameters transferred from the previous model. the CIFAR-10 and ImageNet datasets are examined on different architectures such as AlexNet and ResNet50 in this study. the current study demonstrates that specific image features, such as harris corner information, play a critical role in the efficiency of the rotation-prediction task. the ScaleNet supersedes the RotNet by ~7% in the limited CIFAR-10 dataset. the transferred parameters from a ScaleNet model with limited data improve the ImageNet Classification task by about 6% compared to the RotNet model. this study shows the capability of the ScaleNet method to improve other cutting-edge models such as SimCLR by learning effective features for classification tasks.

Multi-Prompt Fine-Tuning of Foundation Models for Enhanced Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2310.02381
  • repo_url: None
  • paper_authors: Xiangru Li, Yifei Zhang, Liang Zhao
  • for: 提高医学图像分割表现
  • methods: 利用SAM模型的多个提示批处理功能,并在批处理中使用两个标注的 bounding box 作为参考
  • results: 在多种分割任务上显著提高表现指标
    Abstract The Segment Anything Model (SAM) is a powerful foundation model that introduced revolutionary advancements in natural image segmentation. However, its performance remains sub-optimal when delineating the intricate structure of biomedical images, where multiple organs and tissues intertwine in a single image. In this study, we introduce a novel fine-tuning framework that leverages SAM's ability to bundle and process multiple prompts per image and seeks to improve SAM's performance in medical images. We first curated a medical image dataset that consists of CT scans of lesions in various organs, each with two annotations for organs and lesions respectively. Then, we fine-tuned SAM's mask decoder within our framework by batching both bounding boxes generated from ground truth masks as reference. The batched prompt strategy we introduced not only addresses the inherent complexity and ambiguity often found in medical images but also substantially enhances performance metrics when applied onto a wide range of segmentation tasks.
    摘要 “对于自然图像分割,Segment Anything Model(SAM)是一个具有革命性的基础模型。然而,它在医学影像中的性能仍然偏低,因为该区域内有多个器官和组织相互纠缠。在这个研究中,我们介绍了一个新的精确化框架,它利用SAM将多个提示集成一个影像,并对SAM的面瘫处理器进行了修正。我们首先从各种器官的CT扫描图中组建了医学影像集,每个图片都有两个标注:一个是器官的 bounding box,另一个是病变的 bounding box。然后,我们在我们的框架中进行了SAM的面瘫处理器的精确化,使用批处理 Both bounding boxes 生成自真实标注masks作为参考。我们称之为批处理 Both bounding boxes 策略,这策略不��ely addressing the inherent complexity and ambiguity often found in medical images, but also significantly enhances performance metrics when applied to a wide range of segmentation tasks。”

DREAM: Visual Decoding from Reversing Human Visual System

  • paper_url: http://arxiv.org/abs/2310.02265
  • repo_url: None
  • paper_authors: Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue
  • For: 本研究团队开发了一种基于人类视觉系统知识的 fMRI-to-image 方法,可以从大脑活动中重建视图图像。* Methods: 这种方法使用了人类视觉系统中的层次和平行结构,通过两个特定的组件来模拟人类视觉系统内部的反向过程:Reverse Visual Association Cortex (R-VAC) 和 Reverse Parallel PKM (R-PKM)。* Results: 实验结果表明,这种方法在 терms of 出现 consistency, 结构和 semantics 方面表现更好于当前州态艺的模型。
    Abstract In this work we present DREAM, an fMRI-to-image method for reconstructing viewed images from brain activities, grounded on fundamental knowledge of the human visual system. We craft reverse pathways that emulate the hierarchical and parallel nature of how humans perceive the visual world. These tailored pathways are specialized to decipher semantics, color, and depth cues from fMRI data, mirroring the forward pathways from visual stimuli to fMRI recordings. To do so, two components mimic the inverse processes within the human visual system: the Reverse Visual Association Cortex (R-VAC) which reverses pathways of this brain region, extracting semantics from fMRI data; the Reverse Parallel PKM (R-PKM) component simultaneously predicting color and depth from fMRI signals. The experiments indicate that our method outperforms the current state-of-the-art models in terms of the consistency of appearance, structure, and semantics. Code will be made publicly available to facilitate further research in this field.
    摘要 在这项工作中,我们介绍了DREAM方法,它可以从脑活动中重建被观看的图像,基于人类视觉系统的基本知识。我们设计了逆行Pathways,这些Pathways模拟了人类视觉系统的层次和平行结构,从fMRI数据中提取 semantics、色彩和深度信息。为此,我们采用了两个组件:倒转视觉关联区(R-VAC)和倒转平行PKM(R-PKM)组件,它们分别模拟了人类视觉系统内的逆向过程。实验结果显示,我们的方法在 struttural consistency、semantic consistency和appearance consistency等方面都有较高的表现,比之前的状态艺方法更高。我们将代码公开发布,以便进一步的研究在这个领域。Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

  • paper_url: http://arxiv.org/abs/2310.02262
  • repo_url: None
  • paper_authors: Tong Zhao, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Yintao Wei
  • for: 本研究旨在满足智能机器人系统中安全性和舒适性的增长需求, 特别是自动驾驶车辆, 其中道路条件对整体驾驶性能产生重要影响。
  • methods: 我们引入了高分辨率和高精度的道路表面重建数据集(RSRD), 该数据集在多种驾驶条件下收集到了约16,000对双眼图像、原始点云和真实的深度/相差图像, 并通过精心的后处理管道来保证其质量。
  • results: 基于RSRD数据集, 我们建立了一个全面的道路表面重建 benchmark, 通过深度估计和双眼匹配来恢复道路profile。 初步的评估表明, RSRD 数据集和相关技术具有很大的潜在价值, 可以用于提高多视角镜头技术等安全自动驾驶技术的发展。
    Abstract This paper addresses the growing demands for safety and comfort in intelligent robot systems, particularly autonomous vehicles, where road conditions play a pivotal role in overall driving performance. For example, reconstructing road surfaces helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems. We introduce the Road Surface Reconstruction Dataset (RSRD), a real-world, high-resolution, and high-precision dataset collected with a specialized platform in diverse driving conditions. It covers common road types containing approximately 16,000 pairs of stereo images, original point clouds, and ground-truth depth/disparity maps, with accurate post-processing pipelines to ensure its quality. Based on RSRD, we further build a comprehensive benchmark for recovering road profiles through depth estimation and stereo matching. Preliminary evaluations with various state-of-the-art methods reveal the effectiveness of our dataset and the challenge of the task, underscoring substantial opportunities of RSRD as a valuable resource for advancing techniques, e.g., multi-view stereo towards safe autonomous driving. The dataset and demo videos are available at https://thu-rsxd.com/rsrd/
    摘要

Talk2BEV: Language-enhanced Bird’s-eye View Maps for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2310.02251
  • repo_url: None
  • paper_authors: Vikrant Dewangan, Tushar Choudhary, Shivam Chandhok, Shubham Priyadarshan, Anushka Jain, Arun K. Singh, Siddharth Srivastava, Krishna Murthy Jatavallabhula, K. Madhava Krishna
  • for: 这个论文主要是为了提供一个大型视力语言模型(LVLM)接口,用于自动驾驶场景中的 bird’s-eye view(BEV)地图。
  • methods: 这个论文使用了现代的通用语言和视觉模型,以及BEV结构化地图表示,从而消除了需要任务特定模型的需求。
  • results: 这个论文通过对大量的Scene理解任务进行广泛的评估,证明了Talk2BEV可以在自动驾驶场景中执行视觉和空间理解、预测交通actor的意图以及基于视觉cue的决策。
    Abstract Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
    摘要 talk2bev 是一个大型视觉语言模型(LVLM)接口,用于自动驾驶场景中的 bird's-eye view(BEV)地图。而现有的自动驾驶场景识别系统大多集中在固定的对象类划定和驾驶场景上,而 talk2bev 则将最新的通用语言和视觉模型与 BEV 结构化地图表示相结合,从而消除需要任务特定模型的需求。这使得单个系统可以处理多种自动驾驶任务,包括视觉和空间逻辑、预测交通员的意图以及基于视觉提示的决策。我们对 talk2bev 进行了广泛的场景理解测试,测试包括解析自然语言查询的能力和将查询grounding到语言增强的 BEV 地图中。为了推动更多关于 LVLM 的研究在自动驾驶场景中,我们开发了 talk2bev-bench,一个包含 1000 个人注释的 BEV 场景 benchmark,包括超过 20,000 个问题和 NuScenes 数据集中的真实答案。

Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2310.02242
  • repo_url: https://github.com/zju3dv/hghoi
  • paper_authors: Huaijin Pi, Sida Peng, Minghui Yang, Xiaowei Zhou, Hujun Bao
  • for: 本研究旨在解决现有的自动进行式模型和走道规划方法无法满足的长距离多样动作生成挑战。
  • methods: 我们提出了一种层次生成框架,首先生成一系列的标点,然后将动作Synthesize along them。因此,长距离动作生成可以被减少到几个短动作序列的组合,受标点的指导。
  • results: 我们的方法在NSM、COUCH和SAMP数据集上进行实验,与之前的方法相比,表现出较大的优势, both in terms of quality and diversity。
    Abstract This paper presents a novel approach to generating the 3D motion of a human interacting with a target object, with a focus on solving the challenge of synthesizing long-range and diverse motions, which could not be fulfilled by existing auto-regressive models or path planning-based methods. We propose a hierarchical generation framework to solve this challenge. Specifically, our framework first generates a set of milestones and then synthesizes the motion along them. Therefore, the long-range motion generation could be reduced to synthesizing several short motion sequences guided by milestones. The experiments on the NSM, COUCH, and SAMP datasets show that our approach outperforms previous methods by a large margin in both quality and diversity. The source code is available on our project page https://zju3dv.github.io/hghoi.
    摘要 这篇论文提出了一种新的方法来生成人类与目标对象之间的3D运动,强调解决现有的滚动式模型或路径规划基本方法无法实现的远程多样化运动挑战。我们提议了层次生成框架来解决这个挑战。具体来说,我们的框架首先生成一组里程碑,然后将运动synthesize到这些里程碑上。因此,长距离运动生成可以被减少到几个短距离运动序列的指导下进行synthesize。实验结果表明,我们的方法在NSM、COUCH和SAMP数据集上比前方法大幅提高了质量和多样性。代码可以在我们项目页面(https://zju3dv.github.io/hghoi)上获取。

Learnable Data Augmentation for One-Shot Unsupervised Domain Adaptation

  • paper_url: http://arxiv.org/abs/2310.02201
  • repo_url: https://github.com/iit-pavis/learnaug-uda
  • paper_authors: Julio Ivan Davila Carrazco, Pietro Morerio, Alessio Del Bue, Vittorio Murino
  • for: 解决一个难度最大的领域适应问题(One-Shot Unsupervised Domain Adaptation,OS-UDA),即只有一个目标样本可用 для模型适应。
  • methods: 提出了一种学习数据增强框架,通过将源数据变换为类似于目标数据的形式,使得基于这种增强数据的分类器在目标领域中具有良好的泛化能力。
  • results: 在DomainNet和VisDA两个领域适应测试 benchmark 上达到了当前最佳性能。Here is the same information in English:
  • for: Solving the most challenging setting in Domain Adaptation, where only one single unlabeled target sample is available for model adaptation.
  • methods: Propose a classification framework based on learnable data augmentation, which transforms source data into a form similar to the target data, enabling a classifier trained on the augmented data to generalize well to the target domain.
  • results: Achieve state-of-the-art performance on two well-known Domain Adaptation benchmarks, DomainNet and VisDA.
    Abstract This paper presents a classification framework based on learnable data augmentation to tackle the One-Shot Unsupervised Domain Adaptation (OS-UDA) problem. OS-UDA is the most challenging setting in Domain Adaptation, as only one single unlabeled target sample is assumed to be available for model adaptation. Driven by such single sample, our method LearnAug-UDA learns how to augment source data, making it perceptually similar to the target. As a result, a classifier trained on such augmented data will generalize well for the target domain. To achieve this, we designed an encoder-decoder architecture that exploits a perceptual loss and style transfer strategies to augment the source data. Our method achieves state-of-the-art performance on two well-known Domain Adaptation benchmarks, DomainNet and VisDA. The project code is available at https://github.com/IIT-PAVIS/LearnAug-UDA
    摘要 这篇论文提出了一种基于可学习数据扩展的分类框架,用于解决一架不监督领域适应(OS-UDA)问题。OS-UDA是领域适应中最为困难的设定,只有一个单个目标样本可用于模型适应。我们的方法LearnAug-UDA通过使用单个目标样本来学习扩展源数据,使其与目标数据变得感知上相似。因此,基于这种扩展数据的分类器将在目标领域中进行良好的泛化。为实现此目标,我们设计了一个Encoder-Decoder架构,利用感知损失和风格传递策略来扩展源数据。我们的方法在DomainNet和VisDA两个领域适应 benchmark 上实现了状态的表现,代码可以在https://github.com/IIT-PAVIS/LearnAug-UDA 上下载。

PAD-Phys: Exploiting Physiology for Presentation Attack Detection in Face Biometrics

  • paper_url: http://arxiv.org/abs/2310.02140
  • repo_url: None
  • paper_authors: Luis F. Gomez, Julian Fierrez, Aythami Morales, Mahdi Ghafourian, Ruben Tolosana, Imanol Solano, Alejandro Garcia, Francisco Zamora-Martinez
  • for: 防止人脸识别系统中的个人信息泄露或身份识别 spoofing
  • methods: 基于远程光谱 Plethysmography (rPPG) 的脉冲检测
  • results: 比较三种方法的结果,即生物域、深伪域和新的演示攻击域,结果显示 rPPG 基于模型的脉冲检测效果好,ACER 降低了 21.70%(从 41.03% 降低到 19.32%)。
    Abstract Presentation Attack Detection (PAD) is a crucial stage in facial recognition systems to avoid leakage of personal information or spoofing of identity to entities. Recently, pulse detection based on remote photoplethysmography (rPPG) has been shown to be effective in face presentation attack detection. This work presents three different approaches to the presentation attack detection based on rPPG: (i) The physiological domain, a domain using rPPG-based models, (ii) the Deepfakes domain, a domain where models were retrained from the physiological domain to specific Deepfakes detection tasks; and (iii) a new Presentation Attack domain was trained by applying transfer learning from the two previous domains to improve the capability to differentiate between bona-fides and attacks. The results show the efficiency of the rPPG-based models for presentation attack detection, evidencing a 21.70% decrease in average classification error rate (ACER) (from 41.03% to 19.32%) when the presentation attack domain is compared to the physiological and Deepfakes domains. Our experiments highlight the efficiency of transfer learning in rPPG-based models and perform well in presentation attack detection in instruments that do not allow copying of this physiological feature.
    摘要 本工作介绍了三种不同的面向攻击检测方法 based on rPPG:1. 生物学领域,使用 rPPG 基于模型,检测面向攻击。2. Deepfakes 领域, retrained 模型从生物学领域到特定的 Deepfakes 检测任务。3. 一个新的面向攻击领域,通过转移学习从前两个领域提高了区分真实和攻击的能力。结果显示 rPPG 基于模型在面向攻击检测中的效果, ACER (平均分类错误率)下降了 21.70%(从 41.03% 降至 19.32%),当面向攻击领域与生物学和 Deepfakes 领域进行比较时。我们的实验表明 rPPG 基于模型的转移学习能够在不允许 copying 生物学特征的 instrumente 中表现出色。

SIEVE: Multimodal Dataset Pruning Using Image Captioning Models

  • paper_url: http://arxiv.org/abs/2310.02110
  • repo_url: None
  • paper_authors: Anas Mahmoud, Mostafa Elhoushi, Amro Abbas, Yu Yang, Newsha Ardalani, Hugh Leather, Ari Morcos
  • for: 这 paper 的目的是提出一种新的数据筛选方法,以提高 vision-language 模型(VLM)的性能。
  • methods: 该方法使用生成的文本描述和语言模型来评估图像和文本对应的一致性,并使用数据COMP的多模态筛选 benchmark 来评估其性能。
  • results: 该方法在大规模池中实现了状态机器的性能,并在中等规模池中获得了竞争性的结果,比 CLIPScore 基于的筛选方法提高了1.7%和2.6%的平均性能。
    Abstract Vision-Language Models (VLMs) are pretrained on large, diverse, and noisy web-crawled datasets. This underscores the critical need for dataset pruning, as the quality of these datasets is strongly correlated with the performance of VLMs on downstream tasks. Using CLIPScore from a pretrained model to only train models using highly-aligned samples is one of the most successful methods for pruning.We argue that this approach suffers from multiple limitations including: 1) false positives due to spurious correlations captured by the pretrained CLIP model, 2) false negatives due to poor discrimination between hard and bad samples, and 3) biased ranking towards samples similar to the pretrained CLIP dataset. We propose a pruning method, SIEVE, that employs synthetic captions generated by image-captioning models pretrained on small, diverse, and well-aligned image-text pairs to evaluate the alignment of noisy image-text pairs. To bridge the gap between the limited diversity of generated captions and the high diversity of alternative text (alt-text), we estimate the semantic textual similarity in the embedding space of a language model pretrained on billions of sentences. Using DataComp, a multimodal dataset filtering benchmark, we achieve state-of-the-art performance on the large scale pool, and competitive results on the medium scale pool, surpassing CLIPScore-based filtering by 1.7% and 2.6% on average, on 38 downstream tasks.
    摘要 视力语言模型(VLM)通常预训练在大量、多样化和噪音的网络抓取数据上。这种情况下,数据剔除变得非常重要,因为数据质量直接影响下游任务的性能。使用CLIPScore从预训练模型中只训练使用高度对应的样本是一种非常成功的剔除方法。然而,我们认为这种方法受到多种限制,包括:1)由预训练CLIP模型捕捉的假阳性,2)由低精度的样本分类错失,和3)偏向于与预训练CLIP数据集类似的样本排名。我们提议一种剔除方法,即SIEVE,该方法使用由图像描述模型预训练在小型、多样化和匹配的图像文本对中生成的文本来评估噪音图像文本的对应度。为了补偿生成文本的有限多样性,我们利用一种语言模型预训练 на千万句 sentences的embedding空间内的 semantic textual similarity来估计。使用DataComp,一种多模式数据筛选 benchmark,我们在大规模池中实现了状态机器的性能,并在中等规模池中获得了竞争性的结果,高于CLIPScore-based filtering的平均性能by 1.7%和2.6%,在38个下游任务上。

Leveraging Classic Deconvolution and Feature Extraction in Zero-Shot Image Restoration

  • paper_url: http://arxiv.org/abs/2310.02097
  • repo_url: https://github.com/ctom2/cider
  • paper_authors: Tomáš Chobola, Gesine Müller, Veit Dausmann, Anton Theileis, Jan Taucher, Jan Huisken, Tingying Peng
  • for: 非Mask-based非透明恢复图像
  • methods: 组合深度学习和经典迭代恢复算法
  • results: 在实际应用中显示了明显的改善
    Abstract Non-blind deconvolution aims to restore a sharp image from its blurred counterpart given an obtained kernel. Existing deep neural architectures are often built based on large datasets of sharp ground truth images and trained with supervision. Sharp, high quality ground truth images, however, are not always available, especially for biomedical applications. This severely hampers the applicability of current approaches in practice. In this paper, we propose a novel non-blind deconvolution method that leverages the power of deep learning and classic iterative deconvolution algorithms. Our approach combines a pre-trained network to extract deep features from the input image with iterative Richardson-Lucy deconvolution steps. Subsequently, a zero-shot optimisation process is employed to integrate the deconvolved features, resulting in a high-quality reconstructed image. By performing the preliminary reconstruction with the classic iterative deconvolution method, we can effectively utilise a smaller network to produce the final image, thus accelerating the reconstruction whilst reducing the demand for valuable computational resources. Our method demonstrates significant improvements in various real-world applications non-blind deconvolution tasks.
    摘要 非目的减推算器目的是从模糊图像中还原锐利图像,给出的核函数。现有的深度神经架构 часто基于大量锐利真实图像的 datasets 和监督学习。然而,锐利、高质量的真实图像在生物医学应用中并不总是可得,特别是在实际应用中。这会严重限制现有方法的应用。在本文中,我们提出了一种新的非目的减推算法,利用深度学习和经典的迭代减推算法。我们的方法首先使用预训练的网络提取输入图像的深度特征,然后使用迭代的理查森-卢西减推算步骤。最后,我们使用零截 optimization 进程将减推后的特征集成,得到高质量重建图像。通过在经典迭代减推算法前进行初步重建,我们可以更好地利用更小的网络生成最终图像,从而加速重建,降低计算资源的需求。我们的方法在非目的减推应用中表现出了显著的改善。

Global Attractor for a Reaction-Diffusion Model Arising in Biological Dynamic in 3D Soil Structure

  • paper_url: http://arxiv.org/abs/2310.02060
  • repo_url: None
  • paper_authors: Mohamed Elghandouri, Khalil Ezzinbi, Mouad Klai, Olivier Monga
  • for: 本研究用partial differential equations (PDEs) 模型三维土壤结构中微生物活动的复杂过程,为生物领域提供有价值的理解。
  • methods: 本研究使用PDEs模型和数值价值计算来研究微生物活动的存在和唯一性,以及相应模型的极限行为。
  • results: 研究发现了一个全球吸引器,这是一个重要的系统行为特征,有助于理解长期系统的行为。数值价值计算也用于较之这个全球吸引器的特性。
    Abstract Partial Differential Equations (PDEs) play a crucial role as tools for modeling and comprehending intricate natural processes, notably within the domain of biology. This research explores the domain of microbial activity within the complex matrix of 3D soil structures, providing valuable understanding into both the existence and uniqueness of solutions and the asymptotic behavior of the corresponding PDE model. Our investigation results in the discovery of a global attractor, a fundamental feature with significant implications for long-term system behavior. To enhance the clarity of our findings, numerical simulations are employed to visually illustrate the attributes of this global attractor.
    摘要

Exploring Generalisability of Self-Distillation with No Labels for SAR-Based Vegetation Prediction

  • paper_url: http://arxiv.org/abs/2310.02048
  • repo_url: None
  • paper_authors: Laura Martínez-Ferrer, Anna Jungbluth, Joseph A. Gallego-Mejia, Matt Allen, Francisco Dorr, Freddie Kalaitzis, Raúl Ramos-Pollán
  • for: 本研究使用两个Synthetic Aperture Radar数据集(S1GRD或GSSIC)在三个地区(中国、Conus、欧洲)预训练了一个DINO-ViT基本模型,并对其进行了精度调整以预测植被百分比。
  • methods: 本研究使用了预训练模型,并对其进行了精度调整以预测植被百分比。
  • results: 研究发现,S1GRD中不同地区的嵌入空间明显分离,而GSSIC中嵌入空间受到干扰。在精度调整过程中,位置特征保持不变,而较不熟悉的地区的误差通常高于 Familiar 地区。这些结果可以帮助我们更好地理解自动学习模型在远程感知领域的普适性。
    Abstract In this work we pre-train a DINO-ViT based model using two Synthetic Aperture Radar datasets (S1GRD or GSSIC) across three regions (China, Conus, Europe). We fine-tune the models on smaller labeled datasets to predict vegetation percentage, and empirically study the connection between the embedding space of the models and their ability to generalize across diverse geographic regions and to unseen data. For S1GRD, embedding spaces of different regions are clearly separated, while GSSIC's overlaps. Positional patterns remain during fine-tuning, and greater distances in embeddings often result in higher errors for unfamiliar regions. With this, our work increases our understanding of generalizability for self-supervised models applied to remote sensing.
    摘要 在这个工作中,我们预训练了基于DINO-ViT的模型使用两个Synthetic Aperture Radar数据集(S1GRD或GSSIC)在三个地区(中国、Conus、欧洲)。我们精度地调整模型以预测植被百分数,并实际研究模型 embedding 空间和其能否通过不同地理区域和未见数据进行泛化。对于S1GRD,不同地区的 embedding 空间明显分离,而GSSIC 的 embedding 空间存在重叠。位置特征在精度调整中保持不变,并且更大的 embedding 距离通常会导致对不熟悉地区的误差更高。因此,我们的工作将自助学习模型在遥感技术中的泛化性进一步了解。

Video Transformers under Occlusion: How Physics and Background Attributes Impact Large Models for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2310.02044
  • repo_url: https://github.com/shutongjin/occlumanip
  • paper_authors: Shutong Jin, Ruiyu Wang, Muhammad Zahid, Florian T. Pokorny
  • for: 本研究旨在 investigating the influence of object physics attributes and background characteristics on the performance of Video Transformers in trajectory prediction tasks under occlusion.
  • methods: 本研究使用了 OccluManip dataset,一个包含不同物理属性和背景特征的物体 occlusion 视频数据集,并提出了 Video Occlusion Transformer (VOT) 网络,一种通用的视频变换器基于网络,实现了平均96%的准确率在所有18个子数据集中。
  • results: 研究发现,物体物理属性和背景特征均对 Video Transformers 的性能产生了重要影响,并且提出了一些可能导致模型减少性能的原因。此外,研究还发现了一个数据满足点,在这个点上,大型变换器模型的性能不再提高。
    Abstract As transformer architectures and dataset sizes continue to scale, the need to understand the specific dataset factors affecting model performance becomes increasingly urgent. This paper investigates how object physics attributes (color, friction coefficient, shape) and background characteristics (static, dynamic, background complexity) influence the performance of Video Transformers in trajectory prediction tasks under occlusion. Beyond mere occlusion challenges, this study aims to investigate three questions: How do object physics attributes and background characteristics influence the model performance? What kinds of attributes are most influential to the model generalization? Is there a data saturation point for large transformer model performance within a single task? To facilitate this research, we present OccluManip, a real-world video-based robot pushing dataset comprising 460,000 consistent recordings of objects with different physics and varying backgrounds. 1.4 TB and in total 1278 hours of high-quality videos of flexible temporal length along with target object trajectories are collected, accommodating tasks with different temporal requirements. Additionally, we propose Video Occlusion Transformer (VOT), a generic video-transformer-based network achieving an average 96% accuracy across all 18 sub-datasets provided in OccluManip. OccluManip and VOT will be released at: https://github.com/ShutongJIN/OccluManip.git
    摘要 As transformer architectures and dataset sizes continue to scale, understanding the specific dataset factors affecting model performance becomes increasingly urgent. This paper investigates how object physics attributes (color, friction coefficient, shape) and background characteristics (static, dynamic, background complexity) influence the performance of Video Transformers in trajectory prediction tasks under occlusion. Beyond mere occlusion challenges, this study aims to investigate three questions: How do object physics attributes and background characteristics influence the model performance? What kinds of attributes are most influential to the model generalization? Is there a data saturation point for large transformer model performance within a single task? To facilitate this research, we present OccluManip, a real-world video-based robot pushing dataset comprising 460,000 consistent recordings of objects with different physics and varying backgrounds. 1.4 TB and in total 1278 hours of high-quality videos of flexible temporal length along with target object trajectories are collected, accommodating tasks with different temporal requirements. Additionally, we propose Video Occlusion Transformer (VOT), a generic video-transformer-based network achieving an average 96% accuracy across all 18 sub-datasets provided in OccluManip. OccluManip and VOT will be released at: https://github.com/ShutongJIN/OccluManip.gitHere's the translation in Traditional Chinese:当 transformer 架构和数据集大小继续扩大时,理解特定数据集因素对模型性能的影响变得越来越重要。本研究探讨影片变数(颜色、摩擦系数、形状)和背景特征(静止、动态、背景复杂度)对 Video Transformers 在遮蔽 task 中的性能影响。 beyond 遮蔽挑战,本研究旨在 investigate 三个问题:影片变数和背景特征对模型性能影响多大?这些特征在模型通用化中多重影响吗?是否存在单一任务中大 transformer 模型性能的数据饱和点? To facilitate this research, we present OccluManip, a real-world video-based robot pushing dataset comprising 460,000 consistent recordings of objects with different physics and varying backgrounds. 1.4 TB and in total 1278 hours of high-quality videos of flexible temporal length along with target object trajectories are collected, accommodating tasks with different temporal requirements. Additionally, we propose Video Occlusion Transformer (VOT), a generic video-transformer-based network achieving an average 96% accuracy across all 18 sub-datasets provided in OccluManip. OccluManip and VOT will be released at: https://github.com/ShutongJIN/OccluManip.git

Decoding Human Activities: Analyzing Wearable Accelerometer and Gyroscope Data for Activity Recognition

  • paper_url: http://arxiv.org/abs/2310.02011
  • repo_url: None
  • paper_authors: Utsab Saha, Sawradip Saha, Tahmid Kabir, Shaikh Anowarul Fattah, Mohammad Saquib
  • for: 本研究旨在提出一种基于Residual网络和Residual MobileNet的多结构层合并方法,用于分类人类活动。
  • methods: 该方法使用了特制的Residual块进行分类static和dynamic活动,并将这两个模型独立地训练。之后,这两个ResNet被通过权重加权ensemble来结合,以便更好地分类特定的静止和动态活动。
  • results: 在UCI HAR和Motion-Sense等两个公共数据集上进行测试,该方法能够成功处理数据重叠的混淆情况,并实现了state-of-the-art的准确率96.71%和95.35%。
    Abstract A person's movement or relative positioning effectively generates raw electrical signals that can be read by computing machines to apply various manipulative techniques for the classification of different human activities. In this paper, a stratified multi-structural approach based on a Residual network ensembled with Residual MobileNet is proposed, termed as FusionActNet. The proposed method involves using carefully designed Residual blocks for classifying the static and dynamic activities separately because they have clear and distinct characteristics that set them apart. These networks are trained independently, resulting in two specialized and highly accurate models. These models excel at recognizing activities within a specific superclass by taking advantage of the unique algorithmic benefits of architectural adjustments. Afterward, these two ResNets are passed through a weighted ensemble-based Residual MobileNet. Subsequently, this ensemble proficiently discriminates between a specific static and a specific dynamic activity, which were previously identified based on their distinct feature characteristics in the earlier stage. The proposed model is evaluated using two publicly accessible datasets; namely, UCI HAR and Motion-Sense. Therein, it successfully handled the highly confusing cases of data overlap. Therefore, the proposed approach achieves a state-of-the-art accuracy of 96.71% and 95.35% in the UCI HAR and Motion-Sense datasets respectively.
    摘要 人体运动或相对位置生成的Raw电信号可以由计算机机器读取并应用不同的涉及技术来分类不同的人类活动。在这篇论文中,我们提出了一种多结构层次方法,称为FusionActNet,使用了 residual网络和 residual MobileNet的 ensemble。这种方法使用了特制的 residual块来分类静止和动态活动,因为它们在特征上有清晰的分化。这两个网络独立地训练,从而生成了两个高度精度的模型。这些模型可以通过特有的算法优点来识别活动内一个特定超类。接着,这两个 residual 网络通过权重 ensemble 的 residual MobileNet 进行混合,从而高效地区分静止和动态活动。在使用两个公共可访问的数据集(UC Irvine HAR和Motion-Sense)进行评估时,该方法成功处理了数据重叠的高度混淆情况。因此,我们的方法实现了状态的最佳准确率为96.71%和95.35%在UC Irvine HAR和Motion-Sense数据集中。

MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep Models for X-ray Images of Multiple Body Parts

  • paper_url: http://arxiv.org/abs/2310.02000
  • repo_url: None
  • paper_authors: Weibin Liao, Haoyi Xiong, Qingzhong Wang, Yan Mo, Xuhong Li, Yi Liu, Zeyu Chen, Siyu Huang, Dejing Dou
  • for: 这个论文旨在提高透明度图像分析的表示学习,使用多任务自监学习(Multi-task Self-supervised Continual Learning,MUSCLE) pipeline,并在多个身体部位的X射线图像上进行训练。
  • methods: 这个方法使用MoCo基于表示学习,并采用了一种优化的连续学习(Continual Learning,CL)过程,以适应不同的X射线分析任务。此外,方法还使用了一些解决数据不一致、过拟合和忘记问题的策略,如图像预处理、学习调度和规范化。
  • results: 在9个真实世界X射线数据集上进行评估, Comparisons against other pre-trained models confirm the proof-of-concept that self-supervised multi-task/dataset continual pre-training could boost the performance of X-ray image analysis。
    Abstract While self-supervised learning (SSL) algorithms have been widely used to pre-train deep models, few efforts [11] have been done to improve representation learning of X-ray image analysis with SSL pre-trained models. In this work, we study a novel self-supervised pre-training pipeline, namely Multi-task Self-super-vised Continual Learning (MUSCLE), for multiple medical imaging tasks, such as classification and segmentation, using X-ray images collected from multiple body parts, including heads, lungs, and bones. Specifically, MUSCLE aggregates X-rays collected from multiple body parts for MoCo-based representation learning, and adopts a well-designed continual learning (CL) procedure to further pre-train the backbone subject various X-ray analysis tasks jointly. Certain strategies for image pre-processing, learning schedules, and regularization have been used to solve data heterogeneity, overfitting, and catastrophic forgetting problems for multi-task/dataset learning in MUSCLE.We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection. Comparisons against other pre-trained models [7] confirm the proof-of-concept that self-supervised multi-task/dataset continual pre-training could boost the performance of X-ray image analysis.
    摘要 traditional Chinese:随自学习(SSL)算法已经广泛使用来预训深度模型,但有少数尝试(11)来提高X射照像分析中的表示学习使用SSL预训模型。在这个工作中,我们研究了一个新的自我监督预训管道,即多任务自监督流行学习(MUSCLE),用于多种医疗影像任务,如分类和分类,使用X射照像集合自多个身体部位,包括头部、肺部和骨骼。具体来说,MUSCLE将X射照像集合自多个身体部位用MoCo基础的表示学习,并运用了一个Well-设计的流行学习(CL)程式来进一步预训底层组件面对多种X射照像分析任务。我们还使用了一些对于多任务/数据集学习的问题,如数据不一致、过滤和忘却问题的策略。我们使用了9个真实世界X射照像数据集进行评估,包括肺部炎症分类、骨骼异常分类、肺部分类和抑菌病(TB)检测。与其他预训模型(7)进行比较,证明了我们的观念的概念验证,即自我监督多任务/数据集流行预训可以提高X射照像分析的性能。Simplified Chinese:随自学习(SSL)算法已经广泛应用于预训深度模型,但有少数尝试(11)来提高X射照像分析中的表示学习使用SSL预训模型。在这个工作中,我们研究了一个新的自我监督预训管道,即多任务自监督流行学习(MUSCLE),用于多种医疗影像任务,如分类和分类,使用X射照像集合自多个身体部位,包括头部、肺部和骨骼。具体来说,MUSCLE将X射照像集合自多个身体部位用MoCo基础的表示学习,并运用了一个Well-设计的流行学习(CL)程式来进一步预训底层组件面对多种X射照像分析任务。我们还使用了一些对于多任务/数据集学习的问题,如数据不一致、过滤和忘却问题的策略。我们使用了9个真实世界X射照像数据集进行评估,包括肺部炎症分类、骨骼异常分类、肺部分类和抑菌病(TB)检测。与其他预训模型(7)进行比较,证明了我们的观念的概念验证,即自我监督多任务/数据集流行预训可以提高X射照像分析的性能。

Understanding Masked Autoencoders From a Local Contrastive Perspective

  • paper_url: http://arxiv.org/abs/2310.01994
  • repo_url: None
  • paper_authors: Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Luping Zhou, Wanli Ouyang
  • for: 本研究旨在探讨Masked AutoEncoder(MAE)在自supervised learning中的内部机制,以及它如何生成高质量的隐藏表示。
  • methods: 本研究使用MAE的生成预训练路径,通过对图像进行恶势masking来重建图像。研究发现MAE的解码器主要学习本地特征,遵循了Local Principle。基于本地性假设,提出了一种理论框架,将MAE转化为一种地区级别的对比学习形式,以更好地理解MAE的工作原理。
  • results: 研究发现MAE具有强大的地区级别对比学习能力,并且可以在不同的下游任务中达到状态 arts 的性能。此外,研究还提出了一种不需要masking和显式解码器的Siamese架构,可以寻求更加灵活的自supervised learning框架。
    Abstract Masked AutoEncoder(MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we explore a new perspective to explain what truly contributes to the "rich hidden representations inside the MAE". Firstly, concerning MAE's generative pretraining pathway, with a unique encoder-decoder architecture to reconstruct images from aggressive masking, we conduct an in-depth analysis of the decoder's behaviors. We empirically find that MAE's decoder mainly learns local features with a limited receptive field, adhering to the well-known Locality Principle. Building upon this locality assumption, we propose a theoretical framework that reformulates the reconstruction-based MAE into a local region-level contrastive learning form for improved understanding. Furthermore, to substantiate the local contrastive nature of MAE, we introduce a Siamese architecture that combines the essence of MAE and contrastive learning without masking and explicit decoder, which sheds light on a unified and more flexible self-supervised learning framework.
    摘要 masked autoencoder (MAE) 已经在自适应学习领域中革命化了,它的简单 yet effective 的面纱和重建策略使得它在不同的下游视觉任务中实现了状态机器人的性能。然而,尽管 MAE 的表现能力在多个下游任务中达到了领先水平,但是它的内部机制仍然不够清晰,相比于 canonical 对比学习模式。在这篇论文中,我们提出了一种新的视角来解释 MAE 中 "rich hidden representations" 的真正来源。首先,关于 MAE 的生成预训练路径,我们通过对掩蔽图像的唯一 encoder-decoder 架构进行深入分析,发现 MAE 的decoder 主要学习局部特征,遵循了已知的 Local Principle。基于这个局部假设,我们提出了一个理论框架,将 reconstruction-based MAE 转化为一种局部区域级对比学习形式,以提高理解。此外,为了证明 MAE 的局部对比性,我们提出了一种 Siamese 架构,将 MAE 和对比学习的本质结合在一起,无需掩蔽和显式的 decoder,这为一个更加统一和灵活的自适应学习框架提供了新的思路。

Development of Machine Vision Approach for Mechanical Component Identification based on its Dimension and Pitch

  • paper_url: http://arxiv.org/abs/2310.01995
  • repo_url: None
  • paper_authors: Toshit Jain, Faisel Mushtaq, K Ramesh, Sandip Deshmukh, Tathagata Ray, Chandu Parimi, Praveen Tandon, Pramod Kumar Jha
  • for: automatization of mechanical assembly lines
  • methods: novel method of calculating pitch and bolt identification, lightweight and fast system
  • results: correct identification of parts with 98% accuracy
    Abstract In this work, a highly customizable and scalable vision based system for automation of mechanical assembly lines is described. The proposed system calculates the features that are required to classify and identify the different kinds of bolts that are used in the assembly line. The system describes a novel method of calculating the pitch of the bolt in addition to bolt identification and calculating the dimensions of the bolts. This identification and classification system is extremely lightweight and can be run on bare minimum hardware. The system is very fast in the order of milliseconds, hence the system can be used successfully even if the components are steadily moving on a conveyor. The results show that our system can correctly identify the parts in our dataset with 98% accuracy using the calculated features.
    摘要 在这个工作中,描述了一种高度可定制和扩展的视觉基于系统,用于机械生产线自动化。该系统计算用于分类和识别不同类型的螺丝的特征。系统描述了一种新的螺丝排列方法,以及螺丝的尺寸计算方法。这个识别和分类系统非常轻量级,可以在最少的硬件上运行。系统速度非常快,只需毫秒级时间,因此可以成功地在滚动道上使用。实验结果显示,我们的系统可以在我们的数据集中正确地识别部件,准确率达98%。

CoralVOS: Dataset and Benchmark for Coral Video Segmentation

  • paper_url: http://arxiv.org/abs/2310.01946
  • repo_url: https://github.com/zhengziqiang/CoralVOS
  • paper_authors: Zheng Ziqiang, Xie Yaofeng, Liang Haixin, Yu Zhibin, Sai-Kit Yeung
    for:* 这个论文旨在提高贝壳礁的分析效率和准确性,并提供一个大规模的贝壳礁视频分割数据集(CoralVOS),以支持 dense coral video segmentation。methods:* 本论文使用了6种最新的视频对象分割(VOS)算法,并对其进行了微调,以便在CoralVOS dataset上进行分割。results:* 实验结果显示,通过微调VOS算法并使用CoralVOS dataset,可以大幅提高贝壳礁分割精度。然而,还有很大的潜在提升空间。
    Abstract Coral reefs formulate the most valuable and productive marine ecosystems, providing habitat for many marine species. Coral reef surveying and analysis are currently confined to coral experts who invest substantial effort in generating comprehensive and dependable reports (\emph{e.g.}, coral coverage, population, spatial distribution, \textit{etc}), from the collected survey data. However, performing dense coral analysis based on manual efforts is significantly time-consuming, the existing coral analysis algorithms compromise and opt for performing down-sampling and only conducting sparse point-based coral analysis within selected frames. However, such down-sampling will \textbf{inevitable} introduce the estimation bias or even lead to wrong results. To address this issue, we propose to perform \textbf{dense coral video segmentation}, with no down-sampling involved. Through video object segmentation, we could generate more \textit{reliable} and \textit{in-depth} coral analysis than the existing coral reef analysis algorithms. To boost such dense coral analysis, we propose a large-scale coral video segmentation dataset: \textbf{CoralVOS} as demonstrated in Fig. 1. To the best of our knowledge, our CoralVOS is the first dataset and benchmark supporting dense coral video segmentation. We perform experiments on our CoralVOS dataset, including 6 recent state-of-the-art video object segmentation (VOS) algorithms. We fine-tuned these VOS algorithms on our CoralVOS dataset and achieved observable performance improvement. The results show that there is still great potential for further promoting the segmentation accuracy. The dataset and trained models will be released with the acceptance of this work to foster the coral reef research community.
    摘要 海礁生态系统是生物多样性最高、最产值的marine生态系统,提供许多海洋生物种类的栖息地。但现在的海礁调查和分析仅限于专业人员,他们需要投入大量时间和努力来生成全面和可靠的报告(如珊瑚覆盖率、种群数量、空间分布等),从收集的调查数据中。然而,基于手动努力进行的 dense coral analysis 会带来估计偏差或者结果错误。为解决这个问题,我们提议实施 dense coral video segmentation,不含下采样。通过视频对象分 segmentation,我们可以生成更可靠和更深入的海礁分析结果,超过现有的海礁分析算法。为了推动这种 dense coral analysis,我们提出了一个大规模的海礁视频分 segmentation 数据集:CoralVOS(参见图1)。我们认为,CoralVOS 是目前所知道的第一个支持 dense coral video segmentation 的数据集和标准准。我们在 CoralVOS 数据集上进行了实验,包括 6 个最新的视频对象分 segmentation(VOS)算法。我们对这些 VOS 算法进行了微调,并在我们的 CoralVOS 数据集上进行了实验。结果表明,还有很大的提高可能性。我们将数据集和训练模型释放,以便推动海礁研究社区的发展。

OOD Aware Supervised Contrastive Learning

  • paper_url: http://arxiv.org/abs/2310.01942
  • repo_url: None
  • paper_authors: Soroush Seifi, Daniel Olmeda Reino, Nikolay Chumerin, Rahaf Aljundi
  • for: 本文提出了一种基于超级对比学习的外部数据检测方法,以确保机器学习模型在部署时能够正确地识别外部数据。
  • methods: 本文提出了一种扩展supervised contrastive(SupCon)准则的方法,并增加了两个附加的对比项。第一个项使auxiliary OOD表示异常离ID表示,而不受任何约束。第二个项使OOD特征远离现有的类prototype,而Push ID表示更近于其相应的类prototype。当auxiliary OOD数据不可用时,本文提出了一种效率高的特征混合技术来生成pseudo-OOD特征。
  • results: 作者对不同的OOD检测方法进行比较,并在常用的benchmark上显示了state-of-the-art的结果。
    Abstract Out-of-Distribution (OOD) detection is a crucial problem for the safe deployment of machine learning models identifying samples that fall outside of the training distribution, i.e. in-distribution data (ID). Most OOD works focus on the classification models trained with Cross Entropy (CE) and attempt to fix its inherent issues. In this work we leverage powerful representation learned with Supervised Contrastive (SupCon) training and propose a holistic approach to learn a classifier robust to OOD data. We extend SupCon loss with two additional contrast terms. The first term pushes auxiliary OOD representations away from ID representations without imposing any constraints on similarities among auxiliary data. The second term pushes OOD features far from the existing class prototypes, while pushing ID representations closer to their corresponding class prototype. When auxiliary OOD data is not available, we propose feature mixing techniques to efficiently generate pseudo-OOD features. Our solution is simple and efficient and acts as a natural extension of the closed-set supervised contrastive representation learning. We compare against different OOD detection methods on the common benchmarks and show state-of-the-art results.
    摘要 外部分布(OOD)检测是机器学习模型安全部署的关键问题,即内部分布数据(ID)。大多数OOD工作集中在基于交叉熵(CE)训练的分类模型上,尝试解决其内存问题。在这项工作中,我们利用强大的Supervised Contrastive(SupCon)训练学习出的表示,并提出一种整体方法来学习对OOD数据强健的分类器。我们将SupCon损失函数扩展为两个附加的对比项。第一项使auxiliary OOD表示远离ID表示,无论ID数据之间的相似性是否存在约束。第二项使OOD特征远离现有的类prototype,而ID表示靠近其相应的类prototype。当auxiliary OOD数据不可用时,我们提议使用特征混合技术生成 Pseudo-OOD 特征。我们的解决方案简单、高效,可以视为关闭集成Supervised Contrastive表示学习的自然扩展。我们在常用的benchmark上与不同的OOD检测方法进行比较,并显示状态前的结果。

Constructing Image-Text Pair Dataset from Books

  • paper_url: http://arxiv.org/abs/2310.01936
  • repo_url: None
  • paper_authors: Yamato Okamoto, Haruto Toyonaga, Yoshihisa Ijiri, Hirokatsu Kataoka
  • for: This paper aims to leverage digital archives for machine learning, with the potential to uncover unknown insights and acquire knowledge autonomously.
  • methods: The proposed approach uses an optical character reader (OCR), an object detector, and a layout analyzer to construct an image-text pair dataset.
  • results: The authors apply their pipeline on old photo books to demonstrate the effectiveness of the approach in image-text retrieval and insight extraction.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目的是利用数字档案库进行机器学习,以便发现未知的发现和自主获取知识,就像人类读书一样。
  • methods: 该方法使用光学字符识别器(OCR)、物体检测器和布局分析器,为自动提取图文对象构建一个数据集。
  • results: 作者在古老的照片书中应用了自己的管道,展示了图文检索和发现的效果。
    Abstract Digital archiving is becoming widespread owing to its effectiveness in protecting valuable books and providing knowledge to many people electronically. In this paper, we propose a novel approach to leverage digital archives for machine learning. If we can fully utilize such digitized data, machine learning has the potential to uncover unknown insights and ultimately acquire knowledge autonomously, just like humans read books. As a first step, we design a dataset construction pipeline comprising an optical character reader (OCR), an object detector, and a layout analyzer for the autonomous extraction of image-text pairs. In our experiments, we apply our pipeline on old photo books to construct an image-text pair dataset, showing its effectiveness in image-text retrieval and insight extraction.
    摘要 “数字档案化在广泛应用,因为它能够保护价值书籍并提供电子形式知识给许多人。在这篇论文中,我们提出了一种新的方法,利用数字档案来进行机器学习。如果我们可以完全利用这些数字化数据,机器学习就可以探索未知的材料和获得自主知识,就像人类读书一样。为了实现这一目标,我们设计了一个数据建构管道,包括光学字符读取器(OCR)、对象检测器和布局分析器,以自动提取图文对。在我们的实验中,我们应用了这个管道于古照片书籍,构建了一个图文对数据集,并证明了其在图文检索和材料探索方面的有效性。”

Robust deformable image registration using cycle-consistent implicit representations

  • paper_url: http://arxiv.org/abs/2310.01934
  • repo_url: https://github.com/louisvh/cycle_consistent_inr
  • paper_authors: Louis D. van Harten, Jaap Stoker, Ivana Išgum
  • for: 这个论文是为了提高医疗影像注册的稳定性和可靠性而写的。
  • methods: 这个方法使用了对于每对新影像进行优化的隐藏神经表现,并使用对于每对影像进行逆转的方法来实现对称调整。
  • results: 这个方法可以提高注册精度,并且可以提供一个可靠的不确定度量,可以用于自动质量控制。在4D肺CT数据集上进行评估,这个方法可以减少优化失败率从2.4%降至0.0%,提高标志精度4.5%,并且可以侦测到注册方法是否能够正确地解决 проблеme。此外,这个方法在 Abdomen 4D MRI 中心线传播任务上显示了46%的传播一致性提高,并且与注册精度之间存在强相关。
    Abstract Recent works in medical image registration have proposed the use of Implicit Neural Representations, demonstrating performance that rivals state-of-the-art learning-based methods. However, these implicit representations need to be optimized for each new image pair, which is a stochastic process that may fail to converge to a global minimum. To improve robustness, we propose a deformable registration method using pairs of cycle-consistent Implicit Neural Representations: each implicit representation is linked to a second implicit representation that estimates the opposite transformation, causing each network to act as a regularizer for its paired opposite. During inference, we generate multiple deformation estimates by numerically inverting the paired backward transformation and evaluating the consensus of the optimized pair. This consensus improves registration accuracy over using a single representation and results in a robust uncertainty metric that can be used for automatic quality control. We evaluate our method with a 4D lung CT dataset. The proposed cycle-consistent optimization method reduces the optimization failure rate from 2.4% to 0.0% compared to the current state-of-the-art. The proposed inference method improves landmark accuracy by 4.5% and the proposed uncertainty metric detects all instances where the registration method fails to converge to a correct solution. We verify the generalizability of these results to other data using a centerline propagation task in abdominal 4D MRI, where our method achieves a 46% improvement in propagation consistency compared with single-INR registration and demonstrates a strong correlation between the proposed uncertainty metric and registration accuracy.
    摘要 现有医疗影像注册研究提出使用隐藏神经表示,表现和目前学习型方法相当。然而,这些隐藏表示需要每对新影像进行优化,这是一个几率过程可能无法对全域最小化。为了提高适当性,我们提议使用对称的变形注册方法,每对隐藏神经表示 Linked to a second implicit representation that estimates the opposite transformation,使每个网络成为对应的常量。在推断中,我们产生多个变形估计,通过对称反传数进行数值逆解,并评估这对的整合度。这个整合度可以提高注册精度,并生成一个可靠的不确定度量,可以用于自动质量控制。我们使用4D肺CT影像 dataset进行评估。我们的循环相互优化方法可以降低优化失败率从2.4%降至0.0%,相比于目前的state-of-the-art。我们的推断方法可以提高标本精度4.5%,并且可以检测所有注册方法失败 converge to a correct solution。我们还证明了我们的结果可以在其他数据上进行一致性测试,例如腹部4D MRI中心线传播任务,我们的方法可以提高传播一致性46%,并且与注册精度之间存在强相关。

MarineDet: Towards Open-Marine Object Detection

  • paper_url: http://arxiv.org/abs/2310.01931
  • repo_url: None
  • paper_authors: Liang Haixin, Zheng Ziqiang, Ma Zeyu, Sai-Kit Yeung
    for: marine object detection, especially for detecting diverse and unseen marine objects in underwater imagerymethods: 使用joint visual-text semantic space through pre-training, followed by marine-specific training to achieve in-air-to-marine knowledge transferresults: superior performance compared to existing generalist and specialist object detection algorithms, demonstrating the effectiveness of OMOD for marine ecosystem monitoring and management.
    Abstract Marine object detection has gained prominence in marine research, driven by the pressing need to unravel oceanic mysteries and enhance our understanding of invaluable marine ecosystems. There is a profound requirement to efficiently and accurately identify and localize diverse and unseen marine entities within underwater imagery. The open-marine object detection (OMOD for short) is required to detect diverse and unseen marine objects, performing categorization and localization simultaneously. To achieve OMOD, we present \textbf{MarineDet}. We formulate a joint visual-text semantic space through pre-training and then perform marine-specific training to achieve in-air-to-marine knowledge transfer. Considering there is no specific dataset designed for OMOD, we construct a \textbf{MarineDet dataset} consisting of 821 marine-relative object categories to promote and measure OMOD performance. The experimental results demonstrate the superior performance of MarineDet over existing generalist and specialist object detection algorithms. To the best of our knowledge, we are the first to present OMOD, which holds a more valuable and practical setting for marine ecosystem monitoring and management. Our research not only pushes the boundaries of marine understanding but also offers a standard pipeline for OMOD.
    摘要 海洋物体检测已经在海洋研究中占据了重要地位,这是由于需要解开海洋的秘密和提高我们对宝贵海洋生态系统的理解。在海洋图像中寻找和分类多种和未经见过的海洋对象是一项急需要解决的问题。为了实现这一目标,我们提出了海洋物体检测(OMOD)。我们通过预训练形成了视觉语义空间,然后通过海洋专门训练来实现海洋知识传递。由于没有专门为OMOD设计的数据集,我们构建了一个名为“MarineDet数据集”的821个海洋相关对象类别,以促进和评估OMOD性能。实验结果表明,MarineDet在现有的普遍和专家对象检测算法中表现出色。据我们所知,我们是第一个提出OMOD的研究人员,这将为海洋监测和管理提供更加有价值和实用的 Setting。我们的研究不仅推动了海洋理解的前iers,也提供了OMOD的标准管道。

RoFormer for Position Aware Multiple Instance Learning in Whole Slide Image Classification

  • paper_url: http://arxiv.org/abs/2310.01924
  • repo_url: None
  • paper_authors: Etienne Pochet, Rami Maroun, Roger Trullo
  • for: 这个论文旨在解决 computational pathology 中的整个扫描图像(WSI)分类问题,但是现有的方法受到多个特征提取器的冻结问题。
  • methods: 该论文提出了一种使用 RoFormer 层,该层利用了内存高效的自我注意力和相对位置编码,可以对大小和形态不固定的 WSI 的补丁进行全自我注意力和相对位置编码,从而解决了patches之间的相互关系和组织结构的问题。
  • results: 该论文表明,使用该方法可以在三个公共数据集(TCGA-NSCLC、BRACS 和 Camelyon16)上超越现有的MIL模型,在弱级标注分类任务上实现更高的性能。
    Abstract Whole slide image (WSI) classification is a critical task in computational pathology. However, the gigapixel-size of such images remains a major challenge for the current state of deep-learning. Current methods rely on multiple-instance learning (MIL) models with frozen feature extractors. Given the the high number of instances in each image, MIL methods have long assumed independence and permutation-invariance of patches, disregarding the tissue structure and correlation between patches. Recent works started studying this correlation between instances but the computational workload of such a high number of tokens remained a limiting factor. In particular, relative position of patches remains unaddressed. We propose to apply a straightforward encoding module, namely a RoFormer layer , relying on memory-efficient exact self-attention and relative positional encoding. This module can perform full self-attention with relative position encoding on patches of large and arbitrary shaped WSIs, solving the need for correlation between instances and spatial modeling of tissues. We demonstrate that our method outperforms state-of-the-art MIL models on three commonly used public datasets (TCGA-NSCLC, BRACS and Camelyon16)) on weakly supervised classification tasks. Code is available at https://github.com/Sanofi-Public/DDS-RoFormerMIL
    摘要 全像图分类(WSI)是计算 PATHOLOGY 中的一项关键任务。然而, gigapixel 大小的这些图像仍然是当前深度学习中的主要挑战。现有的方法通常使用多个实例学习(MIL)模型,它们的特征提取器被冻结。由于每个图像中的实例数量很高,MIL 方法一直假设实例之间独立和排序不变。这些方法忽略了组织结构和实例之间的相关性。在最近的一些研究中,开始研究实例之间的相关性,但计算工作负担仍然是一个限制因素。特别是,实例之间的相对位置没有被考虑。我们提议使用一个简单的编码模块,即 RoFormer 层,该模块基于内存有效的准确自注意和相对位置编码。这个模块可以在大小和形态不固定的 WSI 上完全进行自注意和相对位置编码,解决实例之间的相关性和组织结构的空间模型。我们示出了我们的方法在三个常用的公共数据集(TCGA-NSCLC、BRACS 和 Camelyon16)上超过了当前的 MIL 模型的弱化类型分类任务。代码可以在 https://github.com/Sanofi-Public/DDS-RoFormerMIL 上获取。

Improved Automatic Diabetic Retinopathy Severity Classification Using Deep Multimodal Fusion of UWF-CFP and OCTA Images

  • paper_url: http://arxiv.org/abs/2310.01912
  • repo_url: None
  • paper_authors: Mostafa El Habib Daho, Yihao Li, Rachid Zeghlache, Yapo Cedric Atse, Hugo Le Boité, Sophie Bonnin, Deborah Cosette, Pierre Deman, Laurent Borderie, Capucine Lepicard, Ramin Tadayoni, Béatrice Cochener, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec
  • for: 这个研究的目的是提高遗传性疾病肉眼病(DR)的早期识别,以提高病人的临床结果。
  • methods: 这个研究使用了多modal的图像技术,包括Ultra-WideField Color Fundus Photography(UWF-CFP)图像和Optical Coherence Tomography Angiography(OCTA)图像,并融合了ResNet50和3D-ResNet50模型,以提高DR的分类性能。
  • results: 实验结果显示,这个多modal方法可以与单一模式比较,提高DR的分类性能。这种方法可能将成为早期识别DR的重要工具,帮助改善病人的临床结果。
    Abstract Diabetic Retinopathy (DR), a prevalent and severe complication of diabetes, affects millions of individuals globally, underscoring the need for accurate and timely diagnosis. Recent advancements in imaging technologies, such as Ultra-WideField Color Fundus Photography (UWF-CFP) imaging and Optical Coherence Tomography Angiography (OCTA), provide opportunities for the early detection of DR but also pose significant challenges given the disparate nature of the data they produce. This study introduces a novel multimodal approach that leverages these imaging modalities to notably enhance DR classification. Our approach integrates 2D UWF-CFP images and 3D high-resolution 6x6 mm$^3$ OCTA (both structure and flow) images using a fusion of ResNet50 and 3D-ResNet50 models, with Squeeze-and-Excitation (SE) blocks to amplify relevant features. Additionally, to increase the model's generalization capabilities, a multimodal extension of Manifold Mixup, applied to concatenated multimodal features, is implemented. Experimental results demonstrate a remarkable enhancement in DR classification performance with the proposed multimodal approach compared to methods relying on a single modality only. The methodology laid out in this work holds substantial promise for facilitating more accurate, early detection of DR, potentially improving clinical outcomes for patients.
    摘要 糖尿病 retinopathy (DR) 是 диабеتype 的一种常见并严重的合并症,全球范围内有数百万人受到影响,这种情况加剧了精准和及时诊断的需求。 latest advancements in imaging technologies, such as Ultra-WideField Color Fundus Photography (UWF-CFP) imaging and Optical Coherence Tomography Angiography (OCTA), provide opportunities for the early detection of DR, but also pose significant challenges due to the disparate nature of the data they produce. This study introduces a novel multimodal approach that leverages these imaging modalities to notably enhance DR classification. Our approach integrates 2D UWF-CFP images and 3D high-resolution 6x6 mm$^3$ OCTA (both structure and flow) images using a fusion of ResNet50 and 3D-ResNet50 models, with Squeeze-and-Excitation (SE) blocks to amplify relevant features. Additionally, to increase the model's generalization capabilities, a multimodal extension of Manifold Mixup, applied to concatenated multimodal features, is implemented. Experimental results demonstrate a remarkable enhancement in DR classification performance with the proposed multimodal approach compared to methods relying on a single modality only. The methodology laid out in this work holds substantial promise for facilitating more accurate, early detection of DR, potentially improving clinical outcomes for patients.

CLIP Is Also a Good Teacher: A New Learning Framework for Inductive Zero-shot Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.02296
  • repo_url: None
  • paper_authors: Jialei Chen, Daisuke Deguchi, Chenkai Zhang, Xu Zheng, Hiroshi Murase
  • for: 提高 Zero-shot Semantic Segmentation (GZLSS) 方法的效果,使其能够应用于不同的像素分类 segmentation 模型,无需添加额外的mask提案器或改变 CLIP 模型的结构。
  • methods: 提出了一种新的学习框架 CLIPTeacher,它可以应用于不同的像素分类 segmentation 模型,并利用所有的图像信息。 CLIPTeacher 包括两个关键模块:全球学习模块 (GLM) 和像素学习模块 (PLM)。 GLM 将图像编码器中的普通特征与 CLIP 模型中的 CLS token 进行对对比,从而捕捉全局信息。 PLM 则只使用 CLIP 模型中的普通特征来生成高级假注入,不需要添加额外的mask提案器。
  • results: 对三个 benchmark 数据集进行实验,结果表明,CLIPTeacher 可以大幅提高 Zero-shot Semantic Segmentation 的效果,具体来说是:PASCAL VOC 2012 上提高了 2.2%,COCO-Stuff 164k 上提高了 1.3%,PASCAL Context 上提高了 8.8%。
    Abstract Existing Generalized Zero-shot Semantic Segmentation (GZLSS) methods apply either finetuning the CLIP paradigm or formulating it as a mask classification task, benefiting from the Vision-Language Models (VLMs). However, the fine-tuning methods are restricted with fixed backbone models which are not flexible for segmentation, and mask classification methods heavily rely on additional explicit mask proposers. Meanwhile, prevalent methods utilize only seen categories which is a great waste, i.e., neglecting the area exists but not annotated. To this end, we propose CLIPTeacher, a new learning framework that can be applied to various per-pixel classification segmentation models without introducing any explicit mask proposer or changing the structure of CLIP, and utilize both seen and ignoring areas. Specifically, CLIPTeacher consists of two key modules: Global Learning Module (GLM) and Pixel Learning Module (PLM). Specifically, GLM aligns the dense features from an image encoder with the CLS token, i.e., the only token trained in CLIP, which is a simple but effective way to probe global information from the CLIP models. In contrast, PLM only leverages dense tokens from CLIP to produce high-level pseudo annotations for ignoring areas without introducing any extra mask proposer. Meanwhile, PLM can fully take advantage of the whole image based on the pseudo annotations. Experimental results on three benchmark datasets: PASCAL VOC 2012, COCO-Stuff 164k, and PASCAL Context show large performance gains, i.e., 2.2%, 1.3%, and 8.8%
    摘要 现有的泛化零shot semantic segmentation(GZLSS)方法通常是通过finetuning CLIP模式或者表示为 маска分类任务,借助于视觉语言模型(VLM)。然而, Fine-tuning 方法受到固定的背景模型的限制,不 flexible enough for segmentation,而且mask classification方法强调添加显式的mask proposer。此外,普遍的方法只使用已经看到的类别,这是一种大的浪费,即忽略不标注的区域。为了解决这个问题,我们提出了 CLIPTeacher,一种新的学习框架,可以应用于不同的每像素分类 segmentation 模型,无需添加显式的mask proposer,并且可以使用所有的区域。CLIPTeacher 包括两个关键模块:全球学习模块(GLM)和像素学习模块(PLM)。具体来说,GLM 将图像编码器中的稠密特征与 CLIP 模型中的 CLS token 进行对应,即通过简单而有效的方式 probing 全局信息从 CLIP 模型中。相比之下,PLM 只是使用 CLIP 模型中的稠密token 生成高级 Pseudo 标注 для忽略区域,而不需要添加额外的mask proposer。同时,PLM 可以完全利用整个图像,基于 Pseudo 标注。实验结果在 Pascal VOC 2012、COCO-Stuff 164k 和 Pascal Context 三个标准测试集上显示了大的性能提升,即 2.2%、1.3% 和 8.8%。

Improving style transfer in dynamic contrast enhanced MRI using a spatio-temporal approach

  • paper_url: http://arxiv.org/abs/2310.01908
  • repo_url: None
  • paper_authors: Adam G. Tattersall, Keith A. Goatman, Lucy E. Kershaw, Scott I. K. Semple, Sonia Dahdouh
  • for: 这个论文旨在解决DCE-MRI中的样式传递问题,因为不同的组织和时间点的增强效应具有大量的变化。
  • methods: 该论文提出了一种新的方法,它将Autoencoder与卷积LSTM结合,以便分解内容和风格,并使用适应性的卷积来处理增强效应的地方化特性。
  • results: 论文的实验结果表明,该方法可以在两个不同的数据集上出perform state-of-the-art。
    Abstract Style transfer in DCE-MRI is a challenging task due to large variations in contrast enhancements across different tissues and time. Current unsupervised methods fail due to the wide variety of contrast enhancement and motion between the images in the series. We propose a new method that combines autoencoders to disentangle content and style with convolutional LSTMs to model predicted latent spaces along time and adaptive convolutions to tackle the localised nature of contrast enhancement. To evaluate our method, we propose a new metric that takes into account the contrast enhancement. Qualitative and quantitative analyses show that the proposed method outperforms the state of the art on two different datasets.
    摘要 <>将给定文本翻译成简化中文。<>在DCE-MRI中的风格传输是一项复杂的任务,因为不同的组织和时间中的对比增强大相差。现有的无监督方法因为图像序列中的对比增强和运动的各种多样性而失败。我们提议一种新的方法,该方法组合了自动编码器来分离内容和风格,以及卷积LSTM来预测时间序列中的隐藏空间。为评估我们的方法,我们提出了一个新的度量标准,该标准考虑对比增强的因素。Qualitative和量化分析表明,我们的方法在两个不同的数据集上表现出了优于当前状态的表现。

Beyond the Benchmark: Detecting Diverse Anomalies in Videos

  • paper_url: http://arxiv.org/abs/2310.01904
  • repo_url: https://github.com/yoavarad/mfad
  • paper_authors: Yoav Arad, Michael Werman
  • for: 本研究旨在推动视频异常检测(VAD)领域的发展,扩展传统的单帧异常检测范畴,处理复杂的动作异常情况。
  • methods: 本研究提出了两个新的数据集:HMDB-AD和HMDB-Violence,以挑战模型处理多种动作异常情况。此外,研究人员还提出了一种新的多帧异常检测方法(MFAD),基于AI-VAD框架,使用单帧特征和两帧特征,以计算异常分布。
  • results: 实验结果表明,现有模型在新的异常类型面前存在限制,MFAD方法在单简异常检测和复杂异常检测场景中均表现出色。
    Abstract Video Anomaly Detection (VAD) plays a crucial role in modern surveillance systems, aiming to identify various anomalies in real-world situations. However, current benchmark datasets predominantly emphasize simple, single-frame anomalies such as novel object detection. This narrow focus restricts the advancement of VAD models. In this research, we advocate for an expansion of VAD investigations to encompass intricate anomalies that extend beyond conventional benchmark boundaries. To facilitate this, we introduce two datasets, HMDB-AD and HMDB-Violence, to challenge models with diverse action-based anomalies. These datasets are derived from the HMDB51 action recognition dataset. We further present Multi-Frame Anomaly Detection (MFAD), a novel method built upon the AI-VAD framework. AI-VAD utilizes single-frame features such as pose estimation and deep image encoding, and two-frame features such as object velocity. They then apply a density estimation algorithm to compute anomaly scores. To address complex multi-frame anomalies, we add a deep video encoding features capturing long-range temporal dependencies, and logistic regression to enhance final score calculation. Experimental results confirm our assumptions, highlighting existing models limitations with new anomaly types. MFAD excels in both simple and complex anomaly detection scenarios.
    摘要 视频异常检测(VAD)在现代监测系统中扮演着关键角色,旨在在实际情况中发现多种异常。然而,当前的标准数据集主要强调简单的单帧异常,如物品检测。这种狭隘的焦点限制了VAD模型的发展。在本研究中,我们主张拓展VAD研究,以涵盖更为复杂的异常。为此,我们介绍了两个数据集:HMDB-AD和HMDB-Violence,以挑战模型。这两个数据集都来自于HMDB51动作识别数据集。我们还提出了一种新的多帧异常检测方法(MFAD),基于AI-VAD框架。AI-VAD使用单帧特征,如pose estimation和深度图像编码,以及两帧特征,如物体速度。然后,它们应用某种密度估计算法计算异常分数。为了处理复杂的多帧异常,我们添加了深度视频编码特征,捕捉长距离时间相关性,以及逻辑回归来增强最终分数计算。实验结果证明了我们的假设,显示了现有模型对新类型异常的局限性。MFAD在简单和复杂异常检测场景中均表现出色。

MFOS: Model-Free & One-Shot Object Pose Estimation

  • paper_url: http://arxiv.org/abs/2310.01897
  • repo_url: None
  • paper_authors: JongMin Lee, Yohann Cabon, Romain Brégier, Sungjoo Yoo, Jerome Revaud
  • for: 能够在单一的前进 pass 中 estimate 未经训练过的对象姿 pose, 只需要 minimum 的输入。
  • methods: 我们提出了一种使用 transformer 架构的方法, 可以充分利用最近提出的 3D-geometry 通用预训练。
  • results: 我们在 LINEMOD benchmark 上进行了广泛的实验,并reported 一shot 性能的 state-of-the-art 表现。
    Abstract Existing learning-based methods for object pose estimation in RGB images are mostly model-specific or category based. They lack the capability to generalize to new object categories at test time, hence severely hindering their practicability and scalability. Notably, recent attempts have been made to solve this issue, but they still require accurate 3D data of the object surface at both train and test time. In this paper, we introduce a novel approach that can estimate in a single forward pass the pose of objects never seen during training, given minimum input. In contrast to existing state-of-the-art approaches, which rely on task-specific modules, our proposed model is entirely based on a transformer architecture, which can benefit from recently proposed 3D-geometry general pretraining. We conduct extensive experiments and report state-of-the-art one-shot performance on the challenging LINEMOD benchmark. Finally, extensive ablations allow us to determine good practices with this relatively new type of architecture in the field.
    摘要 现有的学习基于方法 дляRGB图像中对象姿态估计都是模型特定或类别基于的。它们缺乏在测试时对新类别对象进行泛化的能力,因此很大程度上阻碍了它们的实用性和扩展性。值得注意的是,最近有一些尝试解决这个问题,但它们仍然需要在训练和测试时准确的3D对象表面数据。在这篇论文中,我们介绍了一种新的方法,可以在单一的前进 pass中估计在训练时未看过的对象姿态,只需最小的输入。与现有的状态对比,我们的提议的模型完全基于 transformer 架构,可以从最近的3D-geometry普适预训练中受益。我们进行了广泛的实验,并在LINEMOD测试准则上report了一shot性能的状态对比。最后,我们进行了广泛的ablation,以确定这种相对新的架构在领域中的好做法。

Adaptive Multi-NeRF: Exploit Efficient Parallelism in Adaptive Multiple Scale Neural Radiance Field Rendering

  • paper_url: http://arxiv.org/abs/2310.01881
  • repo_url: None
  • paper_authors: Tong Wang, Shuichi Kurabayashi
  • for: 这个论文的目的是提高NeRF的 Rendering速度,使其适用于实时渲染应用。
  • methods: 该方法使用了分割Scene into axis-aligned bounding boxes,并将不同场景部分分配给不同大小的NeRF。它还使用了导航密度网格来均衡每个Multilayer Perceptron(MLP)的表现能力。
  • results: 该方法可以大幅提高GPU的利用率,并且可以在实时渲染应用中加速Rendering过程。
    Abstract Recent advances in Neural Radiance Fields (NeRF) have demonstrated significant potential for representing 3D scene appearances as implicit neural networks, enabling the synthesis of high-fidelity novel views. However, the lengthy training and rendering process hinders the widespread adoption of this promising technique for real-time rendering applications. To address this issue, we present an effective adaptive multi-NeRF method designed to accelerate the neural rendering process for large scenes with unbalanced workloads due to varying scene complexities. Our method adaptively subdivides scenes into axis-aligned bounding boxes using a tree hierarchy approach, assigning smaller NeRFs to different-sized subspaces based on the complexity of each scene portion. This ensures the underlying neural representation is specific to a particular part of the scene. We optimize scene subdivision by employing a guidance density grid, which balances representation capability for each Multilayer Perceptron (MLP). Consequently, samples generated by each ray can be sorted and collected for parallel inference, achieving a balanced workload suitable for small MLPs with consistent dimensions for regular and GPU-friendly computations. We aosl demonstrated an efficient NeRF sampling strategy that intrinsically adapts to increase parallelism, utilization, and reduce kernel calls, thereby achieving much higher GPU utilization and accelerating the rendering process.
    摘要 Our method subdivides scenes into axis-aligned bounding boxes using a tree hierarchy approach, assigning smaller NeRFs to different-sized subspaces based on the complexity of each scene portion. This ensures that the underlying neural representation is specific to a particular part of the scene. We optimize scene subdivision by employing a guidance density grid, which balances representation capability for each Multilayer Perceptron (MLP). Consequently, samples generated by each ray can be sorted and collected for parallel inference, achieving a balanced workload suitable for small MLPs with consistent dimensions for regular and GPU-friendly computations. We have also demonstrated an efficient NeRF sampling strategy that intrinsically adapts to increase parallelism, utilization, and reduce kernel calls, thereby achieving much higher GPU utilization and accelerating the rendering process.

A Dual Attentive Generative Adversarial Network for Remote Sensing Image Change Detection

  • paper_url: http://arxiv.org/abs/2310.01876
  • repo_url: None
  • paper_authors: Luyi Qiu, Xiaofeng Zhang, ChaoChen Gu, and ShanYing Zhu
  • for: 本研究针对高分辨率 remote sensing 像素数据的变化探测任务进行研究,旨在提高检测精度和效率。
  • methods: 本研究提出了一个基于 dual attentive generative adversarial network(DAGAN)的方法,具有以下特点:(1)将检测模型视为生成器,通过生成 adversarial 策略来实现最佳类别器的适性,不增加检测模型的参数数目;(2)采用多层特征提取器来有效地融合多层特征,并导入总和连接来融合多层特征;(3)提出多値构成弹性融合模组来适应不同级别的物件,并导入 контекст调整模组来探索 Kontextuelle 依赖关系。
  • results: 实验结果显示,DAGAN 架构在 LEVIR 数据集上的平均 IoU 和 F1 分布为 85.01% 和 91.48%,较先进方法的表现更佳。
    Abstract Remote sensing change detection between bi-temporal images receives growing concentration from researchers. However, comparing two bi-temporal images for detecting changes is challenging, as they demonstrate different appearances. In this paper, we propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks, which regards the detection model as a generator and attains the optimal weights of the detection model without increasing the parameters of the detection model through generative-adversarial strategy, boosting the spatial contiguity of predictions. Moreover, We design a multi-level feature extractor for effectively fusing multi-level features, which adopts the pre-trained model to extract multi-level features from bi-temporal images and introduces aggregate connections to fuse them. To strengthen the identification of multi-scale objects, we propose a multi-scale adaptive fusion module to adaptively fuse multi-scale features through various receptive fields and design a context refinement module to explore contextual dependencies. Moreover, the DAGAN framework utilizes the 4-layer convolution network as a discriminator to identify whether the synthetic image is fake or real. Extensive experiments represent that the DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
    摘要 <>Translate the given text into Simplified Chinese.<>远程感知变化检测 между双时间图像获得了研究人员的增加关注。然而,比较两个双时间图像以检测变化是具有挑战性的,因为它们具有不同的外观。在这篇论文中,我们提议一种双注意力生成 adversarial网络(DAGAN),用于实现高分辨率远程感知图像变化检测任务。DAGAN通过生成检测模型的优化参数,不需要增加检测模型的参数,从而提高了空间连续性。此外,我们设计了一种多级特征提取器,用于有效地融合多级特征。我们采用预训练模型来提取多级特征从双时间图像,并引入汇聚连接来融合它们。为了强化多 scales 对象的标识,我们提出了多 scales adaptive fusion模块,可以适应性地融合多级特征。此外,DAGAN框架还利用4层核心网络作为判据器,以确定生成图像是真实的或假的。广泛的实验表明,DAGAN框架在LEVIR数据集上的表现更好,其中 mean IoU 为85.01%,mean F1 score 为91.48%。

Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

  • paper_url: http://arxiv.org/abs/2310.01861
  • repo_url: https://github.com/jhl-det/fla-net
  • paper_authors: Junhao Lin, Qian Dai, Lei Zhu, Huazhu Fu, Qiong Wang, Weibin Li, Wenhao Rao, Xiaoyang Huang, Liansheng Wang
  • for: 这个研究旨在提高乳腺癌识别和治疗 axillary lymph node metastasis 的过程中,使用ultrasound 影像。
  • methods: 研究人员提出了一个新的频率和地方特征聚合网络(FLA-Net),从频率领域学习时间特征,并预测额外的腺腔癌位置。他们还提出了一个局部化的对照损失函数,以减少不同影像中的腺腔癌位置之间的距离。
  • results: 研究人员在自己的标注数据集和两个公共影像数据集上进行了实验,结果显示,他们的提案的 FLA-Net 在乳腺癌识别中具有国际级的性能,并且可以对影像数据集进行高效的处理和分析。
    Abstract Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprising 572 videos and 34,300 annotated frames, covering a wide range of realistic clinical scenarios. Furthermore, we propose a novel frequency and localization feature aggregation network (FLA-Net) that learns temporal features from the frequency domain and predicts additional lesion location positions to assist with breast lesion segmentation. We also devise a localization-based contrastive loss to reduce the lesion location distance between neighboring video frames within the same video and enlarge the location distances between frames from different ultrasound videos. Our experiments on our annotated dataset and two public video polyp segmentation datasets demonstrate that our proposed FLA-Net achieves state-of-the-art performance in breast lesion segmentation in US videos and video polyp segmentation while significantly reducing time and space complexity. Our model and dataset are available at https://github.com/jhl-Det/FLA-Net.
    摘要 腋窝癌肿分割在ultrasound(US)视频中是诊断和治疗肿瘤肿瘤肿瘤的关键。然而,由于没有一个成熔和大规模的US视频数据集,高质量的注释很困难。为了解决这个问题,我们仔细筛选了572个US视频和34,300个注释帧,覆盖了许多真实的临床场景。此外,我们提出了一种频率和地点特征聚合网络(FLA-Net),可以从频率频谱中学习时间特征,并预测更多的肿瘤位置。我们还提出了一种基于地点的对比损失函数,可以在同一个US视频中减少肿瘤位置的距离,并在不同的US视频中增大距离。我们的实验表明,我们的提议的FLA-Net在US视频中的肿瘤分割和视频垂直段落分割中具有最佳性能,同时减少时间和空间复杂性。我们的模型和数据集可以在https://github.com/jhl-Det/FLA-Net中获取。

Selective Feature Adapter for Dense Vision Transformers

  • paper_url: http://arxiv.org/abs/2310.01843
  • repo_url: None
  • paper_authors: Xueqing Deng, Qi Fan, Xiaojie Jin, Linjie Yang, Peng Wang
  • for: 这个论文目的是解决预训transformer模型中巨量parameters的问题,以提高紧密预测类别 tasks的性能。
  • methods: 这个方法使用选择性特征适应器(SFA),包括外部适应器和内部适应器, sequentially operate over a transformer model。
  • results: 实验结果显示,这个方法可以在紧密预测类别 tasks上 achieves state-of-the-art(SoTA)性能,并且比完全 fine-tune 模型在不同的任务上表现更好或相同。
    Abstract Fine-tuning pre-trained transformer models, e.g., Swin Transformer, are successful in numerous downstream for dense prediction vision tasks. However, one major issue is the cost/storage of their huge amount of parameters, which becomes increasingly challenging to handle with the growing amount of vision tasks. In this paper, we propose an effective approach to alleviate the issue, namely selective feature adapter (SFA). It achieves state-of-the-art (SoTA) performance under any given budget of trainable parameters, and demonstrates comparable or better performance than fully fine-tuned models across various dense tasks. Specifically, SFA consists of external adapters and internal adapters which are sequentially operated over a transformer model. For external adapters, we properly select the places and amount of additional multilayer perception (MLP). For internal adapters, we transform a few task-important parameters inside the transformer, which are automatically discovered through a simple yet effective lottery ticket algorithm. Our experiments show that the dual adapter module, a.k.a SFA, is essential to achieve the best trade-off on dense vision tasks, such as segmentation, detection and depth-estimation, outperforming other adapters with a single module.
    摘要 通过练写预训练变换器模型,如斯вин变换器,在许多某些下渠任务上实现了优秀的表现。然而,一个主要问题是其参数的成本和存储量,这在视觉任务的数量不断增加时变得越来越Difficult to handle。在这篇论文中,我们提出了一种有效的方法来解决这个问题,即选择性特征适配器(SFA)。它在任何给定的训练参数预算下实现了状态元的表现,并在不同的权重任务中达到了相当于或更好的表现。SFA由外部适配器和内部适配器两部分组成,这两部分在转换器模型之上依次运行。对于外部适配器,我们合理地选择了附加的多层感知(MLP)的地方和数量。对于内部适配器,我们通过简单 yet effective的抽奖算法自动发现了转换器中一些任务重要的参数,并将它们变换为一些新的参数。我们的实验表明,双 adapter 模块,即 SFA,是在权重视觉任务,如 segmentation、检测和深度估计,实现了最佳的平衡,超过了其他适配器。

SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering

  • paper_url: http://arxiv.org/abs/2310.01842
  • repo_url: None
  • paper_authors: Bruno Souza, Marius Aasan, Helio Pedrini, Adín Ramírez Rivera
    for: 本研究旨在提高基于Scene Graph(SG)的Visual Question Answering(VQA)任务中的批处理性能。methods: 我们提出了SelfGraphVQA框架,其包括使用预训练Scene Graph生成器提取图像中的Scene Graph,并使用自我超VI的自然语言表示进行semantically-preserving augmentation。我们还研究了三种最大化策略:节点级、图像级和卷积equivariant regularization。results: 我们实验表明,使用提取的Scene Graph可以提高VQA任务中的批处理性能,并且这些方法可以增强图像中的视觉信息的重要性。
    Abstract The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images. To address this issue, we introduce the SelfGraphVQA framework. Our approach extracts a scene graph from an input image using a pre-trained scene graph generator and employs semantically-preserving augmentation with self-supervised techniques. This method improves the utilization of graph representations in VQA tasks by circumventing the need for costly and potentially biased annotated data. By creating alternative views of the extracted graphs through image augmentations, we can learn joint embeddings by optimizing the informational content in their representations using an un-normalized contrastive approach. As we work with SGs, we experiment with three distinct maximization strategies: node-wise, graph-wise, and permutation-equivariant regularization. We empirically showcase the effectiveness of the extracted scene graph for VQA and demonstrate that these approaches enhance overall performance by highlighting the significance of visual information. This offers a more practical solution for VQA tasks that rely on SGs for complex reasoning questions.
    摘要 “视觉语言交叉是当前研究热点,因为它们在识别和理解之间的集成提供了更好的性能。场景图(SG)在多模态图像分析中表现出色,特别是在视觉问答(VQA)任务中。在这项工作中,我们发现了使用预训练场景图生成器生成场景图后,使用自我相似的扩展技术来增强图表示的方法。这种方法可以减少使用高成本和可能受损的标注数据,并且可以增强图表示的信息内容。我们在实验中使用了三种不同的最大化策略:节点级、图像级和对称变换的正则化。我们证明了这种方法可以提高 VQA 任务的总性能,并且强调了视觉信息的重要性。这个方法可以提供更实际的解决方案 для VQA 任务,尤其是当它们基于场景图进行复杂的推理问题。”

Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes

  • paper_url: http://arxiv.org/abs/2310.01840
  • repo_url: https://github.com/cszhilu1998/selfhdr
  • paper_authors: Zhilu Zhang, Haoyu Wang, Shuai Liu, Xiaotao Wang, Lei Lei, Wangmeng Zuo
  • for: 提高高动态范围(HDR)图像的重建,不需要 Labelled数据。
  • methods: 基于自我监督学习的HDR重建方法,使用多张多曝光图像进行训练,并通过两个 complementary component来捕捉HDR颜色和结构信息。
  • results: 在实际图像上进行测试,SelfHDR方法可以获得supervised方法相当的性能,并且超过自我监督方法的性能。代码可以在https://github.com/cszhilu1998/SelfHDR中下载。
    Abstract Merging multi-exposure images is a common approach for obtaining high dynamic range (HDR) images, with the primary challenge being the avoidance of ghosting artifacts in dynamic scenes. Recent methods have proposed using deep neural networks for deghosting. However, the methods typically rely on sufficient data with HDR ground-truths, which are difficult and costly to collect. In this work, to eliminate the need for labeled data, we propose SelfHDR, a self-supervised HDR reconstruction method that only requires dynamic multi-exposure images during training. Specifically, SelfHDR learns a reconstruction network under the supervision of two complementary components, which can be constructed from multi-exposure images and focus on HDR color as well as structure, respectively. The color component is estimated from aligned multi-exposure images, while the structure one is generated through a structure-focused network that is supervised by the color component and an input reference (\eg, medium-exposure) image. During testing, the learned reconstruction network is directly deployed to predict an HDR image. Experiments on real-world images demonstrate our SelfHDR achieves superior results against the state-of-the-art self-supervised methods, and comparable performance to supervised ones. Codes are available at https://github.com/cszhilu1998/SelfHDR
    摘要 合并多曝光图像是一种常见的方法来获得高动态范围(HDR)图像,主要挑战是避免在动态场景中出现幽灵 artifacts。现有方法通常是使用深度神经网络进行deghosting,但这些方法通常需要具有HDR真实值的数据,这些数据困难和昂贵地收集。在这种情况下,我们提出了SelfHDR,一种自动标注HDR重建方法,只需要在训练时提供动态多曝光图像。具体来说,SelfHDR学习一个重建网络,该网络在两个补做 Component 的指导下进行学习,这两个 Component 可以从多曝光图像中构建,它们的目标是获得HDR颜色以及结构。颜色 Component 是由对aligned多曝光图像进行Estimation的,而结构 Component 是通过一个以颜色Component为导向,并且与输入参考图像(例如中曝光图像)进行Supervised的神经网络来生成的。在测试时,学习的重建网络直接部署到预测HDR图像。实验结果表明,我们的SelfHDR方法在实际图像上比自动标注方法 superior,并且与指导方法相当。代码可以在https://github.com/cszhilu1998/SelfHDR 中找到。

Skin the sheep not only once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow

  • paper_url: http://arxiv.org/abs/2310.01833
  • repo_url: None
  • paper_authors: Sheng-Chi Huang, Wei-Chen Chiu
  • for: 提高视觉和机器人应用中的光流估计精度,解决现有方法在实际场景中难以获得有效的光流真实数据问题。
  • methods: 利用光流估计和雷达测量之间的几何连接,将多种实际深度估计数据集合并生成超过监督学习的有效训练数据,并通过在一个图像上进行几何变换来增加光流估计器的学习环境灵活性。
  • results: 通过对多种数据集和不同光流估计模型进行广泛的实验,证明提出的方法能够提高光流估计精度和普适性,并且不是与特定的光流估计器相关。
    Abstract Optical flow estimation is crucial for various applications in vision and robotics. As the difficulty of collecting ground truth optical flow in real-world scenarios, most of the existing methods of learning optical flow still adopt synthetic dataset for supervised training or utilize photometric consistency across temporally adjacent video frames to drive the unsupervised learning, where the former typically has issues of generalizability while the latter usually performs worse than the supervised ones. To tackle such challenges, we propose to leverage the geometric connection between optical flow estimation and stereo matching (based on the similarity upon finding pixel correspondences across images) to unify various real-world depth estimation datasets for generating supervised training data upon optical flow. Specifically, we turn the monocular depth datasets into stereo ones via synthesizing virtual disparity, thus leading to the flows along the horizontal direction; moreover, we introduce virtual camera motion into stereo data to produce additional flows along the vertical direction. Furthermore, we propose applying geometric augmentations on one image of an optical flow pair, encouraging the optical flow estimator to learn from more challenging cases. Lastly, as the optical flow maps under different geometric augmentations actually exhibit distinct characteristics, an auxiliary classifier which trains to identify the type of augmentation from the appearance of the flow map is utilized to further enhance the learning of the optical flow estimator. Our proposed method is general and is not tied to any particular flow estimator, where extensive experiments based on various datasets and optical flow estimation models verify its efficacy and superiority.
    摘要 优化流量估计是视觉和机器人应用中的关键问题。由于实际场景中收集实际流量真实数据困难,大多数现有的学习流量估计方法仍然采用生成的 sintetic 数据进行超级vised 训练或利用相机适应性保持 temporal 邻近视频帧的光学一致性来驱动不监督学习,其中前者通常具有普遍化问题而后者通常比监督学习更差。为解决这些挑战,我们提议利用光流估计与tereo匹配的几何连接来统一各种实际深度估计数据,以生成超级vised 训练数据。具体来说,我们将монокуляр深度数据转化为stereo数据,并通过 sintesizing 虚拟的 disparity 来导致流量在水平方向上;此外,我们还引入虚拟相机运动,以生成额外的流量在垂直方向上。此外,我们提议在一个光流对的图像上应用几何变换,以驱动光流估计器学习更加具有挑战性的情况。最后,我们发现光流图像不同的几何变换实际上具有不同的特征,因此我们提出了一个辅助类ifier,用于在光流图像上预测几何变换的类型,以进一步改进光流估计器的学习。我们的提议方法不受任何特定的流量估计器约束,并经过了多种数据和光流估计器的实验,证明了其效果和优越性。

AI-Generated Images as Data Source: The Dawn of Synthetic Era

  • paper_url: http://arxiv.org/abs/2310.01830
  • repo_url: https://github.com/mwxely/aigs
  • paper_authors: Zuhao Yang, Fangneng Zhan, Kunhao Liu, Muyu Xu, Shijian Lu
  • for: 本研究旨在探讨使用生成AI模型生成的图像作为视觉智能的新数据源,以改善传统的模型设计方法。
  • methods: 本研究采用了生成AI模型,如Generative Adversarial Networks (GANs)和Variational Autoencoders (VAEs),生成大量的图像数据,并对这些数据进行了分析和评估。
  • results: 研究发现,使用生成AI模型生成的图像数据可以提高视觉智能模型的性能,并且可以轻松地生成大量的 Edge cases 和罕见的场景,以便进行计算机模拟、测试和验证。
    Abstract The advancement of visual intelligence is intrinsically tethered to the availability of large-scale data. In parallel, generative Artificial Intelligence (AI) has unlocked the potential to create synthetic images that closely resemble real-world photographs. This prompts a compelling inquiry: how much visual intelligence could benefit from the advance of generative AI? This paper explores the innovative concept of harnessing these AI-generated images as new data sources, reshaping traditional modeling paradigms in visual intelligence. In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability, the rapid generation of vast datasets, and the effortless simulation of edge cases. Built on the success of generative AI models, we examine the potential of their generated data in a range of applications, from training machine learning models to simulating scenarios for computational modeling, testing, and validation. We probe the technological foundations that support this groundbreaking use of generative AI, engaging in an in-depth discussion on the ethical, legal, and practical considerations that accompany this transformative paradigm shift. Through an exhaustive survey of current technologies and applications, this paper presents a comprehensive view of the synthetic era in visual intelligence. A project associated with this paper can be found at https://github.com/mwxely/AIGS .
    摘要 通过潜在的推动,视觉智能的进步与大规模数据的可用性密切相关。与此同时,生成型人工智能(AI)已经解锁了创造真实图像的潜在。这引发了一个感人的问题:如何利用这些AI生成的图像来提高视觉智能?这篇论文探讨了将这些AI生成的图像作为新的数据源,重新定义传统的视觉智能模型。与真实数据相比,AI生成的数据具有无可比的优势,包括无比充足和可扩展的数据量,快速生成大量数据,以及轻松模拟边缘情况。基于成功的生成AI模型,我们研究了这些生成的数据在多种应用中的潜力,从训练机器学习模型到计算模拟和验证。我们还探讨了这种新的思维方式在技术、伦理和实践方面的支持,并进行了详细的讨论。通过对当前技术和应用的总结,这篇论文提供了对synthetic时代的视觉智能全面的视图。关于这个项目,可以参考https://github.com/mwxely/AIGS。

Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2310.01819
  • repo_url: None
  • paper_authors: Jun Li, Zedong Zhang, Jian Yang
  • for: 本研究旨在开发一种能够生成有意义的组合物图像,从多个文本描述中提取出人类创造力的机器学习系统。
  • methods: 我们提出了一种简单 yet 高效的技术——可接受的交换抽象法,通过交换两个文本embedding的列向量来生成一个新的组合图像,并使用cutting-edge diffusion model来生成新的图像。
  • results: 我们的实验结果表明,我们的方法可以在生成 novel和surprising的组合图像时,超过latest方法,如Stable-Diffusion2、DALLE2、ERNIE-ViLG2和Bing。此外,我们的方法还可以和人工偏好数据集上训练的PickScore和HPSv2相比,在抽取过程中获得相似的结果。
    Abstract Exploring a machine learning system to generate meaningful combinatorial object images from multiple textual descriptions, emulating human creativity, is a significant challenge as humans are able to construct amazing combinatorial objects, but machines strive to emulate data distribution. In this paper, we develop a straight-forward yet highly effective technique called acceptable swap-sampling to generate a combinatorial object image that exhibits novelty and surprise, utilizing text concepts of different objects. Initially, we propose a swapping mechanism that constructs a novel embedding by exchanging column vectors of two text embeddings for generating a new combinatorial image through a cutting-edge diffusion model. Furthermore, we design an acceptable region by managing suitable CLIP distances between the new image and the original concept generations, increasing the likelihood of accepting the new image with a high-quality combination. This region allows us to efficiently sample a small subset from a new image pool generated by using randomly exchanging column vectors. Lastly, we employ a segmentation method to compare CLIP distances among the segmented components, ultimately selecting the most promising object image from the sampled subset. Our experiments focus on text pairs of objects from ImageNet, and our results demonstrate that our approach outperforms recent methods such as Stable-Diffusion2, DALLE2, ERNIE-ViLG2 and Bing in generating novel and surprising object images, even when the associated concepts appear to be implausible, such as lionfish-abacus. Moreover, during the sampling process, our approach without training and human preference is also comparable to PickScore and HPSv2 trained using human preference datasets.
    摘要 investigate 一种机器学习系统,用于从多个文本描述生成有意义的 combinatorial 物品图像,模拟人类创作能力,是一项 significanthallenge。 humans 可以构建惊人的 combinatorial 物品,但机器尝试模拟数据分布。在这篇论文中,我们开发了一种直观而高效的技术——可接受的换Sampling。我们提议一种交换机制,通过对两个文本嵌入的列向量进行交换,生成一个新的 combinatorial 图像。此外,我们设计了一个可接受的区域,通过控制适当的 CLIP 距离,使新图像与原始概念生成之间的关系更加可靠。这个区域使我们能够有效地采样一小 subsets从新生成的图像池中,并最终选择最有前途的物品图像。我们的实验集中使用了 ImageNet 中的对象对,并我们的结果表明,我们的方法在生成有意义和 surprising 的 object 图像方面,超过了最近的方法,如 Stable-Diffusion2、DALLE2、ERNIE-ViLG2 和 Bing。此外,在采样过程中,我们的方法不需要训练和人类偏好,也能与 PickScore 和 HPSv2 训练使用人类偏好数据进行比较。

PPT: Token Pruning and Pooling for Efficient Vision Transformers

  • paper_url: http://arxiv.org/abs/2310.01812
  • repo_url: https://github.com/xjwu1024/PPT
  • paper_authors: Xinjian Wu, Fanhu Zeng, Xiudong Wang, Yunhe Wang, Xinghao Chen
  • for: 提高计算复杂性的实际应用在计算机视觉领域中
  • methods: 使用token pruning和token pooling技术在ViTs中进行减少重复的 acceleration framework
  • results: 在ImageNet dataset上,PPT可以减少37%的FLOPs并提高通过putthroughput比例45%,而无需减少精度。
    Abstract Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their practical applications in real-world scenarios. Motivated by the fact that not all tokens contribute equally to the final predictions and fewer tokens bring less computational cost, reducing redundant tokens has become a prevailing paradigm for accelerating vision transformers. However, we argue that it is not optimal to either only reduce inattentive redundancy by token pruning, or only reduce duplicative redundancy by token merging. To this end, in this paper we propose a novel acceleration framework, namely token Pruning & Pooling Transformers (PPT), to adaptively tackle these two types of redundancy in different layers. By heuristically integrating both token pruning and token pooling techniques in ViTs without additional trainable parameters, PPT effectively reduces the model complexity while maintaining its predictive accuracy. For example, PPT reduces over 37% FLOPs and improves the throughput by over 45% for DeiT-S without any accuracy drop on the ImageNet dataset.
    摘要 computer vision 领域中,Vision Transformers(ViTs)已经成为了强大的模型,在不同的视觉任务中具有出色的表现。然而,高计算复杂度对实际应用场景中的应用带来了很大的障碍。受到每个token不同程度地影响最终预测的事实的灵感,我们提出了一种新的加速框架,即token Pruning & Pooling Transformers(PPT),用于在不同层次上适应性地处理两种不同类型的缺失。通过在ViTs中不添加任何可训练参数的情况下,合理地结合token pruning和token pooling技术,PPT可以有效减少模型的复杂度,保持预测精度。例如,在DeiT-S上,PPT可以减少37%的FLOPs,提高通过put throughput by over 45%,而无需减少精度在ImageNet dataset。

SMRD: SURE-based Robust MRI Reconstruction with Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.01799
  • repo_url: https://github.com/nvlabs/smrd
  • paper_authors: Batu Ozturkler, Chao Liu, Benjamin Eckart, Morteza Mardani, Jiaming Song, Jan Kautz
    for:SMRD is designed to improve the robustness of diffusion models for accelerated MRI reconstruction.methods:SMRD uses Stein’s Unbiased Risk Estimator (SURE) to estimate the mean squared error of the reconstruction during testing, and automatically tunes the inference hyperparameters without the need for validation tuning.results:SMRD outperforms diffusion model baselines on various measurement noise levels, acceleration factors, and anatomies, achieving a PSNR improvement of up to 6 dB under measurement noise.Here is the Chinese translation of the three key points:for:SMRD 是用于提高Diffusion模型的加速MRI重建稳定性的方法。methods:SMRD 使用 Stein 不偏风险估计器 (SURE) 来在测试阶段估计重建结果的均方差,并自动调整推理参数而无需验证集调整。results:SMRD 在不同的测量噪声水平、加速因子和解剖学上都超过了Diffusion模型基elines,实现了测量噪声下 PSNR 提高达 6 dB。
    Abstract Diffusion models have recently gained popularity for accelerated MRI reconstruction due to their high sample quality. They can effectively serve as rich data priors while incorporating the forward model flexibly at inference time, and they have been shown to be more robust than unrolled methods under distribution shifts. However, diffusion models require careful tuning of inference hyperparameters on a validation set and are still sensitive to distribution shifts during testing. To address these challenges, we introduce SURE-based MRI Reconstruction with Diffusion models (SMRD), a method that performs test-time hyperparameter tuning to enhance robustness during testing. SMRD uses Stein's Unbiased Risk Estimator (SURE) to estimate the mean squared error of the reconstruction during testing. SURE is then used to automatically tune the inference hyperparameters and to set an early stopping criterion without the need for validation tuning. To the best of our knowledge, SMRD is the first to incorporate SURE into the sampling stage of diffusion models for automatic hyperparameter selection. SMRD outperforms diffusion model baselines on various measurement noise levels, acceleration factors, and anatomies, achieving a PSNR improvement of up to 6 dB under measurement noise. The code is publicly available at https://github.com/NVlabs/SMRD .
    摘要 Diffusion models 最近受欢迎用于加速MRI重建,因为它们可以提供高质量的样本,并且可以在推理时 flexibly incorporate 前向模型。它们在分布转移下更加稳定,但是需要在验证集上精细调整推理超参数,并且在测试时仍然敏感于分布转移。为解决这些挑战,我们提出了 SMRD(Diffusion Model based MRI Reconstruction with Stein's Unbiased Risk Estimator),一种在测试时进行 hyperparameter 调整,以提高测试时的Robustness。SMRD 使用 Stein's Unbiased Risk Estimator(SURE)来估计测试时重建的 mean squared error。SURE 然后用于自动调整推理超参数,并设置 early stopping 条件,无需验证集调整。根据我们所知,SMRD 是首次将 SURE integrate 到 diffusion models 的 sampling 阶段中进行自动超参数选择。SMRD 在不同的测量噪声水平、压缩因数和 анатомиче特征下,超过 diffusion model 基eline,实现 PSNR 改善达 6 dB 以上。代码可以在 上获取。

HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption

  • paper_url: http://arxiv.org/abs/2310.01779
  • repo_url: None
  • paper_authors: Bohan Zhai, Shijia Yang, Xiangchen Zhao, Chenfeng Xu, Sheng Shen, Dongdi Zhao, Kurt Keutzer, Manling Li, Tan Yan, Xiangjun Fan
  • for: 这 paper 旨在 investigate 现有大规模视语言模型 (LVLM) 的细节描述能力是否准确,以及如何控制这些模型在细节描述中的幻见。
  • methods: 这 paper 提出了一种基于 GPT-4 的评价方法,称为 $\textit{CCEval}$,用于评价 LVLM 在细节描述 task 中的性能。
  • results: 研究发现,exist Hallucination 仍然存在于现有 VQA benchmark 中,而 $\textit{CCEval}$ 的评价方法可以减少这种幻见的发生。此外,研究还发现了一些因素对细节描述的影响,如图像分辨率、语言解码器大小和数据量等。
    Abstract Current large vision-language models (LVLMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning. To address this, we introduce \textit{CCEval}, a GPT-4 assisted evaluation method tailored for detailed captioning. Interestingly, while LVLMs demonstrate minimal object existence hallucination in existing VQA benchmarks, our proposed evaluation reveals continued susceptibility to such hallucinations. In this paper, we make the first attempt to investigate and attribute such hallucinations, including image resolution, the language decoder size, and instruction data amount, quality, granularity. Our findings underscore the unwarranted inference when the language description includes details at a finer object granularity than what the vision module can ground or verify, thus inducing hallucination. To control such hallucinations, we further attribute the reliability of captioning to contextual knowledge (involving only contextually grounded objects) and parametric knowledge (containing inferred objects by the model). Thus, we introduce $\textit{HallE-Switch}$, a controllable LVLM in terms of $\textbf{Hall}$ucination in object $\textbf{E}$xistence. HallE-Switch can condition the captioning to shift between (i) exclusively depicting contextual knowledge for grounded objects and (ii) blending it with parametric knowledge to imagine inferred objects. Our method reduces hallucination by 44% compared to LLaVA$_{7B}$ and maintains the same object coverage.
    摘要 当前大规模视语言模型(LVLM)已经取得了很大的进步,但是对于细节描述仍存在较大的不确定性。为了解决这个问题,我们提出了《CCEval》评价方法,它是基于GPT-4的辅助评价方法,专门用于细节描述。有趣的是,当前的LVLM在存在的VQA benchmark上基本没有显示对象存在hallucination,但我们的提出的评价方法发现,LVLM仍然存在hallucination的问题。在这篇论文中,我们首次 investigate和 attribute这些hallucination,包括图像分辨率、语言解码器大小和数据量、质量和细节。我们的发现表明,当语言描述包含更细的物体detail than what the vision module can confirm or verify时,会导致hallucination。为了控制这些hallucination,我们进一步 attribute了描述的可靠性,包括上下文知识(仅仅包含上下文中的物体)和参数知识(由模型推导出的物体)。因此,我们提出了《HallE-Switch》,一种可控的LVLM,可以在描述中shift между(i)仅仅描述上下文知识和(ii)混合上下文知识和参数知识来控制hallucination。我们的方法可以reduces hallucination by 44% compared to LLaVA$_{7B}$,并且保持同样的物体覆盖率。

ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

  • paper_url: http://arxiv.org/abs/2310.01755
  • repo_url: None
  • paper_authors: William Yang, Byron Zhang, Olga Russakovsky
  • for: The paper aims to investigate the behavior of out-of-distribution (OOD) detection algorithms and to provide insights for guiding the design of future OOD detectors.
  • methods: The paper uses a clean semantic shift dataset called ImageNet-OOD to decouple semantic shift and covariate shift, and conducts comprehensive experiments to evaluate the performance of OOD detection algorithms under these two types of shifts.
  • results: The paper shows that OOD detectors are more sensitive to covariate shift than to semantic shift, and that the benefits of recent OOD detection algorithms on semantic shift detection are minimal. The paper provides important insights for designing future OOD detectors.Here is the same information in Simplified Chinese text:
  • for: 该文章的目的是调查扩展类别检测算法的行为,并为未来的扩展类别检测算法的设计提供重要的指导。
  • methods: 文章使用ImageNet-OOD清晰的semantic shift dataset,以分离semantic shift和covariate shift,并通过全面的实验评估扩展类别检测算法在这两种类型的变化下的表现。
  • results: 文章发现扩展类别检测算法更敏感于covariate shift,而semantic shift detection的 beneficial effects minimal。文章提供了重要的指导,用于设计未来的扩展类别检测算法。
    Abstract The task of out-of-distribution (OOD) detection is notoriously ill-defined. Earlier works focused on new-class detection, aiming to identify label-altering data distribution shifts, also known as "semantic shift." However, recent works argue for a focus on failure detection, expanding the OOD evaluation framework to account for label-preserving data distribution shifts, also known as "covariate shift." Intriguingly, under this new framework, complex OOD detectors that were previously considered state-of-the-art now perform similarly to, or even worse than the simple maximum softmax probability baseline. This raises the question: what are the latest OOD detectors actually detecting? Deciphering the behavior of OOD detection algorithms requires evaluation datasets that decouples semantic shift and covariate shift. To aid our investigations, we present ImageNet-OOD, a clean semantic shift dataset that minimizes the interference of covariate shift. Through comprehensive experiments, we show that OOD detectors are more sensitive to covariate shift than to semantic shift, and the benefits of recent OOD detection algorithms on semantic shift detection is minimal. Our dataset and analyses provide important insights for guiding the design of future OOD detectors.
    摘要 “批量外部(OOD)检测任务被认为是不具体定义的。 Earlier works 专注于新类别检测,尝试识别标签改变的数据分布类型,也就是“semantic shift”。然而,最近的工作强调失败检测,扩展OOD评估框架,考虑到标签保持的数据分布类型,也就是“covariate shift”。这引起了问题: latest OOD 检测器是否能够正确地检测到问题?对 OOD 检测算法的行为进行评估需要分离 semantic shift 和 covariate shift 的评估数据集。为了帮助我们的研究,我们提出了 ImageNet-OOD,一个减少了 covariate shift 的干扰的清晰 semantic shift 数据集。通过全面的实验,我们发现 OOD 检测器更感应 covariate shift 而不是 semantic shift,而最近的 OOD 检测算法对 semantic shift 的 beneficial 效果几乎是零。我们的数据和分析对未来 OOD 检测器的设计提供了重要的启发。”

Generative Autoencoding of Dropout Patterns

  • paper_url: http://arxiv.org/abs/2310.01712
  • repo_url: https://github.com/shuntama/deciphering-autoencoders
  • paper_authors: Shunta Maeda
  • for: 用于生成图像
  • methods: 使用随机Dropout模式和自适应编码器实现图像生成
  • results: 与DCGAN相比,Deciphering Autoencoders具有更稳定的训练过程和类似的样本质量
    Abstract We propose a generative model termed Deciphering Autoencoders. In this model, we assign a unique random dropout pattern to each data point in the training dataset and then train an autoencoder to reconstruct the corresponding data point using this pattern as information to be encoded. Since the training of Deciphering Autoencoders relies solely on reconstruction error, it offers more stable training than other generative models. Despite its simplicity, Deciphering Autoencoders show comparable sampling quality to DCGAN on the CIFAR-10 dataset.
    摘要 我们提出了一种生成模型,即解译自适应网络(Deciphering Autoencoders)。在这种模型中,我们对训练集中每个数据点分配了唯一的随机掉帽模式,然后使用这种模式作为编码信息,使用自适应网络来重建对应的数据点。由于训练解译自适应网络只依据重建错误来进行训练,因此它比其他生成模型更稳定。尽管简单,但解译自适应网络在CIFAR-10数据集上的采样质量与DCGAN相当。

cs.AI - 2023-10-03

Large Language Models Can Be Good Privacy Protection Learners

  • paper_url: http://arxiv.org/abs/2310.02469
  • repo_url: https://github.com/Yijia-Xiao/PPLM
  • paper_authors: Yijia Xiao, Yiqiao Jin, Yushi Bai, Yue Wu, Xianjun Yang, Xiao Luo, Wenchao Yu, Xujiang Zhao, Yanchi Liu, Haifeng Chen, Wei Wang, Wei Cheng
  • for: 本研究旨在 Addressing the challenge of fine-tuning Large Language Models (LLMs) with domain-specific data while protecting sensitive personally identifiable information (PII).
  • methods: 我们提出了一种新的 Fine-tuning Large Language Models (PPLM) 模型,包括 corpus curation, penalty-based unlikelihood in training loss, 和 instruction-based tuning 等多种技术。
  • results: 我们的实验表明, instruction tuning with both positive and negative examples 是一种有效的方法,可以保护 private data 的隐私,同时提高模型的知识。
    Abstract The proliferation of Large Language Models (LLMs) has driven considerable interest in fine-tuning them with domain-specific data to create specialized language models. Nevertheless, such domain-specific fine-tuning data often contains sensitive personally identifiable information (PII). Direct fine-tuning LLMs on this data without privacy protection poses a risk of leakage. To address this challenge, we introduce Privacy Protection Language Models (PPLM), a novel paradigm for fine-tuning LLMs that effectively injects domain-specific knowledge while safeguarding data privacy. Our work offers a theoretical analysis for model design and delves into various techniques such as corpus curation, penalty-based unlikelihood in training loss, and instruction-based tuning, etc. Extensive experiments across diverse datasets and scenarios demonstrate the effectiveness of our approaches. In particular, instruction tuning with both positive and negative examples, stands out as a promising method, effectively protecting private data while enhancing the model's knowledge. Our work underscores the potential for Large Language Models as robust privacy protection learners.
    摘要 LLM的普及已经引起了特定领域数据的特点化语言模型(PPLM)的较大兴趣。然而,这些领域特定的精细化数据经常包含敏感的个人认izable信息(PII)。直接将LLM直接精细化这些数据无法保护隐私。为解决这个挑战,我们介绍了隐私保护语言模型(PPLM),一种新的范例,可以有效地注入领域特定的知识,同时保护数据隐私。我们的工作提供了模型设计的理论分析,以及各种技术,如文献筛选、罚金基于训练损失、指令调整等等。我们在多个数据集和场景进行了广泛的实验,并证明了我们的方法的有效性。特别是使用正例和负例的指令调整方法,表现出色,能够保护隐私数据,同时提高模型的知识。我们的工作强调了LLM的潜在作为隐私保护学习器的潜力。

EcoAssistant: Using LLM Assistant More Affordably and Accurately

  • paper_url: http://arxiv.org/abs/2310.03046
  • repo_url: https://github.com/jieyuz2/ecoassistant
  • paper_authors: Jieyu Zhang, Ranjay Krishna, Ahmed H. Awadallah, Chi Wang
  • for: 本研究旨在提高大自然语言模型(LLM)作为助手 answering 需要外部知识的查询,以提高效率和准确性。
  • methods: 本研究提出了一个框架,名为 EcoAssistant,它使得 LLM 可以更加经济高效地回答 code-driven 查询。EcoAssistant 包括三部分:首先,它允许 LLM 助手与自动代码执行器进行交互,以便Iteratively 更新代码或根据执行结果生成答案。其次,我们使用层次结构的 LLM 助手,首先使用较弱、较便宜的 LLM 尝试回答查询,然后如果无法回答,则交给更强、更昂贵的 LLM 尝试。最后,我们从成功过去查询中检索出示例,以帮助后续查询。
  • results: 我们通过实验表明,EcoAssistant 可以提供更高效和准确的答案,比 GPT-4 高出10点成功率,仅使用 Less than 50% 的 GPT-4 的成本。
    Abstract Today, users ask Large language models (LLMs) as assistants to answer queries that require external knowledge; they ask about the weather in a specific city, about stock prices, and even about where specific locations are within their neighborhood. These queries require the LLM to produce code that invokes external APIs to answer the user's question, yet LLMs rarely produce correct code on the first try, requiring iterative code refinement upon execution results. In addition, using LLM assistants to support high query volumes can be expensive. In this work, we contribute a framework, EcoAssistant, that enables LLMs to answer code-driven queries more affordably and accurately. EcoAssistant contains three components. First, it allows the LLM assistants to converse with an automatic code executor to iteratively refine code or to produce answers based on the execution results. Second, we use a hierarchy of LLM assistants, which attempts to answer the query with weaker, cheaper LLMs before backing off to stronger, expensive ones. Third, we retrieve solutions from past successful queries as in-context demonstrations to help subsequent queries. Empirically, we show that EcoAssistant offers distinct advantages for affordability and accuracy, surpassing GPT-4 by 10 points of success rate with less than 50% of GPT-4's cost.
    摘要 EcoAssistant consists of three components:1. Conversing with an automatic code executor: LLM assistants can converse with an automatic code executor to iteratively refine code or produce answers based on execution results.2. Hierarchy of LLM assistants: We use a hierarchy of LLM assistants to answer queries with weaker, cheaper LLMs before backing off to stronger, more expensive ones.3. Retrieving solutions from past successful queries: We retrieve solutions from past successful queries as in-context demonstrations to help subsequent queries.Empirically, we show that EcoAssistant offers distinct advantages for affordability and accuracy, surpassing GPT-4 by 10 points of success rate with less than 50% of GPT-4's cost.

Improved Inference of Human Intent by Combining Plan Recognition and Language Feedback

  • paper_url: http://arxiv.org/abs/2310.02462
  • repo_url: None
  • paper_authors: Ifrah Idrees, Tian Yun, Naveen Sharma, Yunxin Deng, Nakul Gopalan, George Konidaris, Stefanie Tellex
  • for: 本研究旨在帮助机器人更好地理解人类计划和目标,尤其是在人类动作受到干扰时。
  • methods: 本研究使用对话 для计划认识(Dialogue for Goal Recognition,D4GR),允许机器人通过自然语言交互 rectify its belief in human progress。
  • results: 对比HTN,D4GR在两个模拟Domain中表现出优异,具体来说,在 kitchen 和 blocks Domain中,D4GR 在高度噪音下表现1%和4%更高的目标准确率,在 plan 准确率方面,D4GR 在 kitchen Domain中表现2%更高,在 blocks Domain中表现7%更高。
    Abstract Conversational assistive robots can aid people, especially those with cognitive impairments, to accomplish various tasks such as cooking meals, performing exercises, or operating machines. However, to interact with people effectively, robots must recognize human plans and goals from noisy observations of human actions, even when the user acts sub-optimally. Previous works on Plan and Goal Recognition (PGR) as planning have used hierarchical task networks (HTN) to model the actor/human. However, these techniques are insufficient as they do not have user engagement via natural modes of interaction such as language. Moreover, they have no mechanisms to let users, especially those with cognitive impairments, know of a deviation from their original plan or about any sub-optimal actions taken towards their goal. We propose a novel framework for plan and goal recognition in partially observable domains -- Dialogue for Goal Recognition (D4GR) enabling a robot to rectify its belief in human progress by asking clarification questions about noisy sensor data and sub-optimal human actions. We evaluate the performance of D4GR over two simulated domains -- kitchen and blocks domain. With language feedback and the world state information in a hierarchical task model, we show that D4GR framework for the highest sensor noise performs 1% better than HTN in goal accuracy in both domains. For plan accuracy, D4GR outperforms by 4% in the kitchen domain and 2% in the blocks domain in comparison to HTN. The ALWAYS-ASK oracle outperforms our policy by 3% in goal recognition and 7%in plan recognition. D4GR does so by asking 68% fewer questions than an oracle baseline. We also demonstrate a real-world robot scenario in the kitchen domain, validating the improved plan and goal recognition of D4GR in a realistic setting.
    摘要 Dialogue for Goal Recognition (D4GR) 是一种新的框架,用于在 partially observable domains 中进行计划和目标识别。它可以让机器人通过对人类动作的识别和语言反馈来更好地理解人类的计划和目标。在我们的实验中,我们发现 D4GR 在面临高度噪音的感知器件情况下表现比 HTN 高一 percentage point 的目标准确率和计划准确率。此外,D4GR 还可以避免向用户提问太多问题,相比 oracle 基线下的 Always-Ask 政策。我们还 validate 了 D4GR 在实际 kitchen 环境中的应用场景,证明了它在真实情况下的改进计划和目标识别能力。Here's a word-for-word translation of the text into Simplified Chinese:Dialogue for Goal Recognition (D4GR) 是一种新的框架,用于在 partially observable domains 中进行计划和目标识别。它可以让机器人通过对人类动作的识别和语言反馈来更好地理解人类的计划和目标。在我们的实验中,我们发现 D4GR 在面临高度噪音的感知器件情况下表现比 HTN 高一 percentage point 的目标准确率和计划准确率。此外,D4GR 还可以避免向用户提问太多问题,相比 oracle 基线下的 Always-Ask 政策。我们还 validate 了 D4GR 在实际 kitchen 环境中的应用场景,证明了它在真实情况下的改进计划和目标识别能力。

Learning Optimal Advantage from Preferences and Mistaking it for Reward

  • paper_url: http://arxiv.org/abs/2310.02456
  • repo_url: https://github.com/Stephanehk/Learning-OA-From-Prefs
  • paper_authors: W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum
  • for: 本文研究了从人类反馈中学习抽象函数,具体来说是研究了RLHF中人类偏好的模型。
  • methods: 本文使用了人类反馈中的偏好来学习抽象函数,并对RLHF中的偏好模型进行了调查和分析。
  • results: 本文发现,假设人类偏好是基于奖励的时候,实际上是基于后悔的。这导致了学习的抽象函数不是真正的奖励函数,而是一个近似的优先级函数。此外,本文还发现,如果解决一个特定的坑,这种错误的假设不会对学习造成很大的影响,而且可以得到一个高度形态的奖励函数。
    Abstract We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return. Recent work casts doubt on the validity of this assumption, proposing an alternative preference model based upon regret. We investigate the consequences of assuming preferences are based upon partial return when they actually arise from regret. We argue that the learned function is an approximation of the optimal advantage function, $\hat{A^*_r}$, not a reward function. We find that if a specific pitfall is addressed, this incorrect assumption is not particularly harmful, resulting in a highly shaped reward function. Nonetheless, this incorrect usage of $\hat{A^*_r}$ is less desirable than the appropriate and simpler approach of greedy maximization of $\hat{A^*_r}$. From the perspective of the regret preference model, we also provide a clearer interpretation of fine tuning contemporary large language models with RLHF. This paper overall provides insight regarding why learning under the partial return preference model tends to work so well in practice, despite it conforming poorly to how humans give preferences.
    摘要 我们考虑使用人工智能来学习奖励函数从人类偏好中,这种学习方法被称为人类反馈学习(RLHF)。大多数最新的研究假设人类偏好是基于每段路径段的奖励,或者它们的归还。但是,最近的研究表明这个假设可能不正确,而是基于 regret的偏好模型。我们调查了假设偏好是基于归还的结果,而不是奖励的结果的后果,并 argue that the learned function is an approximation of the optimal advantage function, $\hat{A^*_r}$, not a reward function. 我们发现,如果一个特定的陷阱被解决,这个错误的假设并不会导致严重的后果,结果是一个高度形状的奖励函数。然而,这个错误的使用 $\hat{A^*_r}$ 比较不愉快,比如使用更简单的方法,例如对 contemporary large language models 的 fine-tuning。从 regret 偏好模型的角度来看,我们也提供了更清晰的对RLHF的解释,这篇文章总体提供了关于为什么在实践中,学习基于偏好的方法能够工作 så 好的解释。

Low-Resource Languages Jailbreak GPT-4

  • paper_url: http://arxiv.org/abs/2310.02446
  • repo_url: None
  • paper_authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach
  • for: 防止大语言模型生成危险内容
  • methods: 使用翻译攻击绕过安全训练数据的语言不平等性
  • results: 成功绕过GPT-4的安全保护,79%的时间能够帮助用户达到危险目标,其他高/中资源语言的攻击成功率远低于这个水平。
    Abstract AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs' safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.
    摘要 人工智能安全训练和大语言模型(LLM)红人攻击是为了遏制不安全内容的产生。我们的工作暴露了这些安全机制的内置跨语言漏洞,由于安全训练数据的语言不均衡,可以成功绕过GPT-4的安全措施,通过翻译不安全的英语输入到低资源语言。在AdvBenchmark上,GPT-4与这些不安全翻译输入交互,提供了79%的时间可以帮助用户实现危害目标,与当前的监狱突破攻击相当或超越。其他高/中资源语言的攻击成功率较低,这表明跨语言漏洞主要适用于低资源语言。在过去,对低资源语言的有限训练主要影响了那些语言的 speaker,导致技术差距。然而,我们的工作表明一种重要的转变:这种不足现在对所有LLM用户构成风险。公共可用的翻译API使得任何人可以利用LLM的安全漏洞。因此,我们的工作呼吁于发展抗频率的多语言安全措施,以具有广泛的语言覆盖率。

Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

  • paper_url: http://arxiv.org/abs/2310.02440
  • repo_url: None
  • paper_authors: Jin Cheng, Marin Vlastelica, Pavel Kolev, Chenhao Li, Georg Martius
  • for: 本研究的目的是提出一种基于数据驱动控制的方法,以实现在机器人控制中获得多样化的行为。
  • methods: 本研究使用了受限优化视角,通过不同的奖励函数来定义多样化的策略,并通过吸引-撕裂奖励来控制多样化水平。
  • results: 研究中使用了一个本地导航任务,训练了一个四脚机器人,并成功实现了多样化的快速行为,包括成功绕过障碍物。
    Abstract Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal.
    摘要 尽管数据驱动控制在机器人中得到了许多成功应用,但抽取有意义的多样行为仍然是一个挑战。通常,完成任务所需的牺牲是获得多样性的代价。在许多场景下,任务要求通常是多种奖励项,每个奖励项需要不同的让拍。在这项工作中,我们采取了受限优化的质量多样性观点,并证明我们可以在其价值函数中做出约束,这些价值函数通过不同的奖励定义。与之前的工作一样,进一步控制多样性水平可以通过通过招引-抗拒奖励项来实现,这种奖励项源于万达力力。我们在本地导航任务中展示了我们的方法的有效性,其中一个四脚机器人需要在有限时间内达到目标。最后,我们在真实的12度自由度四脚机器人(Solo12)上训练了我们的策略,并示出了多样的快速行为,并成功避免了障碍物。

Multi-Agent Reinforcement Learning Based on Representational Communication for Large-Scale Traffic Signal Control

  • paper_url: http://arxiv.org/abs/2310.02435
  • repo_url: None
  • paper_authors: Rohit Bokade, Xiaoning Jin, Christopher Amato
  • for: 提高大规模交通信号控制(TSC)的效率和灵活性。
  • methods: 使用多代理人学习(MARL)和选择性通信策略,使代理人可以只在需要时使用通信频道,从而减少干扰和提高总性能。
  • results: 在 synthetic $4 \times 4$ 网络和实际的 Pasubio 社区网络上实现了最低的网络压力,代理人使用了 $\sim 47-65 %$ 的通信频道。精mitigation 研究还证明了选择性通信策略的有效性。
    Abstract Traffic signal control (TSC) is a challenging problem within intelligent transportation systems and has been tackled using multi-agent reinforcement learning (MARL). While centralized approaches are often infeasible for large-scale TSC problems, decentralized approaches provide scalability but introduce new challenges, such as partial observability. Communication plays a critical role in decentralized MARL, as agents must learn to exchange information using messages to better understand the system and achieve effective coordination. Deep MARL has been used to enable inter-agent communication by learning communication protocols in a differentiable manner. However, many deep MARL communication frameworks proposed for TSC allow agents to communicate with all other agents at all times, which can add to the existing noise in the system and degrade overall performance. In this study, we propose a communication-based MARL framework for large-scale TSC. Our framework allows each agent to learn a communication policy that dictates "which" part of the message is sent "to whom". In essence, our framework enables agents to selectively choose the recipients of their messages and exchange variable length messages with them. This results in a decentralized and flexible communication mechanism in which agents can effectively use the communication channel only when necessary. We designed two networks, a synthetic $4 \times 4$ grid network and a real-world network based on the Pasubio neighborhood in Bologna. Our framework achieved the lowest network congestion compared to related methods, with agents utilizing $\sim 47-65 \%$ of the communication channel. Ablation studies further demonstrated the effectiveness of the communication policies learned within our framework.
    摘要 traffic signal control (TSC) 是智能交通系统中的一个挑战,通过多代理激励学习 (MARL) 进行解决。而中央化方法通常对大规模 TSC 问题不可行,而分布式方法则提供了可扩展性,但引入了新的挑战,如部分可见性。通信在分布式 MARL 中扮演了关键角色,代理需要通过消息交换来更好地理解系统并实现有效协调。深度 MARL 已经用于启用代理之间的交流,但大多数深度 MARL 通信框架在 TSC 中允许代理与所有其他代理进行通信,这会增加系统中的噪音并降低总性能。在本研究中,我们提出了一种基于通信的分布式 MARL 框架 для大规模 TSC。我们的框架允许每个代理学习一个通信策略,这个策略决定“向哪里”发送“什么”部分的消息。这种分布式和灵活的通信机制使代理可以只在需要时使用通信频道,从而有效地利用通信频道。我们设计了两个网络:一个 synthetic 的 $4 \times 4$ 网格网络和一个基于博洛尼亚的 Pasubio neighborghood 网络。我们的框架在相关方法中实现了最低的网络拥塞率,代理在通信频道上使用了 $\sim 47-65 \%$。剥离研究还证明了我们的框架中学习的通信策略的效果。

Episodic Memory Theory for the Mechanistic Interpretation of Recurrent Neural Networks

  • paper_url: http://arxiv.org/abs/2310.02430
  • repo_url: None
  • paper_authors: Arjun Karuvally, Peter Delmastro, Hava T. Siegelmann
  • For: The paper aims to provide a deeper understanding of the internal mechanisms of Recurrent Neural Networks (RNNs) and their relationship to human memory.* Methods: The paper proposes the Episodic Memory Theory (EMT), which conceptualizes RNNs as discrete-time analogs of the General Sequential Episodic Memory Model. The authors also introduce a set of algorithmic tasks to probe the variable binding behavior in RNNs and develop a mathematically rigorous circuit to facilitate variable binding.* Results: The paper shows that trained RNNs consistently converge to the variable binding circuit, indicating universality in the dynamics of RNNs. The authors also develop an algorithm to define a privileged basis, which enhances the interpretability of the learned parameters and hidden states of RNNs.Here is the same information in Simplified Chinese:* For: 本研究旨在深入理解循环神经网络(RNN)的内部机制以及它们与人类记忆之间的关系。* Methods: 本研究提出了 episodic memory theory(EMT),它将 RNN 看作是时间排序的 discrete-time 分析。作者还提出了一系列用于探索 RNN 中变量绑定行为的算法任务,并开发了一个正则的数学圈来促进变量绑定。* Results: 研究显示,训练后 RNN 通常会 converge 到变量绑定圈,这表明 RNN 的动态是统一的。作者还开发了一种算法来定义特权基底,该基底可以增强 RNN 学习的参数和隐藏状态的解释性。
    Abstract Understanding the intricate operations of Recurrent Neural Networks (RNNs) mechanistically is pivotal for advancing their capabilities and applications. In this pursuit, we propose the Episodic Memory Theory (EMT), illustrating that RNNs can be conceptualized as discrete-time analogs of the recently proposed General Sequential Episodic Memory Model. To substantiate EMT, we introduce a novel set of algorithmic tasks tailored to probe the variable binding behavior in RNNs. Utilizing the EMT, we formulate a mathematically rigorous circuit that facilitates variable binding in these tasks. Our empirical investigations reveal that trained RNNs consistently converge to the variable binding circuit, thus indicating universality in the dynamics of RNNs. Building on these findings, we devise an algorithm to define a privileged basis, which reveals hidden neurons instrumental in the temporal storage and composition of variables, a mechanism vital for the successful generalization in these tasks. We show that the privileged basis enhances the interpretability of the learned parameters and hidden states of RNNs. Our work represents a step toward demystifying the internal mechanisms of RNNs and, for computational neuroscience, serves to bridge the gap between artificial neural networks and neural memory models.
    摘要

AXNav: Replaying Accessibility Tests from Natural Language

  • paper_url: http://arxiv.org/abs/2310.02424
  • repo_url: None
  • paper_authors: Maryam Taeb, Amanda Swearngin, Eldon Schoop, Ruijia Cheng, Yue Jiang, Jeffrey Nichols
    for: 这个论文是为了支持访问性测试而开发的一种自然语言基于的测试工作流程。methods: 这个系统使用自然语言处理和图像理解模型来执行手动访问性测试,并生成一个分章、可导航的视频。results: 在一个10名参与者的用户研究中,参与者表示这个工具会很有用于他们的当前工作,并表现与手动测试方式相似。研究还揭示了未来使用LLMs进行访问性测试的可能性。
    Abstract Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecycle. Unfortunately, manual testing can be tedious, often has an overwhelming scope, and can be difficult to schedule amongst other development milestones. Recently, Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs, however to our knowledge no one has yet explored their use in controlling assistive technologies for the purposes of supporting accessibility testing. In this paper, we explore the requirements of a natural language based accessibility testing workflow, starting with a formative study. From this we build a system that takes as input a manual accessibility test (e.g., ``Search for a show in VoiceOver'') and uses an LLM combined with pixel-based UI Understanding models to execute the test and produce a chaptered, navigable video. In each video, to help QA testers we apply heuristics to detect and flag accessibility issues (e.g., Text size not increasing with Large Text enabled, VoiceOver navigation loops). We evaluate this system through a 10 participant user study with accessibility QA professionals who indicated that the tool would be very useful in their current work and performed tests similarly to how they would manually test the features. The study also reveals insights for future work on using LLMs for accessibility testing.
    摘要 开发者和质量保证测试员经常采用手动测试来测试产品的可用性功能。然而,手动测试可能是压力很大,让人感觉压力很大,并且可能与其他开发阶段的时间安排不协调。在最近,大型自然语言模型(LLM)已经用于许多任务,包括 UI 自动化。然而,我们知道没有任何一个研究用于控制助助技术来支持可用性测试。在这篇论文中,我们探讨了可用性测试工作流的自然语言需求,从而开始一个形成性研究。我们将手动可用性测试(例如,“搜索voiceover中的节目”)作为输入,使用 LLM 和像素基的 UI 理解模型来执行测试并生成一个分割、导航的视频。在每个视频中,我们应用了规则来检测和标记可用性问题(例如,文本大小不随大字体开启增加)。我们通过10名参与者的用户研究,发现这个工具会在现有工作中对可用性测试非常有用,并且与手动测试相似。研究还揭示了将 LLM 应用于可用性测试的未来工作方向。

OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

  • paper_url: http://arxiv.org/abs/2310.02422
  • repo_url: None
  • paper_authors: Kuntai Du, Yuhan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, Yuyang Huang, Ganesh Ananthanarayanan, Junchen Jiang
    for:* 这个论文旨在提高流媒体数据中深度学习推理的性能,包括视频中对象检测和音频波中文本提取。methods:* 这个论文使用了一种名为OneAdapt的适配策略,利用梯度上升策略来适配配置螺旋钢,以提高深度学习模型的推理精度。results:* 相比之前的适配策略,OneAdapt可以在不同的配置螺旋钢下减少网络带宽使用量和GPU资源使用量,同时保持相同的准确率或提高准确率。
    Abstract Deep learning inference on streaming media data, such as object detection in video or LiDAR feeds and text extraction from audio waves, is now ubiquitous. To achieve high inference accuracy, these applications typically require significant network bandwidth to gather high-fidelity data and extensive GPU resources to run deep neural networks (DNNs). While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs. This paper presents OneAdapt, which meets these requirements by leveraging a gradient-ascent strategy to adapt configuration knobs. The key idea is to embrace DNNs' differentiability to quickly estimate the accuracy's gradient to each configuration knob, called AccGrad. Specifically, OneAdapt estimates AccGrad by multiplying two gradients: InputGrad (i.e. how each configuration knob affects the input to the DNN) and DNNGrad (i.e. how the DNN input affects the DNN inference output). We evaluate OneAdapt across five types of configurations, four analytic tasks, and five types of input data. Compared to state-of-the-art adaptation schemes, OneAdapt cuts bandwidth usage and GPU usage by 15-59% while maintaining comparable accuracy or improves accuracy by 1-5% while using equal or fewer resources.
    摘要 深度学习推理在流媒体数据上进行检测对象在视频或LiDAR耦合中的检测,以及从音频波中提取文本,现在已经普遍存在。为了实现高精度推理,这些应用通常需要很大的网络带宽和广泛的GPU资源来运行深度神经网络(DNN)。然而,高需求的网络带宽和GPU资源可以通过优化配置螺旋来减少,但现有的适配技术无法同时满足以下三个需求:1. 适配配置(i)的过程具有最小的额外GPU或带宽开销。2. 基于数据如何影响最终DNN的准确性来做近似优化的决策。3. 对配置螺旋进行范围内的适配。本文介绍了OneAdapt,它可以同时满足这些需求。OneAdapt利用了梯度上升策略来适配配置螺旋。关键思想是利用DNN的导数 differentiability来快速估算每个配置螺旋的准确性 gradient,称为AccGrad。具体来说,OneAdapt使用两个梯度的乘积来估算AccGrad:输入梯度(即每个配置螺旋对输入到DNN的影响)和DNN梯度(即输入到DNN的影响对DNN的推理输出)。我们对OneAdapt进行了五种配置、四种分析任务和五种输入数据的评估。相比之下,OneAdapt在带宽和GPU使用量方面减少了15-59%,同时维持了相对较高的准确性或提高了准确性1-5%,使用相同或少于资源。

Can a student Large Language Model perform as well as it’s teacher?

  • paper_url: http://arxiv.org/abs/2310.02421
  • repo_url: None
  • paper_authors: Sia Gholami, Marwan Omar
  • for: 这篇论文主要针对的是深度学习模型的部署问题,即在资源有限的环境中使用高精度模型的问题。
  • methods: 该论文提出了知识传承技术,即将高精度模型(教师模型)的知识传承给流lined模型(学生模型),以提高学生模型的性能。
  • results: 该论文通过仔细分析,探讨了成功的知识传承要素,包括学生模型的架构、教师模型的质量和超参数的均衡。
    Abstract The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. Knowledge distillation, a technique aiming to transfer knowledge from a high-capacity "teacher" model to a streamlined "student" model, emerges as a promising solution to this dilemma. This paper provides a comprehensive overview of the knowledge distillation paradigm, emphasizing its foundational principles such as the utility of soft labels and the significance of temperature scaling. Through meticulous examination, we elucidate the critical determinants of successful distillation, including the architecture of the student model, the caliber of the teacher, and the delicate balance of hyperparameters. While acknowledging its profound advantages, we also delve into the complexities and challenges inherent in the process. Our exploration underscores knowledge distillation's potential as a pivotal technique in optimizing the trade-off between model performance and deployment efficiency.
    摘要 现代深度学习模型的抬头复杂性,尽管达到了无 précédente 的准确性,却意外地引入了资源受限环境中的部署挑战。知识传递技术,目标是将知识从高容量 "老师" 模型传递到流lined "学生" 模型,被认为是解决这个困境的有望的解决方案。本文提供了知识传递 парадиг的全面概述,强调其基础原理,如软标签的利用和温度尺度的重要性。通过精心的检查,我们描述了成功传递的关键因素,包括学生模型的架构、老师模型的质量和敏感的超参数平衡。虽然承认其极大的优势,但我们也探讨了传递过程中的复杂性和挑战。我们的探索表明,知识传递可能成为优化模型性能和部署效率之间的关键技术。

Nugget 2D: Dynamic Contextual Compression for Scaling Decoder-only Language Models

  • paper_url: http://arxiv.org/abs/2310.02409
  • repo_url: None
  • paper_authors: Guanghui Qin, Corby Rosset, Ethan C. Chau, Nikhil Rao, Benjamin Van Durme
  • for: 提高慢速语言模型在长上下文中的缩放性能
  • methods: 使用动态上下文压缩,基于Qin & Van Durme (2023)的Nugget方法,对decoder-only语言模型进行扩展
  • results: 通过语言模型、问答和概要等实验,显示Nugget2D可以保持这些任务的能力,同时在解码过程中减少时间和空间开销,比如在自编码任务中,Nugget2D可以将上下文压缩到20倍的比例,保持BLEU分数在98%之间,实现近乎无损编码。
    Abstract Standard Transformer-based language models (LMs) scale poorly to long contexts. We propose a solution based on dynamic contextual compression, which extends the Nugget approach of Qin & Van Durme (2023) from BERT-like frameworks to decoder-only LMs. Our method models history as compressed "nuggets" which are trained to allow for reconstruction, and it can be initialized with off-the-shelf models such as LLaMA. We demonstrate through experiments in language modeling, question answering, and summarization that Nugget2D retains capabilities in these tasks, while drastically reducing the overhead during decoding in terms of time and space. For example, in the experiments of autoencoding, Nugget2D can shrink context at a 20x compression ratio with a BLEU score of 98% for reconstruction, achieving nearly lossless encoding.
    摘要 通用变换器基于语言模型(LM)在长上下文中表现不佳。我们提出一种解决方案,基于动态Contextual压缩,该方法在BERT类框架中的Qin & Van Durme(2023)的Nugget方法上进行扩展。我们的方法将历史作为压缩“块”,用于允许重建,并可以使用市场上的模型Initial化,如LLaMA。我们通过语言模型、问答和概要等实验表明,Nugget2D可以在这些任务中保持能力,同时在解码过程中减少时间和空间开销。例如,在自动编码实验中,Nugget2D可以将上下文压缩到20倍的压缩率,保持BLEU分数98%,实现近乎无损编码。

PCGPT: Procedural Content Generation via Transformers

  • paper_url: http://arxiv.org/abs/2310.02405
  • repo_url: None
  • paper_authors: Sajad Mohaghegh, Mohammad Amin Ramezan Dehnavi, Golnoosh Abdollahinejad, Matin Hashemi
  • for: 这个论文旨在提出一种基于线下强化学习和变换网络的PCG方法,以生成更加复杂和多样化的游戏内容。
  • methods: 该方法使用了一种基于变换器的autoregressive模型,模型着行为、状态和奖励的轨迹,利用变换器的自注意机制 capture时间相关性和 causal关系。
  • results: 实验结果表明,PCGPT在扮演游戏《苏可阔》中预测物品和它们的位置的任务上表现出色,生成的游戏内容更加复杂和多样化,并且在许多步骤上比前方法快速得到结果。
    Abstract The paper presents the PCGPT framework, an innovative approach to procedural content generation (PCG) using offline reinforcement learning and transformer networks. PCGPT utilizes an autoregressive model based on transformers to generate game levels iteratively, addressing the challenges of traditional PCG methods such as repetitive, predictable, or inconsistent content. The framework models trajectories of actions, states, and rewards, leveraging the transformer's self-attention mechanism to capture temporal dependencies and causal relationships. The approach is evaluated in the Sokoban puzzle game, where the model predicts items that are needed with their corresponding locations. Experimental results on the game Sokoban demonstrate that PCGPT generates more complex and diverse game content. Interestingly, it achieves these results in significantly fewer steps compared to existing methods, showcasing its potential for enhancing game design and online content generation. Our model represents a new PCG paradigm which outperforms previous methods.
    摘要 文章提出了PCGPT框架,这是一种新型的游戏内容生成(PCG)方法,使用了离线强化学习和转换网络。PCGPT使用基于转换器的抽象模型来逐步生成游戏等级,解决传统PCG方法中的困难,如重复、预测性和不一致的内容。框架模型了行为轨迹、状态和奖励的关系,利用转换器的自注意机制来捕捉时间相关性和 causal 关系。这种方法在拼图游戏中进行了实验,模型预测了需要的物品和它们的位置。实验结果表明,PCGPT生成的游戏内容更复杂和多样化,并且在比较少的步骤内完成了任务,表明它在游戏设计和在线内容生成方面具有潜在的应用前景。我们的模型代表了一种新的PCG paradigm,比 précédente 方法更高效。

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

  • paper_url: http://arxiv.org/abs/2310.04451
  • repo_url: https://github.com/sheltonliu-n/autodan
  • paper_authors: Xiaogeng Liu, Nan Xu, Muhao Chen, Chaowei Xiao
  • for: investigate jailbreak attacks on aligned large language models (LLMs) and develop a novel approach to automatically generate stealthy jailbreak prompts
  • methods: hierarchical genetic algorithm to generate stealthy jailbreak prompts that preserve semantic meaningfulness
  • results: AutoDAN demonstrates superior attack strength in cross-model transferability and cross-sample universality compared with the baseline, and can effectively bypass perplexity-based defense methods
    Abstract The aligned Large Language Models (LLMs) are powerful language understanding and decision-making tools that are created through extensive alignment with human feedback. However, these large models remain susceptible to jailbreak attacks, where adversaries manipulate prompts to elicit malicious outputs that should not be given by aligned LLMs. Investigating jailbreak prompts can lead us to delve into the limitations of LLMs and further guide us to secure them. Unfortunately, existing jailbreak techniques suffer from either (1) scalability issues, where attacks heavily rely on manual crafting of prompts, or (2) stealthiness problems, as attacks depend on token-based algorithms to generate prompts that are often semantically meaningless, making them susceptible to detection through basic perplexity testing. In light of these challenges, we intend to answer this question: Can we develop an approach that can automatically generate stealthy jailbreak prompts? In this paper, we introduce AutoDAN, a novel jailbreak attack against aligned LLMs. AutoDAN can automatically generate stealthy jailbreak prompts by the carefully designed hierarchical genetic algorithm. Extensive evaluations demonstrate that AutoDAN not only automates the process while preserving semantic meaningfulness, but also demonstrates superior attack strength in cross-model transferability, and cross-sample universality compared with the baseline. Moreover, we also compare AutoDAN with perplexity-based defense methods and show that AutoDAN can bypass them effectively.
    摘要 协调大语言模型(LLM)是强大的语言理解和决策工具,通过广泛的协调和人类反馈来创建。然而,这些大模型仍然易受到破禁攻击,敌对者可以通过 manipulate 提示来引发恶意输出,这些输出不应该由协调 LLM 提供。研究破禁提示可以让我们更深入了解 LLM 的限制,并引导我们如何加以安全。然而,现有的破禁技术受到了(1)扩展性问题,破禁攻击很大程度上依赖于手动制作提示,以及(2)隐蔽性问题,攻击都 rely 于基于 токен 的算法来生成提示,这些提示通常具有语义意义,使其易于检测。为了解决这些挑战,我们想回答以下问题:可以自动生成隐蔽破禁提示吗?在这篇论文中,我们介绍了一种新的破禁攻击方法——AutoDAN。AutoDAN 可以通过我们特制的层次遗传算法自动生成隐蔽破禁提示,并且能够保持语义意义。广泛的评估表明,AutoDAN 不仅自动化了过程,同时也示出了比基线更高的攻击力,包括跨模型传输性和跨样本一致性。此外,我们还与基于混淆测试的防御方法进行了比较,并证明 AutoDAN 可以效果地绕过它们。

SE(3)-Stochastic Flow Matching for Protein Backbone Generation

  • paper_url: http://arxiv.org/abs/2310.02391
  • repo_url: https://github.com/dreamfold/foldflow
  • paper_authors: Avishek Joey Bose, Tara Akhound-Sadegh, Kilian Fatras, Guillaume Huguet, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong
  • For: The paper is focused on the computational design of novel protein structures, specifically using a series of novel generative models based on the flow-matching paradigm over 3D rigid motions (i.e. the group SE(3)).* Methods: The paper introduces three novel generative models, starting with FoldFlow-Base, which is a simulation-free approach to learning deterministic continuous-time dynamics and matching invariant target distributions on SE(3). The authors then accelerate training by incorporating Riemannian optimal transport to create FoldFlow-OT, and finally, they design FoldFlow-SFM, which couples both Riemannian OT and simulation-free training to learn stochastic continuous-time dynamics over SE(3).* Results: The paper reports high-quality designable, diverse, and novel protein backbone samples generated using the FoldFlow models, validating their effectiveness in the computational design of novel protein structures.
    Abstract The computational design of novel protein structures has the potential to impact numerous scientific disciplines greatly. Toward this goal, we introduce $\text{FoldFlow}$ a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3\text{D}$ rigid motions -- i.e. the group $\text{SE(3)}$ -- enabling accurate modeling of protein backbones. We first introduce $\text{FoldFlow-Base}$, a simulation-free approach to learning deterministic continuous-time dynamics and matching invariant target distributions on $\text{SE(3)}$. We next accelerate training by incorporating Riemannian optimal transport to create $\text{FoldFlow-OT}$, leading to the construction of both more simple and stable flows. Finally, we design $\text{FoldFlow-SFM}$ coupling both Riemannian OT and simulation-free training to learn stochastic continuous-time dynamics over $\text{SE(3)}$. Our family of $\text{FoldFlow}$ generative models offer several key advantages over previous approaches to the generative modeling of proteins: they are more stable and faster to train than diffusion-based approaches, and our models enjoy the ability to map any invariant source distribution to any invariant target distribution over $\text{SE(3)}$. Empirically, we validate our FoldFlow models on protein backbone generation of up to $300$ amino acids leading to high-quality designable, diverse, and novel samples.
    摘要 Computational 设计新蛋白结构具有很大的科学影响 potential. 为了实现这个目标,我们介绍 $\text{FoldFlow}$ 系列的新生成模型,基于流匹配方法在 $3\text{D}$ 摆动上的模型,具有高精度蛋白质量模型。我们首先介绍 $\text{FoldFlow-Base}$,一种不需要 simulate 的方法,用于学习 deterministic 连续时间动力学和匹配恒定目标分布在 $\text{SE(3)}$ 上。然后,我们通过 incorporating 里曼尼安全运输来加速训练,创建更简单和稳定的流。最后,我们设计 $\text{FoldFlow-SFM}$,将两种方法相互融合,以学习连续时间动力学在 $\text{SE(3)}$ 上的随机性。我们的 $\text{FoldFlow}$ 生成模型具有许多优势,比如更稳定和更快速地训练,而且我们的模型可以将任何恒定源分布映射到任何恒定目标分布上。在实验中,我们验证我们的 FoldFlow 模型在蛋白质量上的生成,可以获得高质量、多样化和创新的样本。

ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks

  • paper_url: http://arxiv.org/abs/2310.02372
  • repo_url: None
  • paper_authors: Ritesh Kumar, Saurabh Goyal, Ashish Verma, Vatche Isahagian
  • for: 本研究旨在提高文档理解和数据EXTRACTION领域中KVP提取模型的泛化能力,以便在新的类别加入模型时,不需要重新标注整个训练集和重新训练模型。
  • methods: 我们提出了一种基于原型网络的KVP提取模型,即ProtoNER,它不需要原始训练集,也不需要生成中间的 sintetic数据,而且使用了混合损失函数,以保持模型对原来类别的知识,同时学习新增类别。
  • results: 实验结果显示,ProtoNER经过迁移30个样本后,能够达到新增类别的同等效果,与常规模型经过2600个样本迁移后的效果相当。
    Abstract Key value pair (KVP) extraction or Named Entity Recognition(NER) from visually rich documents has been an active area of research in document understanding and data extraction domain. Several transformer based models such as LayoutLMv2, LayoutLMv3, and LiLT have emerged achieving state of the art results. However, addition of even a single new class to the existing model requires (a) re-annotation of entire training dataset to include this new class and (b) retraining the model again. Both of these issues really slow down the deployment of updated model. \\ We present \textbf{ProtoNER}: Prototypical Network based end-to-end KVP extraction model that allows addition of new classes to an existing model while requiring minimal number of newly annotated training samples. The key contributions of our model are: (1) No dependency on dataset used for initial training of the model, which alleviates the need to retain original training dataset for longer duration as well as data re-annotation which is very time consuming task, (2) No intermediate synthetic data generation which tends to add noise and results in model's performance degradation, and (3) Hybrid loss function which allows model to retain knowledge about older classes as well as learn about newly added classes.\\ Experimental results show that ProtoNER finetuned with just 30 samples is able to achieve similar results for the newly added classes as that of regular model finetuned with 2600 samples.
    摘要 “对于视觉丰富的文档中的键值对(KVP)抽象或名称实体识别(NER),有很多研究在文档理解和数据提取领域。一些基于Transformer的模型,如LayoutLMv2、LayoutLMv3和LiLT,已经取得了州际级的结果。但是,添加新的类别到现有模型中需要(1)重新标注整个训练 dataset,以包括这个新的类别,并(2)重新训练模型。这两个问题都会导致模型的部署受阻。”“我们提出了一个名为ProtoNER的专案网络基于模型,可以将新的类别添加到现有模型中,而不需要大量的新的标注训练数据。ProtoNER的关键贡献包括:(1)不需要原始训练 dataset,这解除了保留原始训练 dataset的时间问题和时间consuming的标注工作,(2)不需要生成中间的 sintetic 数据,这减少了模型的性能下降,(3)混合的损失函数,让模型保留旧有类别的知识,同时学习新添加的类别。”“实验结果显示,ProtoNER 与常规模型相比,仅需要30个标注数据进行 fine-tuning,就能够在新添加的类别上达到相同的结果。”

A Deep Reinforcement Learning Approach for Interactive Search with Sentence-level Feedback

  • paper_url: http://arxiv.org/abs/2310.03043
  • repo_url: None
  • paper_authors: Jianghong Zhou, Joyce C. Ho, Chen Lin, Eugene Agichtein
  • for: 本研究旨在提高搜寻系统的搜寻精度,通过融合用户的互动反馈,从而提供更好的搜寻体验。
  • methods: 本研究使用深度Q学习(DQ)方法,将BERT模型与用户互动反馈结合,选择重要的句子,以提高搜寻精度。此外,本研究还提出了两种 Mechanism来更好地探索优化的动作空间。
  • results: 本研究在三个搜寻dataset上验证了DQrank的效能,与前一代RL方法相比,DQrank能够提高搜寻精度至少12%。此外,本研究还进行了细部抽象研究,结果显示每个模型元件都能够有效地提取和累累长期的用户互动反馈效果。
    Abstract Interactive search can provide a better experience by incorporating interaction feedback from the users. This can significantly improve search accuracy as it helps avoid irrelevant information and captures the users' search intents. Existing state-of-the-art (SOTA) systems use reinforcement learning (RL) models to incorporate the interactions but focus on item-level feedback, ignoring the fine-grained information found in sentence-level feedback. Yet such feedback requires extensive RL action space exploration and large amounts of annotated data. This work addresses these challenges by proposing a new deep Q-learning (DQ) approach, DQrank. DQrank adapts BERT-based models, the SOTA in natural language processing, to select crucial sentences based on users' engagement and rank the items to obtain more satisfactory responses. We also propose two mechanisms to better explore optimal actions. DQrank further utilizes the experience replay mechanism in DQ to store the feedback sentences to obtain a better initial ranking performance. We validate the effectiveness of DQrank on three search datasets. The results show that DQRank performs at least 12% better than the previous SOTA RL approaches. We also conduct detailed ablation studies. The ablation results demonstrate that each model component can efficiently extract and accumulate long-term engagement effects from the users' sentence-level feedback. This structure offers new technologies with promised performance to construct a search system with sentence-level interaction.
    摘要 <> transtable text into Simplified Chinese.<>互动搜索可以提供更好的体验,通过 incorporating 互动反馈从用户。这可以显著提高搜索准确性,因为它帮助避免无关信息和捕捉用户搜索意图。现有的状态之一 (SOTA) 系统使用 reinforcement learning (RL) 模型来 incorporating 互动,但是它们围绕 item-level 反馈进行围绕,忽略了 sentence-level 反馈中的细化信息。然而,这种反馈需要广泛的 RL 动作空间探索和大量标注数据。这个工作解决了这些挑战,通过提议一种新的深度 Q-学习 (DQ) 方法,DQrank。DQrank 采用 BERT-based 模型,当前最佳在自然语言处理中,选择用户互动中的关键句并将项目排序以获得更满足的回答。我们还提出了两种机制来更好地探索优化的动作。DQrank 进一步利用 DQ 中的经验回放机制,将反馈句子存储在 DQ 中,以获得更好的初始排名性能。我们验证了 DQRank 的效果,结果显示,DQRank 在三个搜索数据集上表现至少12%更好于之前的 SOTA RL 方法。我们还进行了详细的剖析研究。剖析结果表明,每个模型组件都能有效地提取和积累用户互动中的长期参与效果。这种结构提供了新的技术,用于构建基于 sentence-level 互动的搜索系统。

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.02360
  • repo_url: None
  • paper_authors: Finn Rietz, Stefan Heinrich, Erik Schaffernicht, Johannes Andreas Stork
  • for: solves complex tasks by breaking them down into elementary subtasks and reusing subtask solutions
  • methods: value decomposition, prioritized soft Q-decomposition (PSQD)
  • results: successful learning, reuse, and adaptation results for simulated robot control tasks, and offline learning results without new environment interaction during adaptation.
    Abstract Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition.
    摘要

On the definition of toxicity in NLP

  • paper_url: http://arxiv.org/abs/2310.02357
  • repo_url: None
  • paper_authors: Sergey Berezin, Reza Farahbakhsh, Noel Crespi
  • for: 提出了一个新的、基于着压力水平的毒性定义,以提高毒性检测任务的 объектив性和上下文感知。
  • methods: 该文提出了一种基于新定义的数据集创建和模型训练方法。
  • results: 该文未提出实际实验结果,但预期通过新定义和方法提高毒性检测任务的准确性和稳定性。
    Abstract The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out. This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
    摘要 基本问题在毒性探测任务中是毒性不具体定义。这导致我们需要基于主观和暧昧的数据进行模型训练,从而导致非Robust和不准确的结果:垃圾入口垃圾出口。本工作提出了一个新的压力水平基本定义毒性,以便具有对象和上下文意识。同时,我们还描述了应用这个新定义到数据集创建和模型训练的可能方法。

Reasoning about Intuitionistic Computation Tree Logic

  • paper_url: http://arxiv.org/abs/2310.02355
  • repo_url: None
  • paper_authors: Davide Catta, Vadim Malvone, Aniello Murano
  • for: 本 paper 定义了一种INTRODUCTION TO COMPUTATION TREE LOGIC(CTL)的INTUITIONISTIC版本。
  • methods: 本 paper 首先介绍了INTUITIONISTIC逻辑的semantic特征,然后研究了这些特征在正式验证方面的可能性。接着,本 paper 定义了INTUITIONISTIC CTL的语法和 semantics,并研究了其一些简单的性质。
  • results: 本 paper conclude 表明了INTUITIONISTIC CTL中的一些固定点规则不是VALID。
    Abstract In this paper, we define an intuitionistic version of Computation Tree Logic. After explaining the semantic features of intuitionistic logic, we examine how these characteristics can be interesting for formal verification purposes. Subsequently, we define the syntax and semantics of our intuitionistic version of CTL and study some simple properties of the so obtained logic. We conclude by demonstrating that some fixed-point axioms of CTL are not valid in the intuitionistic version of CTL we have defined.
    摘要 在这篇论文中,我们定义了一种INTRODUCTION Computation Tree Logic的INTUITIONISTIC版本。我们首先介绍了INTUITIONISTIC逻辑的semantic特征,然后我们explore了这些特征在正式验证方面的可能性。接着,我们定义了INTUITIONISTIC CTL的语法和 semantics,并研究了这种逻辑的一些简单特性。最后,我们示例了INTUITIONISTIC CTL中的一些固定点论证不成立。Here's the breakdown of the translation:* "INTRODUCTION" is translated as "INTRODUCTION" (同 "Introduction" in English).* "Computation Tree Logic" is translated as "计算树逻辑" (shortened as "CTL" in the translation).* "INTUITIONISTIC" is translated as "INTUITIONISTIC" (同 "intuitionistic" in English).* "semantic" is translated as "semantic" (同 "semantic" in English).* "特征" is translated as "特征" (meaning "characteristics" or "features").* "explore" is translated as "explore" (同 "explore" in English).* "语法" is translated as "语法" (meaning "syntax" or "grammar").* "semantics" is translated as "semantics" (同 "semantics" in English).* "逻辑" is translated as "逻辑" (meaning "logic").* "可能性" is translated as "可能性" (meaning "possibility" or "potential").* "示例" is translated as "示例" (meaning "example" or "illustration").* "不成立" is translated as "不成立" (meaning "not valid" or "not hold").

Rollout Heuristics for Online Stochastic Contingent Planning

  • paper_url: http://arxiv.org/abs/2310.02345
  • repo_url: None
  • paper_authors: Oded Blumenthal, Guy Shani
    for:这个论文主要是为了解决具有部分可见性和随机行动的决策问题。methods:这篇论文使用了 Monte-Carlo 搜索算法,基于 UCT 算法,来决定下一个动作。它还使用了 rollout 策略来提供值估计,并且使用了域专业的优化策略来提高效果。results:这篇论文提出了基于 POMDP 的决策问题的解决方案,使用了域独立的优化策略,包括 h_add 优化策略和信息价值估计。这些策略可以帮助解决具有部分可见性和随机行动的决策问题。
    Abstract Partially observable Markov decision processes (POMDP) are a useful model for decision-making under partial observability and stochastic actions. Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform, using a Monte-Carlo tree search approach, based on the UCT (UCB applied to trees) algorithm for fully observable Markov-decision processes. POMCP develops an action-observation tree, and at the leaves, uses a rollout policy to provide a value estimate for the leaf. As such, POMCP is highly dependent on the rollout policy to compute good estimates, and hence identify good actions. Thus, many practitioners who use POMCP are required to create strong, domain-specific heuristics. In this paper, we model POMDPs as stochastic contingent planning problems. This allows us to leverage domain-independent heuristics that were developed in the planning community. We suggest two heuristics, the first is based on the well-known h_add heuristic from classical planning, and the second is computed in belief space, taking the value of information into account.
    摘要 部分可观测Markov决策过程(POMDP)是一种有用的模型,用于决策在部分可观测和随机动作下。部分可观测Monte-Carlo规划是一种在线算法,使用Monte-Carlo搜索算法,基于完全可观测Markov决策过程中的UCT(UCB应用于树)算法。POMC develops an action-observation tree, and at the leaves, uses a rollout policy to provide a value estimate for the leaf. Therefore, POMCP is highly dependent on the rollout policy to compute good estimates, and hence identify good actions. Many practitioners who use POMCP are required to create strong, domain-specific heuristics. 在这篇论文中,我们模型POMDP为随机规划问题。这allowed us to leveraging domain-independent heuristics that were developed in the planning community. We suggest two heuristics, the first is based on the well-known h_add heuristic from classical planning, and the second is computed in belief space, taking the value of information into account.

Autonomous Systems’ Safety Cases for use in UK Nuclear Environments

  • paper_url: http://arxiv.org/abs/2310.02344
  • repo_url: None
  • paper_authors: Christopher R. Anderson, Louise A. Dennis
  • for: 本研究旨在开发一份描述Autonomous robot在核电站的部署安全情况的安全案例,以便在英国核电站进行部署。
  • methods: 本研究使用了一种具有人工智能功能的假设机器人,并使用了一系列的安全措施和技术来确保机器人的安全部署。
  • results: 本研究通过展示了一个可能的安全案例,以便在未来继续与核站licensees、ONR、行业和学术界进行讨论和开发工具。
    Abstract An overview of the process to develop a safety case for an autonomous robot deployment on a nuclear site in the UK is described and a safety case for a hypothetical robot incorporating AI is presented. This forms a first step towards a deployment, showing what is possible now and what may be possible with development of tools. It forms the basis for further discussion between nuclear site licensees, the Office for Nuclear Regulation (ONR), industry and academia.
    摘要 英国核站自主机器人部署的安全案例开发过程的概述,并提供了一个基于人工智能的机器人安全案例。这是一个开始,用于展示当前可能的情况和可能的发展。它可以作为与核站许可人、英国核管理局(ONR)、行业和学术界进行进一步讨论的基础。

Learning Interpretable Deep Disentangled Neural Networks for Hyperspectral Unmixing

  • paper_url: http://arxiv.org/abs/2310.02340
  • repo_url: https://github.com/ricardoborsoi/IDNet_release
  • paper_authors: Ricardo Augusto Borsoi, Deniz Erdoğmuş, Tales Imbiriba
  • for: 本研究提出了一种新的可解释深度学习方法,用于解决受到非理想条件的谱谱分解问题。
  • methods: 该方法基于可变深度学习框架,并使用推断学习来分离质量和元素。模型通过练习自然语言处理技术来学习,并使用自适应策略来提高性能。
  • results: 实验结果表明,提出的方法可以比state-of-the-art算法提高解决谱谱分解问题的性能。
    Abstract Although considerable effort has been dedicated to improving the solution to the hyperspectral unmixing problem, non-idealities such as complex radiation scattering and endmember variability negatively impact the performance of most existing algorithms and can be very challenging to address. Recently, deep learning-based frameworks have been explored for hyperspectral umixing due to their flexibility and powerful representation capabilities. However, such techniques either do not address the non-idealities of the unmixing problem, or rely on black-box models which are not interpretable. In this paper, we propose a new interpretable deep learning method for hyperspectral unmixing that accounts for nonlinearity and endmember variability. The proposed method leverages a probabilistic variational deep-learning framework, where disentanglement learning is employed to properly separate the abundances and endmembers. The model is learned end-to-end using stochastic backpropagation, and trained using a self-supervised strategy which leverages benefits from semi-supervised learning techniques. Furthermore, the model is carefully designed to provide a high degree of interpretability. This includes modeling the abundances as a Dirichlet distribution, the endmembers using low-dimensional deep latent variable representations, and using two-stream neural networks composed of additive piecewise-linear/nonlinear components. Experimental results on synthetic and real datasets illustrate the performance of the proposed method compared to state-of-the-art algorithms.
    摘要 尽管在干扰性较高的干扰性混合问题中投入了大量努力,但是大多数现有算法的性能仍然受到复杂的辐射散射和成分变化的影响。最近,深度学习基于的框架在干扰性混合中得到了广泛的应用,因为它们具有灵活性和强大的表达能力。然而,这些技术 Either do not address the non-idealities of the unmixing problem, or rely on black-box models which are not interpretable。在这篇论文中,我们提出了一种新的可解释的深度学习方法,用于干扰性混合。这种方法基于可变深度学习框架,其中包含分解混合的部署学习。我们使用批处理反射来学习整个模型,并使用自我超级vised的策略来训练模型。此外,我们尽可能地设计了模型,以提供高度的可解释性。这包括将含量模型为DIRICHTLET分布,使用低维深度强化变量来表示结构分子,并使用两栅神经网络,其中每个栅包含可变的积分非线性/线性组件。实验结果表明,提出的方法在synthetic和实际数据上的性能较高,与现有算法相比。

Approximately Equivariant Quantum Neural Network for $p4m$ Group Symmetries in Images

  • paper_url: http://arxiv.org/abs/2310.02323
  • repo_url: None
  • paper_authors: Su Yeon Chang, Michele Grossi, Bertrand Le Saux, Sofia Vallecorsa
  • for: 这个论文主要关注于开发一种具有平面四面体对称的含瑞度量量变量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量 Quantum Neural Networks (QNNs) 是一种可以快速度量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量量 Quantum Neural Networks (QNNs) are suggested as one of the quantum algorithms which can be efficiently simulated with a low depth on near-term quantum hardware in the presence of noises. However, their performance highly relies on choosing the most suitable architecture of Variational Quantum Algorithms (VQAs), and the problem-agnostic models often suffer issues regarding trainability and generalization power. As a solution, the most recent works explore Geometric Quantum Machine Learning (GQML) using QNNs equivariant with respect to the underlying symmetry of the dataset. GQML adds an inductive bias to the model by incorporating the prior knowledge on the given dataset and leads to enhancing the optimization performance while constraining the search space. This work proposes equivariant Quantum Convolutional Neural Networks (EquivQCNNs) for image classification under planar $p4m$ symmetry, including reflectional and $90^\circ$ rotational symmetry. We present the results tested in different use cases, such as phase detection of the 2D Ising model and classification of the extended MNIST dataset, and compare them with those obtained with the non-equivariant model, proving that the equivariance fosters better generalization of the model.
    Abstract Quantum Neural Networks (QNNs) are suggested as one of the quantum algorithms which can be efficiently simulated with a low depth on near-term quantum hardware in the presence of noises. However, their performance highly relies on choosing the most suitable architecture of Variational Quantum Algorithms (VQAs), and the problem-agnostic models often suffer issues regarding trainability and generalization power. As a solution, the most recent works explore Geometric Quantum Machine Learning (GQML) using QNNs equivariant with respect to the underlying symmetry of the dataset. GQML adds an inductive bias to the model by incorporating the prior knowledge on the given dataset and leads to enhancing the optimization performance while constraining the search space. This work proposes equivariant Quantum Convolutional Neural Networks (EquivQCNNs) for image classification under planar $p4m$ symmetry, including reflectional and $90^\circ$ rotational symmetry. We present the results tested in different use cases, such as phase detection of the 2D Ising model and classification of the extended MNIST dataset, and compare them with those obtained with the non-equivariant model, proving that the equivariance fosters better generalization of the model.
    摘要

Contrastive Post-training Large Language Models on Data Curriculum

  • paper_url: http://arxiv.org/abs/2310.02263
  • repo_url: None
  • paper_authors: Canwen Xu, Corby Rosset, Luciano Del Corro, Shweti Mahajan, Julian McAuley, Jennifer Neville, Ahmed Hassan Awadallah, Nikhil Rao
  • for: 这 paper 是 exploring contrastive post-training techniques for alignment of large language models (LLMs) to human preferences.
  • methods: The paper uses automatically constructed preference pairs from multiple models of varying strengths (e.g., InstructGPT, ChatGPT, and GPT-4) for contrastive post-training. The authors compare the contrastive techniques of SLiC and DPO to SFT baselines and find that DPO provides a step-function improvement. Additionally, the authors explore a data curriculum learning scheme for contrastive post-training.
  • results: The paper finds that contrastive post-training further improves the performance of Orca, a state-of-the-art instruction learning model tuned with GPT-4 outputs, to exceed that of ChatGPT.
    Abstract Alignment serves as an important step to steer large language models (LLMs) towards human preferences. In this paper, we explore contrastive post-training techniques for alignment by automatically constructing preference pairs from multiple models of varying strengths (e.g., InstructGPT, ChatGPT and GPT-4). We carefully compare the contrastive techniques of SLiC and DPO to SFT baselines and find that DPO provides a step-function improvement even after continueing SFT saturates. We also explore a data curriculum learning scheme for contrastive post-training, which starts by learning from "easier" pairs and transitioning to "harder" ones, which further improves alignment. Finally, we scale up our experiments to train with more data and larger models like Orca. Remarkably, contrastive post-training further improves the performance of Orca, already a state-of-the-art instruction learning model tuned with GPT-4 outputs, to exceed that of ChatGPT.
    摘要 匹配服务为大语言模型(LLM)引导的重要步骤。在这篇论文中,我们探索了多种强度不同的模型(如InstructGPT、ChatGPT和GPT-4)自动构建的偏好对的方法。我们仔细比较了SLiC和DPO的对偶技术和SFT基准,发现DPO提供了一个大幅提升,甚至在SFT已经饱和之后仍然提供提升。我们还探索了一种数据课程学习方案,该方案从“容易”的对 pairs 开始学习,然后过渡到“更加Difficult”的对 pairs,这有助于改善匹配。最后,我们扩大我们的实验,使用更多的数据和更大的模型如Orca进行训练,并发现对偶训练可以使Orca的性能超过ChatGPT。

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

  • paper_url: http://arxiv.org/abs/2310.02304
  • repo_url: None
  • paper_authors: Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai
  • for: 这个论文目的是使用自然语言处理技术来提高AI系统的性能。
  • methods: 论文使用了一种名为”思维树”的技术,该技术可以让语言模型通过多次调用来生成更好的输出。
  • results: 研究发现,使用语言模型感染的框架程序可以提高自己的性能,并且可以在小型下游任务中显示出显著的改善。此外,研究还发现了一些自我改进策略,包括扫描搜索、遗传算法和模拟热处理。
    Abstract Several recent advances in AI systems (e.g., Tree-of-Thoughts and Program-Aided Language Models) solve problems by providing a "scaffolding" program that structures multiple calls to language models to generate better outputs. A scaffolding program is written in a programming language such as Python. In this work, we use a language-model-infused scaffolding program to improve itself. We start with a seed "improver" that improves an input program according to a given utility function by querying a language model several times and returning the best solution. We then run this seed improver to improve itself. Across a small set of downstream tasks, the resulting improved improver generates programs with significantly better performance than its seed improver. Afterward, we analyze the variety of self-improvement strategies proposed by the language model, including beam search, genetic algorithms, and simulated annealing. Since the language models themselves are not altered, this is not full recursive self-improvement. Nonetheless, it demonstrates that a modern language model, GPT-4 in our proof-of-concept experiments, is capable of writing code that can call itself to improve itself. We critically consider concerns around the development of self-improving technologies and evaluate the frequency with which the generated code bypasses a sandbox.
    摘要 Recent Advances in AI Systems (e.g., Tree-of-Thoughts and Program-Aided Language Models) solve problems by providing a "scaffolding" program that structures multiple calls to language models to generate better outputs. A scaffolding program is written in a programming language such as Python. In this work, we use a language-model-infused scaffolding program to improve itself. We start with a seed "improver" that improves an input program according to a given utility function by querying a language model several times and returning the best solution. We then run this seed improver to improve itself. Across a small set of downstream tasks, the resulting improved improver generates programs with significantly better performance than its seed improver. Afterward, we analyze the variety of self-improvement strategies proposed by the language model, including beam search, genetic algorithms, and simulated annealing. Since the language models themselves are not altered, this is not full recursive self-improvement. Nonetheless, it demonstrates that a modern language model, GPT-4 in our proof-of-concept experiments, is capable of writing code that can call itself to improve itself. We critically consider concerns around the development of self-improving technologies and evaluate the frequency with which the generated code bypasses a sandbox.

TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.02260
  • repo_url: https://github.com/yahidar/transradar
  • paper_authors: Yahia Dalbah, Jean Lahoud, Hisham Cholakkal
  • for: 本文旨在提出一种基于雷达数据的Scene Semantic Segmentation方法,以解决自动驾驶中场景理解的问题。
  • methods: 本方法使用了多输入融合的雷达数据,并提出了一种专门为雷达感知的启发块和适应loss函数,以解决雷达数据的噪声和稀畴性问题。
  • results: 对于CARRADA和RADIal数据集,本方法的性能比状态前方法更高,同时模型的大小更小。
    Abstract Scene understanding plays an essential role in enabling autonomous driving and maintaining high standards of performance and safety. To address this task, cameras and laser scanners (LiDARs) have been the most commonly used sensors, with radars being less popular. Despite that, radars remain low-cost, information-dense, and fast-sensing techniques that are resistant to adverse weather conditions. While multiple works have been previously presented for radar-based scene semantic segmentation, the nature of the radar data still poses a challenge due to the inherent noise and sparsity, as well as the disproportionate foreground and background. In this work, we propose a novel approach to the semantic segmentation of radar scenes using a multi-input fusion of radar data through a novel architecture and loss functions that are tailored to tackle the drawbacks of radar perception. Our novel architecture includes an efficient attention block that adaptively captures important feature information. Our method, TransRadar, outperforms state-of-the-art methods on the CARRADA and RADIal datasets while having smaller model sizes. https://github.com/YahiDar/TransRadar
    摘要 Simplified Chinese:Scene understanding 是自动驾驶中的关键任务,而 радиар(LiDAR)感知器即使不如摄像头和激光雷达(LiDAR)这些感知器受欢迎,也仍然是可靠的选择。然而, радиар数据仍然存在噪声和稀疏性,以及背景和前景的不匹配。为解决这些挑战,我们提出了一种新的方法,即基于多输入的 радиарScene semantic segmentation,通过一种适应性的注意块和适应性的损失函数。我们的方法,TransRadar,在 CARRADA 和 RADIal 数据集上达到了 state-of-the-art 水平,同时具有较小的模型大小。

A Neural Scaling Law from Lottery Ticket Ensembling

  • paper_url: http://arxiv.org/abs/2310.02258
  • repo_url: None
  • paper_authors: Ziming Liu, Max Tegmark
  • for: 本研究探讨了神经网络中的普适扩展法则(NSL),即模型性能随模型大小增加而提高的现象。
  • methods: 作者使用了approximation theory进行分析,并预测了MSE损失随模型参数数量($N$)的 decay,具体来说是$\alpha=4/d$,其中$d$是内在输入维度。
  • results: 尽管这些理论在某些情况下(如ReLU网络)工作良好,但作者却发现了一个简单的1D问题($y=x^2$)manifests一种不同的扩展法则($\alpha=1$),与预测的扩展法则($\alpha=4$)不同。通过打开神经网络和统计学研究,作者发现这种新的扩展法则来自于lottery ticket ensemble:一个更宽的网络平均有更多的”lottery tickets”,这些ensemble可以减少输出变化的偏差。作者支持这种ensemble机制,并 mechanistically interprets single neural networks,以及统计学研究。最后,作者讨论了这种扩展法则的可能的应用于大语言模型和学习统计物理类理论。
    Abstract Neural scaling laws (NSL) refer to the phenomenon where model performance improves with scale. Sharma & Kaplan analyzed NSL using approximation theory and predict that MSE losses decay as $N^{-\alpha}$, $\alpha=4/d$, where $N$ is the number of model parameters, and $d$ is the intrinsic input dimension. Although their theory works well for some cases (e.g., ReLU networks), we surprisingly find that a simple 1D problem $y=x^2$ manifests a different scaling law ($\alpha=1$) from their predictions ($\alpha=4$). We opened the neural networks and found that the new scaling law originates from lottery ticket ensembling: a wider network on average has more "lottery tickets", which are ensembled to reduce the variance of outputs. We support the ensembling mechanism by mechanistically interpreting single neural networks, as well as studying them statistically. We attribute the $N^{-1}$ scaling law to the "central limit theorem" of lottery tickets. Finally, we discuss its potential implications for large language models and statistical physics-type theories of learning.
    摘要

MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models

  • paper_url: http://arxiv.org/abs/2310.02255
  • repo_url: None
  • paper_authors: Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao
    for: The paper aims to evaluate the ability of large language models (LLMs) and large multimodal models (LMMs) in mathematical reasoning in visual contexts.methods: The paper presents MathVista, a benchmark that combines challenges from diverse mathematical and visual tasks, and conducts a comprehensive evaluation of 12 prominent foundation models.results: The best-performing GPT-4V model achieves an overall accuracy of 49.9%, outperforming the second-best performer, Bard, by 15.1%. However, GPT-4V still falls short of human performance by 10.4%, indicating the need for further research to improve its mathematical reasoning and understanding of complex figures.
    Abstract Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. It consists of 6,141 examples, derived from 28 existing multimodal datasets involving mathematics and 3 newly created datasets (i.e., IQTest, FunctionQA, and PaperQA). Completing these tasks requires fine-grained, deep visual understanding and compositional reasoning, which all state-of-the-art foundation models find challenging. With MathVista, we have conducted a comprehensive, quantitative evaluation of 12 prominent foundation models. The best-performing GPT-4V model achieves an overall accuracy of 49.9%, substantially outperforming Bard, the second-best performer, by 15.1%. Our in-depth analysis reveals that the superiority of GPT-4V is mainly attributed to its enhanced visual perception and mathematical reasoning. However, GPT-4V still falls short of human performance by 10.4%, as it often struggles to understand complex figures and perform rigorous reasoning. This significant gap underscores the critical role that MathVista will play in the development of general-purpose AI agents capable of tackling mathematically intensive and visually rich real-world tasks. We further explore the new ability of self-verification, the application of self-consistency, and the interactive chatbot capabilities of GPT-4V, highlighting its promising potential for future research. The project is available at https://mathvista.github.io/.
    摘要 大型语言模型(LLM)和大型多Modal模型(LMM)在许多任务和领域表现出色,但它们在视觉上的数学逻辑能力尚未得到系统的研究。为了填补这一漏洞,我们提出了MathVista,一个权威的测试集,它包含来自28个多Modal数学 dataset的6,141个例子,以及3个新创建的 dataset(即IQTest、FunctionQA和PaperQA)。完成这些任务需要深刻的视觉理解和 композиitional 逻辑,所有当前的基础模型都遇到了挑战。通过MathVista,我们进行了全面的、量化的评估12种知名基础模型。最佳的GPT-4V模型在总体精度上达到49.9%,与第二名的Bard相比,提高了15.1%。我们的深入分析表明,GPT-4V的优势主要归结于其增强的视觉理解和数学逻辑能力。然而,GPT-4V仍然落后人类性能by 10.4%,表明它在处理复杂的图像和进行严格的逻辑时仍有很大的改进空间。这种显著的差距 highlights MathVista在开发普通智能代理人 capable of tackling mathematically intensive and visually rich real-world tasks 的发展中着重的作用。我们进一步探讨了GPT-4V的新能力自我验证、自我一致性应用以及交互式chatbot能力,强调它的潜在的研究潜力。项目可以在https://mathvista.github.io/ 找到。

Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances

  • paper_url: http://arxiv.org/abs/2310.02246
  • repo_url: None
  • paper_authors: Mikhail Khodak, Edmond Chow, Maria-Florina Balcan, Ameet Talwalkar
  • for: 解决一个线性系统 $Ax=b$ 是基本的科学计算基础功能,这个系统中的参数的优化问题非常重要,但是实际上这些参数通常是不可知或者太costly的。
  • methods: 我们考虑了一个常见的情况,即在一个数值仪器中需要解决多个相关的线性系统。在这种情况下,我们可以sequentially选择参数,以达到一个近似优化的总迭代次数。
  • results: 我们证明了,使用 Successive Over-Relaxation (SOR) 方法,一种标准的解决方法,可以使用一个在线学习算法(bandit online learning algorithm)来选择参数,以确保总的迭代次数接近最佳固定参数的性能。此外,当给出额外结构信息时,我们展示了一种上下文依赖的bandit方法可以达到实例优化策略的性能,即选择每个实例的最佳参数。这是learning-theoretic对高精度线性系统解决器的首次征识,以及数据驱动科学计算的首次端到端保证,这些结果表明了使用良好了解的学习算法可以加速数值方法。
    Abstract Solving a linear system $Ax=b$ is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters whose optimal values depend on the system being solved and are often impossible or too expensive to identify; thus in practice sub-optimal heuristics are used. We consider the common setting in which many related linear systems need to be solved, e.g. during a single numerical simulation. In this scenario, can we sequentially choose parameters that attain a near-optimal overall number of iterations, without extra matrix computations? We answer in the affirmative for Successive Over-Relaxation (SOR), a standard solver whose parameter $\omega$ has a strong impact on its runtime. For this method, we prove that a bandit online learning algorithm -- using only the number of iterations as feedback -- can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $\omega$ as the sequence length increases. Furthermore, when given additional structural information, we show that a contextual bandit method asymptotically achieves the performance of the instance-optimal policy, which selects the best $\omega$ for each instance. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing, demonstrating theoretically the potential to speed up numerical methods using well-understood learning algorithms.
    摘要 解决线性系统$Ax=b$是科学计算中的基本原理,其中有许多解 solver 和预conditioner 已经开发出来。这些解 solver 和预conditioner 具有参数,其优化的值取决于所解系统,而且在实际中通常是不可能或太Expensive的。因此在实际中通常使用临时的补做法。我们考虑在一个数值仿真中解决多个相关的线性系统时,可以顺序选择参数以实现近似最优的总迭代次数,而无需额外的矩阵计算。我们回答了问题,并证明了在 Successive Over-Relaxation (SOR) 方法中,一种矩阵学习算法可以在序列中选择参数,使总成本逼近最优的固定 $\omega$ 的成本。此外,当具有额外结构信息时,我们显示了一种 Contextual Bandit 方法,可以在序列中选择最优的 $\omega$,使总成本逼近实例优化策略的成本。我们的工作提供了科学计算中学习理论的首个征应,以及数据驱动科学计算中的首个端到端保证,证明了可以使用良好的学习算法来加速数值方法。

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

  • paper_url: http://arxiv.org/abs/2310.02239
  • repo_url: https://github.com/eric-ai-lab/minigpt-5
  • paper_authors: Kaizhi Zheng, Xuehai He, Xin Eric Wang
  • for: 这篇论文的目的是提出一种新的视觉语言生成技术,以帮助生成具有文字和图像的合理描述。
  • methods: 该技术使用了一种新的“生成元”(vokens),用于将图像和文字生成成一个协调的 multimodal 输出。该技术还采用了一种两个阶段的训练策略,以确保模型能够在不同的 benchmark 上表现出色。
  • results: 对于 MMDialog 数据集,该技术比基eline模型(Divter)表现出了显著的改进,并在 VIST 数据集上的人工评估中也表现出了Superior 或相当的多模态输出。
    Abstract Large Language Models (LLMs) have garnered significant attention for their advancements in natural language processing, demonstrating unparalleled prowess in text comprehension and generation. Yet, the simultaneous generation of images with coherent textual narratives remains an evolving frontier. In response, we introduce an innovative interleaved vision-and-language generation technique anchored by the concept of "generative vokens," acting as the bridge for harmonized image-text outputs. Our approach is characterized by a distinctive two-staged training strategy focusing on description-free multimodal generation, where the training requires no comprehensive descriptions of images. To bolster model integrity, classifier-free guidance is incorporated, enhancing the effectiveness of vokens on image generation. Our model, MiniGPT-5, exhibits substantial improvement over the baseline Divter model on the MMDialog dataset and consistently delivers superior or comparable multimodal outputs in human evaluations on the VIST dataset, highlighting its efficacy across diverse benchmarks.
    摘要 大型语言模型(LLMs)已引起广泛关注,因其在自然语言处理方面表现出了无前例的能力,包括文本理解和生成。然而,同时生成具有 coherent 文本描述和图像的 Multimodal 输出仍然是一个处于演化阶段的领域。为此,我们介绍了一种创新的融合视觉语言生成技术,基于“生成短语”(generative vokens)这个概念,用于融合图像和文本输出。我们的方法包括两个阶段的训练策略,无需对图像进行全面的描述。为保持模型的完整性,我们还包括了无类别导航的技术,以增强生成短语对图像的影响。我们的模型“MINI-GPT-5”在 MMDialog 数据集上表现出了显著的改善,并在 VIST 数据集上人工评估中 consistently 提供了Superior 或 Comparable 的多Modal 输出,这种表现力表明其在多种 benchMark 上的效果。

Who’s Harry Potter? Approximate Unlearning in LLMs

  • paper_url: http://arxiv.org/abs/2310.02238
  • repo_url: None
  • paper_authors: Ronen Eldan, Mark Russinovich
  • for: 本研究旨在提出一种有效的语言模型忘记技术,以解决大型语言模型在训练过程中使用版权内容的法律和伦理问题。
  • methods: 本研究提出的技术包括三个主要组成部分:首先,我们使用一个增强的模型,通过比较其Logits与基eline模型的Logits来确定最相关的标签。其次,我们将特定数据中的独特表达替换为通用表达,并利用模型的自己预测来生成代表每个标签的替换标签。最后,我们通过finetuning这些替换标签来训练模型,从而使模型忘记原始数据。
  • results: 我们在使用这种技术处理Harry Potter系列书籍时,在约1 GPU小时的finetuning后,成功地使Llama2-7b模型失去了对Harry Potter相关内容的生成和回忆能力,而不影响其在常见测试集(如Winogrande、Hellaswag、arc、boolq和piqa)的性能。我们将我们的微调模型公开发布在HuggingFace上,以便社区评估。据我们所知,这是首次对生成语言模型的忘记技术进行有效实现。
    Abstract Large language models (LLMs) are trained on massive internet corpora that often contain copyrighted content. This poses legal and ethical challenges for the developers and users of these models, as well as the original authors and publishers. In this paper, we propose a novel technique for unlearning a subset of the training data from a LLM, without having to retrain it from scratch. We evaluate our technique on the task of unlearning the Harry Potter books from the Llama2-7b model (a generative language model recently open-sourced by Meta). While the model took over 184K GPU-hours to pretrain, we show that in about 1 GPU hour of finetuning, we effectively erase the model's ability to generate or recall Harry Potter-related content, while its performance on common benchmarks (such as Winogrande, Hellaswag, arc, boolq and piqa) remains almost unaffected. We make our fine-tuned model publicly available on HuggingFace for community evaluation. To the best of our knowledge, this is the first paper to present an effective technique for unlearning in generative language models. Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a baseline model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model's own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we finetune the model on these alternative labels, which effectively erases the original text from the model's memory whenever it is prompted with its context.
    摘要 大型语言模型(LLM)通常在互联网庞大数据集上训练,这些数据集经常包含版权内容,这引发了开发者和用户之间的法律和道德问题,以及原始作者和发布者的问题。在这篇论文中,我们提出了一种新的解决方案,即使用一种新的技术来从大型语言模型中“忘记”一部分训练数据,而不需要重新训练模型。我们在使用LLlama2-7b模型(一个最近开源的生成语言模型)进行测试,这个模型需要184K GPU-小时来预训练,但我们发现,只需要约1个GPU小时的训练,我们就可以让模型完全忘记Harry Potter系列的内容,而不会影响其在常见的benchmark上的性能(如Winogrande、Hellaswag、arc、boolq和piqa)。我们将我们的微调模型公开发布在HuggingFace上,以便社区进行评估。我们知道的是,这是首次发表有效的生成语言模型忘记技术的论文。我们的技术包括三个主要部分:首先,我们使用一个增强的模型,通过对目标数据进行进一步训练,以确定对于忘记目标最重要的字符串。其次,我们将目标数据中的特殊表达替换为通用表达,并利用模型的自己预测来生成每个字符的替换标签。这些标签的目标是 aproximate 模型没有训练过目标数据时的下一个字符预测。 finally,我们微调模型使用这些替换标签,这将effectively 将原始文本从模型的记忆中清除,当模型被提交到其上下文时。

Exploring Model Learning Heterogeneity for Boosting Ensemble Robustness

  • paper_url: http://arxiv.org/abs/2310.02237
  • repo_url: https://github.com/git-disl/heterobust
  • paper_authors: Yanzhao Wu, Ka-Ho Chow, Wenqi Wei, Ling Liu
  • for: 提高复杂学习任务的总体化性能
  • methods: 使用多样化深度神经网络 ensemble,利用模型学习多样性强化集成 robustness
  • results: 经验证明,多样化深度神经网络 ensemble可以强化集成的 Robustness,并且在恶例和攻击 settings 中表现出更高的 RobustnessHere’s a breakdown of each point:
  • for: The paper is written to improve the generalization performance of complex learning tasks, specifically by using deep neural network ensembles.
  • methods: The paper uses heterogeneous deep neural networks (DNNs) and a weighted bounding box ensemble consensus method to leverage model learning heterogeneity and boost ensemble robustness. Additionally, the paper introduces a two-tier ensemble construction method that composes ensembles of heterogeneous models for solving different learning problems, and uses connected component labeling (CCL) based alignment to promote high ensemble diversity and low negative correlation among member models.
  • results: The paper provides extensive experiments to validate the enhanced robustness of heterogeneous ensembles in both benign and adversarial settings. The results show that the heterogeneous ensembles can improve the robustness of the model against negative examples and adversarial attacks.
    Abstract Deep neural network ensembles hold the potential of improving generalization performance for complex learning tasks. This paper presents formal analysis and empirical evaluation to show that heterogeneous deep ensembles with high ensemble diversity can effectively leverage model learning heterogeneity to boost ensemble robustness. We first show that heterogeneous DNN models trained for solving the same learning problem, e.g., object detection, can significantly strengthen the mean average precision (mAP) through our weighted bounding box ensemble consensus method. Second, we further compose ensembles of heterogeneous models for solving different learning problems, e.g., object detection and semantic segmentation, by introducing the connected component labeling (CCL) based alignment. We show that this two-tier heterogeneity driven ensemble construction method can compose an ensemble team that promotes high ensemble diversity and low negative correlation among member models of the ensemble, strengthening ensemble robustness against both negative examples and adversarial attacks. Third, we provide a formal analysis of the ensemble robustness in terms of negative correlation. Extensive experiments validate the enhanced robustness of heterogeneous ensembles in both benign and adversarial settings. The source codes are available on GitHub at https://github.com/git-disl/HeteRobust.
    摘要 深度神经网络集成具有提高复杂学习任务的总体化性能的潜在能力。本文提出了正式分析和实验评估,表明多样性深度集成模型可以有效利用模型学习多样性,提高集成强度。我们首先表明,用于解决同一个学习问题的不同深度神经网络模型可以通过我们的重量平均盒子集成方法,提高mean average precision(mAP)的性能。其次,我们将不同的学习问题的模型组成多样性集成,例如对象检测和 semantic segmentation,通过基于connected component labeling(CCL)的对alignment。我们表明,这种两层多样性驱动的集成建构方法可以组成一个高ensemble diversity和低负相关性的ensemble team,从而增强集成的Robustness,包括both benign和敌意 Settings。第三,我们提供了对集成强度的正式分析,并进行了广泛的实验验证,证明多样性集成模型在both benign和敌意 Settings中具有增强的Robustness。相关代码可以在GitHub上找到,链接为https://github.com/git-disl/HeteRobust。

Automatic Quality Assessment of Wikipedia Articles – A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2310.02235
  • repo_url: None
  • paper_authors: Pedro Miguel Moás, Carla Teixeira Lopes
  • for: 提高Wikipedia文章质量的自动评估方法
  • methods: 文章特征、质量指标、机器学习算法等方法的比较和分析
  • results: 文章质量评估的149项研究,探讨了常见之处和缺失In English, that would be:
  • for: Improving the automatic assessment of Wikipedia article quality
  • methods: Comparing and analyzing machine learning algorithms, article features, quality metrics, and used datasets
  • results: A review of 149 studies on article quality assessment, exploring commonalities and gaps
    Abstract Wikipedia is the world's largest online encyclopedia, but maintaining article quality through collaboration is challenging. Wikipedia designed a quality scale, but with such a manual assessment process, many articles remain unassessed. We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality.
    摘要 Wikipedia是全球最大的在线百科全书,但保持文章质量通过协作是挑战。Wikipedia设计了质量级别,但由于手动评估过程,许多文章还没有被评估。我们对现有自动评估wikipedia文章质量的方法进行了回顾,找到了机器学习算法、文章特征、质量指标和使用的数据集,并对149个不同的研究进行了检查。文献广泛,方法遵循过去的技术趋势,但是机器学习还没有广泛应用于Wikipedia,我们希望我们的分析能够帮助未来的研究人员改变这种现实。

MIS-AVoiDD: Modality Invariant and Specific Representation for Audio-Visual Deepfake Detection

  • paper_url: http://arxiv.org/abs/2310.02234
  • repo_url: None
  • paper_authors: Vinaya Sree Katamneni, Ajita Rattani
  • for: 这篇研究是针对deepfake问题的解决方案,具体来说是透过对多modal的声音和视觉数据进行整合,以实现更高精度的伪造检测。
  • methods: 本研究提出了一种基于表现层的方法,通过结合不同模式的声音和视觉表现,实现更好的整合和更高的检测精度。
  • results: 实验结果显示,该方法可以与目前的State-of-the-art(SOTA)数据检测器相比,提高检测精度约17.8%和18.4%。
    Abstract Deepfakes are synthetic media generated using deep generative algorithms and have posed a severe societal and political threat. Apart from facial manipulation and synthetic voice, recently, a novel kind of deepfakes has emerged with either audio or visual modalities manipulated. In this regard, a new generation of multimodal audio-visual deepfake detectors is being investigated to collectively focus on audio and visual data for multimodal manipulation detection. Existing multimodal (audio-visual) deepfake detectors are often based on the fusion of the audio and visual streams from the video. Existing studies suggest that these multimodal detectors often obtain equivalent performances with unimodal audio and visual deepfake detectors. We conjecture that the heterogeneous nature of the audio and visual signals creates distributional modality gaps and poses a significant challenge to effective fusion and efficient performance. In this paper, we tackle the problem at the representation level to aid the fusion of audio and visual streams for multimodal deepfake detection. Specifically, we propose the joint use of modality (audio and visual) invariant and specific representations. This ensures that the common patterns and patterns specific to each modality representing pristine or fake content are preserved and fused for multimodal deepfake manipulation detection. Our experimental results on FakeAVCeleb and KoDF audio-visual deepfake datasets suggest the enhanced accuracy of our proposed method over SOTA unimodal and multimodal audio-visual deepfake detectors by $17.8$% and $18.4$%, respectively. Thus, obtaining state-of-the-art performance.
    摘要 深刻的伪造(Deepfakes)是使用深度生成算法生成的伪造媒体,它们对社会和政治造成了严重的威胁。除了脸部修改和合成声音之外,最近出现了一种新的深刻媒体,其中的声音或视觉特征被修改。在这种情况下,一新的多Modal audio-visual Deepfake检测器正在被研究,以同时集中关注声音和视觉数据的修改检测。现有的多Modal(即音频-视觉)深刻检测器通常基于视频中的声音和视觉流的融合。现有的研究表明,这些多Modal检测器通常与单Modal声音和视觉深刻检测器相当。我们 conjecture that the 多Modal signal的heterogeneous nature creates distributional modality gaps and poses a significant challenge to effective fusion and efficient performance。在本文中,我们采取了解决方案,通过在表示层进行干预,以便协助声音和视觉流的融合。具体来说,我们提议使用模态(即声音和视觉)不变和特定表示。这使得共同的模式和每个模式表示真正的内容或假内容的特征被保留并融合,以便多Modal deepfake manipulation检测。我们在FakeAVCeleb和KoDF audio-visual deepfake dataset上进行实验,结果表明,我们提议的方法与State-of-the-art unimodal和多Modal audio-visual deepfake检测器相比,提高了检测精度 by 17.8%和18.4%,分别。因此,我们获得了State-of-the-art表现。

Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks

  • paper_url: http://arxiv.org/abs/2310.02230
  • repo_url: None
  • paper_authors: Luca Scimeca, Alexander Rubinstein, Armand Mihai Nicolicioiu, Damien Teney, Yoshua Bengio
  • for: 本文提出了一种 ensemble diversification 框架,用于避免短circuit learning 现象,并且使用了Diffusion Probabilistic Models (DPMs)来生成合成counterfactuals。
  • methods: 本文使用了DPMs来生成合成counterfactuals,并且利用了这些counterfactuals来鼓励模型多样性。
  • results: experiments show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
    Abstract Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover that DPMs have the inherent capability to represent multiple visual cues independently, even when they are largely correlated in the training data. We leverage this characteristic to encourage model diversity and empirically show the efficacy of the approach with respect to several diversification objectives. We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
    摘要 <>各种假设与数据中的多个讯号相关性,可能会导致短cut learning现象,其中模型可能会从容易学习且错误的讯号中获取错误的预测。在这个工作中,我们提出了一个ensemble divergence框架,利用Diffusion Probabilistic Models(DPMs)生成的假设对应。我们发现DPMs具有独立表示多个视觉讯号的特性,即使这些讯号在训练数据中高度相关。我们利用这个特性来鼓励模型多样性,并证明这种方法可以与多种多样化目标相比。我们显示,对于短cut learning的防止和多样性表现,diffusion-guided divergence可以将模型引导避免偏预测,并达到与额外数据收集相同的 ensemble多样性表现。<>

Extraction of Medication and Temporal Relation from Clinical Text using Neural Language Models

  • paper_url: http://arxiv.org/abs/2310.02229
  • repo_url: None
  • paper_authors: Hangyu Tu, Lifeng Han, Goran Nenadic
  • for: 这个论文的目的是用深度学习和大型自然语言模型来提高药品提取和时间关系分类的性能。
  • methods: 这个论文使用了多种先进的学习结构,包括BiLSTM-CRF和CNN-BiLSTM来实现医疗领域名实体识别(NER),以及BERT-CNN来提取时间关系。此外,也研究了不同的词嵌入技术。
  • results: 实验表明,CNN-BiLSTM模型在i2b2-2009临床NER任务上轻微超过BiLSTM-CRF模型,得到了75.67、77.83和78.17的精度、回归和F1分数(macro average)。BERT-CNN模型在i2b2-2012挑战中的时间关系提取测试集上也得到了64.48、67.17和65.03的P/R/F1分数(macro average)。
    Abstract Clinical texts, represented in electronic medical records (EMRs), contain rich medical information and are essential for disease prediction, personalised information recommendation, clinical decision support, and medication pattern mining and measurement. Relation extractions between medication mentions and temporal information can further help clinicians better understand the patients' treatment history. To evaluate the performances of deep learning (DL) and large language models (LLMs) in medication extraction and temporal relations classification, we carry out an empirical investigation of \textbf{MedTem} project using several advanced learning structures including BiLSTM-CRF and CNN-BiLSTM for a clinical domain named entity recognition (NER), and BERT-CNN for temporal relation extraction (RE), in addition to the exploration of different word embedding techniques. Furthermore, we also designed a set of post-processing roles to generate structured output on medications and the temporal relation. Our experiments show that CNN-BiLSTM slightly wins the BiLSTM-CRF model on the i2b2-2009 clinical NER task yielding 75.67, 77.83, and 78.17 for precision, recall, and F1 scores using Macro Average. BERT-CNN model also produced reasonable evaluation scores 64.48, 67.17, and 65.03 for P/R/F1 using Macro Avg on the temporal relation extraction test set from i2b2-2012 challenges. Code and Tools from MedTem will be hosted at \url{https://github.com/HECTA-UoM/MedTem}
    摘要 临床文本,表示在电子医疗记录(EMR)中,含有丰富的医学信息,是关键 для疾病预测、个性化信息推荐、临床决策支持和药物征文挖掘。在时间信息之间的关系抽取可以帮助临床医生更好地理解患者的治疗历史。为了评估深度学习(DL)和大型自然语言模型(LLM)在药物抽取和时间关系分类中的性能,我们进行了empirical investigation of MedTem项目,使用了多种先进的学习结构,包括BiLSTM-CRF和CNN-BiLSTM для临床名实体识别(NER),以及BERT-CNN для时间关系抽取(RE)。此外,我们还设计了一系列后处理角色,以生成结构化的药物和时间关系输出。我们的实验结果显示,CNN-BiLSTM模型在i2b2-2009临床NER任务上轻微击败BiLSTM-CRF模型,得分为75.67、77.83和78.17的精度、回归和F1分数使用Macro Average。BERT-CNN模型也 produz了可接受的评估分数,分别为64.48、67.17和65.03的P/R/F1分数使用Macro Avg在i2b2-2012挑战中的时间关系抽取测试集。MedTem代码和工具将会在\url{https://github.com/HECTA-UoM/MedTem}上hosts。

SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training

  • paper_url: http://arxiv.org/abs/2310.02227
  • repo_url: None
  • paper_authors: Kazem Meidani, Parshin Shojaee, Chandan K. Reddy, Amir Barati Farimani
  • for: bridging the gap between symbolic equations and numeric data, and enhancing the mutual similarities between the two domains.
  • methods: joint contrastive learning between symbolic and numeric domains, enhancing the embeddings of both domains.
  • results: SNIP effectively transfers to various tasks, consistently outperforming fully supervised baselines and competing strongly with established task-specific methods, especially in few-shot learning scenarios.
    Abstract In an era where symbolic mathematical equations are indispensable for modeling complex natural phenomena, scientific inquiry often involves collecting observations and translating them into mathematical expressions. Recently, deep learning has emerged as a powerful tool for extracting insights from data. However, existing models typically specialize in either numeric or symbolic domains, and are usually trained in a supervised manner tailored to specific tasks. This approach neglects the substantial benefits that could arise from a task-agnostic unified understanding between symbolic equations and their numeric counterparts. To bridge the gap, we introduce SNIP, a Symbolic-Numeric Integrated Pre-training, which employs joint contrastive learning between symbolic and numeric domains, enhancing their mutual similarities in the pre-trained embeddings. By performing latent space analysis, we observe that SNIP provides cross-domain insights into the representations, revealing that symbolic supervision enhances the embeddings of numeric data and vice versa. We evaluate SNIP across diverse tasks, including symbolic-to-numeric mathematical property prediction and numeric-to-symbolic equation discovery, commonly known as symbolic regression. Results show that SNIP effectively transfers to various tasks, consistently outperforming fully supervised baselines and competing strongly with established task-specific methods, especially in few-shot learning scenarios where available data is limited.
    摘要 在今天的 era 中,符号数学方程是模拟自然现象的不可或缺的工具。科学研究通常包括收集观察数据并将其翻译成数学表达。近期,深度学习出现了一种强大的数据挖掘工具。然而,现有的模型通常专门针对 numeric 或 symbolic 领域,通常通过特定任务的监督学习训练。这种方法忽略了在 symbolic 和 numeric 领域之间的重要优点,即在预训练 embedding 中增强它们之间的相似性。为了跨越这个差距,我们提出了 SNIP,一种 Symbolic-Numeric Integrated Pre-training,它使用 joint contrastive learning 来增强 symbolic 和 numeric 领域之间的相似性。通过域外分析,我们发现 SNIP 在预训练 embedding 中提供了跨领域的描述,其中 symbolic 监督提高了 numeric 数据的嵌入,并且 vice versa。我们对 SNIP 进行了多种任务的评估,包括符号学习和数学表达的转化,并发现它在少量数据enario 中表现出了出色的抗衰假设能力和稳定性。

Think before you speak: Training Language Models With Pause Tokens

  • paper_url: http://arxiv.org/abs/2310.02226
  • repo_url: None
  • paper_authors: Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
  • for: 该研究旨在提高语言模型在各种任务上的表现,通过在推理过程中添加延迟。
  • methods: 研究人员使用了一种名为“pause-training”的技术,将一个可学习的“停止”token添加到输入前缀中,以允许模型在推理过程中进行额外计算。
  • results: 研究人员在使用“pause-training”技术后,对1B和130M参数的语言模型进行了训练和推理,并在多个下游任务上观察了提高表现。特别是,对于1B模型,在8个任务中有7个显示了提高,其中最大提高为18%的EMscore在SQuAD问答任务上。
    Abstract Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on language models with a (learnable) $\textit{pause}$ token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate $\textit{pause-training}$ on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of $18\%$ EM score on the QA task of SQuAD, $8\%$ on CommonSenseQA and $1\%$ accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm.
    摘要 $(K+1)^{th}$ 字符是通过对 $K$ 个隐藏 вектор进行推移操作而生成的,每个前一个字符都有一个隐藏 вектор。如果我们allow模型在每个字符之前进行更多的计算,那么可能会带来更好的性能吗?我们实现这个想法通过在语言模型中添加一个可学习的 $\textit{pause}$ token,并在训练和推理过程中使用这个 token。我们在模型输出前延迟执行模型的输出,这样允许模型在输出之前进行更多的计算。我们对decoder-only模型进行了训练和推理,并在 causal pretraining 中使用了 C4。我们发现,在训练和finetune时,延迟的推理显示了提高。对于 1B 参数的模型,我们在 8 个任务中观察到了提高,其中最大提高是 SQuAD 问答任务的 EM 分数提高了 18%,CommonSenseQA 提高了 8%,GSM8k 理解任务提高了 1%。我们的研究提出了许多概念和实践未来研究的问题,例如使得延迟下一个字符预测成为一种广泛应用的新方法。

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

  • paper_url: http://arxiv.org/abs/2310.02219
  • repo_url: None
  • paper_authors: Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets
  • for: 这个论文旨在研究如何使用预训练的视觉表示(PVR)来训练下游策略,执行实际世界任务。
  • methods: 本研究使用了五种不同的PVR,两种不同的策略学习模式(仿制学习和奖励学习),以及三种不同的机器人,用于5个不同的机器人 manipulation和室内导航任务。
  • results: 我们的研究结果表明:1)PVRs在实际世界中的性能趋势与实际世界中的训练趋势相似,2)使用PVRs可以实现室内ImageNav中的零拟合转移(在实际世界中的真实场景中进行了零拟合转移),3)PVRs的变化,主要是数据扩展和细化,也在实际世界中转移到了性能。请参考项目网站 für更多细节和图像。
    Abstract We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study spans five different PVRs, two different policy-learning paradigms (imitation and reinforcement learning), and three different robots for 5 distinct manipulation and indoor navigation tasks. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance. See project website for additional details and visuals.
    摘要 我们进行了大规模的实验研究,探讨使用预训练视觉表示(PVR)来训练下游策略,执行真实世界任务。我们的研究涵盖了五种不同的PVR,两种不同的策略学习 paradigma(仿制学习和奖励学习),以及三种不同的机器人,用于5种不同的机械和室内导航任务。从这个努力中,我们得出了三个发现:1)PVR在模拟环境中的性能趋势与实际世界的趋势相似,2)通过使用PVR,我们实现了室内图像导航领域中的首次成果(零扩展转移到实际世界中的Scene),3)PVR的变化,主要是数据扩展和细化,也在实际世界中转移到性能。请参考项目网站 для更多细节和视频。

Language Models Represent Space and Time

  • paper_url: http://arxiv.org/abs/2310.02207
  • repo_url: https://github.com/wesg52/world-models
  • paper_authors: Wes Gurnee, Max Tegmark
  • For: The paper explores the question of whether large language models (LLMs) learn a coherent model of the data generating process (a world model) or just a collection of superficial statistics.* Methods: The paper analyzes the learned representations of six datasets (three spatial and three temporal) in the Llama-2 family of models.* Results: The paper finds that LLMs learn linear representations of space and time across multiple scales, and identifies individual “space neurons” and “time neurons” that reliably encode spatial and temporal coordinates. These results suggest that modern LLMs acquire structured knowledge about fundamental dimensions such as space and time, supporting the view that they learn a coherent world model.Here is the same information in Simplified Chinese text:* For: 论文探讨了大语言模型(LLMs)是否学习了数据生成过程的一个完整模型(世界模型),或者只是一个 superficies 的统计数据。* Methods: 论文分析了 Llama-2 家族中的六个数据集(三个空间数据集和三个时间数据集)的学习表示。* Results: 论文发现 LLMs 学习了多尺度空间和时间的直线表示,并确定了固定的 “空间神经” 和 “时间神经” 可靠地编码空间和时间坐标。这些结果表明现代 LLMS 积累了基本维度空间和时间的结构化知识,支持这些模型学习了一个完整的世界模型。
    Abstract The capabilities of large language models (LLMs) have sparked debate over whether such systems just learn an enormous collection of superficial statistics or a coherent model of the data generating process -- a world model. We find evidence for the latter by analyzing the learned representations of three spatial datasets (world, US, NYC places) and three temporal datasets (historical figures, artworks, news headlines) in the Llama-2 family of models. We discover that LLMs learn linear representations of space and time across multiple scales. These representations are robust to prompting variations and unified across different entity types (e.g. cities and landmarks). In addition, we identify individual ``space neurons'' and ``time neurons'' that reliably encode spatial and temporal coordinates. Our analysis demonstrates that modern LLMs acquire structured knowledge about fundamental dimensions such as space and time, supporting the view that they learn not merely superficial statistics, but literal world models.
    摘要 大型自然语言模型(LLMs)的能力引发了讨论,是否只是学习庞大的 superficier statistics 还是一个 coherent 的数据生成过程模型——世界模型。我们通过分析 Llama-2 家族模型学习的表示来证明,LLMs 实际上学习了线性的空间和时间表示,这些表示适应多种缩放。这些表示还能够抗衡示 variations 和不同实体类型(如城市和标志)的统一。此外,我们还发现了特定的“空间神经”和“时间神经”,它们可靠地编码空间和时间坐标。我们的分析表明,现代 LLMs 获得了基本维度 such as space 和 time 的结构化知识,支持者们认为,它们不仅学习 superficier statistics,而是 literal world models。

Efficient Online Scheduling and Routing for Automated Guided Vehicles: Comparing a Novel Loop-Based Algorithm Against Existing Methods

  • paper_url: http://arxiv.org/abs/2310.02195
  • repo_url: None
  • paper_authors: Louis Stubbe, Jens Goemaere, Jan Goedgebeur
  • for: solving the online, conflict-free scheduling and routing problem for AGVs
  • methods: loop-based algorithm
  • results: either outperforms other algorithms or gets an equally good solution in less computing time
    Abstract Automated guided vehicles (AGVs) are widely used in various industries, and scheduling and routing them in a conflict-free manner is crucial to their efficient operation. We propose a loop-based algorithm that solves the online, conflict-free scheduling and routing problem for AGVs. The proposed algorithm is compared against an exact method, a greedy heuristic and a metaheuristic. We experimentally show that this algorithm either outperforms the other algorithms or gets an equally good solution in less computing time.
    摘要 自动导向车(AGV)在各个业务中广泛应用, scheduling和路由它们在冲突无效的方式是关键。我们提出了一种循环基本算法,解决在线、冲突无效的AGV调度和路由问题。提案的算法与精确方法、聪明规则和元规则进行比较。我们实验表明,该算法可以在计算时间更短的情况下,与其他算法匹配或达到相同的解决方案。

Dimensions of Disagreement: Unpacking Divergence and Misalignment in Cognitive Science and Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2310.12994
  • repo_url: None
  • paper_authors: Kerem Oktar, Ilia Sucholutsky, Tania Lombrozo, Thomas L. Griffiths
  • for: 本研究旨在理解人工智能代理人与人类之间的不同观点和冲突,以及这些代理人之间的冲突。
  • methods: 该研究使用了人工智能研究和计算认知科学的工具来衡量代理人之间的表达匹配度。
  • results: 研究发现,不同表达的冲突和不同表达之间的冲突都会导致代理人之间的冲突,并且解决这些冲突的策略取决于这两种类型的冲突之间的交互。
    Abstract The increasing prevalence of artificial agents creates a correspondingly increasing need to manage disagreements between humans and artificial agents, as well as between artificial agents themselves. Considering this larger space of possible agents exposes an opportunity for furthering our understanding of the nature of disagreement: past studies in psychology have often cast disagreement as two agents forming diverging evaluations of the same object, but disagreement can also arise from differences in how agents represent that object. AI research on human-machine alignment and recent work in computational cognitive science have focused on this latter kind of disagreement, and have developed tools that can be used to quantify the extent of representational overlap between agents. Understanding how divergence and misalignment interact to produce disagreement, and how resolution strategies depend on this interaction, is key to promoting effective collaboration between diverse types of agents.
    摘要 人工智能的普遍化导致人工智能与人类之间的纷争增加,以及人工智能之间的纷争。鉴于这一更大的可能的代理人空间,推动我们理解不一致的本质:在心理学研究中,纷争通常被视为两个代理人对同一物体的评估方式不同而导致的,但纷争也可能来自代理人如何表示该物体的不同。AI研究人员在人机协调和计算认知科学中对这种后者类型的纷争进行了研究,并开发了用于衡量代理人表示之间的重叠程度的工具。理解不一致和不同的互动方式如何产生纷争,以及解决策略如何受到这种互动的影响,是促进多种代理人合作的关键。

Uncertainty Quantification in Inverse Models in Hydrology

  • paper_url: http://arxiv.org/abs/2310.02193
  • repo_url: None
  • paper_authors: Somya Sharma Chatterjee, Rahul Ghosh, Arvind Renganathan, Xiang Li, Snigdhansu Chatterjee, John Nieber, Christopher Duffy, Vipin Kumar
  • For: This paper aims to improve the accuracy of streamflow modeling by recovering physical characteristics of river basins from streamflow and weather data, which are more readily available.* Methods: The proposed method is a knowledge-guided, probabilistic inverse modeling approach that uses a Bayesian framework to estimate the physical characteristics of river basins. The method combines prior knowledge with streamflow and weather data to improve the accuracy of basin characteristic estimation.* Results: The proposed method offers 3% improvement in R$^2$ for the inverse model and 6% for the forward model compared to state-of-the-art inverse models. The method also provides improved explainability by quantifying uncertainty in both the inverse and forward models. Specifically, the framework offers 10% improvement in the dispersion of epistemic uncertainty and 13% improvement in coverage rate compared to baseline uncertainty quantification methods.
    Abstract In hydrology, modeling streamflow remains a challenging task due to the limited availability of basin characteristics information such as soil geology and geomorphology. These characteristics may be noisy due to measurement errors or may be missing altogether. To overcome this challenge, we propose a knowledge-guided, probabilistic inverse modeling method for recovering physical characteristics from streamflow and weather data, which are more readily available. We compare our framework with state-of-the-art inverse models for estimating river basin characteristics. We also show that these estimates offer improvement in streamflow modeling as opposed to using the original basin characteristic values. Our inverse model offers 3\% improvement in R$^2$ for the inverse model (basin characteristic estimation) and 6\% for the forward model (streamflow prediction). Our framework also offers improved explainability since it can quantify uncertainty in both the inverse and the forward model. Uncertainty quantification plays a pivotal role in improving the explainability of machine learning models by providing additional insights into the reliability and limitations of model predictions. In our analysis, we assess the quality of the uncertainty estimates. Compared to baseline uncertainty quantification methods, our framework offers 10\% improvement in the dispersion of epistemic uncertainty and 13\% improvement in coverage rate. This information can help stakeholders understand the level of uncertainty associated with the predictions and provide a more comprehensive view of the potential outcomes.
    摘要 hydrology 中,模拟流域流量是一项具有挑战性的任务,因为流域特征信息如土壤地质和地形等可能受到测量误差的干扰或者缺失。为了解决这个挑战,我们提出了一种基于知识导向的抽象概率反向模型方法,可以从流域流量和天气数据中回归物理特征。我们与现有的反向模型进行比较,并显示了这些估计对流域模型中的流量预测具有改善。我们的反向模型提供了3%的R$^2$提升,而前向模型提供了6%的提升。我们的框架还提供了改善的解释性,因为它可以量化反向和前向模型中的不确定性。在我们的分析中,我们评估了不确定性估计的质量,与基eline uncertainty quantification方法相比,我们的框架提供了10%的不确定性散度提升和13%的覆盖率提升。这些信息可以帮助各方理解预测结果的不确定性水平,并提供更全面的结果可能性图像。

What’s Next in Affective Modeling? Large Language Models

  • paper_url: http://arxiv.org/abs/2310.18322
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Nutchanon Yongsatianchot, Tobias Thejll-Madsen, Stacy Marsella
  • for: 这研究探讨了基于语言模型GPT-4的情感预测能力。
  • methods: 这研究使用了GPT-4来解决多种情感任务,包括情感理论和情感故事创作。
  • results: GPT-4能够成功地分辨出不同情感,并且可以通过提示GPT-4关键情感体验因素来控制情感强度。此外,GPT-4还能够正确地预测人类的目标、信仰和情感。
    Abstract Large Language Models (LLM) have recently been shown to perform well at various tasks from language understanding, reasoning, storytelling, and information search to theory of mind. In an extension of this work, we explore the ability of GPT-4 to solve tasks related to emotion prediction. GPT-4 performs well across multiple emotion tasks; it can distinguish emotion theories and come up with emotional stories. We show that by prompting GPT-4 to identify key factors of an emotional experience, it is able to manipulate the emotional intensity of its own stories. Furthermore, we explore GPT-4's ability on reverse appraisals by asking it to predict either the goal, belief, or emotion of a person using the other two. In general, GPT-4 can make the correct inferences. We suggest that LLMs could play an important role in affective modeling; however, they will not fully replace works that attempt to model the mechanisms underlying emotion-related processes.
    摘要

Investigating Large Language Models’ Perception of Emotion Using Appraisal Theory

  • paper_url: http://arxiv.org/abs/2310.04450
  • repo_url: None
  • paper_authors: Nutchanon Yongsatianchot, Parisa Ghanad Torshizi, Stacy Marsella
  • for: 这个研究旨在更好地理解大语言模型(LLM)在人类心理方面的理解,特别是它们对人类情感的理解。
  • methods: 这个研究使用了Stress and Coping Process Questionaire(SCPQ)测试三个最新的OpenAI LLM:davinci-003、ChatGPT和GPT-4。SCPQ是一种有效的临床工具,包含多个故事,随着时间的推移而发展,具有不同的关键评估变量,如可控性和可变性。
  • results: 研究发现,LLMs的响应与人类很相似,在评估和应对方面也类似,但它们的响应不同于预测和人类数据中的预测,而且响应的规模与人类差异很大。此外,研究发现,GPTs可以受到指令和问题的影响。这项研究将进一步扩展评估LLMs的心理方面,帮助我们更好地理解当前的模型。
    Abstract Large Language Models (LLM) like ChatGPT have significantly advanced in recent years and are now being used by the general public. As more people interact with these systems, improving our understanding of these black box models is crucial, especially regarding their understanding of human psychological aspects. In this work, we investigate their emotion perception through the lens of appraisal and coping theory using the Stress and Coping Process Questionaire (SCPQ). SCPQ is a validated clinical instrument consisting of multiple stories that evolve over time and differ in key appraisal variables such as controllability and changeability. We applied SCPQ to three recent LLMs from OpenAI, davinci-003, ChatGPT, and GPT-4 and compared the results with predictions from the appraisal theory and human data. The results show that LLMs' responses are similar to humans in terms of dynamics of appraisal and coping, but their responses did not differ along key appraisal dimensions as predicted by the theory and data. The magnitude of their responses is also quite different from humans in several variables. We also found that GPTs can be quite sensitive to instruction and how questions are asked. This work adds to the growing literature evaluating the psychological aspects of LLMs and helps enrich our understanding of the current models.
    摘要 大语言模型(LLM)如ChatGPT在最近几年内有 significiant advancement,现在已经被普通民众使用。随着更多人与这些系统进行交互,我们理解这些黑盒模型的重要性变得越来越大,尤其是对人类心理方面的理解。在这项工作中,我们通过对SCPQ问卷进行调查, investigate LLMs对人类情感的理解。SCPQ是一种有效的临床实用工具,包含多个故事,这些故事随着时间的推移而发展,并在关键的评估变量方面存在差异。我们对OpenAI提供的三个最新的LLM davinci-003、ChatGPT和GPT-4进行了应用,并与人类数据和理论预测进行比较。结果显示,LLMs的回答与人类的动态评估和应急响应类似,但是其回答不同于预测和人类数据中预期的关键评估维度。此外,LLMs的回答的强度与人类的强度有很大差异。我们还发现GPTs可以受到指令和问题的影响。这项工作将adding to the growing literature evaluating the psychological aspects of LLMs, and helps enrich our understanding of the current models。

Ask Again, Then Fail: Large Language Models’ Vacillations in Judgement

  • paper_url: http://arxiv.org/abs/2310.02174
  • repo_url: https://github.com/nustm/llms-waver-in-judgements
  • paper_authors: Qiming Xie, Zengzhi Wang, Yi Feng, Rui Xia
  • for: 这 paper 的目的是检测大语言模型(如 ChatGPT)在用户表达怀疑或不同意时的稳定性和可靠性。
  • methods: 这 paper 使用了一种名为 \textsc{Follow-up Questioning Mechanism} 的评估方法,以评估模型在不同情况下的判断一致性。
  • results: 研究发现,即使初始答案正确,模型在面临问题、否定或欺诈等干扰时,判断一致性很快下降。此外,研究还检查了不同设置(抽样温度和提示)对模型的影响,并进行了深入的错误分析以获得更深刻的行为认识。
    Abstract With the emergence of generative conversational large language models (LLMs) like ChatGPT, serving as virtual assistants in various fields, the stability and reliability of their responses have become crucial. However, during usage, it has been observed that these models tend to waver in their judgements when confronted with follow-up questions from users expressing skepticism or disagreement. In this work, we draw inspiration from questioning strategies in education and propose a \textsc{Follow-up Questioning Mechanism} along with two evaluation metrics to assess the judgement consistency of LLMs before and after exposure to disturbances. We evaluate the judgement consistency of ChatGPT, PaLM2-Bison, and Vicuna-13B under this mechanism across eight reasoning benchmarks. Empirical results show that even when the initial answers are correct, judgement consistency sharply decreases when LLMs face disturbances such as questioning, negation, or misleading. Additionally, we study these models' judgement consistency under various settings (sampling temperature and prompts) to validate this issue further, observing the impact of prompt tone and conducting an in-depth error analysis for deeper behavioral insights. Furthermore, we also explore several prompting methods to mitigate this issue and demonstrate their effectiveness\footnote{\url{https://github.com/NUSTM/LLMs-Waver-In-Judgements}.
    摘要 随着生成对话大语言模型(LLMs)如ChatGPT的出现,它们作为不同领域的虚拟助手,稳定和可靠的响应成为了关键。然而,在使用过程中,这些模型在用户表达skepticism或不同意时遇到问题时,往往会变得不稳定。在这种情况下,我们从教育中的问题策略中灵感,并提出了一种\textsc{Follow-up Questioning Mechanism},以评估LLMs在不同的问题和环境下的判断一致性。我们对ChatGPT、PaLM2-Bison和Vicuna-13B进行了八个逻辑标准套件的评估。实验结果表明,即使初始答案正确,LLMs在遇到问题、否定或误导时,判断一致性很快下降。此外,我们还研究了这些模型在不同的设置(抽象温度和提示)下的判断一致性,以验证这个问题的严重程度。最后,我们还提出了一些提示方法来解决这个问题,并证明了它们的有效性。更多细节可以参考[这里](https://github.com/NUSTM/LLMs-Waver-In-Judgements)。

Lyfe Agents: Generative agents for low-cost real-time social interactions

  • paper_url: http://arxiv.org/abs/2310.02172
  • repo_url: None
  • paper_authors: Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, Andrew Ahn
  • for: 本研究旨在开发一种可靠、高效、低成本的自主生成代理人,用于虚拟社会中的人类社会行为模拟。
  • methods: 本研究使用了以下三个关键技术:1)选择动作框架,以减少高级决策的成本;2)异步自我监测,以提高自我一致性;3)记忆机制,以优先级化关键记忆项,降低计算成本。
  • results: 研究发现,通过应用这些技术,LYFE代理人能够展现出人类自主社会行为的特点,例如通过自主协作和信息交换解决犯罪案件(如谋杀案)。同时,这些技术可以降低计算成本,相比现有的替代方案,计算成本下降10-100倍。这些发现表明自主生成代理人在虚拟世界中潜在地可以敷充人类社会经验。
    Abstract Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.
    摘要 高度自主的生成代理人powered by大语言模型承诺可以模拟复杂的社会行为在虚拟社会中。然而,实现实时交互与人类的计算成本仍然是挑战。我们介绍了Lyfe Agent。它们结合了低成本和实时应答,同时保持智能和目标强调。关键创新包括:1.选项-动作框架, reducethe cost of high-level decisions。2.异步自我监测,提高自我一致性。3.概要和忘记记忆机制,优先级低成本关键记忆项。我们在自定义的LyfeGame 3D虚拟环境平台上评估了Lyfe Agent的自我动机和社会能力。当装备了我们的脑机制时,Lyfe Agent可以展现出人类自我动机的社会逻辑。例如,代理人可以通过自主合作和信息交换解决杀人案(一个谋杀谜)。同时,我们的技术使得Lyfe Agent可以在计算成本10-100倍低于现有alternative的情况下运行。我们的发现挑战了自主生成代理人的可能性,以浸没人类社会经验的虚拟世界。

Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization

  • paper_url: http://arxiv.org/abs/2310.02170
  • repo_url: https://github.com/salt-nlp/dylan
  • paper_authors: Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang
  • for: 这个研究的目的是提高大型语言模型(LLM)代理的性能,并通过将多个LLM代理集成起来,以提高它们的普遍性和可靠性。
  • methods: 这个研究使用了一个名为“动态LLM-代理网络”(DyLAN)的框架,允许LLM代理在问题查询中互动,并在构成团队时选择最佳的代理。它还包括一个早期停止机制和一个自动团队优化算法,以提高性能和可效性。
  • results: 实验结果显示,DyLAN在逻辑和代码生成等复杂任务中表现出色,与单一GPT-35-turbo执行的结果相比,DyLAN可以获得13.0%和13.3%的提升。在特定主题的MMLU中,团队优化算法可以提高准确性达25.0%。
    Abstract Large language model (LLM) agents have been shown effective on a wide range of tasks, and by ensembling multiple LLM agents, their performances could be further improved. Existing approaches employ a fixed set of agents to interact with each other in a static architecture, which limits their generalizability to various tasks and requires strong human prior in designing these agents. In this work, we propose to construct a strategic team of agents communicating in a dynamic interaction architecture based on the task query. Specifically, we build a framework named Dynamic LLM-Agent Network ($\textbf{DyLAN}$) for LLM-agent collaboration on complicated tasks like reasoning and code generation. DyLAN enables agents to interact for multiple rounds in a dynamic architecture with inference-time agent selection and an early-stopping mechanism to improve performance and efficiency. We further design an automatic agent team optimization algorithm based on an unsupervised metric termed $\textit{Agent Importance Score}$, enabling the selection of best agents based on the contribution each agent makes. Empirically, we demonstrate that DyLAN performs well in both reasoning and code generation tasks with reasonable computational cost. DyLAN achieves 13.0% and 13.3% improvement on MATH and HumanEval, respectively, compared to a single execution on GPT-35-turbo. On specific subjects of MMLU, agent team optimization in DyLAN increases accuracy by up to 25.0%.
    摘要 大型语言模型(LLM)代理已被证明可以在各种任务上达到出色的效果,并通过 ensemble 多个 LLM 代理来进一步提高其性能。现有的方法通常采用固定的代理集合来交互在静态架构中,这限制了它们在不同任务上的泛化能力和需要强大的人工指导。在这项工作中,我们提议构建一个灵活的代理团队通过任务查询来交互。特别是,我们建立了名为 DyLAN(动态LLM代理网络)的框架,用于LLM代理在复杂任务中的合作。DyLAN 允许代理在动态架构中进行多轮交互,并在推理时选择代理和早期停止机制以提高性能和效率。此外,我们还设计了一种基于无监督度量的自动代理团队优化算法,以便根据每个代理的贡献来选择最佳的代理。Empirically,我们证明了 DyLAN 在理解和代码生成任务中表现出色,并且相对于单个执行 GPT-35-turbo 的情况下,DyLAN 可以提高 MATH 和 HumanEval 的表现,分别提高了13.0%和13.3%。在特定主题的 MMLU 任务中,DyLAN 的代理团队优化可以提高准确率达25.0%。

Editing Personality for LLMs

  • paper_url: http://arxiv.org/abs/2310.02168
  • repo_url: https://github.com/zjunlp/easyedit
  • paper_authors: Shengyu Mao, Ningyu Zhang, Xiaohan Wang, Mengru Wang, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
  • for: 这篇论文旨在编辑大语言模型(LLM)的人性特质。
  • methods: 该任务使用新的benchmark dataset PersonalityEdit,基于社会心理学理论选择了三种表现人性特质:躁郁、外向和合作。通过GPT-4生成响应,不仅与指定话题相符,还体现出目标人性特质。
  • results: 经过全面的基线测试和分析,发现这些基线在表现人性特质方面存在一些挑战,表明这个任务还存在一些问题。研究人员预计这种任务的成果可以为NLP社区提供新的想法。代码和数据将在https://github.com/zjunlp/EasyEdit中发布。
    Abstract This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct a new benchmark dataset PersonalityEdit to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that not only align with a specified topic but also embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our intriguing findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can provide the NLP community with insights. Code and datasets will be released at https://github.com/zjunlp/EasyEdit.
    摘要

Towards a Unified Framework for Sequential Decision Making

  • paper_url: http://arxiv.org/abs/2310.02167
  • repo_url: None
  • paper_authors: Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares
  • for: 提供一个通用的Sequential Decision Making(SDM)框架,以帮助理解Automated Planning(AP)和Reinforcement Learning(RL)的集成。
  • methods: 基于概率论和 bayesian inference 概念,从Classical Planning到 Deep RL 任何方法都可以适用。
  • results: 提出了一种通用的SDM任务的训练和测试Markov Decision Processes(MDPs),以确保总结抽象。还提出了一种基于任务知识的协助估计方法,并 derivated 一组公式和算法用于计算SDM任务和方法的有趣属性,使其可以进行实验评估和比较。
    Abstract In recent years, the integration of Automated Planning (AP) and Reinforcement Learning (RL) has seen a surge of interest. To perform this integration, a general framework for Sequential Decision Making (SDM) would prove immensely useful, as it would help us understand how AP and RL fit together. In this preliminary work, we attempt to provide such a framework, suitable for any method ranging from Classical Planning to Deep RL, by drawing on concepts from Probability Theory and Bayesian inference. We formulate an SDM task as a set of training and test Markov Decision Processes (MDPs), to account for generalization. We provide a general algorithm for SDM which we hypothesize every SDM method is based on. According to it, every SDM algorithm can be seen as a procedure that iteratively improves its solution estimate by leveraging the task knowledge available. Finally, we derive a set of formulas and algorithms for calculating interesting properties of SDM tasks and methods, which make possible their empirical evaluation and comparison.
    摘要 We formulate an SDM task as a set of training and test Markov Decision Processes (MDPs) to account for generalization. We propose a general algorithm for SDM, which we hypothesize is the basis for every SDM method. According to this algorithm, every SDM algorithm iteratively improves its solution estimate by leveraging task knowledge available.We derive a set of formulas and algorithms for calculating interesting properties of SDM tasks and methods, enabling their empirical evaluation and comparison. These properties include the expected cumulative reward, the probability of success, and the expected time to complete the task.Our proposed framework provides a unified approach to SDM, enabling the integration of various methods, from Classical Planning to Deep RL. By leveraging the power of Probability Theory and Bayesian inference, we can better understand the underlying principles of SDM and develop more effective and efficient algorithms for solving complex decision-making problems.

Conceptual Framework for Autonomous Cognitive Entities

  • paper_url: http://arxiv.org/abs/2310.06775
  • repo_url: https://github.com/daveshap/ACE_Framework
  • paper_authors: David Shapiro, Wangfan Li, Manuel Delaflor, Carlos Toxtli
  • for: 这篇论文的目的是提出一种新的认知架构,帮助机器人和软件代理人更加独立地运行。
  • methods: 该论文使用了一种名为ACE模型,这是一种基于OSI模型的认知架构,用于概括人工智能系统。
  • results: 该论文提出了一种新的认知架构,并测试了这种架构在实际应用中的可行性。该架构包括6层:aspirational层、全球策略层、代理模型层、执行函数层、认知控制层和任务追究层。每个层都扮演着不同的角色,从设定道德基础和战略思维到任务选择和执行。
    Abstract The rapid development and adoption of Generative AI (GAI) technology in the form of chatbots such as ChatGPT and Claude has greatly increased interest in agentic machines. This paper introduces the Autonomous Cognitive Entity (ACE) model, a novel framework for a cognitive architecture, enabling machines and software agents to operate more independently. Drawing inspiration from the OSI model, the ACE framework presents layers of abstraction to conceptualize artificial cognitive architectures. The model is designed to harness the capabilities of the latest generative AI technologies, including large language models (LLMs) and multimodal generative models (MMMs), to build autonomous, agentic systems. The ACE framework comprises six layers: the Aspirational Layer, Global Strategy, Agent Model, Executive Function, Cognitive Control, and Task Prosecution. Each layer plays a distinct role, ranging from setting the moral compass and strategic thinking to task selection and execution. The ACE framework also incorporates mechanisms for handling failures and adapting actions, thereby enhancing the robustness and flexibility of autonomous agents. This paper introduces the conceptual framework and proposes implementation strategies that have been tested and observed in industry. The goal of this paper is to formalize this framework so as to be more accessible.
    摘要 快速发展和应用生成人工智能(GAI)技术,如ChatGPT和Claude,对职业机器人的兴趣带来了急速增长。这篇论文介绍了自主认知体系(ACE)模型,一种新的认知架构,使得机器人和软件代理能够更加独立地运行。以OSI模型为 inspirations,ACE模型提供了各种层次抽象,用于描述人工认知体系。该模型采用了最新的生成人工智能技术,包括大语言模型(LLM)和多模态生成模型(MMM),以建立自主、主动的系统。ACE模型包括六层:aspirational层、全球策略层、代理模型层、执行函数层、认知控制层和任务执行层。每层都扮演着不同的角色,从设定道德指南和战略思维到任务选择和执行。ACE模型还包括处理失败和适应行动的机制,从而提高自主机器人的可靠性和灵活性。这篇论文将 introduce this framework,并提出了在行业中测试和观察的实施策略。文章的目的是以更加访问性的形式,将这个框架正式化。

Selenite: Scaffolding Decision Making with Comprehensive Overviews Elicited from Large Language Models

  • paper_url: http://arxiv.org/abs/2310.02161
  • repo_url: None
  • paper_authors: Michael Xieyang Liu, Tongshuang Wu, Tianying Chen, Franklin Mingzhe Li, Aniket Kittur, Brad A. Myers
  • for: 帮助用户在不熟悉的领域做出决策,减少用户的比较努力,提高决策效率。
  • methods: 利用自然语言处理技术和机器学习算法,自动生成option和标准的概述,帮助用户快速理解和掌握新信息。
  • results: 三个研究显示,selenite可靠地生成准确的概述,大幅加速用户的信息处理速度,提高了用户的总体理解和决策体验。
    Abstract Decision-making in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect to various criteria. Prior research and our formative study found that people would benefit from seeing an overview of the information space upfront, such as the criteria that others have previously found useful. However, existing sensemaking tools struggle with the "cold-start" problem -- it not only requires significant input from previous users to generate and share these overviews, but such overviews may also be biased and incomplete. In this work, we introduce a novel system, Selenite, which leverages LLMs as reasoning machines and knowledge retrievers to automatically produce a comprehensive overview of options and criteria to jumpstart users' sensemaking processes. Subsequently, Selenite also adapts as people use it, helping users find, read, and navigate unfamiliar information in a systematic yet personalized manner. Through three studies, we found that Selenite produced accurate and high-quality overviews reliably, significantly accelerated users' information processing, and effectively improved their overall comprehension and sensemaking experience.
    摘要 决策在不熟悉的领域可能是具有挑战性的,需要用户投入很大的努力来比较不同的选项,并考虑各种标准。先前的研究和我们的形成研究发现,人们会受益于在头一次使用时看到信息空间的概述,例如其他人在过去找到的有用的标准。然而,现有的感知工具受到“冷启动”问题的困扰——不仅需要大量的先前用户的输入来生成和分享这些概述,而且这些概述也可能受到偏见和缺失。在这项工作中,我们介绍了一种新的系统——Selenite,它利用人工智能语言模型(LLM)作为思维机器和知识检索器,自动生成选项和标准的全面概述,以便让用户快速开始感知过程。此外,Selenite还可以适应用户的使用,帮助用户找到、阅读和浏览未知的信息,并且系统化地帮助用户进行个性化的感知体验。通过三项研究,我们发现Selenite可靠地生成高质量的概述,可靠地加速用户的信息处理,并有效地改善用户的总体感知和感知体验。

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

  • paper_url: http://arxiv.org/abs/2310.02147
  • repo_url: None
  • paper_authors: Guojun Xiong, Jian Li
  • for: 这 paper 是关于 restless multi-armed bandits (RMAB) 问题的一种 asymptotically optimal 的 heuristic,但是计算 Whittle 指数仍然具有困难。
  • methods: 这 paper 提出了一种基于 Q-学习 的 Whittle index 算法,称为 Neural-Q-Whittle,其中使用了 neural network 函数近似来计算 Q-函数值,并在两个不同的时间尺度上更新 Q-函数值和 Whittle 指数。
  • results: 这 paper 提供了 Neural-Q-Whittle 算法的 finite-time 分析,其中数据来自 Markov chain,Q-函数被approx 成了 ReLU 神经网络。分析使用了 Lyapunov 漂移方法,并考虑了函数近似 error。结果显示,Neural-Q-Whittle 算法在 $\mathcal{O}(1/k^{2/3})$ 时间下达到 convergence rate,其中 $k$ 是迭代次数。
    Abstract Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift approach to capture the evolution of two coupled parameters, and the nonlinearity in value function approximation further requires us to characterize the approximation error. Combing these provide Neural-Q-Whittle with $\mathcal{O}(1/k^{2/3})$ convergence rate, where $k$ is the number of iterations.
    摘要 “对于困难的多臂枪客问题(RMAB),Whittle指标政策是一种几乎可以推导到最佳解的规律。然而,实际上找到Whittle指标仍然具有挑战性。在这篇论文中,我们提出了一个使用神经网络函数近似的Whittle指标基于Q学习算法,即Neural-Q-Whittle。这是一种具有两个时间步长的随机测approximation,其中Q值在更快的时间步长上更新,而Whittle指标则在更慢的时间步长上更新。尽管深度学习的实际成功,Neural-Q-Whittle的非对称数据分析仍然不清楚。这篇论文提供了Neural-Q-Whittle在Markov链上获得的finite-time分析,并且利用了Lyapunov滑动方法来捕捉两个耦合的参数的演化。由于值函数近似的非线性性,我们需要 characterize Approximation error。通过结合这些因素,我们可以给出Neural-Q-Whittle的$\mathcal{O}(1/k^{2/3})$的数据分析速率,其中$k$是迭代次数。”

Learning Reliable Logical Rules with SATNet

  • paper_url: http://arxiv.org/abs/2310.02133
  • repo_url: None
  • paper_authors: Zhaoyu Li, Jinpei Guo, Yuhe Jiang, Xujie Si
  • for: 本研究旨在推动逻辑推理和深度学习之间的 integrate,以建立更高级的 AI 系统。
  • methods: 我们提出了一种新的框架,通过 differentiable learning 生成可解释的和可验证的逻辑规则,不需要先天的逻辑结构。我们的方法基于 SATNet,一种可导式 MaxSAT 解决器,通过输入输出示例学习出下面的规则。
  • results: 我们的方法可以生成高可靠性的逻辑规则,并通过多种有效的验证技术验证其与真实规则的函数等价性。实验表明,使用 exact solvers 验证我们的决策则可以达到 100% 的准确率,而原始 SATNet 在许多情况下无法给出正确的解决方案。
    Abstract Bridging logical reasoning and deep learning is crucial for advanced AI systems. In this work, we present a new framework that addresses this goal by generating interpretable and verifiable logical rules through differentiable learning, without relying on pre-specified logical structures. Our approach builds upon SATNet, a differentiable MaxSAT solver that learns the underlying rules from input-output examples. Despite its efficacy, the learned weights in SATNet are not straightforwardly interpretable, failing to produce human-readable rules. To address this, we propose a novel specification method called "maximum equality", which enables the interchangeability between the learned weights of SATNet and a set of propositional logical rules in weighted MaxSAT form. With the decoded weighted MaxSAT formula, we further introduce several effective verification techniques to validate it against the ground truth rules. Experiments on stream transformations and Sudoku problems show that our decoded rules are highly reliable: using exact solvers on them could achieve 100% accuracy, whereas the original SATNet fails to give correct solutions in many cases. Furthermore, we formally verify that our decoded logical rules are functionally equivalent to the ground truth ones.
    摘要 bridging 逻辑推理和深度学习是高级人工智能系统的关键。在这项工作中,我们提出了一种新的框架,通过分别学习生成可读可验证的逻辑规则,不需要预先指定的逻辑结构。我们的方法基于SATNet,一种可微的MaxSAT解决方案,从输入输出示例中学习下面的规则。虽然SATNet的学习结果具有效果,但是学习出来的权重不是直观可读的,无法生成人类可读的规则。为解决这个问题,我们提出了一种新的规定方法 called "最大等式",允许将SATNet学习出来的权重与一组带权的 propositional 逻辑规则相互转换。通过解码的带权MaxSAT式,我们进一步引入了一些有效的验证技术,以验证它们是否与真实规则函数等价。实验表明,我们的解码规则具有高度可靠性:使用 exact 解决器处理它们可以达到100%的准确率,而原始SATNet在许多情况下无法提供正确的解决方案。此外,我们正式验证了我们解码的逻辑规则是否函数等价于真实规则。

Unveiling the Pitfalls of Knowledge Editing for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.02129
  • repo_url: https://github.com/zjunlp/pitfallsknowledgeediting
  • paper_authors: Zhoubo Li, Ningyu Zhang, Yunzhi Yao, Mengru Wang, Xi Chen, Huajun Chen
  • for: 本研究探讨了对大型自然语言模型(LLMs)知识编辑的风险。
  • methods: 本研究提出了新的评估指标和基准集,以评估知识编辑对LLMs的影响。
  • results: 研究发现,知识编辑可能会导致两类问题:知识冲突和知识扭曲。这些问题可能会对LLMs产生不良影响,需要未来研究的注意和努力。
    Abstract As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs-a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code will be released at https://github.com/zjunlp/PitfallsKnowledgeEditing.
    摘要 随着大语言模型(LLM)细化成本的增加,现有研究努力转移到开发方法来编辑 LLM 中的隐式知识。然而,仍然有一个阴影倒挂着——编辑知识是否会触发蝴蝶效应?由于 editing 可能会引入侧效,这些问题仍然未得到解决。本文开拓了 LLM 中编辑知识的可能风险的研究。为此,我们提出了新的 benchmarck 数据集和创新的评价指标。我们的结果显示了两个重要问题:(1)知识冲突:编辑冲突的知识组可能会增加 LLM 中的内在不一致性。(2)知识扭曲:修改参数以编辑实际知识可能会无法回归 LLM 的内生知识结构。实验结果表明,编辑知识可能会不良果的影响 LLM,这些问题需要未来的研究。代码将在 上发布。

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

  • paper_url: http://arxiv.org/abs/2310.02124
  • repo_url: https://github.com/zjunlp/machinesom
  • paper_authors: Jintian Zhang, Xin Xu, Shumin Deng
  • for: 本研究探讨了现代自然语言处理(NLP)系统在多代理社会中是否能够模仿人类的协同智能。
  • methods: 本研究结合了实验和理论视角,对当今NLP系统的协同机制进行了探索。研究者 fabricated four unique ‘societies’,每个社会由多个语言模型(LLM)代表,每个代表有特定的 ‘ trait’(愿景或自信)和 ‘ thinking pattern’(辩论或反思)。
  • results: 研究发现,LLM代表在完成任务时会采用不同的社会行为,从活泼的辩论到 introspective 的反思。此外,研究还发现了一些协同策略,可以提高效率(使用更少的 API tokens),同时也超越了之前的顶尖方法。此外,研究还发现了 LLM 代表具有人类社会行为的特征,如同论或多数规则。
    Abstract As Natural Language Processing (NLP) systems are increasingly employed in intricate social environments, a pressing query emerges: Can these NLP systems mirror human-esque collaborative intelligence, in a multi-agent society consisting of multiple large language models (LLMs)? This paper probes the collaboration mechanisms among contemporary NLP systems by melding practical experiments with theoretical insights. We fabricate four unique `societies' comprised of LLM agents, where each agent is characterized by a specific `trait' (easy-going or overconfident) and engages in collaboration with a distinct `thinking pattern' (debate or reflection). Evaluating these multi-agent societies on three benchmark datasets, we discern that LLM agents navigate tasks by leveraging diverse social behaviors, from active debates to introspective reflections. Notably, certain collaborative strategies only optimize efficiency (using fewer API tokens), but also outshine previous top-tier approaches. Moreover, our results further illustrate that LLM agents manifest human-like social behaviors, such as conformity or majority rule, mirroring foundational Social Psychology theories. In conclusion, we integrate insights from Social Psychology to contextualize the collaboration of LLM agents, inspiring further investigations into the collaboration mechanism for LLMs. We commit to sharing our code and datasets (already submitted in supplementary materials), hoping to catalyze further research in this promising avenue (All code and data are available at \url{https://github.com/zjunlp/MachineSoM}.).
    摘要 如果自然语言处理(NLP)系统在复杂社会环境中得到广泛应用,那么一个重要问题就是:这些NLP系统能否模仿人类的协同智能?这篇论文通过实验和理论启示来探索当今NLP系统之间的协同机制。我们创造了四个不同的“社会”,每个社会由多个大语言模型(LLM)组成,每个LLM代表不同的“特质”(愿景或自信),并且采用不同的“思维模式”(辩论或 introspection)进行协同。我们在三个标准测试集上评估这些多代理社会,发现LLM代理在完成任务时会采用多种社会行为,从活泼的辩论到 introspective reflection。尤其是,某些协同策略可以使用更少的API токен,同时也超越了之前的顶尖方法。此外,我们的结果还表明LLM代理展现出人类社会行为的特征,如同跟随性或多数规则,这与基本社会心理学理论相吻合。在结论中,我们将社会心理学理论与LLM协同机制相结合,并希望通过分享我们的代码和数据(已经在补充材料中提交),以便促进这一领域的进一步研究。

TWIZ: The Wizard of Multimodal Conversational-Stimulus

  • paper_url: http://arxiv.org/abs/2310.02118
  • repo_url: None
  • paper_authors: Rafael Ferreira, Diogo Tavares, Diogo Silva, Rodrigo Valério, João Bordalo, Inês Simões, Vasco Ramos, David Semedo, João Magalhães
  • For: The paper is written to describe the vision, challenges, and scientific contributions of the Task Wizard team (TWIZ) in the Alexa Prize TaskBot Challenge 2022.* Methods: The paper focuses on three main research questions: (1) Humanly-Shaped Conversations, (2) Multimodal Stimulus, and (3) Zero-shot Conversational Flows.* Results: The TWIZ bot is an effective and robust system that can guide users through complex manual tasks while providing several multimodal stimuli. The bot is capable of supporting a wide range of tasks and has several innovative features such as creative cooking and video navigation through voice.
    Abstract In this report, we describe the vision, challenges, and scientific contributions of the Task Wizard team, TWIZ, in the Alexa Prize TaskBot Challenge 2022. Our vision, is to build TWIZ bot as an helpful, multimodal, knowledgeable, and engaging assistant that can guide users towards the successful completion of complex manual tasks. To achieve this, we focus our efforts on three main research questions: (1) Humanly-Shaped Conversations, by providing information in a knowledgeable way; (2) Multimodal Stimulus, making use of various modalities including voice, images, and videos; and (3) Zero-shot Conversational Flows, to improve the robustness of the interaction to unseen scenarios. TWIZ is an assistant capable of supporting a wide range of tasks, with several innovative features such as creative cooking, video navigation through voice, and the robust TWIZ-LLM, a Large Language Model trained for dialoguing about complex manual tasks. Given ratings and feedback provided by users, we observed that TWIZ bot is an effective and robust system, capable of guiding users through tasks while providing several multimodal stimuli.
    摘要 在这份报告中,我们描述了任务魔法团队(TWIZ)在Alexa奖任务机器人挑战2022中的视野、挑战和科学贡献。我们的视野是建立一个有用、多Modal、知识型和有趣的助手,帮助用户完成复杂的手动任务。为了实现这一目标,我们对三个主要研究问题进行了集中努力:1. 人类化对话,通过提供知识型的信息,使用户感觉到和人类交流相似。2. 多Modal 刺激,使用声音、图片和视频等多种Modalities。3. 零shot对话流程,以提高对未看过的情况的响应性。TWIZ是一个能够支持多种任务的助手,具有创新的特点,如创意cooking、通过声音导航视频、robust TWIZ-LLM,一个对对话的大语言模型。根据用户提供的评分和反馈,我们发现TWIZ机器人是一个有效和Robust的系统,能够引导用户完成任务,并提供多种多Modal 刺激。

Towards Effective Human-AI Decision-Making: The Role of Human Learning in Appropriate Reliance on AI Advice

  • paper_url: http://arxiv.org/abs/2310.02108
  • repo_url: None
  • paper_authors: Max Schemmer, Andrea Bartos, Philipp Spitzer, Patrick Hemmer, Niklas Kühl, Jonas Liebschner, Gerhard Satzger
  • for: 本研究旨在探讨人类和人工智能(AI)合作的真正潜力是如何利用人类和AI的各自优势来实现联合性能超越个体AI或人类的性能,即实现补做团队性能(CTP)。
  • methods: 该研究使用实验方法,具体来说是采用100名参与者进行实验,以评估人类对AI建议的适当依赖。
  • results: 研究发现,人类学习是适当依赖AI建议的关键因素,而不仅仅是心理模型。此外,研究还提出了基本概念和设计方法,以便更好地分析依赖和实现人类AI决策的效果。
    Abstract The true potential of human-AI collaboration lies in exploiting the complementary capabilities of humans and AI to achieve a joint performance superior to that of the individual AI or human, i.e., to achieve complementary team performance (CTP). To realize this complementarity potential, humans need to exercise discretion in following AI 's advice, i.e., appropriately relying on the AI's advice. While previous work has focused on building a mental model of the AI to assess AI recommendations, recent research has shown that the mental model alone cannot explain appropriate reliance. We hypothesize that, in addition to the mental model, human learning is a key mediator of appropriate reliance and, thus, CTP. In this study, we demonstrate the relationship between learning and appropriate reliance in an experiment with 100 participants. This work provides fundamental concepts for analyzing reliance and derives implications for the effective design of human-AI decision-making.
    摘要 人类和人工智能(AI)的共同努力的真正潜力在于利用人类和AI的优势相互补做,以实现合作性能超过个体AI或人类的表现,即实现共同团队性能(CTP)。为实现这种共同可能性,人类需要在AI的建议下使用自己的聪明,即有选择地采纳AI的建议。在以前的研究中,人们主要关注建立AI的心理模型来评估AI的建议,但最新的研究表明,心理模型alone不能解释合适的依赖。我们假设,除了心理模型之外,人类学习也是适用依赖的关键因素,因此CTP。在这项实验中,我们证明了学习与合适依赖之间的关系,并 derive了对人类AI决策的设计方法的基本思想。

CoNO: Complex Neural Operator for Continuous Dynamical Systems

  • paper_url: http://arxiv.org/abs/2310.02094
  • repo_url: None
  • paper_authors: Karn Tiwari, N M Anoop Krishnan, Prathosh A P
    for:CoNO is designed to model continuous dynamical systems, such as weather forecasting, fluid flow, and solid mechanics.methods:CoNO uses a complex neural network with integral kernel parameterization in the complex fractional Fourier domain, along with aliasing-free activation functions to preserve complex values and algebraic properties.results:CoNO exhibits improved representation, robustness to noise, and generalization compared to existing neural operator models, and achieves comparable or superior performance on several tasks including zero-shot super-resolution, evaluation of out-of-distribution data, data efficiency, and robustness to noise.
    Abstract Neural operators extend data-driven models to map between infinite-dimensional functional spaces. These models have successfully solved continuous dynamical systems represented by differential equations, viz weather forecasting, fluid flow, or solid mechanics. However, the existing operators still rely on real space, thereby losing rich representations potentially captured in the complex space by functional transforms. In this paper, we introduce a Complex Neural Operator (CoNO), that parameterizes the integral kernel in the complex fractional Fourier domain. Additionally, the model employing a complex-valued neural network along with aliasing-free activation functions preserves the complex values and complex algebraic properties, thereby enabling improved representation, robustness to noise, and generalization. We show that the model effectively captures the underlying partial differential equation with a single complex fractional Fourier transform. We perform an extensive empirical evaluation of CoNO on several datasets and additional tasks such as zero-shot super-resolution, evaluation of out-of-distribution data, data efficiency, and robustness to noise. CoNO exhibits comparable or superior performance to all the state-of-the-art models in these tasks. Altogether, CoNO presents a robust and superior model for modeling continuous dynamical systems, providing a fillip to scientific machine learning.
    摘要

  • paper_url: http://arxiv.org/abs/2310.05976
  • repo_url: None
  • paper_authors: Reiji Suzuki, Takaya Arita
  • for: 本研究旨在探讨多样性和社会层次上的EVOLUTIONARY dynamics, 通过将生成模型引入社会代理模型中的特质表达中,提高了模型的表达力。
  • methods: 我们使用了语言模型(LLM)提取的决策策略,将语言描述的人格特质作为基因,并通过选择和变异基因来进行人类进化。
  • results: 我们的初步实验和分析表明,这种模型可以基于多样性和高级表达的人格特质来演化合作行为。我们还发现了在表达人格特质时的重复干扰,以及基因表达中出现的行为倾向的含义。
    Abstract This paper aims to shed light on the evolutionary dynamics of diverse and social populations by introducing the rich expressiveness of generative models into the trait expression of social agent-based evolutionary models. Specifically, we focus on the evolution of personality traits in the context of a game-theoretic relationship as a situation in which inter-individual interests exert strong selection pressures. We construct an agent model in which linguistic descriptions of personality traits related to cooperative behavior are used as genes. The deterministic strategies extracted from Large Language Model (LLM) that make behavioral decisions based on these personality traits are used as behavioral traits. The population is evolved according to selection based on average payoff and mutation of genes by asking LLM to slightly modify the parent gene toward cooperative or selfish. Through preliminary experiments and analyses, we clarify that such a model can indeed exhibit the evolution of cooperative behavior based on the diverse and higher-order representation of personality traits. We also observed the repeated intrusion of cooperative and selfish personality traits through changes in the expression of personality traits, and found that the emerging words in the evolved gene well reflected the behavioral tendency of its personality in terms of their semantics.
    摘要

Point Neighborhood Embeddings

  • paper_url: http://arxiv.org/abs/2310.02083
  • repo_url: https://github.com/ANAGHA93/t-SNE
  • paper_authors: Pedro Hermosilla
  • for: 本研究旨在分析点云中的邻域信息编码方法,以提高未来 neural network 架构的设计。
  • methods: 研究使用不同的邻域信息编码方法,包括多层感知机(MLP)、ReLU活化函数和简单的卷积。
  • results: 研究发现,使用MLP编码器的邻域信息编码方法实际下表现最差,甚至在某些任务上被简单的点坐标线性组合超越。此外,使用这些建议实现的神经网络架构可以达到多个任务的状态OF-the-art级Results,超越最近的更复杂的操作。
    Abstract Point convolution operations rely on different embedding mechanisms to encode the neighborhood information of each point in order to detect patterns in 3D space. However, as convolutions are usually evaluated as a whole, not much work has been done to investigate which is the ideal mechanism to encode such neighborhood information. In this paper, we provide the first extensive study that analyzes such Point Neighborhood Embeddings (PNE) alone in a controlled experimental setup. From our experiments, we derive a set of recommendations for PNE that can help to improve future designs of neural network architectures for point clouds. Our most surprising finding shows that the most commonly used embedding based on a Multi-layer Perceptron (MLP) with ReLU activation functions provides the lowest performance among all embeddings, even being surpassed on some tasks by a simple linear combination of the point coordinates. Additionally, we show that a neural network architecture using simple convolutions based on such embeddings is able to achieve state-of-the-art results on several tasks, outperforming recent and more complex operations. Lastly, we show that these findings extrapolate to other more complex convolution operations, where we show how following our recommendations we are able to improve recent state-of-the-art architectures.
    摘要 <>将点conv操作转换为标准的中文简体字符串。>点 convolution 操作需要不同的嵌入机制来编码每个点的邻居信息,以探测3D空间中的模式。然而,通常情况下, convolution 被评估为整体,因此很少人对点邻居编码(Point Neighborhood Embeddings,PNE)进行了系统的研究。在这篇论文中,我们提供了首次对 PNE 进行了系统的研究,并在控制的实验室中进行了广泛的测试。从我们的实验结果中,我们提出了一些关于 PNE 的建议,这些建议可以帮助未来的神经网络架构设计人员为点云进行优化。我们最大化的发现是,通常使用的多层感知器(MLP)与 ReLU 活化函数基于的嵌入方法的性能最差,甚至在一些任务上被一个简单的点坐标的线性组合所超越。此外,我们显示了一种使用这些嵌入的神经网络架构可以在多个任务上达到状态的最佳结果,超越了最近的和更复杂的操作。最后,我们表明这些发现可以推广到其他更复杂的 convolution 操作,我们在这些操作中采用了我们的建议,并成功地提高了最近的状态级别的架构。

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

  • paper_url: http://arxiv.org/abs/2310.02071
  • repo_url: https://github.com/pkunlp-icler/pca-eval
  • paper_authors: Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Tianyu Liu, Baobao Chang
  • for: 本研究探索了多模态大语言模型(MLLMs)在改善体现决策过程中的潜力。
  • methods: 本研究使用了 state-of-the-art MLLMs like GPT4-Vision,以及 HOLMES 框架,让 LLMs 可以通过多模态信息来做出更加有知见的决策。
  • results: 研究发现,使用 GPT4-Vision 模型可以实现更高的体现决策能力,相比 GPT4-HOLMES 模型。此外,GPT4-Vision 模型还可以在 PCA-EVAL benchmark 上表现出色,相比 open-source state-of-the-art MLLM。
    Abstract In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents. While Large Language Models (LLMs) have been widely used due to their advanced reasoning skills and vast world knowledge, MLLMs like GPT4-Vision offer enhanced visual understanding and reasoning capabilities. We investigate whether state-of-the-art MLLMs can handle embodied decision-making in an end-to-end manner and whether collaborations between LLMs and MLLMs can enhance decision-making. To address these questions, we introduce a new benchmark called PCA-EVAL, which evaluates embodied decision-making from the perspectives of Perception, Cognition, and Action. Additionally, we propose HOLMES, a multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs to gather multimodal information for informed decision-making. We compare end-to-end embodied decision-making and HOLMES on our benchmark and find that the GPT4-Vision model demonstrates strong end-to-end embodied decision-making abilities, outperforming GPT4-HOLMES in terms of average decision accuracy (+3%). However, this performance is exclusive to the latest GPT4-Vision model, surpassing the open-source state-of-the-art MLLM by 26%. Our results indicate that powerful MLLMs like GPT4-Vision hold promise for decision-making in embodied agents, offering new avenues for MLLM research. Code and data are open at https://github.com/pkunlp-icler/PCA-EVAL/.
    摘要 在这个研究中,我们探索了多模态大语言模型(MLLMs)在改进具体决策过程中的潜力。由于大语言模型(LLMs)在逻辑能力和世界知识方面表现出色,因此我们尝试了使用MLLMs来提高具体决策。我们研究了MLLMs是否可以在端到端方式完成具体决策,以及LLMs和MLLMs之间的合作是否可以提高决策。为此,我们提出了一个新的评价指标集合,称为PCA-EVAL,用于评估具体决策从多个角度。此外,我们还提出了一种多代理合作框架,称为HOLMES,允许LLMs通过多模态信息来做出 Informed 决策。我们对PCA-EVAL和HOLMES进行比较,发现GPT4-Vision模型在PCA-EVAL上表现出色,相比GPT4-HOLMES,其决策精度提高了3%。但是,这种性能仅限于最新的GPT4-Vision模型,超过了开源状态态的MLLM的性能。我们的结果表明,具有强大的MLLMs如GPT4-Vision可能在具体决策中发挥作用,为MRLM研究提供新的 Avenues。代码和数据在https://github.com/pkunlp-icler/PCA-EVAL/。

Content Bias in Deep Learning Age Approximation: A new Approach Towards more Explainability

  • paper_url: http://arxiv.org/abs/2310.02067
  • repo_url: None
  • paper_authors: Robert Jöchl, Andreas Uhl
  • for: 这个论文主要用于探讨图像时间伪造检测中,用 neural network 学习图像年龄特征的问题。
  • methods: 该论文提出了一种新的方法来评估图像内容对于年龄分类中的影响。该方法使用synthetic图像(可以排除内容偏好),并在这些图像中嵌入年龄信号。然后,通过训练标准的 neural network 来评估内容对于年龄分类的影响。
  • results: 研究发现,使用标准的 neural network 在年龄分类任务中,具有强度依赖于图像内容的特征。为了 Mitigate 这种影响,研究人员提出了两种不同的技术,并通过论文中的方法进行评估。
    Abstract In the context of temporal image forensics, it is not evident that a neural network, trained on images from different time-slots (classes), exploit solely age related features. Usually, images taken in close temporal proximity (e.g., belonging to the same age class) share some common content properties. Such content bias can be exploited by a neural network. In this work, a novel approach that evaluates the influence of image content is proposed. This approach is verified using synthetic images (where content bias can be ruled out) with an age signal embedded. Based on the proposed approach, it is shown that a `standard' neural network trained in the context of age classification is strongly dependent on image content. As a potential countermeasure, two different techniques are applied to mitigate the influence of the image content during training, and they are also evaluated by the proposed method.
    摘要 在图像时间修复方面,没有直接证明神经网络,通过图像不同时间槽(类)训练,仅仅利用年龄相关特征。通常,属于同一年龄类别的图像在close temporal proximity(例如,同一个年龄类别)共享一些共同内容特征。这种内容偏好可以被神经网络利用。在这种工作中,一种新的方法,评估图像内容的影响,被提出。该方法通过使用synthetic images( contenido bias可以排除),并将年轻信号嵌入图像中,来验证。根据所提出的方法,显示一个“标准”的神经网络,在年龄分类任务中强烈依赖于图像内容。为了 Mitigate the influence of image content during training, two different techniques are applied and evaluated by the proposed method.

De Novo Drug Design with Joint Transformers

  • paper_url: http://arxiv.org/abs/2310.02066
  • repo_url: None
  • paper_authors: Adam Izdebski, Ewelina Weglarz-Tomczak, Ewa Szczurek, Jakub M. Tomczak
  • for: 本研究旨在提出一种能同时生成外部数据外的新分子和预测其目标性质的新型生成模型,以解决德 ноVO drug design中的难题。
  • methods: 我们提出了一种 combining Transformer decoder、Transformer encoder 和预测器的共享权重的联合生成模型,并通过训练该模型使用 penalty 对数对数据进行优化。
  • results: 我们的方法可以在分子生成中达到领先的性能,同时降低新样本中预测错误,相比于精度调整后的 decoder-only Transformer,提高了42%。此外,我们还提出了一种基于 Joint Transformer 的 probabilistic黑盒优化算法,可以生成具有改进目标性质的新分子,比训练数据更高效。
    Abstract De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, a Transformer encoder, and a predictor in a joint generative model with shared weights. We show that training the model with a penalized log-likelihood objective results in state-of-the-art performance in molecule generation, while decreasing the prediction error on newly sampled molecules, as compared to a fine-tuned decoder-only Transformer, by 42%. Finally, we propose a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties, as compared to the training data, outperforming other SMILES-based optimization methods in de novo drug design.
    摘要 德 новো drug 设计需要同时生成外部训练数据中没有的分子和预测其目标性质,这使得生成模型具有困难任务。为此,我们提议了共同变换器(Joint Transformer),它将 transformer 解码器、transformer 编码器和预测器组合在一起,形成一个共同生成模型,其中所有参数共享。我们表明,通过训练该模型的 penalty логиarithmic 目标函数可以 достичь领域内最佳性能,同时降低新样本分子预测错误率,比较 fine-tuned 解码器 только transformer 的42%。最后,我们提议了基于 Joint Transformer 的 probabilistic 黑盒优化算法,可以通过生成新分子来提高目标性质,比较其他 SMILES 基于的优化方法在德 новォ drug 设计中表现优异。

Relaxed Octahedral Group Convolution for Learning Symmetry Breaking in 3D Physical Systems

  • paper_url: http://arxiv.org/abs/2310.02299
  • repo_url: None
  • paper_authors: Rui Wang, Robin Walters, Tess E. Smidt
  • for: 这篇论文旨在提高采样效率和泛化性,通过使用对称性来改进深度模型。
  • methods: 这篇论文提出了一种弹性八面体卷积,可以保持数据中的最高水平的对称性,同时发现物理系统中的微妙对称性破坏因素。
  • results: 实验结果表明,这种方法可以不仅提供物理系统中对称性破坏因素的理解,还可以在流体超分解任务中实现优秀的性能。
    Abstract Deep equivariant models use symmetries to improve sample efficiency and generalization. However, the assumption of perfect symmetry in many of these models can sometimes be restrictive, especially when the data does not perfectly align with such symmetries. Thus, we introduce relaxed octahedral group convolution for modeling 3D physical systems in this paper. This flexible convolution technique provably allows the model to both maintain the highest level of equivariance that is consistent with data and discover the subtle symmetry-breaking factors in the physical systems. Empirical results validate that our approach can not only provide insights into the symmetry-breaking factors in phase transitions but also achieves superior performance in fluid super-resolution tasks.
    摘要 深度对称模型使用对称性来提高样本效率和泛化性。然而,在许多情况下,这些模型假设的完美对称性可能是限制性的,特别是当数据不完全与这些对称性相对应。因此,我们在这篇论文中引入了放宽的八面体群 convolution来模型三维物理系统。这种灵活的 convolution 技术可以证明地保持数据中的最高水平对称性,同时发现物理系统中的微妙对称性破坏因素。实验结果验证了我们的方法不仅可以提供物理系统中对称性破坏因素的新的视角,还可以在液体超解像任务中实现更高的性能。

AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model

  • paper_url: http://arxiv.org/abs/2310.02054
  • repo_url: None
  • paper_authors: Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Tangjie Lv, Changjie Fan, Zhipeng Hu
  • for: 本文提出了一种新的框架,即 AlignDiff,用于在人工智能学习中对人类喜好进行质量评估,包括抽象性和可变性。
  • methods: 本文使用了人工智能反馈学习(RLHF)来量化人类喜好,并使用这些量化结果来导引扩散规划,以实现零基础行为定制。
  • results: 本文在多种 locomotive 任务上证明了 AlignDiff 的超越性,包括 preference matching、switching 和 covering。此外,它还可以完成人类指导下的未看过任务。
    Abstract Aligning agent behaviors with diverse human preferences remains a challenging problem in reinforcement learning (RL), owing to the inherent abstractness and mutability of human preferences. To address these issues, we propose AlignDiff, a novel framework that leverages RL from Human Feedback (RLHF) to quantify human preferences, covering abstractness, and utilizes them to guide diffusion planning for zero-shot behavior customizing, covering mutability. AlignDiff can accurately match user-customized behaviors and efficiently switch from one to another. To build the framework, we first establish the multi-perspective human feedback datasets, which contain comparisons for the attributes of diverse behaviors, and then train an attribute strength model to predict quantified relative strengths. After relabeling behavioral datasets with relative strengths, we proceed to train an attribute-conditioned diffusion model, which serves as a planner with the attribute strength model as a director for preference aligning at the inference phase. We evaluate AlignDiff on various locomotion tasks and demonstrate its superior performance on preference matching, switching, and covering compared to other baselines. Its capability of completing unseen downstream tasks under human instructions also showcases the promising potential for human-AI collaboration. More visualization videos are released on https://aligndiff.github.io/.
    摘要 把代理人行为与多样化的人类偏好进行匹配仍然是现代学习(RL)中的挑战,因为人类偏好的本质和变化性具有抽象和多变性。为解决这些问题,我们提出了AlignDiff框架,它利用人类反馈学习(RLHF)来量化人类偏好,包括抽象和多变性,并使用它们来导航扩散规划,以实现零shot行为定制。AlignDiff可以准确匹配用户自定义的行为,并高效地switch между不同的行为。为建立框架,我们首先建立了多个视角的人类反馈数据集,其中包含了多种行为的属性比较,然后我们训练了一个属性强度模型,以预测量化的相对强度。接着,我们将行为数据集重新标注为相对强度,然后训练了一个属性 conditional扩散模型,它作为一个决策者,在推理阶段对偏好进行匹配。我们在多种步行任务上评估了AlignDiff,并证明其在偏好匹配、switching和覆盖方面表现出色,比基eline更好。它还可以完成未看过的下游任务,这说明了它在人AI合作中的潜力。更多视频可以在https://aligndiff.github.io/查看。

Jury: A Comprehensive Evaluation Toolkit

  • paper_url: http://arxiv.org/abs/2310.02040
  • repo_url: https://github.com/obss/jury
  • paper_authors: Devrim Cavusoglu, Ulas Sert, Secil Sen, Sinan Altinuc
  • for: 本研究旨在标准化和改进深度学习系统的评估方法,以便在不同任务和度量之间进行评估。
  • methods: 本研究使用了一个名为“jury”的工具包,提供了一个统一的评估框架,可以在不同任务和度量之间进行评估。
  • results: 在发布于GitHub的开源版本中,“jury”已经获得了广泛的关注和使用,并且可以帮助学术社区解决评估挑战。
    Abstract Evaluation plays a critical role in deep learning as a fundamental block of any prediction-based system. However, the vast number of Natural Language Processing (NLP) tasks and the development of various metrics have led to challenges in evaluating different systems with different metrics. To address these challenges, we introduce jury, a toolkit that provides a unified evaluation framework with standardized structures for performing evaluation across different tasks and metrics. The objective of jury is to standardize and improve metric evaluation for all systems and aid the community in overcoming the challenges in evaluation. Since its open-source release, jury has reached a wide audience and is available at https://github.com/obss/jury.
    摘要 评估在深度学习中扮演了关键的角色,是任何预测基本系统的基本块。然而,由于各种自然语言处理(NLP)任务的庞大数量和不同的评价指标的发展,评估不同系统的评价带来了挑战。为解决这些挑战,我们引入了一个名为“评审团”(jury)的工具包,它提供了一个统一的评估框架,可以在不同任务和指标之间进行标准化的评估。jury的目标是标准化和改进所有系统的评估,以帮助社区超越评估的挑战。自其开源发布以来,jury已经达到了广泛的用户群和可以在https://github.com/obss/jury上下载。

An evaluation of pre-trained models for feature extraction in image classification

  • paper_url: http://arxiv.org/abs/2310.02037
  • repo_url: https://github.com/Jawad-Dar/Jaya-Honey-Badger-Optimization-based-Deep-Neuro-Fuzzy-Network-structure-for-detection-of-Covid-19-
  • paper_authors: Erick da Silva Puls, Matheus V. Todescato, Joel L. Carbonera
  • for: 这个研究的目的是比较不同预训网络模型在图像分类任务中的表现。
  • methods: 这个研究使用了16个预训网络模型,并在四个图像dataset上进行评估。
  • results: 我们的结果显示,CLIP-ViT-B和ViT-H-14模型在所有dataset上均有最好的总表现,而CLIP-ResNet50模型则有相似的表现,但较少的波动。这显示了这些模型在图像分类任务中的表现。
    Abstract In recent years, we have witnessed a considerable increase in performance in image classification tasks. This performance improvement is mainly due to the adoption of deep learning techniques. Generally, deep learning techniques demand a large set of annotated data, making it a challenge when applying it to small datasets. In this scenario, transfer learning strategies have become a promising alternative to overcome these issues. This work aims to compare the performance of different pre-trained neural networks for feature extraction in image classification tasks. We evaluated 16 different pre-trained models in four image datasets. Our results demonstrate that the best general performance along the datasets was achieved by CLIP-ViT-B and ViT-H-14, where the CLIP-ResNet50 model had similar performance but with less variability. Therefore, our study provides evidence supporting the choice of models for feature extraction in image classification tasks.
    摘要 近年来,我们所目睹到的图像分类任务中表现的提升非常显著。这种表现提升主要归功于深度学习技术的推广。深度学习技术通常需要大量的标注数据,因此对于小 datasets 来说是一大挑战。在这种情况下,转移学习策略成为了一个有前途的解决方案。本研究的目的是比较不同预训练神经网络的特征提取性能在图像分类任务中。我们在四个图像 datasets 中评估了16个预训练模型。我们的结果显示,CLIP-ViT-B 和 ViT-H-14 模型在所有 datasets 中表现最佳,而 CLIP-ResNet50 模型具有类似表现,但变化较少。因此,本研究提供了支持预训练模型选择的证据,以便在图像分类任务中进行特征提取。

OceanGPT: A Large Language Model for Ocean Science Tasks

  • paper_url: http://arxiv.org/abs/2310.02031
  • repo_url: https://github.com/zjunlp/knowlm
  • paper_authors: Zhen Bi, Ningyu Zhang, Yida Xue, Yixin Ou, Daxiong Ji, Guozhou Zheng, Huajun Chen
  • For: The paper aims to explore the potential of Large Language Models (LLMs) for ocean science tasks, and to address the limitations of current LLMs in catering to the needs of domain experts like oceanographers.* Methods: The authors propose a novel framework called DoInstruct to automatically obtain a large volume of ocean domain instruction data, and construct the first oceanography benchmark called OceanBench to evaluate the capabilities of LLMs in the ocean domain.* Results: The authors introduce OceanGPT, the first-ever LLM in the ocean domain, which shows a higher level of knowledge expertise for ocean science tasks and gains preliminary embodied intelligence capabilities in ocean technology through comprehensive experiments.
    Abstract Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reason may be the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever LLM in the ocean domain, which is expert in various ocean science tasks. We propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. Codes, data and checkpoints will soon be available at https://github.com/zjunlp/KnowLM.
    摘要 海洋科学,探索地球表面的70%以上的海洋,对生物多样性和生命支持非常重要。 current Large Language Models (LLMs) 在其他领域已经取得了成功,但是在海洋科学领域,LLMs frequently fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The reason may be the complexity and intricacy of ocean data, as well as the need for higher granularity and richness in knowledge. To address these issues, we introduce OceanGPT, the first-ever LLM in the ocean domain, which is proficient in various ocean science tasks. We also propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Furthermore, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Through comprehensive experiments, OceanGPT not only demonstrates a higher level of knowledge expertise for ocean science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. codes, data, and checkpoints will soon be available at https://github.com/zjunlp/KnowLM.

Prompting Audios Using Acoustic Properties For Emotion Representation

  • paper_url: http://arxiv.org/abs/2310.02298
  • repo_url: None
  • paper_authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh
  • for: 用于改进情感表示和识别
  • methods: 使用自然语言描述(或提示)和对比学习对象来自动生成提示,并将speech和提示对应起来
  • results: 在Emotion Audio Retrieval和Speech Emotion Recognition任务上,使用acoustic prompts显著提高了模型的性能,precision@k指标在EAR中提高了 Various 的值,在 Ravdess 数据集上,relative accuracy 提高了3.8%。
    Abstract Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs. We use acoustic properties that are correlated to emotion like pitch, intensity, speech rate, and articulation rate to automatically generate prompts i.e. 'acoustic prompts'. We use a contrastive learning objective to map speech to their respective acoustic prompts. We evaluate our model on Emotion Audio Retrieval and Speech Emotion Recognition. Our results show that the acoustic prompts significantly improve the model's performance in EAR, in various Precision@K metrics. In SER, we observe a 3.8% relative accuracy improvement on the Ravdess dataset.
    摘要

Towards Feasible Counterfactual Explanations: A Taxonomy Guided Template-based NLG Method

  • paper_url: http://arxiv.org/abs/2310.02019
  • repo_url: https://github.com/pedramsalimi/nlgxai
  • paper_authors: Pedram Salimi, Nirmalie Wiratunga, David Corsar, Anjana Wijekoon
  • for: 本研究的目的是提出一种新的自然语言Counterfactual Explanation(Natural-XAI)方法,以便更好地解释模型决策过程中的必要变量更改。
  • methods: 本研究使用了一个用户研究,找到了人类编写的Counterfactual Explanation中的两类主题:内容相关的,关注从反事件和查询角度来包含特征和其值的方式;结构相关的,关注描述必要值更改的结构和术语。
  • results: 本研究提出了一个特征可行性税onomy,用于总结和简化Counterfactual Explanation的描述过程。使用这个税onomy和一些预先设计的模板,可以生成与现有的解释器(如DICE、NICE和DisCERN)兼容的自然语言生成结果,以提高Counterfactual Explanation的可读性和可行性。
    Abstract Counterfactual Explanations (cf-XAI) describe the smallest changes in feature values necessary to change an outcome from one class to another. However, many cf-XAI methods neglect the feasibility of those changes. In this paper, we introduce a novel approach for presenting cf-XAI in natural language (Natural-XAI), giving careful consideration to actionable and comprehensible aspects while remaining cognizant of immutability and ethical concerns. We present three contributions to this endeavor. Firstly, through a user study, we identify two types of themes present in cf-XAI composed by humans: content-related, focusing on how features and their values are included from both the counterfactual and the query perspectives; and structure-related, focusing on the structure and terminology used for describing necessary value changes. Secondly, we introduce a feature actionability taxonomy with four clearly defined categories, to streamline the explanation presentation process. Using insights from the user study and our taxonomy, we created a generalisable template-based natural language generation (NLG) method compatible with existing explainers like DICE, NICE, and DisCERN, to produce counterfactuals that address the aforementioned limitations of existing approaches. Finally, we conducted a second user study to assess the performance of our taxonomy-guided NLG templates on three domains. Our findings show that the taxonomy-guided Natural-XAI approach (n-XAI^T) received higher user ratings across all dimensions, with significantly improved results in the majority of the domains assessed for articulation, acceptability, feasibility, and sensitivity dimensions.
    摘要 counterfactual 解释 (cf-XAI) 描述最小改变Feature值能够改变结果从一个类别转移到另一个类别。然而,许多 cf-XAI 方法忽略了这些改变的可行性。在这篇论文中,我们介绍了一种新的方法,用于在自然语言 (Natural-XAI) 中提供 counterfactual 解释,同时考虑到可行性和可理解性的考虑因素,并保持决策和伦理问题的注意。我们在这篇论文中提供了三项贡献。首先,通过用户研究,我们发现了 counterfactual 解释中 humans 所创作的两种主题:内容相关,关注从 counterfactual 和查询角度来看 feature 和其值的包含方式;和结构相关,关注 counterfactual 解释中 feature 值改变所需的结构和术语使用方式。其次,我们引入了一个功能可行分类,用于总结 counterfactual 解释中 feature 值改变的可行性。使用用户研究和我们的分类,我们创建了一种可与现有的解释器 like DICE、NICE 和 DisCERN 兼容的通用 template-based自然语言生成 (NLG) 方法,以生成可以Addressing the limitations of existing approaches的 counterfactuals。最后,我们进行了第二次用户研究,以评估我们的分类导向 NLG 模板在三个领域的表现。我们的发现显示,与我们的分类导向 NLG 模板相比,传统的 counterfactual 解释方法在大多数领域都表现较差,特别是在某些领域的可行性、可理解性、可行性和敏感度方面表现较差。

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

  • paper_url: http://arxiv.org/abs/2310.02012
  • repo_url: https://github.com/alexandrumeterez/bngrad
  • paper_authors: Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand
  • for: 这个论文的目的是提出一种拥有优化信号传递特性,但避免深度梯度爆炸的多层感知网络。
  • methods: 这个论文使用了批量Normalization层,并采用了Weingarten calculus来建立一种非对易的理论模型,以确定批量Normalization层在深度学习中的表现。
  • results: 论文的研究结果表明,通过特定的MLP结构和批量Normalization层的组合,可以实现保持优化信号传递特性,同时避免深度梯度爆炸的目标。此外,论文还提出了一种活动填充方案,可以在非线性激活函数下实现相似的性能。
    Abstract Normalization layers are one of the key building blocks for deep neural networks. Several theoretical studies have shown that batch normalization improves the signal propagation, by avoiding the representations from becoming collinear across the layers. However, results on mean-field theory of batch normalization also conclude that this benefit comes at the expense of exploding gradients in depth. Motivated by these two aspects of batch normalization, in this study we pose the following question: "Can a batch-normalized network keep the optimal signal propagation properties, but avoid exploding gradients?" We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth. Based on Weingarten calculus, we develop a rigorous and non-asymptotic theory for this constructed MLP that gives a precise characterization of forward signal propagation, while proving that gradients remain bounded for linearly independent input samples, which holds in most practical settings. Inspired by our theory, we also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
    摘要 归并层是深度神经网络的关键组件之一。许多理论研究表明,批处Normalization可以改善信号传递,避免层之间的表示变得相互平行。然而,基于mean-field theory的研究也表明,这些优点是随着深度层数的增加而导致梯度爆炸的代价。为了解决这个问题,我们提出以下问题:“是否可以在批处Normalization的情况下保持最佳的信号传递特性,而免除深度层数随着增加而导致梯度爆炸?”我们回答这个问题的答案是肯定的,并给出了一种特殊的多层感知机(MLP),其中每层使用线性活动函数和批处Normalization,可以证明在任意深度下都有稳定梯度。基于Weingarten calculus,我们开发了一种精确和非对数学的理论,可以准确地描述这种构造的前向信号传递特性,同时证明在线性独立输入样本上,梯度都具有有界值。受到我们的理论启发,我们还设计了一种活动形态的调整方案,可以实际实现相同的特性 для某些非线性活动函数。

Generalized Convergence Analysis of Tsetlin Machines: A Probabilistic Approach to Concept Learning

  • paper_url: http://arxiv.org/abs/2310.02005
  • repo_url: None
  • paper_authors: Mohamed-Bachir Belaid, Jivitesh Sharma, Lei Jiao, Ole-Christoffer Granmo, Per-Arne Andersen, Anis Yazidi
  • for: 这篇论文的目的是为了解释Tsetlin机器(TM)在机器学习领域的应用中的性能,以及TM的整体吞吐量和可靠性。
  • methods: 这篇论文使用了Tsetlin自动机基于的机器学习算法,并提出了一种新的框架——概率概念学习(PCL),以解决TM在扩展的情况下的收敛问题。
  • results: 研究发现,PCL在$n$个特征下可以学习一组连接规则$C_i$,每个规则都有一个特定的包含概率$p_i$。此外,研究还证明了,对于任何规则$C_k$,PCL都可以收敛到一个连接规则。这一结论不仅有助于理解TM的性能,还有可能导致更加稳定和可解释的机器学习模型。
    Abstract Tsetlin Machines (TMs) have garnered increasing interest for their ability to learn concepts via propositional formulas and their proven efficiency across various application domains. Despite this, the convergence proof for the TMs, particularly for the AND operator (\emph{conjunction} of literals), in the generalized case (inputs greater than two bits) remains an open problem. This paper aims to fill this gap by presenting a comprehensive convergence analysis of Tsetlin automaton-based Machine Learning algorithms. We introduce a novel framework, referred to as Probabilistic Concept Learning (PCL), which simplifies the TM structure while incorporating dedicated feedback mechanisms and dedicated inclusion/exclusion probabilities for literals. Given $n$ features, PCL aims to learn a set of conjunction clauses $C_i$ each associated with a distinct inclusion probability $p_i$. Most importantly, we establish a theoretical proof confirming that, for any clause $C_k$, PCL converges to a conjunction of literals when $0.5
    摘要 特具谱机器(TM)在不同应用领域中已经吸引了越来越多的关注,主要是因为它们可以通过命题式来学习概念。然而,TM的整合证明,特别是在输入大于两位的情况下,仍然是一个打开的问题。这篇论文的目的是填补这个空白,通过对特具谱自动机基于机器学习算法的总结分析来证明TM的可靠性。我们提出了一种新的框架,称为概率概念学习(PCL),该框架简化了TM结构,并添加了专门的反馈机制和专门的包含/排除概率 для Literal。在n个特征下,PCL的目标是学习一组连接规则 $C_i$,每个规则都有自己的包含概率 $p_i$。最重要的是,我们证明了,对于任何规则 $C_k$,PCL在 $0.5

Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems

  • paper_url: http://arxiv.org/abs/2310.01991
  • repo_url: None
  • paper_authors: Aniruddha Deb, Neeva Oza, Sarthak Singla, Dinesh Khandelwal, Dinesh Garg, Parag Singla
  • for: 这篇论文探讨了大语言模型(LLM)在数学问题上的后向理解能力,即给定一个数学问题和其解答,能否由LLM回归出 omitted 信息?
  • methods: 本文首先定义了数学问题上的后向理解任务,并对 GSM8k、SVAMP 和 MultiArith 三个数据集进行修改以进行评估。然后,提出了三种新的技巧来提高 LLM 的表现:Rephrase、PAL-Tools 和 Check your Work。
  • results: 实验结果表明,使用这些技巧可以成功地提高 LLM 在后向理解任务中的表现。最终,提出了一种 Bayesian 形式的 ensemble 方法,通过与一个高精度的自然验证器相结合,进一步提高 LLM 的表现。
    Abstract While forward reasoning (i.e. find the answer given the question) has been explored extensively in the recent literature, backward reasoning is relatively unexplored. We examine the backward reasoning capabilities of LLMs on Math Word Problems (MWPs): given a mathematical question and its answer, with some details omitted from the question, can LLMs effectively retrieve the missing information? In this paper, we formally define the backward reasoning task on math word problems and modify three datasets to evaluate this task: GSM8k, SVAMP and MultiArith. Our findings show a significant drop in the accuracy of models on backward reasoning compared to forward reasoning across four SOTA LLMs (GPT4, GPT3.5, PaLM-2, and LLaMa-2). Utilizing the specific format of this task, we propose three novel techniques that improve performance: Rephrase reformulates the given problem into a forward reasoning problem, PAL-Tools combines the idea of Program-Aided LLMs to produce a set of equations that can be solved by an external solver, and Check your Work exploits the availability of natural verifier of high accuracy in the forward direction, interleaving solving and verification steps. Finally, realizing that each of our base methods correctly solves a different set of problems, we propose a novel Bayesian formulation for creating an ensemble over these base methods aided by a verifier to further boost the accuracy by a significant margin. Extensive experimentation demonstrates that our techniques successively improve the performance of LLMs on the backward reasoning task, with the final ensemble-based method resulting in a substantial performance gain compared to the raw LLMs with standard prompting techniques such as chain-of-thought.
    摘要 而forward reasoning(即根据问题找到答案)在近期文献中已经得到了广泛的探讨,而backward reasoning则相对较少研究。我们将专注于LLMs在数学语句问题(MWPs)上的backward reasoning能力:对于一个数学问题和其答案,当某些问题详细信息被删除时,LLMs是否能够有效地撷取缺失的信息?在这篇论文中,我们正式定义了MWPs上的backward reasoning任务,并对GSM8k、SVAMP和MultiArith三个数据集进行修改以进行评估。我们发现了四种SOTA LLMs(GPT4、GPT3.5、PaLM-2和LLaMa-2)在backward reasoning任务上的精度明显下降。我们运用这个任务的特定格式,提出了三种新的技术来改善性能:Rephrase将问题重新推理成前向推理问题,PAL-Tools结合了程式帮助LLMs生成可以由外部解uder解的方程集,Check your Work利用了前向方向的自然验证者高精度,将解ving和验证步骤进行交互式推理。最后,我们提出了一个组合这些基本方法的ensemble方法,通过与验证器一起使用 Bayesian 形式来增加精度。实验结果显示,我们的技术顺利地提高了LLMs在backward reasoning任务上的性能, ensemble-based 方法在与标准提示技术相比,实现了重要的性能提升。

Soda: An Object-Oriented Functional Language for Specifying Human-Centered Problems

  • paper_url: http://arxiv.org/abs/2310.01961
  • repo_url: None
  • paper_authors: Julian Alfredo Mendez
  • for: 本文旨在提供一种自然地处理质量和量的语言,以便更好地检查其正确性。
  • methods: 本文使用符号目标描述分析(Soda)语言,该语言可以帮助描述复杂的计算机系统需求,并且提供了适当的键性特性来支持这些需求的模型。
  • results: 本文提供了一种轻松描述问题的工具,可以更加透明地描述复杂的需求,从而减少错误的可能性。
    Abstract We present Soda (Symbolic Objective Descriptive Analysis), a language that helps to treat qualities and quantities in a natural way and greatly simplifies the task of checking their correctness. We present key properties for the language motivated by the design of a descriptive language to encode complex requirements on computer systems, and we explain how these key properties must be addressed to model these requirements with simple definitions. We give an overview of a tool that helps to describe problems in an easy way that we consider more transparent and less error-prone.
    摘要 我们介绍Soda(Symbolic Objective Descriptive Analysis)语言,它帮助处理质量和量的自然方式,并大大简化了检查正确性的任务。我们介绍了语言的关键属性,这些属性由计算机系统的描述语言的设计启发而来,并解释了如何使用简单的定义来模拟这些要求。我们给出了一个工具,它使得描述问题变得更加 transparent 和 less error-prone。Note: "Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

Language Models as Knowledge Bases for Visual Word Sense Disambiguation

  • paper_url: http://arxiv.org/abs/2310.01960
  • repo_url: https://github.com/anastasiakrith/llm-for-vwsd
  • paper_authors: Anastasia Kritharoula, Maria Lymperaiou, Giorgos Stamou
  • for: 本研究的目的是提高视听语言变换器(VL transformer)的检索性能,通过使用大语言模型(LLM)作为知识库。
  • methods: 本研究使用了知识库中的知识,通过适当的提示来检索知识,以及将视听语言变换问题转化为文本问题,并使用链条思维(CoT)提示来探究内部的思维过程。
  • results: 本研究表明,通过使用LLM作为知识库,可以提高VL transformer的检索性能,并且通过转化为文本问题,可以更好地探究内部的思维过程。
    Abstract Visual Word Sense Disambiguation (VWSD) is a novel challenging task that lies between linguistic sense disambiguation and fine-grained multimodal retrieval. The recent advancements in the development of visiolinguistic (VL) transformers suggest some off-the-self implementations with encouraging results, which however we argue that can be further improved. To this end, we propose some knowledge-enhancement techniques towards improving the retrieval performance of VL transformers via the usage of Large Language Models (LLMs) as Knowledge Bases. More specifically, knowledge stored in LLMs is retrieved with the help of appropriate prompts in a zero-shot manner, achieving performance advancements. Moreover, we convert VWSD to a purely textual question-answering (QA) problem by considering generated image captions as multiple-choice candidate answers. Zero-shot and few-shot prompting strategies are leveraged to explore the potential of such a transformation, while Chain-of-Thought (CoT) prompting in the zero-shot setting is able to reveal the internal reasoning steps an LLM follows to select the appropriate candidate. In total, our presented approach is the first one to analyze the merits of exploiting knowledge stored in LLMs in different ways to solve WVSD.
    摘要 Visual Word Sense Disambiguation (VWSD) 是一个新兴的挑战任务,位于语言意义归一化和细部多媒体搜寻之间。 recent advancements in visiolinguistic (VL) transformers 的开发,提供了一些可用的 implementation with encouraging results,但我们认为这些结果可以进一步改善。 To this end, we propose some knowledge-enhancement techniques towards improving the retrieval performance of VL transformers via the usage of Large Language Models (LLMs) as Knowledge Bases. More specifically, knowledge stored in LLMs is retrieved with the help of appropriate prompts in a zero-shot manner, achieving performance advancements. Moreover, we convert VWSD to a purely textual question-answering (QA) problem by considering generated image captions as multiple-choice candidate answers. Zero-shot and few-shot prompting strategies are leveraged to explore the potential of such a transformation, while Chain-of-Thought (CoT) prompting in the zero-shot setting is able to reveal the internal reasoning steps an LLM follows to select the appropriate candidate. In total, our presented approach is the first one to analyze the merits of exploiting knowledge stored in LLMs in different ways to solve WVSD.

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

  • paper_url: http://arxiv.org/abs/2310.01957
  • repo_url: https://github.com/wayveai/driving-with-llms
  • paper_authors: Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton
  • for: 本研究旨在提高自动驾驶中Context理解,特别是通过大语言模型(LLM)的普适性和可解释性。
  • methods: 我们提出了一种unicode object-level multimodal LLM架构,将vectorized numeric modalities与预训练LLM结合,以提高驾驶场景中Context的理解。我们还开发了10000个驾驶场景的160000个问答对,并使用RL代理和教师LLM(GPT-3.5)生成问题和答案。为了将数字vec模态与静态LLM表示相alin,我们提出了一种新的预训练策略。
  • results: 我们的研究发现,使用LLM-driver可以在驾驶场景中更好地理解 Context,回答问题,并做出决策。与传统行为宠模型相比,LLM-based driving action generation表现出了更高的普适性和可解释性。我们的研究结果和数据集、模型将被提供给进一步的探索。
    Abstract Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.
    摘要

Probabilistic Reach-Avoid for Bayesian Neural Networks

  • paper_url: http://arxiv.org/abs/2310.01951
  • repo_url: https://github.com/matthewwicker/bnnreachavoid
  • paper_authors: Matthew Wicker, Luca Laurenti, Andrea Patane, Nicola Paoletti, Alessandro Abate, Marta Kwiatkowska
  • for: 本研究旨在同时学习未知的随机环境动力学和生成优化策略,并确保策略在安全关键场景中的决策是安全和可靠的。
  • methods: 本研究使用interval propagation和backward recursion技术计算动力学模型下的下界,以确保策略满足给定的 reach-avoid 规范(达到目标状态,避免危险状态)。然后,使用控制合成算法derive最优的策略,以提高安全性的可靠性。
  • results: 在一系列控制准则中,我们的方法能够提供更多的 certificatable 状态和更高的平均保证的 reach-avoid 概率,比较传统的数据驱动策略。在最具挑战性的准则中,我们的优化算法能够提供更多的 certificatable 状态和更高的平均保证的 reach-avoid 概率,比较传统的数据驱动策略。
    Abstract Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a "target" state, while avoiding a set of "unsafe" states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.
    摘要 In this work, we address two related problems: computing reach-avoid probabilities for iterative predictions made with dynamical models, and synthesizing control policies that are optimal with respect to a given reach-avoid specification and a learned Bayesian neural network (BNN) model. Our approach leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. These lower bounds provide safety certification for the given policy and BNN model.We then introduce control synthesis algorithms to derive policies that maximize the computed lower bounds on the safety probability. Our method is demonstrated on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, our optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability compared to purely data-driven policies.

Ravestate: Distributed Composition of a Causal-Specificity-Guided Interaction Policy

  • paper_url: http://arxiv.org/abs/2310.01943
  • repo_url: None
  • paper_authors: Joseph Birkner, Andreas Dolp, Negin Karimi, Nikita Basargin, Alona Kharchenko, Rafael Hostettler
  • for: 这 paper 的目的是提出一种基于规则的人机交互策略设计方法,这种方法是有效、可解释、表达力强和直观的。
  • methods: 这 paper 使用了 Signal-Rule-Slot 框架,该框架是根据先前的 Symbolic System 设计方法进行改进,并引入了一种新的 Bayesian 思想的交互规则实用性指标called Causal Pathway Self-information。
  • results: 通过用 Ravestate 开源实现和进行用户研究,这 paper 提供了一种有力的人机交互系统,并在文本、语音和视觉等场景中展示了 Contextual Behavior 的robust性。
    Abstract In human-robot interaction policy design, a rule-based method is efficient, explainable, expressive and intuitive. In this paper, we present the Signal-Rule-Slot framework, which refines prior work on rule-based symbol system design and introduces a new, Bayesian notion of interaction rule utility called Causal Pathway Self-information. We offer a rigorous theoretical foundation as well as a rich open-source reference implementation Ravestate, with which we conduct user studies in text-, speech-, and vision-based scenarios. The experiments show robust contextual behaviour of our probabilistically informed rule-based system, paving the way for more effective human-machine interaction.
    摘要 人机交互策略设计中,使用规则方法是有效、可解释、表达力强和直观的。本文提出了信号规则槽框架,对先前的规则基于符号系统设计做出了改进,并引入了一新的 bayesian 概念:交互规则用量含义。我们提供了坚实的理论基础以及rich的开源参考实现 Ravestate,并在文本、语音和视觉等方面进行了用户研究。实验结果表明了我们的概率知识基于规则系统在不同场景中具有强大的上下文行为,这将为人机交互带来更高效的交互。Note: Please keep in mind that the translation is done by a machine and may not be perfect. If you have any specific requirements or preferences, please let me know and I can provide a more tailored translation.

  • paper_url: http://arxiv.org/abs/2310.01929
  • repo_url: None
  • paper_authors: Mor Ventura, Eyal Ben-David, Anna Korhonen, Roi Reichart
  • for: 这个研究旨在探讨TEXT-TO-IMAGE(TTI)模型中嵌入的文化认知,以及这些模型在不同文化背景下的表现。
  • methods: 该研究使用了多种评估技术,包括CLIP空间的内在评估、VQA模型的外在评估以及人类评估,以探索TTI模型的文化认知。
  • results: 实验结果显示,TTI模型在不同文化背景下具有文化认知,并且可以适应不同文化的特点。这些模型还能够解释文化特点,并且可以在不同文化背景下提高表现。
    Abstract Text-To-Image (TTI) models, exemplified by DALL-E and StableDiffusion, have recently gained prominence for their remarkable zero-shot capabilities in generating images guided by textual prompts. Language, as a conduit of culture, plays a pivotal role in these models' multilingual capabilities, which in turn shape their cultural agency. In this study, we explore the cultural perception embedded in TTI models by characterizing culture across three hierarchical tiers: cultural dimensions, cultural domains, and cultural concepts. We propose a comprehensive suite of evaluation techniques, including intrinsic evaluations using the CLIP space, extrinsic evaluations with a Visual-Question-Answer (VQA) model, and human assessments, to discern TTI cultural perceptions. To facilitate our research, we introduce the CulText2I dataset, derived from four diverse TTI models and spanning ten languages. Our experiments reveal insights into these models' cultural awareness, cultural distinctions, and the unlocking of cultural features, releasing the potential for cross-cultural applications.
    摘要

DARTH: Holistic Test-time Adaptation for Multiple Object Tracking

  • paper_url: http://arxiv.org/abs/2310.01926
  • repo_url: https://github.com/mattiasegu/darth
  • paper_authors: Mattia Segu, Bernt Schiele, Fisher Yu
  • for: 这篇论文主要旨在提出一种在测试时进行多目标跟踪(MOT)系统的适应性问题,以提高自动驾驶系统的安全性。
  • methods: 该论文提出了一种涵盖 object detection 和 instance association 的全面测试时适应框架,包括一种自动化的检测一致问题的解决方案以及一种新的补充彩色损失来适应实例外观表示。
  • results: 该论文在不同的领域变换(sim-to-real、outdoor-to-indoor、indoor-to-outdoor)中进行了广泛的测试,并得到了明显的性能提升。
    Abstract Multiple object tracking (MOT) is a fundamental component of perception systems for autonomous driving, and its robustness to unseen conditions is a requirement to avoid life-critical failures. Despite the urge of safety in driving systems, no solution to the MOT adaptation problem to domain shift in test-time conditions has ever been proposed. However, the nature of a MOT system is manifold - requiring object detection and instance association - and adapting all its components is non-trivial. In this paper, we analyze the effect of domain shift on appearance-based trackers, and introduce DARTH, a holistic test-time adaptation framework for MOT. We propose a detection consistency formulation to adapt object detection in a self-supervised fashion, while adapting the instance appearance representations via our novel patch contrastive loss. We evaluate our method on a variety of domain shifts - including sim-to-real, outdoor-to-indoor, indoor-to-outdoor - and substantially improve the source model performance on all metrics. Code: https://github.com/mattiasegu/darth.
    摘要 多bject tracking (MOT) 是自动驾驶系统的基本组件,其robustness to 未看过的条件是必要的,以避免生命 crítical 错误。尽管安全驾驶系统的需求很强,但是没有任何一个解决方案可以在测试时间下进行 MOT 适应域转换问题。然而,MOT 系统的性质是多元的 - 需要对象检测和实例关联 - 并且全部组件的适应是非常困难。在这篇论文中,我们分析了域转换对 appearance-based 跟踪器的影响,并提出了 DARTH,一个整体测试时间适应框架 для MOT。我们提议使用自我supervised 的检测一致性 формулы来适应对象检测,而对实例的外观表示进行适应via我们的新型 patch contrastive loss。我们对各种域转换进行了评估 - 包括 sim-to-real、outdoor-to-indoor 和 indoor-to-outdoor - 并在所有指标上显著提高了源模型的性能。代码:https://github.com/mattiasegu/darth。

FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations

  • paper_url: http://arxiv.org/abs/2310.01892
  • repo_url: https://github.com/microsoft/figure
  • paper_authors: Chanakya Ekbote, Ajinkya Pankaj Deshpande, Arun Iyer, Ramakrishna Bairi, Sundararajan Sellamanickam
  • for: This paper aims to improve the performance of unsupervised node representations learnt using contrastive learning-based methods on downstream tasks.
  • methods: The authors propose a simple filter-based augmentation method to capture different parts of the eigen-spectrum, which leads to significant improvements. They also share the same weights across different filter augmentations to reduce computational load.
  • results: The proposed method, FiGURe, achieves an average gain of up to 4.4% compared to the state-of-the-art unsupervised models across all datasets considered, both homophilic and heterophilic.Here’s the summary in Simplified Chinese:
  • for: 本文目的是提高基于对比学习的无监督节点表示的性能在下游任务中。
  • methods: 作者提出了一种简单的滤波器基于扩展方法,以捕捉不同的特征谱部分,并且通过共享相同权重来降低计算负担。
  • results: 提出的方法FiGURe,在所有考虑的数据集上,both homophilic和heterophilic,实现了4.4%的平均提升,比领先的无监督模型更高。
    Abstract Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We show significant improvements using these augmentations. Further, we show that sharing the same weights across these different filter augmentations is possible, reducing the computational load. In addition, previous works have shown that good performance on downstream tasks requires high dimensional representations. Working with high dimensions increases the computations, especially when multiple augmentations are involved. We mitigate this problem and recover good performance through lower dimensional embeddings using simple random Fourier feature projections. Our method, FiGURe achieves an average gain of up to 4.4%, compared to the state-of-the-art unsupervised models, across all datasets in consideration, both homophilic and heterophilic. Our code can be found at: https://github.com/microsoft/figure.
    摘要 自助学习方法学习的无监督节点表示已经达到了下游任务的好表现。然而,这些方法通常依赖于模拟低通滤波器的扩充,这限制了它们在不同谱分部分上的表现。这篇论文提出了一种简单的滤波器扩充方法,以捕捉不同的谱分部分。我们表明了这些扩充的显著提高。此外,之前的工作表明,downstream任务需要高维表示。工作于高维度会增加计算量,特别是当多个扩充 involve。我们解决这个问题并重新获得了好表现通过简单的随机傅立干投影。我们的方法FiGURe在考虑所有数据集上达到了4.4%的平均提升,相比之前的状态对照模型。我们的代码可以在 GitHub上找到:https://github.com/microsoft/figure。

Adaptive Hybrid Model for Enhanced Stock Market Predictions Using Improved VMD and Stacked Informer

  • paper_url: http://arxiv.org/abs/2310.01884
  • repo_url: https://github.com/DANNHIROAKI/Adaptive-Hybrid-Model-for-Enhanced-Stock-Market-Predictions-Using-Improved-VMD-and-Stacked-Informer
  • paper_authors: Jianan Zhang, Hongyi Duan
  • for: 该研究旨在提出一种适应性混合模型,用于股票市场预测,利用提高后 Variational Mode Decomposition (VMD)、Feature Engineering (FE) 和堆叠 Informer 以及适应损失函数。
  • methods: 该模型使用了增强的 VMD、FE 和堆叠 Informer,并将其与适应损失函数结合使用。
  • results: 实验结果表明,提出的模型(称为 Adam+GC+增强 informer,简称 VMGCformer)在股票市场数据中表现出色,其预测精度、应急性和泛化能力都高于传统和其他混合模型。
    Abstract This paper introduces an innovative adaptive hybrid model for stock market predictions, leveraging the capabilities of an enhanced Variational Mode Decomposition (VMD), Feature Engineering (FE), and stacked Informer integrated with an adaptive loss function. Through rigorous experimentation, the proposed model, termed Adam+GC+enhanced informer (We name it VMGCformer), demonstrates significant proficiency in addressing the intricate dynamics and volatile nature of stock market data. Experimental results, derived from multiple benchmark datasets, underscore the model's superiority in terms of prediction accuracy, responsiveness, and generalization capabilities over traditional and other hybrid models. The research further highlights potential avenues for optimization and introduces future directions to enhance predictive modeling, especially for small enterprises and feature engineering.
    摘要 Here is the text in Simplified Chinese:这篇论文提出了一种新型的股票市场预测模型, combining 变分幂分析(VMD)、特征工程(FE)和堆叠 Informer 以及适应损失函数。该模型被称为 VMGCformer,经过了多种数据集的严格实验,对股票市场数据的复杂动态和不稳定性表现出了明显的优势。实验结果显示,VMGCformer 在预测精度、应急性和泛化能力方面比传统和其他混合模型表现出了明显的提高。研究还提出了优化方向和未来研究的方向,尤其是对小企业和特征工程。

Towards Stable Backdoor Purification through Feature Shift Tuning

  • paper_url: http://arxiv.org/abs/2310.01875
  • repo_url: https://github.com/aisafety-hkust/stable_backdoor_purification
  • paper_authors: Rui Min, Zeyu Qin, Li Shen, Minhao Cheng
  • for: 本研究旨在提出一种简单易于实施的后门攻击防御方法,以帮助减少深度神经网络(DNN)受到后门攻击的风险。
  • methods: 我们使用了精心调整(fine-tuning)方法,并进行了广泛的测试和分析,以推断 vanilla 调整方法在低毒料比例下完全失效。我们还提出了一种叫做特征偏移调整(Feature Shift Tuning,FST)方法,以解决低毒料比例下后门纯化的问题。FST 通过强制抬升分类器的权重偏移自已损害的权重,以提高后门纯化的稳定性。
  • results: 我们的实验结果表明,FST 方法在不同的攻击场景下具有稳定的性能,并且只需要10个训练 epoch,可以很快地完成纯化过程。此外,FST 方法还可以在低毒料比例下提供更好的防御性能,而不需要复杂的参数调整。
    Abstract It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architecture, which makes them hard to deploy into real-world applications. Therefore, in this paper, we instead start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses, through comprehensive evaluations against diverse attack scenarios. Observations made through initial experiments show that in contrast to the promising defensive results on high poisoning rates, vanilla tuning methods completely fail at low poisoning rate scenarios. Our analysis shows that with the low poisoning rate, the entanglement between backdoor and clean features undermines the effect of tuning-based defenses. Therefore, it is necessary to disentangle the backdoor and clean features in order to improve backdoor purification. To address this, we introduce Feature Shift Tuning (FST), a method for tuning-based backdoor purification. Specifically, FST encourages feature shifts by actively deviating the classifier weights from the originally compromised weights. Extensive experiments demonstrate that our FST provides consistently stable performance under different attack settings. Without complex parameter adjustments, FST also achieves much lower tuning costs, only 10 epochs. Our codes are available at https://github.com/AISafety-HKUST/stable_backdoor_purification.
    摘要 历史观察表明深度神经网络(DNN)容易受到后门攻击,攻击者可以通过修改一小部分训练样本来 manipulate DNN 的行为。虽然一些防御方法被提出,但它们 either require 复杂的训练过程修改或者 heavily rely on 特定模型架构,这使得它们在实际应用中困难实施。因此,在这篇论文中,我们选择通过 fine-tuning,一种最常见并容易实施的后门防御方法,进行广泛的评估。初始实验结果表明,对高毒量情况下, vanilla tuning 方法具有扎实的防御效果。然而,在低毒量情况下,标准的 tuning 方法完全失败。我们的分析表明,在低毒量情况下,后门和干净特征之间的束缚,使得 tuning-based 防御无效。因此,我们需要分离后门和干净特征,以提高后门纯化。为此,我们介绍了 Feature Shift Tuning(FST),一种基于 tuning 的后门纯化方法。具体来说,FST 通过活动偏移分类器权重,以避免由恶意攻击所损害的原始权重。广泛的实验表明,我们的 FST 在不同的攻击设置下具有稳定的性能。没有复杂的参数调整,FST 只需要 10 轮训练,可以快速实现纯化。我们的代码可以在 https://github.com/AISafety-HKUST/stable_backdoor_purification 上获取。

Conditional Instrumental Variable Regression with Representation Learning for Causal Inference

  • paper_url: http://arxiv.org/abs/2310.01865
  • repo_url: None
  • paper_authors: Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Thuc Duy Le
  • for: 这 paper 研究了从观察数据中估计 causal effect 的困难问题,在存在隐藏的假设变量的情况下。
  • methods: 这 paper 使用了 two-stage least square (TSLS) 方法和其变种,以及标准的 instruemental variable (IV),来消除假设变量的偏见,包括隐藏的假设变量所导致的偏见。但是,这些方法 rely 于线性假设。此外,标准 IV 方法中对 instruemental variable 的约束条件太 strict,不实际。因此,在这 paper 中,我们使用 conditional IV (CIV) 来放宽标准 IV 中的 instruemental variable 约束条件,并提出一种非线性 CIV 回归,即 Confounding Balancing Representation Learning, CBRL.CIV,用于同时消除隐藏的假设变量所导致的偏见,并均衡观察到的假设变量。我们 theoretically 验证了 CBRL.CIV 的正确性。
  • results: 在 synthetic 和两个实际数据集上,我们进行了广泛的实验,发现 CBRL.CIV 在对 state-of-the-art IV-based estimator 进行比较时,表现竞争性,并且在非线性情况下表现更优。
    Abstract This paper studies the challenging problem of estimating causal effects from observational data, in the presence of unobserved confounders. The two-stage least square (TSLS) method and its variants with a standard instrumental variable (IV) are commonly used to eliminate confounding bias, including the bias caused by unobserved confounders, but they rely on the linearity assumption. Besides, the strict condition of unconfounded instruments posed on a standard IV is too strong to be practical. To address these challenging and practical problems of the standard IV method (linearity assumption and the strict condition), in this paper, we use a conditional IV (CIV) to relax the unconfounded instrument condition of standard IV and propose a non-linear CIV regression with Confounding Balancing Representation Learning, CBRL.CIV, for jointly eliminating the confounding bias from unobserved confounders and balancing the observed confounders, without the linearity assumption. We theoretically demonstrate the soundness of CBRL.CIV. Extensive experiments on synthetic and two real-world datasets show the competitive performance of CBRL.CIV against state-of-the-art IV-based estimators and superiority in dealing with the non-linear situation.
    摘要

Fine-tuned vs. Prompt-tuned Supervised Representations: Which Better Account for Brain Language Representations?

  • paper_url: http://arxiv.org/abs/2310.01854
  • repo_url: None
  • paper_authors: Jingyuan Sun, Marie-Francine Moens
  • for: investigate the effectiveness of prompt-tuning compared to fine-tuning in generating representations that better account for the brain’s language representations
  • methods: using neural decoding to compare the performance of prompt-tuned and fine-tuned representations in predicting linguistic stimuli from brain activities
  • results: full fine-tuning does not significantly outperform prompt-tuning in neural decoding, and tasks dealing with fine-grained concept meaning yield representations that better decode brain activation patterns than other tasks.
    Abstract To decipher the algorithm underlying the human brain's language representation, previous work probed brain responses to language input with pre-trained artificial neural network (ANN) models fine-tuned on NLU tasks. However, full fine-tuning generally updates the entire parametric space and distorts pre-trained features, cognitively inconsistent with the brain's robust multi-task learning ability. Prompt-tuning, in contrast, protects pre-trained weights and learns task-specific embeddings to fit a task. Could prompt-tuning generate representations that better account for the brain's language representations than fine-tuning? If so, what kind of NLU task leads a pre-trained model to better decode the information represented in the human brain? We investigate these questions by comparing prompt-tuned and fine-tuned representations in neural decoding, that is predicting the linguistic stimulus from the brain activities evoked by the stimulus. We find that on none of the 10 NLU tasks, full fine-tuning significantly outperforms prompt-tuning in neural decoding, implicating that a more brain-consistent tuning method yields representations that better correlate with brain data. Moreover, we identify that tasks dealing with fine-grained concept meaning yield representations that better decode brain activation patterns than other tasks, especially the syntactic chunking task. This indicates that our brain encodes more fine-grained concept information than shallow syntactic information when representing languages.
    摘要 为了解释人脑语言表示法下的算法,前一些研究使用预训练的人工神经网络(ANN)模型进行精细调整NLU任务。然而,全面调整通常更新整个参数空间,与人脑的多任务学习能力不匹配。Prompt-tuning,相比之下,保护预训练的权重并学习任务特定的嵌入,以适应任务。能否使用Prompt-tuning生成更好地匹配人脑语言表示的表示?如果是的,那么哪种NLU任务会使预训练模型更好地解码人脑活动中的信息?我们通过比较Prompt-tuned和全面调整的表示在神经解码中进行比较,即预测语言刺激从人脑活动中的诱导信息。我们发现,在10个NLU任务中,全面调整不能significantlyOutperformPrompt-tuning神经解码中。这表明,使用更加brain-consistent的调整方法可以生成更好地匹配人脑数据的表示。此外,我们发现,处理细化意义的任务(例如语法块分析任务)的表示更好地解码人脑活动模式,特别是在语法块分析任务中。这表明,我们的脑在语言表示中更加强调细化意义信息,而不是浅层语法信息。

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

  • paper_url: http://arxiv.org/abs/2310.01852
  • repo_url: https://github.com/pku-yuangroup/languagebind
  • paper_authors: Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan
  • for: 提高多modalities视频语言模型的性能(improve the performance of multimodal video language models)
  • methods: 使用语言作为绑定élément(use language as the binding element),即冻结语言Encoder获取自VL pré-training,然后使其他模式的Encoder通过对比学习。(freeze the language encoder obtained from VL pre-training, and then train encoders for other modalities with contrastive learning)
  • results: 在MSR-VTT数据集上表现出优于ImageBind的5.8% R@1(outperform ImageBind by 5.8% R@1 on the MSR-VTT dataset),并在其他多个任务中也表现出优异(and also outperform in other multiple tasks)
    Abstract The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modalities, N>=3) beyond vision and language. We thus propose LanguageBind, taking the language as the bind across different modalities because the language modality is well-explored and contains rich semantics. Specifically, we freeze the language encoder acquired by VL pretraining, then train encoders for other modalities with contrastive learning. As a result, all modalities are mapped to a shared feature space, implementing multi-modal semantic alignment. While LanguageBind ensures that we can extend VL modalities to N modalities, we also need a high-quality dataset with alignment data pairs centered on language. We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M. In our VIDAL-10M, all videos are from short video platforms with complete semantics rather than truncated segments from long videos, and all the video, depth, infrared, and audio modalities are aligned to their textual descriptions. After pretraining on VIDAL-10M, we outperform ImageBind by 5.8% R@1 on the MSR-VTT dataset with only 15% of the parameters in the zero-shot video-text retrieval task. Beyond this, our LanguageBind has greatly improved in the zero-shot video, audio, depth, and infrared understanding tasks. For instance, LanguageBind surpassing InterVideo by 1.9% on MSR-VTT, 8.8% on MSVD, 6.3% on DiDeMo, and 4.4% on ActivityNet. On the LLVIP and NYU-D datasets, LanguageBind outperforms ImageBind with 23.8% and 11.1% top-1 accuracy. Code address: https://github.com/PKU-YuanGroup/LanguageBind.
    摘要 视频语言(VL)预训练已经实现了多个下游任务中的显著改进。然而,现有的VL预训练框架难以扩展到多个modalities(N模式,N≥3)以外的视觉语言。我们因此提出了LanguageBind,将语言作为所有modalities之间的绑定因素。具体来说,我们冻结获得的语言encoder,然后使用对比学习训练其他modalities的encoder。这使得所有modalities都映射到共同的特征空间,实现多modal semantic alignment。LanguageBind确保了我们可以扩展VL modalities到N modalities,但我们还需要一个高质量的数据集,其中包含对齐数据对。我们因此提出了VIDAL-10M,它包含视频、红外、深度、音频和其对应的语言。在我们的VIDAL-10M中,所有视频都来自短视频平台,完整的 semantics 而不是长视频中 truncated 的segment,而所有视频、深度、红外和音频modalities都与其文本描述进行了对齐。在VIDAL-10M上进行预训练后,我们在MSR-VTT数据集上出现了与ImageBind的5.8% R@1的提升,只使用15%的参数。此外,LanguageBind在零shot video、音频、深度和红外理解任务中也有了大幅提升。例如,LanguageBind在 MSR-VTT 上超过 InterVideo by 1.9%,在 MSVD 上超过 InterVideo by 8.8%,在 DiDeMo 上超过 InterVideo by 6.3%,在 ActivityNet 上超过 InterVideo by 4.4%。在 LLVIP 和 NYU-D 数据集上,LanguageBind也超过 ImageBind,具体的top-1准确率分别是23.8%和11.1%。代码地址:https://github.com/PKU-YuanGroup/LanguageBind。

Zero-Shot Refinement of Buildings’ Segmentation Models using SAM

  • paper_url: http://arxiv.org/abs/2310.01845
  • repo_url: None
  • paper_authors: Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor, Ali J. Ghandour
  • For: This paper aims to adapt foundation models for specific domains, specifically remote sensing imagery, to improve their generalization and recognition abilities.* Methods: The authors introduce a novel approach that integrates a pre-trained CNN as a prompt generator to augment the Segment Anything Model (SAM) with recognition abilities. They evaluate their method on three remote sensing datasets and achieve improved performance.* Results: The authors report a 5.47% increase in IoU and a 4.81% improvement in F1-score for out-of-distribution performance on the WHU dataset, and a 2.72% and 1.58% increase in True-Positive-IoU and True-Positive-F1 score, respectively, for in-distribution performance on the WHU dataset.
    Abstract Foundation models have excelled in various tasks but are often evaluated on general benchmarks. The adaptation of these models for specific domains, such as remote sensing imagery, remains an underexplored area. In remote sensing, precise building instance segmentation is vital for applications like urban planning. While Convolutional Neural Networks (CNNs) perform well, their generalization can be limited. For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM), a potent foundation model renowned for its prowess in class-agnostic image segmentation capabilities. We start by identifying the limitations of SAM, revealing its suboptimal performance when applied to remote sensing imagery. Moreover, SAM does not offer recognition abilities and thus fails to classify and tag localized objects. To address these limitations, we introduce different prompting strategies, including integrating a pre-trained CNN as a prompt generator. This novel approach augments SAM with recognition abilities, a first of its kind. We evaluated our method on three remote sensing datasets, including the WHU Buildings dataset, the Massachusetts Buildings dataset, and the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU dataset, we achieve a 5.47% increase in IoU and a 4.81% improvement in F1-score. For in-distribution performance on the WHU dataset, we observe a 2.72% and 1.58% increase in True-Positive-IoU and True-Positive-F1 score, respectively. We intend to release our code repository, hoping to inspire further exploration of foundation models for domain-specific tasks within the remote sensing community.
    摘要 基础模型在多种任务中表现出色,但它们常被评估在通用的benchmark上。适应这些模型特定领域,如遥感图像,仍是一个未曾充分发掘的领域。在遥感中,精准地分割建筑物实例是城市规划等应用的关键。虽然卷积神经网络(CNN)表现良好,但其泛化能力有限。为了解决这个问题,我们提出了一种适应基础模型的新方法,以提高现有模型的泛化能力。我们的注意点在于Segment Anything Model(SAM),这是一个知名的基础模型,拥有类型不敏感的图像分割能力。我们发现SAM在遥感图像上表现不佳,并且无法识别和标记地方化对象。为了解决这些限制,我们提出了不同的提示策略,包括将预训练的CNN作为提示生成器 integrating。这种新的approach不仅增强了SAM的识别能力,还是首次实现的。我们对三个遥感数据集进行了评估,包括WHU建筑数据集、马萨诸塞建筑数据集和AICrowd Mapping Challenge。对于WHU数据集的 OUT-OF-DISTRIBUTION性能,我们实现了5.47%的IoU提高和4.81%的F1得分提高。对于IN-DISTRIBUTION性能,我们观察到2.72%和1.58%的True-Positive-IoU和True-Positive-F1分数提高。我们计划在未来释出代码库,希望能启发更多人在遥感社区进行基础模型的探索。

Extending CAM-based XAI methods for Remote Sensing Imagery Segmentation

  • paper_url: http://arxiv.org/abs/2310.01837
  • repo_url: None
  • paper_authors: Abdul Karim Gizzini, Mustafa Shukor, Ali J. Ghandour
  • for: 这 paper 的目的是帮助解释深度学习模型在高分辨率卫星图像中的行为和决策过程,以提高模型的可读性和可信度。
  • methods: 这 paper 使用了一些最新的 XAI 技术,包括适应性抑制和监测 Entropy,来解释建筑物的分类和分割。
  • results: 研究发现,使用 XAI 技术可以帮助理解深度学习模型在高分辨率卫星图像中的行为和决策过程,并提高模型的可读性和可信度。
    Abstract Current AI-based methods do not provide comprehensible physical interpretations of the utilized data, extracted features, and predictions/inference operations. As a result, deep learning models trained using high-resolution satellite imagery lack transparency and explainability and can be merely seen as a black box, which limits their wide-level adoption. Experts need help understanding the complex behavior of AI models and the underlying decision-making process. The explainable artificial intelligence (XAI) field is an emerging field providing means for robust, practical, and trustworthy deployment of AI models. Several XAI techniques have been proposed for image classification tasks, whereas the interpretation of image segmentation remains largely unexplored. This paper offers to bridge this gap by adapting the recent XAI classification algorithms and making them usable for muti-class image segmentation, where we mainly focus on buildings' segmentation from high-resolution satellite images. To benchmark and compare the performance of the proposed approaches, we introduce a new XAI evaluation methodology and metric based on "Entropy" to measure the model uncertainty. Conventional XAI evaluation methods rely mainly on feeding area-of-interest regions from the image back to the pre-trained (utility) model and then calculating the average change in the probability of the target class. Those evaluation metrics lack the needed robustness, and we show that using Entropy to monitor the model uncertainty in segmenting the pixels within the target class is more suitable. We hope this work will pave the way for additional XAI research for image segmentation and applications in the remote sensing discipline.
    摘要 当前的人工智能(AI)基于方法无法提供可understandable的物理解释,包括使用的数据、提取的特征和预测/推理操作。因此,使用高分辨率卫星图像进行训练的深度学习模型缺乏透明性和可解释性,只能被看作为黑obox,这限制了它们的广泛应用。专家需要更好地理解AI模型的复杂行为和下面的决策过程。人工智能可解释(XAI)领域是一个emerging领域,它提供了一些可靠、实用和信任worthy的AI模型部署方法。在图像分类任务上,XAI技术已经被提出,但图像 segmentation的解释仍然是一个未解决的问题。本文想要bridging这个 gap,通过适应最近的XAI分类算法,使其可以用于多类图像 segmentation,主要是对高分辨率卫星图像中的建筑物进行分类。为了评估和比较提出的方法的性能,我们提出了一种新的XAI评估方法和度量基于"Entropy"来度量模型的uncertainty。传统的XAI评估方法主要基于将区域of interest从图像feedback到预训练(utility)模型中,然后计算target类的平均更改 probabilities。这些评估度量缺乏坚定性,我们表明,使用Entropy来监测在target类中分割像素的模型uncertainty是更适合。我们希望这种工作能够开拓XAI研究的新途径,并在远程感知领域得到应用。

Formalizing Natural Language Intent into Program Specifications via Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01831
  • repo_url: None
  • paper_authors: Madeline Endres, Sarah Fakhoury, Saikat Chakraborty, Shuvendu K. Lahiri
  • for: 本文旨在利用大语言模型(LLM)将非正式自然语言 especifications 翻译成正式方法 postconditions,以提高代码质量和可靠性。
  • methods: 本文使用了多种方法来评估和比较不同的 LLM4nl2post approaches,包括正确性和分类力等指标。同时,本文还使用了质量和量化的方法来评估 LLM4nl2post postconditions 的质量。
  • results: 本文的结果表明,LLM4nl2post 可以生成正确的 postconditions,并且可以distinguish correct code from incorrect code。此外,本文还发现,使用 LLM4nl2post 可以捕捉70个历史bugs。
    Abstract Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a programs intent. However, there is typically no guarantee that a programs implementation and natural language documentation are aligned. In the case of a conflict, leveraging information in code-adjacent natural language has the potential to enhance fault localization, debugging, and code trustworthiness. In practice, however, this information is often underutilized due to the inherent ambiguity of natural language which makes natural language intent challenging to check programmatically. The "emergent abilities" of Large Language Models (LLMs) have the potential to facilitate the translation of natural language intent to programmatically checkable assertions. However, it is unclear if LLMs can correctly translate informal natural language specifications into formal specifications that match programmer intent. Additionally, it is unclear if such translation could be useful in practice. In this paper, we describe LLM4nl2post, the problem leveraging LLMs for transforming informal natural language to formal method postconditions, expressed as program assertions. We introduce and validate metrics to measure and compare different LLM4nl2post approaches, using the correctness and discriminative power of generated postconditions. We then perform qualitative and quantitative methods to assess the quality of LLM4nl2post postconditions, finding that they are generally correct and able to discriminate incorrect code. Finally, we find that LLM4nl2post via LLMs has the potential to be helpful in practice; specifications generated from natural language were able to catch 70 real-world historical bugs from Defects4J.
    摘要 具有自然语言描述功能的代码,如代码注释或函数文档,可能包含大量代码意图信息。但是,在实际应用中,这些自然语言文档和代码实现之间通常无法保证一致。在这种情况下,利用代码附近的自然语言信息可以提高错误检测、调试和代码可靠性。然而,由于自然语言的本质含义不确定,使得自然语言意图难以程序matically检查。大型自然语言模型(LLMs)的“emergent abilities”可能使得自然语言意图转化成程序可靠的断言。然而,是否LLMs可以正确地将自然语言规范转化成程序员意图的正式规范,以及这种转化是否有实际用途,都是未知的。在这篇论文中,我们描述了LLM4nl2post问题,即使用LLMs将自然语言规范转化成程序assertions的形式方法。我们引入了和验证了度量不同LLM4nl2post方法的正确性和特征力度。然后,我们通过质量和量度方法评估LLM4nl2post结果,发现它们通常是正确的并能够区分错误代码。最后,我们发现LLM4nl2post通过LLMs在实际应用中有可能帮助的潜在性。在历史上,自然语言规范生成的specifications捕捉到了70个实际bug。

Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentation

  • paper_url: http://arxiv.org/abs/2310.01828
  • repo_url: None
  • paper_authors: Hossein Shreim, Abdul Karim Gizzini, Ali J. Ghandour
  • for: 这 paper 的目的是提出一种基于 Sobol 方法的透明性能算法,用于解决计算机视觉应用中的隐藏层模型 interpretability 问题。
  • methods: 本 paper 使用的方法包括 Sobol 方法和一种基于 learnable noise model 的评价方法,用于评估透明性能算法的性能。
  • results: 研究发现,使用 Sobol 方法可以提高透明性能算法的准确率,而且与 Seg-Grad-CAM 和 Seg-Grad-CAM++ 方法进行比较, Sobol 方法在高分辨率卫星图像中表现更好。
    Abstract eXplainable Artificial Intelligence (XAI) has emerged as an essential requirement when dealing with mission-critical applications, ensuring transparency and interpretability of the employed black box AI models. The significance of XAI spans various domains, from healthcare to finance, where understanding the decision-making process of deep learning algorithms is essential. Most AI-based computer vision models are often black boxes; hence, providing explainability of deep neural networks in image processing is crucial for their wide adoption and deployment in medical image analysis, autonomous driving, and remote sensing applications. Recently, several XAI methods for image classification tasks have been introduced. On the contrary, image segmentation has received comparatively less attention in the context of explainability, although it is a fundamental task in computer vision applications, especially in remote sensing. Only some research proposes gradient-based XAI algorithms for image segmentation. This paper adapts the recent gradient-free Sobol XAI method for semantic segmentation. To measure the performance of the Sobol method for segmentation, we propose a quantitative XAI evaluation method based on a learnable noise model. The main objective of this model is to induce noise on the explanation maps, where higher induced noise signifies low accuracy and vice versa. A benchmark analysis is conducted to evaluate and compare performance of three XAI methods, including Seg-Grad-CAM, Seg-Grad-CAM++ and Seg-Sobol using the proposed noise-based evaluation technique. This constitutes the first attempt to run and evaluate XAI methods using high-resolution satellite images.
    摘要 现代人工智能(XAI)已成为重要的需求,特别是在关键应用领域,以确保深度学习模型的透明度和可解释性。XAI在医疗、金融等领域具有重要的意义,因为理解深度学习算法的决策过程是关键。然而,图像分割 tasks 在计算机视觉应用中尚未得到过度的关注,尽管它是计算机视觉应用中的基本任务。只有一些研究提出了基于梯度的 XAI 算法。本文采用了最近的梯度自由 Sobol XAI 方法进行semantic segmentation。为评估 Sobol 方法的性能,我们提议了一种基于学习的噪声模型的量化 XAI 评估方法。这种方法的主要目标是在解释地图上引入噪声,其中更高的引入噪声表示低准确率,而低的引入噪声表示高准确率。我们进行了比较三种 XAI 方法的性能,包括Seg-Grad-CAM、Seg-Grad-CAM++和Seg-Sobol,使用我们提议的噪声基于评估技术。这是首次使用高分辨率卫星图像来运行和评估 XAI 方法。

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency

  • paper_url: http://arxiv.org/abs/2310.01827
  • repo_url: https://github.com/franroldans/qmp-her
  • paper_authors: Francisco Roldan Sanchez, Qiang Wang, David Cordova Bulens, Kevin McGuinness, Stephen Redmond, Noel O’Connor
  • for: 提高RL基于代理人的训练效率,解决目标基于机器人操作任务中的寻找问题
  • methods: 使用先前学习的基本行为指导代理人在探索过程中选择更有奖励的动作,使用评估网络在每个时刻决定使用先前学习的原始策略提议的动作
  • results: 比较HER和其他更高效的变种,在块 manipulate 任务中表现出更高的效率和更快的计算时间,代表代理人可以更快地学习成功策略
    Abstract Hindsight Experience Replay (HER) is a technique used in reinforcement learning (RL) that has proven to be very efficient for training off-policy RL-based agents to solve goal-based robotic manipulation tasks using sparse rewards. Even though HER improves the sample efficiency of RL-based agents by learning from mistakes made in past experiences, it does not provide any guidance while exploring the environment. This leads to very large training times due to the volume of experience required to train an agent using this replay strategy. In this paper, we propose a method that uses primitive behaviours that have been previously learned to solve simple tasks in order to guide the agent toward more rewarding actions during exploration while learning other more complex tasks. This guidance, however, is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed by the previously-learned primitive policies. We evaluate our method by comparing its performance against HER and other more efficient variations of this algorithm in several block manipulation tasks. We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time. Code is available at https://github.com/franroldans/qmp-her.
    摘要 <> translate("Hindsight Experience Replay(HER)是一种在强化学习(RL)中使用的技术,可以帮助RL-based agent通过过去的经验来解决目标基于机器人操作任务,使用稀有的奖励。尽管HER可以提高RL-based agent的样本效率,但是它不会在环境中探索时提供任何指导。这会导致训练时间很长,因为需要大量的经验来训练一个agent使用这种播放策略。在这篇论文中,我们提出了一种方法,使用先前学习的基本行为来导引agent在探索时选择更有奖励的动作,以便更快地学习更复杂的任务。这些指导不是由手动设计的课程来实施,而是通过一个批评网络来在每个时刻决定是否使用由先前学习的基本策略提出的动作。我们通过对HER和其他更高效的变种进行比较,在块 manipulate任务中评估了我们的方法。我们发现,使用我们的方法,agent可以更快地学习成功策略,同时减少样本数和计算时间。代码可以在https://github.com/franroldans/qmp-her中找到。")Here's the translation in Traditional Chinese:<> translate("Hindsight Experience Replay(HER)是一种在强化学习(RL)中使用的技术,可以帮助RL-based agent通过过去的经验来解决目标基于机器人操作任务,使用稀有的奖励。尽管HER可以提高RL-based agent的样本效率,但是它不会在环境中探索时提供任何指导。这会导致训练时间很长,因为需要大量的经验来训练一个agent使用这种播放策略。在这篇论文中,我们提出了一种方法,使用先前学习的基本行为来导引agent在探索时选择更有奖励的动作,以便更快地学习更复杂的任务。这些指导不是由手动设计的课程来实施,而是通过一个批评网络来在每个时刻决定是否使用由先前学习的基本策略提出的动作。我们通过对HER和其他更高效的变种进行比较,在块 manipulate 任务中评估了我们的方法。我们发现,使用我们的方法,agent可以更快地学习成功策略,同时减少样本数和计算时间。代码可以在https://github.com/franroldans/qmp-her中找到。")

Empirical Study of PEFT techniques for Winter Wheat Segmentation

  • paper_url: http://arxiv.org/abs/2310.01825
  • repo_url: None
  • paper_authors: Mohamad Hasan Zahweh, Hasan Nasrallah, Mustafa Shukor, Ghaleb Faour, Ali J. Ghandour
    for: This paper aims to explore the feasibility of cross-area and cross-year out-of-distribution generalization for crop monitoring using the State-of-the-Art (SOTA) wheat crop monitoring model.methods: The paper uses various PEFT (Parameter Efficient Fine Tuning) techniques, including BigFit, LoRA, Adaptformer, and prompt tuning, to adapt the SOTA TSViT model for winter wheat field segmentation.results: The paper achieved notable results comparable to those achieved using full fine-tuning methods while training only a mere 0.7% parameters of the whole TSViT architecture. The in-house labeled data-set, referred to as the Beqaa-Lebanon dataset, comprises high-quality annotated polygons for wheat and non-wheat classes with a total surface of 170 kmsq, over five consecutive years. Using Sentinel-2 images, the model achieved an 84% F1-score.Here is the simplified Chinese text for the three key points:for: 这篇论文目的是探讨跨区域和跨年度out-of-distribution泛化对农业监测中的应用可能性。methods: 这篇论文使用了不同的PEFT技术,包括BigFit、LoRA、Adaptformer和提示调整,来适应SOTA TSViT模型用于冬小麦场segmentation。results: 这篇论文在只训练0.7% TSViT架构中的参数量下达到了与全量精度调整方法相当的不ABLEResult。使用了自己的标注数据集,称为Beqaa-Lebanon数据集,包括5年 consecutively annotated的高质量冬小麦和非冬小麦类划分 polygon,总面积约170 kmsq。使用Sentinel-2图像,模型达到了84% F1得分。
    Abstract Parameter Efficient Fine Tuning (PEFT) techniques have recently experienced significant growth and have been extensively employed to adapt large vision and language models to various domains, enabling satisfactory model performance with minimal computational needs. Despite these advances, more research has yet to delve into potential PEFT applications in real-life scenarios, particularly in the critical domains of remote sensing and crop monitoring. The diversity of climates across different regions and the need for comprehensive large-scale datasets have posed significant obstacles to accurately identify crop types across varying geographic locations and changing growing seasons. This study seeks to bridge this gap by comprehensively exploring the feasibility of cross-area and cross-year out-of-distribution generalization using the State-of-the-Art (SOTA) wheat crop monitoring model. The aim of this work is to explore PEFT approaches for crop monitoring. Specifically, we focus on adapting the SOTA TSViT model to address winter wheat field segmentation, a critical task for crop monitoring and food security. This adaptation process involves integrating different PEFT techniques, including BigFit, LoRA, Adaptformer, and prompt tuning. Using PEFT techniques, we achieved notable results comparable to those achieved using full fine-tuning methods while training only a mere 0.7% parameters of the whole TSViT architecture. The in-house labeled data-set, referred to as the Beqaa-Lebanon dataset, comprises high-quality annotated polygons for wheat and non-wheat classes with a total surface of 170 kmsq, over five consecutive years. Using Sentinel-2 images, our model achieved a 84% F1-score. We intend to publicly release the Lebanese winter wheat data set, code repository, and model weights.
    摘要 Parameter Efficient Fine Tuning (PEFT) 技术在最近几年内得到了广泛应用,并在不同领域中适应化大型视觉语言模型,以达到最佳性能的同时减少计算成本。然而,还没有很多研究探讨PEFT在实际应用场景中的潜在应用,特别是在重要的远程感知和农业监测领域。由于不同地区的气候多样性和需要大规模的数据采集,以准确地识别不同地区的农作物种类是一项挑战。本研究旨在bridging这个差距,通过对State-of-the-Art (SOTA) 小麦监测模型进行跨地区和跨年度 OUT-OF-distribution 泛化研究。本研究的目的是探讨PEFT方法在农业监测中的应用。特别是,我们将focus on 适应SOTA TSViT模型,以解决冬小麦场地分类,这是农业监测和食品安全的关键任务。这个适应过程涉及到了不同的PEFT技术,包括BigFit、LoRA、Adaptformer和prompt tuning。通过PEFT技术,我们在只训练0.7%的整个TSViT架构参数上达到了与全 fine-tuning 方法相当的 Result。我们使用Sentinel-2 图像,在Beqaa-Lebanon 数据集上实现了84% F1 分数。我们打算公开利比邻国冬小麦数据集、代码存储和模型参数。

Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI

  • paper_url: http://arxiv.org/abs/2310.01824
  • repo_url: https://github.com/stanfordvl/mini_behavior
  • paper_authors: Emily Jin, Jiaheng Hu, Zhuoyi Huang, Ruohan Zhang, Jiajun Wu, Li Fei-Fei, Roberto Martín-Martín
  • for: 本研究开发了一个新的embodied AIbenchmark,名为Mini-BEHAVIOR,用于测试和评估在人工智能执行任务时的决策和计划能力。
  • methods: 该benchmark使用Gridworld环境,并提供了一系列的任务和学习环境,以测试和评估embodied AI的决策和计划能力。
  • results: Mini-BEHAVIOR提供了一个许多任务的集合,可以用于评估和研究embodied AI的决策和计划能力,并且可以快速进行protoype和学习。
    Abstract We present Mini-BEHAVIOR, a novel benchmark for embodied AI that challenges agents to use reasoning and decision-making skills to solve complex activities that resemble everyday human challenges. The Mini-BEHAVIOR environment is a fast, realistic Gridworld environment that offers the benefits of rapid prototyping and ease of use while preserving a symbolic level of physical realism and complexity found in complex embodied AI benchmarks. We introduce key features such as procedural generation, to enable the creation of countless task variations and support open-ended learning. Mini-BEHAVIOR provides implementations of various household tasks from the original BEHAVIOR benchmark, along with starter code for data collection and reinforcement learning agent training. In essence, Mini-BEHAVIOR offers a fast, open-ended benchmark for evaluating decision-making and planning solutions in embodied AI. It serves as a user-friendly entry point for research and facilitates the evaluation and development of solutions, simplifying their assessment and development while advancing the field of embodied AI. Code is publicly available at https://github.com/StanfordVL/mini_behavior.
    摘要 我们介绍了Mini-BEHAVIOR,一个新的人工智能benchmark,挑战智能体用于解决复杂的日常人类挑战。Mini-BEHAVIOR环境具有快速、真实的Gridworld环境,可以快速进行模拟和使用,同时保留了复杂的身体智能 benchmark中的物理实实的复杂性和难度。我们引入了程序生成功能,以生成无数个任务变化,支持开放式学习。Mini-BEHAVIOR提供了原始BEHAVIORbenchmark中的各种家庭任务,以及数据收集和奖励学习代码。简而言之,Mini-BEHAVIOR是一个快速、开放式的人工智能决策和规划解决方案的benchmark,可以方便地评估和开发解决方案,同时推动人工智能领域的发展。代码可以在https://github.com/StanfordVL/mini_behavior上获取。

MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2310.01821
  • repo_url: None
  • paper_authors: Takuhiro Kaneko
  • for: 提高NeRF的渲染速度和质量之间的平衡,以及减少NeRF的训练时间。
  • methods: 将SISO MLP replaced with MIMO MLP,并在组内进行映射,以减少NeRF的MLP数量。自动学习方法可以解决这种抽象的问题,不需要使用预训练模型。
  • results: 通过对比和缺失研究,显示MIMO-NeRF可以在理想的训练时间内获得良好的平衡。此外,MIMO-NeRF可以与之前的NeRF快速技术(如DONeRF和TensoRF)结合使用,以提高渲染质量。
    Abstract Neural radiance fields (NeRFs) have shown impressive results for novel view synthesis. However, they depend on the repetitive use of a single-input single-output multilayer perceptron (SISO MLP) that maps 3D coordinates and view direction to the color and volume density in a sample-wise manner, which slows the rendering. We propose a multi-input multi-output NeRF (MIMO-NeRF) that reduces the number of MLPs running by replacing the SISO MLP with a MIMO MLP and conducting mappings in a group-wise manner. One notable challenge with this approach is that the color and volume density of each point can differ according to a choice of input coordinates in a group, which can lead to some notable ambiguity. We also propose a self-supervised learning method that regularizes the MIMO MLP with multiple fast reformulated MLPs to alleviate this ambiguity without using pretrained models. The results of a comprehensive experimental evaluation including comparative and ablation studies are presented to show that MIMO-NeRF obtains a good trade-off between speed and quality with a reasonable training time. We then demonstrate that MIMO-NeRF is compatible with and complementary to previous advancements in NeRFs by applying it to two representative fast NeRFs, i.e., a NeRF with sample reduction (DONeRF) and a NeRF with alternative representations (TensoRF).
    摘要

Discrete, compositional, and symbolic representations through attractor dynamics

  • paper_url: http://arxiv.org/abs/2310.01807
  • repo_url: None
  • paper_authors: Andrew Nam, Eric Elmoznino, Nikolay Malkin, Chen Sun, Yoshua Bengio, Guillaume Lajoie
  • for: 这个论文的目的是探讨符号系统中的可组合性特性,以及如何通过模型化吸引器动力学来实现在符号空间中的可组合性。
  • methods: 该论文使用了建立在吸引器网络之上的新训练方法,以模型符号空间中的吸引器动力学,从而实现在符号空间中的可组合性。
  • results: 研究人员发现,通过吸引器动力学模型,可以在符号空间中实现可组合性,并且该模型可以处理rich的感知输入。此外,研究人员还发现,该模型在处理感知输入时会经历信息瓶颈现象,这可能与生物体的意识经验有关。
    Abstract Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite capacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.
    摘要 “ композиция是一个重要的特点,它使得抽象符号系统,如语言和程序,可以具有无限容量,即使使用有限的符号集。它在认知科学和人工智能中都是一种有用的抽象,但是在连续和符号处理之间的界面经常是由算法强制实施的,例如通过量化或softmax采样步骤。在这项工作中,我们探索如何通过模型拥有者动力学来实现精炼的抽象。我们基于已有的拥有者网络工作,并引入新的训练方法,并示出了在有限的符号空间中强制结构可以生成compose的表示空间。最后,我们 argue that我们的模型具有信息瓶颈,这种信息瓶颈被认为是意识经验中的一个重要组成部分,将丰富的感知输入分解成稳定的分量,编码符号信息。”Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Improvement and Enhancement of YOLOv5 Small Target Recognition Based on Multi-module Optimization

  • paper_url: http://arxiv.org/abs/2310.01806
  • repo_url: None
  • paper_authors: Qingyang Li, Yuchen Li, Hongyi Duan, JiaLiang Kang, Jianan Zhang, Xueqian Gan, Ruotong Xu
  • for: 这个论文主要针对小目标检测任务中YOLOv5s模型的局限性进行深入研究和改进。
  • methods: 该论文提出了基于GhostNet convolutional模块、RepGFPN颈部模块优化、CA和Transformer的注意机制以及损失函数改进等多种改进策略,以提高模型的精度、回归率和MAP指标。
  • results: 实验结果表明,这些改进策略有效地提高了模型的表现,尤其是在复杂背景和微小目标的实际应用测试中表现出色。
    Abstract In this paper, the limitations of YOLOv5s model on small target detection task are deeply studied and improved. The performance of the model is successfully enhanced by introducing GhostNet-based convolutional module, RepGFPN-based Neck module optimization, CA and Transformer's attention mechanism, and loss function improvement using NWD. The experimental results validate the positive impact of these improvement strategies on model precision, recall and mAP. In particular, the improved model shows significant superiority in dealing with complex backgrounds and tiny targets in real-world application tests. This study provides an effective optimization strategy for the YOLOv5s model on small target detection, and lays a solid foundation for future related research and applications.
    摘要 在这篇论文中,YOLOv5s模型在小目标检测任务中的局限性进行了深入研究和改进。通过引入 GhostNet 基于 convolutional 模块、RepGFPN 基于 Neck 模块优化、CA 和 Transformer 注意机制以及损失函数改进使用 NWD,提高了模型的性能。实验结果证明了这些改进策略对模型精度、重复率和 mAP 的影响。特别是,改进后的模型在实际应用中对复杂背景和小目标展示出了显著的优势。这种研究为 YOLOv5s 模型在小目标检测中的优化提供了有效的策略,并为后续相关研究和应用提供了坚实的基础。

Comparative study of microgrid optimal scheduling under multi-optimization algorithm fusion

  • paper_url: http://arxiv.org/abs/2310.01805
  • repo_url: None
  • paper_authors: Hongyi Duan, Qingyang Li, Yuchen Li, Jianan Zhang, Yuming Xie
  • for: 本研究旨在探讨微Grid的操作和环境成本之间的关系,通过多bjective优化模型进行探讨。
  • methods: 该研究提出了一种 integrate多种优化算法的方法,包括生物algorithm、Simulated Annealing、Ant Colony Optimization和Particle Swarm Optimization。
  • results: 实验结果表明,这些算法在经济和环境调度下提供了不同的派发结果,揭示了微Grid中 diesel机和微气轮机的不同角色。
    Abstract As global attention on renewable and clean energy grows, the research and implementation of microgrids become paramount. This paper delves into the methodology of exploring the relationship between the operational and environmental costs of microgrids through multi-objective optimization models. By integrating various optimization algorithms like Genetic Algorithm, Simulated Annealing, Ant Colony Optimization, and Particle Swarm Optimization, we propose an integrated approach for microgrid optimization. Simulation results depict that these algorithms provide different dispatch results under economic and environmental dispatch, revealing distinct roles of diesel generators and micro gas turbines in microgrids. Overall, this study offers in-depth insights and practical guidance for microgrid design and operation.
    摘要 为了满足全球人们对可再生和清洁能源的需求,微型电网的研究和实施变得非常重要。本文介绍了通过多目标优化模型研究微型电网操作和环境成本之间的关系的方法ologyth. 通过结合不同的优化算法,如遗传算法、模拟热处理算法、蚁群优化算法和粒子群优化算法,我们提出了一种综合方法 для微型电网优化。实验结果表明,这些算法在经济和环境调度下提供了不同的派发结果,揭示了微型电网中燃油机和微型气体轮机的不同角色。总的来说,这种研究为微型电网设计和运行提供了深入的理解和实践指导。

Large Language Models Cannot Self-Correct Reasoning Yet

  • paper_url: http://arxiv.org/abs/2310.01798
  • repo_url: None
  • paper_authors: Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou
  • for: 本研究旨在探讨LLMs中自 corrections的功能和局限性,以提高其生成内容的准确性和适用性。
  • methods: 本研究采用了一种现代方法,即自 corrections,以探讨LLMs的自 corrected内容的质量和可靠性。
  • results: 研究发现,无外部反馈的情况下,LLMs很难进行自 corrected,而且有时even degrade its performance post self-correction。
    Abstract Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance might even degrade post self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.
    摘要 Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance might even degrade post self-correction.Drawing from these insights, we offer suggestions for future research and practical applications in this field.Translation notes:* "Large Language Models" is translated as "大型语言模型" (dàxìng yǔyán módelǐ)* "self-correction" is translated as "自我修正" (zìwǒ xiùgòng)* "intrinsic self-correction" is translated as "内在自我修正" (néizài zìwǒ xiùgòng)* "re reasoning" is translated as "再理解" (zài lǐjiě)Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

Online POMDP Planning with Anytime Deterministic Guarantees

  • paper_url: http://arxiv.org/abs/2310.01791
  • repo_url: https://github.com/moranbar/Online-POMDP-Planning-with-Anytime-Deterministic-Guarantees
  • paper_authors: Moran Barenboim, Vadim Indelman
  • for: This paper is written for researchers and practitioners interested in planning under uncertainty, particularly in real-world scenarios where autonomous agents operate.
  • methods: The paper uses partially observable Markov decision processes (POMDPs) to mathematically formalize planning under uncertainty. It also employs approximate algorithms such as tree search and sample-based methodologies to solve POMDPs, which offer probabilistic and asymptotic guarantees towards the optimal solution.
  • results: The paper derives a deterministic relationship between a simplified solution and the theoretically optimal one, providing bounds for selecting a subset of observations to branch from while computing a complete belief at each posterior node. The paper also extends these bounds to support reduction of both the state and observation spaces, and demonstrates how these guarantees can be integrated with existing state-of-the-art solvers. Additionally, the paper provides supporting experimental results to substantiate its findings.
    Abstract Autonomous agents operating in real-world scenarios frequently encounter uncertainty and make decisions based on incomplete information. Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs). However, finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. In recent years, approximate algorithms, such as tree search and sample-based methodologies, have emerged as state-of-the-art POMDP solvers for larger problems. Despite their effectiveness, these algorithms offer only probabilistic and often asymptotic guarantees toward the optimal solution due to their dependence on sampling. To address these limitations, we derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one. First, we derive bounds for selecting a subset of the observations to branch from while computing a complete belief at each posterior node. Then, since a complete belief update may be computationally demanding, we extend the bounds to support reduction of both the state and the observation spaces. We demonstrate how our guarantees can be integrated with existing state-of-the-art solvers that sample a subset of states and observations. As a result, the returned solution holds deterministic bounds relative to the optimal policy. Lastly, we substantiate our findings with supporting experimental results.
    摘要 首先,我们derive bounds for selecting a subset of observations to branch from while computing a complete belief at each posterior node。然后,由于完整的信念更新可以是计算昂贵的,我们扩展这些 boundsto支持状态和观察空间的减少。我们示出了如何将我们的保证与现有的状态-OF-the-art解决方案集成,以获得返回的解决方案具有deterministic bound relative to the optimal policy。最后,我们通过实验来证明我们的发现。

Can large language models provide useful feedback on research papers? A large-scale empirical analysis

  • paper_url: http://arxiv.org/abs/2310.01783
  • repo_url: https://github.com/weixin-liang/llm-scientific-feedback
  • paper_authors: Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas Vodrahalli, Siyu He, Daniel Smith, Yian Yin, Daniel McFarland, James Zou
  • for: The paper aims to evaluate the utility of using large language models (LLMs) to generate scientific feedback on research manuscripts.
  • methods: The authors created an automated pipeline using GPT-4 to provide comments on the full PDFs of scientific papers and evaluated the quality of GPT-4’s feedback through two large-scale studies.
  • results: The study found that GPT-4’s generated feedback overlaps with human peer reviewer feedback, and more than half of the users found the feedback helpful. However, the authors also identified several limitations of using LLM-generated feedback.Here are the three points in Simplified Chinese text:
  • for: 本研究旨在评估使用大语言模型(LLM)生成科学评论的有用性。
  • methods: 作者们创建了一个自动化管道,使用GPT-4对科学论文PDF提供反馈,并通过两项大规模研究评估GPT-4生成的反馈质量。
  • results: 研究发现,GPT-4生成的反馈与人类同行评审者的反馈重叠,并且超过50%的用户认为这些反馈是有帮助的。然而,作者们还发现了一些LLM生成反馈的局限性。
    Abstract Expert feedback lays the foundation of rigorous research. However, the rapid growth of scholarly production and intricate knowledge specialization challenge the conventional scientific feedback mechanisms. High-quality peer reviews are increasingly difficult to obtain. Researchers who are more junior or from under-resourced settings have especially hard times getting timely feedback. With the breakthrough of large language models (LLM) such as GPT-4, there is growing interest in using LLMs to generate scientific feedback on research manuscripts. However, the utility of LLM-generated feedback has not been systematically studied. To address this gap, we created an automated pipeline using GPT-4 to provide comments on the full PDFs of scientific papers. We evaluated the quality of GPT-4's feedback through two large-scale studies. We first quantitatively compared GPT-4's generated feedback with human peer reviewer feedback in 15 Nature family journals (3,096 papers in total) and the ICLR machine learning conference (1,709 papers). The overlap in the points raised by GPT-4 and by human reviewers (average overlap 30.85% for Nature journals, 39.23% for ICLR) is comparable to the overlap between two human reviewers (average overlap 28.58% for Nature journals, 35.25% for ICLR). The overlap between GPT-4 and human reviewers is larger for the weaker papers. We then conducted a prospective user study with 308 researchers from 110 US institutions in the field of AI and computational biology to understand how researchers perceive feedback generated by our GPT-4 system on their own papers. Overall, more than half (57.4%) of the users found GPT-4 generated feedback helpful/very helpful and 82.4% found it more beneficial than feedback from at least some human reviewers. While our findings show that LLM-generated feedback can help researchers, we also identify several limitations.
    摘要 专家反馈是科学研究的基础。但是,随着学术著作的快速增长和专业知识的复杂化,传统的科学反馈机制变得更加困难。获得高质量的同行评审变得越来越困难。特别是 junior researchers 和资源匮乏的设置中的研究人员更容易遇到延迟的反馈问题。随着大语言模型(LLM)如 GPT-4 的突破,有关使用 LLM 生成科学反馈的兴趣日益增长。然而, LLG 生成反馈的实用性尚未得到系统性的研究。为了解决这个空白,我们创建了一个自动化管道,使用 GPT-4 生成科学论文的注释。我们通过两项大规模研究评估 GPT-4 生成的反馈质量。首先,我们量化比较 GPT-4 生成的反馈和人工同行评审者的反馈在 15 本 Nature 家刊物(3,096 篇论文)和 ICLR 机器学习会议(1,709 篇论文)中的重叠率。结果显示,GPT-4 生成的反馈和人工同行评审者的反馈之间的重叠率为 30.85%(Nature 家刊物)和 39.23%(ICLR),与人工同行评审者之间的重叠率(28.58%(Nature 家刊物)和 35.25%(ICLR))相比较高。此外,我们发现 GPT-4 生成的反馈和人工同行评审者的反馈之间的重叠率在弱论文上更高。其次,我们采用了向 308 名 AI 和计算生物学研究人员进行前向研究,以了解他们对我们的 GPT-4 系统生成的反馈是否有帮助。结果显示,More than half (57.4%) of the users found GPT-4 generated feedback helpful/very helpful,和 82.4% 认为 GPT-4 生成的反馈比至少一些人工同行评审者的反馈更有利。 although our findings show that LLM-generated feedback can help researchers, we also identify several limitations.

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

  • paper_url: http://arxiv.org/abs/2310.01775
  • repo_url: None
  • paper_authors: Yewon Lee, Philip Huang, Krishna Murthy Jatavallabhula, Andrew Z. Li, Fabian Damken, Eric Heiden, Kevin Smith, Derek Nowrouzezahrai, Fabio Ramos, Florian Shkurti
  • for: solves task and motion planning problems for manipulation tasks, such as using tools or assembling parts, by leveraging parallelization and differentiable simulation to efficiently search for multiple diverse plans.
  • methods: uses Stein Variational Gradient Descent and parallelized differentiable physics simulators on the GPU to efficiently obtain gradients for inference, and employs imitation learning to introduce action abstractions that reduce the inference problem to lower dimensions.
  • results: produces multiple diverse plans in parallel and searches for plans more efficiently compared to existing TAMP baselines.
    Abstract Planning for many manipulation tasks, such as using tools or assembling parts, often requires both symbolic and geometric reasoning. Task and Motion Planning (TAMP) algorithms typically solve these problems by conducting a tree search over high-level task sequences while checking for kinematic and dynamic feasibility. While performant, most existing algorithms are highly inefficient as their time complexity grows exponentially with the number of possible actions and objects. Additionally, they only find a single solution to problems in which many feasible plans may exist. To address these limitations, we propose a novel algorithm called Stein Task and Motion Planning (STAMP) that leverages parallelization and differentiable simulation to efficiently search for multiple diverse plans. STAMP relaxes discrete-and-continuous TAMP problems into continuous optimization problems that can be solved using variational inference. Our algorithm builds upon Stein Variational Gradient Descent, a gradient-based variational inference algorithm, and parallelized differentiable physics simulators on the GPU to efficiently obtain gradients for inference. Further, we employ imitation learning to introduce action abstractions that reduce the inference problem to lower dimensions. We demonstrate our method on two TAMP problems and empirically show that STAMP is able to: 1) produce multiple diverse plans in parallel; and 2) search for plans more efficiently compared to existing TAMP baselines.
    摘要 планирование для многих задач манипуляции, таких как использование инструментов или сборка частей, часто требует как символического, так и геометрического мышления. алгоритмы планирования задач и движения (TAMP) обычно решают эти проблемы, проверяя поиск в древе последовательностей задач на высоком уровне, в то время как проверяют возможность движения и динамики. хотя они работают хорошо, большинство существующих алгоритмов являются высоко неэффективными, так как время сложности растет экспоненциально с количеством возможных действий и объектов. кроме того, они только находят один решение для проблем, в которых существует множество возможных планов.чтобы решить эти ограничения, мы предлагаем новый алгоритм, называемый STAMP (Stein Task and Motion Planning), который использует параллелизацию и дифференцируемую симуляцию для эффективного поиска множества разнообразных планов. STAMP разлагает задачи TAMP на континуальные оптимизационные задачи, которые могут быть решены с помощью вариационного инференции. наш алгоритм строится на Stein Variational Gradient Descent, алгоритме вариационной градиентной декрементации, и параллелизированных дифференцируемых физических симуляторах на GPU для эффективного получения градиентов для инфериенции. кроме того, мы используем обучение по примеру для введения абстрактных действий, которые уменьшают задачу инфериенции до более низких измерений.мы демонстрируем свой метод на двух задачах TAMP и эмпирически показываем, что STAMP может: 1) произвести множество разнообразных планов в параллели; и 2) поискать планы более эффективно, чем существующие TAMP-базы.

A simple connection from loss flatness to compressed representations in neural networks

  • paper_url: http://arxiv.org/abs/2310.01770
  • repo_url: None
  • paper_authors: Shirui Chen, Stefano Recanatesi, Eric Shea-Brown
  • for: 研究深度神经网络的泛化能力
  • methods: 使用loss函数的形态和表示 manifold的结构来研究深度神经网络的泛化能力
  • results: 显示在深度神经网络的学习过程中,压缩表示 manifold的体积与损失函数的平坦性有直接的关系,并且这一关系可以通过简单的数学关系来预测。
    Abstract Deep neural networks' generalization capacity has been studied in a variety of ways, including at least two distinct categories of approach: one based on the shape of the loss landscape in parameter space, and the other based on the structure of the representation manifold in feature space (that is, in the space of unit activities). These two approaches are related, but they are rarely studied together and explicitly connected. Here, we present a simple analysis that makes such a connection. We show that, in the last phase of learning of deep neural networks, compression of the volume of the manifold of neural representations correlates with the flatness of the loss around the minima explored by ongoing parameter optimization. We show that this is predicted by a relatively simple mathematical relationship: loss flatness implies compression of neural representations. Our results build closely on prior work of \citet{ma_linear_2021}, which shows how flatness (i.e., small eigenvalues of the loss Hessian) develops in late phases of learning and lead to robustness to perturbations in network inputs. Moreover, we show there is no similarly direct connection between local dimensionality and sharpness, suggesting that this property may be controlled by different mechanisms than volume and hence may play a complementary role in neural representations. Overall, we advance a dual perspective on generalization in neural networks in both parameter and feature space.
    摘要

Differentially Encoded Observation Spaces for Perceptive Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.01767
  • repo_url: https://github.com/a2r-lab/diffcompressdrl
  • paper_authors: Lev Grossman, Brian Plancher
  • for: 这篇论文旨在提高深度强化学习(DRL)系统的训练效率,使其能够在边缘设备上进行学习,以适应环境。
  • methods: 这篇论文使用了差异推统 observation space,将储存的图像基于观察转换为影片,并利用损失无限对称影片编码方案将练习缓存缩小至14.2倍和16.7倍,并且完全在RAM中进行训练,提高了DMC任务的延迟时间。
  • results: 这篇论文获得了训练DRL系统的效率和可扩展性,实现了边缘设备上的大规模强化学习,并且获得了32%的延迟时间改善。
    Abstract Perceptive deep reinforcement learning (DRL) has lead to many recent breakthroughs for complex AI systems leveraging image-based input data. Applications of these results range from super-human level video game agents to dexterous, physically intelligent robots. However, training these perceptive DRL-enabled systems remains incredibly compute and memory intensive, often requiring huge training datasets and large experience replay buffers. This poses a challenge for the next generation of field robots that will need to be able to learn on the edge in order to adapt to their environments. In this paper, we begin to address this issue through differentially encoded observation spaces. By reinterpreting stored image-based observations as a video, we leverage lossless differential video encoding schemes to compress the replay buffer without impacting training performance. We evaluate our approach with three state-of-the-art DRL algorithms and find that differential image encoding reduces the memory footprint by as much as 14.2x and 16.7x across tasks from the Atari 2600 benchmark and the DeepMind Control Suite (DMC) respectively. These savings also enable large-scale perceptive DRL that previously required paging between flash and RAM to be run entirely in RAM, improving the latency of DMC tasks by as much as 32%.
    摘要 深度强化学习(DRL)的感知技术在处理图像输入数据方面取得了许多最近的突破。这些应用包括超human级视频游戏代理以及柔软、物理智能的机器人。然而,训练这些感知DRL系统仍然非常计算和存储密集,经常需要庞大的训练集和大的经验回放缓存。这会对下一代静止环境中的场景机器人带来挑战,这些机器人需要在边缘上学习,以适应其环境。在这篇论文中,我们开始解决这个问题,通过不同的编码方式来压缩存储的图像 Observation 空间。我们利用图像序列的形式重新解释存储的图像,然后利用不失真的视频编码方案来压缩回放缓存。我们使用三种state-of-the-art DRL算法进行评估,发现使用不同的图像编码可以将内存占用量减少为14.2倍和16.7倍,对于Atari 2600 测试集和DeepMind Control Suite(DMC)测试集分别。这些减少还使得大规模感知DRL,之前需要缓存在 Flash 和RAM 之间,现在可以完全在 RAM 上运行,提高 DMC 任务的响应时间为多达32%。

Improved Algorithms for Adversarial Bandits with Unbounded Losses

  • paper_url: http://arxiv.org/abs/2310.01756
  • repo_url: None
  • paper_authors: Mingyu Chen, Xuezhou Zhang
  • for: solve the Adversarial Multi-Armed Bandit (MAB) problem with unbounded losses, where the algorithms have no prior knowledge on the sizes of the losses.
  • methods: presents two algorithms, UMAB-NN and UMAB-G, for non-negative and general unbounded loss respectively.
  • results: achieves the first adaptive and scale-free regret bound without uniform exploration for non-negative unbounded loss, and can learn from arbitrary unbounded loss. Our analysis reveals the asymmetry between positive and negative losses in the MAB problem and provides additional insights.
    Abstract We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses, where the algorithms have no prior knowledge on the sizes of the losses. We present UMAB-NN and UMAB-G, two algorithms for non-negative and general unbounded loss respectively. For non-negative unbounded loss, UMAB-NN achieves the first adaptive and scale free regret bound without uniform exploration. Built up on that, we further develop UMAB-G that can learn from arbitrary unbounded loss. Our analysis reveals the asymmetry between positive and negative losses in the MAB problem and provide additional insights. We also accompany our theoretical findings with extensive empirical evaluations, showing that our algorithms consistently out-performs all existing algorithms that handles unbounded losses.
    摘要 我们考虑了对抗多重机器人(MAB)问题,其中算法无知对损失的大小。我们提出了UMAB-NN和UMAB-G两种算法,用于非正式和一般无限损失。对于非正式无限损失,UMAB-NN实现了首次适应和比例自适应征递减 regret bound,不需要均匀探索。基于这个成果,我们进一步开发了UMAB-G,可以学习任意无限损失。我们的分析发现MAB问题中损失的偏见性,并提供了附加的视角。我们还附加了广泛的实验,证明我们的算法在处理无限损失时表现更好。

Blending Imitation and Reinforcement Learning for Robust Policy Improvement

  • paper_url: http://arxiv.org/abs/2310.01737
  • repo_url: None
  • paper_authors: Xuefeng Liu, Takuma Yoneda, Rick L. Stevens, Matthew R. Walter, Yuxin Chen
  • for: 提高深度学习环境中样本繁殖的效率,使得深度学习可以更广泛应用于不同领域。
  • methods: combines imitation learning (IL) and reinforcement learning (RL), using oracle queries to facilitate exploration and gradually transitioning to RL as learning unfolds.
  • results: 在多个 benchmark 领域中表现出色,比如 existing state-of-the-art 方法。
    Abstract While reinforcement learning (RL) has shown promising performance, its sample complexity continues to be a substantial hurdle, restricting its broader application across a variety of domains. Imitation learning (IL) utilizes oracles to improve sample efficiency, yet it is often constrained by the quality of the oracles deployed. which actively interleaves between IL and RL based on an online estimate of their performance. RPI draws on the strengths of IL, using oracle queries to facilitate exploration, an aspect that is notably challenging in sparse-reward RL, particularly during the early stages of learning. As learning unfolds, RPI gradually transitions to RL, effectively treating the learned policy as an improved oracle. This algorithm is capable of learning from and improving upon a diverse set of black-box oracles. Integral to RPI are Robust Active Policy Selection (RAPS) and Robust Policy Gradient (RPG), both of which reason over whether to perform state-wise imitation from the oracles or learn from its own value function when the learner's performance surpasses that of the oracles in a specific state. Empirical evaluations and theoretical analysis validate that RPI excels in comparison to existing state-of-the-art methodologies, demonstrating superior performance across various benchmark domains.
    摘要 reinforcement learning (RL) 已经显示出了有前途的表现,但其样本复杂性仍然是一大障碍,限制其更广泛的应用于多个领域。 imitation learning (IL) 利用 oracle 来提高样本效率,但它通常受到 oracle 的质量的限制。 RPI draws on the strengths of IL, using oracle queries to facilitate exploration, an aspect that is notably challenging in sparse-reward RL, particularly during the early stages of learning. As learning unfolds, RPI gradually transitions to RL, effectively treating the learned policy as an improved oracle. This algorithm is capable of learning from and improving upon a diverse set of black-box oracles. Integral to RPI are Robust Active Policy Selection (RAPS) and Robust Policy Gradient (RPG), both of which reason over whether to perform state-wise imitation from the oracles or learn from its own value function when the learner's performance surpasses that of the oracles in a specific state. empirical evaluations and theoretical analysis validate that RPI excels in comparison to existing state-of-the-art methodologies, demonstrating superior performance across various benchmark domains.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that version as well.

Learning Expected Appearances for Intraoperative Registration during Neurosurgery

  • paper_url: http://arxiv.org/abs/2310.01735
  • repo_url: None
  • paper_authors: Nazim Haouchine, Reuben Dorent, Parikshit Juvekar, Erickson Torio, William M. Wells III, Tina Kapur, Alexandra J. Golby, Sarah Frisken
  • for: 这个论文旨在提出一种新的操作期患者到图像匹配方法,通过学习预期表现来实现。
  • methods: 该方法使用前Operative的医学成像来生成特定病人的预期视图,并通过Camera pose的估计来匹配实时 microscope 视图和预期的 текстура。
  • results: 该方法在 synthetic 数据和6个临床案例的回顾数据上表现出优于当前临床标准的匹配精度。
    Abstract We present a novel method for intraoperative patient-to-image registration by learning Expected Appearances. Our method uses preoperative imaging to synthesize patient-specific expected views through a surgical microscope for a predicted range of transformations. Our method estimates the camera pose by minimizing the dissimilarity between the intraoperative 2D view through the optical microscope and the synthesized expected texture. In contrast to conventional methods, our approach transfers the processing tasks to the preoperative stage, reducing thereby the impact of low-resolution, distorted, and noisy intraoperative images, that often degrade the registration accuracy. We applied our method in the context of neuronavigation during brain surgery. We evaluated our approach on synthetic data and on retrospective data from 6 clinical cases. Our method outperformed state-of-the-art methods and achieved accuracies that met current clinical standards.
    摘要 我们提出了一种新的术前患者到图像匹配方法,通过学习预期出现的形态。我们的方法使用先operative的医疗影像来生成特定患者的预期视图,并且使用这些视图来估算摄像头姿态。与传统方法不同,我们的方法将处理任务传递到先operative阶段,从而减少了低分辨率、扭曲和噪声等因素的影响,提高了匹配精度。我们在脑手术中应用了我们的方法,并在6个临床案例中进行了评估。我们的方法在比较案例中表现出色,与当前临床标准匹配精度相当。

Nugget: Neural Agglomerative Embeddings of Text

  • paper_url: http://arxiv.org/abs/2310.01732
  • repo_url: None
  • paper_authors: Guanghui Qin, Benjamin Van Durme
  • for: 提高语言理解的现代语言处理中,嵌入文本序列是一项广泛的需求。现有的方法主要集中在固定大小表示上。这会导致问题,因为文本中含的信息通常与输入长度成正比。我们提出了一种解决方案,即块(Nugget),它通过动态选择输入符号来编码语言。这些块通过自动编码和机器翻译任务学习,INTUITIVE地将语言分割成有意义的单元。
  • methods: 我们使用了自动编码和机器翻译任务来学习块。
  • results: 我们证明了块在 semantic comparison 任务中超过相关的方法表现。此外,我们还表明了这些紧凑的单元可以扩大语言模型(LM)的语言上下文窗口,因此可能在未来的语言模型中引入更大量的内容。
    Abstract Embedding text sequences is a widespread requirement in modern language understanding. Existing approaches focus largely on constant-size representations. This is problematic, as the amount of information contained in text often varies with the length of the input. We propose a solution called Nugget, which encodes language into a representation based on a dynamically selected subset of input tokens. These nuggets are learned through tasks like autoencoding and machine translation, and intuitively segment language into meaningful units. We demonstrate Nugget outperforms related approaches in tasks involving semantic comparison. Finally, we illustrate these compact units allow for expanding the contextual window of a language model (LM), suggesting new future LMs that can condition on significantly larger amounts of content.
    摘要 <>转换文本序列是现代语言理解中广泛的需求。现有的方法主要关注常量大小的表示。这会导致问题,因为文本中含的信息通常与输入长度相关。我们提议一种解决方案叫做“块”(Nugget),它将语言编码成基于输入Token的动态选择的子集的表示。这些块通过自动编码和机器翻译任务学习,INTUITIVE地将语言分解成意义ful单元。我们示出了Nugget比相关方法在 semantic comparison 任务中表现出色。最后,我们展示了这些紧凑的单元允许扩展语言模型(LM)的Contextual window,建议未来的LM可以condition on significantly larger amounts of content。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01728
  • repo_url: None
  • paper_authors: Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, Qingsong Wen
  • for: 这个研究是为了实现一个可以处理多种时间序列数据的通用时间序列预测模型。
  • methods: 这个研究使用了一个名为Time-LLM的重programming框架,将大语言模型(LLM)重新训练为时间序列预测模型,并使用了Prompt-as-Prefix(PaP)技术来增强模型的时间序列处理能力。
  • results: 研究结果显示,Time-LLM可以实现高效的时间序列预测,并在几何shot和零shot学习情况下都表现出色。
    Abstract Time series forecasting holds significant importance in many real-world dynamic systems and has been extensively studied. Unlike natural language process (NLP) and computer vision (CV), where a single large model can tackle multiple tasks, models for time series forecasting are often specialized, necessitating distinct designs for different tasks and applications. While pre-trained foundation models have made impressive strides in NLP and CV, their development in time series domains has been constrained by data sparsity. Recent studies have revealed that large language models (LLMs) possess robust pattern recognition and reasoning abilities over complex sequences of tokens. However, the challenge remains in effectively aligning the modalities of time series data and natural language to leverage these capabilities. In this work, we present Time-LLM, a reprogramming framework to repurpose LLMs for general time series forecasting with the backbone language models kept intact. We begin by reprogramming the input time series with text prototypes before feeding it into the frozen LLM to align the two modalities. To augment the LLM's ability to reason with time series data, we propose Prompt-as-Prefix (PaP), which enriches the input context and directs the transformation of reprogrammed input patches. The transformed time series patches from the LLM are finally projected to obtain the forecasts. Our comprehensive evaluations demonstrate that Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models. Moreover, Time-LLM excels in both few-shot and zero-shot learning scenarios.
    摘要 时序序列预测具有重要 significancen 在许多实际动态系统中,并已经进行了广泛的研究。 与自然语言处理(NLP)和计算机视觉(CV)不同,时序序列预测的模型通常是特殊化的,需要不同的设计来应对不同的任务和应用。 although pre-trained foundation models have made impressive strides in NLP and CV, their development in time series domains has been constrained by data sparsity. Recent studies have shown that large language models (LLMs) possess robust pattern recognition and reasoning abilities over complex sequences of tokens. However, the challenge remains in effectively aligning the modalities of time series data and natural language to leverage these capabilities.在这种情况下,我们提出了 Time-LLM 框架,用于重新编程 LLMs 以适应通用时序序列预测。我们首先将时序序列数据重新编程为文本原型,然后将其 feed 到冻结的 LLM 中,以实现两个模式之间的对接。为了让 LLM 更好地理解时序序列数据,我们提出了 Prompt-as-Prefix (PaP),它可以在重新编程的输入裁剪上添加更多的上下文信息,并指导重新编程输入的转换。最后,我们将 transformed 时序序列裁剪 projection 以获取预测结果。我们的全面评估表明,Time-LLM 是一种强大的时序序列学习模型,超过了当前最佳特殊化预测模型。此外,Time-LLM 在几个少量和零量学习场景中也表现出色。

Can GPT-4 Replicate Empirical Software Engineering Research?

  • paper_url: http://arxiv.org/abs/2310.01727
  • repo_url: None
  • paper_authors: Jenny T. Liang, Carmen Badea, Christian Bird, Robert DeLine, Denae Ford, Nicole Forsgren, Thomas Zimmermann
  • for: 这个论文旨在探讨大型自然语言模型(LLM)在软件工程实践中的应用,以便帮助软件工程实践者和研究者更好地理解和复制现有的软件工程研究。
  • methods: 这个论文使用了大型自然语言模型(GPT-4)来复制现有的软件工程研究,并对GPT-4生成的假设和分析计划进行评估。研究者采用了用户研究,询问14名软件工程研究专家对GPT-4生成的假设和分析计划进行评估。
  • results: 研究发现,GPT-4可以surface正确的假设,但在生成假设时存在一些问题,如不具备软件工程知识的问题。在手动分析GPT-4生成的代码时,发现代码具有正确的高级逻辑,但具有许多小型实现级别的错误。这些结果有关于使用LLM进行软件工程研究以及软件团队数据科学工作者的启示。
    Abstract Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering- and science-related tasks, these models could help democratize empirical software engineering research. In this paper, we examine LLMs' abilities to perform replications of empirical software engineering research on new data. We specifically study their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggle to generate ones that reflect common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains the correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams.
    摘要 empirical software engineering research on production systems 已经为实践者和研究人员提供了更深刻的理解软件工程过程。然而,只有一小部分的生产系统被研究,这限制了这些研究的影响。软件工程实践者可以通过复制研究来启发自己的数据,但这也存在一些挑战,因为复制研究需要深刻的理解研究方法论和软件工程数据中的细微差别。大语言模型(LLM),如GPT-4,表明它们可以解决软件工程和科学相关的任务。在这篇论文中,我们研究LLM是否能够在新数据上复制Empirical software engineering research。我们Specifically studying their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggle to generate ones that reflect common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains the correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams.

PrACTiS: Perceiver-Attentional Copulas for Time Series

  • paper_url: http://arxiv.org/abs/2310.01720
  • repo_url: None
  • paper_authors: Cat P. Le, Chris Cannella, Ali Hasan, Yuting Ng, Vahid Tarokh
  • for: 提高时间序列预测性能
  • methods: 结合 perceiver 架构和 copula 结构,使用 midpoint inference 和本地注意力机制,并使用 copula-based attention 和输出方差测试机制来捕捉缺失数据的联合分布,从而避免预测中的错误卷积。
  • results: 在单模态和多模态标准测试集上实现了相比先前方法的20%提高,同时使用的内存资源占用率低于50%。
    Abstract Transformers incorporating copula structures have demonstrated remarkable performance in time series prediction. However, their heavy reliance on self-attention mechanisms demands substantial computational resources, thus limiting their practical utility across a wide range of tasks. In this work, we present a model that combines the perceiver architecture with a copula structure to enhance time-series forecasting. By leveraging the perceiver as the encoder, we efficiently transform complex, high-dimensional, multimodal data into a compact latent space, thereby significantly reducing computational demands. To further reduce complexity, we introduce midpoint inference and local attention mechanisms, enabling the model to capture dependencies within imputed samples effectively. Subsequently, we deploy the copula-based attention and output variance testing mechanism to capture the joint distribution of missing data, while simultaneously mitigating error propagation during prediction. Our experimental results on the unimodal and multimodal benchmarks showcase a consistent 20\% improvement over the state-of-the-art methods, while utilizing less than half of available memory resources.
    摘要 transformers 结构含有 copula 结构,在时间序列预测中表现出了非常remarkable的性能。然而,它们对自我注意机制的依赖性很高,因此在许多任务上具有很大的计算资源需求,限制了实际应用的各种任务。在这种情况下,我们提出了一种将 perceiver 架构与 copula 结构结合在一起的模型,以提高时间序列预测性能。通过使用 perceiver 作为编码器,我们可以快速将复杂、高维、多模态数据转化为紧凑的尺度空间,从而减少计算资源的需求。此外,我们引入中点推理和本地注意机制,使模型能够有效地捕捉插入样本之间的依赖关系。然后,我们采用 copula 基于的注意力和输出方差测试机制,以捕捉缺失数据的共同分布,同时避免预测过程中的错误卷积。我们在单模态和多模态标准benchmark上进行了实验,结果显示与状态态别方法相比,我们的方法在20%的情况下具有了一致的改进,同时使用的内存资源只有使用了半个可用的内存资源。Note: The text is translated into Simplified Chinese, which is the standard form of Chinese used in mainland China. The translation is written in the formal style, which is appropriate for academic or professional writing.Here's a word-for-word translation of the text into Traditional Chinese, which is used in Taiwan and other parts of the world: transformers 结构含有 copula 结构,在时间序列预测中表现出了非常remarkable的性能。然而,它们对自我注意机制的依赖性很高,因此在许多任务上具有很大的计算资源需求,限制了实际应用的各种任务。在这种情况下,我们提出了一种将 perceiver 架构与 copula 结构结合在一起的模型,以提高时间序列预测性能。通过使用 perceiver 作为编码器,我们可以快速将复杂、高维、多模态数据转换为紧凑的尺度空间,从而减少计算资源的需求。此外,我们引入中点推理和本地注意机制,使模型能够有效地捕捉插入样本之间的依赖关系。然后,我们采用 copula 基于的注意力和输出方差测试机制,以捕捉缺失数据的共同分布,同时避免预测过程中的错误卷积。我们在单模式和多模式标准benchmark上进行了实验,结果显示与状态别方法相比,我们的方法在20%的情况下具有了一致的改进,同时使用的内存资源只有使用了半个可用的内存资源。

Ensemble Distillation for Unsupervised Constituency Parsing

  • paper_url: http://arxiv.org/abs/2310.01717
  • repo_url: https://github.com/manga-uofa/ed4ucp
  • paper_authors: Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie C. K. Cheung, Lili Mou
  • for: 这个论文主要用于解决无监督成分分析任务,即将句子中单词和短语组织成层次结构,不使用语言学上标注数据。
  • methods: 该论文提出了一种“树平均”思想,并基于此思想提出了一种新的 ensemble 方法。为提高推理效率,该方法还使用了一种学生模型减少过拟合问题。
  • results: 实验显示,该方法比之前的所有方法都更高效和稳定,可以在不同的 ensemble 组件、Run 和领域shift 条件下表现出色。
    Abstract We investigate the unsupervised constituency parsing task, which organizes words and phrases of a sentence into a hierarchical structure without using linguistically annotated data. We observe that existing unsupervised parsers capture differing aspects of parsing structures, which can be leveraged to enhance unsupervised parsing performance. To this end, we propose a notion of "tree averaging," based on which we further propose a novel ensemble method for unsupervised parsing. To improve inference efficiency, we further distill the ensemble knowledge into a student model; such an ensemble-then-distill process is an effective approach to mitigate the over-smoothing problem existing in common multi-teacher distilling methods. Experiments show that our method surpasses all previous approaches, consistently demonstrating its effectiveness and robustness across various runs, with different ensemble components, and under domain-shift conditions.
    摘要 我团队 investigate了无监督成分分析任务,即将句子中单词和短语组织成层次结构,不使用语言学上注解的数据。我们发现现有的无监督解析器捕捉了不同的解析结构方面,可以用来提高无监督解析性能。为此,我们提出了“树平均”的概念,并基于此提出了一种新的集成方法。为了提高推理效率,我们进一步蒸馏集成知识到学生模型中,这种集成然后蒸馏过程能够有效地解决常见的多教师蒸馏方法中的过度熔化问题。实验表明,我们的方法超越了所有之前的方法,在不同的Run中、不同的集成组件中和下游领域转移条件下都能够保持稳定和有效。

cs.CL - 2023-10-03

ResidualTransformer: Residual Low-rank Learning with Weight-sharing for Transformer Layers

  • paper_url: http://arxiv.org/abs/2310.02489
  • repo_url: None
  • paper_authors: Yiming Wang, Jinyu Li
  • for: 降低 Always-on 设备内存占用,以便部署语音处理模型。
  • methods: 重parameterize 模型 веса逻辑,以实现模型压缩。具体来说,我们受到 ResNet 和 LoRA 等工作的启发,提出了名为 ResidualTransformer 的方法,其中每个 Transformer 层的 веса矩阵包括 1) 共享全矩阵 Component 与邻近层,2) 唯一低矩阵 Component 自身。低矩阵矩阵只增加了模型的一小部分大小。此外,我们添加了对角线 weight 矩阵,以提高模型的描述能力。
  • results: 我们在 10k 小时语音识别和语音翻译任务中进行了实验,结果表明,可以将 Transformer Encoder 的大小减少约 3X,而性能下降非常小。
    Abstract Memory constraint of always-on devices is one of the major concerns when deploying speech processing models on these devices. While larger models trained with sufficiently large amount of data generally perform better, making them fit in the device memory is a demanding challenge. In this paper, we aim to reduce model size by reparameterizing model weights across Transformer encoder layers and assuming a special weight composition and structure. More specifically, inspired by ResNet and the more recent LoRA work, we propose an approach named ResidualTransformer, where each weight matrix in a Transformer layer comprises 1) a shared full-rank component with its adjacent layers, and 2) a unique low-rank component to itself. The low-rank matrices only account for a small amount of model size increase. In addition, we add diagonal weight matrices to improve modeling capacity of the low-rank matrices. Experiments of our 10k-hour speech recognition and speech translation tasks show that the Transformer encoder size can be reduced by ~3X with very slight performance degradation.
    摘要 内存限制是always-on设备部署语音处理模型的一个主要问题。虽然更大的模型通常在具有足够数据量时表现更好,但是在设备内存中做出匹配是一项具有挑战性的任务。在这篇论文中,我们想要降低模型大小,通过在Transformer层中重parameterize模型 веса,并假设特定的weight组合和结构。更 Specifically,我们提出了一种方法 named ResidualTransformer,其中每个weight矩阵在Transformer层中包括1)与邻近层共享的全积矩阵,2)自身唯一的低积矩阵。这些低积矩阵只占据模型大小的一小部分。此外,我们添加了对角矩阵以提高低积矩阵的模型容量。我们的10k小时语音识别和语音翻译任务的实验表明,可以将Transformerencoder大小减少到~3X,而表现下降非常小。

Short text classification with machine learning in the social sciences: The case of climate change on Twitter

  • paper_url: http://arxiv.org/abs/2310.04452
  • repo_url: https://github.com/shikarina/short_text_classification
  • paper_authors: Karina Shyrokykh, Maksym Girnyk, Lisa Dellmuth
  • for: 这篇论文是关于社会科学研究中自动分类大量文本的问题,以及计算机科学中提供的一些机器学习方法的性能的比较。
  • methods: 这篇论文使用了一些最常用的文本分类算法,包括超vision学习和深度学习方法,以及一些常用的语料库。
  • results: 研究发现,supervised机器学习方法在分类 tweet 的批处频率增加时表现更好,而传统的机器学习方法和深度学习方法在分类精度上几乎相同,但需要更少的训练时间和计算资源。
    Abstract To analyse large numbers of texts, social science researchers are increasingly confronting the challenge of text classification. When manual labeling is not possible and researchers have to find automatized ways to classify texts, computer science provides a useful toolbox of machine-learning methods whose performance remains understudied in the social sciences. In this article, we compare the performance of the most widely used text classifiers by applying them to a typical research scenario in social science research: a relatively small labeled dataset with infrequent occurrence of categories of interest, which is a part of a large unlabeled dataset. As an example case, we look at Twitter communication regarding climate change, a topic of increasing scholarly interest in interdisciplinary social science research. Using a novel dataset including 5,750 tweets from various international organizations regarding the highly ambiguous concept of climate change, we evaluate the performance of methods in automatically classifying tweets based on whether they are about climate change or not. In this context, we highlight two main findings. First, supervised machine-learning methods perform better than state-of-the-art lexicons, in particular as class balance increases. Second, traditional machine-learning methods, such as logistic regression and random forest, perform similarly to sophisticated deep-learning methods, whilst requiring much less training time and computational resources. The results have important implications for the analysis of short texts in social science research.
    摘要 社会科学研究者在处理大量文本时,面临着文本分类挑战。当手动标注不可行时,计算机科学提供了一个有用的工具箱,包括机器学习方法,其性能在社会科学中尚未得到充分研究。在这篇文章中,我们比较了最常用的文本分类器,并应用它们于一个典型的社会科学研究场景:一个小型标注 dataset,其中分类类型的发生率较低。作为一个例子,我们使用了 Twitter 上关于气候变化的 tweet,这是社会科学研究中越来越受关注的话题。使用一个新的 dataset,包括 5,750 条 tweet 从各国组织中关于高度抽象的气候变化概念,我们评估了不同方法在自动地将 tweet 分类为关于气候变化或者不关于气候变化。在这个上下文中,我们发现了两个主要发现:首先,supervised 机器学习方法在类别均衡度提高时表现更好 чем当前的字典,特别是在类别均衡度提高时。其次,传统的机器学习方法,如逻辑回归和Random Forest,与复杂的深度学习方法相比,表现相似,但需要远少的训练时间和计算资源。这些结果对社会科学研究中处理短文本的分析有重要意义。

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.02457
  • repo_url: None
  • paper_authors: Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale
  • for: 本文探讨了大语言模型(LLM)中的“对齐”概念,通过后结构主义社会政治理论的镜像,具体探讨其与空标语概念的相似之处。
  • methods: 本文提出了一个框架,以帮助研究者在实验数据中操作抽象概念的对齐方面达成共识。这个框架包括三个级别:首先确定重要的模型行为维度,然后对这些维度进行定义和归类,并由谁进行这种归类。
  • results: 本文通过这个框架,提供了一种透明和批判性评估的方法,以帮助社区在对 LLM 与人类 популяции进行对齐时,avigate复杂的对齐过程。
    Abstract In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers. To establish a shared vocabulary around how abstract concepts of alignment are operationalised in empirical datasets, we propose a framework that demarcates: 1) which dimensions of model behaviour are considered important, then 2) how meanings and definitions are ascribed to these dimensions, and by whom. We situate existing empirical literature and provide guidance on deciding which paradigm to follow. Through this framework, we aim to foster a culture of transparency and critical evaluation, aiding the community in navigating the complexities of aligning LLMs with human populations.
    摘要 在这篇论文中,我们通过后结构主义社会政治理论的镜像,特别是空符号的概念,探讨LLMs中的“对齐”概念。为建立对大数据集中抽象概念的操作化的共同词汇,我们提出了一个框架,它包括:1)对模型行为中考虑重要的维度,然后2)对这些维度的含义和定义如何被赋予,以及谁将这些含义和定义塑造成为模型。我们将现有的实证文献综述,并提供指南,以帮助社区选择遵循哪种 парадигмы。通过这个框架,我们希望激发公共透明和批判性评估的文化,以便在对LMMs与人类人口进行对齐时, navigating complexity。

Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes

  • paper_url: http://arxiv.org/abs/2310.02451
  • repo_url: None
  • paper_authors: Xiruo Ding, Zhecheng Sheng, Meliha Yetişgen, Serguei Pakhomov, Trevor Cohen
  • for: 这项研究是为了提高临床自然语言处理(NLP)的性能。
  • methods: 这项研究使用机器学习和深度学习方法来改进临床NLP的性能。
  • results: 研究发现,使用后门调整可以有效地缓解由来源的数据分布差异引起的混合shift问题,并提高模型的Robustness。
    Abstract Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.
    摘要 自然语言处理(NLP)技术已广泛应用于医疗任务中。机器学习和深度学习方法已经用于提高临床NLP性能。然而,这些方法需要训练数据量足够大,而已训练的模型在不同场景下转移性差。这些问题导致了数据收集和集成的促进,以确保准确和可移植的模型。然而,这可能导致一种偏见called隐藏偏见。当数据分布不同在部署时,这可能影响模型性能。为解决这个问题,我们评估了在多地点数据集中的临床笔记中提取substance滥用的文本分类 task中使用后门调整的 utility。使用我们设计的鲁棒性评估框架,我们评估了后门调整的使用。我们的结果表明,后门调整可以有效地抵消隐藏偏见的影响。

Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

  • paper_url: http://arxiv.org/abs/2310.02439
  • repo_url: None
  • paper_authors: Naiming Liu, Shashank Sonkar, Zichao Wang, Simon Woodhead, Richard G. Baraniuk
  • for: 本研究旨在evaluate Large Language Models (LLMs) 的数学理解能力,通过 simulate LLMs 作为 novice learner 和 expert tutor,并通过问题的不准确答案来找到特定的数学误解。
  • methods: 我们的主要方法是 simulate LLMs 作为 novice learner 和 expert tutor,并使用 grade-school math problems 进行实验。
  • results: 我们的实验表明, LLMS 可以轻松地回答这些问题,但它们很难认出特定的数学误解和相应的误解。这些结果提供了新的机会,以提高 LLMS 的数学理解能力,特别是在开发智能教学系统的应用中。
    Abstract We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Contrary to traditional LLMs-based mathematical evaluations that focus on answering math questions correctly, our approach takes inspirations from principles in educational learning sciences. We explicitly ask LLMs to mimic a novice learner by answering questions in a specific incorrect manner based on incomplete knowledge; and to mimic an expert tutor by identifying misconception(s) corresponding to an incorrect answer to a question. Using simple grade-school math problems, our experiments reveal that, while LLMs can easily answer these questions correctly, they struggle to identify 1) the incorrect answer corresponding to specific incomplete knowledge (misconceptions); 2) the misconceptions that explain particular incorrect answers. Our study indicates new opportunities for enhancing LLMs' math reasoning capabilities, especially on developing robust student simulation and expert tutoring models in the educational applications such as intelligent tutoring systems.
    摘要 我们提出了一种新的评估方法,用于评估语言模型(LLM)的数学理解能力,基于数学误解。我们的主要方法是模拟LLM作为新手学生和专家教师,以识别特定误解导致的错误答案,并识别误解。与传统的LLMs-based数学评估方法不同,我们的方法从教育学原则出发,Explicitly要求LLM答题时模拟新手学生的 incomplete 知识基础,并模拟专家教师的误解识别能力。使用Primary school 数学问题,我们的实验表明,虽然LLM可以轻松地回答这些问题,但它们困难于识别1)特定误解对应的错误答案;2)误解所解释的特定错误答案。我们的研究表明,可以通过开发Robust 学生模拟和专家指导模型来增强LLM的数学理解能力,特别是在教育应用程序中,如智能教学系统。

Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions

  • paper_url: http://arxiv.org/abs/2310.02431
  • repo_url: https://github.com/purseclab/llm_security_privacy_advice
  • paper_authors: Yufan Chen, Arjun Arunasalam, Z. Berkay Celik
  • for: This paper aims to measure the ability of Large Language Models (LLMs) to provide reliable security and privacy (S&P) advice by refuting popular S&P misconceptions.
  • methods: The authors use two popular LLMs (Bard and ChatGPT) and develop a labeling guide to evaluate their responses to S&P misconceptions. They also apply three strategies to comprehensively evaluate the responses: querying each misconception multiple times, generating and querying paraphrases, and soliciting source URLs of the responses.
  • results: The authors find that both LLMs demonstrate a non-negligible error rate (21.3% on average) in supporting popular S&P misconceptions, with the error rate increasing when the same or paraphrased misconceptions are repeatedly queried. Additionally, the models may partially support a misconception or remain noncommittal, and they may provide invalid URLs or point to unrelated sources.
    Abstract Users seek security & privacy (S&P) advice from online resources, including trusted websites and content-sharing platforms. These resources help users understand S&P technologies and tools and suggest actionable strategies. Large Language Models (LLMs) have recently emerged as trusted information sources. However, their accuracy and correctness have been called into question. Prior research has outlined the shortcomings of LLMs in answering multiple-choice questions and user ability to inadvertently circumvent model restrictions (e.g., to produce toxic content). Yet, the ability of LLMs to provide reliable S&P advice is not well-explored. In this paper, we measure their ability to refute popular S&P misconceptions that the general public holds. We first study recent academic literature to curate a dataset of over a hundred S&P-related misconceptions across six different topics. We then query two popular LLMs (Bard and ChatGPT) and develop a labeling guide to evaluate their responses to these misconceptions. To comprehensively evaluate their responses, we further apply three strategies: query each misconception multiple times, generate and query their paraphrases, and solicit source URLs of the responses. Both models demonstrate, on average, a 21.3% non-negligible error rate, incorrectly supporting popular S&P misconceptions. The error rate increases to 32.6% when we repeatedly query LLMs with the same or paraphrased misconceptions. We also expose that models may partially support a misconception or remain noncommittal, refusing a firm stance on misconceptions. Our exploration of information sources for responses revealed that LLMs are susceptible to providing invalid URLs (21.2% for Bard and 67.7% for ChatGPT) or point to unrelated sources (44.2% returned by Bard and 18.3% by ChatGPT).
    摘要 用户寻求安全与隐私(S&P)建议从在线资源中,包括可靠的网站和内容分享平台。这些资源帮助用户理解S&P技术和工具,并提供可行的策略。大型自然语言模型(LLM)最近在信息领域上崛起,成为用户信任的信息源。然而,其准确性和正确性受到质疑。先前的研究表明LLMs在回答多选题目时存在缺陷,用户可能会意外绕过模型限制(例如,生成恶意内容)。然而,LLMs是否可以提供可靠的S&P建议尚不彻底探讨。在这篇论文中,我们测量了它们能否推翻公众对S&P相关误区的认知。我们首先遍读最新的学术文献,并从六个不同的话题中筛选出超过一百个S&P相关的误区。然后,我们对两个流行的LLM(Bard和ChatGPT)进行查询,并开发了评估响应的标准化指南。为了全面评估它们的响应,我们还应用了三种策略:每个误区多次查询,生成并查询它们的重叠,以及寻求响应的源URL。两个模型的平均错误率为21.3%,错误地支持公众对S&P相关误区的认知。错误率随着重复查询误区而增加至32.6%。我们还发现,模型可能会部分支持误区,或者拒绝发表明确的看法。我们探索了LLMs提供的信息来源,发现它们可能会提供无效的URL(Bard的21.2%和ChatGPT的67.7%)或者指向无关的源(Bard返回的44.2%和ChatGPT返回的18.3%)。

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

  • paper_url: http://arxiv.org/abs/2310.02410
  • repo_url: None
  • paper_authors: Young Jin Kim, Raffy Fahim, Hany Hassan Awadalla
  • for: 提高语言任务的模型质量,并解决大型混合专家模型(MoE)所带来的内存浪费和带宽瓶颈问题。
  • methods: 提出了一种简单的量化方法——量化专家权重(MoQE),通过对专家权重进行2位量化来降低内存占用和延迟问题。
  • results: 研究表明,在大多数情况下,使用2位量化的专家层可以提供可靠的模型性能,同时减少内存尺寸,并且不需要额外训练。此外,专家层在MoE模型中比普通的批量网络层更强劲对量化。
    Abstract Large Mixture of Experts (MoE) models could achieve state-of-the-art quality on various language tasks, including machine translation task, thanks to the efficient model scaling capability with expert parallelism. However, it has brought a fundamental issue of larger memory consumption and increased memory bandwidth bottleneck at deployment time. In this paper, we propose Mixture of Quantized Experts (MoQE) which is a simple weight-only quantization method applying ultra low-bit down to 2-bit quantizations only to expert weights for mitigating the increased memory and latency issues of MoE models. We show that low-bit quantization together with the MoE architecture delivers a reliable model performance while reducing the memory size significantly even without any additional training in most cases. In particular, expert layers in MoE models are much more robust to the quantization than conventional feedforward networks (FFN) layers. In our comprehensive analysis, we show that MoE models with 2-bit expert weights can deliver better model performance than the dense model trained on the same dataset. As a result of low-bit quantization, we show the model size can be reduced by 79.6% of the original half precision floating point (fp16) MoE model. Combined with an optimized GPU runtime implementation, it also achieves 1.24X speed-up on A100 GPUs.
    摘要 大型混合专家(MoE)模型可以实现不同语言任务的状态级质量,包括机器翻译任务,凭借专家并行化的高效模型扩展能力。然而,这带来了更大的内存消耗和增强的内存带宽瓶颈问题在部署时。在本文中,我们提出了混合量化专家(MoQE),它是一种简单的只有质量化weight的方法,通过ultra低位数量质化专家weight来缓解MoE模型中的内存和延迟问题。我们显示,低位质量化与MoE架构结合可以提供可靠的模型性能,同时减少内存大小,无需额外训练。具体来说,专家层在MoE模型中比普通的Feed Forward Networks(FFN)层更抵抗质量化。在我们的全面分析中,我们表明MoE模型的2位专家weight可以提供比 dense模型在同一个数据集上训练的更好的模型性能。由于低位质量化,我们显示MoE模型的模型大小可以减少79.6%。结合优化的GPU运行时实现,它还实现了A100 GPU上的1.24倍速度提升。

MindTheDApp: A Toolchain for Complex Network-Driven Structural Analysis of Ethereum-based Decentralised Applications

  • paper_url: http://arxiv.org/abs/2310.02408
  • repo_url: None
  • paper_authors: Giacomo Ibba, Sabrina Aufiero, Silvia Bartolucci, Rumyana Neykova, Marco Ortu, Roberto Tonelli, Giuseppe Destefanis
  • for: 这篇论文是为了研究区块链技术中的智能合约和分布式应用程序(DApp)的结构分析而设计的工具链。
  • methods: 该工具链使用ANTLR4和抽象树(AST)旋转技术将智能合约的架构和交互转化为特殊的两个集群图。
  • results: 该图包括两个集群的节点:一个表示智能合约、接口和库,另一个包括函数、事件和修饰符。边在图中连接函数和智能合约,提供细节的交互和执行流视图,帮助研究人员和实践者更深入地理解分布式系统的稳定性、适应性和复杂性。
    Abstract This paper presents MindTheDApp, a toolchain designed specifically for the structural analysis of Ethereum-based Decentralized Applications (DApps), with a distinct focus on a complex network-driven approach. Unlike existing tools, our toolchain combines the power of ANTLR4 and Abstract Syntax Tree (AST) traversal techniques to transform the architecture and interactions within smart contracts into a specialized bipartite graph. This enables advanced network analytics to highlight operational efficiencies within the DApp's architecture. The bipartite graph generated by the proposed tool comprises two sets of nodes: one representing smart contracts, interfaces, and libraries, and the other including functions, events, and modifiers. Edges in the graph connect functions to smart contracts they interact with, offering a granular view of interdependencies and execution flow within the DApp. This network-centric approach allows researchers and practitioners to apply complex network theory in understanding the robustness, adaptability, and intricacies of decentralized systems. Our work contributes to the enhancement of security in smart contracts by allowing the visualisation of the network, and it provides a deep understanding of the architecture and operational logic within DApps. Given the growing importance of smart contracts in the blockchain ecosystem and the emerging application of complex network theory in technology, our toolchain offers a timely contribution to both academic research and practical applications in the field of blockchain technology.
    摘要 The bipartite graph consists of two sets of nodes: one representing smart contracts, interfaces, and libraries, and the other including functions, events, and modifiers. Edges in the graph connect functions to smart contracts they interact with, providing a detailed view of interdependencies and execution flow within the DApp. This network-centric approach allows researchers and practitioners to apply complex network theory to understand the robustness, adaptability, and intricacies of decentralized systems.Our work enhances the security of smart contracts by providing a visual representation of the network and deepening understanding of the architecture and operational logic within DApps. With the growing importance of smart contracts in the blockchain ecosystem and the emerging application of complex network theory in technology, our toolchain offers a timely contribution to both academic research and practical applications in the field of blockchain technology.

Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching

  • paper_url: http://arxiv.org/abs/2310.02382
  • repo_url: https://github.com/lwang114/graphunsupasr
  • paper_authors: Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo
  • for: trains unsupervised speech recognition systems
  • methods: combines lower-order N-skipgrams and positional unigram statistics
  • results: competitive performance in ASR and phoneme segmentation tasks
    Abstract Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N=3) combined with positional unigram statistics gathered from a small batch of samples. Evaluated on the TIMIT benchmark, our model showcases competitive performance in ASR and phoneme segmentation tasks. Access our publicly available code at https://github.com/lwang114/GraphUnsupASR.
    摘要 translate_text="Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N=3) combined with positional unigram statistics gathered from a small batch of samples. Evaluated on the TIMIT benchmark, our model showcases competitive performance in ASR and phoneme segmentation tasks. Access our publicly available code at https://github.com/lwang114/GraphUnsupASR."Here's the translation in Simplified Chinese:训练无监督语音识别系统存在 Gan 相关的不稳定性、语音和文本的不一致性以及巨大的内存需求。为解决这些挑战,我们提出了一种新的 ASR 系统,即 ESPUM。这个系统利用了 N-skipgram 的低阶(最多 N=3)以及一小批样的位置统计来充分利用语音识别和phoneme 分割任务的能力。在 TIMIT bencmark 上评估,我们的模型在 ASR 和 phoneme 分割任务中显示了竞争性的性能。可以通过https://github.com/lwang114/GraphUnsupASR 访问我们公开的代码。

Conversational Health Agents: A Personalized LLM-Powered Agent Framework

  • paper_url: http://arxiv.org/abs/2310.02374
  • repo_url: None
  • paper_authors: Mahyar Abbasian, Iman Azimi, Amir M. Rahmani, Ramesh Jain
  • for: 这篇论文的目的是提高个人健康服务的个性化响应,使用语言模型为谈话启用更多的功能。
  • methods: 该论文提出了一个基于语言模型的框架,以便让健康谈话代理人(CHAs)能够处理复杂的健康问题,包括访问个人用户健康数据、 integrate 最新的健康发现、和与多种数据分析工具交互。
  • results: 通过一个实验研究,论文表明了该框架在处理压力水平估计任务中的能力,展示了代理人的认知和操作能力。
    Abstract Conversational Health Agents (CHAs) are interactive systems designed to enhance personal healthcare services by engaging in empathetic conversations and processing multimodal data. While current CHAs, especially those utilizing Large Language Models (LLMs), primarily focus on conversation, they often need more comprehensive agent capabilities. This limitation includes accessing personal user health data from wearables, ubiquitous data collection sources, and electronic health records, integrating the latest published health insights, and connecting with established multimodal data analysis tools. In this paper, we propose an LLM-powered framework to empower CHAs to generate a personalized response for users' healthcare queries. This framework provides critical thinking, knowledge acquisition, and problem-solving abilities by integrating healthcare data sources, enabling multilingual and multimodal conversations, and interacting with various user data analysis tools. We illustrate the framework's proficiency in handling complex healthcare tasks via a case study on stress level estimation, showcasing the agent's cognitive and operational capabilities.
    摘要 对话健康助手(CHA)是一种互动系统,旨在提高个人医疗服务的质量,通过互动式对话和识别多种数据来提供更好的护理。现有的CHA,特别是使用大型自然语言模型(LLM),通常只专注于对话,它们需要更具全面的代理能力。这些限制包括访问用户穿戴设备上的个人健康数据、 ubique 数据收集源、电子健康记录等,整合最新的发表在医学期刊上的健康发现,以及与已有的多种数据分析工具集成。在这篇论文中,我们提出一个基于 LLM 的框架,以帮助 CHA 为用户的医疗问题提供个性化的回答。这个框架具有 kritical thinking、知识获取和解决问题的能力,通过结合医疗数据源、支持多语言多模式对话、与多种用户数据分析工具集成。我们通过压力水平估计的案例研究,展示了代理的认知和运作能力。

Generalizable Long-Horizon Manipulations with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.02264
  • repo_url: None
  • paper_authors: Haoyu Zhou, Mingyu Ding, Weikun Peng, Masayoshi Tomizuka, Lin Shao, Chuang Gan
  • for: 本研究利用大型自然语言模型(LLM)生成普遍可应用的初级任务条件,用于实现长期满足新物品和未知任务的机器人 manipulate 任务。
  • methods: 本研究使用 LLM 生成和调整动态运动 primitives(DMP)轨迹,以便在长期任务执行中实现高精度和稳定性。
  • results: 实验表明,本研究在 simulated 和实际环境中都能够有效地应用于新物品和相关任务,highlighting LLM 在机器人系统的 universality 和适应性。I hope that helps! Let me know if you have any other questions.
    Abstract This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks. These task conditions serve as guides for the generation and adjustment of Dynamic Movement Primitives (DMP) trajectories for long-horizon task execution. We further create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation. Extensive experiments in both simulated and real-world environments demonstrate the effectiveness of our framework on both familiar tasks involving new objects and novel but related tasks, highlighting the potential of LLMs in enhancing robotic system versatility and adaptability. Project website: https://object814.github.io/Task-Condition-With-LLM/
    摘要 这个研究框架利用大语言模型(LLM)来生成普适的任务条件,用于执行长期任务(long-horizon task),包括使用新物品和未看过的任务。这些任务条件作为指导,用于生成和调整动态运动 primitives(DMP)的轨迹,以便实现长期任务执行。我们还创建了一个复杂的机器人操作任务集,基于Pybullet,用于长期任务评估。实验表明,我们的框架在真实世界和模拟环境中具有很高的效果,能够在不熟悉的任务和新物品上提高机器人系统的多样性和适应力。项目网站:https://object814.github.io/Task-Condition-With-LLM/

Harnessing Pre-Trained Sentence Transformers for Offensive Language Detection in Indian Languages

  • paper_url: http://arxiv.org/abs/2310.02249
  • repo_url: None
  • paper_authors: Ananya Joshi, Raviraj Joshi
  • for: 防止仇恨言语和不良内容在社交媒体平台上迅速扩散
  • methods: 使用 pré-训练的 BERT 和 SBERT 模型,在三种低资源语言(孟加拉语、阿萨姆语和孔雀语)中进行文本分类,判断推文是否包含仇恨言语
  • results: 发现单语言句子 BERT 模型在孟加拉语中表现最佳,但在阿萨姆语和孔雀语中仍有待提高的机会
    Abstract In our increasingly interconnected digital world, social media platforms have emerged as powerful channels for the dissemination of hate speech and offensive content. This work delves into the domain of hate speech detection, placing specific emphasis on three low-resource Indian languages: Bengali, Assamese, and Gujarati. The challenge is framed as a text classification task, aimed at discerning whether a tweet contains offensive or non-offensive content. Leveraging the HASOC 2023 datasets, we fine-tuned pre-trained BERT and SBERT models to evaluate their effectiveness in identifying hate speech. Our findings underscore the superiority of monolingual sentence-BERT models, particularly in the Bengali language, where we achieved the highest ranking. However, the performance in Assamese and Gujarati languages signifies ongoing opportunities for enhancement. Our goal is to foster inclusive online spaces by countering hate speech proliferation.
    摘要 在我们日益连接的数字世界中,社交媒体平台已经成为蔑视言论和不当内容的广泛传播渠道。这项工作探索了蔑视言论检测领域,特别是关注三种低资源的印度语言:孟加拉语、阿萨姆语和 гуджа拉提语。我们将这项工作定义为文本分类任务,目的是判断推文是否包含蔑视或非蔑视内容。我们使用了HASOC 2023 数据集,精度地练习了预训练的 BERT 和 SBERT 模型,以评估它们在蔑视言论检测方面的效果。我们的发现表明,单语言句子 BERT 模型在孟加拉语中表现最佳,而在阿萨姆语和 гуджа拉提语中,还有很多机会进行改进。我们的目标是创造包容的在线空间,以抵消蔑视言论的扩散。

Can Language Models be Instructed to Protect Personal Information?

  • paper_url: http://arxiv.org/abs/2310.02224
  • repo_url: https://github.com/ethanm88/llm-access-control
  • paper_authors: Yang Chen, Ethan Mendes, Sauvik Das, Wei Xu, Alan Ritter
  • for: 本研究旨在评估multimodal语言模型中的隐私保护和实用性之间的贸易关系,并提出 PrivQA 模拟方案来评估这种贸易关系。
  • methods: 本研究使用了一种名为 PrivQA 的多模态测试 benchmark,并提出了一种基于自我调节的回答技术来提高隐私保护。
  • results: 通过一系列的红队攻击实验,研究人员发现了一些简单的破坏攻击方法,这些方法可以通过文本和/或图像输入绕过 PrivQA 中的隐私保护措施。
    Abstract Large multimodal language models have proven transformative in numerous applications. However, these models have been shown to memorize and leak pre-training data, raising serious user privacy and information security concerns. While data leaks should be prevented, it is also crucial to examine the trade-off between the privacy protection and model utility of proposed approaches. In this paper, we introduce PrivQA -- a multimodal benchmark to assess this privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario. We also propose a technique to iteratively self-moderate responses, which significantly improves privacy. However, through a series of red-teaming experiments, we find that adversaries can also easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs. We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections. We release the entire PrivQA dataset at https://llm-access-control.github.io/.
    摘要 大型多modal语言模型在多个应用程序中证明了转型的作用。然而,这些模型被证明可以记忆和泄露预训练数据,从而引起用户隐私和信息安全的严重问题。虽然应避免数据泄露,但也需要考虑提出的方法中隐私保护和模型实用之间的贸易OFF。在这篇论文中,我们介绍了 PrivQA -- 一个多modal benchmark,用于评估在指定个人信息类别的保护情况下,模型的隐私/实用贸易OFF。我们还提出了一种反复自我修饰回应的技术,可以明显提高隐私。然而,通过一系列的红牛实验,我们发现,攻击者可以通过简单的监狱破解方法,通过文本和/或图像输入,轻松绕过这些保护措施。我们认为 PrivQA 可以支持新的隐私保护模型的开发,以及这些保护措施的攻击者鲁棒性。我们在 上发布了整个 PrivQA 数据集。

Large Language Models Meet Knowledge Graphs to Answer Factoid Questions

  • paper_url: http://arxiv.org/abs/2310.02166
  • repo_url: None
  • paper_authors: Mikhail Salnikov, Hai Le, Prateek Rajput, Irina Nikishina, Pavel Braslavski, Valentin Malykh, Alexander Panchenko
  • for: 这 paper 的目的是提高 Text-to-Text 语言模型在Answering factoid questions 中的表现。
  • methods: 该 paper 使用 Knowledge Graph 中的 subgraphs 抽取算法和 Transformer-based 模型来提取有关问题和答案的信息,并通过 linearization 将其转换为可读取的信息。
  • results: 根据这 paper 的实验结果,使用这种方法可以提高 pre-trained Text-to-Text 语言模型的 Hits@1 分数 by 4-6%。
    Abstract Recently, it has been shown that the incorporation of structured knowledge into Large Language Models significantly improves the results for a variety of NLP tasks. In this paper, we propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs for answering factoid questions. More specifically, we propose an algorithm for subgraphs extraction from a Knowledge Graph based on question entities and answer candidates. Then, we procure easily interpreted information with Transformer-based models through the linearization of the extracted subgraphs. Final re-ranking of the answer candidates with the extracted information boosts Hits@1 scores of the pre-trained text-to-text language models by 4-6%.
    摘要

Instance Needs More Care: Rewriting Prompts for Instances Yields Better Zero-Shot Performance

  • paper_url: http://arxiv.org/abs/2310.02107
  • repo_url: https://github.com/salokr/promptd
  • paper_authors: Saurabh Srivastava, Chengyue Huang, Weiguo Fan, Ziyu Yao
  • for: 提高大型自然语言模型(LLM)在零shot任务上的表现,以实现更好的任务泛化和劳动资源节省。
  • methods: 提出一种名为PRoMPTd的方法,通过对每个测试输入重新编写任务提示,以提供更加特定、不ambiguous和完整的指导,以便LLM在零shot情况下正确解决测试任务。
  • results: 对八个数据集进行了测试,包括代数、逻辑推理和代码生成等任务,使用GPT-4作为任务LLM,并获得了相对于传统零shot方法的约10%的绝对改进和5%的相对改进。此外,还证明了 rewrite prompt 可以提供更好的解释如何使LLM解决每个测试输入,这可能可以作为对 adversarial prompting 的防御机制。
    Abstract Enabling large language models (LLMs) to perform tasks in zero-shot has been an appealing goal owing to its labor-saving (i.e., requiring no task-specific annotations); as such, zero-shot prompting approaches also enjoy better task generalizability. To improve LLMs' zero-shot performance, prior work has focused on devising more effective task instructions (e.g., ``let's think step by step'' ). However, we argue that, in order for an LLM to solve them correctly in zero-shot, individual test instances need more carefully designed and customized instructions. To this end, we propose PRoMPTd, an approach that rewrites the task prompt for each individual test input to be more specific, unambiguous, and complete, so as to provide better guidance to the task LLM. We evaluated PRoMPTd on eight datasets covering tasks including arithmetics, logical reasoning, and code generation, using GPT-4 as the task LLM. Notably, PRoMPTd achieves an absolute improvement of around 10% on the complex MATH dataset and 5% on the code generation task on HumanEval, outperforming conventional zero-shot methods. In addition, we also showed that the rewritten prompt can provide better interpretability of how the LLM resolves each test instance, which can potentially be leveraged as a defense mechanism against adversarial prompting. The source code and dataset can be obtained from https://github.com/salokr/PRoMPTd
    摘要 Enable Large Language Models (LLMs) to perform tasks in zero-shot has been an appealing goal due to its labor-saving (i.e., requiring no task-specific annotations); as such, zero-shot prompting approaches also enjoy better task generalizability. To improve LLMs' zero-shot performance, prior work has focused on devising more effective task instructions (e.g., "let's think step by step"). However, we argue that, in order for an LLM to solve them correctly in zero-shot, individual test instances need more carefully designed and customized instructions. To this end, we propose PRoMPTd, an approach that rewrites the task prompt for each individual test input to be more specific, unambiguous, and complete, so as to provide better guidance to the task LLM. We evaluated PRoMPTd on eight datasets covering tasks including arithmetics, logical reasoning, and code generation, using GPT-4 as the task LLM. Notably, PRoMPTd achieves an absolute improvement of around 10% on the complex MATH dataset and 5% on the code generation task on HumanEval, outperforming conventional zero-shot methods. In addition, we also showed that the rewritten prompt can provide better interpretability of how the LLM resolves each test instance, which can potentially be leveraged as a defense mechanism against adversarial prompting. The source code and dataset can be obtained from .

Controlling Topic-Focus Articulation in Meaning-to-Text Generation using Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.02053
  • repo_url: None
  • paper_authors: Chunliu Wang, Rik van Noord, Johan Bos
  • for: 本研究旨在找到控制主题强调表达方式,从意义表示中提取话题信息,使自然语言生成系统生成文本时能够更好地控制语言结构。
  • methods: 本研究使用图 neural network 模型,因为图模型没有显式的单词顺序信息,可以更好地捕捉意义的含义。研究提出了三种不同的主题强调抽象策略,并使用深度优先搜索来学习节点表示。
  • results: 研究结果显示,使用深度优先搜索来学习节点表示可以获得与当前状态的比较竞争性能,并且在主题强调转换任务中具有显著改善。不同的主题强调策略可以对图模型的性能产生很大的影响。
    Abstract A bare meaning representation can be expressed in various ways using natural language, depending on how the information is structured on the surface level. We are interested in finding ways to control topic-focus articulation when generating text from meaning. We focus on distinguishing active and passive voice for sentences with transitive verbs. The idea is to add pragmatic information such as topic to the meaning representation, thereby forcing either active or passive voice when given to a natural language generation system. We use graph neural models because there is no explicit information about word order in a meaning represented by a graph. We try three different methods for topic-focus articulation (TFA) employing graph neural models for a meaning-to-text generation task. We propose a novel encoding strategy about node aggregation in graph neural models, which instead of traditional encoding by aggregating adjacent node information, learns node representations by using depth-first search. The results show our approach can get competitive performance with state-of-art graph models on general text generation, and lead to significant improvements on the task of active-passive conversion compared to traditional adjacency-based aggregation strategies. Different types of TFA can have a huge impact on the performance of the graph models.
    摘要 表示的意义可以通过自然语言的不同表达方式来表达,具体取决于表面上信息的结构。我们关注在生成文本时控制话题焦点落实的方法。我们使用图 neural network,因为图表示的意义没有显式的单词顺序信息。我们尝试了三种不同的话题焦点落实(TFA)方法,使用图 neural network 进行意义到文本生成任务。我们提出了一种新的节点聚合编码策略,而不是传统的邻居信息汇集编码策略,通过深度优先搜索来学习节点表示。结果显示,我们的方法可以与现有的图模型达到竞争性性能,并在活动Passive转换任务上得到显著提高,比传统邻居汇集策略更好。不同的TFA类型可以对图模型的性能产生巨大的影响。

Tuning Large language model for End-to-end Speech Translation

  • paper_url: http://arxiv.org/abs/2310.02050
  • repo_url: None
  • paper_authors: Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Xiaolin Jiao
  • for: 这个论文主要是为了提高多Modal模型在端到端语音翻译(E2E-ST)任务中的表现。
  • methods: 这个论文使用了一个名为LST的大型多Modal模型,包括一个语音前端、一个适配器和一个LLM后端。训练LST包括两个阶段:模态调整和下游任务练习。在模态调整阶段,适配器被调整以将语音表示空间与文本嵌入空间对齐。在下游任务练习阶段,适配器和LLM模型都被训练以优化E2EST任务的表现。
  • results: 实验结果表明,LST-13B在MuST-C语音翻译benchmark上的BLEU分数为30.39/41.55/35.33(En-De/En-Fr/En-Es语言对),超过了之前的模型,创造了新的state-of-the-art。此外,我们还进行了单modal模型选择和训练策略的深入分析,为未来的研究提供了基础。我们将在审核后开源代码和模型。
    Abstract With the emergence of large language models (LLMs), multimodal models based on LLMs have demonstrated significant potential. Models such as LLaSM, X-LLM, and SpeechGPT exhibit an impressive ability to comprehend and generate human instructions. However, their performance often falters when faced with complex tasks like end-to-end speech translation (E2E-ST), a cross-language and cross-modal translation task. In comparison to single-modal models, multimodal models lag behind in these scenarios. This paper introduces LST, a Large multimodal model designed to excel at the E2E-ST task. LST consists of a speech frontend, an adapter, and a LLM backend. The training of LST consists of two stages: (1) Modality adjustment, where the adapter is tuned to align speech representation with text embedding space, and (2) Downstream task fine-tuning, where both the adapter and LLM model are trained to optimize performance on the E2EST task. Experimental results on the MuST-C speech translation benchmark demonstrate that LST-13B achieves BLEU scores of 30.39/41.55/35.33 on En-De/En-Fr/En-Es language pairs, surpassing previous models and establishing a new state-of-the-art. Additionally, we conduct an in-depth analysis of single-modal model selection and the impact of training strategies, which lays the foundation for future research. We will open up our code and models after review.
    摘要 大量语言模型(LLM)的出现,基于LLM的多modal模型在多种任务上表现出了很大的潜力。例如LLaSM、X-LLM和SpeechGPT等模型能够很好地理解和生成人类指令。然而,当面临复杂任务时,如端到端语音翻译(E2E-ST)时,这些模型表现不如单Modal模型。这篇论文介绍了LST,一个大的多modal模型,用于超越E2E-ST任务。LST包括一个语音前端、一个适配器和一个LLM后端。LST的训练过程包括两个阶段:(1)Modal调整, где适配器被调整以将语音表示与文本嵌入空间对齐,以及(2)下游任务练习, donde Both the adapter and LLM model are trained to optimize performance on the E2EST task。实验结果表明,LST-13B在MuST-C语音翻译benchmark上的BLEU分数为30.39/41.55/35.33,超过了之前的模型,并设立了新的状态图。此外,我们还进行了单Modal模型选择和训练策略的深入分析,这些研究 laid the foundation for future research。我们将在审核后开放代码和模型。

Hierarchical Evaluation Framework: Best Practices for Human Evaluation

  • paper_url: http://arxiv.org/abs/2310.01917
  • repo_url: None
  • paper_authors: Iva Bojic, Jessica Chen, Si Yuan Chang, Qi Chwen Ong, Shafiq Joty, Josip Car
  • for: 评估自然语言处理(NLP)系统的质量和相关性。
  • methods: 基于现有文献的分析和自己开发的层次评估框架。
  • results: 评估Machine Reading Comprehension系统的表现,发现输入质量和输出相关性的关系,并且指出了评估输入和输出两个组件的重要性。
    Abstract Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. Through an extensive analysis of existing literature on human evaluation metrics, we identified several gaps in NLP evaluation methodologies. These gaps served as motivation for developing our own hierarchical evaluation framework. The proposed framework offers notable advantages, particularly in providing a more comprehensive representation of the NLP system's performance. We applied this framework to evaluate the developed Machine Reading Comprehension system, which was utilized within a human-AI symbiosis model. The results highlighted the associations between the quality of inputs and outputs, underscoring the necessity to evaluate both components rather than solely focusing on outputs. In future work, we will investigate the potential time-saving benefits of our proposed framework for evaluators assessing NLP systems.
    摘要 人类评估在自然语言处理(NLP)中扮演着关键性的角色,它评估了开发出来的系统的质量和相关性,从而促进其改进。然而,NLP领域没有广泛得到承认的人类评估指标,这使得不同系统之间的比较不公平,而且无法建立通用的评估标准。通过对现有的NLP评估指标文献进行广泛分析,我们发现了NLP评估方法ologies中的一些缺失。这些缺失成为了我们开发自己的层次评估框架的动机。我们的提议的框架具有一些优势,特别是可以更全面地表示NLP系统的性能。我们将这个框架应用于评估我们开发的机器阅读理解系统,该系统在人机合作模式下使用。结果显示输入质量和输出相关性之间存在关系,从而证明需要同时评估输入和输出而不是 solely focus on outputs。在未来的工作中,我们将investigate我们提议的框架可能对评估NLP系统的时间成本带来的优化。

Ring Attention with Blockwise Transformers for Near-Infinite Context

  • paper_url: http://arxiv.org/abs/2310.01889
  • repo_url: https://github.com/lhao499/llm_large_context
  • paper_authors: Hao Liu, Matei Zaharia, Pieter Abbeel
  • for: 提高Transformer模型对长序列的处理能力,解决由各个设备的内存限制所带来的挑战。
  • methods: 提出了一种新的方法 Ring Attention,通过块式计算自注意力来分布长序列 across multiple devices,并在计算块值块和自注意力计算之间进行 overlap communication。
  • results: Ring Attention 可以让序列长度与设备数量成正比,大大提高语言模型的性能和可扩展性。
    Abstract Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands imposed by Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving extended sequences or long-term dependencies. We present a distinct approach, Ring Attention, which leverages blockwise computation of self-attention to distribute long sequences across multiple devices while overlapping the communication of key-value blocks with the computation of blockwise attention. Ring Attention enables training and inference of sequences that are up to device count times longer than those of prior memory-efficient Transformers, effectively eliminating the memory constraints imposed by individual devices. Extensive experiments on language modeling tasks demonstrate the effectiveness of Ring Attention in allowing large sequence input size and improving performance.
    摘要 transformers 已经成为许多现代 AI 模型的建筑物选择,在各种 AI 应用中显示出了极高的性能。然而,transformers 中的内存需求限制了它们处理长序列的能力,从而创造了较长序列或者长期依赖关系的任务中的挑战。我们提出了一种不同的方法, called Ring Attention,它利用了块式计算自注意力来分布长序列 across 多个设备,并在计算块注意力和交换 key-value 块之间重叠。Ring Attention 允许在设备数量 multiplication 的长度上进行训练和推理,从而消除了每个设备的内存限制。广泛的语言模型任务实验证明了 Ring Attention 的有效性,允许大量输入序列和提高性能。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the original text in English, and it is not a word-for-word translation. Some phrases and sentences may be rephrased or condensed to make the translation more concise and natural-sounding in Chinese.

Effective and Parameter-Efficient Reusing Fine-Tuned Models

  • paper_url: http://arxiv.org/abs/2310.01886
  • repo_url: None
  • paper_authors: Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok
  • for: 提高下游任务的效果和精度,降低存储和服务负担
  • methods: 使用噪声任务 вектор权重抑制法、特征值分解法重建LoRA矩阵
  • results: 对计算机视觉和自然语言处理任务进行广泛的实验,提出了Parameter-Efficient methods for ReUsing (PERU) fine-tuned models,PERU-FFT和PERU-LoRA方法比现有的复用模型方法更高效,达到了与每个任务使用精心调整模型的性能水平。
    Abstract Many pre-trained large-scale models provided online have become highly effective in transferring to downstream tasks. At the same time, various task-specific models fine-tuned on these pre-trained models are available online for public use. In practice, as collecting task-specific data is labor-intensive and fine-tuning the large pre-trained models is computationally expensive, one can reuse task-specific finetuned models to deal with downstream tasks. However, using a model per task causes a heavy burden on storage and serving. Recently, many training-free and parameter-efficient methods have been proposed for reusing multiple fine-tuned task-specific models into a single multi-task model. However, these methods exhibit a large accuracy gap compared with using a fine-tuned model per task. In this paper, we propose Parameter-Efficient methods for ReUsing (PERU) fine-tuned models. For reusing Fully Fine-Tuned (FFT) models, we propose PERU-FFT by injecting a sparse task vector into a merged model by magnitude pruning. For reusing LoRA fine-tuned models, we propose PERU-LoRA use a lower-rank matrix to approximate the LoRA matrix by singular value decomposition. Both PERUFFT and PERU-LoRA are training-free. Extensive experiments conducted on computer vision and natural language process tasks demonstrate the effectiveness and parameter-efficiency of the proposed methods. The proposed PERU-FFT and PERU-LoRA outperform existing reusing model methods by a large margin and achieve comparable performance to using a fine-tuned model per task.
    摘要 很多已经预训练的大规模模型在线提供了,这些模型在下游任务上转移得非常有效。同时,各种任务特定的模型在这些预训练模型上进行细化也在线公开使用。在实践中,收集任务特定数据是劳动密集,并在大规模预训练模型上细化是计算昂贵的。因此,可以重用任务特定细化模型来处理下游任务。然而,使用一个模型每个任务会增加存储和服务的压力。近期,许多无需训练和参数效率高的方法被提议用于重用多个细化任务特定模型。然而,这些方法与使用每个任务细化模型的准确性差距很大。在这篇论文中,我们提出了Parameter-Efficient methods for ReUsing (PERU)细化模型。为重用完全细化(FFT)模型,我们提出了PERU-FFT,通过量减法将多个任务的任务 вектор束入一个合并模型中。为重用LoRA细化模型,我们提出了PERU-LoRA,使用下三角矩阵来近似LoRA矩阵。两种PERUFFT和PERU-LoRA都是无需训练的。我们对计算机视觉和自然语言处理任务进行了广泛的实验,并证明了我们提出的方法的有效性和参数效率。我们的PERU-FFT和PERU-LoRA在与使用每个任务细化模型相比,取得了大幅度的提高,并在与使用每个任务细化模型的准确性相同的情况下达到了相似的性能。

Benchmarking and Improving Generator-Validator Consistency of Language Models

  • paper_url: http://arxiv.org/abs/2310.01846
  • repo_url: None
  • paper_authors: Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, Percy Liang
  • for: 提高语言模型(LM)的一致性,增强LM的可靠性和可信度。
  • methods: 提出了一种 generator-validator consistency(GV-consistency)评估框架,并在这个框架下进行了finetuning,以提高LM的一致性和可靠性。
  • results: 通过在 filtered generator和validator responses上进行finetuning,可以提高GV-consistency的率,并且这种方法可以提高 generator质量和validator准确率,无需使用任何标注数据。
    Abstract As of September 2023, ChatGPT correctly answers "what is 7+8" with 15, but when asked "7+8=15, True or False" it responds with "False". This inconsistency between generating and validating an answer is prevalent in language models (LMs) and erodes trust. In this paper, we propose a framework for measuring the consistency between generation and validation (which we call generator-validator consistency, or GV-consistency), finding that even GPT-4, a state-of-the-art LM, is GV-consistent only 76% of the time. To improve the consistency of LMs, we propose to finetune on the filtered generator and validator responses that are GV-consistent, and call this approach consistency fine-tuning. We find that this approach improves GV-consistency of Alpaca-30B from 60% to 93%, and the improvement extrapolates to unseen tasks and domains (e.g., GV-consistency for positive style transfers extrapolates to unseen styles like humor). In addition to improving consistency, consistency fine-tuning improves both generator quality and validator accuracy without using any labeled data. Evaluated across 6 tasks, including math questions, knowledge-intensive QA, and instruction following, our method improves the generator quality by 16% and the validator accuracy by 6.3% across all tasks.
    摘要 Translated into Simplified Chinese:截至2023年9月,ChatGPT正确地回答“7+8”的答案是15,但当被问到“7+8等于15,是真假”时,它回答“false”。这种在生成和验证答案之间的不一致是语言模型(LM)中的一个常见问题,而这种问题会让人失去信任。在这篇论文中,我们提出了一种测量生成和验证之间的一致性框架( generator-validator consistency,简称GV-consistency),并发现even GPT-4,一个状态体系的LM,只有76%的GV-consistency。为了提高LM的一致性,我们提议通过filter了生成和验证响应的GV-consistent来进行finetuning,并称之为一致性精度调整。我们发现,这种方法可以提高Alpaca-30B的GV-consistency从60%提高到93%,并且这种改进可以 extrapolate to unseen tasks and domains(例如,GV-consistency for positive style transfers extrapolates to unseen styles like humor)。此外,一致性精度调整还可以提高生成质量和验证精度,不需要使用任何标注数据。我们对6个任务进行评估,包括数学问题、知识型QA和实行指令,我们发现,我们的方法可以提高生成质量16%和验证精度6.3% across all tasks。

Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment

  • paper_url: http://arxiv.org/abs/2310.01839
  • repo_url: None
  • paper_authors: Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen
  • for: automatic pronunciation assessment (APA) for second language (L2) learners
  • methods: uses neural models with a phonemic contrast ordinal (PCO) loss function to preserve phonemic distinctions and ordinal relationships
  • results: effective in capturing proficiency levels and preserving phonemic distinctions, as demonstrated by experiments on the speechocean762 benchmark dataset
    Abstract Automatic pronunciation assessment (APA) manages to quantify the pronunciation proficiency of a second language (L2) learner in a language. Prevailing approaches to APA normally leverage neural models trained with a regression loss function, such as the mean-squared error (MSE) loss, for proficiency level prediction. Despite most regression models can effectively capture the ordinality of proficiency levels in the feature space, they are confronted with a primary obstacle that different phoneme categories with the same proficiency level are inevitably forced to be close to each other, retaining less phoneme-discriminative information. On account of this, we devise a phonemic contrast ordinal (PCO) loss for training regression-based APA models, which aims to preserve better phonemic distinctions between phoneme categories meanwhile considering ordinal relationships of the regression target output. Specifically, we introduce a phoneme-distinct regularizer into the MSE loss, which encourages feature representations of different phoneme categories to be far apart while simultaneously pulling closer the representations belonging to the same phoneme category by means of weighted distances. An extensive set of experiments carried out on the speechocean762 benchmark dataset suggest the feasibility and effectiveness of our model in relation to some existing state-of-the-art models.
    摘要

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

  • paper_url: http://arxiv.org/abs/2310.01801
  • repo_url: None
  • paper_authors: Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao
  • for: 本研究推出了适应型KV缓存压缩,用于减少生成推理中Large Language Models(LLMs)的内存占用。
  • methods: 我们采用了目标 Profiling 技术来识别抽象模块的内在结构,并根据此构建了适应型KV缓存:将长距离上下文Token Evict,中心特殊Token上的非特殊Token Discard,仅使用标准KV缓存 для全Token扩展。
  • results: 我们在不同的问题上进行了实验,显示了substantial减少GPU内存占用,同时 generation质量损失几乎可以忽略不计。我们将发布我们的代码和相关的CUDA加速器,以便重现。
    Abstract In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs). Different from the conventional KV cache that retains key and value vectors for all context tokens, we conduct targeted profiling to discern the intrinsic structure of attention modules. Based on the recognized structure, we then construct the KV cache in an adaptive manner: evicting long-range contexts on attention heads emphasizing local contexts, discarding non-special tokens on attention heads centered on special tokens, and only employing the standard KV cache for attention heads that broadly attend to all tokens. Moreover, with the lightweight attention profiling used to guide the construction of the adaptive KV cache, FastGen can be deployed without resource-intensive fine-tuning or re-training. In our experiments across various asks, FastGen demonstrates substantial reduction on GPU memory consumption with negligible generation quality loss. We will release our code and the compatible CUDA kernel for reproducibility.
    摘要 在这项研究中,我们介绍了适应型KV缓存压缩,一种插件化方法,用于减少生成推理中大语言模型(LLM)的内存占用。与传统KV缓存不同,我们通过目标 profiling 描述了注意模块的内在结构,并根据认可的结构来构建适应型KV缓存:在注意头中舍弃远程上下文,仅保留当前上下文中的特殊token,并且仅在所有Token上进行标准KV缓存。此外,通过轻量级注意 profiling 导引适应型KV缓存的构建,我们可以在无需资源开销的细致调整或重新训练的情况下部署 FastGen。在我们的实验中,FastGen 在不同的问题上都显示了明显的内存占用减少,而且与生成质量的损失相对较小。我们将在代码和相应的 CUDA 核心上发布我们的实验结果以供参考。

SEA: Sparse Linear Attention with Estimated Attention Mask

  • paper_url: http://arxiv.org/abs/2310.01777
  • repo_url: None
  • paper_authors: Heejun Lee, Jina Kim, Jeffrey Willette, Sung Ju Hwang
  • for: 提高大型transformer模型在受限制的设备上的运行效率,以及使用较少的内存进行语言理解任务。
  • methods: 提出了一种基于kernel的线性注意力算法,并使用top-k选择来实现简化的注意力操作。
  • results: 在语料 Wikitext2 上,与基线OPt-125M模型相比,前一代的线性和简化注意力方法的误差率大约为2倍,而SEA模型则能够达到与OPT-125M模型相同或更好的误差率,使用内存相对较少的 OPT-125M 模型。此外,SEA模型还能够保持可读性的注意力矩阵,并可以使用知识储存来降低现有预训练的复杂性。
    Abstract The transformer architecture has made breakthroughs in recent years on tasks which require modeling pairwise relationships between sequential elements, as is the case in natural language understanding. However, transformers struggle with long sequences due to the quadratic complexity of the attention operation, and previous research has aimed to lower the complexity by sparsifying or linearly approximating the attention matrix. Yet, these approaches cannot straightforwardly distill knowledge from a teacher's attention matrix, and often require complete retraining from scratch. Furthermore, previous sparse and linear approaches may also lose interpretability if they do not produce full quadratic attention matrices. To address these challenges, we propose SEA: Sparse linear attention with an Estimated Attention mask. SEA estimates the attention matrix with linear complexity via kernel-based linear attention, then creates a sparse approximation to the full attention matrix with a top-k selection to perform a sparse attention operation. For language modeling tasks (Wikitext2), previous linear and sparse attention methods show a roughly two-fold worse perplexity scores over the quadratic OPT-125M baseline, while SEA achieves an even better perplexity than OPT-125M, using roughly half as much memory as OPT-125M. Moreover, SEA maintains an interpretable attention matrix and can utilize knowledge distillation to lower the complexity of existing pretrained transformers. We believe that our work will have a large practical impact, as it opens the possibility of running large transformers on resource-limited devices with less memory.
    摘要 “transformer架构在最近几年内获得了重大突破,尤其是在需要模型顺序元素之间的关系的任务上。然而,transformer对于长序列进行处理存在问题,因为对于注意力Matrix的计算有quadratic复杂度,以往的研究尝试透过对注意力Matrix进行简化或线性近似来降低复杂度。然而,这些方法无法直接传递教师的注意力Matrix,并且通常需要从头开始重新训练。此外,过去的简化和线性方法可能会丧失解释性,因为它们不会产生完整的quadratic注意力Matrix。为了解决这些挑战,我们提出了SEA:简化线性注意力 avec Estimated Attention mask。SEA使用线性复杂度估计注意力Matrix,然后将注意力Matrix简化为top-k选择,以进行简化注意力操作。在语言模型任务(Wikitext2)上,过去的线性和简化注意力方法比QUADRATIC OPT-125M基准下的误差分别大约两倍,而SEA则能够在与OPT-125M相同的内存使用量下取得更好的误差分数。此外,SEA保留了解释性的注意力Matrix,并且可以将知识传播到现有的预训transformer中,以降低复杂度。我们认为,我们的工作将对实际上有很大的影响,因为它开启了让大型transformer在资源有限的设备上运行的可能性。”

Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

  • paper_url: http://arxiv.org/abs/2310.01749
  • repo_url: https://github.com/bdusell/stack-attention
  • paper_authors: Brian DuSell, David Chiang
  • for: 这个论文是为了解决自然语言处理中的层次结构识别问题,以提高模型的表达能力和泛化能力。
  • methods: 这篇论文提出了一种新的注意力操作符,即堆栈注意力(Stack Attention),它通过吸收堆栈来处理层次结构,从而提高模型的识别能力。
  • results: 论文通过实验表明,堆栈注意力可以在自然语言处理中提高模型的表达能力和泛化能力,特别是在识别复杂的语言结构时。此外,堆栈注意力还可以在有限资源的情况下提高模型的性能。
    Abstract Attention, specifically scaled dot-product attention, has proven effective for natural language, but it does not have a mechanism for handling hierarchical patterns of arbitrary nesting depth, which limits its ability to recognize certain syntactic structures. To address this shortcoming, we propose stack attention: an attention operator that incorporates stacks, inspired by their theoretical connections to context-free languages (CFLs). We show that stack attention is analogous to standard attention, but with a latent model of syntax that requires no syntactic supervision. We propose two variants: one related to deterministic pushdown automata (PDAs) and one based on nondeterministic PDAs, which allows transformers to recognize arbitrary CFLs. We show that transformers with stack attention are very effective at learning CFLs that standard transformers struggle on, achieving strong results on a CFL with theoretically maximal parsing difficulty. We also show that stack attention is more effective at natural language modeling under a constrained parameter budget, and we include results on machine translation.
    摘要 注意,特别是缩放乘制注意力,在自然语言中证明有效,但它没有机制来处理嵌套深度的层次结构,这限制了它的能力来认识certain syntactic structures。为解决这个缺陷,我们提出栈注意力:一种注意力运算符,它利用栈来模拟语法结构。我们表明,栈注意力与标准注意力相似,但它需要无语法监督的隐藏语言模型。我们提出了两种变体:一种与deterministic pushdown automata (PDAs)相关,另一种基于nondeterministic PDAs,允许transformers认识任意context-free languages (CFLs)。我们显示,transformers with stack attention在学习CFLs中表现出色,在一个理论上最大的解析难度下 achieve strong results。我们还显示,栈注意力在一定参数预算下更有效于自然语言处理,并包括了machine translation的结果。

Deciphering Diagnoses: How Large Language Models Explanations Influence Clinical Decision Making

  • paper_url: http://arxiv.org/abs/2310.01708
  • repo_url: None
  • paper_authors: D. Umerenkov, G. Zubkova, A. Nesterov
  • For: This study aims to evaluate the effectiveness and reliability of Large Language Models (LLMs) in generating explanations for diagnoses based on patient complaints.* Methods: The study uses LLMs to generate explanations of the connection between patient complaints and doctor and model-assigned diagnoses, and evaluates the explanations with three experienced doctors across several stages.* Results: The study found that LLM explanations significantly increased doctors’ agreement rates with given diagnoses, but also highlighted potential errors in LLM outputs, ranging from 5% to 30%.Here’s the information in Simplified Chinese text:
  • for: 这项研究旨在评估大语言模型(LLMs)在基于病人投诉的诊断上的效果和可靠性。
  • methods: 该研究使用LLMs生成病人投诉与医生和模型诊断之间的联系,并通过三名经验丰富的医生在多个阶段进行评估。
  • results: 研究发现,LLM解释significтельно提高了医生对诊断的同意率,但也揭示了LLM输出中的可能错误,范围为5%到30%。
    Abstract Clinical Decision Support Systems (CDSS) utilize evidence-based knowledge and patient data to offer real-time recommendations, with Large Language Models (LLMs) emerging as a promising tool to generate plain-text explanations for medical decisions. This study explores the effectiveness and reliability of LLMs in generating explanations for diagnoses based on patient complaints. Three experienced doctors evaluated LLM-generated explanations of the connection between patient complaints and doctor and model-assigned diagnoses across several stages. Experimental results demonstrated that LLM explanations significantly increased doctors' agreement rates with given diagnoses and highlighted potential errors in LLM outputs, ranging from 5% to 30%. The study underscores the potential and challenges of LLMs in healthcare and emphasizes the need for careful integration and evaluation to ensure patient safety and optimal clinical utility.
    摘要 临床决策支持系统(CDSS)利用证据基础知识和患者数据提供实时建议,大型自然语言模型(LLMs)在生成普通文本解释方面表现出了潜在的优势。本研究探讨了LLM在基于患者投诉生成诊断关系的效iveness和可靠性。三位临床医生评估了LLM生成的患者投诉与医生和模型诊断之间的关系,结果显示LLM解释可以显著提高医生们对诊断的一致率,并且揭示了LLM输出中的可能错误,从5%到30%不等。本研究强调了LLM在医疗领域的潜力和挑战,并提醒了仔细考虑和评估,以确保患者安全和优化临床实用性。

cs.LG - 2023-10-03

DON-LSTM: Multi-Resolution Learning with DeepONets and Long Short-Term Memory Neural Networks

  • paper_url: http://arxiv.org/abs/2310.02491
  • repo_url: https://github.com/katarzynamichalowska/don_lstm
  • paper_authors: Katarzyna Michałowska, Somdatta Goswami, George Em Karniadakis, Signe Riemer-Sørensen
  • for: 用于长时间演化多非线性系统模型化
  • methods: 使用 DeepONet 和 LSTM 两种 arquitectures,利用多分辨率数据并 capture 长序时间相关性
  • results: 对多非线性系统进行长时间演化模型化,比vanilla版本更低抗泛化误差,需要 fewer高分辨率样本
    Abstract Deep operator networks (DeepONets, DONs) offer a distinct advantage over traditional neural networks in their ability to be trained on multi-resolution data. This property becomes especially relevant in real-world scenarios where high-resolution measurements are difficult to obtain, while low-resolution data is more readily available. Nevertheless, DeepONets alone often struggle to capture and maintain dependencies over long sequences compared to other state-of-the-art algorithms. We propose a novel architecture, named DON-LSTM, which extends the DeepONet with a long short-term memory network (LSTM). Combining these two architectures, we equip the network with explicit mechanisms to leverage multi-resolution data, as well as capture temporal dependencies in long sequences. We test our method on long-time-evolution modeling of multiple non-linear systems and show that the proposed multi-resolution DON-LSTM achieves significantly lower generalization error and requires fewer high-resolution samples compared to its vanilla counterparts.
    摘要 深度运算网络(深度ONets,DONs)在训练多分辨率数据方面具有明显的优势,这种特性在实际应用中非常重要,因为高分辨率测量往往困难,而低分辨率数据却更易获取。然而,独立的DeepONets经常强制要求长时间序列之间的依赖关系,与其他当前算法相比,它们往往不具备稳定性。我们提出了一种新的架构,名为DON-LSTM,它将DeepONet与长短期记忆网络(LSTM)结合在一起。通过这两种架构,我们为网络提供了明确的多分辨率数据利用机制和长时间序列依赖关系捕捉机制。我们对多个非线性系统的长时间演化模型进行测试,并显示我们的多分辨率DON-LSTM方法可以减少总体误差,并且需要 fewer高分辨率样本。

Splitting the Difference on Adversarial Training

  • paper_url: http://arxiv.org/abs/2310.02480
  • repo_url: https://github.com/matanle51/splitting-the-difference-on-adversarial-training
  • paper_authors: Matan Levi, Aryeh Kontorovich
  • for: 本研究旨在提高深度神经网络的抗骚动能力,不同于传统的鲁棒化方法,我们采用了对各个类型的抗骚动例进行分类,从而简化决策边界。
  • methods: 我们的方法是对各个类型的抗骚动例进行分类,并对每个类型的抗骚动例进行独立的学习。这将每个类型的决策边界简化为两个简单的边界。
  • results: 我们的实验结果表明,我们的方法可以很好地保持模型的自然准确率,同时提高模型的抗骚动能力。在CIFAR-10数据集上,我们 obtianed near-optimal natural accuracy of 95.01% alongside significant robustness across multiple tasks。
    Abstract The existence of adversarial examples points to a basic weakness of deep neural networks. One of the most effective defenses against such examples, adversarial training, entails training models with some degree of robustness, usually at the expense of a degraded natural accuracy. Most adversarial training methods aim to learn a model that finds, for each class, a common decision boundary encompassing both the clean and perturbed examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned, effectively splitting each class into two classes: "clean" and "adversarial." This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries. We provide a theoretical plausibility argument that sheds some light on the conditions under which our approach can be expected to be beneficial. Likewise, we empirically demonstrate that our method learns robust models while attaining optimal or near-optimal natural accuracy, e.g., on CIFAR-10 we obtain near-optimal natural accuracy of $95.01\%$ alongside significant robustness across multiple tasks. The ability to achieve such near-optimal natural accuracy, while maintaining a significant level of robustness, makes our method applicable to real-world applications where natural accuracy is at a premium. As a whole, our main contribution is a general method that confers a significant level of robustness upon classifiers with only minor or negligible degradation of their natural accuracy.
    摘要 “深度神经网络的存在表明了它们的基本弱点。一种非常有效的防御方法是对抗训练,通过在模型训练中添加抗击例来提高模型的Robustness,通常会导致模型的自然精度下降。大多数对抗训练方法都是通过学习每个类的共同决策边界来防止抗击例的。在这个工作中,我们采取了一种根本不同的方法,即将每个类的抗击例视为一个分开的类,并将每个类分为两个类:“干净”和“抗击”。这将 doubles the number of classes to be learned, but at the same time, it significantly simplifies the decision boundaries. 我们提供了一种理论可能性的论证,可以解释在哪些条件下,我们的方法可以带来好的效果。同时,我们也通过实验证明了我们的方法可以学习Robust的模型,同时保持自然精度的优化或近似优化,例如在CIFAR-10上,我们获得了95.01%的自然精度,并在多个任务上展现了显著的Robustness。能够实现这种近似自然精度,同时保持显著的Robustness,使得我们的方法适用于实际应用中, где自然精度是非常重要。总之,我们的主要贡献是一种可以带来显著的Robustness的通用方法,只需要模型的自然精度下降很小或无法感知。”

ML4EJ: Decoding the Role of Urban Features in Shaping Environmental Injustice Using Interpretable Machine Learning

  • paper_url: http://arxiv.org/abs/2310.02476
  • repo_url: None
  • paper_authors: Yu-Hsuan Ho, Zhewei Liu, Cheng-Chun Lee, Ali Mostafavi
    for: This paper aims to examine the effects of various urban features and their non-linear interactions on exposure disparities of three primary hazards: air pollution, urban heat, and flooding.methods: The study uses an interpretable machine learning model, combining Random Forest and XGBoost, with data from six metropolitan counties in the United States to train and test the models.results: The analysis reveals that features related to social-demographic characteristics are the most prominent urban features that shape hazard extent, while features related to infrastructure distribution and land cover are relatively important for urban heat and air pollution exposure, respectively. The study also finds limited transferability among different regions and hazards, highlighting the intricate differences among hazards and regions and the way in which urban features shape hazard exposures.
    Abstract Understanding the key factors shaping environmental hazard exposures and their associated environmental injustice issues is vital for formulating equitable policy measures. Traditional perspectives on environmental injustice have primarily focused on the socioeconomic dimensions, often overlooking the influence of heterogeneous urban characteristics. This limited view may obstruct a comprehensive understanding of the complex nature of environmental justice and its relationship with urban design features. To address this gap, this study creates an interpretable machine learning model to examine the effects of various urban features and their non-linear interactions to the exposure disparities of three primary hazards: air pollution, urban heat, and flooding. The analysis trains and tests models with data from six metropolitan counties in the United States using Random Forest and XGBoost. The performance is used to measure the extent to which variations of urban features shape disparities in environmental hazard levels. In addition, the analysis of feature importance reveals features related to social-demographic characteristics as the most prominent urban features that shape hazard extent. Features related to infrastructure distribution and land cover are relatively important for urban heat and air pollution exposure respectively. Moreover, we evaluate the models' transferability across different regions and hazards. The results highlight limited transferability, underscoring the intricate differences among hazards and regions and the way in which urban features shape hazard exposures. The insights gleaned from this study offer fresh perspectives on the relationship among urban features and their interplay with environmental hazard exposure disparities, informing the development of more integrated urban design policies to enhance social equity and environmental injustice issues.
    摘要 理解环境风险暴露的关键因素和相关的环境正义问题对形ulating公平的政策措施至关重要。传统的环境正义观点主要关注社会经济维度,常常忽略城市特有的多样性特征。这种有限的视角可能会阻碍我们对环境正义的复杂性和与城市设计特征之间的关系进行全面的理解。为了缺失这个缺陷,本研究创建了可解释的机器学习模型,检查不同城市特征和它们的非线性相互作用对三种主要环境风险的暴露差异产生影响。研究使用美国六个都会县的数据,使用Random Forest和XGBoost进行训练和测试。模型的性能用于衡量城市特征变化对环境风险水平的影响。此外,特征重要性分析表明社会人口特征相关的特征是城市特征的最显著之处,而基础设施分布和土地覆盖相关的特征对城市热量和空气污染暴露有重要影响。此外,我们还评估模型在不同地区和风险之间的传送性。结果表明模型在不同地区和风险之间具有有限的传送性,这反映了风险和地区之间的复杂关系,以及城市特征如何影响环境风险暴露。研究的发现可以为城市规划政策的开发提供新的视角,以便更好地保护社会公平和环境正义问题。

Prompting-based Efficient Temporal Domain Generalization

  • paper_url: http://arxiv.org/abs/2310.02473
  • repo_url: None
  • paper_authors: Sepidehsadat Hosseini, Mengyao Zhai, Hossein Hajimirsadegh, Frederick Tung
  • for: 本研究旨在解决机器学习模型在不同时间段上的泛化问题,即训练和测试数据的分布不同的情况下,模型的泛化性强度不足。
  • methods: 我们提出了一种基于提问的 temporal domain generalization 方法,该方法不需要训练时间段中的目标频道数据(未来时间段的数据),并且可以在不同任务上(如分类、回归、时间序列预测)进行扩展。我们学习了全局提问、域pecific提问以及演化aware提问,以捕捉下降的时间动态。
  • results: 我们的方法在多种任务上实现了新的state-of-the-art benchmark,并且可以快速、parameter-efficient地在不同时间段上进行泛化。代码库将公开。
    Abstract Machine learning traditionally assumes that training and testing data are distributed independently and identically. However, in many real-world settings, the data distribution can shift over time, leading to poor generalization of trained models in future time periods. Our paper presents a novel prompting-based approach to temporal domain generalization that is parameter-efficient, time-efficient, and does not require access to the target domain data (i.e., unseen future time periods) during training. Our method adapts a target pre-trained model to temporal drift by learning global prompts, domain-specific prompts, and drift-aware prompts that capture underlying temporal dynamics. It is compatible across diverse tasks, such as classification, regression, and time series forecasting, and sets a new state-of-the-art benchmark in temporal domain generalization. The code repository will be publicly shared.
    摘要 传统上,机器学习假设训练和测试数据独立和相同分布。然而,在实际场景中,数据分布可能会随时间变化,导致训练模型在未来时间期间的泛化异常。我们的论文提出了一种新的提示基本方法,用于 temporal domain generalization,它是 parameter-efficient、time-efficient,并且不需要在训练过程中访问目标域数据(即未来时间期间的数据)。我们的方法通过学习全局提示、域特定提示和演变意识提示,以捕捉下游时间动力学。它可以与多种任务相容,如分类、回归和时间序列预测,并设置了新的 temporal domain generalization benchmark。代码仓库将公开发布。

Differentiable Chemical Physics by Geometric Deep Learning for Gradient-based Property Optimization of Mixtures

  • paper_url: http://arxiv.org/abs/2310.03047
  • repo_url: None
  • paper_authors: Shang Zhu, Bharath Ramsundar, Emil Annevelink, Hongyi Lin, Adarsh Dave, Pin-Wen Guan, Kevin Gering, Venkatasubramanian Viswanathan
  • for: 模型化化学混合物的多目标性能和约束,用于化学过程和电化学设备中。
  • methods: 利用几何深度学习(GDL)将分子种、组成和环境条件映射到物理系数,扩展混合物 термодинамиче和运输方程。
  • results: 比其数据驱动变体更高精度和模型可靠性,以及可以效率地优化电解质运输性能。
    Abstract Chemical mixtures, satisfying multi-objective performance metrics and constraints, enable their use in chemical processes and electrochemical devices. In this work, we develop a differentiable chemical-physics framework for modeling chemical mixtures, DiffMix, where geometric deep learning (GDL) is leveraged to map from molecular species, compositions and environment conditions, to physical coefficients in the mixture physics laws. In particular, we extend mixture thermodynamic and transport laws by creating learnable physical coefficients, where we use graph neural networks as the molecule encoder and enforce component-wise permutation-invariance. We start our model evaluations with thermodynamics of binary mixtures, and further benchmarked multicomponent electrolyte mixtures on their transport properties, in order to test the model generalizability. We show improved prediction accuracy and model robustness of DiffMix than its purely data-driven variants. Furthermore, we demonstrate the efficient optimization of electrolyte transport properties, built on the gradient obtained using DiffMix auto-differentiation. Our simulation runs are then backed up by the data generated by a robotic experimentation setup, Clio. By combining mixture physics and GDL, DiffMix expands the predictive modeling methods for chemical mixtures and provides low-cost optimization approaches in large chemical spaces.
    摘要 We begin by evaluating the thermodynamics of binary mixtures and then benchmark the transport properties of multicomponent electrolyte mixtures to test the model's generalizability. Our results show that DiffMix has improved prediction accuracy and model robustness compared to purely data-driven variants. Additionally, we demonstrate the efficient optimization of electrolyte transport properties using the gradient obtained from DiffMix auto-differentiation. Our simulations are supported by data generated by a robotic experimentation setup, Clio.By combining mixture physics and GDL, DiffMix expands the predictive modeling methods for chemical mixtures and provides low-cost optimization approaches in large chemical spaces.

Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming

  • paper_url: http://arxiv.org/abs/2310.02459
  • repo_url: None
  • paper_authors: Alaa Eddine Chriat, Chuangchuang Sun
  • for: 本研究的目的是提供一种可追踪的分布安全学习框架,以保证安全性在面临分布风险时。
  • methods: 本研究使用了分布学习和归一化拟合来处理分布风险,并使用了duality理论和可微编程来简化问题。
  • results: 研究表明,该框架可以提供有效的安全保证,并且在不同的系统中表现出了显著的改善。
    Abstract Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this paper, we present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space where distributional shift is measured, to a finite-dimensional parametric space. Moreover, by differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules: a convex quadratic program to guarantee safety followed by a projected gradient ascent to simultaneously find the worst-case uncertainty. This end-to-end differentiable framework with safety constraints, to the best of our knowledge, is the first tractable single-level solution to address distributional safety. We test our approach on first and second-order systems with varying complexities and compare our results with the uncertainty-agnostic policies, where our approach demonstrates a significant improvement on safety guarantees.
    摘要 安全保证是不可缺的,特别在安全关键环境中存在悖Number� distributional shift (例如,分布变化),特别是在人类在循环中。然而,在安全学习中包含不确定性会自然导致两级问题,其中在下一级水平上 evaluate 安全约束的最坏情况。在这篇论文中,我们提出了一种可追踪的分布安全学习框架,以便在 Wasserstein metric 中度量分布性的变化下保证安全。为了提高可追踪性,我们首先使用 dual theory 将下一级优化从无穷多样性空间转移到 фиFinite-dimensional 参数空间。此外,通过可微转化的凸 программирова,我们将 bi-level safe learning 问题降低到一个单级问题,其中有两个可计算效率的模块:一个凸quadratic program 确保安全,然后是一个 projected gradient ascent 同时找到最坏情况的不确定性。这种整体可追踪的框架,至于我们所知道的是第一个可追踪单级解决方案,可以保证分布安全。我们在不同复杂度的 first 和 second 阶系统上测试了我们的方法,并与不确定性无关的策略进行比较。我们的方法在安全保证方面表现出了显著的改善。

Dual-stage Flows-based Generative Modeling for Traceable Urban Planning

  • paper_url: http://arxiv.org/abs/2310.02453
  • repo_url: None
  • paper_authors: Xuanming Hu, Wei Fan, Dongjie Wang, Pengyang Wang, Yong Li, Yanjie Fu
  • for: 这篇论文的目的是为了开发一个自动化的城市规划技术,以解决传统城市规划的复杂和严重的问题。
  • methods: 这篇论文使用了一种基于普通化流的新生成框架,即双阶段城市流程(DSUF)框架。这个框架包括两个阶段:首先是使用区域水平的城市规划流程来生成城市功能区,然后使用资讯融合模块来捕捉不同功能区之间的关系,最后是使用配置水平的城市规划流程来从融合的信息中获得适当的土地使用配置。
  • results: 论文的实验结果显示,这个新生成框架可以比其他生成模型更好地完成城市规划任务。
    Abstract Urban planning, which aims to design feasible land-use configurations for target areas, has become increasingly essential due to the high-speed urbanization process in the modern era. However, the traditional urban planning conducted by human designers can be a complex and onerous task. Thanks to the advancement of deep learning algorithms, researchers have started to develop automated planning techniques. While these models have exhibited promising results, they still grapple with a couple of unresolved limitations: 1) Ignoring the relationship between urban functional zones and configurations and failing to capture the relationship among different functional zones. 2) Less interpretable and stable generation process. To overcome these limitations, we propose a novel generative framework based on normalizing flows, namely Dual-stage Urban Flows (DSUF) framework. Specifically, the first stage is to utilize zone-level urban planning flows to generate urban functional zones based on given surrounding contexts and human guidance. Then we employ an Information Fusion Module to capture the relationship among functional zones and fuse the information of different aspects. The second stage is to use configuration-level urban planning flows to obtain land-use configurations derived from fused information. We design several experiments to indicate that our framework can outperform compared to other generative models for the urban planning task.
    摘要 现代化城市规划,旨在设计可行的土地利用配置,由于高速城市化过程而变得越来越重要。然而,由人设计的传统城市规划可能是复杂且费时的任务。随着深度学习算法的发展,研究人员开始开发自动化规划技术。虽然这些模型已经表现出了扎根的结果,但它们仍然面临一些未解决的限制:1)忽略城市功能区域和配置之间的关系,不能捕捉不同功能区域之间的关系。2) menos可读性和稳定性。为了超越这些限制,我们提出了一种基于恒常流的新生成框架,即双stage城市流体系(DSUF)框架。具体来说,第一stage是利用城市规划流体系来生成基于周围上下文和人类指导的城市功能区域。然后我们利用信息融合模块来捕捉不同功能区域之间的关系,并融合不同方面的信息。第二stage是使用配置级别的城市规划流体系来从融合的信息中获得来自不同方面的配置。我们设计了一些实验,表明我们的框架可以在城市规划任务中超越其他生成模型。

Feather: An Elegant Solution to Effective DNN Sparsification

  • paper_url: http://arxiv.org/abs/2310.02448
  • repo_url: https://github.com/athglentis/feather
  • paper_authors: Athanasios Glentis Georgoulakis, George Retsinas, Petros Maragos
  • for: 这篇论文的目的是提出一种高效的神经网络减少模型,以适应资源有限的环境,保持高性能。
  • methods: 这篇论文使用了一种名为Feather的简单 Training模块,利用Straight-Through Estimator作为核心,以及一种新的阈值选择器和梯度缩放技术,实现了稳定和可靠的减少性能。
  • results: 这篇论文在CIFAR dataset上使用了不同的架构进行评测,并在ImageNet上使用ResNet-50架构实现了验证集顶部1的最佳表现,超过了现有的方法,包括更复杂和计算沉重的方法,并且具有了更高的灵活性和可扩展性。
    Abstract Neural Network pruning is an increasingly popular way for producing compact and efficient models, suitable for resource-limited environments, while preserving high performance. While the pruning can be performed using a multi-cycle training and fine-tuning process, the recent trend is to encompass the sparsification process during the standard course of training. To this end, we introduce Feather, an efficient sparse training module utilizing the powerful Straight-Through Estimator as its core, coupled with a new thresholding operator and a gradient scaling technique, enabling robust, out-of-the-box sparsification performance. Feather's effectiveness and adaptability is demonstrated using various architectures on the CIFAR dataset, while on ImageNet it achieves state-of-the-art Top-1 validation accuracy using the ResNet-50 architecture, surpassing existing methods, including more complex and computationally heavy ones, by a considerable margin. Code is publicly available at https://github.com/athglentis/feather .
    摘要 neural network 剪除是一种日益受欢迎的方法,用于生成具有资源限制环境的高效和吞吐量小的模型,同时保持高性能。而在过去,剪除通常需要通过多轮训练和精度调整的过程来实现,但现在的趋势是在标准训练过程中包含剪除过程。为此,我们介绍了 Feather,一种高效的笔墨训练模块,利用强大的 Straight-Through Estimator 为核心,并与新的阈值操作和Gradient Scaling技术相结合,实现了稳定、直接的剪除性能。Feather 在不同的架构上在 CIFAR 数据集上进行了证明,而使用 ResNet-50 架构在 ImageNet 上达到了状态之arte Top-1 验证性能,超过了现有的方法,包括更复杂和计算沉重的方法,并且差距较大。代码可以在 上获取。

Machine learning assist nyc subway navigation safer and faster

  • paper_url: http://arxiv.org/abs/2310.02447
  • repo_url: None
  • paper_authors: Wencheng Bao, Shi Feng
  • for: 本研究旨在寻找一种能够兼顾安全和效率的主流导航软件,如Google和Apple Maps等,通常缺乏提供安全优先的路线。然而,安全仍然是许多人的首要关注点。我们的目标是找到一种能够平衡安全和效率的方法。
  • methods: 我们将使用Integer Programming模型,考虑到最短路和最安全的路线。我们将使用机器学习来 derive safety coefficients,包括Generalized Linear Models、线性回归和回归神经网络。我们的评价基于 Root Mean Square Error(RMSE)在不同的地铁站之间,帮助我们选择最准确的安全系数估计模型。
  • results: 我们将对不同的最短路算法进行全面评估,根据时间复杂度和实际数据来判断它们在合并安全和时间效率方面的适用程度。
    Abstract Mainstream navigation software, like Google and Apple Maps, often lacks the ability to provide routes prioritizing safety. However, safety remains a paramount concern for many. Our aim is to strike a balance between safety and efficiency. To achieve this, we're devising an Integer Programming model that takes into account both the shortest path and the safest route. We will harness machine learning to derive safety coefficients, employing methodologies such as generalized linear models, linear regression, and recurrent neural networks. Our evaluation will be based on the Root Mean Square Error (RMSE) across various subway stations, helping us identify the most accurate model for safety coefficient estimation. Furthermore, we'll conduct a comprehensive review of different shortest-path algorithms, assessing them based on time complexity and real-world data to determine their appropriateness in merging both safety and time efficiency.
    摘要 主流导航软件,如Google和Apple Maps,经常缺乏安全性优先级路径提供功能。然而,安全仍然是许多人的关键问题。我们的目标是找到一个平衡安全性和效率的方案。为此,我们正在开发一个整数编程模型,考虑到最短路和最安全的路径。我们将利用机器学习来 derivate 安全系数,采用方法包括泛化线性模型、线性回归和循环神经网络。我们的评估基于不同站点之间的平均方差(RMSE),帮助我们选择最准确的安全系数估计模型。此外,我们将对不同的最短路算法进行全面评估,根据时间复杂度和实际数据来确定它们是否适合将安全性和时间效率融合在一起。

GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature

  • paper_url: http://arxiv.org/abs/2310.02442
  • repo_url: None
  • paper_authors: Aaron Ferber, Arman Zharmagambetov, Taoan Huang, Bistra Dilkina, Yuandong Tian
  • for: This paper aims to address the challenge of generating diverse and high-quality solutions for combinatorial optimization problems, which are common in computer graphics, animation, industrial design, and other fields.
  • methods: The proposed method, called GenCO, integrates deep generative models with embedded combinatorial solvers to generate instances of combinatorial optimization problems. The method differs from conventional generative models in that it focuses on generating combinatorial solutions rather than final objects.
  • results: The authors demonstrate the effectiveness of GenCO on a variety of generative tasks characterized by combinatorial intricacies, including game level generation and map creation for path planning. The results show that GenCO can generate diverse, high-quality solutions that reliably adhere to user-specified combinatorial properties.
    Abstract Generating diverse objects (e.g., images) using generative models (such as GAN or VAE) has achieved impressive results in the recent years, to help solve many design problems that are traditionally done by humans. Going beyond image generation, we aim to find solutions to more general design problems, in which both the diversity of the design and conformity of constraints are important. Such a setting has applications in computer graphics, animation, industrial design, material science, etc, in which we may want the output of the generator to follow discrete/combinatorial constraints and penalize any deviation, which is non-trivial with existing generative models and optimization solvers. To address this, we propose GenCO, a novel framework that conducts end-to-end training of deep generative models integrated with embedded combinatorial solvers, aiming to uncover high-quality solutions aligned with nonlinear objectives. While structurally akin to conventional generative models, GenCO diverges in its role - it focuses on generating instances of combinatorial optimization problems rather than final objects (e.g., images). This shift allows finer control over the generated outputs, enabling assessments of their feasibility and introducing an additional combinatorial loss component. We demonstrate the effectiveness of our approach on a variety of generative tasks characterized by combinatorial intricacies, including game level generation and map creation for path planning, consistently demonstrating its capability to yield diverse, high-quality solutions that reliably adhere to user-specified combinatorial properties.
    摘要 <>Translate the following text into Simplified Chinese.<>生成多样化物品(如图像)使用生成模型(如GAN或VAE)在过去几年内已经取得了很好的成绩,以解决人类设计问题。不过,我们想要超出图像生成,解决更加一般的设计问题,其中包括多样化设计和约束的兼容性都很重要。这种情况在计算机图形、动画、工业设计、材料科学等领域都有应用,我们可能希望生成器的输出遵循离散/组合约束,并对任何偏差进行惩罚,这是现有的生成模型和优化算法所不能做的。为此,我们提出了GenCO框架,它是一种结合深度生成模型和嵌入式 combinatorial solvers 的新框架,旨在通过终端训练来找到高质量的解决方案,这些解决方案遵循非线性目标。虽然结构上与传统的生成模型相似,GenCO 却不同之处在于它的目标是生成 combinatorial 优化问题的实例而不是最终物品(如图像)。这种转移使得可以更细控制生成的输出,并允许评估其可行性,同时还可以添加额外的 combinatorial 损失Component。我们在多种生成任务中进行了详细的测试,包括游戏水平生成和路径规划map创建,一直表现出GenCO 能够生成多样化、高质量的解决方案,可靠地遵循用户指定的 combinatorial 性perty。

EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations

  • paper_url: http://arxiv.org/abs/2310.02428
  • repo_url: None
  • paper_authors: Vaibhav Bihani, Utkarsh Pratiush, Sajid Mannan, Tao Du, Zhimin Chen, Santiago Miret, Matthieu Micoulaut, Morten M Smedskjaer, Sayan Ranu, N M Anoop Krishnan
  • for: This paper aims to evaluate the performance of equivariant graph neural network (EGraFF) force fields for real-world atomistic simulations, and to provide a systematic benchmarking of six EGraFF algorithms.
  • methods: The paper uses eight existing datasets and releases two new benchmark datasets to evaluate the performance of EGraFF models. The authors also propose four new metrics and three new challenging tasks to assess the models’ capabilities and limitations.
  • results: The study finds that no single EGraFF model outperforms others on all datasets and tasks, and that the performance of all models on out-of-distribution datasets is unreliable. The authors highlight the need for developing a foundation model for force fields that can be used in real-world simulations.
    Abstract Equivariant graph neural networks force fields (EGraFFs) have shown great promise in modelling complex interactions in atomic systems by exploiting the graphs' inherent symmetries. Recent works have led to a surge in the development of novel architectures that incorporate equivariance-based inductive biases alongside architectural innovations like graph transformers and message passing to model atomic interactions. However, thorough evaluations of these deploying EGraFFs for the downstream task of real-world atomistic simulations, is lacking. To this end, here we perform a systematic benchmarking of 6 EGraFF algorithms (NequIP, Allegro, BOTNet, MACE, Equiformer, TorchMDNet), with the aim of understanding their capabilities and limitations for realistic atomistic simulations. In addition to our thorough evaluation and analysis on eight existing datasets based on the benchmarking literature, we release two new benchmark datasets, propose four new metrics, and three new challenging tasks. The new datasets and tasks evaluate the performance of EGraFF to out-of-distribution data, in terms of different crystal structures, temperatures, and new molecules. Interestingly, evaluation of the EGraFF models based on dynamic simulations reveals that having a lower error on energy or force does not guarantee stable or reliable simulation or faithful replication of the atomic structures. Moreover, we find that no model clearly outperforms other models on all datasets and tasks. Importantly, we show that the performance of all the models on out-of-distribution datasets is unreliable, pointing to the need for the development of a foundation model for force fields that can be used in real-world simulations. In summary, this work establishes a rigorous framework for evaluating machine learning force fields in the context of atomic simulations and points to open research challenges within this domain.
    摘要 “具有equivariant graph neural network力场(EGraFF)的研究在模型复杂原子系统中表现出了惊人的搭配。最近的研究带来了新的架构,包括graph transformer和信使通信,以及具有equivariant-based inductive bias的新架构。然而,对这些EGraFF模型在真实原子 simulate task中的部署,尚缺乏系统性的评估。因此,我们在这里进行了系统性的比较分析,以了解EGraFF模型在真实原子 simulate task中的能力和局限性。我们分析了8个现有的数据集,并释放了2个新的数据集,并提出了4个新的指标。新的数据集和任务评估EGraFF模型对不同的晶体结构、温度和新的分子的表现。我们发现,对于动态 simulations,EGraFF模型的误差在能量或力方面的低不一定意味着可靠或可信的模拟或原子结构的忠实复制。此外,我们发现没有任何模型在所有数据集和任务中表现出优异。最后,我们发现EGraFF模型对于不同的晶体结构、温度和新的分子的表现不可靠, pointing to the need for developing a foundation model for force fields that can be used in real-world simulations。总之,本文建立了一个严格的评估框架,用于在原子 simulate task中评估机器学习力场,并指出了这个领域的未来研究挑战。”

Delta-AI: Local objectives for amortized inference in sparse graphical models

  • paper_url: http://arxiv.org/abs/2310.02423
  • repo_url: https://github.com/gfnorg/delta-ai
  • paper_authors: Jean-Pierre Falet, Hae Beom Lee, Nikolay Malkin, Chen Sun, Dragos Secrieru, Dinghuai Zhang, Guillaume Lajoie, Yoshua Bengio
  • for: 本研究旨在提出一种新的概率图模型(PGM)中的权重学习算法,以便实现吞吐量的推理。
  • methods: 该算法基于PGM的稀疏性,通过对变量的采样视为智能机器人的行为序列来实现本地归因,从而使用生成流网络(GFlowNets)的方法来实现偏置训练,并且不需要为每个参数更新instantiate all random variables,因此可以很快速地训练。
  • results: 该算法可以快速地从 sintetic PGM 和具有稀疏因子结构的秘密变量模型中采样,并且可以实现对部分变量的推理。
    Abstract We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $\Delta$-amortized inference ($\Delta$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local constraint that can be turned into a local loss in the style of generative flow networks (GFlowNets) that enables off-policy training but avoids the need to instantiate all the random variables for each parameter update, thus speeding up training considerably. The $\Delta$-AI objective matches the conditional distribution of a variable given its Markov blanket in a tractable learned sampler, which has the structure of a Bayesian network, with the same conditional distribution under the target PGM. As such, the trained sampler recovers marginals and conditional distributions of interest and enables inference of partial subsets of variables. We illustrate $\Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
    摘要 我们提出了一种新的算法,即$\Delta$-抽象推理($\Delta$-AI),用于快速的推理和学习含有稀疏概率图模型(PGM)的含义。我们的方法基于PGM的稀疏性,使得在变量抽取视为动作序列时,可以在代理人的政策学习目标中实现本地归因。这些本地约束可以转化为本地损失,类似于生成流网络(GFlowNets),以便在不需要为每个参数更新都实例化所有随机变量的情况下进行快速训练。$\Delta$-AI目标是将变量 conditional distribution 与其Markov blanket中的 conditional distribution匹配,这个目标可以通过一种简单的学习探针来实现,这个探针具有 Bayesian network 的结构。因此,训练完成后的探针可以重建有 интерес的 marginals 和 conditional distribution,并且可以进行部分变量的推理。我们在synthetic PGM 和含有稀疏因子结构的秘密变量模型上证明了$\Delta$-AI的有效性。

Automated Bug Generation in the era of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.02407
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Ali Reza Ibrahimzada, Yang Chen, Ryan Rong, Reyhaneh Jabbarvand
  • For: 这个论文的目的是为了提出一种能够生成多种复杂的bug的方法,以便进行软件工程中的测试和修复。* Methods: 这个方法使用了深度学习模型(LLMs)来对代码进行多处修改,以便生成复杂的bug。具体来说,它会根据模型的注意力来指定修改的位置,以确保修改不会导致代码表示异常大的变化。* Results: 该方法在320万多个bug中进行了广泛的评估,并与两种替代方法进行比较。结果显示,该方法可以更好地生成难以检测和难以修复的bug,并且可以与当前最佳的学习型程序修复技术进行compatible。
    Abstract Bugs are essential in software engineering; many research studies in the past decades have been proposed to detect, localize, and repair bugs in software systems. Effectiveness evaluation of such techniques requires complex bugs, i.e., those that are hard to detect through testing and hard to repair through debugging. From the classic software engineering point of view, a hard-to-repair bug differs from the correct code in multiple locations, making it hard to localize and repair. Hard-to-detect bugs, on the other hand, manifest themselves under specific test inputs and reachability conditions. These two objectives, i.e., generating hard-to-detect and hard-to-repair bugs, are mostly aligned; a bug generation technique can change multiple statements to be covered only under a specific set of inputs. However, these two objectives are conflicting for learning-based techniques: A bug should have a similar code representation to the correct code in the training data to challenge a bug prediction model to distinguish them. The hard-to-repair bug definition remains the same but with a caveat: the more a bug differs from the original code (at multiple locations), the more distant their representations are and easier to be detected. We propose BugFarm, to transform arbitrary code into multiple complex bugs. BugFarm leverages LLMs to mutate code in multiple locations (hard-to-repair). To ensure that multiple modifications do not notably change the code representation, BugFarm analyzes the attention of the underlying model and instructs LLMs to only change the least attended locations (hard-to-detect). Our comprehensive evaluation of 320k+ bugs from over 2.5M mutants generated by BugFarm and two alternative approaches demonstrates our superiority in generating bugs that are hard to detect by learning-based bug prediction approaches and hard to repair by SOTA learning-based program repair technique.
    摘要 软件工程中,虫子是必需的;多个研究在过去几十年中已经提出了检测、定位和修复软件系统中的虫子的技术。评估这些技术的有效性需要复杂的虫子,即具有检测和修复困难的虫子。从 классиcal的软件工程角度来看,困难修复的虫子与正确代码在多个位置有所不同,使其困难于定位和修复。另一方面,困难检测的虫子通常会在特定的输入和达到性条件下表现出来。这两个目标在一起来说是可以匹配的,一种虫子生成技术可以通过特定的输入来覆盖多个语句。然而,这两个目标在学习基本的技术来说是矛盾的:一个虫子应该与正确代码在训练数据中的代码表示相似,以挑战虫子预测模型将其与正确代码区分开来。困难修复虫子的定义保持不变,但是有一点需要注意:随着虫子与原始代码之间的差异增加,它们的代码表示变得更加不同,并且更容易被检测。我们提出了 BugFarm,它可以将任意代码转换成多种复杂的虫子。BugFarm 利用 LLMS 进行多个位置的мутирование(困难修复),以确保多个修改不会显著改变代码表示。为了确保这些修改不会导致代码表示变得更加不同,BugFarm 分析了下游模型的注意力并 instruc LLMS 只变化最少注意力的位置(困难检测)。我们对 BugFarm 和两种替代方法生成的320万个虫子以及250万个 mutants进行了全面的评估,结果表明我们的技术在使用学习基本的检测和修复技术时,可以更好地生成困难检测和困难修复的虫子。

On the Parallel Complexity of Multilevel Monte Carlo in Stochastic Gradient Descent

  • paper_url: http://arxiv.org/abs/2310.02402
  • repo_url: None
  • paper_authors: Kei Ishikawa
  • for: 这个研究是用于探讨Stochastic Gradient Descent(SGD)中的sequential simulation,特别是Neural Stochastic Differential Equations(NSDE)中的Multilevel Monte Carlo(MLMC)方法。
  • methods: 本研究使用了delayed MLMC gradient estimator,它可以将MLMC的大量且平行的复杂性压缩到naive Monte Carlo方法的水平以下,从而提高SGD的并发性。
  • results: 在numerical experiments中,我们使用了deep hedging来证明了我们的方法可以在SGD中提高并发性,并且与标准MLMC方法相比,其parallel complexity可以得到更好的性能。
    Abstract In the stochastic gradient descent (SGD) for sequential simulations such as the neural stochastic differential equations, the Multilevel Monte Carlo (MLMC) method is known to offer better theoretical computational complexity compared to the naive Monte Carlo approach. However, in practice, MLMC scales poorly on massively parallel computing platforms such as modern GPUs, because of its large parallel complexity which is equivalent to that of the naive Monte Carlo method. To cope with this issue, we propose the delayed MLMC gradient estimator that drastically reduces the parallel complexity of MLMC by recycling previously computed gradient components from earlier steps of SGD. The proposed estimator provably reduces the average parallel complexity per iteration at the cost of a slightly worse per-iteration convergence rate. In our numerical experiments, we use an example of deep hedging to demonstrate the superior parallel complexity of our method compared to the standard MLMC in SGD.
    摘要 在随机梯度下降(SGD)中的连续模拟中,多层 Monte Carlo(MLMC)方法在理论上有更好的计算复杂度,但在实践中,MLMC在现代GPU上的批处理复杂度却与朴素Monte Carlo方法相当。为了解决这个问题,我们提议使用延迟MLMC梯度估计器,可以减少MLMC的平行复杂度,但是会付出一定的更快的平行复杂度。在我们的数值实验中,我们使用深度保险作为例子,展示了我们的方法在SGD中的更好的平行复杂度比标准MLMC。Here's the translation in Traditional Chinese:在随机梯度下降(SGD)中的连续模拟中,多层 Monte Carlo(MLMC)方法在理论上有更好的计算复杂度,但在实践中,MLMC在现代GPU上的批处理复杂度却与朴素Monte Carlo方法相等。为了解决这个问题,我们提议使用延迟MLMC梯度估计器,可以减少MLMC的平行复杂度,但是会付出一定的更快的平行复杂度。在我们的数值实验中,我们使用深度保险作为例子,展示了我们的方法在SGD中的更好的平行复杂度比标准MLMC。

Reducing Intraspecies and Interspecies Covariate Shift in Traumatic Brain Injury EEG of Humans and Mice Using Transfer Euclidean Alignment

  • paper_url: http://arxiv.org/abs/2310.02398
  • repo_url: None
  • paper_authors: Manoj Vishwanath, Steven Cao, Nikil Dutt, Amir M. Rahmani, Miranda M. Lim, Hung Cao
  • for: 这篇论文旨在提出一种转移学习技术来解决生物医学数据缺乏问题,以提高机器学习和深度学习模型在不同数据集上的表现。
  • methods: 本论文使用了转移学习技术,试用了不同的机器学习模型,包括rule-based机器学习模型和EEGNet-based深度学习模型,并在不同的数据集上进行了评估。
  • results: 研究发现,转移学习技术可以提高机器学习和深度学习模型在不同数据集上的表现,特别是在内species数据集上的表现有14.42%的提升,在interspecies数据集上的表现有5.53%的提升。
    Abstract While analytics of sleep electroencephalography (EEG) holds certain advantages over other methods in clinical applications, high variability across subjects poses a significant challenge when it comes to deploying machine learning models for classification tasks in the real world. In such instances, machine learning models that exhibit exceptional performance on a specific dataset may not necessarily demonstrate similar proficiency when applied to a distinct dataset for the same task. The scarcity of high-quality biomedical data further compounds this challenge, making it difficult to evaluate the model's generality comprehensively. In this paper, we introduce Transfer Euclidean Alignment - a transfer learning technique to tackle the problem of the dearth of human biomedical data for training deep learning models. We tested the robustness of this transfer learning technique on various rule-based classical machine learning models as well as the EEGNet-based deep learning model by evaluating on different datasets, including human and mouse data in a binary classification task of detecting individuals with versus without traumatic brain injury (TBI). By demonstrating notable improvements with an average increase of 14.42% for intraspecies datasets and 5.53% for interspecies datasets, our findings underscore the importance of the use of transfer learning to improve the performance of machine learning and deep learning models when using diverse datasets for training.
    摘要 While sleep electroencephalography (EEG) 的分析有一些优势在临床应用中,人工数据的高变化性对于部署机器学习模型进行分类任务在实际世界中带来了一定的挑战。在这种情况下,可能会有一些机器学习模型在特定的数据集上表现出色,但是在应用于不同的数据集上可能并不会达到相同的水平。医疗数据的稀缺性进一步增加了这种挑战,使得不可能全面评估模型的通用性。在这篇论文中,我们提出了传输欧几丁素对策 - 一种传输学习技术,以解决人类生物医学数据的缺乏问题。我们在不同的规则基于的传统机器学习模型和EEGNet基于的深度学习模型上测试了这种传输学习技术的可Robustness,并在人类和鼠标数据上进行了一种二分类任务,即识别患有 versus 不患有脑部 травматиче创伤 (TBI)。我们的发现表明,使用传输学习技术可以提高机器学习和深度学习模型在使用多种数据集进行训练时的性能,特别是在�traspecies数据集上。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide that instead.

Implicit regularization of multi-task learning and finetuning in overparameterized neural networks

  • paper_url: http://arxiv.org/abs/2310.02396
  • repo_url: None
  • paper_authors: Jack W. Lindsey, Samuel Lippl
  • for: 这个论文研究了 auxiliary task 学习的 inductive biases,包括同时学习 (multi-task learning, MTL) 和预训练后 fine-tuning (PT+FT)。
  • methods: 作者使用了二层对角线Linear Network,并使用梯度下降来训练。
  • results: 研究发现,在训练 auxiliary task 时,网络会受到各种各样的 inductive biases,包括强制共享任务之间的特征和特征精炼。这些特征会导致网络在继续训练时运行在一种混合的 “核心”(或 “懒散”)模式和 “特征学习” (“rich”) 模式之间。此外,PT+FT 还可能导致一种新的 “嵌入特征学习” 行为,这会帮助网络提取一个稀缺的特征集。在 ReLU 网络中,作者复制了这些Qualitative behaviors。此外,作者还发现,PT+FT 会学习一些与auxiliary task 相关的 yet distinct from 的特征,而 MTL 则倾向于使用同一个特征来解决两个任务。因此,在实际情况下,MTL 在数据 scarcity 情况下更好地通用,而 PT+FT 在有更多数据可用时表现更好。作者还证明了这些结论在图像分类任务上是正确的。
    Abstract It is common in deep learning to train networks on auxiliary tasks with the expectation that the learning will transfer, at least partially, to another task of interest. In this work, we investigate the inductive biases that result from learning auxiliary tasks, either simultaneously (multi-task learning, MTL) or sequentially (pretraining and subsequent finetuning, PT+FT). In the simplified setting of two-layer diagonal linear networks trained with gradient descent, we identify implicit regularization penalties associated with MTL and PT+FT, both of which incentivize feature sharing between tasks and sparsity in learned task-specific features. Notably, our results imply that during finetuning, networks operate in a hybrid of the kernel (or "lazy") regime and the feature learning ("rich") regime identified in prior work. Moreover, PT+FT can exhibit a novel "nested feature learning" behavior not captured by either regime, which biases it to extract a sparse subset of the features learned during pretraining. In ReLU networks, we reproduce all of these qualitative behaviors. We also observe that PT+FT (but not MTL) is biased to learn features that are correlated with (but distinct from) those needed for the auxiliary task, while MTL is biased toward using identical features for both tasks. As a result, we find that in realistic settings, MTL generalizes better when comparatively little data is available for the task of interest, while PT+FT outperforms it with more data available. We show that our findings hold qualitatively for a deep architecture trained on image classification tasks. Our characterization of the nested feature learning regime also motivates a modification to PT+FT that we find empirically improves performance. Overall, our results shed light on the impact of auxiliary task learning and suggest ways to leverage it more effectively.
    摘要 通常在深度学习中,我们会使用 auxiliary task 来培养网络,期望学习将传播到另一个任务中。在这项工作中,我们研究了协助任务学习中的偏见,包括同时学习多个任务(多任务学习,MTL)和先后学习和精度调整(预训练和后续精度调整,PT+FT)。在简化的两层对角线网络中,我们发现了同时学习和预训练+精度调整都会带来隐式的规范化罚款,这些罚款激励feature sharing между任务和精度调整。另外,我们发现在继续训练时,网络会处于“囊括”(或“懒散”)模式和“特征学习”(或“丰富”)模式之间,并且PT+FT可能会展现出一种新的“嵌套特征学习”行为,这种行为会吸引网络学习一 subset of the features during pretraining。在ReLU网络中,我们复制了所有这些qualitative行为。此外,我们发现PT+FT(而不是MTL)会学习与auxiliary task相关的 yet distinct from 的特征,而MTL则倾向于使用同一个特征来进行两个任务。因此,我们发现在实际情况下,MTL在数据不足时会更好地泛化,而PT+FT在有更多数据时会表现更好。我们发现这些结论在图像分类任务上也是如此。我们的结论还适用于深度架构,并且我们的“嵌套特征学习”模式的描述也鼓励了一种PT+FT的修改,我们在实验中发现这种修改可以提高性能。总之,我们的结论 shed light on the impact of auxiliary task learning和提供了更好地利用它的方法。

Secure and Effective Data Appraisal for Machine Learning

  • paper_url: http://arxiv.org/abs/2310.02373
  • repo_url: None
  • paper_authors: Xu Ouyang, Changhong Yang, Felix Xiaozhu Lin, Yangfeng Ji
  • for: 该论文目的是提出一种能够在数据所有者和模型所有者之间实现隐私保护的数据选择方法,以便在数据和模型之间进行交易。
  • methods: 该论文使用多方计算(MPC)技术来评估目标模型,并提出了一种新的方法,可以在评估过程中实现数据选择。
  • results: 该论文的实验结果表明,相比直接使用MPC评估目标模型,该新方法可以减少评估时间从千小时减少到只有几十小时,并且准确率下降的程度非常小(仅0.20%)。
    Abstract Essential for an unfettered data market is the ability to discreetly select and evaluate training data before finalizing a transaction between the data owner and model owner. To safeguard the privacy of both data and model, this process involves scrutinizing the target model through Multi-Party Computation (MPC). While prior research has posited that the MPC-based evaluation of Transformer models is excessively resource-intensive, this paper introduces an innovative approach that renders data selection practical. The contributions of this study encompass three pivotal elements: (1) a groundbreaking pipeline for confidential data selection using MPC, (2) replicating intricate high-dimensional operations with simplified low-dimensional MLPs trained on a limited subset of pertinent data, and (3) implementing MPC in a concurrent, multi-phase manner. The proposed method is assessed across an array of Transformer models and NLP/CV benchmarks. In comparison to the direct MPC-based evaluation of the target model, our approach substantially reduces the time required, from thousands of hours to mere tens of hours, with only a nominal 0.20% dip in accuracy when training with the selected data.
    摘要 为创建一个不受限制的数据市场,必须能够私下选择和评估训练数据 перед完成数据所有者和模型所有者之间的交易。为了保护数据和模型的隐私,这个过程通过多方计算(MPC)进行。 although prior research has suggested that MPC-based evaluation of Transformer models is too resource-intensive, this paper proposes an innovative approach that makes data selection practical. The contributions of this study include three key elements:1. A groundbreaking pipeline for confidential data selection using MPC.2. Replicating complex high-dimensional operations with simplified low-dimensional MLPs trained on a limited subset of relevant data.3. Implementing MPC in a concurrent, multi-phase manner.The proposed method is evaluated across a range of Transformer models and NLP/CV benchmarks. Compared to direct MPC-based evaluation of the target model, our approach significantly reduces the time required, from thousands of hours to just tens of hours, with only a minimal 0.20% decrease in accuracy when training with the selected data.

Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

  • paper_url: http://arxiv.org/abs/2310.02368
  • repo_url: None
  • paper_authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
  • For: The paper aims to improve the quality of test cases generated by Large Language Models (LLMs) using Reinforcement Learning (RL) and static quality metrics.* Methods: The authors propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM), which involves analyzing anti-patterns generated by LLMs, training specific reward models for each static quality metric, and using Proximal Policy Optimization (PPO) to train models for optimizing a single quality metric at a time.* Results: The authors show that RL-optimized models consistently generate high-quality test cases compared to the base LLM, improving the model by up to 21%, and successfully generate nearly 100% syntactically correct code. RLSQM also outperformed GPT-4 on four out of seven metrics.Here’s the information in Simplified Chinese text:* For: 本研究目标是使用强大语言模型(LLMs)和静态质量指标(static quality metrics)提高测试用例的质量。* Methods: 作者提出了一种新的技术 called Reinforcement Learning from Static Quality Metrics(RLSQM),该技术包括分析 LLMS 生成的反模式,Specific reward models for each static quality metric,并使用 Proximal Policy Optimization(PPO)来训练模型。* Results: 作者表明,RL-优化的模型在 LLMS 基础上 consistently 生成高质量的测试用例,提高模型的质量达到 21%,并成功生成nearly 100% 正确的代码。RLSM 还在四个metric中超过 GPT-4。
    Abstract Software testing is a crucial aspect of software development, and the creation of high-quality tests that adhere to best practices is essential for effective maintenance. Recently, Large Language Models (LLMs) have gained popularity for code generation, including the automated creation of test cases. However, these LLMs are often trained on vast amounts of publicly available code, which may include test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM). To begin, we analyze the anti-patterns generated by the LLM and show that LLMs can generate undesirable test smells. Thus, we train specific reward models for each static quality metric, then utilize Proximal Policy Optimization (PPO) to train models for optimizing a single quality metric at a time. Furthermore, we amalgamate these rewards into a unified reward model aimed at capturing different best practices and quality aspects of tests. By comparing RL-trained models with those trained using supervised learning, we provide insights into how reliably utilize RL to improve test generation quality and into the effects of various training strategies. Our experimental results demonstrate that the RL-optimized model consistently generated high-quality test cases compared to the base LLM, improving the model by up to 21%, and successfully generates nearly 100% syntactically correct code. RLSQM also outperformed GPT-4 on four out of seven metrics. This represents a significant step towards enhancing the overall efficiency and reliability of software testing through Reinforcement Learning and static quality metrics. Our data are available at this link: https://figshare.com/s/ded476c8d4c221222849.
    摘要 软件测试是软件开发过程中不可或缺的一部分,创建符合最佳实践的测试用例是软件维护的关键。最近,大型自然语言模型(LLM)在代码生成方面得到了广泛的应用,其中包括自动生成测试用例。然而,这些LLM通常是通过庞大量公共可用的代码进行训练,这些代码可能包含不符合最佳实践的测试用例和测试臭味(anti-patterns)。为解决这个问题,我们提出了一种新的技术 called Reinforcement Learning from Static Quality Metrics(RLSQM)。我们首先分析了LLM生成的测试臭味,并证明LLM可以生成不良测试用例。然后,我们为每种静态质量指标提供特定的奖励模型,然后使用Proximal Policy Optimization(PPO)训练模型,以便在不同的质量指标上优化单个测试用例。此外,我们将这些奖励汇集到一个统一的奖励模型,以捕捉不同的最佳实践和质量方面。通过比较RL训练的模型与超出学习训练的模型,我们提供了如何可靠地利用RL提高测试生成质量的信息,以及不同训练策略的效果。我们的实验结果表明,RL优化后的模型可以与基础LLM相比提高至21%,并成功生成99%的符号正确的代码。RLSQM还在四个 из七个指标上超越GPT-4。这表明RLSQM在软件测试中的应用可以提高总效率和可靠性。我们的数据可以在以下链接中找到:https://figshare.com/s/ded476c8d4c221222849。

Stochastic force inference via density estimation

  • paper_url: http://arxiv.org/abs/2310.02366
  • repo_url: None
  • paper_authors: Victor Chardès, Suryanarayana Maddu, Michael J. Shelley
  • for: 本研究旨在推断低分辨率时间数据中的动力模型,尤其在蛋白质物理中,分离分子程序与噪声仍然是一个重要的开放问题。
  • methods: 我们提出了一种方法,它基于下游分布的概率流来推断一个自主、非线性的力场,将分子程序与内在噪声分离开。我们使用了Score-matching来 отлиčiсть力场与内在噪声。
  • results: 我们通过了一些生物物理的实际例子,显示了我们的方法可以从非站ARY数据中提取非保守的力场,在平衡状态数据上学习平衡动力学,并且可以处理 additive 和 multiplicative 噪声模型。
    Abstract Inferring dynamical models from low-resolution temporal data continues to be a significant challenge in biophysics, especially within transcriptomics, where separating molecular programs from noise remains an important open problem. We explore a common scenario in which we have access to an adequate amount of cross-sectional samples at a few time-points, and assume that our samples are generated from a latent diffusion process. We propose an approach that relies on the probability flow associated with an underlying diffusion process to infer an autonomous, nonlinear force field interpolating between the distributions. Given a prior on the noise model, we employ score-matching to differentiate the force field from the intrinsic noise. Using relevant biophysical examples, we demonstrate that our approach can extract non-conservative forces from non-stationary data, that it learns equilibrium dynamics when applied to steady-state data, and that it can do so with both additive and multiplicative noise models.
    摘要 <>translate "Inferring dynamical models from low-resolution temporal data continues to be a significant challenge in biophysics, especially within transcriptomics, where separating molecular programs from noise remains an important open problem." into Simplified ChineseAnswer:低分辨率时间数据中的动力学模型推断仍然是生物物理学中的一个主要挑战,尤其在转录ómics中,因为分离分子计划与噪声是一个重要的开放问题。我们考虑一种常见的情况,我们有许多时间点的十分适用的横截面样本。我们提议一种方法,它基于下述的扩散过程的概率流来推断一个自主的非线性力场,这个力场可以将分布函数 interpolate。给出一个噪声模型的先验,我们使用分数匹配来 отличи出力场与内在噪声。使用生物物理中的相关示例,我们表明我们的方法可以从非站点数据中提取非保守的力场,learns平衡动力学当应用到平衡数据,并且可以处理 both additive和 multiplicative噪声模型。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Investigating Speed Deviation Patterns During Glucose Episodes: A Quantile Regression Approach

  • paper_url: http://arxiv.org/abs/2310.02351
  • repo_url: None
  • paper_authors: Aparna Joshi, Jennifer Merickel, Cyrus V. Desouza, Matthew Rizzo, Pujitha Gunaratne, Anuj Sharma
  • for: 这个研究旨在探讨 диабе特 sufferers 在驾驶过程中的行为差异,以了解diabetes 对驾驶能力的影响。
  • methods: 该研究使用分布式分析方法,捕捉驾驶者的速度差异模式,以进一步了解diabetes 的影响。
  • results: 研究发现,diabetes 患者在血糖控制不良时,有较高的速度差异和驾驶能力下降风险。
    Abstract Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed behavior during acute glucose to drivers with diabetes who were euglycemic or control drivers without diabetes in a naturalistic driving environment. By employing distribution-based analytic methods which capture distribution patterns, our study advances prior literature that has focused on conventional approach of average speed to explore speed deviation patterns.
    摘要 Translated into Simplified Chinese:随着糖尿病的流行率的增长,有一些关注如何评估糖尿病对日常功能的影响,如驾驶。糖尿病控制的合并包括低血糖和高血糖 episods,这些可能会影响驾驶需要的认知和动作功能。本文的目的是在自然化驾驶环境中,通过分布分析方法,探讨驾驶者有糖尿病的速度行为模式。我们的研究超越了过去关注的平均速度方法,以探讨速度偏移模式。

Learning unitaries with quantum statistical queries

  • paper_url: http://arxiv.org/abs/2310.02254
  • repo_url: None
  • paper_authors: Armando Angrisani
    for: 这个论文主要针对的是学习unitary оператор的问题。methods: 这篇论文使用了量子统计查询(QSQ)来学习unitary operator。这种查询方法只接受噪声估计的期望值作为输入,而不是直接访问unitary operator和其逆元。results: 这篇论文提出了一些算法来学习unitary operator,包括量子金列良-莱文算法和常量深度电路。这些算法可以使用量子统计查询来实现,而不需要直接访问unitary operator和其逆元。此外,论文还证明了一些任务的 Sample Efficiency 可以被改进,包括 $\mathcal{O}(\log n)$-juntas 和量子布尔函数。
    Abstract We propose several algorithms for learning unitary operators from quantum statistical queries (QSQs) with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our methods hinge on a novel technique for estimating the Fourier mass of a unitary on a subset of Pauli strings with a single quantum statistical query, generalizing a previous result for uniform quantum examples. Exploiting this insight, we show that the quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. Moreover, we prove that $\mathcal{O}(\log n)$-juntas and quantum Boolean functions with constant total influence are efficiently learnable in our model, and constant-depth circuits are learnable sample-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks require direct access to the Choi-Jamiolkowski state or oracle access to the unitary. In addition, our upper bounds imply that the actions of those classes of unitaries on locally scrambled ensembles can be efficiently learned. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger sample complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels, adapting to our setting previous arguments for quantum states. Finally, we propose a new definition of average-case surrogate models, showing a potential application of our results to hybrid quantum machine learning.
    摘要 我们提出了几种算法来从量子统计查询(QSQ)中学习单位 оператор(unitary operator)。量子统计查询捕捉了学习者具有限量量子资源的能力,它将从量子统计查询中获得噪声带入的预期值。我们的方法围绕着一种新的技术来估计单位操作在一 subset of Pauli strings 上的傅立务质量,从而扩展先前的结果,对于均匀量子例子。我们利用这个见解,该示了量子金列 Reich-Levin 算法可以通过量子统计查询进行实现,而先前版本的算法需要量子统计查询中的 oracle 存取。此外,我们证明了 $\mathcal{O}(\log n)$-junta 和量子布尔函数的条件Influence 是高效地学习的,并且可以使用量子统计查询进行学习。另一方面,所有先前的算法需要直接存取Choi-Jamiolkowski 状态或 oracle 存取单位。我们的上限 bounds 显示,量子统计查询可以高效地学习单位操作的行为。此外,我们还示出量子统计查询对于 certain tasks 会带来 exponential sample complexity 的增加,相比于 separable measurements to the Choi-Jamiolkowski 状态。具体来说,我们显示了一个 exponential lower bound для learning a class of phase-oracle unitaries 和 double exponential lower bound для testing the unitarity of channels。最后,我们提出了一个新的 average-case surrogate models 的定义,显示了我们的结果可能应用于半古尔дер量子机器学习。

Why do autoencoders work?

  • paper_url: http://arxiv.org/abs/2310.02250
  • repo_url: None
  • paper_authors: Matthew D. Kvalheim, Eduardo D. Sontag
  • for: 这个论文的目的是解释深度神经网络自动编码器在计算机上的应用,它可以识别数据中的内在维度,并将数据Projected onto a lower-dimensional space。
  • methods: 这个论文使用了深度神经网络自动编码器,包括编码层和解码层,以实现数据的压缩和重建。在编码层中,数据从高维空间 proyected onto a lower-dimensional space,而在解码层中,数据从 lower-dimensional space 重建到高维空间。
  • results: 这个论文证明了深度神经网络自动编码器在实际应用中的效果是非常好,但是存在某些概念上的概念上的限制,这些限制可以通过 differential geometry 来解释。
    Abstract Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a $k$-dimensional subset $K$ of an input Euclidean space $\mathbb{R}^n$. The underlying idea is to obtain both an encoding layer that maps $\mathbb{R}^n$ into $\mathbb{R}^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\mathbb{R}^k$ back into $\mathbb{R}^n$, in such a way that the input data from the set $K$ is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that $K$ is homeomorphic to a $k$-dimensional subset of $\mathbb{R}^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to "work" well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential geometry. A computational example is also included to illustrate the ideas.
    摘要 深度神经网络自动编码器在计算上广泛应用于模型减少。它们可以识别数据中的内在维度,即在输入欧几里得空间 $\mathbb{R}^n$ 中的 $k$-维子空间 $K$ 中。基本的想法是通过设置网络参数(权重),使得网络可以将 $\mathbb{R}^n$ 映射到 $\mathbb{R}^k$,并将 $\mathbb{R}^k$ 映射回 $\mathbb{R}^n$,以达到原始输入数据的恢复。这是通过调整参数来实现,以最小化输入和重建输出之间的差异。由于神经网络( WITH 连续活化函数)计算连续Map,因此存在一个网络可以实现完美重建,那么 $K$ 必然是 $\mathbb{R}^k$ 中一个 $k$-维子空间的同构,这会导致找到这种网络的困难。然而,在实践中,这种技术实际上是有效的,这使得人们开始思考是否有一种解释这种效果的方法。我们表明,在小误差下,实际上这种方法是可靠的,这是通过 differential geometry 中的一些事实来证明的。此外,我们还提供了一个计算示例,以 Illustrate 这些想法。

Learning quantum Hamiltonians at any temperature in polynomial time

  • paper_url: http://arxiv.org/abs/2310.02243
  • repo_url: None
  • paper_authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang
  • for: 学习一个本地量子哈密顿 $H$,给出了多个复本的幂函数状态 $\rho = e^{-\beta H}/\text{tr}(e^{-\beta H})$,其中 $\beta > 0$ 是知道的倒数。
  • methods: 我们使用了一种新的平方函数approximation,将幂函数状态转化为多元scalar polynomials和嵌套 commutators,然后将哈密顿学习转化为一个多项系统。我们最后解释,对于这个多项系统,解一个低度 sum-of-squares relaxation 即可准确地学习哈密顿。
  • results: 我们完全解决了这个问题,提供了一个 polynomial time 算法,可以准确地学习 $H$ 到 precision $\epsilon$,只需要 polynomially many copies of the Gibbs state,不 matter what $\beta > 0$ 是。这意味着在任何常数 $\beta$ 下,我们可以实现 computationally efficient 的哈密顿学习算法。
    Abstract We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/\textrm{tr}(e^{-\beta H})$ at a known inverse temperature $\beta>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $\epsilon$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)], with prior work only resolving this in the limited cases of high temperature [Haah, Kothari, Tang'21 (arXiv:2108.04842)] or commuting terms [Anshu, Arunachalam, Kuwahara, Soleimanifar'21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $\epsilon$ from polynomially many copies of the Gibbs state at any constant $\beta > 0$. Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian.
    摘要 我们研究了学习本地量子哈密顿($H$)的问题,将其复复制($\rho = e^{-\beta H}/\text{tr}(e^{-\beta H})$)的副本复本给定的値$\beta > 0$。安修、阿伦那查姆、桑迪哈拉、索利曼费(arXiv:2004.07266)提供了一个算法,可以从多个副本中学习$H$到精度$\epsilon$,但是需要多过多过多的时间。取得 computationally efficient algorithm 是一个主要的开问题,直到[阿伦那查姆'22(arXiv:2204.08349)],[安修、阿伦那查姆'22(arXiv:2204.08349)],仅在高温度情况下解决了这个问题。我们将这个问题完全解决,提供一个 polynomial time 算法,可以从多个副本中学习$H$到精度$\epsilon$,且适用于任何常数$\beta > 0$。我们的主要技术贡献是一个新的平方多项式减少函数,以及将多项式函数与嵌套幂数转换为一个问题。这使我们能够将哈密顿学习推广为一个多项式系统。我们还证明,解决一个低度缩数幂数关推运动的问题,可以精确地学习哈密顿。

Generalized Schrödinger Bridge Matching

  • paper_url: http://arxiv.org/abs/2310.02233
  • repo_url: None
  • paper_authors: Guan-Horng Liu, Yaron Lipman, Maximilian Nickel, Brian Karrer, Evangelos A. Theodorou, Ricky T. Q. Chen
  • for: 本文旨在提出一种新的分布匹配算法,用于直接在 diffusion 或 flow 模型中训练分布。
  • methods: 本文使用 Generalized Schrödinger Bridge (GSB) 问题设置,并提出 Generalized Schrödinger Bridge Matching (GSBM) 算法,这种算法可以扩展到考虑任务特定的状态成本。
  • results: 作者在多个实验设置中证明了 GSBM 算法的可靠性和可扩展性,并且在许多情况下显示了改进的扩展性和稳定性。
    Abstract Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. The problem setup, known as the Generalized Schr\"odinger Bridge (GSB), appears prevalently in many scientific areas both within and without machine learning. We propose Generalized Schr\"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances, generalizing them beyond kinetic energy minimization and to account for task-specific state costs. We show that such a generalization can be cast as solving conditional stochastic optimal control, for which efficient variational approximations can be used, and further debiased with the aid of path integral theory. Compared to prior methods for solving GSB problems, our GSBM algorithm always preserves a feasible transport map between the boundary distributions throughout training, thereby enabling stable convergence and significantly improved scalability. We empirically validate our claims on an extensive suite of experimental setups, including crowd navigation, opinion depolarization, LiDAR manifolds, and image domain transfer. Our work brings new algorithmic opportunities for training diffusion models enhanced with task-specific optimality structures.
    摘要 现代分布匹配算法直接定义 diffusion or flow 模型的时间演化 marginal 分布 между两个边缘分布。在这项工作中,我们考虑一种总体分布匹配设置,其中这些 marginal 分布只是被解释为某种任务特定的目标函数的解。这个问题设置被称为通用Schrödinger Bridge(GSB)问题。我们提出一种基于最近进步的Generalized Schrödinger Bridge Matching(GSBM)算法,其扩展了之前的劳动能矩阵最小化,并考虑了任务特定的状态成本。我们表明,这种扩展可以转化为解 conditional stochastic optimal control 问题,并可以使用高效的 Variational approximations 和 path integral theory 进行逼减。与先前的 GSB 问题解法相比,我们的 GSBM 算法总是保持两个边缘分布之间的可靠运输映射,从而实现稳定的转化和显著提高的可扩展性。我们在一系列实验中证明了我们的主张,包括人群导航、意见减轻、 LiDAR manifold 和图像领域传输。我们的工作带来了新的算法机遇,用于增强 diffusion 模型的任务特定优化结构。

HoloNets: Spectral Convolutions do extend to Directed Graphs

  • paper_url: http://arxiv.org/abs/2310.02232
  • repo_url: None
  • paper_authors: Christian Koke, Daniel Cremers
  • for: directed graph convolutional networks
  • methods: advanced tools from complex analysis and spectral theory
  • results: new state of the art results for heterophilic node classification on many datasets, stable to resolution-scale varying topological perturbations.Here’s the full text in Simplified Chinese:
  • for: directed graphs
  • methods: 复杂分析和 спектраль理论工具
  • results: 新的领先Result for 不同 datasets 的异构节点分类,对分辨率层次变化的扰动稳定。
    Abstract Within the graph learning community, conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs: Only there could the existence of a well-defined graph Fourier transform be guaranteed, so that information may be translated between spatial- and spectral domains. Here we show this traditional reliance on the graph Fourier transform to be superfluous and -- making use of certain advanced tools from complex analysis and spectral theory -- extend spectral convolutions to directed graphs. We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based. In order to thoroughly test the developed theory, we conduct experiments in real world settings, showcasing that directed spectral convolutional networks provide new state of the art results for heterophilic node classification on many datasets and -- as opposed to baselines -- may be rendered stable to resolution-scale varying topological perturbations.
    摘要 在图学学术社区中,传统观点认为spectral convolutional networks只能应用于无向图:只有在这种情况下,存在 Graph Fourier Transform的定义可以保证信息在空间频率域和spectral频率域之间进行翻译。我们现在采用了一些复杂分析和频谱理论的高级工具,扩展spectral convolutions到导向图。我们提供了新的滤波器频率响应的解释,研究表达滤波器使用的基准集的影响,并讨论与特征Operator相关的交互。为了彻底测试开发的理论,我们在实际世界中进行了实验,显示了 dirigible spectral convolutional networks在许多数据集上提供了新的状态态Results for heterophilic node classification,并且与基elines相比,可以在分辨率尺度变化时保持稳定。

Structurally guided task decomposition in spatial navigation tasks

  • paper_url: http://arxiv.org/abs/2310.02221
  • repo_url: None
  • paper_authors: Ruiqi He, Carlos G. Correa, Thomas L. Griffiths, Mark K. Ho
  • for: 研究人员想要了解人们如何快速准备计划,即使有限的认知资源。
  • methods: 研究人员通过扩展现有的人任务剖分模型来解释更复杂的计划问题,并在更复杂的 Navigation 领域中应用该模型。
  • results: 研究人员在在线实验中发现,使用该模型可以正确预测大多数参与者的导航策略。
    Abstract How are people able to plan so efficiently despite limited cognitive resources? We aimed to answer this question by extending an existing model of human task decomposition that can explain a wide range of simple planning problems by adding structure information to the task to facilitate planning in more complex tasks. The extended model was then applied to a more complex planning domain of spatial navigation. Our results suggest that our framework can correctly predict the navigation strategies of the majority of the participants in an online experiment.
    摘要 人们如何有效划分计划,即使有限的认知资源呢?我们想要回答这个问题,我们扩展了现有的人类任务分解模型,以便在更复杂的计划问题中更好地划分任务。我们将这个扩展模型应用到了空间导航计划领域,我们的结果表明,我们的框架可以正确预测大多数参与者在线 эксперимен特 navigation策略。

An experimental system for detection and localization of hemorrhage using ultra-wideband microwaves with deep learning

  • paper_url: http://arxiv.org/abs/2310.02215
  • repo_url: None
  • paper_authors: Eisa Hedayati, Fatemeh Safari, George Verghese, Vito R. Ciancia, Daniel K. Sodickson, Seena Dehkharghani, Leeor Alon
  • for: stroke detection
  • methods: 使用低能量微波探测技术和深度学习算法
  • results: 达到了检测的可靠性和准确性,具有1.65毫米的准确性Error
    Abstract Stroke is a leading cause of mortality and disability. Emergent diagnosis and intervention are critical, and predicated upon initial brain imaging; however, existing clinical imaging modalities are generally costly, immobile, and demand highly specialized operation and interpretation. Low-energy microwaves have been explored as low-cost, small form factor, fast, and safe probes of tissue dielectric properties, with both imaging and diagnostic potential. Nevertheless, challenges inherent to microwave reconstruction have impeded progress, hence microwave imaging (MWI) remains an elusive scientific aim. Herein, we introduce a dedicated experimental framework comprising a robotic navigation system to translate blood-mimicking phantoms within an anatomically realistic human head model. An 8-element ultra-wideband (UWB) array of modified antipodal Vivaldi antennas was developed and driven by a two-port vector network analyzer spanning 0.6-9.0 GHz at an operating power of 1 mw. Complex scattering parameters were measured, and dielectric signatures of hemorrhage were learned using a dedicated deep neural network for prediction of hemorrhage classes and localization. An overall sensitivity and specificity for detection >0.99 was observed, with Rayliegh mean localization error of 1.65 mm. The study establishes the feasibility of a robust experimental model and deep learning solution for UWB microwave stroke detection.
    摘要 stroke 是一种主要的死亡和残疾原因。 紧急诊断和 interven 是关键的,但现有的临床成像方法通常很昂贵,不可移动,需要高度专业的操作和解释。 低能量微波已经被研究用于低成本、小形态、快速、安全地探测组织肉眼属性,具有成像和诊断潜力。 然而,微波重建问题的挑战使得微波成像(MWI)成为了一个困难的科学目标。 在这里,我们介绍了一个专门的实验框架,包括一个 робоット导航系统,用于在人类头部模型中穿梭血液模拟体。 我们开发了一个8个元UWB数组,由修改后的反podal Vivaldi天线组成,并通过两个端口 вектор网络分析器覆盖0.6-9.0 GHz的频谱范围,操作功率为1MW。 我们测量了复杂散射参数,并通过专门的深度神经网络来预测出血液类别和地点。 我们观察到了检测的敏感度和特异性都高于0.99,并且各个准确的散射参数的平均值为1.65mm。 这种研究证明了UWB微波成像的可行性和深度学习解决方案的可靠性。

Chunking: Forgetting Matters in Continual Learning even without Changing Tasks

  • paper_url: http://arxiv.org/abs/2310.02206
  • repo_url: None
  • paper_authors: Thomas L. Lee, Amos Storkey
  • for: This paper focuses on the problem of continual learning (CL) with dynamically-changing data distribution, specifically addressing the chunking of data and its impact on CL performance.
  • methods: The paper analyzes the chunking sub-problem in CL and shows that current CL algorithms do not effectively address this issue, leading to performance drops. The authors propose per-chunk weight averaging as a solution to improve performance in the chunking setting and demonstrate its transfer to the full CL setting.
  • results: The paper reveals that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in the authors’ experiments. The proposed per-chunk weight averaging method improves performance in the chunking setting and transfers to the full CL setting, demonstrating its effectiveness in addressing the chunking sub-problem.
    Abstract Work on continual learning (CL) has largely focused on the problems arising from the dynamically-changing data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem -- the chunking of data -- and note that previous analysis of chunking in the CL literature is sparse. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. We analyse why performance drops when learning occurs on chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. Motivated by an analysis of the linear case, we show that per-chunk weight averaging improves performance in the chunking setting and that this performance transfers to the full CL setting. Hence, we argue that work on chunking can help advance CL in general.
    摘要 translate the given text into Simplified Chinese.Work on continual learning (CL) has largely focused on the problems arising from the dynamically-changing data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem -- the chunking of data -- and note that previous analysis of chunking in the CL literature is sparse. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. We analyze why performance drops when learning occurs on chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. Motivated by an analysis of the linear case, we show that per-chunk weight averaging improves performance in the chunking setting and that this performance transfers to the full CL setting. Hence, we argue that work on chunking can help advance CL in general.中文简化版: continual learning (CL) 研究 largely 专注在资料分布的变化问题上,但 CL 可以分为两个子问题:(a)资料分布的变化,以及(b)只有一部分资料可以在任何时候进行训练。在这个工作中,我们专注在后者子问题上——即批量训练。过去 CL 文献中关于批量训练的分析不多,我们表明批量训练是 CL 的重要部分,对于缺乏资料分布的情况下,CL 的性能下降约为半。此外,我们发现现有的 CL 算法不会处理批量训练的问题,它们只能在没有资料分布的情况下和平SGD 训练相同水平。我们分析了在批量训练中学习的原因会导致性能下降,发现这是由于遗传问题导致的。我们还进一步显示,在线性情况下,每个批量的均值调整可以提高批量训练中的性能,并且这些性能可以转移到全 CL 设定中。因此,我们认为研究批量训练可以帮助进步 CL 。

Probabilistically Rewired Message-Passing Neural Networks

  • paper_url: http://arxiv.org/abs/2310.02156
  • repo_url: https://github.com/chendiqian/PR-MPNN
  • paper_authors: Chendi Qian, Andrei Manolache, Kareem Ahmed, Zhe Zeng, Guy Van den Broeck, Mathias Niepert, Christopher Morris
  • for: 这个研究的目的是提出一种可学习的Message-passing graph neural networks(MPNNs)模型,可以处理具有随机遗传的图像输入。
  • methods: 这个模型使用了最近的精确和可微分$k$-subset抽样法,可以学习添加有用的图像关系,同时忽略不重要的关系。
  • results: 这个研究的 тео리式分析表明,PR-MPNNs可以增强表达能力,并且我们提出了具体的条件,在这些条件下,PR-MPNNs 会比Randomized MPNNs表现更好。实验结果显示,我们的方法可以有效地解决过压和不足的问题。此外,我们的方法在一些知名的实际世界数据集上表现了竞争性或更好的预测性,比起传统的 MPNN 模型和最近的图像转换架构。
    Abstract Message-passing graph neural networks (MPNNs) emerged as powerful tools for processing graph-structured input. However, they operate on a fixed input graph structure, ignoring potential noise and missing information. Furthermore, their local aggregation mechanism can lead to problems such as over-squashing and limited expressive power in capturing relevant graph structures. Existing solutions to these challenges have primarily relied on heuristic methods, often disregarding the underlying data distribution. Hence, devising principled approaches for learning to infer graph structures relevant to the given prediction task remains an open challenge. In this work, leveraging recent progress in exact and differentiable $k$-subset sampling, we devise probabilistically rewired MPNNs (PR-MPNNs), which learn to add relevant edges while omitting less beneficial ones. For the first time, our theoretical analysis explores how PR-MPNNs enhance expressive power, and we identify precise conditions under which they outperform purely randomized approaches. Empirically, we demonstrate that our approach effectively mitigates issues like over-squashing and under-reaching. In addition, on established real-world datasets, our method exhibits competitive or superior predictive performance compared to traditional MPNN models and recent graph transformer architectures.
    摘要 message-passing图 neural networks(MPNNs)已经出现为处理图结构输入的强大工具。然而,它们使用固定的输入图结构,忽略了可能的噪声和缺失信息。此外,它们的本地聚合机制可能会导致过抑压和限制表达力,使得不能够捕捉相关的图结构。现有的解决方案主要依靠了规则性的方法,经常忽略了下面数据分布。因此,把学习推理出相关的图结构与给定预测任务相关 remains an open challenge。在这种情况下,我们利用最近的精确和可微的 $k$-subset sampling技术,设计出 probabilistically rewired MPNNs(PR-MPNNs),它可以学习添加相关的边而忽略不重要的边。我们的理论分析表明,PR-MPNNs可以增强表达力,并且我们确定了其表达力超过了各种随机化方法的条件。实际上,我们的方法可以有效地解决过抑压和下降的问题,并且在已知的实际世界数据上显示出与传统 MPNN 模型和最新的图 transformer 架构相当或更高的预测性能。

Graph Neural Network-based EEG Classification: A Survey

  • paper_url: http://arxiv.org/abs/2310.02152
  • repo_url: None
  • paper_authors: Dominik Klepl, Min Wu, Fei He
  • for: 这篇论文旨在系统地审查和分类使用图 neural network (GNN) 来分类 EEG 数据的方法。
  • methods: 论文使用了各种方法来设计 GNN 基类器,包括 spectral graph convolutional layers 和 differential entropy 等。
  • results: 论文发现了这些方法的相似性和差异,以及标准的节点特征 forma , Raw EEG signal 是最为受欢迎的节点特征之一。 论文还提出了一些可能的研究方向,如 Transfer learning 方法和 Cross-frequency interactions 的合适模elling。
    Abstract Graph neural networks (GNN) are increasingly used to classify EEG for tasks such as emotion recognition, motor imagery and neurological diseases and disorders. A wide range of methods have been proposed to design GNN-based classifiers. Therefore, there is a need for a systematic review and categorisation of these approaches. We exhaustively search the published literature on this topic and derive several categories for comparison. These categories highlight the similarities and differences among the methods. The results suggest a prevalence of spectral graph convolutional layers over spatial. Additionally, we identify standard forms of node features, with the most popular being the raw EEG signal and differential entropy. Our results summarise the emerging trends in GNN-based approaches for EEG classification. Finally, we discuss several promising research directions, such as exploring the potential of transfer learning methods and appropriate modelling of cross-frequency interactions.
    摘要 Graph neural networks (GNN) 是越来越多地用于类型化 EEG,用于情绪识别、motor imagery 和 neuroscience diseases 和疾病。许多方法已经提议用于设计 GNN-based 分类器。因此,有需要一篇系统性的评论和分类这些方法。我们对这些方法进行了广泛的文献搜索,并 derivated 出一些比较categories。这些类别 highlights 这些方法之间的相似性和差异。结果表明 spectral graph convolutional layers 的使用更为普遍,而 node features 的标准化也有 Raw EEG signal 和 differential entropy 等。我们的结果总结了 GNN-based EEG 分类的emerging trends,并提出了一些有前途的研究方向,如通过 transfer learning 方法和cross-frequency interactions 的合适模elling。

Symmetric Single Index Learning

  • paper_url: http://arxiv.org/abs/2310.02117
  • repo_url: None
  • paper_authors: Aaron Zweig, Joan Bruna
  • for: 这个论文主要研究了单指数模型在卷积神经网络中的学习问题。
  • methods: 该论文使用了梯度流动方法来解决单指数模型的学习问题。
  • results: 论文证明了梯度流动方法可以在卷积神经网络中收敛到隐藏的植入方向。
    Abstract Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural networks. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.
    摘要 只有一些神经网络架构适合可证明学习,使用梯度基本方法。一种受欢迎的模型是单指数模型,labels是通过不确定的线性投影和可能不确定的整数链函数生成的。使用SGD进行学习这种模型的理论知识比较完善,其中链函数的信息指数控制了波动样本复杂度的 polynomial 速率。然而,推广这种分析到更深或更复杂的架构仍然是一个挑战。在这种工作中,我们考虑了单指数学习在同质神经网络中。对于 activation 和最大度函数的假设,我们证明了梯度流可以重现隐藏的植入方向,表示为特定的幂 sums 空间中的可数支持向量。我们还定义了适用于我们设定的信息指数,用于控制学习效率的概念。

Hierarchical Concept Discovery Models: A Concept Pyramid Scheme

  • paper_url: http://arxiv.org/abs/2310.02116
  • repo_url: None
  • paper_authors: Konstantinos P. Panousis, Dino Ienco, Diego Marcos
  • for: 本研究旨在提高深度学习模型的 ante hoc 解释性,具体来说是基于概念瓶颈模型(CBM)。
  • methods: 该研究提议一种基于图文模型的新型层次概念发现方法,通过数据驱动和稀疏化 bayesian 理论来实现多级划分概念选择。
  • results: 实验结果表明,提议的构建不仅能够超越当前CBM方法,还提供了一个原则性的解释性框架。
    Abstract Deep Learning algorithms have recently gained significant attention due to their impressive performance. However, their high complexity and un-interpretable mode of operation hinders their confident deployment in real-world safety-critical tasks. This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs). Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on multiple levels of granularity. To this end, we propose a novel hierarchical concept discovery formulation leveraging: (i) recent advances in image-text models, and (ii) an innovative formulation for multi-level concept selection via data-driven and sparsity inducing Bayesian arguments. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene. As we experimentally show, the proposed construction not only outperforms recent CBM approaches, but also yields a principled framework towards interpetability.
    摘要 Translated into Simplified Chinese:深度学习算法在最近受到了广泛关注,因为它们在表现出色的同时也具有较高的复杂性和不可解释性,这限制了它们在实际世界中安全关键任务中的可信部署。本工作目标是在预先解释性(ante hoc interpretability)方面提高CBM的表现,并且具体来说是通过图像文本模型的最新进展和数据驱动、稀热 inducing bayesian理论来实现。在我们的框架中,概念信息不仅仅是基于整个图像和普遍无结构概念之间的相似性,而是通过引入概念层次结构,挖掘和利用图像场景中更细致的概念信息。我们实验表明,我们的建议的构造不仅超越了现有CBM方法,还提供了一个原理性的解释性框架。

FLEDGE: Ledger-based Federated Learning Resilient to Inference and Backdoor Attacks

  • paper_url: http://arxiv.org/abs/2310.02113
  • repo_url: None
  • paper_authors: Jorge Castillo, Phillip Rieger, Hossein Fereidooni, Qian Chen, Ahmad Sadeghi
  • for: 防止潜在攻击者利用联合学习系统进行贪婪和欺诈行为,保证跨多个点之间的资料隐私和安全性。
  • methods: 利用零知识认证和抵抗式认证技术,实现各统计点之间的资料隐私和安全性,并通过奖励良性行为和惩罚违规行为的机制,增强统计点之间的信任和责任感。
  • results: 在四个公共数据集上进行了广泛的评估,试验结果显示FLEDGE可以实现强大的隐私保证和模型价值,同时能够成功地抵制不同的毒素攻击,并且提供了唯一的奖励机制来增强统计点之间的信任和责任感。
    Abstract Federated learning (FL) is a distributed learning process that uses a trusted aggregation server to allow multiple parties (or clients) to collaboratively train a machine learning model without having them share their private data. Recent research, however, has demonstrated the effectiveness of inference and poisoning attacks on FL. Mitigating both attacks simultaneously is very challenging. State-of-the-art solutions have proposed the use of poisoning defenses with Secure Multi-Party Computation (SMPC) and/or Differential Privacy (DP). However, these techniques are not efficient and fail to address the malicious intent behind the attacks, i.e., adversaries (curious servers and/or compromised clients) seek to exploit a system for monetization purposes. To overcome these limitations, we present a ledger-based FL framework known as FLEDGE that allows making parties accountable for their behavior and achieve reasonable efficiency for mitigating inference and poisoning attacks. Our solution leverages crypto-currency to increase party accountability by penalizing malicious behavior and rewarding benign conduct. We conduct an extensive evaluation on four public datasets: Reddit, MNIST, Fashion-MNIST, and CIFAR-10. Our experimental results demonstrate that (1) FLEDGE provides strong privacy guarantees for model updates without sacrificing model utility; (2) FLEDGE can successfully mitigate different poisoning attacks without degrading the performance of the global model; and (3) FLEDGE offers unique reward mechanisms to promote benign behavior during model training and/or model aggregation.
    摘要 federated learning (FL) 是一种分布式学习过程,使用一个可信的聚合服务器,让多个方(或客户端)共同训练一个机器学习模型,无需共享私人数据。然而, latest research 表明,FL 受到推理和毒击攻击的威胁。 simultaneously mitigating both attacks 是非常困难的。 current solutions 提出使用毒素防御技术(SMPC)和/或差分隐私(DP),但这些技术不是高效的,而且不能 Addressing the malicious intent behind the attacks, i.e., adversaries (curious servers and/or compromised clients) seek to exploit the system for monetization purposes。To overcome these limitations, we present a ledger-based FL framework known as FLEDGE that allows making parties accountable for their behavior and achieve reasonable efficiency for mitigating inference and poisoning attacks. Our solution leverages crypto-currency to increase party accountability by penalizing malicious behavior and rewarding benign conduct. We conduct an extensive evaluation on four public datasets: Reddit, MNIST, Fashion-MNIST, and CIFAR-10. Our experimental results demonstrate that:1. FLEDGE provides strong privacy guarantees for model updates without sacrificing model utility;2. FLEDGE can successfully mitigate different poisoning attacks without degrading the performance of the global model;3. FLEDGE offers unique reward mechanisms to promote benign behavior during model training and/or model aggregation.

Stochastic Gradient Descent with Preconditioned Polyak Step-size

  • paper_url: http://arxiv.org/abs/2310.02093
  • repo_url: https://github.com/fxrshed/scaledsps
  • paper_authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč
  • for: 提高 Stochastic Gradient Descent 的性能在 badly scaled 和/或 ill-conditioned 数据集上
  • methods: 使用 preconditioning 技术,如 Hutchinson’s method、Adam 和 AdaGrad,提高 SPS 的性能
  • results: 提高 SPS 的性能,使其在 badly scaled 和/或 ill-conditioned 数据集上更高效
    Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.
    摘要

1D-CapsNet-LSTM: A Deep Learning-Based Model for Multi-Step Stock Index Forecasting

  • paper_url: http://arxiv.org/abs/2310.02090
  • repo_url: None
  • paper_authors: Cheng Zhang, Nilam Nur Amir Sjarif, Roslina Ibrahim
  • for: 预测股票市场指数价格的多步预测任务是金融领域的关键任务,对各种金融活动的决策起着关键性的作用。然而,预测结果经常不满足要求,归因于数据的随机和抖抖性。研究人员已经尝试了各种方法,这一过程仍在继续。
  • methods: 这种研究借鉴了卷积神经网络长短期记忆网络(CNN-LSTM),使用1D卷积神经网络进行特征提取以提高模型性能。此外,该研究还引入了干擦网络(CapsNet)作为高级特征提取器,并结合LSTM层以捕捉时间相关性。为了维护多个输入输出之间的随机相关性,该模型采用了多输入多输出(MIMO)策略。
  • results: 该研究对实际的股票市场指数进行了评估,包括标普500指数(S&P 500)、道琼工业指数(DJIA)、纳斯达克股票指数(IXIC)和纽约股票交易所指数(NYSE)。与基准模型 such as LSTM、RNN和CNN-LSTM进行比较,结果表明1D-CapsNet-LSTM模型在各种评价指标上表现出色,并有很大的潜力在复杂预测任务中。
    Abstract Multi-step forecasting of stock market index prices is a crucial task in the financial sector, playing a pivotal role in decision-making across various financial activities. However, forecasting results are often unsatisfactory owing to the stochastic and volatile nature of the data. Researchers have made various attempts, and this process is ongoing. Inspired by convolutional neural network long short-term memory (CNN-LSTM) networks that utilize a 1D CNN for feature extraction to boost model performance, this study explores the use of a capsule network (CapsNet) as an advanced feature extractor in an LSTM-based forecasting model to enhance multi-step predictions. To this end, a novel neural architecture called 1D-CapsNet-LSTM was introduced, which combines a 1D CapsNet to extract high-level features from 1D sequential data and an LSTM layer to capture the temporal dependencies between the previously extracted features and uses a multi-input multi-output (MIMO) strategy to maintain the stochastic dependencies between the predicted values at different time steps. The proposed model was evaluated based on several real-world stock market indices, including Standard & Poor's 500 (S&P 500), Dow Jones Industrial Average (DJIA), Nasdaq Composite Index (IXIC), and New York Stock Exchange (NYSE), and was compared with baseline models such as LSTM, recurrent neural network (RNN), and CNN-LSTM in terms of various evaluation metrics. The comparison results suggest that the 1D-CapsNet-LSTM model outperforms the baseline models and has immense potential for the effective handling of complex prediction tasks.
    摘要 多步预测股票市场指数价格是金融部门中一项重要的任务,对各种金融活动的决策具有关键性。然而,预测结果经常不满足 expectations due to the stochastic and volatile nature of the data. Researchers have made various attempts, and this process is ongoing.引ourg CNN-LSTM 网络,这种网络使用一个 1D CNN 来提取特征,以提高模型性能。本研究则探索使用 Capsule Network (CapsNet) 作为高级特征提取器,并与 LSTM 层结合使用,以增强多步预测。为此,我们提出了一种新的神经网络架构,称为 1D-CapsNet-LSTM,它将 1D CapsNet 用于提取高级特征,并将 LSTM 层用于捕捉时间间隔中的相互关系。此外,我们采用 MIMO 策略,以维护预测值之间的随机相关性。我们对实际的股票市场指数进行了评估,包括 Standard & Poor's 500 (S&P 500)、Dow Jones Industrial Average (DJIA)、Nasdaq Composite Index (IXIC) 和 New York Stock Exchange (NYSE)。与基准模型 such as LSTM、RNN 和 CNN-LSTM 进行比较,我们发现 1D-CapsNet-LSTM 模型在各种评价指标上表现出色,并且有潜在的应用前景。

Learning Quantum Processes with Quantum Statistical Queries

  • paper_url: http://arxiv.org/abs/2310.02075
  • repo_url: https://github.com/chirag-w/qpsq-learning
  • paper_authors: Chirag Wadhwa, Mina Doosti
  • for: 本文为了研究量子过程学习而提出了首个学习框架,并提供了量子统计查询(QSQ)模型下的首个 formally定义的统计查询(QPSQ)。
  • methods: 该框架使得可以提出高效的QPSQ学习算法,并提供了一个可靠性保证。数据示范也验证了该算法的有效性。
  • results: 本文通过应用于密码分析中,演示了该框架的实际 relevance,揭示了 classical-readout量子物理不可克隆函数(CR-QPUF)的漏洞,解决了量子硬件安全领域的一个重要开问题。这项工作对量子过程学习的理解和安全性带来了重要的进展。
    Abstract Learning complex quantum processes is a central challenge in many areas of quantum computing and quantum machine learning, with applications in quantum benchmarking, cryptanalysis, and variational quantum algorithms. This paper introduces the first learning framework for studying quantum process learning within the Quantum Statistical Query (QSQ) model, providing the first formal definition of statistical queries to quantum processes (QPSQs). The framework allows us to propose an efficient QPSQ learner for arbitrary quantum processes accompanied by a provable performance guarantee. We also provide numerical simulations to demonstrate the efficacy of this algorithm. The practical relevance of this framework is exemplified through application in cryptanalysis, highlighting vulnerabilities of Classical-Readout Quantum Physical Unclonable Functions (CR-QPUFs), addressing an important open question in the field of quantum hardware security. This work marks a significant step towards understanding the learnability of quantum processes and shedding light on their security implications.
    摘要 学习复杂量子过程是许多量子计算和量子机器学习领域的中心挑战,具有应用于量子准则测试、破解和变量量子算法等领域。本文介绍了首个量子过程学习框架,基于量子统计查询(QSQ)模型,提供了首个量子过程统计查询(QPSQ)的正式定义。这个框架允许我们提出一种高效的QPSQ学习算法,并提供了可证明性能保证。我们还提供了数字实验来证明该算法的效果。这个框架的实用性被应用于密码分析中,揭示了类型ical-Readout量子物理不可复制函数(CR-QPUFs)的漏洞,解决了量子硬件安全领域中的一个重要问题。这项工作标志着量子过程学习的理解和安全性的重要进展。

ACE: A fast, skillful learned global atmospheric model for climate prediction

  • paper_url: http://arxiv.org/abs/2310.02074
  • repo_url: None
  • paper_authors: Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah D. Brenowitz, Karthik Kashinath, Michael S. Pritchard, Boris Bonev, Matthew E. Peters, Christopher S. Bretherton
  • for: 该论文是为了提出一种基于人工智能的气候预测模型,以提高气候预测的稳定性和物理一致性。
  • methods: 该论文使用了200M个参数的自动适应机器学习模型,来模拟一个现有的100km分覆盖全球气象模型。该模型的形ulation允许评估物理法律,如质量和湿度的保守。
  • results: 该论文的结果显示,ACE模型在10年的稳定性下,可以准确地预测气候变化,并且在80%以上的变量上超越了一个具有挑战性的基线模型。同时,ACE模型只需要100倍的wall clock时间和100倍的能源成本,与传统的 Referenced Model 相比。
    Abstract Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass and moisture. The emulator is stable for 10 years, nearly conserves column moisture without explicit constraints and faithfully reproduces the reference model's climate, outperforming a challenging baseline on over 80% of tracked variables. ACE requires nearly 100x less wall clock time and is 100x more energy efficient than the reference model using typically available resources.
    摘要

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

  • paper_url: http://arxiv.org/abs/2310.02065
  • repo_url: https://github.com/udc-gac/venom
  • paper_authors: Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler
  • for: 提高深度学习模型的计算效率和能力
  • methods: 使用简化方法、剪枝算法和专门的稀疏 вектор单元支持
  • results: 实现了2x增速,并且可以达到高稀疏比率(37x增速 над cuBLAS)和低损失精度(在现代转换器中)
    Abstract The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated hardware is becoming available. However, exploiting it efficiently requires kernel implementations, pruning algorithms, and storage formats, to utilize hardware support of specialized sparse vector units. An example of those are the NVIDIA's Sparse Tensor Cores (SPTCs), which promise a 2x speedup. However, SPTCs only support the 2:4 format, limiting achievable sparsity ratios to 50%. We present the V:N:M format, which enables the execution of arbitrary N:M ratios on SPTCs. To efficiently exploit the resulting format, we propose Spatha, a high-performance sparse-library for DL routines. We show that Spatha achieves up to 37x speedup over cuBLAS. We also demonstrate a second-order pruning technique that enables sparsification to high sparsity ratios with V:N:M and little to no loss in accuracy in modern transformers.
    摘要 “深度学习模型的成功和扩展需要更高的计算效率和能力。简化可以导致模型的减小和更高的计算效率,并且特殊化硬件在进行支持。然而,利用其需要kernel实现、剪枝算法和存储格式,以利用硬件支持特殊的稀疏向量单元。例如,NVIDIA的稀疏tensor核心(SPTCs)可以提供2倍的速度提升。然而,SPTCs只支持2:4格式,这限制了可达的稀疏比例到50%。我们提出了V:N:M格式,可以在SPTCs上执行任意N:M比例的计算。为了高效地利用这种格式,我们提出了Spatha,一个高性能的稀疏库 дляDLRoutines。我们表明Spatha可以与cuBLAS相比 achieve up to 37倍的速度提升。我们还展示了一种第二阶剪枝技术,可以在V:N:M格式下实现高稀疏比例,而无需失去现代transformer中的精度。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Lessons Learned from EXMOS User Studies: A Technical Report Summarizing Key Takeaways from User Studies Conducted to Evaluate The EXMOS Platform

  • paper_url: http://arxiv.org/abs/2310.02063
  • repo_url: None
  • paper_authors: Aditya Bhattacharya, Simone Stumpf, Lucija Gosak, Gregor Stiglic, Katrien Verbert
  • for: 本研究旨在探讨如何在交互式机器学习系统中提供解释,以帮助域专家更好地调试和改进预测模型。
  • methods: 本研究采用了两个用户研究,包括量化分析和质量评估,以探索不同类型的解释对域专家的影响。
  • results: 研究发现,全局模型中心解释独立无法有效地指导用户进行数据配置。相比之下,数据中心解释能够增强系统变化的理解,但是组合两者最高效地带来了域专家对模型改进的信任、理解和能力。
    Abstract In the realm of interactive machine-learning systems, the provision of explanations serves as a vital aid in the processes of debugging and enhancing prediction models. However, the extent to which various global model-centric and data-centric explanations can effectively assist domain experts in detecting and resolving potential data-related issues for the purpose of model improvement has remained largely unexplored. In this technical report, we summarise the key findings of our two user studies. Our research involved a comprehensive examination of the impact of global explanations rooted in both data-centric and model-centric perspectives within systems designed to support healthcare experts in optimising machine learning models through both automated and manual data configurations. To empirically investigate these dynamics, we conducted two user studies, comprising quantitative analysis involving a sample size of 70 healthcare experts and qualitative assessments involving 30 healthcare experts. These studies were aimed at illuminating the influence of different explanation types on three key dimensions: trust, understandability, and model improvement. Results show that global model-centric explanations alone are insufficient for effectively guiding users during the intricate process of data configuration. In contrast, data-centric explanations exhibited their potential by enhancing the understanding of system changes that occur post-configuration. However, a combination of both showed the highest level of efficacy for fostering trust, improving understandability, and facilitating model enhancement among healthcare experts. We also present essential implications for developing interactive machine-learning systems driven by explanations. These insights can guide the creation of more effective systems that empower domain experts to harness the full potential of machine learning
    摘要 在机器学习系统中的互动实现中,提供说明是一项非常重要的帮助工具,用于系统调试和改进预测模型。然而,许多全球模型中心和数据中心的说明是否能够有效地帮助领域专家检测和解决数据相关问题,以便提高模型的改进,还未得到充分探讨。在这份技术报告中,我们Summarize了我们的两项用户研究的关键发现。我们的研究涵盖了全球说明在支持医疗专家优化机器学习模型的自动和手动数据配置系统中的影响。为了实际检查这些动态,我们进行了两项用户研究,包括70名医疗专家的量化分析和30名医疗专家的质量评估。这些研究的目的是为了突出不同类型的说明对三个关键维度的影响:信任、理解度和模型改进。结果表明,全球模型中心的说明独立无法有效地引导用户进行数据配置过程中的繁杂过程。相比之下,数据中心的说明在系统改变后的理解方面表现出了潜在的优势。然而,两者的组合显示出了最高水平的效果,即使用途中心的医疗专家建立信任,提高理解度,并且促进模型改进。我们还提出了开发基于说明的互动机器学习系统的重要建议。这些发现可以导引创造更有效的系统,以便领域专家更好地利用机器学习的潜力。

The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers

  • paper_url: http://arxiv.org/abs/2310.02041
  • repo_url: None
  • paper_authors: Rickard Brännvall
  • for: 提高量化Transformer的计算效率
  • methods: 使用添加和ReLU活化代替点积和Softmax基于的注意力机制
  • results: 实现了与传统点积注意力相对的测试集预测分数,并且在 encryption 下实现了显著的计算成本减少。
    Abstract To enhance the computational efficiency of quantized Transformers, we replace the dot-product and Softmax-based attention with an alternative mechanism involving addition and ReLU activation only. This side-steps the expansion to double precision often required by matrix multiplication and avoids costly Softmax evaluations but maintains much of the core functionality of conventional dot-product attention. It can enable more efficient execution and support larger quantized Transformer models on resource-constrained hardware or alternative arithmetic systems like homomorphic encryption. Training experiments on four common benchmark tasks show test set prediction scores comparable to those of conventional Transformers with dot-product attention. Our scaling experiments also suggest significant computational savings, both in plaintext and under encryption. In particular, we believe that the ReLU and addition-based attention mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the costly multiplication of encrypted variables.
    摘要 为提高量化Transformers的计算效率,我们将点乘和Softmax基于的注意机制换为一种另外的机制,只包括加法和ReLU活动。这种方法可以避免矩阵乘法中的扩展到双精度,并避免贵重的Softmax评估,但是保持了大部分普通点乘注意的核心功能。它可以启用更高效的执行和支持更大的量化Transformer模型在资源受限的硬件或非标准数学系统上。训练实验在四个常见的任务上表明,测试集预测得分与普通点乘注意的模型相比几乎相同。我们的扩展实验还表明,使用我们在这篇论文中提出的ReLU和加法基于的注意机制可以实现隐私保护AI应用程序在同质加密下运行,避免了EncryptedVariable的贵重乘法。

aSAGA: Automatic Sleep Analysis with Gray Areas

  • paper_url: http://arxiv.org/abs/2310.02032
  • repo_url: None
  • paper_authors: Matias Rusanen, Gabriel Jouan, Riku Huttunen, Sami Nikkonen, Sigríður Sigurðardóttir, Juha Töyräs, Brett Duce, Sami Myllymaa, Erna Sif Arnardottir, Timo Leppänen, Anna Sigridur Islind, Samu Kainulainen, Henri Korkalainen
  • for: 这个研究旨在提出一个人工智能与人类合作的睡眠分析方法,以实现 automatization 的睡眠分析,并且能够与职业睡眠技术员之间的互动进行协同运作。
  • methods: 这个研究使用了自动睡眠分析模型(aSAGA),该模型能够对来自临床波形测量和家用睡眠测量的睡眠资料进行自动分析,并且能够处理不同类型的睡眠资料。
  • results: 研究发现,使用这个自动睡眠分析模型可以与人类职业睡眠技术员的分析结果相互匹配,并且可以运用不同类型的睡眠资料进行自动分析。此外,这个研究还发现了一个称为“灰色区域”的概念,这个概念可以用来描述自动睡眠分析中存在着不确定性的部分,并且可以帮助睡眠技术员更好地处理这些部分。
    Abstract State-of-the-art automatic sleep staging methods have already demonstrated comparable reliability and superior time efficiency to manual sleep staging. However, fully automatic black-box solutions are difficult to adapt into clinical workflow and the interaction between explainable automatic methods and the work of sleep technologists remains underexplored and inadequately conceptualized. Thus, we propose a human-in-the-loop concept for sleep analysis, presenting an automatic sleep staging model (aSAGA), that performs effectively with both clinical polysomnographic recordings and home sleep studies. To validate the model, extensive testing was conducted, employing a preclinical validation approach with three retrospective datasets; open-access, clinical, and research-driven. Furthermore, we validate the utilization of uncertainty mapping to identify ambiguous regions, conceptualized as gray areas, in automatic sleep analysis that warrants manual re-evaluation. The results demonstrate that the automatic sleep analysis achieved a comparable level of agreement with manual analysis across different sleep recording types. Moreover, validation of the gray area concept revealed its potential to enhance sleep staging accuracy and identify areas in the recordings where sleep technologists struggle to reach a consensus. In conclusion, this study introduces and validates a concept from explainable artificial intelligence into sleep medicine and provides the basis for integrating human-in-the-loop automatic sleep staging into clinical workflows, aiming to reduce black-box criticism and the burden associated with manual sleep staging.
    摘要 现代自动睡眠分期方法已经达到了人工分期的相似可靠性和更高的时间效率。然而,完全自动的黑盒解决方案具有与休息技术人员的互动不足和不充分的概念化。因此,我们提出了人工 loop概念 для睡眠分析,presenting an automatic sleep staging model (aSAGA),可以具有临床多somnographic recording和家用睡眠研究的效果。为验证模型,我们进行了广泛的测试,使用了三个退回数据集:开放访问、临床和研究驱动。此外,我们验证了uncertainty mapping的使用,以识别自动睡眠分析中的不确定区域,即灰色区域,需要人工重新评估。结果表明,自动睡眠分析达到了不同睡眠记录类型的相同水平的一致度。此外, validation of the gray area concept revealed its potential to enhance sleep staging accuracy and identify areas in the recordings where sleep technologists struggle to reach a consensus。总之,这项研究将解释人工智能概念引入睡眠医学中,并提供了将人工 loop自动睡眠分析 integrate into clinical workflows的基础,以减少黑盒批评和手动睡眠分析的负担。

Between accurate prediction and poor decision making: the AI/ML gap

  • paper_url: http://arxiv.org/abs/2310.02029
  • repo_url: None
  • paper_authors: Gianluca Bontempi
  • for: 本研究旨在探讨人工智能代理人在做出决策时的假设准确性对最终决策的影响,以及 Utility 评估的不准确性对决策的影响。
  • methods: 本研究使用了 AI/ML 技术来预测可能的动作的后果,并对决策策略进行优化。研究者还使用了 teoretic 和 simulations 来评估 Utility 评估的不准确性对决策的影响。
  • results: 研究结果显示,假设 Utility 评估不准确可能导致决策失败,而且这种情况可能比假设概率评估不准确的情况更加严重。研究者建议人工智能社区做出一个关注 Utility 评估的偏移,强调在决策过程中准确评估 Utility。
    Abstract Intelligent agents rely on AI/ML functionalities to predict the consequence of possible actions and optimise the policy. However, the effort of the research community in addressing prediction accuracy has been so intense (and successful) that it created the illusion that the more accurate the learner prediction (or classification) the better would have been the final decision. Now, such an assumption is valid only if the (human or artificial) decision maker has complete knowledge of the utility of the possible actions. This paper argues that AI/ML community has taken so far a too unbalanced approach by devoting excessive attention to the estimation of the state (or target) probability to the detriment of accurate and reliable estimations of the utility. In particular, few evidence exists about the impact of a wrong utility assessment on the resulting expected utility of the decision strategy. This situation is creating a substantial gap between the expectations and the effective impact of AI solutions, as witnessed by recent criticisms and emphasised by the regulatory legislative efforts. This paper aims to study this gap by quantifying the sensitivity of the expected utility to the utility uncertainty and comparing it to the one due to probability estimation. Theoretical and simulated results show that an inaccurate utility assessment may as (and sometimes) more harmful than a poor probability estimation. The final recommendation to the community is then to undertake a focus shift from a pure accuracy-driven (or obsessed) approach to a more utility-aware methodology.
    摘要 智能代理人 rely on AI/ML 功能来预测可能的行动的结果并优化策略。然而,研究社区对预测精度的努力已经非常投入(并成功),创造了一种假设:当 learner 预测(或分类)更加准确,final 决策就更好。然而,这个假设只有当人工或天然的决级者有完整的知识的可能行动的用处时才是有效的。本文 argue 表示 AI/ML 社区已经过去了一种过度的偏重策略,即强调预测状态(或目标)概率的优先级,而忽略了准确和可靠地 estimating 用处。特别是,有限的证据表明一个 incorrect 用处评估对最终预期的用处产生的影响。这种情况正在创造一个巨大的预期与实际效果之间的差距,如 witnessed 由 latest 批判和 regulatory 立法努力。本文旨在研究这个差距,并通过量化用处不确定性的敏感度和比较它与概率估计的影响来解释。理论和模拟结果表明,一个不准确的用处评估可能比一个差异概率估计更有害。因此,我们建议 AI/ML 社区转移注意点从纯粹的准确率驱动(或困惑)方法至更加用处意识的方法ологи。

DeepHGCN: Toward Deeper Hyperbolic Graph Convolutional Networks

  • paper_url: http://arxiv.org/abs/2310.02027
  • repo_url: None
  • paper_authors: Jiaxu Liu, Xinping Yi, Xiaowei Huang
  • for: 本研究旨在提出一种深度多层扩展的几何图 convolutional neural network (HGCN),以解决现有HGCN的昂贵的几何操作和深度增加导致的过拟合问题。
  • methods: 本研究提出了两个关键技术来推动深度HGCN的发展:首先,一种新的几何特征变换层,可以快速和准确地 Linear Map;其次,在几何GCN中使用几何偏置和regulation,通过高效的几何中点方法来实现。
  • results: 对于链接预测和节点分类任务,DeepHGCN在比对欧式和浅几何GCN variant时获得了显著的改善。
    Abstract Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, developing a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) Techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.
    摘要 希пербо利ックグラフ学习网络(HGCN)已经表现出抽取层次图的信息的潜力。然而,现有的HGCN受到 shallow 架构的限制,因为浸入很贵的希пербо利ック操作和难以控制的过滤效果。虽然在 GCN 中,对于减轻过滤效果进行了处理,但是开发希пербо利ック疗法呈现出独特的挑战,因为操作需要特别地设计适应希пербо利ック性质。面临这些挑战,在这项工作中,我们提出了 DeepHGCN,第一个深度多层 HGCN 架构,具有显著提高的计算效率和减轻过滤效果。DeepHGCN 的两个关键启发器是:(1)一种新的希пербо利ック特征变换层,可以快速和准确地生成线性映射;(2)使用希пербо利ック征料连接和规则化,通过高效的希пербо利ック中点方法来促进权重和特征的正则化。广泛的实验表明,DeepHGCN 在链接预测和节点分类任务中具有显著的改善,比 Euclidean 和 shallow 希пербо利ック GCN 变体更好。

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

  • paper_url: http://arxiv.org/abs/2310.02025
  • repo_url: https://github.com/Phoveran/DeepZero
  • paper_authors: Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Jiancheng Liu, Konstantinos Parasyris, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu
    for:* This paper aims to develop a principled zero-order (ZO) deep learning framework for training deep neural networks (DNNs) without a significant decrease in performance.methods:* The proposed framework, called DeepZero, uses three primary innovations to scale ZO optimization to DNN training: coordinate-wise gradient estimation (CGE), sparsity-induced ZO training protocol, and feature reuse and forward parallelization.results:* The proposed DeepZero framework achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching first-order (FO) training performance for the first time. Additionally, the framework demonstrates practical utility in applications such as certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA.
    Abstract Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinate-wise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsity-induced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box.
    摘要 Zero-order(ZO)优化已成为机器学习(ML)问题的受欢迎技术,当第一顺(FO)信息困难或不可能获得时。然而,ZO优化的扩展性仍然是一个打开的问题:它的使用主要受到了相对较小规模的ML问题的限制,如采样计算中的对抗攻击生成。据我们所知,没有任何先前的工作可以证明ZO优化在训练深度学习网络(DNN)时不会导致性能下降的情况。为了突破这个障碍,我们开发了DeepZero,一个理解ZO深度学习(DL)框架,可以扩展ZO优化到DNN训练从头开始。我们通过以下三个主要创新来实现这一目标:1. 在训练精度和计算效率方面,coordinate-wise gradient estimation(CGE)比随机化向量化gradient estimation(RVE)更有利。2. 我们提出了基于简单差分的模型剔除方法,通过激活 sparse DL prior 来扩展模型剔除方法。3. 我们开发了Feature Reuse和Forward Parallelization等方法,以提高ZO训练的实际应用。我们的广泛实验表明,DeepZero可以在ResNet-20 trained on CIFAR-10上达到SOTA精度,并且在FO训练性能附近。此外,我们还证明了DeepZero在证明性防御和DL基于partial differential equation error correction应用中的实用性,提高了10-20%。我们认为我们的结果将激励未来关于可扩展ZO优化的研究,并为深度学习的扩展做出贡献。

Nash Regret Guarantees for Linear Bandits

  • paper_url: http://arxiv.org/abs/2310.02023
  • repo_url: None
  • paper_authors: Ayush Sawarni, Soumybrata Pal, Siddharth Barman
  • for: 这个论文主要目标是提供一种在Stochastic Linear Bandits框架下实现的公平的帮助Algorithm,并且提供了一个准确的 regret bound。
  • methods: 这个论文使用了Successive Elimination方法,并且提供了一些新的技术解决方案,包括tailored concentration bounds和使用John ellipsoid sampling。
  • results: 这个论文得到了一个 Nash regret upper bound of $O\left( \sqrt{\frac{d\nu}{T} \log( T |X|)\right)$,并且在 bounded, positive rewards 下也能够获得一个 Nash regret upper bound of $O\left( \frac{d^\frac{5}{4}\nu^{\frac{1}{2}}{\sqrt{T} \log(T)\right)$。
    Abstract We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening -- referred to as Nash regret -- is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of $T$ rounds and with set of arms ${X}$ in ambient dimension $d$. Furthermore, we focus on settings in which the stochastic reward -- associated with each arm in ${X}$ -- is a non-negative, $\nu$-sub-Poisson random variable. For this setting, we develop an algorithm that achieves a Nash regret of $O\left( \sqrt{\frac{d\nu}{T} \log( T |X|)\right)$. In addition, addressing linear bandit instances in which the set of arms ${X}$ is not necessarily finite, we obtain a Nash regret upper bound of $O\left( \frac{d^\frac{5}{4}\nu^{\frac{1}{2}}{\sqrt{T} \log(T)\right)$. Since bounded random variables are sub-Poisson, these results hold for bounded, positive rewards. Our linear bandit algorithm is built upon the successive elimination method with novel technical insights, including tailored concentration bounds and the use of sampling via John ellipsoid in conjunction with the Kiefer-Wolfowitz optimal design.
    摘要 我们获得了一种 Essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening, referred to as Nash regret, is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms, and therefore, an upper bound on Nash regret provides a principled fairness guarantee.我们考虑了在 $T$ 个轮次和 $X$ 个武器(arm)上的随机线性带刺问题。具体来说,我们考虑的是在每个轮次上,每个武器的奖励是一个非负、 $\nu$-次减Poisson随机变量。为这种设定,我们开发了一个算法,其 Nash regret 为 $O\left( \sqrt{\frac{d\nu}{T} \log( T |X|)\right)$。此外,我们还考虑了线性带刺问题中的非确定武器集 ${X}$,并获得了一个 Nash regret Upper bound of $O\left( \frac{d^\frac{5}{4}\nu^{\frac{1}{2}}{\sqrt{T} \log(T)\right)$。由于减Poisson随机变量是bounded random variables的特例,这些结果适用于带有 bounded, positive 奖励的情况。我们的线性带刺算法基于成功排除方法,并具有新的技术积umi,包括专门的集中散度 bounds 和 John ellipsoid 的采样。

Ranking a Set of Objects using Heterogeneous Workers: QUITE an Easy Problem

  • paper_url: http://arxiv.org/abs/2310.02016
  • repo_url: None
  • paper_authors: Alessandro Nordio, Alberto tarable, Emilio Leonardi
  • for: 本研究旨在解决对 $N$ 个对象进行排名的问题,从一群不均衡的工作者提供的不准确对比数据开始。
  • methods: 本研究提出了一种非适应式排名算法 QUITE,该算法同时估计工作者的可靠性和对象的质量。
  • results: 对于不同场景,QUITE 的表现与之前提出的算法进行比较,并且可以自然地做出适应式改进。
    Abstract We focus on the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of unequal workers, each worker being characterized by a specific degree of reliability, which reflects her ability to rank pairs of objects. More specifically, we assume that objects are endowed with intrinsic qualities and that the probability with which an object is preferred to another depends both on the difference between the qualities of the two competitors and on the reliability of the worker. We propose QUITE, a non-adaptive ranking algorithm that jointly estimates workers' reliabilities and qualities of objects. Performance of QUITE is compared in different scenarios against previously proposed algorithms. Finally, we show how QUITE can be naturally made adaptive.
    摘要 我们强调在从一群不同能力的工人提供的混乱对比中排名 $N$ 个物品的问题。更 Specifically,我们假设物品具有内在质量,并且工人的可靠度和物品之间的差异都会影响选择的结果。我们提出了QUITE非逐算法,可以同时估计工人的可靠度和物品的质量。我们在不同的情况下与先前的算法进行比较QUITE的性能,最后我们显示了如何将QUITE自然地变数算法。

Spectral operator learning for parametric PDEs without data reliance

  • paper_url: http://arxiv.org/abs/2310.02013
  • repo_url: None
  • paper_authors: Junho Choi, Taehyun Yun, Namjung Kim, Youngjoon Hong
  • for: solves parametric partial differential equations (PDEs) without the need for data harnessing.
  • methods: employs expansions using orthogonal functions, such as Fourier series and Legendre polynomials, and merges the merits of spectral methods with the prowess of deep neural networks.
  • results: accurately predicts solutions of complex parametric PDEs, including singularly perturbed convection-diffusion equations and the Navier-Stokes equations, without the need for paired input-output training data.
    Abstract In this paper, we introduce the Spectral Coefficient Learning via Operator Network (SCLON), a novel operator learning-based approach for solving parametric partial differential equations (PDEs) without the need for data harnessing. The cornerstone of our method is the spectral methodology that employs expansions using orthogonal functions, such as Fourier series and Legendre polynomials, enabling accurate PDE solutions with fewer grid points. By merging the merits of spectral methods - encompassing high accuracy, efficiency, generalization, and the exact fulfillment of boundary conditions - with the prowess of deep neural networks, SCLON offers a transformative strategy. Our approach not only eliminates the need for paired input-output training data, which typically requires extensive numerical computations, but also effectively learns and predicts solutions of complex parametric PDEs, ranging from singularly perturbed convection-diffusion equations to the Navier-Stokes equations. The proposed framework demonstrates superior performance compared to existing scientific machine learning techniques, offering solutions for multiple instances of parametric PDEs without harnessing data. The mathematical framework is robust and reliable, with a well-developed loss function derived from the weak formulation, ensuring accurate approximation of solutions while exactly satisfying boundary conditions. The method's efficacy is further illustrated through its ability to accurately predict intricate natural behaviors like the Kolmogorov flow and boundary layers. In essence, our work pioneers a compelling avenue for parametric PDE solutions, serving as a bridge between traditional numerical methodologies and cutting-edge machine learning techniques in the realm of scientific computation.
    摘要 在这篇论文中,我们介绍了一种新的算法——参数partial differential equations(PDE)学习方法(SCLON),该方法可以解决 parametric PDE 问题,无需数据采集。我们的方法基于spectral methodology,使用正交函数,如 fourier series和Legendre polynomials,以实现高精度的 PDE 解。通过将spectral方法的优点,包括高精度、效率、通用性和边界条件的精确满足,与深度神经网络的强大特点相结合,SCLON 提供了一种转变性的策略。我们的方法不仅消除了需要对输入输出数据的匹配,而且可以高效地学习和预测复杂参数 PDE 的解。我们的方法可以解决多种参数 PDE 问题,包括带有特征函数的扩散-液体动力学方程和 Navier-Stokes 方程。我们的方法在多个实例中表现出了superior performance,而且可以准确预测自然现象,如kolmogorov流和边界层。总之,我们的工作开拓了一个新的avenue para parametric PDE 解决方案,并成为传统数值方法和现代机器学习技术之间的桥梁。

fmeffects: An R Package for Forward Marginal Effects

  • paper_url: http://arxiv.org/abs/2310.02008
  • repo_url: None
  • paper_authors: Holger Löwe, Christian A. Scholbeck, Christian Heumann, Bernd Bischl, Giuseppe Casalicchio
  • for: 提供一种可读性和可行性的模型解释方法,即如果我们将变量$x$改变为$h$,则预测结果$\widehat{y}$会受到多少改变?
  • methods: 基于Forward Marginal Effects(FMEs)的模型解释方法,提供了一个可用的R包实现。
  • results: 提供了一种可用的R包,可以帮助用户快速地计算和分析模型的解释结果。
    Abstract Forward marginal effects (FMEs) have recently been introduced as a versatile and effective model-agnostic interpretation method. They provide comprehensible and actionable model explanations in the form of: If we change $x$ by an amount $h$, what is the change in predicted outcome $\widehat{y}$? We present the R package fmeffects, the first software implementation of FMEs. The relevant theoretical background, package functionality and handling, as well as the software design and options for future extensions are discussed in this paper.
    摘要 forward marginal effects (FMEs) 最近被引入为一种通用和有效的模型无关解释方法。它们提供可读性和行动性的模型解释,表示:如果我们将变量x变化了一个量h, Then what is the change in predicted outcome $\widehat{y}$? 我们在这篇论文中介绍了R包fmeffects,是FMEs的首个软件实现。我们还讨论了理论背景、包功能和处理方法、软件设计和未来扩展的选项。

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

  • paper_url: http://arxiv.org/abs/2310.01975
  • repo_url: None
  • paper_authors: Xuran Meng, Difan Zou, Yuan Cao
  • for: This paper aims to study the “benign overfitting” phenomenon in deep learning models, specifically in the context of XOR-type classification tasks with label-flipping noises.
  • methods: The paper uses an over-parameterized ReLU CNN trained by gradient descent to achieve near Bayes-optimal accuracy in the XOR problems, and establishes a matching lower bound result to demonstrate the efficiency of the CNN in learning the tasks.
  • results: The paper shows that, under certain conditions on the sample complexity and signal-to-noise ratio, the over-parameterized CNN can achieve near Bayes-optimal accuracy in the XOR problems, and establishes a lower bound result to demonstrate the absolute constant gap between the CNN’s accuracy and the Bayes-optimal rate.
    Abstract Modern deep learning models are usually highly over-parameterized so that they can overfit the training data. Surprisingly, such overfitting neural networks can usually still achieve high prediction accuracy. To study this "benign overfitting" phenomenon, a line of recent works has theoretically studied the learning of linear models and two-layer neural networks. However, most of these analyses are still limited to the very simple learning problems where the Bayes-optimal classifier is linear. In this work, we investigate a class of XOR-type classification tasks with label-flipping noises. We show that, under a certain condition on the sample complexity and signal-to-noise ratio, an over-parameterized ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy. Moreover, we also establish a matching lower bound result showing that when the previous condition is not satisfied, the prediction accuracy of the obtained CNN is an absolute constant away from the Bayes-optimal rate. Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.
    摘要 现代深度学习模型通常具有高度过参数化,以至于可能过拟合训练数据。尽管如此,这些过拟合神经网络仍可以达到高精度预测。为研究这一“温和过拟合”现象,一系列最近的研究已经 theoretically 研究了线性模型和两层神经网络的学习。然而,大多数这些分析仍然受限于非常简单的学习问题,其中的最佳分类器都是线性的。在这项工作中,我们研究了一类 XOR-类型的分类任务,并对这些任务进行了实验研究。我们发现,在某些条件下,一个过参数化的 ReLU CNN 使用梯度下降来训练,可以达到near Bayes-优化的精度。此外,我们还确立了一个匹配的下界结果,表明在这些条件不满足时,获得的 CNN 的预测精度与 Bayes-优化率之间的差距是一定的常数。我们的结果表明,CNN 在 XOR 问题上具有惊人的学习能力,即使特征之间存在高度的相关性。

Federated Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2310.01973
  • repo_url: None
  • paper_authors: Alain Rakotomamonjy, Kimia Nadjahi, Liva Ralaivola
  • for: 这个论文是为了计算在分布式环境中的沃氏距离而设计的。
  • methods: 这个论文使用的方法是在中央服务器的协调下,通过利用沃氏距离的几何性质和相应的{\em geodesics}来估算沃氏距离。
  • results: 这个论文的结果表明,这种方法可以有效地计算沃氏距离,并且可以用来提高 Federated Learning 算法的性能。
    Abstract We introduce a principled way of computing the Wasserstein distance between two distributions in a federated manner. Namely, we show how to estimate the Wasserstein distance between two samples stored and kept on different devices/clients whilst a central entity/server orchestrates the computations (again, without having access to the samples). To achieve this feat, we take advantage of the geometric properties of the Wasserstein distance -- in particular, the triangle inequality -- and that of the associated {\em geodesics}: our algorithm, FedWad (for Federated Wasserstein Distance), iteratively approximates the Wasserstein distance by manipulating and exchanging distributions from the space of geodesics in lieu of the input samples. In addition to establishing the convergence properties of FedWad, we provide empirical results on federated coresets and federate optimal transport dataset distance, that we respectively exploit for building a novel federated model and for boosting performance of popular federated learning algorithms.
    摘要 我们介绍了一种原理性的联邦方式来计算两个分布的沃氏距离。具体来说,我们展示了如何在不同设备/客户端上存储和计算的样本之间计算沃氏距离,而中央实体/服务器负责协调计算(不需要访问样本)。为了实现这一目标,我们利用沃氏距离的几何性质——特别是三角不等式——以及相应的{\em geodesics}的性质。我们的算法(FedWad,即联邦沃氏距离算法)通过在几何空间中扭曲和交换分布来逼近沃氏距离。此外,我们还证明了FedWad算法的收敛性和实际效果。Here's the breakdown of the translation:* 联邦 (federated) refers to the fact that the algorithm is designed to work on distributed devices or clients, without a centralized server.* 沃氏距离 (Wasserstein distance) is a measure of distance between two probability distributions.* 分布 (distribution) refers to the probability distributions of the samples stored on different devices.* 几何空间 (geometric space) refers to the space of geodesics, which are used to approximate the Wasserstein distance.* 扭曲 (manipulate) and 交换 (exchange) refer to the actions performed on the distributions in the geometric space to approximate the Wasserstein distance.* 收敛性 (convergence properties) refers to the fact that the algorithm converges to the correct solution as the number of iterations increases.* 实际效果 (empirical results) refers to the performance of the algorithm on real-world datasets.

Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

  • paper_url: http://arxiv.org/abs/2310.01972
  • repo_url: https://github.com/sacs-epfl/decentralizepy
  • paper_authors: Martijn de Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma
  • for: 这篇论文旨在提出一种基于分布式学习(Decentralized Learning,DL)的新算法,称为疫情学习(Epidemic Learning,EL),该算法可以在不同的通信网络结构下进行学习,以提高模型的融合速度。
  • methods: EL算法在每个轮次中,每个节点将其模型更新发送到随机选择的 $s$ 个节点中,这使得模型在不同的通信网络结构下进行学习,从而提高模型的融合速度。
  • results: 我们对 EL 算法进行了广泛的理论分析,并证明了其在不同的通信网络结构下的优化性。在考虑非 convex 损函数时,EL 算法在 $O(n^3/s^2)$ 轮次内 converges,比最佳知道的 bound $O(n^3)$ 快,这表明随机化通信可以为 DL 提供利益。在实验中,我们 comparing EL 算法与现有的 DL 算法,并证明了 EL 可以在 96 个节点网络上 converge 更快,并达到 $2.2 $% 高于基eline DL 算法的同等通信量。
    Abstract We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume.
    摘要 我团队现代 Epidemic Learning(EL),一种简单又强大的分布式学习(DL)算法,利用变化的通信 topology 以实现更快的模型融合。在每次 EL 中,每个节点将其模型更新发送到随机选择的 $s$ 个节点(在系统中的 $n$ 个节点)。我们提供了 EL 的广泛理论分析,证明其变化的 topology 会产生更高的融合性能,比起现有的静态和动态 topology。对于平滑非 convex 损失函数,EL 的融合轮数,即需要达到 asymptotic 线性加速的轮数,是 $O(n^3/s^2)$,比best-known bound $O(n^3)$ 高一个因子 $s^2$,表明随机通信对 DL 有利。我们对一个 96 个节点网络进行了实验,并与现有 DL 方法进行比较。我们的结果表明,EL 可以在 1.7 倍快于基eline DL 算法 converged,并达到 2.2 个百分点高于同样的通信量的精度。

Beyond Labeling Oracles: What does it mean to steal ML models?

  • paper_url: http://arxiv.org/abs/2310.01959
  • repo_url: None
  • paper_authors: Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot
  • for: 这篇论文主要写于模型EXTRACTION攻击,即通过API访问ML模型并劫持其中的已训练模型。
  • methods: 论文提出了一种新的模型EXTRACTION攻击方法,即通过让攻击者根据自己的假设采样数据来劫持模型。
  • results: 论文通过实验发现,现有的模型EXTRACTION攻击方法通常不能实现预期的成本减少,因为攻击者需要有充足的假设数据来采样。此外,论文还发现了一些因素影响模型EXTRACTION的成功率,如攻击者的先前知识和攻击策略。
    Abstract Model extraction attacks are designed to steal trained models with only query access, as is often provided through APIs that ML-as-a-Service providers offer. ML models are expensive to train, in part because data is hard to obtain, and a primary incentive for model extraction is to acquire a model while incurring less cost than training from scratch. Literature on model extraction commonly claims or presumes that the attacker is able to save on both data acquisition and labeling costs. We show that the attacker often does not. This is because current attacks implicitly rely on the adversary being able to sample from the victim model's data distribution. We thoroughly evaluate factors influencing the success of model extraction. We discover that prior knowledge of the attacker, i.e. access to in-distribution data, dominates other factors like the attack policy the adversary follows to choose which queries to make to the victim model API. Thus, an adversary looking to develop an equally capable model with a fixed budget has little practical incentive to perform model extraction, since for the attack to work they need to collect in-distribution data, saving only on the cost of labeling. With low labeling costs in the current market, the usefulness of such attacks is questionable. Ultimately, we demonstrate that the effect of prior knowledge needs to be explicitly decoupled from the attack policy. To this end, we propose a benchmark to evaluate attack policy directly.
    摘要 模型提取攻击是设计来盗取受训的模型,仅从query接口进行访问,如ML-as-a-Service提供者所提供的API。机器学习模型训练成本高,其中数据收集成本尤为高,而模型提取攻击的主要动机是盗取模型而不需要重新训练。文献中关于模型提取攻击通常假设或假设攻击者能够减少数据收集和标注成本。我们显示,攻击者并不一定能够减少这些成本。这是因为当前的攻击都是基于攻击者能够采样 victim模型的数据分布。我们仔细评估了模型提取攻击的成功因素,发现攻击者的先前知识,即访问受训数据,对攻击的成功产生了决定性的影响。因此,如果攻击者想要建立一个与训练模型相同的模型,而且只有固定预算,那么他们并不会有实际的动机来进行模型提取,因为为了使攻击成功,他们需要收集受训数据,并且现在标注成本低,这种攻击的用处是有限的。最后,我们示出了需要将先前知识从攻击策略中分离出来。为此,我们提出了一个直接评估攻击策略的标准套件。

Causal Inference with Conditional Front-Door Adjustment and Identifiable Variational Autoencoder

  • paper_url: http://arxiv.org/abs/2310.01937
  • repo_url: None
  • paper_authors: Ziqi Xu, Debo Cheng, Jiuyong Li, Jixue Liu, Lin Liu, Kui Yu
  • for: 这篇论文的目的是解决假设推理中的 causal effect estimation 问题,特别是在没有观察到的潜在随机变量的情况下。
  • methods: 这篇论文提出了 conditional front-door (CFD) adjustment 的概念,并证明了这种调整可以确保 causal effect 的可识别性。此外,这篇论文还提出了一个名为 CFDiVAE 的模型,它可以从数据中学习 CFD adjustment 的表现,并证明了这个模型的可识别性。
  • results: 实验结果显示,CFDiVAE 比较于 existing methods 具有更高的效能和更好的敏感性,并且可以在实际应用中实现更好的结果。此外,CFDiVAE 还可以在实际应用中实现更好的可识别性和隐藏数据的处理。
    Abstract An essential and challenging problem in causal inference is causal effect estimation from observational data. The problem becomes more difficult with the presence of unobserved confounding variables. The front-door adjustment is a practical approach for dealing with unobserved confounding variables. However, the restriction for the standard front-door adjustment is difficult to satisfy in practice. In this paper, we relax some of the restrictions by proposing the concept of conditional front-door (CFD) adjustment and develop the theorem that guarantees the causal effect identifiability of CFD adjustment. Furthermore, as it is often impossible for a CFD variable to be given in practice, it is desirable to learn it from data. By leveraging the ability of deep generative models, we propose CFDiVAE to learn the representation of the CFD adjustment variable directly from data with the identifiable Variational AutoEncoder and formally prove the model identifiability. Extensive experiments on synthetic datasets validate the effectiveness of CFDiVAE and its superiority over existing methods. The experiments also show that the performance of CFDiVAE is less sensitive to the causal strength of unobserved confounding variables. We further apply CFDiVAE to a real-world dataset to demonstrate its potential application.
    摘要 一个重要且挑战性的问题在 causal inference 中是从观察数据中估计 causal effect。在存在隐藏的干扰变量时,这个问题变得更加困难。我们提出了 conditional front-door (CFD) 调整方法,以适应实际中难以满足的限制。在这篇论文中,我们证明了 CFD 调整方法可以确定 causal effect 的可 identificability。然而,在实际中,CFD 变量通常不可能给出。我们提出了 CFDiVAE,一种可以从数据中学习 CFD 调整变量的表示。我们证明了 CFDiVAE 模型的可 identificability,并通过对 sintetic 数据进行了广泛的实验来证明其效果。实验结果表明,CFDiVAE 比存在的方法更有优势,并且对隐藏干扰变量的 causal strength 的敏感性较低。我们最后将 CFDiVAE 应用于实际数据集,以展示其应用前景。

Unsupervised Complex Semi-Binary Matrix Factorization for Activation Sequence Recovery of Quasi-Stationary Sources

  • paper_url: http://arxiv.org/abs/2310.02295
  • repo_url: None
  • paper_authors: Romain Delabeye, Martin Ghienne, Olivia Penas, Jean-Luc Dion
  • for: 本研究旨在提高工业过程和生产系统的理解,以便提高能源可持续性和可靠性。
  • methods: 本研究使用稀疏回归技术来提取序列数据中的主动动力。
  • results: 研究表明,稀疏回归技术可以有效地提取多普通 actuator 的运动序列。
    Abstract Advocating for a sustainable, resilient and human-centric industry, the three pillars of Industry 5.0 call for an increased understanding of industrial processes and manufacturing systems, as well as their energy sustainability. One of the most fundamental elements of comprehension is knowing when the systems are operated, as this is key to locating energy intensive subsystems and operations. Such knowledge is often lacking in practice. Activation statuses can be recovered from sensor data though. Some non-intrusive sensors (accelerometers, current sensors, etc.) acquire mixed signals containing information about multiple actuators at once. Despite their low cost as regards the fleet of systems they monitor, additional signal processing is required to extract the individual activation sequences. To that end, sparse regression techniques can extract leading dynamics in sequential data. Notorious dictionary learning algorithms have proven effective in this regard. This paper considers different industrial settings in which the identification of binary subsystem activation sequences is sought. In this context, it is assumed that each sensor measures an extensive physical property, source signals are periodic, quasi-stationary and independent, albeit these signals may be correlated and their noise distribution is arbitrary. Existing methods either restrict these assumptions, e.g., by imposing orthogonality or noise characteristics, or lift them using additional assumptions, typically using nonlinear transforms.
    摘要 To address this challenge, this paper explores the use of non-intrusive sensors, such as accelerometers and current sensors, to recover activation statuses from sensor data. These sensors can provide mixed signals containing information about multiple actuators at once, but additional signal processing is required to extract the individual activation sequences.To achieve this, the paper proposes using sparse regression techniques to extract leading dynamics in sequential data. Notorious dictionary learning algorithms have proven effective in this regard. The paper considers different industrial settings where the identification of binary subsystem activation sequences is sought.In these settings, it is assumed that each sensor measures an extensive physical property, and the source signals are periodic, quasi-stationary, and independent, although they may be correlated and have an arbitrary noise distribution. Existing methods either restrict these assumptions or lift them using additional assumptions, typically using nonlinear transforms.

Synthetic CT Generation via Variant Invertible Network for All-digital Brain PET Attenuation Correction

  • paper_url: http://arxiv.org/abs/2310.01885
  • repo_url: None
  • paper_authors: Yu Guan, Bohui Shen, Xinchong Shi, Xiangsong Zhang, Bingxuan Li, Qiegen Liu
  • for: 这篇论文的目的是解决 positron emission tomography (PET) 图像中的衰减 correction (AC) 问题,以提供质量更高的 PET 图像。
  • methods: 这篇论文提出了一种使用深度学习approach来实现 PET AC 而无需骨架图像,具体来说是使用一种可逆网络(Invertible Network)和变量增强策略来生成 CT 图像。
  • results: 根据对 1440 个数据集(来自 37 名临床病人)的比较研究,提出的 invertible network for PET AC 表现了比其他 AC 模型更高的性能,这表明了该方法的潜力和在无骨架 CT 下实现 PET AC 的可能性。
    Abstract Attenuation correction (AC) is essential for the generation of artifact-free and quantitatively accurate positron emission tomography (PET) images. However, AC of PET faces challenges including inter-scan motion and erroneous transformation of structural voxel-intensities to PET attenuation-correction factors. Nowadays, the problem of AC for quantitative PET have been solved to a large extent after the commercial availability of devices combining PET with computed tomography (CT). Meanwhile, considering the feasibility of a deep learning approach for PET AC without anatomical imaging, this paper develops a PET AC method, which uses deep learning to generate continuously valued CT images from non-attenuation corrected PET images for AC on brain PET imaging. Specifically, an invertible network combined with the variable augmentation strategy that can achieve the bidirectional inference processes is proposed for synthetic CT generation (IVNAC). To evaluate the performance of the proposed algorithm, we conducted a comprehensive study on a total of 1440 data from 37 clinical patients using comparative algorithms (such as Cycle-GAN and Pix2pix). Perceptual analysis and quantitative evaluations illustrate that the invertible network for PET AC outperforms other existing AC models, which demonstrates the potential of the proposed method and the feasibility of achieving brain PET AC without CT.
    摘要 减弱 correction (AC) 是生成无artifact和量度准确的 positron emission tomography (PET) 图像的关键。然而,PET AC 面临了多种挑战,包括 между扫描图像的运动和错误地将结构ixel-intensity 转换为 PET 减弱 correction 因素。在当今,PET AC 问题已经在一定程度上解决了,尤其是在商业化 PET 与计算机 Tomography (CT) 设备的出现后。在这种情况下,本文提出了一种不需要 анатомиче imaging 的 PET AC 方法,该方法使用深度学习生成维持连续值的 CT 图像,以便对 brain PET 图像进行 AC。特别是,本文提出了一种可逆网络(IVNAC),该网络可以实现 bidirectional inference 过程,并且可以生成 Synthetic CT 图像。为了评估提案的性能,我们对总共 1440 个数据集进行了全面的研究,包括 37 名临床病人的数据。我们对比了一些现有的 AC 模型(如 Cycle-GAN 和 Pix2pix),并进行了感知分析和量度评估。结果表明,可逆网络 для PET AC 在其他现有 AC 模型中表现出色,这表明了提案的可行性和脑 PET AC 无需 CT 的可能性。

AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval

  • paper_url: http://arxiv.org/abs/2310.01880
  • repo_url: None
  • paper_authors: Qi Yan, Raihan Seraj, Jiawei He, Lili Meng, Tristan Sylvain
  • for: 预测现实世界事件的机器学习模型在迅速做出决策的前提下受到关注。传统的预测主要基于结构化数据,如时间序列,但最近的突破口在语言模型中允许使用不结构化的文本进行预测。
  • methods: 我们引入了AutoCast++,一种零shot排名基于上下文检索系统,用于从庞大的新闻文档库中筛选到适合预测的新闻。我们的方法首先根据零shot问题段与新闻文章的相似性进行排名,然后使用零shot摘要来简化Context。我们利用预训练的语言模型,无需培训,进行问题段相似性评估和文章摘要。
  • results: 我们的实验结果表明,AutoCast++可以大幅提高多项问题(MCQ)的性能,提高True/False(TF)问题的性能,最高提高MCQ的性能48%,TF问题的性能最高提高8%。
    Abstract Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answering forecasting queries. Nevertheless, existing methods still trail behind human performance. The cornerstone of accurate forecasting, we argue, lies in identifying a concise, yet rich subset of news snippets from a vast corpus. With this motivation, we introduce AutoCast++, a zero-shot ranking-based context retrieval system, tailored to sift through expansive news document collections for event forecasting. Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news. Following this, the chosen articles are subjected to zero-shot summarization to attain succinct context. Leveraging a pre-trained language model, we conduct both the relevance evaluation and article summarization without needing domain-specific training. Notably, recent articles can sometimes be at odds with preceding ones due to new facts or unanticipated incidents, leading to fluctuating temporal dynamics. To tackle this, our re-ranking mechanism gives preference to more recent articles, and we further regularize the multi-passage representation learning to align with human forecaster responses made on different dates. Empirical results underscore marked improvements across multiple metrics, improving the performance for multiple-choice questions (MCQ) by 48% and true/false (TF) questions by up to 8%.
    摘要 人工预测现实世界事件的可能性正在吸引越来越多的关注,因为它可以提供更加有知识的决策。传统的预测主要基于结构化数据,如时间序列,但最新的突破性语言模型可以通过不结构化的文本进行预测。例如,(Zou et al., 2022)描述了AutoCast,一个新的标准测试用例,使用新闻文章回答预测查询。然而,现有方法仍然落后于人类表现。我们认为,准确预测的基础在于从庞大的新闻文章库中选择一个简短、具有richSemantic的新闻摘要。为此,我们介绍了AutoCast++,一种零shot排名基于上下文检索系统,专门为遍历庞大的新闻文章库进行事件预测。我们的方法首先根据零shot问题文章相似性进行排名,选择semantically相似的新闻。然后,选择的新闻文章将被采用零shot概要化来获得简短的上下文。我们使用预训练的语言模型来进行问题相似性评估和文章概要化,无需培训具有特定领域知识。尤其是,新闻文章可能与之前的文章不同,因为新的事实或意外发展,导致时间动态变化。为了解决这一问题,我们的重新排名机制偏好更新的文章,并进一步规范多个文章表示学习以与人类预测器的回答时间相Alignment。实验结果表明,我们的方法在多个维度上具有明显的改进,提高多选题(MCQ)的性能 by 48%和真假问(TF)的性能 by up to 8%。

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01870
  • repo_url: None
  • paper_authors: Albert Garde, Esben Kran, Fazl Barez
  • for: 这篇论文的目的是提供一个可读性和透明度的工具集,用于分析含有变换器模型的大语言模型(LLM)。
  • methods: 该工具集使用的方法包括探索神经元、比较模型以及从模型中获取信息。
  • results: 该工具集可以帮助研究人员、工程师和开发人员快速检测和解决 LLM 中的问题,提高模型的可读性和透明度,使得 LLM 更加透明、可靠和安全。
    Abstract As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking. To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models' MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques for LLMs readily available. The easy-to-use interface also makes inspecting these complex models more intuitive. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior. For example, we contrast DeepDecipher's functionality with similar tools like Neuroscope and OpenAI's Neuron Explainer. DeepDecipher enables efficient, scalable analysis of LLMs. By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe. Researchers, engineers, and developers can quickly diagnose issues, audit systems, and advance the field.
    摘要 large language models (LLMs) 的能力越来越强大,有一个紧迫的需求是可读性和透明度的工具。现有的方法困难实施,可accessible的模型内部分析工具缺乏。为了bridging这个差距,我们提出了DeepDecipher - 一个用于探索transformer模型的MLP层中neuron的API和界面。DeepDecipher使得LLMs中进行先进的解释性技术的输出 readily available。使用这个易用的界面,可以更直观地检查这些复杂的模型。本文介绍了DeepDecipher的设计和功能。我们示例了如何分析neurons,比较模型,并从模型的行为中获得洞察。例如,我们比较了DeepDecipher的功能与类似的工具如Neuroscope和OpenAI的Neuron Explainer。DeepDecipher可以高效、可扩展地分析LLMs。通过提供先进的解释性方法,DeepDecipher使LLMs更透明、可靠、安全。研究人员、工程师和开发者可以快速诊断问题、审核系统和推动领域的进步。

High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise

  • paper_url: http://arxiv.org/abs/2310.01860
  • repo_url: None
  • paper_authors: Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik
  • for: 这种论文主要研究了在含噪量的情况下,使用概率首领法 optimize 非对映准则和分布式优化问题的高可能性分析方法。
  • methods: 作者提出了一种基于概率 gradient difference clipping 的新方法,并证明了这种方法在 composite 和分布式优化问题中具有高可能性 convergence 性。
  • results: 作者通过实验和分析,证明了这种方法在各种特殊情况下(如强CONvex 问题)具有较好的高可能性 guarantees,并且可以在不受噪量的情况下提高 convergence 速度。
    Abstract High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years. Typically, gradient clipping is one of the key algorithmic ingredients to derive good high-probability guarantees when the noise is heavy-tailed. However, if implemented na\"ively, clipping can spoil the convergence of the popular methods for composite and distributed optimization (Prox-SGD/Parallel SGD) even in the absence of any noise. Due to this reason, many works on high-probability analysis consider only unconstrained non-distributed problems, and the existing results for composite/distributed problems do not include some important special cases (like strongly convex problems) and are not optimal. To address this issue, we propose new stochastic methods for composite and distributed optimization based on the clipping of stochastic gradient differences and prove tight high-probability convergence results (including nearly optimal ones) for the new methods. Using similar ideas, we also develop new methods for composite and distributed variational inequalities and analyze the high-probability convergence of these methods.
    摘要 Recent years have seen a surge of interest in high-probability analysis of stochastic first-order optimization methods under mild noise assumptions. Typically, gradient clipping is a key ingredient for deriving good high-probability guarantees in the presence of heavy-tailed noise. However, naive implementation of clipping can hinder the convergence of popular methods for composite and distributed optimization (Prox-SGD/Parallel SGD) even in the absence of noise. As a result, many works on high-probability analysis focus on unconstrained non-distributed problems, and existing results for composite/distributed problems exclude important special cases (such as strongly convex problems) and are not optimal.To address this issue, we propose new stochastic methods for composite and distributed optimization based on stochastic gradient difference clipping and prove tight high-probability convergence results (including nearly optimal ones) for the new methods. Similarly, we develop new methods for composite and distributed variational inequalities and analyze the high-probability convergence of these methods.

Variational Gaussian approximation of the Kushner optimal filter

  • paper_url: http://arxiv.org/abs/2310.01859
  • repo_url: https://github.com/angaral41/https-www.gaytorrent.ru-details.php-231000b636db0c4b71a4245d9ffa993a0185914d90a31ebf-Kieran-Flex
  • paper_authors: Marc Lambert, Silvère Bonnabel, Francis Bach
  • for: 这个论文主要是为了解决 Dynamical system 的难题,具体来说是通过 continous-time 观察来描述状态的演化。
  • methods: 这个论文使用了 two 种 tractable variational Gaussian approximations,一种是基于 Wasserstein 距离的 proximal loss,另一种是基于 Fisher 距离的 proximal loss。这两种 approximations 可以 fusion 并满足一组 Stochastic Differential Equations(SDEs),这些 SDEs 是关于 Gaussian 的均值和 covariance 矩阵的。
  • results: 这个论文的结果是一种基于 Gaussian flow 的方法,这种方法可以解决 Kalman-Bucy 和 Riccati 流的问题,并且可以扩展到非线性情况。这个方法可以用来解决 Dynamical system 的难题,并且可以提供一种 tractable 的方法来描述状态的演化。
    Abstract In estimation theory, the Kushner equation provides the evolution of the probability density of the state of a dynamical system given continuous-time observations. Building upon our recent work, we propose a new way to approximate the solution of the Kushner equation through tractable variational Gaussian approximations of two proximal losses associated with the propagation and Bayesian update of the probability density. The first is a proximal loss based on the Wasserstein metric and the second is a proximal loss based on the Fisher metric. The solution to this last proximal loss is given by implicit updates on the mean and covariance that we proposed earlier. These two variational updates can be fused and shown to satisfy a set of stochastic differential equations on the Gaussian's mean and covariance matrix. This Gaussian flow is consistent with the Kalman-Bucy and Riccati flows in the linear case and generalize them in the nonlinear one.
    摘要 在估计理论中,库什纳方程描述了一 dynamical system 的状态的概率密度的发展,给出了连续时间观测的情况。基于我们的先前工作,我们提出了一种新的近似解决库什纳方程的方法,通过可迭代的Variational Gaussian approximations来 aproximate two proximal losses associated with the propagation and Bayesian update of the probability density. The first is a proximal loss based on the Wasserstein metric and the second is a proximal loss based on the Fisher metric. The solution to this last proximal loss is given by implicit updates on the mean and covariance that we proposed earlier. These two variational updates can be fused and shown to satisfy a set of stochastic differential equations on the Gaussian's mean and covariance matrix. This Gaussian flow is consistent with the Kalman-Bucy and Riccati flows in the linear case and generalize them in the nonlinear one.Note: Simplified Chinese is used here, which is a more informal and conversational style of Chinese. If you prefer Traditional Chinese or a more formal style, I can provide those as well.

Score-based Data Assimilation for a Two-Layer Quasi-Geostrophic Model

  • paper_url: http://arxiv.org/abs/2310.01853
  • repo_url: https://github.com/francois-rozet/sda
  • paper_authors: François Rozet, Gilles Louppe
  • for: addressed the problem of identifying plausible state trajectories of dynamical systems given noisy or incomplete observations.
  • methods: score-based data assimilation (SDA) method, with modifications to the score network architecture to reduce memory consumption and execution time.
  • results: promising results for a two-layer quasi-geostrophic model.Here’s the full translation in Simplified Chinese:
  • for: 本研究实际运用于地球物理系统中的数据吸收问题,即从不确定或部分观测中推断可能的系统状态轨迹。
  • methods: 本研究使用了一种名为排名基数据吸收(SDA)的新型数据吸收方法,并对排名网络架构进行了修改,以大幅降低内存 consumption和执行时间。
  • results: 实验结果显示,这种修改后的 SDA 方法在两层 quasi-地球径模型中获得了有前途的结果。
    Abstract Data assimilation addresses the problem of identifying plausible state trajectories of dynamical systems given noisy or incomplete observations. In geosciences, it presents challenges due to the high-dimensionality of geophysical dynamical systems, often exceeding millions of dimensions. This work assesses the scalability of score-based data assimilation (SDA), a novel data assimilation method, in the context of such systems. We propose modifications to the score network architecture aimed at significantly reducing memory consumption and execution time. We demonstrate promising results for a two-layer quasi-geostrophic model.
    摘要 “数据融合”是指根据含有噪声或不完整的观测数据来确定动力系统的可能性状态轨迹的问题。在地球科学中,它受到高维度动力系统的挑战,经常超过百万维度。本文评估了基于得分网络的数据融合方法(SDA)在这些系统中的可扩展性。我们提出了改进得分网络架构,以减少内存占用和执行时间。我们在二层 quasi-地球压力模型中实现了promising结果。

EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis

  • paper_url: http://arxiv.org/abs/2310.01835
  • repo_url: https://github.com/crowdstrike/embersim-databank
  • paper_authors: Dragos Georgian Corlatescu, Alexandru Dinu, Mihaela Gaman, Paul Sumedrea
  • for: 本研究的目的是提高针对二进制文件的相似性研究,以强化针对恶意软件的检测。
  • methods: 本研究使用了机器学习技术,并将EMBER数据集(一个大型针对恶意软件的分类数据集)与相似性信息相结合,以便进一步研究相似性空间。
  • results: 本研究提出了一种基于相似性的针对二进制文件的检测方法,并在EMBERSim数据集上进行了实验,结果表明该方法可以提高检测精度。
    Abstract In recent years there has been a shift from heuristics-based malware detection towards machine learning, which proves to be more robust in the current heavily adversarial threat landscape. While we acknowledge machine learning to be better equipped to mine for patterns in the increasingly high amounts of similar-looking files, we also note a remarkable scarcity of the data available for similarity-targeted research. Moreover, we observe that the focus in the few related works falls on quantifying similarity in malware, often overlooking the clean data. This one-sided quantification is especially dangerous in the context of detection bypass. We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER - one of the largest malware classification data sets. We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space. Our contribution is threefold: (1) we publish EMBERSim, an augmented version of EMBER, that includes similarity-informed tags; (2) we enrich EMBERSim with automatically determined malware class tags using the open-source tool AVClass on VirusTotal data and (3) we describe and share the implementation for our class scoring technique and leaf similarity method.
    摘要
  1. We publish EMBERSim, an augmented version of EMBER, which includes similarity-informed tags.2. We enrich EMBERSim with automatically determined malware class tags using the open-source tool AVClass on VirusTotal data.3. We describe and share the implementation of our class scoring technique and leaf similarity method.

Towards Robust Fidelity for Evaluating Explainability of Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.01820
  • repo_url: None
  • paper_authors: Xu Zheng, Farhad Shirani, Tianchun Wang, Wei Cheng, Zhuomin Chen, Haifeng Chen, Hua Wei, Dongsheng Luo
  • for: 这个论文的目的是为了提供一种满足信息论定义的GNN解释函数,并且评估这些解释函数的性能。
  • methods: 这个论文使用了一种新的信息论定义来评估GNN解释函数的性能,并提出了一种robust的评估方法来避免分布shift问题。
  • results: 实验结果表明,提出的评估方法比以前的评估方法更加稳定和可靠,并且与黄金标准 metric更加相似。
    Abstract Graph Neural Networks (GNNs) are neural models that leverage the dependency structure in graphical data via message passing among the graph nodes. GNNs have emerged as pivotal architectures in analyzing graph-structured data, and their expansive application in sensitive domains requires a comprehensive understanding of their decision-making processes -- necessitating a framework for GNN explainability. An explanation function for GNNs takes a pre-trained GNN along with a graph as input, to produce a `sufficient statistic' subgraph with respect to the graph label. A main challenge in studying GNN explainability is to provide fidelity measures that evaluate the performance of these explanation functions. This paper studies this foundational challenge, spotlighting the inherent limitations of prevailing fidelity metrics, including $Fid_+$, $Fid_-$, and $Fid_\Delta$. Specifically, a formal, information-theoretic definition of explainability is introduced and it is shown that existing metrics often fail to align with this definition across various statistical scenarios. The reason is due to potential distribution shifts when subgraphs are removed in computing these fidelity measures. Subsequently, a robust class of fidelity measures are introduced, and it is shown analytically that they are resilient to distribution shift issues and are applicable in a wide range of scenarios. Extensive empirical analysis on both synthetic and real datasets are provided to illustrate that the proposed metrics are more coherent with gold standard metrics.
    摘要 图 neural network(GNN)是一种基于图数据的神经网络模型,通过图节点之间的消息传递来利用图数据的依赖结构。GNN在敏感领域中应用广泛,因此需要深入了解它们的决策过程,而需要一个GNN解释性框架。一个GNN解释函数会接受一个预训练的GNN以及一个图作为输入,并生成一个具有图标签的`sufficient statistic'子图。研究GNN解释性的主要挑战是提供可靠的评估指标,以评估这些解释函数的性能。这篇文章研究了这个基础性的挑战,并发现现有的纯度指标,如$Fid_+$, $Fid_-$ 和 $Fid_\Delta$,在多种统计场景下存在诸多限制。Specifically, this paper introduces a formal, information-theoretic definition of explainability and shows that existing metrics often fail to align with this definition across various statistical scenarios. The reason is due to potential distribution shifts when subgraphs are removed in computing these fidelity measures.为了解决这个问题,本文提出了一种Robust class of fidelity measures,并证明它们在各种场景下具有高度的稳定性和可靠性。这些指标可以在各种场景下应用,并且与权威指标更加一致。经验分析表明,提出的指标在synthetic和实际数据上的性能都较高。

AutoLoRa: A Parameter-Free Automated Robust Fine-Tuning Framework

  • paper_url: http://arxiv.org/abs/2310.01818
  • repo_url: None
  • paper_authors: Xilie Xu, Jingfeng Zhang, Mohan Kankanhalli
  • for: 本研究旨在提高鲁棒精度下游应用中的抗袭击性能,而无需大量计算资源和数据收集。
  • methods: 本研究使用Robust Fine-Tuning(RFT)策略,并提出了一种低维度分支(LoRa)来分离RFT成两个独立组成部分,以便更好地优化自然目标和敌意目标。同时,我们引入了一些启发式策略来自动调整学习率和损失项的参数。
  • results: 我们的提出的自动化LoRa(AutoLoRa)在多种下游任务上达到了新的州OF-THE-ART的result,而且具有实用性,可以自动将预训练的特征提取器转换成抗袭击的模型,无需搜索参数。
    Abstract Robust Fine-Tuning (RFT) is a low-cost strategy to obtain adversarial robustness in downstream applications, without requiring a lot of computational resources and collecting significant amounts of data. This paper uncovers an issue with the existing RFT, where optimizing both adversarial and natural objectives through the feature extractor (FE) yields significantly divergent gradient directions. This divergence introduces instability in the optimization process, thereby hindering the attainment of adversarial robustness and rendering RFT highly sensitive to hyperparameters. To mitigate this issue, we propose a low-rank (LoRa) branch that disentangles RFT into two distinct components: optimizing natural objectives via the LoRa branch and adversarial objectives via the FE. Besides, we introduce heuristic strategies for automating the scheduling of the learning rate and the scalars of loss terms. Extensive empirical evaluations demonstrate that our proposed automated RFT disentangled via the LoRa branch (AutoLoRa) achieves new state-of-the-art results across a range of downstream tasks. AutoLoRa holds significant practical utility, as it automatically converts a pre-trained FE into an adversarially robust model for downstream tasks without the need for searching hyperparameters.
    摘要 “强健精致调整(RFT)是一种低成本策略,以获取攻击下流应用程序的防御能力,不需要很多计算资源和收集大量数据。这篇论文揭露了RFT中的一个问题,那就是在批量调整时,将 adversarial 和自然目标通过抽象特征提取器(FE)进行优化会导致几何方向的散度。这个散度会导致优化过程中的不稳定性,使得获取防御能力受到几何 Parameters的影响。为了解决这个问题,我们提出了一个低维(LoRa)支线,将 RFT 拆分为两个不同的部分:通过 LoRa 支线优化自然目标,并通过 FE 优化攻击目标。此外,我们引入了一些假设策略来自动调整学习率和损失函数的参数。实验结果显示,我们的提案的自动拆分 LoRa 支线(AutoLoRa)在多个下流任务中实现新的顶峰成绩。AutoLoRa 具有实用性,因为它可以将预训练的 FE 转换为攻击下流任务的防御模型,不需要搜寻参数。”

What Determines the Price of NFTs?

  • paper_url: http://arxiv.org/abs/2310.01815
  • repo_url: https://github.com/paralleluniversium/pulproject
  • paper_authors: Vivian Ziemke, Benjamin Estermann, Roger Wattenhofer, Ye Wang
  • for: 本研究旨在了解非可转换 Token (NFT) 价格如何决定,以及哪些因素对 NFT 价格产生影响。
  • methods: 本研究使用了 Onchain 和 Offchain 数据来分析 NFT 收藏品在 OpenSea 上的交易。研究人员对 NFT 的文本和图像数据进行了分析,以了解价格变化的原因。
  • results: 研究结果表明,文本和图像数据可以用来解释 NFT 收藏品的价格变化,但是这些特征不能泛化到新、未看到的收藏品。此外,研究人员发现,NFT 收藏品的交易量经常与其在线存在相关,如社交媒体关注者和网站流量。
    Abstract In the evolving landscape of digital art, Non-Fungible Tokens (NFTs) have emerged as a groundbreaking platform, bridging the realms of art and technology. NFTs serve as the foundational framework that has revolutionized the market for digital art, enabling artists to showcase and monetize their creations in unprecedented ways. NFTs combine metadata stored on the blockchain with off-chain data, such as images, to create a novel form of digital ownership. It is not fully understood how these factors come together to determine NFT prices. In this study, we analyze both on-chain and off-chain data of NFT collections trading on OpenSea to understand what influences NFT pricing. Our results show that while text and image data of the NFTs can be used to explain price variations within collections, the extracted features do not generalize to new, unseen collections. Furthermore, we find that an NFT collection's trading volume often relates to its online presence, like social media followers and website traffic.
    摘要 在数字艺术的演化之中,非可转换代币(NFT)已经出现为一个创新的平台,结合艺术和技术两个领域。NFT作为数字艺术市场的基础结构,对艺术家的作品进行了无 precedent的展示和商业化。NFT将区块链上的元数据与外部数据,如图像,结合起来创造了一种新的数字所有权。但是我们还不很理解这些因素如何决定NFT价格。在这项研究中,我们分析了OpenSea上的NFT收藏品交易数据,以解释NFT价格的变化。我们发现,对于同一个收藏品,文本和图像数据可以用来解释价格变化,但是这些特征不能泛化到新的、未看过的收藏品。此外,我们发现,一个NFT收藏品的交易量经常与其在线存在相关,例如社交媒体follower和网站流量。

Simulation-based Inference with the Generalized Kullback-Leibler Divergence

  • paper_url: http://arxiv.org/abs/2310.01808
  • repo_url: None
  • paper_authors: Benjamin Kurt Miller, Marco Federici, Christoph Weniger, Patrick Forré
  • for: 解决隐藏 проблеme 中的逆问题,当仅知likelihood是 implicit的时候。
  • methods: 使用 Neural Posterior Estimation 方法,适应了 normalized density estimator 作为 posterior 模型。
  • results: 提出了一种 generalized Kullback-Leibler divergence 来考虑不正规分布的正则化常数,并将 Neural Posterior Estimation 和 Neural Ratio Estimation 统一到一个目标中。 benchmark 结果表明 hybrid 模型可以同时利用 normalized base distribution 和 learned ratio 的优点。
    Abstract In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler divergence that accounts for the normalization constant in unnormalized distributions. The objective recovers Neural Posterior Estimation when the model class is normalized and unifies it with Neural Ratio Estimation, combining both into a single objective. We investigate a hybrid model that offers the best of both worlds by learning a normalized base distribution and a learned ratio. We also present benchmark results.
    摘要 在 simulations-based inference 中,我们的目标是解决逆 проблеme 当 likelihood 只能通过 implicit 的方式知道。 Neural Posterior Estimation 通常是一个 normalized density estimator 作为这个类型的 surrogate model。 这个形式不能轻易地适用于不对齐的 surrogate,因为它最佳化 Kullback-Leibler divergence。我们提议使用一个通用化 Kullback-Leibler divergence,考虑到不对齐分布的内部常数。这个目标可以复原 Neural Posterior Estimation 当模型集是正规化的,并将其与 Neural Ratio Estimation 结合,将它们融合为一个单一的目标。我们还探索了一个 hybrid 模型,它可以学习一个正规化的基底分布和一个学习的比例。我们还会提供一些 benchmark 结果。

GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking

  • paper_url: http://arxiv.org/abs/2310.01794
  • repo_url: None
  • paper_authors: Mert Kosan, Samidha Verma, Burouj Armgaan, Khushbu Pahwa, Ambuj Singh, Sourav Medya, Sayan Ranu
  • for: This paper aims to provide a comprehensive understanding of the state-of-the-art explainability methods for Graph Neural Networks (GNNs), and to identify potential research problems for further enhancement.
  • methods: The paper presents a benchmarking study on perturbation-based explainability methods for GNNs, evaluating and comparing a wide range of explainability techniques.
  • results: The study reveals that all algorithms are affected by stability issues when faced with noisy data, and that current counterfactual explainers often fail to provide feasible recourses due to violations of topological constraints encoded by domain-specific considerations.
    Abstract Numerous explainability methods have been proposed to shed light on the inner workings of GNNs. Despite the inclusion of empirical evaluations in all the proposed algorithms, the interrogative aspects of these evaluations lack diversity. As a result, various facets of explainability pertaining to GNNs, such as a comparative analysis of counterfactual reasoners, their stability to variational factors such as different GNN architectures, noise, stochasticity in non-convex loss surfaces, feasibility amidst domain constraints, and so forth, have yet to be formally investigated. Motivated by this need, we present a benchmarking study on perturbation-based explainability methods for GNNs, aiming to systematically evaluate and compare a wide range of explainability techniques. Among the key findings of our study, we identify the Pareto-optimal methods that exhibit superior efficacy and stability in the presence of noise. Nonetheless, our study reveals that all algorithms are affected by stability issues when faced with noisy data. Furthermore, we have established that the current generation of counterfactual explainers often fails to provide feasible recourses due to violations of topological constraints encoded by domain-specific considerations. Overall, this benchmarking study empowers stakeholders in the field of GNNs with a comprehensive understanding of the state-of-the-art explainability methods, potential research problems for further enhancement, and the implications of their application in real-world scenarios.
    摘要 许多解释方法已经被提出来揭示Graph Neural Networks(GNN)的内部工作方式。尽管所有提出的算法都包括了实验评估,但这些评估的问题缺乏多样性。因此,许多GNN解释方面的问题,如对冲抗理解、不同GNN架构、噪声、随机性等因素的稳定性、遵循域内约束等等,尚未得到正式的探讨。为了解决这一需求,我们提出了对GNN解释方法的批判性研究,旨在系统地评估和比较一系列解释技术。在我们的研究中,我们发现了对噪声的抗噪声方法,并证明了这些方法在不同的GNN架构和噪声水平下的稳定性。然而,我们的研究还发现,所有的算法都受到噪声数据的稳定性问题的影响。此外,我们发现,当前的对冲抗解释器frequently fails to provide feasible recourses due to violations of topological constraints encoded by domain-specific considerations。总的来说,这项benchmarking研究为GNN领域的投资者提供了全面的解释方法的状态、需要进一步改进的研究问题以及在实际应用中的影响。

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

  • paper_url: http://arxiv.org/abs/2310.01769
  • repo_url: None
  • paper_authors: Nuoya Xiong, Lijun Ding, Simon S. Du
  • for: 这 paper 证明了过参数化对 gradient descent (GD) 的抽象行为的影响,具体来说是在矩阵感知问题中,要从near-isotropic linear measurements中recover一个未知的低维度基础矩阵 $M^{*}$。
  • methods: 这 paper 使用了一种新的 $\Omega (1/T^2)$ 下界,以及建立在先前的工作之上的全球精确减法结果,来研究过参数化和 precisions 的效果。
  • results: 这 paper 获得了以下结果:在过参数化情况下 ($k>r$), randomly initialized GD 的 convergencerate是 $\exp (-\Omega (T))$,而在 precisions 情况下 ($k=r$),则是 $\exp (-\Omega (T))$。此外,paper 还提出了一种修改 GD 的一步方法,以实现精确减法结果的独立于 $\alpha$ 的recovery。
    Abstract This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear measurements. First, we consider the symmetric setting with the symmetric parameterization where $M^* \in \mathbb{R}^{n \times n}$ is a positive semi-definite unknown matrix of rank $r \ll n$, and one uses a symmetric parameterization $XX^\top$ to learn $M^*$. Here $X \in \mathbb{R}^{n \times k}$ with $k > r$ is the factor matrix. We give a novel $\Omega (1/T^2)$ lower bound of randomly initialized GD for the over-parameterized case ($k >r$) where $T$ is the number of iterations. This is in stark contrast to the exact-parameterization scenario ($k=r$) where the convergence rate is $\exp (-\Omega (T))$. Next, we study asymmetric setting where $M^* \in \mathbb{R}^{n_1 \times n_2}$ is the unknown matrix of rank $r \ll \min\{n_1,n_2\}$, and one uses an asymmetric parameterization $FG^\top$ to learn $M^*$ where $F \in \mathbb{R}^{n_1 \times k}$ and $G \in \mathbb{R}^{n_2 \times k}$. Building on prior work, we give a global exact convergence result of randomly initialized GD for the exact-parameterization case ($k=r$) with an $\exp (-\Omega(T))$ rate. Furthermore, we give the first global exact convergence result for the over-parameterization case ($k>r$) with an $\exp(-\Omega(\alpha^2 T))$ rate where $\alpha$ is the initialization scale. This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence. On the other hand, we propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $\alpha$, recovering the rate in the exact-parameterization case.
    摘要 这个论文证明了过参数化对于梯度下降(GD)在矩阵感知问题中的 converges 行为,其目标是从近iso特RIGHT linear measurements中 recover 一个未知低级matrix $M^* \in \mathbb{R}^{n \times n}$。我们首先考虑了对称参数化的情况,其中 $M^*$ 是一个正 semi-definite 的 unknown matrix 的rank $r \ll n$,并使用对称参数化 $XX^\top$ 来学习 $M^*$。在这种情况下,我们给出了一个 novel $\Omega (1/T^2)$ 下界,表明 randomly initialized GD 在过参数化情况($k > r$)下 converges 的速度是 $\Omega (1/T^2)$。这与精确参数化情况($k=r$)下的 converge 速度 $\exp (-\Omega (T))$ 是 stark contrast。接下来,我们研究了不对称参数化的情况,其中 $M^* \in \mathbb{R}^{n_1 \times n_2}$ 是一个未知矩阵的rank $r \ll \min\{n_1,n_2\}$,并使用不对称参数化 $FG^\top$ 来学习 $M^*$。基于先前的工作,我们给出了一个全球精确 converge 结果,表明 randomly initialized GD 在精确参数化情况($k=r$) 下 converge 的速度是 $\exp (-\Omega(T))$。此外,我们给出了首次全球精确 converge 结果,表明在过参数化情况($k>r$) 下, randomly initialized GD 的 converge 速度是 $\exp(-\Omega(\alpha^2 T))$,其中 $\alpha$ 是初始化的涨scale。这个线性 converge 结果在过参数化情况下是非常 significante,因为可以将对称参数化应用到symmetric setting中,从 $\Omega (1/T^2)$ 提升到线性 converge。此外,我们提出了一种新的方法,只需要修改一步的 GD,可以获得独立于 $\alpha$ 的 converge 速度,recovering the rate in the exact-parameterization case。

Backdiff: a diffusion model for generalized transferable protein backmapping

  • paper_url: http://arxiv.org/abs/2310.01768
  • repo_url: None
  • paper_authors: Yikai Liu, Ming Chen, Guang Lin
  • for: 本研究旨在提出一种通用的背景映射方法,可以应用于不同的粗化模型和蛋白质结构。
  • methods: 我们提出了一种基于梯度分布模型的生成模型,称为BackDiff,可以实现对蛋白质结构的背景映射。BackDiff使用了 Conditional score-based diffusion model with geometric representations,并在自我监督训练框架中适应不同的粗化模型。
  • results: 我们的BackDiff方法在多种流行的粗化模型上进行了广泛的实验,并证明了它的超过 existing state-of-the-art 性能和通用性。此外,BackDiff方法还可以无需重新训练,对不同的蛋白质进行高效的采样。
    Abstract Coarse-grained (CG) models play a crucial role in the study of protein structures, protein thermodynamic properties, and protein conformation dynamics. Due to the information loss in the coarse-graining process, backmapping from CG to all-atom configurations is essential in many protein design and drug discovery applications when detailed atomic representations are needed for in-depth studies. Despite recent progress in data-driven backmapping approaches, devising a backmapping method that can be universally applied across various CG models and proteins remains unresolved. In this work, we propose BackDiff, a new generative model designed to achieve generalization and reliability in the protein backmapping problem. BackDiff leverages the conditional score-based diffusion model with geometric representations. Since different CG models can contain different coarse-grained sites which include selected atoms (CG atoms) and simple CG auxiliary functions of atomistic coordinates (CG auxiliary variables), we design a self-supervised training framework to adapt to different CG atoms, and constrain the diffusion sampling paths with arbitrary CG auxiliary variables as conditions. Our method facilitates end-to-end training and allows efficient sampling across different proteins and diverse CG models without the need for retraining. Comprehensive experiments over multiple popular CG models demonstrate BackDiff's superior performance to existing state-of-the-art approaches, and generalization and flexibility that these approaches cannot achieve. A pretrained BackDiff model can offer a convenient yet reliable plug-and-play solution for protein researchers, enabling them to investigate further from their own CG models.
    摘要 粗粒化(CG)模型在蛋白结构、蛋白热力学性质和蛋白构象动力学中扮演着关键的角色。由于粗粒化过程中的信息损失,从CG到全原子配置的映射是蛋白设计和药物探索应用中的关键环节。尽管最近的数据驱动回映方法已经取得了一定的进步,但是为了在不同的CG模型和蛋白之间实现通用的回映方法,仍然是一个未解决的问题。在这种情况下,我们提出了BackDiff,一种新的生成模型,用于解决蛋白回映问题。BackDiff利用了条件分数基 diffusion 模型和几何表示。由于不同的CG模型可以包含不同的粗粒化站点,我们设计了一种自我超vised 训练框架,以适应不同的CG atoms,并将条件推送样本路径约束为CG auxiliary variables。我们的方法实现了终端训练和高效的样本探索,不需要重新训练。广泛的实验表明,BackDiff的性能高于现有的状态 искусственного智能方法,并且具有这些方法无法实现的通用性和灵活性。一个预训练的 BackDiff 模型可以提供一个便捷又可靠的插件解决方案,帮助蛋白研究人员进一步探索他们自己的CG模型。

Exploring Counterfactual Alignment Loss towards Human-centered AI

  • paper_url: http://arxiv.org/abs/2310.01766
  • repo_url: None
  • paper_authors: Mingzhou Liu, Xinwei Sun, Ching-Wen Lee, Yu Qiao, Yizhou Wang
  • for: 这篇论文旨在提高深度神经网络在监督学习任务中的可信度,特别是在安全批评领域如医疗保健中。
  • methods: 这篇论文提出了一个新的人类中心框架,利用Counterfactual Generation来实现人类中心模型。这个框架使用Counterfactual Generation的能力来做 causa attribution,实现了模型预测结果的实质拓展。
  • results: 这篇论文在一个肺癌诊断数据集上进行了实验,结果显示了实质的人类与模型之间的对准。
    Abstract Deep neural networks have demonstrated impressive accuracy in supervised learning tasks. However, their lack of transparency makes it hard for humans to trust their results, especially in safe-critic domains such as healthcare. To address this issue, recent explanation-guided learning approaches proposed to align the gradient-based attention map to image regions annotated by human experts, thereby obtaining an intrinsically human-centered model. However, the attention map these methods are based on may fail to causally attribute the model predictions, thus compromising their validity for alignment. To address this issue, we propose a novel human-centered framework based on counterfactual generation. In particular, we utilize the counterfactual generation's ability for causal attribution to introduce a novel loss called the CounterFactual Alignment (CF-Align) loss. This loss guarantees that the features attributed by the counterfactual generation for the classifier align with the human annotations. To optimize the proposed loss that entails a counterfactual generation with an implicit function form, we leverage the implicit function theorem for backpropagation. Our method is architecture-agnostic and, therefore can be applied to any neural network. We demonstrate the effectiveness of our method on a lung cancer diagnosis dataset, showcasing faithful alignment to humans.
    摘要 To address this issue, we propose a novel human-centered framework based on counterfactual generation. Our approach utilizes the counterfactual generation's ability to provide causal attribution, and introduces a new loss function called the Counterfactual Alignment (CF-Align) loss. This loss ensures that the features attributed by the counterfactual generation for the classifier align with human annotations.To optimize the proposed loss, we leverage the implicit function theorem for backpropagation. Our method is architecture-agnostic, meaning it can be applied to any neural network. We demonstrate the effectiveness of our method on a lung cancer diagnosis dataset, showing faithful alignment with human annotations.

Data Cleaning and Machine Learning: A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2310.01765
  • repo_url: None
  • paper_authors: Pierre-Olivier Côté, Amin Nikanjam, Nafisa Ahmed, Dmytro Humeniuk, Foutse Khomh
    for: 这篇论文的目的是总结最新的数据清洁方法,以及 ML 技术是如何用于数据清洁的。methods: 该论文使用系统性的文献综述方法,从 2016 年至 2022 年 inclusively 检索了相关的学术论文。results: 该论文总结了 101 篇论文,其中涵盖了多种数据清洁活动,如 feature cleaning、label cleaning、实体匹配、异常检测、填充和整体数据清洁。此外,论文还提出了 24 个未来工作建议。
    Abstract Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. Objective: This paper's objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. Method: We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. Results: We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. Conclusion: We believe that our review of the literature will help the community develop better approaches to clean data.
    摘要 <>传输文本到Simplified Chinese。<>Context: Machine Learning (ML) 已经被应用于多种系统中,其性能强度取决于训练数据的质量。因此,数据错误检测和修复(即数据清洁)已成为一项热点研究领域。研究人员还在探索如何使用 ML 进行数据清洁,从而创造了 ML 和数据清洁之间的双重关系。根据我们所知,现有的研究都没有全面评估这种关系。目标:本文的目标是两fold。首先,它旨在概括最新的数据清洁技术 для ML 和 ML для数据清洁。第二,它提供未来工作建议。方法:我们进行了2016年至2022年 вклюlusively 发表的文献系统性文献评估。我们认定了不同类型的数据清洁活动,包括特征清洁、标签清洁、实体匹配、异常检测、填充和整体数据清洁。结果:我们总结了101篇文章,涵盖了多种数据清洁活动,并提供了24个未来工作建议。我们的评估显示了许多有前途的数据清洁技术,可以进一步推广。结论:我们认为,我们的文献评估会帮助社区开发更好的数据清洁方法。

Sampling Multimodal Distributions with the Vanilla Score: Benefits of Data-Based Initialization

  • paper_url: http://arxiv.org/abs/2310.01762
  • repo_url: None
  • paper_authors: Frederic Koehler, Thuy-Duong Vuong
  • for: 本文研究了使用基本Score函数模型来学习数据中的分布。
  • methods: 本文使用了Hyv"arinen提出的vanilla score匹配方法,并证明了这种方法可以成功地生成自然的多模态分布。
  • results: 本文通过实验表明,使用Langevin随机 движение和早期停止,初始化为数据的empirical分布,可以成功地生成多模态分布(包括log-concave分布的混合)。
    Abstract There is a long history, as well as a recent explosion of interest, in statistical and generative modeling approaches based on score functions -- derivatives of the log-likelihood of a distribution. In seminal works, Hyv\"arinen proposed vanilla score matching as a way to learn distributions from data by computing an estimate of the score function of the underlying ground truth, and established connections between this method and established techniques like Contrastive Divergence and Pseudolikelihood estimation. It is by now well-known that vanilla score matching has significant difficulties learning multimodal distributions. Although there are various ways to overcome this difficulty, the following question has remained unanswered -- is there a natural way to sample multimodal distributions using just the vanilla score? Inspired by a long line of related experimental works, we prove that the Langevin diffusion with early stopping, initialized at the empirical distribution, and run on a score function estimated from data successfully generates natural multimodal distributions (mixtures of log-concave distributions).
    摘要 有一长史和最近一次爆发的兴趣在统计和生成模型方面,基于分数函数——极值函数的Derivative。在经典著作中,Hyvärinen提出了vanilla score匹配作为从数据学习分布的方法,并建立了与已知技术如对称随机过程和 Pseudolikelihood估计的连接。现在已经 widely known that vanilla score匹配对多模分布有问题。虽然有各种方法可以解决这问题,但这个问题仍然未被回答——是否有一个自然的方法使用just vanilla score来抽象多模分布?受到相关实验工作的激发,我们证明了Langevin diffusion with early stopping,从empirical distribution初始化,并在数据上Estimate score函数后,能够成功生成自然的多模分布(mixture of log-concave distributions)。

Linearization of ReLU Activation Function for Neural Network-Embedded Optimization:Optimal Day-Ahead Energy Scheduling

  • paper_url: http://arxiv.org/abs/2310.01758
  • repo_url: None
  • paper_authors: Cunzhi Zhao, Xingpeng Li
  • for: 这个论文旨在提出一种解决使用神经网络模型和ReLU活化函数时难以解决的优化问题的方法。
  • methods: 该论文提出了四种适用于ReLU活化函数的 linearization 方法,并对它们进行了分析和比较。
  • results: 该论文的实验结果表明,这些提出的 linearization 方法可以有效地解决神经网络模型中的非线性问题,并使优化问题更容易解决。
    Abstract Neural networks have been widely applied in the power system area. They can be used for better predicting input information and modeling system performance with increased accuracy. In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models that enforce limits on the output of the same learning model. This will create a neural network-embedded optimization problem; the use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable. To address this emerging challenge, this paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function. Four linearization methods tailored for the ReLU activation function are developed, analyzed and compared in this paper. Each method employs a set of linear constraints to replace the ReLU function, effectively linearizing the optimization problem, which can overcome the computational challenges associated with the nonlinearity of the neural network model. These proposed linearization methods provide valuable tools for effectively solving optimization problems that integrate neural network models with ReLU activation functions.
    摘要

CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery

  • paper_url: http://arxiv.org/abs/2310.01753
  • repo_url: https://github.com/jarrycyx/unn
  • paper_authors: Yuxiao Cheng, Ziqian Wang, Tingxiong Xiao, Qin Zhong, Jinli Suo, Kunlun He
  • for: 这个研究旨在提供一种可靠的时间序列 causal discovery (TSCD)评估方法,以便在实际应用中评估TSCD算法的性能。
  • methods: 该研究提出了一个名为 CausalTime 的数据生成管道,该管道可以生成具有真实时间序列特征的时间序列数据,并且可以提供准确的 causal 图。
  • results: 在实验中,CausalTime 管道生成的数据被证明是具有高准确性和可靠性的,并且可以用于评估TSCD算法的性能。此外,该研究还提供了一个用户友好的网站(www.causaltime.cc),以便易于使用该方法。
    Abstract Time-series causal discovery (TSCD) is a fundamental problem of machine learning. However, existing synthetic datasets cannot properly evaluate or predict the algorithms' performance on real data. This study introduces the CausalTime pipeline to generate time-series that highly resemble the real data and with ground truth causal graphs for quantitative performance evaluation. The pipeline starts from real observations in a specific scenario and produces a matching benchmark dataset. Firstly, we harness deep neural networks along with normalizing flow to accurately capture realistic dynamics. Secondly, we extract hypothesized causal graphs by performing importance analysis on the neural network or leveraging prior knowledge. Thirdly, we derive the ground truth causal graphs by splitting the causal model into causal term, residual term, and noise term. Lastly, using the fitted network and the derived causal graph, we generate corresponding versatile time-series proper for algorithm assessment. In the experiments, we validate the fidelity of the generated data through qualitative and quantitative experiments, followed by a benchmarking of existing TSCD algorithms using these generated datasets. CausalTime offers a feasible solution to evaluating TSCD algorithms in real applications and can be generalized to a wide range of fields. For easy use of the proposed approach, we also provide a user-friendly website, hosted on www.causaltime.cc.
    摘要 时序序 causal discovery (TSCD) 是机器学习的基本问题。然而,现有的 sintetic 数据集不能正确评估或预测算法的性能在实际数据上。这项研究提出了 CausalTime 管道,用于生成高度相似于实际数据的时序序数据,并提供了相关的真实 causal 图 для量化性能评估。该管道从实际观察开始,生成匹配的 benchmark 数据集。首先,我们利用深度神经网络和正则化流来准确捕捉实际动力。其次,我们从神经网络中提取出假设 causal 图,通过重要性分析或利用先验知识。最后,我们 derive 真实 causal 图,将 causal 模型分解为 causal 项、剩余项和噪声项。使用已经适应的网络和 derive 的 causal 图,我们生成对应的 versatile 时序序数据,适用于算法评估。在实验中,我们验证生成的数据的准确性通过 качеitative 和量化的实验,然后使用这些生成的数据进行 TSCD 算法的 benchmarking。CausalTime 提供了评估 TSCD 算法的可行解决方案,可扩展到广泛的领域。为方便使用该方法,我们还提供了一个 user-friendly 的网站,位于 www.causaltime.cc。

5G Network Slicing: Analysis of Multiple Machine Learning Classifiers

  • paper_url: http://arxiv.org/abs/2310.01747
  • repo_url: None
  • paper_authors: Mirsad Malkoc, Hisham A. Kholidy
  • for: 本研究旨在调查不同机器学习算法在检测网络slice的准确性和精度。
  • methods: 本研究使用了多种机器学习算法,包括logistic regression模型、线性积分模型、k-最近邻模型、决策树模型、随机森林模型、SVC BernoulliNB模型和 GaussianNB模型。
  • results: 研究发现,SVC BernoulliNB模型和GaussianNB模型在检测网络slice的准确性和精度方面表现最佳,而其他机器学习算法的性能相对较差。
    Abstract The division of one physical 5G communications infrastructure into several virtual network slices with distinct characteristics such as bandwidth, latency, reliability, security, and service quality is known as 5G network slicing. Each slice is a separate logical network that meets the requirements of specific services or use cases, such as virtual reality, gaming, autonomous vehicles, or industrial automation. The network slice can be adjusted dynamically to meet the changing demands of the service, resulting in a more cost-effective and efficient approach to delivering diverse services and applications over a shared infrastructure. This paper assesses various machine learning techniques, including the logistic regression model, linear discriminant model, k-nearest neighbor's model, decision tree model, random forest model, SVC BernoulliNB model, and GaussianNB model, to investigate the accuracy and precision of each model on detecting network slices. The report also gives an overview of 5G network slicing.
    摘要 “5G 通信基础设施的分割成多个虚拟网络slice,每个slice具有不同的特点,如带宽、延迟、可靠性、安全性和服务质量,被称为5G网络分割。每个slice是独立的逻辑网络,满足特定服务或应用场景的需求,如虚拟现实、游戏、自动驾驶、工业自动化等。网络slice可以在服务需求变化时进行动态调整,从而实现更加cost-effective和高效的多服务应用执行。本文使用不同的机器学习技术,包括Logistic Regression模型、线性混合模型、k-最近邻模型、决策树模型、Random Forest模型、SVM BernoulliNB模型和GaussianNB模型,来评估每种模型在检测网络slice的准确性和精度。报告还提供了5G网络分割的概述。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Randomized Dimension Reduction with Statistical Guarantees

  • paper_url: http://arxiv.org/abs/2310.01739
  • repo_url: None
  • paper_authors: Yijun Dong
  • for: 本研究旨在提出快速减少维度的算法和有效利用数据的算法,以推动算法的可能性的推进。
  • methods: 本研究使用了”矩阵绘制”技术来实现快速随机低维度分解算法,以及在不同情况下进行数据增强学习算法。
  • results: 本研究实现了快速减少维度的算法和有效利用数据的算法,并且提供了一些有效的数据增强学习算法,以提高算法的泛化和分布性 robustness。
    Abstract Large models and enormous data are essential driving forces of the unprecedented successes achieved by modern algorithms, especially in scientific computing and machine learning. Nevertheless, the growing dimensionality and model complexity, as well as the non-negligible workload of data pre-processing, also bring formidable costs to such successes in both computation and data aggregation. As the deceleration of Moore's Law slackens the cost reduction of computation from the hardware level, fast heuristics for expensive classical routines and efficient algorithms for exploiting limited data are increasingly indispensable for pushing the limit of algorithm potency. This thesis explores some of such algorithms for fast execution and efficient data utilization. From the computational efficiency perspective, we design and analyze fast randomized low-rank decomposition algorithms for large matrices based on "matrix sketching", which can be regarded as a dimension reduction strategy in the data space. These include the randomized pivoting-based interpolative and CUR decomposition discussed in Chapter 2 and the randomized subspace approximations discussed in Chapter 3. From the sample efficiency perspective, we focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional robustness provably. Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization where we view sample efficiency from the lens of dimension reduction in the function space. Then in Chapter 5, we introduce an adaptively weighted data augmentation consistency regularization algorithm for distributionally robust optimization with applications in medical image segmentation.
    摘要 大型模型和庞大数据是现代算法取得成就的关键驱动力,特别是在科学计算和机器学习领域。然而,随着模型复杂度和数据维度的增加,以及数据处理的非轻量级劳动力,也带来了计算和数据聚合的沉重成本。随着莫勒定律的减速,计算硬件层次的成本减少不再是可预测的,因此快速的经典算法优化和有效地利用数据的算法变得越来越重要。这个论文探讨一些用于快速执行和有效地利用数据的算法。从计算效率角度来看,我们设计和分析了一些基于矩阵绘图的快速随机低级别分解算法,包括在第2章中介绍的随机扫描基于 interpolative 和 CUR 分解,以及在第3章中介绍的随机子空间近似。从样本效率角度来看,我们关注使用不同的数据扩展技术来提高通用化和分布robustness的学习算法。特别是在第4章中,我们提供了样本复杂度分析为数据扩展一致规范,视样本效率为维度减少在函数空间中。然后在第5章中,我们介绍了一种自适应权重数据扩展一致规范算法,用于分布robust优化,并应用于医学图像分割。

Large Language Models for Test-Free Fault Localization

  • paper_url: http://arxiv.org/abs/2310.01726
  • repo_url: https://github.com/squareslab/llmao
  • paper_authors: Aidan Z. H. Yang, Ruben Martins, Claire Le Goues, Vincent J. Hellendoorn
  • for: 本研究旨在自动地 lokalisierung buggy 代码行,这是许多手动和自动调试任务的关键开头。先前的 FL 技术假设输入测试的提供,并经常需要广泛的程序分析、程序工具或数据处理。先前的深度学习 для APR 努力学习从小数据集中学习,但是其生成的结果对实际程序而言有限。
  • methods: 我们采用了大语言模型(LLMs)的能力,它可以根据几个示例适应新任务。我们调整了一小组批量的方向性Adapter层,以便在 LLMs 学习的表示之上生成 LLMAO,这是首个不需要输入测试信息的语言模型基于的错误 lokalisierung 方法。
  • results: 我们的技术可以在不同的模型大小下实现显著更高的确定性,并且 bug lokalisierung 性能随模型大小呈指数增长。我们在小 manually cura 的报告中训练了 350 万、6 亿和 16 亿参数的 LLMs,并观察到我们的技术可以在大型模型上实现更高的性能。我们的实验表明,LLMAO 可以在 state-of-the-art 机器学习 fault localization (MLFL)基线上提高 Top-1 结果,从而提高 bug lokalisierung 性能。此外,LLMAO 还是首个使用语言模型架构来检测代码行级别安全漏洞的 FL 技术。
    Abstract Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on real-world programs. Inspired by the ability of large language models (LLMs) of code to adapt to new tasks based on very few examples, we investigate the applicability of LLMs to line level fault localization. Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMAO, the first language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune LLMs with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs such as the Defects4J corpus. We observe that our technique achieves substantially more confidence in fault localization when built on the larger models, with bug localization performance scaling consistently with the LLM size. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%. LLMAO is also the first FL technique trained using a language model architecture that can detect security vulnerabilities down to the code line level.
    摘要 fault localization (FL) 目的是自动LOCATE buggy 代码行,这是许多手动和自动调试任务的关键第一步。先前的 FL 技术假设提供输入测试,并经常需要广泛的程序分析、程序 инструментирование或数据处理。关于深度学习 для APR 的先前工作困难以从小数据集学习,并且在实际程序上提供有限的结果。我们由 code 大型语言模型(LLMs)的能力启发,LLMs 可以适应新任务基于非常少的示例,我们调查 LLMAO 是否可以在没有测试覆盖信息的情况下自动LOCATE buggy 代码。我们提出了在 LLMs 上 fine-tune 小量的 bidirectional adapter layers,以生成 LLMAO,这是基于语言模型的代码LOCATE approach。我们 fine-tune LLMs 的参数为 350 million、6 billion 和 16 billion,并在手动精心选择的 buggy 程序 corpora 上进行训练。我们发现,当使用更大的模型时,我们的技术可以获得更高的信任度,并且 bug localization 性能随模型大小呈指数增长。我们的实验表明,LLMAO 可以在 state-of-the-art machine learning fault localization (MLFL) 基线上提高 Top-1 结果,并且 Top-5 结果均提高 14.4%-35.6%。此外,LLMAO 是第一个使用语言模型架构的代码LOCATE 技术,可以在代码行级别检测安全漏洞。

Large Language Models as Analogical Reasoners

  • paper_url: http://arxiv.org/abs/2310.01714
  • repo_url: None
  • paper_authors: Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, Denny Zhou
  • for: 增强语言模型的推理能力
  • methods: 使用自动生成的例子和知识来引导推理过程
  • results: 在多种推理任务中表现出色,比如GSM8K和MATH中的数学问题解决、Codeforces中的代码生成和BIG-Bench中的其他推理任务等。
    Abstract Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, Analogical Prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench.
    摘要 Chain-of-thought(CoT)提示对语言模型表现出色,但通常需要标注的示例来引导理智过程。在这项工作中,我们介绍了一种新的提示方法,即相似性提示(Analogical Prompting),用于自动引导语言模型的理智过程。这种方法灵感于人类的相似理智,人们在面临新问题时从相关的过去经验中练习出答案。我们的方法会让语言模型在 Context 中自动生成相关的示例或知识,然后解决给定的问题。这种方法具有以下优点:不需要标注或检索示例,更加通用和便利;同时,可以根据每个问题自动生成适应性的示例和知识。实验结果表明,我们的方法在多种理智任务中超过零shot CoT和手动几shot CoT,包括 GSM8K 和 MATH 中的数学问题解决、Codeforces 中的代码生成和其他理智任务在 BIG-Bench 中。

On Representation Complexity of Model-based and Model-free Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.01706
  • repo_url: None
  • paper_authors: Hanlin Zhu, Baihe Huang, Stuart Russell
  • for: 本研究探讨了基于模型和无模型的强化学习(RL)在Circuit复杂度上的表示复杂性。
  • methods: 我们使用了理论方法证明了一类MDP中的转移和奖励函数可以被表示为常数深度电路的 polynomialsize,而且优化的$Q$-函数具有常数深度电路中的过程复杂度。
  • results: 我们的理论表明,在某些情况下,模型基于的算法可以更好地采用更少的样本来学习环境,而无模型基的算法往往需要更多的样本来学习。我们还通过对多种Mujoco环境的证明,证明了这种差异。
    Abstract We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity. We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with polynomial size, while the optimal $Q$-function suffers an exponential circuit complexity in constant-depth circuits. By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective: in some cases, the ground-truth rule (model) of the environment is simple to represent, while other quantities, such as $Q$-function, appear complex. We empirically corroborate our theory by comparing the approximation error of the transition kernel, reward function, and optimal $Q$-function in various Mujoco environments, which demonstrates that the approximation errors of the transition kernel and reward function are consistently lower than those of the optimal $Q$-function. To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.
    摘要

eess.IV - 2023-10-03

Deep learning-based image super-resolution of a novel end-expandable optical fiber probe for application in esophageal cancer diagnostics

  • paper_url: http://arxiv.org/abs/2310.02171
  • repo_url: None
  • paper_authors: Xiaohui Zhang, Mimi Tan, Mansour Nabil, Richa Shukla, Shaleen Vasavada, Sharmila Anandasabapathy, Mark A. Anastasio, Elena Petrova
  • for: 这个研究的目的是提高食道癌检测的效率,以便早期诊断和治疗食道癌。
  • methods: 这个研究使用了一种新的结构可扩展的endooscopic optical fiber probes,以及一种深度学习基于的图像超解析(DL-SR)方法,以超越限制的样本量化问题。
  • results: 这个研究发现,DL-SR方法可以在不同的干扰模型下,对低分辨率(LR)微缩endooscopic图像进行超解析,并且可以提高传统的图像质量指标。此外,endooscopist的解释也与高分辨率图像相似。
    Abstract Significance: Endoscopic screening for esophageal cancer may enable early cancer diagnosis and treatment. While optical microendoscopic technology has shown promise in improving specificity, the limited field of view (<1 mm) significantly reduces the ability to survey large areas efficiently in esophageal cancer screening. Aim: To improve the efficiency of endoscopic screening, we proposed a novel end-expandable endoscopic optical fiber probe for larger field of visualization and employed a deep learning-based image super-resolution (DL-SR) method to overcome the issue of limited sampling capability. Approach: To demonstrate feasibility of the end-expandable optical fiber probe, DL-SR was applied on simulated low-resolution (LR) microendoscopic images to generate super-resolved (SR) ones. Varying the degradation model of image data acquisition, we identified the optimal parameters for optical fiber probe prototyping. The proposed screening method was validated with a human pathology reading study. Results: For various degradation parameters considered, the DL-SR method demonstrated different levels of improvement of traditional measures of image quality. The endoscopist interpretations of the SR images were comparable to those performed on the high-resolution ones. Conclusions: This work suggests avenues for development of DL-SR-enabled end-expandable optical fiber probes to improve high-yield esophageal cancer screening.
    摘要 significans:endoscopic screening for esophageal cancer may enable early cancer diagnosis and treatment。although optical microendoscopic technology has shown promise in improving specificity,the limited field of view (<1 mm) significantly reduces the ability to survey large areas efficiently in esophageal cancer screening。 aim:to improve the efficiency of endoscopic screening,we proposed a novel end-expandable endoscopic optical fiber probe for larger field of visualization and employed a deep learning-based image super-resolution (DL-SR) method to overcome the issue of limited sampling capability。approach:to demonstrate feasibility of the end-expandable optical fiber probe,DL-SR was applied on simulated low-resolution (LR) microendoscopic images to generate super-resolved (SR) ones。varying the degradation model of image data acquisition,we identified the optimal parameters for optical fiber probe prototyping。the proposed screening method was validated with a human pathology reading study。results:for various degradation parameters considered,the DL-SR method demonstrated different levels of improvement of traditional measures of image quality。the endoscopist interpretations of the SR images were comparable to those performed on the high-resolution ones。conclusion:this work suggests avenues for development of DL-SR-enabled end-expandable optical fiber probes to improve high-yield esophageal cancer screening。

Detecting internal disorders in fruit by CT. Part 1: Joint 2D to 3D image registration workflow for comparing multiple slice photographs and CT scans of apple fruit

  • paper_url: http://arxiv.org/abs/2310.01987
  • repo_url: https://github.com/d1rk123/apple_photo_ct_workflow
  • paper_authors: Dirk Elias Schut, Rachael Maree Wood, Anna Katharina Trull, Rob Schouten, Robert van Liere, Tristan van Leeuwen, Kees Joost Batenburg
    for: This paper aims to create a dataset of image pairs of photographs of apple slices and their corresponding CT slices to study internal disorders in apples using CT imaging.methods: The workflow includes data acquisition, image segmentation, image registration, and validation methods. The image registration method aligns all available slices of an apple within a single optimization problem, assuming that the slices are parallel.results: The dataset was acquired from 107 ‘Kanzi’ apples that had been stored in controlled atmosphere (CA) storage for 8 months. In this dataset, the distance between annotations in the slice photograph and the matching CT slice was, on average, $1.47 \pm 0.40$ mm.
    Abstract A large percentage of apples are affected by internal disorders after long-term storage, which makes them unacceptable in the supply chain. CT imaging is a promising technique for in-line detection of these disorders. Therefore, it is crucial to understand how different disorders affect the image features that can be observed in CT scans. This paper presents a workflow for creating datasets of image pairs of photographs of apple slices and their corresponding CT slices. By having CT and photographic images of the same part of the apple, the complementary information in both images can be used to study the processes underlying internal disorders and how internal disorders can be measured in CT images. The workflow includes data acquisition, image segmentation, image registration, and validation methods. The image registration method aligns all available slices of an apple within a single optimization problem, assuming that the slices are parallel. This method outperformed optimizing the alignment separately for each slice. The workflow was applied to create a dataset of 1347 slice photographs and their corresponding CT slices. The dataset was acquired from 107 'Kanzi' apples that had been stored in controlled atmosphere (CA) storage for 8 months. In this dataset, the distance between annotations in the slice photograph and the matching CT slice was, on average, $1.47 \pm 0.40$ mm. Our workflow allows collecting large datasets of accurately aligned photo-CT image pairs, which can help distinguish internal disorders with a similar appearance on CT. With slight modifications, a similar workflow can be applied to other fruits or MRI instead of CT scans.
    摘要 大量的苹果在长期存储后会受到内部疾病的影响,导致它们不符合供应链的标准。CT成像是一种有前途的技术,可以在生产过程中检测内部疾病。因此,了解不同的内部疾病如何影响CT成像中可见的特征非常重要。这篇论文提出了一个工作流程,用于创建包含图像对的数据集。通过对图像进行分割、注册和验证,可以利用图像之间的相似性来研究内部疾病的过程和CT成像中的疾病识别。工作流程包括数据采集、图像分割、图像注册和验证方法。图像注册方法将所有可用的苹果片归类到单一优化问题中,假设苹果片是平行的。这种方法在每个苹果片上进行优化时表现出色。工作流程应用于创建了1347个图像对,其中每个对包含一个苹果片的相片和相应的CT成像。这些数据来自于在控制气候存储8个月后采集的107个'Kanzi'苹果。在这个数据集中,相片和CT成像之间的距离平均为1.47±0.40毫米。我们的工作流程可以生成大量准确匹配的相片-CT成像对,帮助分辨CT成像中的内部疾病。通过小幅修改,这种工作流程可以应用于其他水果或MRI等。

Spectro-spatial hyperspectral image reconstruction from interferometric acquisitions

  • paper_url: http://arxiv.org/abs/2310.01898
  • repo_url: None
  • paper_authors: Daniele Picone, Mohamad Jouni, Mauro Dalla-Mura
  • for: 这个论文主要是为了解决静止спектроскопия图像的征化问题,即从 Raw 数据中提取有用的信息。
  • methods: 这篇论文使用了干扰Interferometry的技术,并在 Bayesian 框架下进行了减除雷达处理。
  • results: 研究人员在这篇论文中提出了一种将空间常量约束添加到减除过程中,以提高图像征化的性能。这种方法与传统的像素级减除方法相比,能够更好地捕捉图像的空间结构信息。
    Abstract In the last decade, novel hyperspectral cameras have been developed with particularly desirable characteristics of compactness and short acquisition time, retaining their potential to obtain spectral/spatial resolution competitive with respect to traditional cameras. However, a computational effort is required to recover an interpretable data cube. In this work we focus our attention on imaging spectrometers based on interferometry, for which the raw acquisition is an image whose spectral component is expressed as an interferogram. Previous works have focused on the inversion of such acquisition on a pixel-by-pixel basis within a Bayesian framework, leaving behind critical information on the spatial structure of the image data cube. In this work, we address this problem by integrating a spatial regularization for image reconstruction, showing that the combination of spectral and spatial regularizers leads to enhanced performances with respect to the pixelwise case. We compare our results with Plug-and-Play techniques, as its strategy to inject a set of denoisers from the literature can be implemented seamlessly with our physics-based formulation of the optimization problem.
    摘要 过去一个十年,新的干涉 hyperspectral 镜头已经开发出来,具有特别的紧凑性和捕获时间短的特点,保留了它们对传统镜头的spectral/空间分辨率的竞争力。然而,需要计算的努力来回归一个可解释的数据立方体。在这种工作中,我们关注干涉spectrometer,其捕获数据的原始形式是一个干涉ogram。前一些工作是在每个像素基础上进行bayesian框架中进行干涉的减少,留下了图像数据立方体的空间结构信息。在这种工作中,我们解决这个问题,通过 интеGRATE一个空间规则来进行图像重建,表明将spectral和空间规则相结合会提高对于像素基础的情况的性能。我们与插入技术相比较,因为它们的策略可以轻松地与我们的物理基础的优化问题进行结合。

eess.SP - 2023-10-03

Dual-Polarization Phase Retrieval Receiver in Silicon Photonics

  • paper_url: http://arxiv.org/abs/2310.02467
  • repo_url: None
  • paper_authors: Brian Stern, Hanzi Huang, Haoshuo Chen, Kwangwoong Kim, Mohamad Hossein Idjadi
  • for: 这个论文是为了描述一种基于硅光学的双极化phaserecoverer。
  • methods: 该接收器使用了不含本地oscillator或传输的干扰信号的intensity-only测量来恢复phas。作者设计了硅波导提供长时间延迟和微环 resonator实现symbol-to-symbol干扰和排차分析。
  • results: 研究人员成功地恢复了分配多plexed 30GBd QPSK和20GBd 8QAM信号的全场,并在80公里长的SSMF上实现了。
    Abstract We demonstrate a silicon photonic dual-polarization phase retrieval receiver. The receiver recovers phase from intensity-only measurements without a local oscillator or transmitted carrier. We design silicon waveguides providing long delays and microring resonators with large dispersion to enable symbol-to-symbol interference and dispersive projection in the phase retrieval algorithm. We retrieve the full field of a polarization-division multiplexed 30-GBd QPSK and 20-GBd 8QAM signals over 80 km of SSMF.
    摘要 我们实现了一种半导体光学双极化阶段征 recuperator。该接收器从强度 alone 测量中 recuperate 阶段,不需要本地抗器或传输的干扰信号。我们设计了半导体波导,以实现长时间延迟和微环 resonator 的大扩散,以启用符号到符号干扰和扩散减法在phaseretrieval算法中。我们在80 km SSMF 上 Retrieval 整个场的半导体波分 multiplexed 30 GBd QPSK 和 20 GBd 8QAM 信号。

  • paper_url: http://arxiv.org/abs/2310.02399
  • repo_url: None
  • paper_authors: Ashutosh Srivastava, Qing Zhao, Yi Lu, Ping Wang, Qi Qu, Zhu Ji, Yee Sin Chan, Shivendra S. Panwar
  • for: 这 paper 的目的是研究 NR SL 是否可以支持 AR 应用。
  • methods: 这 paper 使用系统水平的模拟来分析 NR SL 是否可以支持 AR。
  • results: 研究结果表明,当前 SL 标准规定不足以支持高级 AR 应用程序,但可以支持 simpler 预览和文件传输。 authors 还提出了两种加强 SL 资源分配的方案,它们有可能为 AR 应用程序提供显著性能提升。
    Abstract Smart glasses that support augmented reality (AR) have the potential to become the consumer's primary medium of connecting to the future internet. For the best quality of user experience, AR glasses must have a small form factor and long battery life, while satisfying the data rate and latency requirements of AR applications. To extend the AR glasses' battery life, the computation and processing involved in AR may be offloaded to a companion device, such as a smartphone, through a wireless connection. Sidelink (SL), i.e., the D2D communication interface of 5G NR, is a potential candidate for this wireless link. In this paper, we use system-level simulations to analyze the feasibility of NR SL for supporting AR. Our simulator incorporates the PHY layer structure and MAC layer resource scheduling of 3GPP SL, standard 3GPP channel models, and MCS configurations. Our results suggest that the current SL standard specifications are insufficient for high-end AR use cases with heavy interaction but can support simpler previews and file transfers. We further propose two enhancements to SL resource allocation, which have the potential to offer significant performance improvements for AR applications.
    摘要 智能眼镜支持增强现实(AR)技术有可能成为未来互联网的主要连接媒介。为保证用户体验质量,AR眼镜需要具有小型化的形 factor和长时间的电池生命周期,同时满足AR应用的数据速率和延迟要求。为了延长AR眼镜的电池生命周期, computation和处理 involved in AR可以卸载到配套设备,如智能手机,通过无线连接。在这篇论文中,我们使用系统级别的模拟来分析SL在支持AR方面的可行性。我们的模拟器包括3GPP SL的物理层结构和 mac层资源调度,标准3GPP通道模型和MCS配置。我们的结果表明,当前SL标准规格不够用于高级AR应用程序,但可以支持简单的预览和文件传输。我们还提出了两种SL资源分配的改进方案,这些改进方案具有提高AR应用程序性能的潜在性。

Time-Reflection of Microwaves by a Fast Optically-Controlled Time-Boundary

  • paper_url: http://arxiv.org/abs/2310.02377
  • repo_url: None
  • paper_authors: Thomas R. Jones, Alexander V. Kildishev, Mordechai Segev, Dimitrios Peroulis
  • for: 这 paper 是 investigate the time-reflection of electromagnetic (EM) waves in a medium with abrupt property changes.
  • methods: The authors use a periodically-loaded microstrip line with optically-controlled picosecond-switchable photodiodes to observe the time-reflection of microwave pulses at the highest frequency ever observed (0.59 GHz).
  • results: The authors present experimental evidence of the phase-conjugation nature of time-reflected waves and demonstrate the feasibility of realizing Photonic Time-Crystals at GHz frequencies.
    Abstract When an electromagnetic (EM) wave is propagating in a medium whose properties are varied abruptly in time, the wave experiences refractions and reflections known as "time-refractions" and "time-reflections", both manifesting spectral translation as a consequence of the abrupt change of the medium and the conservation of momentum. However, while the time-refracted wave continues to propagate with the same wave-vector, the time-reflected wave is propagating backward with a conjugate phase, despite the lack of any spatial interface. Importantly, while time-refraction is always significant, observing time-reflection poses a major challenge - because it requires a large change in the medium occurring within a single cycle. For that reason, time-reflection of EM waves was observed only recently. Here, we present the observation of microwave pulses at the highest frequency ever observed (0.59 GHz), and the experimental evidence of the phase-conjugation nature of time-reflected waves. Our experiments are carried out in a periodically-loaded microstrip line with optically-controlled picosecond-switchable photodiodes. Our system paves the way to the experimental realization of Photonic Time-Crystals at GHz frequencies.
    摘要 Observing time-reflection is a challenging task, as it requires a significant change in the medium within a single cycle. As a result, time-reflection of EM waves has only recently been observed. In this study, we report the observation of microwave pulses at the highest frequency ever recorded (0.59 GHz), and provide experimental evidence of the phase-conjugate nature of time-reflected waves. Our experiments were conducted in a periodically-loaded microstrip line using optically-controlled picosecond-switchable photodiodes. Our system paves the way for the experimental realization of Photonic Time-Crystals at GHz frequencies.

Integrate-and-fire circuit for converting analog signals to spikes using phase encoding

  • paper_url: http://arxiv.org/abs/2310.02055
  • repo_url: None
  • paper_authors: Javier Lopez-Randulfe, Nico Reeb, Alois Knoll
  • for: 本研究旨在提出一种基于泛型神经网络的硬件实现,用于处理感知器数据。
  • methods: 本研究使用了逐渐增强的泛型神经网络模型,并在物理板上实现了快速的时间编码技术。
  • results: 研究人员通过实现一种可适应的刺激时间控制方法,实现了将连续的感知器数据转换为时间编码的脉冲。这种方法可以减少能耗,并且可以在终端neuromorphic应用中实现快速的处理速度。
    Abstract Processing sensor data with spiking neural networks on digital neuromorphic chips requires converting continuous analog signals into spike pulses. Two strategies are promising for achieving low energy consumption and fast processing speeds in end-to-end neuromorphic applications. First, to directly encode analog signals to spikes to bypass the need for an analog-to-digital converter (ADC). Second, to use temporal encoding techniques to maximize the spike sparsity, which is a crucial parameter for fast and efficient neuromorphic processing. In this work, we propose an adaptive control of the refractory period of the leaky integrate-and-fire (LIF) neuron model for encoding continuous analog signals into a train of time-coded spikes. The LIF-based encoder generates phase-encoded spikes that are compatible with digital hardware. We implemented the neuron model on a physical circuit and tested it with different electric signals. A digital neuromorphic chip processed the generated spike trains and computed the signal's frequency spectrum using a spiking version of the Fourier transform. We tested the prototype circuit on electric signals up to 1 KHz. Thus, we provide an end-to-end neuromorphic application that generates the frequency spectrum of an electric signal without the need for an ADC or a digital signal processing algorithm.
    摘要 处理感知数据使用聚合神经网络(SNN)在数字神经omorphic芯片上需要将连续的分析信号转换为脉冲。有两种措施可能实现低能耗和快速处理速度的端到端神经omorphic应用。第一种是直接将分析信号编码为脉冲,以避免使用分析数字转换器(ADC)。第二种是使用时间编码技术来最大化脉冲稀疏性,这是端到端神经omorphic处理中的关键参数。在这种工作中,我们提出了适应控制LIF神经元模型的储能期的自适应控制,以编码连续分析信号为一列时间编码脉冲。LIF模型生成的脉冲编码在与数字硬件兼容的方面具有优势。我们在物理芯片上实现了神经元模型,并对它们进行了不同的电信号测试。一个数字神经omorphic芯片处理生成的脉冲链,并使用脉冲版 Fourier 变换计算信号的频率谱。我们测试了板件circuit,并在电信号频率范围内达到1KHz。因此,我们提供了一种端到端神经omorphic应用,可以无需ADC或数字信号处理算法,对电信号频率谱进行计算。

Implementation of hyperspectral inversion algorithms on FPGA: Hardware comparison using High Level Synthesis

  • paper_url: http://arxiv.org/abs/2310.01906
  • repo_url: None
  • paper_authors: El Mehdi Abdali, Daniele Picone, Mauro Dalla-Mura, Stéphane Mancini
  • for: 本研究旨在对各种扩spectrum reconstruction算法和设计体系进行评估,以提供特定应用场景中的优化trade-off。
  • methods: 本研究使用High Level Synthesis(HLS)工作流程,对不同的扩spectrum reconstruction算法和设计体系进行评估,以提供性能评估。
  • results: 本研究提供了一个统一的评估框架,可以帮助选择最佳的扩spectrum reconstruction算法和设计体系,以满足特定应用场景的需求。
    Abstract Hyperspectral imaging is gathering significant attention due to its potential in various domains such as geology, agriculture, ecology, and surveillance. However, the associated processing algorithms, which are essential for enhancing output quality and extracting relevant information, are often computationally intensive and have to deal with substantial data volumes. Our focus lies on reconfigurable hardware, particularly recent FPGAs. While FPGA design can be complex, High Level Synthesis (HLS) workflows have emerged as a solution, abstracting low-level design intricacies and enhancing productivity. Despite successful prior efforts using HLS for hyperspectral imaging acceleration, we lack a comprehensive research to benchmark various algorithms and architectures within a unified framework. This study aims to quantitatively evaluate performance across different inversion algorithms and design architectures, providing insights for optimal trade-offs for specific applications. We apply this analysis to the case study of spectrum reconstruction processed from interferometric acquisitions taken by Fourier transform spectrometers.
    摘要 干部成像技术在不同领域如地质、农业、生态和监测等领域受到了广泛关注,因为它们在数据处理和图像分析方面具有广泛的应用前景。然而,与这些处理算法相关的数据处理和图像分析过程经常具有较高的计算复杂度和大量数据量。我们的研究重点在于可编程的硬件,尤其是最新的场程 gates (FPGAs)。虽然FPGA 设计可能是复杂的,但高级语言synthesis(HLS)工作流程已经出现,它可以抽象低级设计细节,提高生产力。 DESPITE previous efforts have been made to accelerate hyperspectral imaging using HLS, we lack a comprehensive study that benchmarks various algorithms and architectures within a unified framework. This study aims to quantitatively evaluate the performance of different inversion algorithms and design architectures, providing insights for optimal trade-offs for specific applications. We apply this analysis to the case study of spectrum reconstruction processed from interferometric acquisitions taken by Fourier transform spectrometers.

Waveform Manipulation Against DNN-based Modulation Classification Attacks

  • paper_url: http://arxiv.org/abs/2310.01894
  • repo_url: None
  • paper_authors: Dimitrios Varkatzas, Antonios Argyriou
  • for: 防止侦测器使用深度神经网络(DNN)学习无线通信信号的模式。
  • methods: 利用混合在数据中的不断时频分割(FM)隐藏信号来修改发射的波形,使得合法接收器(LRx)可以解谱数据,但是使得侦测器的DNN分类器的测试错误率提高至下限。
  • results: 选择混合波形参数的精细控制可以在AWGN和折射通道中降低侦测器的分类性能至10%以下,无损LRx的性能。
    Abstract In this paper we propose a method for defending against an eavesdropper that uses a Deep Neural Network (DNN) for learning the modulation of wireless communication signals. Our method is based on manipulating the emitted waveform with the aid of a continuous time frequency-modulated (FM) obfuscating signal that is mixed with the modulated data. The resulting waveform allows a legitimate receiver (LRx) to demodulate the data but it increases the test error of a pre-trained or adversarially-trained DNN classifier at the eavesdropper. The scheme works for analog modulation and digital single carrier and multi carrier orthogonal frequency division multiplexing (OFDM) waveforms, while it can implemented in frame-based wireless protocols. The results indicate that careful selection of the parameters of the obfuscating waveform can drop classification performance at the eavesdropper to less than 10% in AWGN and fading channels with no performance loss at the LRx.
    摘要 在这篇论文中,我们提出了一种防止侦测者使用深度神经网络(DNN)学习无线通信信号的方法。我们的方法基于在扩展时频域内混合扩展时频域调制(FM)隐藏信号和模拟数据之后,将模拟信号发射到侦测者。这种方法使得合法接收器(LRx)可以解模数据,但是使得预训练或恶意训练的DNN分类器在侦测者上增加测试错误率。该方案适用于分形式无线协议,并且可以在模拟调制和数字单个载波和多载波扩展分多处理(OFDM)波形上实现。结果表明,选择混合信号的参数可以在AWGN和折射通道中降低侦测者的分类性能至10%以下,而不会对合法接收器产生影响。

On the Peak-to-Average Power Ratio of Vibration Signals: Analysis and Signal Companding for an Efficient Remote Vibration-Based Condition Monitoring

  • paper_url: http://arxiv.org/abs/2310.01718
  • repo_url: None
  • paper_authors: Sulaiman Aburakhia, Abdallah Shami
    for: This paper focuses on the problem of peak-to-average power ratio (PAPR) control in vibration-based condition monitoring (VBCM) systems, and proposes a lightweight autoencoder-based signal companding scheme to improve power efficiency and mitigate the impact of nonlinear distortion.methods: The proposed scheme employs a lightweight reconstruction autoencoder with a compression-based activation function in the source to compress the vibration signals, and a denoising-expansion autoencoder in the destination to expand the compressed signals while minimizing noise enhancement.results: The experimental results demonstrate the effectiveness of the proposed companding scheme in preventing nonlinear distortion, improving the efficiency of power amplification in the source, and restoring the PAPR characteristics in the destination while avoiding the undesired effect of noise expansion. The proposed scheme is able to maintain the waveform of the vibration signal and improve the reliability of condition monitoring.Here is the information in Simplified Chinese text:for: 这篇论文关注了VBMC系统中的峰峰值平均功率比(PAPR)控制问题,并提议了一种基于自适应神经网络的信号压缩方案,以提高功率效率和减少非线性扭曲的影响。methods: 提议的方案使用了一种轻量级的重建自适应神经网络,在源端使用压缩基于活动函数来压缩振荡信号,并在目的端使用扩展基于活动函数来扩展压缩后的信号,以最小化噪声增强。results: 实验结果表明,提议的压缩方案能够避免非线性扭曲,提高源端的功率压缩效率,并在目的端 restore PAPR 特性,避免噪声增强的不良影响。
    Abstract Vibration-based condition monitoring (VBCM) is widely utilized in various applications due to its non-destructive nature. Recent advancements in sensor technology, the Internet of Things (IoT), and computing have enabled the facilitation of reliable distributed VBCM where sensor nodes are deployed at multiple locations and connected wirelessly to monitoring centers. However, sensor nodes are typically constrained by limited power resources, necessitating control over the peak-to-average power ratio (PAPR) of the generated vibration signals. Effective control of PAPR is crucial to prevent nonlinear distortion and reduce power consumption within the node. Additionally, avoiding nonlinear distortion in the vibration signal and preserving its waveform is essential to ensure the reliability of condition monitoring. This paper conducts an in-depth analysis of the PAPR of vibration signals in VBCM systems, evaluates the impact of nonlinear power amplification on the system performance, and proposes a lightweight autoencoder-based signal companding scheme to control the PAPR to improve power efficiency and mitigate the impact of nonlinear distortion. The proposed scheme employs a lightweight reconstruction autoencoder with a compression-based activation function in the source to compress the vibration signals and avoid increasing the average power of the compressed signal. In the destination, the proposed scheme uses a denoising-expansion autoencoder to expand the compressed signals while minimizing noise enhancement during the expansion process. The experimental results demonstrate the effectiveness of the proposed companding scheme in preventing nonlinear distortion, improving the efficiency of power amplification in the source, and restoring the PAPR characteristics in the destination while avoiding the undesired effect of noise expansion.
    摘要 随着互联网技术的发展,质量监测技术(VBCM)在各种应用中广泛应用,主要是因为它的非 destruktive 特点。现有的感知技术,互联网技术和计算技术的进步,使得可靠的分布式VBCM成为可能,感知节点在多个位置被无线连接到监测中心。然而,感知节点通常受到有限的电力资源的限制,因此控制发射器的峰值平均功率比(PAPR)是关键的。有效地控制PAPR可以避免非线性扭曲和降低节点内部的电力消耗。此外,保持振荡信号的波形不受扭曲是重要的,以确保 condition monitoring 的可靠性。本文进行了深入的PAPR振荡信号在VBCM系统中的分析,评估了非线性增强器对系统性能的影响,并提议了一种轻量级自适应压缩 schemes。该方案使用轻量级重建自适应 neural network 来压缩振荡信号,以避免增加平均功率。在目的地,该方案使用一个扩展自适应 neural network 来扩展压缩信号,以最小化扩展过程中的噪声增强。实验结果表明,提议的压缩 schemes 可以避免非线性扭曲,提高源端的电力增强效率,并在目的地Restore PAPR 特点,而不是扩展过程中的噪声增强。

cs.SD - 2023-10-02

Scaling Up Music Information Retrieval Training with Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.01353
  • repo_url: None
  • paper_authors: Yun-Ning Hung, Ju-Chiang Wang, Minz Won, Duc Le
  • for: 提高Music Information Retrieval(MIR)任务的成功率,解决数据环境的缺乏问题。
  • methods: 使用半监督教师学生训练方法,通过不断创建和优化假标签来提高MIR任务的性能。
  • results: 通过扩大模型大小和训练数据量,实现了多个MIR任务的最佳性能,比超vised模型和基于自我超vised预训练模型的模型更高。
    Abstract In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has been one of the major concerns to the success of an MIR task. In this work, we leverage the semi-supervised teacher-student training approach to improve MIR tasks. For training, we scale up the unlabeled music data to 240k hours, which is much larger than any public MIR datasets. We iteratively create and refine the pseudo-labels in the noisy teacher-student training process. Knowledge expansion is also explored to iteratively scale up the model sizes from as small as less than 3M to almost 100M parameters. We study the performance correlation between data size and model size in the experiments. By scaling up both model size and training data, our models achieve state-of-the-art results on several MIR tasks compared to models that are either trained in a supervised manner or based on a self-supervised pretrained model. To our knowledge, this is the first attempt to study the effects of scaling up both model and training data for a variety of MIR tasks.
    摘要 在数据驱动的音乐信息检索(MIR)时代,数据稀缺问题一直是MIR任务的主要难题。在这项工作中,我们利用半supervised教师生徒训练方法来提高MIR任务的性能。为了训练,我们扩大了无标音乐数据至240k小时,这比任何公共MIR数据集都大得多。我们在含噪教师生徒训练过程中逐渐创建和精细化假标签。我们也进行了知识扩展,以逐渐扩大模型的大小从less than 3M到大约100M参数。我们在实验中研究了数据大小和模型大小之间的性能相关性。通过扩大模型和训练数据,我们的模型在多个MIR任务上达到了相对较高的状态。到我们知道的,这是第一次对多个MIR任务进行数据和模型扩大的研究。

F0 analysis of Ghanaian pop singing reveals progressive alignment with equal temperament over the past three decades: a case study

  • paper_url: http://arxiv.org/abs/2310.00870
  • repo_url: None
  • paper_authors: Iran R. Roman, Daniel Faronbi, Isabelle Burger-Weiser, Leila Adu-Gilmore
  • for: 这个论文的目的是研究当代加纳现代流行歌曲如何结合欧洲和传统加纳风格,以及这种结合对加纳风格的影响。
  • methods: 作者使用了 Gaussian mixture modeling (GMM) 方法来分析加纳歌手Daddy Lumba的歌曲,从1989年到2016年,并提取了封闭 vocals 的 F0 值。
  • results: 研究发现,Daddy Lumba的 singing 逐渐倾向于和等律律法align,特别是在最近的年份,总体来说,他的唱法中的微调内容逐渐减少。这些结果表明,加纳风格在接触等律律法后可能会受到影响,并且需要进一步的研究以映射和档案加纳的唱法样式。
    Abstract Contemporary Ghanaian popular singing combines European and traditional Ghanaian influences. We hypothesize that access to technology embedded with equal temperament catalyzed a progressive alignment of Ghanaian singing with equal-tempered scales over time. To test this, we study the Ghanaian singer Daddy Lumba, whose work spans from the earliest Ghanaian electronic style in the late 1980s to the present. Studying a singular musician as a case study allows us to refine our analysis without over-interpreting the findings. We curated a collection of his songs, distributed between 1989 and 2016, to extract F0 values from isolated vocals. We used Gaussian mixture modeling (GMM) to approximate each song's scale and found that the pitch variance has been decreasing over time. We also determined whether the GMM components follow the arithmetic relationships observed in equal-tempered scales, and observed that Daddy Lumba's singing better aligns with equal temperament in recent years. Together, results reveal the impact of exposure to equal-tempered scales, resulting in lessened microtonal content in Daddy Lumba's singing. Our study highlights a potential vulnerability of Ghanaian musical scales and implies a need for research that maps and archives singing styles.
    摘要 现代加纳流行歌唱结合了欧洲和传统加纳的元素。我们推测,访问嵌入了等温度的技术导致加纳歌手的唱法逐渐与等温度的音频相对。为了测试这一点,我们研究了加纳歌手达ди·卢贝,他的作品从1980年代后期到现在。通过研究单个音乐家的案例,我们可以精细地分析而不是过度解释结果。我们收集了达ди·卢贝的歌曲,分布在1989年和2016年之间,并从孤立的 vocals 中提取 F0 值。我们使用 Gaussian mixture modeling (GMM) 来估算每首歌的scale,并发现歌曲的抖音幅度逐渐减少。我们还确定了 GMM 组件是否遵循等温度规律所见,并发现达ди·卢贝的唱法在最近几年变得更加适应等温度。结果表明加纳音频扩展的影响,导致达ди·卢贝的唱法中减少微调内容。我们的研究强调了加纳音频扩展的可能性,并 imply 需要一种映射和档案加纳唱法的研究。

eess.AS - 2023-10-02

A Fused Deep Denoising Sound Coding Strategy for Bilateral Cochlear Implants

  • paper_url: http://arxiv.org/abs/2310.01122
  • repo_url: None
  • paper_authors: Tom Gajecki, Waldo Nogueira
  • For: The paper aims to improve speech intelligibility in noisy environments for individuals with severe sensorineural hearing loss using bilateral cochlear implants (BiCI).* Methods: The authors propose a deep-learning-based bilateral speech enhancement model that shares information between both hearing sides, connecting two monaural end-to-end deep denoising sound coding techniques through intermediary latent fusion layers.* Results: The proposed fused BiCI sound coding strategy achieves higher interaural coherence, superior noise reduction, and enhanced predicted speech intelligibility scores compared to the baseline methods, with speech-in-noise intelligibility results in BiCI users revealing scores similar to those achieved in quiet conditions.Here’s the information in Simplified Chinese text:* For: 这篇论文目标是为患有严重感觉听力障碍的人群提高噪音环境中的语音理解能力,使用双侧听力Implant(BiCI)。* Methods: 作者提议一种基于深度学习的双侧语音提升模型,将两侧听力端的端到端深度噪音减除技术相互连接,通过中间潜在融合层。* Results: 提议的融合BiCI声码策略比基线方法更高的interaural coherence、更好的噪音减除和提高预测语音理解能力分数,在BiCI用户中的语音在噪音环境中的理解能力也达到了静音环境的水平。
    Abstract Cochlear implants (CIs) provide a solution for individuals with severe sensorineural hearing loss to regain their hearing abilities. When someone experiences this form of hearing impairment in both ears, they may be equipped with two separate CI devices, which will typically further improve the CI benefits. This spatial hearing is particularly crucial when tackling the challenge of understanding speech in noisy environments, a common issue CI users face. Currently, extensive research is dedicated to developing algorithms that can autonomously filter out undesired background noises from desired speech signals. At present, some research focuses on achieving end-to-end denoising, either as an integral component of the initial CI signal processing or by fully integrating the denoising process into the CI sound coding strategy. This work is presented in the context of bilateral CI (BiCI) systems, where we propose a deep-learning-based bilateral speech enhancement model that shares information between both hearing sides. Specifically, we connect two monaural end-to-end deep denoising sound coding techniques through intermediary latent fusion layers. These layers amalgamate the latent representations generated by these techniques by multiplying them together, resulting in an enhanced ability to reduce noise and improve learning generalization. The objective instrumental results demonstrate that the proposed fused BiCI sound coding strategy achieves higher interaural coherence, superior noise reduction, and enhanced predicted speech intelligibility scores compared to the baseline methods. Furthermore, our speech-in-noise intelligibility results in BiCI users reveal that the deep denoising sound coding strategy can attain scores similar to those achieved in quiet conditions.
    摘要 针对严重的感觉肾膜听力损伤,听力导管(CI)提供了一种解决方案,帮助人们恢复听力能力。当患者双侧听力都受到损伤时,可以安装两个独立的CI设备,通常会进一步提高CI的效果。这种空间听力特别重要,因为听力损伤者在噪声环境中理解语音是一个普遍的问题。目前,研究人员正在努力开发自动过滤噪声的算法,以提高CI用户对语音的理解能力。在这种情况下,我们提出了一种基于深度学习的双侧语音增强模型,该模型将两个单侧的听力侧连接起来,共享信息。具体来说,我们将两个独立的末端到端深度减噪听音编码技术相互 multiply ogether,以实现更好地减少噪声和提高学习普遍性。对比基eline方法,我们的提议的融合BiCI听力编码策略得到了更高的同耳相关性、更好的噪声减少和更高的预测语音认知度。此外,我们的BiCI用户语音在噪声环境中的认知结果表明,深度减噪听音编码策略可以达到类似于静音环境下的成绩。